Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge
from Human and Agent Demonstrations
Abstract
Reinforcement learning has enjoyed multiple impressive successes in recent years. However, these
successes typically require very large amounts of
data before an agent achieves acceptable performance. This paper focuses on a novel way of
combating such requirements by leveraging existing (human or agent) knowledge. In particular, this
paper leverages demonstrations, allowing an agent
to quickly achieve high performance. This paper
introduces the Dynamic Reuse of Prior (DRoP)
algorithm, which combines the offline knowledge
(demonstrations recorded before learning) with
an online confidence-based performance analysis.
DRoP leverages the demonstrator’s knowledge by
automatically balancing between reusing the prior
knowledge and the current learned policy, allowing the agent to outperform the original demonstrations. We compare with multiple state-of-theart learning algorithms and empirically show that
DRoP can achieve superior performance in two domains. Additionally, we show that this confidence
measure can be used to selectively request additional demonstrations, significantly improving the
learning performance of the agent