Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human and Agent Demonstrations

资源分类

2019-10-10 |

68 |

39 |

Abstract Reinforcement learning has enjoyed multiple impressive successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper focuses on a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper leverages demonstrations, allowing an agent to quickly achieve high performance. This paper introduces the Dynamic Reuse of Prior (DRoP) algorithm, which combines the offline knowledge (demonstrations recorded before learning) with an online confidence-based performance analysis. DRoP leverages the demonstrator’s knowledge by automatically balancing between reusing the prior knowledge and the current learned policy, allowing the agent to outperform the original demonstrations. We compare with multiple state-of-theart learning algorithms and empirically show that DRoP can achieve superior performance in two domains. Additionally, we show that this confidence measure can be used to selectively request additional demonstrations, significantly improving the learning performance of the agent

上一篇：Failure-Scenario Maker for Rule-Based Agent using Multi-agent Adversarial Reinforcement Learning and its Application to Autonomous Driving

下一篇：Leveraging Human Guidance for Deep Reinforcement Learning Tasks

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com