资源论文RIDE: REWARDING IMPACT-D RIVEN EXPLORATIONFOR PROCEDURALLY-G ENERATED ENVIRONMENTS

RIDE: REWARDING IMPACT-D RIVEN EXPLORATIONFOR PROCEDURALLY-G ENERATED ENVIRONMENTS

2020-01-02 | |  80 |   50 |   0

Abstract

Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning (RL). Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage the agent to explore the environment. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to ever visit the same state more than once. We propose a novel type of intrinsic exploration bonus which rewards the agent for actions that change the agent's learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid, as well as on tasks used in prior curiosity-driven exploration work. Our experiments demonstrate that our approach is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments. Furthermore, we analyze the learned behavior as well as the intrinsic reward received by our agent. In contrast to previous approaches, our intrinsic reward does not diminish during the course of training and it rewards the agent substantially more for interacting with objects that it can control.

上一篇:DYNAMIC MODEL PRUNING WITH FEEDBACK

下一篇:ON THE INTERACTION BETWEEN SUPERVISION ANDSELF -PLAY IN EMERGENT COMMUNICATION

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...