资源论文Reconciling ?-Returns with Experience Replay

Reconciling ?-Returns with Experience Replay

2020-02-23 | |  30 |   25 |   0

Abstract

Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the 图片.png-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of 图片.png-returns. Yet replay-based methods are often the most sample efficient, and incorporating 图片.png-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of 图片.png-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent 图片.png-returns can be efficiently precomputed by sharing Q-values. Computation is not wasted on experiences that are never sampled, and stored 图片.png-returns behave as stable temporal-difference (TD) targets that replace the target network. Additionally, our method grants the unique ability to observe TD errors prior to sampling; for the first time, transitions can be prioritized by their true significance rather than by a proxy to it. Furthermore, we propose the novel use of the TD error to dynamically select 图片.png-values that facilitate faster learning. We show that these innovations can enhance the performance of DQN when playing Atari 2600 games, even under partial observability. While our work specifically focuses on 图片.png-returns, these ideas are applicable to any multi-step return estimator.

上一篇:Beyond Alternating Updates for Matrix Factorizationwith Inertial Bregman Proximal Gradient Algorithms

下一篇:Neural Multisensory Scene Inference

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...