资源论文MODEL -AUGMENTED ACTOR -C RITIC :BACKPROPAGATING THROUGH PATHS

MODEL -AUGMENTED ACTOR -C RITIC :BACKPROPAGATING THROUGH PATHS

2020-01-02 | |  41 |   32 |   0

Abstract

Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning. In this paper, we show how to make more effective use of the model by exploiting its differentiability. We construct a policy optimization algorithm that uses the pathwise derivative of the learned model and policy across future timesteps. Instabilities of learning across many timesteps are prevented by using a terminal value function, learning the policy in an actor-critic fashion. Furthermore, we present a derivation on the monotonic improvement of our objective in terms of the gradient error in the model and value function. We show that our approach (i) is consistently more sample efficient than existing state-of-the-art model-based algorithms, (ii) matches the asymptotic performance of model-free algorithms, and (iii) scales to long horizons, a regime where typically past model-based approaches have struggled.

上一篇:IMPLEMENTING INDUCTIVE BIAS FOR DIFFERENTNAVIGATION TASKS THROUGH DIVERSE RNN ATTR -RACTORS

下一篇:WATCH ,T RY, LEARN :M ETA -L EARNING FROMD EMONSTRATIONS AND REWARDS

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...