资源论文A Deeper Look at Planning as Learning from Replay

A Deeper Look at Planning as Learning from Replay

2020-03-05 | |  81 |   40 |   0

Abstract

In reinforcement learning, the notions of experience replay, and of planning as learning from replayed experience, have long been used to find good policies with minimal training data. Replay can be seen either as model-based reinforcement learning, where the store of past experiences serves as the model, or as a way to avoid a conventional model of the environment altogether. In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods. First, we show for the first time an exact equivalence between the sequence of value functions found by a modelbased policy-evaluation method and by a modelfree method with replay. Second, we present a general replay method that can mimic a spectrum of methods ranging from the explicitly modelfree (TD(0)) to the explicitly model-based (linear Dyna). Finally, we use insights gained from these relationships to design a new model-based reinforcement learning algorithm for linear function approximation. This method, which we call forgetful LSTD(?), improves upon regular LSTD(?) because it extends more naturally to online control, and improves upon linear Dyna because it is a multi-step method, enabling it to perform well even in non-Markov problems or, equivalently, in problems with significant function approximation.

上一篇:Celeste: Variational inference for a generative model of astronomical images

下一篇:Alpha-Beta Divergences Discover Micro and Macro Structures in Data

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...