资源论文Convergent T REE BACKUP and R ETRACE with Function Approximation

Convergent T REE BACKUP and R ETRACE with Function Approximation

2020-03-16 | |  47 |   35 |   0

Abstract

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficie algorithms. In this work, we show that the T REE BACKUP and R ETRACE algorithms are unstable with linear function approximation, both in theory and in practice with specific examples. Based on our analysis, we then derive stable and efficient gradient-based algorithms using a quadratic convex-concave saddle-point formulation. By exploiting the problem structure proper to these algo rithms, we are able to provide convergence guarantees and finite-sample bounds. The applicability of our new analysis also goes beyond T REE BACKUP and R ETRACE and allows us to provide new convergence rates for the GTD and GTD2 algorithms without having recourse to projections or Polyak averaging.

上一篇:Visualizing and Understanding Atari Agents

下一篇:Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...