资源论文Action-Gap Phenomenon in Reinforcement Learning

Action-Gap Phenomenon in Reinforcement Learning

2020-01-11 | |  77 |   46 |   0

Abstract
Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy π̂ with respect to an action-value function Q̂, the performance loss 图片.png is upper bounded by 图片.png in which 图片.png is the parameter quantifying the action-gap regularity. For 图片.png, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.

上一篇:MAP Inference for Bayesian Inverse Reinforcement Learning

下一篇:Selecting the State-Representation in Reinforcement Learning

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...