资源论文Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

2020-02-26 | |  128 |   46 |   0

Abstract

We present an algorithm based on the Optimism in the Face of Uncertainty (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the poptimal bias function h* , the proposed algorithm 1 achieves a regret bound of 图片.png for MDP with S states and A actions, in the case that an upper bound H on the span of h* , i.e., sp(h*) is known. This result outperforms the best p previous regret bounds 图片.png[Bartlett and Tewari, 2009]p by a factor of 图片.png. Furthermore, this regret bound matches the lower bound of 图片.png[Jaksch et al., 2010] up to a logarithmicpfactor. As a consequence, we show that there is a near optimal regret bound of 图片.png for MDPs with a finite diameter D compared to the lower bound of 图片.png[Jaksch et al., 2010].

上一篇:Learning Reward Machines for Partially Observable Reinforcement Learning

下一篇:A Family of Robust Stochastic Operators for Reinforcement Learning

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...