资源论文Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

2020-01-16 | |  118 |   39 |   0

Abstract

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Ho?lder continuity of rewards and transition probabilities.

上一篇:Inverse Reinforcement Learning through Structured Classification

下一篇:Cost-Sensitive Exploration in Bayesian Reinforcement Learning

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...