资源论文On the Sample Complexity of Reinforcement Learning with a Generative Model

On the Sample Complexity of Reinforcement Learning with a Generative Model

2020-03-02 | |  83 |   44 |   0

Abstract

We consider the problem of learning the optimal action-value function in the discountedreward Markov decision processes (MDPs). We prove a new PAC bound on the samplecomplexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor  图片.png samples are required to find an 图片.png-optimal estimation of the action-value function with the probability 1 图片.png. We also prove a matching  lower bound of 图片.png on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action) value function in which the upper bound matches the lower bound of RL in terms of N 图片.png. Also, both our lower bound and our upper bound significantly improve on the state-of-the-art in terms of 1/(1 图片.png).

上一篇:Information-theoretic Semi-supervised Metric Learning via Entropy Regularization

下一篇:Statistical linear estimation with penalized estimators: an application to reinforcement learning

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...