资源论文Q-learning with Nearest Neighbors

Q-learning with Nearest Neighbors

2020-02-17 | |  58 |   37 |   0

Abstract 

We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor image.png given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an "-accurate estimate of the optimal Q-function using image.pngsamples. For instance, for a wellbehaved MDP, the covering time of the sample path under the purely random policy scales as image.png so the sample complexity scales as image.png Indeed, we establish a lower bound that argues that the dependence of  image.png is necessary.

上一篇:A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

下一篇:Deep Structured Prediction with Nonlinear Output Transformations

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...