资源论文Speedy Q-Learning

Speedy Q-Learning

2020-01-08 | |  77 |   36 |   0

Abstract

We introduce a new convergent variant of Q-learning, called speedy Q-learning (SQL), to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which  n state-action pairs and the discount factor 图片.pngonly T = 图片.pngsteps are required for the SQL algorithm to converge to an 图片.pngoptimal action-value function with high probability. This bound has a better dependency on 图片.pngand thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both modelfree and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.

上一篇:Query-Aware MCMC

下一篇:Fast and Accurate k-llleans For Large Datasets

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...