资源论文A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

2019-11-05 | |  106 |   48 |   0

Abstract Recently, a new multi-step temporal learning algorithm 图片.pngunifies n-step Tree-Backup (when 图片.png) and n-step Sarsa (when 图片.png) by introducing a sampling parameter 图片.png However, similar to other multi-step temporal-difference learning algorithms, 图片.pngneeds much memory consumption and computation time. Eligibility trace is an important mechanism to transform the offline updates into efficient on-line ones which consume less memory and computation time. In this paper, we combine the original 图片.png with eligibility traces and propose a new algorithm, called 图片.pngis trace-decay parameter. This new algorithm unifies Sarsa(图片.png) (when 图片.png) and 图片.png). Furthermore, we give an upper error bound of图片.png ) policy evaluation algorithm. We prove that 图片.png) control algorithm converges to the optimal value function exponentially. We also empirically compare it with conventional temporal-difference learning methods. Results show that, with an intermediate value of图片.png 图片.png) creates a mixture of the existing algorithms which learn the optimal value significantly faster than the extreme end (图片.png = 0, or 1).

上一篇:Cost-Effective Active Learning for Hierarchical Multi-Label Classification Yi-Fan Yan and Sheng-Jun Huang?

下一篇:Bandit Online Learning on Graphs via Adaptive Optimization

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...