资源论文Re-evaluating Complex Backups in Temporal Difference Learning

Re-evaluating Complex Backups in Temporal Difference Learning

2020-01-08 | |  62 |   44 |   0

Abstract

We show that the 图片.png-return target used in the TD(图片.png) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an nstep return estimate increases with n. We introduce the 图片.pngeturn estimator, an alternative target based on a more accurate model of variance, which defines the 图片.png family of complex-backup temporal difference learning algorithms. We derive 图片.png the 图片.png-return equivalent of the original 图片.png algorithm, which eliminates the图片.pngparameter but can only perform updates at the end of an episode and requires time and space proportional to the episode length. We then derive a second algorithm, 图片.pngwith a capacity parameter 图片.pngrequires C times more time and memory than 图片.png and is incremental and online. We show that 图片.png outperforms 图片.png for any setting of 图片.png on 4 out of 5 benchmark domains, and that 图片.pngperforms as well as or better than 图片.png for intermediate settings of C.

上一篇:High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

下一篇:Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...