资源论文Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

2020-02-26 | |  57 |   36 |   0

Abstract

Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of “bootstrapped” return estimates to make efficient use of sampled data. In particular, TD(图片.png) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD(图片.png) chooses a very specific way of averaging these estimators based on the fixed parameter 图片.png, which may not lead to optimal convergence rates in all settings. In this paper, we derive an automated Bayesian approach to setting 图片.png that we call temporal difference Bayesian model averaging (TDBMA). Empirically, TD-BMA always performs as well and often much better than the best fixed 图片.png for TD(图片.png) (even when performance for different values of 图片.png varies across problems) without requiring that 图片.png or any analogous parameter be manually tuned.

上一篇:Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine

下一篇:Implicit Regularization in Variational Bayesian Matrix Factorization

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...