资源论文Policy Evaluation Using the -Return

Policy Evaluation Using the -Return

2020-02-07 | |  68 |   39 |   0

Abstract 

We propose the image.png-return as an alternative to the image.png-return currently used by the TD(image.png) family of algorithms. The benefit of the image.png-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the image.png-return. We provide empirical studies that suggest that it is superior to the image.png-return and image.png-return for a variety of problems.

上一篇:Matrix Completion with Noisy Side Information

下一篇:Private Graphon Estimation for Sparse Graphs

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...