资源论文Stochastic Variance-Reduced Policy Gradient

Stochastic Variance-Reduced Policy Gradient

2020-03-19 | |  62 |   41 |   0

Abstract

In this paper, we propose a novel reinforcementlearning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variancereduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Final we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

上一篇:Temporal Poisson Square Root Graphical Models

下一篇:Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...