资源论文SAMPLE EFFICIENT POLICY GRADIENT METHODSWITH RECURSIVE VARIANCE REDUCTION

SAMPLE EFFICIENT POLICY GRADIENT METHODSWITH RECURSIVE VARIANCE REDUCTION

2020-01-02 | |  66 |   48 |   0

Abstract

Improving the sample efficiency in reinforcement learning has been a longstanding research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires 图片.png1 episodes to find 图片.pngapproximate stationary point of the nonconcave performance function 图片.png(i.e., 图片.png such that 图片.pngThis sample complexity improves the existing result 图片.png for stochastic variance reduced policy gradient algorithms by a factor of 图片.png. In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

上一篇:SUMO: UNBIASED ESTIMATION OF LOG MARGINALP ROBABILITY FOR LATENT VARIABLE MODELS

下一篇:RESTRICTING THE FLOW: INFORMATION BOTTLE -NECKS FOR ATTRIBUTION

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...