资源论文Clipped Action Policy Gradient

Clipped Action Policy Gradient

2020-03-16 | |  43 |   34 |   0

Abstract

Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies ar usually optimized as if the actions are not clipp We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a bette policy gradient estimator for continuous control tasks. The source code is available at https: //github.com/pfnet-research/capg.

上一篇:On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

下一篇:Learning and Memorization

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...