资源论文Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

2020-01-06 | |  64 |   44 |   0

Abstract

In this paper, we propose an efficient algorithm for estimating the natural policy gradient using parameter-based exploration; this algorithm samples directly in the parameter space. Unlike previous methods based on natural gradients, our algorithm calculates the natural policy gradient using the inverse of the exact Fisher information matrix. The computational cost of this algorithm is equal to that of conventional policy gradients whereas previous natural policy gradient methods have a prohibitive computational cost. Experimental results show that the proposed method outperforms several policy gradient methods.

上一篇:An Approximate Inference Approach to Temporal Optimization in Optimal Control

下一篇:A Novel Kernel for Learning a Neuron Model from Spike Train Data

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...