Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

2020-03-10 |

145 |

46 |

Abstract

Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and robotic manipulation. However, in real-world control problems, the actions one can take are bounded by physical constraints, which introduces a bias when the standard Gaussian distribution is used as the stochastic polic In this work, we propose to use the Beta distribution as an alternative and analyze the bias and variance of the policy gradients of both poli cies. We show that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization (TRPO) and actor critic with experience replay (ACER), the state-of-the-art onand offpolicy stochastic methods respectively, on OpenAI Gym’s and MuJoCo’s continuous control environments.

上一篇：Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC

下一篇：Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Joint Pose and Ex...

Facial expression recognition (FER) is a challe...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com