PROJECTION BASED CONSTRAINED POLICY OPTI -MIZATION

资源分类

2020-01-02 |

63 |

44 |

Abstract

In this paper, we consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm – Projection Based Constrained Policy Optimization (PCPO), an iterative method for optimizing policies in a twostep process – the first step performs an unconstrained update while the second step reconciles the constraint violation by projection the policy back onto the constraint set. We theoretically analyze PCPO and provide a lower bound on reward improvement, as well as an upper bound on constraint violation for each policy update. We further characterize the convergence of PCPO with projection based on two different metrics – L2 norm and Kullback-Leibler divergence. Our empirical results over several control tasks demonstrate that our algorithm achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.1

上一篇：REINFORCEMENT LEARNING BASEDG RAPH -TO -S EQUENCE MODEL FORNATURAL QUESTION GENERATION

下一篇：LOW- DIMENSIONAL STATISTICAL MANIFOLD EMBED -DING OF DIRECTED GRAPHS

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com