资源论文Projected Natural Actor-Critic

Projected Natural Actor-Critic

2020-01-16 | |  61 |   47 |   0

Abstract

Natural actor-critics form a popular class of policy search algorithms for ?nding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability—their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural ActorCritics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.

上一篇:Small-Variance Asymptotics for Hidden Markov Models

下一篇:Learning to Prune in Metric and Non-Metric Spaces

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...