资源论文Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

2019-10-10 | |  76 |   48 |   0
Abstract Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy ? is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy ?0 . However, the policy computed by traditional RL algorithms might have worse performance compared to ?. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of ?0 is better than the performance of ? given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

上一篇:Reinforcement Learning Experience Reuse with Policy Residual Representation?

下一篇:Split Q Learning: Reinforcement Learning with Two-Stream Rewards

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...