资源论文Potential Based Reward Shaping for Hierarchical Reinforcement Learning

Potential Based Reward Shaping for Hierarchical Reinforcement Learning

2019-11-22 | |  70 |   46 |   0

Abstract Hierarchical Reinforcement Learning (HRL) outperforms many ‘flflat’ Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flflat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ- 0. We prove that under certain conditions, PBRSMAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 signifificantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics

上一篇:Reinforcement Learning from Demonstration through Shaping

下一篇:Inverse Reinforcement Learning in Relational Domains

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...