资源论文Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

2019-11-16 | |  75 |   38 |   0

Abstract Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we fifirst show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for effificient exploration, which is particularly useful when the sampling cost of immediate rewards is high. We demonstrate the usefulness of the proposed method, named active policy iteration (API), through simulations with a batting robot

上一篇:Unsupervised Rank Aggregation with Domain-Specific Expertise

下一篇:Inverse Reinforcement Learning in Partially Observable Environments

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...