资源论文Model-based reinforcement learning with nearly tight exploration complexity bounds

Model-based reinforcement learning with nearly tight exploration complexity bounds

2020-02-27 | |  69 |   43 |   0

Abstract

One might believe that model-based algorithms of reinforcement learning can propagate the obtained experience more quickly, and are able to direct exploration better. As a consequence, fewer exploratory actions should be enough to learn a good policy. Strangely enough, current theoretical results for model-based algorithms do not support this claim: In a finite Markov decision process with N states, the best bounds on the number of exploratory steps necessary are of order O(图片.png log N ), in contrast to the O(N log N ) bound available for the modelfree, delayed Q-learning algorithm. In this paper we show that Mormax, a modified version of the Rmax algorithm needs to make at most O(N log N ) exploratory steps. This matches the lower bound up to logarithmic factors, as well as the upper bound of the state-of-the-art model-free algorithm, while our new bound improves the dependence on other problem parameters. In the reinforcement learning (RL) framework, an agent interacts with an unknown environment and tries to maximize its long-term profit. A standard way to measure the efficiency of the agent is sample complexity or exploration complexity. Roughly, this quantity tells how many non-optimal (exploratory) steps does the agent make at most. The best understood and most studied case is when theenvironment is a finite Markov decision process (MDP) with the expected total discounted reward criterion.Since the work of Kearns & Singh (1998), many algorithms have been published with bounds on their samAppearing in Proceedings of the 27 th International Confeence on Machine Learning, Haifa, Israel, 2010. Copyright2010 by the author(s)/owner(s).

上一篇:Constructing States for Reinforcement Learning

下一篇:The Translation-invariant Wishart-Dirichlet Process for Clustering Distance Data

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...