Model-based reinforcement learning with nearly tight exploration complexity bounds

资源分类

2020-02-27 |

87 |

63 |

Abstract

One might believe that model-based algorithms of reinforcement learning can propagate the obtained experience more quickly, and are able to direct exploration better. As a consequence, fewer exploratory actions should be enough to learn a good policy. Strangely enough, current theoretical results for model-based algorithms do not support this claim: In a finite Markov decision process with N states, the best bounds on the number of exploratory steps necessary are of order O( 图片.png log N ), in contrast to the O(N log N ) bound available for the modelfree, delayed Q-learning algorithm. In this paper we show that Mormax, a modified version of the Rmax algorithm needs to make at most O(N log N ) exploratory steps. This matches the lower bound up to logarithmic factors, as well as the upper bound of the state-of-the-art model-free algorithm, while our new bound improves the dependence on other problem parameters. In the reinforcement learning (RL) framework, an agent interacts with an unknown environment and tries to maximize its long-term profit. A standard way to measure the efficiency of the agent is sample complexity or exploration complexity. Roughly, this quantity tells how many non-optimal (exploratory) steps does the agent make at most. The best understood and most studied case is when theenvironment is a finite Markov decision process (MDP) with the expected total discounted reward criterion.Since the work of Kearns & Singh (1998), many algorithms have been published with bounds on their samAppearing in Proceedings of the 27 th International Confeence on Machine Learning, Haifa, Israel, 2010. Copyright2010 by the author(s)/owner(s).

上一篇：Constructing States for Reinforcement Learning

下一篇：The Translation-invariant Wishart-Dirichlet Process for Clustering Distance Data

用户评价

全部评价

还没有评论，说两句吧！

热门资源

A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...
The Variational S...

Unlike traditional images which do not offer in...
Hierarchical Task...

We extend hierarchical task network planning wi...
Shape-based Autom...

We present an algorithm for automatic detection...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com