资源论文Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

2020-02-05 | |  105 |   49 |   0

Abstract

 Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an 2 2 2 upper PAC bound image.png and a lower PAC bound image.png that match up to log-terms and an additional linear dependency on the number of states |S|. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein’s inequality to improve on previous bounds for episodic finitehorizon MDPs which have a time-horizon dependency of at least image.png .

上一篇:Inverse Reinforcement Learning with Locally Consistent Reward Functions

下一篇:Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...