资源论文Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

2020-02-10 | |  74 |   41 |   0

Abstract 

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

上一篇:Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

下一篇:AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...