资源论文Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

2020-02-12 | |  78 |   53 |   0

Abstract 

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

上一篇:EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

下一篇:Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...