资源论文Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

2020-02-20 | |  34 |   45 |   0

Abstract

This paper establishes that optimistic algorithms attain gap-dependent and nonasymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and?smoothly interpolate between the gap dependent logarithmic-regret, and the 图片.png-minimax rate. The key technique in our analysis is a novel “clipped” regret decomposition which applies to a broad family of recent optimistic algorithms for episodic MDPs.

上一篇:Fast and Accurate Least-Mean-Squares Solvers

下一篇:NAOMI: Non-Autoregressive Multiresolution Sequence Imputation

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...