资源论文On Thompson Sampling and Asymptotic Optimality?

On Thompson Sampling and Asymptotic Optimality?

2019-10-29 | |  53 |   45 |   0
Abstract We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

上一篇:Nonparametric Online Machine Learning with Kernels

下一篇:Online Algorithm Selection

用户评价
全部评价

热门资源

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • dynamical system ...

    allows to preform manipulations of heavy or bul...

  • The Variational S...

    Unlike traditional images which do not offer in...