资源论文On Thompson Sampling and Asymptotic Optimality?

On Thompson Sampling and Asymptotic Optimality?

2019-10-29 | |  76 |   80 |   0
Abstract We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

上一篇:Nonparametric Online Machine Learning with Kernels

下一篇:Online Algorithm Selection

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Deep Cross-media ...

    Cross-media retrieval is a research hotspot in ...

  • Regularizing RNNs...

    Recently, caption generation with an encoder-de...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Visual Reinforcem...

    For an autonomous agent to fulfill a wide range...