资源论文Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

2020-02-21 | |  65 |   30 |   0

Abstract

The information-theoretic analysis by Russo and Van Roy [25] in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation. Besides this, for k-armed adversarial bandits we provide an efficient algorithm with regret that matches the best information-theoretic upper bound and improve best known regret guarantees for online linear optimisation on 图片.png-balls and bandits with graph feedback.

上一篇:Adaptive Cross-Modal Few-shot Learning

下一篇:Massively Scalable Sinkhorn Distances via the Nyström Method

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...