资源论文Fighting Bandits with a New Kind of Smoothness

Fighting Bandits with a New Kind of Smoothness

2020-02-04 | |  108 |   44 |   0

Abstract

 We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularizationpvia the Tsallis entropy, which includes EXP3 as a special case, matches the  image.pngminimax regret with a smaller constant factor. Second, we show that a p wide class of perturbation methods achieve a near-optimal regret as low as image.png as long as the perturbation distribution has a bounded hazard function. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property and lead to near-optimal algorithms.

上一篇:Estimating Mixture Models via Mixtures of Polynomials

下一篇:Collaborative Filtering with Graph Information: Consistency and Scalable Methods

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...