资源论文N EWTRON: an Efficient Bandit algorithm for Online Multiclass Prediction

N EWTRON: an Efficient Bandit algorithm for Online Multiclass Prediction

2020-01-08 | |  82 |   41 |   0

Abstract

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar 图片.png. We prove that the regret of N EWTRON is 图片.png when 图片.png is a constant that does not vary with horizon T , and at most 图片.png if 图片.png is allowed  to increase to infinity with T . For 图片.png, the regret is bounded by 图片.png, thus solving the open problem of [KSST08, AR09]. Our algorithm is based on a novel application of the online Newton method [HAK07]. We test our algorithm and show it to perform well in experiments, even when 图片.png is a small constant.

上一篇:Stochastic convex optimization with bandit feedback

下一篇:Inductive reasoning about chimeric creatures

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...