资源论文Reducing Dueling Bandits to Cardinal Bandits

Reducing Dueling Bandits to Cardinal Bandits

2020-03-03 | |  42 |   39 |   0

Abstract

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B” (as opposed to cardinal feedback like “A has value 2.5”), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions – named Doubler, MultiSBM and Sparring – provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For Doubler and MultiSBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.

上一篇:Signal recovery from Pooling Representations

下一篇:Nonmyopic ε-Bayes-Optimal Active Learning of Gaussian Processes

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...