资源论文Copeland Dueling Bandits

Copeland Dueling Bandits

2020-02-04 | |  62 |   37 |   0

Abstract 

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form image.pngbut require restrictive assumptions, or offer bounds of the form image.png without requiring such assumptions. Our results offer the best of both worlds: image.png bounds without restrictive assumptions.

上一篇:Structured Transforms for Small-Footprint Deep Learning

下一篇:Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems

用户评价
全部评价

热门资源

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Rating-Boosted La...

    The performance of a recommendation system reli...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...