资源论文Copeland Dueling Bandits

Copeland Dueling Bandits

2020-02-04 | |  138 |   97 |   0

Abstract 

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form image.pngbut require restrictive assumptions, or offer bounds of the form image.png without requiring such assumptions. Our results offer the best of both worlds: image.png bounds without restrictive assumptions.

上一篇:Structured Transforms for Small-Footprint Deep Learning

下一篇:Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems

用户评价
全部评价

热门资源

  • Deep Cross-media ...

    Cross-media retrieval is a research hotspot in ...

  • Regularizing RNNs...

    Recently, caption generation with an encoder-de...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Visual Reinforcem...

    For an autonomous agent to fulfill a wide range...

  • Supervised Descen...

    Many computer vision problems (e.