资源论文Advancements in Dueling Bandits

Advancements in Dueling Bandits

2019-11-06 | |  45 |   41 |   0
Abstract The dueling bandits problem is an online learning framework where learning happens “on-thefly” through preference feedback, i.e., from comparisons between a pair of actions. Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality of each pair of actions. The dueling bandits problem is wellsuited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form. In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem. As an emerging domain, the theories and algorithms of dueling bandits have been intensively studied during the past few years. We provide an overview of recent advancements, including algorithmic advances and applications. We discuss extensions to standard problem formulation and novel application areas, highlighting key open research questions in our discussion.

上一篇:Stackelberg Security Games: Looking Beyond a Decade of Success

下一篇:Ontology-Based Data Access: A Survey

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...