资源论文Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach

Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach

2020-02-04 | |  62 |   47 |   0

Abstract 

We study the problem of online rank elicitation, assuming that rankings of a set of alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits problem, the learner is allowed to query pairwise comparisons between alternatives, i.e., to sample pairwise marginals of the distribution in an online fashion. Using this information, the learner seeks to reliably predict the most probable ranking (or top-alternative). Our approach is based on constructing a surrogate probability distribution over rankings based on a sorting procedure, for which the pairwise marginals provably coincide with the marginals of the PlackettLuce distribution. In addition to a formal performance and complexity analysis, we present first experimental studies.

上一篇:Calibrated Structured Prediction

下一篇:Stop Wasting My Gradients: Practical SVRG

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...