资源论文Contextual Combinatorial Cascading Bandits

Contextual Combinatorial Cascading Bandits

2020-03-05 | |  64 |   33 |   0

Abstract

We propose the contextual combinatorial cascading bandits, a combinatorial online learning game, where at each time step a learning agent is given a set of contextual information, then selects a list of items, and observes stochastic outcomes of a prefix in the selected items by some stopping criterion. In online recommendation, the stopping criterion might be the first item a user selects; in network routing, the stopping criterion might be the first edge blocked in a path. We consider position discounts in the list order, so that the agent’s reward is discounted depending on the position where the stopping criterion is met. We design a UCB-type algorithm, 图片.png for this problem, prove an n-step regret bound 图片.png in the general setting, and give finer analysis for two special cases. Our work generalizes existing studies in several directions, including contextual information, position discounts, and a more general cascading bandit model. Experiments on synthetic and real datasets demonstrate the advantage of involving contextual information and position discounts.

上一篇:Differentially Private Policy Evaluation

下一篇:Learning Sparse Combinatorial Representations via Two-stage Submodular Maximization

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...