资源论文Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward

Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward

2020-02-13 | |  63 |   42 |   0

Abstract

 In this paper, we study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions. CC-MAB inherits properties from both contextual bandit and combinatorial bandit: it aims to select a set of arms in each round based on the side information (a.k.a. context) associated with the arms. By “volatile arms”, we mean that the available arms to select from in each round may change; and by “submodular rewards”, we mean that the total reward achieved by selected arms is not a simple sum of individual rewards but demonstrates a feature of diminishing returns determined by the relations between selected arms (e.g. relevance and redundancy). Volatile arms and submodular rewards are often seen in many real-world applications, e.g. recommender systems and crowdsourcing, in which multi-armed bandit (MAB) based strategies are extensively applied. Although there exist works that investigate these issues separately based on standard MAB, jointly considering all these issues in a single MAB problem requires very different algorithm design and regret analysis. Our algorithm CC-MAB provides an online decision-making policy in a contextual and combinatorial bandit setting and effectively addresses the issues raised by volatile arms and submodular reward functions. The proposed  algorithm is proved to achieve image.png regret after a span of T rounds. The performance of CC-MAB is evaluated by experiments conducted on a realworld crowdsourcing dataset, and the result shows that our algorithm outperforms the prior art.

上一篇:Testing for Families of Distributions via the Fourier Transform

下一篇:Contextual Pricing for Lipschitz Buyers

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...