资源论文Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications

Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications

2020-03-04 | |  56 |   42 |   0

Abstract

In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player. We present the Global Confidence Bound (GCB) algorithm, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues. GCB only requires feedback on a small set of actions and achieves 图片.png distribution-independent regret and O(log T ) distribution-dependent regret (the latter assuming unique optimal action), where T is the total time steps played. Moreover, the regret bounds only depend linearly on log |X | rather than |X |, where X is the action space. GCB isolates offline optimization tasks from online learning and avoids explicit enumeration of all actions in the online learning part. W demonstrate that our model and algorithm can be applied to a crowdsourcing application leading to both an efficient learning algorithm and low regret, and argue that they can be applied to a wide range of combinatorial applications constrained with limited feedback. Proceedings of the 31 st International Conference on Mach Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copright 2014 by the author(s).

上一篇:Rectangular Tiling Process

下一篇:A Highly Scalable Parallel Algorithm for Isotropic Total Variation Models

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...