Combinatorial Multi-Armed Bandit: General Framework, Results and Applications

资源分类

2020-03-02 |

59 |

44 |

Abstract

We define a general framework for a large class of combinatorial multi-armed bandit (CMAB) problems, where simple arms with unknown distributions form super arms. In each round, a super arm is played and the outcomes of its related simple arms are observed, which helps the selection of super arms in future rounds. The reward of the super arm depends on the outcomes of played arms, and it only needs to satisfy two mild assumptions, which allow a large class of nonlinear reward instances. We assume the availability of an (α,β)-approximation oracle that takes the means of the distributions of arms and outputs a super arm that with probability β generates an ? fraction of the optimal expected reward. The objective of a CMAB algorithm is to minimize (α,β)approximation regret, which is the difference in total expected reward between the αβ fraction of expected reward when always playing the optimal super arm, and the expected reward of playing super arms according to the algorithm. We provide CUCB algorithm that achieves O(log n) regret, where n is the number of rounds played, and we further provide distribution-independent bounds for a large class of reward functions. Our regret analysis is tight in that it matches the bound for classical MAB problem up to a constant factor, and it significantly improves the regret bound

上一篇：Joint Transfer and Batch-mode Active Learning

下一篇：Stochastic Alternating Direction Method of Multipliers

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com