资源论文Improved Regret Bounds for Bandit Combinatorial Optimization*

Improved Regret Bounds for Bandit Combinatorial Optimization*

2020-02-21 | |  57 |   42 |   0

Abstract

Bandit combinatorial optimization is a bandit framework in which a player chooses an action within a given finite set 图片.png and incurs a loss that is the inner product of the chosen action and an unobservable loss vector in Rd in each round. In this paper, we aim to reveal the property, which makes the bandit combinatorial p optimization hard. Recently, Cohen et al. [8] obtained a lower bound 图片.png of the regret, where k is the maximum 图片.png-norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by 图片.png through applying a factor of 图片.png which can be done by means of strongly-correlated losses with binary values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit. In particular, the bound obtained for the bandit ranking in the present study addresses an open problem raised in [8]. In addition, we demonstrate that the problem becomes easier without considering correlations among entries of loss vectors. In fact, if each entry of loss vectors is an independent random variable, then, one can achieve a regret of 图片.png which is k times smaller than the lower bound shown above. The observed results indicated that correlation among losses is the reason for observing a large regret. This work was supported by JST, ERATO, Grant Number JPMJER1201, Japan. † This work was supported by JST, ACT-I, Grant Number JPMJPR18U5, Japan. ‡ This work was supported by JST, PRESTO, Grant Number JPMJPR1759, Japan. § This work was supported by JSPS, KAKENHI, Grant Number JP18H05291, Japan.33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.

上一篇:Global Convergence of Gradient Descent for Deep Linear Residual Networks

下一篇:Defending Neural Backdoors via Generative Distribution Modeling

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...