资源论文Unifying the Stochastic and the Adversarial Bandits with Knapsack

Unifying the Stochastic and the Adversarial Bandits with Knapsack

2019-10-09 | |  83 |   36 |   0
Abstract This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature

上一篇:SynthNet: Learning to Synthesize Music End-to-End

下一篇:A Vectorized Relational Graph Convolutional Network for Multi-Relational Network Alignment

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...