资源论文Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

2020-02-05 | |  48 |   41 |   0

Abstract 

We propose a new oracle-based algorithm, BISTRO+, for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization 2 1 oracle, and enjoys a regret of order image.png, where K is the number of actions, T is the number of iterations, and N is the number of baseline policies. 3 Our result is the first to break the image.png barrier achieved by recent algorithms, which was left as a major open problem. Our analysis employs the recent relaxation framework of Rakhlin and Sridharan [7].

上一篇:Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

下一篇:Split LBI: An Iterative Regularization Path with Structural Sparsity

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...