资源论文Contextual Bandits With Cross-Learning

Contextual Bandits With Cross-Learning

2020-02-21 | |  37 |   33 |   0

Abstract

In the classical contextual bandits problem, in each round t, a learner observes some context c, chooses some action a to perform, and receives some reward 图片.png We consider the variant of this problem where in addition to receiving the reward 图片.png, the learner also learns the values of 图片.png for all other contexts c0 ; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning.? The best algorithms for the classical contextual bandits problem achieve图片.png regret against all stationary policies, where C is the number of contexts, K the number of actions, and T the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on C and achieve regret 图片.png. We simulate our algorithms on real auction data from an ad exchange running first-price auctions (showing that they outperform traditional contextual bandit algorithms).

上一篇:Combinatorial Inference against Label Noise

下一篇:Multivariate Sparse Coding of Nonstationary Covariances with Gaussian Processes

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...