Thompson Sampling for Contextual Bandits with Linear Payoffs

资源分类

2020-03-02 |

72 |

69 |

Abstract

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-theart methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studied version of the contextual bandits problem. We prove a high probability regret bound of 图片.png in time T for any 0 < < 1, where d is the dimension of each context vector and is a parameter used by the algorithm. Our results provide the first theoretical guarantees for the contextual version of Thompson Sampling, ? and are close to the lower bound of for this problem. This essentially solves a COLT open problem of Chapelle and Li [COLT 2012]. Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28. Copyright 2013 by the author(s).

上一篇：One-bit Compressed Sensing: Provable Support and Vector Recovery

下一篇：Almost Optimal Exploration in Multi-Armed Bandits

用户评价

全部评价

还没有评论，说两句吧！

热门资源

A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...
Hierarchical Task...

We extend hierarchical task network planning wi...
The Variational S...

Unlike traditional images which do not offer in...
Shape-based Autom...

We present an algorithm for automatic detection...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com