Threshold Bandit, With and Without Censored Feedback

资源分类

2020-02-05 |

52 |

33 |

Abstract

We consider the Threshold Bandit setting, a variant of the classical multi-armed bandit problem in which the reward on each round depends on a piece of side information known as a threshold value. The learner selects one of K actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the event that this sample exceeds the threshold value. We consider two versions of this problem, the uncensored and censored case, that determine whether the sample is always observed or only when the threshold is not met. Using new tools to understand the popular UCB algorithm, we show that the uncensored case is essentially no more difficult than the classical multi-armed bandit setting. Finally we show that the censored case exhibits more challenges, but we give guarantees in the event that the sequence of threshold values is generated optimistically.

上一篇：Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning

下一篇：Learning What and Where to Draw

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com