资源论文-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

2020-02-20 | |  60 |   40 |   0

Abstract

We study -best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an -best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.

上一篇:Nonparametric Regressive Point Processes Based on Conditional Gaussian Processes

下一篇:PAC-Bayes under potentially heavy tails

用户评价
全部评价

热门资源

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...