PAC Subset Selection in Stochastic Multi-armed Bandits

资源分类

2020-02-28 |

84 |

45 |

Abstract

We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This “subset selection” problem finds application in a variety of areas. In the authors’ previous work (Kalyanakrishnan & Stone, 2010), this problem is framed under a PAC setting (denoted “Explore-m”), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algorithms, in this paper, we design and analyze an algorithm (“LUCB”) with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the wellknown UCB algorithm for regret minimization. The expected sample complexity bound we show for LUCB is novel even for singlearm selection (Explore-1). We also give a lower bound on the worst case sample complexity of PAC algorithms for Explore-m.

上一篇：Local Loss Optimization in Operator Models: A New Insight into Spectral Learning

下一篇：Copula-based Kernel Dependency Measures

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com