资源论文Efficient Selection of Multiple Bandit Arms: Theory and Practice

Efficient Selection of Multiple Bandit Arms: Theory and Practice

2020-02-26 | |  96 |   63 |   0

Abstract

We consider the general, widely applicable problem of selecting from n real-valued random variables a subset of size m of those with the highest means, based on as few samples as possible. This problem, which we denote Explore-m, is a core aspect in several stochastic optimization algorithms, and applications of simulation and industrial engineering. The theoretical basis for our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the one best of n random variables (Explore1). In addition to providing PAC bounds for the general case, we tailor our theoretically grounded approach to work efficiently in practice. Empirical comparisons of the resulting sampling algorithm against stateof-the-art subset selection strategies demonstrate significant gains in sample efficiency.

上一篇:Risk minimization, probability elicitation, and cost-sensitive SVMs

下一篇:A fast natural Newton method

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...