资源论文Pure Exploration with Multiple Correct Answers

Pure Exploration with Multiple Correct Answers

2020-02-23 | |  34 |   30 |   0

Abstract

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensure that the existing Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-andStop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.

上一篇:Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks

下一篇:An adaptive nearest neighbor rule for classification

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...