资源论文Nonparametric Contextual Bandits in an Unknown Metric Space

Nonparametric Contextual Bandits in an Unknown Metric Space

2020-02-19 | |  72 |   49 |   0

Abstract

Consider a nonparametric contextual multi-arm bandit problem where each arm 图片.png is associated to a nonparametric reward function 图片.png mapping from contexts to the expected reward. Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. finite types or smooth with respect to an unknown metric space. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm’s dependence on the local geometry of the reward functions.

上一篇:Fast and Accurate Stochastic Gradient Estimation

下一篇:Provably Powerful Graph Networks

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...