资源论文Committing Bandits

Committing Bandits

2020-01-08 | |  198 |   120 |   0

Abstract

We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment 图片.png steps and then commit, where T is the time horizon.

上一篇:Inductive reasoning about chimeric creatures

下一篇:Shaping Level Sets with Submodular Functions

用户评价
全部评价

热门资源

  • Deep Cross-media ...

    Cross-media retrieval is a research hotspot in ...

  • Regularizing RNNs...

    Recently, caption generation with an encoder-de...

  • Learning Expressi...

    Facial expression is temporally dynamic event w...

  • Attributed Graph ...

    Graph clustering is a fundamental task which di...

  • Compact MDDs for ...

    Pseudo-Boolean (PB) constraints are usually en...