资源论文Cornering Stationary and Restless Mixing Bandits with Remix-UCB

Cornering Stationary and Restless Mixing Bandits with Remix-UCB

2020-02-04 | |  62 |   45 |   0

Abstract 

We study the restless bandit problem where arms are associated with stationary image.png-mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of carefully recovering some independence by ‘ignoring’ the values of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off, which we do by considering the idea of a waiting arm in the new Remix-UCB algorithm, a generalization of Improved-UCB for the problem at hand, that we introduce. We provide a regret analysis for this bandit strategy; two noticeable features of Remix-UCB are that i) it reduces to the regular Improved-UCB when the image.png-mixing coefficients are all 0, i.e. when the i.i.d scenario is recovered, and ii) when image.png it is able to ensure a controlled regret of order image.png where image.png encodes the distance between the best arm and the best suboptimal arm, even in the case when image.png < 1, i.e. the case when the image.png-mixing coefficients are not summable.

上一篇:Learning with Symmetric Label Noise: The Importance of Being Unhinged

下一篇:Where are they looking?

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...