资源论文DISTRIBUTED BANDIT LEARNING :N EAR -O PTIMALR EGRET WITH EFFICIENT COMMUNICATION

DISTRIBUTED BANDIT LEARNING :N EAR -O PTIMALR EGRET WITH EFFICIENT COMMUNICATION

2020-01-02 | |  91 |   46 |   0

Abstract

We study the problem of regret minimization for distributed bandits learning, in which M agents work collaboratively to minimize their total regret under the coordination of a central server. Our goal is to design communication protocols with near-optimal regret and little communication cost, which is measured by the total amount of transmitted data. For distributed multi-armed bandits, we propose a protocol with near-optimal regret and only 图片.png communication cost, where K is the number of arms. The communication cost is independent of the time horizon T , has only logarithmic dependence on the number of arms, and matches the lower bound except for a logarithmic factor. For distributed d-dimensional linear bandits, we propose a protocol that achieves near-optimal regret and has communication cost of order 图片.pngwhich has only logarithmic dependence on T .

上一篇:LEARNING FROM EXPLANATIONS WITHN EURAL MODULE EXECUTION TREE

下一篇:IMITATION LEARNING VIA OFF -P OLICYD ISTRIBUTION MATCHING

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...