DISTRIBUTED BANDIT LEARNING :N EAR -O PTIMALR EGRET WITH EFFICIENT COMMUNICATION

资源分类

2020-01-02 |

91 |

46 |

Abstract

We study the problem of regret minimization for distributed bandits learning, in which M agents work collaboratively to minimize their total regret under the coordination of a central server. Our goal is to design communication protocols with near-optimal regret and little communication cost, which is measured by the total amount of transmitted data. For distributed multi-armed bandits, we propose a protocol with near-optimal regret and only 图片.png communication cost, where K is the number of arms. The communication cost is independent of the time horizon T , has only logarithmic dependence on the number of arms, and matches the lower bound except for a logarithmic factor. For distributed d-dimensional linear bandits, we propose a protocol that achieves near-optimal regret and has communication cost of order 图片.png which has only logarithmic dependence on T .

上一篇：LEARNING FROM EXPLANATIONS WITHN EURAL MODULE EXECUTION TREE

下一篇：IMITATION LEARNING VIA OFF -P OLICYD ISTRIBUTION MATCHING

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com