Communication trade-offs for Local-SGD with large step size

资源分类

2020-02-19 |

59 |

46 |

Abstract

Synchronous mini-batch SGD is state-of-the-art for large-scale distributed machine learning. However, in practice, its convergence is bottlenecked by slow communication rounds between worker nodes. A natural solution to reduce communication is to use the “local-SGD” model in which the workers train their model independently and synchronize every once in a while. This algorithm improves the computation-communication trade-off but its convergence is not understood very well. We propose a non-asymptotic error analysis, which enables comparison to one-shot averaging i.e., a single communication round among independent workers, and mini-batch averaging i.e., communicating at every step. We also provide adaptive lower bounds on the communication frequency for large step-sizes 图片.png and show that local-SGD reduces communication by a factor of with T the total number of gradients and P machines.

上一篇：Provably Powerful Graph Networks

下一篇：Max-value Entropy Search for Multi-Objective Bayesian Optimization

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com