资源论文Communication trade-offs for Local-SGD with large step size

Communication trade-offs for Local-SGD with large step size

2020-02-19 | |  59 |   46 |   0

Abstract

Synchronous mini-batch SGD is state-of-the-art for large-scale distributed machine learning. However, in practice, its convergence is bottlenecked by slow communication rounds between worker nodes. A natural solution to reduce communication is to use the “local-SGD” model in which the workers train their model independently and synchronize every once in a while. This algorithm improves the computation-communication trade-off but its convergence is not understood very well. We propose a non-asymptotic error analysis, which enables comparison to one-shot averaging i.e., a single communication round among independent workers, and mini-batch averaging i.e., communicating at every step. We also provide adaptive lower bounds on the communication frequency for large step-sizes 图片.png图片.png and show that local-SGD reduces communication by a factor of 图片.png with T the total number of gradients and P machines.

上一篇:Provably Powerful Graph Networks

下一篇:Max-value Entropy Search for Multi-Objective Bayesian Optimization

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...