资源论文Asynchronous Decentralized Parallel Stochastic Gradient Descent

Asynchronous Decentralized Parallel Stochastic Gradient Descent

2020-03-19 | |  53 |   39 |   0

Abstract

Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduceSGD perform poorly in a heterogeneous environment, while asynchronous algorithms using a parameter server suffer from 1) communication bottleneck at parameter servers when workers are many, and 2) significantly worse convergence when the traffic to parameter server is congested. Can we design an algorithm that is robust in a heterogeneous environment, while being communication efficient and maintaining the best-possible convergence rate? In this paper, we propose an asynchronous decentralized stochastic gradient decent algorithm (AD-PSGD) satisfying all above expectations. Our theoretical analysis shows AD-PSGD converges at the optimal 图片.png rate as SGD and has linear speedup w.r.t. number of workers. Empirically, ADPSGD outperforms the best of decentralized parallel SGD (D-PSGD), asynchronous parallel SGD (APSGD), and standard data parallel SGD (AllReduceSGD), often by orders of magnitude in a heterogeneous environment. When training ResNet-50 on ImageNet with up to 128 GPUs, AD-PSGD converges (w.r.t epochs) similarly to the AllReduce-SGD, but each epoch can be up to 4-8× faster than its synchronous counterparts in a network-sharing HPC environment.

上一篇:Constant-Time Predictive Distributions for Gaussian Processes

下一篇:Geometry Score: A Method For Comparing Generative Adversarial Networks

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...