资源论文A Linear Speedup Analysis of Distributed DeepLearning with Sparse and Quantized Communication

A Linear Speedup Analysis of Distributed DeepLearning with Sparse and Quantized Communication

2020-02-14 | |  66 |   42 |   0

Abstract 

The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous works have demonstrated the potential of using gradient sparsification and quantization to reduce the communication cost. However, there is still a lack of understanding about how sparse and quantized communication affects the convergence rate of the training algorithm. In this paper, we study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: p sparse parameter averaging and gradient quantization. We show that image.png convergence rate can be achieved if the sparsification and quantization hyperparameters are configured properly. We also propose a strategy called periodic quantized averaging (PQASGD) p that further reduces the communication cost while preserving the image.png convergence rate. Our evaluation validates our theoretical results and shows that our PQASGD can converge as fast as full-communication SGD with only 3% 5% communication data size.

上一篇:Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

下一篇:Learning in Games with Lossy Feedback

用户评价
全部评价

热门资源

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Rating-Boosted La...

    The performance of a recommendation system reli...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...