资源论文SIGN SGD: Compressed Optimisation for Non-Convex Problems

SIGN SGD: Compressed Optimisation for Non-Convex Problems

2020-03-19 | |  54 |   37 |   0

Abstract

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGN SGD alleviates this problem by transmitting just the sign of each minibatc stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative图片.png1 /图片.png2 geometry of gradients, noise and curvature informs whether SIGN SGD or SGD is theoretically better suited to a particular problem. On the prac tical side we find that the momentum counterpart of SIGN SGD is able to match the accuracy and convergence speed of A DAM on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss (1823) we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD.

上一篇:Adaptive Three Operator Splitting

下一篇:MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...