SIGN SGD: Compressed Optimisation for Non-Convex Problems

资源分类

2020-03-19 |

54 |

37 |

Abstract

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGN SGD alleviates this problem by transmitting just the sign of each minibatc stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative 图片.png 1 /2 geometry of gradients, noise and curvature informs whether SIGN SGD or SGD is theoretically better suited to a particular problem. On the prac tical side we find that the momentum counterpart of SIGN SGD is able to match the accuracy and convergence speed of A DAM on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss (1823) we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD.

上一篇：Adaptive Three Operator Splitting

下一篇：MISSION: Ultra Large-Scale Feature Selection using Count-Sketches

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com