资源论文Momentum-Based Variance Reduction in Non-Convex SGD

Momentum-Based Variance Reduction in Non-Convex SGD

2020-02-20 | |  73 |   40 |   0

Abstract

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large “mega-batches” in order to achieve their improved results. We present a new algorithm, S TORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses F , STORM finds a point x with 图片.png图片.png in T iterations with 图片.png variance in the gradients, matching the best-known rate but without requiring knowledge of ?.

上一篇:Defending Against Neural Fake News

下一篇:Handling correlated and repeated measurements with the smoothed multivariate square-root Lasso

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...