资源论文How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD?

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD?

2020-02-14 | |  81 |   48 |   0

Abstract

 Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives f (x). However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when f (x) is convex. If f (x) is convex, to find a point with gradient norm image.png, we design an algorithm SGD3 with a near-optimal rate image.png, improving the best known rate image.png. If f (x) is nonconvex, to find its image.png-approximate local minimum, we design an algorithm SGD5 with rate image.png, where previously SGD variants only achieve image.png [6, 14, 30]. This is no slower than the best known stochastic version of Newton’s method in all parameter regimes [27].

上一篇:Watch Your Step: Learning Node Embeddings via Graph Attention

下一篇:Bayesian Distributed Stochastic Gradient Descent

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...