资源论文Gradient Descent Can Take Exponential Time to Escape Saddle Points

Gradient Descent Can Take Exponential Time to Escape Saddle Points

2020-02-10 | |  56 |   38 |   0

Abstract 

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be signi?cantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points—it can ?nd an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justi?es the importance of adding perturbations for effcient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical ?ndings.

上一篇:Variable Importance using Decision Trees

下一篇:A Bayesian Data Augmentation Approach for Learning Deep Models

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...