资源论文How SGD Selects the Global Minima inOver-parameterized Learning: A Dynamical Stability Perspective

How SGD Selects the Global Minima inOver-parameterized Learning: A Dynamical Stability Perspective

2020-02-13 | |  56 |   38 |   0

Abstract

 The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability. The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. In particular, this analysis shows that learning rate and batch size play different roles in minima selection. Extensive empirical results seem to correlate well with the theoretical findings and provide further support to these claims.

上一篇:Bayesian Structure Learning by Recursive Bootstrap

下一篇:Reducing Network Agnostophobia

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...