资源论文THEBREAK -EVEN POINT ON THE OPTIMIZATION TRA -JECTORIES OF DEEP NEURAL NETWORKS

THEBREAK -EVEN POINT ON THE OPTIMIZATION TRA -JECTORIES OF DEEP NEURAL NETWORKS

2019-12-30 | |  46 |   46 |   0

Abstract

Understanding the optimization trajectory is critical for understanding training of deep neural networks. We show how the hyperparameters of stochastic gradient descent influence the covariance of the gradients (K) and the Hessian of the training loss (H) along the trajectory. Based on a theoretical model, we conjecture that using a high learning rate or a small batch size leads SGD to regions of loss landscape, typically early during training, characterized by (1) reduced spectral norm of K, and (2) improved conditioning of K and H. We refer to the point on the training trajectory after which these effects hold as the break-even point. We demonstrate these effects empirically for a range of deep neural networks applied to different tasks. Finally, we apply our analysis to networks with batch normalization (BN) layers and find that it is necessary to use a higher learning rate to improve loss surface conditioning compared to a network without BN layers.

上一篇:EMERGENCE OF FUNCTIONAL AND STRUCTURALPROPERTIES OF THE HEAD DIRECTION SYSTEM BY OP -TIMIZATION OF RECURRENT NEURAL NETWORKS

下一篇:GRADIENT DESCENT MAXIMIZES THE MARGIN OFH OMOGENEOUS NEURAL NETWORKS

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...