资源论文AT STABILITY ’S EDGE :H OW TO ADJUST HYPER -PARAMETERS TO PRESERVE MINIMA SELECTION INA SYNCHRONOUS TRAINING OF NEURAL NETWORKS

AT STABILITY ’S EDGE :H OW TO ADJUST HYPER -PARAMETERS TO PRESERVE MINIMA SELECTION INA SYNCHRONOUS TRAINING OF NEURAL NETWORKS

2019-12-30 | |  44 |   37 |   0

Abstract

Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more scalable. However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm. This gap remains not well understood, as theoretical analysis so far mainly focused on the convergence rate of asynchronous methods. Contributions: We examine asynchronous training from the perspective of dynamical stability. We find that the degree of delay interacts with the learning rate, to change the set of minima accessible by an asynchronous stochastic gradient descent algorithm. We derive closed-form rules on how the learning rate could be changed, while keeping the accessible set the same. Specifically, for high delay values, we find that the learning rate should be kept inversely proportional to the delay. We then extend this analysis to include momentum. We find momentum should be either turned off, or modified to improve training stability. We provide empirical experiments to validate our theoretical findings.

上一篇:PRECISION GATING :I MPROVING NEURAL NETWORKE FFICIENCY WITH DYNAMIC DUAL -P RECISION ACTI -VATIONS

下一篇:THE EARLY PHASE OF NEURAL NETWORK TRAINING

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...