资源论文INFORMATION GEOMETRY OFO RTHOGONAL INITIALIZATIONS AND TRAINING

INFORMATION GEOMETRY OFO RTHOGONAL INITIALIZATIONS AND TRAINING

2020-01-02 | |  61 |   35 |   0

Abstract

Recently mean field theory has been successfully used to analyze properties of wide, random neural networks. It gave rise to a prescriptive theory for initializing feed-forward neural networks with orthogonal weights, which ensures that both the forward propagated activations and the backpropagated gradients are near 图片.png isometries and as a consequence training is orders of magnitude faster. Despite strong empirical performance, the mechanisms by which critical initializations confer an advantage in the optimization of deep neural networks are poorly understood. Here we show a novel connection between the maximum curvature of the optimization landscape (gradient smoothness) as measured by the Fisher information matrix (FIM) and the spectral radius of the input-output Jacobian, which partially explains why more isometric networks can train much faster. Furthermore, given that orthogonal weights are necessary to ensure that gradient norms are approximately preserved at initialization, we experimentally investigate the benefits of maintaining orthogonality throughout training, and we conclude that manifold optimization of weights performs well regardless of the smoothness of the gradients. Moreover, we observe a surprising yet robust behavior of highly isometric initializations — even though such networks have a lower FIM condition number at initialization, and therefore by analogy to convex functions should be easier to optimize, experimentally they prove to be much harder to train with stochastic gradient descent. We conjecture that stability of model linearized around its initial parameters has implication beyond the lazy training regime– stability of the linearization impacts the degree to which the initial curvature is predictive of the local curvature during training.

上一篇:VARIATIONAL TEMPLATE MACHINE FOR DATA -TO -T EXT GENERATION

下一篇:GENERATIVE MODELS FOR EFFECTIVE ML ONP RIVATE ,D ECENTRALIZED DATASETS

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...