资源论文Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

2020-02-18 | |  75 |   51 |   0

Abstract 

We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant image.png, given by the sum of the reciprocals of the hidden layer widths. When image.png is large, the gradients computed by N at initialization vary wildly. Our approach complements the mean field theory analysis of random networks. From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos.

上一篇:An intriguing failing of convolutional neural networks and the CoordConv solution

下一篇:Clebsch–Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...