资源论文On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

2020-02-25 | |  55 |   36 |   0

Abstract

We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate an integral operator which is determined by the feature vector distribution 图片.png only. Consequently, GD method can be viewed as approximately applying the powers of this integral operator on the underlying function 图片.png that generates the responses. We show that if 图片.png admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low-rank approximation error at a linear rate which is determined by 图片.png and 图片.png only, i.e., the rate is independent of the sample size n. Furthermore, if 图片.png has zero low-rank approximation error, then, as long as thep width of the neural network is 图片.png the empirical risk decreases to 图片.png To the best of our knowledge, this is the first result showing the sufficiency of nearly-linear network over-parameterization. We provide an application of our general results to the setting where 图片.png is the uniform distribution on the spheres and 图片.png is a polynomial. Throughout this paper, we consider the scenario where the input dimension d is fixed.

上一篇:Seeing the Wind: Visual Wind Speed Prediction with a Coupled Convolutional and Recurrent Neural Network

下一篇:Training Image Estimators without Image Ground-Truth

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...