Spurious Local Minima are Common in Two-Layer ReLU Neural Networks

资源分类

2020-03-16 |

51 |

34 |

Abstract

We consider the optimization problem associated with training simple ReLU neural networks of Pk the form 图片.png with respect to the squared loss. We provide a computerassisted proof that even if the input distribution is standard Gaussian, even if the dimension is arbitrarily large, and even if the target values are generated by such a network, with orthonormal parameter vectors, the problem can still have spurious local minima once 图片.png By a concentration of measure argument, this implies that in high input dimensions, nearly all target networks of the relevant sizes lead to spurious local minima. Moreover, we conduct experiments which show that the probability of hitting such local minima is quite high, and increasing with the network size. On the positive side, mild over-parameterization appears to drastically reduce such local minima, indicating that an overparameterization assumption is necessary to get a positive result in this setting.

上一篇：Differentially Private Identity and Equivalence Testing of Discrete Distributions

下一篇：oi-VAE: Output Interpretable VAEs for Nonlinear Group Factor Analysis

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com