Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

资源分类

2020-02-17 |

109 |

36 |

Abstract

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given highdimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters n is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.

上一篇：Differentially Private Bayesian Inference for Exponential Families

下一篇：Playing hard exploration games by watching YouTube

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com