资源论文Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

2020-02-21 | |  62 |   39 |   0

Abstract

We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require N hidden nodes to memorize/interpolate arbitrary N data points.  In contrast, by exploiting depth, we show that 3-layer ReLU networks with 图片.png hidden nodes  can perfectly memorize most datasets with N points. We also prove that width 图片.png is necessary and sufficient for memorizing N data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an L-layer network with W parameters in the hidden layers can memorize N data points if W 图片.png. Combined with a recent upper bound O(W L log W ) on VC dimension, our construction is nearly tight for any fixed L. Subsequently, we analyze memorization capacity of residual networks under a general position assumption; we prove results that substantially reduce the known requirement of N hidden nodes. Finally, we study the dynamics of stochastic gradient descent (SGD), and show that when initialized near a memorizing global minimum of the empirical risk, SGD quickly finds a nearby point with much smaller empirical risk.

上一篇:Differentiable Ranks and Sorting using Optimal Transport

下一篇:Discrete Flows: Invertible Generative Models of Discrete Data

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...