资源论文POLYLOGARITHMIC WIDTH SUFFICES FOR GRADIENTDESCENT TO ACHIEVE ARBITRARILY SMALL TEST ER -ROR WITH SHALLOW RELU NETWORKS

POLYLOGARITHMIC WIDTH SUFFICES FOR GRADIENTDESCENT TO ACHIEVE ARBITRARILY SMALL TEST ER -ROR WITH SHALLOW RELU NETWORKS

2020-01-02 | |  85 |   63 |   0

Abstract

Recent work has revealed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size n, the (inverse) target error 图片.pngand the (inverse) failure probability 图片.png This work shows that 图片.png iterations of gradient descent on two-layer networks of any width exceeding polylog图片.png training examples suffices to achieve a test error of . The analysis further relies upon a margin property of the limiting kernel, which is guaranteed positive, and can distinguish between true labels and random labels.

上一篇:FROM VARIATIONALTO DETERMINISTIC AUTOENCODERS

下一篇:LEARNING FROM EXPLANATIONS WITHN EURAL MODULE EXECUTION TREE

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...