资源论文PIECEWISE LINEAR ACTIVATIONS CANSUBSTANTIALLY SHAPE THE LOSS SURFACES OFNEURAL NETWORKS

PIECEWISE LINEAR ACTIVATIONS CANSUBSTANTIALLY SHAPE THE LOSS SURFACES OFNEURAL NETWORKS

2019-12-30 | |  65 |   46 |   0

Abstract

Understanding the loss surface of a neural network is fundamentally important to the understanding of deep learning. This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that the loss surfaces of many neural networks have infinite spurious local minima, which are defined as the local minima with higher empirical risks than the global minima. Our result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice with some mild assumptions. This result demonstrates that the networks with piecewise linear activations possess substantial differences to the well-studied linear neural networks. Essentially, the underlying assumptions for the above result are consistent with most practical circumstances where the output layer is narrower than any hidden layer. In addition, the loss surface of a neural network with piecewise linear activations is partitioned into multiple smooth and multilinear open cells by nondifferentiable boundaries. The constructed spurious local minima are exactly connected with each other in one cell by a continuous path, on which the empirical risk is invariant. We further prove that within every cell of a one-hidden-layer network, local minima are equally good, and also, they are all global minima in the cell.

上一篇:GRADIENT DESCENT MAXIMIZES THE MARGIN OFH OMOGENEOUS NEURAL NETWORKS

下一篇:SPAN RECOVERY FOR DEEP NEURAL NETWORKS WITH APPLI -CATIONS TO INPUT OBFUSCATION

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...