资源论文On the Local Minima of the Empirical Risk

On the Local Minima of the Empirical Risk

2020-02-14 | |  38 |   40 |   0

Abstract 

Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. Even for applications with nonconvex nonsmooth losses (such as modern deep networks), the population risk is generally significantly more well-behaved from an optimization point of view than the empirical risk. In particular, sampling can create many spurious local minima. We consider a general framework which aims to optimize a smooth nonconvex function F (population risk) given only access to an approximation f (empirical risk) that is pointwise close to F image.png Our objective is to find the -approximate local minima of the underlying function F while avoiding the shallow local minima—arising because of the tolerance image.png—which exist only in f . We propose a simple algorithm based on stochastic gradient descent (SGD) on a smoothed version of f that is guaranteed to achieve our goal as long as image.png We also provide an almost matching lower bound showing that our algorithm achieves optimal error tolerance image.png among all algorithms making a polynomial number of queries of f . As a concrete example, we show that our results can be directly used to give sample complexities for learning a ReLU unit.

上一篇:Beauty-in-averageness and its contextual modulations: A Bayesian statistical account

下一篇:Phase Retrieval Under a Generative Prior

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...