资源论文Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

2020-02-04 | |  76 |   46 |   0

Abstract 

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is over a computer network and the other  is on a shared memory system. We establish an ergodic convergence rate O(1image.png)for both algorithms and prove  that the linear speedup is achievable if the number of workers is bounded by image.png(K is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.

上一篇:The Brain Uses Reliability of Stimulus Information when Making Perceptual Decisions

下一篇:Online Learning with Gaussian Payoffs and Side Observations

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...