资源论文TOWARDS BETTER UNDERSTANDING OF ADAPTIVEG RADIENT ALGORITHMS IN GENERATIVE ADVER -SARIAL NETS

TOWARDS BETTER UNDERSTANDING OF ADAPTIVEG RADIENT ALGORITHMS IN GENERATIVE ADVER -SARIAL NETS

2020-01-02 | |  83 |   53 |   0

Abstract

Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical and empirical perspectives. First, we analyze a variant of Optimistic Stochastic Gradient (OSG) proposed in (Daskalakis et al., 2017) for solving a class of nonconvex non-concave min-max problem and establish 图片.pngcomplexity for finding -first-order stationary point, in which the algorithm only requires invoking one stochastic first-order oracle while enjoying state-of-the-art iteration complexity achieved by stochastic extragradient method by (Iusem et al., 2017). Then we propose an adaptive variant of OSG namedOptimistic  Adagrad (OAdagrad) and reveal an improved adaptive complexity 图片.png , where 图片.png characterizes the growth rate of the cumulative stochastic gradient and 图片.png To the best of our knowledge, this is the first work for establishing adaptive complexity in nonconvex non-concave min-max optimization. Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically.

上一篇:DIFF SIM :D IFFERENTIABLE PROGRAMMING FORP HYSICAL SIMULATION

下一篇:INTENSITY-F REE LEARNING OFT EMPORAL POINT PROCESSES

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...