资源论文Nostalgic Adam: Weighting More of the Past Gradients When Designing the Adaptive Learning Rate

Nostalgic Adam: Weighting More of the Past Gradients When Designing the Adaptive Learning Rate

2019-10-08 | |  77 |   61 |   0
Abstract First-order optimization algorithms have been proven prominent in deep learning. In particular, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of “long-term memory” in Adam-like algorithms, which could hamper their performance and lead to divergence. In our study, we observe that there are benefits of weighting more of the past gradients when designing the adaptive learning rate. We therefore propose an algorithm called the Nostalgic Adam (NosAdam) with theoretically guaranteed convergence at the best known convergence rate. NosAdam can be regarded as a fix to the non-convergence issue of Adam in alternative to the recent work of [Reddi et al., 2018]. Our preliminary numerical experiments show that NosAdam is a promising alternative algorithm to Adam. The proofs, code, and other supplementary materials are already released

上一篇:Network-Specific Variational Auto-Encoder for Embedding in Attribute Networks

下一篇:Online Learning from Capricious Data Streams: A Generative Approach

用户评价
全部评价

热门资源

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Rating-Boosted La...

    The performance of a recommendation system reli...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...