资源论文Understanding Synthetic Gradients and Decoupled Neural Interfaces

Understanding Synthetic Gradients and Decoupled Neural Interfaces

2020-03-10 | |  58 |   40 |   0

Abstract

When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking – without waiting for a true error gradient to be backpropagated – resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically in Jaderberg et al. (2016). However, there has been very little demonstration of what changes DNIs and SGs impose from a functional, representational, and learning dynamics point of view. In this paper, we study DNIs through the use of synthetic gradients on feed-forward networks to better Legend:und stand their behaviour and elucidate their Forward co differenti on optimisation. We show that the incorporaForward co tion of SGs does not affect the representational non-differ strength of the learning system for a neural netError grad work, and prove the convergence of the learn non-differ system for linear and deep linear models. Synthetic practical problems we investigate the mechanism gradient, by which synthetic gradient estimators appr Synthetic imate the true loss, and, surprisingly,gradient, how differenti leads to drastically different layer-wise represen tations. Finally, we also expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.

上一篇:Magnetic Hamiltonian Monte Carlo

下一篇:On Calibration of Modern Neural Networks

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...