Abstract
When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking – without waiting for a true error gradient to be backpropagated – resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically in Jaderberg et al. (2016). However, there has been very little demonstration of what changes DNIs and SGs impose from a functional, representational, and learning dynamics point of view. In this paper, we study DNIs through the use of synthetic gradients on feed-forward networks to better Legend:und stand their behaviour and elucidate their Forward co differenti on optimisation. We show that the incorporaForward co tion of SGs does not affect the representational non-differ strength of the learning system for a neural netError grad work, and prove the convergence of the learn non-differ system for linear and deep linear models. Synthetic practical problems we investigate the mechanism gradient, by which synthetic gradient estimators appr Synthetic imate the true loss, and, surprisingly,gradient, how differenti leads to drastically different layer-wise represen tations. Finally, we also expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.