Disclaimer: right now the code contains a little bug, and i'll fix it as soon as possible. Author: https://github.com/andrewliao11
Image classification with synthetic gradient in tensorflow
I implement the Decoupled Neural Interfaces using Synthetic Gradients
in tensorflow. The paper use synthetic gradient to decouple the layers
in the network. This is pretty interesting since we won't suffer from update lock (Here is a talk online on DNI: https://www.youtube.com/watch?v=toZprSCCmNI) anymore. I test my model in cifar10 and archieve similar result as the paper claimed.
apply to some more complicated network to see if it's general
What's synthetic gradients?
We ofter optimize NN by backpropogation, which is usually implemented
in some well-known framework. However, is there another way for the
layers in NN to communicate with other layers? Here comes the synthetic gradients!
It gives us a way to allow neural networks to communicate, to learn to
send messages between themselves, in a decoupled, scalable manner paving
the way for multiple neural networks to communicate with each other or
improving the long term temporal dependency of recurrent networks. The neuron in each layer will automatically produces an error signal(δa_head)
from synthetic-layers and do the optimzation. And how did the error
signal generated? Actually, the network still does the backpropogation.
While the error signal(δa) from the objective function is not used to optimize the neuron in the network, it is used to optimize the error signal(δa_head) produced by the synthetic-layer. The following is the illustration from the paper:
Usage
Right now I just implement the FCN version, which is set as the default network structure You can define some variable in command line: ex: python main.py -- max_step 100000 --checkpoint_dir ./model
max_step = 50000
model_name = mlp # the ckpt will save in $checkpoint_path/$model_name/checkpoint-*
checkpoint_dir = './checkpoint' # the checkpint directory
gpu_fraction = 1/2 # you can define the gpu memory usage
batch_size = 256
hidden_size = 1000 # hidden size of the mlp
test_per_iter = 50
optim_type = adam
synthetic = False # ues synthetic gradient or not
Experiment Result
DNI-mlp test on cifar10
Something Beautiful in Tensorflow
Tensorflow is known for the convenience of auto-gradient,
while at the same time many people don't know how it do the backprop or
calculate the backprop. Compared to Torch, there's no obvious way to
access the gradOutput, gradInput. Actually, Tensorflow contains some beautiful function that makes it easier and more flexible. Sometimes, you might want to calculate gradient dy/dx: Use tf.gradients(y,x). It's very simple
If you want to calculate the gradientm given the gradient backprop from
the loss, or sth you've defined (dy/dx = dy/du*du/dx, given dy/du): Use tf.gradients(y,x,dy/du).
Reference
Deepmind's post on Decoupled Neural Interfaces Using Synthetic Gradients