This repository contains re-implemented code for the paper "Identity Mappings in Deep Residual Networks" (http://arxiv.org/abs/1603.05027). This work enables training quality 1k-layer neural networks in a super simple way.
Acknowledgement: This code is re-implemented by Xiang Ming from Xi'an Jiaotong Univeristy for the ease of release.
[a] @article{He2016,
author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
title = {Identity Mappings in Deep Residual Networks},
journal = {arXiv preprint arXiv:1603.05027},
year = {2016}
}
[b] @article{He2015,
author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
title = {Deep Residual Learning for Image Recognition},
journal = {arXiv preprint arXiv:1512.03385},
year = {2015}
}
The experiments in the paper were conducted in Caffe, whereas this
code is re-implemented in Torch. We observed similar results within
reasonable statistical variations.
To fit the 1k-layer models into memory without modifying much code,
we simply reduced the mini-batch size to 64, noting that results in the
paper were obtained with a mini-batch size of 128. Less expectedly, the
results with the mini-batch size of 64 are slightly better:
mini-batch
CIFAR-10 test error (%): (median (mean+/-std))
128 (as in [a])
4.92 (4.89+/-0.14)
64 (as in this code)
4.62 (4.69+/-0.20)
Curves obtained by running this code with a mini-batch size of 64
(training loss: y-axis on the left; test error: y-axis on the right):