Memory-Efficient Implementation of DenseNets, supporting both DenseNet and DeseNet-BC series.
Environments: Linux CPU/GPU, Python 3, PyTorch 1.0
Check the implementation correctness by python -m utils.gradient_checking.py with different settings in utils/gradient_checking.py (CPU, single GPU, multiple GPUs).
Benchmark the forward/backward of efficient&non-efficient DenseNet by python -m utils.benckmark_effi.py
(CPU, single GPU, multiple GPUs). The following results are reported on
the Linux system equipped with 40 Intel(R) Xeon(R) CPUs (E5-2630 v4 @
2.20GHz) and NVIDIA GTX TiTan 1080Ti.
F means average forward time (ms), B means average backward time (ms), R=B/F.
The efficient version can process up to 1450 batches in a single GPU
(~12GB), compared with 350 batches of the non-efficient version. That
is, the efficient version is ~4x memory-efficient as the non-efficient version.
How to load the pretrained DenseNet into the efficient version?