Abstract
Very deep convolutional neural networks offer excellent
recognition results, yet their computational expense limits
their impact for many real-world applications. We introduce BlockDrop, an approach that learns to dynamically
choose which layers of a deep network to execute during
inference so as to best reduce total computation without degrading prediction accuracy. Exploiting the robustness of
Residual Networks (ResNets) to layer dropping, our framework selects on-the-fly which residual blocks to evaluate
for a given novel image. In particular, given a pretrained
ResNet, we train a policy network in an associative reinforcement learning setting for the dual reward of utilizing
a minimal number of blocks while preserving recognition
accuracy. We conduct extensive experiments on CIFAR and
ImageNet. The results provide strong quantitative and qualitative evidence that these learned policies not only accelerate inference but also encode meaningful visual information. Built upon a ResNet-101 model, our method achieves a
speedup of 20% on average, going as high as 36% for some
images, while maintaining the same 76.4% top-1 accuracy
on ImageNet