Abstract
Deep convolutional neural networks are now widely deployed in vision applications, but the size of training data can bottleneck their performance. Transfer learning offers the chance for CNNs to learn with limited data samples by transferring knowledge from weights pre-trained on large datasets. On the other hand, blindly transferring all learned features from the source dataset brings unnecessary computation to CNNs on the target task. In this paper, we propose attentive feature distillation and selection (AFDS) that not only adjusts the strength of regularization introduced by transfer learning but also dynamically determines which are the important features to transfer. When deploying AFDS on ResNet-101, we achieve state-of-theart computation reduction at the same accuracy budget, outperforming all existing transfer learning methods. On a 10× MACs reduction budget, transfer learned from ImageNet to Stanford Dogs 120, AFDS achieves an accuracy that is 12.51% higher than its best competitor.