Abstract
CNNs have made an undeniable impact on computer vision through the ability to learn high-capacity models with
large annotated training sets. One of their remarkable
properties is the ability to transfer knowledge from a large
source dataset to a (typically smaller) target dataset. This
is usually accomplished through fine-tuning a fixed-size network on new target data. Indeed, virtually every contemporary visual recognition system makes use of fine-tuning to
transfer knowledge from ImageNet. In this work, we analyze what components and parameters change during finetuning, and discover that increasing model capacity allows
for more natural model adaptation through fine-tuning. By
making an analogy to developmental learning, we demonstrate that “growing” a CNN with additional units, either by widening existing layers or deepening the overall
network, significantly outperforms classic fine-tuning approaches. But in order to properly grow a network, we show
that newly-added units must be appropriately normalized to
allow for a pace of learning that is consistent with existing units. We empirically validate our approach on several
benchmark datasets, producing state-of-the-art results.