Abstract
Transferring the knowledge learned from large scale
datasets (e.g., ImageNet) via fine-tuning offers an effective
solution for domain-specific fine-grained visual categorization (FGVC) tasks (e.g., recognizing bird species or car
make & model). In such scenarios, data annotation often
calls for specialized domain knowledge and thus is difficult
to scale. In this work, we first tackle a problem in large scale
FGVC. Our method won first place in iNaturalist 2017 large
scale species classification challenge. Central to the success of our approach is a training scheme that uses higher
image resolution and deals with the long-tailed distribution of training data. Next, we study transfer learning via
fine-tuning from large scale datasets to small scale, domainspecific FGVC datasets. We propose a measure to estimate
domain similarity via Earth Mover’s Distance and demonstrate that transfer learning benefits from pre-training on a
source domain that is similar to the target domain by this
measure. Our proposed transfer learning outperforms ImageNet pre-training and obtains state-of-the-art results on
multiple commonly used FGVC datasets