Abstract. State-of-the-art visual perception models for a wide range
of tasks rely on supervised pretraining. ImageNet classification is the de
facto pretraining task for these models. Yet, ImageNet is now nearly ten
years old and is by modern standards “small”. Even so, relatively little is
known about the behavior of pretraining with datasets that are multiple
orders of magnitude larger. The reasons are obvious: such datasets are
difficult to collect and annotate. In this paper, we present a unique study
of transfer learning with large convolutional networks trained to predict
hashtags on billions of social media images. Our experiments demonstrate that training for large-scale hashtag prediction leads to excellent
results. We show improvements on several image classification and object
detection tasks, and report the highest ImageNet-1k single-crop, top-1
accuracy to date: 85.4% (97.6% top-5). We also perform extensive experiments that provide novel empirical data on the relationship between
large-scale pretraining and transfer learning performance