Abstract
An analysis of different techniques for recognizing and
detecting objects under extreme scale variation is presented. Scale specific and scale invariant design of detectors are compared by training them with different con-
figurations of input data. By evaluating the performance
of different network architectures for classifying small objects on ImageNet, we show that CNNs are not robust to
changes in scale. Based on this analysis, we propose to
train and test detectors on the same scales of an imagepyramid. Since small and large objects are difficult to recognize at smaller and larger scales respectively, we present
a novel training scheme called Scale Normalization for Image Pyramids (SNIP) which selectively back-propagates the
gradients of object instances of different sizes as a function
of the image scale. On the COCO dataset, our single model
performance is 45.7% and an ensemble of 3 networks obtains an mAP of 48.3%. We use off-the-shelf ImageNet-1000
pre-trained models and only train with bounding box supervision. Our submission won the Best Student Entry in
the COCO 2017 challenge. Code will be made available at
http://bit.ly/2yXVg4c