Abstract. We present a scalable approach for Detecting Objects by
transferring Common-sense Knowledge (DOCK) from source to target
categories. In our setting, the training data for the source categories
have bounding box annotations, while those for the target categories only
have image-level annotations. Current state-of-the-art approaches focus
on image-level visual or semantic similarity to adapt a detector trained
on the source categories to the new target categories. In contrast, our key
idea is to (i) use similarity not at the image-level, but rather at the regionlevel, and (ii) leverage richer common-sense (based on attribute, spatial, etc.) to guide the algorithm towards learning the correct detections.
We acquire such common-sense cues automatically from readily-available
knowledge bases without any extra human effort. On the challenging MS
COCO dataset, we find that common-sense knowledge can substantially
improve detection performance over existing transfer-learning baselines.