Abstract. Object detection is one of the major problems in computer
vision, and has been extensively studied. Most of the existing detection
works rely on labor-intensive supervision, such as ground truth bounding boxes of objects or at least image-level annotations. On the contrary, we propose an object detection method that does not require any
form of human annotation on target tasks, by exploiting freely available web images. In order to facilitate effective knowledge transfer from
web images, we introduce a multi-instance multi-label domain adaption
learning framework with two key innovations. First of all, we propose an
instance-level adversarial domain adaptation network with attention on
foreground objects to transfer the object appearances from web domain
to target domain. Second, to preserve the class-specific semantic structure of transferred object features, we propose a simultaneous transfer
mechanism to transfer the supervision across domains through pseudo
strong label generation. With our end-to-end framework that simultaneously learns a weakly supervised detector and transfers knowledge across
domains, we achieved significant improvements over baseline methods on
the benchmark datasets