Abstract
We address the problem of weakly supervised object lo-calization where only image-level annotations are available for training.Many existing approaches tackle this prob-lem through object proposal mining.However,a substan-rial amount of noise in object proposals causes ambiguities for learning discriminative object models.Such approaches are sensitive to model initialization and ofien converge lo an undesirable local minimum In this paper,we address this problem by progressive domain adaptation with Iwo main steps:classification adaptation and detection adaptation.In classification adaptation,we transfer a pre-trained net-work to our multi-label classification Iask for recognizing the presence of a certain object in an image. In detec-tion adaptation,we first use a mask-out strategy to collect class-specific object proposals and apply multiple instance learning to mine confident candidates.We then use these se-lected objecr proposals to fine-tune all the layers,resulting in a fiully adapted detection nerwork We extensively evalu-ate the localization performance on the PASCAL VOC and ILSVRC datasets and demonstrate significant pe rformance improvement over the state-of -the-art methods.