Abstract
Weakly supervised learning of object detection is an im-portant problem in image understanding that still does not have a satisfactory solution. In this paper, we address thisproblem by exploiting the power of deep convolutional neural networks pre-trained on large-scale image-level classification tasks. We propose a weakly supervised deep detection architecture that modifies one such network to operate at the level of image regions, performing simultaneously re-gion selection and classification. Trained as an image clas-sifier, the architecture implicitly learns object detectors thaare better than alternative weakly supervised detection systems on the PASCAL VOC data. The model, which is a simple and elegant end-to-end architecture, outperforms standard data augmentation and fine-tuning techniques for the task of image-level classification as well.