Abstract
This paper aims to classify and locate objects accurately and efficiently, without using bounding box annotations. Itis challenging as objects in the wild could appear at arbitrary locations and in different scales. In this paper, wepropose a novel classification architecture ProNet based onconvolutional neural networks. It uses computationally efficient neural networks to propose image regions that arelikely to contain objects, and applies more powerful butslower networks on the proposed regions. The basic building block is a multi-scale fully-convolutional network whichassigns object confidence scores to boxes at different loca-tions and scales. We show that such networks can be trained effectively using image-level annotations, and can be connected into cascades or trees for efficient object classification. ProNet outperforms previous state-of-the-art significantly on PASCAL VOC 2012 and MS COCO datasets forobject classification and point-based localization.