Abstract
Object detection is often conducted by object proposal generation and classifification sequentially. This paper handles object detection in a superpixel oriented manner instead of the proposal oriented. Specially, this paper takes object detection as a multi-label superpixel labeling problem by minimizing an energy function. It uses the data cost term to capture the appearance, smooth cost term to encode the spatial context and label cost term to favor compact detection. The data cost is learned through a convolutional neural network and the parameters in the labeling model are learned through a structural SVM. Compared with proposal generation and classifification based methods, the proposed superpixel labeling method can naturally detect objects missed by proposal generation step and capture the global image context to infer the overlapping objects. The proposed method shows its advantage in Pascal VOC and ImageNet. Notably, it performs better than the ImageNet ILSVRC2014 winner GoogLeNet (45.0% V.S. 43.9% in mAP) with much shallower and fewer CNNs.