Abstract
Sliding window classifiers are among the most successful and widely applied techniques for ob ject localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative ex- amples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat ob ject localization in a principled way by posing it as a problem of predicting structured data : we model the problem not as binary classification, but as the prediction of the bounding box of ob jects located in images. The use of a joint-kernel framework allows us to formulate the training procedure as a general- ization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branch-and-bound strategy for lo- calization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured training procedure improves performance over binary training as well as the best previously published scores.