Abstract
The problem of ranking a set of visual samples according to their relevance to a query plays an important role in computer vision. The traditional approach for ranking is to train a binary classifier such as a support vector machine (svm). Binary classifiers suffer from two main deficiencies: (i) they do not optimize a ranking-based loss function, for example, the average precision (ap) loss; and (ii) they cannot incorporate high-order information such as the a priori correlation between the rele- vance of two visual samples (for example, two persons in the same image tend to perform the same action). We propose two novel learning formu- lations that allow us to incorporate high-order information for ranking. The first framework, called high-order binary svm (hob-svm), allows for a structured input. The parameters of hob-svm are learned by minimiz- ing a convex upper bound on a surrogate 0-1 loss function. In order to obtain the ranking of the samples that form the structured input, hob- svm sorts the samples according to their max-marginals. The second framework, called high-order average precision svm (hoap-svm), also allows for a structured input and uses the same ranking criterion. How- ever, in contrast to hob-svm, the parameters of hoap-svm are learned by minimizing a difference-of-convex upper bound on the ap loss. Using a standard, publicly available dataset for the challenging problem of action classification, we show that both hob-svm and hoap-svm outperform the baselines that ignore high-order information.