Abstract
The scarcity of labeled training data relative to the high- dimensionality multi-modal features is one of the ma jor obstacles for semantic concept classification of images and videos. Semi-supervised learning leverages the large amount of unlabeled data in developing effec- tive classifiers. Feature subspace learning finds optimal feature subspaces for representing data and helping classification. In this paper, we present a novel algorithm, Locality Preserving Semi-supervised Support Vector Machines (LPSSVM), to jointly learn an optimal feature subspace as well as a large margin SVM classifier. Over both labeled and unlabeled data, an optimal feature subspace is learned that can maintain the smoothness of local neighborhoods as well as being discriminative for classification. Simultaneously, an SVM classifier is optimized in the learned feature sub- space to have large margin. The resulting classifier can be readily used to handle unseen test data. Additionally, we show that the LPSSVM algo- rithm can be used in a Reproducing Kernel Hilbert Space for nonlinear classification. We extensively evaluate the proposed algorithm over four types of data sets: a toy problem, two UCI data sets, the Caltech 101 data set for image classification, and the challenging Kodak’s consumer video data set for semantic concept detection. Promising results are obtained which clearly confirm the effectiveness of the proposed method.