Abstract
Latent subcategory models (LSMs) offer significant improvements over training linear support vector machines (SVMs). Training LSMs is a challenging task due to the potentially large number of local optima in the objective function and the increased model complexity which requires large training set sizes. Often, larger datasets are available as a collection of heterogeneous datasets. However, previous work has highlighted the possible danger of simply training a model from the combined datasets, due to the presence of bias. In this paper, we present a model which jointly learns an LSM for each dataset as well as a compound LSM. The method provides a means to borrow statistical strength from the datasets while reducing their inherent bias. In experiments we demonstrate that the compound LSM, when tested on PASCAL, LabelMe, Caltech101 and SUN09 in a leave-one-dataset-out fashion, achieves an average improvement of over 6.5% over a previous SVMbased undoing bias approach and an average improvement of over 8.5% over a standard LSM trained on the concatenation of the datasets.