Abstract.
Given a set of images of scenes containing multiple ob ject categories (e.g. grass, roads, buildings) our ob jective is to discover these ob jects in each image in an unsupervised manner, and to use this ob ject distribution to perform scene classification. We achieve this discovery us- ing probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature, here applied to a bag of visual words representation for each image. The scene classification on the ob ject dis- tribution is carried out by a k-nearest neighbour classifier. We investigate the classification performance under changes in the vi- sual vocabulary and number of latent topics learnt, and develop a novel vocabulary using colour SIFT descriptors. Classification performance is compared to the supervised approaches of Vogel & Schiele [19] and Oliva & Torralba [11], and the semi-supervised approach of Fei Fei & Per- ona [3] using their own datasets and testing protocols. In all cases the combination of (unsupervised) pLSA followed by (supervised) nearest neighbour classification achieves superior results. We show applications of this method to image retrieval with relevance feedback and to scene classification in videos.