Abstract The web holds tremendous potential as a source of training data for visual classifification. However, web images must be correctly indexed and labeled before this potential can be realized. Accordingly, there has been considerable recent interest in collecting imagery from the web using image search engines to build databases for object and scene recognition research. While search engines can provide rough sets of image data, results are noisy and this leads to problems when training classififiers. In this paper we propose a semisupervised model for automatically collecting clean example imagery from the web. Our approach includes both visual and textual web data in a unifified framework. Minimal supervision is enabled by the selective use of generative and discriminative elements in a probabilistic model and a novel learning algorithm. We show through experiments that our model discovers good training images from the web with minimal manual work. Classififiers trained using our method signifificantly outperform analogous baseline approaches on the Caltech-256 dataset