Abstract
We attack the problem of learning concepts automatically from noisy Web image search results. The idea is based on discovering common characteris- tics shared among subsets of images by posing a method that is able to organise the data while eliminating irrelevant instances. We propose a novel clustering and outlier detection method, namely Concept Map (CMAP). Given an image collec- tion returned for a concept query, CMAP provides clusters pruned from outliers. Each cluster is used to train a model representing a different characteristics of the concept. The proposed method outperforms the state-of-the-art studies on the task of learning from noisy web data for low-level attributes, as well as high level object categories. It is also competitive with the supervised methods in learning scene concepts. Moreover, results on naming faces support the generalisation ca- pability of the CMAP framework to different domains. CMAP is capable to work at large scale with no supervision through exploiting the available sources.