Abstract
Learning visual classifiers for ob ject recognition from weakly labeled data requires determining correspondence between image re- gions and semantic ob ject classes. Most approaches use co-occurrence of “nouns” and image features over large datasets to determine the cor- respondence, but many correspondence ambiguities remain. We further constrain the correspondence problem by exploiting additional language constructs to improve the learning process from weakly labeled data. We consider both “prepositions” and “comparative adjectives” which are used to express relationships between ob jects. If the models of such relationships can be determined, they help resolve correspondence ambi- guities. However, learning models of these relationships requires solving the correspondence problem. We simultaneously learn the visual fea- tures defining “nouns” and the differential visual features defining such “binary-relationships” using an EM-based approach.