Abstract
In this paper we investigate a new method of learning part- based models for visual ob ject recognition, from training data that only provides information about class membership (and not ob ject location or configuration). This method learns both a model of local part ap- pearance and a model of the spatial relations between those parts. In contrast, other work using such a weakly supervised learning paradigm has not considered the problem of simultaneously learning appearance and spatial models. Some of these methods use a “bag” model where only part appearance is considered whereas other methods learn spatial models but only given the output of a particular feature detector. Pre- vious techniques for learning both part appearance and spatial relations have instead used a highly supervised learning process that provides substantial information about ob ject part location. We show that our weakly supervised technique produces better results than these previous highly supervised methods. Moreover, we investigate the degree to which both richer spatial models and richer appearance models are helpful in improving recognition performance. Our results show that while both spatial and appearance information can be useful, the effect on perfor- mance depends substantially on the particular ob ject class and on the difficulty of the test dataset.