Abstract
We describe a new method for unsupervised structure learn- ing of a hierarchical compositional model (HCM) for deformable ob jects. The learning is unsupervised in the sense that we are given a train- ing dataset of images containing the ob ject in cluttered backgrounds but we do not know the position or boundary of the ob ject. The struc- ture learning is performed by a bottom-up and top-down process. The bottom-up process is a novel form of hierarchical clustering which re- cursively composes proposals for simple structures to generate proposals for more complex structures. We combine standard clustering with the suspicious coincidence principle and the competitive exclusion principle to prune the number of proposals to a practical number and avoid an exponential explosion of possible structures. The hierarchical clustering stops automatically, when it fails to generate new proposals, and out- puts a proposal for the ob ject model. The top-down process validates the proposals and fills in missing elements. We tested our approach by using it to learn a hierarchical compositional model for parsing and seg- menting horses on Weizmann dataset. We show that the resulting model is comparable with (or better than) alternative methods. The versatility of our approach is demonstrated by learning models for other ob jects (e.g., faces, pianos, butterflies, monitors, etc.). It is worth noting that the low-levels of the ob ject hierarchies automatically learn generic image features while the higher levels learn ob ject specific features.