Abstract
We present a discriminatively trained model for joint mod- elling of ob ject class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat at- tributes of an ob ject as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to in- fer ob ject class labels using the information of both the test image itself and its (latent) attributes. Our model unifies ob ject class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental re- sults provide quantitative evidence that attributes can improve ob ject naming.