Abstract
We consider image retrieval with structured object queries – queries that specify the ob jects that should be present in the scene, and their spatial relations. An example of such queries is “car on the road”. Existing image retrieval systems typically consider queries consisting of ob ject classes (i.e. keywords). They train a separate classifier for each ob ject class and combine the output heuristically. In contrast, we de- velop a learning framework to jointly consider ob ject classes and their relations. Our method considers not only the ob jects in the query (“car” and “road” in the above example), but also related ob ject categories can be useful for retrieval. Since we do not have ground-truth labeling of ob ject bounding boxes on the test image, we represent them as latent variables in our model. Our learning method is an extension of the rank- ing SVM with latent variables, which we call latent ranking SVM. We demonstrate image retrieval and ranking results on a dataset with more than a hundred of ob ject classes.