Abstract
Relative attribute models can compare images in terms
of all detected properties or attributes, exhaustively predicting which image is fancier, more natural, and so on without
any regard to ordering. However, when humans compare
images, certain differences will naturally stick out and come
to mind first. These most noticeable differences, or prominent differences, are likely to be described first. In addition,
many differences, although present, may not be mentioned
at all. In this work, we introduce and model prominent differences, a rich new functionality for comparing images. We
collect instance-level annotations of most noticeable differences, and build a model trained on relative attribute features that predicts prominent differences for unseen pairs.
We test our model on the challenging UT-Zap50K shoes and
LFW10 faces datasets, and outperform an array of baseline
methods. We then demonstrate how our prominence model
improves two vision tasks, image search and description
generation, enabling more natural communication between
people and vision systems