Abstract
Current human-in-the-loop fifine-grained visual categorization systems depend on a predefifined vocabulary of attributes and parts, usually determined by experts. In this work, we move away from that expert-driven and attributecentric paradigm and present a novel interactive classififi- cation system that incorporates computer vision and perceptual similarity metrics in a unifified framework. At test time, users are asked to judge relative similarity between a query image and various sets of images; these general queries do not require expert-defifined terminology and are applicable to other domains and basic-level categories, enabling a flflexible, effificient, and scalable system for fifinegrained categorization with humans in the loop. Our system outperforms existing state-of-the-art systems for relevance feedback-based image retrieval as well as interactive classi- fification, resulting in a reduction of up to 43% in the average number of questions needed to correctly classify an image