Abstract
In most state-of-the-art hashing-based visual search systems, local image descriptors of an image are first aggregated as a single feature vector. This feature vector is
then subjected to a hashing function that produces a binary hash code. In previous work, the aggregating and the
hashing processes are designed independently. In this paper, we propose a novel framework where feature aggregating and hashing are designed simultaneously and optimized
jointly. Specifically, our joint optimization produces aggregated representations that can be better reconstructed by
some binary codes. This leads to more discriminative binary hash codes and improved retrieval accuracy. In addition, we also propose a fast version of the recently-proposed
Binary Autoencoder to be used in our proposed framework. We perform extensive retrieval experiments on several benchmark datasets with both SIFT and convolutional
features. Our results suggest that the proposed framework
achieves significant improvements over the state of the art