Abstract
Learning to hash involves learning hash functions from a set of images for embedding high-dimensional visual descriptors into a similarity-preserving low-dimensional Hamming space. Most of existing methods resort to a single representation of images, that is, only one type of visual descriptors is used to learn a hash function to assign bi- nary codes to images. However, images are often described by multiple different visual descriptors (such as SIFT, GIST, HOG), so it is desir- able to incorporate these multiple representations into learning a hash function, leading to multi-view hashing. In this paper we present a se- quential spectral learning approach to multi-view hashing where a hash function is sequentially determined by solving the successive maximiza- tion of local variances sub ject to decorrelation constraints. We compute multi-view local variances by ?-averaging view-specific distance matri- ces such that the best averaged distance matrix is determined by min- imizing its ?-divergence from view-specific distance matrices. We also present a scalable implementation, exploiting a fast approximate k-NN graph construction method, in which ?-averaged distances computed in small partitions determined by recursive spectral bisection are gradually merged in conquer steps until whole examples are used. Numerical exper- iments on Caltech-256, CIFAR-20, and NUS-WIDE datasets confirm the high performance of our method, in comparison to single-view spectral hashing as well as existing multi-view hashing methods.