Abstract
Compact coding has been widely applied to approximate
nearest neighbor search for large-scale image retrieval, due
to its computation efficiency and retrieval quality. This paper presents a compact coding solution with a focus on the
deep learning to quantization approach, which improves retrieval quality by end-to-end representation learning and
compact encoding and has already shown the superior performance over the hashing solutions for similarity retrieval.
We propose Deep Visual-Semantic Quantization (DVSQ),
which is the first approach to learning deep quantization
models from labeled image data as well as the semantic information underlying general text domains. The main contribution lies in jointly learning deep visual-semantic embeddings and visual-semantic quantizers using carefullydesigned hybrid networks and well-specified loss functions.
DVSQ enables efficient and effective image retrieval by supporting maximum inner-product search, which is computed
based on learned codebooks with fast distance table lookup.
Comprehensive empirical evidence shows that DVSQ can
generate compact binary codes and yield state-of-the-art
similarity retrieval performance on standard benchmarks