Abstract
Recent studies show that large-scale sketch-based image
retrieval (SBIR) can be efficiently tackled by cross-modal binary representation learning methods, where Hamming distance matching significantly speeds up the process of similarity search. Providing training and test data subjected to
a fixed set of pre-defined categories, the cutting-edge SBIR
and cross-modal hashing works obtain acceptable retrieval
performance. However, most of the existing methods fail
when the categories of query sketches have never been seen
during training.
In this paper, the above problem is briefed as a novel
but realistic zero-shot SBIR hashing task. We elaborate the
challenges of this special task and accordingly propose a
zero-shot sketch-image hashing (ZSIH) model. An end-toend three-network architecture is built, two of which are
treated as the binary encoders. The third network mitigates
the sketch-image heterogeneity and enhances the semantic
relations among data by utilizing the Kronecker fusion layer
and graph convolution, respectively. As an important part
of ZSIH, we formulate a generative hashing scheme in reconstructing semantic knowledge representations for zeroshot retrieval. To the best of our knowledge, ZSIH is the first
zero-shot hashing work suitable for SBIR and cross-modal
search. Comprehensive experiments are conducted on two
extended datasets, i.e., Sketchy and TU-Berlin with a novel
zero-shot train-test split. The proposed model remarkably
outperforms related works