Learning K-way D-dimensional Discrete Embedding for Hierarchical Data
Visualization and Retrieval
Abstract
Traditional embedding approaches associate a realvalued embedding vector with each symbol or data
point, which is equivalent to applying a linear transformation to “one-hot” encoding of discrete symbols or data objects. Despite simplicity, these methods generate storage-inefficient representations and
fail to effectively encode the internal semantic
structure of data, especially when the number of
symbols or data points and the dimensionality of
the real-valued embedding vectors are large. In
this paper, we propose a regularized autoencoder
framework to learn compact Hierarchical K-way
D-dimensional (HKD) discrete embedding of symbols or data points, aiming at capturing essential semantic structures of data. Experimental results on
synthetic and real-world datasets show that our proposed HKD embedding can effectively reveal the
semantic structure of data via hierarchical data visualization and greatly reduce the search space of
nearest neighbor retrieval while preserving high accuracy