Embedding-based Representation of Categorical Data by
Hierarchical Value Coupling Learning
Abstract
Learning the representation of categorical data with
hierarchical value coupling relationships is very
challenging but critical for the effective analysis
and learning of such data. This paper proposes a
novel coupled unsupervised categorical data representation (CURE) framework and its instantiation,
i.e., a coupled data embedding (CDE) method, for
representing categorical data by hierarchical valueto-value cluster coupling learning. Unlike existing embedding- and similarity-based representation
methods which can capture only a part or none of
these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding
representation. CDE first learns two complementary feature value couplings which are then used to
cluster values with different granularities. It further models the couplings in value clusters within
the same granularity and with different granularities to embed feature values into a new numerical
space with independent dimensions. Substantial
experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods