Abstract
We study the problem of Network Embedding (NE)
for content-rich networks. NE models aim to learn
efficient low-dimensional dense vectors for network vertices which are crucial to many network
analysis tasks. The core problem of content-rich
network embedding is to learn and integrate the semantic information conveyed by network structure
and node content. In this paper, we propose a general end-to-end model, Dual GEnerative Network
Embedding (DGENE), to leverage the complementary information of network structure and content.
In this model, each vertex is regarded as an object with two modalities: node identity and textual
content. Then we formulate two dual generation
tasks. One is Node Identification (NI) which recognizes nodes’ identities given their contents. Inversely, the other one is Content Generation (CG)
which generates textual contents given the nodes’
identities. We develop specific Content2Node and
Node2Content models for the two tasks. Under
the DGENE framework, the two dual models are
learned by sharing and integrating intermediate layers, with which they mutually enhance each other.
Extensive experimental results show that our model
yields a significant performance gain compared to
the state-of-the-art NE methods. Moreover, our
model has an interesting and useful byproduct, that
is, a component of our model can generate texts,
which is potentially useful for many tasks