Self-paced Compensatory Deep Boltzmann Machine for Semi-Structured
Document Embedding
Abstract
In the last decade, there has been a huge amount
of documents with different types of rich metadata
information, which belongs to the Semi-Structured
Documents (SSDs), appearing in many real applications. It is an interesting research work to model
this type of text data following the way how humans understand text with informative metadata. In
the paper, we introduce a Self-paced Compensatory
Deep Boltzmann Machine (SCDBM) architecture
that learns a deep neural network by using metadata
information to learn deep structure layer-wisely for
Semi-Structured Documents (SSDs) embedding in
a self-paced way. Inspired by the way how humans
understand text, the model defines a deep process
of document vector extraction beyond the space of
words by jointing the metadata where each layer
selects different types of metadata. We present ef-
ficient learning and inference algorithms for the
SCDBM model and empirically demonstrate that
using the representation discovered by this model
has better performance on semi-structured document classification and retrieval, and tag prediction
comparing with state-of-the-art baselines