Abstract
Automated representation learning is behind many
recent success stories in machine learning. It is often used to transfer knowledge learned from a large
dataset (e.g., raw text) to tasks for which only a
small number of training examples are available.
In this paper, we review recent advance in learning
to represent social media users in low-dimensional
embeddings. The technology is critical for creating high performance social media-based human
traits and behavior models since the ground truth
for assessing latent human traits and behavior is often expensive to acquire at a large scale. In this
survey, we review typical methods for learning a
unified user embeddings from heterogeneous user
data (e.g., combines social media texts with images
to learn a unified user representation). Finally we
point out some current issues and future directions.