Abstract
We propose the Ornstein auto-encoder (OAE), a
representation learning model for correlated data.
In many interesting applications, data have nested
structures. Examples include the VGGFace and
MNIST datasets. We view such data consist of
i.i.d. copies of a stationary random process, and
seek a latent space representation of the observed
sequences. This viewpoint necessitates a distance
measure between two random processes. We propose to use Orstein’s d-bar distance, a process extension of Wasserstein’s distance. We first show
that the theorem by Bousquet et al. (2017) for
Wasserstein auto-encoders extends to stationary
random processes. This result, however, requires
both encoder and decoder to map an entire sequence to another. We then show that, when exchangeability within a process, valid for VGGFace
and MNIST, is assumed, these maps reduce to univariate ones, resulting in a much simpler, tractable
optimization problem. Our experiments show that
OAEs successfully separate individual sequences in
the latent space, and can generate new variations of
unknown, as well as known, identity. The latter has
not been possible with other existing methods