EigenSent: Spectral sentence embeddings using higher-order Dynamic
Mode Decomposition
Abstract
Distributed representation of words, or word
embeddings, have motivated methods for calculating semantic representations of word sequences such as phrases, sentences and paragraphs. Most of the existing methods to do
so either use algorithms to learn such representations, or improve on calculating weighted
averages of the word vectors. In this work,
we experiment with spectral methods of signal
representation and summarization as mechanisms for constructing such word-sequence
embeddings in an unsupervised fashion. In
particular, we explore an algorithm rooted in
fluid-dynamics, known as higher-order Dynamic Mode Decomposition, which is designed to capture the eigenfrequencies, and
hence the fundamental transition dynamics, of
periodic and quasi-periodic systems. It is empirically observed that this approach, which
we call EigenSent, can summarize transitions
in a sequence of words and generate an embedding that can represent well the sequence
itself. To the best of the authors’ knowledge,
this is the first application of a spectral decomposition and signal summarization technique
on text, to create sentence embeddings. We
test the efficacy of this algorithm in creating
sentence embeddings on three public datasets,
where it performs appreciably well. Moreover it is also shown that, due to the positive
combination of their complementary properties, concatenating the embeddings generated
by EigenSent with simple word vector averaging achieves state-of-the-art results.