Abstract
Spectral models for learning weighted nondeterministic automata have nice theoretical
and algorithmic properties. Despite this, it has
been challenging to obtain competitive results
in language modeling tasks, for two main reasons. First, in order to capture long-range dependencies of the data, the method must use
statistics from long substrings, which results
in very large matrices that are difficult to decompose. The second is that the loss function behind spectral learning, based on moment matching, differs from the probabilistic metrics used to evaluate language models. In this work we employ a technique
for scaling up spectral learning, and use interpolated predictions that are optimized to
maximize perplexity. Our experiments in
character-based language modeling show that
our method matches the performance of stateof-the-art ngram models, while being very fast
to train