Abstract
Lattices are an efficient and effective method
to encode ambiguity of upstream systems in
natural language processing tasks, for example
to compactly capture multiple speech recognition hypotheses, or to represent multiple linguistic analyses. Previous work has extended
recurrent neural networks to model lattice inputs and achieved improvements in various
tasks, but these models suffer from very slow
computation speeds. This paper extends the
recently proposed paradigm of self-attention
to handle lattice inputs. Self-attention is a sequence modeling technique that relates inputs
to one another by computing pairwise similarities and has gained popularity for both its
strong results and its computational efficiency.
To extend such models to handle lattices, we
introduce probabilistic reachability masks that
incorporate lattice structure into the model and
support lattice scores if available. We also propose a method for adapting positional embeddings to lattice structures. We apply the proposed model to a speech translation task and
find that it outperforms all examined baselines
while being much faster to compute than previous neural lattice models during both training and inference.