Abstract
In recent NLP research, a topic of interest
is universal sentence encoding, sentence representations that can be used in any supervised task. At the word sequence level, fully
attention-based models suffer from two problems: a quadratic increase in memory consumption with respect to the sentence length
and an inability to capture and use syntactic
information. Recursive neural nets can extract
very good syntactic information by traversing a tree structure. To this end, we propose Tree Transformer, a model that captures
phrase level syntax for constituency trees as
well as word-level dependencies for dependency trees by doing recursive traversal only
with attention. Evaluation of this model on
four tasks gets noteworthy results compared
to the standard transformer and LSTM-based
models as well as tree-structured LSTMs. Ablation studies to find whether positional information is inherently encoded in the trees and
which type of attention is suitable for doing
the recursive traversal are provided.