Abstract
The current state-of-the-art dependency parsing
approaches employ BiLSTMs to encode input
sentences. Motivated by the success of the
transformer-based machine translation, this work
for the first time applies the self-attention mechanism to dependency parsing as the replacement
of BiLSTM, leading to competitive performance
on both English and Chinese benchmark data.
Based on detailed error analysis, we then combine
the power of both BiLSTM and self-attention via
model ensembles, demonstrating their complementary capability of capturing contextual information.
Finally, we explore the recently proposed contextualized word representations as extra input features,
and further improve the parsing performance