Multilingual Constituency Parsing with Self-Attention and Pre-Training

资源分类

2019-09-18 |

111 |

60 |

Abstract We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText (Bojanowski et al., 2017; Mikolov et al., 2018), ELMo (Peters et al., 2018), and BERT (Devlin et al., 2018a) for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-ofthe-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).

上一篇：Jointly Learning Semantic Parser and Natural Language Generatorvia Dual Information Maximization

下一篇：PTB Graph Parsing with Tree Approximation

用户评价

全部评价

还没有评论，说两句吧！

热门资源

A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...
The Variational S...

Unlike traditional images which do not offer in...
Hierarchical Task...

We extend hierarchical task network planning wi...
Shape-based Autom...

We present an algorithm for automatic detection...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com