Sequence Tagging with Contextual and Non-Contextual
Subword Representations: A Multilingual Evaluation
Abstract
Pretrained contextual and non-contextual subword embeddings have become available in
over 250 languages, allowing massively multilingual NLP. However, while there is no
dearth of pretrained embeddings, the distinct
lack of systematic evaluations makes it diffi-
cult for practitioners to choose between them.
In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and
a contextual representation method, namely
BERT, on multilingual named entity recognition and part-of-speech tagging.
We find that overall, a combination of BERT,
BPEmb, and character representations works
well across languages and tasks. A more
detailed analysis reveals different strengths
and weaknesses: Multilingual BERT performs
well in medium- to high-resource languages,
but is outperformed by non-contextual subword embeddings in a low-resource setting.