Abstract
In cross-lingual transfer, NLP models over one
or more source languages are applied to a lowresource target language. While most prior
work has used a single source model or a
few carefully selected models, here we consider a “massive” setting with many such models. This setting raises the problem of poor
transfer, particularly from distant languages.
We propose two techniques for modulating
the transfer, suitable for zero-shot or few-shot
learning, respectively. Evaluating on named
entity recognition, we show that our techniques are much more effective than strong
baselines, including standard ensembling, and
our unsupervised method rivals oracle selection of the single best individual model