Abstract
In this paper, we present a Multi-Task Deep
Neural Network (MT-DNN) for learning representations across multiple natural language
understanding (NLU) tasks. MT-DNN not
only leverages large amounts of cross-task
data, but also benefits from a regularization effect that leads to more general representations
to help adapt to new tasks and domains. MTDNN extends the model proposed in Liu et al.
(2015) by incorporating a pre-trained bidirectional transformer language model, known as
BERT (Devlin et al., 2018). MT-DNN obtains new state-of-the-art results on ten NLU
tasks, including SNLI, SciTail, and eight out of
nine GLUE tasks, pushing the GLUE benchmark to 82.7% (2.2% absolute improvement)
1
. We also demonstrate using the SNLI and SciTail datasets that the representations learned
by MT-DNN allow domain adaptation with
substantially fewer in-domain labels than the
pre-trained BERT representations. The code
and pre-trained models are publicly available
at https://github.com/namisan/mt-dnn