Abstract
Recent advances, such as GPT and BERT, have
shown success in incorporating a pre-trained transformer language model and fine-tuning operation to
improve downstream NLP systems. However, this
framework still has some fundamental problems
in effectively incorporating supervised knowledge
from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training
framework, which can transfer not only general language knowledge from large-scale unlabeled data
but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three
kinds of transfer tasks, including natural language
inference, sentiment classification, and next action
prediction, to further train BERT based on a pretrained model. This enables the model to get a
better initialization for the target task. We take
story ending prediction as the target task to conduct experiments. The final result, an accuracy of
91.8%, dramatically outperforms previous state-ofthe-art baseline methods. Several comparative experiments give some helpful suggestions on how to
select transfer tasks to improve BERT