Abstract
During the past decades, due to the lack of
sufficient labeled data, most studies on crossdomain parsing focus on unsupervised domain
adaptation, assuming there is no targetdomain training data. However, unsupervised
approaches make limited progress so far due
to the intrinsic difficulty of both domain
adaptation and parsing. This paper tackles the
semi-supervised domain adaptation problem
for Chinese dependency parsing, based on two
newly-annotated large-scale domain-specific
datasets.1 We propose a simple domain
embedding approach to merge the sourceand target-domain training data, which is
shown to be more effective than both direct
corpus concatenation and multi-task learning.
In order to utilize unlabeled target-domain
data, we employ the recent contextualized
word representations and show that a simple
fine-tuning procedure can further boost
cross-domain parsing accuracy by large
margins