Abstract
Supervised models suffer from the problem of
domain shifting where distribution mismatch
in the data across domains greatly affect model
performance. To solve the problem, training data selection (TDS) has been proven to
be a prospective solution for domain adaptation in leveraging appropriate data. However,
conventional TDS methods normally requires
a predefined threshold which is neither easy
to set nor can be applied across tasks, and
models are trained separately with the TDS
process. To make TDS self-adapted to data
and task, and to combine it with model training, in this paper, we propose a reinforcement
learning (RL) framework that synchronously
searches for training instances relevant to the
target domain and learns better representations
for them. A selection distribution generator
(SDG) is designed to perform the selection and
is updated according to the rewards computed
from the selected data, where a predictor is
included in the framework to ensure a taskspecific model can be trained on the selected
data and provides feedback to rewards. Experimental results from part-of-speech tagging, dependency parsing, and sentiment analysis, as
well as ablation studies, illustrate that the proposed framework is not only effective in data
selection and representation, but also generalized to accommodate different NLP tasks