Abstract
Transfer learning is effective for improving
the performance of tasks that are related, and
Multi-task learning (MTL) and Cross-lingual
learning (CLL) are important instances. This
paper argues that hard-parameter sharing, of
hard-coding layers shared across different
tasks or languages, cannot generalize well,
when sharing with a loosely related task. Such
case, which we call sparse transfer, might actually hurt performance, a phenomenon known
as negative transfer. Our contribution is using adversarial training across tasks, to “softcode” shared and private spaces, to avoid the
shared space gets too sparse. In CLL, our proposed architecture considers another challenge
of dealing with low-quality input.