Abstract
Not all types of supervision signals are created equal: Different types of feedback have
different costs and effects on learning. We
show how self-regulation strategies that decide when to ask for which kind of feedback from a teacher (or from oneself) can
be cast as a learning-to-learn problem leading
to improved cost-aware sequence-to-sequence
learning. In experiments on interactive neural machine translation, we find that the selfregulator discovers an -greedy strategy for
the optimal cost-quality trade-off by mixing
different feedback types including corrections,
error markups, and self-supervision. Furthermore, we demonstrate its robustness under domain shift and identify it as a promising alternative to active learning.