Abstract
We incorporate morphological supervision
into character language models (CLMs) via
multitasking and show that this addition improves bits-per-character (BPC) performance
across 24 languages, even when the morphology data and language modeling data are disjoint. Analyzing the CLMs shows that in-
flected words benefit more from explicitly
modeling morphology than uninflected words,
and that morphological supervision improves
performance even as the amount of language
modeling data grows. We then transfer morphological supervision across languages to improve language modeling performance in the
low-resource setting.