Abstract
We present a study of morphological irregularity. Following recent work, we define an
information-theoretic measure of irregularity
based on the predictability of forms in a language. Using a neural transduction model,
we estimate this quantity for the forms in
28 languages. We first present several validatory and exploratory analyses of irregularity. We then show that our analyses provide
evidence for a correlation between irregularity and frequency: higher frequency items
are more likely to be irregular and irregular
items are more likely be highly frequent. To
our knowledge, this result is the first of its
breadth and confirms longstanding proposals
from the linguistics literature. The correlation is more robust when aggregated at the
level of whole paradigms—providing support
for models of linguistic structure in which in-
flected forms are unified by abstract underlying stems or lexemes. Code is available
at https://github.com/shijie-wu/
neural-transducer