Abstract
A longstanding debate in semiotics centers on
the relationship between linguistic signs and
their corresponding semantics: is there an arbitrary relationship between a word form and
its meaning, or does some systematic phenomenon pervade? For instance, does the
character bigram gl have any systematic relationship to the meaning of words like glisten, gleam and glow? In this work, we offer a holistic quantification of the systematicity of the sign using mutual information and
recurrent neural networks. We employ these
in a data-driven and massively multilingual
approach to the question, examining 106 languages. We find a statistically significant reduction in entropy when modeling a word
form conditioned on its semantic representation. Encouragingly, we also recover wellattested English examples of systematic af-
fixes. We conclude with the meta-point: Our
approximate effect size (measured in bits) is
quite small—despite some amount of systematicity between form and meaning, an arbitrary
relationship and its resulting benefits dominate
human language