Abstract
Word embeddings are now pervasive across
NLP subfields as the de-facto method of forming text representataions. In this work, we
show that existing embedding models are inadequate at constructing representations that
capture salient aspects of mathematical meaning for numbers, which is important for language understanding. Numbers are ubiquitous
and frequently appear in text. Inspired by cognitive studies on how humans perceive numbers, we develop an analysis framework to test
how well word embeddings capture two essential properties of numbers: magnitude (e.g.
3<4) and numeration (e.g. 3=three). Our experiments reveal that most models capture an
approximate notion of magnitude, but are inadequate at capturing numeration. We hope
that our observations provide a starting point
for the development of methods which better
capture numeracy in NLP systems.