Predicting Human Similarity Judgments with Distributional Models: The Value of
Word Associations
Abstract
To represent the meaning of a word, most models
use external language resources, such as text corpora, to derive the distributional properties of word
usage. In this study, we propose that internal language models, that are more closely aligned to the
mental representations of words, can be used to derive new theoretical questions regarding the structure of the mental lexicon. A comparison with internal models also puts into perspective a number
of assumptions underlying recently proposed distributional text-based models could provide important
insights into cognitive science, including linguistics and artificial intelligence. We focus on wordembedding models which have been proposed to
learn aspects of word meaning in a manner similar to humans and contrast them with internal language models derived from a new extensive data set
of word associations. An evaluation using relatedness judgments shows that internal language models consistently outperform current state-of-the art
text-based external language models. This suggests
alternative approaches to represent word meaning
using properties that aren’t encoded in text