Know What You Don’t Know: Modeling a Pragmatic
Speaker that Refers to Objects of Unknown Categories
Abstract
Zero-shot learning in Language & Vision is the
task of correctly labelling (or naming) objects
of novel categories. Another strand of work in
L&V aims at pragmatically informative rather
than “correct” object descriptions, e.g. in reference games. We combine these lines of research and model zero-shot reference games,
where a speaker needs to successfully refer to
a novel object in an image. Inspired by models
of “rational speech acts”, we extend a neural
generator to become a pragmatic speaker reasoning about uncertain object categories. As
a result of this reasoning, the generator produces fewer nouns and names of distractor categories as compared to a literal speaker. We
show that this conversational strategy for dealing with novel objects often improves communicative success, in terms of resolution accuracy of an automatic listener