Abstract
Consistency is a long standing issue faced by
dialogue models. In this paper, we frame the
consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue
NLI. We propose a method which demonstrates that a model trained on Dialogue NLI
can be used to improve the consistency of a
dialogue model, and evaluate the method with
human evaluation and with automatic metrics
on a suite of evaluation sets designed to measure a dialogue model’s consistency