Abstract
The sequential order of utterances is often
meaningful in coherent dialogues, and the order changes of utterances could lead to lowquality and incoherent conversations. We consider the order information as a crucial supervised signal for dialogue learning, which,
however, has been neglected by many previous dialogue systems. Therefore, in this paper, we introduce a self-supervised learning
task, inconsistent order detection, to explicitly capture the flow of conversation in dialogues. Given a sampled utterance pair triple,
the task is to predict whether it is ordered or
misordered. Then we propose a samplingbased self-supervised network SSN to perform the prediction with sampled triple references from previous dialogue history. Furthermore, we design a joint learning framework where SSN can guide the dialogue systems towards more coherent and relevant dialogue learning through adversarial training.
We demonstrate that the proposed methods
can be applied to both open-domain and taskoriented dialogue scenarios, and achieve the
new state-of-the-art performance on the OpenSubtitiles and Movie-Ticket Booking datasets.