Reading Turn by Turn: Hierarchical Attention Architecture for
Spoken Dialogue Comprehension
Abstract
Comprehending multi-turn spoken conversations is an emerging research area, presenting challenges different from reading comprehension of passages due to the interactive
nature of information exchange from at least
two speakers. Unlike passages, where sentences are often the default semantic modeling unit, in multi-turn conversations, a turn is
a topically coherent unit embodied with immediately relevant context, making it a linguistically intuitive segment for computationally
modeling verbal interactions. Therefore, in
this work, we propose a hierarchical attention
neural network architecture, combining turnlevel and word-level attention mechanisms, to
improve spoken dialogue comprehension performance. Experiments are conducted on a
multi-turn conversation dataset, where nurses
inquire and discuss symptom information with
patients. We empirically show that the proposed approach outperforms standard attention baselines, achieves more efficient learning
outcomes, and is more robust to lengthy and
out-of-distribution test samples