Abstract
The Air Travel Information Service (ATIS)
corpus has been the most common benchmark
for evaluating Spoken Language Understanding (SLU) tasks for more than three decades
since it was released. Recent state-of-the-art
neural models have obtained F1-scores near
98% on the task of slot filling. We developed a
rule-based grammar for the ATIS domain that
achieves a 95.82% F1-score on our evaluation
set. In the process, we furthermore discovered numerous shortcomings in the ATIS corpus annotation, which we have fixed.
This paper presents a detailed account of these
shortcomings, our proposed repairs, our rulebased grammar and the neural slot-filling architectures associated with ATIS. We also rationally reappraise the motivations for choosing a neural architecture in view of this account. Fixing the annotation errors results in
a relative error reduction of between 19.4 and
52% across all architectures. We nevertheless
argue that neural models must play a different
role in ATIS dialogues because of the latter’s
lack of variety