资源论文Are Red Roses Red? Evaluating Consistency of Question-Answering Models

Are Red Roses Red? Evaluating Consistency of Question-Answering Models

2019-09-20 | |  217 |   56 |   0 0 0
Abstract Although current evaluation of questionanswering systems treats predictions in isolation, we need to consider the relationship between predictions to measure true understanding. A model should be penalized for answering “no” to “Is the rose red?” if it answers “red” to “What color is the rose?”. We propose a method to automatically extract such implications for instances from two QA datasets, VQA and SQuAD, which we then use to evaluate the consistency of models. Human evaluation shows these generated implications are well formed and valid. Consistency evaluation provides crucial insights into gaps in existing models, and retraining with implicationaugmented data improves consistency on both synthetic and human-generated implications.

上一篇:Using LSTMs to Assess the Obligatoriness of Phonological Distinctive Features for Phonotactic Learning

下一篇:Aspect Sentiment Classification Towards Question-Answering with Reinforced Bidirectional Attention Network

用户评价
全部评价

热门资源

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Rating-Boosted La...

    The performance of a recommendation system reli...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...