The KNOWREF Coreference Corpus: Removing Gender and Number
Cues for Difficult Pronominal Anaphora Resolution
Abstract
We introduce a new benchmark for coreference resolution and NLI, KNOWREF, that targets common-sense understanding and world
knowledge. Previous coreference resolution
tasks can largely be solved by exploiting the
number and gender of the antecedents, or have
been handcrafted and do not reflect the diversity of naturally occurring text. We present
a corpus of over 8,000 annotated text passages with ambiguous pronominal anaphora.
These instances are both challenging and realistic. We show that various coreference
systems, whether rule-based, feature-rich, or
neural, perform significantly worse on the
task than humans, who display high interannotator agreement. To explain this performance gap, we show empirically that state-ofthe art models often fail to capture context, instead relying on the gender or number of candidate antecedents to make a decision. We
then use problem-specific insights to propose
a data-augmentation trick called antecedent
switching to alleviate this tendency in models. Finally, we show that antecedent switching yields promising results on other tasks as
well: we use it to achieve state-of-the-art results on the GAP coreference task