Abstract
One of the key steps in language resource
creation is the identification of the text segments to be annotated, or markables– in our
case, the (potentially nested) noun phrases
in coreference resolution (or mentions). In
this paper, we present a method for identifying markables for coreference annotation that
combines high-performance automatic markable detectors with checking with a GameWith-A-Purpose (GWAP) and aggregation using a Bayesian annotation model. The method
was evaluated both on news data and data from
a variety of other genres and results in an improvement on F1 of mention boundaries of
over seven percentage points when compared
with a state-of-the-art, domain-independent
automatic mention detector, and almost three
points over an in-domain mention detector.
One of the key contributions of our proposal
is its applicability to the case in which markables are nested, as is the case with coreference markables; but the GWAP and several of
the proposed markable detectors are task- and
language-independent and are thus applicable
to a variety of other annotation scenarios.