Abstract
Recent neural network models have signifi-
cantly advanced the task of coreference resolution. However, current neural coreference
models are typically trained with heuristic loss
functions that are computed over a sequence
of local decisions. In this paper, we introduce
an end-to-end reinforcement learning based
coreference resolution model to directly optimize coreference evaluation metrics. Specifically, we modify the state-of-the-art higherorder mention ranking approach in Lee et al.
(2018) to a reinforced policy gradient model
by incorporating the reward associated with a
sequence of coreference linking actions. Furthermore, we introduce maximum entropy regularization for adequate exploration to prevent the model from prematurely converging
to a bad local optimum. Our proposed model
achieves new state-of-the-art performance on
the English OntoNotes v5.0 benchmark.