Abstract
This paper proposes Confusionset-guided
Pointer Networks for Chinese Spell Check
(CSC) task. More concretely, our approach
utilizes the off-the-shelf confusionset for guiding the character generation. To this end,
our novel Seq2Seq model jointly learns to
copy a correct character from an input sentence through a pointer network, or generate
a character from the confusionset rather than
the entire vocabulary. We conduct experiments
on three human-annotated datasets, and results demonstrate that our proposed generative
model outperforms all competitor models by a
large margin of up to 20% F1 score, achieving
state-of-the-art performance on three datasets.