Abstract
Cloze-style reading comprehension in Chinese
is still limited due to the lack of various corpora. In this paper we propose a large-scale
Chinese cloze test dataset ChID, which studies the comprehension of idiom, a unique language phenomenon in Chinese. In this corpus,
the idioms in a passage are replaced by blank
symbols and the correct answer needs to be
chosen from well-designed candidate idioms.
We carefully study how the design of candidate idioms and the representation of idioms
affect the performance of state-of-the-art models. Results show that the machine accuracy is
substantially worse than that of human, indicating a large space for further research.