Identifying the relationship between two articles, e.g., whether two articles published from
different sources describe the same breaking news, is critical to many document understanding tasks. Existing approaches for
modeling and matching sentence pairs do not
perform well in matching longer documents,
which embody more complex interactions between the enclosed entities than a sentence
does. To model article pairs, we propose the
Concept Interaction Graph to represent an article as a graph of concepts. We then match
a pair of articles by comparing the sentences
that enclose the same concept vertex through
a series of encoding techniques, and aggregate
the matching signals through a graph convolutional network. To facilitate the evaluation
of long article matching, we have created two
datasets, each consisting of about 30K pairs
of breaking news articles covering diverse topics in the open domain. Extensive evaluations
of the proposed methods on the two datasets
demonstrate significant improvements over a
wide range of state-of-the-art methods for natural language matching.