Improving the Similarity Measure of Determinantal Point Processes for
Extractive Multi-Document Summarization
Abstract
The most important obstacles facing multidocument summarization include excessive redundancy in source descriptions and the looming shortage of training data. These obstacles
prevent encoder-decoder models from being
used directly, but optimization-based methods
such as determinantal point processes (DPPs)
are known to handle them well. In this paper
we seek to strengthen a DPP-based method for
extractive multi-document summarization by
presenting a novel similarity measure inspired
by capsule networks. The approach measures
redundancy between a pair of sentences based
on surface form and semantic information. We
show that our DPP system with improved similarity measure performs competitively, outperforming strong summarization baselines on
benchmark datasets. Our findings are particularly meaningful for summarizing documents
created by multiple authors containing redundant yet lexically diverse expressions