Abstract
In this paper, we develop a neural summarization model which can effectively process
multiple input documents and distill abstractive summaries. Our model augments a previously proposed Transformer architecture (Liu
et al., 2018) with the ability to encode documents in a hierarchical manner. We represent
cross-document relationships via an attention
mechanism which allows to share information
as opposed to simply concatenating text spans
and processing them as a flat sequence. Our
model learns latent dependencies among textual units, but can also take advantage of explicit graph representations focusing on similarity or discourse relations. Empirical results
on the WikiSum dataset demonstrate that the
proposed architecture brings substantial improvements over several strong baselines