资源论文Extractive and Abstractive Event Summarization over Streaming Web Text

Extractive and Abstractive Event Summarization over Streaming Web Text

2019-11-25 | |  107 |   55 |   0
Abstract Extractive and Abstr Summarization over Streaming Chris Kedzie Dept. of Computer Science Columbia University kedzie@cs.columbia.edu During crises, information is critical for responders and victims. When the event is significant, as in the case of hurricane Sandy, the amount of content produced by traditional news outlets, relief organizations, and social media vastly overwhelms those trying to monitor the situation. The ensuing digital overload that accompanies large scale disasters suggests an opportunity for automatic summarization – the implied task here is to monitor an event as it unfolover time by processing an associated stream of documents and producing a rolling update summary containing the most salient information with respect to the event (which we also refer to as the query). This general task is found in a variety of fields including journalism, finance, and especially crisis informatics, where there is a dire need at all times for situational awareness (i.what is happening now) that is largely achieved manually [Starbird and Palen, 2013]. This should be a major use case for the decades-long research on automatic multi-document summarization (MDS) systems. Such systems could deliver relevant and salient information without interruption, even when humans are unable to. Perhaps more importantly, they could help filter out unnecessary and irrelevant detail when the volume of incoming information is large. Frustratingly, classic MDS approaches are not robust enough to handle streaming data. Their reliance on unsupervised clustering and nearest neighbors techniques leans heavily on lexical redundancy to determine the salience of a text [Erkan and Radev, 2004]. In the streaming scenario we focused on recovering novel information which is often not detected by these algorithms. In addition, most MDS methods assume a fixed input set to which they have full retrospective access which is clearly not the case with streaming data and may not be feasible for most large web text corpora. The streaming or time component of the summarization task also brings with it the notion of timeliness – information may become stale or outdated. Managing this has not been extensively studied in the context of MDS. In [Kedzie et al., 2015] we were able to significantly reducreliance on redundancy by explicitly predicting salience with a Gaussian process regression model. We ran experiments in a crisis informatics type scenario, where our summarization system was given an event query (e.g. “boston marathon bombing”) and was expected to filter a multi-terabyte stream

上一篇:Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval

下一篇:Semantic Framework for Industrial Analytics and Diagnostics

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...