Extractive and Abstractive Event Summarization over Streaming Web Text

资源分类

2019-11-25 |

137 |

71 |

Abstract Extractive and Abstr Summarization over Streaming Chris Kedzie Dept. of Computer Science Columbia University kedzie@cs.columbia.edu During crises, information is critical for responders and victims. When the event is significant, as in the case of hurricane Sandy, the amount of content produced by traditional news outlets, relief organizations, and social media vastly overwhelms those trying to monitor the situation. The ensuing digital overload that accompanies large scale disasters suggests an opportunity for automatic summarization – the implied task here is to monitor an event as it unfolover time by processing an associated stream of documents and producing a rolling update summary containing the most salient information with respect to the event (which we also refer to as the query). This general task is found in a variety of fields including journalism, finance, and especially crisis informatics, where there is a dire need at all times for situational awareness (i.what is happening now) that is largely achieved manually [Starbird and Palen, 2013]. This should be a major use case for the decades-long research on automatic multi-document summarization (MDS) systems. Such systems could deliver relevant and salient information without interruption, even when humans are unable to. Perhaps more importantly, they could help filter out unnecessary and irrelevant detail when the volume of incoming information is large. Frustratingly, classic MDS approaches are not robust enough to handle streaming data. Their reliance on unsupervised clustering and nearest neighbors techniques leans heavily on lexical redundancy to determine the salience of a text [Erkan and Radev, 2004]. In the streaming scenario we focused on recovering novel information which is often not detected by these algorithms. In addition, most MDS methods assume a fixed input set to which they have full retrospective access which is clearly not the case with streaming data and may not be feasible for most large web text corpora. The streaming or time component of the summarization task also brings with it the notion of timeliness – information may become stale or outdated. Managing this has not been extensively studied in the context of MDS. In [Kedzie et al., 2015] we were able to significantly reducreliance on redundancy by explicitly predicting salience with a Gaussian process regression model. We ran experiments in a crisis informatics type scenario, where our summarization system was given an event query (e.g. “boston marathon bombing”) and was expected to filter a multi-terabyte stream

上一篇：Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval

下一篇：Semantic Framework for Industrial Analytics and Diagnostics

用户评价

全部评价

还没有评论，说两句吧！

热门资源

A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...
The Variational S...

Unlike traditional images which do not offer in...
Hierarchical Task...

We extend hierarchical task network planning wi...
Shape-based Autom...

We present an algorithm for automatic detection...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com