Abstract
Spatial aggregation refers to merging of documents created at the same spatial location.
We show that by spatial aggregation of a large
collection of documents and applying a traditional topic discovery algorithm on the aggregated data we can efficiently discover spatially distinct topics. By looking at topic discovery through matrix factorization lenses we
show that spatial aggregation allows low rank
approximation of the original document-word
matrix, in which spatially distinct topics are
preserved and non-spatial topics are aggregated into a single topic. Our experiments on
synthetic data confirm this observation. Our
experiments on 4.7 million tweets collected
during the Sandy Hurricane in 2012 show that
spatial and temporal aggregation allows rapid
discovery of relevant spatial and temporal topics during that period. Our work indicates
that different forms of document aggregation
might be effective in rapid discovery of various types of distinct topics from large collections of documents.