Causal Inference with Rare Events in Large-Scale Time-Series Data Samantha Kleinberg
Abstract
Large-scale observational datasets are prevalent in many areas of research, including biomedical informatics, computational social science, and ?nance. However, our ability to use these data for decision-making lags behind our ability to collect and mine them. One reason for this is the lack of methods for inferring the causal impact of rare events. In cases such as the monitoring of continuous data streams from intensive care patients, social media, or ?nance, though, rare events may in fact be the most important ones – signaling critical changes in a patient’s status or trading volume. While prior data mining approaches can identify or predict rare events, they cannot determine their impact, and probabilistic causal inference methods fail to handle inference with infrequent events. Instead, we develop a new approach to ?nding the causal impact of rare events that leverages the large amount of data available to infer a model of a system’s functioning and evaluates how rare events explain deviations from usual behavior. Using simulated data, we evaluate the approach and compare it against others, demonstrating that it can accurately infer the effects of rare events.