Enhancing Air Quality Prediction with Social Media and Natural
Language Processing
Abstract
Accompanied by modern industrial developments, air pollution has already become a major concern for human health. Hence, air
quality measures, such as the concentration
of PM2.5, have attracted increasing attention.
Even some studies apply historical measurements into air quality forecast, the changes of
air quality conditions are still hard to monitor. In this paper, we propose to exploit social
media and natural language processing techniques to enhance air quality prediction. Social media users are treated as social sensors
with their findings and locations. After filtering noisy tweets using word selection and
topic modeling, a deep learning model based
on convolutional neural networks and overtweet-pooling is proposed to enhance air quality prediction. We conduct experiments on 7-
month real-world Twitter datasets in the five
most heavily polluted states in the USA. The
results show that our approach significantly
improves air quality prediction over the baseline that does not use social media by 6.9% to
17.7% in macro-F1 scores