Composition Loss for Counting, Density Map
Estimation and Localization in Dense Crowds
Abstract. With multiple crowd gatherings of millions of people every year in
events ranging from pilgrimages to protests, concerts to marathons, and festivals
to funerals; visual crowd analysis is emerging as a new frontier in computer vision. In particular, counting in highly dense crowds is a challenging problem with
far-reaching applicability in crowd safety and management, as well as gauging
political significance of protests and demonstrations. In this paper, we propose
a novel approach that simultaneously solves the problems of counting, density
map estimation and localization of people in a given dense crowd image. Our
formulation is based on an important observation that the three problems are inherently related to each other making the loss function for optimizing a deep CNN
decomposable. Since localization requires high-quality images and annotations,
we introduce UCF-QNRF dataset that overcomes the shortcomings of previous
datasets, and contains 1.25 million humans manually marked with dot annotations. Finally, we present evaluation measures and comparison with recent deep
CNN networks, including those developed specifically for crowd counting. Our
approach significantly outperforms state-of-the-art on the new dataset, which is
the most challenging dataset with the largest number of crowd annotations in the
most diverse set of scenes