CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly
Congested Scenes
Abstract
We propose a network for Congested Scene Recognition
called CSRNet to provide a data-driven and deep learning
method that can understand highly congested scenes and
perform accurate count estimation as well as present highquality density maps. The proposed CSRNet is composed
of two major components: a convolutional neural network
(CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to
deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure
convolutional structure. We demonstrate CSRNet on four
datasets (ShanghaiTech dataset, the UCF CC 50 dataset,
the WorldEXPO’10 dataset, and the UCSD dataset) and
we deliver the state-of-the-art performance. In the ShanghaiTech Part B dataset, CSRNet achieves 47.3% lower
Mean Absolute Error (MAE) than the previous state-of-theart method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset.
Results show that CSRNet significantly improves the output
quality with 15.4% lower MAE than the previous state-ofthe-art approach