Abstract
Crowd counting or density estimation is a challenging
task in computer vision due to large scale variations, perspective distortions and serious occlusions, etc. Existing
methods generally suffer from two issues: 1) the model averaging effects in multi-scale CNNs induced by the widely
adopted ?2 regression loss; and 2) inconsistent estimation
across different scaled inputs. To explicitly address these
issues, we propose a novel crowd counting (density estimation) framework called Adversarial Cross-Scale Consistency Pursuit (ACSCP). On one hand, a U-net structured generation network is designed to generate density map from
input patch, and an adversarial loss is directly employed to
shrink the solution onto a realistic subspace, thus attenuating the blurry effects of density map estimation. On the
other hand, we design a novel scale-consistency regularizer which enforces that the sum up of the crowd counts
from local patches (i.e., small scale) is coherent with the
overall count of their region union (i.e., large scale). The
above losses are integrated via a joint training scheme, so
as to help boost density estimation performance by further
exploring the collaboration between both objectives. Extensive experiments on four benchmarks have well demonstrated the effectiveness of the proposed innovations as well as
the superior performance over prior art.