Abstract
We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in
a learning-to-rank framework. To induce a ranking of
cropped images , we use the observation that any sub-image
of a crowded scene image is guaranteed to contain the same
number or fewer persons than the super-image. This allows us to address the problem of limited size of existing
datasets for crowd counting. We collect two crowd scene
datasets from Google using keyword searches and queryby-example image retrieval, respectively. We demonstrate
how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which
simultaneously ranks images and estimates crowd density
maps. Experiments on two of the most challenging crowd
counting datasets show that our approach obtains state-ofthe-art results