Abstract. Object counting is an important task in computer vision
due to its growing demand in applications such as surveillance, traf-
fic monitoring, and counting everyday objects. State-of-the-art methods
use regression-based optimization where they explicitly learn to count
the objects of interest. These often perform better than detection-based
methods that need to learn the more difficult task of predicting the location, size, and shape of each object. However, we propose a detectionbased method that does not need to estimate the size and shape of the
objects and that outperforms regression-based methods. Our contributions are three-fold: (1) we propose a novel loss function that encourages
the network to output a single blob per object instance using pointlevel annotations only; (2) we design two methods for splitting large predicted blobs between object instances; and (3) we show that our method
achieves new state-of-the-art results on several challenging datasets including the Pascal VOC and the Penguins dataset. Our method even
outperforms those that use stronger supervision such as depth features,
multi-point annotations, and bounding-box labels