Abstract. We introduce count-guided weakly supervised localization
(C-WSL), an approach that uses per-class object count as a new form
of supervision to improve weakly supervised localization (WSL). C-WSL
uses a simple count-based region selection algorithm to select high-quality
regions, each of which covers a single object instance during training,
and improves existing WSL methods by training with the selected regions. To demonstrate the effectiveness of C-WSL, we integrate it into
two WSL architectures and conduct extensive experiments on VOC2007
and VOC2012. Experimental results show that C-WSL leads to large
improvements in WSL and that the proposed approach significantly outperforms the state-of-the-art methods. The results of annotation experiments on VOC2007 suggest that a modest extra time is needed to obtain
per-class object counts compared to labeling only object categories in an
image. Furthermore, we reduce the annotation time by more than 2×
and 38× compared to center-click and bounding-box annotations.