Improving Occlusion and Hard Negative Handling
for Single-Stage Pedestrian Detectors
Abstract
We propose methods of addressing two critical issues
of pedestrian detection: (i) occlusion of target objects as
false negative failure, and (ii) confusion with hard negative examples like vertical structures as false positive failure. Our solutions to these two problems are general and
flexible enough to be applicable to any single-stage detection models. We implement our methods into four state-ofthe-art single-stage models, including SqueezeDet+ [22],
YOLOv2 [17], SSD [12], and DSSD [8]. We empirically
validate that our approach indeed improves the performance of those four models on Caltech pedestrian [4] and
CityPersons dataset [25]. Moreover, in some heavy occlusion settings, our approach achieves the best reported performance. Specifically, our two solutions are as follows.
For better occlusion handling, we update the output tensors of single-stage models so that they include the prediction of part confidence scores, from which we compute
a final occlusion-aware detection score. For reducing confusion with hard negative examples, we introduce average
grid classifiers as post-refinement classifiers, trainable in
an end-to-end fashion with little memory and time overhead
(e.g. increase of 1–5 MB in memory and 1–2 ms in inference
time)