Abstract
Pedestrian detection has progressed significantly in the
last years. However, occluded people are notoriously hard
to detect, as their appearance varies substantially depending on a wide range of occlusion patterns. In this paper,
we aim to propose a simple and compact method based on
the FasterRCNN architecture for occluded pedestrian detection.
We start with interpreting CNN channel features of a
pedestrian detector, and we find that different channels activate responses for different body parts respectively. These
findings motivate us to employ an attention mechanism
across channels to represent various occlusion patterns in
one single model, as each occlusion pattern can be formulated as some specific combination of body parts. Therefore, an attention network with self or external guidances
is proposed as an add-on to the baseline FasterRCNN detector. When evaluating on the heavy occlusion subset, we
achieve a significant improvement of 8pp to the baseline
FasterRCNN detector on CityPersons and on Caltech we
outperform the state-of-the-art method by 4pp