Abstract. In this paper, we propose a graininess-aware deep feature
learning method for pedestrian detection. Unlike most existing pedestrian detection methods which only consider low resolution feature maps, we incorporate fine-grained information into convolutional features
to make them more discriminative for human body parts. Specifically,
we propose a pedestrian attention mechanism which efficiently identifies
pedestrian regions. Our method encodes fine-grained attention masks into convolutional feature maps, which significantly suppresses background
interference and highlights pedestrians. Hence, our graininess-aware features become more focused on pedestrians, in particular those of small
size and with occlusion. We further introduce a zoom-in-zoom-out module, which enhances the features by incorporating local details and context information. We integrate these two modules into a deep neural
network, forming an end-to-end trainable pedestrian detector. Comprehensive experimental results on four challenging pedestrian benchmarks
demonstrate the effectiveness of the proposed approach