Abstract. Pedestrian detection in crowded scenes is a challenging problem since the pedestrians often gather together and occlude each other. In
this paper, we propose a new occlusion-aware R-CNN (OR-CNN) to improve the detection accuracy in the crowd. Specifically, we design a new
aggregation loss to enforce proposals to be close and locate compactly
to the corresponding objects. Meanwhile, we use a new part occlusionaware region of interest (PORoI) pooling unit to replace the RoI pooling
layer in order to integrate the prior structure information of human body
with visibility prediction into the network to handle occlusion. Our detector is trained in an end-to-end fashion, which achieves state-of-the-art
results on three pedestrian detection datasets, i.e., CityPersons, ETH,
and INRIA, and performs on-pair with the state-of-the-arts on Caltech