Learning Deep Context-aware Features over Body and Latent Parts
for Person Re-identification
Abstract
Person Re-identification (ReID) is to identify the same
person across different cameras. It is a challenging task
due to the large variations in person pose, occlusion, background clutter, etc. How to extract powerful features is a
fundamental problem in ReID and is still an open problem today. In this paper, we design a Multi-Scale ContextAware Network (MSCAN) to learn powerful features over
full body and body parts, which can well capture the local context knowledge by stacking multi-scale convolutions
in each layer. Moreover, instead of using predefined rigid
parts, we propose to learn and localize deformable pedestrian parts using Spatial Transformer Networks (STN) with
novel spatial constraints. The learned body parts can release some difficulties, e.g. pose variations and background
clutters, in part-based representation. Finally, we integrate the representation learning processes of full body
and body parts into a unified framework for person ReID through multi-class person identification tasks. Extensive
evaluations on current challenging large-scale person ReID
datasets, including the image-based Market1501, CUHK03
and sequence-based MARS datasets, show that the proposed method achieves the state-of-the-art results