Abstract
Person Re-identification (ReID) is an important yet challenging task in computer vision. Due to the diverse background clutters, variations on viewpoints and body poses, it
is far from solved. How to extract discriminative and robust
features invariant to background clutters is the core problem. In this paper, we first introduce the binary segmentation masks to construct synthetic RGB-Mask pairs as inputs,
then we design a mask-guided contrastive attention model
(MGCAM) to learn features separately from the body and
background regions. Moreover, we propose a novel regionlevel triplet loss to restrain the features learnt from different regions, i.e., pulling the features from the full image and
body region close, whereas pushing the features from backgrounds away. We may be the first one to successfully introduce the binary mask into person ReID task and the first
one to propose region-level contrastive learning. We evaluate the proposed method on three public datasets, including
MARS, Market-1501 and CUHK03. Extensive experimental results show that the proposed method is effective and
achieves the state-of-the-art results. Mask and code will be
released upon request