Abstract
Recently, person re-identification (re-ID) has attracted increasing research attention, which has
broad application prospects in video surveillance
and beyond. To this end, most existing methods
highly relied on well-aligned pedestrian images and
hand-engineered part-based model on the coarsest
feature map. In this paper, to lighten the restriction of such fixed and coarse input alignment, an
end-to-end part power set model with multi-scale
features is proposed, which captures the discriminative parts of pedestrians from global to local, and
from coarse to fine, enabling part-based scale-free
person re-ID. In particular, we first factorize the
visual appearance by enumerating k-combinations
for all k of n body parts to exploit rich global
and partial information to learn discriminative feature maps. Then, a combination ranking module
is introduced to guide the model training with all
combinations of body parts, which alternates between ranking combinations and estimating an appearance model. To enable scale-free input, we further exploit the pyramid architecture of deep networks to construct multi-scale feature maps with
a feasible amount of extra cost in term of memory and time. Extensive experiments on the mainstream evaluation datasets, including Market-1501,
DukeMTMC-reID and CUHK03, validate that our
method achieves the state-of-the-art performance