Abstract
Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images
between queries and candidates. However, it is different
from real-world scenarios where the annotations of pedestrian bounding boxes are unavailable and the target person needs to be searched from a gallery of whole scene images. To close the gap, we propose a new deep learning
framework for person search. Instead of breaking it down
into two separate tasks—pedestrian detection and person
re-identification, we jointly handle both aspects in a single
convolutional neural network. An Online Instance Matching (OIM) loss function is proposed to train the network effectively, which is scalable to datasets with numerous identities. To validate our approach, we collect and annotate
a large-scale benchmark dataset for person search. It contains 18, 184 images, 8, 432 identities, and 96, 143 pedestrian bounding boxes. Experiments show that our framework outperforms other separate approaches, and the proposed OIM loss function converges much faster and better
than the conventional Softmax loss.