Abstract
In this work, we propose a method for simultaneously learning features and a corresponding similarity metric for person re-identifification. We present a deep convolutional architecture with layers specially designed to address the problem of re-identifification. Given a pair of images as input, our network outputs a similarity value indicating whether the two input images depict the same person. Novel elements of our architecture include a layer that computes cross-input neighborhood differences, which capture local relationships between the two input images based on midlevel features from each input image. A high-level summary of the outputs of this layer is computed by a layer of patch summary features, which are then spatially integrated in subsequent layers. Our method signifificantly outperforms the state of the art on both a large data set (CUHK03) and a medium-sized data set (CUHK01), and is resistant to over- fifitting. We also demonstrate that by initially training on an unrelated large data set before fifine-tuning on a small target data set, our network can achieve results comparable to the state of the art even on a small data set (VIPeR)