Abstract
Person re-identification has been usually solved as either the matching of single-image representation (SIR) orthe classification of cross-image representation (CIR). Inthis work, we exploit the connection between these two categories of methods, and propose a joint learning framework to unify SIR and CIR using convolutional neural net-work (CNN). Specifically, our deep architecture contains one shared sub-network together with two sub-networksthat extract the SIRs of given images and the CIRs of given image pairs, respectively. The SIR sub-network is required to be computed once for each image (in both the probeand gallery sets), and the depth of the CIR sub-networkis required to be minimal to reduce computational burden. Therefore, the two types of representation can be jointly op-timized for pursuing better matching accuracy with moderate computational cost. Furthermore, the representations learned with pairwise comparison and triplet comparisonobjectives can be combined to improve matching perfor-mance. Experiments on the CUHK03, CUHK01 and VIPeR datasets show that the proposed method can achieve favorable accuracy while compared with state-of-the-arts.