HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID
via Image Captioning
Abstract
Person re-identification (re-ID) aims to recognize a
person-of-interest across different cameras with notable appearance variance. Existing research works
focused on the capability and robustness of visual
representation. In this paper, instead, we propose a
novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer
and more consistent than visual attributes, which
could significantly alleviate the variance. We use
the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill
domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments
are conducted on several benchmark datasets with
or without image captions, i.e., CUHK03, Market-
1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while
achieving state-of-the-art performance