Abstract
Key to effective person re-identification (Re-ID) is modelling discriminative and view-invariant factors of person
appearance at both high and low semantic levels. Recently
developed deep Re-ID models either learn a holistic single
semantic level feature representation and/or require laborious human annotation of these factors as attributes. We
propose Multi-Level Factorisation Net (MLFN), a novel network architecture that factorises the visual appearance of a
person into latent discriminative factors at multiple semantic levels without manual annotation. MLFN is composed
of multiple stacked blocks. Each block contains multiple
factor modules to model latent factors at a specific level,
and factor selection modules that dynamically select the
factor modules to interpret the content of each input image. The outputs of the factor selection modules also provide a compact latent factor descriptor that is complementary to the conventional deeply learned features. MLFN
achieves state-of-the-art results on three Re-ID datasets, as
well as compelling results on the general object categorisation CIFAR-100 dataset