Abstract
Matching persons across non-overlapping cameras is a rather challenging task. Thus, successful methods often build on complex fea- ture representations or sophisticated learners. A recent trend to tackle this problem is to use metric learning to find a suitable space for match- ing samples from different cameras. However, most of these approaches ignore the transition from one camera to the other. In this paper, we propose to learn a metric from pairs of samples from different cameras. In this way, even less sophisticated features describing color and texture information are sufficient for finally getting state-of-the-art classification results. Moreover, once the metric has been learned, only linear pro- jections are necessary at search time, where a simple nearest neighbor classification is performed. The approach is demonstrated on three pub- licly available datasets of different complexity, where it can be seen that state-of-the-art results can be obtained at much lower computational costs.