Abstract
Despite the great success of face recognition techniques,
recognizing persons under unconstrained settings remains
challenging. Issues like profile views, unfavorable lighting,
and occlusions can cause substantial difficulties. Previous
works have attempted to tackle this problem by exploiting
the context, e.g. clothes and social relations. While showing
promising improvement, they are usually limited in two important aspects, relying on simple heuristics to combine different cues and separating the construction of context from
people identities. In this work, we aim to move beyond such
limitations and propose a new framework to leverage context for person recognition. In particular, we propose a Region Attention Network, which is learned to adaptively combine visual cues with instance-dependent weights. We also
develop a unified formulation, where the social contexts are
learned along with the reasoning of people identities. These
models substantially improve the robustness when working
with the complex contextual relations in unconstrained environments. On two large datasets, PIPA [27] and Cast In
Movies (CIM), a new dataset proposed in this work, our
method consistently achieves state-of-the-art performance
under multiple evaluation policies