Abstract
To narrow the inherent sensing gap in heterogeneous face recognition (HFR), recent methods
have resorted to generative models and explored
the “recognition via generation” framework. Even
though, it remains a very challenging task to synthesize photo-realistic visible faces (VIS) from
near-infrared (NIR) images especially when paired
training data are unavailable. We present an approach to avert the data misalignment problem and
faithfully preserve pose, expression and identity
information during cross-spectral face hallucination. At the pixel level, we introduce an unsupervised attention mechanism to warping that is
jointly learned with the generator to derive pixelwise correspondence from unaligned data. At the
image level, an auxiliary generator is employed to
facilitate the learning of mapping from NIR to VIS
domain. At the domain level, we first apply the
mutual information constraint to explicitly measure
the correlation between domains and thus bene-
fit synthesis. Extensive experiments on three heterogeneous face datasets demonstrate that our approach not only outperforms current state-of-the-art
HFR methods but also produce visually appealing
results at a high resolution (256×256).