Abstract. Face recognition has witnessed great progress in recent years,
mainly attributed to the high-capacity model designed and the abundant
labeled data collected. However, it becomes more and more prohibitive
to scale up the current million-level identity annotations. In this work,
we show that unlabeled face data can be as effective as the labeled ones.
Here, we consider a setting closely mimicking the real-world scenario,
where the unlabeled data are collected from unconstrained environments
and their identities are exclusive from the labeled ones. Our main insight is that although the class information is not available, we can still
faithfully approximate these semantic relationships by constructing a relational graph in a bottom-up manner. We propose Consensus-Driven
Propagation (CDP) to tackle this challenging problem with two modules, the “committee” and the “mediator”, which select positive face
pairs robustly by carefully aggregating multi-view information. Extensive experiments validate the effectiveness of both modules to discard
outliers and mine hard positives. With CDP, we achieve a compelling
accuracy of 78.18% on MegaFace identification challenge by using only
9% of the labels, comparing to 61.78% when no unlabeled data are used
and 78.52% when all labels are employed