Abstract
We present a learning-based approach for synthesizing
facial geometry at medium and fine scales from diffusely-lit
facial texture maps. When applied to an image sequence,
the synthesized detail is temporally coherent. Unlike current state-of-the-art methods [17, 5], which assume ”dark
is deep”, our model is trained with measured facial detail collected using polarized gradient illumination in a
Light Stage [20]. This enables us to produce plausible
facial detail across the entire face, including where previous approaches may incorrectly interpret dark features as
concavities such as at moles, hair stubble, and occluded
pores. Instead of directly inferring 3D geometry, we propose to encode fine details in high-resolution displacement
maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network
[29] and super resolution network [43]. To effectively capture geometric detail at both mid- and high frequencies, we
factorize the learning into two separate sub-networks, enabling the full range of facial detail to be modeled. Results from our learning-based approach compare favorably
with a high-quality active facial scanhening technique, and
require only a single passive lighting condition without a
complex scanning setup