Abstract. In this work we integrate ideas from surface-based modeling
with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate
pose transfer, i.e. synthesize a new image of a person based on a single
image of that person and the image of a pose donor. We use a dense
pose estimation system that maps pixels from both images to a common
surface-based coordinate system, allowing the two images to be brought
in correspondence with each other. We inpaint and refine the source image intensities in the surface coordinate system, prior to warping them
onto the target pose. These predictions are fused with those of a convolutional predictive module through a neural synthesis module allowing for
training the whole pipeline jointly end-to-end, optimizing a combination
of adversarial and perceptual losses. We show that dense pose estimation
is a substantially more powerful conditioning input than landmark-, or
mask-based alternatives, and report systematic improvements over state
of the art generators on DeepFashion and MVC datasets