Abstract
We present a novel approach for the task of human pose
transfer, which aims at synthesizing a new image of a person
from an input image of that person and a target pose. Unlike
existing methods, we propose to estimate dense and intrinsic 3D appearance flow to better guide the transfer of pixels between poses. In particular, we wish to generate the 3D
flow from just the reference and target poses. Training a network for this purpose is non-trivial, especially when the annotations for 3D appearance flow are scarce by nature. We
address this problem through a flow synthesis stage. This is
achieved by fitting a 3D model to the given pose pair and
project them back to the 2D plane to compute the dense appearance flow for training. The synthesized ground-truths
are then used to train a feedforward network for efficient
mapping from the input and target skeleton poses to the
3D appearance flow. With the appearance flow, we perform feature warping on the input image and generate a
photorealistic image of the target pose. Extensive results
on DeepFashion and Market-1501 datasets demonstrate the
effectiveness of our approach over existing methods. Our
code is available at http://mmlab.ie.cuhk.edu.
hk/projects/pose-transfer/