Abstract
We present a novel approach for synthesizing photorealistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our
model renders the image of the same person under the new
pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This
problem has recently been addressed in a supervised manner [16, 35], i.e., during training the ground truth images
under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised
strategy. We tackle this challenging scenario by splitting
the problem into two principal subtasks. First, we consider
a pose conditioned bidirectional generator that maps back
the initially rendered image to the original pose, hence being directly comparable to the input image without the need
to resort to any training image. Second, we devise a novel
loss function that incorporates content and style terms, and
aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset
demonstrate that the images rendered by our model are very
close in appearance to those obtained by fully supervised
approaches.