Abstract
We address the computational problem of novel human
pose synthesis. Given an image of a person and a desired
pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background.
We present a modular generative neural network that synthesizes unseen poses using training pairs of images and
poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their
appearances, and composites the new foreground with a
hole-filled background. These subtasks, implemented with
separate modules, are trained jointly using only a single
target image as a supervised label. We use an adversarial
discriminator to force our network to synthesize realistic
details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts
and tennis, and show that our method produces accurate results within action classes as well as across action classes.
Given a sequence of desired poses, we also produce coherent videos of actions.