Abstract
We study how to synthesize novel views of human body
from a single image. Though recent deep learning based
methods work well for rigid objects, they often fail on
objects with large articulation, like human bodies. The core
step of existing methods is to fit a map from the observable
views to novel views by CNNs; however, the rich articulation modes of human body make it rather challenging
for CNNs to memorize and interpolate the data well. To
address the problem, we propose a novel deep learning
based pipeline that explicitly estimates and leverages the
geometry of the underlying human body. Our new pipeline
is a composition of a shape estimation network and an image generation network, and at the interface a perspective
transformation is applied to generate a forward flow for
pixel value transportation. Our design is able to factor out
the space of data variation and makes learning at each step
much easier. Empirically, we show that the performance for
pose-varying objects can be improved dramatically. Our
method can also be applied on real data captured by 3D
sensors, and the flow generated by our methods can be used
for generating high quality results in higher resolution