Abstract
This paper describes a method to obtain accurate 3D
body models and texture of arbitrary people from a single,
monocular video in which a person is moving. Based on
a parametric body model, we present a robust processing
pipeline to infer 3D model shapes including clothed people with 4.5mm reconstruction accuracy. At the core of our
approach is the transformation of dynamic body pose into
a canonical frame of reference. Our main contribution is
a method to transform the silhouette cones corresponding
to dynamic human silhouettes to obtain a visual hull in a
common reference frame. This enables efficient estimation
of a consensus 3D shape, texture and implanted animation
skeleton based on a large number of frames. Results on 4
different datasets demonstrate the effectiveness of our approach to produce accurate 3D models. Requiring only an
RGB camera, our method enables everyone to create their
own fully animatable digital double, e.g., for social VR applications or virtual try-on for online fashion shopping