Abstract
3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation
has proven to be a powerful tool to obtain 3D fits “in-thewild”. However, depending on the level of detail, it can be
hard to impossible to acquire labeled data for training 2D
estimators on large scale. We propose a hybrid approach to
this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body
model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads
to an initial dataset, UP-3D, with rich annotations. With a
comprehensive set of experiments, we show how this data
can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body.
Using the 91 landmark pose estimator, we present state-ofthe art results for 3D human pose and shape estimation using an order of magnitude less training data and without
assumptions about gender or pose in the fitting procedure.
We show that UP-3D can be enhanced with these improved
fits to grow in quantity and quality, which makes the system
deployable on large scale. The data, code and models are
available for research purposes.