Abstract
Image-based 3D reconstruction for Internet photo collections has become a robust technology to produce impressive virtual representations of real-world scenes. However,
several fundamental challenges remain for Structure-fromMotion (SfM) pipelines, namely: the placement and reconstruction of transient objects only observed in single views,
estimating the absolute scale of the scene, and (suprisingly
often) recovering ground surfaces in the scene. We propose
a method to jointly address these remaining open problems
of SfM. In particular, we focus on detecting people in individual images and accurately placing them into an existing
3D model. As part of this placement, our method also estimates the absolute scale of the scene from object semantics, which in this case constitutes the height distribution of
the population. Further, we obtain a smooth approximation of the ground surface and recover the gravity vector
of the scene directly from the individual person detections.
We demonstrate the results of our approach on a number of
unordered Internet photo collections, and we quantitatively
evaluate the obtained absolute scene scales