Abstract
Objects and structures within man-made environmentstypically exhibit a high degree of organization in the form oforthogonal and parallel planes. Traditional approaches toscene representation exploit this phenomenon via the some-what restrictive assumption that every plane is perpendicu-lar to one of the axes of a single coordinate system. Knownas the Manhattan-World model, this assumption is widelyused in computer vision and robotics. The complexity ofmany real-world scenes, however, necessitates a more flex-ible model. We propose a novel probabilistic model thatdescribes the world as a mixture of Manhattan frames:each frame defines a different orthogonal coordinate sys-tem. This results in a more expressive model that still ex-ploits the orthogonality constraints. We propose an adap-tive Markov-Chain Monte-Carlo sampling algorithm with Metropolis-Hastings split/merge moves that utilizes the geometry of the unit sphere. We demonstrate the versatility of our Mixture-of-Manhattan-Frames model by describing complex scenes using depth images of indoor scenes as well as aerial-LiDAR measurements of an urban center. Additionally, we show that the model lends itself to focal-length calibration of depth cameras and to plane segmentation.