Abstract
Since most current scene understanding approaches operate either on the 2D image or using a surface-based representation, they do not allow reasoning about the physical constraints within the 3D scene. Inspired by the “Blocks World” work in the 1960’s, we present a qual- itative physical representation of an outdoor scene where ob jects have volume and mass, and relationships describe 3D structure and mechani- cal configurations. Our representation allows us to apply powerful global geometric constraints between 3D volumes as well as the laws of statics in a qualitative manner. We also present a novel iterative “interpretation- by-synthesis” approach where, starting from an empty ground plane, we progressively “build up” a physically-plausible 3D interpretation of the image. For surface layout estimation, our method demonstrates an im- provement in performance over the state-of-the-art [9]. But more impor- tantly, our approach automatically generates 3D parse graphs which describe qualitative geometric and mechanical properties of ob jects and relationships between ob jects within an image.