Abstract
Visual scene understanding is a diffificult problem interleaving object detection, geometric reasoning and scene classifification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplifification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects which frequently co-occur in the same 3D spatial confifiguration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections