Abstract
In this paper we show that a geometric representation of an ob ject occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that ob ject in a single image. Us- ing perspective cues from the global scene geometry, we first develop a 3D based ob ject detector. This detector is competitive with an image based detector built using state-of-the-art methods; however, combining the two produces a notably improved detector, because it unifies con- textual and geometric information. We then use a probabilistic model that explicitly uses constraints imposed by spatial layout – the locations of walls and floor in the image – to refine the 3D ob ject estimates. We use an existing approach to compute spatial layout [1], and use con- straints such as ob jects are supported by floor and can not stick through the walls. The resulting detector (a) has significantly improved accuracy when compared to the state-of-the-art 2D detectors and (b) gives a 3D interpretation of the location of the ob ject, derived from a 2D image. We evaluate the detector on beds, for which we give extensive quantitative results derived from images of real scenes.