Abstract
This work combines two active areas of research in computer vision: unsupervised ob ject extraction from a single image, and depth estimation from a stereo image pair. A recent, successful trend in unsu- pervised ob ject extraction is to exploit so-called “3D scene-consistency”, that is enforcing that ob jects obey underlying physical constraints of the 3D scene, such as occupancy of 3D space and gravity of ob jects. Our main contribution is to introduce the concept of 3D scene-consistency into stereo matching. We show that this concept is beneficial for both tasks, ob ject extraction and depth estimation. In particular, we demonstrate that our approach is able to create a large set of 3D scene-consistent ob ject proposals, by varying e.g. the prior on the number of ob jects. After automatically ranking the proposals we show experimentally that our results are considerably closer to ground truth than state-of-the-art techniques which either use stereo or monocular images. We envision that our method will build the front-end of a future ob ject recognition system for stereo images.