Abstract
Shape is a natural, highly prominent characteristic of ob- jects that human vision utilizes everyday. But despite its expressiveness, shape poses significant challenges for category-level ob ject detection in cluttered scenes: Ob ject form is an emergent property that cannot be perceived locally but becomes only available once the whole ob ject has been detected and segregated from the background. Thus we address the detection of ob jects and the assembling of their shape simultaneously. A dictionary of meaningful contours is obtained by clustering based on contour co-activation in all training images. We seek a joint, consistent placement of all contours in an image, since placing them independently from another is not reliable due to the emergence of shape. Therefore, the characteristic ob ject shape is learned by discovering spatially con- sistent configurations of all dictionary contours using maximum margin multiple instance learning. During recognition, ob jects are detected and their shape is explained simultaneously by optimizing a single cost func- tion. We demonstrate the benefit of our approach on standard shape benchmarks.