Abstract
Hough transform based ob ject detectors learn a mapping from the image domain to a Hough voting space. Within this space, ob- ject hypotheses are formed by local maxima. The votes contributing to a hypothesis are called support. In this work, we investigate the use of the support and its backpro jection to the image domain for multi-view ob ject detection. To this end, we create a shared codebook with training and matching complexities independent of the number of quantized views. We show that since backpro jection encodes enough information about the viewpoint all views can be handled together. In our experiments, we demonstrate that superior accuracy and efficiency can be achieved in comparison to the popular one-vs-the-rest detectors by treating views jointly especially with few training examples and no view annotations. Furthermore, we go beyond the detection case and based on the support we introduce a part-based similarity measure between two arbitrary de- tections which naturally takes spatial relationships of parts into account and is insensitive to partial occlusions. We also show that backpro jec- tion can be used to efficiently measure the similarity of a detection to all training examples. Finally, we demonstrate how these metrics can be used to estimate continuous ob ject parameters like human pose and ob ject’s viewpoint. In our experiment, we achieve state-of-the-art per- formance for view-classification on the PASCAL VOC’06 dataset.