Abstract
Hough transform based methods for ob ject detection work by allowing image features to vote for the location of the ob ject. While this representation allows for parts observed in different training instances to support a single ob ject hypothesis, it also produces false positives by accumulating votes that are consistent in location but in- consistent in other properties like pose, color, shape or type. In this work, we propose to augment the Hough transform with latent variables in order to enforce consistency among votes. To this end, only votes that agree on the assignment of the latent variable are allowed to sup- port a single hypothesis. For training a Latent Hough Transform (LHT) model, we propose a learning scheme that exploits the linearity of the Hough transform based methods. Our experiments on two datasets in- cluding the challenging PASCAL VOC 2007 benchmark show that our method outperforms traditional Hough transform based methods leading to state-of-the-art performance on some categories.