Abstract
Ob ject viewpoint classification aims at predicting an approx- imate 3D pose of ob jects in a scene and is receiving increasing atten- tion. State-of-the-art approaches to viewpoint classification use genera- tive models to capture relations between ob ject parts. In this work we propose to use a mixture of holistic templates (e.g. HOG) and discrimi- native learning for joint viewpoint classification and category detection. Inspired by the work of Felzenszwalb et al 2009, we discriminatively train multiple components simultaneously for each ob ject category. A large number of components are learned in the mixture and they are as- sociated with canonical viewpoints of the ob ject through different levels of supervision, being fully supervised, semi-supervised, or unsupervised. We show that discriminative learning is capable of producing mixture components that directly provide robust viewpoint classification, signif- icantly outperforming the state of the art: we improve the viewpoint accuracy on the Savarese et al 3D Ob ject database from 57% to 74%, and that on the VOC 2006 car database from 73% to 86%. In addi- tion, the mixture-of-templates approach to ob ject viewpoint/pose has a natural extension to the continuous case by discriminatively learning a linear appearance model locally at each discrete view. We evaluate con- tinuous viewpoint estimation on a dataset of everyday ob jects collected using IMUs for groundtruth annotation: our mixture model shows great promise comparing to a number of baselines including discrete nearest neighbor and linear regression.