Abstract
Ob ject detection and pose estimation are interdependent problems in computer vision. Many past works decouple these problems, either by discretizing the continuous pose and training pose-specific ob- ject detectors, or by building pose estimators on top of detector outputs. In this paper, we propose a structured kernel machine approach to treat ob ject detection and pose estimation jointly in a mutually benificial way. In our formulation, a unified, continuously parameterized, discriminative appearance model is learned over the entire pose space. We propose a cascaded discrete-continuous algorithm for efficient inference, and give effective online constraint generation strategies for learning our model using structural SVMs. On three standard benchmarks, our method per- forms better than, or on par with, state-of-the-art methods in the com- bined task of ob ject detection and pose estimation.