Abstract
This paper explores the localization of pre-defined seman- tic ob ject parts, which is much more challenging than traditional ob ject detection and very important for applications such as face recognition, HCI and fine-grained ob ject recognition. To address this problem, we make two critical improvements over the widely used deformable part model (DPM). The first is that we use appearance based shape regres- sion to globally estimate the anchor location of each part and then locally refine each part according to the estimated anchor location under the constraint of DPM. The DPM with shape regression (SR-DPM) is more flexible than the traditional DPM by relaxing the fixed anchor location of each part. It enjoys the efficient dynamic programming inference as tradi- tional DPM and can be discriminatively trained via a coordinate descent procedure. The second is that we propose to stack multiple SR-DPMs, where each layer uses the output of previous SR-DPM as the input to progressively refine the result. It provides an analogy to deep neural network while benefiting from hand-crafted feature and model. The pro- posed methods are applied to human pose estimation, face alignment and general ob ject part localization tasks and achieve state-of-the-art performance.