Abstract
Recently, Deep Convolutional Neural Networks (DC-NNs) have been applied to the task of human pose estima-tion, and have shown its potential of learning better fea-ture representations and capturing contextual relationships.However, it is difficult to incorporate domain prior knowl-edge such as geometric relationships among body parts into DCNNs. In addition, training DCNN-based body part de-tectors without consideration of global body joint consis-tency introduces ambiguities, which increases the complex-ity of training. In this paper, we propose a novel end-to-endframework for human pose estimation that combines DC-NNs with the expressive deformable mixture of parts. We ex-plicitly incorporate domain prior knowledge into the frame-work, which greatly regularizes the learning process andenables the flexibility of our framework for loopy models or tree-structured models. The effectiveness of jointly learning a DCNN with a deformable mixture of parts model is evaluated through intensive experiments on several widely used benchmarks. The proposed approach significantly im-proves the performance compared with state-of-the-art approaches, especially on benchmarks with challenging articulations.