Abstract
Multi-person pose estimation is fundamental to many
computer vision tasks and has made significant progress
in recent years. However, few previous methods explored
the problem of pose estimation in crowded scenes while
it remains challenging and inevitable in many scenarios.
Moreover, current benchmarks cannot provide an appropriate evaluation for such cases. In this paper, we propose a novel and efficient method to tackle the problem
of pose estimation in the crowd and a new dataset to better evaluate algorithms. Our model consists of two key
components: joint-candidate single person pose estimation
(SPPE) and global maximum joints association. With multipeak prediction for each joint and global association using the graph model, our method is robust to inevitable
interference in crowded scenes and very efficient in inference. The proposed method surpasses the state-of-the-art
methods on CrowdPose dataset by 5.2 mAP and results on
MSCOCO dataset demonstrate the generalization ability
of our method. Source code and dataset are available at
https://github.com/Jeff-sjtu/CrowdPose