Abstract
We present an approach to effificiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affifinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image. The architecture is designed to jointly learn part locations and their association via two branches of the same sequential prediction process. Our method placed fifirst in the inaugural COCO 2016 keypoints challenge, and signifificantly exceeds the previous state-of-the-art result on the MPII MultiPerson benchmark, both in performance and effificiency