Abstract
Training accurate 3D human pose estimators requires
large amount of 3D ground-truth data which is costly to
collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data.
Nevertheless, these methods, in addition to 2D groundtruth poses, require either additional supervision in various forms (e.g. unpaired 3D ground truth data, a small subset of labels) or the camera parameters in multiview settings. To address these problems, we present EpipolarPose,
a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data
or camera extrinsics. During training, EpipolarPose estimates 2D poses from multi-view images, and then, utilizes epipolar geometry to obtain a 3D pose and camera
geometry which are subsequently used to train a 3D pose
estimator. We demonstrate the effectiveness of our approach on standard benchmark datasets (i.e. Human3.6M
and MPI-INF-3DHP) where we set the new state-of-the-art
among weakly/self-supervised methods. Furthermore, we
propose a new performance measure Pose Structure Score
(PSS) which is a scale invariant, structure aware measure
to evaluate the structural plausibility of a pose with respect to its ground truth. Code and pretrained models
are available at https://github.com/mkocabas/
EpipolarPose