Unconstrained Monocular 3D Human Pose Estimation by Action Detection and
Cross-modality Regression Forest
Abstract
This work addresses the challenging problem of unconstrained 3D human pose estimation (HPE) from a novel perspective. Existing approaches struggle to operate in realistic applications, mainly due to their scene-dependent priors, such as background segmentation and multi-camera network, which restrict their use in unconstrained environments. We therfore present a framework which applies action detection and 2D pose estimation techniques to infer 3D poses in an unconstrained video. Action detection offers spatiotemporal priors to 3D human pose estimation by both recognising and localising actions in space-time. Instead of holistic features, e.g. silhouettes, we leverage th ?exibility of deformable part model to detect 2D body parts as a feature to estimate 3D poses. A new unconstrained pose dataset has been collected to justify the feasibility o our method, which demonstrated promising results, signi?cantly outperforming the relevant state-of-the-arts.