Abstract
Articulated hand pose estimation plays an importantrole in human-computer interaction. Despite the recentprogress, the accuracy of existing methods is still not sat-isfactory, partially due to the difficulty of embedded high-dimensional and non-linear regression problem. Differentfrom the existing discriminative methods that regress for thehand pose with a single depth image, we propose to firstproject the query depth image onto three orthogonal planesand utilize these multi-view projections to regress for 2Dheat-maps which estimate the joint positions on each plane.These multi-view heat-maps are then fused to produce final 3D hand pose estimation with learned pose priors. Experiments show that the proposed method largely outperforms state-of-the-art on a challenging dataset. Moreover, a cross-dataset experiment also demonstrates the good generalization ability of the proposed method.