Point-to-Pose Voting based Hand Pose Estimation using Residual PermutationEquivariant Layer
Recently, 3D input data based hand pose estimation
methods have shown state-of-the-art performance, because
3D data capture more spatial information than the depth
image. Whereas 3D voxel-based methods need a large
amount of memory, PointNet based methods need tedious
preprocessing steps such as K-nearest neighbour search for
each point. In this paper, we present a novel deep learning
hand pose estimation method for an unordered point cloud.
Our method takes 1024 3D points as input and does not require additional information. We use Permutation Equivariant Layer (PEL) as the basic element, where a residual network version of PEL is proposed for the hand pose estimation task. Furthermore, we propose a voting-based scheme
to merge information from individual points to the final pose
output. In addition to the pose estimation task, the votingbased scheme can also provide point cloud segmentation result without ground-truth for segmentation. We evaluate our
method on both NYU dataset and the Hands2017Challenge
dataset, where our method outperforms recent state-of-theart methods.