Abstract
Convolutional Neural Network (CNN) has shown
promising results for 3D hand pose estimation in depth images. Different from existing CNN-based hand pose estimation methods that take either 2D images or 3D volumes
as the input, our proposed Hand PointNet directly processes the 3D point cloud that models the visible surface of the
hand for pose regression. Taking the normalized point cloud
as the input, our proposed hand pose regression network
is able to capture complex hand structures and accurately regress a low dimensional representation of the 3D hand
pose. In order to further improve the accuracy of fingertips, we design a fingertip refinement network that directly
takes the neighboring points of the estimated fingertip location as input to refine the fingertip location. Experiments
on three challenging hand pose datasets show that our proposed method outperforms state-of-the-art methods