Abstract
This work addresses a novel and challenging problem
of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully
express the 3D shape of hand. In contrast, we propose a
Graph Convolutional Neural Network (Graph CNN) based
method to reconstruct a full 3D mesh of hand surface that
contains richer information of both 3D hand shape and
pose. To train networks with full supervision, we create a
large-scale synthetic dataset containing both ground truth
3D meshes and 3D poses. When fine-tuning the networks
on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth
map as a weak supervision in training. Through extensive
evaluations on our proposed new datasets and two public
datasets, we show that our proposed method can produce
accurate and reasonable 3D hand mesh, and can achieve
superior 3D hand pose estimation accuracy when compared
with state-of-the-art methods.