Abstract
This paper focuses on the topic of vision based hand pose
estimation from single depth map using convolutional neural network (CNN). Our main contributions lie in designing
a new pose regression network architecture named CrossInfoNet. The proposed CrossInfoNet decomposes hand pose
estimation task into palm pose estimation sub-task and finger pose estimation sub-task, and adopts two-branch crossconnection structure to share the beneficial complementary
information between the sub-tasks. Our work is inspired
by multi-task information sharing mechanism, which has
been few discussed in hand pose estimation using depth
data in previous publications. In addition, we propose a
heat-map guided feature extraction structure to get better
feature maps, and train the complete network end-to-end.
The effectiveness of the proposed CrossInfoNet is evaluated
with extensively self-comparative experiments and in comparison with state-of-the-art methods on four public hand
pose datasets. The code is available in1.