Two Heads Are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation

Pengfei Ren; Yuchen Chen; Jiachang Hao; Haifeng Sun; Qi Qi; Jingyu Wang; Jianxin Liao

doi:10.1609/aaai.v37i2.25310

Authors

Pengfei Ren Beijing University of Posts and Telecommunications
Yuchen Chen Beijing University of Posts and Telecommunications
Jiachang Hao Beijing University of Posts and Telecommunications
Haifeng Sun Beijing University of Posts and Telecommunications
Qi Qi Beijing University of Posts and Telecommunications
Jingyu Wang Beijing University of Posts and Telecommunications
Jianxin Liao Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v37i2.25310

Keywords:

CV: Biometrics, Face, Gesture & Pose, CV: 3D Computer Vision

Abstract

Depth images and point clouds are the two most commonly used data representations for depth-based 3D hand pose estimation. Benefiting from the structuring of image data and the inherent inductive biases of the 2D Convolutional Neural Network (CNN), image-based methods are highly efficient and effective. However, treating the depth data as a 2D image inevitably ignores the 3D nature of depth data. Point cloud-based methods can better mine the 3D geometric structure of depth data. However, these methods suffer from the disorder and non-structure of point cloud data, which is computationally inefficient. In this paper, we propose an Image-Point cloud Network (IPNet) for accurate and robust 3D hand pose estimation. IPNet utilizes 2D CNN to extract visual representations in 2D image space and performs iterative correction in 3D point cloud space to exploit the 3D geometry information of depth data. In particular, we propose a sparse anchor-based "aggregation-interaction-propagation'' paradigm to enhance point cloud features and refine the hand pose, which reduces irregular data access. Furthermore, we introduce a 3D hand model to the iterative correction process, which significantly improves the robustness of IPNet to occlusion and depth holes. Experiments show that IPNet outperforms state-of-the-art methods on three challenging hand datasets.

Two Heads Are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription