Two Heads Are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation

Authors

  • Pengfei Ren Beijing University of Posts and Telecommunications
  • Yuchen Chen Beijing University of Posts and Telecommunications
  • Jiachang Hao Beijing University of Posts and Telecommunications
  • Haifeng Sun Beijing University of Posts and Telecommunications
  • Qi Qi Beijing University of Posts and Telecommunications
  • Jingyu Wang Beijing University of Posts and Telecommunications
  • Jianxin Liao Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v37i2.25310

Keywords:

CV: Biometrics, Face, Gesture & Pose, CV: 3D Computer Vision

Abstract

Depth images and point clouds are the two most commonly used data representations for depth-based 3D hand pose estimation. Benefiting from the structuring of image data and the inherent inductive biases of the 2D Convolutional Neural Network (CNN), image-based methods are highly efficient and effective. However, treating the depth data as a 2D image inevitably ignores the 3D nature of depth data. Point cloud-based methods can better mine the 3D geometric structure of depth data. However, these methods suffer from the disorder and non-structure of point cloud data, which is computationally inefficient. In this paper, we propose an Image-Point cloud Network (IPNet) for accurate and robust 3D hand pose estimation. IPNet utilizes 2D CNN to extract visual representations in 2D image space and performs iterative correction in 3D point cloud space to exploit the 3D geometry information of depth data. In particular, we propose a sparse anchor-based "aggregation-interaction-propagation'' paradigm to enhance point cloud features and refine the hand pose, which reduces irregular data access. Furthermore, we introduce a 3D hand model to the iterative correction process, which significantly improves the robustness of IPNet to occlusion and depth holes. Experiments show that IPNet outperforms state-of-the-art methods on three challenging hand datasets.

Downloads

Published

2023-06-26

How to Cite

Ren, P., Chen, Y., Hao, J., Sun, H., Qi, Q., Wang, J., & Liao, J. (2023). Two Heads Are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2163-2171. https://doi.org/10.1609/aaai.v37i2.25310

Issue

Section

AAAI Technical Track on Computer Vision II