Two Heads Are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation
DOI:
https://doi.org/10.1609/aaai.v37i2.25310Keywords:
CV: Biometrics, Face, Gesture & Pose, CV: 3D Computer VisionAbstract
Depth images and point clouds are the two most commonly used data representations for depth-based 3D hand pose estimation. Benefiting from the structuring of image data and the inherent inductive biases of the 2D Convolutional Neural Network (CNN), image-based methods are highly efficient and effective. However, treating the depth data as a 2D image inevitably ignores the 3D nature of depth data. Point cloud-based methods can better mine the 3D geometric structure of depth data. However, these methods suffer from the disorder and non-structure of point cloud data, which is computationally inefficient. In this paper, we propose an Image-Point cloud Network (IPNet) for accurate and robust 3D hand pose estimation. IPNet utilizes 2D CNN to extract visual representations in 2D image space and performs iterative correction in 3D point cloud space to exploit the 3D geometry information of depth data. In particular, we propose a sparse anchor-based "aggregation-interaction-propagation'' paradigm to enhance point cloud features and refine the hand pose, which reduces irregular data access. Furthermore, we introduce a 3D hand model to the iterative correction process, which significantly improves the robustness of IPNet to occlusion and depth holes. Experiments show that IPNet outperforms state-of-the-art methods on three challenging hand datasets.Downloads
Published
2023-06-26
How to Cite
Ren, P., Chen, Y., Hao, J., Sun, H., Qi, Q., Wang, J., & Liao, J. (2023). Two Heads Are Better than One: Image-Point Cloud Network for Depth-Based 3D Hand Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2163-2171. https://doi.org/10.1609/aaai.v37i2.25310
Issue
Section
AAAI Technical Track on Computer Vision II