Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Fu, Qifan; Yang, Xiaohang; Asad, Muhammad; Oh, Changjae; Yuan, Shanxin; Slabaugh, Gregory

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.09149 (cs)

[Submitted on 13 Sep 2024]

Title:Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Authors:Qifan Fu, Xiaohang Yang, Muhammad Asad, Changjae Oh, Shanxin Yuan, Gregory Slabaugh

View PDF HTML (experimental)

Abstract:Diffusion models have shown their remarkable ability to synthesize images, including the generation of humans in specific poses. However, current models face challenges in adequately expressing conditional control for detailed hand pose generation, leading to significant distortion in the hand regions. To tackle this problem, we first curate the How2Sign dataset to provide richer and more accurate hand pose annotations. In addition, we introduce adaptive, multi-modal fusion to integrate characters' physical features expressed in different modalities such as skeleton, depth, and surface normal. Furthermore, we propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region, resulting in improved quality of generated hand gestures. More specifically, the proposed RACL computes a weighted keypoint distance between the full-body pose keypoints from the generated image and the ground truth, to generate higher-quality hand poses while balancing overall pose accuracy. Moreover, we use two hand region metrics, named hand-PSNR and hand-Distance for hand pose generation evaluations. Our experimental evaluations demonstrate the effectiveness of our proposed approach in improving the quality of digital human pose generation using diffusion models, especially the quality of the hand region. The source code is available at this https URL.

Comments:	This paper has been accepted by the ECCV 2024 HANDS workshop
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2409.09149 [cs.CV]
	(or arXiv:2409.09149v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.09149

Submission history

From: Qifan Fu [view email]
[v1] Fri, 13 Sep 2024 19:09:19 UTC (2,799 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators