MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Chen, Ling-Hao; Lu, Shunlin; Zeng, Ailing; Zhang, Hao; Wang, Benyou; Zhang, Ruimao; Zhang, Lei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.20340 (cs)

[Submitted on 30 May 2024]

Title:MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Authors:Ling-Hao Chen, Shunlin Lu, Ailing Zeng, Hao Zhang, Benyou Wang, Ruimao Zhang, Lei Zhang

View PDF HTML (experimental)

Abstract:This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models (LLMs). Diverging from recent LLMs designed for video-only or motion-only understanding, we argue that understanding human behavior necessitates joint modeling from both videos and motion sequences (e.g., SMPL sequences) to capture nuanced body part dynamics and semantics effectively. In light of this, we present MotionLLM, a straightforward yet effective framework for human motion understanding, captioning, and reasoning. Specifically, MotionLLM adopts a unified video-motion training strategy that leverages the complementary advantages of existing coarse video-text data and fine-grained motion-text data to glean rich spatial-temporal insights. Furthermore, we collect a substantial dataset, MoVid, comprising diverse videos, motions, captions, and instructions. Additionally, we propose the MoVid-Bench, with carefully manual annotations, for better evaluation of human behavior understanding on video and motion. Extensive experiments show the superiority of MotionLLM in the caption, spatial-temporal comprehension, and reasoning ability.

Comments:	MotionLLM version 1.0, project page see this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.20340 [cs.CV]
	(or arXiv:2405.20340v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.20340

Submission history

From: Ling-Hao Chen [view email]
[v1] Thu, 30 May 2024 17:59:50 UTC (17,062 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators