MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation

Authors

  • Nhat M. Hoang Huawei Technologies Co., Ltd. Nanyang Technological University
  • Kehong Gong Huawei Technologies Co., Ltd.
  • Chuan Guo Huawei Technologies Co., Ltd.
  • Michael Bi Mi Huawei Technologies Co., Ltd.

DOI:

https://doi.org/10.1609/aaai.v38i3.27988

Keywords:

CV: Motion & Tracking, CV: 3D Computer Vision

Abstract

Controllable generation of 3D human motions becomes an important topic as the world embraces digital transformation. Existing works, though making promising progress with the advent of diffusion models, heavily rely on meticulously captured and annotated (e.g., text) high-quality motion corpus, a resource-intensive endeavor in the real world. This motivates our proposed MotionMix, a simple yet effective weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences. Specifically, we separate the denoising objectives of a diffusion model into two stages: obtaining conditional rough motion approximations in the initial T-T* steps by learning the noisy annotated motions, followed by the unconditional refinement of these preliminary motions during the last T* steps using unannotated motions. Notably, though learning from two sources of imperfect data, our model does not compromise motion generation quality compared to fully supervised approaches that access gold data. Extensive experiments on several benchmarks demonstrate that our MotionMix, as a versatile framework, consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.

Published

2024-03-24

How to Cite

Hoang, N. M., Gong, K., Guo, C., & Bi Mi, M. (2024). MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2157-2165. https://doi.org/10.1609/aaai.v38i3.27988

Issue

Section

AAAI Technical Track on Computer Vision II