A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Wang, Xinshun; Zhao, Shen; Chen, Chen; Liu, Mengyuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.03532v1 (cs)

[Submitted on 7 Apr 2023 (this version), latest version 7 Aug 2023 (v2)]

Title:A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Authors:Xinshun Wang, Shen Zhao, Chen Chen, Mengyuan Liu

View PDF

Abstract:The past few years has witnessed the dominance of Graph Convolutional Networks (GCNs) over human motion prediction, while their performance is still far from satisfactory. Recently, MLP-Mixers show competitive results on top of being more efficient and simple. To extract features, GCNs typically follow an aggregate-and-update paradigm, while Mixers rely on token mixing and channel mixing operations. The two research paths have been independently established in the community. In this paper, we develop a novel perspective by unifying Mixers and GCNs. We show that a mixer layer can be seen as a graph convolutional layer applied to a fully-connected graph with parameterized adjacency. Extending this theoretical finding to the practical side, we propose Meta-Mixing Network (M$^2$-Net). Assisted with a novel zero aggregation operation, our network is capable of capturing both the structure-agnostic and the structure-sensitive dependencies in a collaborative manner. Not only is it computationally efficient, but most importantly, it also achieves state-of-the-art performance. An extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that M$^2$-Net consistently outperforms all other approaches. We hope our work brings the community one step further towards truly predictable human motion. Our code will be publicly available.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.03532 [cs.CV]
	(or arXiv:2304.03532v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.03532

Submission history

From: Xinshun Wang [view email]
[v1] Fri, 7 Apr 2023 08:11:16 UTC (2,798 KB)
[v2] Mon, 7 Aug 2023 07:25:34 UTC (1,024 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Mixer Layer is Worth One Graph Convolution: Unifying MLP-Mixers and GCNs for Human Motion Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators