TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Li, Haoran; Zhou, Pengyuan; Lin, Yihang; Hao, Yanbin; Xie, Haiyong; Liao, Yong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.09807 (cs)

[Submitted on 17 Mar 2023 (v1), last revised 20 Mar 2023 (this version, v2)]

Title:TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Authors:Haoran Li, Pengyuan Zhou, Yihang Lin, Yanbin Hao, Haiyong Xie, Yong Liao

View PDF

Abstract:Video prediction is a complex time-series forecasting task with great potential in many use cases. However, conventional methods overemphasize accuracy while ignoring the slow prediction speed caused by complicated model structures that learn too much redundant information with excessive GPU memory consumption. Furthermore, conventional methods mostly predict frames sequentially (frame-by-frame) and thus are hard to accelerate. Consequently, valuable use cases such as real-time danger prediction and warning cannot achieve fast enough inference speed to be applicable in reality. Therefore, we propose a transformer-based keypoint prediction neural network (TKN), an unsupervised learning method that boost the prediction process via constrained information extraction and parallel prediction scheme. TKN is the first real-time video prediction solution to our best knowledge, while significantly reducing computation costs and maintaining other performance. Extensive experiments on KTH and Human3.6 datasets demonstrate that TKN predicts 11 times faster than existing methods while reducing memory consumption by 17.4% and achieving state-of-the-art prediction performance on average.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.09807 [cs.CV]
	(or arXiv:2303.09807v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.09807

Submission history

From: Haoran Li [view email]
[v1] Fri, 17 Mar 2023 07:26:16 UTC (14,288 KB)
[v2] Mon, 20 Mar 2023 10:57:45 UTC (14,288 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators