Vision Transformer Computation and Resilience for Dynamic Inference

Sreedhar, Kavya; Clemons, Jason; Venkatesan, Rangharajan; Keckler, Stephen W.; Horowitz, Mark

Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.02687 (cs)

[Submitted on 6 Dec 2022 (v1), last revised 15 Apr 2024 (this version, v3)]

Title:Vision Transformer Computation and Resilience for Dynamic Inference

Authors:Kavya Sreedhar, Jason Clemons, Rangharajan Venkatesan, Stephen W. Keckler, Mark Horowitz

View PDF HTML (experimental)

Abstract:State-of-the-art deep learning models for computer vision tasks are based on the transformer architecture and often deployed in real-time applications. In this scenario, the resources available for every inference can vary, so it is useful to be able to dynamically adapt execution to trade accuracy for efficiency. To create dynamic models, we leverage the resilience of vision transformers to pruning and switch between different scaled versions of a model. Surprisingly, we find that most FLOPs are generated by convolutions, not attention. These relative FLOP counts are not a good predictor of GPU performance since GPUs have special optimizations for convolutions. Some models are fairly resilient and their model execution can be adapted without retraining, while all models achieve better accuracy with retraining alternative execution paths. These insights mean that we can leverage CNN accelerators and these alternative execution paths to enable efficient and dynamic vision transformer inference. Our analysis shows that leveraging this type of dynamic execution can lead to saving 28\% of energy with a 1.4\% accuracy drop for SegFormer (63 GFLOPs), with no additional training, and 53\% of energy for ResNet-50 (4 GFLOPs) with a 3.3\% accuracy drop by switching between pretrained Once-For-All models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Hardware Architecture (cs.AR)
Cite as:	arXiv:2212.02687 [cs.CV]
	(or arXiv:2212.02687v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.02687
Journal reference:	2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Submission history

From: Kavya Sreedhar [view email]
[v1] Tue, 6 Dec 2022 01:10:31 UTC (6,467 KB)
[v2] Thu, 23 Feb 2023 21:25:53 UTC (6,955 KB)
[v3] Mon, 15 Apr 2024 22:13:39 UTC (4,066 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Vision Transformer Computation and Resilience for Dynamic Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Vision Transformer Computation and Resilience for Dynamic Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators