Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Li, Xiaoyu; Liang, Yingyu; Shi, Zhenmei; Song, Zhao; Zhou, Yufa

Computer Science > Machine Learning

arXiv:2410.09397 (cs)

[Submitted on 12 Oct 2024]

Title:Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Authors:Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities in processing long-context information. However, the quadratic complexity of attention computation with respect to sequence length poses significant computational challenges, and I/O aware algorithms have been proposed. This paper presents a comprehensive analysis of the I/O complexity for attention mechanisms, focusing on backward passes by categorizing into small and large cache scenarios. Using the red-blue pebble game framework, we establish tight bounds on I/O complexity across all cache sizes. We confirm that the de facto standard I/O aware algorithm FlashAttention is optimal for both forward and backward passes for the large cache size scenario. For small cache sizes, we provide an algorithm that improves over existing methods and achieves the tight bounds. Additionally, we extend our analysis to sparse attention, a mainstream speeding-up approach, deriving fine-grained lower bounds for both forward and backward passes and both small and large caches. Our findings complete the theoretical foundation for I/O complexity in attention mechanisms, offering insights for designing efficient algorithms of LLM training and inference.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Computation and Language (cs.CL)
Cite as:	arXiv:2410.09397 [cs.LG]
	(or arXiv:2410.09397v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.09397

Submission history

From: Zhenmei Shi [view email]
[v1] Sat, 12 Oct 2024 07:01:30 UTC (152 KB)

Computer Science > Machine Learning

Title:Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators