A Survey on Failure Analysis and Fault Injection in AI Systems

Yu, Guangba; Tan, Gou; Huang, Haojia; Zhang, Zhenyu; Chen, Pengfei; Natella, Roberto; Zheng, Zibin

Computer Science > Software Engineering

arXiv:2407.00125 (cs)

[Submitted on 28 Jun 2024]

Title:A Survey on Failure Analysis and Fault Injection in AI Systems

Authors:Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

View PDF HTML (experimental)

Abstract:The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ensure resilience and reliability. Despite the importance of these techniques, there lacks a comprehensive review of FA and FI methodologies in AI systems. This study fills this gap by presenting a detailed survey of existing FA and FI approaches across six layers of AI systems. We systematically analyze 160 papers and repositories to answer three research questions including (1) what are the prevalent failures in AI systems, (2) what types of faults can current FI tools simulate, (3) what gaps exist between the simulated faults and real-world failures. Our findings reveal a taxonomy of AI system failures, assess the capabilities of existing FI tools, and highlight discrepancies between real-world and simulated failures. Moreover, this survey contributes to the field by providing a framework for fault diagnosis, evaluating the state-of-the-art in FI, and identifying areas for improvement in FI techniques to enhance the resilience of AI systems.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2407.00125 [cs.SE]
	(or arXiv:2407.00125v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2407.00125

Submission history

From: Guangba Yu [view email]
[v1] Fri, 28 Jun 2024 00:32:03 UTC (3,098 KB)

Computer Science > Software Engineering

Title:A Survey on Failure Analysis and Fault Injection in AI Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:A Survey on Failure Analysis and Fault Injection in AI Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators