LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions

Cheng, Yiran; Shar, Lwin Khin; Zhang, Ting; Yang, Shouguo; Dong, Chaopeng; Lo, David; Lv, Shichao; Shi, Zhiqiang; Sun, Limin

Abstract:Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules. They then use syntactic-level code clone detection to identify the vulnerable versions. These methods are hindered by imprecisions due to (1) the inclusion of vulnerability-irrelevant code in the analysis and (2) the inadequacy of syntactic-level code clone detection. This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++. Vercation combines program slicing with a Large Language Model (LLM) to identify vulnerability-relevant code from vulnerability patches. It then backtraces historical commits to gather previous modifications of identified vulnerability-relevant code. We propose semantic-level code clone detection to compare the differences between pre-modification and post-modification code, thereby locating the vulnerability-introducing commit (vic) and enabling to identify the vulnerable versions between the patch commit and the vic. We curate a dataset linking 74 OSS vulnerabilities and 1013 versions to evaluate Vercation. On this dataset, our approach achieves the F1 score of 92.4%, outperforming current state-of-the-art methods. More importantly, Vercation detected 134 incorrect vulnerable OSS versions in NVD reports.

Subjects:	Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Cite as:	arXiv:2408.07321 [cs.SE]
	(or arXiv:2408.07321v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2408.07321

Computer Science > Software Engineering

Title:LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators