RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Li, Zhen; Guenevere; Chen; Chen, Chen; Zou, Yayi; Xu, Shouhuai

doi:10.1145/3510003.3510181

Computer Science > Cryptography and Security

arXiv:2202.06043 (cs)

[Submitted on 12 Feb 2022]

Title:RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Authors:Zhen Li, Guenevere (Qian)Chen, Chen Chen, Yayi Zou, Shouhuai Xu

View PDF

Abstract:Source code authorship attribution is an important problem often encountered in applications such as software forensics, bug fixing, and software quality analysis. Recent studies show that current source code authorship attribution methods can be compromised by attackers exploiting adversarial examples and coding style manipulation. This calls for robust solutions to the problem of code authorship attribution. In this paper, we initiate the study on making Deep Learning (DL)-based code authorship attribution robust. We propose an innovative framework called Robust coding style Patterns Generation (RoPGen), which essentially learns authors' unique coding style patterns that are hard for attackers to manipulate or imitate. The key idea is to combine data augmentation and gradient augmentation at the adversarial training phase. This effectively increases the diversity of training examples, generates meaningful perturbations to gradients of deep neural networks, and learns diversified representations of coding styles. We evaluate the effectiveness of RoPGen using four datasets of programs written in C, C++, and Java. Experimental results show that RoPGen can significantly improve the robustness of DL-based code authorship attribution, by respectively reducing 22.8% and 41.0% of the success rate of targeted and untargeted attacks on average.

Comments:	ICSE 2022
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2202.06043 [cs.CR]
	(or arXiv:2202.06043v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2202.06043
Related DOI:	https://doi.org/10.1145/3510003.3510181

Submission history

From: Zhen Li [view email]
[v1] Sat, 12 Feb 2022 11:27:32 UTC (2,162 KB)

Computer Science > Cryptography and Security

Title:RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators