Machine Unlearning in Large Language Models

Chen, Kongyang; Wang, Zixin; Mi, Bing; Liu, Waixi; Wang, Shaowei; Ren, Xiaojun; Shen, Jiaxing

Computer Science > Cryptography and Security

arXiv:2404.16841 (cs)

[Submitted on 3 Feb 2024]

Title:Machine Unlearning in Large Language Models

Authors:Kongyang Chen, Zixin Wang, Bing Mi, Waixi Liu, Shaowei Wang, Xiaojun Ren, Jiaxing Shen

View PDF HTML (experimental)

Abstract:Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper introduces a novel machine unlearning framework into LLMs. Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses, while retaining their standard output capabilities. To accomplish this, we use an evaluative model to pinpoint dialogues needing unlearning. We also establish a distance loss to function as the model's negative loss, diverting it from previous undesirable outputs. Furthermore, we determine the expected output's cluster mean to formulate a positive loss, directing the model's outputs toward preferable outcomes without compromising its reasoning abilities and performance. Experimental results show that our approach effectively meets unlearning objectives without substantially compromising model performance.

Subjects:	Cryptography and Security (cs.CR)
Cite as:	arXiv:2404.16841 [cs.CR]
	(or arXiv:2404.16841v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2404.16841

Submission history

From: Kongyang Chen [view email]
[v1] Sat, 3 Feb 2024 05:14:56 UTC (228 KB)

Computer Science > Cryptography and Security

Title:Machine Unlearning in Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Machine Unlearning in Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators