CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Lv, Huijie; Wang, Xiao; Zhang, Yuansen; Huang, Caishuang; Dou, Shihan; Ye, Junjie; Gui, Tao; Zhang, Qi; Huang, Xuanjing

Computer Science > Computation and Language

arXiv:2402.16717 (cs)

[Submitted on 26 Feb 2024]

Title:CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Authors:Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

View PDF HTML (experimental)

Abstract:Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothesis, we propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. To elude the intent security recognition phase, we reformulate tasks into a code completion format, enabling users to encrypt queries using personalized encryption functions. To guarantee response generation functionality, we embed a decryption function within the instructions, which allows the LLM to decrypt and execute the encrypted queries successfully. We conduct extensive experiments on 7 LLMs, achieving state-of-the-art average Attack Success Rate (ASR). Remarkably, our method achieves an 86.6\% ASR on GPT-4-1106.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2402.16717 [cs.CL]
	(or arXiv:2402.16717v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.16717

Submission history

From: Xiao Wang [view email]
[v1] Mon, 26 Feb 2024 16:35:59 UTC (6,743 KB)

Computer Science > Computation and Language

Title:CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators