Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

Huang, Yihao; Wang, Chong; Jia, Xiaojun; Guo, Qing; Juefei-Xu, Felix; Zhang, Jian; Pu, Geguang; Liu, Yang

Computer Science > Computation and Language

arXiv:2405.14189 (cs)

[Submitted on 23 May 2024]

Title:Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

Authors:Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Geguang Pu, Yang Liu

View PDF HTML (experimental)

Abstract:With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance. Regarding the new task of universal goal hijacking, previous efforts have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To fill this gap, we propose a universal goal hijacking method called POUGH that incorporates semantic-guided prompt processing strategies. Specifically, the method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes the prompts. Once the prompts are organized sequentially, the method employs an iterative optimization algorithm to generate the universal fixed suffix for the prompts. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness of our method.

Comments:	15 pages
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.14189 [cs.CL]
	(or arXiv:2405.14189v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.14189

Submission history

From: Yihao Huang [view email]
[v1] Thu, 23 May 2024 05:31:41 UTC (147 KB)

Computer Science > Computation and Language

Title:Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators