Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models

Luo, Weidi; Cao, He; Liu, Zijing; Wang, Yu; Wong, Aidan; Feng, Bing; Yao, Yuan; Li, Yu

Computer Science > Artificial Intelligence

arXiv:2410.17922 (cs)

[Submitted on 23 Oct 2024 (v1), last revised 9 Feb 2025 (this version, v2)]

Title:Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models

Authors:Weidi Luo, He Cao, Zijing Liu, Yu Wang, Aidan Wong, Bing Feng, Yuan Yao, Yu Li

View PDF HTML (experimental)

Abstract:With the extensive deployment of Large Language Models (LLMs), ensuring their safety has become increasingly critical. However, existing defense methods often struggle with two key issues: (i) inadequate defense capabilities, particularly in domain-specific scenarios like chemistry, where a lack of specialized knowledge can lead to the generation of harmful responses to malicious queries. (ii) over-defensiveness, which compromises the general utility and responsiveness of LLMs. To mitigate these issues, we introduce a multi-agents-based defense framework, Guide for Defense (G4D), which leverages accurate external information to provide an unbiased summary of user intentions and analytically grounded safety response guidance. Extensive experiments on popular jailbreak attacks and benign datasets show that our G4D can enhance LLM's robustness against jailbreak attacks on general and domain-specific scenarios without compromising the model's general functionality.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.17922 [cs.AI]
	(or arXiv:2410.17922v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2410.17922

Submission history

From: He Cao [view email]
[v1] Wed, 23 Oct 2024 14:40:37 UTC (1,385 KB)
[v2] Sun, 9 Feb 2025 03:34:47 UTC (1,386 KB)

Computer Science > Artificial Intelligence

Title:Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators