"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation

Liang, Zi; Ye, Qingqing; Wang, Yanyun; Zhang, Sen; Xiao, Yaxin; Li, Ronghua; Xu, Jianliang; Hu, Haibo

Computer Science > Cryptography and Security

arXiv:2409.02718 (cs)

[Submitted on 4 Sep 2024 (v1), last revised 8 Feb 2025 (this version, v2)]

Title:"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation

Authors:Zi Liang, Qingqing Ye, Yanyun Wang, Sen Zhang, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu

View PDF HTML (experimental)

Abstract:Model extraction attacks (MEAs) on large language models (LLMs) have received increasing attention in recent research. However, existing attack methods typically adapt the extraction strategies originally developed for deep neural networks (DNNs). They neglect the underlying inconsistency between the training tasks of MEA and LLM alignment, leading to suboptimal attack performance. To tackle this issue, we propose Locality Reinforced Distillation (LoRD), a novel model extraction algorithm specifically designed for LLMs. In particular, LoRD employs a newly defined policy-gradient-style training task that utilizes the responses of victim model as the signal to guide the crafting of preference for the local model. Theoretical analyses demonstrate that I) The convergence procedure of LoRD in model extraction is consistent with the alignment procedure of LLMs, and II) LoRD can reduce query complexity while mitigating watermark protection through our exploration-based stealing. Extensive experiments validate the superiority of our method in extracting various state-of-the-art commercial LLMs. Our code is available at: this https URL.

Subjects:	Cryptography and Security (cs.CR); Computation and Language (cs.CL)
Cite as:	arXiv:2409.02718 [cs.CR]
	(or arXiv:2409.02718v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2409.02718

Submission history

From: Zi Liang [view email]
[v1] Wed, 4 Sep 2024 13:54:38 UTC (1,473 KB)
[v2] Sat, 8 Feb 2025 10:14:26 UTC (1,538 KB)

Computer Science > Cryptography and Security

Title:"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:"Yes, My LoRD." Guiding Language Model Extraction with Locality Reinforced Distillation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators