Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

Zhu, Yu; Sun, Chuxiong; Yang, Wenfei; Wei, Wenqiang; Tang, Bo; Zhang, Tianzhu; Li, Zhiyu; Zhang, Shifeng; Xiong, Feiyu; Hu, Jie; yang, Mingchuan

Computer Science > Computation and Language

arXiv:2403.04283 (cs)

[Submitted on 7 Mar 2024]

Title:Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

Authors:Yu Zhu, Chuxiong Sun, Wenfei Yang, Wenqiang Wei, Bo Tang, Tianzhu Zhang, Zhiyu Li, Shifeng Zhang, Feiyu Xiong, Jie Hu, Mingchuan yang

View PDF HTML (experimental)

Abstract:Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values. However, existing RLHF methods require a high computational cost, one main reason being that RLHF assigns both the generation and alignment tasks to the LLM simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the generation and alignment processes of LLMs, achieving alignment with human values at a much lower computational cost. We start with a novel Markov Decision Process (MDP) designed for the alignment process and employ Reinforcement Learning (RL) to train a streamlined proxy model that oversees the token generation of the LLM, without altering the LLM itself. Experiments show that our method achieves a comparable level of alignment with only 1\% of the training parameters of other methods.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2403.04283 [cs.CL]
	(or arXiv:2403.04283v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.04283

Submission history

From: Yu Zhu [view email]
[v1] Thu, 7 Mar 2024 07:31:00 UTC (1,640 KB)

Computer Science > Computation and Language

Title:Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators