ALaRM: Align Language Models via Hierarchical Rewards Modeling

Lai, Yuhang; Wang, Siyuan; Liu, Shujun; Huang, Xuanjing; Wei, Zhongyu

Computer Science > Computation and Language

arXiv:2403.06754 (cs)

[Submitted on 11 Mar 2024 (v1), last revised 16 Mar 2024 (this version, v2)]

Title:ALaRM: Align Language Models via Hierarchical Rewards Modeling

Authors:Yuhang Lai, Siyuan Wang, Shujun Liu, Xuanjing Huang, Zhongyu Wei

View PDF HTML (experimental)

Abstract:We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback (RLHF), which is designed to enhance the alignment of large language models (LLMs) with human preferences. The framework addresses the limitations of current alignment approaches, which often struggle with the inconsistency and sparsity of human supervision signals, by integrating holistic rewards with aspect-specific rewards. This integration enables more precise and consistent guidance of language models towards desired outcomes, particularly in complex and open text generation tasks. By employing a methodology that filters and combines multiple rewards based on their consistency, the framework provides a reliable mechanism for improving model alignment. We validate our approach through applications in long-form question answering and machine translation tasks, employing gpt-3.5-turbo for pairwise comparisons, and demonstrate improvements over existing baselines. Our work underscores the effectiveness of hierarchical rewards modeling in refining LLM training processes for better human preference alignment. We release our code at this https URL.

Comments:	15 pages, 6 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2403.06754 [cs.CL]
	(or arXiv:2403.06754v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.06754

Submission history

From: Yuhang Lai [view email]
[v1] Mon, 11 Mar 2024 14:28:40 UTC (760 KB)
[v2] Sat, 16 Mar 2024 12:43:33 UTC (760 KB)

Computer Science > Computation and Language

Title:ALaRM: Align Language Models via Hierarchical Rewards Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ALaRM: Align Language Models via Hierarchical Rewards Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators