Continual SFT Matches Multimodal RLHF with Negative Supervision

Zhu, Ke; Wang, Yu; Sun, Yanpeng; Chen, Qiang; Liu, Jiangjiang; Zhang, Gang; Wang, Jingdong

Computer Science > Machine Learning

arXiv:2411.14797 (cs)

[Submitted on 22 Nov 2024]

Title:Continual SFT Matches Multimodal RLHF with Negative Supervision

Authors:Ke Zhu, Yu Wang, Yanpeng Sun, Qiang Chen, Jiangjiang Liu, Gang Zhang, Jingdong Wang

View PDF HTML (experimental)

Abstract:Multimodal RLHF usually happens after supervised finetuning (SFT) stage to continually improve vision-language models' (VLMs) comprehension. Conventional wisdom holds its superiority over continual SFT during this preference alignment stage. In this paper, we observe that the inherent value of multimodal RLHF lies in its negative supervision, the logit of the rejected responses. We thus propose a novel negative supervised finetuning (nSFT) approach that fully excavates these information resided. Our nSFT disentangles this negative supervision in RLHF paradigm, and continually aligns VLMs with a simple SFT loss. This is more memory efficient than multimodal RLHF where 2 (e.g., DPO) or 4 (e.g., PPO) large VLMs are strictly required. The effectiveness of nSFT is rigorously proved by comparing it with various multimodal RLHF approaches, across different dataset sources, base VLMs and evaluation metrics. Besides, fruitful of ablations are provided to support our hypothesis. We hope this paper will stimulate further research to properly align large vision language models.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.14797 [cs.LG]
	(or arXiv:2411.14797v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.14797

Submission history

From: Ke Zhu [view email]
[v1] Fri, 22 Nov 2024 08:48:30 UTC (2,938 KB)

Computer Science > Machine Learning

Title:Continual SFT Matches Multimodal RLHF with Negative Supervision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Continual SFT Matches Multimodal RLHF with Negative Supervision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators