TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Fu, Deqing; Xiao, Tong; Wang, Rui; Zhu, Wang; Zhang, Pengchuan; Pang, Guan; Jia, Robin; Chen, Lawrence

Computer Science > Machine Learning

arXiv:2410.04734 (cs)

[Submitted on 7 Oct 2024 (v1), last revised 24 Feb 2025 (this version, v2)]

Title:TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Authors:Deqing Fu, Tong Xiao, Rui Wang, Wang Zhu, Pengchuan Zhang, Guan Pang, Robin Jia, Lawrence Chen

View PDF

Abstract:Although reward models have been successful in improving multimodal large language models, the reward models themselves remain brutal and contain minimal information. Notably, existing reward models only mimic human annotations by assigning only one binary feedback to any text, no matter how long the text is. In the realm of multimodal language models, where models are required to process both images and texts, a naive reward model may learn implicit biases toward texts and become less grounded in images. In this paper, we propose a $\textbf{T}$oken-$\textbf{L}$evel $\textbf{D}$etective $\textbf{R}$eward Model ($\textbf{TLDR}$) to provide fine-grained annotations to each text token. We first introduce a perturbation-based method to generate synthetic hard negatives and their token-level labels to train TLDR models. Then we show the rich usefulness of TLDR models both in assisting off-the-shelf models to self-correct their generations, and in serving as a hallucination evaluation tool. We show that TLDR automatically trains a token-level likelihood optimization, and can improve the base model's performance significantly. Finally, we show that TLDR models can significantly speed up human annotation by 3 times to acquire a broader range of high-quality vision language data.

Comments:	Published as a conference paper at ICLR 2025
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2410.04734 [cs.LG]
	(or arXiv:2410.04734v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.04734

Submission history

From: Deqing Fu [view email]
[v1] Mon, 7 Oct 2024 04:00:22 UTC (3,779 KB)
[v2] Mon, 24 Feb 2025 22:15:33 UTC (3,787 KB)

Computer Science > Machine Learning

Title:TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators