A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Cuconasu, Florin; Trappolini, Giovanni; Tonellotto, Nicola; Silvestri, Fabrizio

Computer Science > Computation and Language

arXiv:2406.14972 (cs)

[Submitted on 21 Jun 2024]

Title:A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Authors:Florin Cuconasu, Giovanni Trappolini, Nicola Tonellotto, Fabrizio Silvestri

View PDF HTML (experimental)

Abstract:Retrieval Augmented Generation (RAG) represents a significant advancement in artificial intelligence combining a retrieval phase with a generative phase, with the latter typically being powered by large language models (LLMs). The current common practices in RAG involve using "instructed" LLMs, which are fine-tuned with supervised training to enhance their ability to follow instructions and are aligned with human preferences using state-of-the-art techniques. Contrary to popular belief, our study demonstrates that base models outperform their instructed counterparts in RAG tasks by 20% on average under our experimental settings. This finding challenges the prevailing assumptions about the superiority of instructed LLMs in RAG applications. Further investigations reveal a more nuanced situation, questioning fundamental aspects of RAG and suggesting the need for broader discussions on the topic; or, as Fromm would have it, "Seldom is a glance at the statistics enough to understand the meaning of the figures".

Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2406.14972 [cs.CL]
	(or arXiv:2406.14972v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.14972

Submission history

From: Florin Cuconasu [view email]
[v1] Fri, 21 Jun 2024 08:31:02 UTC (620 KB)

Computer Science > Computation and Language

Title:A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators