Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

Hong, Giwon; Kim, Jeonghwan; Kang, Junmo; Myaeng, Sung-Hyon; Whang, Joyce Jiyoung

Computer Science > Computation and Language

arXiv:2305.01579 (cs)

[Submitted on 2 May 2023 (v1), last revised 9 Jun 2024 (this version, v3)]

Title:Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

Authors:Giwon Hong, Jeonghwan Kim, Junmo Kang, Sung-Hyon Myaeng, Joyce Jiyoung Whang

View PDF HTML (experimental)

Abstract:Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We observe that existing LMs are highly brittle to the presence of conflicting information in both the fine-tuning and in-context few-shot learning scenarios. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability. Our empirical results on open-domain QA show that these approaches significantly enhance model robustness. We also provide our findings on incorporating the fine-tuned discriminator's decision into the in-context learning process, proposing a way to exploit the benefits of two disparate learning schemes. Alongside our findings, we provide MacNoise, a machine-generated, conflict-induced dataset to further encourage research in this direction.

Comments:	NAACL 2024 (Findings; Long Paper)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.01579 [cs.CL]
	(or arXiv:2305.01579v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.01579

Submission history

From: Jeonghwan Kim [view email]
[v1] Tue, 2 May 2023 16:28:10 UTC (7,827 KB)
[v2] Thu, 14 Mar 2024 01:39:58 UTC (9,362 KB)
[v3] Sun, 9 Jun 2024 23:42:48 UTC (9,362 KB)

Computer Science > Computation and Language

Title:Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators