Rethinking Dense Retrieval's Few-Shot Ability

Sun, Si; Lu, Yida; Yu, Shi; Li, Xiangyang; Li, Zhonghua; Cao, Zhao; Liu, Zhiyuan; Ye, Deiming; Bao, Jie

Computer Science > Computation and Language

arXiv:2304.05845 (cs)

[Submitted on 12 Apr 2023]

Title:Rethinking Dense Retrieval's Few-Shot Ability

Authors:Si Sun, Yida Lu, Shi Yu, Xiangyang Li, Zhonghua Li, Zhao Cao, Zhiyuan Liu, Deiming Ye, Jie Bao

View PDF

Abstract:Few-shot dense retrieval (DR) aims to effectively generalize to novel search scenarios by learning a few samples. Despite its importance, there is little study on specialized datasets and standardized evaluation protocols. As a result, current methods often resort to random sampling from supervised datasets to create "few-data" setups and employ inconsistent training strategies during evaluations, which poses a challenge in accurately comparing recent progress. In this paper, we propose a customized FewDR dataset and a unified evaluation benchmark. Specifically, FewDR employs class-wise sampling to establish a standardized "few-shot" setting with finely-defined classes, reducing variability in multiple sampling rounds. Moreover, the dataset is disjointed into base and novel classes, allowing DR models to be continuously trained on ample data from base classes and a few samples in novel classes. This benchmark eliminates the risk of novel class leakage, providing a reliable estimation of the DR model's few-shot ability. Our extensive empirical results reveal that current state-of-the-art DR models still face challenges in the standard few-shot scene. Our code and data will be open-sourced at this https URL.

Comments:	Work in progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2304.05845 [cs.CL]
	(or arXiv:2304.05845v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.05845

Submission history

From: Si Sun [view email]
[v1] Wed, 12 Apr 2023 13:20:16 UTC (926 KB)

Computer Science > Computation and Language

Title:Rethinking Dense Retrieval's Few-Shot Ability

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Rethinking Dense Retrieval's Few-Shot Ability

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators