Hindi-BEIR : A Large Scale Retrieval Benchmark in Hindi

Acharya, Arkadeep; Murthy, Rudra; Kumar, Vishwajeet; Sen, Jaydeep

Computer Science > Information Retrieval

arXiv:2408.09437 (cs)

[Submitted on 18 Aug 2024]

Title:Hindi-BEIR : A Large Scale Retrieval Benchmark in Hindi

Authors:Arkadeep Acharya, Rudra Murthy, Vishwajeet Kumar, Jaydeep Sen

View PDF HTML (experimental)

Abstract:Given the large number of Hindi speakers worldwide, there is a pressing need for robust and efficient information retrieval systems for Hindi. Despite ongoing research, there is a lack of comprehensive benchmark for evaluating retrieval models in Hindi. To address this gap, we introduce the Hindi version of the BEIR benchmark, which includes a subset of English BEIR datasets translated to Hindi, existing Hindi retrieval datasets, and synthetically created datasets for retrieval. The benchmark is comprised of $15$ datasets spanning across $8$ distinct tasks. We evaluate state-of-the-art multilingual retrieval models on this benchmark to identify task and domain-specific challenges and their impact on retrieval performance. By releasing this benchmark and a set of relevant baselines, we enable researchers to understand the limitations and capabilities of current Hindi retrieval models, promoting advancements in this critical area. The datasets from Hindi-BEIR are publicly available.

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2408.09437 [cs.IR]
	(or arXiv:2408.09437v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2408.09437

Submission history

From: Arkadeep Acharya [view email]
[v1] Sun, 18 Aug 2024 10:55:04 UTC (2,781 KB)

Computer Science > Information Retrieval

Title:Hindi-BEIR : A Large Scale Retrieval Benchmark in Hindi

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Hindi-BEIR : A Large Scale Retrieval Benchmark in Hindi

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators