QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Rogers, Anna; Gardner, Matt; Augenstein, Isabelle

doi:10.1145/3560260

Computer Science > Computation and Language

arXiv:2107.12708 (cs)

[Submitted on 27 Jul 2021 (v1), last revised 19 Sep 2022 (this version, v2)]

Title:QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Authors:Anna Rogers, Matt Gardner, Isabelle Augenstein

View PDF

Abstract:Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of "skills" that question answering/reading comprehension systems are supposed to acquire, and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of over-focusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.

Comments:	Published in ACM Comput. Surv (2022). This version differs from the final version in that section 7 ("Languages") is not in the main paper rather than the supplementary materials
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2107.12708 [cs.CL]
	(or arXiv:2107.12708v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2107.12708
Related DOI:	https://doi.org/10.1145/3560260

Submission history

From: Anna Rogers [view email]
[v1] Tue, 27 Jul 2021 10:09:13 UTC (1,777 KB)
[v2] Mon, 19 Sep 2022 16:38:40 UTC (2,066 KB)

Computer Science > Computation and Language

Title:QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators