A Dataset for Plain Language Adaptation of Biomedical Abstracts

Attal, Kush; Ondov, Brian; Demner-Fushman, Dina

Computer Science > Computation and Language

arXiv:2210.12242 (cs)

[Submitted on 21 Oct 2022]

Title:A Dataset for Plain Language Adaptation of Biomedical Abstracts

Authors:Kush Attal, Brian Ondov, Dina Demner-Fushman

View PDF

Abstract:Though exponentially growing health-related literature has been made available to a broad audience online, the language of scientific articles can be difficult for the general public to understand. Therefore, adapting this expert-level language into plain language versions is necessary for the public to reliably comprehend the vast health-related literature. Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed for proper evaluation. Proposed datasets thus far consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This leads to a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. The dataset contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark automatic adaptation on the dataset with state-of-the-art Deep Learning approaches, setting baselines for future research.

Comments:	12 pages, 4 figures, 7 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.12242 [cs.CL]
	(or arXiv:2210.12242v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.12242

Submission history

From: Kush Attal [view email]
[v1] Fri, 21 Oct 2022 20:47:34 UTC (219 KB)

Computer Science > Computation and Language

Title:A Dataset for Plain Language Adaptation of Biomedical Abstracts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Dataset for Plain Language Adaptation of Biomedical Abstracts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators