NELA-GT-2022: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

Gruppi, Maurício; Horne, Benjamin D.; Adalı, Sibel

Computer Science > Computation and Language

arXiv:2203.05659 (cs)

[Submitted on 10 Mar 2022 (v1), last revised 17 Mar 2023 (this version, v2)]

Title:NELA-GT-2022: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

Authors:Maurício Gruppi, Benjamin D. Horne, Sibel Adalı

View PDF

Abstract:In this paper, we present the fifth installment of the NELA-GT datasets, NELA-GT-2022. The dataset contains 1,778,361 articles from 361 outlets between January 1st, 2022 and December 31st, 2022. Just as in past releases of the dataset, NELA-GT-2022 includes outlet-level veracity labels from Media Bias/Fact Check and tweets embedded in collected news articles. The NELA-GT-2022 dataset can be found at: this https URL

Comments:	Technical report documenting the NELA-GT recent update (NELA-GT-2022). arXiv admin note: substantial text overlap with arXiv:2102.04567
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Cite as:	arXiv:2203.05659 [cs.CL]
	(or arXiv:2203.05659v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2203.05659

Submission history

From: Benjamin Horne [view email]
[v1] Thu, 10 Mar 2022 21:58:33 UTC (96 KB)
[v2] Fri, 17 Mar 2023 22:21:50 UTC (109 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2022-03

Change to browse by:

cs
cs.CY
cs.LG
cs.SI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:NELA-GT-2022: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:NELA-GT-2022: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators