Detecting Toxicity in News Articles: Application to Bulgarian

Dinkov, Yoan; Koychev, Ivan; Nakov, Preslav

Computer Science > Computation and Language

arXiv:1908.09785 (cs)

[Submitted on 26 Aug 2019]

Title:Detecting Toxicity in News Articles: Application to Bulgarian

Authors:Yoan Dinkov, Ivan Koychev, Preslav Nakov

View PDF

Abstract:Online media aim for reaching ever bigger audience and for attracting ever longer attention span. This competition creates an environment that rewards sensational, fake, and toxic news. To help limit their spread and impact, we propose and develop a news toxicity detector that can recognize various types of toxic content. While previous research primarily focused on English, here we target Bulgarian. We created a new dataset by crawling a website that for five years has been collecting Bulgarian news articles that were manually categorized into eight toxicity groups. Then we trained a multi-class classifier with nine categories: eight toxic and one non-toxic. We experimented with different representations based on ElMo, BERT, and XLM, as well as with a variety of domain-specific features. Due to the small size of our dataset, we created a separate model for each feature type, and we ultimately combined these models into a meta-classifier. The evaluation results show an accuracy of 59.0% and a macro-F1 score of 39.7%, which represent sizable improvements over the majority-class baseline (Acc=30.3%, macro-F1=5.2%).

Comments:	Fact-checking, source reliability, political ideology, news media, Bulgarian, RANLP-2019. arXiv admin note: text overlap with arXiv:1810.01765
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
MSC classes:	68T50
ACM classes:	I.2.7
Cite as:	arXiv:1908.09785 [cs.CL]
	(or arXiv:1908.09785v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1908.09785
Journal reference:	RANLP-2019

Submission history

From: Preslav Nakov [view email]
[v1] Mon, 26 Aug 2019 16:37:03 UTC (107 KB)

Computer Science > Computation and Language

Title:Detecting Toxicity in News Articles: Application to Bulgarian

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Detecting Toxicity in News Articles: Application to Bulgarian

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators