Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit

Chandrasekharan, Eshwar; Gilbert, Eric

Computer Science > Social and Information Networks

arXiv:1904.03596 (cs)

[Submitted on 7 Apr 2019 (v1), last revised 17 Jul 2019 (this version, v2)]

Title:Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit

Authors:Eshwar Chandrasekharan, Eric Gilbert

View PDF

Abstract:In this dataset paper, we present a three-stage process to collect Reddit comments that are removed comments by moderators of several subreddits, for violating subreddit rules and guidelines. Other than the fact that these comments were flagged by moderators for violating community norms, we do not have any other information regarding the nature of the violations. Through this procedure, we collect over 2M comments removed by moderators of 100 different Reddit communities, and publicly release the data. Working with this dataset of removed comments, we identify 8 macro norms---norms that are widely enforced on most parts of Reddit. We extract these macro norms by employing a hybrid approach---classification, topic modeling, and open-coding---on comments identified to be norm violations within at least 85 out of the 100 study subreddits. Finally, we label over 40K Reddit comments removed by moderators according to the specific type of macro norm being violated, and make this dataset publicly available. By breaking down a collection of removed comments into more granular types of macro norm violation, our dataset can be used to train more nuanced machine learning classifiers for online moderation.

Subjects:	Social and Information Networks (cs.SI)
Cite as:	arXiv:1904.03596 [cs.SI]
	(or arXiv:1904.03596v2 [cs.SI] for this version)
	https://doi.org/10.48550/arXiv.1904.03596

Submission history

From: Eshwar Chandrasekharan [view email]
[v1] Sun, 7 Apr 2019 07:15:35 UTC (1,779 KB)
[v2] Wed, 17 Jul 2019 01:23:39 UTC (2,532 KB)

Computer Science > Social and Information Networks

Title:Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Social and Information Networks

Title:Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators