Detecting hate speech against politicians in Arabic community on social media
International Journal of Web Information Systems
ISSN: 1744-0084
Article publication date: 4 August 2020
Issue publication date: 8 October 2020
Abstract
Purpose
This paper aims to propose an approach for hate speech detection against politicians in Arabic community on social media (e.g. Youtube). In the literature, similar works have been presented for other languages such as English. However, to the best of the authors’ knowledge, not much work has been conducted in the Arabic language.
Design/methodology/approach
This approach uses both classical algorithms of classification and deep learning algorithms. For the classical algorithms, the authors use Gaussian NB (GNB), Logistic Regression (LR), Random Forest (RF), SGD Classifier (SGD) and Linear SVC (LSVC). For the deep learning classification, four different algorithms (convolutional neural network (CNN), multilayer perceptron (MLP), long- or short-term memory (LSTM) and bi-directional long- or short-term memory (Bi-LSTM) are applied. For extracting features, the authors use both Word2vec and FastText with their two implementations, namely, Skip Gram (SG) and Continuous Bag of Word (CBOW).
Findings
Simulation results demonstrate the best performance of LSVC, BiLSTM and MLP achieving an accuracy up to 91%, when it is associated to SG model. The results are also shown that the classification that has been done on balanced corpus are more accurate than those done on unbalanced corpus.
Originality/value
The principal originality of this paper is to construct a new hate speech corpus (Arabic_fr_en) which was annotated by three different annotators. This corpus contains the three languages used by Arabic people being Arabic, French and English. For Arabic, the corpus contains both script Arabic and Arabizi (i.e. Arabic words written with Latin letters). Another originality is to rely on both shallow and deep leaning classification by using different model for extraction features such as Word2vec and FastText with their two implementation SG and CBOW.
Keywords
Citation
Guellil, I., Adeel, A., Azouaou, F., Chennoufi, S., Maafi, H. and Hamitouche, T. (2020), "Detecting hate speech against politicians in Arabic community on social media", International Journal of Web Information Systems, Vol. 16 No. 3, pp. 295-313. https://doi.org/10.1108/IJWIS-08-2019-0036
Publisher
:Emerald Publishing Limited
Copyright © 2020, Emerald Publishing Limited