To read this content please select one of the options below:

Detecting hate speech against politicians in Arabic community on social media

Imane Guellil (Laboratoire des Méthodes de Conception des Systèmes, Ecole Nationale Supérieure d’Informatique, Algiers, Algeria; School of Engineering and Applied Science (EAS), Aston University, Birmingham, UK and Folding Space, Birmingham, UK)
Ahsan Adeel (School of Mathematics and Computer Science, University of Wolverhampton, Wolverhampton, UK)
Faical Azouaou (Laboratoire des Méthodes de Conception des Systèmes, Ecole Nationale Supérieure d’Informatique, Algiers, Algeria)
Sara Chennoufi (Laboratoire des Méthodes de Conception des Systémes, Ecole Nationale Supèrieure d'Informatique, Alger, Algeria)
Hanene Maafi (Laboratoire des Méthodes de Conception des Systémes, Ecole Nationale Supèrieure d'Informatique, Alger, Algeria)
Thinhinane Hamitouche (Laboratoire des Méthodes de Conception des Systémes, Ecole Nationale Supèrieure d'Informatique, Alger, Algeria)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 4 August 2020

Issue publication date: 8 October 2020

439

Abstract

Purpose

This paper aims to propose an approach for hate speech detection against politicians in Arabic community on social media (e.g. Youtube). In the literature, similar works have been presented for other languages such as English. However, to the best of the authors’ knowledge, not much work has been conducted in the Arabic language.

Design/methodology/approach

This approach uses both classical algorithms of classification and deep learning algorithms. For the classical algorithms, the authors use Gaussian NB (GNB), Logistic Regression (LR), Random Forest (RF), SGD Classifier (SGD) and Linear SVC (LSVC). For the deep learning classification, four different algorithms (convolutional neural network (CNN), multilayer perceptron (MLP), long- or short-term memory (LSTM) and bi-directional long- or short-term memory (Bi-LSTM) are applied. For extracting features, the authors use both Word2vec and FastText with their two implementations, namely, Skip Gram (SG) and Continuous Bag of Word (CBOW).

Findings

Simulation results demonstrate the best performance of LSVC, BiLSTM and MLP achieving an accuracy up to 91%, when it is associated to SG model. The results are also shown that the classification that has been done on balanced corpus are more accurate than those done on unbalanced corpus.

Originality/value

The principal originality of this paper is to construct a new hate speech corpus (Arabic_fr_en) which was annotated by three different annotators. This corpus contains the three languages used by Arabic people being Arabic, French and English. For Arabic, the corpus contains both script Arabic and Arabizi (i.e. Arabic words written with Latin letters). Another originality is to rely on both shallow and deep leaning classification by using different model for extraction features such as Word2vec and FastText with their two implementation SG and CBOW.

Keywords

Citation

Guellil, I., Adeel, A., Azouaou, F., Chennoufi, S., Maafi, H. and Hamitouche, T. (2020), "Detecting hate speech against politicians in Arabic community on social media", International Journal of Web Information Systems, Vol. 16 No. 3, pp. 295-313. https://doi.org/10.1108/IJWIS-08-2019-0036

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles