Amharic Fake Account Detection in Social Network PDF
Amharic Fake Account Detection in Social Network PDF
Amharic Fake Account Detection in Social Network PDF
Abstract
A social networking service serves as a platform to build social networks or social relations among people who,
share interests, activities, backgrounds, or real life connections. A social network service is generally offered to
participants who registers to this site with their unique representation (often a profile) and one’s social links.
Most social network services are web-based and provide means for users to interact over the Internet. [1].
Online social networking sites became an important means in our daily life. Millions of users register and share
personal information with others. Because of the fast expansion of social networks, public may exploit them for
unprincipled and illegitimate activities. As a result of this, privacy threats and disclosing personal information
have become the most important issues to the users of social networking sites. The intent of creating fake pro-
files have become an adversary effect and difficult to detect such identities/malicious content without appropri-
ate research. The current research that have been developed for detecting malicious content, primarily consid-
ered the characteristics of user profile. Most of the existing techniques lack comprehensive evaluation. In this
work we propose new model using machine learning and NLP (Natural Language Processing) techniques to en-
hance the accuracy rate in detecting the fake identities in online social networks. We would like to apply this
approach to Facebook by extracting the features like Time, date of publication, language, and geo position. [2]
Key words: Amharic, Classification, Detection, Fake account, Machine learning, NLP, Social media,
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 605
1. Introduction
1.1. Background
Social media currently provide localization, which allows the user to use different world languages on their
sites. One of these languages is Amharic, Amharic languages are one of wildly spoken language and working
language of the federal government of Ethiopia. The language is written left-to-right and has its unique script,
which lacks capitalization and in total 275 characters, mainly consonant-vowel pairs. [3] It is the second most
spoken Semitic language in the world (after Arabic) and closely related to Tigrinya. It is probably the second
largest language in Ethiopia (after Oromo, a Cushitic language) and possibly one of the five largest languages
on the African continent. Despite the relatively large number of speakers, Amharic is still a language for which
very few computational linguistic resources have been developed for the language. [3]
Online Social Networks are most popular through which information can be exchanged through the world. So-
cial Networks being the center of attraction for many applications and they incorporate a range of new infor-
mation and communication tools to the user community. A Social Network is best viewed as a graphical struc-
ture with nodes and edges depicting the users and their interaction activities respectively. The nodes and edges
in a Social Network graph can be labeled or unlabeled depending upon the structure of the network being used.
Because of the great reputation of social intelligence, social networking sites such as Facebook, YouTube, Twit-
ter, LinkedIn, Pinterest, Google +, Tumblr and Instagram have become the preferred means of communication
and information sharing tools amongst a diverse set of users including individuals and companies. The users of
the social networks will play a vital role and they are completely responsible for the contents being exchanged
in the networks. Users share information by interesting websites, videos and files. People share confidential data
through the set-up of great faith and others have the same faith in the data shared. The rush of online social net-
works’ reputation and the accessibility of huge amount of data enable them simple objective to the opponents.
These objectives mainly include stealing individual user’s details without seeking any permission. One of the
main problems in social media is the spammers as they can use their accounts for different targets. One of these
targets is spreading rumors which may affect a determined business or even the society in a large scale. Accord-
ing to the importance of the effect of social media to the society, in this research, [4] aim to detect the fake pro-
file accounts from Twitter online social network to prevent the spreading of fake news, advertisements and fake
followers.
The attempt for the encroachment of a legitimate user profile through fake identities is considered as the mostly
practiced technique. As the expansion of greater security in online social networking sites it turned to be very
hard to encroach into online social networks. As a result of this, antagonists create false identities to gain access
to other profiles. [2] In 2019, Facebook took down on average close to 2 billion fake accounts per quarter.
Fraudsters use these fake accounts to spread spam, phishing links, or malware. It’s a lucrative business that can
be devastating for any innocent users that it snares. Facebook is now releasing details about the machine-
learning system it uses to tackle this challenge. The tech giant distinguishes between two types of fake accounts.
First, there are “user-misclassified accounts,” personal profiles for businesses or pets that are meant to be Pages.
These are relatively straightforward to deal with—they just get converted to Pages. “Violating accounts,” on
the other hand, are more serious. These are personal profiles that engage in scamming and spamming or other-
wise violate the platform’s terms of service. Violating accounts need to be remove as quickly as possible with-
out casting net and snagging real accounts as well. [5] The main objective of any Social Networking Site is to
target different user segments. The best thing about Facebook is the ability to find old friends, but YouTube pro-
vides a platform for people to connect, inform, and inspire others across the world by video sharing. According
to ETV News (Ethiopian Television) report in June 5, 2020 more than 5 million Birr (money) were fraud by
fake account user in social media. The following figure shows how the fake account is a serious problem. [6]
Available on: https://www.youtube.com/watch?v=e9s3B4dZJus
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 606
PCA is applied to reduce the dimensionality of the dataset. In this proposed work PCA plays an important posi-
tion by giving the great endorsement to make decisions on which profile features to be used. Principal Compo-
nent Analysis (PCA) is the simplest and robust dimensionality reduction technique ever seen. In this paper we
have selected a mathematical model called variance maximization for drawing PCA results. According to this
model “first principal component has the highest projection variance which is the direction in feature space
along. And the second component defines the direction which has highest projection variance among all the
other orthogonal direction to the first component”. While calculating the score on profile features both false and
real accounts to be measured [9]
1.3. Related Work
Different researches have been presented to detect fake accounts with different approaches in this study, they
have presented a classification method for detecting the fake accounts on Twitter. They have preprocessed the
dataset using a supervised discretization technique named Entropy Minimization Discretization (EMD) on nu-
merical features and analyzed the results of the Naïve Bayes algorithm. [4]. Inspired by the importance of de-
tecting fake accounts, researchers have recently started to investigate efficient fake accounts detection mecha-
nisms. Most detection mechanisms attempt to predict and classify user accounts as real or fake (malicious,
Sybil) by analyzing user level activities or graph-level structures. There are several data mining methodologies
[4] and approaches that help detecting fake accounts that are described in the following sub-sections. [7] In this
section, we woud demonstrate some of the works that have been presented in this area. Reference [1]has
reached an accuracy 80% the performance were evaluated using the supervised machine learning algorithms
and the highest accuracy were obtained and the maximum percentage of skin exposed were calculated from the
images collected from the fake accounts. However, in my research. [10]Neural network algorithm is used to
evaluate the proposed feature set and compare it against the state-of-the-art feature sets in detecting fraud. The
feature set considers the user’s social interaction on the Yelp platform to determine if the user is committing
fraud. The neural network algorithm helps in comparing the feature set with other feature sets used to detect
fraud. Any attempt to find the characteristics that lead to fraud has a prerequisite to be good enough to detect
fraud as well. However, [11] OSNs suffer from abuse in the form of the creation of fake accounts, which do not
correspond to real humans. Fakes can introduce spam, manipulate online rating, or exploit knowledge extracted
from the network. OSN operators currently expend significant resources to detect, manually verify, and shut
down fake accounts. [12]Information is spread across social networks quickly. However at the same time social
media networks become susceptible to different types of unwanted spammer actions. As part of their work, they
propose a mechanism to detect spammers in facebook social network. Their work is based on number of fea-
tures at content level and user level. Use [13]classification algorithms in machine learning to detect fake ac-
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 607
counts. The process of finding a fake account mainly depends on factors such as engagement rate and artificial
activity. and Decision trees are made seeing the success rate i.e., in their case taking the value which contains
more fake accounts. Following Table show works done by different Peoples in this area. [1], [4], [14], [9], [10],
[15], [12], [8]
Author and year Title Feature extraction Method and Accu-
racy
M. Smruthi, N. A Hybrid Scheme for Time, date of pub-
Harini (2019) Detecting Fake Ac- machine learning lication, language,
counts in Facebook and NLP (Natural and geoposition
Language Pro-
cessing) tech-
niques
Buket Ersahin1, Twitter Fake Account supervised dis- 85.55%
Ozlem Aktas1 Detection cretization tech-
, nique named
Deniz Kilinç2, Entropy Minimiza-
Ceyhun Akyol2 tion Discretization
(2017) (EMD
Mohammadreza Identifying Fake Ac- Graph Analysis 75%
Mohammadrezaei counts on Social Net- and Classification
,1 Mohammad works Based on Algorithms
Ebrahim Shiri ,1,2 Graph Analysis and
and AmirMasoud Classification Algo-
Rahmani1,3,4 rithms
(2018)
Time, date of pub-
Srinivas Rao Pul- A Comprehensive Mod- machine learning lication, language,
luri1, Jayadev el for Detecting Fake and NLP (Natural and geoposition
Gyani2, Narsimha Language
Profiles in Online Social Pro-
Gugulothu3 Networks cessing) tech-
(2017) niques
Kunal Goswami, Impact of reviewer so- machine learning F-score of
Younghee Park* cial interaction techniques 75.4 % for burst
and Chungsik on online consumer re- reviews, and 68.7
Song view fraud detection % for all reviews.
(2017)
Michael Craw- Survey of review spam machine learning 65 % accuracy
ford*, Taghi M. detection using techniques
Khoshgoftaar, Jo- machine learning tech-
seph D. Prusa, Aa- niques
ron N. Richter and
Hamzah Al Najada
(2017)
K Subba Reddy, An Efficient Methodol- Naïve Bayes and The integrated
Dr E Srinivasa ogy to Detect Spam Decision Tree al- algorithm classifies
Reddy in Social Networking gorithms an account as
(2017) Sites spammer or non
spammer
with an overall ac-
curacy of 90.5%.
Sarah Khaled, Detecting Fake Ac- classification Roughly 70% of
Hoda M. O. counts on Social Media spammers and 96%
Mokhtar, Neamat of non-spammers
El-Tazi were effectively
(2018) characterized in
their outcome.
Consolidation data
Data annotation
In feature reduction phase, four data reduction techniques were applied to guide the process of deciding the most promising feature
patterns to be used in the mining process [7]
• PCA
• Spear mans Rank-Order Correlation
• Wrapper Feature Selection using SVM
• Multiple Linear Regression
Experiments are performed to evaluate the performance of the developed system as the following flow
chart
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 610
cation algorithm and SVM classification algorithm were used as the principles mining techniques in many so-
cial network researches, so they have been applied on the feature sets mentioned in Feature Reduction and com-
pared with the proposed SVN-NN algorithm. [7]
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 611
As mentioned above the feature subsets with highest accuracy was highlighted, as following:
spearmans rank-order Correlation best pattern was
(1000001000110110), Multiple linear Regression best
pattern was
(0110110111001111),
Wrapper-SVM best
pattern was
(110111111011111). [7]
Most of the existing
techniques for detecting
malicious content of Fa-
cebook lack inclusive
evaluation. The main ob-
jective of [2] research
work is to increase the
accuracy rate in identify-
ing the fake pro-
files/malicious content in
online social networking
sites as compared to ex-
isting research. We
would like to apply the
proposed approach on
Facebook.
Working Princi-
ple of Proposed Work
Figure 7 working
principle of proposed
work
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 612
users were classified as real, possibly because fake accounts mimic real user behavior to elude detection mech-
anisms.
Detecting and blocking fake account is important for online communities for maintaining safe environments for
its real users and as a responsibility considering their impact on society. Fake account detection system will help
for reduction of time, fraud and human effort to identify privacy attack on social media. The system will help to
filter any fake user that makes peoples of the local population indirectly or directly participate in the violent
activities across the different region of the country.
1.7. Conclusion
Fake accounts are being continuously evolving in online social media. Therefore, it is very essential to invent
new methods to detect Fake profiles in online social media. So the real time Facebook dataset were required to
detect the fake accounts and vulgar images in Facebook. For the detection of Fake accounts the user timeline
information namely post-count, comment-count, etc. were used and for the vulgar image detection the images
from the user time line and the display picture of the users were taken out. The performance were evaluated us-
ing the supervised machine learning algorithms and the highest 80%accuracy were obtained and the maximum
percentage of skin exposed were calculated from the images collected from the fake accounts. For the future
scope, a more complex algorithm for the skin detection can be implemented. The natural language processing
techniques can be implemented to detect fake accounts more accurately. The new features will be certainly in-
troduced by the Facebook, and these features can also be included while analyzing the fake accounts. [1]
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 613
REFERENCES
[1] N. H. . M. Smruthi, "A Hybrid Scheme for Detecting Fake Accounts in Facebook," International Journal of
Recent Technology and Engineering (IJRTE), pp. 213-217, , February 2019.
[2] J. G. N. G. Srinivas Rao Pulluri1, "A Comprehensive Model for Detecting Fake Profiles in Online Social
Networks," International Journal of Advanced Research in Science and Engineering, pp. 1-10, 2017.
[3] Y. K. Defar, "Hate Speech Detection for Amharic Language on Social Media Using Machine Learning
Techniques," pp. 1-103, September 2019.
[4] Ö. A. D. K. C. A. Buket Ersahin1, "Twitter Fake Account Detection," IEEE, pp. 388-392, 2017.
[5] K. Hao, "Hao, Karen Archive Page," 4 March 2020. [Online]. Available:
https://www.technologyreveiw.com.
[7] S. B. S. A. Sachin Ingle1, "Detecting Fake User Accounts on," IJARIIE-ISSN(O)-2395-4396, pp. 927-931,
2019.
[8] H. M. O. M. N. E.-T. Sarah Khaled, "Detecting Fake Accounts on Social Media," in IEEE International
Conference on Big Data (Big Data), Cairo, 2018.
[9] J. G. N. G. Srinivas Rao Pulluri1, "A Comprehensive Model for Detecting Fake Profiles in Online Social
Networks," International Journal of Advanced Research in Science and Engineering, p. 10, 2017.
[10] Y. P. a. C. S. Kunal Goswami, "Impact of reviewer social interaction," Springer Journal of Big Data, pp. 1-
19, 2017.
[11] Q. C. †. M. S. ‡. X. Y. T. Pregueiro, "Aiding the Detection of Fake Accounts in Large Scale Social Online
Services," pp. 1-14.
[12] D. E. S. R. K Subba Reddy, "An Efficient Methodology to Detect Spam," International Journal of Computer
Science and Information Security (IJCSIS),, pp. 151-158, 2017.
[13] H. K. G. S. T. P. R. S. P. Maniraj, "Fake Account Detection using Machine Learning and Data Science,"
International Journal of Innovative Technology and Exploring Engineering (IJITEE), pp. 583-585, 2019.
GSJ© 2020
www.globalscientificjournal.com
GSJ: Volume 8, Issue 6, June 2020
ISSN 2320-9186 614
[17] L. Guta, "Social network hate speech detection for afaan oromoo language," p. 8, 11 June 2019.
[18] C. L. P. a. N. Solomom, "Social media and journalism i Ethiopia," FOJO MEDIA INSTITUTE , Linnaeus
University Stockholm, 2019.
GSJ© 2020
www.globalscientificjournal.com