Paper 69-Fake Reviews Detection Using Supervised Machine
Paper 69-Fake Reviews Detection Using Supervised Machine
Paper 69-Fake Reviews Detection Using Supervised Machine
in the Fake reviews detection research; textual and behavioral III. BACKGROUND
features. Textual features refer to the verbal characteristic
of review activity. In other words, textual features depend Machine learning is one of the most important techno-
mainly on the content of the reviews. Behavioral features logical trends which lies behind many critical applications.
refer to the nonverbal characteristics of the reviews. They The main power of machine learning is helping machines
depend mainly on the behaviors of the reviewers such as to automatically learn and improve themselves from previous
writing style, emotional expressions, and the frequent times the experience [16].There are several types of machine learning
reviewers write the reviews. Although tackling textual features algorithms [17]; namely supervised, semi supervised and un-
is challenging and crucial, behavioral features are also very supervised machine learning. In the surprised approach, both
important and cannot be ignored as they have a high impact on input and output data are provided and the training data must
the performance of the fake review detection process. Textual be labeled and classified [18]. In the unsupervised learning
features have extensively been seen in several fake reviews approach, only the data is given without any classification
detection research papers. In [7], the authors used supervised or labels and the role of the approach is to find the best
machine learning approaches for fake reviews detection. Five fit clustering or classification of the input data. Thus, in
classifiers are used which are SVM, Naive-bayes, KNN, k-star unsupervised learning, all data are unlabeled and the role of
and decision tree. Simulation experiments have been done on the approach is to label them. Finally, in the semi supervised
three versions of labeled movie reviews dataset [8] consisting approach, some data are labeled but the most are unlabeled. In
of 1400, 2000, and 10662 movie reviews respectively. Also, this part, we introduce a summary of the supervised learning
in [9], the authors used Naive Bayes, Decision tree, SVM, algorithms as they are the main focus of this paper.
Random forest and Maximum entropy classifiers in detecting Several classification algorithms are developed for super-
fake reviews on the dataset that they have collected. The vised machine learning.The main objective of these algorithms
collected dataset is around 10,000 negative tweets related to is to find a proper model that disseminates the training
Samsung products and their services. In [10], the authors used data. For example, Support Vector Machines (SVM) is a
both SVM and Naive base classifiers. The authors worked discriminated classifier that basically separates the given data
on yield dataset which consists of 1600 reviews collected into classes by finding the best separable hyper-plane which
from 20 popular hotels in Chicago. In [11], the authors used categorizes the given training data[19]. Another Common
the neural and discrete models with Average, CNN, RNN, supervised learning algorithm is Naive Bayes (NB). The key
GRNN, Average GRNN and Bi-directional Average GRNN idea of NB relies on Bayes theorem; the probability of event
deep learning classifiers to detect deceptive opinion spamming. A to happen given the probability of event B which is formed
They used dataset from [12] which contains truthful and as P(A—B) = P(B—A)*P(A) P(B) [20]. NB calculates a set
deceptive reviews in three domains; namely hotels, restaurants of probabilities by counting the frequency and the combined
and doctors. All the above research works have only considered values in a given dataset. NB has been successfully applied
the textual features without any effort towards the behavioral in several application domains like text classification, spam
features. filtering and recommendation systems.
The K-Nearest Neighbors algorithm (or KNN) [21] is
one of the most simple yet powerful classification algorithms.
Other articles have considered behavioral features in the KNN has been used mostly in statistical estimation and pattern
fake reviews detection process. In [13], some behavioral recognition. The key idea behinds KNN is to classify instance
features have been considered on Amazon reviews such as query based on voting of a group of similar classified instances.
average rating, and ratio of the number of reviews that the The similarity is usually calculated using distance function [22]
reviewer wrote. In another work [14], the authors investigated
the impact of both textual and behavioural features on the fake Decision-tree [23] is another machine learning classifier
review detection process focusing on the restaurant and hotel that relies on building a tree that represents a decision of
domain. Also, In[15], an iterative computation framework plus instances training data. The Algorithm starts to construct the
plus (ICF++) is proposed integrating textual and behavioral tree iteratively based on best possible split among features. The
features. They detected fake reviews based on measuring the selection process of the best features relies on a predefined
honesty value of a review, the trustiness value of the reviewers functions like, entropy,information gain, gain ratio, or gini
and the reliability value of a product. index. Random Forest [24] is a successful method that
handles the overfitting problems that occur in the decision tree.
The key essence of random forest is to construct a bag of trees
from different samples of the dataset. Instead of constructing
From the above discussion and to the best of our knowl- the tree from all features, Random forest generates small
edge, no approaches have dived deeply in extracting features random number of features while constructing each tree in the
that reflect the reviewers’ behaviors. These features will highly forest. Logistic regression [25] is another simple supervised
influence the effectiveness of the fake reviews detection pro- machine learning classifier. It relies on finding a hyperplane
cess. In this paper a machine learning approach to identify that classifies the data.
fake reviews is presented. In addition to the features extraction
process of the reviews, the presented approach applies several IV. P ROPOSED A PPROACH
features engineering to extract various behaviors of the review-
ers. Some new behavioral features are created. The created This section explains the details of the proposed approach
features are used as inputs to the proposed system besides the shown in figure 1. The proposed approach consists of three
textual features for fake reviews detection task. basic phases in order to get the best model that will be used
www.ijacsa.thesai.org 602 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 1, 2021
for fake reviews detection. These phases are explained in the Several approaches have been developed in the literature to
following: extract features for fake reviews detection. Textual features is
one popular approach [31]. It contains sentiment classification
A. Data Preprocessing [32] which depends on getting the percent of positive and
negative words in the review; e.g. “good”, “weak”. Also,
The first step in the proposed approach is data the Cosine similarity is considered. The Cosine similarity is
preprocessing [26]; one of the essential steps in machine the cosine of the angle between two n-dimensional vectors
learning approaches. Data preprocessing is a critical activity in an n-dimensional space and the dot product of the two
as the world data is never appropriate to be used. A sequence vectors divided by the product of the two vectors’ lengths
of preprocessing steps have been used in this work to prepare (ormagnitudes)[33]. TF-IDF is another textual feature method
the raw data of the Yelp dataset for computational activities. that gets the frequency of both true and false (TF) and the
This can be summarized as follows: inverse document (IDF). Each word has a respective TF and
IDF score and the product of the TF and IDF scores of a term is
called the TF-IDF weight of that term [34]. A confusion matrix
1) Tokenization: Tokenization is one of the most common
is used to classify the reviews into four results; True Negative
natural language processing techniques. It is a basic step
(TN): Real events are classified as real events, True Positive
before applying any other preprocessing techniques. The text
(TP): Fake events are classified as fake, False Positive (FP):
is divided into individual words called tokens. For example,
Real events are classified as fake events, and False Negative
if we have a sentence (“wearing helmets is a must for pedal
(FN): Fake events are classified as real.
cyclists”), tokenization will divide it into the following tokens
(“wearing” , “helmets” , “is” , “a”, “must”, “for” , “pedal” , Second there are user personal profile and behavioral
“cyclists”) [27]. features. These features are the two ways used to identify
spammers Whether by using time-stamp of user’s comment
2) Stop Words Cleaning: Stop words [28] are the words
is frequent and unique than other normal users or if the user
which are used the most yet they hold no value. Common
posts a redundant review and has no relation to domain of
examples of the stop words are (an, a, the, this). In this paper,
target.
all data are cleaned from stop words before going forward in
the fake reviews detection process. In this paper, We apply TF-IDF to extract the features
of the contents in two languages models; mainly bi-gram
3) Lemmatization: Lemmatization method is used to con-
and tri-gram. In both language models, we apply also the
vert the plural format to a singular one. It is aiming to remove
extended dataset after extracting the features representing the
inflectional endings only and to return the base or dictionary
users behaviors.
form of the word. For example: converting the word (“plays”)
to (“play”) [29].
C. Feature Engineering
Fake reviews are known to have other descriptive features
[35] related to behaviors of the reviewers during writing their
reviews. In this paper, we consider some of these feature
and their impact on the performance of the fake reviews
detection process. We consider caps-count, punct-count, and
emojis behavioral features. caps-count represents the total
capital character a reviewer use when writing the review,
punct-count represents the total number of punctuation that
found in each review, and emojis counts the total number of
emojis in each review. Also, we have used statistical analysis
on reviewers’ behaviours by applying “groupby” function, that
gets the number of fake or real reviews by each reviewer that
are written on a certain date and on each hotel. All these
features are taken into consideration to see the effect of the
Fig. 1. The Proposed Framework. users behaviors on the performance of the classifiers.
V. E XPERIMENTAL R ESULTS
B. Feature Extraction
We evaluated our proposed system on Yelp dataset [5].
Feature extraction is a step which aims to increase the This dataset includes 5853 reviews of 201 hotels in Chicago
performance either for a pattern recognition or machine written by 38, 063 reviewers. The reviews are classified into
learning system. Feature extraction represents a reduction 4, 709 review labeled as real and 1, 144 reviews labeled as
phase of the data to its important features which yields in fake. Yelp has classified the reviews into genuine and fake.
feeding machine and deep learning models with more valuable Each instance of the review in the dataset contains the review
data. It is mainly a procedure of removing the unneeded date, review ID, reviewer ID, product ID, review label and star
attributes from data that may actually reduce the accuracy of rating. The statistics of dataset is summarized in Table I. The
the model [30]. maximum review length in the data contains 875 word, the
minimum review length contains 4 words, the average length
www.ijacsa.thesai.org 603 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 1, 2021
of all the reviews is 439.5 word, the total number of tokens TABLE II. ACCURACY OF BI - GRAM AND TRI - GRAM IN THE A BSENCE OF
of the data is 103052 word, and the number of unique words E XTRACTED F EATURES B EHAVIORS
is 102739 word.
Accuracy% Accuracy% Average
Classification Algorithm
Bigram Trigram Accuracy
TABLE I. S UMMARY OF THE DATASET Logistic Regression 87.87% 87.87% 87.87%
Naive bayes 86.76% 87.30% 87.03%
Total number of reviews 5853 review KNN (K=7) 86.34% 87.87% 87.82%
Number of fake reviews 1144 review SVM 87.82% 87.82% 87.82%
Number of real reviews 4709 review Random Forest 87.82% 87.82% 87.82%
Number of distinct words 102739 word
Total number of tokens 103052 token
The Maximum review length 875 word
The Minimum review length 4 word
The Average review length 439.5 word
indicators when the data is unbalanced. Similar to the previous, TABLE V. R ECALL , P RECISION , AND F1- SCORE IN P RESENCE OF
Table IV represents the recall, precision, and hence f1-score E XTRACTED B EHAVIORAL F EATURES
in the absence of the extracted features behaviors of the users
Bi-gram Tri-gram Avg F-score
in the two language models. For the trade-off between recall Recall Precision F-score Recall Precision F-score
and precision, f1-score is taken into account as the evaluation Logistic Regression 86.90% 75.53% 82% 86.90% 75.53% 80.82% 81.41%
criterion of each classifier. In Bi-gram, KNN(k=7) outperforms Naive Bayes 85.82% 76% 80.38% 86.34% 76.59% 80.64% 80.51%
KNN(K=7) 86.56% 80% 81.26% 85.30% 78.50% 86.20% 83.73%
all other classifiers with f1-score value of 82.40%. Whereas, in SVM 86.90% 75.50% 80.82% 84.90% 75.53% 81.82% 81.32%
Tri-gram, both logistic regression and KNN(K-7) outperform Random Forest 86.85% 75.50% 80.79% 87.90% 74.53% 81.90% 81.34%
other classifiers with f1-score value of 82.20%. To evaluate
the overall performance of the classifiers in both language
models, the average f1-score is calculated. It is found that,
KNN outperforms the overall classifiers with average f1-score
of 82.30%. Fig. 4 depicts the the overall performance of all
classifiers.
www.ijacsa.thesai.org 605 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 1, 2021
[2] S. Tadelis, “The economics of reputation and feedback systems in e- [19] T. Joachims, “Text categorization with support vector machines: Learn-
commerce marketplaces,” IEEE Internet Computing, vol. 20, no. 1, pp. ing with many relevant features.” 1998.
12–19, 2016. [20] T. R. Patil and S. S. Sherekar, “Performance analysis of naive bayes
[3] M. J. H. Mughal, “Data mining: Web data mining techniques, tools and and j48 classification algorithm for data classification,” pp. 256–261,
algorithms: An overview,” Information Retrieval, vol. 9, no. 6, 2018. 2013.
[4] C. C. Aggarwal, “Opinion mining and sentiment analysis,” in Machine [21] M.-L. Zhang and Z.-H. Zhou, “Ml-knn: A lazy learning approach to
Learning for Text. Springer, 2018, pp. 413–434. multi-label learning,” Pattern recognition, vol. 40, no. 7, pp. 2038–2048,
[5] A. Mukherjee, V. Venkataraman, B. Liu, and N. Glance, “What yelp 2007.
fake review filter might be doing?” in Seventh international AAAI [22] N. Suguna and K. Thanushkodi, “An improved k-nearest neighbor clas-
conference on weblogs and social media, 2013. sification using genetic algorithm,” International Journal of Computer
[6] N. Jindal and B. Liu, “Review spam detection,” in Proceedings of the Science Issues, vol. 7, no. 2, pp. 18–21, 2010.
16th International Conference on World Wide Web, ser. WWW ’07, [23] M. A. Friedl and C. E. Brodley, “Decision tree classification of land
2007. cover from remotely sensed data,” Remote sensing of environment,
[7] E. Elmurngi and A. Gherbi, Detecting Fake Reviews through Sentiment vol. 61, no. 3, pp. 399–409, 1997.
Analysis Using Machine Learning Techniques. IARIA/DATA ANA- [24] A. Liaw, M. Wiener et al., “Classification and regression by random-
LYTICS, 2017. forest,” R news, vol. 2, no. 3, pp. 18–22, 2002.
[8] V. Singh, R. Piryani, A. Uddin, and P. Waila, “Sentiment analysis [25] D. G. Kleinbaum, K. Dietz, M. Gail, M. Klein, and M. Klein, Logistic
of movie reviews and blog posts,” in Advance Computing Conference regression. Springer, 2002.
(IACC), 2013, pp. 893–898. [26] G. G. Chowdhury, “Natural language processing,” Annual review of
[9] A. Molla, Y. Biadgie, and K.-A. Sohn, “Detecting Negative Deceptive information science and technology, vol. 37, no. 1, pp. 51–89, 2003.
Opinion from Tweets.” in International Conference on Mobile and [27] J. J. Webster and C. Kit, “Tokenization as the initial phase in nlp,”
Wireless Technology. Singapore: Springer, 2017. in Proceedings of the 14th conference on Computational linguistics-
[10] S. Shojaee et al., “Detecting deceptive reviews using lexical and Volume 4. Association for Computational Linguistics, 1992, pp. 1106–
syntactic features.” 2013. 1110.
[11] Y. Ren and D. Ji, “Neural networks for deceptive opinion spam [28] C. Silva and B. Ribeiro, “The importance of stop word removal on recall
detection: An empirical study,” Information Sciences, vol. 385, pp. 213– values in text categorization,” in Neural Networks, 2003. Proceedings
224, 2017. of the International Joint Conference on, vol. 3. IEEE, 2003, pp.
1661–1666.
[12] H. Li et al., “Spotting fake reviews via collective positive-unlabeled
learning.” 2014. [29] J. Plisson, N. Lavrac, D. Mladenić et al., “A rule based approach to
word lemmatization,” 2004.
[13] N. Jindal and B. Liu, “Opinion spam and analysis,” in Proceedings of
the 2008 International Conference on Web Search and Data Mining, [30] C. Lee and D. A. Landgrebe, “Feature extraction based on decision
ser. WSDM ’08, 2008, pp. 219–230. boundaries,” IEEE Transactions on Pattern Analysis & Machine Intel-
ligence, no. 4, pp. 388–400, 1993.
[14] D. Zhang, L. Zhou, J. L. Kehoe, and I. Y. Kilic, “What online reviewer
behaviors really matter? effects of verbal and nonverbal behaviors on [31] N. Jindal and B. Liu, “Opinion spam and analysis.” in Proceedings
detection of fake online reviews,” Journal of Management Information of the 2008 international conference on web search and data mining.
Systems, vol. 33, no. 2, pp. 456–481, 2016. ACM, 2008.
[15] E. D. Wahyuni and A. Djunaidy, “Fake review detection from a product [32] M. Hu and B. Liu, “Mining and summarizing customer reviews.” 2004.
review using modified method of iterative computation framework.” [33] R. Mihalcea, C. Corley, C. Strapparava et al., “Corpus-based and
2016. knowledge-based measures of text semantic similarity,” in AAAI, vol. 6,
[16] D. Michie, D. J. Spiegelhalter, C. Taylor et al., “Machine learning,” 2006, pp. 775–780.
Neural and Statistical Classification, vol. 13, 1994. [34] J. Ramos et al., “Using tf-idf to determine word relevance in document
[17] T. O. Ayodele, “Types of machine learning algorithms,” in New ad- queries,” in Proceedings of the first instructional conference on machine
vances in machine learning. InTech, 2010. learning, vol. 242, 2003, pp. 133–142.
[18] F. Sebastiani, “Machine learning in automated text categorization,” ACM [35] G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh,
computing surveys (CSUR), vol. 34, no. 1, pp. 1–47, 2002. “Exploiting burstiness in reviews for review spammer detection,” in
Seventh international AAAI conference on weblogs and social media,
2013.
www.ijacsa.thesai.org 606 | P a g e