Self-Attention GRU Networks For Fake Job Classification
Self-Attention GRU Networks For Fake Job Classification
Self-Attention GRU Networks For Fake Job Classification
ISSN No:-2456-2165
Abstract:- This paper analyses the Employment Scam A large amount of data that we encounter is text based.
Aegean Dataset and compares various machine learning Text data requires considering semantic as well as syntactic
algorithms including Logistic Regression, Decision Tree, significance of words. With deep learning, Natural Language
Random Forest, XGBoost, K-Nearest Neighbor, Naïve Processing (NLP) has accomplished great heights. It has
Bayes and Support Vector Classifier on the task of fake job empowered our machines to examine, comprehend and choose
classification. The paper also proposes two self-attention important contexts out of the compositions. Nowadays,
enhanced Gated Recurrent Unit networks, one with vanilla Recurrent Neural Network (RNN) has come up as an
RNN architecture and other with Bidirectional empowering alternative to withstand the test of time not just on
architecture, for classifying the fake job from real ones. one but numerous text-based jobs.
The proposed framework uses Gated Recurrent Units with
multi-head self-attention mechanism to enhance the long Recurrent Neural Networks have been utilized for
term retention within the network. In comparison to the different applications like text classification [1, 2, 3, 4], speech
other algorithms, the two GRU models proposed in this recognition [5], language translation [6], image captioning [7],
paper are able to obtain better result. and various others. Speculatively, vanilla Recurrent Neural
Networks show energetic common conduct for a time series
Keywords:- Fake Job Classification; Text Classification; task. However, Hochreiter [8] and Bengio et al., [9] proved that
Gated Recurrent Unit; Recurrent Neural Networks. vanilla Recurrent Neural Networks are frail to dispersing or
detonating slopes. To overcome this issue of frailing slope,
I. INTRODUCTION Hoschreiter proposed Long Short-Term Memory (LSTM) in
his 1997 paper [10]. LSTM is a combination of three gates
21st century world is the world of data. There has never namely input, forgets and output gates. The three gates together
been more data available to humans at once than now. Data is solve the issue of the slope. A more summarized adaptation of
available in various formats – texts, audios, videos, images, LSTM called Gated Recurrent Unit (GRU) was proposed in
graphs and more. There was a time when reaching people or 2011 by Cho et al., [11]. Both the LSTM and GRU have been
accessing things was not easy, but with the advent of internet used in RNN architecture for various tasks and have resulted in
everything has changed. People are one text or internet call many state-of-the-art results. Since GRU has only two gates
(audio or video) away from each other irrespective of their instead of three as is the case with LSTM, GRUs are
geographical locations. Books, journals, news, recruitments- computationally faster than LSTMs.
information regarding anything and everything was difficult to
access earlier, again with internet, it has become easier to The rest of the sections of this paper are structured as
access data or such information. Within three decades of arrival follows: Section 2 details about GRU cell and the use of GRU
of internet, we have moved from a time of not enough data to based RNN architectures for text classification. Besides this the
way too much data. With so much data available at once, we section details about the calculation of self-attention weights.
are at advantage. However, just as there is some bane In section 3, we have given the details our models. Section 4
associated with every boon, this availability of too much data includes the details of datasets, implementations, results and
also has some hidden issues. Especially when there is no the various observations that we have made based on the
validity of the data. With the advent of social media platforms outcomes of our experiments. We conclude this paper in
it has become really easy to share information obtained from section 5.
these data with people. However, this ease has brought a major
issue with it. People can and do share information with other II. BACKGROUND
people without verifying it. An information that is not verified
could pose some real threat to people using that data. For A. Recurrent Neural Networks for Text Classification
instance, a famous journalist in India thought she got a job to Recurrent neural network is a sequential network in which
teach at one of the top ranked university in the world. She quit output at each step is calculated as the function of its current
her job to accept this teaching position. However, later she got input and the outputs obtained from the previous inputs. With
to know that the job offer that she received was fake and there the recent progression within the field of text classification
was no teaching job for her. She had left her journalist job by utilizing RNNs, recurrent networks are being utilized for an
then. This is just one such instance of people falling in the trap assortment of errands. Irsoy et al., [12] in 2014, used RNN for
of fake or unverified information. opinion mining. Pollastri et al., [13], in 2002, used RNNs for
estimating the protein secondary structure. Tang et. al., [14] did
IV. EXPERIMENTS
A. Dataset
We have conducted training and testing of our models
using the Employment Scam Aegean Dataset (EMSCAD) [6] Fig. 3. Heatmap for null values in the dataset.
which is a publicly available dataset containing 17,880 real-life
job advertisements that aims at providing a clear picture of the The GRUSA model, for the first variation of dataset, uses
Employment Scam problem to the research community. 1024 GRU cells followed by the self-attention layer to
EMSCAD records were annotated by hand and were classified improve the learning over longer length. It is followed by a
into two categories. The dataset contains 17,014 legitimate job dense hidden layer of 2048 cells with sigmoid activation
advertisements and 866 fraudulent job advertisements. These function which is further followed by the output layer which
advertisements were published between 2012 to 2014. uses softmax as the activation function. Same configurations
are used for the BGRUSA model as well. For the second
The dataset was divided into training, validation and variation of the data, we have used 2608 GRU cells followed
testing sets randomly with 60% of real and fake records used by the self-attention model. The dense layer for this variation
for training, 20% for validation and 20% for testing. has 4096 cells with sigmoid activation function further
followed by output layer with softmax function. Again, same
B. Implementation configurations are used for BGRUSA model as well.
The training data is firstly preprocessed in order to
prepare it for training the models. The string is encoded into To compare our models learning, we have implemented
utf-8 unicode. All the words are converted to lowercase. Porter other machine learning algorithms also. These algorithms
stemmer is applied to the whole database to remove the include Logistic Regression, Decision Tree, Random Forest,
common morphological and inflexional endings from the XGBoost, K-Nearest Neighbor, Naïve Bayes and Support
words. Several features from original dataset is removed. job Vector Classifier. Besides these, we have also implemented
id is removed since it has all the unique values. Further we the base GRU and Bidirectional GRU models without self-
have removed the columns where we have missing value in attention. We have used accuracy, precision, recall and F1
description column. Now we prepare two variations of this score to evaluate the learning of all the models.