Ijaret 11 12 313

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/356253935
EMPLOYEE ATTRITION PREDICTION IN INDUSTRY USING MACHINE

LEARNING TECHNIQUES
Article in INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING & TECHNOLOGY · December 2020

DOI: 10.34218/IJARET.11.12.2020.313
CITATIONS READS
10 1,510
2 authors, including:
Dr. GOPINATH. R.
Bharat Sanchar Nigam Ltd.
193 PUBLICATIONS 1,272 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
system View project
Commerce and Management View project
All content following this page was uploaded by Dr. GOPINATH. R. on 16 November 2021.
The user has requested enhancement of the downloaded file.

International Journal of Advanced Research in Engineering and Technology (IJARET)
Volume 11, Issue 12, December 2020, pp. 3329-3341, Article ID: IJARET_11_12_313
Available online at https://iaeme.com/Home/issue/IJARET?Volume=11&Issue=12
Journal Impact Factor (2020): 10.9475 (Calculated by GISI) www.jifactor.com
ISSN Print: 0976-6480 and ISSN Online: 0976-6499
DOI: https://doi.org/10.34218/IJARET.11.12.2020.313
© IAEME Publication Scopus Indexed
EMPLOYEE ATTRITION PREDICTION IN

INDUSTRY USING MACHINE LEARNING
TECHNIQUES
Dr. M. Subhashini1 and Dr. R. Gopinath2
1
Assistant Professor in Department of Computer Science,
SrimadAndavan Arts and Science College (Autonomous)
(Affiliated to Bharathidasan University), Tiruchirappalli, Tamil Nadu, India
2
D.Litt. (Business Administration) - Researcher, Madurai Kamaraj University,
Madurai, Tamil Nadu, India
ABSTRACT
Companies are always looking for ways to keep their professional personnel on
board in order to save money on hiring and training. Predicting whether or not a
specific employee would depart will assist the organisation in making proactive
decisions. Human resource problems, unlike physical systems, cannot be defined by a
scientific-analytical formula. As a result, machine learning approaches are the most
effective instruments for achieving this goal. In this study, a feature selection strategy
based on a Machine Learning Classifier is proposed to improve classification accuracy,
precision, and True Positive Rate while lowering error rates such as False Positive Rate
and Miss Rate. Different feature selection techniques, such as Information Gain, Gain
Ratio, Chi-Square, Correlation-based, and Fisher Exact test, are analysed with six
Machine Learning classifiers, such as Artificial Neural Network, Support Vector
Machine, Gradient Boosting Tree, Bagging, Random Forest, and Decision Tree, for the
proposed approach. In this study, combining Chi-Square feature selection with a
Gradient Boosting Tree classifier improves employee attrition classification accuracy
while lowering error rates.
Key words: Feature Selection, Employee Attrition, Classification, Error Rates,
Accuracy.
Cite this Article: M. Subhashini and R. Gopinath, Employee Attrition Prediction in
Industry Using Machine Learning Techniques, International Journal of Advanced
Research in Engineering and Technology, 11(12), 2020, pp. 3329-3341.
https://iaeme.com/Home/issue/IJARET?Volume=11&Issue=12
1. INTRODUCTION
Employee turnover is another name for employee attrition. Wearing down is a common
problem, and it's more prevalent in today's industry. In the vast majority of associations, it is
https://iaeme.com/Home/journal/IJARET 3329 [email protected]

Employee Attrition Prediction in Industry Using Machine Learning Techniques
one of the most important difficulties [1]. Employee Defection refers to the gradual reduction
in the number of representatives due to retirement, renunciation, or death. Wearing down rates
vary widely from industry to industry in terms of their own principles, and these rates can also
differ between bright and inept positions. Organizations face a daunting task of enrolment and
gift retention, while also dealing with ability misfortune due to continual loss, whether because
to industry midtowns or willful person turnover [2]. When a well-trained and well-adjusted
employee quits the company, a void is created. As a result, the organisation loses important
skills, information, and business relationships [29]. Current chiefs and individual executives
are extremely interested in reducing wear and tear in the organisation in such a way that it will
contribute to the organization's most extreme viable development and progress. Any business's
representative consent is, in any case, organisation. If the situation is not handled properly,
critical personnel departures can result in a significant loss of earnings. Representative turnover
results in execution losses, which can have a long-term negative influence on businesses [3][4].
With rate reduction a significant concern for every industry, businesses endeavour to use
innovative business methods to reduce maintenance [5]. Although there is no way to completely
eliminate continuous loss, we can reduce it by implementing appropriate solutions. It could also
be when a supervisor estimates the rate of employee turnover ahead of time [30].
Machine learning can be used to create labelled data classifications or to create hidden
structures from unlabeled data. The ability of machine learning algorithms to anticipate the
possibility of a person leaving an organisation can be used by top-level management of firms
[6][7]. This procedure will aid in the control of attrition-causing causes and the prevention of
attrition. Employee turnover is a significant issue for employers. Every organization's ability to
retain talent is critical. As a result, if management can obtain a prediction likelihood of
employee separation as well as the variables driving the separation, it can be useful in making
actions that reduce attrition risk [8] [9]. Here's when machine learning comes in handy. Top
management will take proactive actions to retain personnel based on the forecasts made by
machine learning algorithms [10].
2. RELATED WORKS
Tang, Ziyuan, Gautam Srivastava, and Shuai Liu [11] based on accounting market big data,
offered a strategy for selecting accounting models for small and medium-sized firms (SMEs)
(AMBD). To begin, some indicators from a company's solvency, operating capacity,
profitability, and growth capacity are chosen, such as the current ratio, quick ratio, asset-
liability ratio, accounts receivable turnover rate, and other indicators from the solvency,
operating capacity, profitability, and growth capacity. Following that, the AMBD constraints
are classified using the principal component analysis method. Finally, the optimal accounting
model is established by iteration by combining particle swarm optimization with ant colony
optimization.
Marichelvam, M. K., M. Geetha, and OmurTosun [12] the effect of human variables was
taken into account when solving the multi-stage hybrid flow shop scheduling problem with
identical parallel machines at each level. The aim function is to minimise the weighted sum of
the make span and total flow time. Because the problem is NP-hard, we propose an improved
version of the particle swarm optimization (PSO) algorithm to solve it. To improve the PSO
algorithm's initial solutions, a dispatching rule and a constructive heuristic are used. The
variable neighbourhood search (VNS) algorithm is used with the PSO algorithm to provide the
best results in the shortest amount of time.
Jhaver, Mehul, Yogesh Gupta, and Amit Kumar Mishra [13] the study's unique addition is
to investigate the usage of the Gradient Boosting technique, which is more resilient due to its
regularisation formulation. Gradient Boosting is compared to three commonly used supervised

M. Subhashini and R. Gopinath
classifiers, such as Logistic Regression, Support Vector Machine, and Random Forest, using
worldwide retailer data to show that it has a higher accuracy for forecasting staff turnover.
Machado, Marcos Roberto, Salma Karray, and IvaldoTributino de Sousa [14] showcased a
financial company's deployment of a Machine Learning model to predict customer loyalty. The
researchers assessed the accuracy of two Gradient Boosting Decision Tree Models:
XGBoosting and the LightGBM algorithm, which has never been used to predict customer
loyalty.
Keshri, Rajat, and Srividya [15] Microsoft published the LightGBM algorithm in 2017,
which was explored. The authors compared LighGBM to other known algorithms in this paper.
The dataset's data is used to compare LightGBM to other classification algorithms and
demonstrate LightGBM's excellent prediction accuracy.
Padmasini, Ms, and K. Shyamala [16] A model is presented that clusters Original
Equipment Customers, Distributors, and Dealers in order to determine what the consumer
thinks about the products and firm based on pricing, quality, and delivery logistics, and then
decides to buy an automobile product and become a loyal client. The Integrated Gower based
PSO-KMode model proposes using the Gower dissimilarity measure and optimising the data
using Particle Swarm Optimization with K Modes clustering method to locate the existing
consumer through a survey.
Eitle, Verena, and Peter Buxmann [17] proposed a model to assist software sales reps in
managing the complicated sales funnel. Data-driven qualification assistance decreases the high
degree of arbitrariness produced by professional expertise and experiences by incorporating
business analytics in the form of machine learning into lead and opportunity management.
Using real business data from the company's CRM system, the authors created an artefact
consisting of three models to map the end-to-end sales pipeline.
3. PROPOSED FRAMEWORK FOR EMPLOYEE ATTRITION

PREDICTION
3.1. Artificial Neural Network Classifier
The organic neuron employed for prediction formed the basis for Neural Network [18].
Network of Neurons Let's see if we can get a single neuron to learn. Figure 1 depicts a single
neuron with a single input. Where O is the output, σ is the sigmoid function, ξ is the neuron's
input, and ω is the weight that connects the input to the neuron, the provided equation defines
the single input neuron.
O = σ(ξ ω)
Figure 1 Single input neuron0

When a neuron has many inputs, as shown in Figure 2, the MLP is made up of inputs that
are weighted and connected to the layer. The neuron then takes many inputs and forms a
multilayer perceptron as a result. The diagram depicts a multi-layer experience.

Figure 2 Multilayer perceptron

O = (𝛏𝟏𝛚𝟏+ 𝛏𝟐𝛚𝟐+⋯……+ 𝛏𝐤𝛚𝐤) +Θ
where O is the output.

σ is the sigmoid function or transformed function.
ξ is the input to the neuron.
ω is the weight of input (1 to k).
Θ is the bias.
3.2. Support Vector Machine Classifier

A support vector machine that separates hyperplanes is described. An ideal hyperplane that
categorises new instances is the approach performance. In a two-dimensional environment, this
modern hyper plane divides a plane into two halves, with each class on one side. It produces
better outcomes when dealing with categorization problems that are more complex. Each
function that represents the coordination of the plane as a point in n-dimension is assigned a
value to each data element. The SVM [19] is a highly effective categorizer that separates both
groups.
Figure 3 Support Vector Machine Hyper plane classifying two classes

3.3. Gradient Boosting Classifier

Gradient boosting (GB) builds new models sequentially from a set of weak models by
minimising loss functions for each new model. The descent of gradients determines the loss
function. Every new model that uses the loss function better matches the observations and
improves overall accuracy. Boosting, on the other hand, must be stopped at some point;
otherwise, the model looks to be overfishing. The stop criterion could be a certain level of
accuracy or a certain number of models.
The GBDT [20] community model of sequence-trained decision-making trees is a
community model of sequence-trained decision-making trees. GBDT adapts negative gradients
to train decision trees in any iteration (also called residual failures). Learning decision bodies
is the most time-consuming component of learning a decision tree in GBDT, and identifying
the appropriate split points is the most time-consuming aspect of learning a decision tree. The
pre-sort algorithm, which lists all possible dividing points with pre-sorting function values, is
one of the most used dividing dots. This method is simple and can find the best splits, however
it is unsuccessful in terms of training speeds and memory usage.
3.4. Bagging Classifier

Bagging methods are a powerful category of algorithms that combine many cases of black box
estimators in random subsets of the original data set, then aggregate their predictions effectively
to work out and construct the final prediction. The storage methods go to great lengths to
incorporate randomization into their creation in order to reduce differences across the basic
estimators. Consider a scenario in which you have a learner for scenario The Decision Tree.
You've probably tried to improve the accuracy and variance of Bootstrap technology.
• Using a next scheme technique, you can generate numerous samples of your data set
that are classified as a training set: you can randomly imagine every variable in your
training set and then pull it back. As a result, some of the training elements in the new
sample are present several times, while others are missing unintentionally. The samples
must be identical in size to the train package.
• You can instruct your pupil to attain efficient results and refine the model on each
produced sample.
• If students regress, you use the procedure to estimate the average number of students,
or to vote if they are graded.
3.4. Random Forest Classifier

Random Forest is a master learning technique used for grading and regression. It's a kind of
ensemble technique that combines a bunch of weak models to create a powerful model. The
random forest generates several tresses. Voting for that class should be used to categorise each
tree. This is a categorization. The forest chooses the classification with the most votes. Figure
3.4 depicts the random forest selection approach [22].
Take the test to evaluate the features and choose trees in order to predict and save the
findings.
• Determine the votes for each projected outcome.
• Assume that the forecast with the most votes is the final forecast.

Figure 4 Prediction process taken by random forest
3.6. Decision Tree Classifier

The decision tree algorithm [23] is a supervised research method for classification and
regression problems. Its main purpose is to create a training model that may be used to anticipate
employee attrition decisions using data sets from previous studies. It attempts to tackle the
problem through the use of nodes or node hierarchies. Three nodes are present:
• Root
• Internal Nodes
• Leaf Nodes
The root node represents the complete sample, which is then separated into leaf nodes that
display the attribute divided into leaf nodes that represent the class Labels.
Figure 5 Representation of the Decision Tree
3.7. Proposed Research Methodology with Feature Selection Technique

A feature selection algorithm is given in this research paper based on Chi-Square in order
to predict employee attrition using algorithms for machine learning. The comparison of other
functional approaches for the classification of employee attrition is often compared with the
Chi-Square. The methods for selecting features such as correlation-based role selection, the
gain ratio, chi-square-based feature selection, the Fishers Exact Test are used to predict
Employee Attrition.

Figure 6 Proposed Research Methodology for Predicting Employee Attrition in Industry
3.7.1. Correlation based Feature Selection

The relationship between all features and the output class is determined in this algorithm, and
the heuristic correlation assessment function is utilised to choose the best feature subset [24]. It
examines the relationship between nominal and category features, with numerical features
being represented by discrete values. The equation must be used to choose the correlation
function.
Where rzc denotes the relationship between features and class variable, K represents the
number of features, 𝑟̅̅̅indicates
𝑧𝑖 the mean value of correlated feature-classes and 𝑟̅𝑖𝑖 represents
the mean value of inter-correlated features.
3.7.2. Information gain Ratio Feature Selection

Knowledge determines the amount of data received by a term in a text for class prediction.
Gaining knowledge In relation to the development of a subgroup on the class attribute, it
determines the relevant information value. Data speculation indexes are frequently used to
evaluate attributes. The final goal of this research is to provide an undesired range of data gain
or to calculate the entropy value for every data [25]. It's a supervised, univariate, simple, solid,
symmetrical, and entropy-based feature selection approach. For the functions X and Y, the
following information is provided:

Where H(X), H (X|Y) is calculated on X and Y for entropy values. X entropy can be
computed as
X|Y entropy calculation is shown below:
Similarly, this strategy analyses the ratio for each characteristic independently and chooses
'm' as the most appropriate function, i.e. it considers the most significant function F with a high
information gain as the most relevant function. The fundamental disadvantage of this technique
is that it selects a high-data-gain attribute that may or may not be more informative. Because
the characteristics are chosen globally, knowledge acquisition is unable to handle redundant
features.
3.7.3. Gain Ratio Feature Selection

The revised information gain version is the Gain Ratio. It takes into account daughter nodes,
which are nodes in which an attribute separates the class data. This limits the appreciation for
the fact that the process of acquiring information has four characteristics and great potential
values [26]. Equation provides a benefit ratio.
We normalise information gain by breaking it into X entropy when forecasting variable Y,

and vice versa. The gain ratio now has a value between 0 and 1 as a result of this normalisation.
When the gain ratio is 1, it signifies that X information completely predicts Y, and there is no
relationship between Y and X. It favours variables with lower value as compared to information
gain.
3.7.4. Chi-Square Feature Selection Method

The observed and predicted frequency are two parameters in Chi-Square feature choices [27].
The qualities' weights can also be discovered. The greatest weight attributes are those that match
to the relevant attributes. This method examines the label of the class. This is where the
predictor's variable is chosen. This attribute value with the class numbers 'r' and 'c' is defined
as
Where Oij is the number of ‘i’ value occurrences in class ‘j’. Eij is the number of events
predictable with the value ‘i’ and the class ‘j’.

3.7.5. Fisher’s Exact Test

Using a well-known fisherman ratio concept and a heuristic approach to establish a value for
qualities, the Fisher score can be used to choose the suitable attributes [27]. The revenue is
• Define the features applicable to any particular issue.
• Reduces issue size and storage in your machine.
• Minimize calculation time to increase prediction accuracy as well.
• Strengthen the classification by eliminating irrelevant characteristics and noise.
4. RESULT AND DISCUSSION

4.1. Description of the Dataset
The employee attrition dataset is considered to evaluate the performance of the proposed
framework with various feature selection and classifiers. The employee attrition dataset is taken
from Kaggle repository [28]. The following table 1 represents the dataset used in this research
work.
Table 1 IBM Employee Attrition Dataset

Sl.No Feature Name
1 Age
2 Attrition
3 Business Travel
4 Daily Rate
5 Department
6 Distance From Home
7 Education
8 Education Field
9 Employee Count
10 Employee Number
11 Environment Satisfaction
12 Gender
13 Hourly Rate
14 Job Involvement
15 Job Level
16 Job Role
17 Job Satisfaction
18 Martial Status
19 Monthly Income
20 Monthly Rate
21 Number of Companies worked
22 Over18
23 Overtime
24 Percentage Salary Hike
25 Performance Rating
26 Relationship Satisfaction
27 Standard Hours
28 Stock Option Level
29 Total Working Years
30 Training time last year
31 Work life balance
32 Years at Company
33 Years in current role
34 Years since last promotion
35 Years with current manager

4.2. Number of Features obtained by Feature Selection Methods

Table 2 gives the number of features obtained by implementing the feature selection techniques
like Correlation based Feature Selection (CFS), Information Gain (IG), Gain Ratio (GR), Chi-
Square, and Fisher Exact Test. From the table 2, it is clear that the Chi-Square Feature Selection
method gives a smaller number of features when comparing with other feature selection
methods.
Table 2 Number of Features obtained by Feature Selection Methods

Feature Selection Method Number of Features obtained
Original Dataset 34
Correlation based Feature Selection Method 32
Information Gain 29
Gain Ratio 27
Chi-Square 24
Fisher Exact Test 28
4.3. Performance Analysis of the Feature Selection Methods

The performance metrics like Classification Accuracy, True Positive Rate (TPR), Precision,
False Positive, and Miss Rate are considered in this paper to evaluate the performance of the
feature selection methods in the prediction of employee attrition in industry using different
Machine Learning classifiers.
Table 3 gives the classification accuracy obtained by the feature selection methods using
various classifiers. Table 4 depicts the True Positive Rate (in %) obtained by Feature Selection
Methods using different classifiers. Table 5 presents the precision (in %) obtained by the feature
selection methods using various classifiers. Table 6 depicts the False Positive Rate (in %)
obtained by the feature selection methods using various classifiers. Table 7 gives the miss rate
(in %) obtained by the feature selection methods using various classifiers.
Table 3 Classification accuracy (in %) obtained by Feature Selection Methods using different
classifiers
Feature Selection Classification Accuracy (in %) by Classification Techniques
Methods ANN SVM GBT Bagging RF DT
Original Dataset 44.27 45.31 49.21 43.98 41.65 42.86
Correlation based FS 68.51 68.76 69.73 67.31 66.82 65.41
Information Gain 64.33 65.74 69.81 62.23 63.71 65.43
Gain Ratio 64.43 64.74 66.64 64.34 61.82 62.77
Chi-Square 89.44 85.75 89.15 79.81 80.53 81.22
Fisher’s Exact 72.65 73.21 73.76 68.92 69.16 70.88
Table 4 True Positive Rate (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection True Positive Rate (in %) by Classification Techniques
Gain Ratio 65.46 65.66 67.37 63.42 61.78 62.87
Chi-Square 88.54 87.72 89.22 86.34 85.62 81.11
Fisher’s Exact 68.26 66.54 64.57 66.91 65.17 67.49

Table 5 Precision (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection Precision (in %) by Classification Techniques
Gain Ratio 57.68 57.52 59.52 54.43 52.16 58.65
Chi-Square 83.18 85.24 85.42 81.53 79.72 78.18
Fisher’s Exact 55.85 56.21 61.28 65.89 64.25 55.32
Table 6 False Positive Rate (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection False Positive Rate (in %) by Classification Techniques
Gain Ratio 35.61 34.63 27.43 33.31 34.34 35.48
Chi-Square 7.62 7.84 6.73 10.25 17.33 20.09
Fisher’s Exact 31.97 31.77 23.14 31.56 33.13 32.88
Table 7 Miss Rate (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection Miss Rate (in %) by Classification Techniques
Gain Ratio 34.54 34.34 32.63 36.58 38.22 37.13
Chi-Square 11.46 12.28 10.78 13.66 14.38 18.89
Fisher’s Exact 31.74 33.46 35.43 33.09 34.83 32.51
From the table 3, table 4, table 5, table 6 and table 7, it is clear that the Chi-Square Feature
Selection with Gradient Boosting Tree (GBT) classifiers increased the classification accuracy,
TPR, Precision, and also it reduced the error rates like FPR and Miss Rate for predicting the
employee attrition in industry.
5. CONCLUSION
Employee attrition prediction has become a key issue in today's organisations. Employee
attrition is a major problem for businesses, especially when trained, technical, and critical staff
leave for better opportunities elsewhere. This leads in a financial loss as a trained employee
must be replaced. Feature selection is critical in the pre-processing stage of data mining, and
several data mining machine learning approaches struggle to manage vast volumes of irrelevant
characteristics. Various feature selection strategies are used in this research article to improve
the accuracy of employee attrition prediction in the industry. The performance of the different
six classifiers, such as ANN, SVM, GBT, Bagging, RF, and DT, is tested for the prediction of
employee attrition using Feature Selection approaches such as Information Gain, Gain Ratio,
Chi-Square, Correlation based, Fisher's Exact. The Chi-Square Feature Selection with Gradient
Boosting Tree classifier performs better in the prediction of employee attrition than other
feature selection techniques with other classifiers, as evidenced by the results.

REFERENCES
[1] Gopinath, R. "Impact of Stress Management by development of Emotional Intelligence in
CMTS, BSNL, Tamilnadu Circle-A Study." International Journal of Management Research
and Development (IJMRD) 4.1 (2014).
[2] Gopinath, R. "Prominence of Self-Actualization in Organization." International Journal of

Advanced Science and Technology 29.3 (2020): 11591-11602.
[3] Gopinath, R. (2019). Quality of Work Life (QWL) among the Employees of LIC, International
Journal of Scientific Research and Review, 8(5), 373-377.
[4] Gopinath, R. "Relationship Between Knowledge Management and Human Resource

Development–A Study on Telecommunication Industry." Suraj Punj Journal for
Multidisciplinary Research 9.5 (2019): 477-480.
[5] Gopinath, R., &Kalpana, R. (2020). Relationship of Job Involvement with Job Satisfaction.
Adalya Journal, 9 (7), 306-315.
[6] Fan, Chin-Yuan, et al. "Using hybrid data mining and machine learning clustering analysis to
predict the turnover rate for technology professionals." Expert Systems with Applications 39.10
(2012): 8844-8851.
[7] Samuel, Michael O., and Crispen Chipunza. "Employee retention and turnover: Using
motivational variables as a panacea." African journal of business management 3.9 (2009): 410-
415.
[8] Samuel, Michael O., and Crispen Chipunza. "Employee retention and turnover: Using
motivational variables as a panacea." African journal of business management 3.9 (2009): 410-
415.
[9] Glebbeek, Arie C., and Erik H. Bax. "Is high employee turnover really harmful? An empirical
test using company records." Academy of management journal 47.2 (2004): 277-286.
[10] Allen, David G. Retaining talent: A guide to analysing and managing employee turnover. SHRM
Foundations, 2008.
[11] Tang, Ziyuan, Gautam Srivastava, and Shuai Liu. "Swarm intelligence and ant colony
optimization in accounting model choices." Journal of Intelligent & Fuzzy Systems Preprint
(2020): 1-9.
[12] Marichelvam, M. K., M. Geetha, and Ömür Tosun. "An improved particle swarm optimization
algorithm to solve hybrid flowshop scheduling problems with the effect of human factors–A
case study." Computers & Operations Research 114 (2020): 104812.
[13] Jhaver, Mehul, Yogesh Gupta, and Amit Kumar Mishra. "Employee Turnover Prediction
System." 2019 4th International Conference on Information Systems and Computer Networks
(ISCON).IEEE, 2019.
[14] Machado, Marcos Roberto, Salma Karray, and Ivaldo Tributino de Sousa. "LightGBM: An
effective decision tree gradient boosting method to predict customer loyalty in the finance
industry." 2019 14th International Conference on Computer Science & Education
(ICCSE).IEEE, 2019.

[15] Keshri, Rajat, and P. Srividya. "Prediction of Employee Turnover Using Light GBM
Algorithm."
[16] Padmasini, Ms, and K. Shyamala. "An Integrated Gower based PSO-K Mode Clustering Model
for Business Solutions through Existing Customer Assessment."
[17] Eitle, Verena, and Peter Buxmann. "Business analytics for sales pipeline management in the
software industry: a machine learning perspective." Proceedings of the 52nd Hawaii
International Conference on System Sciences. 2019.
[18] Dutta, Shawni, and Samir Kumar Bandyopadhyay. "Employee attrition prediction using neural
network cross validation method." International Journal of Commerce and Management
Research (2020).
[19] Kim, Soo Y. "Prediction of hotel bankruptcy using support vector machine, artificial neural
network, logistic regression, and multivariate discriminant analysis." The Service Industries
Journal 31.3 (2011): 441-468.
[20] Qutub, Aseel, et al. "Prediction of Employee Attrition Using Machine Learning and Ensemble
Methods." Int. J. Mach. Learn. Comput 11 (2021).
[21] [21] Bhuva, Kashyap, and KritiSrivastava. "Comparative Study of the Machine Learning
Techniques for Predicting the Employee Attrition." IJRAR-International Journal of Research
and Analytical Reviews (IJRAR) 5.3 (2018): 568-577.
[22] Sisodia, Dilip Singh, Somdutta Vishwakarma, and Abinash Pujahari. "Evaluation of machine
learning models for employee churn prediction." 2017 International Conference on Inventive
Computing and Informatics (ICICI). IEEE, 2017.
[23] Alao, D. A. B. A., and A. B. Adeyemo. "Analyzing employee attrition using decision tree
algorithms." Computing, Information Systems, Development Informatics and Allied Research
Journal 4.1 (2013): 17-28.
[24] Najafi-Zangeneh, Saeed, et al. "An Improved Machine Learning-Based Employees Attrition
Prediction Framework with Emphasis on Feature Selection." Mathematics 9.11 (2021): 1226.
[25] Jain, Divyang. Evaluation of Employee Attrition by Effective Feature Selection using Hybrid
Model of Ensemble Methods. Diss. Dublin, National College of Ireland, 2017.
[26] Ozdemir, Fatma. Recommender System For Employee Attrition Prediction And Movie
Suggestion. Diss. Abdullah Gul University, 2020.
[27] PM, Usha, and N. V. Balaji. "Chi Square Selector Enhanced Fuzzy Clustering Method for
Employee Attrition Prediction." Design Engineering (2021): 3405-3425.
[28] https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
[29] Gopinath, R., and Chitra, A. (2020) Emotional Intelligence and Job Satisfaction of Employees
at Sago Companies in Salem District: Relationship Study. Adalya Journal, 9 (6), pp. 203-217.
[30] Gopinath, R., and N. S. Shibu. "A study on few HRD related entities influencing Job Satisfaction
in BSNL, Tamil Nadu Telecom Circle." Annamalai Business Review, Special Issue (2015): 24-
30.
View publication stats

Ijaret 11 12 313

Uploaded by

Copyright:

Available Formats

Ijaret 11 12 313

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ijaret 11 12 313

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

EMPLOYEE ATTRITION PREDICTION IN INDUSTRY USING MACHINE

Article in INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING & TECHNOLOGY · December 2020

system View project

Commerce and Management View project

The user has requested enhancement of the downloaded file.

© IAEME Publication Scopus Indexed

EMPLOYEE ATTRITION PREDICTION IN

https://iaeme.com/Home/journal/IJARET 3329 [email protected]

https://iaeme.com/Home/journal/IJARET 3330 [email protected]

3. PROPOSED FRAMEWORK FOR EMPLOYEE ATTRITION

Figure 1 Single input neuron0

https://iaeme.com/Home/journal/IJARET 3331 [email protected]

Figure 2 Multilayer perceptron

where O is the output.

3.2. Support Vector Machine Classifier

Figure 3 Support Vector Machine Hyper plane classifying two classes

https://iaeme.com/Home/journal/IJARET 3332 [email protected]

3.3. Gradient Boosting Classifier

3.4. Bagging Classifier

3.4. Random Forest Classifier

https://iaeme.com/Home/journal/IJARET 3333 [email protected]

Figure 4 Prediction process taken by random forest

3.6. Decision Tree Classifier

Figure 5 Representation of the Decision Tree

3.7. Proposed Research Methodology with Feature Selection Technique

https://iaeme.com/Home/journal/IJARET 3334 [email protected]

Figure 6 Proposed Research Methodology for Predicting Employee Attrition in Industry

3.7.1. Correlation based Feature Selection

3.7.2. Information gain Ratio Feature Selection

https://iaeme.com/Home/journal/IJARET 3335 [email protected]

X|Y entropy calculation is shown below:

3.7.3. Gain Ratio Feature Selection

We normalise information gain by breaking it into X entropy when forecasting variable Y,

3.7.4. Chi-Square Feature Selection Method

https://iaeme.com/Home/journal/IJARET 3336 [email protected]

3.7.5. Fisher’s Exact Test

4. RESULT AND DISCUSSION

Table 1 IBM Employee Attrition Dataset

https://iaeme.com/Home/journal/IJARET 3337 [email protected]

4.2. Number of Features obtained by Feature Selection Methods

Table 2 Number of Features obtained by Feature Selection Methods

4.3. Performance Analysis of the Feature Selection Methods

https://iaeme.com/Home/journal/IJARET 3338 [email protected]

https://iaeme.com/Home/journal/IJARET 3339 [email protected]

[2] Gopinath, R. "Prominence of Self-Actualization in Organization." International Journal of

[4] Gopinath, R. "Relationship Between Knowledge Management and Human Resource

https://iaeme.com/Home/journal/IJARET 3340 [email protected]

https://iaeme.com/Home/journal/IJARET 3341 [email protected]

View publication stats

You might also like