Ijaret 11 12 313
Ijaret 11 12 313
Ijaret 11 12 313
net/publication/356253935
CITATIONS READS
10 1,510
2 authors, including:
Dr. GOPINATH. R.
Bharat Sanchar Nigam Ltd.
193 PUBLICATIONS 1,272 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Dr. GOPINATH. R. on 16 November 2021.
ABSTRACT
Companies are always looking for ways to keep their professional personnel on
board in order to save money on hiring and training. Predicting whether or not a
specific employee would depart will assist the organisation in making proactive
decisions. Human resource problems, unlike physical systems, cannot be defined by a
scientific-analytical formula. As a result, machine learning approaches are the most
effective instruments for achieving this goal. In this study, a feature selection strategy
based on a Machine Learning Classifier is proposed to improve classification accuracy,
precision, and True Positive Rate while lowering error rates such as False Positive Rate
and Miss Rate. Different feature selection techniques, such as Information Gain, Gain
Ratio, Chi-Square, Correlation-based, and Fisher Exact test, are analysed with six
Machine Learning classifiers, such as Artificial Neural Network, Support Vector
Machine, Gradient Boosting Tree, Bagging, Random Forest, and Decision Tree, for the
proposed approach. In this study, combining Chi-Square feature selection with a
Gradient Boosting Tree classifier improves employee attrition classification accuracy
while lowering error rates.
Key words: Feature Selection, Employee Attrition, Classification, Error Rates,
Accuracy.
Cite this Article: M. Subhashini and R. Gopinath, Employee Attrition Prediction in
Industry Using Machine Learning Techniques, International Journal of Advanced
Research in Engineering and Technology, 11(12), 2020, pp. 3329-3341.
https://iaeme.com/Home/issue/IJARET?Volume=11&Issue=12
1. INTRODUCTION
Employee turnover is another name for employee attrition. Wearing down is a common
problem, and it's more prevalent in today's industry. In the vast majority of associations, it is
one of the most important difficulties [1]. Employee Defection refers to the gradual reduction
in the number of representatives due to retirement, renunciation, or death. Wearing down rates
vary widely from industry to industry in terms of their own principles, and these rates can also
differ between bright and inept positions. Organizations face a daunting task of enrolment and
gift retention, while also dealing with ability misfortune due to continual loss, whether because
to industry midtowns or willful person turnover [2]. When a well-trained and well-adjusted
employee quits the company, a void is created. As a result, the organisation loses important
skills, information, and business relationships [29]. Current chiefs and individual executives
are extremely interested in reducing wear and tear in the organisation in such a way that it will
contribute to the organization's most extreme viable development and progress. Any business's
representative consent is, in any case, organisation. If the situation is not handled properly,
critical personnel departures can result in a significant loss of earnings. Representative turnover
results in execution losses, which can have a long-term negative influence on businesses [3][4].
With rate reduction a significant concern for every industry, businesses endeavour to use
innovative business methods to reduce maintenance [5]. Although there is no way to completely
eliminate continuous loss, we can reduce it by implementing appropriate solutions. It could also
be when a supervisor estimates the rate of employee turnover ahead of time [30].
Machine learning can be used to create labelled data classifications or to create hidden
structures from unlabeled data. The ability of machine learning algorithms to anticipate the
possibility of a person leaving an organisation can be used by top-level management of firms
[6][7]. This procedure will aid in the control of attrition-causing causes and the prevention of
attrition. Employee turnover is a significant issue for employers. Every organization's ability to
retain talent is critical. As a result, if management can obtain a prediction likelihood of
employee separation as well as the variables driving the separation, it can be useful in making
actions that reduce attrition risk [8] [9]. Here's when machine learning comes in handy. Top
management will take proactive actions to retain personnel based on the forecasts made by
machine learning algorithms [10].
2. RELATED WORKS
Tang, Ziyuan, Gautam Srivastava, and Shuai Liu [11] based on accounting market big data,
offered a strategy for selecting accounting models for small and medium-sized firms (SMEs)
(AMBD). To begin, some indicators from a company's solvency, operating capacity,
profitability, and growth capacity are chosen, such as the current ratio, quick ratio, asset-
liability ratio, accounts receivable turnover rate, and other indicators from the solvency,
operating capacity, profitability, and growth capacity. Following that, the AMBD constraints
are classified using the principal component analysis method. Finally, the optimal accounting
model is established by iteration by combining particle swarm optimization with ant colony
optimization.
Marichelvam, M. K., M. Geetha, and OmurTosun [12] the effect of human variables was
taken into account when solving the multi-stage hybrid flow shop scheduling problem with
identical parallel machines at each level. The aim function is to minimise the weighted sum of
the make span and total flow time. Because the problem is NP-hard, we propose an improved
version of the particle swarm optimization (PSO) algorithm to solve it. To improve the PSO
algorithm's initial solutions, a dispatching rule and a constructive heuristic are used. The
variable neighbourhood search (VNS) algorithm is used with the PSO algorithm to provide the
best results in the shortest amount of time.
Jhaver, Mehul, Yogesh Gupta, and Amit Kumar Mishra [13] the study's unique addition is
to investigate the usage of the Gradient Boosting technique, which is more resilient due to its
regularisation formulation. Gradient Boosting is compared to three commonly used supervised
classifiers, such as Logistic Regression, Support Vector Machine, and Random Forest, using
worldwide retailer data to show that it has a higher accuracy for forecasting staff turnover.
Machado, Marcos Roberto, Salma Karray, and IvaldoTributino de Sousa [14] showcased a
financial company's deployment of a Machine Learning model to predict customer loyalty. The
researchers assessed the accuracy of two Gradient Boosting Decision Tree Models:
XGBoosting and the LightGBM algorithm, which has never been used to predict customer
loyalty.
Keshri, Rajat, and Srividya [15] Microsoft published the LightGBM algorithm in 2017,
which was explored. The authors compared LighGBM to other known algorithms in this paper.
The dataset's data is used to compare LightGBM to other classification algorithms and
demonstrate LightGBM's excellent prediction accuracy.
Padmasini, Ms, and K. Shyamala [16] A model is presented that clusters Original
Equipment Customers, Distributors, and Dealers in order to determine what the consumer
thinks about the products and firm based on pricing, quality, and delivery logistics, and then
decides to buy an automobile product and become a loyal client. The Integrated Gower based
PSO-KMode model proposes using the Gower dissimilarity measure and optimising the data
using Particle Swarm Optimization with K Modes clustering method to locate the existing
consumer through a survey.
Eitle, Verena, and Peter Buxmann [17] proposed a model to assist software sales reps in
managing the complicated sales funnel. Data-driven qualification assistance decreases the high
degree of arbitrariness produced by professional expertise and experiences by incorporating
business analytics in the form of machine learning into lead and opportunity management.
Using real business data from the company's CRM system, the authors created an artefact
consisting of three models to map the end-to-end sales pipeline.
Where rzc denotes the relationship between features and class variable, K represents the
number of features, 𝑟̅̅̅indicates
𝑧𝑖 the mean value of correlated feature-classes and 𝑟̅𝑖𝑖 represents
the mean value of inter-correlated features.
Where H(X), H (X|Y) is calculated on X and Y for entropy values. X entropy can be
computed as
Similarly, this strategy analyses the ratio for each characteristic independently and chooses
'm' as the most appropriate function, i.e. it considers the most significant function F with a high
information gain as the most relevant function. The fundamental disadvantage of this technique
is that it selects a high-data-gain attribute that may or may not be more informative. Because
the characteristics are chosen globally, knowledge acquisition is unable to handle redundant
features.
Where Oij is the number of ‘i’ value occurrences in class ‘j’. Eij is the number of events
predictable with the value ‘i’ and the class ‘j’.
Table 3 Classification accuracy (in %) obtained by Feature Selection Methods using different
classifiers
Feature Selection Classification Accuracy (in %) by Classification Techniques
Methods ANN SVM GBT Bagging RF DT
Original Dataset 44.27 45.31 49.21 43.98 41.65 42.86
Correlation based FS 68.51 68.76 69.73 67.31 66.82 65.41
Information Gain 64.33 65.74 69.81 62.23 63.71 65.43
Gain Ratio 64.43 64.74 66.64 64.34 61.82 62.77
Chi-Square 89.44 85.75 89.15 79.81 80.53 81.22
Fisher’s Exact 72.65 73.21 73.76 68.92 69.16 70.88
Table 4 True Positive Rate (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection True Positive Rate (in %) by Classification Techniques
Methods ANN SVM GBT Bagging RF DT
Original Dataset 51.42 51.73 49.62 49.89 49.34 48.88
Correlation based FS 72.16 71.78 72.57 71.72 70.62 69.96
Information Gain 70.53 70.43 70.95 68.73 67.46 69.57
Gain Ratio 65.46 65.66 67.37 63.42 61.78 62.87
Chi-Square 88.54 87.72 89.22 86.34 85.62 81.11
Fisher’s Exact 68.26 66.54 64.57 66.91 65.17 67.49
Table 5 Precision (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection Precision (in %) by Classification Techniques
Methods ANN SVM GBT Bagging RF DT
Original Dataset 44.72 48.16 50.61 43.43 44.31 45.76
Correlation based FS 67.67 67.52 72.51 66.97 65.43 66.81
Information Gain 58.39 57.61 60.43 55.64 54.32 59.45
Gain Ratio 57.68 57.52 59.52 54.43 52.16 58.65
Chi-Square 83.18 85.24 85.42 81.53 79.72 78.18
Fisher’s Exact 55.85 56.21 61.28 65.89 64.25 55.32
Table 6 False Positive Rate (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection False Positive Rate (in %) by Classification Techniques
Methods ANN SVM GBT Bagging RF DT
Original Dataset 66.25 59.17 55.61 66.54 67.26 56.36
Correlation based FS 34.31 32.54 26.51 35.36 35.34 33.54
Information Gain 33.53 34.51 25.42 32.53 35.34 34.25
Gain Ratio 35.61 34.63 27.43 33.31 34.34 35.48
Chi-Square 7.62 7.84 6.73 10.25 17.33 20.09
Fisher’s Exact 31.97 31.77 23.14 31.56 33.13 32.88
Table 7 Miss Rate (in %) obtained by Feature Selection Methods using different classifiers
Feature Selection Miss Rate (in %) by Classification Techniques
Methods ANN SVM GBT Bagging RF DT
Original Dataset 48.58 48.27 50.38 50.11 50.66 51.12
Correlation based FS 27.84 28.22 27.43 28.28 29.38 30.04
Information Gain 29.47 29.57 29.05 31.27 32.54 30.43
Gain Ratio 34.54 34.34 32.63 36.58 38.22 37.13
Chi-Square 11.46 12.28 10.78 13.66 14.38 18.89
Fisher’s Exact 31.74 33.46 35.43 33.09 34.83 32.51
From the table 3, table 4, table 5, table 6 and table 7, it is clear that the Chi-Square Feature
Selection with Gradient Boosting Tree (GBT) classifiers increased the classification accuracy,
TPR, Precision, and also it reduced the error rates like FPR and Miss Rate for predicting the
employee attrition in industry.
5. CONCLUSION
Employee attrition prediction has become a key issue in today's organisations. Employee
attrition is a major problem for businesses, especially when trained, technical, and critical staff
leave for better opportunities elsewhere. This leads in a financial loss as a trained employee
must be replaced. Feature selection is critical in the pre-processing stage of data mining, and
several data mining machine learning approaches struggle to manage vast volumes of irrelevant
characteristics. Various feature selection strategies are used in this research article to improve
the accuracy of employee attrition prediction in the industry. The performance of the different
six classifiers, such as ANN, SVM, GBT, Bagging, RF, and DT, is tested for the prediction of
employee attrition using Feature Selection approaches such as Information Gain, Gain Ratio,
Chi-Square, Correlation based, Fisher's Exact. The Chi-Square Feature Selection with Gradient
Boosting Tree classifier performs better in the prediction of employee attrition than other
feature selection techniques with other classifiers, as evidenced by the results.
REFERENCES
[1] Gopinath, R. "Impact of Stress Management by development of Emotional Intelligence in
CMTS, BSNL, Tamilnadu Circle-A Study." International Journal of Management Research
and Development (IJMRD) 4.1 (2014).
[3] Gopinath, R. (2019). Quality of Work Life (QWL) among the Employees of LIC, International
Journal of Scientific Research and Review, 8(5), 373-377.
[5] Gopinath, R., &Kalpana, R. (2020). Relationship of Job Involvement with Job Satisfaction.
Adalya Journal, 9 (7), 306-315.
[6] Fan, Chin-Yuan, et al. "Using hybrid data mining and machine learning clustering analysis to
predict the turnover rate for technology professionals." Expert Systems with Applications 39.10
(2012): 8844-8851.
[7] Samuel, Michael O., and Crispen Chipunza. "Employee retention and turnover: Using
motivational variables as a panacea." African journal of business management 3.9 (2009): 410-
415.
[8] Samuel, Michael O., and Crispen Chipunza. "Employee retention and turnover: Using
motivational variables as a panacea." African journal of business management 3.9 (2009): 410-
415.
[9] Glebbeek, Arie C., and Erik H. Bax. "Is high employee turnover really harmful? An empirical
test using company records." Academy of management journal 47.2 (2004): 277-286.
[10] Allen, David G. Retaining talent: A guide to analysing and managing employee turnover. SHRM
Foundations, 2008.
[11] Tang, Ziyuan, Gautam Srivastava, and Shuai Liu. "Swarm intelligence and ant colony
optimization in accounting model choices." Journal of Intelligent & Fuzzy Systems Preprint
(2020): 1-9.
[12] Marichelvam, M. K., M. Geetha, and Ömür Tosun. "An improved particle swarm optimization
algorithm to solve hybrid flowshop scheduling problems with the effect of human factors–A
case study." Computers & Operations Research 114 (2020): 104812.
[13] Jhaver, Mehul, Yogesh Gupta, and Amit Kumar Mishra. "Employee Turnover Prediction
System." 2019 4th International Conference on Information Systems and Computer Networks
(ISCON).IEEE, 2019.
[14] Machado, Marcos Roberto, Salma Karray, and Ivaldo Tributino de Sousa. "LightGBM: An
effective decision tree gradient boosting method to predict customer loyalty in the finance
industry." 2019 14th International Conference on Computer Science & Education
(ICCSE).IEEE, 2019.
[15] Keshri, Rajat, and P. Srividya. "Prediction of Employee Turnover Using Light GBM
Algorithm."
[16] Padmasini, Ms, and K. Shyamala. "An Integrated Gower based PSO-K Mode Clustering Model
for Business Solutions through Existing Customer Assessment."
[17] Eitle, Verena, and Peter Buxmann. "Business analytics for sales pipeline management in the
software industry: a machine learning perspective." Proceedings of the 52nd Hawaii
International Conference on System Sciences. 2019.
[18] Dutta, Shawni, and Samir Kumar Bandyopadhyay. "Employee attrition prediction using neural
network cross validation method." International Journal of Commerce and Management
Research (2020).
[19] Kim, Soo Y. "Prediction of hotel bankruptcy using support vector machine, artificial neural
network, logistic regression, and multivariate discriminant analysis." The Service Industries
Journal 31.3 (2011): 441-468.
[20] Qutub, Aseel, et al. "Prediction of Employee Attrition Using Machine Learning and Ensemble
Methods." Int. J. Mach. Learn. Comput 11 (2021).
[21] [21] Bhuva, Kashyap, and KritiSrivastava. "Comparative Study of the Machine Learning
Techniques for Predicting the Employee Attrition." IJRAR-International Journal of Research
and Analytical Reviews (IJRAR) 5.3 (2018): 568-577.
[22] Sisodia, Dilip Singh, Somdutta Vishwakarma, and Abinash Pujahari. "Evaluation of machine
learning models for employee churn prediction." 2017 International Conference on Inventive
Computing and Informatics (ICICI). IEEE, 2017.
[23] Alao, D. A. B. A., and A. B. Adeyemo. "Analyzing employee attrition using decision tree
algorithms." Computing, Information Systems, Development Informatics and Allied Research
Journal 4.1 (2013): 17-28.
[24] Najafi-Zangeneh, Saeed, et al. "An Improved Machine Learning-Based Employees Attrition
Prediction Framework with Emphasis on Feature Selection." Mathematics 9.11 (2021): 1226.
[25] Jain, Divyang. Evaluation of Employee Attrition by Effective Feature Selection using Hybrid
Model of Ensemble Methods. Diss. Dublin, National College of Ireland, 2017.
[26] Ozdemir, Fatma. Recommender System For Employee Attrition Prediction And Movie
Suggestion. Diss. Abdullah Gul University, 2020.
[27] PM, Usha, and N. V. Balaji. "Chi Square Selector Enhanced Fuzzy Clustering Method for
Employee Attrition Prediction." Design Engineering (2021): 3405-3425.
[28] https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
[29] Gopinath, R., and Chitra, A. (2020) Emotional Intelligence and Job Satisfaction of Employees
at Sago Companies in Salem District: Relationship Study. Adalya Journal, 9 (6), pp. 203-217.
[30] Gopinath, R., and N. S. Shibu. "A study on few HRD related entities influencing Job Satisfaction
in BSNL, Tamil Nadu Telecom Circle." Annamalai Business Review, Special Issue (2015): 24-
30.