Analysis of Various Data Mining Techniques To Predict Diabetes Mellitus

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 1 (2016) pp

727-730 © Research India Publications. http://www.ripublication.com

Analysis of Various Data Mining Techniques to Predict Diabetes


Mellitus

Dr. M. Renuka Devi, Dean of Computer 727


Science, Sri G.V.G Visalakshi College J. Maria Shyla Part Time Research
for Women Udumalpet, Tamil Nadu, Scholar Bharathiar University,
India. E-mail: [email protected] Coimbatore, Tamil Nadu, India. E-
mail: [email protected]

Abstract Data mining approach helps to diagnose patient’s


diseases. Diabetes Mellitus is a chronic disease to affect
various organs of the human body. Early prediction can
save human life and can take control over the diseases.
This paper explores the early prediction of diabetes using
various data mining techniques. The dataset has taken 768
instances from PIMA Indian Dataset to determine the
accuracy of the data mining techniques in prediction. The
analysis proves that Modified J48 Classifier provide the
highest accuracy than other techniques.

Keywords: Data mining, Diabetes, Prediction, accuracy,


classification

Introduction Today the buzz word is “Health Care” all


over the world. Early Prediction of diseases can reduce the
fatal rate of human. There are very large and enormous
data available in hospitals and medical related institutions.
Information technology plays a vital role in Health Care.
Diabetes is a chronic disease with the potential to cause a
worldwide Health Care crisis. According to International
Diabetes Federation 382 million people are living with
diabetes world wide. By 2035, this will be doubled as 592
million. Early prediction of diabetes is quite challenging task
for medical practitioners due to complex interdependence
on various factors. Diabetes affects human organs such as
kidney, eye, heart, nerves, foot etc. Data mining is a
process to extract useful information from large database. It
is a multidisciplinary field of computer science which
involves computational process, machine learning, statistical
techniques, classification, clustering and discovering
patterns. Data mining techniques has proved for early
prediction of disease with higher accuracy inorder to save
human life and reduce the treatment cost. This paper
explores various Data mining techniques such as Navie glucose get into the cells of our bodies. When a body is
Bayes, MLP, Bayesian Network, C4.5, Amalgam KNN, affected with diabetes, it couldn’t make enough insulin or
ANFIS, PLS-LDA, Homegenity-Based, ANN, Modified J48 couldn’t use its own insulin. This causes sugar to build up
etc. are analyzed to predict the diabetes disease. Veena into blood. Several pathogenic processes are involved in the
2014 combined AmalgamKNN and ANFIS to improve the development of diabetes. These range from autoimmune
accuracy in prediction. In this K-means and KNN are destruction of the β-cells of the pancreas with consequent
combined to overcome the computational complexity of insulin deficiency to abnormalities that result in resistance to
large number of dataset. And the training set is verified with insulin action. Diabetes is a life threatening disease in rural
fuzzy systems and neural networks to produce better result. and urban, then developed and under developed countries.
Sapna 2012 implemented genetic algorithm with data The common symptoms for the diabetic patients are
mining techniques to test the patients affected by diabetes frequent urination, increased thirst, weight loss, slow-healing
based upon the fitness value and the accuracy chromosome in wound, giddiness, increased hunger etc. Diabetes can
value. Gaganjot Kaur 2014 proposed a new approach for cause serious health complications including heart disease,
predicting the diabetes blindness, kidney failure and low- extremity amputations.
using WEKA and MATLAB for generating J48 classifiers
with improved existing J48 algorithm. Murat Koklu 2013 A. Types of Diabetes Type 1 Diabetes is called insulin-
formed a decision support system using data mining and dependent diabetes mellitus (IDDM) or juvenile-onset
artificial intelligence classification algorithms namely diabetes. Autoimmune, genetic, and environmental factors
Multilayer Perceptron, Navie Bayes classification and J48are to involved in the development of this type of diabetes.
diagnose illness. To achieve good performances Type1 in mostly occurs in young people who are below 30
predicting the onset of diabetes, Manaswini Pradhan 2011 years. This type can affect children or adults, but majority of
suggested and experimented ANN based classification these diabetes cases were in children. In persons with type
model and Genetic algorithm for feature selection. Hence, 1 diabetes, the beta cells of the pancreas, which are
this paper mainly focused on Data mining techniques and responsible for insulin production, are destroyed due to
analyzed its accuracy with various tools. autoimmune system. Type 2 Diabetes is called non-insulin-
dependent diabetes mellitus (NIDDM) or adult-onset
Diabetes Diabetes Mellitus (DM) is commonly referred asdiabetes. In the type 2 diabetes, the pancreas usually
Diabetes; it is the condition in which the body does not produces some insulin the amount produced is not enough
properly process food for use as energy. Most of the food for the body's needs, or the body's cells are resistant to it.
we eat is turned into glucose or sugar for energy. The Risk factors for Type 2 diabetes includes older age, obesity,
pancreas, an organ makes a hormone called insulin to help family history of diabetes, prior
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 1 (2016) pp
727-730 © Research India Publications. http://www.ripublication.com
history of gestational diabetes, impaired glucose tolerance, physical inactivity, and race/ethnicity.
Gestational Diabetes is the third main form and occurs when pregnant women without a previous history
of diabetes develop a high blood glucose level. The majority of gestational diabetes patients can control
their diabetes with exercise and diet. Between 10% to 20% of them will need to take some kind of blood-
glucose-controlling medications. In few cases this gestational diabetes may lead to type 2 diabetes in
future. It affects on 4% of all pregnant women. Congenital Diabetes occurs in human due to genetic
defects of insulin secretion, cystic fibrosis-related diabetes, and high doses of glucocorticoids leads to
steroid diabetes.
Application of Data mining Techniques in Diabetes Medical data can be trained using data
mining techniques to predict the diabetes. For this, dataset has to be preprocessed to remove noisy and
fill the missing values. Pima Indian Diabetes Dataset was taken to evaluate data mining Classification.
The dataset comprises 9 attributes and 768 instances. The following table 1 shows the description of the
attributes. Data mining techniques can be applied to the effective factors such as BMI, DPF, age and skin
to predict the diabetes. Insulin and GTT measurement are used for testing diabetes. Pregnancy and BP
are also considered as testing factors. The above attributes can be classified and cluster using various
techniques such as Navie Bayes, J48, PLS-LDA, SVM,BLR, MLP, K-NN, Bayesian Network. With the
help of the above attributes type 1, type 2 diabetes and gestational diabetes can be diagnosed. Obesity,
age factor and family history are the main cause for type 2 diabetes. The class variable 1 indicates
diabetic test is positive and 0 indicates test is negative. Tanagara, WEKA and MATLAB tools help to do
data mining task with all machine learning algorithms. Data mining supervised learning algorithms are
used to categorization task. DM technique can predict the hidden patterns from the previous history.
Classification is the commonly used technique in medical data mining. The predictive accuracy of the
classifier is estimated. The application of data mining technique can minimize the number of test required
for detecting disease.
Table 1: Attributes of Diabetes Dataset
Attribute No.
728
Various Data mining techniques used to predict diabetes The diabetic patients suffer with
various diseases and also it affects various parts of other organs. If the treatments are not taken to control
the disease, it leads the patient to death. Hence, effective measures have to be taken to predict the
disease at the earliest and control. In this paper various data mining techniques are analyzed to diagnose
diabetes mellitus with the best techniques using various tools. As per the data given in the table 2,
Gaganjot compared the accuracy and error rate of various data mining algorithms such as Navie Bayes,
MLP, Random forest, Random Tree and Modified J48.The result provides 99.87% accuracy in Modified
J48 Classifier. Radha experimented C4.5,SVM,K-NN, PNN and BLR classification techniques to classify
the patients with and without diabetes. The result obtained shows that C4.5 decision tree algorithm
provides 86% of accuracy. Mohtaram Bayesian network technique predicts the diabetic patients. The
Bayesian Network trained with the given data set and provides 90.4% accuracy in prediction. The genetic
algorithm creates an optimal solution to predict the diseases with 80.5% of accuracy. The machine
learning algorithm Multilayer Perceptron fed with training dataset and trained to classify the feature
vectors. The result obtained in MLP is 97.61%. This paper deals with different data mining techniques
with respect to performance of the system to predict diabetes.
Table 2 : Analysis of Different Data mining Techniques to predict diabetes
Author & year Data mining
Techniques
Best DM techniques Gaganjot Kaur and Amit Chhabra, 2014 [1]
Tools Accura
cy 99.87% Modified
J48 Classifier
P.Radha and Dr. B. Srinivasan, 2014 [2]
NavieBayes,
WEKA, MLP,
MATLAB Random Tree, REP tree, RAD, RandomFore st, J48, Modified J48 Classifier
Tanagara 86% C4.5
Mohtaram Mohammadi, Mitra Hosseini, Hamid Tabatabaee, 2014 [3]
C4.5, SVM, k-NN,PNN,BL R Attribute Description
Bayesian
MATLAB 90.4% Bayesian Network,
Network 1 Plasma Plasma glucose concentration a 2
hours in an oral glucose tolerance test
Sudesh Rao and N. Arun Kumar, 2014 [4]
Decision Tree
2 Pressure Diastolic blood pressure(mmHg) 3 Skin Triceps skin fold thickness(mm) 4 Insulin 2-Hour
serum insulin (mu U/ml) 5 Pregnancy Number of times pregnant 6 Mass Body Mass Index(BMI) 7
Pedigree Diabetes Pedigree function 8 Age Age(in years) 9 Class Class variable(0 or 1)
Clementine 80.5% Genetic
algorithm with fuzzy logic Veena Vijayan. V and Aswathy Ravikumar, 2014 [5]
Genetic algorithm
EM, KNN,
SharperLig
80% Amalgam K-means,
ht
KNN and amalgam
ANFIS KNN and ANFIS algorithm
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 1 (2016) pp
727-730 © Research India Publications. http://www.ripublication.com
Arwa Al-Rofiyee,
MLP WEKA 97.61% MLP Maram Al-Nowiser, Nasebih Al-Mufadi, 2013 [6] K.R. Lakshmi and
C4.5, SVM,
Tanagara 76.78% PLS-LDA S. Premkumar,
k-NN, PNN, 2013 [7]
BLR, MLR, PLS-DA, PLS-LDA, k-means & Apriori Murat Koklu and
Multilayer
WEKA 76.3% Navie Yavuz Unal, 2013
Perceptron,
Bayes [8]
J48 and
Classifier Navie Bayes Classifier Rupa Bagdi, Prof.
74% C4.5 Pramod Patil, [9]
Decision Tree Ashwinkumar.U.M and Dr.Anandakumar. K.R, 2012 [10]
729 ID3 , C4.5
ID3 , C4.5 Decision
Decision Tree
Tree
Figure 1: Performance of various data mining techniques
WEKA 68% C4.5
Conclusion In the medical field accuracy in prediction of the diseases is the most important factor rather
than the execution time. In the analysis of data mining techniques and tools Modified J48 S.Sapna , Dr.
Classifier gives 99.87% of highest accuracy using WEKA & Tamilarasi and M.
MATLAB tool. Since the diabetes is a chronic disease it has Pravin Kumar, 2012 [11]
to be prevented before it affects people. In future the diabetes can be prevented using gene analysis and
previous history of the diabetes.
References
[1] Gaganjot Kaur, Amit Chhabra, “Improved J48 Classification Algorithm for the prediction of Diabetes”,
International Journal of Computer Applications(0975-8887) vol.98 No.22, July 2014. [2] P. Radha, Dr. B.
Srinivasan, “ Predicting Diabetes by consequencing the various Data mining Classification Techniques”,
International Journal of Innovative Science, Engineering & Technology, vol. 1 Issue 6, August 2014, pp.
334-339 [3] Mohtaram Mohammadi, Mitra Hosseini, Hamid Tabatabaee, “Using Bayesian Network for the
prediction and Diagnosis of Diabetes” , MAGNT Research Report, vol.2(5), pp.892-902. [4] Sudesh Rao,
V. Arun Kumar, “Applying Data mining Technique to predict the diabetes of our future generations”,
ISRASE eXplore digital library, 2014. [5] Veena vijayan, Aswathy Ravikumar, “ Study of Data mining
algorithms for prediction and diagnosis of Diabetes Mellitus”, International Journal of Computer
Applications (0975-8887) vol. 95-No.17, June 2014 [6] Arwa Al-Rofiyee, Maram Al-Nowiser, Nasebih Al-
Mufad, Dr. Mohammed Abdullah AL-Hagery, “Using Prediction Methods in Data mining for Diabetes
Diagnosis”, http://www.psu.edu.sa/megdam/sdma/Downloads/Post ers [7] K.R Lakshmi, S.Premkumar, “
Utilization of Data mining Techniques for prediction of Diabetes Disease survivability”, International
Journal of Scientific & Engineering Research, vol.4 Issue 6, June 2013. [8] Murat Koklu and Yauz Unal, “
Analysis of a Decision Tree and Incremental learning
MATLAB 80% Generic Genetic Algorithm
Manaswini Pradhan and Dr. Ranjit Kumar Sahu, 2011 [12]
Genetic Algorithm
Artificial Neural Network
Muhammad Waqar Aslam and Asoke Kumar Nandi, 2010 [13]
Artificial
Tanagara 73.438 Neural
% Network, Genetic algorithm
78.5% Genetic
Programmin g
Huy Nguyen Anh Pham and Evangelos Triantaphyllou, 2008 [14]
Genetic
GP Lab Programmin
tool Box g
Homogeneit
RapidMine
80.1% Homogeneit y-Based
r
y-Based algorithm
algorithm
Results & Discussion The results obtained from the given dataset classified into two classes i.e patients
with diabetes and without diabetes using various data mining techniques. The accuracy to predict the
diabetes disease using different techniques is shown in graphical representation in the fig1. Based on the
results demonstrated, Modified J48 classifier provides highest accuracy 99.87% to predict the diseases.
The performance of the algorithm is calculated using the equation for Total Accuracy and Random
Accuracy. Here, True positive and True Negative, False positive and False Negative parameters are
taken to evaluate the equation. Radha compared classification techniques and found the C4.5 decision
tree algorithm gives better accuracy 86% in prediction. Arwa Al- Rofiyee et.al used machine learning
algorithm Multilayer Perceptron to predict the disease with 97.61% accuracy.
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 1 (2016) pp
727-730 © Research India Publications. http://www.ripublication.com

population of Diabetic patients Databases with Classifiers”,


International Journal of Medical,Health,Pharmaceutical and
Biomedical Engineering”, vol.7 No.8, 2013. [9] Rupa Bagdi,
Prof. Pramod Patil,” Diagnosis of Diabetes Using OLAP
and Data Mining Integration”, International Journal of
Computer Science & Communication Networks,Vol 2(3),
pp. 314-322. [10] Ashwinkumar.U.M and Dr. Anandakumar
K.R, “Predicting Early Detection of cardiac and Diabetes
symptoms using Data mining techniques”, International
conference on computer Design and Engineering, vol.49,
2012. [11] S. Sapna, Dr. A. Tamilarasi and M. Pravin
Kumar, “Implementation of Genetic Algorithm in predicting
Diabetes”, International Journal of computer science, vol.9
Issue 1, No.3, January 2012. [12] Manaswini pradhan, Dr.
Ranjit kumar sahu, “ predict the onset of diabetes disease
using Artificial Neural Network”, “ International Journal of
Computer Science & Emerging Technologies, vol.2 Issue
2, April 2011. [13] Muhammad Waqar Aslam and Asoke
Kumar Nandi, “Detection of Diabetes using Genetic
Programming”, European Signal Processing Conference
(EUSIPCO- 2010), ISSN 2076-1465. [14] Huy Nguyen Anh
Pham and Evangelos Triantaphyllou, “ Prediction of
Diabetes by Employing New Data mining approach which
balances Fitting and Generalization,Springer 2008.

730

You might also like