Microsoft Word - Paper 2021 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

www.ijcrt.

org © 2021 IJCRT | Volume 9, Issue 5 May 2021 | ISSN: 2320-2882

Disease Prediction using Machine Learning


1
Palle Pramod Reddy, 2Dirisinala Madhu Babu, 3Hardeep Kumar and
4Dr.Shivi Sharma

1,2,3Students, School of Computer Science and Engineering, Lovely Professional University, Jalandhar (Punjab), India

4 Assistant Professor, School of Electronics and Electrical Engineering, Lovely Professional university, Jalandhar (Punjab) India

Abstract— Many advanced countries, including India, are dealing with


a wide range of chronic diseases, mostly cardiovascular
The “Disease Prediction” method, which is concentrated on disease and diabetes, which could have deep consequences
predictive modeling, it predicts the user's disease based on for global health, security, and economy. The rapid
the symptoms that the user provides as input. The method urbanization and economic growth of today's world has
examines the user's symptoms as input and returns the resulted in a wide range of lifestyles. Chronic diseases are
disease's likelihood as an output. Disease prediction is now a problem in all nations, with chronic disease afflicting
accomplished using the random forest classifier. one-third of the population in each. Chronic disease care is
more expensive, and it is difficult for those who are sick. In
Keywords—Random Forest, Chronic Disease,
the medical field, a huge number of chronic disease datasets
I. INTRODUCTION are gathered and processed, and data mining aids in disease
early detection. Cardiovascular disease, diabetes, liver
When anyone is currently afflicted with an illness, they must
disease, Alzheimer's disease, and Parkinson's disease are the
see a doctor, which is both time consuming and costly. It can
most high-priced diagnosis diseases.
also be difficult for the user if they are not near to doctors
and hospitals because the illness cannot be identified. So, if It's a major challenge in the medical or healthcare industries
the above procedure can be done using an automated to offer the highest quality services to all patients, and only
software that saves time and money, it could be better for the those who can afford it can benefit from it. There is a vast
patient, making the process go more smoothly. There are amount of healthcare data available that is not being mined
other Heart Disease Prediction Systems that use data mining in a more efficient and reliable manner to uncover secret
methods to analyze the patient's risk level. Disease Predictor knowledge for successful decision-making. The proposed
is a web-based system that predicts a user's disease based on framework employs data mining techniques to detect
the symptoms they have. Data sets from various health- Chronic diseases early.
related websites have been obtained for the Disease Machine learning is the process of programming computers
Prediction system. The consumer will be able to determine to improve their output based on examples or previous data.
the likelihood of a disease based on the symptoms given The study of computer systems that learn from data and
using Disease Predictor. People are always curious to learn experience is known as machine learning. Training and
new things, particularly as the use of the internet grows Testing are the two stages of the machine learning algorithm.
every day. When an issue occurs, people often want to look Prediction of a disease based on the signs and medical
it up on the internet. Hospitals and physicians have less history of the patient Machine learning has been a stumbling
access to the internet than the general public. When people block for decades.
are afflicted with an illness, they do not have many options. Machine Learning technology provides a strong forum in
As a result, this system can be beneficial to people. Chronic the medical sector for efficiently resolving healthcare issues.
illness is a disease that lasts a long time or takes a long time
to heal, and many chronic diseases cannot be cured but can
only be managed with daily treatments. India, like all other
nations, is undergoing significant social and economic shifts,
which is causing a fast rise in the frequency of
cardiovascular disease.

IJCRT2105229 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org c205


www.ijcrt.org © 2021 IJCRT | Volume 9, Issue 5 May 2021 | ISSN: 2320-2882
II. RESEARCH OBJECTIVE techniques on the same dataset, and the results show that
There is a require to groundwork and evolve a system that will Decision Tree outperforms, with Bayesian classification having
enable end users to predict chronic diseases without having to comparable accuracy to Decision Tree in some cases, but other
visit a physician or doctor for diagnosis. To identify various predictive approaches such as KNN, Neural Networks, and
diseases by observing the symptoms of patients and applying Classification based on Clustering underperform.
various Machine Learning Models techniques. There is no
proper procedure for handling text and structured data. Both Shadab Adam Pattekari and Asma Parveen conducted a study to
structured and unstructured data would be considered by the predict heart diseases using the Decision Tree Algorithm, in
proposed framework. which the consumer provides data that is compared to a
Machine Learning can improve the accuracy of predictions. qualified set of values. As a result of this study, patients were
able to provide basic information that was compared to data,
III. LITERATURE REVIEW and heart disease was expected. M.A.NisharaBanu and B.
The study for the best medical diagnosis mining technique was Gomathy analysed the various types of heart-related problems
performed by K.M. Al-Aidaroos, A.A. Bakar, and Z. Othman. using medical data mining techniques such as association rule
For this study, the authors compared Nave Baeyes to five other mining, grouping, and clustering I. The aim of a decision tree is
classifiers: LR, KStar (K*), Decision Tree (DT), Neural to show any possible outcome of a decision. To achieve the best
Network (NN), and a basic rule-based algorithm (ZeroR). The result, various rules are devised. The criteria used in this study
efficiency of all algorithms was evaluated using 15 real-world were age, sex, smoking, being overweight, drinking alcohol,
medical problems from the UCI machine learning repository blood sugar, heart rate, and blood pressure. The risk level for
(Asuncion and Newman, 2007). In the experiment, NB various parameters is saved with their ids ranging from 1 to
outperformed the other algorithms in 8 of the 15 data sets, 100. (1-8). The standard level of prediction is represented by
leading to the conclusion that the predictive accuracy results in IDs less than 1, whereas higher IDs other than 1 represent
Nave Baeyes are superior to other techniques. Darcy A. Davis, higher risk levels. The pattern in the dataset is studied using the
Nitesh V. Chawla, Nicholas Blumm, Nicholas Christakis, and K- means clustering method. The algorithm divides the data into
Albert-Laszlo Barabasi discovered that treating chronic illness at k groups. The closed cluster is allocated to each point in the
a global level is neither time nor cost effective. As a result, the dataset. Each cluster centre is recalculated as the average of the
authors performed this study in order to forecast potential cluster's points.
disease risk. CARE (which uses only a patient's medical history
and ICD-9-CM codes to predict possible disease risks) was used
for this. Based on their own medical history and that of similar
patients, CARE incorporates collective filtering approaches with PROPOSED SYSTEM
clustering to predict each patient's greatest disease risks. ICARE,
an iterative version that integrates ensemble principles for We have mixed structured and unstructured data in the
improved efficiency, has also been defined by the authors. healthcare fields to determine disease risk in this project. The
use of a latent factor model to recreate missing data in medical
These cutting-edge systems don't need any advanced knowledge records obtained from online sources. We could also assess the
and can predict a wide range of medical conditions in a single major chronic diseases in a specific area and population using
run. ICARE's remarkable potential risk coverage means more statistical information. We consult hospital experts to learn
precise early alerts for thousands of illnesses, several years about useful features when dealing with structured data. In the
ahead of time. When used to its full extent, the CARE system case of unstructured text files, we use the randrom forest
can be used to investigate a wider range of disease backgrounds, algorithm to automatically select features.
raise previously unconsidered questions, and facilitate
discussions regarding early detection and prevention.

This research paper was written by JyotiSoni, Ujma Ansari,


Dipesh Sharma, and SunitaSoni to provide a survey of existing
techniques of information discovery in databases using data
mining techniques that are used in today's medical research,
specifically in Heart Disease Prediction. A number of
experiments have been carried out to compare the performance
of predictive data mining

Fig 1:- System Model

IJCRT2105229 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org c206


www.ijcrt.org © 2021 IJCRT | Volume 9, Issue 5 May 2021 | ISSN: 2320-2882
phase of categorical data classification. And this step is divided
into two phases: training and testing. In the training phase,
A. Data collection
predetermined data and associated class labels are used for
Data collection has been done from the internet to identify the classification. The training stage is often referred to as
disease here the real symptoms of the disease are collected i.e. supervised learning. The preparation and testing phases of the
no dummy values are entered. The symptoms of the disease are classification process are depicted in the diagram. In the training
collected from different health related websites. process, training tuples are used, and in the test data phase, test
data tuples are used, and the classification rule's accuracy is
Data Preprocessing
calculated. Assume that the classification rule's accuracy on
Before feeding the data into the Prediction model, followingdata testing data is sufficient for the rule to be used for classification
cleaning and preprocessing steps are performed of unmined data.
● Checking null values and filling using forward fill method
C.Prediction:
● Converting data into different cases
Prediction using Random Forest : -
● Standardizing the data using mean and standard deviation Prediction done by Random Forest Model using Flask frame
● Splitting the dataset into training and testing sets work model trained by training chronic disease dataset
B. Building Model
Many methods are used to perform data mining. Machine
learning is one of the approaches. Random forest Machine IV. RESULTS AND CONCLUSION
learning strategies include grouping, clustering, summarization,
Model Accuracy
and many others. Since classification techniques are used in this
Diabetes Model 98.25
project, classification is one of the data mining processes in this
Breast Cancer Model 98.25
Heart Disease Model 85.25
Kidney Disease Model 99
Liver Disease Model 78
Table 1:- shows the accuracy achieved using random forest
for each disease

Fig. 2 shows the accuracy or each model using Random


forest classifier

IJCRT2105229 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org c207


www.ijcrt.org © 2021 IJCRT | Volume 9, Issue 5 May 2021 | ISSN: 2320-2882
Fig. 7:- Kidney Disease Prediction entry form

Fig. 3 Home scr

Fig. 7:- Liver Disease Prediction entry form

Conclusion
The aim of this project is to predict disease based on symptoms.
The project is set up in such a way that the device takes the
user's symptoms as input and generates an output, which is
disease prediction. A prediction accuracy probability of 95% is
obtained on average. The grails system was used to successfully
incorporate Disease Predictor.
Fig. 4 :- Diabetes Prediction entry form

REFERENCE

1) A.Davis, D., V.Chawla, N., Blumm, N., Christakis, N., &


Barbasi, A. L. (2008). Predicting Individual Disease Risk Based
On Medical History.
2) Adam, S., & Parveen, A. (2012). Prediction System For
Heart Disease Using Naive Bayes.
3) Al-Aidaroos, K., Bakar, A., & Othman, Z. (2012). Medical
Data Classification With Naive Bayes Approach. Information
Fig. 5 :- Breast cancer Prediction entry form Technology Journal.
4) Darcy A. Davis, N. V.-L. (2008). Predicting Individual
Disease Risk Based On Medical History.
5) JyotiSoni, Ansari, U., Sharma, D., & Soni, S. (2011).
Predictive Data Mining for Medical Diagnosis: An Overview Of
Heart Disease Prediction.
6) K.M. Al-Aidaroos, A. B. (n.d.).

K.M. Al-Aidaroos, A. B. (n.d.). 2012. Medical Data


sssClassification With Naive Bayes Approach .
7) Nisha Banu, MA; Gomathy, B;. (2013). Disease
Fig. 6 :- Heart Disease Prediction entry form Predicting System Using Data Mining Techniques.

IJCRT2105229 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org c208

You might also like