Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
BACHELOR OF TECHNOLOGY
in
MARCH 2021
BONAFIDE CERTIFICATE
This is to certify that the project report entitled, “Disease Prediction using Machine Learning”
is a bonafide record of Mini Project work done during the even semester of the academic year
2020-2021 by
in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology
in Computer Science and Engineering of Karunya Institute of Technology and Sciences.
First and foremost, I praise and thank ALMIGHTY GOD whose blessings have bestowed
I am grateful to our beloved founders Late. Dr. D.G.S. Dhinakaran, C.A.I.I.B, Ph.D and
Dr. Paul Dhinakaran, M.B.A, Ph.D, for their love and always remembering us in their prayers.
I extend my thanks to our Vice Chancellor Dr. P. Mannar Jawahar, Ph.D and our
Registrar Dr. Elijah Blessing, M.E., Ph.D, for giving me this opportunity to do the project.
I would like to thank Dr. Prince Arulraj, M.E., Ph.D., Dean, School of Engineering and
Technology for his direction and invaluable support to complete the same.
I would like to place my heart-felt thanks and gratitude to Dr. J. Immanuel John Raja,
M.E., Ph.D., Head of the Department, Computer Science and Engineering for his
Department of Computer Science and Engineering and Dr. Esther Daniel for their invaluable
I also thank all the staff members of the Department for extending their helping hands to
I would also like to thank all my friends and my parents who have prayed and helped me
Using Machine learning, our project proposes a disease prediction system. For small problems,
the users have to go personally to the hospital for check-up which is more time consuming. Also
handling the telephonic calls for appointments is quite hectic. Such a problem can be solved by
using disease prediction applications by giving proper guidance regarding healthy living. Over
the past decade, the use of the specific disease prediction tools along with the concerning health
has been increased due to a variety of diseases and less doctor-patient ratio. Thus, in this
system, we are concentrating on providing immediate and accurate disease prediction to the
users about the symptoms they enter along with the severity of disease predicted. For prediction
of diseases, different machine learning algorithms are used to ensure quick and accurate
predictions. In one channel, the symptoms entered will be cross checked with the database.
Further, it will be preserved in the database if the symptom is new which its primary work is
and the other channel will provide severity of disease predicted. A web/android application is
deployed for users for
easy portability, configuring and being able to access remotely where doctors cannot reach easily.
Therefore, this arrangement helps in easier health management.
Keywords: Machine Learning, Decision Tree Algorithm, Random Forest Algorithm, Naive
Bayes Algorithm.
References 29
1.1 INTRODUCTION
At present, when one suffers from a particular disease, then the person has to visit a doctor which
is time consuming and costly too. Also if the user is out of reach of doctors and hospitals it may
be difficult for the user as the disease can not be identified. So, if the above process can be
completed using an automated program which can save time as well as money, it could be easier
to the patient which can make the process easier. There are other Heart related Disease Prediction
System using data mining techniques that analyzes the risk level of the patient.
Disease Predictor is a web based application that predicts the disease of the user with respect to
the symptoms given by the user. The Disease Prediction system has data sets collected from
different health related sites. With the help of Disease Predictor the user will be able to know the
probability of the disease with the given symptoms.
As the use of the internet is growing every day, people are always curious to know different new
things. People always try to refer to the internet if any problem arises. People have access to the
internet more than hospitals and doctors. People do not have immediate options when they suffer
with a particular disease. So, this system can be helpful to the people as they have access to the
internet 24 hours.
1.2 OBJECTIVE
It is estimated that more than 70% of people in India are prone to general body diseases like
viral, flu, cough, cold .etc, in every 2 months. Because many people don't realize that the general
body diseases could be symptoms of something more harmful, 25 % of the population succumbs
to death because of ignoring the early general body symptoms. This could be a dangerous
situation for the population and can be alarming. Hence identifying or predicting the disease at
the earliest is very important to avoid any unwanted casualties. The currently available systems
6 | Page Mini Project 2020-2021
are the systems that are either dedicated to a particular disease or are in the research phase for
algorithms when it comes to generalized disease. The purpose of this system is to provide
prediction for the general and more commonly occurring disease that when unchecked can turn
into fatal disease. The system applies data mining techniques and decision tree algorithms. This
system will predict the most possible disease based on the given symptoms and precautionary
measures required to avoid the aggression of disease, it will also help the doctors analyse the
pattern of presence of diseases in the society. In this project, the disease prediction system will
carry out data mining in its preliminary stages, the system will be trained using machine
learning. I have used three different algorithms for this purpose and gained an accuracy of
92-95%. Such a system can have a very large potential in medical treatment of the future. I have
also designed an interactive interface to facilitate interaction with the system. I have also
attempted to show and visualized the result of our study and this project.
1.3 MOTIVATION
Every machine learning pipeline is a set of operations, which are executed to produce a model.
An ML model is roughly defined as a mathematical representation of a real-world process. We
might think of the ML model as a function that takes some input data and produces an output
(classification, sentiment, recommendation, or clusters). The performance of each model is
evaluated by using evaluation metrics, such as precision, recall and accuracy. ML might solve
some problems, which can be too complex to be solved traditionally. For such problems, a
probabilistic solution that is implemented by using machine learning, might be the right way to
pursue. Disease prediction using Machine Learning provides an output disease when the user
gives the symptoms even without going to hospital. There are many people who pick on
symptoms, google them and assume that they have a chronic illness which leads to unnecessary
worrying. This kind of an app has personally helped me to predict a conclusion to calm my mind
which was actually the same when I got diagnosed.
REQUIREMENT ANALYSIS
a. Display the list of symptoms where users can select the symptoms.
b. Decision Tree, Random Forest and Naïve Bayes classifier are used to classify the data sets.
The proposed project is economically feasible as the cost of the project is involved only in the
hosting of the project. As the data samples increase, which consume more time and processing
power. In that case a better processor might be needed.
The proposed project is operationally feasible as the user has basic knowledge about computers
and the Internet. Disease Predictor is based on client-server architecture where client is user and
server is the machine where datasets are stored.
In the existing system the data set is typically small, for patients and diseases with specific
conditions. These systems are mostly designed for the more colossal diseases such as Heart
Disease, Cancer etc. The pre-selected characteristics may sometimes not satisfy the changes in
the disease and its influencing factors which could lead to inaccuracy in results. As we live in a
continuously evolving world, the symptoms of diseases also evolve over a course of time. Also
most of the current systems make the users wait for long periods by making them answer lengthy
questionnaires.
Here, I am proposing such a system which will flaunt a simple and elegant User Interface and
also be time efficient. In order to make it less time consuming we are aiming at a more specific
questionnaire which will be followed by the system. Our aim with this system is to be the
connecting bridge between doctors and patients. The main feature will be the machine learning,
in which we will be using algorithms such as Naïve Bayes Algorithm, K-Nearest Algorithm,
Decision Tree Algorithm, Random Forest Algorithm and Support Vector Machine, which will
help us in getting accurate predictions and Also, will find which algorithm gives a faster and
efficient result by comparatively-comparing. Another feature that our system will consist of is
Doctor’s Consultation. After delivering the results, our system will also suggest the user to get a
doctor's consultation on this report. By using this feature, we will not only address the other class
of users i.e. the Doctors but we will also gain their trust in this system as in that this system is not
affecting their business.
The process of training models is a fundamental process in Machine learning Projects. There are
two approaches to machine learning mainly Supervised Learning and Unsupervised Learning.
The system is trained using the Training set and then the model is asked to predict new values
based on the test set. In our system we aim at first applying different algorithms on the training
dataset and based on the model’s Confidence and testing dataset accuracy, we select the best
model algorithm and apply it on testing dataset to generate accurate results.
It explains the classes used in the Disease Predictor. There are three classes used in total,
Symptoms Reader: Reads the user input and creates the list of symptoms
Symptoms Analyzer: According to symptoms parameter displays the subjective result.
Calculate Values: Calculates the probabilistic model of the diseases.
It explains the sequence of the Disease Predictor. Initially the system shows the symptoms to be
selected. The user selects the symptoms and submits to the system .The Disease Predictor
predicts and displays the result.
In this project, the standard libraries for database analysis and model creation are used. The
following are the libraries used in this project.
1. tkinter: It’s a standard GUI library of python. Python when combined with tkinter provides
fast and easy way to create GUI. It provides a powerful object-oriented tool for creating
GUI.
It provides various widgets to create GUI some of the prominent ones being:
1. Button
2. Canvas
3. Label
4. Entry
5. Check Button
6. List box
7. Message
8. Text
9. Messagebox
Some of these were used in this project to create our GUI namely messagebox, button, label,
Option Menu, text and title. Using tkinter we were able to create an interactive GUI for our
model.
2. Numpy: Numpy is the core library of scientific computing in python. It provides
powerful tools to deal with various multi-dimensional arrays in python. It is a general
purpose array processing package.
Numpy’s main purpose is to deal with multidimensional homogeneous arrays. It has tools
ranging from array creation to its handling. It makes it easier to create a n dimensional array just
by using np.zeros() or handle its contents using various other methods such as replace, arrange,
random, save, load it also helps I array processing using methods like sum, mean, std, max, min,
15 | Page Mini Project 2020-2021
all, etc.
4. sklearn: Sklearn is an open source python library with implements a huge range of machine-
learning, pre-processing, cross-validation and visualization algorithms. It features various simple
and efficient tools for data mining and data processing. It features various classification,
regression and clustering algorithm such as support vector machine, random forest classifier,
decision tree, gaussian naïve-Bayes, KNN to name a few.
In this project we have used sklearn to get advantage of inbuilt classification algorithms like
decision tree, random forest classifier and naïve Bayes. We have also used inbuilt cross
validation and visualization features such as classification report, confusion matrix and accuracy
score.
Random Forest Algorithm is a supervised learning algorithm used for both classification and
regression. This algorithm works on 4 basic steps –
1. It chooses random data samples from a dataset.
2. It constructs decision trees for every sample dataset chosen.
3. At this step every predicted result will be compiled and voted on.
4. At last most voted predictions will be selected and be presented as the result of classification.
In this project we have used a random forest classifier and the result given is ~95% accuracy.
Naïve Bayes algorithm is a family of algorithms based on naïve bayes theorem. They share a
a common principle is that every pair of predictions is independent of each other. It also
makes an assumption that features make an independent and equal contribution to the
prediction.
In this project we have used naïve bayes algorithms to gain a ~95% accurate prediction.
GUI
4.1 TESTING
The test case designed for the project is discussed below:
Test steps: 1. Select the checkbox from the list 2. Select submit
Expected Result: The symptoms selected should be submitted and further analyzed to calculate
the probability of the disease.
4.2 RESULTS
TRAINING SET
PREDICTED RESULT-1
PREDICTED RESULT-3
We set out to create a system which can predict disease on the basis of symptoms given to it.
Such a system can decrease the rush at OPDs of hospitals and reduce the workload on medical
staff. We were successful in creating such a system and use three different algorithms to do so.
On an average we achieved accuracy of ~95%. Such a system can be largely reliable to do the
job. Creating this system we also added a way to store the data entered by the user in the
database which can be used in future to help in creating a better version of such a system. Our
system also has an easy to use interface. It also has various visual representation of data collected
and results
achieved.
FURTHER SCOPE
● Facility for modifying user details.
● More interactive user interface.
● Facilities for Backup creation.
● Can be implemented as a Web page.
● Can be implemented as a Mobile Application.
● More Details and Latest Diseases.