0% found this document useful (0 votes)
97 views31 pages

Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)

Uploaded by

sonam sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
97 views31 pages

Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)

Uploaded by

sonam sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 31

DISEASE PREDICTION USING MACHINE LEARNING

A mini project report submitted by

V. SHARON ROSE (URK18CS178)

in partial fulfillment for the award of the degree


of

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE AND ENGINEERING

under the supervision of

Dr. ESTHER DANIEL, Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE


AND ENGINEERING
KARUNYA INSTITUTE OF TECHNOLOGY AND SCIENCES
(Declared as Deemed to be University -under Sec-3 of the UGC Act, 1956)
Karunya Nagar, Coimbatore - 641 114. INDIA

MARCH 2021

1 | Page Mini Project 2020-2021


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDE CERTIFICATE

This is to certify that the project report entitled, “Disease Prediction using Machine Learning”
is a bonafide record of Mini Project work done during the even semester of the academic year
2020-2021 by

V. SHARON ROSE (Reg. No: URK18CS178)

in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology
in Computer Science and Engineering of Karunya Institute of Technology and Sciences.

Submitted for the Viva Voce held on

Project Coordinator Signature of the Guide

2 | Page Mini Project 2020-2021


ACKNOWLEDGEMENT

First and foremost, I praise and thank ALMIGHTY GOD whose blessings have bestowed

in me the will power and confidence to carry out my project.

I am grateful to our beloved founders Late. Dr. D.G.S. Dhinakaran, C.A.I.I.B, Ph.D and

Dr. Paul Dhinakaran, M.B.A, Ph.D, for their love and always remembering us in their prayers.

I extend my thanks to our Vice Chancellor Dr. P. Mannar Jawahar, Ph.D and our

Registrar Dr. Elijah Blessing, M.E., Ph.D, for giving me this opportunity to do the project.

I would like to thank Dr. Prince Arulraj, M.E., Ph.D., Dean, School of Engineering and

Technology for his direction and invaluable support to complete the same.

I would like to place my heart-felt thanks and gratitude to Dr. J. Immanuel John Raja,

M.E., Ph.D., Head of the Department, Computer Science and Engineering for his

encouragement and guidance.

I feel it is a pleasure to be indebted to Mr. J. Andrew, M.E, (Ph.D.), Assistant Professor,

Department of Computer Science and Engineering and Dr. Esther Daniel for their invaluable

support, advice and encouragement.

I also thank all the staff members of the Department for extending their helping hands to

make this project a successful one.

I would also like to thank all my friends and my parents who have prayed and helped me

during the project work.

3 | Page Mini Project 2020-2021


ABSTRACT

Using Machine learning, our project proposes a disease prediction system. For small problems,
the users have to go personally to the hospital for check-up which is more time consuming. Also
handling the telephonic calls for appointments is quite hectic. Such a problem can be solved by
using disease prediction applications by giving proper guidance regarding healthy living. Over
the past decade, the use of the specific disease prediction tools along with the concerning health
has been increased due to a variety of diseases and less doctor-patient ratio. Thus, in this
system, we are concentrating on providing immediate and accurate disease prediction to the
users about the symptoms they enter along with the severity of disease predicted. For prediction
of diseases, different machine learning algorithms are used to ensure quick and accurate
predictions. In one channel, the symptoms entered will be cross checked with the database.
Further, it will be preserved in the database if the symptom is new which its primary work is
and the other channel will provide severity of disease predicted. A web/android application is
deployed for users for
easy portability, configuring and being able to access remotely where doctors cannot reach easily.
Therefore, this arrangement helps in easier health management.

Keywords: Machine Learning, Decision Tree Algorithm, Random Forest Algorithm, Naive
Bayes Algorithm.

4 | Page Mini Project 2020-2021


CONTENTS
Acknowledgement 3
Abstract 4
1. Introduction 6
1.1 Introduction
1.2 Objectives 7
1.3 Motivation 8
1.4 Overview of the Project
1.5 Chapter wise Summary
2. Analysis and Design 9
2.1 Functional Requirements
2.2 Non-Functional Requirements
2.3 Feasibility Analysis
2.4 Existing System 10
2.5. Proposed System
2.6. System Design 12
3. Implementation . 15
3.1. Libraries Used
3.2. Algorithm Models 17
3.3. Modules 18
4. Test results/experiments/verification . 25
4.1. Testing
4.2. Results
5. Conclusions and Further Scope 28

References 29

5 | Page Mini Project 2020-2021


CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION

At present, when one suffers from a particular disease, then the person has to visit a doctor which
is time consuming and costly too. Also if the user is out of reach of doctors and hospitals it may
be difficult for the user as the disease can not be identified. So, if the above process can be
completed using an automated program which can save time as well as money, it could be easier
to the patient which can make the process easier. There are other Heart related Disease Prediction
System using data mining techniques that analyzes the risk level of the patient.

Disease Predictor is a web based application that predicts the disease of the user with respect to
the symptoms given by the user. The Disease Prediction system has data sets collected from
different health related sites. With the help of Disease Predictor the user will be able to know the
probability of the disease with the given symptoms.

As the use of the internet is growing every day, people are always curious to know different new
things. People always try to refer to the internet if any problem arises. People have access to the
internet more than hospitals and doctors. People do not have immediate options when they suffer
with a particular disease. So, this system can be helpful to the people as they have access to the
internet 24 hours.

1.2 OBJECTIVE

It is estimated that more than 70% of people in India are prone to general body diseases like
viral, flu, cough, cold .etc, in every 2 months. Because many people don't realize that the general
body diseases could be symptoms of something more harmful, 25 % of the population succumbs
to death because of ignoring the early general body symptoms. This could be a dangerous
situation for the population and can be alarming. Hence identifying or predicting the disease at
the earliest is very important to avoid any unwanted casualties. The currently available systems
6 | Page Mini Project 2020-2021
are the systems that are either dedicated to a particular disease or are in the research phase for
algorithms when it comes to generalized disease. The purpose of this system is to provide
prediction for the general and more commonly occurring disease that when unchecked can turn
into fatal disease. The system applies data mining techniques and decision tree algorithms. This
system will predict the most possible disease based on the given symptoms and precautionary
measures required to avoid the aggression of disease, it will also help the doctors analyse the
pattern of presence of diseases in the society. In this project, the disease prediction system will
carry out data mining in its preliminary stages, the system will be trained using machine
learning. I have used three different algorithms for this purpose and gained an accuracy of
92-95%. Such a system can have a very large potential in medical treatment of the future. I have
also designed an interactive interface to facilitate interaction with the system. I have also
attempted to show and visualized the result of our study and this project.

1.3 MOTIVATION
Every machine learning pipeline is a set of operations, which are executed to produce a model.
An ML model is roughly defined as a mathematical representation of a real-world process. We
might think of the ML model as a function that takes some input data and produces an output
(classification, sentiment, recommendation, or clusters). The performance of each model is
evaluated by using evaluation metrics, such as precision, recall and accuracy. ML might solve
some problems, which can be too complex to be solved traditionally. For such problems, a
probabilistic solution that is implemented by using machine learning, might be the right way to
pursue. Disease prediction using Machine Learning provides an output disease when the user
gives the symptoms even without going to hospital. There are many people who pick on
symptoms, google them and assume that they have a chronic illness which leads to unnecessary
worrying. This kind of an app has personally helped me to predict a conclusion to calm my mind
which was actually the same when I got diagnosed.

7 | Page Mini Project 2020-2021


1.4 OVERVIEW OF THE PROJECT
The Earth is passing through a purplish patch of technology, where there is increasing demand of
intelligence and accuracy behind it. Today’s people are more likely addicted to the Internet but
they are not concerned about their personal health. In this 21st Century humans are surrounded
with technology as they are the constituent of our day to day life cycle. With this we are always
focusing on health for ourselves and our earned valuables respectively. People avoid going to
hospital for small problems which may become a major disease in the future. Establishing
question answer forums is becoming a simple way to answer those queries rather than browsing
through the list of potentially relevant documents from the web. Our basic idea is to develop a
system which will predict and give the details of the disease predicted along with its severity
which as symptoms are given as input by the user. The system will compare the symptoms with
the datasets provided in the database. The main feature will be the machine learning, in which we
will be using algorithms such as Naïve Bayes Algorithm, Decision Tree Algorithm and Random
Forest Algorithm, which will predict accurate disease and Also, will find which algorithm gives
a faster and efficient result by comparatively-comparing. According to the literature survey, this
algorithm results in the maximum accuracy for a larger dataset. The dataset contains disease as
labels and for each disease, symptoms are given. 70% of the dataset will be used as training and
30% will be used for training data. Training and testing would be done on the dataset and the
desired output will be obtained.

1.5 CHAPTER WISE SUMMARY


Chapter 1 talks about the introduction, objective, overview and motivation of the project.
Chapter 2 talks about the requirement analysis and architecture of the system. Chapter 3 talks
about the implementation part, the libraries used, algorithm models and modules used in the
program. Chapter 4 is about the testing and results predicted by the proposed system. Chapter 5
is the final chapter and talks about the conclusion and future scope of the project.

8 | Page Mini Project 2020-2021


CHAPTER 2
ANALYSIS AND DESIGN

REQUIREMENT ANALYSIS

2.1 FUNCTIONAL REQUIREMENTS


a. Predict disease with the given symptoms.

b. Compare the given symptoms with the input datasets

2.2 NON-FUNCTIONAL REQUIREMENTS

a. Display the list of symptoms where users can select the symptoms.

b. Decision Tree, Random Forest and Naïve Bayes classifier are used to classify the data sets.

2.3 FEASIBILITY ANALYSIS

2.3.1 TECHNICAL FEASIBILITY


The proposed project is technically feasible as it can be built using the existing available
technologies. It is a web based application that uses the Tkinter Framework. The technology
required by Disease Predictor is available and hence it is technically feasible.

2.3.2 ECONOMIC FEASIBILITY

The proposed project is economically feasible as the cost of the project is involved only in the
hosting of the project. As the data samples increase, which consume more time and processing
power. In that case a better processor might be needed.

9 | Page Mini Project 2020-2021


2.3.3 OPERATIONAL FEASIBILITY

The proposed project is operationally feasible as the user has basic knowledge about computers
and the Internet. Disease Predictor is based on client-server architecture where client is user and
server is the machine where datasets are stored.

2.4 EXISTING SYSTEM

In the existing system the data set is typically small, for patients and diseases with specific
conditions. These systems are mostly designed for the more colossal diseases such as Heart
Disease, Cancer etc. The pre-selected characteristics may sometimes not satisfy the changes in
the disease and its influencing factors which could lead to inaccuracy in results. As we live in a
continuously evolving world, the symptoms of diseases also evolve over a course of time. Also
most of the current systems make the users wait for long periods by making them answer lengthy
questionnaires.

2.5 PROPOSED SYSTEM

Here, I am proposing such a system which will flaunt a simple and elegant User Interface and
also be time efficient. In order to make it less time consuming we are aiming at a more specific
questionnaire which will be followed by the system. Our aim with this system is to be the
connecting bridge between doctors and patients. The main feature will be the machine learning,
in which we will be using algorithms such as Naïve Bayes Algorithm, K-Nearest Algorithm,
Decision Tree Algorithm, Random Forest Algorithm and Support Vector Machine, which will
help us in getting accurate predictions and Also, will find which algorithm gives a faster and
efficient result by comparatively-comparing. Another feature that our system will consist of is
Doctor’s Consultation. After delivering the results, our system will also suggest the user to get a
doctor's consultation on this report. By using this feature, we will not only address the other class
of users i.e. the Doctors but we will also gain their trust in this system as in that this system is not
affecting their business.

10 | Page Mini Project 2020-2021


As shown in the above figure, the raw data from the original dataset is passed onto the first phase
i.e. Data pre-processing. In Data pre-processing this raw data is then cleaned of all redundancies,
missing values etc. The new clean data is fit for training different algorithmic models on it.

The process of training models is a fundamental process in Machine learning Projects. There are
two approaches to machine learning mainly Supervised Learning and Unsupervised Learning.
The system is trained using the Training set and then the model is asked to predict new values
based on the test set. In our system we aim at first applying different algorithms on the training
dataset and based on the model’s Confidence and testing dataset accuracy, we select the best
model algorithm and apply it on testing dataset to generate accurate results.

11 | Page Mini Project 2020-2021


2.6 SYSTEM DESIGN

2.6.1 CLASS DIAGRAM

Fig 2. Class Diagram

It explains the classes used in the Disease Predictor. There are three classes used in total,
Symptoms Reader: Reads the user input and creates the list of symptoms
Symptoms Analyzer: According to symptoms parameter displays the subjective result.
Calculate Values: Calculates the probabilistic model of the diseases.

12 | Page Mini Project 2020-2021


2.6.2 STATE DIAGRAM

Fig 3. State Diagram


It explains the different state of the system. First the user opens the Disease Predictor. The user
selects the symptoms. When finished selecting symptoms the user submits the symptoms.
Disease Predictor analyzes the symptoms and displays the result.

13 | Page Mini Project 2020-2021


2.6.3 SEQUENCE DIAGRAM

Fig 4. Sequence Diagram

It explains the sequence of the Disease Predictor. Initially the system shows the symptoms to be
selected. The user selects the symptoms and submits to the system .The Disease Predictor
predicts and displays the result.

14 | Page Mini Project 2020-2021


CHAPTER 3
IMPLEMENTATION

3.1 LIBRARIES USED

In this project, the standard libraries for database analysis and model creation are used. The
following are the libraries used in this project.
1. tkinter: It’s a standard GUI library of python. Python when combined with tkinter provides
fast and easy way to create GUI. It provides a powerful object-oriented tool for creating
GUI.

It provides various widgets to create GUI some of the prominent ones being:
1. Button
2. Canvas
3. Label
4. Entry
5. Check Button
6. List box
7. Message
8. Text
9. Messagebox
Some of these were used in this project to create our GUI namely messagebox, button, label,
Option Menu, text and title. Using tkinter we were able to create an interactive GUI for our
model.
2. Numpy: Numpy is the core library of scientific computing in python. It provides
powerful tools to deal with various multi-dimensional arrays in python. It is a general
purpose array processing package.
Numpy’s main purpose is to deal with multidimensional homogeneous arrays. It has tools
ranging from array creation to its handling. It makes it easier to create a n dimensional array just
by using np.zeros() or handle its contents using various other methods such as replace, arrange,
random, save, load it also helps I array processing using methods like sum, mean, std, max, min,
15 | Page Mini Project 2020-2021
all, etc.

16 | Page Mini Project 2020-2021


Array created with numpy also behave differently then arrays created normally when they are
operated upon using operators such as +,-,*,/.
All the above qualities and services offered by numpy array makes it highly suitable for our
purpose of handling data. Data manipulation occurring in arrays while performing various
operations need to give the desired results while predicting outputs require such high operational
capabilities.
3. pandas : it is the most popular python library used for data analysis. It provides highly
optimized performance with back-end source code purely written in C or python.
Data in python can be analysed with 2 ways
1. Series
2. Dataframes
Series is a one dimensional array defined in pandas used to store any data type.
Dataframes are two-dimensional data structure used in python to store data consisting of rows
and columns.
Pandas dataframe is used extensively in this project to use datasets required for training and
testing the algorithms. Dataframes makes it easier to work with attributes and results. Several of
its inbuilt functions such as replace were used in our project for data manipulation and
preprocessing.

4. sklearn: Sklearn is an open source python library with implements a huge range of machine-
learning, pre-processing, cross-validation and visualization algorithms. It features various simple
and efficient tools for data mining and data processing. It features various classification,
regression and clustering algorithm such as support vector machine, random forest classifier,
decision tree, gaussian naïve-Bayes, KNN to name a few.
In this project we have used sklearn to get advantage of inbuilt classification algorithms like
decision tree, random forest classifier and naïve Bayes. We have also used inbuilt cross
validation and visualization features such as classification report, confusion matrix and accuracy
score.

17 | Page Mini Project 2020-2021


4.2 ALGORITHM MODELS
There are four different kind of models present in our project to predict the disease these are
Decision tree
Random forest tree
Gaussian Naïve Bayes
Decision tree is classified as a very effective and versatile classification technique. It is used in
pattern recognition and classification for image. It is used for classification in very complex
problems due to its high adaptability. It is also capable of engaging problems of higher
dimensionality. It mainly consists of three parts root, nodes and leaf.
Roots consists of attribute which has most effect on the outcome, leaf tests for value of certain
attribute and leaf gives out the output of the tree.
Decision tree is the first prediction method we have used in this project. It gives us an accuracy
of ~95%.

Random Forest Algorithm is a supervised learning algorithm used for both classification and
regression. This algorithm works on 4 basic steps –
1. It chooses random data samples from a dataset.
2. It constructs decision trees for every sample dataset chosen.
3. At this step every predicted result will be compiled and voted on.
4. At last most voted predictions will be selected and be presented as the result of classification.
In this project we have used a random forest classifier and the result given is ~95% accuracy.

Naïve Bayes algorithm is a family of algorithms based on naïve bayes theorem. They share a
a common principle is that every pair of predictions is independent of each other. It also
makes an assumption that features make an independent and equal contribution to the
prediction.
In this project we have used naïve bayes algorithms to gain a ~95% accurate prediction.

18 | Page Mini Project 2020-2021


4.3 MODULES
COLUMN DISTRIBUTION

19 | Page Mini Project 2020-2021


CORRELATION MATRIX

20 | Page Mini Project 2020-2021


SCATTER MATRIX

21 | Page Mini Project 2020-2021


DECISION TREE ALGORITHM

RANDOM FOREST ALGORITHM

22 | Page Mini Project 2020-2021


NAIVE BAYES ALGORITHM

GUI

23 | Page Mini Project 2020-2021


24 | Page Mini Project 2020-2021
THE DEVELOPED GUI

25 | Page Mini Project 2020-2021


CHAPTER 4
TEST RESULTS

4.1 TESTING
The test case designed for the project is discussed below:

Test Case- I: Submit the symptoms from the list

Precondition: The application is open.

Assumptions: The symptoms for the disease are available

Test steps: 1. Select the checkbox from the list 2. Select submit

Expected Result: The symptoms selected should be submitted and further analyzed to calculate
the probability of the disease.

4.2 RESULTS

TRAINING SET

26 | Page Mini Project 2020-2021


TESTING SET

PREDICTED RESULT-1

27 | Page Mini Project 2020-2021


PREDICTED RESULT-2

PREDICTED RESULT-3

28 | Page Mini Project 2020-2021


RESULTS

29 | Page Mini Project 2020-2021


CHAPTER 5
CONCLUSION

We set out to create a system which can predict disease on the basis of symptoms given to it.
Such a system can decrease the rush at OPDs of hospitals and reduce the workload on medical
staff. We were successful in creating such a system and use three different algorithms to do so.
On an average we achieved accuracy of ~95%. Such a system can be largely reliable to do the
job. Creating this system we also added a way to store the data entered by the user in the
database which can be used in future to help in creating a better version of such a system. Our
system also has an easy to use interface. It also has various visual representation of data collected
and results
achieved.

FURTHER SCOPE
● Facility for modifying user details.
● More interactive user interface.
● Facilities for Backup creation.
● Can be implemented as a Web page.
● Can be implemented as a Mobile Application.
● More Details and Latest Diseases.

30 | Page Mini Project 2020-2021


REFERENCES

[1] [online] http://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html


[2] [online] Disease Prediction and Doctor Recommendation System by https://www.irjet.net
[3] [online] GDPS - General Disease Prediction System by https://www.irjet.net
[4] Jaymin Patel, Prof.Tejal Upadhyay, Dr. Samir Patel “Heart disease prediction using
Machine learning and Data Mining Technique" Volume 7.Number1 Sept 2015March 2016.
[5] [online] https://ieeexplore.ieee.org/document/8819782
[6] “Disease Prediction Using Machine Learning Over Big Data”Vinitha S, Sweetlin S,
Vinusha H and Sajini S (2018).
[7] “Multi Disease Prediction Using Data Mining Techniques”K.Gomathi , Dr. D.
Shanmuga Priyaa (2017).
[8] Implementing WEKA for medical data classification and early disease prediction. “3rd IEEE
International Conference on "Computational Intelligence and Communication Technology"
(IEEE-CICT 2017)”.
[9] Kaveeshwar, S.A., and Cornwall, J., 2014, “The current state of disease mellitus in
India”. AMJ, 7(1), pp. 45-48.
[10] Dean, L., McEntyre, J., 2004, “The Genetic Landscape of Disease [Internet].
Bethesda (MD): National Center for Biotechnology Information (US); Chapter 1, Introduction to
Disease. 2004 Jul 7.
[11] [online] Machine Learning Methods used in Disease by
https://en.wikipedia.org/wiki/Machine_learning
[11] [online] https://ieeexplore.ieee.org/document/8819782

31 | Page Mini Project 2020-2021

You might also like