Major Project Documentation Saif

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

A

MAJOR PROJECT REPORT

ON

CUSTOMER CHURN PREDICTION IN BANKING


INDUSTRY USING DATA MINING

Submitted to
Osmania University, Hyderabad
In partial fulfillment of the requirement for the award of the degree of

BACHELOR OF ENGINEERING
In

COMPUTER SCIENCE AND ENGINEERING – AI&ML

Submitted by
MOHAMMED SAIF UR RAHMAN (160920748100)

Under The Guidance Of


Mrs. SHEREEN UZMA
Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING - AI&ML


LORDS INSTITUTE OF ENGINEERING AND TECHNOLOGY
(Approved by AICTE-New Delhi and Accredited by NAAC ‘A’ grade. Affiliated to OU-HYD)
2023-2024
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING - AI&ML
LORDS INSTITUTE OF ENGINEERING AND TECHNOLOGY
(Approved by AICTE-New Delhi and Accredited by NAAC ‘A’ grade. Affiliated to OU-HYD)

CERTIFICATE

This is to certify that the major project work entitled “CUSTOMER CHURN
PREDICTION IN BANKING INDUSTRY USING DATA MINING” is submitted by
Mohammed Saif Ur Rahman (160920748100) in partial fulfillment for the award of degree of
Bachelor of Engineering In “Computer Science and Engineering – AI&ML” of Osmania University,
Hyderabad during the academic year 2023-2024. The project report has been approved as it satisfies the
academic requirementsin report of project work prescribed for the Bachelor of Engineering. The result
embodied in this projectreport has not been submitted either in partial or in full, for the award of any other
degree in this instituteor any institute or university.

Mrs. Shereen Uzma Dr. Abdul Rasool MD


ASSISTANT PROFESSOR HEAD OF THE DEPARTMENT
CSE - (AI&ML)

PRINCIPAL EXTERNAL EXAMINER

i
DECLARATION

We hereby declare that the project report entitled “CUSTOMER CHURN


PREDICTION IN BANKING INUSTRY USING DATA MINING” is submitted in partial
fulfillment of the requirement for the award of the degree of Bachelor of Engineering in
“Computer Science And Engineering - AI&ML” in the Lords Institute Of Engineering and
Technology, affiliated to Osmania University, Hyderabad, is a record of Bona fide project
work carried out by us under the guidance of Mrs. Shereen Uzma.

We further declare that the work reported in this project and the result embodied in this
project report has not been submitted either in partial or in full, for the award of any other
degree in this institute or any institute or university.

MOHAMMED SAIF UR RAHMAN (160920748100)

DATE:
PLACE: HYDERABAD

ii
ACKNOWLEDGEMENT

While developing the project and thesis preparation we were helped by many people.
We avail this opportunity to express our profound sense of gratitude to all who rendered their
valuable help and time in the completion of the project on time with quality.

We would like to express our immense gratitude and sincere thanks to Management of
Lords Institute of Engineering and Technology for providing infrastructure necessary
equipment, support and excellent academic environment which was required during research
and development of the project.

We are thankful to DR. RAVI KISHORE SINGH, the Principal of Lords Institute of
Engineering and Technology for providing the necessary guidance to pursue and successfully
complete the project on time with quality.

We are mostly obliged and grateful to the DR. ABDUL RASOOL MD, Head of the
Department, Computer Science and Engineering - AI&ML, Lords Institute of Engineering
and Technology, Hyderabad for his valuable guidance, keen and sustained interest and
encouragement throughout the implementation of this project.

We wish our deepest sense of gratitude to internal guide Mrs. SHEREEN UZMA,
Assistant Professor, Lords Institute of Engineering and Technology for her valuable advice
and guidance in the critical review, project implementation and thesis preparation.

We express our gratitude to all the other faculty members of CSE(AI&ML) Department
who helped us in learning, implementing and project execution.

Finally, we acknowledge with gratitude the unflagging support and patience of our
parents for the guidance and encouragement during this project work.

MOHAMMED SAIF UR RAHMAN (160920748100)

iii
CUSTOMER CHURN
PREDICTION IN BANKING
INSDUSTRY USING DATA
MINING

iv
ABSTRACT

A new method for customer churn analysis and prediction has been proposed. The method uses
data mining model in banking industries. This has been inspired by the fact that there are around 1.5
million churn customers in a year which is increasing every year. Churn customer prediction is an
activity carried out to predict whether the customer will leave the company or not. One way to predict
this customer churn is to use a classification technique from data mining that produces a machine
learning model. This study tested 5 different classification methods with a dataset consisting of 57
attributes. Experiments were carried out several times using comparisons between different classes.
Support Vector Machine (SVM) with a comparison of 50:50 Class sampling data is the best method for
predicting churn customers at a private bank in Indonesia. The results of this modeling can be utilized
by company who will apply strategic action to prevent customer churn.

v
INDEX

Contents Page No.

Certificate i
Declaration ii
Acknowledgement iii
Abstract v
Index vi
List of Figures ix

CHAPTER-1: INTRODUCTION 1

INTRODUCTION 1

CHAPTER-2: LITERATURE SURVEY 5

LITERATURE REVIEW 5

CHAPTER-3: FEASIBILITY STUDY 9

ECONOMIC FEASIBILITY 9

TECHNICAL FEASIBILITY 9

SOCIAL FEASIBILITY 10

CHAPTER-4: SYSTEM ANALYSIS 11

EXISTING SYSTEM 11

DISADVANTAGES OF EXISTING SYSTEM 11

PROPOSED SYSTEM 12

ADVANTAGES OF PROPOSED SYSTEM 12

CHAPTER-5: SYSTEM SPECIFICATION 13

vi
HARDWARE SPECIFICATIONS 13

SOFTWARE SPECIFICATIONS 13
CHAPTER-6: SYSTEM DESIGN 14

SYSTEM ARCHITECTURE 14

DATAFLOW DIAGRAM 14

UML DIAGRAMS 16

USECASE DIAGRAM 17

CLASS DIAGRAM 18

SEQUENCE DIAGRAM 19

ACTIVITY DIAGRAM 20

CHAPTER-7: MODULES 21

MODULES 21

MODULE DESCRIPTION 21

USER 21

ADMIN 22

DATA PREPROCESSING 22

MODELS 22

PYTHON 23

DJANGO 23

CHAPTER-8: IMPLEMENTATION AND CODE 33

INPUT DESIGN 33

OBJECTIVES 33

OUTPUT DESIGN 34
SAMPLE CODE 35

vii
USER VIEW CODE 35

ADMIN VIEW CODE 43


CHAPTER-9: TESTING 47

UNIT TESTING 47

INTEGRATION TESTING 47

FUNCTIONAL TEST 48

SYSTEM TEST 48

WHITE BOX TESTING 48

BLACK BOX TESTING 49

UNIT TESTING WITH RESPECT TO PROJECT 49

INTEGRATION TESTING WITH RESPECT TO PROJECT 50

ACCEPTANCE TESTING WITH RESPECT TO PROJECT 50

TEST CASES 51

CHAPTER-10: SCREENSHOTS 52

CHAPTER-11: CONCLUSION AND FUTURE SCOPE 61

CONCLUSION 61

FUTURE SCOPE 61

CHAPTER-12: BIBLOGRAPHY 62

viii
LIST OF FIGURES

S.no Figure No. Name of the Figure Page No.

1. 6.1 System Architecture 14

2. 6.2 Data Flow Diagram 15

3. 6.3 Use Case Diagram 17

4. 6.4 Class Diagram 18

5. 6.5 Sequence Diagram 19

6. 6.6 Activity Diagram 20

7. 7.1 Django Architecture 24

8. 7.2 Project Structure 24

7. 10.1 Home Page 52

9. 10.2 User Register Form 52

10. 10.3 User Login Form 53

11. 10.4 User Home Page 53

12. 10.5 View Dataset 54

13. 10.6 Logistic Regression Results 54

14. 10.7 Decision Tree Results 55

15. 10.8 SVM Results 55

16. 10.9 Naïve Bayes Result 56

ix
17. 10.10 Neural Network Starts 56

18. 10.11 Neural Network Results 57

19. 10.12 Admin Login 57

20. 10.13 User Activation 58

21. 10.14 Gender Comparison 58

22. 10.15 Age Comparison 59

23. 10.16 Active Members 59

24. 10.17 Credit Card Holders 60

x
CHAPTER-1

INTRODUCTION

Our case study (XYZ Bank) is one of the largest banks in Indonesia with dozens of millions of
customers who must be considered well so that they want to continue to use the facilities provided
by the company. Companies have realized that they must strive not only to get new customers but
also to retain existing customers because if existing customers become churn customers, the
number of customers will decrease if there are no more new customers. At our case study (XYZ
Bank), there are around
1.5 million churn customer in a year and increasing every year Although it can have an impact on
the decline of new customers, to get new customers costs five to six times greater than retaining
existing customers.

Some technics can be done to defend old customers, which is to predict customers who will churn.
Predicting churn customers aims to identify prospective churn customers based on past information
and previous behaviour so that incentives can be offered to survive. Data analysis can be described
as an in-depth examination of the meaning and important values available in the data to identify
important information using specific methods and techniques. One technique that can be used is
data mining techniques. Some previous research many have shown that data mining techniques
can be used to predict churn customers. The purpose of this study is to obtain the best data mining
learning model that can be implemented by XYZ Bank to prevent customers from leaving them.

Customer Churn
In the definition of banking, a churn customer can be defined as the person who closes all of his
accounts and stops doing business with the bank. Churn customers not only can result in
depreciation of funds but also can reduce company profits and other negative impacts on the
company's operations. Therefore the search and identification of customers who show a high
tendency to leave the company or predict customer churn is an important part of customer oriented
retention that aims to reduce churner.

1
Data Mining
Data mining is a technique for finding information hidden in a data set. By using statistical
techniques, mathematics, artificial intelligence and machine learning to extract and identify
potential and useful information stored in large data. Classification is one of the processes of data
mining where the way it works is to find a model or function that describes and distinguishes a
class or concept of data. This model is derived based on analysis of training data and is used to
predict the class label of an object whose label class is unknown [11]. In the case of prediction of
churn customers, data mining classification algorithms that are often used include Artificial Neural
Networks, Decision Trees Learning, Logistic Regression, Support Vector Machines, Naïve Bayes.

Evaluation of modelling can use k-fold cross validation which is one of the popular model
validation methods used. This validation method works by dividing a number of data k and
repeating iterations as much as k as well. This is so that the resulting model is not only good when
using training datasets but also good for other datasets (overfitting).

CRISP-DM

This study used CRISP-DM in conducting this research. CRISP-DM was compiled jointly by
Daimler Chrysler, SPSS and NCR in 1996 and was first published in 1999 and reported as a leading
methodology for data mining projects and analytic predictions in polls conducted in 2002, 2004
and 2007. There are six phases in CRISP-DM.

Emphasizing the higher costs associated with attracting new customers compared with retaining
existing customers, and the fact that long-term customers tend to produce more profits, Verbeke et
al. (2011) assert that customer retention increases profitability. Many competitive organizations
have realized that a key strategy for survival within the industry is to retain existing customers.
Tsai and Chen (2010) argued that “this leads to the importance of churn management.” Customer
churn represents a basic problem within the competitive atmosphere of banking industry.

According to Nie et al. (2011), a bank can increase its profits by up to 85 % by improving the
retention rate by up to 5 %. In addition, customer retention is seen as more important than in the
past. This survey seeks to identify common characteristics of churned customers in order to build
a customer churn prediction model.

2
Customer churn

According to Sharma and Panigrahi (2011), churning refers to a customer who leaves one company
to go to another company. Customer churn introduces not only some loss in income but also other
negative effects on the operation of companies (Chen et al. 2014). As Hadden et al. (2005)
stipulated, “Churn management is the concept of identifying those customers who are intending to
move their custom to a competing service provider.” Risselada et al. (2010) stated that churn
management is becoming part of customer relationship management. It is important for companies
to consider it as they try to establish long-term relationships with customers and maximize the
value of their customer base.

Electronic banking
Liébana-Cabanillas et al. (2013) recognized electronic banking portals as initial alternative
channels to the traditional bank branches. They mentioned many advantages of electronic banking;
these include convenient and global access, availability, time- and cost-saving, wider choices of
services, information transparency, customization, and financial innovation. Related works Guo-
en and Wei-dong (2008) focused on building a customer churn prediction model using SVM in the
telecommunication industry. They compared this method with other techniques such as DT,
artificial neural networks, naïve Bayesian (NB) and logistic regression. The results proved SVM
to be a simple classification method of high capability yet good precision. Anil Kumar and Ravi
(2008) used data mining to predict credit card customer churn. They used multilayer perceptron
(MLP), logistic regression, DT, random forest, radial basis function, and SVM techniques. Nie et
al. (2011) built a customer churn prediction model by using logistic regression and DT- based
techniques within the context of the banking industry. In their study, Lin et al (2011) used rough set
theory and rulebased decision-making techniques to extract rules related to customer churn in
credit card accounts using a flow network graph (a path-dependent approach to deriving decision
rules and variables). They further showed how rules and different kinds of churn are related.
Sharma and Panigrahi (2011) applied neural networks to predict customer churn from cellular
network services. The results indicated that neural networks could predict customer churn with an
accuracyof higher than 92 %. Saradhi and Palshikar (2011) compared machine learning techniques
used to build an employee churn prediction model. Yu et al. (2011) applied neural network, SVM,
DT, and extended SVM (ESVM) techniques to forecast customer churn. Of the methods studied,

3
ESVM performed best. Huang et al. (2012) presented new-features-based logistic regression (LR),
linear classifier (LC), NB, DT, MLP neural networks, and SVM. In their experiments, each
technique produced a different output. Data mining by evolutionary learning (DMEL) could show
the reason or probability of a churning phenomenon; DT, however, could only show the reason.
LR, NB, and MLP could provide probabilities of different customer behaviors. LC and SVM could
distinguish between a churner and a non-churner. Farquad et al. (2014) used SVM to predict
customer churn from bank credit cards. They introduced a hybrid approach to extract rules from
SVM for customer relationship management purposes. The approach is composed of three phases
where: 1) SVM-recursive feature elimination is applied to reduce the feature set; 2) the obtained
dataset is used to build the SVM model; and 3) using NB, tree rules are generated. Keramati et al.
(2014) not only presented different approaches to data mining and classification methods such as
DT, neural networks, SVM, and k-nearest neighbors, but also had the performances of these
approaches compared. They analyzed, as a case study, data from an Iranian mobile company.

4
CHAPTER-2

LITERATURE SURVEY

1) Customer churn analysis in banking sector using data mining


techniques

AUTHORS: Oyeniyi, A., & Adeyemo

Customer churn has become a major problem within a customer centred banking industry and
banks have always tried to track customer interaction with the company, in order to detect early
warning signs in customer's behaviour such as reduced transactions, account status dormancy and
take steps to prevent churn. This paper presents a data mining modelthat can be usedto predict which
customers are most likely to churn (or switch banks). The study used real-life customer records
provided by a major Nigerian bank. The raw data was cleaned, pre-processed and then analysed
using WEKA, a data mining software tool for knowledge analysis. Simple K- Means was used for
the clustering phase while a rule-based algorithm, JRip was used for the rule generation phase. The
results obtained showed that the methods used can determine patterns in customer behaviours and
help banks to identify likely churners and hence develop customer retention modalities. The
regulatory framework within which financial institutions and insurance firms operate require their
interaction with customers to be tracked, recorded, stored in Customer Relationship Management
(CRM) databases, and then data mine the information in a way that increases customer relations,
average revenue per unit (ARPU) and decrease churn rate. According to, churn has an equal or
greater impact on Customer Lifetime Value (CLTV) when compared to one of the most regarded
Key Performance Indicator (KPI’s) such as Average Revenue Per User (ARPU). As one of the
biggest destructors of enterprise value, it has become one of the top issues for the banking industry.
Customers churn prediction is aimed at determiningcustomers who are at risk of leaving, and
whether such customers are worth retaining.

5
2) Analytical model of customer churn based on bayesian network

AUTHORS: Peng, S., Xin, G., Yunpeng, Z., & Ziyan, W

A customer churn analytical model based Bayesian network is built for prediction of customer
churn. We propose Bayesian Network approaches to predict churn motivation, mining the result
in churn characters in order to help decision-making manager formulate corresponding detainment
strategy. Experimental results show that classification performance of both methods is resultful.
Customer churn is a big problem in marketing for long time. Companies have become aware that
they should put much effort not only trying to convince customers to sign contracts, but also to
retain existing clients. On one hand, customer churn may decrease the sale. On the other hand, it
may lead to the reduction of new customers. Moreover, the cost of gaining a new customer is five
to six times of retaining an old customer. In terms of companies, it’s very important to build a well-
defined model for customer churn which can explain who and why is likely to churn. Resultful
approaches must be found in order to help decision-making manager formulate corresponding
detainment strategy. There are two kinds of methods presented by scholars to analyze customer
churn. One is traditional classification method, such as Decision Tree, Logistic Regression, Naive
BN and Clustering Analysis. The other is artificial intelligence method, such as Artificial Neural
Network (ANN), Self-organizing Map (SOM) and Evolutionary Learning. This paper adopts
Bayesian networks to build a model of customer churn analysis which can analyze the probability
of churn factors and dissect the customer behavior to provide decision basis for marketing.

3) Designing of customer and employee churn prediction model based


on data mining method and neural predictor

AUTHORS: Dolatabadi, S. H., & Keynia, F

In recent years due to increased competition between companies in the services sector, predict
churn customer in order to retain customers is so important. The impact of brand loyalty and
customer churn in an organization as well as the difficulty of attracting a new customer per lost
customer is very painful for organizations. Obtaining a predictive model customer behaviour to
plan for and deal with such cases, can be very helpful. Employee churn or loss of staffwill be close
to the customer churn, but the impact of losing a major customer for organization certainly will be
more painful (because organization do not have physical sense to losing their
6
employees) while the consequences of finding well employees instead of missed employees, As
well as the cost of in-service training that should be given to new employees could be one of the
issues that each organization would be sensitive to losing its human resources.

4) A Churn Prediction Model Using Random Forest: Analysis of Machine


Learning Techniques for Churn Prediction and Factor Identification in
Telecom Sector

AUTHORS: Irfan Ullah, Basit Raza, Ahmad Kamran Malik , Muhammad


Imran,Saif Ul Islam , And Sung Won Kim

In the telecom sector, a huge volume of data is being generated on a daily basis due to a vast client
base. Decision makers and business analysts emphasized that attaining new customers is costlier
than retaining the existing ones. Business analysts and customer relationship management (CRM)
analyzers need to know the reasons for churn customers, as well as, behavior patterns from the
existing churn customers’ data. This paper proposes a churn prediction model that uses
classification, as well as, clustering techniques to identify the churn customers and provides the
factors behind the churning of customers in the telecom sector. Feature selection is performed by
using information gain and correlation attribute ranking filter. The proposed model first classifies
churn customers data using classification algorithms, in which the Random Forest (RF) algorithm
performed well with 88.63% correctly classified instances. Creating effective retention policies is
an essential task of the CRM to prevent churners. After classification, the proposed model
segments the churning customer’s data by categorizing the churn customers in groups using cosine
similarity to provide group-based retention offers. This paper also identified churn factors that are
essential in determining the root causes of churn. By knowing the significant churn factors from
customers’ data, CRM can improve productivity, recommend relevant promotions to the group of
likely churn customers based on similar behavior patterns, and excessively improve marketing
campaigns of the company. The proposed churn prediction model is evaluated using metrics, such
as accuracy, precision, recall, f-measure, and receiving operating characteristics (ROC) area. The
results reveal that our proposed churn prediction model produced better churn classification using
the RF algorithm and customer profiling using k-means clustering. Furthermore, it also provides
factors behind the churning of churn customers through the rules generated by using the attribute-
selected classifier algorithm.

7
5) A comparison of machine learning techniques for customer churn
prediction

AUTHORS: T. Vafeiadisa, K. I. Diamantarasb , G. Sarigiannidisa , K. Ch.


Chatzisavvasa

We present a comparative study on the most popular machine learning methods applied to the
challenging problem of customer churning prediction in the telecommunications industry. In the
first phase of our experiments, all models were applied and evaluated using cross-validation on a
popular, public domain dataset. In the second phase, the performance improvement offered by
boosting was studied. In order to determine the most efficient parameter combinations we
performed a series of Monte Carlo simulations for each method and for a wide range of parameters.
Our results demonstrate clear superiority of the boosted versions of the models against the plain
(non-boosted) versions. The best overall classifier was the SVM-POLY using AdaBoost with
accuracy of almost 97% and F-measure over 84%. Customer Relationship Management (CRM) is
a comprehensive strategy for building, managing and strengthening loyal and long-lasting
customer relationships. It is broadly acknowledged and extensively applied to different fields, e.g.
telecommunications, banking and insurance, retail market, etc. One of its main objectives is
customer retention. The importance of this objective is obvious, given the fact that the cost for
customer acquisition is much greater than the cost of customer retention (in some cases it is 20
times more expensive). Thus, tools to develop and apply customer retention models (churn models)
are required and are essential Business Intelligence (BI) applications. In the dynamic market
environment, churning could be the result of low-level customer satisfaction, aggressive
competitive strategies, new products, regulations, etc. Churn models aim to identify early churn
signals and recognize customers with an increased likelihood to leave voluntarily.

8
CHAPTER-3

FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth with a very
general plan for the project and some cost estimates. During system analysis the feasibility study of
the proposed system is to be carried out. This is to ensure that the proposed system is not a burden
to the company. For feasibility analysis, some understanding of the major requirements for the
system is essential.

Three key considerations involved in the feasibility analysis are,

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of
the system is limited. The expenditures must be justified. Thus the developed system as well within
the budget and this was achieved because most of the technologies used are freely available. Only
the customized products had to be purchased.

TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the available
technical resources. This will lead to high demands on the available technical resources. This will
lead to high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.

9
SOCIAL FEASIBILITY

The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the users
solely depends on the methods that are employed to educate the user about the system and to make
him familiar with it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.

10
CHAPTER-4

SYSTEM ANALYSIS

EXISTING SYSTEM

The Previous research said that the use of data mining can help predict customer churn. These
studies use various methods in learning. Oyeniyi & Adeyemo in 2015 examined the use of
customer demographic data and customer transaction data to detect churners using the k- Means
method and JRIP algorithm. This studyonlyuses 500 data with only four attributes. This is because
this study only focuses on customers who are still carrying out transaction activities within a span
of two months before the customer closes their account. This study managed to group customers
into five groups using the k-Means algorithm and then processed using the JRIP algorithm which
produced an analysis model and evaluated using 10-fold confusion matrix and cross validation.

DISADVANTAGES OF EXISTING SYSTEM

 The difficulties faced by researchers in this study such as the value of missing data or
inconsistent data.
 The researchers also found that changing the network topology did not make the results
better because all the topologies that were tried produced similar results.
 The classification method for predicting churn customers is also influenced by customer
product ownership data. This data is used because the tendency of customers to leave can
be seen from the number of products they have or the number of products that suddenly
change as evidenced by previous research.

11
PROPOSED SYSTEM

In this Proposed system uses deductive method and the type of research is case study research and
experimental research. The experiment was conducted by creating a data mining learning model
that aims to predict customers who will churn. From these problems the research question found
is “what is the best classification model that can be used to predict churn customers, thereby
reducing the risk of customers going to Bank XYZ?” All learning models produced are then
evaluated to get the best learning model that best fits the case to be completed. For the research
phase, this study uses CRISP-DM as a framework. Data preparation refers to the formation of
dataset training, testing datasets and validation data referring to the preparation of balance data,
transactions and demographic data that will be used as input data from the model to be made.

ADVANTAGES OF PROPOSED SYSTEM

 The model used is decision tree, neural network, support vector machine (SVM), naïve
bayes and regression logistic.
 All True Positive predictions are calculated in large amounts assuming the funds have been
successfully detained for not leaving the company.
 The value of hold able funds is assumed to be the profit obtained by the company if the
model is implemented. This is based on the assumption for each customer that is detected
by churn and is actually churn after being followed up will still be a customer.

Algorithm: Decision Tree, Neural Network, Support Vector Machine (SVM), Naïve Bayes
and Regression Logistic.

12
CHAPTER-5

REQUIREMENT SPECIFICATION

The project involved analyzing the design of few applications so as to make the application
more users friendly. To do so, it was really important to keep the navigations from one screen to
the other well-ordered and at the same time reducing the amount of typing the user needs to do. In
order to make the application more accessible, the browser version had to be chosen so that it is
compatible with most of the Browsers.
Functional Requirements
 Graphical User interface with the User.
Software Requirements
For developing the application the following are the Software Requirements:
1. Python
2. Django
Operating Systems supported
1. Windows 10 64 bit OS
Technologies and Languages used to Develop
1. Python

Debugger and Emulator

 Any Browser (Particularly Chrome)

Hardware Requirements
For developing the application the following are the Hardware Requirements:
 Processor: Intel i7

 RAM: 16 GB

 Space on Hard Disk: minimum 1 TB

13
CHAPTER-6

SYSTEM DESIGN

SYSTEM ARCHITECTURE

Fig 6.1 System Architecture

DATA FLOW DIAGRAM

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used
to represent a system in terms of input data to the system, various processing carried out on
this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by
the process, an external entity that interacts with the system and the information flows in
the system.

3. DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any level
of abstraction. DFD may be partitioned into levels that represent increasing information
flow and functional detail.

14
Fig 6.2 Data Flow Diagram

15
UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general- purpose
modeling language in the field of object-oriented software engineering. The standard is managed,
and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object oriented
computer software. In its current form UML is comprised of two major components: a Meta-model
and a notation. In the future, some form of method or process may also be added to; or associated
with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business modeling
and other non-software systems.
The UML represents a collection of best engineering practices that have proven successful
in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the software
development process. The UML uses mostly graphical notations to express the design of software
projects.

GOALS
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns
and components
7. Integrate best practices.

16
USE CASE DIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the actors in the system
can be depicted.

Fig 6.3 Use Case Diagram

17
CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of
static structure diagram that describes the structure of a system by showing the system's classes,
their attributes, operations (or methods), and the relationships among the classes. It explains which
class contains information.

Fig 6.4 Class Diagram

18
SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams.

Fig 6.5 Sequence Diagram

19
ACTIVITY DIAGRAM

Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.

Fig 6.6 Activity Diagram

20
CHAPTER-7

MODULES

MODULES

 User
 Admin
 Data Preprocess
 Models
 Python
 Django

MODULES DESCRIPTION

User

The User can register the first. While registering he required a valid user email and mobile for
further communications. Once the user register then admin can activate the customer. Once admin
activated the users then user can login into our system. The user can see the dataset of banking
customers.in the dataset there is 10000 records are there to find which customer can leave the bank.
And we took these are the attributes of the users CustomerId, Surname , CreditScore, Geography,
Gender, Age, Tenure, Balance, NumOfProducts, HasCrCard, IsActiveMember, EstimatedSalary,
Exited. By using this attributes user can perform the model execution by specified attributes.

21
Admin

Admin can login with his credentials. Once he login he can activate the users. The activated user only
login in our applications. The admin can set the training and testing data for the project dynamically
to the code. Here 1/3 of data split into training and testing. Admin can view all the models training
and testing accuracy and training accuracy. After that the admin can see the confusion matrix of the
algorithms.

Data Preprocessing

Pre-processing refers to the transformations applied to our data before feeding it to the algorithm. Data
Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words,
whenever the data is gathered from different sources it is collected in raw format which is not feasible
for the analysis.

For achieving better results from the applied model in Machine Learning projects the format of the
data has to be in a proper manner. Some specified Machine Learning model needs information in a
specified format, for example, Random Forest algorithm does not support null values, therefore to
execute random forest algorithm null values have to be managed from the original raw data set.
Another aspect is that data set should be formatted in such a waythat more than one Machine Learning
and Deep Learning algorithms are executed in one data set, and best out of them is chosen.

Models

At this stage a classification model will be established in accordance with the criteria and settings
previously explained. There are five classification methods that will be used, namely Decision Tree,
Neural Network, Support Vector Machine (SVM), Naïve Bayes and Logistic Regression. Recall is
something important to compare because the recall value is the percentage of success of the model
in predicting the true customer churn which is actually the churn of all the customers who actually
churn. Precision is the correct percentage of the model predicting the true customer churn which is
actually compared to the total of all customer customerspredicted by churn. This will have an effect
in calculating losses incurred by the company if it follows up on customers who are wrongly
predicted.

22
PYTHON

Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming


language. An interpreted language, Python has a design philosophy that emphasizes code readability
(notably using whitespace indentation to delimit code blocks rather than curly brackets or
keywords), and a syntax that allows programmers to express concepts in fewer lines of code than
might be used in languages such as C++or Java. It provides constructs that enable clear
programming on both small and large scales. Python interpreters are available for many operating
systems. CPython, the reference implementation of Python, is open source software and has a
community-based development model, as do nearly all of its variant implementations. CPython is
managed by the non-profit Python Software Foundation. Python features a dynamic type system and
automatic memory management. It supports multiple programming paradigms, including object-
oriented, imperative, functional and procedural, and has a large and comprehensive standard library.

DJANGO

Django is a high-level Python Web framework that encourages rapid development and
clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of
Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s
free and open source.

Django's primary goal is to ease the creation of complex, database-driven websites. Django
emphasizes reusabilityand "pluggability" of components, rapid development, and the principle of
don't repeat yourself. Python is used throughout, even for settings files and data models.

Django also provides an optional administrative create, read, update and delete interface that is
generated dynamically through introspection and configured via admin models

23
Fig 7.1 Django Architecture

Fig 7.2 Project Structure

24
Create a Project

Whether you are on Windows or Linux, just get a terminal or a cmd prompt and navigate to the
place you want your project to be created, then use this code −

$ django-admin startproject myproject


This will create a "myproject" folder with the following structure –
myproject/
manage.py
myproject/
init .py
settings.py
urls.py
wsgi.py

The Project Structure

The “myproject” folder is just your project container, it actually contains two elements −
manage.py − This file is kind of your project local django-admin for interacting with your project via
command line (start the development server, sync db...). To get a full list of command accessible via
manage.py you can use the code –

$ python manage.py help


The “myproject” subfolder − This folder is the actual python package of your project. It contains
four files −

init .py − Just for python, treat this folder as package.


settings.py − As the name indicates, your project settings.
urls.py − All links of your project and the function to call. A kind of ToC of your project.
wsgi.py − If you need to deploy your project over WSGI.

25
Setting Up Your Project

Your project is set up in the subfolder myproject/settings.py. Following are some important options
you might need to set −
DEBUG = True
This option lets you set if your project is in debug mode or not. Debug mode lets you get more
information about your project's error. Never set it to ‘True’ for a live project. However, this has
to be set to ‘True’ if you want the Django light server to serve static files. Do it only in the
development mode.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': 'database.sql',
'USER': '',
'PASSWORD': '',
'HOST': '',
'PORT': '',
}
}
Database is set in the ‘Database’ dictionary. The example above is for SQLite engine. As stated
earlier, Django also supports −

MySQL (django.db.backends.mysql)
PostGreSQL (django.db.backends.postgresql_psycopg2)
Oracle (django.db.backends.oracle) and NoSQL DB
MongoDB (django_mongodb_engine)
Before setting any new engine, make sure you have the correct db driver installed.
You can also set others options like: TIME_ZONE, LANGUAGE_CODE, TEMPLATE…
Now that your project is created and configured make sure it's working −
$ python manage.py runserver
You will get something like the following on running the above code −
Validating models...
0 errors found
26
September 03, 2015 - 11:41:50
Django version 1.6.11, using settings 'myproject.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
A project is a sum of many applications. Every application has an objective and can be reused into
another project, like the contact form on a website can be an application, and can be reused for
others. See it as a module of your project.

Create an Application

We assume you are in your project folder. In our main “myproject” folder, the same folder then
manage.py −
$ python manage.py startapp myapp
You just created myapp application and like project, Django create a “myapp” folder with the
application structure −
myapp/

init .py
admin.py
models.py
tests.py
views.py

init .py − Just to make sure python handles this folder as a package.
admin.py − This file helps you make the app modifiable in the admin interface.
models.py − This is where all the application models are stored.
tests.py − This is where your unit tests are.
views.py − This is where your application views are.

Get the Project to Know About Your Application

At this stage we have our "myapp" application, now we need to register it with our Django project
"myproject". To do so, update INSTALLED_APPS tuple in the settings.py file of your project
(add your app name) −
INSTALLED_APPS = (
27
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles', 'myapp',)

Creating forms in Django, is really similar to creating a model. Here again, we just need to inherit
from Django class and the class attributes will be the form fields. Let's add a forms.py file in myapp
folder to contain our app forms. We will create a login form.
myapp/forms.py
#-*- coding: utf-8 -*-
from django import forms
class LoginForm(forms.Form):
user = forms.CharField(max_length = 100)
password = forms.CharField(widget = forms.PasswordInput())

As seen above, the field type can take "widget" argument for html rendering; in our case, we want
the password to be hidden, not displayed. Many others widget are present in Django: DateInput
for dates, CheckboxInput for checkboxes, etc.

28
Using Form in a View

There are two kinds of HTTP requests, GET and POST. In Django, the request object passed as
parameter to your view has an attribute called "method" where the type of the request is set, and
all data passed via POST can be accessed via the request.POST dictionary.
Let's create a login view in our myapp/views.py −
#-*- coding: utf-8 -*-
from myapp.forms import LoginForm
def login(request):
username = "not logged in"
if request.method == "POST":
#Get the posted form
MyLoginForm = LoginForm(request.POST)
if MyLoginForm.is_valid():
username = MyLoginForm.cleaned_data['username']
else:
MyLoginForm = Loginform()
return render(request, 'loggedin.html', {"username" : username})
The view will display the result of the login form posted through the loggedin.html. To test it, we
will first need the login form template. Let's call it login.html.
<html>
<body>
<form name = "form" action = "{% url "myapp.views.login" %}"
method = "POST" >{% csrf_token %}
<div style = "max-width:470px;">

<center>
<input type = "text" style = "margin-left:20%;"
placeholder = "Identifiant" name = "username" />
</center>
</div>
<br>
<div style = "max-width:470px;">
29
<center>
<input type = "password" style = "margin-left:20%;"
placeholder = "password" name = "password" />
</center>
</div>
<br>
<div style = "max-width:470px;">
<center>
<button style = "border:0px; background-color:#4285F4; margin-top:8%;
height:35px; width:80%;margin-left:19%;" type = "submit"
value = "Login" >
<strong>Login</strong>
</button>
</center>
</div>
</form>
</body>
</html>
The template will display a login form and post the result to our login view above. You have
probably noticed the tag in the template, which is just to prevent Cross-site Request Forgery
(CSRF) attack on your site.
{% csrf_token %}
Once we have the login template, we need the loggedin.html template that will be rendered after
form treatment.
<html>
<body>
You are : <strong>{{username}}</strong>
</body>
</html>
Now, we just need our pair of URLs to get started: myapp/urls.py
from django.conf.urls import patterns, url
from django.views.generic import TemplateView
30
urlpatterns = patterns('myapp.views',
url(r'^connection/',TemplateView.as_view(template_name = 'login.html')),
url(r'^login/', 'login', name = 'login'))
When accessing "/myapp/connection", we will get the following login.html template rendered –

Setting Up Sessions

In Django, enabling session is done in your project settings.py, by adding some lines to the
MIDDLEWARE_CLASSES and the INSTALLED_APPS options. This should be done while
creating the project, but it's always good to know, so MIDDLEWARE_CLASSES should have −
'django.contrib.sessions.middleware.SessionMiddleware'
And INSTALLED_APPS should have −
'django.contrib.sessions'
By default, Django saves session information in database (django_session table or collection), but you
can configure the engine to store information using other ways like: in file or in cache.
When session is enabled, every request (first argument of any view in Django) has a session (dict)
attribute.
Let's create a simple sample to see how to create and save sessions. We have built a simple login
system before. Let us save the username in a cookie so, if not signed out, when accessing our login
page you won’t see the login form. Basically, let's make our login system we used in Django Cookies
handling more secure, by saving cookies server side.
For this, first lets change our login view to save our username cookie server side −
def login(request):
username = 'not logged in'
if request.method == 'POST':
MyLoginForm = LoginForm(request.POST)
if MyLoginForm.is_valid():
username = MyLoginForm.cleaned_data['username']
request.session['username'] = username
else:
MyLoginForm = LoginForm()
return render(request, 'loggedin.html', {"username" : username}

31
Then let us create formView view for the login form, where we won’t display the form if cookie is set
def formView(request):
if request.session.has_key('username'):
username = request.session['username']
return render(request, 'loggedin.html', {"username" : username})
else:
return render(request, 'login.html', {})
Now let us change the url.py file to change the url so it pairs with our new view −
from django.conf.urls import patterns, url
from django.views.generic import TemplateView
urlpatterns = patterns('myapp.views',
url(r'^connection/','formView', name = 'loginform'),
url(r'^login/', 'login', name = 'login'))

32
CHAPTER-8

IMPLEMENTATION AND CODE

INPUT DESIGN

The input design is the link between the information system and the user. It comprises
the developing specification and procedures for data preparation and those steps are necessary to put
transaction data in to a usable form for processing can be achieved by inspecting the computer to read
data from a written or printed document or it can occur by having people keying the data directly into
the system. The design of input focuses on controlling the amount of input required, controlling the
errors, avoiding delay, avoiding extra steps and keeping the process simple. The input is designed in
such a way so that it provides security and ease of use with retaining the privacy.
Input Design considered the following things:

 What data should be given as input?


 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input.
 Methods for preparing input validations and steps to follow when error occur.

OBJECTIVES

1. Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and show
the correct direction to the management for getting correct information from the computerized system.

2. It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to be free from errors. The
data entry screen is designed in such a way that all the data manipulates can be performed. It also
provides record viewing facilities.
3. When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user will not be in maize of
instant. Thus the objective of input design is to create an input layout that is easy to follow

33
OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to other
system through outputs. In output design it is determined how the information is to be displaced for
immediate need and also the hard copy output. It is the most important and direct source information
to the user. Efficient and intelligent output design improves the system’s relationship to help user
decision-making.

1. Designing computer output should proceed in an organized, well thought out manner; the
right output must be developed while ensuring that each output element is designed so that people will
find the system can use easily and effectively. When analysis design computer output, they should
Identify the specific output that is needed to meet the requirements.

2. Select methods for presenting information.

3. Create document, report, or other formats that contain information produced by the system.

The output form of an information system should accomplish one or more of the following
objectives.

 Convey information about past activities, current status or projections of the


 Future.
 Signal important events, opportunities, problems, or warnings.
 Trigger an action.
 Confirm an action.

34
SAMPLE CODE

User Side views.py


from django.shortcuts import render,HttpResponse
from django.contrib import messages
from .forms import UserRegistrationForm
from .models import UserRegistrationModel
from .util.GetCSVData import ReadBankChurnData
from .algorithms.ProcessAlgorithm import Algorithms
algo = Algoirithms()
def UserRegisterActions(request):
if request.method == 'POST':
form = UserRegistrationForm(request.POST)
if form.is_valid():
print('Data is Valid')
form.save()
messages.success(request, 'You have been successfully registered')
form = UserRegistrationForm()
return render(request, 'UserRegistrations.html', {'form': form})
else:
messages.success(request, 'Email or Mobile Already Existed')
print("Invalid form")
else:
form = UserRegistrationForm()
return render(request, 'UserRegistrations.html', {'form': form})
def UserLoginCheck(request):
if request.method == "POST":
loginid = request.POST.get('loginname')
pswd = request.POST.get('pswd')
print("Login ID = ", loginid, ' Password = ', pswd)
try:
35
check = UserRegistrationModel.objects.get(loginid=loginid, password=pswd)
status = check.status
print('Status is = ', status)
if status == "activated":
request.session['id'] = check.id
request.session['loggeduser'] = check.name
request.session['loginid'] = loginid
request.session['email'] = check.email
print("User id At", check.id, status)
return render(request, 'users/UserHome.html', {})
else:
messages.success(request, 'Your Account Not at activated')
return render(request, 'UserLogin.html')
except Exception as e:
print('Exception is ', str(e))
pass
messages.success(request, 'Invalid Login id and password')
return render(request, 'UserLogin.html', {})
def UserHome(request):
return render(request, 'users/UserHome.html', {})
def ViewChurnData(request):
obj = ReadBankChurnData()
data = obj.readData()
data = data.to_html
return render(request, 'users/ViewChurnData.html',{'data':data})
def UserDecisionTree(request):
dt_acc,dt_recall,dt_precc,dt_auc = algo.decisionTree()
return render(request,
'users/DTResult.html',{'dt_acc':dt_acc,'dt_recall':dt_recall,'dt_precc':dt_precc,'dt_auc':dt_auc})
def UserLogisticRegression(request):
dt_acc, dt_recall, dt_precc, dt_auc = algo.logisticRegressions()
36
return render(request, 'users/LGResult.html', {'dt_acc': dt_acc, 'dt_recall': dt_recall, 'dt_precc':
dt_precc, 'dt_auc': dt_auc}) def UserSVM(request):
acc, recall, precc, auc = algo.processSVM()
return render(request, 'users/SVMResult.html', {'dt_acc': acc, 'dt_recall': recall, 'dt_precc': precc,
'dt_auc': auc})
def UserNaiveBayes(request):
acc, recall, precc, auc = algo.processNaiveBayes()
return render(request, 'users/NaiveBayesResult.html', {'dt_acc': acc, 'dt_recall': recall, 'dt_precc':
precc, 'dt_auc': auc})
def UserNeuralNetworks(request):
acc, recall, precc, auc = algo.processNeuralNetwork()
return render(request, 'users/NeuralNetworkResult.html', {'dt_acc': acc, 'dt_recall': recall,
'dt_precc': precc, 'dt_auc': auc})

Models.py
from django.db import models
class UserRegistrationModel(models.Model):
name = models.CharField(max_length=100)
loginid = models.CharField(unique=True, max_length=100)
password = models.CharField(max_length=100)
mobile = models.CharField(unique=True, max_length=100)
email = models.CharField(unique=True, max_length=100)
locality = models.CharField(max_length=100)
address = models.CharField(max_length=1000)
city = models.CharField(max_length=100)
state = models.CharField(max_length=100)
status = models.CharField(max_length=100)
def str (self):
return self.loginid
class Meta:
db_table = 'UserRegistrations'
37
forms.py
from django import forms
from .models import UserRegistrationModel
class UserRegistrationForm(forms.ModelForm):
name = forms.CharField(widget=forms.TextInput(attrs={'pattern': '[a-zA-Z]+'}), required=True,
max_length=100)
loginid = forms.CharField(widget=forms.TextInput(attrs={'pattern': '[a-zA-Z]+'}), required=True,
max_length=100)
password = forms.CharField(widget=forms.PasswordInput(attrs={'pattern': '(?=.*\d)(?=.*[a-
z])(?=.*[A-Z]).{8,}','title': 'Must contain at least one number and one uppercase and lowercase letter,
and at least 8 or more characters'}), required=True, max_length=100)
mobile = forms.CharField(widget=forms.TextInput(attrs={'pattern': '[56789][0-9]{9}'}),
required=True, max_length=100)
email = forms.CharField(widget=forms.TextInput(attrs={'pattern': '[a-z0-9._%+-]+@[a-z0-9.-
]+\.[a-z]{2,}$'}), required=True, max_length=100)
locality = forms.CharField(widget=forms.TextInput(), required=True, max_length=100) address =
forms.CharField(widget=forms.Textarea(attrs={'rows': 4, 'cols': 22}), required=True,
max_length=250)
city = forms.CharField(widget=forms.TextInput(
attrs={'autocomplete': 'off', 'pattern': '[A-Za-z ]+', 'title': 'Enter Characters Only '}),
required=True, max_length=100)
state = forms.CharField(widget=forms.TextInput(
attrs={'autocomplete': 'off', 'pattern': '[A-Za-z ]+', 'title': 'Enter Characters Only '}),
required=True, max_length=100)
status = forms.CharField(widget=forms.HiddenInput(), initial='waiting', max_length=100)
class Meta():
model = UserRegistrationModel
fields = ' all '

38
ProcessAlgorithm.py
from django.conf import settings
import pandas as pd
from sklearn.metrics import
confusion_matrix,accuracy_score,precision_score,recall_score,roc_auc_score
class Algoirithms:
path = settings.MEDIA_ROOT + "\\" + "Churn_Modelling.csv"
data = pd.read_csv(path, delimiter=',')
data = data.drop(['CustomerId', 'Surname', 'RowNumber'], axis=1)
x = data.iloc[:, 0:10]
y = data.iloc[:, 10]
x = pd.get_dummies(x)

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.fit_transform(x_test)
x_train = pd.DataFrame(x_train)
x_train.head()

def decisionTree(self):
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
model = DecisionTreeClassifier()
model.fit(self.x_train, self.y_train)
y_pred = model.predict(self.x_test)
print("Training Accuracy :", model.score(self.x_train, self.y_train))
print("Testing Accuaracy :", model.score(self.x_test,self. y_test))
cm = confusion_matrix(self.y_test, y_pred)

39
print(cm)
dt_acc = accuracy_score(self.y_test, y_pred)
dt_precc = precision_score(self.y_test, y_pred)
dt_recall = recall_score(self.y_test, y_pred)
dt_auc = roc_auc_score(self.y_test, y_pred)
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator=model, X=self.x_train, y=self.y_train, cv=10)
print(cvs)
return dt_acc,dt_recall,dt_precc,dt_auc

def logisticRegressions(self):
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
model = LogisticRegression()
model.fit(self.x_train, self.y_train)
y_pred = model.predict(self.x_test)
print("Training Accuracy :", model.score(self.x_train, self.y_train))
print("Testing Accuaracy :", model.score(self.x_test,self. y_test))
cm = confusion_matrix(self.y_test, y_pred)
print(cm)
dt_acc = accuracy_score(self.y_test, y_pred)
dt_precc = precision_score(self.y_test, y_pred)
dt_recall = recall_score(self.y_test, y_pred)
dt_auc = roc_auc_score(self.y_test, y_pred)
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator=model, X=self.x_train, y=self.y_train, cv=10)
print(cvs)
return dt_acc,dt_recall,dt_precc,dt_auc

def processSVM(self):
from sklearn.svm import SVC
40
from sklearn.metrics import confusion_matrix
model = SVC()
model.fit(self.x_train, self.y_train)
y_pred = model.predict(self.x_test)
print("Training Accuracy :", model.score(self.x_train, self.y_train))
print("Testing Accuaracy :", model.score(self.x_test,self. y_test))
cm = confusion_matrix(self.y_test, y_pred)
print(cm)
dt_acc = accuracy_score(self.y_test, y_pred)
dt_precc = precision_score(self.y_test, y_pred)
dt_recall = recall_score(self.y_test, y_pred)
dt_auc = roc_auc_score(self.y_test, y_pred)
from sklearn.model_selection import cross_val_score
cvs = cross_val_score(estimator=model, X=self.x_train, y=self.y_train, cv=10)
print(cvs)
return dt_acc,dt_recall,dt_precc,dt_auc
def processNaiveBayes(self):
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix
model = GaussianNB()
model.fit(self.x_train, self.y_train)
y_pred = model.predict(self.x_test)
print("Training Accuracy :", model.score(self.x_train, self.y_train))
print("Testing Accuaracy :", model.score(self.x_test, self.y_test))
cm = confusion_matrix(self.y_test, y_pred)
print(cm)
dt_acc = accuracy_score(self.y_test, y_pred)
dt_precc = precision_score(self.y_test, y_pred)
dt_recall = recall_score(self.y_test, y_pred)
dt_auc = roc_auc_score(self.y_test, y_pred)
from sklearn.model_selection import cross_val_score
41
cvs = cross_val_score(estimator=model, X=self.x_train, y=self.y_train, cv=10)
print(cvs)
return dt_acc, dt_recall, dt_precc, dt_auc

def processNeuralNetwork(self):
from keras.models import Sequential
from keras.layers import Dense
classifier = Sequential()
classifier.add(Dense(output_dim=13, init='uniform', activation='relu', input_dim=13))
classifier.add(Dense(output_dim=9, init='uniform', activation='relu'))
classifier.add(Dense(output_dim=1, init='uniform', activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(classifier.summary())
classifier.fit(self.x_train, self.y_train, batch_size=10, nb_epoch=100)
y_pred = classifier.predict(self.x_test)
y_pred = (y_pred > 0.5)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(self.y_test, y_pred)
acc = accuracy_score(self.y_test, y_pred)
precc = precision_score(self.y_test, y_pred)
recall = recall_score(self.y_test, y_pred)
auc = roc_auc_score(self.y_test, y_pred)
print("Confusion Matrix ", cm)
print('Deep Learning ', acc, precc, recall, auc)
return acc, recall, precc, auc

42
Admin side views

from django.shortcuts import render,HttpResponse


from django.contrib import messages
from users.models import UserRegistrationModel
from .util.PreProcess import PreProcessData
# Create your views here.

def AdminLoginCheck(request):
if request.method == 'POST':
usrid = request.POST.get('loginid')
pswd = request.POST.get('pswd')
print("User ID is = ", usrid)
if usrid == 'admin' and pswd == 'admin':
return render(request, 'admins/AdminHome.html')
elif usrid == 'Admin' and pswd == 'Admin':
return render(request, 'admins/AdminHome.html')
else:
messages.success(request, 'Please Check Your Login Details')
return render(request, 'AdminLogin.html', {})

def AdminHome(request):
return render(request, 'admins/AdminHome.html')

def ViewRegisteredUsers(request):
data = UserRegistrationModel.objects.all()
return render(request, 'admins/RegisteredUsers.html', {'data': data})

def AdminActivaUsers(request):
if request.method == 'GET':

43
id = request.GET.get('uid')
status = 'activated'
print("PID = ", id, status)
UserRegistrationModel.objects.filter(id=id).update(status=status)
data = UserRegistrationModel.objects.all()
return render(request, 'admins/RegisteredUsers.html', {'data': data})

def AdminDataPreProcess(request):
obj = PreProcessData()
total_france,total_germany, total_spain = obj.startProcess()
return render(request, 'admins/AdminHome.html')

datapreprocess.py
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from django.conf import settings
class PreProcessData:
def startProcess(self):
print('Pre Process Starts')
path = settings.MEDIA_ROOT + "\\" + "Churn_Modelling.csv"
data = pd.read_csv(path)
data.head(4)
data.info()
data.describe()
data.tail()
data.isnull().sum()
data['Gender'].value_counts()
# Plotting the features of the dataset to see the correlation between them
plt.hist(x=data.Gender, bins=3, color='pink')
plt.title('comparison of male and female')
44
plt.xlabel('Gender')
plt.ylabel('population')
plt.show()
data['Age'].value_counts()
# comparison of age in the dataset

plt.hist(x=data.Age, bins=10, color='orange')


plt.title('comparison of Age')
plt.xlabel('Age')
plt.ylabel('population')
plt.show()
data['Geography'].value_counts()
# comparison of geography
plt.hist(x=data.Geography, bins=5, color='green')
plt.title('comparison of Geography')
plt.xlabel('Geography')
plt.ylabel('population')
plt.show()
data['HasCrCard'].value_counts()
# comparision of how many customers hold the credit card
plt.hist(x=data.HasCrCard, bins=3, color='red')
plt.title('how many people have or not have the credit card')
plt.xlabel('customers holding credit card')
plt.ylabel('population')

plt.show()
data['IsActiveMember'].value_counts()
# How many active member does the bank have ?
plt.hist(x=data.IsActiveMember, bins=3, color='brown')
plt.title('Active Members')
plt.xlabel('Customers')
plt.ylabel('population')
plt.show()
45
# comparison between Geography and Gender
Gender = pd.crosstab(data['Gender'], data['Geography'])
Gender.div(Gender.sum(1).astype(float), axis=0).plot(kind="bar", stacked=True, figsize=(6,6))
# comparison between geography and card holders
HasCrCard = pd.crosstab(data['HasCrCard'], data['Geography'])
HasCrCard.div(HasCrCard.sum(1).astype(float), axis=0).plot(kind='bar',
stacked=True, figsize=(6, 6))
# comparison of active member in differnt geographies

IsActiveMember = pd.crosstab(data['IsActiveMember'], data['Geography'])


IsActiveMember.div(IsActiveMember.sum(1).astype(float), axis=0).plot(kind='bar',
stacked=True, figsize=(6, 6))
# comparing ages in different geographies
Age = pd.crosstab(data['Age'], data['Geography']) Age.div(Age.sum(1).astype(float),
axis=0).plot(kind='bar', stacked=True, figsize=(15, 15))
# calculating total balance in france, germany and spain
total_france = data.Balance[data.Geography == 'France'].sum()
total_germany = data.Balance[data.Geography == 'Germany'].sum()
total_spain = data.Balance[data.Geography == 'Spain'].sum()
print("Total Balance in France :", total_france)
print("Total Balance in Germany :", total_germany)
print("Total Balance in Spain :", total_spain)
# plotting a pie chart

labels = 'France', 'Germany', 'Spain'


colors = ['cyan', 'magenta', 'orange']
sizes = [311, 300, 153]
explode = [0.01, 0.01, 0.01]
plt.pie(sizes, colors=colors, labels=labels, explode=explode, shadow=True)
plt.axis('equal')
plt.show()
return total_france,total_germany, total_spain
46
CHAPTER-9

SYSTEM TEST

The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub assemblies, assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the Software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There are various types of test. Each test
type addresses a specific testing requirement.

TYPES OF TESTS

Unit testing

Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly, and that program inputs produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately to
the documented specifications and contains clearly defined inputs and expected results.

Integration testing

Integration tests are designed to test integrated software components to


determine if they actually run as one program. Testing is event driven and is more concerned with
the basic outcome of screens or fields. Integration tests demonstrate that although the components
were individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing the
problems that arise from the combination of components.

47
Functional test

Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted. Invalid

Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures : interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key


functions, or special test cases. In addition, systematic coverage pertaining to identify Business
process flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective value
of current tests is determined.

System Test

System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.

White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose. It is
used to test areas that cannot be reached from a black box level.

48
Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests, must
be written from a definitive source document, such as specification or requirements document, such
as specification or requirements document. It is a testing in which the software under test is treated,
as a black box
.you cannot “see” into it. The test provides inputs and responds to outputs without considering how
the software works.

Unit Testing

Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as two
distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

 All field entries must work properly.


 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.

Features to be tested

 Verify that the entries are of the correct format


 No duplicate entries should be allowed
 All links should take the user to the correct page.

49
Integration Testing

Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.

The task of the integration test is to check that components or software applications,
e.g. components in a software system or – one step up – software applications at the company level –
interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

50
Sample Test Cases

Excepted Remarks(IF
S.no Test Case Result
Result Fails)
If already user
If User registration
1. User Register Pass email exist then it
successfully.
fails.
If User name and
password is Un Register Users
2. User Login Pass
correct then it will will not logged in.
getting valid page.
Data preprocess
will start from
Start Data Pre If csv file not
3. media folder Pass
process loaded the failed
whoch contain csv
file
Using sklearn Pandas function
Split 1/3 of
model selection read_csv will read
4. training and Pass
we can split the the data otherwise
testing of the data
data it failed
Based on
It check available
5. Train the models X_train,y_train we Pass
models for testing
can train the data
Test Model Data
Testing the Test Model Data
6. Pass generated if no
models generated
train no data
For our model
Check the If model not
7. accuracy score has Pass
Accuracy loaded then failed
generated
For our model
Check the Precision Score
8. precision Score Pass
Precision score generated
generated
Admin can login
with his login Invalid login
9. Admin login credential. If Pass details will not
success he get his allowed here
home page
Admin can Admin can If user id not
activate the activate the Pass found then it
10. register users register user id won’t login.

51
CHAPTER-10

SCREENSHOTS

Fig 10.1 Home Page

Fig 10.2 User Registration Form

52
Fig 10.3 User Login Form

Fig 10.4 User Home Page

53
Fig 10.5 View Dataset

Fig 10.6 Logistic Regression Results

54
Fig 10.7 Decision Tree Result

Fig 10.8 Support Vector Machine Result

55
Fig 10.9 Naive Bayes Result

Fig 10.10 Neural Network Starts

56
Fig 10.11 Neural Network Result

Fig 10.12 Admin Login Form

57
Fig 10.13 User Activation

Fig 10.14 Gender Comparison

58
Fig 10.15 Age Comparison

Fig 10.16 Active Members

59
Fig 10.17 Credit Card Holders

60
CHAPTER-11

CONCLUSION AND FUTURE SCOPE

CONCLUSION

The use of data mining is proven to be used in predicting customer churn in the banking business.
The number of samples of data used for learning greatly influences the results of modeling. The
number of inter-class comparisons greatly influences the recall results where the comparison of the
50:50 data will result in a greater recall value (average 70%) compared to the other two settings. In
this study using around 15.949 samples of data so for each class around 7.975 samples of data.
Accuracy values cannot be fully used as a reference for comparison if the distribution of data is very
unbalanced. The best model is the model with the highest profit value, namely the 50:50 SVM
sampling model with a profit value of 456 billion with loss and benefit calculations such as Table
5.6 with the five most significant attributes is vintage, volume of EDC (Electronic Data Capture)
transaction, amount of EDC (Electronic Data Capture) transaction, average balance in one month
and age. This is in line with the research of Dolatabadi et al. (2017) which obtained SVM as
modeling with the best accuracy in its research, but Logistic Regression is also worth considering
because it results in smaller losses.

FUTURE SCOPE

In the studypredicts churn customers using the data mining classification technique encountered several
obstacles and suggestions for further research such as, difficulties in accessing data and the amount of
data that are too large are obstacles in this study so that further research can consider the use of more
adequate hardware so that the obstacles that constrain this research can be overcome and be able to
process larger datasets. Additional types of data used in modeling such as data handling complaints
services, customer satisfaction levels and external data such as social media can be added.

61
CHAPTER-12

BIBLOGRAPHY
[1] Oyeniyi, A., & Adeyemo, A. (2015). Customer churn analysis in banking
sector using data mining techniques. African Journal of Computing &
ICT Vol 8, 165-174.

[2] Peng, S., Xin, G., Yunpeng, Z., & Ziyan, W. (2013). Analytical model of
customer churn based on bayesian network. Ninth International
Conference on Computational Intelligence and Security (pp. 269-271).

[3] Banarescu, A. (2015). Detecting and preventing fraud with data analytics.

[4] Dolatabadi, S. H., & Keynia, F. (2017). Designing of customer and


employee churn prediction model based on data mining method and
neural predictor. The 2nd International Conference on Computer and
Communication Systems (pp. 74-77). IEEE.

[5] Keramati, A., Ghaneei, H., & Mohammad Mirmohammadi, S. (2016).


Developing a prediction model for customer churn from electronic
banking services using data mining. Financial Innovation, 2-10.

[6] Zoric, A. B. (2016). Predicting customer churn in banking industry using


neural networks. Interdisciplinary Description of Complex Systems,
116-124.

[7] Chitra, K., & Subashini, B. (2011). Customer retention in banking sector
using predictive data mining technique. International Conference on
Information Technology. ICIT.

[8] De Caigny, A., Coussement, K., & W. De Bock, K. (2018). A new hybrid
classification algorithm for customer churn prediction based on logistic
regression and decision trees. European Journal of Operational Research
269, 760-772.

[9] Larose, D. T. (2006). Data mining methods and models. New Jersey: John
Wiley & Sons, Inc.

62
[10] Turban, E., Aronson, J. E., & Liang, T.-P. (2005). Decision support
systems and intelligent systems. New Jersey: Pearson Education, Inc.

[11] Han, j., Kamber, M., & Jian, P. (2012). Data mining concepts and
techniques third edition. Morgan Kaufmann Publishers.
[12] Vafeiadis, T., Diamantaras, K., Sarigiannidis, G., & Chatzisavvas, K.
(2015). A comparison of machine learning techniques for customer
churn prediction. Simulation Modelling Practice and Theory 55, 1-9.

[13] Azevedo, A., & Filipe Santos, M. (2008). KDD, semma and CRISP◻
DM: A parallel overview. European Conference on Data Mining (pp.
182-185). Amsterdam: IADIS.

[14] Swamynathan, M. (2017). Mastering Machine Learning with Python in


Six Steps. Bangalore: Apress.

[15] Fayyad, U., & Stolorz, P. (1997). Data mining and KDD: promise and
challenges. Future Generation Computer System 13, 99-115.

[16] Ali, A., Shamsuddin, S. M., & Ralescu, A. L. (2015). Classification with
class imbalance problem: A Review. IJASCA Volume 7, 176-204.

[17] Witten, I. H., & Frank, E. (2005). Data Mining : Practical Machine
Learning Tools and Techniques - Second Edition. San Francisco:
Morgan Kaufma

63

You might also like