Loan Approval Prediction Using Supervised Learning Algorithm

You are on page 1of 11

ABSTRACT

Technology has boosted the existence of humankind the quality of life they live.
Every day we are planning to create something new and different. We have a
solution for every other problem we have machines to support our lives and make us
somewhat complete in the banking sector candidate gets proofs/ backup before
approval of the loan amount. The application approved or not approved depends
upon the historical data of the candidate by the system. Every day lots of people
applying for the loan in the banking sector but Bank would have limited funds. In this
case, the right prediction would be very beneficial using some classes-function
algorithm. An example the logistic regression, random forest classifier, support
vector machine classifier, etc. A Bank's profit and loss depend on the amount of the
loans that is whether the Client or customer is paying back the loan. Recovery of
loans is the most important for the banking sector. The improvement process plays
an important role in the banking sector. The historical data of candidates was used to
build a machine learning model using different classification algorithms. The main
objective of this paper is to predict whether a new applicant granted the loan or not
using machine learning models trained on the historical data set.

V
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.


1 INTRODUCTION 1
2 2
LITERATURE SURVEY

2.1 LITERATURE SURVEY 1 2


2.2 LITERATURE SURVEY 2 2
2.3 LITERATURE SURVEY 3 3
2.4 LITERATURE SURVEY 4 4
2.5 LITERATURE SURVEY 5 4
3 METHODOLOGY 5
3.1 EXISITING SYSTEM 5
3.2 PROPODED SYSTEM 5
3.3 SYSTEM ARCHITECTURE 6
3.4 FLOW CHART 6
3.5 UML DIAGRAMS 7
3.5.1 USE CASE DIAGRAM 8
3.5.2 CLASS DIAGRAM 9
3.5.3 SEQUENCE DIAGRAM 10
3.5.4 ACTIVITY DIAGRAM 10
3.6 SYSTEM REQUIREMENTS 11
3.6.1 HARDWARE REQUIREMENTS 11
3.6.2 SOFTWARE REQUIREMENTS 11

4 SOFTWARE MODULES 12
4.1 PYTHON 12
4.1.1 PYTHON HISTORY 12
4.2 NUMPY 15
4.2.1 INSTALL NUMPY 16

VI
4.2.2 IMPORT NUMPY 16
4.2.3 USE OF NUMPY 17
4.2.4 NUMPY ARRAY 17
4.3 MATPLOTLIB 18
4.4 MODULE DESCRIPTION 18
4.4.1 DATA COLLECTION 18
4.4.2 DATASET 18
4.4.3 DATA PREPERATION 19
4.4.4 MODEL SELECTION 19
4.4.5 ANALYSE AND PREDICTION 19
4.4.6 ACCURACY ON TEST SET 20
4.4.7 SAVING THE TRAINED MODEL 20
4.5 PROPOSED ALGORITHM 20
4.5.1 SUPPORT VECTOR MACHINE (SVM) 20
4.6 SYSTEM TESTING 37

5 CONCLUSION 40
5.1 CONCLUSION 40
5.2 FUTURE ENHANCEMENT 40
5.3 APENDICES 42
A.SAMPLE CODE 42
B.SCREENSHOTS 46

VII
LIST OF ABBREVATIONS

ABBREVATION EXPANSION

SVM Support Vector Machine

VIII
LIST OF FIGURES
FIGURE NO. FIGURE NAME PAGE NO.
3.1 Architecture Diagram 6
3.2 Flow chart 7
3.3 Use case diagram 9
3.4 Class diagram 9
3.5 Sequence Diagram 10
3.6 Activity diagram 11
5.1 Home page 48
5.2 Login page 49
5.3 Entering credentials 50
5.4 Upload page 51
5.5 Uploaded page 51
5.6 Preview of data set 52
5.7 Training data set 53
5.8 Prediction page 53
5.9 Prediction result 54
5.10 Performance Analysis 55
5.11 Chart Representation of Data set 56

IX
CHAPTER 1
INTRODUCTION

As the data are increasing daily due to digitization in the banking sector, people want
to apply for loans through the internet. Artificial intelligence (AI), as a typical method
for information investigation, has gotten more consideration increasingly. Individuals
of various businesses are utilizing AI calculations to take care of the issues
dependent on their industry information. Banks are facing a significant problem in the
approval of the loan. Daily there are so many applications that are challenging to
manage by the bank employees, and also the chances of some mistakes are high.
Most banks earn profit from the loan, but it is risky to choose deserving customers
from the number of applications. One mistake can make a massive loss to a bank.
Loan distribution is the primary business of almost every bank. This project aims to
provide a loan to a deserving applicant out of all applicants. An efficient and non-
biased system that reduces the bank‟s time employs checking every applicant on a
priority basis. The bank authorities complete all other customer‟s other formalities on
time, which positively impacts the customers. The best part is that it is efficient for
both banks and applicants. This system allows jumping on particular applications
that deserve to be approved on a priority basis. There are some features for the
prediction like- „Gender‟, „Married‟, „Dependents‟, „Education‟, „Self_ Employed‟,
„ApplicantIncome‟, „CoapplicantIncome‟, „LoanAmount‟, „Loan_Amount_Term‟,
„Credit_History‟, „Property_Area‟, „Loan_Status‟.

1
CHAPTER 2

LITERATURE SURVEY

2.1 LITERATURE SURVEY 1

“Prediction for Loan Approval using Machine Learning Algorithm”


AUTHORS: Ashwini S. Kadam, Shraddha R Nikam, Ankita A. Aher, Gayatri V.
Shelke, Amar S. Chandgude
Content:
In our banking system, banks have many products to sell but main source of income
of any banks is on its credit line. So they can earn from interest of those loans which
they credits. A bank‟s profit or a loss depends to a large extent on loans i.e. whether
the customers are paying back the loan or defaulting. By predicting the loan
defaulters, the bank can reduce its Non-performing Assets. This makes the study of
this phenomenon very important. Previous research in this era has shown that there
are so many methods to study the problem of controlling loan default. But as the
right predictions are very important for the maximization of profits, it is essential to
study the nature of the different methods and their comparison. A very important
approach in predictive analytics is used to study the problem of predicting loan
defaulters (i) Collection of Data, (ii) Data Cleaning and (iii) Performance Evaluation.
Experimental tests found that the Naïve Bayes model has better performance than
other models in terms of loan forecasting.

2.2 LITERATURE SURVEY 2

“An Approach for Prediction of Loan Approval using Machine Learning


Algorithm”
AUTHORS: Mohammad Ahmad Sheikh, Amit Kumar Goel,Tapas Kumar
Content:

In our banking system, banks have many products to sell but main source of income
of any banks is on its credit line. So they can earn from interest of those loans which
they credits.A bank's profit or a loss depends to a large extent on loans i.e. whether
the customers are paying back the loan or defaulting. By predicting the loan

2
defaulters, the bank can reduce its Non- Performing Assets. This makes the study of
this phenomenon very important. Previous research in this era has shown that there
are so many methods to study the problem of controlling loan default. But as the
right predictions are very important for the maximization of profits, it is essential to
study the nature of the different methods and their comparison. A very important
approach in predictive analytics is used to study the problem of predicting loan
defaulters: The Logistic regression model. The data is collected from the Kaggle for
studying and prediction. Logistic Regression models have been performed and the
different measures of performances are computed. The models are compared on the
basis of the performance measures such as sensitivity and specificity. The final
results have shown that the model produce different results. Model is marginally
better because it includes variables (personal attributes of customer like age,
purpose, credit history, credit amount, credit duration, etc.) other than checking
account information (which shows wealth of a customer) that should be taken into
account to calculate the probability of default on loan correctly. Therefore, by using a
logistic regression approach, the right customers to be targeted for granting loan can
be easily detected by evaluating their likelihood of default on loan. The model
concludes that a bank should not only target the rich customers for granting loan but
it should assess the other attributes of a customer as well which play a very
important part in credit granting decisions and predicting the loan defaulters.

2.3 LITERATURE SURVEY 3

“An exploratory Data Analysis for Loan Prediction based on nature of clients “
AUTHORS: X.FrencisJensy, V.P.Sumathi,Janani Shiva Shri
Content:

In India, the number of people applying for the loans gets increased for various
reasons in recent years. The bank employees are not able to analyse or predict
whether the customer can payback the amount or not (good customer or bad
customer) for the given interest rate. The aim of this paper is to find the nature of the
client applying for the personal loan. An exploratory data analysis technique is used
to deal with this problem. The result of the analysis shows that short term loans are
preferred by majority of the clients and the clients majorly apply loans for debt

3
consolidation. The results are shown in graphs that helps the bankers to understand
the client‟s behaviour.

2.4 LITERATURE SURVEY 4

“ACCURATE LOAN APPROVAL PREDICTION BASED ON MACHINE


LEARNING APPROACH”

AUTHORS: J. Tejaswini1, T. Mohana Kavya, R. Devi Naga Ramya, P. Sai


Triveni Venkata Rao Maddumala

Content: Loan approval is a very important process for banking organizations.


Banking Industry always needs a more accurate predictive modeling system for
many issues. Predicting credit defaulters is a difficult task for the banking industry.
The system approved or rejects the loan applications. Recovery of loans is a major
contributing parameter in the financial statements of a bank. It is very difficult to
predict the possibility of payment of loan by the customer. Machine Learning (ML)
techniques are very useful in predicting outcomes for large amount of data. In this
paper three machine learning algorithms, Logistic Regression (LR), Decision Tree
(DT) and Random Forest (RF) are applied to predict the loan approval of customers.
The experimental results conclude that the accuracy of Decision Tree machine
learning algorithm is better as compared to Logistic Regression and Random Forest
machine learning approaches.

2.5 LITERATURE SURVEY 5

“Predictive and probabilistic approach using logistic regression: Application


to prediction of loan approval”

AUTHORS: Vaidya

Decision taking is attained by probabilistic and predictive approaches developed by


various machine learning algorithms. This paper discusses about logistic regression
ad its mathematical representation. This paper adheres to logistic regression as a
machine learning tool in order to actualize the predictive and probabilistic
approaches to a given problem of loan approval prediction. Using logistic regression
as a tool, this paper specifically delineates about whether or not loan for a set of
records of an applicant will be approved.

4
CHAPTER 3

METHODOLOGY

3.1 EXISTING SYSTEM

 Y. Shi and P. Song proposed a method for evaluating project loans using risk
analysis. The method evaluate the risk involved in loans of commercial
banks.
 R. ZhangandD. Li used machine learning approached in prediction systems.
The machine learning approach was used for assessment of water quality. The
paper concluded that machine learning is a very unimportant tool in prediction
systems.
 C. Franket al. used machine learning in prediction of smoking status.
Different machine learning approaches were applied and investigated for
finding the smoking status. From the results it was ensured that logistic algorithm
performs better.
 R. Lopeset al. applied machine learning approach for the prediction of
credit recovery. Credit recovery is very important issue for banking
system. The prediction of credit recovery is a challenging task. Different
machine learning approach was applied to predict the credit recovery and
gradient expansion algorithms (GBM) outperformed the other machine learning
approaches.

3.2 PROPOSED SYSTEM

 This proposed model will characterize the behavior of customers on the Basis of
their record. These records are taken from the customers, and create a data set.
With the help of these data sets and training machine learning model, we predict
that the customer‟s loan will pass or not.
 The aim of this Paper is to provide quick, immediate and easy way to choose the
deserving applicants. It can provide special advantages to the bank. The Loan
Prediction System can automatically calculate the weight of each features taking

5
part in loan processing and on new test data same features are processed with
respect to their associated weight. A time limit can be set for the applicant to
check whether his/her loan can be sanctioned or not.
 Loan Prediction System allows jumping to specific application so that it can be
check on priority basis. This Paper is exclusively for the managing authority of
Bank/finance company, whole process of prediction is done privately no
stakeholders would be able to alter the processing.

3.3 SYSTEM ARCHITECTURE

Prediction
Pre- Support Results:
processi You Are Performa
Input Vector
ng and eligible for nce
Machine
dataset Analysis
Feature (SVM) loan OR
Selectio No, you and
are not Graph
Fig 3.1-Architecture Diagram

3.4 FLOW CHART

1. The DFD is also called as bubble chart. It is a simple graphical formalism that
can be used to represent a system in terms of input data to the system,
various processing carried out on this data, and the output data is generated
by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is
used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with
the system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
information flow and the transformations that are applied as data moves from
input to output.

You might also like