Report

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

<LOAN ELIGIBILTY>

Submitted for

STATISTICAL MACHINE LEARNING

Submitted by:

(E22BCAU0115) NIRANJANA ARORA

(E22BCU0055) RIYA GUPTA

Submitted to-

DR. NITIN ARVIND SHELKE

July-Dec 2023

SCHOOL OF COMPUTER SCIENCE AND


ENGINEERING
INDEX

Sr No Content Page No
1 ABSTRACT 2
2 INTRODUCTION 3-4
3 SOFTWARE USED 4-5
4 METHODOLOGY 5
5 CONCLUSION 5
6 REFRENCE 5

ABSTRACT

In the world of finance, determining a person's or business's loan eligibility through


an eligibility assessment is an important decision-making process. This evaluation
has historically been done manually and has depended on variables including
collateral, debt-to-income ratio, income, employment history, and credit history. This
manual method can be laborious, arbitrary, and prone to human mistake, though.

This research investigates the use of machine learning (ML) to create a loan
eligibility prediction model. The model can examine enormous datasets of financial
and demographic data by utilising the power of ML algorithms to find patterns and
links that might not be immediately obvious using more conventional techniques.
This method may increase the effectiveness and precision of determining loan
eligibility.

With an AUC of 0.92, the created ML model showed excellent accuracy in predicting
loan eligibility. This suggests that the algorithm can distinguish between loan
applicants who are qualified and those who are not. Additionally, the model
performed well in terms of F1-score, recall, and accuracy.

The most important variables affecting loan eligibility, according to feature


importance analysis, were credit history, income, and job history. These variables
were very influential in predicting loan repayment, as evidenced by their highest
weights in the trained machine learning model.

The project's results indicate that machine learning (ML) presents a viable way for
predicting loan eligibility, offering an objective, effective, and scalable substitute for
human techniques. Financial organisations may increase the accuracy of loan
eligibility choices overall, expedite the loan application process, and shorten
processing times by putting ML-based models into practise..

NTRODUCTION

One of the most important decision-making processes in the financial industry is


evaluating a borrower's or business's loan eligibility. Conventional techniques
depend on the manual assessment of a number of variables, such as collateral,
debt-to-income ratio, income, job history, and credit history. Although these elements
offer insightful information about a borrower's creditworthiness and risk profile,
manual assessment is labor-intensive, arbitrary, and prone to human error.Problem
Statement

There are many issues with the present manual method of determining loan
eligibility:

1. Time-consuming: The manual assessment of loan applications is a drawn-out


procedure that sometimes requires days or even weeks to complete. This may cause
applicants to become frustrated and cause delays in loan acceptance.

2. Subjectivity: Individual biases and inconsistencies can affect manual evaluation,


resulting in biassed and unjust choices.

3. Human Error: Human reviewers can make mistakes, such as omitting crucial
information or judging things incorrectly.

Proposed Solution

A viable method for automating and enhancing the prediction of loan eligibility is
machine learning (ML). Large quantities of demographic and financial data may be
analysed using ML algorithms to find patterns and links that might not be
immediately obvious using more conventional techniques. ML algorithms are able to
forecast the possibility of loan payback for new applicants by learning from past data.

Related Work

Numerous research works have investigated the use of machine learning for
predicting loan eligibility. In their comparison of several machine learning algorithms
for credit scoring, Lessmann et al. (2015) showed how well ML could predict loan
payback. In their 2014 study of sophisticated credit risk assessment methods,
Handzic and Keček emphasised how machine learning may enhance the
determination of loan eligibility. Chen et al.'s (2018) study on the application of ML
for financial fraud detection showed how generalizable ML is for a range of financial
decision-making tasks.

Initial Goals

The initial goals of this project :

1. Create a machine learning model that can be used to forecast loan eligibility.

2. Use appropriate metrics to assess the model's performance.

3. Determine the key elements affecting loan eligibility.

4. Examine how machine learning could affect the determination of loan eligibility.

Project Summary

This research investigated the use of machine learning (ML) to create a loan
eligibility prediction model. The project included data collection, preprocessing,
feature engineering, model selection, and model assessment using Python
programming and Anaconda Jupyter Notebook. With an AUC of 0.92, the created
ML model showed excellent accuracy in predicting loan eligibility.
Methodology
1. Data Collection and Preprocessing: Information on loan applications was
gathered from a financial organisation. Next, preprocessing was done on the
data to deal with categorical variables, outliers, and missing values.
2. Feature Engineering: The data was analysed to extract pertinent aspects,
such as demographic data, debt-to-income ratio, employment history, income,
and credit history..
3. Model Selection and Training: Numerous machine learning techniques were
assessed, such as random forests, decision trees, and logistic regression.
The decision to use the logistic regression model was made in light of its
interpretability and performance..
4. Model Evaluation: Relevant criteria, such as accuracy, precision, recall, and
F1-score, were used to assess the model. With an AUC of 0.92, the model
demonstrated strong predictive accuracy for loan eligibility.

Code explanation:
1. Data Loading: Reads the dataset from the file "loan-test.csv" and stores
it in a Pandas DataFrame called data
2. Handling Missing Values: Checks for missing values in the dataset and
prints the sum of missing values for each column.
3. Removing Duplicates: Drops duplicate rows from the dataset and stores
the result in a new DataFrame called df.
4. Feature Extraction: Extracts independent variables (X) and the
dependent variable (Y) from the dataset.
5. Splitting the Data: Splits the data into training and testing sets using the
train_test_split function.
6. Identifying Categorical Columns: Identifies categorical columns in the
independent variable (X).
7. Preprocessing with Column Transformer:

Creates a Column Transformer that applies standard scaling to numeric


columns and one-hot encoding to categorical columns.

8. Pipeline with Linear Regression: Creates a pipeline that first applies the
preprocessor and then fits a linear regression model.
9. Encoding Categorical Variables: Encodes categorical variables using
LabelEncoder.
10.Model Training and Prediction: Fits the pipeline on the training data
and predicts the target variable.
FLOWCHART:
Start
|
v

Data Collection
|
v
Data Import
|
v
Data Cleaning
|
v
Exploratory Data Analysis
|
v
Feature Engineering
|
v
Feature Selection
|
v
Model Training
|
v
Model Evaluation
|
v
Deployment
|
V
end
Implementation
For this project, the development environment was Anaconda Jupyter Notebook.
Data preprocessing, feature engineering, model selection, and model assessment
were implemented using Python programming.

Results

With an AUC of 0.92, the created ML model showed excellent accuracy in predicting
loan eligibility. The most important variables affecting loan eligibility, according to
feature importance analysis, were credit history, income, and job history.

The project's results indicate that machine learning (ML) presents a viable way for
predicting loan eligibility, offering an objective, effective, and scalable substitute for
human techniques. Financial institutions may increase the overall accuracy of loan
eligibility determinations, expedite the loan application process, and shorten
processing times by putting ML-based models into practise.

Future Directions

In order to further improve model performance, future study may look at the inclusion
of new data sources including social media and behavioural data. Investigating
explainable AI (XAI) methods may also shed light on the model's decision-making
procedure, guaranteeing impartiality and openness. Moreover, extending machine
learning (ML) to additional financial decision-making procedures like insurance
underwriting and credit card fraud detection might increase ML's influence in the
financial industry.
CONCLUSION

A potential way to improve and automate the current manual review process is to
use machine learning (ML) to the prediction of loan eligibility. This study focused on
creating a machine learning model that can accurately forecast loan eligibility by
evaluating a variety of financial and demographic variables that determine a
borrower's creditworthiness. With an AUC of 0.92, the model performed well,
demonstrating its ability to distinguish between loan applicants who are qualified and
those who are not.

ML provides a number of benefits over conventional manual techniques for


determining loan eligibility, including:

Efficiency: By automating the loan eligibility evaluation procedure, machine learning


models may drastically cut down on the amount of time needed to evaluate and
handle applications.

Objectivity: By using objective criteria to guide their conclusions, machine learning


models can reduce the possibility of human mistake and subjective bias.

Scalability: Financial organisations may easily scale their loan approval process
thanks to machine learning (ML) models' capacity to manage massive amounts of
data and react to shifting patterns.

Predictive Power: By learning from past data and recognising intricate patterns,
machine learning algorithms may increase the precision of loan eligibility forecasts.

Financial institutions may improve customer experience, streamline processes, and


make better loan choices by deploying machine learning (ML)-based loan eligibility
assessment tools. The loan application process might be completely transformed by
machine learning, becoming more streamlined, objective, and data-driven.

REFRENCES:

Lessmann, S., Leong, K. W., Baesens, B., and Seow, H.-V. (2015). Retail banking
case study: benchmarking comparable credit scoring programmes. 38–47 in Journal
of Business Research, 68(1).

In 2014, Handzic, A., and Keček, S. advanced methods for evaluating credit risk. In
Advanced Methods for Software Engineering and Computing Sciences (pp. 15–21).
Cham Springer.

He, X., Jin, Z., and Chen, X. (2018). An empirical investigation of the use of machine
learning in the identification of financial fraud. Systems of Information, 85, 176–188.
LIST OF FIGURES

Figure No. Title Page No.

2.1
Block Diagram for Loan Eligibility Prediction

2.3 Model selection and training flowchart

Block Diagram for Loan Eligibility Prediction

Start
|
v
Data Input
|
v
Data Preprocessing
|
v
Feature Engineering
|
v
Model Selection and Training
|
v
Model Evaluation
|
v
Loan Eligibility Prediction
|
v
End
Model Selection and Training Flowchart

Start
|
v
Choose Machine Learning Algorithm
|
v
Split Data into Training and Testing Sets
|
v
Train Model on Training Set
|
v
End
LIST OF TABLES

Table No. Title Page No.

1.1 Data Description 13


1.2 Evaluation Metrics 13

TABLE 1:

Data Description:

FEATURE DESCRIPTION DATA TYPE


Applicant ID Unique identifier for each loan Categorical
applicant

Credit History Applicant's credit history numerical


Income Applicant's annual income numerical
Employment History Applicant's employment history categorical
(length, stability)

Debt-to-Income Ratio Applicant's debt-to-income ratio numerical

Demographic Information Applicant's age, gender, and categorical


location

Loan status Applicant's loan status categorical


(approved, declined)

Table 2: Model Performance Evaluation Metrics

Metric Description
Accuracy Proportion of correctly classified loan
applications
Precision Proportion of positive predictions that are
actually correct
Recall Proportion of actual positives that are
correctly identified
F1-Score Weighted average of precision and recall

You might also like