Report
Report
Report
Submitted for
Submitted by:
Submitted to-
July-Dec 2023
Sr No Content Page No
1 ABSTRACT 2
2 INTRODUCTION 3-4
3 SOFTWARE USED 4-5
4 METHODOLOGY 5
5 CONCLUSION 5
6 REFRENCE 5
ABSTRACT
This research investigates the use of machine learning (ML) to create a loan
eligibility prediction model. The model can examine enormous datasets of financial
and demographic data by utilising the power of ML algorithms to find patterns and
links that might not be immediately obvious using more conventional techniques.
This method may increase the effectiveness and precision of determining loan
eligibility.
With an AUC of 0.92, the created ML model showed excellent accuracy in predicting
loan eligibility. This suggests that the algorithm can distinguish between loan
applicants who are qualified and those who are not. Additionally, the model
performed well in terms of F1-score, recall, and accuracy.
The project's results indicate that machine learning (ML) presents a viable way for
predicting loan eligibility, offering an objective, effective, and scalable substitute for
human techniques. Financial organisations may increase the accuracy of loan
eligibility choices overall, expedite the loan application process, and shorten
processing times by putting ML-based models into practise..
NTRODUCTION
There are many issues with the present manual method of determining loan
eligibility:
3. Human Error: Human reviewers can make mistakes, such as omitting crucial
information or judging things incorrectly.
Proposed Solution
A viable method for automating and enhancing the prediction of loan eligibility is
machine learning (ML). Large quantities of demographic and financial data may be
analysed using ML algorithms to find patterns and links that might not be
immediately obvious using more conventional techniques. ML algorithms are able to
forecast the possibility of loan payback for new applicants by learning from past data.
Related Work
Numerous research works have investigated the use of machine learning for
predicting loan eligibility. In their comparison of several machine learning algorithms
for credit scoring, Lessmann et al. (2015) showed how well ML could predict loan
payback. In their 2014 study of sophisticated credit risk assessment methods,
Handzic and Keček emphasised how machine learning may enhance the
determination of loan eligibility. Chen et al.'s (2018) study on the application of ML
for financial fraud detection showed how generalizable ML is for a range of financial
decision-making tasks.
Initial Goals
1. Create a machine learning model that can be used to forecast loan eligibility.
4. Examine how machine learning could affect the determination of loan eligibility.
Project Summary
This research investigated the use of machine learning (ML) to create a loan
eligibility prediction model. The project included data collection, preprocessing,
feature engineering, model selection, and model assessment using Python
programming and Anaconda Jupyter Notebook. With an AUC of 0.92, the created
ML model showed excellent accuracy in predicting loan eligibility.
Methodology
1. Data Collection and Preprocessing: Information on loan applications was
gathered from a financial organisation. Next, preprocessing was done on the
data to deal with categorical variables, outliers, and missing values.
2. Feature Engineering: The data was analysed to extract pertinent aspects,
such as demographic data, debt-to-income ratio, employment history, income,
and credit history..
3. Model Selection and Training: Numerous machine learning techniques were
assessed, such as random forests, decision trees, and logistic regression.
The decision to use the logistic regression model was made in light of its
interpretability and performance..
4. Model Evaluation: Relevant criteria, such as accuracy, precision, recall, and
F1-score, were used to assess the model. With an AUC of 0.92, the model
demonstrated strong predictive accuracy for loan eligibility.
Code explanation:
1. Data Loading: Reads the dataset from the file "loan-test.csv" and stores
it in a Pandas DataFrame called data
2. Handling Missing Values: Checks for missing values in the dataset and
prints the sum of missing values for each column.
3. Removing Duplicates: Drops duplicate rows from the dataset and stores
the result in a new DataFrame called df.
4. Feature Extraction: Extracts independent variables (X) and the
dependent variable (Y) from the dataset.
5. Splitting the Data: Splits the data into training and testing sets using the
train_test_split function.
6. Identifying Categorical Columns: Identifies categorical columns in the
independent variable (X).
7. Preprocessing with Column Transformer:
8. Pipeline with Linear Regression: Creates a pipeline that first applies the
preprocessor and then fits a linear regression model.
9. Encoding Categorical Variables: Encodes categorical variables using
LabelEncoder.
10.Model Training and Prediction: Fits the pipeline on the training data
and predicts the target variable.
FLOWCHART:
Start
|
v
Data Collection
|
v
Data Import
|
v
Data Cleaning
|
v
Exploratory Data Analysis
|
v
Feature Engineering
|
v
Feature Selection
|
v
Model Training
|
v
Model Evaluation
|
v
Deployment
|
V
end
Implementation
For this project, the development environment was Anaconda Jupyter Notebook.
Data preprocessing, feature engineering, model selection, and model assessment
were implemented using Python programming.
Results
With an AUC of 0.92, the created ML model showed excellent accuracy in predicting
loan eligibility. The most important variables affecting loan eligibility, according to
feature importance analysis, were credit history, income, and job history.
The project's results indicate that machine learning (ML) presents a viable way for
predicting loan eligibility, offering an objective, effective, and scalable substitute for
human techniques. Financial institutions may increase the overall accuracy of loan
eligibility determinations, expedite the loan application process, and shorten
processing times by putting ML-based models into practise.
Future Directions
In order to further improve model performance, future study may look at the inclusion
of new data sources including social media and behavioural data. Investigating
explainable AI (XAI) methods may also shed light on the model's decision-making
procedure, guaranteeing impartiality and openness. Moreover, extending machine
learning (ML) to additional financial decision-making procedures like insurance
underwriting and credit card fraud detection might increase ML's influence in the
financial industry.
CONCLUSION
A potential way to improve and automate the current manual review process is to
use machine learning (ML) to the prediction of loan eligibility. This study focused on
creating a machine learning model that can accurately forecast loan eligibility by
evaluating a variety of financial and demographic variables that determine a
borrower's creditworthiness. With an AUC of 0.92, the model performed well,
demonstrating its ability to distinguish between loan applicants who are qualified and
those who are not.
Scalability: Financial organisations may easily scale their loan approval process
thanks to machine learning (ML) models' capacity to manage massive amounts of
data and react to shifting patterns.
Predictive Power: By learning from past data and recognising intricate patterns,
machine learning algorithms may increase the precision of loan eligibility forecasts.
REFRENCES:
Lessmann, S., Leong, K. W., Baesens, B., and Seow, H.-V. (2015). Retail banking
case study: benchmarking comparable credit scoring programmes. 38–47 in Journal
of Business Research, 68(1).
In 2014, Handzic, A., and Keček, S. advanced methods for evaluating credit risk. In
Advanced Methods for Software Engineering and Computing Sciences (pp. 15–21).
Cham Springer.
He, X., Jin, Z., and Chen, X. (2018). An empirical investigation of the use of machine
learning in the identification of financial fraud. Systems of Information, 85, 176–188.
LIST OF FIGURES
2.1
Block Diagram for Loan Eligibility Prediction
Start
|
v
Data Input
|
v
Data Preprocessing
|
v
Feature Engineering
|
v
Model Selection and Training
|
v
Model Evaluation
|
v
Loan Eligibility Prediction
|
v
End
Model Selection and Training Flowchart
Start
|
v
Choose Machine Learning Algorithm
|
v
Split Data into Training and Testing Sets
|
v
Train Model on Training Set
|
v
End
LIST OF TABLES
TABLE 1:
Data Description:
Metric Description
Accuracy Proportion of correctly classified loan
applications
Precision Proportion of positive predictions that are
actually correct
Recall Proportion of actual positives that are
correctly identified
F1-Score Weighted average of precision and recall