Welcome to Scribd!

0% found this document useful (0 votes)

3 views

t2

Uploaded by

kokobhaiya143

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

t2

Uploaded by

kokobhaiya143

0% found this document useful (0 votes)

3 views10 pages

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

3 views10 pages

t2

Uploaded by

kokobhaiya143

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 10

Search inside document

Introduction:

The Loan Approval Classification Dataset is a collection of financial data used to

predict whether a loan application will be approved or denied. It includes
information about the applicant's demographics, employment, income, and credit
history. This dataset is commonly used for training machine learning models to
automate the loan approval process.

The objective of the Loan Approval Classification Dataset is to predict whether a

loan application will be approved or denied based on various financial and
demographic factors. This dataset is used to train and evaluate machine learning
models that can automate the loan approval process, making it more efficient and
accurate.

1. Handling Missing Values

The script first looks for missing values in the dataset using some handy R
functions like is.na() and colSums(). It calculates the total number of
missing values and even breaks it down column by column. For example:
Once the missing values are identified, the script offers a couple of solutions:

 It uses na.omit() to completely remove rows with missing values if

necessary.
 For more refined handling, it fills missing values with things like the
median of the column. For example:

So, it makes sure the missing data doesn’t mess up the analysis.

2. Visualizing Missing Data

To make things clearer, the script uses graphs to visualize where data is missing.
This helps in understanding the problem better. It uses two tools:

1. Naniar Library: With gg_miss_var(), it creates a simple graph to

show how many missing values each column has.
2. VIM Library: The aggr() function makes a neat plot, coloring missing
values in red and complete ones in blue. This is like a visual heatmap of the
missing data.

Here’s an example:
3. Balancing the Dataset

The dataset might have an imbalance in how many rows belong to each class, for
example, in loan_status. If one class (like "Approved") has way more rows
than another (like "Rejected"), it can skew the analysis.

The script fixes this using the ROSE library. It tries three methods:

1. Oversampling: Adding more rows to the smaller class.

2. Undersampling: Removing rows from the larger class.
3. Combination (Both): A mix of oversampling and undersampling.

For instance:
Now, the data is balanced, making the analysis fairer.

4. Removing Duplicate Data

Duplicates in a dataset can distort results. The script identifies rows that are
repeated using the duplicated() function. For example:
Once duplicates are identified, the script removes them. It can also find duplicates
based on specific columns, like person_age,person_gender, person_gender and
person_education.

This ensures that the data is clean and has no redundancy.

5. Filtering the Data

The script demonstrates several ways to filter rows of data based on conditions.
For example:

 To find only male participants and to filter for people aged between 20 and
30:
This is useful when you want to focus on specific segments of the data.

6. Converting Between Categorical and Numeric Data

Sometimes, categorical data like "Male" and "Female" needs to be turned into
numbers (0 and 1) for better analysis. It also turns education levels (like
"Bachelor", "Master") into numbers:
This makes the data easier to work with for statistical models.

7. Normalization
Normalization is when you scale numeric values so they fit within a specific range,
like 0 to 1. The script normalizes person_income using min-max scaling:

This is useful when different attributes have very different scales.

8. Handling Outliers and Invalid Data

Outliers, or values that are way too high or low compared to the rest, can mess up
the analysis. The script identifies outliers using the Interquartile Range (IQR)
method.
 First, it calculates the lower and upper bounds:

 Then, it replaces outliers with the median income:

For invalid data (like negative ages), it simply identifies and removes or fixes
them.

Collins - Cambridge - Statistics 1 Answers Key
Document67 pages
Collins - Cambridge - Statistics 1 Answers Key
최서영
100% (2)
The Analysis of Biological Data Practice Problem Answers
Document46 pages
The Analysis of Biological Data Practice Problem Answers
Kevin Gian
40% (5)
Statistics and Probability Reviewer
Document6 pages
Statistics and Probability Reviewer
Joseph Cabailo
82% (11)
Chapter 4 Analysis and Interpretation of Assessment Results
Document36 pages
Chapter 4 Analysis and Interpretation of Assessment Results
mallarialdrain03
No ratings yet
Assignment Clustering
Document22 pages
Assignment Clustering
Netra Raina
No ratings yet
ss6th Grade Statistical Variability Chapter Questions
Document7 pages
ss6th Grade Statistical Variability Chapter Questions
api-261894355
No ratings yet
Machine Learning
Document30 pages
Machine Learning
hamoelsyed2005
No ratings yet
FDS Unit 2
Document8 pages
FDS Unit 2
Amit Adhikari
No ratings yet
Data Understanding and Prepration
Document10 pages
Data Understanding and Prepration
MohamedYounes
100% (1)
Microsoft Decision Trees Algorithm
Document7 pages
Microsoft Decision Trees Algorithm
Mohit Goyal
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
Document15 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
Malik Yousaf
No ratings yet
Chapter 3 - Visualizing Data
Document70 pages
Chapter 3 - Visualizing Data
Ryan Dinglasan
No ratings yet
FDS
Document7 pages
FDS
demoacc5043
No ratings yet
Minor Unit 3-5 2 Marks
Document4 pages
Minor Unit 3-5 2 Marks
zerolegion4
No ratings yet
Survey Method Assignment
Document5 pages
Survey Method Assignment
Dewi Iriani
No ratings yet
Mod 4
Document115 pages
Mod 4
sankalps.chintu
No ratings yet
Data Mining Vs Data Exploration UNIT-II
Document11 pages
Data Mining Vs Data Exploration UNIT-II
Hanumanthu Gouthami
No ratings yet
November 2010)
Document6 pages
November 2010)
zhangzhongshi91
No ratings yet
Feature Selection Methods Used in SAS
Document12 pages
Feature Selection Methods Used in SAS
Sumit Sidana
No ratings yet
06 Data Mining-Data Preprocessing-Cleaning
Document6 pages
06 Data Mining-Data Preprocessing-Cleaning
Raj Endran
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
Document21 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
Jeeva Jeeva
No ratings yet
Explain in Detail Different Types of Machine Learning Models?
Document14 pages
Explain in Detail Different Types of Machine Learning Models?
Sirisha
No ratings yet
Programming Python Statistics
Document7 pages
Programming Python Statistics
Tate Knight
No ratings yet
21AD71-module-1-textbook
Document75 pages
21AD71-module-1-textbook
Dhanashree
No ratings yet
ML notes
Document10 pages
ML notes
Hajra bibi
No ratings yet
Business Intelligence cw2
Document4 pages
Business Intelligence cw2
usman
No ratings yet
Data Preprocessing Techniques in ML
Document12 pages
Data Preprocessing Techniques in ML
Yasha Wakhle
No ratings yet
DSV Module-4
Document36 pages
DSV Module-4
Bhargavi
No ratings yet
Explorotary Data Analysis
Document30 pages
Explorotary Data Analysis
Sanjaya Kumar Khadanga
100% (1)
Business Anaytics Unit 1
Document37 pages
Business Anaytics Unit 1
K.V.T S
No ratings yet
Wa0000
Document38 pages
Wa0000
Aurobinda Mohanty
No ratings yet
Book Machine Learning Finance Python
Document75 pages
Book Machine Learning Finance Python
Alanger
100% (1)
Dta102 - Bid102 - Day 2
Document21 pages
Dta102 - Bid102 - Day 2
Adedamola Adegoke
No ratings yet
20 Questions On Feature Engineering and Eda
Document9 pages
20 Questions On Feature Engineering and Eda
rahul.guptaoct31
No ratings yet
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
Document17 pages
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
vanjchao
No ratings yet
Data Preprocessing
Document9 pages
Data Preprocessing
tanishq.verma2020
No ratings yet
Finals-Predictive-Time-Series-Analysis - Module
Document14 pages
Finals-Predictive-Time-Series-Analysis - Module
pbajet493
No ratings yet
Unit 3
Document30 pages
Unit 3
imjyoti1511
No ratings yet
EDA
Document9 pages
EDA
gblespaulguitarist
100% (1)
Unit 1 Notes
Document39 pages
Unit 1 Notes
zaaya3103
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
Document77 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
Anupama
No ratings yet
AI Machine Learning - Practical Applications and Insights
From Everand
AI Machine Learning - Practical Applications and Insights
Anthony Joseph
No ratings yet
Dev Answer Key
Document17 pages
Dev Answer Key
jayapriya kce
100% (1)
Machine Learning Part: Domain Overview
Document20 pages
Machine Learning Part: Domain Overview
surya prakash
No ratings yet
R For Data Science Sample Chapter
Document39 pages
R For Data Science Sample Chapter
Packt Publishing
100% (1)
Prediction of Company Bankruptcy: Amlan Nag
Document16 pages
Prediction of Company Bankruptcy: Amlan Nag
Express Business Services
100% (2)
Topic 1 Introduction PDF
Document24 pages
Topic 1 Introduction PDF
vibhav thakur
No ratings yet
Employee Attrition Miniblogs
Document15 pages
Employee Attrition Miniblogs
Codein
100% (1)
Chapter 3 - DESCRIPTIVE ANALYSIS
Document28 pages
Chapter 3 - DESCRIPTIVE ANALYSIS
Simer Fibers
No ratings yet
Chapter 2
Document17 pages
Chapter 2
Đức Anh Lê Ngọc
No ratings yet
Introduction To Data Mining
Document11 pages
Introduction To Data Mining
chupkica
No ratings yet
Autos Automobile.. EDA Project by Anjali Sinha
Document26 pages
Autos Automobile.. EDA Project by Anjali Sinha
anjalisinha_17046467
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
Document53 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
yetsedaw
No ratings yet
ML Unit 1
Document27 pages
ML Unit 1
SUJATA SONWANE
No ratings yet
A Comprehensive Guide On Microsoft Excel For Data Analysis
Document28 pages
A Comprehensive Guide On Microsoft Excel For Data Analysis
Khushi Budhiraja
No ratings yet
A) Excel Can Be Used in The Business Setting in Many Ways
Document4 pages
A) Excel Can Be Used in The Business Setting in Many Ways
valymarie
No ratings yet
Group A Assignment No2 Writeup
Document9 pages
Group A Assignment No2 Writeup
403 Chaudhari Sanika Sagar
No ratings yet
r18dbms Lab Manual
Document57 pages
r18dbms Lab Manual
Rushikesh Reddy
No ratings yet
Spss Coursework Help
Document8 pages
Spss Coursework Help
fzdpofajd
100% (2)
IS5312 Mini Project-2
Document5 pages
IS5312 Mini Project-2
lengbiao111
No ratings yet
FDS Pyq2
Document10 pages
FDS Pyq2
sonuchaure548
No ratings yet
Tableau Interview Questions 1
Document22 pages
Tableau Interview Questions 1
Christine Cao
No ratings yet
Data Mining Important
Document15 pages
Data Mining Important
saswatsanu2003
No ratings yet
2 SVM Kernel
Document8 pages
2 SVM Kernel
Krishnakant Behera
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Measures of Dispersion
Document32 pages
Measures of Dispersion
Asad
No ratings yet
Year 10 Maths - Chance - Data - Data Represnetation - Analysis - Questions (Ch2 Ex2)
Document4 pages
Year 10 Maths - Chance - Data - Data Represnetation - Analysis - Questions (Ch2 Ex2)
coffee080403
No ratings yet
Training Pharmacy Students in Self-Medication Counseling Using An Objective Structured Clinical Examination-Based Approach
Document9 pages
Training Pharmacy Students in Self-Medication Counseling Using An Objective Structured Clinical Examination-Based Approach
TIARA AMELIA
No ratings yet
PSY 307 Midterm Examination Reviewer Chapters 1 3
Document13 pages
PSY 307 Midterm Examination Reviewer Chapters 1 3
Yana Templanza
No ratings yet
MathsWatch Essential Questions SAMPLE
Document16 pages
MathsWatch Essential Questions SAMPLE
ChloëBooth
100% (1)
Basic Statistics Questions
Document16 pages
Basic Statistics Questions
beautysarah1000
No ratings yet
Weather Prediction 2
Document33 pages
Weather Prediction 2
ashwin jayagurunath
No ratings yet
Reaserch Assignment Part I
Document21 pages
Reaserch Assignment Part I
Yoseph Bekele
No ratings yet
Merits and Demerits
Document10 pages
Merits and Demerits
Ramesh Safare
No ratings yet
BADM 572 Module 4 Study Session 7 April 2019
Document44 pages
BADM 572 Module 4 Study Session 7 April 2019
KokWai Chan
No ratings yet
Midterm Psych
Document84 pages
Midterm Psych
jennincatalia
No ratings yet
Introduction To Probability and Statistics Twelfth Edition
Document47 pages
Introduction To Probability and Statistics Twelfth Edition
Khalid Bin waleed
No ratings yet
R22-Ids-Question Bank
Document4 pages
R22-Ids-Question Bank
madmaxx6222
No ratings yet
2.measures of Variation by Shakil-1107
Document18 pages
2.measures of Variation by Shakil-1107
MD AL-AMIN
No ratings yet
BS - S2022 (4519207) (GTURanker - Com)
Document3 pages
BS - S2022 (4519207) (GTURanker - Com)
Jaimin Sathavara
No ratings yet
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
Document16 pages
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
Par Veen
No ratings yet
117769
Document20 pages
117769
Aakash
No ratings yet
GIS - Manual - FINAL Students
Document174 pages
GIS - Manual - FINAL Students
vivek
No ratings yet
Standar Deviasi
Document20 pages
Standar Deviasi
Åwink Beê
No ratings yet
Statistics Volume 1_47368786_2024_12_06_19_23
Document95 pages
Statistics Volume 1_47368786_2024_12_06_19_23
shivam kushwaha
No ratings yet
Lesson 5 MMW Maeb
Document46 pages
Lesson 5 MMW Maeb
Christian John Resabal Biol
No ratings yet
Business Statistics: Shalabh Singh Room No: 231 Shalabhsingh@iim Raipur - Ac.in
Document58 pages
Business Statistics: Shalabh Singh Room No: 231 Shalabhsingh@iim Raipur - Ac.in
Dipak Kumar Patel
No ratings yet
Chapter 4 Measures of Variability PDF
Document28 pages
Chapter 4 Measures of Variability PDF
Princess Melanie Melendez
No ratings yet
Chapter 5
Document11 pages
Chapter 5
Princess Melanie Melendez
No ratings yet
Year 10 Maths Mainstream Program 2023
Document13 pages
Year 10 Maths Mainstream Program 2023
Ralph Rezin Moore
No ratings yet