Business Report SMDM Project - Coded

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Business Report

SMDM Project – Coded


PGPDSBA

➢ Analysis covers:

❖ Car Sales - Austo Motor Company


❖ Credit card policy Evaluation - GODIGT Bank

Submitted by:
Amit Kumar
Contents
Problem 1 - Data Overview
1.1.1 Import the libraries……………………………………………………………………………………………………3
1.1.2 Load the data……………………………………………………………………………………………………………..3
1.1.3 Structure of the data…………………………………………………………………………………………………..3
1.1.4 Types of data……………………………………………………………………………………………………………..4
1.1.5 Treating missing values……………………………………………………………………………………………...4
1.1.6 Statistical summary……………………………………………………………………………………………………4
1.1.7 Check and treat data irregularities. ……………………………………………………………………………4
1.1.8 Observations and Insights………………………………………………………………………………………….4

Problem 1 - Univariate Analysis


1.2.1 Exploring all the variables (categorical and numerical) in the data………………………………….5-11
1.2.2 Check and treat outliers………………………………………………………………………………………………..12
1.2.3 Observations and Insights…………………………………………………………………………………………….12
Problem 1 - Bivariate Analysis
1.3.1 Explore the relationship between all numerical variables………………………………………………13
1.3.2 Explore the correlation between all numerical variables………………………………………………..14
1.3.3 Explore the relationship between categorical vs numerical variables……………………………..15-18

Problem 1 - Key Questions


1.4.1 Do men tend to prefer SUVs more compared to women?....................................................................19
1.4.2 What is the likelihood of a salaried person buying a Sedan?.............................................................19
1.4.3 What evidence or data supports Sheldon Cooper's claim that a salaried male is an easier target
for a SUV sale over a Sedan sale?...............................................................................................................................20
1.4.4 How does the amount spent on purchasing automobiles vary by gender?..................................21
1.4.5 How much money was spent on purchasing automobiles by individuals who took a personal
loan? ……………………………………………………………………………………………………………………………………21
1..4.6 How does having a working partner influence the purchase of higher-priced cars?..............21

Problem 1 - Actionable Insights & Recommendations


1.5 Actionable Insights - Business Recommendations……………………………………………………………22

Problem 2 - Framing Analytics Problem


2.1.1 Analyse the dataset …………………………………………………………………………………………………….23-25
2.1.2 list down the top 5 important variables, along with the business justifications………………26-27
Problem 1
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and Hatchback
models. In its recent board meeting, concerns were raised by the members on the efficiency of
the marketing campaign currently being used. The board has decided to rope in analytics
professional to improve the existing campaign.

Objective
They want to analyze the data to get a fair idea about the demand of customers which will help
them in enhancing their customer experience. Suppose you are a Data Scientist at the
company and the Data Science team has shared some of the key questions that need to be
answered. Perform the data analysis to find answers to these questions that will help the
company to improve the business.

Problem 1 - Data Overview

1.1.1 Importing the libraries


To analyze the data set using python, we have uploaded all the necessary libraries like Pandas,
NumPy, seaborn and Matplotlib.
1.1.2 Loading the data
Loading data is the initial step in any data analysis project. It about bringing data inside Jupiter
Notebook for analysis. Here are brief descriptions for loading data in Python:
1.1.3 Checking the structure of the data
Checking the structure of the data is a crucial step to understanding its format, features, and
initial characteristics. I have used df.head(), df.info () and df.shape().
This data has 1581 rows and 14 columns. Here is how top 5 records looks like
1.1.4 Check the types of data.
Data types of the columns are float64 (1), int64 (5), object (8)
We have 6 numerical and 8 object data types.
1.1.5 Treating missing values
Missing values affects analysis of business so identifying and treating them becomes very
important. In this Data set, we have missing values in the columns.
Gender - 53
Partner Salary – 106

We have treated these missing values and have replaced missing values in Gender with Mode
and Partner salary with mean.
1.1.6 Check the statistical summary:
I have used the describe () method in Pandas to generate a statistical summary of this Dataset. It
has helped in understanding central tendency, dispersion, and shape of the distribution of
numerical features.

1.1.7 Check for and treat (if needed) data irregularities:


Upon reviewing and analyzing the data, I noticed there are some categorical variables that has
multiple names in gender value. I have corrected the same using Replace function. The good
thing is that we don’t have any duplicity in data.
1.1.8 Observations and Insights
The available Data set looks good and sufficient for the analyzing the demand of the customers.
Looking into this dataset can provide a good amount of information for increasing sales and
customer experience. Data had few anomalies like missing records and multiple texts which has
been treated. Data also has inconsistencies like outliers which will be treated.
Problem 1 - Univariate Analysis

1.2.1 Exploring categorical and numerical variables in the data.

Exploring Categorical variable: We have the following categorical columns in the data set.
➢ Gender
➢ Profession
➢ Marital status
➢ Education
➢ Personal loan
➢ House loan
➢ Partner working
➢ Make

Using some charting/plotting technique, we will try to understand the categorical distribution of
data. We are using bar chart here to understand the distribution.
Gender: Data clearly shows the dominance of Male Car owners compared to females.
Profession: Data suggests cars are most preferred amongst salaried persons.

Married people are more into buying cars compared to Single people.

People with post graduate degree have higher number of cars than people with Graduation
degree.
Data suggests that there is no impact of personal loan status of car buyers/owners.

Data suggests that people with no house loan are buying more cars. Seems house loan liabilities
impact and disrupt the decision of buying cars.
Working partners mean more money in hand hence easy decision making of buying a car
compared to the person with non-working partners.

Probably the very important chart of categorical variables. Data suggests that Sedan is the most
preferred car followed by Hatchback. SUV is the least preferred car amongst all the make.
Exploring Numerical variable: Numerical columns can be both Int and Float values. Have
used df.select_dtypes function to find all numerical variables. We have the following
numerical columns in the data set.

➢ Age
➢ No of Dependents
➢ Salary
➢ Partner Salary
➢ Total Salary
➢ Price

To understand the numerical variable and distribution of the data, we will create a box plot to
analyze the numerical variable.
Box plot is a graphical representation that displays the distribution of a dataset and provides 5-
point summary. It consists of five key values that capture important characteristics of the data.
Here's a brief description of each component:
Minimum (Min): The smallest value in the dataset.
First Quartile (Q1): The median of the lower half of the dataset.
Median: The middle value of the dataset when it is sorted.
Third Quartile (Q3):
The median of the upper half of the dataset.
Maximum (Max): The largest value in the dataset.
Here is the graphical representation:
1.2.2 Identification and treatment of outliers
We can clearly see that there are some outliers present in few variables of this data set which
may impact proper analysis of data. Outliers in a dataset can have bad effects on data analysis.
They can mislead central tendency measures and can distort the relationships between
variables. Treating outliers appropriately is important for reliable results in data analysis.
Following variable “No_of_Dependents” and “Total_salary” have outliers in data.

Outliers’ treatment: We have used Median technique to treat the outlier of these two variables
and all the outliers have been replaced with Median values.
1.2.3 Observations and Insights
Data had outliers which has been treated now and data is ready now for analysis. Analysis
shows dominance of Salaried Male who is married. Personal loan doesn't really make any
difference in buying a car as per data. Sedan is the most preferred choice as a car.
Problem 1 - Bivariate Analysis
1.3.1 Explore the relationship between all numerical variables.
I am using pair plot to draw relationship between all the numerical variables. The chart below
shows the relationship between all numeric variables. This graphical representation has helped
in exploring the relationship between all pairs of numerical variables. This chart helps in
identification of pattern, co-relation, multivariate analysis and other analysis as well. Looking at
the graph, it shows the most data of positively related.
1.3.2 Explore the correlation between all numerical variables.
Here, I am using Heat Map for Correlation. This visually displays a correlation matrix which helps
in easily identifying the patterns in data. Correlation is a statistical measure of the strength and
relationship between two or more numerical variables. Here, we are using heatmap to draw the
correlation between numerical variables.
➢ 1 indicates a perfect positive correlation,
➢ -1 indicates a perfect negative correlation, and
➢ 0 indicates no correlation.

Data above shows Age and Price have strong positive co-relation and
Age and Number of dependents have negative co-relation.
Similarly, we can see other co-relations as well.
1.3.3 Explore the relationship between categorical vs numerical variables.

I have tried to co-related a few of the important looking categorical and numerical variables. I
will be using box plot to explore these relationships.
Total Salary vs. Make
Marital Status vs. Salary
Education vs. Salary
Personal Loan vs. Total Salary
House Loan vs. Price
Gender vs. Age
Car Choice Across Age Groups

Total Salary vs. Make


Marital Status vs. Salary

Education vs. Salary


Personal Loan vs. Total Salary

House Loan vs. Price


Gender vs. Age

Car Choice Across Age Groups


Problem 1 - Key Questions

1.4.1 Q. Do men tend to prefer SUVs more compared to women?


No, female tend to prefer SUVs more compared to men.

1.4.2 Q. What is the likelihood of a salaried person buying a Sedan?


As per the data in graph shown above, Salaried person is more likely to buy Sedan.
1.4.3 What evidence or data supports Sheldon Cooper's claim that a salaried male is an easier
target for a SUV sale over a Sedan sale?
We have seen from the data that SUVs are more preferred by Women and by people beyond
forties.
1.4.4 How does the amount spent on purchasing automobiles vary by gender?
Data suggests that on an average female are spending more money on purchasing automobiles

1.4.5 Q. How much money was spent on purchasing automobiles by individuals who took a
personal loan?
A total of Rs 27290000 has been spent by individuals who took a personal loan.
1..4.6 Q. How does having a working partner influence the purchase of higher-priced cars?
Mean of High-priced cars shown below
Partner working
NO 51377.289377
Yes 50374.233129
Data suggests that having a working partner doesn't influence the purchase of higher-priced
cars.
Problem 1 - Actionable Insights & Recommendations

1.5 Actionable Insights - Business Recommendations

Demographic Analysis:
Insight: This dataset is male dominant and mostly married people. Most of them have a post
graduate degree.
Recommendation: Need to cater a plan to reach out to more single people for buying a car and
Hatchback can be promoted for salary professionals.

Income and Affordability:


Insight: Most people have working partners with partners earning less. Data shows people have taken
house loans more than personal loans.

Recommendation: We have seen from the data that people with working partners are not spending
much on buying higher priced cars. Hatchbacks with higher prices can be recommended to them
considering their high income, dependents, and marital status.

Loan Analysis:
Insight: Half of the people in this data set have taken personal loans. Data shows people have taken
house loans more than personal loans.

Recommendation: Company can collaborate with financial institutions to offer attractive loan packages
for car buyers with lesser loan liabilities.

Make Preferences Analysis:


Insight: We have seen Sedan is the most preferred choice followed by hatchback. SUV is least preferred.

Recommendation: Need to target audience for SUV’s sales by creating customized offers.
Problem 2
Framing Analytics Problem
A bank generates revenue through interest, transaction fees, and financial advice, with interest
charged on customer loans being a significant source of profits. GODIGT Bank, a mid-sized
private bank, offers various banking products and cross-sells asset products to existing
customers through different communication methods. However, the bank is facing high credit
card attrition, leading them to reevaluate their credit card policy to ensure customers receive
the right card for higher spending and intent, resulting in profitable relationships.
Objective:

As a Data Scientist at the company and the Data Science team has shared some data. You are
supposed to find the key variables that have a vital impact on the analysis which will help the
company to improve the business.

2.1.1 Analyse the dataset


This dataset has 8448 rows and 28 columns.
Out of total 28 columns, we have 19 numeric, 8 categorical and 1 datetime.
Data has no duplicates.
We have 38 null values in column Transactor revolver. Which has been treated and dele
ted as they were having lesser number and less than 3 percent.
The columns along with data types look like shown below.
Statistical Summary: Here is the statistical summary of all the numeric variables. Where we can
understand the central tendency, dispersion, and shape of the distribution of a dataset.
Correlation is a statistical measure of the strength and relationship between two or more
numerical variables. Here, we are using heatmap to draw the correlation between numerical
variables.

➢ 1 indicates a perfect positive correlation,


➢ -1 indicates a perfect negative correlation, and
➢ 0 indicates no correlation.
2.1.2 list down the top 5 important variables, along with the business
justifications.

The following five important variables are very important to ensure customers receive the right
card for higher spending and intent.

1. Credit card activity (cc_active30, cc_active60 and cc_active90)


2. avg_spends_l3m.
3. Transactor_revolver
4. other_bank_cc_holding.
5. annual_income_at_source

Business justifications shared below:


1. Credit card activity (cc_active30, cc_active60 and cc_active90)

This is very important to analyze customer's recent credit card activity for predicting credit card
attrition. Having higher activity suggests credit card holder's engagement and satisfaction with
the credit card and suggests higher chances of reducing attrition. No activity during these
periods suggests the customer is somewhat unhappy and suggests that the customer might
terminate the card. Analyzing trends of the activity during these different periods can provide
insights into customer behavior and can help in making strategies for credit card retention.

2. avg_spends_l3m

The average amount a customer spends using their credit card in the last 3 months shows how
much they use the card for their needs. Customers who spend more are obviously better for
the bank. By looking at spending patterns of their transactions, the bank can customize special
offers or benefits to encourage customers to keep using their credit card.

3. Transactor_revolver

Transactor revolver deals with how customers handle their credit card balances. Transactors
pay off their balances every month, while revolvers carry balances over time. Banks can
understand this to manage risks and understand how customers make payment. This can give
banks lots of insight to take decisions to manage credit limits and interest rates. Understanding
this data can help banks offer customized services to keep customers happy and loyal.

4. Other_bank_cc_holding

It is very important for the bank to know if their customers have other banks’ credit cards. If the
customers have credit cards from another bank, then obviously the spending will be divided
across the cards. Once the bank identifies that the customer has another bank's credit card as
well then, the bank may woo the customers by offering special deals, loyalty programs, or with
adjusted credit limits. This also helps the bank to know about the other banks and stay
competitive.

5. annual_income_at_source

The income of the customer is the key determinant of credit card eligibility and spending
capacity. Banks can understand the relationship between income and spending behavior and
define various strategies in segmenting customers, offering credit limits, and creating
personalized products.

Thank You

You might also like