Unit2-Regression NGP

Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

Linear Regression

Regression Lines
Applications of Regression
Fitting a line to data/ Linear Regression/Least
Squares

After plotting points, we usually like to add line to our data, so that we can see what the trend is?
• How to find the optimal line to fit our data?
Thus, there will be a sweet spot between these horizontal and vertical lines.So, to find the sweet spot, lets start
with a generic line.
How good is this guess??
Calculating R2 is the first step in determining how good this guess will be.
R-Squared (R² or the coefficient of determination) is a statistical measure in a regression model
that determines the proportion of variance in the dependent variable that can be explained by
the independent variable. In other words, r-squared shows how well the data fit the regression
model (the goodness of fit).
To compute R2
R squared is about explanatory power;
The p-value is the "probability" attached to the likelihood of getting your data results
(or those more extreme) for the model you have.
The p-value is attached to the F statistic that tests the overall explanatory power for
a model based on that data (or data more extreme).
In simple Linear Regression
In Multiple Linear Regression
• It is a method of finding the best fit line.
• Method of updating b0 and b1 values
(slope and intercept)to reduce the
RMSE, i.e. find the best values for b0
and b1.
Example: ABC café chain located in different cities of India. The
manager believes that the quarterly sales for the café ( denoted by y)
are related to the size of the student population (denoted by x).Using
regression analysis we can develop an equation showing how the
dependent variable y is related to the independent variable x.

The Least Squares Method:


Slope for the Estimated Regression Equation:
σ 𝒙𝒊 − 𝒙ഥ 𝒚𝒊 − 𝒚ഥ
𝒃𝟏 =
෌ 𝒙𝒊 − 𝒙 𝟐
Intercept for the Estimated Regression Equation

𝑏0 = 𝑦ത − 𝑏1 𝑥ҧ

Scatter plot
Calculating the least squares estimated
regression equation for ABC cafe
Table for SSE
Table for SST
Finding SSR and r2

• SSR=SST-SSE
=15730-1530
=14200

• Coefficient of Determination
r2 = SSR/SST
= 14200/15730
= .9027
LOGISTIC REGRESSION
LOGISTIC
REGRESSION
ODDS
• odds are the chances of success divided by the chances of
failure. It is represented in the form of a ratio.

where, p -> success odds 1-p -> failure odds

• In logistic regression, the odds of independent variable


corresponding to a success is given by:
Transforming Output
• So, at p > 0.5 -> we get value of log(odds) in range (0, ∞)
and at p < 0.5 -> we get value of log(odds) in range (-∞, 0)

• Log odds commonly known as Logit function is used in Logistic


Regression models when we are looking non-binary output.
This is how logistic regression is able to work as both a
regression as well as classification model.
Linear Regression and Logistic Regression are the two famous Machine Learning Algorithms which come under
supervised learning technique.

Since both the algorithms are of supervised in nature hence these algorithms use labeled dataset to make the
predictions. But the main difference between them is how they are being used.

The Linear Regression is used for solving Regression problems whereas Logistic Regression is used for solving the
Classification problems.
Linear Regression:
• Linear Regression is one of the simplest Machine learning algorithm that comes
under Supervised Learning technique and used for solving regression problems.
• It is used for predicting the continuous dependent variable with the help of
independent variables.
• The goal of the Linear regression is to find the best fit line that can accurately
predict the output for the continuous dependent variable.
• If single independent variable is used for prediction, then it is called Simple Linear
Regression and if there are more than two independent variables then such
regression is called as Multiple Linear Regression.
• By finding the best fit line, algorithm establish the relationship between
dependent variable and independent variable. And the relationship should be of
linear nature.
• The output for Linear regression should only be the continuous values such as
price, age, salary, etc.
• The regression line can be written as: y= a0+a1x+ ε, where, a0 and a1 are the
coefficients and ε is the error term.
Logistic Regression:

• Logistic regression is one of the most popular Machine learning algorithm that comes under
Supervised Learning techniques.
• It can be used for Classification as well as for Regression problems, but mainly used for
Classification problems.
• Logistic regression is used to predict the categorical dependent variable with the help of
independent variables.
• The output of Logistic Regression problem can be only between the 0 and 1.
• Logistic regression can be used where the probabilities between two classes is required. Such as
whether it will rain today or not, either 0 or 1, true or false etc.
• Logistic regression is based on the concept of Maximum Likelihood estimation. According to this
estimation, the observed data should be most probable.
• In logistic regression, we pass the weighted sum of inputs through an activation function that can
map values in between 0 and 1. Such activation function is known as sigmoid function and the
curve obtained is called as sigmoid curve or S-curve.
• The equation for logistic regression is:

You might also like