0% found this document useful (0 votes)
19 views32 pages

Regression

The document discusses simple linear regression analysis. It introduces linear regression models and how they can be used to predict a dependent variable from an independent variable. It also discusses finding the least squares regression line and interpreting the slope and y-intercept coefficients.

Uploaded by

biep23-jamavuto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
19 views32 pages

Regression

The document discusses simple linear regression analysis. It introduces linear regression models and how they can be used to predict a dependent variable from an independent variable. It also discusses finding the least squares regression line and interpreting the slope and y-intercept coefficients.

Uploaded by

biep23-jamavuto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 32

REGRESSION

Dumisani Namakhwa, BEng(Hon), MSc


[email protected]
INTRODUCTION
▪ When comparing two different variables, two questions come to mind:
➢ Is there a relationship between two variables?
➢ How strong is that relationship?
▪ These questions can be answered using regression and correlation.
▪ Regression answers whether there is a relationship.
▪ correlation answers how strong the linear relationship is.
▪ For example, determining if there a relationship between the alcohol
content and the number of calories, income and number of years of
education, height and weight of people, length and width of envelopes,
temperature and output of an industrial process, altitude and boiling point
of water, or dose of a drug and response.
▪ A scatter plot can be used to show the relationship between two variables.

APPLIED STATISTICA
REGRESSION MODELS
Linear relationships Curvilinear relationships

Y
Y

X X

Y Y

APPLIED STATISTICA
TYPES OF RELATIONSHIPS
Strong relationships Weak relationships No relationship

Y Y
Y

X X
X
Y Y
Y

X X
APPLIED STATISTICA
INTRODUCTION TO REGRESSION ANALYSIS
▪ Regression analysis is used to:
▪ Predict the value of a dependent variable based on the value of at least
one independent variable
▪ Explain the impact of changes in an independent variable on the
dependent variable
▪ Dependent variable: the variable we wish to predict or explain
▪ Independent variable: the variable used to predict or explain the
dependent variable

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION MODEL
▪ Only one independent variable, 𝑋
▪ Relationship between 𝑋 and 𝑌 is described by a linear
function
▪ Changes in 𝑌 are assumed to be related to changes in 𝑋

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION MODEL
▪ For a linear relationship, we can use a model of the form

𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀,
▪ where
▪ 𝑦 = the dependent variable
▪ 𝛽0 = the y-intercept
▪ 𝛽1 = the slope coefficient
▪ 𝑥 =the independent variable
▪ 𝜀 = the random error term
▪ 𝛽0 + 𝛽1 𝑥 = the linear component

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION MODEL
Y Yi = β0 + β1Xi + ε i
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EQUATION
(PREDICTION LINE)
The simple linear regression equation provides an
estimate of the population regression line

Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept

Value of X for

Ŷi = b0 + b1Xi
observation i

APPLIED STATISTICA
THE LEAST SQUARES METHOD
▪ 𝑏0 and 𝑏1 are obtained by finding the values that minimize the sum of the squared
differences between Y and Yƶ :

min  (Yi −Ŷi ) = min  (Yi − (b0 + b1Xi ))


2 2

APPLIED STATISTICA
FINDING THE LEAST SQUARES EQUATION
▪ The business objective of the director of planning is to forecast annual sales for all
new stores, based on the number of profiled customers who live no more than 30
minutes from a Sunflowers store. To examine the relationship between the number
of profiled customers (in millions) who live within a fixed radius from a Sunflowers
store and its annual sales ($millions), data were collected from a sample of 14
stores. Determine the least squares equation for the given data using Excel.

Profiled
Customers Annual Sales Profiled Annual Sales
Store (millions) ($millions) Store Customers ($millions)

1 3.7 5.7 8 3.1 4.7

2 3.6 5.9 9 3.2 6.1

3 2.8 6.7 10 3.5 4.9

4 5.6 9.5 11 5.2 10.7

5 3.3 5.4 12 4.6 7.6

6 2.2 3.5 13 5.8 11.8

7 3.3 6.2 14 3.0 4.1

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
USING EXCEL DATA ANALYSIS FUNCTION
Scatter Plot of Profiled Customers and Annual Sales.
14.00

12.00

10.00
Annual sales

8.00

6.00

4.00

2.00

0.00
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
profiled customers

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
USING EXCEL DATA ANALYSIS FUNCTION
1. Choose Data
2. Choose Data Analysis
3. Choose Regression

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
USING EXCEL DATA ANALYSIS FUNCTION
▪ Enter Y range and X range and desired options

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
USING EXCEL DATA ANALYSIS FUNCTION
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.920798
R Square 0.847869
Adjusted R Square
0.835191
Standard Error
0.999298
Observations 14
▪ Observe that 𝑏0 = −1.2088 and 𝑏1 = 2.0742.
▪ Therefore, the prediction line for these data is
ANOVA
df SS MS F Significance F
Regression 1 66.7854 66.7854 66.87922 3E-06
Residual 12 11.98317 0.998597
Total 13 78.76857 𝑌ƶ𝑖 = −1.2088 + 2.0742𝑋𝑖
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept -1.20884 0.994874 -1.21507 0.247707 -3.37648 0.958806 -3.37648 0.958806
X Variable 12.074173 0.253629 8.177972 3E-06 1.521562 2.626784 1.521562 2.626784

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
INTERPRETATION OF BO
𝑌ƶ𝑖 = −1.2088 + 2.0742𝑋𝑖
▪ The 𝑌 intercept, 𝑏0 , is -1.2088 . The 𝑌 intercept represents the
predicted value of 𝑌 when 𝑋 equals 0 . Because the number of
profiled customers of the store cannot be 0 , this 𝑌 intercept has
little or no practical interpretation. Also, the 𝑌 intercept for this
example is outside the range of the observed values of the 𝑋
variable, and therefore interpretations of the value of 𝑏0 should
be made cautiously.

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
INTERPRETING B1
𝑌ƶ𝑖 = −1.2088 + 2.0742𝑋𝑖
▪ The slope, 𝑏1 , is +2.0742 . This means that for each increase of 1
unit in 𝑋, the predicted mean value of 𝑌 is estimated to increase
by 2.0742 units. In other words, for each increase of 1.0 million
profiled customers within 30 minutes of the store, the predicted
mean annual sales are estimated to increase by $2.0742 million.
Thus, the slope represents the portion of the annual sales that
are estimated to vary according to the number of profiled
customers.

APPLIED STATISTICA
ACTIVTY
▪ Use the prediction line found in the previous example to predict the annual sales
for a store with 4 million profiled customers.
▪ A statistics professor wants to use the number of hours a student studies for a statistics final
exam (𝑋) to predict the final exam score (𝑌). A regression model is fit based on data
collected from a class during the previous semester, with the following results:
𝑌ƶ𝑖 = 35.0 + 3𝑋𝑖
▪ What is the interpretation of the 𝑌 intercept, 𝑏0 , and the slope, 𝑏1 ?

APPLIED STATISTICA
COMPUTING THE 𝑌 INTERCEPT,
𝑏0 , AND THE SLOPE, 𝑏1
▪ For small data sets, you can use a hand calculator to compute the
least-squares regression coefficients.
▪ Computational formula for the slope, 𝑏1
𝑆𝑆𝑋𝑌
𝑏1 =
𝑆𝑆𝑋

where
𝑛 𝑛 𝑛 𝑛
( 𝑖=1 𝑋𝑖 )( 𝑖=1 𝑌𝑖 )
𝑆𝑆𝑋𝑌 = (𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌) = 𝑋𝑖 𝑌𝑖 −
𝑛
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
( 𝑖=1 𝑋𝑖 )2
𝑆𝑆𝑋 = (𝑋𝑖 − 𝑋)2 = 𝑋𝑖2 −
𝑛
𝑖=1 𝑖=1

APPLIED STATISTICA
COMPUTING THE 𝑌 INTERCEPT,
𝑏0 , AND THE SLOPE, 𝑏1
▪ Computational formula for the y-intercept, 𝑏0

𝑏0 = 𝑌 − 𝑏1 𝑋

where
𝑛
𝑖=1 𝑌𝑖
𝑌=
𝑛
𝑛
𝑖=1 𝑋𝑖
𝑋=
𝑛

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
HAND CALCULATION
▪ The business objective of the director of planning is to forecast annual sales for all
new stores, based on the number of profiled customers who live no more than 30
minutes from a Sunflowers store. To examine the relationship between the number
of profiled customers (in millions) who live within a fixed radius from a Sunflowers
store and its annual sales ($millions), data were collected from a sample of 14
stores. Determine the least-squares regression coefficients of the data given below.

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE: HAND
CALCULATION
▪ Five quantities need to be computed to determine 𝑏1 and
𝑏0 . These are 𝑛, the sample size; 𝑛𝑖=1 𝑋𝑖 , the sum of the 𝑋
values; 𝑛𝑖=1 𝑌𝑖 , the sum of the 𝑌 values; 𝑛𝑖=1 𝑋𝑖2 , the sum of
the squared 𝑋 values; and 𝑛𝑖=1 𝑋𝑖 𝑌𝑖 , the sum of the product
of 𝑋 and 𝑌. The computation for these terms are shown in
the table below:

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE: HAND
CALCULATION Profiled Annual
Store Customers Sales
𝑿𝟐 𝑿𝒀
(𝑋) (𝑌)

1 3.7 5.7 13.69 21.09

2 3.6 5.9 12.96 21.24

3 2.8 6.7 7.84 18.76

4 5.6 9.5 31.36 53.20

5 3.3 5.4 10.89 17.82

6 2.2 3.5 4.84 7.70

7 3.3 6.2 10.89 20.46

8 3.1 4.7 9.61 14.57

9 3.2 6.1 10.24 19.52

10 3.5 4.9 12.25 17.15

11 5.2 10.7 27.04 55.64

12 4.6 7.6 21.16 34.96

13 5.8 11.8 33.64 68.44

14 3.0 4.1 9.00 12.30

Totals 52.9 92.8 215.41 382.85

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE: HAND
CALCULATION

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
HAND CALCULATION

APPLIED STATISTICA
SIMPLE LINEAR REGRESSION EXAMPLE:
HAND CALCULATION

APPLIED STATISTICA
INFERENCES ABOUT THE SLOPE
▪ The standard error of the regression slope coefficient (b1) is estimated by

S YX S YX
Sb1 = =
SSX  (X − X)
i
2

APPLIED STATISTICA
INFERENCES ABOUT THE SLOPE:
T TEST
▪ t test for a population slope
▪ Is there a linear relationship between X and Y?

▪ Null and alternative hypotheses


▪ H0: β1 = 0 (no linear relationship)
▪ H1: β1 ≠ 0 (linear relationship does exist)
▪ Test statistic
where:
b1 − β1
t STAT = b1 = regression slope
coefficient
Sb
1 β1 = hypothesized slope

d.f. = n − 2 Sb1 = standard


error of the slope
APPLIED STATISTICA
INFERENCES ABOUT THE SLOPE:
T TEST EXAMPLE
Estimated Regression Equation:

house price = 98.25 + 0.1098 (sq. ft.)

The slope of this model is 0.1098


Is there a relationship between the
square footage of the house and its
sales price?

APPLIED STATISTICA
INFERENCES ABOUT THE SLOPE:
T TEST EXAMPLE
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

b1 Sb1

b1 − β 1 0.10977 − 0
t STAT = = = 3.32938
Sb 0.03297
1

APPLIED STATISTICA
INFERENCES ABOUT THE SLOPE:
T TEST EXAMPLE
H0: β1 = 0
Test Statistic: tSTAT = 3.329
H1: β1 ≠ 0

d.f. = 10- 2 = 8
Decision: Reject H0
a/2=.025 a/2=.025

There is sufficient evidence


Reject H0 Do not reject H0 Reject H0
that square footage affects
-tα/2 tα/2 house price
0
-2.3060 2.3060 3.329
APPLIED STATISTICA
THE END

APPLIED STATISTICA

You might also like