Regression Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

REGRESSION

ANALYSIS
Correlation

Measures the relationship


between two or more
variables
Correlation Coefficient (r)

Measures
• strength (0 to 1)
• direction (negative / positive)

of the linear relationship between


• independent variable (𝑥)
• dependent variable (𝑦)
Correlation Coefficient (r)
Greater
Less than 1
Negative −𝟏 ≤ 𝒓 ≤ 𝟏 than 1
Positive

Value of r Relationship
0 None
Close to 0 Weak
Close to 0.5 Moderate
Close to 1 Strong
5
Correlation Direction

Increase in Increase in
Independent Dependent
Variable (x) Variable (y)

Positive
Relationship
6
Correlation Direction

Decrease in Decrease in
Independent Dependent
Variable (x) Variable (y)

Positive
Relationship
7
Correlation Direction

Increase in Decrease in
Independent Dependent
Variable (x) Variable (y)

Negative
Relationship
8
Correlation Direction

Decrease in Increase in
Independent Dependent
Variable (x) Variable (y)

Negative
Relationship
Interpreting Correlation Values
Perfectly
Perfectly
Positive
Negative
Linear
Linear
Strong Weak Weak Strong Relationship
Relationship Negative
Negative Positive Positive
Linear Linear Linear Linear
Relationship Relationship Relationship Relationship

-1 - 0.5 0 0.5 1

No
Linear
Relationship
Scatter Plot

https://statistics.laerd.com/statistical-guides/img/pearson-
2.png
Facts about Correlation
• The choice between the independent and dependent
variable does not influence its calculations.
• Both variables must be quantitative.
• Not influenced by the units of measures
• Correlation r is always a number between –1 and 1. r =
+1 indicated strong positive relations. r =-1 indicates
strong negative correlations. r = 0 indicates NO
(linear) relationship between variables.
• Describes only linear relationships.
• It is influenced by the value of outlier, if outlier exists
in data.
Regression Analysis

Causal forecasting models


usually consider several
variables that are related to
the quantity being predicted
Regression Analysis

For example,
we might want to predict
Personal Computer (PC) sales
for an organisation
Regression Analysis

PC sales
the variable we are trying to
predict
would be considered the
Dependent Variable.
Regression Analysis

The sale of PC’s might be


related to
• Advertising budget
• Prices
• Competitors’ prices
• Promotional strategies
• State of the economy
• Disposable income
• Unemployment rates
Regression Analysis

The Variable we use to do the


prediction
would be considered the
Independent Variable.
Regression Analysis

The most common


Quantitative casual
forecasting model is
Linear Regression Analysis
Assumptions of SLR
1. Linearity of the relationship between dependent and independent variables
2. Independence of the errors (no serial correlation) -- This means
that residuals (errors) should be uncorrelated.
3. Homoscedasticity (constant variance) of the errors -- For each value of X, the
distribution of residuals has the same variance. This means that the level of error
in the model is roughly the same regardless of the value of the explanatory
variable
4. Normality of the error distribution.
If any of these assumptions is violated (i.e., if there is nonlinearity,
serial correlation, heteroscedasticity, and/or non-normality), then the
forecasts, confidence intervals, and economic insights yielded by a
regression model may be (at best) inefficient or (at worst) seriously
biased or misleading.

Please read up on these assumptions


Example of Heteroscedasticity
19
Regression Formula (3 Parts)

1. 𝑛 𝑋𝑌 − 𝑋 𝑌 (Eq 1)
only this equation can give a negative answer

2. 𝑛 𝑋2 − 𝑋 2
(Eq 2)

3. 𝑛 2
𝑌 − 𝑌 2 (Eq 3)
Regression Equations

Y = a + bX
Intercept Slope

𝐸𝑞 1 𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏= =
𝐸𝑞 2 𝑛 𝑋2 − 𝑋 2

𝑌−𝑏 𝑋
𝑎=
𝑛
Correlation Coefficient Formula

𝐸𝑞 1
𝑟=
𝐸𝑞 2 × 𝐸𝑞 3

𝑛 𝑋𝑌 − 𝑋 𝑌
𝑟=
𝑛 𝑋2 − 𝑋 2 𝑛 𝑌2 − 𝑌 2
Advertising Sales

1 10 50
2 20 80
3 30 105
4 40 110
5 50 112
6 60 120
7 70 133

The Dependent Variable (Y) is the


variable that we are seeking to
forecast
Advertising Sales
(Y)
1 10 50
2 20 80
3 30 105
4 40 110
5 50 112
6 60 120
7 70 133

It should be clear that we have a


greater interest in forecasting
sales (not advertising)
Advertising Sales
(X) (Y)
1 10 50
2 20 80
3 30 105
4 40 110
5 50 112
6 60 120
7 70 133

The Independent Variable (X) is


the variable that is being used to
generate the forecast.
Advertising Sales
(X) (Y)
1 10 50
2 20 80
3 30 105
4 40 110
5 50 112
6 60 120
7 70 133

We now need columns for XY, X2


and Y2. Columns will be totaled
to get ∑X, ∑Y, ∑XY, ∑X2, and ∑Y2.
X Y 𝐗𝐘 𝐗𝟐 𝐘𝟐
1 10 50 500 100 2,500
2 20 80 1,600 400 6,400
3 30 105 3,150 900 11,025
4 40 110 4,400 1,600 12,100
5 50 112 5,600 2,500 12,544
6 60 120 7,200 3,600 14,400
7 70 133 9,310 4,900 17,689
280 710 31,760 14,000 76,658

n = number of observations (values) = 7


NB: Figures are in ($’000)
X Y 𝐗𝐘 𝐗𝟐 𝐘𝟐
280 710 31,760 14,000 76,658

n=7 𝑋𝑌 = 31,760

𝑋 = 280 𝑋 2 = 14,000

𝑌 = 710 𝑌 2 = 76,658
X Y 𝐗𝐘 𝐗𝟐 𝐘𝟐
280 710 31,760 14,000 76,658

1. 𝑛 𝑋𝑌 − 𝑋 𝑌
= 7 × 31,760 − 280 × 710
= 23,520 (𝐸𝑞 1)

2. 𝑛 𝑋 2 − 𝑋 2
2
= 7 × 14,000 − 280
= 19,600 (𝐸𝑞 2)
X Y 𝐗𝐘 𝐗𝟐 𝐘𝟐
280 710 31,760 14,000 76,658

3. 𝑛 𝑌 2 − 𝑌 2
2
= 7 × 76,658 − 710
= 32,506 (𝐸𝑞 3)

Eq 1 = 23,520 Eq 2 = 19,600

Eq 3 = 32,506
Eq 1 = 23,520 Eq 2 = 19,600

𝐸𝑞 1 23,520
𝑏= = = 1.2
𝐸𝑞 2 19,600

𝑌−𝑏 𝑋 710 − (1.2 × 280)


𝑎= =
𝑛 7
= 53.43
Intercept: a = 53.43
Interpretation: Even with no advertising
expenditure, sales will be at least $53,430. (NB:
Figures are in thousands)

Slope: b = 1.2
Interpretation: For every additional dollar spent on
advertising, sales will increase by $1.20

Y = a + bX = 53.43 + 1.2X
Eq 1 = 23,520 Eq 2 = 19,600
Eq 3 = 32,506

𝐸𝑞 1 23,520
𝑟= =
𝐸𝑞 2 × 𝐸𝑞 3 19,600 × 32,506

= 0.93
Strength Direction

Interpretation: Very strong positive linear


relationship between advertising and
sales.
Y = a + bX = 53.43 + 1.2X

r = 0.93

Checkpoint
If b is positive then r must also be
positive
Coefficient of Determination (𝒓𝟐 )

Measures the proportion of the total


variation in the dependent variable
(Y) that can be explained by
variation in the independent
variable (X).

𝑟 = 0.93
Therefore 𝑟 2 = (0.93)2 = 0.8649
= 86.49%
Coefficient of Determination (𝒓𝟐 )

r = 86.49%

Interpretation:
86.49% of changes in sales can be
explained by changes in advertising
expenditure.
That is, 13.51% of changes in sales
cannot be explained by changes in
advertising.
What is the predicted value for sales if
advertising is $65,000?

Y = 53.43 + (1.2)(65) = 131.43


That is, Sales = $131,430

Remember that original figures were


in thousands.
What level of advertising would be
required to generate sales of
$200,000?

200 = 53.43 + 1.2X


200 - 53.43 = 1.2X
146.57 = 1.2X
X = 146.57 / 1.2 = 122.14

Required advertising = $122,140


Practice Problem
INTELLIGENCE TEST SCORES AND FRESHMEN
CHEMISTRY GRADES
STUDENT TEST SCORES (TS) X CHEMISTRY GRADE (CG) Y
1 65 85
2 50 74
3 55 76
4 65 90
5 55 85
6 70 87
7 65 94
8 70 98
9 55 81
10 70 91
11 50 76
12 55 74
INTELLIGENCE TEST SCORES AND
FRESHMEN CHEMISTRY GRADES
1. Find and interpret the:
a) Least square regression equation to
predict chemistry grades.
b) Correlation coefficient
c) Coefficient of determination

2. What chemistry grade would be


expected based on a test score of 75?
3. What test score would you expect to lead
to a chemistry grade of 60?
STUDENT TS (X) CG(Y) XY X2 Y2
1 65 85 5525 4225 7225
2 50 74 3700 2500 5476
3 55 76 4180 3025 5776
4 65 90 5850 4225 8100
5 55 85 4675 3025 7225
6 70 87 6090 4900 7569
7 65 94 6110 4225 8836
8 70 98 6860 4900 9604
9 55 81 4455 3025 6561
10 70 91 6370 4900 8281
11 50 76 3800 2500 5776
12 55 74 4070 3025 5476
78 725 1011 61685 44475 85905

You might also like