Regression Analysis
Regression Analysis
Regression Analysis
ANALYSIS
Correlation
Measures
• strength (0 to 1)
• direction (negative / positive)
Value of r Relationship
0 None
Close to 0 Weak
Close to 0.5 Moderate
Close to 1 Strong
5
Correlation Direction
Increase in Increase in
Independent Dependent
Variable (x) Variable (y)
Positive
Relationship
6
Correlation Direction
Decrease in Decrease in
Independent Dependent
Variable (x) Variable (y)
Positive
Relationship
7
Correlation Direction
Increase in Decrease in
Independent Dependent
Variable (x) Variable (y)
Negative
Relationship
8
Correlation Direction
Decrease in Increase in
Independent Dependent
Variable (x) Variable (y)
Negative
Relationship
Interpreting Correlation Values
Perfectly
Perfectly
Positive
Negative
Linear
Linear
Strong Weak Weak Strong Relationship
Relationship Negative
Negative Positive Positive
Linear Linear Linear Linear
Relationship Relationship Relationship Relationship
-1 - 0.5 0 0.5 1
No
Linear
Relationship
Scatter Plot
https://statistics.laerd.com/statistical-guides/img/pearson-
2.png
Facts about Correlation
• The choice between the independent and dependent
variable does not influence its calculations.
• Both variables must be quantitative.
• Not influenced by the units of measures
• Correlation r is always a number between –1 and 1. r =
+1 indicated strong positive relations. r =-1 indicates
strong negative correlations. r = 0 indicates NO
(linear) relationship between variables.
• Describes only linear relationships.
• It is influenced by the value of outlier, if outlier exists
in data.
Regression Analysis
For example,
we might want to predict
Personal Computer (PC) sales
for an organisation
Regression Analysis
PC sales
the variable we are trying to
predict
would be considered the
Dependent Variable.
Regression Analysis
1. 𝑛 𝑋𝑌 − 𝑋 𝑌 (Eq 1)
only this equation can give a negative answer
2. 𝑛 𝑋2 − 𝑋 2
(Eq 2)
3. 𝑛 2
𝑌 − 𝑌 2 (Eq 3)
Regression Equations
Y = a + bX
Intercept Slope
𝐸𝑞 1 𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏= =
𝐸𝑞 2 𝑛 𝑋2 − 𝑋 2
𝑌−𝑏 𝑋
𝑎=
𝑛
Correlation Coefficient Formula
𝐸𝑞 1
𝑟=
𝐸𝑞 2 × 𝐸𝑞 3
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑟=
𝑛 𝑋2 − 𝑋 2 𝑛 𝑌2 − 𝑌 2
Advertising Sales
1 10 50
2 20 80
3 30 105
4 40 110
5 50 112
6 60 120
7 70 133
n=7 𝑋𝑌 = 31,760
𝑋 = 280 𝑋 2 = 14,000
𝑌 = 710 𝑌 2 = 76,658
X Y 𝐗𝐘 𝐗𝟐 𝐘𝟐
280 710 31,760 14,000 76,658
1. 𝑛 𝑋𝑌 − 𝑋 𝑌
= 7 × 31,760 − 280 × 710
= 23,520 (𝐸𝑞 1)
2. 𝑛 𝑋 2 − 𝑋 2
2
= 7 × 14,000 − 280
= 19,600 (𝐸𝑞 2)
X Y 𝐗𝐘 𝐗𝟐 𝐘𝟐
280 710 31,760 14,000 76,658
3. 𝑛 𝑌 2 − 𝑌 2
2
= 7 × 76,658 − 710
= 32,506 (𝐸𝑞 3)
Eq 1 = 23,520 Eq 2 = 19,600
Eq 3 = 32,506
Eq 1 = 23,520 Eq 2 = 19,600
𝐸𝑞 1 23,520
𝑏= = = 1.2
𝐸𝑞 2 19,600
Slope: b = 1.2
Interpretation: For every additional dollar spent on
advertising, sales will increase by $1.20
Y = a + bX = 53.43 + 1.2X
Eq 1 = 23,520 Eq 2 = 19,600
Eq 3 = 32,506
𝐸𝑞 1 23,520
𝑟= =
𝐸𝑞 2 × 𝐸𝑞 3 19,600 × 32,506
= 0.93
Strength Direction
r = 0.93
Checkpoint
If b is positive then r must also be
positive
Coefficient of Determination (𝒓𝟐 )
𝑟 = 0.93
Therefore 𝑟 2 = (0.93)2 = 0.8649
= 86.49%
Coefficient of Determination (𝒓𝟐 )
r = 86.49%
Interpretation:
86.49% of changes in sales can be
explained by changes in advertising
expenditure.
That is, 13.51% of changes in sales
cannot be explained by changes in
advertising.
What is the predicted value for sales if
advertising is $65,000?