Regression and Correlation - Upload Compatibility Mode

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

SIMPLE LINEAR REGRESSION

Simple Linear Regression


Simple Linear Regression is a process of estimating the
statistical relationship between two variables. It is
commonly used as least square method.
Example:
 The number of hours a student study his lesson relates to his
academic grades
 A student’s grade in his undergraduate subjects relate to his
board exam result
 The amount of money spent in doing his project relates to his
grade in Physics
 The number of family members relates to the monthly
expenses
 Oil price in the world market relates with the country’s
economic stability
 Lifestyle with health condition
Linear Trend is a straight-line trend wherein the
amount of change is constant each period.

• There are three degrees of correlation or


relationship between two variables:
 Perfect correlation (positive and
negative)
 Some degree of correlation (positive and
negative)
 No correlation
Linear Trend

Perfect positive correlation Perfect negative correlation

Some positive correlation Some negative correlation

No correlation
)
y = a + bx - simple linear regression equation

a=
∑ y − b∑ x b= ∑ xy − nx y
2 x= ∑ x
y= ∑ y
n ∑ x 2
− nx n n
Where:
x = independent variable
y = dependent variable
n = number of ordered pairs
x = mean of the independent variable
y = mean of the dependent variable
a = intercept
b = slope
)
y = predicted value of the independent
variable
CORRELATION ANALYSIS
Correlation Analysis
Correlation is a measure of association between two
variables.
The two most popular correlation coefficients are:
Pearson's product-moment correlation coefficient.
For interval or ratio-type data, use Pearson's
technique(Pearson r).
Spearman's correlation coefficient rho
When calculating a correlation coefficient for ordinal
data, select Spearman's technique (Spearman rho)
Correlation Interpretation Guide
±1.00 Perfect +/- Correlation
±0.76 - ±0.99 Very High +/- Correlation
±0.51 - ±0.75 High +/- Correlation
±0.26 - ±0.50 Moderately Small +/- Correlation
±0.01 - ±0.25 Very Small +/- Correlation
0.00 No Correlation

The range of r values is –1 ≤ r ≤ 1


 where an r value close to +1 indicates a good and direct
linear relationship between variables x and y ( as x increases,
y increases linearly);
 while an r value close to –1 indicates a good but inverse
linear relationship between the two variables (as x increase,
y decrease linearly);
 An r value close to zero indicates a poor linear relationship.
The Pearson Product-Moment Correlation
Coefficient(Pearson r)
The Pearson Product-Moment Correlation
Coefficient(Pearson r)
The most widely used measure of linear
correlation between two variables is called
the Pearson product-moment correlation
coefficient or simply the simple correlation
coefficient.
For interval or ratio-type data.
Two ways of computing the Pearson r:
1. Coefficient of Correlation, and
2. Coefficient of Determination

The Coefficient of Correlation or Coefficient


Determination;
 is a measure of strength of the relationship of the
variables in a regression equation.
 is used to determine whether the regression
equation is a reliable forecast method. This is
determined by relatively strong relationship
between the two variables in the equation, and
any other factors affecting must remain constant.
The strength or quality of the linear relationship
between two variables x and y may be measured
using either of two coefficients.
The coefficient of correlation, denoted as r; or
2
 

r2 = 
∑ xy − ∑ x ∑ y
n 

 n 
 ∑ x 2 − ( ∑ x ) 2  n ∑ y 2 − ( ∑ y )2  
     

r =
n ∑ xy − ∑ x ∑ y
n x )2  n y )2 
 ∑ x2 − ( ∑   ∑ y2 − ( ∑ 
the coefficient of determination, r2 are
computed below:
2
a ∑ y + b ∑ xy − n y
r2 = 2
∑ y2 − ny
Example:

THE DELTA FOODS INC.


r =
n ∑ xy − ∑ x ∑ y
n x )2  n y )2 
 ∑ x2 − ( ∑   ∑ y2 − ( ∑ 
Example:

THE DELTA FOODS INC.


Compute the Coefficient of Determination
and Coefficient Correlation of the variables
in the Delta Foods Inc., then interpret the
result.
Delta Foods Inc., believes its sales are directly related to
the amount of money it spends in promotion. The
company has accumulated the following data on
promotional expenditures and sales for the past ten
years.
Annual Promotional Annual Sales
Expenditure(P100,000) (P100,000)
a. Develop a simple linear regression
8 65
equation for these data.
14 90
b. For a promotional expenditure of
10 84
P2,500,000 what level of sales
13 95
would the company expect.
15 97
c. Plot the scatter data on a scatter
18 100
diagram and super impose the
19 105
regression line on it.
20 111
d. Solve for the Pearson Product
24 120
Moment Correlation then interpret.
29 123
Correlation Interpretation Guide
±1.00 Perfect +/- Correlation
±0.76 - ±0.99 Very High +/- Correlation
±0.51 - ±0.75 High +/- Correlation
±0.26 - ±0.50 Moderately Small +/- Correlation
±0.01 - ±0.25 Very Small +/- Correlation
0.00 No Correlation

Interpretation of the Delta Foods Inc.:


A correlation r = 0.9511 or r = 0.9504 indicates
a very high and direct linear positive relationship
between variables x(annual promotional) and
y(annual sales), that is, as x increases, y
increases.
2. It is generally known that the number of road
accidents is inversely proportional with road
width. The following data show the results of
a study indicating the number of accidents
occurring per hundred thousand vehicle
kilometers.
Road width in feet (x)
75 52 60 33 22
Number of accidents (y)
40 84 55 92 90

a. Determine the simple linear regression equation.


b. If the road width is 40 ft., how many times
accidents would possibly occur?
c. Find the correlation coefficient (Pearson r) and
interpret the result.
Exercise:
1. Suppose that a firm’s marketing manager wishes to determine the correlation
between the company’s annual level of production expense and the sales it
generates. To do this, he has taken a sample of seven previous periods as shown
below:
SAMPLE ANNUAL PRODUCTION ANNUAL SALES IN
YEAR BUDGET IN P100,000 x MILLIONS OF PESOS y
1 1.0 0.75
2 1.5 0.82
3 2.4 1.20
4 3.3 1.60
5 4.0 2.00
6 4.8 2.10
7 6.0 3.40
1. Develop a simple linear regression equation.
2. The marketing manager is informed that the approved advertising budget for
next year is P500,000. Estimate the sales figure for that year.
3. Plot the scatter data on a scatter diagram and super impose the regression line
on it.
4. Solve for the coefficient of correlation r and coefficient of determination r2.
Spearman Rank-Order Correlation
Coefficient(Spearman rho)
Spearman Rank-Order Correlation
Coefficient(Spearman rho)
Simple Correlation Analysis Between Ordinal
Variables:
Example:
o The final grades of the students in Statistics
o The result of the judges’ rating in a beauty
pageant.
o The preference of a student in his professional
career (i.e., 1st - White Collar Job, 2nd – Blue
Collar job, 3rd – Gold collar, 4th – Pink Collar
Job, 5th – Gray Collar )
Spearman Rank-Order Correlation
Coefficient
2
6∑ D
ρ =1− 2
n (n − 1)

Where: n = number of samples


D = difference between ranks
Example:
1. Determine the measure the degree of
relationship between the performance rank
obtained by the ten trainees during the first
and second evaluation period.
Student Rank During Rank During D D2
nd
Trainee 1st Evaluation 2 Evaluation

A 8 7
B 2 5
C 7 10
D 1 4
E 4 2
F 9 6
G 3 1
H 6 9
I 10 8
J 5 3
Solution:
Student Rank During Rank During D D2
nd
Trainee 1st Evaluation 2 Evaluation

A 8 7 1 1
B 2 5 -3 9
C 7 10 -3 9
D 1 4 -3 9
E 4 2 2 4
F 9 6 3 9
G 3 1 2 4
H 6 9 -3 9
I 10 8 2 4
J 5 3 2 4

ρ = 1−
6 ∑ D2
= 1−
6(62)
= 0.62 High
n(n2 − 1) 10(102 − 1)
correlation
2. Measure the degree of relationship of five
finalists in a certain competition as ranked by
female and male judges.

Five As Ranked As Ranked D D2


Finalists by Male by Female
Judge Judge
A 4 2.5
B 2 1
C 1 2.5
D 5 4
E 3 5
1.
2. Compute the correlation between the
scores in Math and Physics of the
following students by using the spearman
rank correlation. Then interpret the
result.
Student A B C D E F G H I

Math 4 3 5 4 6 6 8 2 6
rank 6.5 8 5 6.5 3 3 1 9 3

Physics 3 5 6 6 4 0 1 7 7
rank 7 5 3.5 3.5 6 9 8 1.5 1.5
3. Calculate the correlation between the IQ of a
person with the number of hours spent in
front of TV per week. Then Interpret.
Hours of
IQ(y) TV per x y Rx Ry D D2
week(y)
106 7
86 0
100 27
101 50
99 28
103 29
97 20
113 12
112 6
110 17
Note: When two or more observations of one
variable are the same, ranks are assigned by
averaging positions occupied in their rank
order.
Score 2 3 4 4 5 6 6 6 8
Rank 1 2 3.5 3.5 5 7 7 7 9
2. Compute the correlation between the
scores in Math and Physics of the
following students by using the spearman
rank correlation. Then interpret the
result.
Student A B C D E F G H I

Math 4 3 5 4 6 6 8 2 6

Physics 3 5 6 6 4 0 1 7 7
May your constant love be with us, Lord,
as we put our trust in you.
Ps. 33:22

You might also like