0% found this document useful (0 votes)
25 views

Stastics ll:6

This document discusses simple linear regression and correlation. It defines key terms like regression, correlation, dependent and independent variables. It provides examples of direct and inverse correlation and outlines the process for calculating correlation coefficients and fitting regression models using least squares estimates. An example problem is worked through demonstrating these calculations and concepts.

Uploaded by

muchedessie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Stastics ll:6

This document discusses simple linear regression and correlation. It defines key terms like regression, correlation, dependent and independent variables. It provides examples of direct and inverse correlation and outlines the process for calculating correlation coefficients and fitting regression models using least squares estimates. An example problem is worked through demonstrating these calculations and concepts.

Uploaded by

muchedessie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CHAPTER Six

SIMPLE LINEAR REGRESSION AND


CORRELATION
Introduction
• Linear regression and correlation is studying and measuring the linear relationship among
two or more variables.

• When only two variables are involved, the analysis is referred to as simple correlation and
simple linear regression analysis, and when there are more than two variables the term
multiple regression and partial correlation is used.

• Regression Analysis: is a statistical technique that can be used to develop a mathematical


equation showing how variables are related.

• Correlation Analysis: deals with the measurement of the closeness of the relationship which
are described in the regression equation.

• We say there is correlation if the two series of items vary together directly or inversely.
Count….
• Simple Correlation: Suppose we have two variables X = (X1, X2 ,... Xn ) and Y = (Y1, Y2,... Yn).

When higher values of X are associated with higher values of Y and lower values of X are associated
with lower values of Y, then the correlation is said to be positive or direct.

• Examples: - Income and expenditure

- Height and weight

Distance covered and fuel consumed by car.

When higher values of X are associated with lower values of Y and lower values of X are associated
with higher values of Y, then the correlation is said to be negative or inverse.

• Examples: - Demand and supply

- Income and the proportion of income spent on food


Count…
• The correlation between X and Y may be one of the called dependent variable.
following ii. Both variables being the result of a common cause.
i. Perfect positive (slope=1) That is, the correlation that exists between two
variables is due to their being related to some third
ii. Positive (slope between 0 and 1)
force.
iii. No correlation (slope=0)
Example: Let X1= ESLCE result Y1= rate of surviving
iv. Negative (slope between -1 and 0)
in the University Y2= the rate of getting a scholar ship.
v. Perfect negative (slope=-1) Both X1&Y1 and X1&Y2 have high positive correlation,
The presence of correlation between two variables maylikewiseY1 & Y2 have positive correlation but they are
be due to three reasons: not directly related, but they are related to each other via
X1.
i. One variable being the cause of the other. The cause
is called independent variable, while the effect isiii. Chance: The correlation that arises by chance is called
spurious correlation.
• Therefore, while interpreting correlation coefficient, it is necessary to see if there is any
likelihood of any relationship existing between variables under study. The correlation
coefficient between X and Y denoted by r is given by

• Remark: Always this r lies between -1 and 1 inclusively and it is also symmetric.
• Interpretation of r
1. Perfect positive linear relationship ( if r = 1)
2. Some Positive linear relationship ( if r is between 0 and 1)
3. No linear relationship ( if r = 0)
4. Some Negative linear relationship ( if r is between -1 and 0)
5. Perfect negative linear relationship ( if r = -1)

Examples: 1. Calculate the simple correlation between mid semester and final exam scores of 10 students (both out of 50)

Student Mid Sem. Exam (X) Final Sem. Exam (Y)


1 31 31
2 23 29
3 41 34
4 32 35
5 29 25
6 33 35
7 28 33
8 31 42
9 31 31
10 33 34
Solution:

• This means mid semester exam and final exam scores have a slightly positive correlation.
• Exercise The following data were collected from a certain household on the monthly income (X) and consumption (Y) for
the past 10 months. Compute the simple correlation coefficient.

X: 650 654 720 456 536 853 735 650 536 666

Y: 450 523 235 398 500 632 500 635 450 360

• The above formula and procedure is only applicable on quantitative data, but when we have qualitative data like efficiency,
honesty, intelligence, etc. we calculate what is called Spearman’s rank correlation coefficient as follows:
• Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula

Where rs = coefficient of rank correlation, D = the difference between paired ranks and n = the number of pairs
Example: Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation between the tests of the
ladies.

Lipstick types A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7

Solution:

X (R1) Y (R2) R1-R2 (D) D2


2 1 1 1
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0
7 6 1 1
6 7 -1 1
Total 12

= 0.786 Yes, there is positive correlation.


Simple Linear Regression
- Simple linear regression refers to the linear relation ship between two variables
- We usually denote the dependent variable by Y and the independent variable by X.
- A simple regression line is the line fitted to the points plotted in the scatter diagram, which would describe the
average relation ship between the two variables. Therefore, to see the type of relation ship, it is advisable to prepare
scatter plot before fitting the model.

• The linear model is:


Y = a + bX + e
Where :Y = Dependent variable X = independent variable
a = Regression cons tan t
b = regression slope
e = random disturbance term Y ~ N (a + bX ,s 2 )
e ~ N (0,s 2 )
- To estimate the parameters (a and b ) we have several methods:
 The free hand method
 The semi-average method
 The least square method
 The maximum likelihood method
 The method of moments
 Bayesian estimation technique.
- The above model is estimated as: Yˆ = a + bX
• Where
a is a constant which gives the value of Y when X=0. It is called the Y-intercept.
b is a constant indicating the slope of the regression line, and it gives a measure of
the change in Y for a unit change in X. It is also regression coefficient of Y on X.
• Example 1: The following data shows the score of 12 students for Accounting and Statistics
examinations.
a) Calculate a simple correlation coefficient
b) Fit a regression equation of Statistics on Accounting using least square estimates.
c) Predict the score of Statistics if the score of accounting is 85.
Accounting X Statistics Y X2 Y2 XY
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
a)

The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two variables are positively correlated (Y increases
as X increases).

b)
Where
Yˆ= 7.0194
r
+ 0.9560X is the estimated regression line.
c)Insert X=85 in the estimated regression line.

Yˆ= 7.0194 + 0.9560 X = 7.0194 + 0.9560(85) = 88.28

Exercise:

• To know how far the regression equation has been able to explain the variation in Y we use a
measure called coefficient of determination ( r 2 )

Where: r = the simple correlation coefficient.

- r 2 gives the proportion of the variation in Y explained by the regression of Y on X.

• 1 - r 2 gives the unexplained proportion and is called coefficient of indetermination

You might also like