Stastics ll:6
Stastics ll:6
• When only two variables are involved, the analysis is referred to as simple correlation and
simple linear regression analysis, and when there are more than two variables the term
multiple regression and partial correlation is used.
• Correlation Analysis: deals with the measurement of the closeness of the relationship which
are described in the regression equation.
• We say there is correlation if the two series of items vary together directly or inversely.
Count….
• Simple Correlation: Suppose we have two variables X = (X1, X2 ,... Xn ) and Y = (Y1, Y2,... Yn).
When higher values of X are associated with higher values of Y and lower values of X are associated
with lower values of Y, then the correlation is said to be positive or direct.
When higher values of X are associated with lower values of Y and lower values of X are associated
with higher values of Y, then the correlation is said to be negative or inverse.
• Remark: Always this r lies between -1 and 1 inclusively and it is also symmetric.
• Interpretation of r
1. Perfect positive linear relationship ( if r = 1)
2. Some Positive linear relationship ( if r is between 0 and 1)
3. No linear relationship ( if r = 0)
4. Some Negative linear relationship ( if r is between -1 and 0)
5. Perfect negative linear relationship ( if r = -1)
Examples: 1. Calculate the simple correlation between mid semester and final exam scores of 10 students (both out of 50)
• This means mid semester exam and final exam scores have a slightly positive correlation.
• Exercise The following data were collected from a certain household on the monthly income (X) and consumption (Y) for
the past 10 months. Compute the simple correlation coefficient.
X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360
• The above formula and procedure is only applicable on quantitative data, but when we have qualitative data like efficiency,
honesty, intelligence, etc. we calculate what is called Spearman’s rank correlation coefficient as follows:
• Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula
Where rs = coefficient of rank correlation, D = the difference between paired ranks and n = the number of pairs
Example: Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation between the tests of the
ladies.
Lipstick types A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two variables are positively correlated (Y increases
as X increases).
b)
Where
Yˆ= 7.0194
r
+ 0.9560X is the estimated regression line.
c)Insert X=85 in the estimated regression line.
Exercise:
• To know how far the regression equation has been able to explain the variation in Y we use a
measure called coefficient of determination ( r 2 )