Correlation Coefficient
Correlation Coefficient
Correlation Coefficient
The correlation coefficient, r, is a summary measure that describes the extent of the
statistical relationship between two interval or ratio level variables
. The correlation coefficient is scaled so that it is always between -1 and +1. When r is
close to 0 this means that there is little relationship between the variables and the
farther away from 0 r is, in either the positive or negative direction, the greater the
relationship between the two variables.
The two variables are often given the symbols X and Y. In order to illustrate how the two
variables are related, the values of X and Y are pictured by drawing the scatter diagram,
graphing combinations of the two variables.
The scatter diagram is given first, and then the method of determining Pearson’s r is
presented. From the following examples, relatively small sample sizes are given. Later,
data from larger samples are given.
Scatter Diagram
A scatter diagram is a diagram that shows the values of two variables X and Y, along
with the way in which these two variables relate to each other. The values of variable X
are given along the horizontal axis, with the values of the variable Y given on the
vertical axis.
Later, when the regression model is used, one of the variables is defined as an
independent variable, and the other is defined as a dependent variable. In regression,
the independent variable X is considered to have some effect or influence on the
dependent variable Y. Correlation methods are symmetric with respect to the two
variables, with no indication of causation or direction of influence being part of the
statistical consideration. A scatter diagram is given in the following example. The same
example is later used to determine the correlation coefficient.
Types of Correlation
The scatter plot explains the correlation between the two attributes or variables. It
represents how closely the two variables are connected. There can be three such
situations to see the relation between the two variables –
Positive Correlation – when the values of the two variables move in the same direction
so that an increase/decrease in the value of one variable is followed by an
increase/decrease in the value of the other variable.
Negative Correlation – when the values of the two variables move in the opposite
direction so that an increase/decrease in the value of one variable is followed by
decrease/increase in the value of the other variable.
No Correlation – when there is no linear dependence or no relation between the two
variables.
Correlation Formula
Correlation shows the relation between two variables. Correlation coefficient shows the
measure of correlation. To compare two datasets, we use the correlation formulas.
Correlation Example
Years of Education and Age of Entry to Labour Force Table.1 gives the number of years
of formal education (X) and the age of entry into the labour force (Y ), for 12 males from
the Regina Labour Force Survey. Both variables are measured in years, a ratio level of
measurement and the highest level of measurement. All of the males are aged close to
30, so that most of these males are likely to have completed their formal education.
1 10 16
2 12 17
3 15 18
4 8 15
5 20 18
6 17 22
7 12 19
8 15 22
9 12 18
10 10 15
11 8 18
12 10 16
Table 1. Years of Education and Age of Entry into Labour Force for 12 Regina
Males
Since most males enter the labour force soon after they leave formal schooling, a close
relationship between these two variables is expected. By looking through the table, it
can be seen that those respondents who obtained more years of schooling generally
entered the labour force at an older age. The mean years of schooling are
= 12.4 years and the mean age of entry into the labour force is