Lesson 18 - Correlation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

LESSON 18 - CORRELATION

Suppose you are a company manager, and you want to know whether there is a relationship between the number
of hours of training you provide for an employee and the number of monthly sales he acquires after. How are you going
to establish this relationship?
In this lesson, you will learn to understand the type of relationship or correlation between two quantities or
variables and determine whether their correlation is significant or not.

A correlation is a relationship between two variables.

If the data is represented as a set of ordered pairs (x,y) , we take x as the independent (or
explanatory) variable and y is the dependent (or response variable.

A bivariate data is a data that has two variables. For statistics, it is interesting to know whether the variables in
this type of data are related or not. They are numerical measure to determine whether two or more variables are related
and what extent the strength of their relationship is. This measure is called a correlation of coefficient.
To determine the strength of a linear relationship between two variables, statisticians use a statistic called the
linear correlation coefficient, which is denoted by r and is defined as follows.

Linear Correlation Coefficient

For the n ordered pairs (𝑥1 , 𝑦1 ) , (𝑥2 , 𝑦2 ) , (𝑥3 , 𝑦3 ) ,……, (𝑥𝑛 , 𝑦𝑛 ) , the linear correlation coefficient r
is given by

𝑛 𝛴𝑥𝑦 − (𝛴𝑥)(𝛴𝑦)
𝑟=
ට[𝑛𝛴𝑥 2 − (𝛴𝑥)2 ][𝑛𝛴𝑦 2 − (𝛴𝑦)2 ]

• If the linear correlation coefficient r is positive, the relationship between the variables has a positive
correlation. A value close to +1 signifies a strong positive linear relationship. In this case, if one variable
increases, the other variable also tends to increase.
• If r is negative, the linear relationship between the variables has a negative correlation. A value close
to -1 signifies a strong negative linear relationship. In this case, if one variable increases, the other
variable tends to decrease.
• The most useful graph for displaying the relationship between two quantitative variables is a scatterplot.
A scatterplot shows the relationship between two quantitative variables measured for the same
individuals. The values of one variable appear on the horizontal axis, and the values of the other variable
appear on the vertical axis. Each individual in the data appears as a point on the graph.

(a) The graph/scatter diagram along the type of linear correlation that exists between x and y variables.
(b) The strength of the Correlation

(a) (b)
EXAMPLE 1.

The following table shows data that describe the test scores of students of quizzes in Mathematics in relation to
their test scores in Physics. Compute the correlation coefficient of these two variables.
Student 1 2 3 4 5 6 7 8 9 10
Math (X) 3 7 2 9 8 4 1 10 6 5
Physics 11 1 19 5 17 3 15 9 15 8
(Y)

Solution: The computation for r will be more efficient if we use a table that shows the columns 𝒙 , 𝒚, 𝒙𝒚, 𝒙𝟐 , 𝒚𝟐 .

Correlation Between Math and Physics


Student Math (X) Physics (Y) 𝑋2 𝑌2 XY
1 3 11 9 121 33
2 7 1 49 1 7
3 2 19 4 361 38
4 9 5 81 25 45
5 8 17 64 289 136
6 4 3 16 9 12
7 1 15 1 225 15
8 10 9 100 81 90
9 6 15 36 225 90
10 5 8 25 64 40
SUM ∑ 𝑥 = 55 ∑ 𝑦 = 103 ∑ 𝑥 2 = 385 ∑ 𝑦2 ∑ 𝑥𝑦 = 506
= 1, 401

Computations: When n = 10, we have,

𝑛 𝛴𝑥𝑦 − (𝛴𝑥)(𝛴𝑦)
𝑟=
√[𝑛𝛴𝑥 2 − (𝛴𝑥)2 ][𝑛𝛴𝑦 2 − (𝛴𝑦)2 ]
10 (506)−(55)(103)
𝑟=
√[10(385)−(55)2 ][10(1, 401)−(103)2 ]
5, 060− 5, 665
𝑟=
√[3, 850−3, 025] [14, 010−10, 609]
− 605
𝑟=
√[3, 850−3, 025] [14, 010−10, 609]
− 605
𝑟=
√(825)(3, 401)
− 605
𝑟 = 2, 805, 825

− 605
𝑟=
1, 675.06
𝑟 = -0.361 or -0.36 (Negatively low correlation)

You might also like