Statistics and Probability: Quarter 4 - Module 7 Pearson's Sample Correlation Coefficient

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

11 Senior High School

STATISTICS
AND PROBABILITY
Quarter 4 - Module 7
Pearson’s Sample
Correlation Coefficient

Property of Schools Division of Negros Oriental | lrmds.depednodis.net | (035) 225 2376 / 225 2838
Statistics and Probability – Grade 11
Alternative Delivery Mode
Quarter 4 – Module 7: Pearson’s Sample Correlation Coefficient

First Edition, 2020


11
Republic Act 8293, section 176 states that: No copyright shall subsist in any work of
the Government of the Philippines. However, prior approval of the government agency or office
wherein the work is created shall be necessary for exploitation of such work for profit. Such
agency or office may, among other things, impose as a condition the payment of royalties.

Borrowed materials (i.e., songs, stories, poems, pictures, photos, brand names,
trademarks, etc.) Included in this module are owned by their respective copyright holders.
Every effort has been exerted to locate and seek permission to use these materials from their
respective copyright owners. The publisher ownership over them and authors do not represent
nor claim.

Published by the Department of Education


Secretary: Leonor Magtolis Briones
Undersecretary: Diosdado M. San Antonio

Development Team of the Module


Writers: Duke Princeton Mariño and Ronald G. Tolentino
Editors: Didith T. Yap and Mercyditha D. Enolpe
Reviewer: Rickleoben V. Bayking

Layout Artist: Jerry Mar B. Vadil

Management Team: Senen Priscillo P. Paulin, CESO V Elisa L. Baguio, EdD


Joelyza M. Arcilla, EdD, CESE Rosela R. Abiera
Marcelo K. Palispis, JD, EdD Maricel S. Rasid
Nilita L. Ragay, EdD Elmar L. Cabrera

Inilimbag sa Pilipinas ng ________________________

Department of Education - Region VII Schools Division of Negros Oriental

Office Address: Kagawasan, Ave., Daro, Dumaguete City, Negros Oriental


Tel #: (035) 225 2376 / 541 1117
E-mail Address: [email protected]
11

Statistics and
Probability
Quarter 4 - Module 7
Pearson’s Sample Correlation
Coefficient
I

LEARNING COMPETENCIES:
▪ Calculates the Pearson’s sample correlation coefficient
(M11/12SP-IVh-2)
▪ Solves problems involving correlation analysis (M11/12SP-IVh-
3)

OBJECTIVES:
K: Defines Pearson’s correlation coefficient;
S: Computes the correlation coefficient; and
A: Appreciates the importance of interpreting the relationships
among data being observed.

Pre-Assessment
A data set consists of eight (x, y) pairs of numbers:
(0, 12), (2, 15), (4, 16), (5, 14), (8, 22), (13, 24), (15, 28), (20, 30)

1. Plot the data in a scatter diagram.


2. Based on the plot, explain whether the relationship between x and y appears to involve
randomness or no pattern.
3. Based on the plot, explain whether the relationship between x and y appears to be linear or
not.

2
’s In

Let us recall the four different relationships between the variables x and y.

What have you observed?

Figure a illustrates negative linear relationship, b illustrates positive linear relationship,


x could serve as a useful predictor of y for figures a and b. Figure c illustrates nonlinear
relationship while and in d the linear relationship is weak.

3
’s New

Definition:
The Pearson’s correlation coefficient for a collection of n pairs (x, y) of numbers is a
sample is the number r given by the formula

𝑺𝑺𝒙𝒚
𝒓=
√(𝑺𝑺𝒙𝒙 )(𝑺𝑺𝒚𝒚 )
Where:
𝟏 𝟏
𝑺𝑺𝒙𝒙 = ∑ 𝒙𝟐 − 𝒏 (∑ 𝒙)𝟐 𝑺𝑺𝒙𝒚 = ∑ 𝒙𝒚 − 𝒏 (∑ 𝒙)(∑ 𝒚)
𝟏
𝑺𝑺𝒚𝒚 = ∑ 𝒚𝟐 − 𝒏 (∑ 𝒚)𝟐

The Pearson’s correlation coefficient has the following properties:


1. The value of r lies between -1 and 1, inclusive.
2. The sign of r indicates the direction of the linear relationship between x and y:
a. If r < 0, then y tends to decrease as x is increased.
b. If r > 0, then y tends to increase as x is increased.
3. The size of |𝑟| indicates the strength of the linear relationship between x and y:
a. If |𝑟| is near 1 (that is, if r is near either 1 or – 1) then the linear relationship between
x and y is strong.
b. If |𝑟| is near 0 (that is, if r is near 0 of either sign) then the linear relationship
between x and y is weak.

The table below shows the verbal description of the strength of the
correlation between two variables.

Value of r Linear Relationship


0.00 to 0.19(-0.19 to 0.00) Very weak positive(negative)
correlation
0.20 to 0.39(-0.39 to -0.20) Weak positive(negative) correlation
0.40 to 0.59(-0.59 to -0.40) Moderate positive(negative)
correlation
0.60 to 0.79(-0.79 to -0.60) Strong positive(negative) correlation
0.80 to 1.00(-1.00 to -0.80) Very strong positive(negative)
correlation

4
The closer the value of r to 1 or -1, the stronger the relationships between
the variables. This can be shown in the diagram below.

a)Strong Positive Linear Correlation b) Strong Negative Linear Correlation

c) No Linear Correlation d) Weak to Medium Positive Linear Correlation

e) Weak to Medium Negative Linear Correlation f) No Linear Correlation

5
Example 1. Compute the linear correlation coefficient for the height and weight as shown in
the table below.

Height x (in) 68 69 70 70 71 72 72 72 73 73 74 75

Weight y (lbs) 151 146 157 164 171 160 163 180 170 175 178 188

Solutions.

Step 1. Construct a table with the components x, y, x2, xy, y2 on top and with the corresponding
values as shown.

Step 2. Compute SSxx , SSxy, and SSyy.

𝟏 𝟐
𝑺𝑺𝒙𝒙 = ∑ 𝒙𝟐 − (∑ 𝒙)
𝒏

= 61537 – (1/12)(859)2

= 46.9167

𝟏
𝑺𝑺𝒙𝒚 = ∑ 𝒙𝒚 − (∑ 𝒙) (∑ 𝒚)
𝒏

= 143626 – (1/12)(859)(2003)

= 244.583
𝟏
𝑺𝑺𝒚𝒚 = ∑ 𝒚𝟐 − 𝒏 (∑ 𝒚)𝟐

= 336025 – (1/12)(2003)2

6
= 1690.9167

Step 3. Compute the linear correlation coefficient r.

𝑺𝑺𝒙𝒚
𝒓=
√(𝑺𝑺𝒙𝒙 )(𝑺𝑺𝒚𝒚)
𝟐𝟒𝟒. 𝟓𝟖𝟑
𝒓=
√(𝟒𝟔. 𝟗𝟏𝟔𝟕)(𝟏𝟔𝟗𝟎. 𝟗𝟏𝟔𝟕)
r = 0.868
Interpretation:
Since the value of r is greater than 0 or positive, weight y tends to increase as height x is
increases. The value 0.868 is near 1 so the linear relationship between height x and weight y
is strong.
Note: Some books use another formula in solving the correlation coefficient. However, for the
purpose of uniformity and to avoid confusion, let us just use the one above.
Example 2.

Kalmin, a Physics teacher is interested in determining the relationship


between the number of hours students spent on studying the Physics final exam
and their obtained final scores. He selected a random sample of eight students in
a Physics class. The scores obtained are shown in the table. Interpret and calculate
the Pearson’s sample correlation coefficient.

Hours spent in studying(x) 1 3 5 2 4 3 2 0


Score(y) 70 79 90 77 85 81 75 64

Solution:
Step 1. Construct a table with the components x, y, x2, xy, y2 on top and with the corresponding
values as shown.

Hours spent
Score
No. in studying xy x² y²
(y)
(x)
1 1 70 70 1 4900
2 3 79 237 9 6241
3 5 90 450 25 8100
4 2 77 154 4 5929
5 4 85 340 16 7225
6 3 81 243 9 6561
7 2 75 150 4 5625
8 0 64 0 0 4096
Ʃ 20 621 1644 68 48677

7
Step 2. Compute SSxx , SSxy, and SSyy.

𝟏 𝟐
𝑺𝑺𝒙𝒙 = ∑ 𝒙𝟐 − (∑ 𝒙)
𝒏

= 68 – (1/8)(20)2

= 68-50

= 18

𝟏
𝑺𝑺𝒙𝒚 = ∑ 𝒙𝒚 − (∑ 𝒙) (∑ 𝒚)
𝒏

= 1644 – (1/8)(20)(621)

= 1644-1552.50

= 91.5
𝟏
𝑺𝑺𝒚𝒚 = ∑ 𝒚𝟐 − 𝒏 (∑ 𝒚)𝟐

= 48677 – (1/8)(621)2

= 48677-48205.125

= 471.875

Step 3. Compute the linear correlation coefficient r.

𝑺𝑺𝒙𝒚
𝒓=
√(𝑺𝑺𝒙𝒙 )(𝑺𝑺𝒚𝒚)

𝟗𝟏. 𝟓
𝒓=
√(𝟏𝟖)(𝟒𝟕𝟏. 𝟖𝟕𝟓)

𝟗𝟏.𝟓
𝒓= ; r = 0.9928
√𝟖𝟒𝟗𝟑.𝟕𝟓

Interpretation:

Since, the value of r is positive and close to 1, the variables have a very strong
positive correlation. It means that students who took more hours in studying get
higher Physics score.

8
I Have Learned

Directions: Reflect the learning that you gained after taking up this lesson on Pearson’s
Correlation Coefficient by completing the given statements below. Do this on your activity
notebook. Do not write anything on this module.

What were your thoughts or ideas about the topic before taking up the lesson?
I thought that _______________________________________________________________
___________________________________________________________________________
__________________________________________________________________________.
What new or additional ideas have you had after taking up this lesson?
I learned that ________________________________________________________________
___________________________________________________________________________
__________________________________________________________________________.
How are you going to apply your learning from this lesson?

I will apply ________________________________________________________________


___________________________________________________________________________
_________________________________________________________________________.

I Can Do

Directions: From a study with 6 patients, their ages and glucose levels were
recorded. Based on the data in the table below, would you say that ages (x) and
glucose levels (y) are linearly correlated. Complete the table by supplying the
necessary information.

Glucose
Patient Age(x) xy x² y²
Level(y)
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Ʃ 247 486 20485 11409 40022
Answer the following:

9
a) n = _________
b) ∑ 𝒙𝟐 = _______
c) ∑ 𝒙 = ________
d) ∑ 𝒙𝒚 = ______
e) ∑ 𝒚 = _______
f) r = _________
g) Interpretation:
_____________________________________________________________________
_____________________________________________________________________
____________________________________________________________________.

You will be graded following the rubric below.

Need
Excellent Good Satisfactory Improvement
Category 4 3 2 1
Completeness 100% of the 75% of the Only 50% of Only 25% of
necessary data necessary data the necessary the necessary
asked in the asked in the data asked in data asked in
task were task were the task were the task were
completely and followed and accomplished. accomplished.
correctly accomplished.
followed and
accomplished.
Accuracy of answer The answer is The answer is The answer is The answer is
100% accurate. 75% accurate. 50% accurate. does not show
accuracy

Organization The solution is Only 75% of Only 50% of The


well-organized the solution is the solution is organization
and very well- well-organized of the solution
evident. organized and and very is not evident.
very evident. evident.
Timeliness Submitted the Submitted the Submitted the Never
task before the task on time task after the submitted the
deadline deadline given task
Typically, uses Typically,
Partially, uses Rarely uses an
an efficient and uses an
an effective effective
effective effective
Strategy/Procedures strategy to strategy to
strategy to strategy to
solve the solve
solve the solve the
problem. problems.
problem. problem.

10
Compute the linear correlation coefficient for the given data below.

1. n = 10 ∑x = 15 ∑y = 24 ∑x2 = 60 ∑y2 = 134


∑xy = 144

2. The age x and resting heart rate y were measured for ten men, with the results shown in
the table below.
x 20 23 30 37 35 45 51 55 60 63
y 72 71 73 74 74 73 72 79 75 77

Compute the linear correlation coefficient for these sample data and interpret the
result.

11
12
a.
b. The y values tend to increase as x values increased, thus the relationship between x and y appears to be positive
linear.
WHAT I HAVE LEARNED
1. SSxy = 24 SSxx = 40 SSyy = 18.8 r = 0.875
2.
3. SSxy = 1761 – (1/6)(92)(110) = 74.33
SSxx = 1426 – (1/6)(92)2 = 15.33
SSyy = 2418 – (1/6)(110)2 = 401.33
r = 0.948
Since the value of r is greater than 0 or positive, the number of vocabulary y tends to increase as age x is increased.
The value 0.948 is near 1 so the linear relationship between age x and vocabulary y is strong.
WHAT I CAN DO (Performance Task)
Glucose
Patient Age(x) xy x² y²
Level(y)
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Ʃ 247 486 20485 11409 40022
Assessment:
1. SSxy = 108 SSxx = 37.5 SSyy = 76.4 r = 0.638
2.
SSxy = 31244 – (1/10)(419)(740) = 238
SSxx = 19643 – (1/10)(419)2 = 2086.9
SSyy = 54814 – (1/10)(740)2 = 54
r = 0.709
Since the value of r is greater than 0 or positive, the resting heart rate y tends to increase as age x is increased. The
value 0.709 is near 1 so the linear relationship between age x and resting heart rate y is strong.
References

Calkins, Keith G. 2005. Applied Statistics. “Linear Regression”.


URL: http://www.andrews.edu/~calkins/math/edrm611/edrm06.htm. Accessed: April 27,
2021.

Malate, J., 2017. Statistics and Probability: The Pearson’s Correlation Coefficient. 155-159.
Sta. Ana, Manila: VICARISH PUBLICATIONS AND TRADING, INC.

13

You might also like