Statistics 1
Statistics 1
Statistics 1
AND
PROBABILITY
Statistics
- Method of collecting, organizing, presenting, analysis and interpreting data
- Tool for decision making
Data and Variable
Data – Refers to any information
- Facts and statistics collected together for reference or analysis.
Universe – A particularly sphere of activity, interest and experience
- Collection (Kabuuan)
Population – Set of all possible various of variable
- A finite or Infinite collection of items under consideration
Variable – A quantity that during a calculation is assume to vary or be capable of varying
in value
Qualitative and Quantitative
Qualitative – is one whose values are adjective such as color, gender, nationalistic, etc.
- Express categorical
Quantitative – A variable that takes numerical values for which arithmetic makes sense.
- Numerical
Discrete Variable – Data can be counted
- Which data can only take on certain values
Continuous Variable – Has an infinite number of possible values and the probability
associated with any particular value of continuous distribution is null
Scale or level of Measurement
Nominal level – Basically refers to categorically discrete data such as name of your
school, type, etc.
- Variable that categorical
- Non – numeric, no sense of ordering
Interval level – Data is like ordinal except we can say the intervals between each value
are equally split
- Numerical scales in which intervals have the same interpretation
throughout.
Ratio level – Measurement there is always an absolute zero that us meaningful
- Highest level, importance 0
Probability – It is the measure of the likehood that an event will occur
- Is quantified as number between 0
N ( E)
=
N (S)
Sample – Set of possible outcome
Event – Expected amount of outcome
Example / Seatwork:
6 1
A. 7 = =
36 6
3 1
B. 4 = =
36 12
4 1
C. 5 = =
36 9
3 1
D. 10 = =
36 12
Random Variables
A random variable, usually written X, is a variable whose possible values are numerical
outcomes of a random phenomenon. As a function, a random variable is required to be
measurable, which rules out certain pathological case where the quantity which the
random variable returns is infinitely sensitive to small changes in the outcome.
Example: Let us presume that two coins are tossed and the sample space is S= (HH, HT,
TH, TT)
Suppose that X represent the number of heads and with each sample point we can
associate to a number of X as shown in the table below:
Outcome X P(x)
HH 2 ¼
HT 1 ¼
TH 1 ¼
TT 0 ¼
Hence; random variable X takes the values 0, 1, 2 for this random experiment.
As shown in the above example, it takes only a finite number of values and the
association of probability for each random value.
Examples of discrete random variables includes the number of children in a family, the
family Friday night attendance in a cinema, the number of patients in a doctor’s surgery.
The Probability Distribution of a discrete random variable is a list of probabilities
associated with each of its possible values. It is also sometimes called the probability
function or the probability mass function.
Example: Let X represent the sum of two dice.
Then the probability distribution of X is as follows:
X 2 3 4 5 6 7 8 9 10 11 12
P(x) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36
B.
7 8
7
6
6
5
5
P(x) 4 4
3
3
2
2
1
1 0
2 3 4 5 6 7 8 9 10 11 12
X
Continuous Random Variable is one which takes an infinite number of possible values.
Continuous random variable are usually measurements. Examples of this are height,
weight, the amount of sugar in an orange, the time required to run a mile.
Example: The temperature of the patients in a particular clinic lies between 37ºC to 40ºC
we write this as X= (x/37 ≤ x ≤ 40)
Probability Mass Function (PMF) is a function that gives the probability that a discrete
random variable is exactly equal to some value. The probability mass function is often
the primary means of defining a discrete probability distribution, and such functions exist
for either scalar or multivariate random variables whose domain is discrete.
The Probability Mass Function of the Probability Distribution is all equal to one.
The variance of a discrete random variable X measures the spread, or variability, of the
distribution, and is defined by:
σ2=[∑(x-µ)2(P(x))]
Example 1:
X 0 1 2 3 4
P(x) 0.10 0.20 0.30 0.25 0.15
Mean:
µ= ∑(x(P(x))
µ= [(0) (0.10)] + [(1) (0.20)] + [(2) (0.30)] + [(3) (0.25)] + [(4) (0.15)]
µ= 0 + 0.2 + 0.6 + 0.75 + 0.6
µ= 2.15
Variance:
σ2=[∑(x-µ)2(P(x))]
σ2= [(0-2.15)2 (0.10)] + [(1-2.15)2 (0.20)] + [(2-2.15)2 (0.30)] + [(3-2.15)2 (0.25)] +
[(4-2.15)2 (0.15)]
σ2= 1.4275
Example 2:
x F P(x)
1
0 1
18
6 1
1 6 or
18 3
8 4
2 8 or
18 9
3 1
3 3 or
18 6
Mean:
µ= ∑(x(P(x))
1 1 4 1
µ= [(0) ( )] + [(1) ( )])] + [(2) ( )] + [(3) ( )]
18 3 9 6
1 8 1
µ= 0 + + +
3 9 2
31 13
µ= or 1 or 1.7222
18 18
Variance:
σ2=[∑(x-µ)2(P(x))]
31 2 1 31 1 31 4 31 1
σ2= [(0- ) ( )] + [(1- )2 ( )] + [(2- )2 ( )] + [(3- )2 ( )]
18 18 18 3 18 9 18 6
961 169 25 529
σ2= + + +
5832 972 729 1944
204
σ2= or 0.6451
324
Normal distribution
A normal distribution is an arrangement of a data set in which most values cluster in
the middle of the range and the rest taper off symmetrically toward either extreme.
- The normal distribution is a mathematical model represented by a bell-shaped
curve which is symmetric with respect to the mean.
- The normal curve does not intersect or touch the horizontal axis.
X
The mean, median and mode of the normal distribution are equal.
SEATWORK:
A. 10 to 50
z=
z = -1.33
A = 0.4082 or 40.82%
B. 25 to 50
z = -0.83
A = 0.2967 or 29.67%
C. 32 to 50
D. 50 TO 65
z=
z=
z = 0.5
A = 0.1915 or 19.15%
E. 50 to 78
Z=
Z=
Z = 0.93
A= 0.3238 or 32.38%
F. below 45
z=
z=
z = -0.17
A = 0.5 – 0.0675
A = 0.4325 or 43.25%
G. below 38
Z=
Z=
Z = -0.4
Z = 0.1554
A = 0.5 – 0.1554
A = 0.3446 or 34.46%
H. above 55
Z=
Z=
Z = 0.17
A = 0.5 - 0.0675
A = 0.4325 or 43.25 %
I. above 48
Z=
Z = -0.07
A = 0.5 + 0.0279
A = 0.5279
J. between 20 and 80
K. 27 and 47
L. 61 and 70
SAMPLING AND SAMPLING DISTRIBUTION
A sampling distribution is a probability distribution of a statistic obtained through a large
number of samples drawn from a specific population. The sampling distribution of a
given population is the distribution of frequencies of a range of different outcomes that
could possibly occur for a statistic of a population. Sampling is a term used in statistics. It
is the process of choosing a representative sample from a target population and collecting
data from that sample in order to understand something about the population as a whole.
Example number 1:
X ( Sample) ( x-x̄ ) ( x-x̄ )2
3 ̵ 2.5 6.25
4 ̵ 1.5 2.25
5 ̵ 0.5 0.25
6 0.5 0.25
7 1.5 2.25
8 2.5 6.25
N:6 17.5
∑x = 33
A sampling distribution of sample means is a theoretical distribution of the values that
the mean of a sample takes on in all of the possible samples of a specific size that can be
made from a given population
Formula: Mean (x̄)
∑x
X̅ = ∑ x
N
33
X̅ =
6
X̅ = 5.5
Population variance (σ2) tells us how data points in a specific population are spread out.
It is the average of the distances from each data point in the population to the mean,
squared.
Formula: σ2 =
∑ ( x−x )
N
17 .5
σ2 =
6
σ2 = 2.9166
The standard deviation of a population gives researchers the amount of dispersion of
data for an entire population of survey respondents. A population standard deviation
represents a parameter, not a statistic. Parameters refer to a numerical property of a
population. A statistic, conversely, means that a number can be computed from data.
Researchers use statistics to estimate parameters.
Formula: σ = √σ
σ = √ 2 .9166
σ = 1. 7078
The sample variance, s2, is used to calculate how varied a sample is. Sample
variance is defined as the variance estimated from a sample. Just to recall that the sample
is a collection of data that is taken from the population data: a very large amount of data.
Formula: S2 =
∑ ( x−x )
N−1
17 .5
S2 =
5
S2 = 3.5
A standard deviation of a sample estimates the standard deviation of a population based
on a random sample. The sample standard deviation, unlike the population standard
deviation, is a statistic that measures the dispersion of the data around the sample mean.
In statistics, “mean” equals the average of a set of numbers; to obtain the mean,
researchers add together a list of numbers and divide the total by the amount of numbers
on the list. To calculate the sample standard deviation, researchers divide the squared
deviations by the number of data sets minus 1.
Formula: S = √s
S = √ 3 .5
S = 1. 8708
Example 2.
x x- x̅ ( x- x̅ )2
1 -4.5 20.25
2 -3.5 12.25
3 -2.5 6.25
4 -1.5 2.25
5 -0.5 0.25
6 0.5 0.25
7 1.5 2.25
8 2.5 6.25
9 3.5 12.25
10 4.5 20.25
N=10 ∑(x- x̅)2=82. 5
∑x=55
Mean Population Standard Deviation Sample Standard Deviation
x́=¿
∑x σ =√ σ 2 s=√ s 2
N
55
x́= σ =√ 8.25 s=√ 9.1667
10
x̅=5. 5 σ =2.8723 s= 3.0277
Example 3.
S2 = 9.1667
Practice Seatwork:
1.
n=4
Mean (x̄) Population Variance Poppulation S.D Sample Variance
Sample S.D
∑x ∑( x− x̄ ) 2 ∑( x− x̄ ) 2
x̄ = σ2 = σ= √σ 2 S2 = S=
N N N−1
√s 2
30 5 5
x̄ = σ2 = σ =√ 1.25 S2 = S=
4 4 4−1
√ 1.6667
σ = 1.1180 5 S = 1.2910
x̄ = 7.5 σ2 = 1.25 S2 =
2.8723 3
S2 = 1.6667
2.
X (Sample) (x-x̄) (x-x̄ )2
1 -2 4
2 ̵1 1
3 0 0
4 1 1
5 2 4
15 10
r=5
3.
Population= 10, 11, 12, 13
Sample= 2
Example 1.
1. μ=80
σ =5
N = 20
x́ >83
83−80
z=
5 / √ 20
z= 2.68
A=2.68
A=0.4963
A= 0. 5-0 .4963 80 85 90 95 100 105
A=0. 0037or 0. 37%
2. μ=40000
σ =4000
n= 20
38000 < x́ <41500
38000−40000
z=
4000/ √ 20
z= 2.24
A= 0.4875
41500−40000
z=
4000/ √ 20
z= 1.68
A=0.4535
3600 4000 4400
A=0.4875+0.4535
A=0.941 or 94.1%
3.
μ=100
σ= 9
n= 20
95< x̄ < 97
Solution:
95−100
z=
9 / √ 20
z = 2.48 91 100
A= 0.4934
97−100
z=
9 / √ 20
z = -1.49
A= 0.4319
A= 0.4934 - 0.4319
A= 0.0615 or 6.15%
Asignment:
1.
μ=45
σ =15
n=10
36< x́<50
36−45 50−45
z= z=
15/ √ 10 15/ √ 10
z=−1.90 z=1.05
A=0.4713 A=0.3531
A=0.4713+0.3531 30 45 60
A ¿ 0.8284∨82.44 %
2.
μ=496
σ =20
n=25
x́ > 485
485−496
z=
20/ √ 25
z=−2 .75
A=0.4970 436 456 476 496 516 536 556
A=0.5+0.4970
A=0.997∨99. %
If the population standard deviation is not known then the t-distribution can be
used. If the population standard deviation is known, then the normal distribution an z-
score may be used.
Note:
Ho and Ha are always opposite with each other. Switching the symbols for each
hypothesis is necessary if the symbolic form is not accepted.
Always remember that Ho must have an equal sign for every symbolic form.
If the Ho of the claim does not have any equal sign therefore Ho and Ha symbolic
form will interchange.
Step 2. Select the level of significance
Level of Confidence – Degree of assurance (belief) that a particular statistical
statement is correct under specified conditions
- (1 – α)
Level of Significance – is the degree of uncertainty (doubt) about the statistical
statement under the same conditions used to determine the confidence level.
- Α can be 0.01, 0.05 and 0.10
Situations
Decision
Ho is TRUE Ho is FALSE
Reject Ho Type I Error CORRECT
Fails to reject Ho CORRECT TYPE II error
Note:
The most commonly used values of α are 0.01 (1%), 0.05 (5%) and 0.10 (10%).
Choosing 0.01 level of significance means that the researcher is 99% confident and has
1% chance to commit type 1 error.
1 tailed 2 tailed
0.10 1.28 1.645
0.05 1.645 1.96
0.01 2.33 2.58
If the test concerns means, some parametric tests for choice are:
1. Z – test
2. T – test
3. Paired t – test
4. Analysis of Variance (ANOVA)
If the test concerns means, some non – parametric tests for choices are:
1. Sign Test
2. Wilcoxon signed – rank test
3. Wilcoxon rank – sum test
4. Kruskal – Wallis test
5. Chi – square test
THE Z – TEST
Z – test is another type of parametric test that concerns mean (one or two population
means). It is being used based on the ff. assumptions:
The probability distributions of the random variable is normal and the SD is
known or assumed.
The population SD is estimated from sample SD.
n > 30
a. z – test for one – sample Mean test
x́−μ
z= σ
√n
b. z – test for two Independent Means
(σ1, σ1 _ unknown_or_n1 > 30 _ & _ n2 > 30)
x́ 1−x́ 2
z= σ 21 σ 22
Note:
√ +
n1 n2
In case of population SD is unknown; the value of sample SD is can be used. The reason
is that in z – test the sample size (n) is large enough to represents the population.
THE T – TEST
T – test is almost similar to z – test, it is being used based on the following assumptions:
the probability distribution of the random variable is appropriately normal
n < 30
a. t-test for One – sample Mean
x́−μ
t= s
√n
b. t-test between Two Independent Means
Case 1: (σ 12=σ 22_but_unknown)
x́ 1−x́ 2
t= s21 s 22
Sp
√ +
n1 n2
Decision Rule:
Reject Ho if the computed value is > + Tabular Value or < - Tabular Value. Otherwise,
do not reject the Ho.
b. For One Tailed Test (Directional) – the critical value is either negative or positive
For Left Tail test (Ha is < )
Decision Rule:
Reject Ho if the computed value is < - CV. Otherwise, do not reject.
For Right Tail test (Ha is >)
Decision Rule:
Reject Ho if the computed value is < - CV. Otherwise, do not reject.
Example #1
A Barangay Captain from a certain barangay in Valenzuela City claims that the average
monthly income of families with 5 members from his vicinity is P12, 000. But when the
City Statistics Office (CSO) conducted survey with 100 families with 5 members, to his
barangay randomly they found out that they only have an average monthly income of 10,
800 with a standard deviation of 1, 500. With this information the CSO assert that the
claim is not true. Using 0.05 level of significance test the claim of the Brgy. Captain.
-1.96 1.96
Decision Rule:
Reject Ho if the computed values is less than or greater than 1.96. Otherwise do not reject
Ho.
Step 5. Compute for the z – value.
x́−μ
z= σ
√n
10,800−12,000
z= 1,500
√100
z=-8
Step 6. Decision
Since the computed z - value (-8) is less than the tabular value (-1.96). Therefore,
Rejected the null hypothesis (Ho) at 0.05 level of significance.
Step 7. Interpretation
Rejection of the null hypothesis (Ho) means that the average monthly income of families
with 5 members is not 12, 000 base on the sample of 100 using 0.05 level of significance.
Therefore, the claim of Brgy. Captain is not true.
Example #2
Suppose that the Barangay Captain made the assertion that the average weekly income of
families with 5 members from his locality is greater than P12, 000. Considering the data
gathered by the CSO, what possible conclusions can be drawn? Assuming that the Brgy.
Captain is 99% confidence about his claim.
2.33
Decision Rule:
Reject Ho if the computed values is greater than 2.33. Otherwise do not reject Ho.
Step 5. Compute for the z – value.
x́−μ
z= σ
√n
10,800−12,000
z= 1,500
√100
z=-8
Step 6. Decision
Since the computed z - value (-8) is less than the tabular value (2.33). Therefore, Rejected
the null hypothesis (Ho) at 0.01 level of significance.
Step 7. Interpretation
Rejection of the null hypothesis (Ho) means that the average weekly income of families
with 5 members is greater than 12, 000 base on the sample of 100 using 0.01 level of
significance. Therefore, the claim of Brgy. Captain is not true.
Example #3
A Physics Professor claims that there is no significant difference between the mean
scores obtained by students in afternoon and morning session. If the professor is 95%
confident with his claim, perform the hypothesis testing.
Morning Afternoon
Mean 85 83
SD 15 10
N 40 40
Step 1.
Ho: µ1 = µ2 – There is no significant difference between the mean score obtained by
student in morning and afternoon
Ha: µ1 ≠ µ2 - There is significant difference between the mean score obtained by student
in morning and afternoon
Step 2. Level of Significance
α = 0.05
Step 3. Test Statistics
n1 = 40 n2 = 40
z – test; 2 – mean/s; 2 – tailed test
Step 4. Define the area of rejection
-1.96 1.96
Decision Rule:
Reject Ho if the computed values is less than or greater than + 1.96. Otherwise do not
reject Ho.
Step 5. Compute for the z – value.
x́ 1−x́ 2
z= σ 21 σ 22
√ +
n1 n2
85−83
z= 152 102
√ +
40 40
z = 0.7016
Step 6. Decision
Since the computed z - value (0.7016) is less than the tabular value (1.96). Therefore, do
not reject the null hypothesis (Ho) at 0.05 level of significance.
Step 7. Interpretation
Non Rejection of the null hypothesis (Ho) means that the mean score of obtained by the
morning and afternoon session is 85 and 83 base on the sample of 40 using 0.05 level of
significance. Therefore, the claim of professor is true.
Example #4
The mean score obtained by OLFU students in entrance examination is 87. A group of 25
freshmen students scored a average of 85 with a standard deviation of 5. Based on the
result the admission office asserts that the group’s average score is lower than 87. If you
were one of those student would you agree?
Make a necessary statistical analysis to support your answer use 0.05 level of
significance.
Let µ = the mean score obtained by OLFU students in entrance examination.
Step 1.
Ho: µ < 87 – The average score is less than 87
Ha: µ > 87 – The average score is greater than 87
Step 2. Level of Significance
α = 0.05
Step 3. Test Statistics
n = 25
t – test; 1 – mean/s; 1 – tailed test
Step 4. Define the area of rejection
n-1 = 24
1.711
Decision Rule:
Reject Ho if the computed values is greater than 1.711. Otherwise do not reject Ho.
Step 5. Compute for the t – value.
x́−μ
t= s
√n
85−87
t= 5
√ 25
t = -2
Step 6. Decision
Since the computed t - value (-2) is less than the tabular value (1.711). Therefore, do not
reject the null hypothesis (Ho) at 0.05 level of significance.
Step 7. Interpretation
Non Rejection of the null hypothesis (Ho) means that the mean score obtained by OLFU
student is less than 87 base on the sample of 25 using 0.05 level of significance.
Therefore, the claim of admission is true.
Example #5
A rice dealer claims that the average mass of his sack of rice is 50kg. A sample of 20
sacks were taken and found to have a mean mass of 49.3kg with a standard deviation of 1
kg. Are you going to agree with the claim of the dealer? (Use 0.05 level of significance)
Let µ = the average mass of rice dealer sack of rice.
Step 1.
Ho: µ = 50 kg – The average mass of sack of rice is 50 kg
Ha: µ ≠ 50 kg – The average mass of sack of rice is not 50 kg
Step 2. Level of Significance
α = 0.05
Step 3. Test Statistics
n = 20
t – test; 1 – mean/s; 2 – tailed test
Step 4. Define the area of rejection
n-1 20-1=19
2.093
-2.093 2.093
Decision Rule:
Reject Ho if the computed values is less than or greater than + 2.093. Otherwise do not
reject Ho.
Step 5. Compute for the z – value.
x́−μ
t= s
√n
49.3−50
t= 1
√ 20
t = -3.1305
Step 6. Decision
Since the computed t - value (-3.1305) is less than the tabular value (-2.093). Therefore,
Rejected the null hypothesis (Ho) at 0.05 level of significance.
Step 7. Interpretation
Rejection of the null hypothesis (Ho) means that the average mass of rice dealer of sack
of rice is not 50 kg. base on the sample of 20 using 0.05 level of significance. Therefore,
the claim of rice dealer is not true.
Example #6
A group of nursing student selected two brands of pain reliever and test the average time
of each to take effect. For each brand the following was determined.
Brand x s n
A 5.2 mins 1 min 15 trials
B 4.7 mins 1.6 mins 14 trials
Assume unequal variances (σ 12≠ σ 22) and unknown. Test the hypothesis that there is no
significant difference between the average time of brand A and B to take effect, using
0.01 as the level of significance.
Let µ1 = the average time of brand A to take effect
Let µ2 = the average time of brand B to take effect
Step 1.
Ho: µ1 = µ2 – There is no significant difference between the time of brand average A and
B to take effect
Ha: µ1 ≠ µ2 - There is significant difference between the time of brand average A and B to
take effect
Step 2. Level of Significance
α = 0.01
Step 3. Test Statistics
n1 = 15 n2 = 14
t – test; 2 – mean/s; 2 – tailed test
Step 4. Define the area of rejection
n-1 = 13
3.012
-3.012 3.012
Decision Rule:
Reject Ho if the computed values is less than or greater than + 3.012. Otherwise do not
reject Ho.
Step 5. Compute for the z – value.
x́ 1−x́ 2
t= s21 s 22
Sp
√ +
n1 n2
5.2−4.7
t= 12 1.6 2
√ +
15 14
t = 1.0010
Step 6. Decision
Since the computed t - value (1.0010) is less than the tabular value (3.012). Therefore, do
not reject the null hypothesis (Ho) at 0.05 level of significance.
Step 7. Interpretation
Non Rejection of the null hypothesis (Ho) means that the average time of brand A and B
to take effect is 5.2 mins and 4.7 mins base on the sample of 15 and 14 using 0.01 level
of significance. Therefore, the claim of nursing student is true.
N ∑ xy−∑ x ∑ y
r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ]
5(42,180)−(740)(287)
r= 2 2
√ [ 5(110,050)−( 740 ) ] [5(16,985)−( 287 ) ]
r =−0.5686674282∨−0.5687
Step 4:
Based on the result of r = 0.57 which is less than the tabular value of 0.959 accept
Ho and decline Ha. The height and weight of female students had no relationship. Based
on the result of r implies moderate correlation.
Example #2
The researchers wants to know if there is a correlation between the sleeping hours
and score in a quiz of 7 male students using a 0.05 level of significance.
Step 1: Ho: There is no correlation between the sleeping hours and score in a quiz of 7
male students.
Ha: There is a correlation between the sleeping hours and score in a quiz of 7
male students.
Step 2: df = n-2 α = 0.05
df = 7-2 TV = 0.754
df = 5
Step 3:
Male x y x2 y2 xy
1 3 0 9 0 0
2 6 7 36 49 42
3 4 4 16 16 16
4 8 10 64 100 80
5 4 3 16 9 12
6 7 8 49 64 56
7 9 10 81 100 90
N=7 41 42 271 338 296
Example #3
The tour guide wants to determine if there is any substantial relationship between
the number of tourist in the morning and afternoon in 5 tourist destination. Used α = 0.01
as level of significance.
Step 1: Ho: There is no correlation between the morning and afternoon in tourist
destinations.
Ha: There is a correlation between the morning and afternoon in tourist
destinations.
Step 2: df = n-2 α = 0.01
df = 5-2 TV = 0.959
df = 3
Step 3:
Tourist Morning Afternoon
x2 y2 xy
Destination (x) (y)
1 25 30 625 900 750
2 20 70 400 4,900 1,400
3 25 20 625 400 500
4 35 10 1,225 100 350
5 40 30 1,600 900 1,200
N=5 145 160 4,475 7,200 4,200
N ∑ xy−∑ x ∑ y 5( 4,200)−(145)(160)
r= r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ] 2 2
√ [ 5(4,475)−( 145 ) ] [5(7,200)−( 160 ) ]
r =0.587136564∨0.5871
Step 4:
Based on the result of r = 0.59 which is less than the T.V. of 0.959 accept H o and
reject Ha. The morning and afternoon in tourist destination had no relationship. Based on
the result of r implies low correlation.
Example #4
After 10 days of listing the no. of buyers in 10 products, a business man wants to
determine if there is a correlation between the no. of buyers and price of the products
using the level of significance α = 0.01.
Step 1: Ho: There is no correlation between the no. of buyers and price of the products.
Ha: There is a correlation between the no. of buyers and price of the products.
Step 2: df = n-2 α = 0.01
df = 10-2 TV = 0.765
df = 8
Step 3:
Products x y x2 y2 xy
1 12 25 144 625 300
2 10 30 100 900 300
3 7 30 49 900 210
4 15 40 225 1,600 600
5 30 10 900 100 300
6 25 12 625 144 300
7 17 20 289 400 340
8 19 20 361 400 380
9 55 10 3,025 100 550
10 70 5 4,900 25 350
N = 10 260 202 10,618 5,194 3,630
N ∑ xy−∑ x ∑ y
r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ]
10 (3,630)−(260)(202)
r= 2 2
√ [ 10(10,618)−( 260 ) ] [10( 5,194)−( 202 ) ]
r =0.7825374433∨0.7825
Step 4:
Based on the result of r = 0.78 which is greater than the T.V. of 0.765 decline H o,
accept Ha. The no. of buyers and price of products had a relationship. Based on the result
of r implies very high correlation.
Example #5
The following data were obtained in a study of the relationship between the
weight and chest size of infants at birth using a level of significance α = 0.05.
Step 1: Ho: There is no correlation between the weight and chest size of infants at birth.
Ha: There is a correlation between the weight and chest size of infants at birth.
Step 2: df = n-2 α = 0.05
df = 9-2 TV = 0.666
df = 7
Step 3:
Weight Chest Size
Infants x2 y2 xy
(x) (y)
1 5.64 29.5 31.8096 870.25 166.380
2 4.41 26.3 19.4481 691.69 115.983
3 9.00 32.2 81.0000 1,036.84 289.800
4 11.32 36.5 128.1424 1,332.25 413.180
5 7.08 27.2 50.1264 739.84 192.576
6 8.86 27.7 78.4996 767.29 245.422
7 4.74 28.3 22.4676 800.89 134.142
8 8.82 30.3 77.7924 918.09 267.246
9 7.61 28.7 57.9121 823.69 218.407
N=9 67.48 266.7 547.1982 7,980.83 2,043.136
N ∑ xy−∑ x ∑ y
r=
2 2 2 2
√ [ N ∑ x − ( ∑ x ) ] [ N ∑ y −( ∑ y ) ]
9 (2,043.136)−(67.48)(266.7)
r= 2 2
√ [ 9(547.1982)−( 67.48 ) ] [9 (7,980.83)−( 266.7 ) ]
r =0.7684014652∨0.7684
Step 4:
Based on the result of r = 0.77 which is greater than the T.V. of 0.66 reject H o and
accept Ha. The weight and chest size of infants at birth had a relationship. Based on the
result of r implies high correlation.