Chapter 5 Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Statistics and Probability CMU Mathematics Department

CHAPTER 5
Statistical Inference: Estimation

Below are the topics and objectives to be considered as you going through this chapter.

Topics:
5.1 Preliminary Concepts
5.2 Confidence Interval for the Population Mean
5.3 Confidence Interval for the Difference between Two Population Means 𝜇1 − 𝜇2
5.4 Confidence Interval for Paired Observations 𝜇𝑑 = 𝜇1 − 𝜇2
5.5 Confidence Interval for a Population Proportion
5.6 Confidence Interval for the Difference between Two Proportions
5.7 Sample Size Determination

Objectives:
(1) Illustrates the preliminary concepts of estimation;
(2) Know and differentiate the two types of estimation (point and interval estimation);
(3) Identify point estimator for the population mean;
(4) Compute for the point estimate of the population mean;
(5) Illustrate and constructs a t-distribution;
(6) Compute for the confidence interval estimate and draw conclusion based on the
appropriate form of the estimator for the population mean;
(7) Compute for the confidence interval estimate and draw conclusion based on the
appropriate form of the estimator for the difference between two population means (two
independent samples and paired observations);
(8) Identify point estimator for the population proportion;
(9) Compute for the point estimate of the population proportion;
(10) Compute for the confidence interval estimate and draw conclusion based on the
appropriate form of the estimator for the population proportion and the difference between
two proportions;
(11) Compute for the length of the confidence interval; and
(12) Solve problems involving sample size determination.

Statistical Inference is a procedure whereby inferences of conclusions about a population


are made on the basis of the results obtained from a sample drawn from that population and is
divided into two major areas: estimation and hypothesis testing. In estimation, the objective
is to estimate unknown population parameters, like the mean or a proportion. In hypothesis
testing, the purpose is to make a decision on whether to accept or reject a statement regarding
a population characteristic.

Second Semester 1 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
5.1 Preliminary Concepts

Estimation is concerned with finding a value or range of values for an unknown parameter.
Estimator of a parameter is a rule or formula for computing an estimate using the sample data.
• Good property of an estimator
Accuracy measures the closeness of an estimate to its true value.
✓ To measure accuracy, bias is used
✓ Bias is the difference between the expected value of the estimates and the
parameter to be measured, that is,
𝑏𝑖𝑎𝑠(𝜃̂) = 𝐸(𝜃̂) − 𝜃
✓ An estimator with its bias equal to zero is said to be an unbiased estimator
of the parameter.

Precision measures the closeness of the different possible values of the estimator
to each other. The precision of an estimator can be measured by its variance or by
its standard error which is the square root of the variance.

Remark: We want the estimator to be both accurate and precise.

Measure of Accuracy and Precision


Mean Square Error ( MSE ) measure both accuracy and precision. The mean
square error of estimator 𝜃̂ is
̂ )]𝟐 − 𝑽𝒂𝒓(𝜽
𝑴𝑺𝑬(𝜽) = [𝒃𝒊𝒂𝒔(𝜽 ̂)

Estimate is a numerical value of the estimator.


➢ Two types of Estimates
1. Point Estimate – is a numerical value of an estimator computed from the data
contained in a sample. It is a single numerical value that is used as a best conjecture
of a population value. Below summarizes some common Population Parameters
with its Best Point Estimator

Population Parameter Point Estimator

𝑁 𝑛
1 1
𝜇 = ∑ 𝑥𝑖 𝑥̅ = ∑ 𝑥𝑖
𝑁 𝑛
𝑖=1 𝑖=1

𝑋 𝑥
𝑃= 𝑝=
𝑁 𝑛
𝑁 𝑛
1 2
1
𝜎 = ∑(𝑥𝑖 − 𝜇)2
2 𝑠 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑁 𝑛−1
𝑖=1 𝑖=1

2. Interval Estimate - it is an interval that specifies the lower and upper boundary for
the unknown parameter. In interval estimation, a certain degree of confidence can
be attached to the interval estimate. This degree of confidence is normally
expressed in percentages. A high percentage value means that there is a high level
of certainty in the estimate being used. This degree of certainty is called the level
of confidence or the confidence coefficient, and is denoted by 𝟏 − 𝜶. The bounds of
a confidence interval are called confidence limits. The objective of interval estimation
is to find the values of the lower (L) and upper (U) boundary that minimizes the
width of the interval given a confidence coefficient.

Second Semester 2 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
The general format of a confidence interval is

E = (tabulated value) (standard error)

5.2 Confidence Interval for the Population Mean


➢ To construct a 𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for the population mean 𝝁, the
following notation are used:

➢ The computing formulas are:

Note: Case 1 is used when the value of 𝝈 is given in the problem. In practical
situations, 𝝈 is unknown so case 2 or case 3 is used, depending on whether 𝒏 ≥ 𝟑𝟎 or
𝒏 < 𝟑𝟎

Second Semester 3 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
Example 1. A frozen food company wishes to know the mean length of corn received in a large
shipment. A random sample of 20 corns were collected and measured and found to have a mean
length of 8.8 inches. It is known that the standard deviation of all corn shipments is 1.5 inches.
Find a 95% confidence interval for the mean length of corn in the large shipment.

Given: 𝑛 = 20,
𝑥̅ = 8.8,
𝜎 = 1.5,
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05
Solution: Since the population standard deviation 𝝈 is given, then we use Case 1 with 𝑍𝛼 .
2

Find the z-value of this area/probability by


where 𝑍𝛼 = 𝑍0.05 = 𝑍0.025 = 1 − 0.025 = 0.975
2 2 referring it to the z-table as shown below.

Thus, the z-value of 𝒁𝟎.𝟎𝟐𝟓


is 1.96

Thus, the 95% confidence interval for 𝝁, is given by:


𝜎 1.5
𝑥̅ ± 𝑧𝛼 ( ) = 8.8 ± (1.96) ( ) , 𝑡ℎ𝑎𝑡 𝑖𝑠,
2 √𝑛 √20

1.5 1.5
8.8 − (1.96) ( ) < 𝜇 < 8.8 + (1.96) ( )
√20 √20

8.1426 < 𝜇 < 9.4574


(8.1426, 9.4574)

Conclusion: Thus, the 95% confidence interval for 𝝁, is (8.1426, 9.4574). The true population
mean length of the co0rn shipped is contained in the interval (8.1426, 9.4574) with a confidence
coefficient of 95%.
Second Semester 4 JOBELLE S. SIMBLANTE
Statistics and Probability CMU Mathematics Department
Example 2. The average CMUCAT score of a random sample of 35 freshmen of CAS is 89.66
with a standard deviation of 18.28. Construct a 96% confidence interval for the true mean
CMUCAT score of CAS freshmen.

Given: 𝑛 = 35,
𝑥̅ = 89.66,
𝑠 = 18.28,
𝐶𝐼 = 96% 𝑜𝑟 0.96
𝛼 = 1 − 𝐶𝐼 = 1 − 0.96 = 0.04

Solution: Since 𝝈 is unknown and 𝑛 > 30, we use Case 2 with 𝑍𝛼 .


2

Find the z-value of this area/probability by


where 𝑍𝛼 = 𝑍0.04 = 𝑍0.02 = 1 − 0.02 = 0.98
2 2 referring it to the z-table as shown below.

Notice that there is no exact 0.98 in the


table so we have to get the
corresponding average z-values of
0.9798 and 0.9803, which is 2.05 and
2.06, respectively. The average z-value
2.05+2.06
is 2.055, that is, 2
= 2.055.

Thus, the z-value of 𝒁𝟎.𝟎𝟐 is 2.055

Thus, the 96% confidence interval for 𝝁, is given by:


𝑠 18.28
𝑥̅ ± 𝑧𝛼 ( ) = 89.66 ± (2.055) ( ) , 𝑡ℎ𝑎𝑡 𝑖𝑠,
2 √𝑛 √35

18.28 18.28
89.66 − (2.055) ( ) < 𝜇 < 89.66 + (2.055) ( )
√35 √35
83.3103 < 𝜇 < 96.0097

(83.3103, 96.0097)

Conclusion: Thus, the 96% confidence interval for 𝝁, is (83.3103, 96.0097). The true mean
CMUCAT score is contained in the interval (83.3103, 96.0097) with a confidence coefficient of
96%.
Second Semester 5 JOBELLE S. SIMBLANTE
Statistics and Probability CMU Mathematics Department
Example 3. The contents of a random sample of 10 jugs of honey were recorded as follows: 10.2,
9.6, 10.3, 10.3, 10.4, 9.8, 9.7, 10.5, 10.6, and 9.8 liters. Assuming the distribution of contents to
be normal, construct a 95% confidence interval for the mean content of the jugs of honey.

Given: 𝑛 = 10,
𝑛
1 10.2 + 9.6 + 10.3 + 10.3 + 10.4 + 9.8 + 9.7 + 10.5 + 10.6 + 9.8
𝑥̅ = ∑ 𝑥𝑖 = = 10.12,
𝑛 10
𝑖=1
1 (10.2−10.12)2 +(9.6−10.12)2 +⋯+(9.8−10.12)2
𝑠 = √𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 = √ = 0.361,
10−1
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05

Solution: Since 𝝈 is unknown and 𝑛 < 30, we use Case 3 with 𝑡(𝛼,𝑣).
2

where 𝑡(𝛼,𝑣) = 𝑡(𝛼,𝑛−1) = 𝑡(0.05,10−1) = 𝑡(0.025,9) Find the value of t by referring it to the t-table
2 2 2 as shown below.

The t-value 𝑡(0.025,9) is 2.262.

Thus, the 95% confidence interval for 𝝁, is given by:


𝑠 0.361
𝑥̅ ± 𝑡(𝛼,𝑣) ( ) = 10.12 ± (2.262) ( ) , 𝑡ℎ𝑎𝑡 𝑖𝑠,
2 √𝑛 √10

0.361 0.361
10.12 − (2.262) ( ) < 𝜇 < 10.12 + (2.262) ( )
√10 √10

9.8618 < 𝜇 < 10.3782

(9.8618, 10.3782)

Conclusion: Thus, the 95% confidence interval for 𝝁, is (9.8618, 10.3782). The true population
mean of the content of the jugs of honey is contained in the interval (9.8618, 10.3782) with a
confidence coefficient of 95%.

Second Semester 6 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
5.3 Confidence Interval for the Difference between Two Population Means 𝝁𝟏 − 𝝁𝟐

➢ To construct a 𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for the difference between two
population means 𝝁𝟏 − 𝝁𝟐 , a sample is drawn from each population and the following
notation are used:

➢ The computing formulas are:

Note: If the population variances 𝝈𝟏 𝟐 and 𝝈𝟐 𝟐 for the two populations are given, then case 1
should be used regardless of the sample size.

To interpret the confidence interval, we take note of the sign of the lower bound L and
upper bound U. When the lower and upper boundaries of the confidence interval are both positive
numbers then 𝜇1 − 𝜇2 is also positive and may conclude that 𝜇1 > 𝜇2 . Similarly, when the lower
and upper boundaries of the confidence interval are both negative numbers then 𝜇1 − 𝜇2 is also
negative and may conclude that 𝜇1 < 𝜇2 . When the lower boundary is negative and the upper
boundary is positive, the interval will contain the number 0, so that 𝜇1 − 𝜇2 ≈ 0 and may conclude
that 𝜇1 = 𝜇2 . This interpretation can be summarized in the table below.

Second Semester 7 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
Example 1. The random sample of 45 electric fans of Brand A showed a mean life span of 4.11
years with a standard deviation of 0.55 years, while a random sample of 50 electric fans of Brand
B have a mean life span of 3.51 years with a standard deviation of 0.23 years. Construct a 99%
confidence interval for the true difference between means.

Given: 𝐶𝐼 = 99% 𝑜𝑟 0.99


𝛼 = 1 − 𝐶𝐼 = 1 − 0.99 = 0.01
Brand A Brand B
𝑛1 = 45 𝑛2 = 50
𝑥̅1 = 4.11 𝑥̅2 = 3.51
𝑠1 = 0.55 𝑠2 = 0.23

Solution: Since the population variances 𝜎1 2 and 𝜎2 2 are unknown and 𝑛1 > 30, 𝑛2 > 30, then we
use Case 2 with 𝑍𝛼 .
2
where 𝑍𝛼 = 𝑍0.01 = 𝑍0.005 = 1 − 0.005 = 0.995 Find the z-value of this area/probability by
2 2 referring it to the z-table as shown below.

Notice that there is no exact 0.995 in the


table so we have to get the
corresponding average z-values of
0.9949 and 0.9951, which is 2.57 and
2.58, respectively. The average z-value
2.57+2.58
is 2.575, that is, 2
= 2.575.

Thus, the z-value of 𝒁𝟎.𝟎𝟎𝟓 is 2.575

Second Semester 8 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
Thus, the 99% confidence interval for 𝝁𝟏 − 𝝁𝟐 , is given by:
𝒔𝟏 𝟐 𝒔𝟐 𝟐 0.552 0.232
(𝒙 ̅𝟐 ) ± 𝒁 𝜶 √
̅𝟏 − 𝒙 + = (4.11 − 3.51) ± (2.575)√ + , 𝑡ℎ𝑎𝑡 𝑖𝑠,
𝟐 𝒏𝟏 𝒏𝟐 45 50

0.552 0.232 0.552 0.232


(4.11 − 3.51) − (2.575)√ + < 𝝁𝟏 − 𝝁𝟐 < (4.11 − 3.51) + (2.575)√ +
45 50 45 50

0.552 0.232 0.552 0.232


0.6 − (2.575)√ + < 𝝁𝟏 − 𝝁𝟐 < 0.6 + (2.575)√ +
45 50 45 50

0.3729 < 𝝁𝟏 − 𝝁𝟐 < 0.8271

(0.3729, 0.8271)

Conclusion: Since the lower and upper boundaries are positive, then we are 99% confident that
the two population means 𝜇1 − 𝜇2 > 0 or 𝜇1 > 𝜇2 . This means that on the average, Brand A electric fans
have a higher life span than Brand B electric fans.

Example 2. At a certain machinery shop, two machines are used to produce metal rods. A random
sample of 11 rods from machine 1 showed a mean length of 5.95 inches with variance of 0.18
square inches while a random sample of 15 rods from machine 2 showed a mean length of 6.01
inches with variance of 0.2 square inches. Construct a 95% confidence interval for the difference
between the mean lengths, assuming that the populations are approximately normally distributed
with equal population variances.

Given:
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05

Machine 1 Machine 2
𝑛1 = 11 𝑛2 = 15
𝑥̅1 = 5.95 𝑥̅2 = 6.01
𝑠1 2 = 0.18 𝑠2 2 = 0.2

Solution: Since the two random samples are independent, where 𝜎1 2 = 𝜎2 2 but unknown and

𝟏 𝟏
𝑛1 < 30 & 𝑛2 < 30, we use Case 3, that is, (𝒙 ̅𝟐 ) ± 𝒕(𝜶,𝒗) 𝒔𝒑 √
̅𝟏 − 𝒙 + .
𝟐 𝒏𝟏 𝒏𝟐

where
(𝑛1 − 1)𝑠1 2 + (𝑛2 − 1)𝑠2 2
𝑠𝑝 = √
𝑣

(11 − 1)(0.18) + (15 − 1)(0.2)


=√
24

= 0.438

Second Semester 9 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department

and 𝑡(𝛼,𝑣) = 𝑡(𝛼,𝑛 = 𝑡(0.05,11+15−2) = 𝑡(0.025,24) Find the value of t by referring it to the t-table
1 +𝑛2 −2)
2 2 2 as shown below.

The t-value 𝑡(0.025,24) is 2.064.

Thus, the 99% confidence interval for 𝝁𝟏 − 𝝁𝟐 , is given by:

1 1 1 1
(𝑥̅1 − 𝑥̅2 ) − 𝑡(𝛼,𝑣) 𝑠𝑝 (√ + ) < 𝜇1 − 𝜇2 < (𝑥̅1 − 𝑥̅2 ) + 𝑡(𝛼,𝑣) 𝑠𝑝 (√ + )
2 𝑛1 𝑛2 2 𝑛1 𝑛2

1 1 1 1
(5.95 − 6.01) − (2.064)(0.438) (√ + ) < 𝜇1 − 𝜇2 < (5.95 − 6.01) + (2.064)(0.438) (√ + )
11 15 11 15

1 1 1 1
(−0.06) − (2.064)(0.438) (√ + ) < 𝜇1 − 𝜇2 < (−0.06) + (2.064)(0.438) (√ + )
11 15 11 15

−0.4189 < 𝜇1 − 𝜇2 < 0.2989

(−0.4189, 0.2989)

Conclusion: Since the lower boundary is negative and the upper boundary is positive, the mean
difference 𝜇1 − 𝜇2 is contained in an interval which includes 0. Thus, it is safe to assume that
𝜇1 − 𝜇2 ≈ 0 or 𝜇1 ≈ 𝜇2 and conclude that on the average, the two machines produce metal rods
of the same length.

Second Semester 10 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
5.4 Confidence Interval for Paired Observations 𝝁𝟏 − 𝝁𝟐 = 𝝁𝒅
Two observations are considered as paired if there is some relationship between the two
observations or that the observations are taken from the same person or object. The data layout for
paired observations is:
1 2 … N
𝑥1 𝑥11 𝑥12 … 𝑥1𝑛
𝑥2 𝑥21 𝑥22 … 𝑥2𝑛
𝑑𝑖 𝑑1 = 𝑥11 − 𝑥21 𝑑2 = 𝑥21 − 𝑥22 … 𝑑𝑛 = 𝑥1𝑛 − 𝑥2𝑛

Steps in constructing confidence interval estimate for paired observations:


1. Take the difference for each pair, that is, 𝑑𝑖 = 𝑥1𝑖 − 𝑥2𝑖 for all 𝑖 = 1, 2, … , 𝑛 where 𝑛
is the number of pairs.
2. Get the mean and standard deviation of the 𝑑𝑖 ’s where
𝑛 2 2
∑ 𝑑𝑖 𝑛 ∑𝑛 𝑛
𝑖=1 𝑑𝑖 −(∑𝑖=1 𝑑𝑖 )
𝑑̅ = 𝑖=1 ; 𝑠𝑑 = √
𝑛 𝑛(𝑛−1)

3. Compute the confidence interval using the formula


𝑠
𝑑̅ ± 𝑡 𝛼 ( 𝑑 ), which is equivalent to
( ,𝑣) √𝑛
2

𝑠 𝑠
𝑑̅ − 𝑡(𝛼,𝑣) ( 𝑑𝑛) < 𝜇𝑑 < 𝑑̅ + 𝑡(𝛼,𝑣) ( 𝑑𝑛), with 𝑣 = 𝑛 − 1
2 √ 2 √

Example 1. The weights in kilograms of five (5) women who took a new diet pill were recorded
before and after taking pill for a 2-week period. Construct a 99% confidence interval for the mean
difference in the weights, assuming the distribution of weight to be approximately normal. The
data are recorded below.
Women 1 2 3 4 5
Weight before (𝑥1 ) 58.5 60.3 61.7 69.0 64.0
Weight after (𝑥2 ) 60.0 54.9 58.1 62.1 58.5

Given: 𝑛 = 5,
𝐶𝐼 = 99% 𝑜𝑟 0.99
𝛼 = 1 − 0.99 = 0.01,
𝑣 =𝑛−1=5−1=4
𝑡(𝛼,𝑣) = 𝑡(𝛼,𝑛−1) = 𝑡 0.01 = 𝑡(0.005,4)
2 2 ( ,5−1)
2

The t-value of 𝑡(0.005,4) is 4.604.

Second Semester 11 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
Solution: Since we are dealing with paired observations, we follow the suggested steps.
1. Compute for 𝑑𝑖 = 𝑥1𝑖 − 𝑥2𝑖 , that is 𝑑𝑖 = weight before minus weight after. The
differences are reflected on the last row of the table.
Women 1 2 3 4 5
Weight before (𝑥1 ) 58.5 60.3 61.7 69.0 64.0
Weight after (𝑥2 ) 60.0 54.9 58.1 62.1 58.5
𝑑𝑖 = 𝑥1𝑖 − 𝑥2𝑖 -1.5 5.4 3.6 6.9 5.5

2. The mean and standard deviation of their difference are obtained as follows:
∑𝑛𝑖=1 𝑑𝑖 −1.5 + 5.4 + 3.6 + 6.9 + 5.5 19.9
𝑑̅ = = = = 3.98
𝑛 5 5

𝑛 ∑𝑛 𝑑𝑖 2 − (∑𝑛𝑖=1 𝑑𝑖 )2 5[(−1.5)2 + (5.4)2 + (3.6)2 + (6.9)2 + (5.5)2 ] − (−1.5 + 5.4 + 3.6 + 6.9 + 5.5)2
𝑠𝑑 = √ 𝑖=1 =√
𝑛(𝑛 − 1) 5(5 − 1)

5(122.23) − (19.9)2
=√ = 3.28
5(4)

3. The 99% confidence interval for 𝜇𝑑 is given by


𝑠 3.28
𝑑̅ ± 𝑡 𝛼 ( 𝑑 ) = 3.98 ± (4.604) ( ), that is
( ,𝑣)
2 𝑛 √5

3.28 3.28
3.98 − (4.604) ( ) < 𝜇𝑑 < 3.98 + (4.604) ( )
√5 √5
−2.7734 < 𝜇𝑑 < 10.7334
(−2.7734, 10.7334)
Conclusion: Since the lower boundary is negative and the upper boundary is positive, the interval
contains 0. Thus, we are 99% confident that the mean difference, 𝜇𝑑 = 𝜇1 − 𝜇2 ≈ 0. In other
words, we are 99% confident that the diet pill is not effective.

Second Semester 12 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
5.5 Confidence Interval for a Population Proportion
➢ A 𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for a population proportion is given by
𝑝̂𝑞̂
𝑝̂ ± 𝑍𝛼 √ , which is equivalent to
2 𝑛

𝑝̂𝑞̂ 𝑝̂𝑞̂
𝑝̂ − 𝑍𝛼 √ 𝑛 < 𝑃 < 𝑝̂ + 𝑍𝛼 √ 𝑛 ,
2 2

𝑥
where 𝑝̂ = 𝑛 ; 𝑞̂ = 1 − 𝑝̂

Example 1. A survey of 1000 household shows that 54% own personal computers at home. Find
a 95% confidence interval of the true proportion of households who own a computer.

Given: 𝑛 = 1000,
𝑝̂ = 54% = 0.54,
𝑞̂ = 1 − 𝑝̂ = 1 − 0.54 = 0.46,
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05
𝑍𝛼 = 𝑍0.05 = 𝑍0.025 = 1 − 0.025 = 0.975 Find the z-value of this area/probability by
2 2 referring it to the z-table as shown below.

Thus, the z-value of 𝒁𝟎.𝟎𝟐𝟓


is 1.96

Second Semester 13 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
Solution: The 95% confidence interval for the true proportion of households who own a computer
𝑃 is given by:

𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
𝑝̂ − 𝑍𝛼 √ < 𝑃 < 𝑝̂ + 𝑍𝛼 √
2 𝑛 2 𝑛

(0.54)(0.46) (0.54)(0.46)
0.54 − (1.96)√ < 𝑃 < 0.54 + (1.96) √
1000 1000
0.5091 < 𝑃 < 0.5709

(0.5091, 0.5709)

Conclusion: Thus, the true proportion of households who own a computer is between 50.91% and
57.09%, with a confidence coefficient of 95%.

Example 2. To determine p, the true proportion of underemployed persons who are college
graduates, 500 out 2500 random selected working persons who were interviewed indicated that
they are college graduates. Construct a 99% confidence interval p.

Given: 𝑥 = 500,
𝑛 = 2500,
𝑥 500
𝑝̂ = = = 0.2,
𝑛 2500
𝑞̂ = 1 − 𝑝̂ = 1 − 0.2 = 0.8,
𝐶𝐼 = 99% 𝑜𝑟 0.99
𝛼 = 1 − 𝐶𝐼 = 1 − 0.99 = 0.01
𝑍𝛼 = 𝑍0.01 = 𝑍0.005 = 1 − 0.005 = 0.995 Find the z-value of this area/probability by
2 2 referring it to the z-table as shown below.

Notice that there is no exact 0.995 in the


table so we have to get the
corresponding average z-values of
0.9949 and 0.9951, which is 2.57 and
2.58, respectively. The average z-value
2.57+2.58
is 2.575, that is, 2
= 2.575.

Thus, the z-value of 𝒁𝟎.𝟎𝟎𝟓 is 2.575

Second Semester 14 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
Solution: The 99% confidence interval for the true proportion of underemployed persons who are
college graduates 𝑃 is given by:

𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
𝑝̂ − 𝑍𝛼 √ < 𝑃 < 𝑝̂ + 𝑍𝛼 √
2 𝑛 2 𝑛

(0.2)(0.8) (0.2)(0.8)
0.20 − (2.575)√ < 𝑃 < 0.20 + (2.575)√
2500 2500
0.1794 < 𝑃 < 0.2206

(0.1794, 0.2206)

Conclusion: Thus, the true proportion of underemployed persons who are college graduates is
between 17.94% and 22.06% with a 99% confidence coefficient.

5.6 Confidence Interval for the Difference between Two Proportions


➢ A 𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for the difference between two proportions is
given by
1 1
(𝑝̂1 − 𝑝̂2 ) ± 𝑍𝛼 √𝑝̅ 𝑞̅ ( + ) , which is equivalent to
2 𝑛1 𝑛2

1 1 1 1
(𝑝̂1 − 𝑝̂2 ) − 𝑍𝛼 √𝑝̅ 𝑞̅ ( + ) < 𝑃1 − 𝑃2 < (𝑝̂1 − 𝑝̂2 ) + 𝑍𝛼 √𝑝̅ 𝑞̅ ( + )
2 𝑛1 𝑛2 2 𝑛1 𝑛2

𝑥 𝑥2 𝑥1 +𝑥2
where 𝑝̂1 = 𝑛1 ; 𝑝̂ 2 =
𝑛2
; 𝑝̅ =
𝑛1 +𝑛2
; 𝑞̅ = 1 − 𝑝̅
1

Example 1. A survey was made to determine the difference in proportion of men and women who
completed a college education. A random sample of men and women who were 20 years old and
above were surveyed and results show that 70 out of 280 women, and 92 out of 320 men, were
college graduates. Construct a 96% confidence interval for the true difference in proportions.

Given:
Men Women
𝑥1 = 92 𝑥2 = 70
𝑛1 = 320 𝑛2 = 280
𝑥1 92 𝑥2 70
𝑝̂1 = = = 0.2875 𝑝̂2 = = = 0.25
𝑛1 320 𝑛2 280
𝑥1 + 𝑥2 92 + 70
𝑝̅ = = = 0.27
𝑛1 + 𝑛2 320 + 280

𝑞̅ = 1 − 𝑝̅ = 1 − 0.27 = 0.73

𝐶𝐼 = 96% 𝑜𝑟 0.96
𝛼 = 1 − 𝐶𝐼 = 1 − 0.96 = 0.04
Find the z-value of this area/probability by
𝑍𝛼 = 𝑍0.04 = 𝑍0.02 = 1 − 0.02 = 0.98
2 2 referring it to the z-table as shown below.

Second Semester 15 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department

Notice that there is no exact 0.98 in the


table so we have to get the
corresponding average z-values of
0.9798 and 0.9803, which is 2.05 and
2.06, respectively. The average z-value
2.05+2.06
is 2.055, that is, 2
= 2.055.

Thus, the z-value of 𝒁𝟎.𝟎𝟐 is 2.055

Solution: The 96% confidence interval for 𝑃1 − 𝑃2 is given by:


1 1 1 1
(𝑝̂1 − 𝑝̂2 ) − 𝑍𝛼 √𝑝̅ 𝑞̅ ( + ) < 𝑃1 − 𝑃2 < (𝑝̂1 − 𝑝̂2 ) + 𝑍𝛼 √𝑝̅ 𝑞̅ ( + )
2 𝑛1 𝑛2 2 𝑛1 𝑛2

1 1 1 1
(0.2875 − 0.25) − (2.055)√(0.27)(0.73) ( + ) < 𝑃1 − 𝑃2 < (0.2875 − 0.25) + (2.055)√(0.27)(0.73) ( + )
320 280 320 280

1 1 1 1
0.0375 − (2.055)√(0.27)(0.73) ( + ) < 𝑃1 − 𝑃2 < 0.0375 + (2.055)√(0.27)(0.73) ( + )
320 280 320 280

−0.0372 < 𝑃1 − 𝑃2 < 0.1122

(−0.0372, 0.1122)

Conclusion: Since the lower boundary is negative and the upper boundary is positive, it can be
interpreted that the proportion of men who completed a college education is about equal to the
proportion of females who completed a college education.

Second Semester 16 JOBELLE S. SIMBLANTE


Statistics and Probability CMU Mathematics Department
5.7 Sample Size Determination
Formula in determining the sample size for Estimating the Population Mean:
𝑧𝛼⁄2 2 𝜎 2
𝑛=
𝐸2

Note: Always round-off the result to the next higher integer.

Example 1. What sample size is needed to be 95% confident of being correct within
 10 ? A pilot study suggested that the standard deviation  is 35.

Given: 𝐶𝐼 = 95% = 0.95,


𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05
𝑍𝛼 = 𝑍0.05 = 𝑍0.025 = 1.96
2 2
𝜎 = 35
𝐸 = 10

Solution:
𝑧𝛼⁄2 2 𝜎 2 (1.96)2 (35)2
𝑛= = = 47.0596 ≈ 48
𝐸2 (10)2

Formula in determining the sample size for Estimating the Population Proportion:
𝑧𝛼⁄2 2 (𝑝)(1 − 𝑝)
𝑛=
𝐸2

Note: Always round-off the result to the next higher integer.

Example 2. A pollster is hired to determine the percentage of voters favoring the


opposition party presidential candidate. If we require 99% confidence that the
estimated value is within two percent of the true percentage of the true value, how
large should the random sample be?

Given: 𝐶𝐼 = 99% = 0.99,


𝛼 = 1 − 𝐶𝐼 = 1 − 0.99 = 0.01
𝑍𝛼 = 𝑍0.01 = 𝑍0.005 = 2.575
2 2
𝑝 = 0.5
𝐸 = 2% = 0.02

Solution:
𝑧𝛼⁄2 2 (𝑝)(1 − 𝑝) (2.575)2 (0.5)(1 − 0.5)
𝑛= = = 4144.140625 ≈ 4,145
𝐸2 (0.02)2

Second Semester 17 JOBELLE S. SIMBLANTE

You might also like