Chapter 5 Estimation
Chapter 5 Estimation
Chapter 5 Estimation
CHAPTER 5
Statistical Inference: Estimation
Below are the topics and objectives to be considered as you going through this chapter.
Topics:
5.1 Preliminary Concepts
5.2 Confidence Interval for the Population Mean
5.3 Confidence Interval for the Difference between Two Population Means 𝜇1 − 𝜇2
5.4 Confidence Interval for Paired Observations 𝜇𝑑 = 𝜇1 − 𝜇2
5.5 Confidence Interval for a Population Proportion
5.6 Confidence Interval for the Difference between Two Proportions
5.7 Sample Size Determination
Objectives:
(1) Illustrates the preliminary concepts of estimation;
(2) Know and differentiate the two types of estimation (point and interval estimation);
(3) Identify point estimator for the population mean;
(4) Compute for the point estimate of the population mean;
(5) Illustrate and constructs a t-distribution;
(6) Compute for the confidence interval estimate and draw conclusion based on the
appropriate form of the estimator for the population mean;
(7) Compute for the confidence interval estimate and draw conclusion based on the
appropriate form of the estimator for the difference between two population means (two
independent samples and paired observations);
(8) Identify point estimator for the population proportion;
(9) Compute for the point estimate of the population proportion;
(10) Compute for the confidence interval estimate and draw conclusion based on the
appropriate form of the estimator for the population proportion and the difference between
two proportions;
(11) Compute for the length of the confidence interval; and
(12) Solve problems involving sample size determination.
Estimation is concerned with finding a value or range of values for an unknown parameter.
Estimator of a parameter is a rule or formula for computing an estimate using the sample data.
• Good property of an estimator
Accuracy measures the closeness of an estimate to its true value.
✓ To measure accuracy, bias is used
✓ Bias is the difference between the expected value of the estimates and the
parameter to be measured, that is,
𝑏𝑖𝑎𝑠(𝜃̂) = 𝐸(𝜃̂) − 𝜃
✓ An estimator with its bias equal to zero is said to be an unbiased estimator
of the parameter.
Precision measures the closeness of the different possible values of the estimator
to each other. The precision of an estimator can be measured by its variance or by
its standard error which is the square root of the variance.
𝑁 𝑛
1 1
𝜇 = ∑ 𝑥𝑖 𝑥̅ = ∑ 𝑥𝑖
𝑁 𝑛
𝑖=1 𝑖=1
𝑋 𝑥
𝑃= 𝑝=
𝑁 𝑛
𝑁 𝑛
1 2
1
𝜎 = ∑(𝑥𝑖 − 𝜇)2
2 𝑠 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑁 𝑛−1
𝑖=1 𝑖=1
2. Interval Estimate - it is an interval that specifies the lower and upper boundary for
the unknown parameter. In interval estimation, a certain degree of confidence can
be attached to the interval estimate. This degree of confidence is normally
expressed in percentages. A high percentage value means that there is a high level
of certainty in the estimate being used. This degree of certainty is called the level
of confidence or the confidence coefficient, and is denoted by 𝟏 − 𝜶. The bounds of
a confidence interval are called confidence limits. The objective of interval estimation
is to find the values of the lower (L) and upper (U) boundary that minimizes the
width of the interval given a confidence coefficient.
Note: Case 1 is used when the value of 𝝈 is given in the problem. In practical
situations, 𝝈 is unknown so case 2 or case 3 is used, depending on whether 𝒏 ≥ 𝟑𝟎 or
𝒏 < 𝟑𝟎
Given: 𝑛 = 20,
𝑥̅ = 8.8,
𝜎 = 1.5,
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05
Solution: Since the population standard deviation 𝝈 is given, then we use Case 1 with 𝑍𝛼 .
2
1.5 1.5
8.8 − (1.96) ( ) < 𝜇 < 8.8 + (1.96) ( )
√20 √20
Conclusion: Thus, the 95% confidence interval for 𝝁, is (8.1426, 9.4574). The true population
mean length of the co0rn shipped is contained in the interval (8.1426, 9.4574) with a confidence
coefficient of 95%.
Second Semester 4 JOBELLE S. SIMBLANTE
Statistics and Probability CMU Mathematics Department
Example 2. The average CMUCAT score of a random sample of 35 freshmen of CAS is 89.66
with a standard deviation of 18.28. Construct a 96% confidence interval for the true mean
CMUCAT score of CAS freshmen.
Given: 𝑛 = 35,
𝑥̅ = 89.66,
𝑠 = 18.28,
𝐶𝐼 = 96% 𝑜𝑟 0.96
𝛼 = 1 − 𝐶𝐼 = 1 − 0.96 = 0.04
18.28 18.28
89.66 − (2.055) ( ) < 𝜇 < 89.66 + (2.055) ( )
√35 √35
83.3103 < 𝜇 < 96.0097
(83.3103, 96.0097)
Conclusion: Thus, the 96% confidence interval for 𝝁, is (83.3103, 96.0097). The true mean
CMUCAT score is contained in the interval (83.3103, 96.0097) with a confidence coefficient of
96%.
Second Semester 5 JOBELLE S. SIMBLANTE
Statistics and Probability CMU Mathematics Department
Example 3. The contents of a random sample of 10 jugs of honey were recorded as follows: 10.2,
9.6, 10.3, 10.3, 10.4, 9.8, 9.7, 10.5, 10.6, and 9.8 liters. Assuming the distribution of contents to
be normal, construct a 95% confidence interval for the mean content of the jugs of honey.
Given: 𝑛 = 10,
𝑛
1 10.2 + 9.6 + 10.3 + 10.3 + 10.4 + 9.8 + 9.7 + 10.5 + 10.6 + 9.8
𝑥̅ = ∑ 𝑥𝑖 = = 10.12,
𝑛 10
𝑖=1
1 (10.2−10.12)2 +(9.6−10.12)2 +⋯+(9.8−10.12)2
𝑠 = √𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 = √ = 0.361,
10−1
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05
Solution: Since 𝝈 is unknown and 𝑛 < 30, we use Case 3 with 𝑡(𝛼,𝑣).
2
where 𝑡(𝛼,𝑣) = 𝑡(𝛼,𝑛−1) = 𝑡(0.05,10−1) = 𝑡(0.025,9) Find the value of t by referring it to the t-table
2 2 2 as shown below.
0.361 0.361
10.12 − (2.262) ( ) < 𝜇 < 10.12 + (2.262) ( )
√10 √10
(9.8618, 10.3782)
Conclusion: Thus, the 95% confidence interval for 𝝁, is (9.8618, 10.3782). The true population
mean of the content of the jugs of honey is contained in the interval (9.8618, 10.3782) with a
confidence coefficient of 95%.
➢ To construct a 𝟏𝟎𝟎(𝟏 − 𝜶)% confidence interval for the difference between two
population means 𝝁𝟏 − 𝝁𝟐 , a sample is drawn from each population and the following
notation are used:
Note: If the population variances 𝝈𝟏 𝟐 and 𝝈𝟐 𝟐 for the two populations are given, then case 1
should be used regardless of the sample size.
To interpret the confidence interval, we take note of the sign of the lower bound L and
upper bound U. When the lower and upper boundaries of the confidence interval are both positive
numbers then 𝜇1 − 𝜇2 is also positive and may conclude that 𝜇1 > 𝜇2 . Similarly, when the lower
and upper boundaries of the confidence interval are both negative numbers then 𝜇1 − 𝜇2 is also
negative and may conclude that 𝜇1 < 𝜇2 . When the lower boundary is negative and the upper
boundary is positive, the interval will contain the number 0, so that 𝜇1 − 𝜇2 ≈ 0 and may conclude
that 𝜇1 = 𝜇2 . This interpretation can be summarized in the table below.
Solution: Since the population variances 𝜎1 2 and 𝜎2 2 are unknown and 𝑛1 > 30, 𝑛2 > 30, then we
use Case 2 with 𝑍𝛼 .
2
where 𝑍𝛼 = 𝑍0.01 = 𝑍0.005 = 1 − 0.005 = 0.995 Find the z-value of this area/probability by
2 2 referring it to the z-table as shown below.
(0.3729, 0.8271)
Conclusion: Since the lower and upper boundaries are positive, then we are 99% confident that
the two population means 𝜇1 − 𝜇2 > 0 or 𝜇1 > 𝜇2 . This means that on the average, Brand A electric fans
have a higher life span than Brand B electric fans.
Example 2. At a certain machinery shop, two machines are used to produce metal rods. A random
sample of 11 rods from machine 1 showed a mean length of 5.95 inches with variance of 0.18
square inches while a random sample of 15 rods from machine 2 showed a mean length of 6.01
inches with variance of 0.2 square inches. Construct a 95% confidence interval for the difference
between the mean lengths, assuming that the populations are approximately normally distributed
with equal population variances.
Given:
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05
Machine 1 Machine 2
𝑛1 = 11 𝑛2 = 15
𝑥̅1 = 5.95 𝑥̅2 = 6.01
𝑠1 2 = 0.18 𝑠2 2 = 0.2
Solution: Since the two random samples are independent, where 𝜎1 2 = 𝜎2 2 but unknown and
𝟏 𝟏
𝑛1 < 30 & 𝑛2 < 30, we use Case 3, that is, (𝒙 ̅𝟐 ) ± 𝒕(𝜶,𝒗) 𝒔𝒑 √
̅𝟏 − 𝒙 + .
𝟐 𝒏𝟏 𝒏𝟐
where
(𝑛1 − 1)𝑠1 2 + (𝑛2 − 1)𝑠2 2
𝑠𝑝 = √
𝑣
= 0.438
and 𝑡(𝛼,𝑣) = 𝑡(𝛼,𝑛 = 𝑡(0.05,11+15−2) = 𝑡(0.025,24) Find the value of t by referring it to the t-table
1 +𝑛2 −2)
2 2 2 as shown below.
1 1 1 1
(𝑥̅1 − 𝑥̅2 ) − 𝑡(𝛼,𝑣) 𝑠𝑝 (√ + ) < 𝜇1 − 𝜇2 < (𝑥̅1 − 𝑥̅2 ) + 𝑡(𝛼,𝑣) 𝑠𝑝 (√ + )
2 𝑛1 𝑛2 2 𝑛1 𝑛2
1 1 1 1
(5.95 − 6.01) − (2.064)(0.438) (√ + ) < 𝜇1 − 𝜇2 < (5.95 − 6.01) + (2.064)(0.438) (√ + )
11 15 11 15
1 1 1 1
(−0.06) − (2.064)(0.438) (√ + ) < 𝜇1 − 𝜇2 < (−0.06) + (2.064)(0.438) (√ + )
11 15 11 15
(−0.4189, 0.2989)
Conclusion: Since the lower boundary is negative and the upper boundary is positive, the mean
difference 𝜇1 − 𝜇2 is contained in an interval which includes 0. Thus, it is safe to assume that
𝜇1 − 𝜇2 ≈ 0 or 𝜇1 ≈ 𝜇2 and conclude that on the average, the two machines produce metal rods
of the same length.
𝑠 𝑠
𝑑̅ − 𝑡(𝛼,𝑣) ( 𝑑𝑛) < 𝜇𝑑 < 𝑑̅ + 𝑡(𝛼,𝑣) ( 𝑑𝑛), with 𝑣 = 𝑛 − 1
2 √ 2 √
Example 1. The weights in kilograms of five (5) women who took a new diet pill were recorded
before and after taking pill for a 2-week period. Construct a 99% confidence interval for the mean
difference in the weights, assuming the distribution of weight to be approximately normal. The
data are recorded below.
Women 1 2 3 4 5
Weight before (𝑥1 ) 58.5 60.3 61.7 69.0 64.0
Weight after (𝑥2 ) 60.0 54.9 58.1 62.1 58.5
Given: 𝑛 = 5,
𝐶𝐼 = 99% 𝑜𝑟 0.99
𝛼 = 1 − 0.99 = 0.01,
𝑣 =𝑛−1=5−1=4
𝑡(𝛼,𝑣) = 𝑡(𝛼,𝑛−1) = 𝑡 0.01 = 𝑡(0.005,4)
2 2 ( ,5−1)
2
2. The mean and standard deviation of their difference are obtained as follows:
∑𝑛𝑖=1 𝑑𝑖 −1.5 + 5.4 + 3.6 + 6.9 + 5.5 19.9
𝑑̅ = = = = 3.98
𝑛 5 5
𝑛 ∑𝑛 𝑑𝑖 2 − (∑𝑛𝑖=1 𝑑𝑖 )2 5[(−1.5)2 + (5.4)2 + (3.6)2 + (6.9)2 + (5.5)2 ] − (−1.5 + 5.4 + 3.6 + 6.9 + 5.5)2
𝑠𝑑 = √ 𝑖=1 =√
𝑛(𝑛 − 1) 5(5 − 1)
5(122.23) − (19.9)2
=√ = 3.28
5(4)
3.28 3.28
3.98 − (4.604) ( ) < 𝜇𝑑 < 3.98 + (4.604) ( )
√5 √5
−2.7734 < 𝜇𝑑 < 10.7334
(−2.7734, 10.7334)
Conclusion: Since the lower boundary is negative and the upper boundary is positive, the interval
contains 0. Thus, we are 99% confident that the mean difference, 𝜇𝑑 = 𝜇1 − 𝜇2 ≈ 0. In other
words, we are 99% confident that the diet pill is not effective.
𝑝̂𝑞̂ 𝑝̂𝑞̂
𝑝̂ − 𝑍𝛼 √ 𝑛 < 𝑃 < 𝑝̂ + 𝑍𝛼 √ 𝑛 ,
2 2
𝑥
where 𝑝̂ = 𝑛 ; 𝑞̂ = 1 − 𝑝̂
Example 1. A survey of 1000 household shows that 54% own personal computers at home. Find
a 95% confidence interval of the true proportion of households who own a computer.
Given: 𝑛 = 1000,
𝑝̂ = 54% = 0.54,
𝑞̂ = 1 − 𝑝̂ = 1 − 0.54 = 0.46,
𝐶𝐼 = 95% 𝑜𝑟 0.95
𝛼 = 1 − 𝐶𝐼 = 1 − 0.95 = 0.05
𝑍𝛼 = 𝑍0.05 = 𝑍0.025 = 1 − 0.025 = 0.975 Find the z-value of this area/probability by
2 2 referring it to the z-table as shown below.
𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
𝑝̂ − 𝑍𝛼 √ < 𝑃 < 𝑝̂ + 𝑍𝛼 √
2 𝑛 2 𝑛
(0.54)(0.46) (0.54)(0.46)
0.54 − (1.96)√ < 𝑃 < 0.54 + (1.96) √
1000 1000
0.5091 < 𝑃 < 0.5709
(0.5091, 0.5709)
Conclusion: Thus, the true proportion of households who own a computer is between 50.91% and
57.09%, with a confidence coefficient of 95%.
Example 2. To determine p, the true proportion of underemployed persons who are college
graduates, 500 out 2500 random selected working persons who were interviewed indicated that
they are college graduates. Construct a 99% confidence interval p.
Given: 𝑥 = 500,
𝑛 = 2500,
𝑥 500
𝑝̂ = = = 0.2,
𝑛 2500
𝑞̂ = 1 − 𝑝̂ = 1 − 0.2 = 0.8,
𝐶𝐼 = 99% 𝑜𝑟 0.99
𝛼 = 1 − 𝐶𝐼 = 1 − 0.99 = 0.01
𝑍𝛼 = 𝑍0.01 = 𝑍0.005 = 1 − 0.005 = 0.995 Find the z-value of this area/probability by
2 2 referring it to the z-table as shown below.
𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
𝑝̂ − 𝑍𝛼 √ < 𝑃 < 𝑝̂ + 𝑍𝛼 √
2 𝑛 2 𝑛
(0.2)(0.8) (0.2)(0.8)
0.20 − (2.575)√ < 𝑃 < 0.20 + (2.575)√
2500 2500
0.1794 < 𝑃 < 0.2206
(0.1794, 0.2206)
Conclusion: Thus, the true proportion of underemployed persons who are college graduates is
between 17.94% and 22.06% with a 99% confidence coefficient.
1 1 1 1
(𝑝̂1 − 𝑝̂2 ) − 𝑍𝛼 √𝑝̅ 𝑞̅ ( + ) < 𝑃1 − 𝑃2 < (𝑝̂1 − 𝑝̂2 ) + 𝑍𝛼 √𝑝̅ 𝑞̅ ( + )
2 𝑛1 𝑛2 2 𝑛1 𝑛2
𝑥 𝑥2 𝑥1 +𝑥2
where 𝑝̂1 = 𝑛1 ; 𝑝̂ 2 =
𝑛2
; 𝑝̅ =
𝑛1 +𝑛2
; 𝑞̅ = 1 − 𝑝̅
1
Example 1. A survey was made to determine the difference in proportion of men and women who
completed a college education. A random sample of men and women who were 20 years old and
above were surveyed and results show that 70 out of 280 women, and 92 out of 320 men, were
college graduates. Construct a 96% confidence interval for the true difference in proportions.
Given:
Men Women
𝑥1 = 92 𝑥2 = 70
𝑛1 = 320 𝑛2 = 280
𝑥1 92 𝑥2 70
𝑝̂1 = = = 0.2875 𝑝̂2 = = = 0.25
𝑛1 320 𝑛2 280
𝑥1 + 𝑥2 92 + 70
𝑝̅ = = = 0.27
𝑛1 + 𝑛2 320 + 280
𝑞̅ = 1 − 𝑝̅ = 1 − 0.27 = 0.73
𝐶𝐼 = 96% 𝑜𝑟 0.96
𝛼 = 1 − 𝐶𝐼 = 1 − 0.96 = 0.04
Find the z-value of this area/probability by
𝑍𝛼 = 𝑍0.04 = 𝑍0.02 = 1 − 0.02 = 0.98
2 2 referring it to the z-table as shown below.
1 1 1 1
(0.2875 − 0.25) − (2.055)√(0.27)(0.73) ( + ) < 𝑃1 − 𝑃2 < (0.2875 − 0.25) + (2.055)√(0.27)(0.73) ( + )
320 280 320 280
1 1 1 1
0.0375 − (2.055)√(0.27)(0.73) ( + ) < 𝑃1 − 𝑃2 < 0.0375 + (2.055)√(0.27)(0.73) ( + )
320 280 320 280
(−0.0372, 0.1122)
Conclusion: Since the lower boundary is negative and the upper boundary is positive, it can be
interpreted that the proportion of men who completed a college education is about equal to the
proportion of females who completed a college education.
Example 1. What sample size is needed to be 95% confident of being correct within
10 ? A pilot study suggested that the standard deviation is 35.
Solution:
𝑧𝛼⁄2 2 𝜎 2 (1.96)2 (35)2
𝑛= = = 47.0596 ≈ 48
𝐸2 (10)2
Formula in determining the sample size for Estimating the Population Proportion:
𝑧𝛼⁄2 2 (𝑝)(1 − 𝑝)
𝑛=
𝐸2
Solution:
𝑧𝛼⁄2 2 (𝑝)(1 − 𝑝) (2.575)2 (0.5)(1 − 0.5)
𝑛= = = 4144.140625 ≈ 4,145
𝐸2 (0.02)2