0% found this document useful (0 votes)

25 views

Chapter 6

This document discusses estimation techniques used to estimate population parameters based on sample data. There are three key points: 1. Estimation involves using samples to represent whole populations since working with entire populations may be impossible due to large size or other factors. Good estimators are unbiased, efficient, and consistent. 2. Point estimation estimates parameters using single values like means, while interval estimation uses probability to determine intervals within which true population values are likely to fall. 3. Methods for constructing confidence intervals for population means, differences between population means, and other parameters are presented depending on whether population variances are known or unknown, and whether sample sizes are large or small.

Uploaded by

Raymond Kilangi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Chapter 6

Uploaded by

Raymond Kilangi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CHAPTER SIX

ESTIMATION

6.1 Introduction.
In practice it is not always possible to work with the entire population and determine the
desirable statistical measures, like mean and standard deviation. This is because the
populations might be infinite or very large and hence very expensive to work on it.
Estimation involves sampling techniques whereby findings from those samples are used
to represent the whole population. Estimators are the formulas used to estimate the
population parameters.
A good estimator must have several properties including the following;
(a) Unbiasedness
(b) Efficiency
(c) Consistency

Unbiasedness

An estimator ˆ for a population parameter  is said to be unbiased if E ˆ   . The

quantity E ˆ   is called the bias of  .
Efficiency
This property is used to compare the efficiency of one estimator over the others in
estimating the same population parameter  . The estimator with this property is also
known as MVUE (Minimum Variance Unbiased Estimator). This property is described as
follows;
Let ˆ and ˆ be two unbiased estimators for  , then ˆ is said to be more efficient over
1 2 1

than ˆ2 if
 
Var ˆ1  Var ˆ2  
Consistency
An estimator ˆ for  is said to be consistent if both its bias and variance tend to zero
when the sample size approaches infinity.

There are two types of estimation.

(a) Point estimation
(b) Interval estimation

1
6.2 Point estimation
This is the technique of estimating population parameters using single valued
statistics/estimators. Commonly used approaches in point estimations are Maximum
Likelihood and the Method of moments. We shall not discuss these approaches.

6.3 Interval Estimation

This is a technique that applies probability theory to determine an interval for which a true
value of a population parameter lies. This involves construction of confidence intervals of
population parameters for different situations and different levels of significance, 𝛼.

6.4 Confidence Interval Estimate for a Population Mean, 𝝁

There are two cases to consider under this estimation.
Case 1: When the Variance (𝝈𝟐 ) is known
In this case, the distribution or statistic used is Z, and hence the formula for 1   100%
confidence interval estimate for  is given by

x  Z 2
n

The value 𝑍𝛼/2 is called the critical value and is determined using inverse probability
technique. Depending on the standard normal table you use, this is simply the value of Z
at c such that 𝑃(𝑍 ≥ 𝑐) = 𝛼/2 or 𝑃(0 ≤ 𝑍 ≤ 𝑐) = 0.5 − 𝛼/2.
𝛼
For instance, if 𝛼 = 5% = 0.05 ⇒ = 0.025 ⇒ 𝑍0.025 = 𝑐
2

0.4750

0.0250

0 c

We find the table that 𝑃(0 ≤ 𝑍 ≤ 1.96) = 0.475. it follows that 𝑐 = 1.96.

2
Example 6.1
A population is known to have a variance of 81. A random sample of size 16 showed that
x  10.5 . Estimate the population mean by means of 95% confidence interval.

Solution
Given  2  81    9, n  16, x  10.5,   5%  0.05
Then, 95% confidence interval is given by
 9
x  Z 2  10.5  Z 0.025
n 16
 10.5  (1.96) (2.25)
 10.5  4.41
 6.09    14.91

Case 2: When the Population Variance is unknown

There are two situations describing unknown variance.
1. Large sample size n  30
2. Small sample size n  30

Large Sample Size

If the sample size is large, the statistic used is still Z while the unknown population
variance is replaced by a sample variance, 𝑠 2 . Hence the formula for 1   100%
confidence interval estimate for  becomes
s
x  Z 2
n
Example 6.2
Let X be a normal random variable representing the value of individual invoices (in
dollars) issued by a certain firm. Suppose that  and  are unknown. A random sample
of 49 invoices selected, showed that x  520 and s  91. Compute 95% confidence
interval estimate for  .
Solution
Given n  49, x  520, s  91,   5%
Since the sample size is large, the distribution used is Z, and hence the formula for 95%
confidence interval for  is
s
x  Z 2
n

3
But Z  2  Z 0.025  1.96 , so we have

 520  1.9613  520  25.48

s 91
x  Z 2  520  1.96
n 49
 494.52    545.48

Small Sample Size

If the sample size is small (𝑛 < 30) and the population variance is unknown, the standard
normal variable Z is no longer used. In this case a suitable distribution is t-distribution
with n  1 degrees of freedom. Degrees of freedom are a parameters attached to the t-
distribution whose derivation is not discussed here! The formula for 1   100%
confidence interval estimate for  is given by
s
x  t 2, n1
n
Where n  1 is called the degrees of freedom of the t– distribution.

Example 6.3
Repeat example 5.2 with sample size 25.

Solution
Given n  25, x  520, s  91,   5%
Since the sample size is small, the distribution used is t, and hence the formula for 95%
confidence interval for  is
s
x  t 2, n1
n
But t 2, n1  t 0.025, 24  2.064 , then we have

 520  2.06418.2  520  37.56

s 91
x  t 2, n 1  520  2.064
n 25
 482.44    557.56

6.5 Estimation for the Difference between Two Population Means

Similar to a single population, we consider two different cases.
Case 1: When Population Variances; 𝝈𝟐𝒙 and 𝝈𝟐𝒚 are known
In this case the statistic used is Z. Let X and Y be two normally distributed random
variables representing two populations, and let n x and n y be respective sample sizes.

4
Then, the formula for 1   100% confidence interval estimate for the difference between
means  x   y is given by

 x2  y2
x  y   Z  2 
nx ny
Example 6.4
Random variables X and Y are normally distributed with standard deviations  x  1.2 and
 y  0.9 ; random samples of observations on both variables, each of size 32, provide the
following information x  4.1 and y  3.5 . Estimate the difference between population
means by means of a 95% confidence interval.

Solution
Since the populations variances are known, the distribution used is Z, and hence the
formula for 95% confidence interval estimate for  x   y is given by

 x2  y2
x  y   Z  2 
nx ny
Where Z  2  Z 0.025  1.96 , then the confidence interval is

 x2  y2 1.2 2 0.9 2
x  y   Z  2   4.1  3.5  1.96 
nx ny 32 32
 0.60  1.960.2652
 0.60  0.52
 0.08   x   y   1.12
Comment: Since the entire interval consists of positive numbers only, it is 95% confident
that 𝜇𝑥 > 𝜇𝑦 .

Case 2: When Population variances are unknown

In similar fashion as the population mean, two situations are to be considered in this case;
1. Sample sizes are both large, n1  30 and n2  30 .
2. At least one of the samples is small.

5
Large Sample Sizes
The distribution used is still Z, and the population variances are replaced by their
respective sample variances. Hence the formula for 1   100% confidence interval
estimate for the difference between means  x   y is given by
2
s x2 s y
x  y   Z  2 
nx n y
Example 6.5
A utility company used to send out monthly statements to its customers without addressed
return envelopes. From a random sample of 120 customers it was determined that, on
average, it took 9 days for a payment to be made, with a sample standard deviation of 2
days.
Wishing to speed up receipt of payment, pre-addressed return envelopes were
subsequently included with the invoices. An independent sample of 130 customers
indicated that average payment time fell to 8 days, with a sample standard deviation of 2.2
days.
Compute a 95% confidence interval estimate for the difference between population
means.

Solution
Let X represents the invoices sent without pre-addressed return envelopes.
Let Y represents the invoices sent with pre-addressed return envelopes.
The following information are given
n x  120, x  9, s x  2 , n y  130, y  8, s y  2.2

Since the sample sizes are large, the distribution used is Z, population variances are
unknown but are replaced by the corresponding sample variances and hence the formula
for 95% confidence interval is given by
2
s x2 s y
x  y   Z  2 
nx n y
Where Z  2  Z 0.025  1.96 , so we have

6
2
s x2 s y 2 2 2.2 2
x  y   Z  2   9  8  1.96 
nx n y 120 130
 1.0  1.960.2656
 1.0  0.52
 0.08   x   y   2.52

Comment: It is also 95% confident that 𝜇𝑥 > 𝜇𝑦 .

Small Sample Sizes

In this situation, estimation is done under the assumptions that two samples are drawn
from two independent populations (say X and Y) and that these populations have a
common variance. i. e.  x2   y2   2 . This common variance is then estimated by a

common sample variance called a pooled sample variance s 2p , such that

n x  1 s x2  n y  1s y2
s 
2

nx  n y  2
p

The distribution or statistic used for estimation is now t – distribution with n x  n y  2

degrees of freedom. Hence the formula for 1   100% confidence interval estimate for
the difference between means  x   y is given by

s 2p s 2p
 x  y   t 2 , n  n  2 
1 2
nx ny

Example 6.6
Repeat example 5.5 with sample sizes n x  19 and n y  25

Solution
The following information are given
n x  19, x  9, s x  2 , n y  25, y  8, s y  2.2
Since the population variances are unknown and the sample sizes are small, the
distribution used is t-distribution and the population variances are replaced by sample
variances, and hence the formula for 95% confidence interval is

7
s 2p s 2p
 x  y   t 2 , n  n  2 
1 2
nx ny

We first compute s 2p using the formula

n x  1 s x2  n y  1s y2 19  1.2 2  25  1 2.2 2 188.16

s 
2
   4.48
nx  n y  2 19  25  2
p
42

The value of t 2 , nx  n y 2  t 0.025, 42  2.021

Therefore the required confidence interval is

s 2p s 2p
 x  y   t 2 , n  n  2   9  8  2.021
4.48 4.48

1 2
nx ny 19 25
 1.0  2.0210.6442
 1.0  1.30
 0.30   x   y   2.30
Comment: This interval consists of negative, zero and positive values. So one can
conclude he/she is not 95% confident that 𝜇𝑥 > 𝜇𝑦 .

Other formulas for confidence interval estimations are summarized in the following table

Type of estimation Confidence interval formula

Population proportion,  p1  p 
  p  Z
2 n
Deference between two p1 1  p1  p 2 1  p 2 
 1   2   p1  p2   Z  
population proportions,  1   2 2 n1 n2
Population variance,  2 n  1s 2 2 
n  1s 2
 2 , n 1
12 , n 1
2 2

Ratio of two population s12  12 s12

F n2  1, n1  1
1
  
 12 s 22 F n1  1, n2  1  22 s 22 2
variances, 2
 22

8
Example 6.7
In a random sample of 500 families owning television sets in a certain city, it is found that
340 have not yet subscribed to a newly introduced digital transmission system. Find a
95% confidence interval for the actual proportion of all families in the city who have not
yet subscribed to the system.

Solution
The point estimate of p is pˆ  340 500  0.68. For 95% confidence, we have   0.05 ,
then, Z   Z 0.025  1.96 . Therefore, 95% confidence interval for 𝜋 is
2

𝑝(1−𝑝) 0.68(0.32)
𝜋 = 𝑝 ± 𝑍𝛼 √ = 0.68 ± (1.96)√ = 0.68 ± 0.04
2 𝑛 500

⇒ 0.68 ≤ 𝜋 ≤ 0.72

Example 6.8
In a town A housing survey, 234 respondents out the 300 reported that they had exclusive
use of a flush toilet inside the house. In a town B housing survey, 135 out of 150 also
reported that they had exclusive use of a flush toilet inside the house. Construct a 95%
confidence interval for the difference between these two town proportions.

Solution
The following information are given
𝑥𝐴 = 234, 𝑛𝐴 = 300, 𝑦𝐵 = 135, 𝑛𝐵 = 150, 𝛼 = 5%
Now, 𝑝𝐴 = 234⁄300 = 0.78, 𝑝𝐵 = 135⁄150 = 0.90, and Z   Z 0.025  1.96
2

The required 95% confidence interval for 𝜋1 − 𝜋2 is given by

p A 1  p A  p B 1  p B 
 A   B    p A  p B   Z  
2 nA nB

 0.78  0.90  1.96

0.780.22  0.900.10
300 150
  0.12  1.960.03423
  0.12  0.067
  0.187   A   B   0.053
Comment: The range does not include any positive value or zero. This suggests that the
proportion of users from town A is less that of town B.

9
Example 6.9
The following are weights, in decagrams, of 10 packages of grass seed distributed by a
certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0. Find a 95%
confidence interval estimate for the variance of all such packages of grass seed distributed
by this company, assuming the normal population.
Solution
We first compute sample variance of this data as follows;
1  2
  xi2   x   
1 2
 21,273.12  641.2   0.286
1 1
s2 
n  1  n 
 9 10 
For 95% confidence interval, we have   0.05 , then
 2 n  1   02.025 9  19.023 and 12 n  1   02.975 9  2.700
2 2

Therefore, the 95% confidence interval estimate for  2 is given by

90.286   2  90.286
19.023 2.700
 0.135    0.953
2

6.6 Estimation of Sample size

The formula for confidence interval estimate for the population mean 𝜇 or population
proportion 𝜋 is used to estimate a sample size that is suitable to provide good
approximations of the population parameters. This can be done earlier if one can set the
interval length and when the population/sample variance is known in advance. For the
case of proportion, we need the estimate of the sample proportion, 𝑝.
From,

  x  Z 2 , we get   x  E
n

Where, E  Z is called the margin of error (interval length).
n
Making 𝑛 (sample size) subject, we obtain
𝑧𝜎 2
𝑛≥( )
𝐸
The inequality is preferred because the larger the sample size the more precise are the
estimates.

10
Example 6.10
What sample size would be required to estimate the population mean for a large set of
company invoices to within $0.30 with 95% confidence, given that the estimated
population standard deviation is $5.

Solution
The following information are given. 𝐸 = 0.30, 𝛼 = 5%, 𝜎 = 5
Required to find the estimated sample size 𝑛.
Now, 𝑍𝛼 = 𝑍0.025 = 1.965
2

Then,
𝑧𝜎 2 1.96 × 5 2
𝑛≥( ) =( ) = 1067.11
𝐸 0.3