BS - CH II Estimation
BS - CH II Estimation
BS - CH II Estimation
STATISTICAL ESTIMATIONS
2.1Basic Concepts
Introduction
In many cases values for a population parameter are unknown. If parameters are unknown it is
generally not sufficient to make some convenient assumption about their values, rather those
unknown parameters should be estimated.
In business many decision are made without complete information. A firm does not know
exactly what will be its sales volume next year or next month. A college does not know exactly
how many students will enroll next year. Both must estimate to make decision about the future.
Inferential statistics is concerned with estimation. It is the procedure where inference about a
population is made on the basis of the results obtained from a sample drawn from that
population. This can be achieved by:
Hypothesis Testing -e.g. Use sample evidence to test hypotheses about the population mean.
Estimation e.g. Estimate the population mean using the information derived from sample.
There are two types of estimations for a population parameter:
- Point estimation
- Interval estimation
Definition of Terms:
Estimation: is the process of predicting the unknown population parameter through
sampling i.e. it is the process of using sample statistic so as to estimate the unknown
population parameter. It is simply the act of guessing the value of a population parameter.
The objective of estimation is to determine the approximate value of a population parameter
on the basis of a sample statistic. E.g. The sample mean ( x ) is employed to estimate the
population mean ( )
Estimator: is a sample statistic that is used to estimate an unknown population parameter.
E.g. sample mean, sample proportion, etc.
Estimate: is a single numerical value obtained for an estimator. E.g. 1, 2, 3, - - -. For
instance, the sample mean is an estimator for the population mean.
Parameter: is a characteristic of a population. E.g. population mean, population standard
deviation, population proportion etc
Statistic: is a characteristic of a sample. E.g. sample mean, sample proportion, sample
standard deviation, etc.
2.2Types of Estimates
2.2.1 Point Estimation
It is a single numerical value obtained from a random sample used to estimate the corresponding
population parameter. A random sample of observations is taken from the population of interest
and the observed values are used to obtain a point estimate of the relevant parameter.
The Sample mean, x is the best estimator of the population mean (). Different samples from a
population yield different point estimates of ().
1|Page
Sample proportion p is a good estimator of population proportion, p.
Population proportion (P) is equal to the number of elements in the population belonging to the
X
category of interest divided by the total number of elements in the population, p = N Where:
2|Page
say that the exact parameter value is some specific number, but we can determine a range of
values within which we are confident the unknown parameter lies.
Interval estimate states the range within which a population parameter probably lies. The interval
with in which a population parameter is expected to lie is usually referred to as the confidence
interval.
The confidence interval for the population mean is the interval that has a high probability of
containing the population means,
Three confidence intervals are used extensively.
90% confidence interval,
95% confidence interval and
99% confidence interval
A 95% confidence interval means that about 95% of the similarly constructed intervals will
contain the parameter being estimated. If we use the 99% confidence interval we expect about
99% of the intervals to contain the parameter being estimated.
Another interpretation of the 95 % confidence interval is that 95 % of the sample means for a
specified sample size will lie within 1.96 standard deviations of the hypothesized population
mean. For 99% the sample means will lie, with in 2.58 standard deviations of the hypothesized
population mean.
Where do the values 1.96 and 2.58 come from?
The middle 95% of the sample mean lie equally on either side of the mean. And logically
0.95/2=0.4750 or 47.5% of the area is to the right of the mean and the area to the left of the mean
is 0.4750. The Z value for this probability is 1.96. The Z to the right of the mean is + 1.96 and Z
to the left is – 1.96.
2.2.2.1 Interval Estimation of the Mean
a) Compute the standard error of the mean
Standard error of the mean is the standard deviation of the sample means.
= population standard deviation, n = sample size
x
n
If the population standard deviation is not known, the standard deviation of the sample s, is used
S
Sx
to approximate the population standard deviation. n
This indicates that the error in estimating the population means decreases as the sample size
increases.
b) The 95% and 99% confidence intervals are constructed as follows when n > 30.
x
x
95% confidence interval 1.96 n
x
s
99% confidence interval x 2.58 n
1.96 & 2.58 indicate the Z values corresponding to the middle 95% or 99% of the observation
respectively.
S
xZ xZ
In general a confidence interval for the mean is computed by, n , or n Z reflects
the selected level of confidence.
3|Page
Example
An experiment involves selecting a random sample of 256 middle managers for studying their
annual income. The sample mean is computed to be Br. 35,420 and the sample standard
deviation is Br. 2,050.
a) What is the estimated mean income of all middle managers (the population)?
b) What is the 95% confidence interval (rounded to the nearest 10)
c) What are the 95% confidence limits?
d) Interpret the finding.
Solution
a) Sample mean is 35420 so this will approximate the population mean so = 35420. It is
estimated from the sample mean.
b) The confidence interval is between 35168.87 and 35671.13 found by
S 2050
X 1.96
n = 35420 1.96 256 = 35168.87 and 35671.13
C) The end points of the confidence interval are called the confidence limits. In this case they
are rounded to 35168.87 and 35671.13. 35168.87 is the lower limit and 35671.13 is the upper
limit.
D) Interpretation - The population means annual income would be found between 35168.87 and
35671.13 at 95 out of the 100 confidence intervals. About 5 out of the 100 confidence
intervals would not contain the population mean annual income.
Exercise
() A research firm conducted a survey to determine the mean amount smokers spend on
cigarette during a week. A sample of 49 smokers revealed that the sample mean is Br. 20 with
standard deviation of Br. 5. Construct 95% confidence interval for the mean amount spent.
2.2.2.2 Interval Estimation of the difference between two
independent means
If all possible samples of large size n 1 and n2 are drawn from two different populations, then
x x 2 is approximately normal
sampling distribution of the difference between two means 1 &
with mean (µ1-µ2) and standard deviation x 1 - x 2 = √12/n1 +22/n2
For a desired confidence level, the confidence interval limits for the population mean (µ 1-µ2) are
given by x 1 - x 2 ± Z x 1 - x 2
Example:
The strength of the wire produced by company A has a mean of 4,500kg and a standard deviation
of 200 kg. Company B has a mean of 4,000 kg and a standard deviation of 300 kg. A sample of
50 wires of company A and 100 wires of company B are selected at random for testing the
strength. Find 99% confidence limits on the difference in the average strength of the populations
of wires produced by the two companies.
Solution: the following information is given:
Company A: x 1 = 4500 = 200 n1 = 50
Company B: x 2 = 4000 = 300 n2 = 100
Therefore, x 1 - x 2 = 4500 – 4000 = 500 and Z = 2.576
4|Page
x 1 -
x 2 = √12/n1 +22/n2 = √40,000/50 + 90,000/100 = 41.23
x x x x
The required 99 % confidence interval limits are given by ( 1 - 2) ± Z 1 - 2
= 500 ± 2.576(41.23) = 500 ± 106.2
Hence the 99% confidence limit on the difference in the average strength of wires produced by
the two companies are likely to fall in the interval 394.04 =< µ =< 605.96
Exercise 1: A large chain-store wishes to compare credit card holders living in area I with those
living in area II in terms of the length of time the customers have held the credit cards. A random
sample of 81 card holders is selected from each area. The sample means are found to be 120
months and 90 months for area I and area II, respectively. The population variances being 49
months for area I and 36 months for area II, construct a 99% confidence interval for the
difference between the two population means.
Exercise 2: A sample of 150 bulbs of brand A showed an average life of 1800 hrs with standard
deviation of 15 hrs. Another sample of 100 bulbs of brad B showed an average life of 1500 hrs
with standard deviation of 11 hrs. Obtain 95% confidence interval for the difference in the mean
life of population of A and B brand bulbs.
2.2.2.3 Interval estimation for a population proportion
The confidence interval for a population proportion is estimated
p Zp
Where p is the standard error of the proportion and
σ p=
√ p(1− p )
n
p Z n √
p(1−p )
Example. Suppose 1600 of 2000 union members sampled said they plan to vote for the proposal
to merge with a national union. Union by laws state that at least 75% of all members must
approve for the merger to be enacted. Using the 0.95 degree of confidence, what is the interval
estimate for the population proportion? Based on the confidence interval, what conclusion can be
1600
drawn? p = 2000 = 0.8. The sample proportion is 80%
5|Page
for the proportion of people in the population who consider television their major source of news
information.
Exercise
As the sample size decreases the curve representing the t distribution will have wider tails and
will be more flat at the center.
Z Distribution
6|Page
t Distribution
Computing t value
x−μ
The t variable representing the student’s t distribution is defined as: t = s / √ n where: x is the
sample mean of n measurements, is the population mean and s is the sample standard deviation
x−μ
Note that t is just like Z = σ / √ n except that we replace with s. unlike our methods of large
samples, cannot be approximated by s when the sample size is less than 30 and we cannot use
the normal distribution. The table for the t distribution is constructed for selected levels of
confidence for degree of freedom up to 30. To use the table we need to know two numbers, the
tail area, (1 minus confidence level selected), and the degree of freedom.
(1 – Confidence level selected) is , the Greek letter alpha. This is the error we committee in
estimating.
S
The confidence interval for the sample mean is x t √n
Example: A traffic department in town is planning to determine mean number of accidents at a
high-risk intersection. Only a random sample of 10 days measurements were obtained. The
numbers of accidents per day were:
8 7 10 15 11 6 8 5 13 12
Construct a 95% confidence interval for the mean number of accident per day.
a) Compute x and s
7|Page
95
x = 10 = 9.5 per day
S x=
√ ∑ ( x−x )2 =
n−1 √ 94 . 5
9 = 3.24 per day
α 0 . 05
=
The confidence level is 95%, = 1 – 0.95 = 0.05 (one tail) and two tails is 2 2 = 0.025
The degree of freedom, df = n – 1 = 10 – 1 = 9 from the t table t_0.025, df_ 9 = 2.26
s
The confidence interval is x t_.0025 df(9) √ n
3. 24
9.5 (2.26) √ 10 = 9.5 2.3 = 7.2 to 11.80
With 95% confidence the mean number of accident at this particular intersection is between 7.2
and 11.8.
Exercise
() A quality controller of a company plans to inspect the average diameter of small bolts made.
A random sample of 6 bolts was selected. The sample is computed to be 2.0016mm and the
sample standard deviation 0.0012mm. Construct the 99% confidence interval for all bolts made.
Size of a sample must be determined scientifically. Care must be taken not to select a sample too
large or too small. The sample size should be mathematically determined.
2.2.4.1 Sample size for estimating Population mean
When the distribution of sample mean x is normal, the standard normal variable Z is given as
x−μ σ
σ
Z= √n x
or - µ = √
n
The value of Z in the above equation will be positive or negative, depending or whether the
sample mean x is larger or smaller than population mean µ. The difference between
and µ is
x
called the sampling error or margin of error, E. thus, margin of error acceptable (i.e.
maximum tolerable difference between unknown population mean and the sample estimate at a
particular level of confidence) can be written as:
σ σ
x - µ = Z √n = E = Z √n
σ
√n = Z E = n = Z22/ E2
If population standard deviation is not known, the sample standard deviation, s can be used to
determine the sample size, n.
8|Page
Example: Given a population with a standard deviation of 8.6. What sample size is needed to
estimate the mean of population within ± 0.5 with 99% confidence?
Solution: we have E = 0.5, Z = 2.576 at 99% CI and = 8.6
n = Z22 / E2
n = (2.576)2(8.6)2 = 1964
(0.5)2
2.2.4.2 Sample size for estimating population proportion
The method for determining a sample size for estimating the population proportion is similar to
that used in the previous section. We require that the sample proportion p should fall within
range p ± E.
The formula for determining the sample size n for a proportion
2
n= p (1 - p ) ( ZE )
Where: p - estimated proportion
Z = Z value for the selected confidence level
E = the maximum tolerable error
Example: A member of parliament wants to determine her popularity in her region. She
indicates that the proportion of voters who will vote for her must be estimated within + 2 percent
of the population proportion. Further, the 95% degree of confidence is to be used. In past
elections she received 40% of the popular vote in that area. She doubts whether it has changed
much. How many registered voters should be sampled?
p = 0.40 & E = 0.02
Solution: Z = 1.96,
Z 2
n = (1 - ) E )
p p (
( )
1. 96 2
= 0.40 (1 – 0.4) 0 . 02 = 2,304.96 2305
9|Page
no = (1.96)2(20)2 = 61.456
(5)2
Since the population size is finite, the revised sample size obtained by using the correction factor
noN ( 61.456)1000
n= =n= = 57.952
no+( N−1) 61.456+(1000−1)
Thus a slightly small sample size of n = 58 should be taken.
10 | P a g e