Chapter 4: Sampling, Estimation and Confidence Interval, Hypothesis Testing
Chapter 4: Sampling, Estimation and Confidence Interval, Hypothesis Testing
Chapter 4: Sampling, Estimation and Confidence Interval, Hypothesis Testing
The shape of the sampling distribution of X relates to the following two cases.
1.
The population from which samples are drawn has a normal distribution.
2.
The population from which samples are drawn does not have a normal distribution.
2
n
Example 4.1. In a recent STAT test, the mean score for all examinees was 1016. Assume that the
distribution of STAT scores of all examinees is normal with a mean of 1016 and a standard deviation of
153. Let X be the mean STAT score of a random sample of certain examinees. Calculate the mean and
standard deviation of X and describe the shape of its sampling distribution when the sample size is
(a)
16
(b)
50
(c)
1000
Solution.
.
n
That means, for all distribution of X, if n is large
X ~ N X = , X =
2
2
n
Chapter 4 - 1
Remark :
1.
The sample size is usually considered to be large if n 30 .
2.
As sample size increases, the sampling distribution of X behaves more like normal distribution
and hence, the approximation is better.
Example 4.2. The mean rent paid by all tenants in a large city is RM1250 with a standard deviation of
RM225. However, the population distribution of rents for all tenants in this city is skewed to the right.
Calculate the mean and standard deviation of X and describe the shape of its sampling distribution when
the sample size is
(a)
30
(b) 100
Solution.
4.2
Example 4.3. Assume that the weights of all packages of a certain brand of cookies are normally
distributed with a mean of 32 ounces and a standard deviation of 0.3 ounce. Find the probability that the
mean weight, X , of a random sample of 20 packages of this brand of cookies will be between 31.8 and
31.9 ounces.
Solution.
Example 4.4. The prices of the houses in Selangor have a skewed probability distribution with a mean of
RM165,300 and standard deviation RM29,500. Find the probability that the mean prices, X , of a
random sample of 400 houses in Selangor is
i)
within RM3,000 of the population mean,
ii)
less than the population mean by at least RM2,500.
Solution.
Chapter 4 - 2
4.3
Estimation
The assignment of value(s) to a population parameter based on a value of the corresponding sample
statistic is called estimation.
The value(s) assigned to a population parameter based on the value of a sample statistic is called an
estimate. The sample statistic used to estimate a population parameter is called an estimator.
The estimation procedure involves the following steps:
1.
Select a sample.
2.
Collect the required information from the members of the sample.
3.
Calculate the value of the sample statistic.
4.
Assign value(s) to the corresponding population parameter.
A point estimate is a single value (or point) used to approximate a population parameter.
For example,
i)
the sample proportion, p , is the best point estimate of the population proportion, p.
ii)
the sample mean, X , is the best point estimate of the population mean, .
iii)
the sample variance, s2, is the best point estimate of the population variance, 2 .
An interval estimate is an interval that constructed around the point estimate, and it is stated that this
interval is likely to contain the true value of a population parameter.
Each interval is constructed with regard to a given confidence level and is called a confidence interval.
The confidence level associated with a confidence interval states how much confidence we have that this
interval contains the true population parameter. The confidence level is denoted by (1 )100%
n
s
if is known
if is not known
n
used here is read from the standard normal distribution table for the given confidence
x z / 2
The value of z / 2
level.
The maximum error of estimate for , denoted by E, is the quantity that is subtracted from and added to
the value of x to obtain a confidence interval for . Thus,
s
or E = z / 2
.
E = z / 2
n
n
Chapter 4 - 3
Example 4.5. A research department took a sample of 36 textbooks and collected information on their
prices. This information produced a mean price of RM54.40. It is known that the standard deviation of
the prices of all textbooks is RM4.50.
a)
What is the point estimate of the mean price of all textbooks? What is the margin of error for the
95% confidence interval?
b)
Construct a 90% confidence interval for the mean price of all textbooks.
Solution.
Example 4.6. According to a recent survey, the workers employed in manufacturing industries earned an
average of RM546 per month. Assume that this mean is based on a random sample of 1000 workers
selected from the manufacturing industries and that the standard deviation of earnings for this sample is
RM75. Find a 99% confidence interval for the mean earnings of all workers employed in manufacturing
industries.
Solution.
n=
z 2
Chapter 4 - 4
When finding the sample size n, if the use of formula does not result in a whole number, always increase
the value of n to the next larger whole number.
Remark
When is not known, we can estimate using these methods:
1.
range 4
2.
Estimate the value of by using the earlier result.
Example 4.7. We want to estimate the mean IQ scores for the population of statistics professors. Given
that the standard deviation of IQ scores for all statistics professors is 15. How many statistics professors
must be randomly selected for IQ tests if we want 99% confidence that the sample mean is within 2 IQ
points of the population mean?
Solution.
Chapter 4 - 5
Chapter 4 - 6
4.4
Test statistic
The test statistic is a value computed from the sample data. It is used in making the decision about the
rejection of the null hypothesis
Test statistic for mean(large sample),
z=
t=
x
s
Chapter 4 - 7
or
z=
x
s n
Two-tailed test
Left-tailed test
Right-tailed test
= or
= or
<
>
In both tails
Example 4.11. Write the null and alternative hypothesis for each of the following cases. Determine
whether we have each is a case of a two-tailed, a left-tailed or a right-tailed test.
a)
According to the formal record, the mean family size was 3.19 in 1995. A researcher wants to
check whether or not this mean has changed since 1995.
b)
A company claims that the mean amount of soda in all soft-drink cans is 12 ounces. Suppose a
consumer agency wants to test whether the mean amount of soda per can is less than 12 ounces.
c)
A research report shows that the mean cholesterol of all adult males in KL is 175 in 1995. Test if
the mean cholesterol of all adult males in KL is now higher than 175.
Solution.
Chapter 4 - 8
Decision
Correct decision
Correct decision
Type II error
A Type I error occurs when a true null hypothesis is rejected. The value of represents the probability
of committing this type of error, that is
= P(H0 is rejected | H0 is true)
The value of = the significance level of the test.
A Type II error occurs when a false null hypothesis is not rejected. The value of represents the
probability of committing a Type II error, that is
= P(H0 is not rejected | H0 is false)
The value of 1 is called the power of the test. It represents the probability of not making a Type II
error.
Steps to perform a test of hypothesis
1.
2.
3.
4.
5.
4.5
In tests of hypothesis about for large samples (n 30), the test statistic is
x
z=
if is known
n
x
z=
if is not known
s n
The value of z calculated for a sample mean x is also called the observed value of z.
Chapter 4 - 9
Example 4.12. The management of Priority Health Club claims that its members lose an average of 10
pounds or more within the first month after joining the club. A consumer agency that wanted to check
this claim took a random sample of 36 members of this health club and found that they lost an average of
9.2 pounds within the first month of membership with a standard deviation of 2.4 pounds.
a)
What will the conclusion be if using 1% significance level?
b)
What are the Type I and Type II errors in this case?
c)
Compute the probability of committing Type I error in this case.
d)
What is the probability of making Type II error if the mean is changed to 9 pounds?
Solution.
Chapter 4 - 10
Chapter 4 - 11
4.6 Hypothesis tests about a population mean : small sample (n < 30)
Conditions under which the t distribution is used to make tests of hypotheses about
The t distribution is used to conduct a test of hypotheses about if
1.
The population from which the sample is drawn is (approximately) normally distributed.
2.
The sample size is small (that is , n < 30).
3.
The population standard deviation is not known.
In tests of hypothesis about for small samples (n < 30), the test statistic is
x
,
with v = n 1
t=
s n
The value of t calculated for a sample mean x is also called the observed value of t.
Note that
The Normal distribution is used to conduct a test of hypotheses about if
1.
The population from which the sample is drawn is (approximately) normally distributed.
2.
The sample size is small (that is , n < 30).
3.
The population standard deviation is known.
Example 4.14. From the past record of a bank, with the old computer system, a teller at this bank could
serve, on average 22 customers per hour. Recently, a new system was installed, expecting that it would
increase the service rate. To check if the new computer system is more efficient than the old system, the
management took a random sample of 18 hours and found that during these hours the mean number of
customers served by tellers was 28 per hour with a standard deviation of 2.5. Testing at the 1%
significance level, would you conclude that the new computer system is more efficient than the old
computer system? Assume that the number of customers served per hour by a teller is approximately
normally distributed.
Solution.
Chapter 4 - 12
Example 4.15. A psychologist claims that the mean age at which children start walking is 12.5 months.
Carol wanted to check if this claim is true. She took a random sample of 18 children and found that the
mean age at which these children started walking was 12.9 months with a standard deviation of 0.80
month. Using the 1% significance level, can you conclude that the mean age at which all children start
walking is different from 12.5 months? Assume that the ages at which all children start walking have an
approximately normal distribution.
Solution.
Chapter 4 - 13