Chapter 4: Sampling, Estimation and Confidence Interval, Hypothesis Testing

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II
Chapter 4 : Sampling, Estimation and Confidence Interval, Hypothesis

Testing
4.1
Shape of the sampling distribution of X .
The shape of the sampling distribution of X relates to the following two cases.
1.
The population from which samples are drawn has a normal distribution.
2.
The population from which samples are drawn does not have a normal distribution.
4.1.1 Sampling from a normally distributed population

If the population from which the samples are drawn is normally distributed with mean, and standard
deviation, , then the sampling distribution of the sample mean, X , will also be normally distributed
with the following mean and standard deviation, irrespective of the sample size: X = and X =
That means, if X ~ N ( , 2 ) , then X ~ N X = , X =
2
2
n
Example 4.1. In a recent STAT test, the mean score for all examinees was 1016. Assume that the
distribution of STAT scores of all examinees is normal with a mean of 1016 and a standard deviation of
153. Let X be the mean STAT score of a random sample of certain examinees. Calculate the mean and
standard deviation of X and describe the shape of its sampling distribution when the sample size is
(a)
16
(b)
50
(c)
1000
Solution.
4.1.2 Sampling from a population that is not normally distributed

Central Limit Theorem
For a relatively large sample size, the sampling distribution of X is approximately normal, regardless of
the distribution of the population under consideration. The mean and standard deviation of the sampling
distribution of X are X = and X =
.
n
That means, for all distribution of X, if n is large
X ~ N X = , X =
2
2
n
Chapter 4 - 1
Remark :
1.
The sample size is usually considered to be large if n 30 .
2.
As sample size increases, the sampling distribution of X behaves more like normal distribution
and hence, the approximation is better.
Example 4.2. The mean rent paid by all tenants in a large city is RM1250 with a standard deviation of
RM225. However, the population distribution of rents for all tenants in this city is skewed to the right.
Calculate the mean and standard deviation of X and describe the shape of its sampling distribution when
the sample size is
(a)
30
(b) 100
Solution.
4.2
Application of the sampling distribution of X .
Example 4.3. Assume that the weights of all packages of a certain brand of cookies are normally
distributed with a mean of 32 ounces and a standard deviation of 0.3 ounce. Find the probability that the
mean weight, X , of a random sample of 20 packages of this brand of cookies will be between 31.8 and
31.9 ounces.
Solution.
Example 4.4. The prices of the houses in Selangor have a skewed probability distribution with a mean of
RM165,300 and standard deviation RM29,500. Find the probability that the mean prices, X , of a
random sample of 400 houses in Selangor is
i)
within RM3,000 of the population mean,
ii)
less than the population mean by at least RM2,500.
Solution.
Chapter 4 - 2
4.3
Estimation
The assignment of value(s) to a population parameter based on a value of the corresponding sample
statistic is called estimation.
The value(s) assigned to a population parameter based on the value of a sample statistic is called an
estimate. The sample statistic used to estimate a population parameter is called an estimator.
The estimation procedure involves the following steps:
1.
Select a sample.
2.
Collect the required information from the members of the sample.
3.
Calculate the value of the sample statistic.
4.
Assign value(s) to the corresponding population parameter.
A point estimate is a single value (or point) used to approximate a population parameter.
For example,
i)
the sample proportion, p , is the best point estimate of the population proportion, p.
ii)
the sample mean, X , is the best point estimate of the population mean, .
iii)
the sample variance, s2, is the best point estimate of the population variance, 2 .
An interval estimate is an interval that constructed around the point estimate, and it is stated that this
interval is likely to contain the true value of a population parameter.
Each interval is constructed with regard to a given confidence level and is called a confidence interval.
The confidence level associated with a confidence interval states how much confidence we have that this
interval contains the true population parameter. The confidence level is denoted by (1 )100%
4.3.1 Interval estimation of a population mean : large sample (n 30)

The (1 )100% confidence interval for for large samples (n 30) is
x z / 2
n
s
if is known
if is not known
n
used here is read from the standard normal distribution table for the given confidence
x z / 2
The value of z / 2
level.
The maximum error of estimate for , denoted by E, is the quantity that is subtracted from and added to
the value of x to obtain a confidence interval for . Thus,
s
or E = z / 2
.
E = z / 2
n
n
Chapter 4 - 3
Example 4.5. A research department took a sample of 36 textbooks and collected information on their
prices. This information produced a mean price of RM54.40. It is known that the standard deviation of
the prices of all textbooks is RM4.50.
a)
What is the point estimate of the mean price of all textbooks? What is the margin of error for the
95% confidence interval?
b)
Construct a 90% confidence interval for the mean price of all textbooks.
Solution.
Example 4.6. According to a recent survey, the workers employed in manufacturing industries earned an
average of RM546 per month. Assume that this mean is based on a random sample of 1000 workers
selected from the manufacturing industries and that the standard deviation of earnings for this sample is
RM75. Find a 99% confidence interval for the mean earnings of all workers employed in manufacturing
industries.
Solution.
Sample size for estimating mean
n=
z 2
where z 2 = critical z score based on the desired confidence interval

E = desired margin of error
= population standard deviation
Chapter 4 - 4
When finding the sample size n, if the use of formula does not result in a whole number, always increase
the value of n to the next larger whole number.
Remark
When is not known, we can estimate using these methods:
1.
range 4
2.
Estimate the value of by using the earlier result.
Example 4.7. We want to estimate the mean IQ scores for the population of statistics professors. Given
that the standard deviation of IQ scores for all statistics professors is 15. How many statistics professors
must be randomly selected for IQ tests if we want 99% confidence that the sample mean is within 2 IQ
points of the population mean?
Solution.
4.3.2 Students t-distribution

Students t-distribution is a continuous distribution.
It is a (bell-shaped) symmetric distribution which is flatter than standard normal distribution
As the sample size becomes larger, the t distribution approaches the standard normal distribution.
The t distribution has only one parameter, called the degrees of freedom (df) with df , = n 1 and it is
denoted by t ( ) .
Example 4.8.
(a) Find the t-value of the t-distribution for the following
i)
Area in the right tail = 0.05 and
v=5
ii)
Area in the left tail = 0.025 and
v = 20.
(b) Find the area in the appropriate tail of the t-distribution for the following
i)
t = 2.467, v = 28
ii)
t = 2.878, v = 18
Solution.
Chapter 4 - 5
4.3.3 Interval estimation of a population mean : small sample (n < 30)

Note:
1.
For small sample size, the normal distribution is used to construct a confidence interval for if
i)
sample drawn from a normally distributed population; and
ii)
the value of is known.
2.
For small sample size, t distribution is used to construct confidence intervals for if
i)
sample drawn from an (approximately) normal distributed population, and
ii)
is not known.
The (1 )100% confidence interval for for small samples (n < 30) is
s
x t / 2
n
The value of t 2 used here is read from the t-distribution table for degrees of freedom df , = n 1 and
the given confidence level.
Example 4.9. A doctor wanted to estimate the mean cholesterol level for all adult men living in a town.
He took a sample of 25 adult men from the town and found that the mean cholesterol level for this sample
is 186 with a standard deviation of 12. Assume that the cholesterol levels for all adult men in the town
are normally distributed. Construct a 95% confidence interval for the population mean .
Solution.
Chapter 4 - 6
4.4
Basics of Hypothesis Testing
In statistics, a hypothesis is a claim or statement about a property of a population.

A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property of a
population.
4.4.1 Components of a Formal Hypothesis Test

Null hypothesis and alternative hypothesis
The null hypothesis (denoted by H0) is a statement that the value of a population parameter is assumed to
be true until it is declared false.
The alternative hypothesis (denoted by H1 or Ha) is the statement that the parameter has a value that
somehow differs from the null hypothesis.
Example 4.10. Give the relevant null hypothesis and alternative hypothesis.
a)
Knowing that the proportion of drivers who admit to running red lights is at least 0.5, test if the
proportion has changed.
b)
The mean height of professional basketball players is at most 7ft., test if the claim has changed.
c)
The standard deviation of IQ scores of actors is equal to 15, test if the standard deviation
i)
has changed,
ii)
is getting smaller,
iii)
is getting bigger.
Solution.
Test statistic
The test statistic is a value computed from the sample data. It is used in making the decision about the
rejection of the null hypothesis
Test statistic for mean(large sample),
z=
Test statistic for mean(small sample),
t=
x
s
Chapter 4 - 7
or
z=
x
s n
Rejection and nonrejection region

The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject
the null hypothesis.
The nonrejection region (or acceptance region) is the set of all values of the test statistic that cause us not
to reject the null hypothesis.
The critical value is a value that separates the critical region and the nonrejection region.
The significance level (denoted by ) is the probability that the test statistic will fall in the critical region
when the null hypothesis is actually true.
Tails of a test
The tails in a distribution are the extreme regions bounded by critical values. Some hypothesis tests are
two-tailed (the critical region is in the two extreme regions), some are left-tailed (the critical region is in
the extreme left region) and some are right-tailed tests (the critical region is in the extreme right region).
Sign in the null
hypothesis H0
Sign in the
alternative
hypothesis H1
Rejection region
Two-tailed test
Left-tailed test
Right-tailed test
= or
= or
<
>
In both tails
In the left tail
In the right tail
Example 4.11. Write the null and alternative hypothesis for each of the following cases. Determine
whether we have each is a case of a two-tailed, a left-tailed or a right-tailed test.
a)
According to the formal record, the mean family size was 3.19 in 1995. A researcher wants to
check whether or not this mean has changed since 1995.
b)
A company claims that the mean amount of soda in all soft-drink cans is 12 ounces. Suppose a
consumer agency wants to test whether the mean amount of soda per can is less than 12 ounces.
c)
A research report shows that the mean cholesterol of all adult males in KL is 175 in 1995. Test if
the mean cholesterol of all adult males in KL is now higher than 175.
Solution.
Chapter 4 - 8
Type I and Type II errors
Decision
We decide to reject the

null hypothesis.
We fail to reject the
null hypothesis.
True State of Nature

The null
The null
hypothesis is
hypothesis is
true.
false.
Type I error
Correct decision
Correct decision
Type II error
A Type I error occurs when a true null hypothesis is rejected. The value of represents the probability
of committing this type of error, that is
= P(H0 is rejected | H0 is true)
The value of = the significance level of the test.
A Type II error occurs when a false null hypothesis is not rejected. The value of represents the
probability of committing a Type II error, that is
= P(H0 is not rejected | H0 is false)
The value of 1 is called the power of the test. It represents the probability of not making a Type II
error.
Steps to perform a test of hypothesis
1.
2.
3.
4.
5.
State the null and alternative hypothesis.

Select the distribution to use (test statistic).
Calculate the value of the test statistic.
Determine the rejection and nonrejection regions or p-value.
Make a decision.
4.5
Hypothesis tests about a population mean : Large sample (n 30)
In tests of hypothesis about for large samples (n 30), the test statistic is
x
z=
if is known
n
x
z=
if is not known
s n
The value of z calculated for a sample mean x is also called the observed value of z.
Chapter 4 - 9
Example 4.12. The management of Priority Health Club claims that its members lose an average of 10
pounds or more within the first month after joining the club. A consumer agency that wanted to check
this claim took a random sample of 36 members of this health club and found that they lost an average of
9.2 pounds within the first month of membership with a standard deviation of 2.4 pounds.
a)
What will the conclusion be if using 1% significance level?
b)
What are the Type I and Type II errors in this case?
c)
Compute the probability of committing Type I error in this case.
d)
What is the probability of making Type II error if the mean is changed to 9 pounds?
Solution.
Chapter 4 - 10
The p-value method in hypothesis testing

Right-tailed test:
Left-tailed test:
Two-tailed test:
p-value = area to right of the test statistic z

p-value = area to left of the test statistic z
p-value = twice the area of the extreme region bounded by
the test statistic z
Criteria of the decision making

Reject H0
if p-value (the significance level)
Do not reject H0
if p-value > (the significance level)
Example 4.13. A sample of 106 body temperatures having a mean of 98.20F. Assume that the sample
is a simple random sample and that the population standard deviation is known to be 0.62F. Use a
0.05 significance level to test the common belief that the mean body temperature of healthy adults is
equal to 98.60F. Find the p-value of the test.
Solution.
Chapter 4 - 11
4.6 Hypothesis tests about a population mean : small sample (n < 30)
Conditions under which the t distribution is used to make tests of hypotheses about
The t distribution is used to conduct a test of hypotheses about if
1.
The population from which the sample is drawn is (approximately) normally distributed.
2.
The sample size is small (that is , n < 30).
3.
The population standard deviation is not known.
In tests of hypothesis about for small samples (n < 30), the test statistic is
x
,
with v = n 1
t=
s n
The value of t calculated for a sample mean x is also called the observed value of t.
Note that
The Normal distribution is used to conduct a test of hypotheses about if
1.
The population from which the sample is drawn is (approximately) normally distributed.
2.
The sample size is small (that is , n < 30).
3.
The population standard deviation is known.
Example 4.14. From the past record of a bank, with the old computer system, a teller at this bank could
serve, on average 22 customers per hour. Recently, a new system was installed, expecting that it would
increase the service rate. To check if the new computer system is more efficient than the old system, the
management took a random sample of 18 hours and found that during these hours the mean number of
customers served by tellers was 28 per hour with a standard deviation of 2.5. Testing at the 1%
significance level, would you conclude that the new computer system is more efficient than the old
computer system? Assume that the number of customers served per hour by a teller is approximately
normally distributed.
Solution.
Chapter 4 - 12
Example 4.15. A psychologist claims that the mean age at which children start walking is 12.5 months.
Carol wanted to check if this claim is true. She took a random sample of 18 children and found that the
mean age at which these children started walking was 12.9 months with a standard deviation of 0.80
month. Using the 1% significance level, can you conclude that the mean age at which all children start
walking is different from 12.5 months? Assume that the ages at which all children start walking have an
approximately normal distribution.
Solution.
Chapter 4 - 13

Chapter 4: Sampling, Estimation and Confidence Interval, Hypothesis Testing

Uploaded by

Copyright:

Available Formats

Chapter 4: Sampling, Estimation and Confidence Interval, Hypothesis Testing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4: Sampling, Estimation and Confidence Interval, Hypothesis Testing

Uploaded by

Copyright:

Available Formats

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II