ch10 Slides

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Department of Quantitative Methods & Information Systems

Introduction to Business Statistics


QM 220
Chapter 10

Dr. Mohammad Zainal


Chapter 10: Estimation and hypothesis testing: two populations

What are we going to cover?


10.1 Comparing Two Population Means Using Large
Independent Samples
10.2 Comparing Two Population Means Using Small
Independent Samples: Equal standard deviations
10.3 Comparing Two Population Means Using Small
Independent Samples: Unequal standard deviations
10.4 Paired Difference Experiments
10.5 Comparing Two Population Proportions Using
Large Independent Samples

QM-220, M. Zainal 2
Chapter 10: Estimation and hypothesis testing: two populations

Why do we need to compare two populations?


You have just received a job offer in two different
countries with the same salary.
You want to invest your money into two different stock
markets
A pharmaceutical company just announced a new drug
that is better than Panadol.
A bank manager introduced a new serving policy that
reduces the waiting time.
You want to decide between two cars

QM-220, M. Zainal 3
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two


population means for large and independent samples
Let µ1 be the mean of the first population and µ2 be the
mean of the second population.
Suppose we want to make a confidence interval and test a
hypothesis about the difference between these two
population means, that is, µ1 - µ2.
Let x1 be the mean of a sample taken from the first
population and x2 be the mean of a sample taken from the
second population.
Then, x1 - x2 is the sample statistic that is used to make an
interval estimate and to test a hypothesis about µ1 - µ2.

QM-220, M. Zainal 4
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two


population means for large and independent samples 10.1.1
Independent versus Dependent Samples
Two samples are independent if they are drawn from two
different populations and the elements of one sample have
no relationship to the elements of the second sample.
If the elements of the two samples are somehow related,
then the samples are said to be dependent.
Thus, in two independent samples, the selection of one
sample has no effect on the selection of the second sample

QM-220, M. Zainal 5
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two


population means for large and independent samples
Example: Suppose we want to estimate the difference between the
mean salaries of all male and all female executives. To do so, we draw
two samples, one from the population of male executives and another
from the population of female executives. These two samples are
independent because they are drawn from two different populations,
and the samples have no effect on each other.

Example: Suppose we want to estimate the difference between the


mean weights of all participants before and after a weight loss
program. To accomplish this, suppose we take a sample of 40
participants and measure their weights before and after the completion
of this program. Note that these two samples include the same 40
participants. This is an example of two dependent samples. Such
samples are also called paired or matched samples

QM-220, M. Zainal 6
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two


population means for large and independent samples
10.1.2 Mean, standard deviation, and sampling distribution
of x2 - x2
Suppose we select two (independent) large samples from two different
populations that are referred to as population 1 and population 2.
1  the mean of population1  2  the mean of population 2
 1  the standard deviation of population1  2  the standard deviation of population 2
n1  the size of the sample drawn from pop1 n 2  the size of the sample drawn from pop2
x1  the mean of sample1 x2  the mean of sample 2

QM-220, M. Zainal 7
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
 If independent random samples are taken from two
population, then the sampling distribution of the sample
difference in means x1  x 2 is
-Normal, if each of the sampled populations is normal
and approximately normal if the sample sizes n1 and n2
are large
-Has mean:  x1 - x 2 = 1   2
-Has standard deviation:
 12  22 s12 s22
 x -x =  , or s x1 - x 2 = 
1 2
n1 n2 n1 n 2
QM-220, M. Zainal 8
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
10.1.3 Interval estimate of µ1 - µ2
If two independent samples are from populations that are
normal or each of the sample sizes is large, 100(1 - a)%
confidence interval for µ1 - µ2 is

 12  22
(x1  x 2 )  za/2 
n1 n2

QM-220, M. Zainal 9
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
If 1 and 2 are unknown and each of the sample sizes is
large (n1, n2  30), estimate the sample standard deviations
by s1 and s2 and a 100(1 - a)% confidence interval for
µ1 - µ2 is

2 2
s s
(x1  x 2 )  za/2 1
 2
n1 n2

QM-220, M. Zainal 10
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
Example: According to the U.S. Bureau of the Census, the
average annual salary of full-time state employees was $49,056 in
New York and $46,800 in Massachusetts in 2001. Suppose that
these mean salaries are based on random samples of 500 full-time
state employees from New York and 400 full-time state employees
from Massachusetts and that the population standard deviations
of the 2001 salaries of all full-time state employees in these two
states were $9000 and $8500, respectively.
(a) What is the point estimate of µ1 - µ2 ? What is the margin of
error?
(b) Construct a 97% confidence interval for the difference
between the 2001 mean salaries of all full-time state employees in
these two states.

QM-220, M. Zainal 11
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
Example: According to the National Association of Colleges and
Employers, the average salary offered to college students who graduated
in 2002 was $43,732 to MIS (Management Information Systems) majors
and $40,293 to accounting majors. Assume that these means are based
on samples of 900 MIS and 1200 accounting majors and that the sample
standard deviations for the two samples are $2200 and $1950,
respectively. Find a 99% confidence interval for the difference between
the corresponding population means.

QM-220, M. Zainal 12
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
10.1.4 Hypothesis testing of µ1 - µ2
The three situations of the alternative hypothesis are:
1.µ1  µ2 is same as µ1 - µ2  0
2.µ1 > µ2 is same as µ1 - µ2 > 0
3.µ1 < µ2 is same as µ1 - µ2 < 0

QM-220, M. Zainal 13
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
Let Ho: µ1 - µ2 = Do, the value of the test statistic z for x1  x 2
is computed as (x1  x 2 )  D 0
z
 12  22

n1 n2

Alternative Reject H0 if: p-Value


Area under std normal curve right of z
H a : 1   2  Do z  za
z   za
H a : 1   2  Do Area under std normal curve left of z

z  za / 2 , that is
H a : 1   2  Do Twice area under std normal
z  za / 2 or z   za / 2 curve right of z
QM-220, M. Zainal 14
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
Example: Refer to the 2001 average salaries of full-time
state employees in New York and Massachusetts. Test at the
1% significance level if the 2001 mean salaries of fulltime
state employees in New York and Massachusetts are
different.

QM-220, M. Zainal 15
Chapter 10: Estimation and hypothesis testing: two populations

10.1 Inferences about the difference between two population


means for large and independent samples
Example: Refer to the mean salaries offered to college
students who graduated in 2002 with MIS and accounting
majors. Test at 2.5% significance level if the mean salary
offered to college students who graduated in 2002 with the
MIS major is higher than that for accounting major.

QM-220, M. Zainal 16
Chapter 10: Estimation and hypothesis testing: two
populations

F distribution
 The F distribution is continuous and skewed to the right.
 The F distribution has two degrees of freedom: df1 for
the numerator and df2 for denominator.
The units of an F distribution, denoted by Fdf1,df2,a are
nonnegative

QM-220, M. Zainal 17
Chapter 10: Estimation and hypothesis testing: two
populations

F distribution:
Example: Find the F value for 8 degrees of freedom for the
numerator, 14 degrees of freedom for the denominator, and
0.05 area in the right tail of F distribution curve. (F .05 8,14)

QM-220, M. Zainal 18
Chapter 10: Estimation and hypothesis testing: two
populations

F distribution:
Example: Find the F value for 10 degrees of freedom for the
numerator, 12 degrees of freedom for the denominator, and
0.01 area in the right tail of F distribution curve.

Example: Find the F value for 15 degrees of freedom for the


numerator, 15 degrees of freedom for the denominator, and
0.05 area in the right tail of F distribution curve.

QM-220, M. Zainal 19
Chapter 10: Estimation and hypothesis testing: two
populations

Comparing Two Population Variances Using Independent


Samples (One-Tailed)
If both sampled populations are normal
We test Ho: 12 = 22 (12/ 22 = 1)
H1: 12 > 22 (12/ 22 > 1)
Test statistic
2
s
F 1
2
s 2
Reject Ho in favor of H1 if:
F > F a, df1, df2 or if
p-value < a
F a, df1, df2 is based on (n1 – 1) and (n2 – 1) df

QM-220, M. Zainal 20
Chapter 10: Estimation and hypothesis testing: two
populations

Comparing Two Population Variances Using Independent


Samples (One-Tailed)
If both sampled populations are normal
We test Ho: 12 = 22 (12/ 22 = 1)
H1: 12 < 22 (12/ 22 < 1)
Test statistic
s 22
F 2
s1
Reject Ho in favor of H1 if:
F > F a, df1, df2 or if
p-value < a
F a, df1, df2 is based on (n2 – 1) and (n1 – 1) df

QM-220, M. Zainal 21
Chapter 10: Estimation and hypothesis testing: two
populations

Comparing Two Population Variances Using Independent


Samples (Two-Tailed)
If both sampled populations are normal
We test Ho: 12 = 22 (12/ 22 = 1)
H1: 12  22 (12 / 22  1)
Test statistic
larger of s12and s 22
F
smaller of s12and s 22
Reject Ho in favor of H1 if:
F > F a/2, df1, df2 or if
p-value < a
F a/2, df1, df2 is based on
df1 = {size of sample with larger variance} – 1
df2 = {size of sample with smaller variance} – 1

QM-220, M. Zainal 22
Chapter 10: Estimation and hypothesis testing: two
populations

10.2 Inferences about the difference between two population


means for small and indep. samples: Equal stand. dev.
The t distribution is used to make inferences about µ1 - µ2
when the following assumptions hold true:
1. The two populations from which the two samples are
drawn are (approximately) normally distributed.
2. The samples are small (n1 < 30 and n2 < 30) and
independent.
3. The standard deviations (1 and 2) of the two
populations are unknown but they are assumed to be
equal that is, 1= 2 .

QM-220, M. Zainal 23
Chapter 10: Estimation and hypothesis testing: two
populations

10.2 Inferences about the difference between two population


means for small and indep. samples: Equal stand. dev.
Since 1= 2 and they are unknown, we replace them by
sp, which is called the pooled sample variance:

(n  1)s 2
 (n  1)s 2
s 2p  1 1 2 2
(n1  n 2  2)

 n1 -1 and n2-1 are the degrees of freedom for samples 1


and 2, respectively, and n1 + n2 – 2 are the degrees of
freedom for the two samples taken together

QM-220, M. Zainal 24
Chapter 10: Estimation and hypothesis testing: two
populations

10.2 Inferences about the difference between two population


means for small and indep. samples: Equal stand. dev.
The estimator of the standard deviation x1  x 2 is
1 1 
s x1 x 2  s   
2
p
 n1 n 2 
 So, If two independent samples are drawn from
populations that are normal with equal variances,
100(1 - a)% confidence interval for µ1 - µ2 is
1 1 
(x 1  x 2 )  t a /2 s   
2
p
 n1 n 2 
Where t is based on a /2 and n1 + n2 – 2 degrees of freedom.

QM-220, M. Zainal 25
Chapter 10: Estimation and hypothesis testing: two
populations

10.2 Inferences about the difference between two population


means for small and indep. samples: Equal stand. dev.
The value of the test statistic t for x1  x 2 is
(x 1  x 2 )  D0
t 
 1 1 
s  
2
p 
n
 1 n 2 

Alternative Reject H0 if: p-Value


H1 : 1  2  D0 t  ta Areaunder t distribution right of t

H1 : 1  2  D0 t   ta Areaunder t distribution left of t

H1 : 1  2  D0 t  ta / 2 ,that is
Twice areaunder t distribution
t  ta / 2 or t   t a / 2
right of t
QM-220, M. Zainal 26
Chapter 10: Estimation and hypothesis testing: two
populations

10.2 Inferences about the difference between two population


means for small and indep. samples: Equal stand. dev.
Example: A chemical engineer wants to test which of two catalysts
maximizes the hourly yield of a chemical process. The following table
gives the data collected from that experiment.
Catalyst A Catalyst B
801 752
814 718
784 776
836 742
820 763

Assume the data above are approximately normal.


1-Find a 99% confidence interval for the difference between the
corresponding population means.
2-Test if µ1 = µ2 at 5% significance level.

QM-220, M. Zainal 27
Chapter 10: Estimation and hypothesis testing: two
populations

10.3 Inferences about the difference between two population


means for small and indep. samples: Unequal stand. dev.
In the previous section (10.2) we learned how to make
inferences about µ1 - µ2.
What if the population standard deviations are not only
unknown but also unequal?
All the procedures (confidence interval and test of
hypothesis) will remain the same except for two thing:
 The degrees of freedom will no longer be n1 + n2 -2
 The standard deviation of x1  x 2 is not calculated using
the pooled standard deviation sp.

QM-220, M. Zainal 28
Chapter 10: Estimation and hypothesis testing: two
populations

10.3 Inferences about the difference between two population


means for small and indep. samples: Unequal stand. dev.
If :
1. The two populations from which the two samples are
drawn are (approximately) normally distributed.
2. The samples are small (n1 < 30 and n2 < 30) and
independent.
3. The standard deviations (1 and 2 ) are unknown
and not equal, that is, 1 2
The standard deviation of x1  x 2 is
s12 s 22
s x 1 -x 2 = 
n1 n 2
QM-220, M. Zainal 29
Chapter 10: Estimation and hypothesis testing: two
populations

10.3 Inferences about the difference between two population


means for small and indep. samples: Unequal stand. dev.
The degrees of freedom are given by

(s / n1  s / n 2 )
2 2 2
df  2 1
2 2 2
2
(s1 / n1 ) (s 2 / n 2 )

n1  1 n2  1
we always round down the df to the nearest integer.

QM-220, M. Zainal 30
Chapter 10: Estimation and hypothesis testing: two
populations

10.3 Inferences about the difference between two population


means for small and indep. samples: Unequal stand. dev.
Example: A chemical engineer wants to test which of two catalysts
maximizes the hourly yield of a chemical process. The following table
gives the data collected from that experiment.
Catalyst A Catalyst C
801 880
814 850
784 690
836 755
820 880

Assume the data above are approximately normal.


1-Find a 99% confidence interval for the difference between the
corresponding population means.
2-Test if µ1 = µ2 at 5% significance level.

QM-220, M. Zainal 31
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
 In Sections 10.1, 10.2, and 10.3 we were concerned with
estimation and hypothesis testing about the difference
between two population means when the two samples were
drawn independently from two different populations.
This section describes estimation and hypothesis-testing
procedures for the difference between two population
means when the samples are dependent.
In a case of two dependent samples, two data values—one
for each sample—are collected from the same source (or
element) and, hence, these are also called paired or matched
samples.

QM-220, M. Zainal 32
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
 For example, we may want to make inferences about the
mean weight loss for members of a health club after they
have gone through an exercise program for a certain period
of time.
Suppose we select a sample of 15 members of this health
club and record their weights before and after the program.
In this example, both sets of data are collected from the
same 15 persons, once before and once after the program.
Thus, although there are two samples, they contain the
same 15 persons.

QM-220, M. Zainal 33
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
Paired or Matched Samples: Two samples are said to be
paired or matched samples when for each data value
collected from one sample there is a corresponding data
value collected from the second sample, and both these data
values are collected from the same source.
The procedures to make confidence intervals and test
hypotheses in the case of paired samples are different from
the ones for independent samples discussed in earlier
sections of this chapter.

QM-220, M. Zainal 34
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
In paired samples, the difference between the two data
values for each element of the two samples is denoted by d.
This value of d is called the paired difference.
We then treat all the values of d as one sample and make
inferences applying procedures similar to the ones used for
one-sample cases in Chapters 8 and 9.
Note that because each source (or element) gives a pair of
values (one for each of the two data sets), each sample
contains the same number of values. That is, both samples
are the same size.

QM-220, M. Zainal 35
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
We denote the (common) sample size by n, which gives the
number of paired difference values denoted by d. The
degrees of freedom for the paired samples are n − 1.
Let
µd = the mean of the paired differences for the population
d = the standard deviation of the paired differences for
the population, which is usually never known
d = the mean of the paired differences for the sample
sd = the standard deviation of the paired differences for
the sample
n = the number of paired difference values
df = n - 1

QM-220, M. Zainal 36
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
 The values of the mean and standard deviation, d and sd,
of paired differences for two samples are calculated as:

d
 d
n
 d
2

d  n 2

sd 
n 1

QM-220, M. Zainal 37
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
In paired samples, instead of using x1  x 2 as the sample
statistic to make inferences about µ1 - µ2, we use the sample
statistic d to make inferences about µd. Actually the value of
d is always equal to x1  x 2 , and the value of µd is always
equal to µ1 - µ2.
Sampling Distribution, Mean, and Standard Deviation of d
If the sample size is large (n ≥ 30), then the sampling
distribution of d is approximately normal with its mean and
standard deviation given as
d
 d   d and  d 
n
QM-220, M. Zainal 38
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
In paired samples, most of the times, the samples sizes are
small and d is unknown.
So, if
 n is less than 30
 d is not known.
 the population of paired differences is (approximately) normal
then the t distribution is used to make inferences about µd
and
sd
sd 
n
QM-220, M. Zainal 39
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
The 100(1 - a)% confidence interval for µd is

d  ts d

The value of the test statistic t for the mean of differences


is
d  d
t
sd

QM-220, M. Zainal 40
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples
Example: A researcher wanted to find the effect of a special diet on
systolic blood pressure. She selected a sample of seven adults and put
them on this dietary plan for three months. The following table gives
the systolic blood pressures of these seven adults before and after the
completion of this plan. Construct a 95% confidence interval for μd.

Using the 5% significance level, can we conclude that the mean of the
paired differences μd is different from zero? Assume that the
population of paired differences is (approximately) normally
distributed.

QM-220, M. Zainal 41
Chapter 10: Estimation and hypothesis testing: two
populations

10.4 Inferences About the Difference Between Two


Population Means for Paired Samples

QM-220, M. Zainal 42
Chapter 10: Estimation and hypothesis testing: two
populations

10.5 Inferences About the Difference Between Two


Population Proportions for Large and Indep. Samples
As we learned in the previous chapters, many times in
our life we need to deal with proportions.
In this section we will learn how to construct a confidence
interval and test a hypothesis about the difference between
two population proportions.
We may want to estimate the difference between the
proportions of defective items produced on two different
machines.
We may want to test the hypothesis that the proportion of
defective items produced on Machine I is different from the
proportion of defective items produced on Machine II.

QM-220, M. Zainal 43
Chapter 10: Estimation and hypothesis testing: two
populations

10.5 Inferences About the Difference Between Two


Population Proportions for Large and Indep. Samples
In this case, we are to test the null hypothesis p1 − p2 = 0
against the alternative hypothesis p1 − p2 ≠ 0.
The sample statistic that is used to make inferences about
p1 − p2 is ˆp1  ˆp,2 where p̂1 and p̂ 2 are the proportions for two
large and independent samples.
 For two large and independent samples, the sampling
distribution of ˆp1  ˆp 2 is (approximately) normal with its
mean and standard deviation given as
p1q1 p 2q 2
 ˆp  ˆp  p1  p 2 and  ˆp  ˆp  
1 2 1 2
n1 n2

QM-220, M. Zainal 44
Chapter 10: Estimation and hypothesis testing: two
populations

10.5 Inferences About the Difference Between Two


Population Proportions for Large and Indep. Samples
If np and nq for both samples are greater than 5, then the
100(1 - a)% confidence interval for p1 − p2 is
ˆp1qˆ 1 ˆp 2qˆ 2
 ˆp1  ˆp 2   zs ˆp  ˆp where s ˆp1  ˆp2  
1 2
n1 n2
When testing about p1 = p2, we assume it is true. So we
need to find a common value for p1 and p2 (pooled).

x1  x 2
p
n1  n 2

QM-220, M. Zainal 45
Chapter 10: Estimation and hypothesis testing: two
populations

10.5 Inferences About the Difference Between Two


Population Proportions for Large and Indep. Samples
The estimate of the standard deviation of for ˆp1  ˆp 2 is

1 1 
s ˆp1  ˆp 2  pq   
 n1 n 2 
The value of the test statistic for ˆp1  ˆp 2 is

z 
 ˆp1  ˆp 2    p1  p 2 
s ˆp1  ˆp2
QM-220, M. Zainal 46
Chapter 10: Estimation and hypothesis testing: two
populations

10.5 Inferences About the Difference Between Two Population


Proportions for Large and Indep. Samples
Example: researcher wanted to estimate the difference between the
percentages of users of two toothpastes who will never switch to
another toothpaste. In a sample of 500 users of Toothpaste A taken by
this researcher, 100 said that they will never switch to another
toothpaste. In another sample of 400 users of Toothpaste B taken by the
same researcher, 68 said that they will never switch to another
toothpaste.
a. Let p1 and p2 be the proportions of all users of Toothpastes A and B,
respectively, who will never switch to another toothpaste. What is the
point estimate of p1 − p2?
b. Construct a 97% confidence interval for the difference between the
proportions of all users of the two toothpastes who will never switch.
c. At the 1% significance level, can we conclude that p1 is higher than
p2?

QM-220, M. Zainal 47

You might also like