Lecture - 3 (With Ink)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 48

1

MSC 600
Quantitative Methods
Dilcu Barnes (Email: [email protected])

Statistical Inferences for Two Samples

Management Science Department


Introduction 2

Engineers and scientists are often interested in


comparing two different conditions to determine
whether either condition produces a significant
effect on the response that is observed. These
conditions are sometimes called treatments.
Inference on the Difference in Means of Two Normal 3

Distributions, Variances Known


Assumptions
1. Let X 11 , X 12 ,  , X 1n1 be a random sample from
population 1.
X 21 , X 22 ,  , X 2n2
2. Let be a random sample from
population 2.

3. The two populations X1 and X2 are independent.

4. Both X1 and X2 are normal.


Sampling Distribution of 4

• Expected Value
)

• Standard Deviation (Standard Error)


2 2
( 𝜎 1) ( 𝜎 2 )
𝜎 𝑥 −𝑥 = +
1 2
𝑛1 𝑛2
where: 1 = standard deviation of population 1
2 = standard deviation of population 2
n1 = sample size from population 1
n2 = sample size from population 2
Hypothesis Tests on the Difference in Means, Variances 5

Known
Null hypothesis: H0: 1  2 = 0
X1  X 2  0
Z0 
Test statistic: 12 2
 2
n1 n2
Confidence Interval on a Difference in Means, Variances6
Known
If and are the means of independent random samples of
sizes n1 and n2 from two independent normal populations
with known variance and , respectively, a 100(1 )
confidence interval for 1  2 is

12  22 12  22
x1  x2  z /2   1   2  x1  x 2  z  / 2 
n1 n2 n1 n2

where z2 is the upper 2 percentage point of the standard


normal distribution
Inferences for a Difference in Means of Two 7

Normal Distribution, Variances Known


One-Sided Confidence
Bounds
Upper Confidence Bound

12  22
1   2  x1  x2  z  
n1 n2

Lower Confidence Bound

12  22
x1  x2  z    1   2
n1 n2
Example_1 8

Par, Inc. is a manufacturer of golf equipment and has developed a new


golf ball that has been designed to provide “extra distance.” In a test of
driving distance using a mechanical driving device, a sample of Par golf
balls was compared with a sample of golf balls made by Rap, Ltd., a
competitor.

Sample #1 Sample #2
Par, Inc. Rap, Ltd.

Sample Size 120 balls 80 balls

Sample Mean 295 yards 278 yards

Based on data from previous driving distance tests, the two population
standard deviations are known with s 1 = 15 yards and s 2 = 20 yards.
Let us develop a 95% confidence interval estimate of the difference
between the mean driving distances of the two brands of golf ball.
Solution 9
Solution 10
Example_2 11

(Please refer to Example_1)Can we conclude, using a = .05, that the mean


driving distance of Par, Inc. golf balls is different than the mean driving
distance of Rap, Ltd. golf balls?
Solution 12
Solution 13
Large Sample Tests 14

The assumption of normal population distributions and


known values of and are fortunately unnecessary when both
sample sizes are sufficiently large. In this case, the Central
Limit Theorem guarantees that has approximately a normal
distribution regardless of the underlying population
distributions. Furthermore, using sample variances in place of
population variances gives a variable whose distribution is
approximately standard normal:

Provided that m and n are both large, a CI for with a


confidence level of approximately 100(1-)% is

where – gives the lower limit and + gives the upper limit of
the interval.
Two Sample t-Test 15

We illustrated for large sample sizes the use of z


test and CI which the sample variances were used
in place of the population variances. In fact, for
large samples, the CLT allows us to use these
methods even when the two populations of
interest are not normal.
In practice, though, it will often happen that at
least one sample size is small and the population
variances have unknown values.
Hypotheses Tests on the Difference in Means Variances Unknown and 16
Equal
Case 1:   
2
1
2
2
2

We wish to test: H0: 1  2  0


H1: 1  2  0
S2
The pooled estimator of 2, denoted
p by , is defined by

(n1  1) S12  ( n2  1) S 22
S 2p 
n1  n2  2
The quantity
X 1  X 2  (1   2 )
T 
1 1
Sp 
n1 n2

has a t distribution with n1  n2  2 degrees of freedom.


Tests on the Difference in Means of Two Normal 17

Distributions, Variances Unknown and Equal


Case 1:   
2
1
2
2
2
18
Example_3
Two catalysts are being analyzed to determine how they affect
the mean yield of a chemical process. Specifically, catalyst 1 is
currently in use, but catalyst 2 is acceptable. Since catalyst 2 is
cheaper, it should be adopted, providing it does not change the
process yield. A test is run in the pilot plant and results in the
data shown in below table. Is there any difference between the
mean yields? Use   0.05, and assume equal variances.
Observation Catalyst 1 Catalyst 2
Number
1 91.50 89.19
2 94.18 90.95
3 92.18 90.46
4 95.39 93.21
5 91.79 97.19
6 89.07 97.04
7 94.72 91.07
8 89.21 92.75

s1  2.39 s2  2.98
Solution 19
Solution 20
Hypotheses Tests on the Difference in Means, Variances Unknown21

Case 2:  
2 2
1 2
If H0: 1  2  0 is true, the statistic
X1  X 2  0
T0* 
S12 S 22

n1 n2

is distributed as t with degrees of freedom given by


2
 s12 s 2 
  2 
 n1 n 
v  2 
s 2
1 / n1 
2

s 2
2 / n 
2
2

n1  1 n2  1
Hypotheses Tests on the Difference in Means, Variances Unknown22

Case 2: 
2 2
1 2

• Interval Estimate


2 2
( 𝑠1 ) ( 𝑠2 )
𝑥1 − 𝑥 2 ± 𝑡 𝛼/ 2 +
𝑛1 𝑛2

where the degrees of freedom for ta/2 are:

( )
2 2 2
( 𝑠1 ) (𝑠2 )
+
𝑛1 𝑛2
𝑑𝑓 =

( ) ( )
2 2 2 2
1 ( 𝑠1 ) 1 ( 𝑠2 )
+
𝑛1 − 1 𝑛1 𝑛2 − 1 𝑛2
Example_4 23

Specific Motors of Detroit has developed a new Automobile


known as the M car. 24 M cars and 28 J cars (from Japan) were
road tested to compare miles-per-gallon (mpg) performance.
Sample #1 Sample #2
M Cars J Cars
24 cars 28 cars Sample Size
29.8 mpg 27.3 mpg Sample Mean
2.56 mpg 1.81 mpg Sample Std. Dev.

Let us develop a 90% confidence interval estimate of the


difference between the mpg performances of the two models
of automobile.
Solution 24
Solution 25
Example_5 26

 (Please refer to Example_4) Can we conclude, using a .05 level of


significance, that the miles-per-gallon (mpg) performance of M
cars is greater than the miles-per-gallon performance of J cars?
Solution 27
Paired t-Test 28

A special case of two-sample t-tests occurs when the


observations on the two populations of interest are
collected in pairs.
For example, in a medical experiment you would like
to compare the efficacy of two drugs for lowering
blood pressure. Since blood pressure is influenced by
age and weight we might decide to create pairs of
patients so that within each of the resulting pairs, age
and weight were approximately equal.
Paired t-Test 29

Null hypothesis: H0: D  0

D  0
T0 
Test statistic: SD / n

Alternative Rejection Criterion


Hypothesis P-Value for Fixed-Level Tests
t 0  t 2, n 1 or
t 0  t 2 , n 1
H1: D ≠ 0 Probability above t0
and probability
belowt0 t0  t ,n 1

t0  t ,n 1
H1: D  0 Probability above t0

H1: D  0 Probability below t0


Example_6 30

An article in the Journal of Strain Analysis [1983, Vol. 18(2)]


reports a comparison of several methods for predicting the
shear strength for steel plate girders. Data for two of these
methods, the Karlsruhe and Lehigh procedures, when applied to
nine specific girders, are shown in the table below.
Girder Karlsruhe Method Lehigh Method Difference dj
S1/1 1.186 1.061 0.125
S2/1 1.151 0.992 0.159
S3/1 1.322 1.063 0.259
S4/1 1.339 1.062 0.277
S5/1 1.2 1.065 0.135
S2/1 1.402 1.178 0.224
S2/2 1.365 1.037 0.328
S2/3 1.537 1.086 0.451
S2/4 1.559 1.052 0.507

Determine whether there is any difference (on the average) for


the two methods.
Solution 31
A Confidence Interval for from Paired Samples 32

If and sD are the sample mean and standard


deviation of the difference of n random pairs of
normally distributed measurements, a 100(1 - )
% confidence interval on the difference in
means µD = µ1 - µ2 is
d  t /2 , n 1s D / n   D  d  t /2, n 1s D / n

where tα/2,n-1 is the upper /2% point of the t


distribution with n - 1 degrees of freedom.
Example_7 33

(Please refer to Example_6) Obtain the 95%


confidence interval.
Inferences on the Variances of Two Normal Distributions34

Suppose that two independent normal populations are of interest


when the population means and variances, say , and . Assume that
two random samples of size from population 1 and size from
population 2 are available and let and be the sample variances. We
wish to test the hypothesis
=

The development of a test procedure for these hypotheses requires a
new probability distribution, the F distribution. The random variable F
is defined to be the ratio of two-independent chi-square random
variables, each divided by its number of degrees of freedom. That is,

where W and Y are independent chi-square random variables with u


and v degrees of freedom, respectively.
Inferences on the Variances of Two Normal Populations 35
Let W and Y be independent chi-square random variables with u
and v degrees of freedom respectively. Then the ratio
W/u
F
Y/v
has the probability density function
u/2
 u  v  u  (u/2) 1
   x
f ( x)   2  v  0 x
u  v /2 ,
 u   v   u  
     x  1
 2   2   v  
and is said to follow the distribution with u degrees of freedom in
the numerator and v degrees of freedom in the denominator. It is
usually abbreviated as Fu,v.
Hypothesis Tests on the Ratio of Two Variances 36

Let be a random sample from a normal


population with mean µ1 and variance , and let

be a random sample from a second normal


population with mean µ 2 and variance. Assume
that both normal populations are independent.
Let and be the sample 2variances.
2 Then the
S /
ratio F  12 12
S 2 / 2

has an F distribution with n1  1 numerator


degrees of freedom and n2  1 denominator
degrees of freedom.
Hypothesis Tests on the Ratio of Two Variances 37

Null hypothesis: H :  2   2
0 1 2

Test statistic: S12


F0  2
S2

Alternative Hypotheses Rejection Criterion

H1 : 12   2
2
f 0  f /2, n1 1, n2 1 or f 0  f1 /2, n1 1, n2 1
H1 : 12   2
2
f 0  f  , n1 1, n2 1
H1 : 12   2
2
f 0  f1  , n1 1, n2 1
Example_8 38

Oxide layers on semiconductor wafers are etched in a


mixture of gases to achieve the proper thickness. The
variability in the thickness of these oxide layers is a
critical characteristic of the wafer, and low variability is
desirable for subsequent processing steps. Two different
mixtures of gases are being studied to determine
whether one is superior in reducing the variability of the
oxide thickness. Sixteen wafers are etched in each gas.
The sample standard deviations of oxide thickness are s1
= 1.96 angstroms and s2 = 2.13 angstroms, respectively.
Is there any evidence to indicate that either gas is
preferable? Use a fixed-level test with = 0.05.
Solution 39
Confidence Interval on the Ratio of Two Variances 40

Ifs12 and
2
s2 are the sample variances of random samples
of sizes n1 and n2, respectively, from two independent
2

normal populations
1  22 with unknown variances and ,
2 2
thena1 /
100(1
2  a)% confidence interval on the ratio
is s2 2 s2
1 f1 /2, n2 1, n1 1  1  1 f /2 , n2 1, n1 1
s22  22 s22
f /2,n2 1,n1 1 f/2, n2 1, n1 1

where and are the upper and


lower a/2 percentage points of the F distribution with n2 – 1
numerator and n1 – 1 denominator degrees of freedom,
respectively.

A confidence interval on the ratio of the standard


deviations can be obtained by taking square roots.
Large-Sample Test on the Difference in Population 41

Proportions
Suppose that two independent random samples of
and are taken from two populations, and let and
represent the number of observations that belong
to the class of interest in samples 1 and 2,
respectively. Furthermore, suppose that the
normal approximation to the binomial is applied
to each population, so the estimators of the
population proportions and have approximate
normal distributions.
Large-Sample Test on the Difference in Population 42

Proportions
We wish to test the hypotheses:

H 0 : p1  p2
H1 : p1  p2
The following test statistic is distributed
approximately as standard normal and is the basis
of the test:
Pˆ P ˆ  (p  p )
Z  1 2 1 2
p1 (1  p1 ) p (1  p2 )
 2
n1 n2
Large-Sample Test on the Difference in Population 43

Proportions
Null hypothesis: H0: p1  p2

Pˆ1  Pˆ2
Test statistic:Z 0 
1 1 
Pˆ (1  Pˆ )  
 n1 n2 
Rejection Criterion
Alternative P-Value for Fixed-Level Tests
Hypothesis
H1 : p 1  p 2 Probability above |z0| z0 > za/2 or z0 < -za/2
and
probability below -|
z0|.
P = 2[1 - (|z0|)]

H1 : p 1 > p 2 Probability above z0. z0 > za


P = 1 - (z0)

H1 : p 1 < p 2 Probability below z0. z0 < -za 43


Example_10 44

Extracts of St. John's Wort are widely used to treat


depression. An article in the April 18, 2001, issue of the
Journal of the American Medical Association compared the
efficacy of a standard extract of St. John's Wort with a
placebo in 200 outpatients diagnosed with major depression.
Patients were randomly assigned to two groups; one group
received the St. John's Wort, and the other received the
placebo. After eight weeks, 19 of the placebo-treated patients
showed improvement, and 27 of those treated with St. John's
Wort improved. Is there any reason to believe that St. John's
Wort is effective in treating major depression? Use =0.05.
Solution 45
Solution 46
Confidence Interval on the Difference in 47

Population Proportion
If and are the sample proportions of observations in two
independent random samples of sizes n1 and n2 that belong to a
class of interest, an approximate two-sided 100(1 a)%
confidence interval on the difference in the true proportions p1 
p2 is
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )
pˆ1  pˆ 2  z/2 
n1 n2
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )
 p1  p2  pˆ1  pˆ 2  z/2 
n1 n2

where za/2 is the upper a/2 percentage point of the standard


normal distribution.
Example_11 48

(Please refer to Example_10)Obtain the 95% confidence


interval.

You might also like