Chapter 6. Hypothesis Testing 2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Chapter 6.

Hypothesis Testing
Dr. Lam Son
[email protected]
033.6969.909
1. Concept
1.1. The statistical hypothesis

Definition: The statistical hypothesis is a hypothesis about


the probability distribution of a random variable, about the
characteristic parameters of the random variable, or about the
independence of the random variables.

The null hypothesis : H0.

The alternative hypothesis :H1


Example:
a. H0 : Income is a normally distributed
random variable
H1 :Income is not a normally
distributed random variable.
b. H0 :The average income in city A is 10
million
H1 : The average income in city A is more
than 10 million
1.2. Hypothesis testing
Hypothesis testing is to determine if there is
enough statistical evidence to believe in a
hypothesis.

- Average weight of 1 manufactured product.


- Habits and consumption behavior of the
population.
- Changes before the project and after the project.
1.3. Method

- Give a pair of hypotheses H0 and H1.


- Assume H0 is true.
- Then find the event A such that: P(A) is very small
- Using the principle of small probability, we can
assume that event A never happened.
- Carry out the test on the sample
+ If A occurs, it shows that H0 lacks basis and
rejects H0 .
+ If A does not occur, there is no basis for rejection,
so we accept H0.
1.4. Test Criteria and Rejection Domain
W = ( X1, X 2 ,..., X n )
Let a statistic G G = f ( X1, X 2 ,..., X n ,0 )
Such that if H0 is true, then the distribution of G
is completely deterministic. Then, with a very
small probability, we find a set W such that:

P ( G W / H 0 ) = 
G test criteria.
W rejection domain/ region
 level of significance
1.5. Testing rule
Carry out a test with a sample that obtains a
specific value: w = ( x1 , x2 ,...xn )
From there it can be calculated:
g = f ( x1 , x2 ,...xn , 0 )
g is called the observed value of the test
criterion.
g W reject H0 (accept H1)
g W accept H0
2. The characteristic parameters Test
2.1. Hypothesis testing a mathematical expectation
of a normally distributed random variable.

Suppose X  N (  , ) does not know the


2

expectation yet but has a basis to compare it


with some number 0 .
I hypothesize H0:  = 0
2.1.1. Case 1: 2 is known
(X − 0 ) n
G=

If H0 is true then: G  N (0,1)

H0:  = 0
H1:   0

P (G  u ) =  W = (u ; + )
H 0:  = 0
H1:   0

P (G  −u ) =  W = ( −; −u )

H0:  = 0
H1:   0

P (G  −u 2 ) + P (G  u 2 ) = 
W = ( −; −u 2 )  (u 2 ; + )
Example:
A management report: the average processing time
for 1 machine part is 26 minutes. Machining time
for a machine part is a normally distributed random
variable with standard deviation of 5.2 minutes. A
sample of size n=100 is randomly taken and an
average of 27.56 minutes is calculated.
At 5% significance level, what can be concluded
about the above report?

g = 3 W = (−; −1,96)  (1,96; +)


Reject H0
So with level 5%, the conclusion that the
average time is 26 is unfounded.
2.1.2 Case 1: 2 is known (X − 0 ) n
G=
H 0:  = 0 S
H1:   0 W = (t(n −1)
; + )
 = 0
W = ( −; −t )
H 0:
(n −1)
H 1:    0

H0:  = 0
H1:    0
(
W = −; −t (n −1)
)  (t (n −1)
 ; + )
Ex:
A company produces light bulbs with an average
lamp life of 1200 hours. The company just
imported a new production line. This line, when
tested for 40 balls and tested, showed an
average life of 1260 hours with a standard
deviation of 215 hours. At the 5% level of
significance, determine whether the new chain is
better than the old one?

g = 1,76 W = (1,645; +)


So with 5%, it can be said that the new
chain is better.
2.2. Hypothesis testing on two expected values

Giả sử có hai tổng thể nghiên cứu trong đó có các bnn:


X  N ( 1 ,12 ); Y  N ( 2 , 22 )
If 1 and 2 are unknown but there is a basis to
compare them with each other, a statistical
assumption is made:

H0 : 1 = 2
X1 − X 2
G=  N (0,1)
 2
 2
1
+ 2
n1 n2

H 0 : 1 =  2
H1 : 1  2

W = (u , +)


H 0 : 1 =  2
W = (−, −u )
H1 : 1   2

H 0 : 1 =  2
H1 : 1  2

W = (−, −u /2 )  (u /2 , +)


Note: When both sample sizes are greater
than 30, we can replace the population
variance with the sample variance and still
assume that G is normally distributed:

X1 − X 2
G=  N (0,1)
S 2
S 2
+
1 2
n1 n 2
Ví dụ: At a factory, two production lines are
assembled to produce the same product. to evaluate
whether the average fuel consumption in a production
shift along these two lines is different.
Line 1: xi 2,5 3,2 3,5 3,8 3,5
ni 9 6 6 4 8
Line 2: yi 2,0 2,7 2,5 2,9 2,3 2,6
ni 8 7 8 9 5 8
At the 5% level of significance, conclude on the above
problem knowing that the material costs along both
lines are normally distributed random variables.
Let X1, X2 be….
X1 − X 2
H 0 : 1 =  2 G=
S 2
S 2
H1 : 1  2 +
1 2
n1 n 2
W = (−, −u /2 )  (u /2 , +) = ( −, −1.96 )  (1.96, + )

n1 = 5, x1 = 3,3; n 2 = 6, x 2 = 2,5
3,3 − 2,5
g= = 3,33 W
0, 42 0,52
+
5 6
So at 5% significance level, it can be said that the
material waste of the two production lines is really
different.
H 0 : 1 =  2
W = (−, −u /2 )  (u /2 , +)
H1 : 1  2
= ( −, −1.96 )  (1.96, + )

X1 − X 2 3,3 − 2,5
G= = = 3,33 W
2 2 2 2
0, 4 0, 4 0, 4 0, 4
+ +
n1 n2 5 6
2.3 Test for a probability value

( f − p0 ) n
H 0 : p = p0 G=  N (0,1)
p0 (1 − p0 )
H 0 : p = p0
H1 : p  p0 W = (u , +)
H 0 : p = p0
H1 : p  p0 W = (−, −u )
H 0 : p = p0
H1 : p  p0
W = (−, −u /2 )  (u /2 , +)
Ex
The traffic police agency said that 62% of
motorcyclists on the road carried a license,
randomly checking 130 motorcyclists found 68
people carrying a license. At 1% significance level,
does this data show that the percentage of
people carrying a driving license is lower than
62%?
H 0 : p = p0
H1 : p  p0 W = (−, −u ) = ( −, −1.645 )

(0,523 − 0,62) 130


g= = −2, 278 W
0,62(1 − 0,62)
2.4. Hypothesis testing on two probability
values

H0 : p1 = p2
m1 m2 m1 + m 2
f1 = , f2 = ,f =
n1 n2 n1 + n 2
f1 − f2
G=  N (0,1)
1 1 
f (1 − f )  + 
 n1 n 2 
H 0 : p1 = p2
W = (u , +)
H1 : p1  p2

H 0 : p1 = p2
H1 : p1  p2 W = (−, −u )

H 0 : p1 = p2
H1 : p1  p2 W = (−, −u /2 )  (u /2 , +)
Ex:

A study was conducted to compare the rate


of lower secondary school dropout in two
mountainous districts. In district A, out of
160 monitored students, 48 dropped out of
school before 9th grade and in district B out
of 400 monitored students 90 dropped out
before 9th grade. With a 2% thought level, it
can be assumed that Is the dropout rate in
the two districts different?
2.5. Test for a variance
(n − 1)S 2
H0 :  = 
2 2
G=  2(n −1)


0
2
0
H 0 :  = 
 2 2

W = ( 2(n −1)


, +)
0

H1 :    0

2 2


 0
H :  2
=  2

W = (−, 1− ) 2(n −1)


0

H1 :    0

2 2

H 0 :  =  0
 2 2

  W = ( − ,  2(n −1)
−  )  (  2(n −1)
 , + )
H1 :    0
1 /2 /2

2 2
H 0 :  =  0
 2 2

 W = ( 
2(n −1)
, +)
H1 :    0

2 2

H 0 :  = 
 2 2


0
 W = ( − ,  2(n −1)
− )
H1 :    0
1

2 2


 0
H :  2
=  2
0

H1 :    0

2 2

W = ( −,  2(n −1)


1− /2 )(
2(n −1)
 /2 , + )
Ex:

Measuring the diameters of 12 products


of a production line, the quality control
engineer calculates s=0.3. Know that if the
volatility of the products is greater than
0.2, the production line must be stopped
for adjustment. At the 5% level of
significance, what conclusion does the
engineer have?
2.6. Test for 2 variances
X  N ( 1 , ); Y  N ( 2 , )
2
1
2
2

S 2
G=  F (n1 − 1, n 2 − 1) if H0 is true
1
S 2
2

H 0 :  = 
 2 2


1
 W
2
= f (
(n1 −1,n 2 −1)
; + )
H1: 1   2

2 2

H 0 :  1 =  2
 2 2

  W = −;(f (n1 −1,n 2 −1)


1− )
H1: 1   2

2 2
H 0 :  = 
 2 2
1 2

H1 :  1   2

2 2

(
W = −; f (n1 −1,n 2 −1)
1− /2 )(f(n1 −1,n 2 −1)
 /2 ; + )

You might also like