Chapter 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

Chapter 5.

Estimation

Nguyễn Văn Hạnh


Department of Applied Mathematics
School of Applied Mathematics and Informatics

Second semester, 2022-2023

NV HANH Probability and statistics Second semester, 2022-2023 1 / 60


Contents

1 Random sample

2 Sampling distribution of random sample mean

3 Estimation
Point estimation
Maximum likelihood estimation methods
Method of moments

4 Confidence interval estimation


Confidence interval estimation of µ
Confidence interval estimation of σ 2
Confidence interval estimation of p
General problem

NV HANH Probability and statistics Second semester, 2022-2023 2 / 60


Random sample

Population and sample


Definition 5.1:
A population is the set of all individuals of interest.
In practice, we usually study on a characteristic X of individuals in a
population. X is also called the population.
The number of individuals N is called the population size.
A subset of n individuals taken from a population X is called a
sample of size n.
A sample of size n is a vetor of n obsevations (x1 , x2 , .., xn ).

NV HANH Probability and statistics Second semester, 2022-2023 3 / 60


Random sample

Random sample

Suppose that we study on a population X having a probability


distribution f (x) (probability density function or probability mass
function).
Definition 5.2: A vector of n random variables (X1 , X2 , . . . , Xn ),
where Xi are independently and identically distributed with the
probability distribution f (x) is called a random sample of size n taken
from the population X .
The joint probability distribution (pdf or pmf) of the random sample
(X1 , X2 , . . . , Xn ) is

f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) . . . f (xn )

and is called the likelihood function.

NV HANH Probability and statistics Second semester, 2022-2023 4 / 60


Random sample

Example

Let X be the electricity bills (thousands dong) of households in a


region of Vietnam (in June 2020). The population is the set of all
households in this region (consists of about one million households).
Suppose that X follows a normal distribution with a mean of µ and a
variance of σ 2 , the probability density function is f (x; µ, σ 2 ).
A random vector (X1 , X2 , . . . , X50 ), where Xi are i.i.d random variable
having the same normal distribution f (x; µ, σ 2 ), is called a random
sample of size 50 drawn from the population X .
Observed the electricity bills of 50 households from this region and
onbtained the following sample
(x1 , x2 , . . . , x50 ) = (255, 367, . . . , 423), this sample is a representation
of the random sample (X1 , X2 , . . . , X50 ).

NV HANH Probability and statistics Second semester, 2022-2023 5 / 60


Random sample

Statistic

Definition 5.3: A statistic is a function f (X1 , X2 , . . . , Xn ) of the


random sample (X1 , X2 , . . . , Xn ).
Example: The random sample mean
X1 + X2 + . . . + Xn
X̄ =
n
is a statistic.
A statistic is a random variable and the distribution of a statistic is
called a sampling distribution.

NV HANH Probability and statistics Second semester, 2022-2023 6 / 60


Random sample

Some important statistic

The random sample mean:


X1 + X2 + . . . + Xn
X̄ =
n
The non-adjusted random sample variance:

(X1 − X̄ )2 + . . . + (Xn − X̄ )2
Ŝ 2 =
n
The adjusted random sample variance

(X1 − X̄ )2 + . . . + (Xn − X̄ )2
S2 =
n−1

The adjusted random sample standard deviation S = S2

NV HANH Probability and statistics Second semester, 2022-2023 7 / 60


Random sample

Some important statistic

The Z -statistic:
X̄ − µ
Z= √
σ/ n
The T -statistic:
X̄ − µ
T = √
S/ n

NV HANH Probability and statistics Second semester, 2022-2023 8 / 60


Sampling distribution of random sample mean

Sampling distribution of random sample mean

Consider a random sample (X1 , X2 , . . . , Xn ) taken from a population


X . Denote by µ = E (X ) and σ 2 = V (X ).
The random sample mean:
X1 + X2 + . . . + Xn
X̄ = .
n
Theorem 5.1: For all distribution of X, we have E (X̄ ) = µ and
2
V (X̄ ) = σn .
Theorem 5.2: If X is normal: X ∼ N(µ, σ 2 ) then X̄ is also normal:
2
X̄ ∼ N(µ, σn ). So the Z-statistic is standard normal:
X̄ −µ
Z= √
σ/ n
∼ N(0, 1).

NV HANH Probability and statistics Second semester, 2022-2023 9 / 60


Sampling distribution of random sample mean

Sampling distribution of random sample mean

Consider a random sample (X1 , X2 , . . . , Xn ) taken from a population


X . Denote by µ = E (X ) and σ 2 = V (X ).
Theorem 5.3:The central limit theorem:
X̄ − µ n→+∞
Z= √ −−−−→ N(0, 1)
σ/ n
X̄ −µ 2
When n is large enough, Z = √
σ/ n
≈ N(0, 1) or X̄ ≈ N(µ, σn ), for all
distribution of X .

NV HANH Probability and statistics Second semester, 2022-2023 10 / 60


Sampling distribution of random sample mean

Sampling distribution of random sample mean

Consider a random sample (X1 , X2 , . . . , Xn ) taken from a population


X . Denote by µ = E (X ) and σ 2 = V (X ).
Theorem 5.4: If X is normal: X ∼ N(µ, σ 2 ) then the T-statistic
follows the Student’s distribution with n − 1 degrees of freedom:

X̄ − µ
T = √ ∼ tn−1 .
S/ n

Theorem 5.5 (The central limit theorem + Slutsky’s theorem): For all
distribution of X :
X̄ − µ n→+∞
T = √ −−−−→ N(0, 1)
S/ n

NV HANH Probability and statistics Second semester, 2022-2023 11 / 60


Estimation Point estimation

Introduction
Consider a random sample (X1 , X2 , . . . , Xn ) taken from a population
X.
Suppose that the population X follows a distribution F (x; θ) that
depends on a unknown parameter θ.
The parameter θ is unknown since we usually cannot observe all the
population.
Problem of estimation: it is necessary to estimate the parameter θ
based on the random sample (X1 , X2 , . . . , Xn ).
Definition 5.4: A point estimator of θ is a statistic
θ̂ = h(X1 , X2 , . . . , Xn ).
A confidence interval estimation of θ with a confidence level 1 − α is
a random interval [θ̂1 ; θ̂2 ] = [h1 (X1 , X2 , . . . , Xn ); h2 (X1 , X2 , . . . , Xn )]
such that:
P(θ̂1 ≤ θ ≤ θ̂2 ) = 1 − α
NV HANH Probability and statistics Second semester, 2022-2023 12 / 60
Estimation Point estimation

Example

Let X be the electricity bills (thousands dong) of households in a region of


Vietnam (in June 2020). Observed the electricity bills of 200 households
from this region and onbtained the following data:

NV HANH Probability and statistics Second semester, 2022-2023 13 / 60


Estimation Point estimation

Example

The histogram for these data is the following:

NV HANH Probability and statistics Second semester, 2022-2023 14 / 60


Estimation Point estimation

Example

The distribution of data can be approximated by a normal distribution:

NV HANH Probability and statistics Second semester, 2022-2023 15 / 60


Estimation Point estimation

Example
Modelling: We can suppose that the electricity bills of households in
this region follows a normal distribution with parameter θ = (µ, σ 2 )
and the probability density function:
1  (x − µ)2 
f (x; θ) = √ exp −
2πσ 2 2σ 2
The parameter µ is the population mean (the mean electricity bill of
all households) and the parameter σ 2 is the populaiton variance.
A point estimator of µ is the random sample mean
1
(X1 + X2 + . . . + Xn )
X̄ =
n
For the given sample, the sample mean
1
x̄ = (x1 + x2 + . . . + xn ) = 236.78
n
is also called a point estimate of µ.
NV HANH Probability and statistics Second semester, 2022-2023 16 / 60
Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE)


Consider a random sample (X1 , X2 , . . . , Xn ) taken from a population
X that follows a distribution f (x; θ) (the pdf or pmf).
The likelihood function of θ is defined by:
L(θ) = Πni=1 f (Xi ; θ)
The likelihood function measures the possibility of the output
(probability that the sample was observed). The output depemds on
the paramete θ of the model, different values of parameters produce
different ouputs.
Idea of MLE is to find the parameter θ such that the ouput of model
is closest to the observed sample (fit the data the best or maximizes
the likelihood function).
Maximum likelihood estimator: To find θ that maximizes the
likelihood function L(θ) or log L(θ):
θ̂ = argmax L(θ) = argmax log L(θ)
θ θ
NV HANH Probability and statistics Second semester, 2022-2023 17 / 60
Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE)


Procedure to find the maximum likelihood estimator:
Step 1: Write the pdf or pmf of X : f (x; θ).
Step 2: Compute the likelihood function of θ: L(θ) = Πni=1 f (Xi ; θ)
Step 3: Compute the log- likelihood function:
n
X
log L(θ) = log f (Xi ; θ)
i=1

Step 4: Solve the equation


∂ log L(θ)
=0
∂θ
and let θ̂ be the solution, then prove that
∂ 2 log L(θ)
|θ=θ̂ < 0
∂θ2
.
NV HANH Probability and statistics Second semester, 2022-2023 18 / 60
Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE)


Example 5.1: Let X be the lifetime of a type of batteries produced by a
factory and suppose that X follows an exponential distribution with a
parameter λ > 0. Find the maximum likelihood estimator of λ.
Step 1: The probability density function (pdf) of X is

f (x; λ) = λe −λx , for x > 0.

Step 2: The likelihood function of λ:


Pn
L(λ) = Πni=1 λe −λXi = λn e −λ i=1 Xi

Step 3: The log- likelihood function:


n
X
log L(λ) = n log λ − λ Xi
i=1

NV HANH Probability and statistics Second semester, 2022-2023 19 / 60


Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE)


Step 4: Solve the equation
n
∂ log L(λ) n X
= − Xi = 0
∂λ λ
i=1

we obtain the solution


n 1
λ̂ = Pn = .
i=1 Xi X̄
Since
∂ 2 log L(λ) n
2
= − 2 < 0, for all λ > 0,
∂λ λ
then the maximum likelihood estimator of λ is
1
λ̂ = .

NV HANH Probability and statistics Second semester, 2022-2023 20 / 60
Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE) of normal


distribution
Consider a random sample (X1 , X2 , . . . , Xn ) drawn from a normal
population X with a mean of µ and a variance of σ 2 . Find the MLE
of θ = (µ, σ 2 ).
The pdf of X is
1  (x − µ)2 
f (x; θ) = √ exp −
2πσ 2 2σ 2
The likelihood function is
1  (X − µ)2 
i
L(θ) = Πni=1 f (Xi ; θ) = Πni=1 √ exp −
2πσ 2 2σ 2
The log-likelihood function is
n
n 1 X
log L(θ) = − log(2πσ 2 ) − 2 (Xi − µ)2
2 2σ
i=1

NV HANH Probability and statistics Second semester, 2022-2023 21 / 60


Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE) of normal


distribution

Solve the following system of equations:


n
∂ log L(θ) 1 X
=− 2 (Xi − µ) = 0
∂µ σ
i=1
n
∂ log L(θ) n 1 X
=− 2 + 4 (Xi − µ)2 = 0
∂σ 2 2σ 2σ
i=1

Obtain the MLE of µ and σ 2 as follows:


n n
1X 1X
µ̂ = X̄ = Xi and σ̂ 2 = (Xi − X̄ )2
n n
i=1 i=1

NV HANH Probability and statistics Second semester, 2022-2023 22 / 60


Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE) of proportion

Let p be a proportion of defective items in a production line. Find the


MLE of p.
Consider a random sample of n items, we define random variables Xi
that equals to 1 if the i th item in the sample is defective and equals
to 0 otherwise. Then the random sample (X1 , X2 , . . . , Xn ) is drawn
from a Bernoulli population X with a parameter of p.
The pmf of X is

f (x; p) = p x (1 − p)1−x , for x = 0, 1.

The likelihood function is


Pn Pn
L(p) = Πni=1 f (Xi ; p) = Πni=1 p Xi (1 − p)1−Xi = p i=1 Xi
(1 − p)n− i=1 Xi

NV HANH Probability and statistics Second semester, 2022-2023 23 / 60


Estimation Maximum likelihood estimation methods

Maximum likelihood estimation (MLE) of noraml


distribution
The log-likelihood function is
n
X n
X
log L(p) = Xi log p + (n − Xi ) log(1 − p)
i=1 i=1

Solve the following equation:


Pn
n − ni=1 Xi
P
∂ log L(p) i=1 Xi
= − =0
∂p p 1−p
Obtain the MLE of p as follows:
n
1X
p̂ = X̄ = Xi
n
i=1

NV HANH Probability and statistics Second semester, 2022-2023 24 / 60


Estimation Method of moments

Method of moments
Definition 5.5: Let X be a random variable. The k th moment of X is
E [X k ], for k ∈ N ∗ .
Definition 5.6: Let X be a random variable and (X1 , X2 , ..., Xn ) be a
random sample drawn from the population X . The k th sample
moment of X is
n
1 k 1X k
(X1 + ... + Xnk ) = Xi
n n
i=1

Method of moments: Let X be a population with a probability


distribution f (x; θ), where θ is an unknown parameter in R r . The
estimator of θ by the method of moment is the solution of the
following system of equations:
n
1X k
E [X k ] = Xi , k = 1, .., r .
n
i=1

NV HANH Probability and statistics Second semester, 2022-2023 25 / 60


Estimation Method of moments

Method of moments
Example 5.2: Let X be the lifetime of a type of batteries produced by a
factory and suppose that X follows an exponential distribution with a
parameter λ > 0. Find the estimator of λ by the method of moments.
The parameter λ is in R+∗ , so the dimension r = 1.

The 1 moment of X is E [X ] = λ1 .
st

The 1st sample moment of X is


1
(X1 + ... + Xn ) = X̄
n
We solve the equation:
n
1X 1 1
E [X ] = Xi ⇔ = X̄ ⇔ λ = .
n λ X̄
i=1
The estimator of λ by the method of moments is
1
λ̂MM = = λ̂MLE .

NV HANH Probability and statistics Second semester, 2022-2023 26 / 60
Estimation Method of moments

Unbiased estimator

Definition 5.7: A point estimator θ̂ of θ is called unbiased if E(θ̂) = θ.


Example: Consider a random sample (X1 , X2 , ..., Xn ) drawn from a
population X with a mean of µ and a variance of σ 2 .
We can prove that:
n−1 2
E(X̄ ) = µ and E(Ŝ 2 ) = σ .
n

Then X̄ is an unbiased estimator of µ and Ŝ 2 is a biased estimator of


σ2.
We adjusted Ŝ 2 to obtain an unbiased estimator of σ 2 as follows:
n
2 1 X
S = (Xi − X̄ )2 .
n−1
i=1

NV HANH Probability and statistics Second semester, 2022-2023 27 / 60


Confidence interval estimation

Confidence interval estimation

A confidence interval estimation of θ with a confidence level 1 − α is


a random interval [θ̂1 ; θ̂2 ] such that: P(θ̂1 ≤ θ ≤ θ̂2 ) = 1 − α.
Procedure of finding a confidence interval estimation:
Find a point estimator θ̂ of θ.
Using the sampling distribution of θ̂ or the central limit theorem:

X̄ − µ
Z= √ ≈ N(0; 1)
σ/ n

to find an interval [θˆ1 , θˆ2 ] such that P[θˆ1 < θ < θˆ2 ] = 1 − α (where µ
and σ are functions of θ).

NV HANH Probability and statistics Second semester, 2022-2023 28 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ


Problem 1: Consider a random sample (X1 , X2 , . . . , Xn ) taken from a
population X with a mean of µ = E (X ) and a variance of σ 2 = V (X ).
Find a 1 − α confidence interval estimation of µ.
Solution:
Case 1: The population X is normal: X ∼ N(µ, σ 2 ), where σ 2 is known.
A point estimator of µ is the random sample mean X̄ .
2
The sampling distribution of X̄ is also normal: X̄ ∼ N(µ, σn ).
X̄ −µ
√ ∼ N(0; 1).
The statistic Z = σ/ n
Let Zα/2 be the critical value of N(0; 1) at level 1 − α/2, it means
that P(Z < Zα/2 ) = 1 − α/2.
We have
 X̄ − µ 
P − Zα/2 ≤ √ ≤ Zα/2 = 1 − α
σ/ n
 σ σ 
P X̄ − Zα/2 √ ≤ µ ≤ X̄ + Zα/2 √ =1−α
n n
NV HANH Probability and statistics Second semester, 2022-2023 29 / 60
Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ

Problem 1: Consider a random sample (X1 , X2 , . . . , Xn ) taken from a


population X with a mean of µ = E (X ) and a variance of σ 2 = V (X ).
Find a 1 − α confidence interval estimation of µ.
Solution:
Case 1: The population X is normal: X ∼ N(µ, σ 2 ), where σ 2 is known.
We have
 σ σ 
P X̄ − Zα/2 √ ≤ µ ≤ X̄ + Zα/2 √ =1−α
n n

Then a 1 − α confidence interval (CI) estimation of µ is:


h σ σ i
X̄ − Zα/2 √ ; X̄ + Zα/2 √ = X̄ ∓ ,
n n

where  = Zα/2 √σn is called the error of CI.

NV HANH Probability and statistics Second semester, 2022-2023 30 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ


Problem 1: Consider a random sample (X1 , X2 , . . . , Xn ) taken from a
population X with a mean of µ = E (X ) and a variance of σ 2 = V (X ).
Find a 1 − α confidence interval estimation of µ.
Solution:
Case 2: The population X is normal: X ∼ N(µ, σ 2 ), where σ 2 is unknown.
A point estimator of µ is the random sample mean X̄ .
X̄ −µ
√ ∼ tn−1 , where tn−1 is the Student’s
We use the T-statistic T = S/ n
distribution with n − 1 degrees of freedom.
Let tn−1;α/2 be the critical value of tn−1 at level 1 − α/2, it means
that P(tn−1 < tn−1;α/2 ) = 1 − α/2.
We have
 X̄ − µ 
P − tn−1;α/2 ≤ √ ≤ tn−1;α/2 = 1 − α
S/ n

NV HANH Probability and statistics Second semester, 2022-2023 31 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ


Problem 1: Consider a random sample (X1 , X2 , . . . , Xn ) taken from a
population X with a mean of µ = E (X ) and a variance of σ 2 = V (X ).
Find a 1 − α confidence interval estimation of µ.
Solution:
Case 2: The population X is normal: X ∼ N(µ, σ 2 ), where σ 2 is unknown.

We have
 S S 
P X̄ − tn−1;α/2 √ ≤ µ ≤ X̄ + tn−1;α/2 √ =1−α
n n
Then a 1 − α confidence interval (CI) estimation of µ is:
h S S i
X̄ − tn−1;α/2 √ ; X̄ + tn−1;α/2 √ = X̄ ∓ ,
n n

where  = tn−1;α/2 √Sn is called the error of CI.


NV HANH Probability and statistics Second semester, 2022-2023 32 / 60
Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ

Problem 1: Consider a random sample (X1 , X2 , . . . , Xn ) taken from a


population X with a mean of µ = E (X ) and a variance of σ 2 = V (X ).
Find a 1 − α confidence interval estimation of µ.
Solution:
Case 3: The population X is non-normal, n is large enough and the
population variance σ 2 is known.
X̄ −µ
The statistic Z = √
σ/ n
≈ N(0; 1).
Then a 1 − α confidence interval (CI) estimation of µ is:
h σ σ i
X̄ − Zα/2 √ ; X̄ + Zα/2 √ = X̄ ∓ ,
n n

where  = Zα/2 √σn is called the error of CI.

NV HANH Probability and statistics Second semester, 2022-2023 33 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ

Problem 1: Consider a random sample (X1 , X2 , . . . , Xn ) taken from a


population X with a mean of µ = E (X ) and a variance of σ 2 = V (X ).
Find a 1 − α confidence interval estimation of µ.
Solution:
Case 4: The population X is non-normal, n is large enough and the
population variance σ 2 is unknown.
X̄ −µ
The statistic Z = √
S/ n
≈ N(0; 1).
Then a 1 − α confidence interval (CI) estimation of µ is:
h S S i
X̄ − Zα/2 √ ; X̄ + Zα/2 √ = X̄ ∓ ,
n n

where  = Zα/2 √Sn is called the error of CI.

NV HANH Probability and statistics Second semester, 2022-2023 34 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ

Example 5.3: Let X be the amount of telephone bills (USD) of customers


in a city. Suppose that X follows a normal distribution N(µ, σ 2 ). Observed
a sample of 20 customers, we obtained the following data:

31.3, 28.8, 30.8, 29.6, 32.5, 30.1, 28.6, 32.2, 30.8, 32.6,

31.8, 28.5, 29.9, 27.2, 36.0, 30.6, 29.2, 30.9, 31.0, 30.8

Find the point estimate of µ and σ 2 by method of moments.


Find the point estimate of µ and σ 2 by MLE method.
Find a 90% confidence interval estimate of µ.
Suppose that the standard deviation σ is known to equal to 1.5. Find
a 90% confidence interval estimate of µ.

NV HANH Probability and statistics Second semester, 2022-2023 35 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ

Solution of Example: The point estimator of µ and σ 2 by method of


moments:
The parameter θ = (µ, σ 2 ) is in R 2 , so the dimension r = 2.
We have E (X ) = µ and E (X 2 ) = µ2 + σ 2 .
We solve the following system of equations:
n n
1X 1X 2
E [X ] = Xi and E (X 2 ) = Xi
n n
i=1 i=1

The point estimator of µ and σ 2 by method of moments are:


n
2 1X 2
µ̂MM = X̄ and σ̂MM = Xi − X̄ 2 = Ŝ 2 .
n
i=1

NV HANH Probability and statistics Second semester, 2022-2023 36 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ


Solution of Example: The point estimator of µ and σ 2 by method of
moments:
The point estimator of µ and σ 2 by method of moments are:
n
2 1X 2
µ̂MM = X̄ and σ̂MM = Xi − X̄ 2 = Ŝ 2 .
n
i=1

For the given sample, the point estimate of µ and σ 2 by method of


moments are:
1
µ̂MM = x̄ = (31.3 + ... + 30.8) = 30.66
20
and
2 1
σ̂MM = ŝ 2 = (31.32 + ... + 30.82 − 20 ∗ 30.662 ) = 3.4234
20
NV HANH Probability and statistics Second semester, 2022-2023 37 / 60
Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ

Solution of Example:
The point estimator of µ and σ 2 by the maximum likelihood
estimation method are:
2
µ̂MLE = X̄ and σ̂MLE = Ŝ 2 .

For the given sample, the point estimate of µ and σ 2 by the


maximum likelihood estimation method are:
1
µ̂MLE = x̄ = (31.3 + ... + 30.8) = 30.66
20
and

2 1
σ̂MLE = ŝ 2 = (31.32 + ... + 30.82 − 20 ∗ 30.662 ) = 3.4234
20

NV HANH Probability and statistics Second semester, 2022-2023 38 / 60


Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ


Solution of Example: Find a 90% confidence interval estimate of µ.
Since X ∼ N(µ; σ 2 ) where σ 2 is unknown then we use the following
statistic
X̄ − µ
T = √ ∼ tn−1 ,
S/ n
so a 1 − α confidence interval (CI) estimation of µ is:
h S S i
X̄ − tn−1;α/2 √ ; X̄ + tn−1;α/2 √
n n
For the given sample, we have √
n 2
n = 20; x̄ = 30.66; s 2 = n−1 ŝ = 3.604; s = 3.604 = 1.9;
1 − α = 90% then tn−1;α/2 = t19;0.05 = 1.73
So the CI of µ is
1.9
30.66 ∓ 1.73 √ = 30.66 ∓ 0.735 = [29.925; 31.395]
20
NV HANH Probability and statistics Second semester, 2022-2023 39 / 60
Confidence interval estimation Confidence interval estimation of µ

Confidence interval estimation of µ


Solution of Example: Find a 90% confidence interval estimate of µ when
σ = 1.5.
Since X ∼ N(µ; σ 2 ) where σ is known to equal to 1.5 then we use the
following statistic
X̄ − µ
Z= √ ∼ N(0; 1),
σ/ n
so a 1 − α confidence interval (CI) estimation of µ is:
h σ σ i
X̄ − Zα/2 √ ; X̄ + Zα/2 √
n n
For the given sample, we have n = 120; x̄ = 30.66; 1 − α = 90% then
Zα/2 = Z0.05 = 1.645
So the CI of µ is
1.5
30.66 ∓ 1.645 √ = 30.66 ∓ 0.55 = [30.11; 31.21]
20
NV HANH Probability and statistics Second semester, 2022-2023 40 / 60
Confidence interval estimation Confidence interval estimation of σ 2

Confidence interval estimation of σ 2


Problem 2: Let (X1 , X2 , .., Xn ) be a random sample taken from a normal
population X ∼ N(µ, σ 2 ). Find a 1 − α confidence interval estimate of σ 2 .
The point estimator of σ 2 is S 2 .
The sampling distribution of S 2 is the following
(n − 1)S 2
∼ χ2n−1 ,
σ2
where χ2n−1 is the Chi-squared distribution with n − 1 degrees of
freedom.
Let χ2n−1;1−α/2 and χ2n−1;α/2 be the critical value of Chi-squared
distribution χ2n−1 at level α/2 and 1 − α/2.
We have
 (n − 1)S 2 
P χ2n−1;1−α/2 ≤ ≤ χ 2
n−1;α/2 = 1 − α
σ2
NV HANH Probability and statistics Second semester, 2022-2023 41 / 60
Confidence interval estimation Confidence interval estimation of σ 2

Confidence interval estimation of σ 2


Problem 2: Let (X1 , X2 , .., Xn ) be a random sample taken from a normal
population X ∼ N(µ, σ 2 ). Find a 1 − α confidence interval estimate of σ 2 .

We have
(n − 1)S 2
P(χ2n−1;1−α/2 ≤ ≤ χ2n−1;α/2 ) = 1 − α
σ2
Then
 (n − 1)S 2 (n − 1)S 2 
P ≤ σ2 ≤ =1−α
χ2n−1;α/2 χ2n−1;1−α/2

A 1 − α confidence interval estimate of σ 2 is


h (n − 1)S 2 (n − 1)S 2 i
;
χ2n−1;α/2 χ2n−1;1−α/2

NV HANH Probability and statistics Second semester, 2022-2023 42 / 60


Confidence interval estimation Confidence interval estimation of σ 2

Confidence interval estimation of σ 2


Example 5.4: Let X be the amount of telephone bills (USD) of customers
in a city. Suppose that X follows a normal distribution N(µ, σ 2 ). Observed
a sample of 20 customers, we obtained the following data:
31.3, 28.8, 30.8, 29.6, 32.5, 30.1, 28.6, 32.2, 30.8, 32.6,
31.8, 28.5, 29.9, 27.2, 36.0, 30.6, 29.2, 30.9, 31.0, 30.8
Find a 90% confidence interval estimate of σ 2 .
A 1 − α confidence interval estimate of σ 2 is
h (n − 1)S 2 (n − 1)S 2 i
; ,
χ2n−1;α/2 χ2n−1;1−α/2
where n = 20; s 2 = 3.604; 1 − α = 0.9 then
χ2n−1;1−α/2 = χ219,0.95 = 10.12; χ2n−1;α/2 = χ219,0.05 = 30.14.
Then a 90% confidence interval estimate of σ 2 is
h 19 ∗ 3.604 19 ∗ 3.604 i
; = [2.27; 6.77].
30.14 10.12
NV HANH Probability and statistics Second semester, 2022-2023 43 / 60
Confidence interval estimation Confidence interval estimation of p

Confidence interval estimation of population proportion p


Problem 3: Let p be a population proportion, for example, p is the
proportion of defective items in a production line. Find a 1 − α confidence
interval estimate of p.
Consider a random sample of size n from the population.
A point estimator of p is p̂, the sample proportion (example: the
proportion of defective items in a sample of n items).
By the following limit theorem
p̂ − p
Z=q ≈ N(0, 1)
p̂(1−p̂)
n

we obtain the following 1 − α confidence interval estimate of p;


r
p̂(1 − p̂)
p̂ ∓  = p̂ ∓ Zα/2
n
q
where  = Zα/2 p̂(1−n
p̂)
is the error of the CI.
NV HANH Probability and statistics Second semester, 2022-2023 44 / 60
Confidence interval estimation Confidence interval estimation of p

Confidence interval estimation of population proportion p

Example 5.5: Let p be the proportion of defective items in a production


line. Examined a random sample of 120 items from the line and there were
6 defective items. Find a 90% confidence interval estimate of p.
A 90% confidence interval estimate of p is
r
p̂(1 − p̂)
p̂ ∓  = p̂ ∓ Zα/2
n
where n = 20; p̂ = 6/120 = 0.05; 1 − α = 0.9 then
Zα/2 = Z0.025 = 1.96.
So the CI of p is
r
0.05 ∗ 0.95
0.05 ∓ 1.96 = 5% ∓ 3.9% = [1.1%; 8.9%].
120

NV HANH Probability and statistics Second semester, 2022-2023 45 / 60


Confidence interval estimation General problem

Confidence interval estimation


General problem: Observe a population X with the pdf (ou pmf) f (x; θ),
where θ is unknown parameter to estimate. Find a 1 − α confidence
interval estimation of θ.
Consider a random sample (X1 , X2 , ..., Xn ) taken from the population
X.
Find a point estimator θ̂ of θ.
Use the sampling distribution of θ̂ or a limit theorem, for example:
X̄ − µ X̄ − g1 (θ)
Z= √ = √ ≈ N(0; 1)
σ/ n g2 (θ)/ n
From the equation
 X̄ − g1 (θ) 
P − Zα/2 ≤ √ ≤ Zα/2 = 1 − α
g2 (θ)/ n

we find an interval [θˆ1 ; θˆ2 ] such that P(θ̂1 ≤ θ ≤ θ̂2 ) = 1 − α.


NV HANH Probability and statistics Second semester, 2022-2023 46 / 60
Confidence interval estimation General problem

Confidence interval estimation

Example 5.6: Let X be the lifetime (in years) of a mechanical part.


Suppose that X follows an exponential distribution with a rate parameter
of λ.
Construct a 1 − α confidence interval estimation of λ.
Given the following sample:

X [0,1] (1, 2] (2, 3] (3, 4] (4, 5] (5, 6] (6, 7]


No of parts 20 12 8 3 3 2 2
Find a 90% confidence interval estimate of λ for this sample.

NV HANH Probability and statistics Second semester, 2022-2023 47 / 60


Confidence interval estimation General problem

Confidence interval estimation


Solution:
Since X ∼ E(λ) then the pdf of X is f (x; λ) = λe −λx , for x > 0 and
µ = E (X ) = 1/λ; σ 2 = V (X ) = 1/λ2 then σ = 1/λ.
By the central limit theorem:

X̄ − µ X̄ − 1/λ √
Z= √ = √ = (X̄ λ − 1) n ≈ N(0; 1)
σ/ n (1/λ)/ n

From the equation


 √ 
P − Zα/2 ≤ (X̄ λ − 1) n ≤ Zα/2 = 1 − α

or
 1 − Z√α/2 Z
1 + √α/2 
n n
P ≤λ≤ =1−α
X̄ X̄
NV HANH Probability and statistics Second semester, 2022-2023 48 / 60
Confidence interval estimation General problem

Confidence interval estimation

Solution:
Then a 1 − α confidence interval estimation of λ is
Zα/2 Zα/2
h1 − √ 1+ √ i
n n
;
X̄ X̄
For the given sample, we have n = 50; 1 − α = 0.9 then
Zα/2 = Z0.05 = 1.645;
x̄ = (20 ∗ 0.5 + 12 ∗ 1.5 + ... + 2 ∗ 6.5)/50 = 1.92.
So the 90% CI of λ is
1.645
1∓ √
50
= [0.4; 0.64]
1.92
We are 90% confident that the parameter λ is between 0.4 and 0.64.

NV HANH Probability and statistics Second semester, 2022-2023 49 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution: 2nd method


Using the following limit theorem:

X̄ − µ X̄ − 1/λ
T = √ = √ ≈ N(0; 1)
S/ n S/ n

From the equation


 X̄ − 1/λ 
P − Zα/2 ≤ √ ≤ Zα/2 = 1 − α
S/ n
or  1 1 
P ≤ λ ≤ =1−α
X̄ + Zα/2 √Sn X̄ − Zα/2 √Sn

NV HANH Probability and statistics Second semester, 2022-2023 50 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution:
Then a 1 − α confidence interval estimation of λ is
h 1 1 i
;
X̄ + Zα/2 √Sn X̄ − Zα/2 √Sn

For the given sample, we have n = 50; Zα/2 = Z0.05 = 1.645;


1.92; s 2 = (20 ∗ 0.52 + ... + 2 ∗ 6.52 − 50 ∗ 1.922 )/49 = 2.861;
x̄ = √
s = 2.861 = 1.69
So the 90% CI of λ is
1
1.69
= [0.43; 0.65]
1.92 ∓ 1.645 ∗ √
50

We are 90% confident that the parameter λ is between 0.43 and 0.65.

NV HANH Probability and statistics Second semester, 2022-2023 51 / 60


Confidence interval estimation General problem

Confidence interval estimation

Example 5.7: Let X be the number of accidents per week in a small city.
Suppose that X follows a Poisson distribution with a mean parameter of
λ.
Find the point estimator of λ by the method of moment and by the
MLE method.
Construct a 1 − α confidence interval estimation of λ.
Given the following sample:
X 0 1 2 3 4
o
N of weeks 7 15 10 12 6
Find a 90% confidence interval estimate of λ for this sample.

NV HANH Probability and statistics Second semester, 2022-2023 52 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution: The point estimator of λ by the method of moment:


∗ then the dimension of parameter space is
The parameter λ ∈ R+
r = 1.
The first moment of X is E (X ) = λ.
The first sample moment of X is n1 ni=1 Xi = X̄ .
P

We solve the following equation:


n
1X
E (X ) = Xi or λ = X̄
n
i=1

So the point estimator of λ by the method of moment is λ̂MM = X̄ .

NV HANH Probability and statistics Second semester, 2022-2023 53 / 60


Confidence interval estimation General problem

Confidence interval estimation


Solution: The point estimator of λ by the MLE method.:
The pmf of X is
λx
f (x; λ) = e −λ , x = 0, 1, 2, ...
x!
The likelihood function of λ is
n n Xi
Pn
Xi
−λ λ λ
Y Y i=1
−nλ Q
L(λ) = f (Xi ; λ) = e =e n
Xi ! i=1 Xi !
i=1 i=1

The log-likelihood function of λ is


n
X n
Y
log L(λ) = −nλ + Xi log(λ) − log( Xi !)
i=1 i=1

NV HANH Probability and statistics Second semester, 2022-2023 54 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution: The point estimator of λ by the MLE method:


Solve the following equation:
Pn
∂ log L(λ) i=1 Xi
= −n + =0
∂λ λ
we have λ = X̄ .
Since Pn
∂ 2 log L(λ) i=1 Xi
=− <0
∂λ2 λ2
then the likelihood function attains maximum at λ = X̄ . So
λ̂MLE = X̄ .

NV HANH Probability and statistics Second semester, 2022-2023 55 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution: Construct a 1 − α confidence interval estimation of λ:



Since X ∼ P(λ) then µ = E (X ) = λ; σ 2 = V (X ) = λ; σ = λ.
By the central limit theorem:

X̄ − µ X̄ − λ √
Z= √ = √ n ≈ N(0; 1)
σ/ n λ
From the equation
 |X̄ − λ| √ 
P √ n ≤ Zα/2 = 1 − α
λ
or  
P n(X̄ − λ)2 ≤ λZα/2
2
=1−α

NV HANH Probability and statistics Second semester, 2022-2023 56 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution: Construct a 1 − α confidence interval estimation of λ:


or  
P nλ2 − (2nX̄ + Zα/2
2
)λ + nX̄ 2 = 1 − α
or q
2 ∓Z
2nX̄ + Zα/2 2
 α/2 4nX̄ + Zα/2 
P λ∈ =1−α
2n
Then a 1 − α confidence interval estimation of λ is
q
2 ∓Z
2nX̄ + Zα/2 2
α/2 4nX̄ + Zα/2

2n

NV HANH Probability and statistics Second semester, 2022-2023 57 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution:
For the given sample, we have n = 50; 1 − α = 0.9 then
Zα/2 = Z0.05 = 1.645;
x̄ = (7 ∗ 0 + 15 ∗ 1 + 10 ∗ 2 + 21 ∗ 3 + 6 ∗ 4)/50 = 1.9.
So the 90% CI of λ is

2 ∗ 50 ∗ 1.9 + 1.6452 ∓ 1.645 4 ∗ 50 ∗ 1.9 + 1.6452
= [1.61; 2.25]
2 ∗ 50
We are 90% confident that the parameter λ is between 1.61 and 2.25.

NV HANH Probability and statistics Second semester, 2022-2023 58 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution: 2nd method.


By the following limit theorem:

X̄ − µ X̄ − λ √
Z= √ = n ≈ N(0; 1)
S/ n S

From the equation


 |X̄ − λ| √ 
P n ≤ Zα/2 = 1 − α
S
or  S S 
P X̄ − Zα/2 √ ≤ λ ≤ X̄ + Zα/2 √ =1−α
n n

NV HANH Probability and statistics Second semester, 2022-2023 59 / 60


Confidence interval estimation General problem

Confidence interval estimation

Solution:
Then a 1 − α confidence interval estimation of λ is
S
X̄ ∓ Zα/2 √
n

For the given sample, we have n = 50; Zα/2 = Z0.05 = 1.645;


x̄ = 1.9; s 2 √
= (7 ∗ 02 + 15 ∗ 12 + 10 ∗ 22 + 21 ∗ 32 + 6 ∗ 42 − 50 ∗ 1.92 ) =
1.602; s = 1.602 = 1.27.
So the 90% CI of λ is
1.27
1.9 ∓ 1.645 √ = [1.6; 2.2]
50
We are 90% confident that the parameter λ is between 1.6 and 2.2.

NV HANH Probability and statistics Second semester, 2022-2023 60 / 60

You might also like