Chapter 7

BITS Pilani
Pilani Campus
Course No: MATH F113
Probability and Statistics

Chapter 7 (Estimation)
Sumanta Pasari
BITS Pilani
Pilani Campus [email protected]
Parameter Estimation
• Parameter estimation is one of the important steps
in statistical inference.
• It belongs to the subject of estimation theory.
• Why do we require parameter estimation?
• What are the different estimation methods?
• What are the desirable properties of an estimator?
• How to judge “how good is my estimator”?
• Two broad types – point estimation and interval
estimation 3 BITS Pilani, Pilani Campus
Estimator and estimate
• A statistic (which is a function of a random sample,
and hence a random variable) used to estimate the
population parameter θ is called a point estimator
for θ and is denoted by ˆ
• The value of the point estimator on a particular

sample of given size is called a point estimate for θ.
4
BITS Pilani, Pilani Campus
Desirable Properties
1. ˆ to be unbiased for  .
2. ˆ to have a small variance for large sample size.
Unbiased estimator:
An estimator ˆ is an unbiased estimator for a
population parameter  if and only if

E ˆ = .
5 BITS Pilani, Pilani Campus
Sample Mean
Ex.7.1. Let X 1 , X 2 , , X n be a random sample of size n from a distribution
with mean . The sample mean, X , is an unbiased estimator for  .
Sol. Here X 1 , X 2 , , X n is a random sample of size n from a distribution
with mean . Then, the sample mean, X , is
1
X   X1  X 2   X n 
n
E  X   E  X1  X 2   X n 
1
n
1
=  E  X 1  +E  X 2  +E  X 3  + +E  X n  
n

Thus, the sample mean, X , is an unbiased estimator for  .
Variance of Sample Mean
2
Ex.7.2. Show that   Var  X  
2
X
n
• From this theorem, it follows that larger the

sample size, sample mean can be expected to
lie closer to the population mean.
• Thus choosing large sample makes estimation
more reliable.
Variance of Sample Mean

Standard error of mean  X 
n
Ex.7.3. Show that sample variance S 2 of a random
sample of size n from a population X is an unbiased
estimator for population variance.
E  S 2    X2

More Efficient Unbiased Estimator
A statistic ˆ1 is said to be a more efficient unbiased
estimator of the parameter  than the statistic ˆ if 2
(a) ˆ1 and ˆ2 are both unbiased estimators of 

(b) variance of the sampling distribution of the
first estimator is no larger than that of the second
and is smaller for at least one value of  .

Method of Moments
• In method of moments (MoM), we compare the observed
sample moments (about origin) with the corresponding
population moments (about origin).
• If there are k-parameters in the distribution, then first k
sample moments will be compared with the first k
population moments to yield k equations. The solution of
this k-equations will provide the required estimated
parameter values.

Example: Method of Moments
Ex.7.4. Use method of moments to estimate the parameter of exponential
distribution.
x
1 
f  x;    e  ; x  0,   0

Sol.
Step 1 : Find E  X    .
1 n
Step 2 : Find the first sample moment as M 1   X i
n i 1
Step 3 : Equate the first sample moment with the first population moment.
1 n 1 n
   X i  ˆ  X   X i
n i 1 n i 1
Is the estimator ˆ unbiased?
HW.7.1. Use MoM to estimate the parameter of Poisson distribution.

e k k x
f  x; k   ; x  0,1, 2, and k  0
x!
Sol. kˆ  X ? Is there an alternative estimator of k ?
(hint: compare sample and population variance)
HW.7.2. Use MoM to estimate the parameters of Binomial distribution.

n x
f  x; n, p     p 1  p  ; x  0,1, 2,
n x
, n and 0  p  1
 x

HW.7.3. Use MoM to estimate the parameter of Rayleigh distribution.

x  x2 
f  x;    2 exp   2  ;   0, x  0
  2 
2
Sol. ˆ  X ? Is it an unbiased estimator?

HW.7.4. Use MoM to estimate the parameter of Maxwell distribution.

2 x2  1  x 2 
f  x;    exp      ;for   0, x  0
  3
 2    
X 2
Sol. ˆ  ? Is it an unbiased estimator?
2 
HW.7.5. Use MoM to estimate the parameters of Gaussian distribution.

 1 1  x 
2
  
 2  
;    x  ,     ,   0
f  x;  ,      2
e

 0 ; otherwise
HW.7.6. Use MoM to estimate the parameter of gamma distribution.

 1 
x
  x 1e  ; x  0,   0,   0
f  x;  ,        

 0 ; otherwise
Maximum Likelihood Estimation
1. MLE is the most widely used parameter estimation method as on today.

2. The basic principle is to maximize the likelihood of the parameters,
denoted by L  | x  , as a function of the model parameters  .
3. Note that the  can be a single parameter or a vector of parameters;
  1 , 2 , , p  .
n
4. The likelihood function L  | x  is defined as L  | x    f  xi ; 
i 1
5. As log is a one  to  one function, maximization of log  likelihood  ln L 

is often preferred for computational ease.

Examples: MLE
Ex.7.5. Let X 1 , X 2 , X m be a random sample of size m from a
binomial distribution of parameters n ( known) and p. Find the
maximum likelihood estimator for p. Is it an unbiased estimator?
Sol.
Step 1 : The log-likelihood function for binomial distribution is
m
L( p x)   f ( xi , p ), 0  p 1
i 1
m m
m   n  xi   m
 n    xi nm   xi
     p (1  p ) n  xi        p i1 (1  p ) i1
i 1   xi    i 1  xi  
 m n  m
   
m
ln L( p x)  ln        xi ln p   nm   xi  ln(1  p )
 i 1  xi   i 1  i 1 
Examples: MLE
Step 2 : The corresponding log  likelihood equation is

ln L( p x)  0
p
   
m
nm   xi 
m
 xi 
 i 1    0
L '( p) i 1

L( p ) p 1 p
 m
  m 
  nm   xi  p    xi  (1  p )
 i 1   i 1 
 m 
  Xi  X
Step 3 : The estimator of p is then obtained as pˆ   i 1  
nm n
Why does this estimator maximize likelihood function?
Examples: MLE
Ex.7.6. Use MLE to estimate parameters of exponential distribution
x
1 
f  x;    e  ; x  0,   0

Sol.
Step 1 : The log  likelihood function for exponential distribution is
n
xi
ln L  | x   ln L  ; x1 , x2 ,..., xn   n ln   
i 1 
Step 2 : The corresponding log  likelihood equation is

ln L  0

1 n
Step 3 : The estimator of  is then obtained as ˆ   X i
n i 1
Examples: MLE
HW.7.7. Use MLE to estimate the parameter of Poisson distribution.
HW.7.8. Use MLE to estimate the parameter of Gaussian distribution.

 1 1  x 
2
  
 2  
;    x  ,     ,   0
f  x;  ,      2
e

 0 ; otherwise
 X X
n
2
i
n 1 2
Note that the ML estimator for  is ˆ 2  2 i 1
 S .
n n
Thus M-L estimator for  2 is not unbiased.

Example: MLE
HW7.9. Use MLE to estimate the parameter of Rayleigh distribution.
x  x2 
f  x;    2 exp   2  ;   0, x  0
  2 
1 n 2
Sol. ˆ  
2n i 1
Xi ?
HW7.10. Use MLE to estimate the parameter of Maxwell distribution.

2 x2  1  x 2 
f  x;    exp      ;for   0, x  0
  3
 2    
1 n 2
Sol. ˆ  
3n i 1
Xi ?

Example: MLE
HW7.11. Use MLE to estimate the parameters of inverse Gaussian distribution
     
2
 t 
f  t;  ,    exp    ; t  0,   0,   0
2 t 3
 2 t 
2

HW7.12. Use MLE to estimate the parameters of lognormal distribution

1  1  ln t   2 
f  t; ,    exp      ; t  0,   0
t  2  2    

Examples: MLE
Ex.7.7. Use MLE to estimate the parameters of Weibull distribution

t 
  
f  t;   t  1   
e ; t  0,   0,   0
 
Sol.
Step 1 : The log  likelihood function is
ln L  | t   ln L  ,  ; t1 , t2 ,..., tn 

 n ln   n ln      1  ln  ti     i 
n n
t
i 1 i 1 

Step 2 : The corresponding log  likelihood equations are
 
ln L  0 and ln L  0
 
Examples: MLE
This gives
 1 n
ln L  ,  ; t1 , t2 ,..., tn   0      ti  0
 n i 1

 n n   ti    ti 
ln L  ,  ; t1 , t2 ,..., tn   0    1     ln    0
  i 1        
Step 3 : The estimates of  and  are then obtained from
n
1 1 n  i ln  ti 
t 
 1 n  
1
  ln  ti   i 1
 0 and     ti 
 n i 1 n
 n i 1 
i
t 
i 1
How to solve now? (Need to learn more! Numerical techniques?)

BITS Pilani
Pilani Campus

Self Reading Material:
Estimation of Tiger Population in India
https://esajournals.onlinelibrary.wiley.com/doi/epdf/10.1890/11-2110.1

Section 7.3: Functions of RV
Theorem 7.3.1 : Let X and Y be random variables with

moment generating functions mx(t) and mY(t),
respectively. If mx(t) = mY(t) for all t in some open
interval about 0, then X and Y have the same
distribution.
Theorem 7.3.2 : Let X1 and X2 be independent random

variables with moment generating functions mX1(t)
and mX2(t), respectively. Let Y = X1 + X2. The moment
generating function for Y is given by :
mY(t) = mX1(t). mX2(t)
Ex 7.3.2 : (Distribution of the sum of independent
normally distributed random variables)
Let X1, X2, X3,…, Xn be independent normal random
variables with means µ1, µ2, µ3,…, µn and variances
σ²1, σ²2,σ²3,…,σ²n respectively.
Let Y = X1+X2+X3+…+Xn. Note that the moment

generating function for Xi is given by:
mXi(t) = e (µit + (σ²it²/2)) i = 1,2,3,…,n

and the moment generating function for Y is (why?)

n
n
  n
2t 
2
mY  t    mX i  t   exp   i  t     i  
i 1  i 1   i 1  2 
The function on the right is nothing but the moment

generatingn function for a random
n
variable Y with
mean    i and variance    i2
2
i 1
i 1

Theorem 7.3.3:
Let X be a random variable with moment
generating function mx(t). Let Y = α + βX. The
moment generating function for Y is
mY(t) = e αt mx(βt)

Theorem 7.3.4: (Distribution of X -normal population)

Let X1, X2, X3,…, Xn be a random sample of size ‘n’ from
a normal distribution with mean µ and variance σ².
Then X is normally distributed with mean µ and

variance σ²/n.

HW 7.13 (Q 39: Distribution of a sum of
independent random variables)
Let X1,X2,X3,….,Xn be a collection of independent
random variables with moment generating functions
mX (t) (i=1,2,3,…..,n, respectively). Let a0, a1,a2,…., an
i
be real numbers, and let
Y= a0+a1X1+a2X2+…+anXn.
Show that the moment generating function for Y is
given by
n
mY (t) = e a0 t
m
i=1
Xi (a i t)
31
HW 7.14 (Q 41: Distribution of a linear combination of
independent normally distributed random variables)
Let X1,X2,X3,….,Xn be independent normal random

variables with means µi and σi² (i=1,2,3,…,n,
respectively). Let a0, a1,a2,…., an be real numbers, and
let
Y= a0+a1X1+a2X2+…+anXn
Show that Y is normal with mean
n
µ= a0 + ∑ai µi , and variance
i=1
n
σ²= ∑ai² σi² .
i =1
32
HW 7.15 (Q 44: Distribution of a sum of
independent chi-squared random variables)
Let X1,X2,X3,….,Xn be independent chi-squared random

variables with γ1, γ2, γ3,…, γn degrees of freedom
,respectively.
Let Y=X1+X2…+Xn .
Show that Y is a chi-squared random variable with
degrees of freedom where γ= ∑ γi
33
Homework
HW 7.16 (Q 52) Consider the random variable X

with density given by
f(x)= 1/ ; 0 < x < 
(a) Find E[X].
(b) Find the method of moments estimator for .
(c) Find the method of moments estimate for 
based on these data
1, 0.5, 1.4, 2.0, 0.25
34
Homework
HW 7.17 (Q 58). Let X1,X2,X3,….,X100 be a random

sample of size 100 from gamma distribution
with α=5 and β=3.
100
(a) Find the mgf of Y   X i

i 1
(b) What is the distribution of Y?

(c) Find the mgf of X  Y / n
(d) What is the distribution of X ?
35
BITS Pilani
Pilani Campus

Interval Estimation
• A point estimate cannot be expected to provide the exact value (close
value) of the population parameter.
• Usually, an interval estimate can be obtained by adding and

subtracting a margin of error to the point estimate. Then,
Interval Estimate = Point Estimate

` + /  Margin of Error
• Interval estimation provides us information about how close the

point estimate is to the value of the parameter.
• Why we use the term confidence interval?
37
Interval (CI) Estimation
– Instead of considering a statistic as a point estimator, we
may use random intervals to trap the parameter.
– In this case, the end points of the interval are RVs and we
can talk about the probability that it traps the parameter
value.
Confidence Interval : A 100(1- α)% confidence interval for a
parameter is a random interval [L1,L2] such that
P[L1 ≤ θ ≤ L2] = 1- α , regardless the value of θ.
38
Theorem 7.4.1: Interval estimation
for µ: σ known
Let X 1 , X 2 , , X n be a random sample from a normal population with
mean   unknown  and the variance  2  known  . Then, using Thm 7.3.4,
 2  X 
X N  ,  N  0,1
 n    
 
 n 
Taking two points  z 2 symmetrically about the origin, we get
 
 X  
P   z 2   z 2   1  
  
 
 n 
Here 1    is known as confidence level, and  is the level of significance.

Interval estimation for µ: σ known
   
P X  z 2    X  z 2   1  
 n n 
Hence, the confidence interval for population mean  having confidence
   
level 100  1    % is given as  X  z 2 , X  z 2  .
 n n 
The endpoints of the confidence interval is called confidence limits.

Interval estimation for µ: σ known
Most commonly used confidence levels:
   
Hence, 95% CI for  is given as  X  1.96 , X  1.96 .
 n n
   
That is, P  X  1.96    X  1.96   0.95
 n n

Practice Problems
HW 7.18 (Q 53) Studies have shown that the random variable
X, the processing time required to do a multiplication on a
new 3-D computer, is normally distributed with mean μ
and standard deviation 2 microseconds. A random sample
of 16 observations is to be taken
(a) These data are obtained
42.65 45.15 39.32 44.44
41.63 41.54 41.59 45.68
46.50 41.35 44.37 40.27
43.87 43.79 43.28 40.70
Based on these data, find an unbiased estimate for μ.
(b) Find a 95% confidence interval for μ.
42
Practice Problems
HW7.19. The mean of a sample size 50 from a normal population is observed to
be 15.68. If the s.d. of the population is 3.27, find (a) 80% (b) 95%, (c) 99%
confidence interval for the population mean. Can you find out the respective
margin of errors? What is the length of CI for each case?
Sol. (b)
Step 1: Here n  50, x  15.68,  =3.27, and   0.05. We need CI for  .
 
Step 2: As   0.05, we need to find z 2 such that P Z  z 2 =0.975.
From cumulative normal distribution table, we see z 2  1.96.
   
Step 3: The CI for   known  is  x  z0.025 , x  z0.025   14.77,16.59 
 n n 

Impracticality of Assumptions in CI
In practice, we usually face mainly two

problems in application of previous C.I.
formula.
• What if the population is not normal?
• What if the population variance is unknown?
Need to introduce Central Limit Theorem.
What is the beauty and importance of CLT?
44
Central Limit Theorem (CLT)
• Regardless of the population distribution model, as the sample size increases, the
sample mean tends to be normally distributed around the population mean, and its
standard deviation shrinks as n increases.
• Two conditions must be satisfied to apply CLT – (a) samples must be i.i.d. (b)
sample size must be large enough ( usually, n ≥ 30, but depends on problem!)
Let X 1 , X 2 , , X n be a random sample (that is, X i are i.i.d) of size n from a

distribution with mean  and variance  2 . Then for large n, sample mean X
is approximately normal with mean  and variance  2 n ; X  N  ,   n 
X 
Furthermore, for large n, the random variable  N  0,1 .
 n
CLT: Examples
• A certain brand of tires has a mean life of 25,000 km with a
s.d. of 1600 km. What is the probability that the mean life
of 64 tires is less than 24,600 km?
• A hawker sells dolls earns at various prices with a mean of

Rs.700 and s.d. of Rs. 250. During Christmas week, assume
that he sells 60 dolls. What is the probability that he earns
Rs. 45000 or more in that week?
Central Limit Theorem (CLT)
• Randomization – we assume that samples constitute a random sample
(i.i.d.) from the population.
• Large enough sample size – how large is large?
• If the population is normal, then the sampling distribution X will also

be normal, no matter what is the sample size.
• If the population is approximately symmetric, the distribution becomes
approximately normal for relatively small values of n.
• When the population is skewed, the sample size must be at least 30
before the sampling distribution of X becomes approximately normal.

CLT: Step by Step
Step 1: Identify parts of the problem. Your question should state:
• The mean (average or μ)
• The standard deviation (σ)
• The sample size (n)
Step 2: Find X and express the problem in terms of “greater than” or “less
than” the sample mean X .

Step 3: Use CLT to find the distribution of X and X  N  ,  n X 
Step 4: Convert the normal variate X to a standard normal variate Z 
 n
Now you may draw a graph, centre with the 0 (mean of Z) and shade the
appropriate area to find the required probability.

Problem Solving
HW7.20. A certain brand of tyres has a mean life of 25, 000 km with a s.d. of 1600 km.
What is the probability that the mean life of 64 tyres is less than 24, 600 km ?
Sol.
Step 1: Here X 1 , X 2 , , X 64 constitute a random sample, and it is given that
E  X i   25, 000 and  X i  1600.
Step 2: X   X i and our interest is to find out P  X  24600  .

1 64
64 i 1
Step 3: As we have a random sample of size 64 (sufficiently large n), we can use CLT
 1600 
to find the distribution of X , that is, X N  25000,  X N  25000, 200 
 64 
 X  25000 24600  25000 
Step 4: P  X  24600   P   
 200 200 
 P  Z  2   0.0228 (using standard normal cdf table)
Examples: Normal Distribution

Problem Solving
HW7.21. A hawker sells dolls at varying prices with a mean of Rs.700 and s.d.of Rs. 250.
During Christmas week, assume that he sells 60 dolls. What is the probability that he earns
Rs. 45000 or more in that week?
Step 1: Here X 1 , X 2 , , X 60 constitute a random sample, and it is given that
E  X i   700 and  X i  250.
 60 
Step 2: X   X i and our aim is to find out P   X i  45000  i.e., P  X  750  .
1 60
60 i 1  i 1 
Step 3: As we have a random sample of size 60 (sufficiently large n), we can use CLT
 250 
N  700,
to find the distribution of X , that is, X   X N  700,32.27 
 60 
 X  700 750  700 
Step 4: P  X  750   P     P  Z  1.55 
 32.27 32.27 
 P  Z  1.55   0.0606 (using standard normal cdf table)
Problem Solving
HW7.22. A population of 30 year  old males has a mean salary of Rs. 75000 with a
standard deviation of Rs. 10000. If a sample of 100 men is taken, what is the
probability that their mean salary will be less than Rs. 77500?
HW7.23. A certain group of welfare recipients receives pension benefits of Rs. 45000
per month with a standard deviation of Rs. 7500. If a random sample of 25 people is
taken, what is the probability that their mean pension benefit will be greater than
Rs. 47000 or less than Rs. 43000 per month ?
HW7.24. A certain population of dogs weigh an average of 5 kg, with a standard

deviation of 2 kg. If 40 dogs are chosen at random, what is the probability they have
an average weight of greater than 6.0 kg or less than 4.5 kg ?

Problem Solving
HW 7.25 (Q 49) When fission occurs, many of the nuclear
fragments formed have too many neutrons for stability.
Some of these neutrons are expelled almost
instantaneously. These observations are obtained on X,
the number of neutrons released during fission of
plutonium-239:
3 2 2 2 2 3 3 3
3 3 3 3 4 3 2 3
3 2 3 3 3 3 3 1
3 3 3 3 3 3 3 3
3 3 2 3 3 3 3 3
(a) Is X normally distributed? Explain. (b), (c), (d)
53

Chapter 7

Uploaded by

Copyright:

Available Formats

Chapter 7

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 7

Uploaded by

Copyright:

Available Formats

BITS Pilani

Course No: MATH F113

Probability and Statistics

• The value of the point estimator on a particular

• From this theorem, it follows that larger the

8 BITS Pilani, Pilani Campus

(a) ˆ1 and ˆ2 are both unbiased estimators of 

9 BITS Pilani, Pilani Campus

10 BITS Pilani, Pilani Campus

HW.7.1. Use MoM to estimate the parameter of Poisson distribution.

HW.7.2. Use MoM to estimate the parameters of Binomial distribution.

12 BITS Pilani, Pilani Campus

HW.7.3. Use MoM to estimate the parameter of Rayleigh distribution.

HW.7.4. Use MoM to estimate the parameter of Maxwell distribution.

HW.7.5. Use MoM to estimate the parameters of Gaussian distribution.

HW.7.6. Use MoM to estimate the parameter of gamma distribution.

1. MLE is the most widely used parameter estimation method as on today.

5. As log is a one  to  one function, maximization of log  likelihood  ln L 

15 BITS Pilani, Pilani Campus

HW.7.8. Use MLE to estimate the parameter of Gaussian distribution.

19 BITS Pilani, Pilani Campus

HW7.10. Use MLE to estimate the parameter of Maxwell distribution.

20 BITS Pilani, Pilani Campus

HW7.12. Use MLE to estimate the parameters of lognormal distribution

21 BITS Pilani, Pilani Campus

How to solve now? (Need to learn more! Numerical techniques?)

Course No: MATH F113

Probability and Statistics

25 BITS Pilani, Pilani Campus

Theorem 7.3.1 : Let X and Y be random variables with

Theorem 7.3.2 : Let X1 and X2 be independent random

Let Y = X1+X2+X3+…+Xn. Note that the moment

27 BITS Pilani, Pilani Campus

and the moment generating function for Y is (why?)

The function on the right is nothing but the moment

28 BITS Pilani, Pilani Campus

29 BITS Pilani, Pilani Campus

Theorem 7.3.4: (Distribution of X -normal population)

Then X is normally distributed with mean µ and

30 BITS Pilani, Pilani Campus

Let X1,X2,X3,….,Xn be independent normal random

Let X1,X2,X3,….,Xn be independent chi-squared random

HW 7.16 (Q 52) Consider the random variable X

HW 7.17 (Q 58). Let X1,X2,X3,….,X100 be a random

(a) Find the mgf of Y   X i

(b) What is the distribution of Y?

Course No: MATH F113

Probability and Statistics

• Usually, an interval estimate can be obtained by adding and

Interval Estimate = Point Estimate

• Interval estimation provides us information about how close the

39 BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Most commonly used confidence levels:

41 BITS Pilani, Pilani Campus

43 BITS Pilani, Pilani Campus

In practice, we usually face mainly two

Let X 1 , X 2 , , X n be a random sample (that is, X i are i.i.d) of size n from a

• A hawker sells dolls earns at various prices with a mean of