Chapter 7

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

BITS Pilani

Pilani Campus

Course No: MATH F113

Probability and Statistics


Chapter 7 (Estimation)
Sumanta Pasari
BITS Pilani
Pilani Campus [email protected]
Parameter Estimation
• Parameter estimation is one of the important steps
in statistical inference.
• It belongs to the subject of estimation theory.
• Why do we require parameter estimation?
• What are the different estimation methods?
• What are the desirable properties of an estimator?
• How to judge “how good is my estimator”?
• Two broad types – point estimation and interval
estimation 3 BITS Pilani, Pilani Campus
Estimator and estimate
• A statistic (which is a function of a random sample,
and hence a random variable) used to estimate the
population parameter θ is called a point estimator
for θ and is denoted by ˆ

• The value of the point estimator on a particular


sample of given size is called a point estimate for θ.

4
BITS Pilani, Pilani Campus
Desirable Properties
1. ˆ to be unbiased for  .
2. ˆ to have a small variance for large sample size.

Unbiased estimator:
An estimator ˆ is an unbiased estimator for a
population parameter  if and only if

E ˆ = .
5 BITS Pilani, Pilani Campus
Sample Mean
Ex.7.1. Let X 1 , X 2 , , X n be a random sample of size n from a distribution
with mean . The sample mean, X , is an unbiased estimator for  .
Sol. Here X 1 , X 2 , , X n is a random sample of size n from a distribution
with mean . Then, the sample mean, X , is
1
X   X1  X 2   X n 
n
E  X   E  X1  X 2   X n 
1
n
1
=  E  X 1  +E  X 2  +E  X 3  + +E  X n  
n

Thus, the sample mean, X , is an unbiased estimator for  .
6 BITS Pilani, Pilani Campus
Variance of Sample Mean
2
Ex.7.2. Show that   Var  X  
2
X
n

• From this theorem, it follows that larger the


sample size, sample mean can be expected to
lie closer to the population mean.
• Thus choosing large sample makes estimation
more reliable.
7 BITS Pilani, Pilani Campus
Variance of Sample Mean

Standard error of mean  X 
n
Ex.7.3. Show that sample variance S 2 of a random
sample of size n from a population X is an unbiased
estimator for population variance.
E  S 2    X2

8 BITS Pilani, Pilani Campus


More Efficient Unbiased Estimator
A statistic ˆ1 is said to be a more efficient unbiased
estimator of the parameter  than the statistic ˆ if 2

(a) ˆ1 and ˆ2 are both unbiased estimators of 


(b) variance of the sampling distribution of the
first estimator is no larger than that of the second
and is smaller for at least one value of  .

9 BITS Pilani, Pilani Campus


Method of Moments
• In method of moments (MoM), we compare the observed
sample moments (about origin) with the corresponding
population moments (about origin).
• If there are k-parameters in the distribution, then first k
sample moments will be compared with the first k
population moments to yield k equations. The solution of
this k-equations will provide the required estimated
parameter values.

10 BITS Pilani, Pilani Campus


Example: Method of Moments
Ex.7.4. Use method of moments to estimate the parameter of exponential
distribution.
x
1 
f  x;    e  ; x  0,   0

Sol.
Step 1 : Find E  X    .
1 n
Step 2 : Find the first sample moment as M 1   X i
n i 1
Step 3 : Equate the first sample moment with the first population moment.
1 n 1 n
   X i  ˆ  X   X i
n i 1 n i 1
Is the estimator ˆ unbiased?
11 BITS Pilani, Pilani Campus
Example: Method of Moments

HW.7.1. Use MoM to estimate the parameter of Poisson distribution.


e k k x
f  x; k   ; x  0,1, 2, and k  0
x!
Sol. kˆ  X ? Is there an alternative estimator of k ?
(hint: compare sample and population variance)

HW.7.2. Use MoM to estimate the parameters of Binomial distribution.


n x
f  x; n, p     p 1  p  ; x  0,1, 2,
n x
, n and 0  p  1
 x

12 BITS Pilani, Pilani Campus


Example: Method of Moments

HW.7.3. Use MoM to estimate the parameter of Rayleigh distribution.


x  x2 
f  x;    2 exp   2  ;   0, x  0
  2 
2
Sol. ˆ  X ? Is it an unbiased estimator?

HW.7.4. Use MoM to estimate the parameter of Maxwell distribution.


2 x2  1  x 2 
f  x;    exp      ;for   0, x  0
  3
 2    
X 2
Sol. ˆ  ? Is it an unbiased estimator?
2 
13 BITS Pilani, Pilani Campus
Example: Method of Moments

HW.7.5. Use MoM to estimate the parameters of Gaussian distribution.


 1 1  x 
2
  
 2  
;    x  ,     ,   0
f  x;  ,      2
e

 0 ; otherwise

HW.7.6. Use MoM to estimate the parameter of gamma distribution.


 1 
x

  x 1e  ; x  0,   0,   0
f  x;  ,        

 0 ; otherwise
14 BITS Pilani, Pilani Campus
Maximum Likelihood Estimation

1. MLE is the most widely used parameter estimation method as on today.


2. The basic principle is to maximize the likelihood of the parameters,
denoted by L  | x  , as a function of the model parameters  .
3. Note that the  can be a single parameter or a vector of parameters;
  1 , 2 , , p  .
n
4. The likelihood function L  | x  is defined as L  | x    f  xi ; 
i 1

5. As log is a one  to  one function, maximization of log  likelihood  ln L 


is often preferred for computational ease.

15 BITS Pilani, Pilani Campus


Examples: MLE
Ex.7.5. Let X 1 , X 2 , X m be a random sample of size m from a
binomial distribution of parameters n ( known) and p. Find the
maximum likelihood estimator for p. Is it an unbiased estimator?
Sol.
Step 1 : The log-likelihood function for binomial distribution is
m
L( p x)   f ( xi , p ), 0  p 1
i 1
m m
m   n  xi   m
 n    xi nm   xi
     p (1  p ) n  xi        p i1 (1  p ) i1
i 1   xi    i 1  xi  
 m n  m
   
m
ln L( p x)  ln        xi ln p   nm   xi  ln(1  p )
 i 1  xi   i 1  i 1 
16 BITS Pilani, Pilani Campus
Examples: MLE
Step 2 : The corresponding log  likelihood equation is

ln L( p x)  0
p

   
m
nm   xi 
m
 xi 
 i 1    0
L '( p) i 1

L( p ) p 1 p
 m
  m 
  nm   xi  p    xi  (1  p )
 i 1   i 1 
 m 
  Xi  X
Step 3 : The estimator of p is then obtained as pˆ   i 1  
nm n
Why does this estimator maximize likelihood function?
17 BITS Pilani, Pilani Campus
Examples: MLE
Ex.7.6. Use MLE to estimate parameters of exponential distribution
x
1 
f  x;    e  ; x  0,   0

Sol.
Step 1 : The log  likelihood function for exponential distribution is
n
xi
ln L  | x   ln L  ; x1 , x2 ,..., xn   n ln   
i 1 
Step 2 : The corresponding log  likelihood equation is

ln L  0

1 n
Step 3 : The estimator of  is then obtained as ˆ   X i
n i 1
18 BITS Pilani, Pilani Campus
Examples: MLE
HW.7.7. Use MLE to estimate the parameter of Poisson distribution.

HW.7.8. Use MLE to estimate the parameter of Gaussian distribution.


 1 1  x 
2
  
 2  
;    x  ,     ,   0
f  x;  ,      2
e

 0 ; otherwise

 X X
n
2
i
n 1 2
Note that the ML estimator for  is ˆ 2  2 i 1
 S .
n n
Thus M-L estimator for  2 is not unbiased.

19 BITS Pilani, Pilani Campus


Example: MLE
HW7.9. Use MLE to estimate the parameter of Rayleigh distribution.
x  x2 
f  x;    2 exp   2  ;   0, x  0
  2 
1 n 2
Sol. ˆ  
2n i 1
Xi ?

HW7.10. Use MLE to estimate the parameter of Maxwell distribution.


2 x2  1  x 2 
f  x;    exp      ;for   0, x  0
  3
 2    
1 n 2
Sol. ˆ  
3n i 1
Xi ?

20 BITS Pilani, Pilani Campus


Example: MLE
HW7.11. Use MLE to estimate the parameters of inverse Gaussian distribution
     
2
 t 
f  t;  ,    exp    ; t  0,   0,   0
2 t 3
 2 t 
2

HW7.12. Use MLE to estimate the parameters of lognormal distribution


1  1  ln t   2 
f  t; ,    exp      ; t  0,   0
t  2  2    

21 BITS Pilani, Pilani Campus


Examples: MLE
Ex.7.7. Use MLE to estimate the parameters of Weibull distribution

t 
  
f  t;   t  1   
e ; t  0,   0,   0
 

Sol.
Step 1 : The log  likelihood function is
ln L  | t   ln L  ,  ; t1 , t2 ,..., tn 

 n ln   n ln      1  ln  ti     i 
n n
t
i 1 i 1 

Step 2 : The corresponding log  likelihood equations are
 
ln L  0 and ln L  0
 
22 BITS Pilani, Pilani Campus
Examples: MLE
This gives
 1 n
ln L  ,  ; t1 , t2 ,..., tn   0      ti  0
 n i 1

 n n   ti    ti 
ln L  ,  ; t1 , t2 ,..., tn   0    1     ln    0
  i 1        
Step 3 : The estimates of  and  are then obtained from
n

1 1 n  i ln  ti 
t 
 1 n  
1

  ln  ti   i 1
 0 and     ti 
 n i 1 n
 n i 1 
i
t 

i 1

How to solve now? (Need to learn more! Numerical techniques?)


23 BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Course No: MATH F113

Probability and Statistics


BITS Pilani, Pilani Campus
Self Reading Material:
Estimation of Tiger Population in India
https://esajournals.onlinelibrary.wiley.com/doi/epdf/10.1890/11-2110.1

25 BITS Pilani, Pilani Campus


Section 7.3: Functions of RV

Theorem 7.3.1 : Let X and Y be random variables with


moment generating functions mx(t) and mY(t),
respectively. If mx(t) = mY(t) for all t in some open
interval about 0, then X and Y have the same
distribution.

Theorem 7.3.2 : Let X1 and X2 be independent random


variables with moment generating functions mX1(t)
and mX2(t), respectively. Let Y = X1 + X2. The moment
generating function for Y is given by :
mY(t) = mX1(t). mX2(t)
26 BITS Pilani, Pilani Campus
Section 7.3: Functions of RV
Ex 7.3.2 : (Distribution of the sum of independent
normally distributed random variables)
Let X1, X2, X3,…, Xn be independent normal random
variables with means µ1, µ2, µ3,…, µn and variances
σ²1, σ²2,σ²3,…,σ²n respectively.

Let Y = X1+X2+X3+…+Xn. Note that the moment


generating function for Xi is given by:
mXi(t) = e (µit + (σ²it²/2)) i = 1,2,3,…,n

27 BITS Pilani, Pilani Campus


Section 7.3: Functions of RV

and the moment generating function for Y is (why?)


n
n
  n
2t 
2
mY  t    mX i  t   exp   i  t     i  
i 1  i 1   i 1  2 

The function on the right is nothing but the moment


generatingn function for a random
n
variable Y with
mean    i and variance    i2
2

i 1
i 1

28 BITS Pilani, Pilani Campus


Section 7.3: Functions of RV

Theorem 7.3.3:
Let X be a random variable with moment
generating function mx(t). Let Y = α + βX. The
moment generating function for Y is
mY(t) = e αt mx(βt)

29 BITS Pilani, Pilani Campus


Section 7.3: Functions of RV

Theorem 7.3.4: (Distribution of X -normal population)


Let X1, X2, X3,…, Xn be a random sample of size ‘n’ from
a normal distribution with mean µ and variance σ².

Then X is normally distributed with mean µ and


variance σ²/n.

30 BITS Pilani, Pilani Campus


HW 7.13 (Q 39: Distribution of a sum of
independent random variables)
Let X1,X2,X3,….,Xn be a collection of independent
random variables with moment generating functions
mX (t) (i=1,2,3,…..,n, respectively). Let a0, a1,a2,…., an
i
be real numbers, and let
Y= a0+a1X1+a2X2+…+anXn.
Show that the moment generating function for Y is
given by
n
mY (t) = e a0 t
m
i=1
Xi (a i t)
31
BITS Pilani, Pilani Campus
HW 7.14 (Q 41: Distribution of a linear combination of
independent normally distributed random variables)

Let X1,X2,X3,….,Xn be independent normal random


variables with means µi and σi² (i=1,2,3,…,n,
respectively). Let a0, a1,a2,…., an be real numbers, and
let
Y= a0+a1X1+a2X2+…+anXn
Show that Y is normal with mean
n
µ= a0 + ∑ai µi , and variance
i=1
n
σ²= ∑ai² σi² .
i =1

32
BITS Pilani, Pilani Campus
HW 7.15 (Q 44: Distribution of a sum of
independent chi-squared random variables)

Let X1,X2,X3,….,Xn be independent chi-squared random


variables with γ1, γ2, γ3,…, γn degrees of freedom
,respectively.

Let Y=X1+X2…+Xn .
Show that Y is a chi-squared random variable with
degrees of freedom where γ= ∑ γi

33
BITS Pilani, Pilani Campus
Homework

HW 7.16 (Q 52) Consider the random variable X


with density given by
f(x)= 1/ ; 0 < x < 
(a) Find E[X].
(b) Find the method of moments estimator for .
(c) Find the method of moments estimate for 
based on these data
1, 0.5, 1.4, 2.0, 0.25

34
BITS Pilani, Pilani Campus
Homework

HW 7.17 (Q 58). Let X1,X2,X3,….,X100 be a random


sample of size 100 from gamma distribution
with α=5 and β=3.
100

(a) Find the mgf of Y   X i


i 1

(b) What is the distribution of Y?


(c) Find the mgf of X  Y / n
(d) What is the distribution of X ?

35
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Course No: MATH F113

Probability and Statistics


BITS Pilani, Pilani Campus
Interval Estimation
• A point estimate cannot be expected to provide the exact value (close
value) of the population parameter.

• Usually, an interval estimate can be obtained by adding and


subtracting a margin of error to the point estimate. Then,

Interval Estimate = Point Estimate


` + /  Margin of Error

• Interval estimation provides us information about how close the


point estimate is to the value of the parameter.
• Why we use the term confidence interval?
37
BITS Pilani, Pilani Campus
Interval (CI) Estimation
– Instead of considering a statistic as a point estimator, we
may use random intervals to trap the parameter.
– In this case, the end points of the interval are RVs and we
can talk about the probability that it traps the parameter
value.
Confidence Interval : A 100(1- α)% confidence interval for a
parameter is a random interval [L1,L2] such that
P[L1 ≤ θ ≤ L2] = 1- α , regardless the value of θ.
38
BITS Pilani, Pilani Campus
Theorem 7.4.1: Interval estimation
for µ: σ known
Let X 1 , X 2 , , X n be a random sample from a normal population with
mean   unknown  and the variance  2  known  . Then, using Thm 7.3.4,
 2  X 
X N  ,  N  0,1
 n    
 
 n 
Taking two points  z 2 symmetrically about the origin, we get
 
 X  
P   z 2   z 2   1  
  
 
 n 
Here 1    is known as confidence level, and  is the level of significance.

39 BITS Pilani, Pilani Campus


Interval estimation for µ: σ known

   
P X  z 2    X  z 2   1  
 n n 
Hence, the confidence interval for population mean  having confidence
   
level 100  1    % is given as  X  z 2 , X  z 2  .
 n n 
The endpoints of the confidence interval is called confidence limits.

BITS Pilani, Pilani Campus


Interval estimation for µ: σ known

Most commonly used confidence levels:

   
Hence, 95% CI for  is given as  X  1.96 , X  1.96 .
 n n
   
That is, P  X  1.96    X  1.96   0.95
 n n

41 BITS Pilani, Pilani Campus


Practice Problems
HW 7.18 (Q 53) Studies have shown that the random variable
X, the processing time required to do a multiplication on a
new 3-D computer, is normally distributed with mean μ
and standard deviation 2 microseconds. A random sample
of 16 observations is to be taken
(a) These data are obtained
42.65 45.15 39.32 44.44
41.63 41.54 41.59 45.68
46.50 41.35 44.37 40.27
43.87 43.79 43.28 40.70
Based on these data, find an unbiased estimate for μ.
(b) Find a 95% confidence interval for μ.
42
BITS Pilani, Pilani Campus
Practice Problems
HW7.19. The mean of a sample size 50 from a normal population is observed to
be 15.68. If the s.d. of the population is 3.27, find (a) 80% (b) 95%, (c) 99%
confidence interval for the population mean. Can you find out the respective
margin of errors? What is the length of CI for each case?

Sol. (b)
Step 1: Here n  50, x  15.68,  =3.27, and   0.05. We need CI for  .
 
Step 2: As   0.05, we need to find z 2 such that P Z  z 2 =0.975.
From cumulative normal distribution table, we see z 2  1.96.
   
Step 3: The CI for   known  is  x  z0.025 , x  z0.025   14.77,16.59 
 n n 

43 BITS Pilani, Pilani Campus


Impracticality of Assumptions in CI

In practice, we usually face mainly two


problems in application of previous C.I.
formula.
• What if the population is not normal?
• What if the population variance is unknown?
Need to introduce Central Limit Theorem.
What is the beauty and importance of CLT?
44
BITS Pilani, Pilani Campus
Central Limit Theorem (CLT)
• Regardless of the population distribution model, as the sample size increases, the
sample mean tends to be normally distributed around the population mean, and its
standard deviation shrinks as n increases.
• Two conditions must be satisfied to apply CLT – (a) samples must be i.i.d. (b)
sample size must be large enough ( usually, n ≥ 30, but depends on problem!)

Let X 1 , X 2 , , X n be a random sample (that is, X i are i.i.d) of size n from a


distribution with mean  and variance  2 . Then for large n, sample mean X
is approximately normal with mean  and variance  2 n ; X  N  ,   n 
X 
Furthermore, for large n, the random variable  N  0,1 .
 n
45 BITS Pilani, Pilani Campus
CLT: Examples
• A certain brand of tires has a mean life of 25,000 km with a
s.d. of 1600 km. What is the probability that the mean life
of 64 tires is less than 24,600 km?

• A hawker sells dolls earns at various prices with a mean of


Rs.700 and s.d. of Rs. 250. During Christmas week, assume
that he sells 60 dolls. What is the probability that he earns
Rs. 45000 or more in that week?
46 BITS Pilani, Pilani Campus
Central Limit Theorem (CLT)
• Randomization – we assume that samples constitute a random sample
(i.i.d.) from the population.

• Large enough sample size – how large is large?

• If the population is normal, then the sampling distribution X will also


be normal, no matter what is the sample size.
• If the population is approximately symmetric, the distribution becomes
approximately normal for relatively small values of n.
• When the population is skewed, the sample size must be at least 30
before the sampling distribution of X becomes approximately normal.

47 BITS Pilani, Pilani Campus


CLT: Step by Step
Step 1: Identify parts of the problem. Your question should state:
• The mean (average or μ)
• The standard deviation (σ)
• The sample size (n)
Step 2: Find X and express the problem in terms of “greater than” or “less
than” the sample mean X .

Step 3: Use CLT to find the distribution of X and X  N  ,  n X 
Step 4: Convert the normal variate X to a standard normal variate Z 
 n
Now you may draw a graph, centre with the 0 (mean of Z) and shade the
appropriate area to find the required probability.

48 BITS Pilani, Pilani Campus


Problem Solving
HW7.20. A certain brand of tyres has a mean life of 25, 000 km with a s.d. of 1600 km.
What is the probability that the mean life of 64 tyres is less than 24, 600 km ?
Sol.
Step 1: Here X 1 , X 2 , , X 64 constitute a random sample, and it is given that
E  X i   25, 000 and  X i  1600.

Step 2: X   X i and our interest is to find out P  X  24600  .


1 64
64 i 1
Step 3: As we have a random sample of size 64 (sufficiently large n), we can use CLT
 1600 
to find the distribution of X , that is, X N  25000,  X N  25000, 200 
 64 
 X  25000 24600  25000 
Step 4: P  X  24600   P   
 200 200 
 P  Z  2   0.0228 (using standard normal cdf table)
49 BITS Pilani, Pilani Campus
Examples: Normal Distribution

50 BITS Pilani, Pilani Campus


Problem Solving
HW7.21. A hawker sells dolls at varying prices with a mean of Rs.700 and s.d.of Rs. 250.
During Christmas week, assume that he sells 60 dolls. What is the probability that he earns
Rs. 45000 or more in that week?
Step 1: Here X 1 , X 2 , , X 60 constitute a random sample, and it is given that
E  X i   700 and  X i  250.
 60 
Step 2: X   X i and our aim is to find out P   X i  45000  i.e., P  X  750  .
1 60
60 i 1  i 1 
Step 3: As we have a random sample of size 60 (sufficiently large n), we can use CLT
 250 
N  700,
to find the distribution of X , that is, X   X N  700,32.27 
 60 
 X  700 750  700 
Step 4: P  X  750   P     P  Z  1.55 
 32.27 32.27 
 P  Z  1.55   0.0606 (using standard normal cdf table)
51 BITS Pilani, Pilani Campus
Problem Solving
HW7.22. A population of 30 year  old males has a mean salary of Rs. 75000 with a
standard deviation of Rs. 10000. If a sample of 100 men is taken, what is the
probability that their mean salary will be less than Rs. 77500?

HW7.23. A certain group of welfare recipients receives pension benefits of Rs. 45000
per month with a standard deviation of Rs. 7500. If a random sample of 25 people is
taken, what is the probability that their mean pension benefit will be greater than
Rs. 47000 or less than Rs. 43000 per month ?

HW7.24. A certain population of dogs weigh an average of 5 kg, with a standard


deviation of 2 kg. If 40 dogs are chosen at random, what is the probability they have
an average weight of greater than 6.0 kg or less than 4.5 kg ?

52 BITS Pilani, Pilani Campus


Problem Solving
HW 7.25 (Q 49) When fission occurs, many of the nuclear
fragments formed have too many neutrons for stability.
Some of these neutrons are expelled almost
instantaneously. These observations are obtained on X,
the number of neutrons released during fission of
plutonium-239:
3 2 2 2 2 3 3 3
3 3 3 3 4 3 2 3
3 2 3 3 3 3 3 1
3 3 3 3 3 3 3 3
3 3 2 3 3 3 3 3
(a) Is X normally distributed? Explain. (b), (c), (d)
53
BITS Pilani, Pilani Campus

You might also like