s131 Reviewer 002

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

UP School of Statistics Student Council

Education and Research


w erho.weebly.com | 0 [email protected] | f /erhoismyhero | t @erhomyhero

S131_Reviewer_002
Statistics 131 Parametric Statistical Inference
2nd Exam Reviewer-Summary

Some Notations

X = (X 1 , X 2 , …, X n)' : r.s. of size n from F X (⋅; θ)


θ = (θ1 , θ 2 , …, θk )' : vector of k unknown parameters
Ωθ = parameter space : set of all admissible values of θ
θ^ = (θ^ 1 , θ^ 2 , …, θ^ k ) ' : vector of estimators for θ
τ (θ) : a function of the unknown parameters
T(X) : a statistic used as an estimator for τ (θ)

Definitions

• Point estimator – a statistic used to estimate an unknown parameter or a function of unknown


parameters; a point estimator is a rule or function
• Point estimate – a realized value of a point estimator; a real number

Mean squared error (MSE)


Formula of MSE of T ( X ) with respect to τ (θ)

MSEτ (θ ) (T ) = E[T − τ (θ)]2


2
= Var (T ) + (Bias)
2
= Var (T ) + [E (T ) − τ ( θ)]

Consistency
Let T 1 , T 2 , …, T n , … be a sequence of estimators of τ (θ) , where T n = T n (X ) is the same estimator
based on a r.s. of size n. The sequence {T n } is:

a) MSE consistent iff: MSEτ (θ ) (T n ) → 0 as n → ∞ , ∀ θ ∈ Ω θ


b) weakly consistent iff: P {|T n − τ (θ)| < ε } → 1 as n → ∞ , ∀ θ ∈ Ωθ

A sufficient condition for {T n } to be MSE consistent for τ (θ) is,

E(T n) → τ (θ) and V (T n) → 0 as n → ∞

1
Sufficiency

a) A statistic S = S (X ) is defined to be sufficient for θ iff f X ( x 1 , x 2 , … , x n | S = s ) is


independent of θ , ∀ s .

b) A statistic S = S (X ) is defined to be sufficient for θ iff given any other statistic T = T (X ) ,


f T(t | S = s) .

n
r. s.
For X ∼ Be( p) of size n and p ∈ [0, 1] , ∑ X i is sufficient for p .
i =1
n
r. s.
For X ∼ Po( λ ) of size n and λ > 0 , ∑ X i is sufficient for λ .
i =1

Joint Sufficiency

The statistics S 1 (X ) , S 2( X ), …, S r ( X) are said to be jointly sufficient for θ iff

f X ( x 1 , x 2 , … , x n | S 1 ( X ) = s 1 , S 2( X ) = s 2 , … , S r ( X ) = s r ) is independent of θ , ∀ s1 , s 2 , …, s r .

a) The random sample, X , is jointly sufficient.


b) The order statistics X (1 ) , X (2) , …, X(n) are jointly sufficient.

Fisher-Neyman Factorization Theorem

a) For single sufficient statistics:


r . s.
Let X ∼ f X (⋅; θ) , θ unknown . A statistic S = S (X ) is sufficient for θ iff,

f X 1 , X 2 , …, X n ( x 1 , x 2 , … , x n ) = g [S ( X ) ; θ ]⋅ h( X )

where h( X) ≥ 0 , a function of X independent of θ .


g[S (X ) ; θ] ≥ 0 , a function involving θ but depends on the r.s, X , only through S ( X )

b) For jointly sufficient statistics:


r . s.
Let X ∼ f X (⋅; θ) , θ unknown . The statistics S 1 (X ) , S 2( X ), …, S r ( X) are jointly sufficient
for θ iff,

f X 1 , X 2 , …, X n ( x 1 , x 2 , … , x n ) = g [S 1 (X ), S 2 ( X ), … , S r ( X ) ; θ ]⋅h( X )

where h( X) ≥ 0 , a function of X independent of θ .


g[S 1 ( X ), S 2( X) , …, S r ( X) ; θ] ≥ 0 , a function involving θ but depends on the r.s, X ,
only through the statistics S 1 (X ) , S 2( X ), …, S r ( X) .

2
Fisher-Neyman Factorizations

Note: h( X) ≥ 0 , g[S (X ) ; θ] ≥ 0 , and g[S 1 ( X ), S 2( X) , …, S r ( X) ; θ] ≥ 0 for the following


examples
r. s.
1. X ∼ Be( p), p ∈ [0, 1]
n
f X (x) = ∏
i=1
p X (x i ) i

n
= ∏ p x (1 − p)1 − x I {0, 1}( xi )
i i

i=1
n
= p∑ (1 − p) ∑
x n− x i i

i =1
I {0, 1}( x i )

h (X )
[ ]
n
g ∑ xi ; p
i=1

n
S n = ∑ X i is sufficient for p .
i =1

r. s.
2. X ∼ Po( λ ) , λ > 0
n
f X ( x) = ∏
i=1
p X (x i ) i

n
e λx −λ i

= ∏ I (x )
i=1 x i ! {0 , 1 , …} i
n
= e−n λ λ ∑ x i
∏ x1! I {0 , 1 , …}(x i )
i =1

[ ] h (X )
n
g ∑ xi ; λ
i=1

n
S n = ∑ X i is sufficient for p .
i =1

r. s.
3. X ∼ N ( μ , σ 2 ), μ ∈ ℝ , σ 2 > 0
n
f X ( x) = ∏
i=1
f X (x i ) i

n
= ∏ σ √12 π
i=1
n
exp −
1
{ 1
2σ 2
( xi − μ )2
} I (−∞ , ∞) ( x i )

{ 1
}

= ∏ (2 π σ 2 ) 2
exp −
2σ 2
( xi − μ )2 I (−∞ , ∞) ( x i )
i=1

3
{ }
n n n
2
− 1 2
= ( 2 π σ ) exp − 2

2σ 2
∑ (x i − μ ) ∏
i=1
I (−∞ , ∞) ( x i )
i =1

{ [ ]}
n n n
− 1
f X (x) = ( 2 π σ 2) 2 exp −

2 ∑ ( x i − X )2 + n( X − μ )2 ∏ I (−∞, ∞) ( x i)
i=1 i =1

Case 1: σ 2 known, μ unknown

{ }
n n

{ 1
} 1

2 2
f X (x) = exp −
2σ 2
n( X − μ ) (2 π σ ) 2 exp −
2σ 2
∑ ( X i − X )2 I (−∞ , ∞) (x i )
i=1

g [ X ; μ] h(X )

X is sufficient for μ .

Case 2: σ 2 unknown, μ known

{ }
n n n
2 1

2
f X (x) = ( 2 π σ ) exp −
2σ 2
2
∑ (x i − μ ) ∏
i=1
I (−∞ , ∞) ( x i )
i =1

h (X )
[ ]
n
g ∑ ( x i − μ )2 ; σ 2
i=1

∑ (X i − μ )2 is sufficient for σ 2 .
i =1

Case 3: both σ 2 and μ unknown

{ [ ]} ∏
n n n
2
− 1 2 2
f X (x) = ( 2 π σ ) 2 exp −
2σ 2
∑ ( x i − X ) + n( X − μ ) I (−∞, ∞) ( x i)
i=1 i =1

h(X )
[ ]
n
g ∑ ( X i − X )2 , X ; μ,σ2
i=1

n
X, ∑ ( X i − X )2 are jointly sufficient statistics for θ = ( μ , σ 2)'
i =1

Theorem
If {S 1 , S2 , …, Sr } is a set of jointly sufficient statistics, then any set of at least r 1 – 1 transformations
of {S 1 , S2 , …, Sr } is also jointly sufficient. In the case of a single sufficient statistic, if S = S (X ) is
sufficient, then any 1 – 1 transformation of S is also sufficient.

4
Examples:
r. s.
1. X ∼ Be( p), p ∈ [0, 1]
n
1
S n = ∑ X i is sufficient for p → X = S is also sufficient for p
i =1 n n

r. s.
2. X ∼ Po( λ ) , λ > 0
X is also sufficient for λ .
r. s.
3. X ∼ N ( μ , σ 2 ), μ ∈ ℝ , σ 2 > 0
n
• σ known, μ unknown: S n = ∑ X i is sufficient for μ .
2

i =1
2
• both σ and μ unknown: X , S are jointly sufficient for ( μ , σ 2 )'
2

Completeness

A statistic T = T ( X ) is said to be complete iff,

E[ g(T )] = 0 implies P[g (T ) = 0] = 1, ∀ θ ∈ Ωθ

Examples:
n
r. s.
1. Let X ∼ Be( p) , p ∈ [0, 1] . T = T (X ) = ∑ X i is complete.
i= 1
r. s.
2. Let X 1 , X 2 ∼ Be ( p) , p ∈ [0, 1] . T = T ( X ) = X 2 − X 1 is NOT complete.
r . s.
3. Let X 1 ∼ N (0, θ) . T = T ( X ) = X 1 is NOT complete.

Exponential Families of Densities

1. 1 – parameter exponential family of densities

A PDF/PMF, f X (⋅; θ) , θ unknown, is defined to belong to this family of densities iff


f X (⋅; θ) can be expressed as:
f X (⋅; θ) = a (θ)⋅b(x )⋅exp {c (θ)⋅d ( x )} , ∀ θ ∈ Ωθ

where a (θ) > 0, b( x ) > 0, ∀ x ∋ f X (⋅; θ) > 0 .

Examples: Some distributions belonging to the 1 – parameter exponential family of densities:

a) Be ( p) and Geo( p)
b) Po( λ )
c) Exp( λ )
d) N ( μ , σ 2 ) , σ 2 known
e) Bi (n , p)

5
2. k – parameter exponential family of densities

A PDF/PMF, f X (⋅; θ) , θ = (θ 1 , θ 2 , … , θ k )' unknown,is defined to belong to this family of


densities iff f X (⋅; θ) can be expressed as:

{∑ }
k
f X (⋅; θ) = a (θ)⋅b(x )⋅exp c j (θ)⋅d j (x ) , ∀ θ ∈ Ωθ
j =1

where a (θ) > 0, b( x ) > 0, ∀ x ∋ f X (⋅; θ) > 0 .

Examples: Some distributions belonging to the 2 – parameter exponential family of densities:

a) N ( μ , σ 2 )
b) Beta (a , b)
c) Ga(r , λ )

Theorem
r . s.
Let X ∼ f X (⋅; θ) , where f X (⋅; θ) belongs to the 1 – parameter exponential family of densities.
Then, a complete (minimal) sufficient statistic for θ is given by:
n
S = S (X ) = ∑ d( xi )
i =1

Theorem
r . s.
Let X ∼ f X (⋅; θ) , where f X (⋅; θ) belongs to the k – parameter exponential family of densities.
Then, a joint complete (minimal) sufficient set of statistics for θ is given by:
n n n

∑ d 1(x i ) , ∑ d 2 (x i), … , ∑ d k ( xi )
i =1 i=1 i =1

Factorizations of Members of Exponential Families of Densities and Associated Complete


(Minimal) Sufficient Statistics
r. s.
1. X ∼ Be( p), p ∈ [0, 1]
p X ( x) = p x (1 − p)1 − x I {0, 1}( x )

= (1 − p) I {0, 1}( x) exp x ln { ( )} p


1− p

a(p) b(x) d(x) c(p)


n n
Complete Sufficient Statistic (CSS): ∑ d ( xi ) = ∑ x i = S n
i=1 i=1

6
r. s.
2. X ∼ Geo( p) , p ∈ [ 0, 1]
p x ( x) = p(1 − p)x I {0 , 1 , …}( x)
= p I {0 , 1 , …}(x )exp {ln [(1 − p) x ]}
= p I {0 , 1 , …}(x )exp {x ln(1 − p)}

a(p) b(x) d(x) c(p)

n n
CSS: ∑ d ( xi ) = ∑ x i = S n
i=1 i=1

r. s.
3. X ∼ Po( λ ) , λ > 0
e λx
−λ
p X ( x) = I ( x)
x ! {0 , 1 , …}
1
= e− λ I (x ) exp {ln ( λ x )}
x ! {0 , 1 , …}
−λ 1
= e I (x ) exp {x ln ( λ )}
x ! {0 , 1 , …}

a(λ) b(x) d(x) c(λ)

n n
CSS: ∑ d ( xi ) = ∑ x i = S n
i=1 i=1

r. s.
4. X ∼ Exp( λ ), λ > 0
f X ( x) = λ e−λ x I (0 , ∞) ( x)
= λ I (0 , ∞) ( x) exp {− λ x }

a(λ) b(x) c(λ) d(x)

n n
CSS: ∑ d ( xi ) = ∑ x i = S n
i=1 i=1

r. s.
5. X ∼ N ( μ , σ 2 ), μ ∈ ℝ , σ 2 > 0 , σ 2 known
f X ( x) =
1
σ √2 π
exp −
{1
2σ 2 }
( x − μ )2 I (−∞, ∞) ( x)

=
1
σ √2 π
exp −
{1
2σ 2
}
( x 2 − 2 μ x + μ 2 ) I (−∞ , ∞) (x)

=
1
σ √2 π
exp −
{1 2 1
2σ 2
x + 2μx−
σ
1
2σ 2 }
μ 2 I (−∞ , ∞) ( x)

7
= exp −
{ 1 2
2σ 2
x
} 1
σ √2 π
I (−∞, ∞) ( x) exp −
1
2σ 2
μ2
{ } {
exp
1
σ2
μx }
b(x) a(μ) c(μ) d(x)

n n
CSS: ∑ d ( xi ) = ∑ x i = S n
i=1 i=1

r. s.
6. X ∼ Bi(n , p), n ∈ ℤ, p ∈ [0, 1]

x ()
p X ( x) = n p x (1 − p)n − x I {0 , 1 , …, n }( x)
x
= (n ) (1 − p) (
p)
p
n
I ( x) {0 , 1 , …, n }
x 1 −

{ 1 − p ) ]}
( x ) exp ln [(
x
p
= (1 − p) ( n ) I
n
{0 , 1 , …, n}
x

( x)
= (1 − p) n I n
{ 1 − p )}
( x ) exp x ln (
{0 , 1 , …, n}
p

a(p) b(x) d(x) c(p)

n n
CSS: ∑ d ( xi ) = ∑ x i = S n
i=1 i=1

r. s.
7. X ∼ N ( μ , σ 2 ), μ ∈ ℝ , σ 2 > 0 . Both parameters unknown
f X ( x) =
1
σ √2 π
exp −
1 2 1
2σ 2 {
x + 2μx−
σ 2σ
1
2
μ 2 I (−∞ , ∞) ( x)
}
=
1
σ √2 π
exp −
1
2σ 2
{ }
μ 2 I (−∞ , ∞)( x) exp −

1
2
1
x2 + 2 μ x
σ { }
a(μ, σ2) b(x) c1(σ2) d1(x) c2(μ, σ2) d2(x)
n n n n
Joint CSS: ∑ d 1 (x i ) = ∑ x 2
i , ∑ d 2 ( xi ) = ∑ x i = S n
i=1 i =1 i=1 i=1

r. s.
8. X ∼ Beta( a , b), a > 0 , b > 0
1
f X ( x) = x a − 1 (1 − x)b − 1 I (0, 1) ( x )
B (a , b)
1
= I (0, 1) (x ) exp {ln [ x a − 1 (1 − x )b − 1 ]}
B (a , b)
1 a−1 b− 1
= I (0, 1) (x ) exp {ln ( x ) + ln [(1 − x ) ]}
B (a , b)

8
1
= I ( x ) exp {( a − 1)ln( x) + (b − 1)ln (1 − x)}
B (a , b) (0, 1)

a(a, b) b(x) c1(a) d1(x) c2(b) d2(x)

n n n n
Joint CSS: ∑ d 1 (x i ) = ∑ ln(x i ) , ∑ d 2 ( x i) = ∑ ln(1 − x i )
i=1 i =1 i=1 i=1

r. s.
9. X ∼ Ga(r , λ ), r > 0 , λ > 0
f X ( x) = λr x r − 1 e−λ x I (0 , ∞) ( x )
Γ(r )
r
= λ I (0 , ∞) (x ) exp {ln ( x r − 1)} exp {−λ x }
Γ(r )
r
= λ I (0 , ∞) (x ) exp {ln ( x r − 1) − λ x}
Γ(r )
r
= λ I (0 , ∞) (x ) exp {(r − 1) ln( x ) − λ x }
Γ(r )

a(r, λ) b(x) c1(r) d1(x) c2(λ) d2(x)

n n n n
Joint CSS: ∑ d 1 (x i ) = ∑ ln( x i ) , ∑ d 2 ( x i) = ∑ x i = S n
i=1 i =1 i=1 i=1

Unbiasedness

Definition: An estimator T = T ( X ) is defined to be unbiased for a function of unknown parameter(s)


say τ (θ) , iff:
E (T ) = E [T ( X )] = τ (θ)

Definition: If T = T ( X ) is an estimator of τ (θ) , the bias of T with respect to τ (θ) is defined as:

bτ (θ )(T ) = τ (θ) − E (T )

Minimum MSE

Definition: An estimator T = T ( X ) of τ (θ) is defined to be the minimum MSE estimator of τ (θ) , if


for any other estimator say T * ( X ) of τ (θ) ,

MSE (T ) ≤ MSE ( T *) , ∀ θ ∈ Ωθ , and


MSE (T ) < MSE (T *) , for at least 1 θ ∈ Ωθ

9
UMVUE (Uniformly Minimum Variance Unbiased Estimator)

Definition: An estimator T = T ( X ) of τ (θ) is defined to be the UMVUE for τ (θ) iff ∀ θ ∈ Ωθ :

i. E (T ) = τ (θ) ; and
ii. V (T ) ≤ V (T * ) , for any other unbiased estimator T * of τ (θ) .

Some Results:

1. If T = T ( X ) is the UMVUE for τ (θ) , then (aT + b) is the UMVUE for [a τ ( θ) + b]

2. If T 1 = T 1( X ) is the UMVUE for τ 1 (θ) and T 2 = T 2 ( X ) is the UMVUE for τ 2 (θ) , with T 1
and T 2 independent, then (a 1 T 1 + a 2 T 2) is the UMVUE for [a1 τ 1 (θ) + a 2 τ 2 (θ)]

Cramer-Rao Lower Bound (CRLB)

[ τ ' (θ)]2
CRLB =
n I (θ)

where τ ' ( θ) = Dθ τ (θ) = ∂ τ (θ)


∂θ

n = sample size

I (θ) = Fisher's Information Measure

[( )]
2
= E ∂ ln f X ( x i ; θ)
∂θ

[ ]
2
= E ∂ ln f ( x ; θ)
X i
∂θ2

Rao-Blackwell Theorem
r . s.
Let X ∼ f X (⋅; θ) , θ ∈ Ωθ ⊂ ℝ (i.e., θ is a scalar). Further, let S = S ( X ) be a sufficient statistic for
θ and let T = T ( X ) be an unbiased estimator for τ (θ) , with finite variance V (T ) < ∞ . Define T ' as
T ' = E (T | S ) . Then,

i. T ' is a function of the sufficient statistic S .


ii. E (T ') = τ (θ) , ∀ θ ∈ Ω θ [ T ' is unbiased for τ (θ) ]
iii. V (T ' ) ≤ V (T ) , ∀ θ ∈ Ωθ , with equality iff P (T ' = T ) = 1 .

10
Lehmann-Scheffé Theorem
r . s.
Let X ∼ f X (⋅; θ) , θ ∈ Ωθ ⊂ ℝ (i.e., θ is a scalar). If the statistic S = S ( X ) is complete (minimal)
and sufficient for θ and there exists an unbiased estimator T = T ( X ) for τ (θ) , then there exists a
unique UMVUE for τ (θ) ,given by: T * = E (T | S ) .

Efficiency

Definitions:

1. An unbiased estimator T = T ( X ) of τ (θ) is defined to be efficient for τ (θ) , iff, the variance
of T attains the CRLB of the variances of unbiased estimators of τ (θ) .

2. The efficiency (Eff) of an unbiased estimator T = T ( X ) is defined as the ratio of the


corresponding CRLB (of the variances of unbiased estimators of τ (θ) ) to Var (T ) , i.e.,

CRLB
Eff (T ) = × 100 %
Var (T )

3. If T 1 and T 2 are both unbiased for τ (θ) , with Var (T 1) ≤ Var (T 2) , then, the relative efficiency
(REff) of T 2 with respect to T 1 is defined as,

Var (T 1)
REff = × 100 %
Var ( T 2)

4. An estimator T = T ( X ) of τ (θ) is defined to be asymptotically efficient for τ (θ) , iff,

i. E (T ) → τ (θ) as n → ∞ , i.e., T is asymptotically unbiased for τ (θ)

ii. Eff ( T ) → 1 as n → ∞

Invariance and Equivariance

Definition: Let X = ( X 1 , X 2 , …, X n )' be a random sample from f X (⋅; θ) , θ ∈ Ωθ . Suppose


T = T ( X ) is an estimator of τ (θ) . Then T is defined to be:

(a) Location invariant, iff, T ( X + c 1 n) = T ( X ) , ∀ x ∈ S , ∀ c ∈ ℝ

(b) Scale invariant, iff, T (c X ) = T ( X ), ∀ x ∈ S , ∀ c ∈ ℝ +

(c) Location equivariant, iff, T ( X + c 1 n) = T ( X ) + c , ∀ x ∈ S , ∀ c ∈ ℝ

(d) Scale equivariant, iff, T (c X ) = c T ( X ) , ∀ x ∈ S , ∀ c ∈ ℝ+

(e) Location-scale invariant, iff, T (c 1 X + c 2 1 n ) = T ( X ), ∀ x ∈ S , ∀ c 1 ∈ ℝ+ , and ∀ c 2 ∈ ℝ

11
(f) Location-scale equivariant, iff,
+
T (c 1 X + c 2 1 n ) = c1 T ( X ) + c 2 , ∀ x ∈ S , ∀ c 1 ∈ ℝ ,and ∀ c 2 ∈ ℝ

Robustness and Resistance

Robustness: An estimator that performs well under modifications of the underlying assumptions is
said to be robust.

Resistance: An estimator that is little affected by extreme observations is said to be resistant.

Ancillarity
r . s.
Let X ∼ f X (⋅;θ). Let the statistic T = T ( X ) be sufficient for θ , and suppose dim(T ) > dim(θ) . If T
can be written as T = (T 1 , T 2) where T 2 = T 2 ( X ) has a marginal distribution which is independent of
θ , then the statistic T 2 is defined to be an ancillary statistic. Moreover, the statistic T 1 = T 1( X ) is
called a conditionally sufficient statistic.

BLUE/UMVULE (Best Linear Unbiased Estimator/ Uniformly Minimum Variance Unbiased


Linear Estimator)

Definition: An estimator T * = T *( X ) is defined to be BLUE/UMVULE for τ (θ) , iff,

i. T * is a linear function of X 1 , X 2 , … , X n ;
ii. E (T * ) = τ (θ) , i.e., T * is unbiased for τ (θ) ; and
iii. V (T * ) ≤ V (T ), ∀ θ ∈ Ωθ for any other linear unbiased estimator T of τ (θ) .

Remark: The BLUE is the counterpart of the UMVUE if we restrict ourselves with linear estimators.
r . s.
Result: Let X ∼ f X (⋅ ; μ , σ 2) , with σ 2 < ∞ . Then X is the BLUE/UMVULE for μ , regardless
of the form of f X

Methods of Finding Estimators

1. Method of Moments Estimation (MME)


r . s.
Let X ∼ F X (⋅ ; θ) , θ ∈ Ωθ with PDF f X (⋅ ; θ) . Let μ r ' denote the rth population raw moment.
Generally, μ r ' will be a function of θ , so we can write μ r ' = μ r ' (θ). Let M r ' denote the rth sample
raw moment:

➢ Equate the first k sample raw moments (if θ has k components) to their corresponding raw moments,
i.e.

M 1' = μ 1' , M 2' = μ 2' , … , M k ' = μ k '

➢ Solve these k equations simultaneously for θ1 , θ 2 , … , θk

12
~ ~ ~
➢ The solutions, denoted by θ 1 , θ 2 , … , θ k , are the MMEs of the parameters θ1 , θ 2 , … , θk ,
respectively.
n

∑ X ri
r i=1
Recall: μ r ' = E ( X ) M r' =
n

2. Maximum Likelihood Estimation (MLE)

➢ Likelihood function: A function of θ1 , θ 2 , … , θk with the values of the random variables


known

L(θ | X = x) = L(θ | X 1 = x 1 , X 2 = x 2 , … , X n = x n )
n
= ∏ f X (⋅ ; θ)
i=1

r . s.
➢ Let X ∼ f X (⋅ ; θ), θ ∈ Ω θ and let L(θ | X ) be the likelihood function of the random
sample. The estimator
θ^ = ( θ^ 1 , θ^ 2 , … , θ^ k )'

is defined as the maximum likelihood estimator of θ iff,

L( θ^ | x ) = sup L(θ | x) , i.e. , L( θ^ | x ) > L( θ | x) , ∀ θ ∈ Ωθ


θ ∈ Ωθ

➢ Method: Maximize L(θ | X = x)

i. Use ∂ L(θ | X = x ) = 0
∂θ

ii. Find ln [ L (θ | X = x)] then use ∂ ln[ L(θ | X = x)] = 0


∂θ

iii. Sometimes differentiation might not work (e.g. the uniform distribution)

Theorem:
Let θ^ = ( θ^ 1 , θ^ 2 , … , θ^ k )' be the MLE of θ . Suppose τ 1 (θ), τ 2 (θ), … , τ r (θ) , 1 ≤ r ≤ k , are r
functions of the k unknown parameters. Then, the MLEs of these r functions are the same functions
evaluated at the MLEs of the k unknown parameters, i.e., the MLEs of τ 1 (θ) , τ 2 (θ), … , τ r (θ) are
^ , τ 2 (θ),
τ 1 ( θ) ^ … , τ r (θ). ^

Theorem: MLEs are functions of sufficient statistics

Remark: The MLE chooses as “best” the mode of the distribution or the modal value of θ

13
Theorem: If the parent PDF/PMF satisfies certain regularity conditions, the MLE of τ (θ) say
^ is asymptotically normally distributed.
τ^ (θ) = τ ( θ)

^ ≈ N ( μ = τ (θ), σ 2 = CRLB τ (θ) ) as n → ∞


τ ( θ)

Some Results:

1. MLEs may or may not be unbiased, but they are consistent.

2. MLEs are asymptotically unbiased and asymptotically efficient for the target parameter, and are
also asymptotically normally distributed.

3. If an MLE is unbiased for a target parameter and its variance attains the corresponding CRLB,
then it is the UMVUE.

4. If an MLE is unbiased for a target parameter and if it is a function of the CSS, then it is the
UMVUE.

14

You might also like