Lecture1
Lecture1
Lecture 1
Indian Institute of Technology Delhi
Contents
Linear Mean Model 1
Finite Sample Properties 3
Large Sample Properties 3
Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . . . 5
Confidence Interval and Testing . . . . . . . . . . . . . . . . . . . . 6
heighti = b0 + b1 agei + ϵi ,
1
Lecture Notes Linear Mean Model
where E(ϵi ) = 0 and Var(ϵi ) = σ 2 < ∞. The only assumption here is that variance
is finite and everything else hols without loss of generality as we can always write
Xi = E(Xi ) + (Xi − E(Xi ))
= E(Xi ) + ϵi ,
where ϵi satisfy E(ϵi ) = 0. So the quantity of interest is θ = E(Xi ) and σ 2 = Var(ϵi ).
Let X = (X1 . . . Xn )⊤ be an i.i.d. sample from this model. We want to estimate
(θ, σ 2 ) from the sample observations. Consider following estimators
n
1X
θ̂ = Xi ,
n i=1
and n
1 X
σ̂ 2 = [Xi − θ̂]2 .
n − 1 i=1
The α which minimizes this problem is exactly E(X). The best prediction of
any random variable in terms of MSE is its expectation.
E[θ̂] = θ.
H = np.mean(np.mean(np.random.rand(100,10000),axis=0))-.5
2
Lecture Notes Finite Sample Properties
Fact 1.1
(1) If X ∼ N (θ, Σ) be a n × 1 random vector and A (m × n) a matrix, then
AX ∼ N (Aθ, AΣA⊤ ).
a
In denote n × n identity matrix.
θ̂ ∼ N (θ, σ 2 /n).
and
(n − 1)σ̂ 2
∼ χ2n−1 ,
σ2
this follows from Fact 1 (4) as
1 − n1 − n1 . . . − n1
(n − 1)σ̂ 2 ⊤
− n1 1 − n1 . . . − n1
= X .. .. .. .. X,
σ2
. . . .
1
−n − n . . . 1 − n1
1
| {z }
M
3
Lecture Notes Large Sample Properties
We show that estimator θ̂ and σ̂ 2 are consistent estimators. Under E(|Xi |) < ∞,
from weak law of large number
n
1X P
θ̂ = Xi −
→θ
n i=1
dff=np.zeros(100)
eps=.01
for i in range(100):
H = (np.sign(np.abs(np.mean(np.random.rand(100*(i+1),1000),
axis=0)-.5)-eps)+1)/2
dff[i]= np.mean(H)
plt.plot(dff)
plt.show()
4
Lecture Notes 3.2 Asymptotic Normality
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 20 40 60 80 100
※ Asymptotic Normality
Now, we will see as sample size grow the approximate distribution of θ̂ is normal
distribution. We will use central limit theorem to prove this result.
√ n
!
1X d
n − N 0, σ 2 ,
Xi − µ →
n i=1
where
n
!
2 1X
σ = lim nVar Xi .
n→∞ n i=1
clt = np.sqrt(12*100)*(np.mean(np.random.rand(100,1000),axis=0)-.5)
5
Lecture Notes 3.3 Confidence Interval and Testing
0.5
0.4
0.3
0.2
0.1
0.0
3 2 1 0 1 2 3
H0 : θ = θ0 versus H1 : θ ̸= θ0
The general strategy behind testing is to develop a test statistic T (X) whose
distribution under the null is known and its distribution under the alternate is
different compare to under the null distribution. The ability of the test based on
T (X) to detect null vs alternate relies on the difference in behavior under the null
vs alternate. Let consider the following test statistic
√
n(θ̂ − θ0 )
T (X) = .
σ̂
Under the null as sample size increases, we have
d
T (X) →
− N (0, 1).
6
Lecture Notes 3.3 Confidence Interval and Testing
where θ1 belongs the true alternate. Suppose we also want to control the type I
error to be less than α. Consider the test ϕ∗ given by
(
∗ 1 if T (X) < −c or T (X) > c
ϕ (X) =
0 otherwise,
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
3 2 1 0 1 2 3
7
Lecture Notes 3.3 Confidence Interval and Testing
interval for θ. One possibility is to set C(X) = (−∞, ∞). Such an interval will
contain the parameter of interest with probability 1. The problem is to construct
shortest interval that will contain θ with large probability. The goal is to find
confidence interval such that
Prθ {θ ∈ C(X)} ≥ 1 − α.
is of coverage probability at least 1 − α. Indeed, suppose that the true value of the
parameter is θ0 . Since the test of θ = θ0 against θ ̸= θ0 has level α by construction.
We can use the test above to find our confidence interval as
√ ′
n(θ̂ − θ )
C(X) = θ′ : ≤ zα/2
σ̂
( )
zα/2 σ̂ zα/2 σ̂
= θ : θ̂ − √ ≤ θ′ ≤ θ̂ + √
′
.
n n