0% found this document useful (0 votes)
11 views8 pages

Lecture1

Uploaded by

Rohit Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
11 views8 pages

Lecture1

Uploaded by

Rohit Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

HSP 511: Economics Lab

Lecture 1
Indian Institute of Technology Delhi

Contents
Linear Mean Model 1
Finite Sample Properties 3
Large Sample Properties 3
Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . . . 5
Confidence Interval and Testing . . . . . . . . . . . . . . . . . . . . 6

An econometric model is a mathematical model that embodies a set of statistical


assumptions concerning the generation of sample data (and similar data from a
larger population). It is usually specified as a mathematical relationship between
one or more random variables and other non-random variables. Suppose that we
have a population of children and we have an i.i.d. sample data on height, age from
this population. The height of a child will be stochastically related to the age. We
could formalize that relationship in a linear regression model,

heighti = b0 + b1 agei + ϵi ,

where b0 is the intercept, b1 is a parameter that age is multiplied by to obtain a


prediction of height, ϵi is the error term. This implies that height is predicted by
age, with some error. We can further specify some restrictions on error. However,
these assumptions on the error term ϵi should be such that the model is consistent
with all the data points.

※ Linear Mean Model


Suppose we have a random variable Xi e.g. height of individuals in some population.
This random variable has some distribution which we don’t know and we want to
understand this distribution from an i.i.d. sample from the population. Suppose we
are only interested in the mean of this distribution as mean is the best prediction
of any random variable in terms of mean squared error.
Lets consider the linear mean model and θ ∈ R be a parameter such that we
have statistical model
Xi = θ + ϵi ,

1
Lecture Notes Linear Mean Model

where E(ϵi ) = 0 and Var(ϵi ) = σ 2 < ∞. The only assumption here is that variance
is finite and everything else hols without loss of generality as we can always write
Xi = E(Xi ) + (Xi − E(Xi ))
= E(Xi ) + ϵi ,
where ϵi satisfy E(ϵi ) = 0. So the quantity of interest is θ = E(Xi ) and σ 2 = Var(ϵi ).
Let X = (X1 . . . Xn )⊤ be an i.i.d. sample from this model. We want to estimate
(θ, σ 2 ) from the sample observations. Consider following estimators
n
1X
θ̂ = Xi ,
n i=1
and n
1 X
σ̂ 2 = [Xi − θ̂]2 .
n − 1 i=1

R Suppose we want to predict a random variable by a constant α such that


mean squared error is minimized
min E(X − α)2 .
α

The α which minimizes this problem is exactly E(X). The best prediction of
any random variable in terms of MSE is its expectation.

R (Plug-in Approach) Most common approach to developing estimators is


based on the analogy principle or plug-in method. The analogy principle
of estimation proposes that population parameters be estimated by sample
statistics which have the same property in the sample as the parameters
do in the population. More generally expectations are replaced by sample
averages and parameters are replaced by their estimators.

Definition 1 An estimator θ̂ of θ is called unbiased if

E[θ̂] = θ.

Exercise 1 Show that θ̂ and σ̂ 2 are unbiased estimator of θ and σ 2 .


import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

H = np.mean(np.mean(np.random.rand(100,10000),axis=0))-.5

print('The bias for theta is: {}'.format(H))

2
Lecture Notes Finite Sample Properties

## The bias for theta is: 0.00025030865719077866

Fact 1.1
(1) If X ∼ N (θ, Σ) be a n × 1 random vector and A (m × n) a matrix, then

AX ∼ N (Aθ, AΣA⊤ ).

(2) If X ∼ N (0, 1), then


X 2 ∼ χ21 .
(3) If X ∼ N (0, In ), thena
X ⊤ X ∼ χ2n .
(4) If X ∼ N (0, In ) and M be orthogonal projection (M 2 = M and M = M ⊤ )
with rank n − k, then
X ⊤ M X ∼ χ2n−k .
(5) Let z ∼ N (0, 1), w ∼ χ2m and z ⊥ w, then
z
q
w
∼ tm .
m

a
In denote n × n identity matrix.

※ Finite Sample Properties

We want to know the distribution of our estimator θ̂ and σ̂ 2 to make inference


or build confidence interval. Usually, it is not easy to know the distribution
of the estimator until we either make some assumption or rely on large sample
considerations. Here we make the normality assumption in order to find the
distribution of our estimator. Assume that Xi ∼ N (θ, σ 2 ). Then

θ̂ ∼ N (θ, σ 2 /n).

and
(n − 1)σ̂ 2
∼ χ2n−1 ,
σ2
this follows from Fact 1 (4) as

1 − n1 − n1 . . . − n1
 

(n − 1)σ̂ 2 ⊤


− n1 1 − n1 . . . − n1 

= X .. .. .. ..  X,
σ2 
 . . . .


1
−n − n . . . 1 − n1
1
| {z }
M

where M is a projection matrix of rank n − 1.

3
Lecture Notes Large Sample Properties

※ Large Sample Properties


※ Consistency
P
Definition 2 We say an estimator θ̂ of θ is consistent if θ̂ −
→ θ.

We show that estimator θ̂ and σ̂ 2 are consistent estimators. Under E(|Xi |) < ∞,
from weak law of large number
n
1X P
θ̂ = Xi −
→θ
n i=1

Secondly, under E(Xi2 ) < ∞


n
1 X
σ̂ 2 = [Xi − θ̂]2
n − 1 i=1
n
1 X
= [Xi2 + θ̂2 − 2θ̂Xi ]
n − 1 i=1
n
n 1X n
= Xi2 − θ̂ |{z}
θ̂
n − 1 n i=1 n
| {z } | {z } | {z } P
− 1 |{z}
P
→1 P
→1 −→θ − →θ

→E(Xi2 )
P
→ E(Xi2 ) − (E(Xi ))2 = σ 2 .

dff=np.zeros(100)
eps=.01

for i in range(100):
H = (np.sign(np.abs(np.mean(np.random.rand(100*(i+1),1000),
axis=0)-.5)-eps)+1)/2
dff[i]= np.mean(H)

plt.plot(dff)
plt.show()

4
Lecture Notes 3.2 Asymptotic Normality

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 20 40 60 80 100

※ Asymptotic Normality

Now, we will see as sample size grow the approximate distribution of θ̂ is normal
distribution. We will use central limit theorem to prove this result.

Theorem 3.1: Central Limit Theorem


Suppose {X1 , . . . , Xn } is a sequence of i.i.d. random variables with E[Xi ] = µ
and Var[Xi ] < ∞. Then, as n approaches infinity

√ n
!
1X d
 
n − N 0, σ 2 ,
Xi − µ →
n i=1

where
n
!
2 1X
σ = lim nVar Xi .
n→∞ n i=1

x_axis = np.arange(-3, 3, 0.01)

clt = np.sqrt(12*100)*(np.mean(np.random.rand(100,1000),axis=0)-.5)

plt.hist(clt, bins=50, density= True)


plt.plot(x_axis, norm.pdf(x_axis, 0, 1))
plt.show()

5
Lecture Notes 3.3 Confidence Interval and Testing

0.5

0.4

0.3

0.2

0.1

0.0
3 2 1 0 1 2 3

※ Confidence Interval and Testing


Now, we want to develop a test for

H0 : θ = θ0 versus H1 : θ ̸= θ0

The general strategy behind testing is to develop a test statistic T (X) whose
distribution under the null is known and its distribution under the alternate is
different compare to under the null distribution. The ability of the test based on
T (X) to detect null vs alternate relies on the difference in behavior under the null
vs alternate. Let consider the following test statistic

n(θ̂ − θ0 )
T (X) = .
σ̂
Under the null as sample size increases, we have
d
T (X) →
− N (0, 1).

Under the alternate, we have



n(θ̂ − θ0 )
T (X) =
σ̂
√ √
n(θ̂ − θ1 ) n(θ1 − θ0 )
= + ,
| σ̂
{z } | σ̂
{z }
d large +ve or -ve

− N (0,1)

6
Lecture Notes 3.3 Confidence Interval and Testing

where θ1 belongs the true alternate. Suppose we also want to control the type I
error to be less than α. Consider the test ϕ∗ given by
(
∗ 1 if T (X) < −c or T (X) > c
ϕ (X) =
0 otherwise,

where c is chosen so that


E(1 − ϕ∗ (X)) ≤ α.
It is easy to see in our hypothesis testing c = zα/2 , where zα/2 is α/2 quantile of
normal distribution. So our test becomes
 √ 
 n(θ̂ − θ ) 
0
ϕ∗ (X) = 1 ≤ zα/2
 σ̂ 

x_axis = np.arange(-3, 3, 0.01)


x_axis1= np.arange(-3, -1.96, 0.01)
x_axis2= np.arange(1.96, 3, 0.01)

plt.plot(x_axis, norm.pdf(x_axis, 0, 1))


plt.fill_between(x_axis1, norm.pdf(x_axis1, 0, 1))
plt.fill_between(x_axis2, norm.pdf(x_axis2, 0, 1))
plt.show()

0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
3 2 1 0 1 2 3

A confidence interval is a data-dependent interval C(X) that contains the


parameter of interest with large probability. Suppose, we want to find confidence

7
Lecture Notes 3.3 Confidence Interval and Testing

interval for θ. One possibility is to set C(X) = (−∞, ∞). Such an interval will
contain the parameter of interest with probability 1. The problem is to construct
shortest interval that will contain θ with large probability. The goal is to find
confidence interval such that

Prθ {θ ∈ C(X)} ≥ 1 − α.

A confidence interval can constructed by inverting a test. For each possible


parameter value θ′ , consider the problem of testing the null hypothesis, H0 : θ = θ′
against the alternative, H1 : θ ̸= θ′ . Suppose that for each such hypothesis we have
a test of level α. Then the confidence set

C(X) = {θ′ ∈ Θ : the null hypothesis θ = θ′ is accepted.}

is of coverage probability at least 1 − α. Indeed, suppose that the true value of the
parameter is θ0 . Since the test of θ = θ0 against θ ̸= θ0 has level α by construction.
We can use the test above to find our confidence interval as
 √ ′

 n(θ̂ − θ ) 
C(X) = θ′ : ≤ zα/2
 σ̂ 
( )
zα/2 σ̂ zα/2 σ̂
= θ : θ̂ − √ ≤ θ′ ≤ θ̂ + √

.
n n

You might also like