Lecture1 Introduction

Lecture 1: Introduction and Review
Zhuzhu Zhou
Xiamen University
Introduction and Review
• Economic Questions and Data

• Review of Probability
• Review of Statistics
1
Economic Questions and Data
Econometrics is the science and art of using economic theory
and statistical techniques to analyze economic data.
2
Economic Questions
• Outsiders laugh at Economics, thinking conclusions too

straightforward: e.g. higher education leads to higher
salary in the future
• We can ask him: with 50w cash at hand, whether to
pursue graduate education, or to invest in stock?
• Need to compare the returns in education and stock:
quantitative answers required.
• We want to know the answer as well as how precise our
answer is.
• Econometrics, biometrics, anthropometrics,
milkteametrics...
3
Economic Questions
Many economic questions can be described by the relationship

between variables. e.g. education and salary.
4
Economic Questions: Causal Effect
• We want to compare two individuals’ salary withholding

all other factors but education the same.
• Ideally, we want to find two identical individuals (twins?)
and make an Randomized Controlled Experiment:
one takes more education (Treatment Group) while the
other less (Control Group).
• It is helpful to think about the “ideal experiment” when
asking for the causal effect.
• Forecast may not concern the causal effect: rooster and
sunrise.
• The conceptual framework of this course to explore the
relationship between variables is the Multiple
Regression Model.
5
Data Sources
Experimental Data v.s. Observational Data
• Values of variables cannot be intervened by economists by

experiments. Doing experiment in Economics is usually
intractable (expensive (Liang Jianzhang J-PAL); unethical
(Stanford Prison Experiment )).
• Many economists try to estimate causal effects using
non-random observational data with Econometrics.
6
Data Types
Economists are like workers. Data are the inputs. Experienced

lady cannot cook without rice . Different types of data allow
different findings. E.g. education and earnings: two individuals
v.s. one individual before and after receiving higher education;
drinking milk tea and getting fat.
• Cross-sectional Data: {xi , yi }

• Time Series Data: {xt , yt }
• Panel Data: {xit , yit }
Examples: CHARLS, Bureau of Statistics, Statistical Yearbook
7
Data Types
It is always helpful to bear in mind what sort of variation in

data help us identify the effect (reach our conclusions).
Different types of data may have different findings for the
same question. E.g. career training and earnings, or the effect
of SK2.
• Compare two individuals with/ without training

(cross-sectional).
• Compare one individual before and after receiving training
(time series).
8
Review of Probability
Review of Probability
1. Random Variables and Probability Distributions

2. Expected Values, Mean, and Variance
3. Two Random Vairables
4. Normal, Chi-Squared, Student t, and F Distributions
5. Random Sampling and Distribution of Sample Average
6. Large-Sample Approximations to Sampling Distributions
9
1. Random Variables and Probability Distributions
10
Probabilities and Outcomes
• The mutually exclusive potential results of a random

process are called the outcomes.
-Alipay lottery: normal people, mascot, super mascot.
-Pieces of meat you get from the service lady at the
dinning hall: 1, 2, 3, 4
• The probability of an outcome is the proportion of the
time that the outcome occurs in the long run.
• The set of all possible outcomes is called the sample
space.
• An event is a subset of the sample space.
11
Random Variables
• A random variable is a numerical summary of a random

outcome.
• Discrete random variables takes on a discrete set of
values. (e.g. the result of university application, graduate
entrance exam; the number of pieces of pork you get
from dinning hall; number of students sleeping in class
currently)
• Continuous random variables takes on a continuum of
possible values. (e.g. Shanghai Composite Index; the
grams of rice you get)
• Usually we use the upper case to represent the random
variable X, and lower case the value taken by the random
variable x.
12
Probability Distribution of a Discrete Random Vari-
able
• Probability distribution of a discrete random variable is

the list of all possible values of the variable and the
probability that each value will occur.
• Probabilities of events.
• Cumulative probability distribution (i.e. cumulative
distribution function CDF) is the probability that the
random variable is less than or equal to a particular value.
• Bernoulli distribution and bernoulli random variable.
13
Probability Distribution of a Continuous Random
Variable
• Cumulative probability distribution is the probability that

the random variable is less than or equal to a particular
value.
• Probability density function (PDF)
The area between any two points is the probability that
the random variable falls between those two points.
Probability of taking a single value is meaningless.
14
2. Expected Values, Mean, and Variance
15
Expected Value
The expected value of a random variable E (Y ), is the long-run

average value of the random variable over many repeated trials
or occurrences.
k
X
Discrete case : E (Y ) = y1 p1 + y2 p2 + ... + yk pk = yi pi
i=1
Z
Continuous case : E (Y ) = yfY (y )dy
16
Expected Value
• It is computed as a weighted average of the possible

outcomes of that random variable.
The weights are the probabilities of that outcome.
• The expected value of Y is also called the expectation of
Y or the mean of Y (µY ).
E.g. pieces of pork from dinning hall
17
Variance and Standard Deviation
The variance and standard deviation measure the dispersion or

the “spread” of a probability distribution.
If we want to measure the “spread”, what should we do? We
may want to calculate how far each value is away from the
mean. To get rid of the sign, we may want to square the
difference.
k
X
2 2
Discrete case : σ = E [(Y − µY ) ] = (yi − µY )2 pi
Zi=1
Continuous case : σ 2 = E [(Y − µY )2 ] = (y − µY )2 fY (y )dy
18
Variance and Standard Deviation
• The variance of a random variable Y (var (Y )), is the

expected value of the sqare of the deviation of Y from its
mean.
σY2 = var (Y ) = E [(Y − µY )2 ]
• It punishes extremes heavily.

• Standard deviation takes square root of the variance, so
that it has the same unit as the random variable.
19
Mean and Variance of a Linear Function of a Random
Variable
Y = a + bX
• Mean: µY = a + bµX
• Variance: σY2 = b 2 σX2
Prove it. Notice that the variance of a constant is 0.
20
Other Measures of the Shape of a Distribution
• Mean and variance are two important features of a

distribution. (e.g. weight and height of a person’s
appearance)
Appearance: How an individual looks like;
Distribution: How a random variable looks like
• Skewness describes how much a distribution deviates
from symmetry. Positive skewed, right-skewed, skewed to
the right.
E [(Y − µY )3 ]
Skewness =
σY3
21
Other Measures of the Shape of a Distribution
• Kurtosis of a distribution is a measure of how much

mass is in its tails.
How many outliers (extreme values)? (FIGURE)
E [(Y − µY )4 ]
Kurtosis =
σY4
• Mean, variance, skewness and kurtosis are moments of

a distribution.
They capture parts of the information of a distribution.
r th moment. For an interview, better to see the real
person, though sometimes it is costly.
22
3. Two Random Vairables
23
Joint Distributions
• The joint probability distribution of two discrete

random variables is the probability that the random
variables simultaneously take on certain values.
Discrete case : Pr (X = x, Y = y )
Continuous case : fXY (x, y )
24
Marginal Distributions
• The marginal probability distribution of a random variable

Y describes the distribution of Y alone.
l
X
Discrete case : Pr (Y = y ) = Pr (X = xi , Y = y )
i=1
Z
Continuous case : fY (y ) = fXY (x, y )dx
25
Joint and Marginal Distributions
26
Conditional Distributions
The distribution of a random variable Y conditional on another

random variable X taking on a specific value is called the
conditional distribution of Y given X. IT IS HELPFUL TO
RETHINK WHAT IS Pr? A NUMBER? OR A FUNCTION? A
FUNCTION OF X, OR X AND Y?
Pr (X = x, Y = y )
Discrete case : Pr (Y = y |X = x) =
Pr (X = x)
fXY (x, y )
Continuous case : fY |X =
fX (x)
27
The conditional expectation of Y given X, i.e. the conditional

mean of Y given X, is the mean of the conditional distribution
of Y given X.
Discrete case: E (Y |X ) = y1 p1 + y2 p2 + ... + yk pk = ki=1 yi pi .
P
Where is the difference from the unconditional expectation?

R
Continuous case: E (Y |X ) = yfY |X dy .
A function of what?
28
LAW OF ITERATED EXPECTATIONS (example:

Y:weight; X:height)
E (Y ) = E (E (Y |X ))
Z Z
= yfY |X dy fX dx
E (Y |Z ) = E (E (Y |X , Z )|Z )
Example: E (Y |X ) Y is the weight. X=1 for drinking bubble

milk tea, 0 otherwise.
Conditional variance. Similar.

29
Independence, Covariance and Correlation
• Independent: Knowing the value of one of the variables

provides no information about the other.
fX ,Y
fX ,Y = fX fY . Notice that = fY |X = fY
fX
• Correlation
cov (X , Y )
corr (X , Y ) =
σ(X )σ(Y )
X and Y are uncorrelated if cov (X , Y ) = 0. X and Y

being uncorrelated only suggests there is no “linear”
relationship between them. EXAMPLE.
30
Independence, Covariance and Correlation
• Mean independent.
E (Y |X ) = E (Y ) then cov (X , Y ) = 0
Proof : E (XY ) = E (E (XY |X )) = E (XE (Y |X )) =
E (XE (Y )) = E (X )E (Y )
• Relationship between Independent, Mean Independent
and Uncorrelated.
31
Means, Variances, and Covariances of Sums of Ran-
dom Variables
E(a + bX + cY ) =a + bE(X ) + cE(Y )

var(a + bY ) =b 2 var(Y )
var(aX + bY ) =a2 var(X ) + b 2 var(Y ) + 2abcov(X , Y )
cov(a + bX + cZ , Y ) =bcov(X , Y ) + ccov(Z , Y )
32
4. The Normal, Chi-Squared, Student t, and F
Distributions
33
The following distributions are used when testing certain types
of hypothesis in Econometrics.
• Normal Distribution
• Chi-Squared
• Student t
• F Distribution
34
Normal Distribution
• Univariate normal distribution: N(0, 1); N(µ, σ 2 ); p.d.f .ϕ(x);

c.d.f .Φ(x); skewness is 0; kurtosis is 3
1 1 x − µ 2
fX = √ exp − ( )
2πσ 2 2 σ
• Bivariate normal distribution:

1
gX ,Y = p
2πσX σY 1 − ρ2XY
1 x − µX 2 y − µY 2 x − µX y − µY
exp − 2 ( ) +( ) −2ρXY ( )( )
2(1 − ρXY ) σX σY σX σY
35
Normal Distribution
• If X and Y follow bivariate normal distribution with

covariance σXY , then aX+bY has the distribution
N(aµX + bµY , a2 σX2 + b 2 σY2 + 2abσXY ). Moreover, if n
random variables have a multivariate normal distribution,
then any linear combination of these variables is normally
distributed.
• If a set of variables has a multivariate normal distribution,
then the marginal distribution of each of the variables is
normal.
36
Normal Distribution
• If variables with a multivariate normal distribution have

covariances 0, then the variables are independent.
• If X and Y have a bivariate normal distribution, then the
conditional expectation of Y given X is linear in X.
E (Y |X = x) = a + bx.
37
Chi-Squared Distribution
The chi-squared distribution with m degrees of freedom χ2m is

the distribution of the sum of m squared independent standard
normal random variables.
38
Student t Distribution
• The Student t distribution with m degrees of freedom tm

is the distribution of the ratio of a standard normal
random variable, divided by the square root of an
independently distributed chi-squared random variable
with m degrees of freedom divided by m. √ Z
W /m
• when m is greater than 30, Student t distribution is close
to the standard normal.
39
F Distribution
The F distribution with m and n degrees of freedom Fm,n , is

the distribution of the ratio of a chi-squared random variable
with m degrees of freedom divided by m, to an independently
distributed chi-squared random variable with n degrees of
freedom divided by n. WV /n
/m
40
5. Random Sampling and Distribution of Sample
Average
41
Almost all the statistical and econometric procedures used in
this book involve averages or weighted averages of a sample
of data. Therefore understanding the distributions of sample
averages is important.
42
• Simple random sampling: n objects are selected at
random from a population. Each member of the
population is equally likely to be included in the sample.
• Y1 , ..., Yn can be treated as random variables. (The
randomness comes from sampling. If we are able to
sample again, the values of these variables will be
different.) Before being sampled, they can take many
values; after being sampled, a specific value is taken.
• When Y1 , ..., Yn are drawn from the same distribution and
are independently distributed, they are said to be
independently and identically distributed (i.i.d.).
43
Sample Average
• Sampling distribution: the distribution of a given

random-sample-based statistic.
• Sample average, or sample mean, is a random variable.
We want to know the sampling distribution of Ȳ .
n
1 1X
Ȳ = (Y1 + Y2 ... + Yn ) = Yi
n n i=1
• The sampling distribution of Ȳ depends on the

distribution of Y . But the mean and variance is given as
follows, regardless of the population distribution of Y .
44
Sample Average
• Mean of Ȳ , E (Ȳ )
n
1X
E (Ȳ ) = E (Yi ) = µY = E (Y )
n 1
• Variance of Ȳ
n
1X
var(Ȳ ) =var( Yi )
n i=1
n n n
1 X 1 X X
= 2 var(Yi ) + 2 cov(Yi , Yj )
n i=1 n i=1 j=1,j̸=i
var(Y )
=
n
45
6. Large-Sample Approximations to Sampling
Distributions
46
Large-Sample Approximations to Sampling Distribu-
tion of Ȳ
• Under i.i.d., the exact distribution (finite sample

distribution) of Ȳ depends on the number of
observations n as well as the population distribution of Yi .
• E.g. Yi has a normal distribution then so does Ȳ .
However, if Yi has other distributions, the finite sample
distribution of Ȳ is complicated.
• Asymptotic distribution is the large-sample approximation
to the sample distribution when n goes to infinite.
• The two powerful tools for asymptotic distribution: Law
of Large Numbers and Central Limit Theorem.
47
Law of Large Numbers
• If Yi , i = 1, ..., n are i.i.d., and var(Yi ) < ∞, then

p
Ȳ → µY
When n is very large, Ȳ will be near µY with very high
probability.
• What’s the difference between law of large numbers and
the property E(Ȳ ) = µY ?
• Converge in probability.
48
Central Limit Theorem
• We know the mean and variance of Ȳ is µY and σY2 /n.

How about the distribution?
• When n is large, the distribution of Ȳ is approximately
N(µY , σY2 /n), whatever the distribution of Y .
d
Ȳ → N(µY , σY2 /n)
or (the above formula is not strict, as the variance

depends on n, which is not a certain distribution.)
√ d
n(Ȳ − µY ) → N(0, σY2 )
49
Review of Statistics
Review of Statistics
1. Estimation of Population Mean

2. Hypothesis Tests Concerning Population Mean
3. Confidence Interval for the Population mean
50
The key insight of statistics is that one can learn about a
population distribution by selecting a random sample from
that population.
• Census: Implementation of census for the all population

is costly. The 6th China Population Census costs 8 billion
yuans.
• Survey: Instead, statistical methods allow us to draw
statistical inferences about characteristics of the full
population based on the random sample.
51
Three types of statistical methods frequently used in
Econometrics.
• Estimation
• Hypothesis Testing
• Confidence Intervals
We will focus on the population mean as an example. We will

be interested in other statistics of the population. The
statistical methods are similar.
52
1. Estimation of Population Mean
53
Estimators
• To estimate the population mean µY (i.e. the

estimand), there are many alternative estimators µ
bY .
1
For example, Ȳ , Y1 or (Y1 Y2 ...Yn ) n
• Estimators are functions of a sample of data to be
drawn randomly from a population.
Notice that they per se are random variables.
• An estimate is the numerical value of the estimator
when it is actually computed using data from a specific
sample. An estimate is a nonrandom number.
54
Estimators
• Various estimators have different properties. Which is

better?
- From the aspect of random variable, what are the
desirable characteristics of the sampling distribution of an
estimator?
- What are the desirable characteristics of the equipment
in video game? ATK, DEF, HP, MP....
- What are the desirable characteristics of the cosmetics?
Keeping moistness, antioxidization etc.
• We want to get estimate “close” to the unknown true
value of the estimand.
55
Properties of Estimators
• Unbiasedness: The mean of the sampling distribution of

a estimator equals to the true value.
E (b
µY ) = µY
• Consistency: The probability that the estimator is within
a small interval of the true value approaches 1 as the
sample size increases.
p
bY → µY
µ
• Efficiency: Two unbiased estimators, the one with
smaller variance is more efficient.
Usually we restrict to unbiased estimators to discuss
efficiency. Think about µ bY =c, where c is a constant. The
variance is 0 but bias is large. 56
Properties of Ȳ (i.i.d. is important!)
• E(Ȳ ) = µY , Ȳ is unbiased.
p
• µ
bY → µY , Ȳ is consistent.
• Ȳ is the most efficient among linear unbiased estimators.
best linear unbiased estimator (BLUE).
Compare with Y1 and
Ye = n1 ( 12 Y1 + 23 Y2 + ... + 21 Yn−1 + 32 Yn ).
var(Ye ) = 1.25σY2 /n
57
2. Hypothesis Tests Concerning Population Mean
58
Null and Alternative Hypotheses (two-sided)
• Null hypothesis: population mean E(Y ) takes on a

specific value µY ,0
H0 : E(Y ) = µY ,0
• Alternative hypothesis specifies what is true if the null

hypothesis is not.
H1 : E(Y ) ̸= µY ,0
59
Null and Alternative Hypotheses (two-sided)
• Our estimate for E(Y ), the value of µ eY , will rarely be

µY ,0 . Is it due to random sampling or that the population
mean E(Y ) is not µY ,0 ? We want to account for the
uncertainty resulted from random sampling.
- I observe that I got less pieces of meat than Ryan,
Jeremy and Maximo. Is it because I am less handsome, or
because random sample?
60
p-value
• Ȳ act is the value of the sample average actually computed

with the data.
• If the null hypothesis is true, Ȳ has the approximate
distribution N(µY0 , σY2 /n) when n is large.
• What’s the probability of drawing a statistic more
“extreme” than (or at least the same as) the value
computed with the data at hand, given the null
hypothesis is true?
• If the probability is high, then the null hypothesis is likely
to hold, whereas if the probability is low, the nul
hypothesis is unlikely to be true.
61
p-value
• EXAMPLE, when you try to borrow money from me. I

tell you that I have no money (null hypothesis). But you
observe that I am using an iPhone 13 pro 256 GB (this is
the value of the estimator we actually computed, the
data). Then if I indeed had no money (if the null
hypothesis is true), the probability that I can afford
iPhone 13 pro 256GB (or iPhone 14 Pro 512 GB) is very
low. Therefore the null hypothesis will be rejected.
(If I am using Xiaomi. Is the P-value large? Do you trust
the null hypothesis?)
62
p-value
• Mathematically, the p-value is:

h i
p-value = PrH0 |Ȳ − µY ,0 | > |Ȳ act − µY ,0 |
• To compute the p-value, it is necessary to know the

sampling distribution of Ȳ under the null hypothesis (If I
had no money, what are the possibilities that I use
different type of mobiles.). Again, the exact sampling
distribution can be difficult. But it can be approximated
when n is large according to the central limit theorem.
63
Calculating p-Value
• When σY is known:
h i
p-value =PrH0 |Ȳ − µY ,0 | > |Ȳ act − µY ,0 |
h Ȳ − µ Ȳ act − µ i
Y ,0 Y ,0
=PrH0 >

σȲ σȲ

Ȳ act − µ
Y ,0
=2Φ −

σȲ

Pay attention. Which is the random variable and which is

the number?
• Unfortunately, usually σY is unknown.
64
t-Statistic
• t statistic is the standardized sample average. It plays a

central role in hypothesis test in econometrics.
Ȳ − µY ,0
t=
SE(Ȳ )
• t is approximately distributed N(0,1) for large n.
65
Hypothesis Testing with a Prespecified Significance
Level
• Reject the null hypothesis if the absolute value of the

t-statistic computed from the sample is greater than the
critical value associated with prespecified significance
level.
- Critical value, if I am poor, the most expensive mobile
that I can afford.
• Type I error: the null hypothesis is rejected when in fact
it is true.
- False positive.
• Significance level of the test: the prespecified probability
of a type I error.
66
Sample Variance
• Distinguish the variance of the sample mean (the one

we’ve seen before) and the sample variance.
• Sample variance sY2 :
n
1 X
sY2 = (Yi − Ȳ )2
n − 1 i=1
• This is similar to the population variance. However, the
sample mean Ȳ instead of the population mean µY is
subtracted; and this is divided by n-1 instead of n.
• sY2 is unbiased by being divided by n−1
1
(PROVE). An
example that rule of analogy does not always produce
estimator with desirable properties.
• sY2 is consistent estimator of the population variance.
67
Standard Error
• The standard error of Ȳ is an estimator of the standard

d
deviation of Ȳ . Notice that Ȳ → N(µY , σY2 /n) andf σȲ
is estimated.
sY
SE(Ȳ ) = σȲ = √
n
68
3. Confidence Intervals for the Population Mean
69
Confidence Intervals
A 95% two-sided confidence interval for µY is an interval

constructed so that it contains the true value of µY in 95% of
all possible random samples. When the sample size n is large,
90%, 95%, and 99% confidence intervals for µY are:
• 90% confidence interval for µY = {Ȳ ± 1.64SE (Y¯)},

70

Lecture1 Introduction

Uploaded by

Lecture1 Introduction

Uploaded by

Lecture 1: Introduction and Review

• Economic Questions and Data

• Outsiders laugh at Economics, thinking conclusions too

Many economic questions can be described by the relationship

• We want to compare two individuals’ salary withholding

Experimental Data v.s. Observational Data

• Values of variables cannot be intervened by economists by

Economists are like workers. Data are the inputs. Experienced

• Cross-sectional Data: {xi , yi }

Examples: CHARLS, Bureau of Statistics, Statistical Yearbook

It is always helpful to bear in mind what sort of variation in

• Compare two individuals with/ without training

1. Random Variables and Probability Distributions

• The mutually exclusive potential results of a random

• A random variable is a numerical summary of a random

• Probability distribution of a discrete random variable is

• Cumulative probability distribution is the probability that

The expected value of a random variable E (Y ), is the long-run

• It is computed as a weighted average of the possible

The variance and standard deviation measure the dispersion or

• The variance of a random variable Y (var (Y )), is the

σY2 = var (Y ) = E [(Y − µY )2 ]

• It punishes extremes heavily.

Prove it. Notice that the variance of a constant is 0.

• Mean and variance are two important features of a

• Kurtosis of a distribution is a measure of how much

• Mean, variance, skewness and kurtosis are moments of

• The joint probability distribution of two discrete

• The marginal probability distribution of a random variable

The distribution of a random variable Y conditional on another

The conditional expectation of Y given X, i.e. the conditional

Where is the difference from the unconditional expectation?

LAW OF ITERATED EXPECTATIONS (example:

Example: E (Y |X ) Y is the weight. X=1 for drinking bubble

Conditional variance. Similar.

• Independent: Knowing the value of one of the variables

X and Y are uncorrelated if cov (X , Y ) = 0. X and Y

E(a + bX + cY ) =a + bE(X ) + cE(Y )

• Univariate normal distribution: N(0, 1); N(µ, σ 2 ); p.d.f .ϕ(x);

• Bivariate normal distribution:

• If X and Y follow bivariate normal distribution with

• If variables with a multivariate normal distribution have

The chi-squared distribution with m degrees of freedom χ2m is

• The Student t distribution with m degrees of freedom tm

The F distribution with m and n degrees of freedom Fm,n , is

• Sampling distribution: the distribution of a given

• The sampling distribution of Ȳ depends on the

• Under i.i.d., the exact distribution (finite sample

• If Yi , i = 1, ..., n are i.i.d., and var(Yi ) < ∞, then

• We know the mean and variance of Ȳ is µY and σY2 /n.

or (the above formula is not strict, as the variance

1. Estimation of Population Mean

• Census: Implementation of census for the all population

We will focus on the population mean as an example. We will

• To estimate the population mean µY (i.e. the

• Various estimators have different properties. Which is

• Unbiasedness: The mean of the sampling distribution of

• Null hypothesis: population mean E(Y ) takes on a

• Alternative hypothesis specifies what is true if the null

• Our estimate for E(Y ), the value of µ eY , will rarely be

• Ȳ act is the value of the sample average actually computed

• EXAMPLE, when you try to borrow money from me. I

• Mathematically, the p-value is:

• To compute the p-value, it is necessary to know the

Pay attention. Which is the random variable and which is

• t statistic is the standardized sample average. It plays a

• t is approximately distributed N(0,1) for large n.

• Reject the null hypothesis if the absolute value of the

• Distinguish the variance of the sample mean (the one

• The standard error of Ȳ is an estimator of the standard

A 95% two-sided confidence interval for µY is an interval

• 90% confidence interval for µY = {Ȳ ± 1.64SE (Y¯)},

You might also like