0% found this document useful (0 votes)
34 views74 pages

Lecture1 Introduction

This document provides an introduction and review of key concepts in econometrics. It begins with an overview of how econometrics uses economic theory and statistics to analyze economic data. It then reviews concepts in probability, including random variables, probability distributions, expected values, variance, and the normal distribution. Finally, it discusses different types of economic data like cross-sectional, time series, and panel data that econometricians use.

Uploaded by

yasmeen Elwasify
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
34 views74 pages

Lecture1 Introduction

This document provides an introduction and review of key concepts in econometrics. It begins with an overview of how econometrics uses economic theory and statistics to analyze economic data. It then reviews concepts in probability, including random variables, probability distributions, expected values, variance, and the normal distribution. Finally, it discusses different types of economic data like cross-sectional, time series, and panel data that econometricians use.

Uploaded by

yasmeen Elwasify
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 74

Lecture 1: Introduction and Review

Zhuzhu Zhou

Xiamen University
Introduction and Review

• Economic Questions and Data


• Review of Probability
• Review of Statistics

1
Economic Questions and Data
Econometrics is the science and art of using economic theory
and statistical techniques to analyze economic data.

2
Economic Questions

• Outsiders laugh at Economics, thinking conclusions too


straightforward: e.g. higher education leads to higher
salary in the future
• We can ask him: with 50w cash at hand, whether to
pursue graduate education, or to invest in stock?
• Need to compare the returns in education and stock:
quantitative answers required.
• We want to know the answer as well as how precise our
answer is.
• Econometrics, biometrics, anthropometrics,
milkteametrics...
3
Economic Questions

Many economic questions can be described by the relationship


between variables. e.g. education and salary.

4
Economic Questions: Causal Effect

• We want to compare two individuals’ salary withholding


all other factors but education the same.
• Ideally, we want to find two identical individuals (twins?)
and make an Randomized Controlled Experiment:
one takes more education (Treatment Group) while the
other less (Control Group).
• It is helpful to think about the “ideal experiment” when
asking for the causal effect.
• Forecast may not concern the causal effect: rooster and
sunrise.
• The conceptual framework of this course to explore the
relationship between variables is the Multiple
Regression Model.
5
Data Sources

Experimental Data v.s. Observational Data

• Values of variables cannot be intervened by economists by


experiments. Doing experiment in Economics is usually
intractable (expensive (Liang Jianzhang J-PAL); unethical
(Stanford Prison Experiment )).
• Many economists try to estimate causal effects using
non-random observational data with Econometrics.

6
Data Types

Economists are like workers. Data are the inputs. Experienced


lady cannot cook without rice . Different types of data allow
different findings. E.g. education and earnings: two individuals
v.s. one individual before and after receiving higher education;
drinking milk tea and getting fat.

• Cross-sectional Data: {xi , yi }


• Time Series Data: {xt , yt }
• Panel Data: {xit , yit }

Examples: CHARLS, Bureau of Statistics, Statistical Yearbook

7
Data Types

It is always helpful to bear in mind what sort of variation in


data help us identify the effect (reach our conclusions).
Different types of data may have different findings for the
same question. E.g. career training and earnings, or the effect
of SK2.

• Compare two individuals with/ without training


(cross-sectional).
• Compare one individual before and after receiving training
(time series).

8
Review of Probability
Review of Probability

1. Random Variables and Probability Distributions


2. Expected Values, Mean, and Variance
3. Two Random Vairables
4. Normal, Chi-Squared, Student t, and F Distributions
5. Random Sampling and Distribution of Sample Average
6. Large-Sample Approximations to Sampling Distributions

9
1. Random Variables and Probability Distributions

10
Probabilities and Outcomes

• The mutually exclusive potential results of a random


process are called the outcomes.
-Alipay lottery: normal people, mascot, super mascot.
-Pieces of meat you get from the service lady at the
dinning hall: 1, 2, 3, 4
• The probability of an outcome is the proportion of the
time that the outcome occurs in the long run.
• The set of all possible outcomes is called the sample
space.
• An event is a subset of the sample space.

11
Random Variables

• A random variable is a numerical summary of a random


outcome.
• Discrete random variables takes on a discrete set of
values. (e.g. the result of university application, graduate
entrance exam; the number of pieces of pork you get
from dinning hall; number of students sleeping in class
currently)
• Continuous random variables takes on a continuum of
possible values. (e.g. Shanghai Composite Index; the
grams of rice you get)
• Usually we use the upper case to represent the random
variable X, and lower case the value taken by the random
variable x.
12
Probability Distribution of a Discrete Random Vari-
able

• Probability distribution of a discrete random variable is


the list of all possible values of the variable and the
probability that each value will occur.
• Probabilities of events.
• Cumulative probability distribution (i.e. cumulative
distribution function CDF) is the probability that the
random variable is less than or equal to a particular value.
• Bernoulli distribution and bernoulli random variable.

13
Probability Distribution of a Continuous Random
Variable

• Cumulative probability distribution is the probability that


the random variable is less than or equal to a particular
value.
• Probability density function (PDF)
The area between any two points is the probability that
the random variable falls between those two points.
Probability of taking a single value is meaningless.

14
2. Expected Values, Mean, and Variance

15
Expected Value

The expected value of a random variable E (Y ), is the long-run


average value of the random variable over many repeated trials
or occurrences.
k
X
Discrete case : E (Y ) = y1 p1 + y2 p2 + ... + yk pk = yi pi
i=1
Z
Continuous case : E (Y ) = yfY (y )dy

16
Expected Value

• It is computed as a weighted average of the possible


outcomes of that random variable.
The weights are the probabilities of that outcome.
• The expected value of Y is also called the expectation of
Y or the mean of Y (µY ).
E.g. pieces of pork from dinning hall

17
Variance and Standard Deviation

The variance and standard deviation measure the dispersion or


the “spread” of a probability distribution.
If we want to measure the “spread”, what should we do? We
may want to calculate how far each value is away from the
mean. To get rid of the sign, we may want to square the
difference.
k
X
2 2
Discrete case : σ = E [(Y − µY ) ] = (yi − µY )2 pi
Zi=1
Continuous case : σ 2 = E [(Y − µY )2 ] = (y − µY )2 fY (y )dy

18
Variance and Standard Deviation

• The variance of a random variable Y (var (Y )), is the


expected value of the sqare of the deviation of Y from its
mean.

σY2 = var (Y ) = E [(Y − µY )2 ]

• It punishes extremes heavily.


• Standard deviation takes square root of the variance, so
that it has the same unit as the random variable.

19
Mean and Variance of a Linear Function of a Random
Variable

Y = a + bX

• Mean: µY = a + bµX
• Variance: σY2 = b 2 σX2

Prove it. Notice that the variance of a constant is 0.

20
Other Measures of the Shape of a Distribution

• Mean and variance are two important features of a


distribution. (e.g. weight and height of a person’s
appearance)
Appearance: How an individual looks like;
Distribution: How a random variable looks like
• Skewness describes how much a distribution deviates
from symmetry. Positive skewed, right-skewed, skewed to
the right.

E [(Y − µY )3 ]
Skewness =
σY3

21
Other Measures of the Shape of a Distribution

• Kurtosis of a distribution is a measure of how much


mass is in its tails.
How many outliers (extreme values)? (FIGURE)

E [(Y − µY )4 ]
Kurtosis =
σY4

• Mean, variance, skewness and kurtosis are moments of


a distribution.
They capture parts of the information of a distribution.
r th moment. For an interview, better to see the real
person, though sometimes it is costly.

22
3. Two Random Vairables

23
Joint Distributions

• The joint probability distribution of two discrete


random variables is the probability that the random
variables simultaneously take on certain values.

Discrete case : Pr (X = x, Y = y )
Continuous case : fXY (x, y )

24
Marginal Distributions

• The marginal probability distribution of a random variable


Y describes the distribution of Y alone.
l
X
Discrete case : Pr (Y = y ) = Pr (X = xi , Y = y )
i=1
Z
Continuous case : fY (y ) = fXY (x, y )dx

25
Joint and Marginal Distributions

26
Conditional Distributions

The distribution of a random variable Y conditional on another


random variable X taking on a specific value is called the
conditional distribution of Y given X. IT IS HELPFUL TO
RETHINK WHAT IS Pr? A NUMBER? OR A FUNCTION? A
FUNCTION OF X, OR X AND Y?
Pr (X = x, Y = y )
Discrete case : Pr (Y = y |X = x) =
Pr (X = x)
fXY (x, y )
Continuous case : fY |X =
fX (x)

27
Conditional Distributions

The conditional expectation of Y given X, i.e. the conditional


mean of Y given X, is the mean of the conditional distribution
of Y given X.
Discrete case: E (Y |X ) = y1 p1 + y2 p2 + ... + yk pk = ki=1 yi pi .
P

Where is the difference from the unconditional expectation?


R
Continuous case: E (Y |X ) = yfY |X dy .
A function of what?

28
Conditional Distributions

LAW OF ITERATED EXPECTATIONS (example:


Y:weight; X:height)

E (Y ) = E (E (Y |X ))
Z Z 
= yfY |X dy fX dx

E (Y |Z ) = E (E (Y |X , Z )|Z )

Example: E (Y |X ) Y is the weight. X=1 for drinking bubble


milk tea, 0 otherwise.

Conditional variance. Similar.


29
Independence, Covariance and Correlation

• Independent: Knowing the value of one of the variables


provides no information about the other.
fX ,Y
fX ,Y = fX fY . Notice that = fY |X = fY
fX
• Correlation
cov (X , Y )
corr (X , Y ) =
σ(X )σ(Y )

X and Y are uncorrelated if cov (X , Y ) = 0. X and Y


being uncorrelated only suggests there is no “linear”
relationship between them. EXAMPLE.

30
Independence, Covariance and Correlation

• Mean independent.
E (Y |X ) = E (Y ) then cov (X , Y ) = 0
Proof : E (XY ) = E (E (XY |X )) = E (XE (Y |X )) =
E (XE (Y )) = E (X )E (Y )
• Relationship between Independent, Mean Independent
and Uncorrelated.

31
Means, Variances, and Covariances of Sums of Ran-
dom Variables

E(a + bX + cY ) =a + bE(X ) + cE(Y )


var(a + bY ) =b 2 var(Y )
var(aX + bY ) =a2 var(X ) + b 2 var(Y ) + 2abcov(X , Y )
cov(a + bX + cZ , Y ) =bcov(X , Y ) + ccov(Z , Y )

32
4. The Normal, Chi-Squared, Student t, and F
Distributions

33
The following distributions are used when testing certain types
of hypothesis in Econometrics.

• Normal Distribution
• Chi-Squared
• Student t
• F Distribution

34
Normal Distribution

• Univariate normal distribution: N(0, 1); N(µ, σ 2 ); p.d.f .ϕ(x);


c.d.f .Φ(x); skewness is 0; kurtosis is 3

1  1 x − µ 2
fX = √ exp − ( )
2πσ 2 2 σ

• Bivariate normal distribution:


1
gX ,Y = p
2πσX σY 1 − ρ2XY
 1  x − µX 2 y − µY 2 x − µX y − µY 
exp − 2 ( ) +( ) −2ρXY ( )( )
2(1 − ρXY ) σX σY σX σY

35
Normal Distribution

• If X and Y follow bivariate normal distribution with


covariance σXY , then aX+bY has the distribution
N(aµX + bµY , a2 σX2 + b 2 σY2 + 2abσXY ). Moreover, if n
random variables have a multivariate normal distribution,
then any linear combination of these variables is normally
distributed.
• If a set of variables has a multivariate normal distribution,
then the marginal distribution of each of the variables is
normal.

36
Normal Distribution

• If variables with a multivariate normal distribution have


covariances 0, then the variables are independent.
• If X and Y have a bivariate normal distribution, then the
conditional expectation of Y given X is linear in X.
E (Y |X = x) = a + bx.

37
Chi-Squared Distribution

The chi-squared distribution with m degrees of freedom χ2m is


the distribution of the sum of m squared independent standard
normal random variables.

38
Student t Distribution

• The Student t distribution with m degrees of freedom tm


is the distribution of the ratio of a standard normal
random variable, divided by the square root of an
independently distributed chi-squared random variable
with m degrees of freedom divided by m. √ Z
W /m
• when m is greater than 30, Student t distribution is close
to the standard normal.

39
F Distribution

The F distribution with m and n degrees of freedom Fm,n , is


the distribution of the ratio of a chi-squared random variable
with m degrees of freedom divided by m, to an independently
distributed chi-squared random variable with n degrees of
freedom divided by n. WV /n
/m

40
5. Random Sampling and Distribution of Sample
Average

41
Almost all the statistical and econometric procedures used in
this book involve averages or weighted averages of a sample
of data. Therefore understanding the distributions of sample
averages is important.

42
• Simple random sampling: n objects are selected at
random from a population. Each member of the
population is equally likely to be included in the sample.
• Y1 , ..., Yn can be treated as random variables. (The
randomness comes from sampling. If we are able to
sample again, the values of these variables will be
different.) Before being sampled, they can take many
values; after being sampled, a specific value is taken.
• When Y1 , ..., Yn are drawn from the same distribution and
are independently distributed, they are said to be
independently and identically distributed (i.i.d.).

43
Sample Average

• Sampling distribution: the distribution of a given


random-sample-based statistic.
• Sample average, or sample mean, is a random variable.
We want to know the sampling distribution of Ȳ .
n
1 1X
Ȳ = (Y1 + Y2 ... + Yn ) = Yi
n n i=1

• The sampling distribution of Ȳ depends on the


distribution of Y . But the mean and variance is given as
follows, regardless of the population distribution of Y .

44
Sample Average

• Mean of Ȳ , E (Ȳ )
n
1X
E (Ȳ ) = E (Yi ) = µY = E (Y )
n 1

• Variance of Ȳ
n
1X
var(Ȳ ) =var( Yi )
n i=1
n n n
1 X 1 X X
= 2 var(Yi ) + 2 cov(Yi , Yj )
n i=1 n i=1 j=1,j̸=i
var(Y )
=
n
45
6. Large-Sample Approximations to Sampling
Distributions

46
Large-Sample Approximations to Sampling Distribu-
tion of Ȳ

• Under i.i.d., the exact distribution (finite sample


distribution) of Ȳ depends on the number of
observations n as well as the population distribution of Yi .
• E.g. Yi has a normal distribution then so does Ȳ .
However, if Yi has other distributions, the finite sample
distribution of Ȳ is complicated.
• Asymptotic distribution is the large-sample approximation
to the sample distribution when n goes to infinite.
• The two powerful tools for asymptotic distribution: Law
of Large Numbers and Central Limit Theorem.

47
Law of Large Numbers

• If Yi , i = 1, ..., n are i.i.d., and var(Yi ) < ∞, then


p
Ȳ → µY
When n is very large, Ȳ will be near µY with very high
probability.
• What’s the difference between law of large numbers and
the property E(Ȳ ) = µY ?
• Converge in probability.

48
Central Limit Theorem

• We know the mean and variance of Ȳ is µY and σY2 /n.


How about the distribution?
• When n is large, the distribution of Ȳ is approximately
N(µY , σY2 /n), whatever the distribution of Y .
d
Ȳ → N(µY , σY2 /n)

or (the above formula is not strict, as the variance


depends on n, which is not a certain distribution.)
√ d
n(Ȳ − µY ) → N(0, σY2 )

49
Review of Statistics
Review of Statistics

1. Estimation of Population Mean


2. Hypothesis Tests Concerning Population Mean
3. Confidence Interval for the Population mean

50
The key insight of statistics is that one can learn about a
population distribution by selecting a random sample from
that population.

• Census: Implementation of census for the all population


is costly. The 6th China Population Census costs 8 billion
yuans.
• Survey: Instead, statistical methods allow us to draw
statistical inferences about characteristics of the full
population based on the random sample.

51
Three types of statistical methods frequently used in
Econometrics.

• Estimation
• Hypothesis Testing
• Confidence Intervals

We will focus on the population mean as an example. We will


be interested in other statistics of the population. The
statistical methods are similar.

52
1. Estimation of Population Mean

53
Estimators

• To estimate the population mean µY (i.e. the


estimand), there are many alternative estimators µ
bY .
1
For example, Ȳ , Y1 or (Y1 Y2 ...Yn ) n
• Estimators are functions of a sample of data to be
drawn randomly from a population.
Notice that they per se are random variables.
• An estimate is the numerical value of the estimator
when it is actually computed using data from a specific
sample. An estimate is a nonrandom number.

54
Estimators

• Various estimators have different properties. Which is


better?
- From the aspect of random variable, what are the
desirable characteristics of the sampling distribution of an
estimator?
- What are the desirable characteristics of the equipment
in video game? ATK, DEF, HP, MP....
- What are the desirable characteristics of the cosmetics?
Keeping moistness, antioxidization etc.
• We want to get estimate “close” to the unknown true
value of the estimand.

55
Properties of Estimators

• Unbiasedness: The mean of the sampling distribution of


a estimator equals to the true value.
E (b
µY ) = µY
• Consistency: The probability that the estimator is within
a small interval of the true value approaches 1 as the
sample size increases.
p
bY → µY
µ
• Efficiency: Two unbiased estimators, the one with
smaller variance is more efficient.
Usually we restrict to unbiased estimators to discuss
efficiency. Think about µ bY =c, where c is a constant. The
variance is 0 but bias is large. 56
Properties of Ȳ (i.i.d. is important!)

• E(Ȳ ) = µY , Ȳ is unbiased.
p
• µ
bY → µY , Ȳ is consistent.
• Ȳ is the most efficient among linear unbiased estimators.
best linear unbiased estimator (BLUE).
Compare with Y1 and
Ye = n1 ( 12 Y1 + 23 Y2 + ... + 21 Yn−1 + 32 Yn ).
var(Ye ) = 1.25σY2 /n

57
2. Hypothesis Tests Concerning Population Mean

58
Null and Alternative Hypotheses (two-sided)

• Null hypothesis: population mean E(Y ) takes on a


specific value µY ,0

H0 : E(Y ) = µY ,0

• Alternative hypothesis specifies what is true if the null


hypothesis is not.

H1 : E(Y ) ̸= µY ,0

59
Null and Alternative Hypotheses (two-sided)

• Our estimate for E(Y ), the value of µ eY , will rarely be


µY ,0 . Is it due to random sampling or that the population
mean E(Y ) is not µY ,0 ? We want to account for the
uncertainty resulted from random sampling.
- I observe that I got less pieces of meat than Ryan,
Jeremy and Maximo. Is it because I am less handsome, or
because random sample?

60
p-value

• Ȳ act is the value of the sample average actually computed


with the data.
• If the null hypothesis is true, Ȳ has the approximate
distribution N(µY0 , σY2 /n) when n is large.
• What’s the probability of drawing a statistic more
“extreme” than (or at least the same as) the value
computed with the data at hand, given the null
hypothesis is true?
• If the probability is high, then the null hypothesis is likely
to hold, whereas if the probability is low, the nul
hypothesis is unlikely to be true.

61
p-value

• EXAMPLE, when you try to borrow money from me. I


tell you that I have no money (null hypothesis). But you
observe that I am using an iPhone 13 pro 256 GB (this is
the value of the estimator we actually computed, the
data). Then if I indeed had no money (if the null
hypothesis is true), the probability that I can afford
iPhone 13 pro 256GB (or iPhone 14 Pro 512 GB) is very
low. Therefore the null hypothesis will be rejected.
(If I am using Xiaomi. Is the P-value large? Do you trust
the null hypothesis?)

62
p-value

• Mathematically, the p-value is:


h i
p-value = PrH0 |Ȳ − µY ,0 | > |Ȳ act − µY ,0 |

• To compute the p-value, it is necessary to know the


sampling distribution of Ȳ under the null hypothesis (If I
had no money, what are the possibilities that I use
different type of mobiles.). Again, the exact sampling
distribution can be difficult. But it can be approximated
when n is large according to the central limit theorem.

63
Calculating p-Value

• When σY is known:
h i
p-value =PrH0 |Ȳ − µY ,0 | > |Ȳ act − µY ,0 |
h Ȳ − µ Ȳ act − µ i
Y ,0 Y ,0
=PrH0 >

σȲ σȲ

 Ȳ act − µ 
Y ,0
=2Φ −

σȲ

Pay attention. Which is the random variable and which is


the number?
• Unfortunately, usually σY is unknown.

64
t-Statistic

• t statistic is the standardized sample average. It plays a


central role in hypothesis test in econometrics.

Ȳ − µY ,0
t=
SE(Ȳ )

• t is approximately distributed N(0,1) for large n.

65
Hypothesis Testing with a Prespecified Significance
Level

• Reject the null hypothesis if the absolute value of the


t-statistic computed from the sample is greater than the
critical value associated with prespecified significance
level.
- Critical value, if I am poor, the most expensive mobile
that I can afford.
• Type I error: the null hypothesis is rejected when in fact
it is true.
- False positive.
• Significance level of the test: the prespecified probability
of a type I error.
66
Sample Variance

• Distinguish the variance of the sample mean (the one


we’ve seen before) and the sample variance.
• Sample variance sY2 :
n
1 X
sY2 = (Yi − Ȳ )2
n − 1 i=1
• This is similar to the population variance. However, the
sample mean Ȳ instead of the population mean µY is
subtracted; and this is divided by n-1 instead of n.
• sY2 is unbiased by being divided by n−1
1
(PROVE). An
example that rule of analogy does not always produce
estimator with desirable properties.
• sY2 is consistent estimator of the population variance.
67
Standard Error

• The standard error of Ȳ is an estimator of the standard


d
deviation of Ȳ . Notice that Ȳ → N(µY , σY2 /n) andf σȲ
is estimated.
sY
SE(Ȳ ) = σȲ = √
n

68
3. Confidence Intervals for the Population Mean

69
Confidence Intervals

A 95% two-sided confidence interval for µY is an interval


constructed so that it contains the true value of µY in 95% of
all possible random samples. When the sample size n is large,
90%, 95%, and 99% confidence intervals for µY are:

• 90% confidence interval for µY = {Ȳ ± 1.64SE (Y¯)},


• 95% confidence interval for µY = {Ȳ ± 1.96SE (Y¯)},
• 99% confidence interval for µY = {Ȳ ± 2.58SE (Y¯)},

70

You might also like