Basics
Basics
Basics
Christophe Croux
[email protected]
1
Session 1: Basic Statistical Concepts
Possible specifications:
1. Linear form: M = α + βY + γr
Note that ∂M
∂Y = β (Marginal effect)
[If Y increases with one unit, then M increases with β units, ce-
teris paribus.]
paribus.]
5
II Probability Distributions
In statistics and econometrics we deal with variables whose values
able will take, we are often able to attach certain probabilities to these
k 0 1 2 3
pk 0.125 0.375 0.375 0.125
10
5
0
0 1 2 3 4 5 6
number of children
8
II.2 Continuous stochastic variables
9
Rx
We see that F (x) = −∞ f (y)dy and that the
derivative of F equals f .
0.4
0.3
0.2
f(x)
0.1
0.0
-4 -2 0 2 4
0.4
0.2
0.0
-4 -2 0 2 4
11
For X a standard normal random variable, we
have E[X] = 0 and SD(X) = 1. Furthermore:
P (−1 ≤ X ≤ 1) ≈ 0.68, P (−2 ≤ X ≤ 2) ≈
0.95, P (−3 ≤ X ≤ 3) ≈ 0.997.
about 95% of the Belgians have an IQ in the interval [70;130], under normality)
12
III Estimators of means and variances
30
20
10
0
monthly income
14
The sample average equals 1201, with standard error SE=6.47. The
we may say that about 95% of the monthly incomes are in the interval
[978; 1427]
15
The joint distribution of (X, Y ) is given by the
probabilities of the form
P (X = k, Y = l).
The conditional distribution of Y given X = k
is given by probabilities of the form
P (Y = l and X = k)
P (Y = l|X = k) = ,
P (X = k)
for each possible outcome k of X.
The marginal distributions of X and Y are noth-
ing else but the distributions of X and Y sepa-
rately.
Property
• E[X+Y]=E[X]+E[Y]
16
• If X and Y are statistically independent, then
Var(X + Y ) = Var(X) + Var(Y )
Marginal distribution of Y :
P (Y = 0) = 0.5 and P (Y = 1) = 0.5.
Conditional distribution of Y given X = 1:
P (Y = 1|X = 1) = 0.5 and P (Y = 0|X = 1) = 0.5.
Conditional distribution of Y given X = 0:
P (Y = 1|X = 0) = 0.5 and P (Y = 0|X = 0) = 0.5.
17
V Some Exercises
1. Below we see a graphical representation of the total number of
bankruptcies in Belgium over the last 4 months.
1560
1550
1540
number
1530
1520
1510
20
20
15
15
10
10
5
5
0
-1 0 1 2 3 -1 0 1 2 3
x1 x2
20
20
15
15
10
10
5
5
0
-2 0 2 4 -1 0 1 2 3 4
x3 x4
19
As an alternative to histograms, kernel density estimates can be
computed. The latter can be considered as a kind of smoothed
histograms. Compare these kernel density estimates with the pre-
vious histograms.
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
-2 -1 0 1 2 3 4 -1 0 1 2 3 4
x1 x2
0.4
0.15
0.3
0.10
0.2
0.05
0.1
0.0
0.0
-4 -2 0 2 4 6 -2 0 2 4
x3 x4
20
Session 2: Econometrics in Practice: in-
troduction to the use of Eviews
3. Open the object “GDP” by clicking on it. Using the View menu,
try out: (1) /graph/line, (2) /descriptive statistics/histogram and
stats/.
Note that
Xt − Xt−1
log(Xt) − log(Xt−1) = log(Xt/Xt−1) ≈ .]
Xt−1
6. Select now in the workfile the series M1, GDP, PR, and RS.
By clicking on the right mouse button, you can open them as
a group. Try /View/Graphs/Lines/ and /View/Multiple Graph
/Lines/. Compute the correlation matrix of these variables, by
/View/Correlations. Are these correlations spurions?
[Time series showing trends over time will always be highly cor-
related. The reason is that they are both driven by time. The
high correlation does not imply a causal relationship, it may be
spurious.]
22
7. Use /Quick/Estimate Equation/ to estimate the equation:
The default choice for the significance level is α = 0.05. This level
gives the type I error, i.e. the probability of rejecting H0 when it
23
holds. The smaller the choice of α, the more conservative we are
towards H0.
If P-value< 0.05, then the corresponding variable is said to be
significant (for explaining Y ). If P-value< 0.01, then it is highly
significant.
It is often better to interpret the P-value on a continuous scale
(e.g. P=0.049 and P=0.051 is almost identical). The smaller the
P-value, the more evidence in the data against the null hypothesis.
24
II Basic Principles of Eviews
• Eviews is a windows oriented econometrical
software package.
• For every new data set, a workfile needs to be
created.
• Workfiles are characterized by a frequency and
a range.
• A workfile contains different objects.
• Objects may be of different types like series,
groups, equations, graphs, ...
• The available toolbars/menus of an object win-
dow depend on the type of the object
• The same EVIEWS instruction can be given
in several ways.
• It is possible to write programs in EVIEWS.
25
III Descriptive Univariate Statistics
Given a univariate sample x1, . . . , xn, then we
can compute
• location measures: mean x̄, median, ..
• spread/dispersion measures: standard devia-
tion σ̂, range=maximum-minimum, ...
• measure of asymmetry: skewness coefficient
Xn
1 xi − x̄ 3
Sk = ( )
n σ̂
i=1
Positive skewness means long right tail.
• measure of “heavy tails”: kurtosis coefficient
n
1 X xi − x̄ 4
κ= ( )
n σ̂
i=1
At normal distributions κ ≈ 3. If κ < 3, the
distribution is said to be peaked (leptocurtic).
If κ > 3, the distribution is said to be heavy
tailed or flat (platycurtic) w.r.t. the normal.
26
The distribution of the data can be pictured by
an histogram or a kernel density plot.
27
IV Correlation coefficients
Given two stochastic variables X and Y . The
covariance between X and Y is defined as
Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))].
The correlation between X and Y is defined as
Cov(X, Y )
Corr(X, Y ) = p .
Var(X)Var(Y )
We have that
• −1 ≤ Corr(X, Y ) ≤ 1
• |Corr(aX + b, cY + d)| = |Corr(X, Y )|
• Corr(X, Y ) = 1 (respectively =-1) if and only
if there exist a > 0 (resp. a < 0) and b such
that Y = aX + b.
• If Corr(X, Y ) = 0 then we say that X and Y
are uncorrelated. If (X, Y ) follows a normal
distribution, then uncorrelatedness implies in-
dependency.
28
From a random sample X1, . . . , Xn, we can es-
timate ρ = Corr(X, Y ) by the correlation coeffi-
cient
Pn
i=1 (xi − x̄)(yi − ȳ)
ρ̂ = qP P .
n 2 n (y − ȳ)2
(x
i=1 i − x̄) i=1 i
The correlation coefficient is used as a measure
of the strength of the linear association between
2 variables. It is telling us to which extent 2
variables “move together”, and has nothing to
say about causal relations.
Exercise
For 6 datasets, visualized by their scatterplots
on the next page, we computed correlation coef-
ficients and obtained:
-0.70 0.01 0.74 0.74 0.95 0.99
Match the correlations with the data sets.
29
4.0
5
4
3.0
3
y
y
2
2.0
1
1.0
-1
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x x
40
0
-2
20
y
y
-4
0
-6
-20
-8
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x x
0.8
2.5
0.6
1.5
y
y
0.4
0.5
0.2
-0.5
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x x
30
Serial correlation:
If the data are a random sample X1, X2, . . . , Xn,
then Corr(Xi, Xj ) = 0 for 1 ≤ i 6= j ≤ n. But
if the data form a time series, then presence of
autocorrelation or serial correlation may occur.
If Xt is a stationary time series, then this auto-
correlation can be quantified.
Definition: Xt is a stationary series if
1. E[Xt] = µ for all t
2. Var(Xt) = σ 2 for all t
3. Corr(Xt, Xt−k ) = ρk for all t, and for k =
1, 2, . . ..
31
The graph of the function k → ρ̂k is called the
correlogram.
A correlogram also indicates critical bounds. If
ρ̂k passes them, then it is significantly different
from zero (H0 : ρk = 0 is rejected at 5%).
series 1 correlogram
1.0
6
4
0.6
ACF
2
0.2
0
-2
-0.2
0 20 40 60 80 100 0 5 10 15 20
Lag
Time
series 2 correlogram
1.0
2
1
0.6
0
ACF
-1
0.2
-2
-0.2
-3
0 20 40 60 80 100 0 5 10 15 20
Lag
Time
The correlograms above show that there is much more serial correlation
in the first than in the second series. One says that there is more
32
V Exercise
In the file spurious.txt you find annual data from
1971 to 1990 for the variables
X: average disposable income per capita (1980
prices)
Y : average consumption per capita (1980 prices)
Z: number of professional football players
We are interested in the correlations between these
variables. Note that we have here time series
data and no cross-sectional data.
1. Create a new workfile and import the data (/File/Import/Text/)
33
6. Use Quick/Estimate equation to estimate the coefficients of the
regression equation Y = α + βX + ². (Type in “Y c X ” where c
represents the constant term α).
34
Session 3: The Linear Regression Model
I The model
Note that
∂E[Y |X1, . . . , Xp]
= βj
∂Xj
for any 1 ≤ j ≤ p, → interpretation of βj :
39
III Properties of OLS
41
IV Summary Statistics in Regression Analysis
R-squared
The R-squared statistic is the fraction of the vari-
ance of the dependent variable explained by the
independent variables:
2 VarŶ Var(residuals)
R = =1− .
VarY VarY
It measures the predictive power of the regression
equation.
• R2 = 1 if and only if Yi = Ŷi for all i
• R2 = 0 if and only if Ŷi = Ȳ for all i
We also call R2 the squared multiple correlation
coefficient.
42
Adjusted R-squared
A problem with using R2 as a measure of good-
ness of fit is that it never decreases if you add
more regressors. The adjusted R2 penalizes for
the addition of regressors which do not contribute
to the explanatory power of the model:
2 2 n−1
AdjustedR = 1 − (1 − R )
n−k
F-statistic
The F-statistic tests the hypothesis that all of
the slope coefficients (excluding the intercept) in
a regression are zero:
H0 : β1 = . . . = βp = 0.
An accompanying P-value is given by the soft-
ware. The F-test is a joint test, keeping the joint
type I error under control. Note that even if all
the t-statistics are insignificant, it is not excluded
43
that the F-statistic is highly significant.
Durbin-Watson Statistic
The Durbin-Watson (DW) statistic measures the
serial correlation (of order one) in the residuals.
The statistic is computed as
Pn 2
(r t − r )
DW = t=2Pn 2t−1
t=1 ri
• 0 ≤ DW ≤ 4
• H3 → DW ≈ 2
• DW << 2 → positive autocorrelation
(There are better tests for serial correlation in the
error terms.)
44
V Residual Plots
500
1
residual
residual
0
0
-1
-1000
0 5 10 15 20 25 30 0 5 10 15 20 25 30
time time
8
2
6
1
residual
residual
4
0
2
-1
0
-2
-2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
time time
0.4
residual
residual
50
0.0
0
-50
-0.4
0 5 10 15 20 25 30 0 5 10 15 20 25 30
time time
46
VI Using Dummy Variables
Exercise
49
VII Exercises
51
3. Ten experts make a prediction for the eco-
nomic growth in the EU and ten other experts
in the US for next year:
EU US
2.1 2.6
2.5 2.4
2.3 3.2
1.4 0.8
1.5 1.3
1.5 2.1
2.4 1.6
2.7 3.2
2.8 3.1
1.1 1.4
I Running Example
Demand for food in the USA, yearly data (1963-1992, file: “food.wmf”).
Q: the demand for food in constant prices
X: total expenditure in current prices
P : a price index for food
G: a general price index
Economic Theory suggests Q = f (X, P, G).
53
1. Make line graphs + some descriptive statistics of the series Q, X,
P and G, and
5. Make a graph of the actual and the fitted series log(Q). Make a
residual plot. (use /View/Actual, fitted,residuals). Make a QQ-
plot and a correlogram of the residuals. Comment.
II Coefficient tests
54
If we test for “H0: g(parameters)=0”, then the
Wald test rejects the null hypothesis if “g(estimated
parameters)” is too far from 0.
57
IV Residual Tests
60
dependent (and independent) variables can solve
the problem of serial correlation.
3. Look how the standard errors around your estimates change when
you use the White estimator for the covariance matrix of the es-
timator (Use the options when estimating the model equation in
EVIEWS). Same question for Newey/West.
61