Ch2 Slides
Ch2 Slides
Ch2 Slides
Regression
Regression is probably the single most important tool at the
econometricians disposal.
But what is regression analysis?
It is concerned with describing and evaluating the relationship
between a given variable (usually called the dependent variable) and
one or more other variables (usually known as the independent
variable(s)).
Some Notation
Denote the dependent variable by y and the independent variable(s) by x1, x2, ... , xk
where there are k independent variables.
Some alternative names for the y and x variables:
y
x
dependent variable
independent variables
regressand
regressors
effect variable causal variables
explained variable
explanatory variable
Note that there can be many x variables but we will limit ourselves to the case
where there is only one x variable to start with. In our set-up, there is only one y
variable.
Simple Regression
For simplicity, say k=1. This is the situation where y depends on only one x
variable.
Year, t
1
2
3
4
5
Excess return
= rXXX,t rft
17.8
39.0
12.8
24.2
17.2
We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship between x
and y given the data that we have. The first stage would be to form a scatter
plot of the two variables.
45
40
35
30
25
20
15
10
5
0
0
10
15
20
25
yi
u i
y i
xi
So min. u1 u 2 u3 u 4 u5 , or minimise
as the residual sum of squares.
2
u
t 1
2
t
. This is known
But what was ut ? It was the difference between the actual point and
the line, yt - y t .
So minimising
with respect to
y
t t is equivalent to minimising
and .
u
t
t xt , so let
But y
L ( yt y t ) 2 ( yt xt ) 2
t
and
, so
differentiate L w.r.t.
L
2 ( yt xt ) 0
t
L
2 xt ( yt xt ) 0
t
(1)
(2)
( y t x t ) 0 y t T x t 0
From (1),
But y t Ty and x t Tx .
or y x 0
From (3), y x
Substitute into (4) for
from (5),
xt ( yt y x xt ) 0
t
x
y
y
x
x
x
x
t t t
t
t 0
t
2
2
x
y
T
y
x
T
x
x
t t
t 0
t
(3)
(4)
(5)
xt yt Tx y
and
xt2 Tx 2
This method of finding the optimum is known as ordinary least squares.
What do We Use
and For?
estimates
= -1.74 and
= 1.64. We would write the fitted line as:
y t 1.74 1.64 x t
Question: If an analyst tells you that she expects the market to yield a return
20% higher than the risk-free rate next year, what would you expect the return
on fund XXX to be?
Solution: We can say that the expected value of y = -1.74 + 1.64 * value of x,
so plug x = 20 into ythe equation
expected
value for y:
1.74 1.to
64get
20the
31
.06
i
Population of interest
the entire electorate
yt xt ut
The SRF is y t xt
and we also know that ut yt y t.
We use the SRF to infer likely values of the PRF.
We also want to know how good our estimates of and are.
Linearity
In order to use OLS, we need a model which is linear in the parameters (
and ). It does not necessarily have to be linear in the variables (y and x).
Linear in the parameters means that the parameters are not multiplied
together, divided, squared or cubed etc.
Some models can be transformed to linear ones by a suitable substitution or
manipulation, e.g. the exponential regression model
Yt e X t e ut ln Yt ln X t ut
Then let yt=ln Yt and xt=ln Xt
yt xt ut
yt
ut
xt
1
zt
xt
But some models are intrinsically non-linear, e.g.
yt xt ut
Estimator or Estimate?
4. Cov (ut,xt)=0
one another
No relationship between the error and
corresponding x variate
Estimator
Linear
Unbiased
Best
Consistency/Unbiasedness/Efficiency
Consistent
lim Pr 0
Unbiased
The least squares estimates of and are unbiased.
That is E( )= and E( )=
Thus on average the estimated value will be equal to the true values. To prove this also
requires the assumption that E(ut)=0. Unbiasedness is a stronger condition than consistency.
Efficiency
An estimator of parameter is said to be efficient if it is unbiased and no other unbiased
estimator has a smaller variance. If the estimator is efficient, we are minimising the
probability that it is a long way off from the true value of .
( and ). The precision of the estimate is given by its standard error. Given
assumptions
1 - 4 above, then the standard errors can be shown to be given by
SE ( ) s
2
x
t
T ( xt x )
xt
T x T x
2
t
1
1
) s
SE
(
s
2the residuals.
2
2
where s is the estimated standard deviation
of
(
x
x
)
x
T
x
t
t
s2
1
ut2
u
t
the sample counterpart to ut, which is :
where
2
t
u
t
T 2
1. Both SE( ) and SE( ) depend on s2 (or s). The greater the variance s2,
then the more dispersed the errors are about their mean value and therefore
the more dispersed y will be about its mean value.
2. The sum of the squares of x about their mean appears in both formulae.
The larger the sum of squares, the smaller the coefficient variances.
x
is small or large:
t
y
y
2
t
is from t = 1 to T.
t
The reason is that
measures how far the points are away from the
y-axis.
2
t
y t xt
y t 59.12 0.35 xt
Example (contd)
u t2
130.6
SE(regression), s
2.55
T 2
20
SE ( ) 2.55 *
3919654
3.35
2
22 3919654 22 416.5
SE ( ) 2.55 *
1
0.0079
2
3919654 22 416.5
(14.38) (0.2561)
0.5091
is a single (point) estimate of the unknown population
parameter, . How reliable is this estimate?
The reliability of the point estimate is measured by the coefficients
standard error.
We can use the information in the sample to make inferences about the
population.
We will always have two hypotheses that go together, the null hypothesis
(denoted H0) and the alternative hypothesis (denoted H1).
The null hypothesis is the statement or the statistical hypothesis that is actually
being tested. The alternative hypothesis represents the remaining outcomes of
interest.
For example, suppose given the regression results above, we are interested in
the hypothesis that the true value of is in fact 0.5. We would use the notation
H0 : = 0.5
H1 : 0.5
This would be known as a two sided test.
N(, Var())
What if the errors are not normally distributed? Will the parameter estimates still
be normally distributed?
Yes, if the other assumptions of the CLRM hold, and the sample size is
sufficiently large.
~ N 0,1 and
var
~ N 0,1
var
~ tT 2 and
SE ( )
~ tT 2
SE ( )
Testing Hypotheses:
The Test of Significance Approach
Assume the regression equation is given by ,
yt xt ut
for t=1,2,...,T
2.5%
rejection region
95% non-rejection
region
2.5%
rejection region
f(x)
95% non-rejection
region
5% rejection region
f(x)
normal distribution
t-distribution
t(4)
The reason for using the t-distribution rather than the standard normal is that
we had to estimate 2, the variance of the disturbances.
tcrit
tcrit
SE ( )
t crit SE ( ) * t crit SE ( )
y t 20.3 0.5091xt
, T=22
(14.38) (0.2561)
Using both the test of significance and confidence interval approaches,
test the hypothesis that =1 against a two-sided alternative.
The first step is to obtain the critical value. We want tcrit = t20;5%
-2.086
+2.086
1917
.
0.2561
(0.0251,1.0433)
Do not reject H0 since Since 1 lies within the
test stat lies within
confidence interval,
non-rejection region do not reject H0
H1 : 0
H0 : = 2
vs.
H1 : 2
test stat
SE ( )
05091
.
1
1917
.
0.2561
5% rejection region
-1.725
5% rejection region
+1.725
t20;10% = 1.725. So now, as the test statistic lies in the rejection region,
we would reject H0.
Caution should therefore be used when placing emphasis on or making
decisions in marginal cases (i.e. in cases where we only just reject or
not reject).
If we reject the null hypothesis at the 5% level, we say that the result
of the test is statistically significant.
Result of
(reject H0)
=
Test
Insignificant
Type II error
( do not
=
reject H0)
more strict
criterion for
rejection
reject null
hypothesis
less often
more likely to
incorrectly not
reject
So there is always a trade off between type I and type II errors when choosing a
significance level. The only way we can reduce the chances of both is to increase
the sample size.
test statistic
If the test is
SE i
H 0 : i = 0
H 1 : i 0
i.e. a test that the population coefficient is zero against a two-sided
alternative, this is known as a t-ratio test:
i
Since i* = 0, test stat
SE ( i )
The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.
12 d.f.
2.179
H0: 2 = 0? (Yes)
5%
xt
R jt R ft j j ( Rmt R ft ) jt
t-ratio on
-0.25
Mean
-0.02%
0.91
-0.07
Minimum
-0.54%
0.56
-2.44
Maximum
0.33%
1.09
Median
-0.03%
0.91
3.11
Methodology
Calculate the monthly excess return of the stock over the market over a 12,
24 or 36 month period for each stock i:
Uit = Rit - Rmt
n = 12, 24 or 36 months
Calculate the average monthly return for the stock i over the first 12, 24,
or 36 month period:
1 n
Ri U it
n t 1
Portfolio Formation
Then rank the stocks from highest average return to lowest and from 5
portfolios:
Portfolio 1:
Portfolio 2:
Portfolio 3:
Portfolio 4:
Portfolio 5:
R pW
=
RDt
R pL
R pL
.
R pW
where
Rmt is the return on the FTA All-share
Rft
(Test 2)
n = 24
0.0011
-0.0003
1.68%
n =36
0.0129
0.0115
1.56%
-0.00031
(0.29)
0.0014**
(2.01)
0.0013
(1.55)
-0.00034
(-0.30)
-0.022
(-0.25)
0.00147**
(2.01)
0.010
(0.21)
0.0013*
(1.41)
-0.0025
(-0.06)
-0.0007
(-0.72)
0.0012*
(1.63)
0.0009
(1.05)
Return on Loser
Return on Winner
Implied annualised return difference
Coefficient for (3.47):
Notes: t-ratios in parentheses; * and ** denote significance at the 10% and 5% levels
respectively. Source: Clare and Thomas (1995). Reprinted with the permission of Blackwell
Publishers.
Is there evidence that losers out-perform winners more at one time of the
year than another?
To test this, calculate the difference between the winner & loser portfolios
as previously, RDt , and regress12this on 12 month-of-the-year dummies:
RDt i Mi t
i 1
Conclusions
Small samples
No diagnostic checks of model adequacy