Lecture 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Regression Introduction and Estimation Review

Dr. Frank Wood


Quick Example - Scatter Plot

Use linear regression/demo.m


Linear Regression
I Want to find parameters for a function of the form

Yi = β0 + β1 Xi + i

I Distribution of error random variable not specified


Quick Example - Scatter Plot
Formal Statement of Model

Yi = β0 + β1 Xi + i

I Yi value of the response variable in the i th trial


I β0 and β1 are parameters
I Xi is a known constant, the value of the predictor variable in
the i th trial
I i is a random error term with mean E {i } = 0 and finite
variance σ 2 {i } = σ 2
I i = 1, . . . , n
Properties
I The response Yi is the sum of two components
I Constant term β0 + β1 Xi
I Random term i
I The expected response is

E {Yi } = E {β0 + β1 Xi + i }
= β0 + β1 Xi + E {i }
= β0 + β1 Xi
Expectation Review
I Definition
Z
E {X } = E {X } = XP(X )dX , X ∈ R

I Linearity property

E {aX } = aE {X }
E {aX + bY } = aE {X } + bE {Y }

I Obvious from definition


Example Expectation Derivation

P(X ) = 2X , 0 ≤ X ≤ 1
Expectation
Z 1
E {X } = XP(X )dX
0
Z 1
= 2X 2 dX
0
2X 3 1
= |
3 0
2
=
3
Expectation of a Product of Random Variables
If X,Y are random variables with joint distribution P(X , Y ) then
the expectation of the product is given by
Z
E {XY } = XYP(X , Y )dXdY .
XY
Expectation of a product of random variables
What if X and Y are independent? If X and Y are independent
with density functions f and g respectively then
Z
E {XY } = XYf (X )g (Y )dXdY
XY
Z Z
= XYf (X )g (Y )dXdY
ZX Y Z
= Xf (X )[ Yg (Y )dY ]dX
ZX Y

= Xf (X )E {Y }dX
X
= E {X }E {Y }
Regression Function
I The response Yi comes from a probability distribution with
mean

E {Yi } = β0 + β1 Xi

I This means the regression function is

E {Y } = β0 + β1 X

Since the regression function relates the means of the


probability distributions of Y for a given X to the level of X
Error Terms
I The response Yi in the i th trial exceeds or falls short of the
value of the regression function by the error term amount i
I The error terms i are assumed to have constant variance σ 2
Response Variance
Responses Yi have the same constant variance

σ 2 {Yi } = σ 2 {β0 + β1 Xi + i }
= σ 2 {i }
= σ2
Variance (2nd central moment) Review
I Continuous distribution
Z
2 2
σ {X } = E {(X −E {X }) } = (X −E {X })2 P(X )dX , X ∈ R

I Discrete distribution
X
σ 2 {X } = E {(X − E {X })2 } = (Xi − E {X })2 P(Xi ), X ∈ Z
i
Alternative Form for Variance

σ 2 {X } = E {(X − E {X })2 }
= E {(X 2 − 2XE {X } + E {X }2 )}
= E {X 2 } − 2E {X }E {X } + E {X }2
= E {X 2 } − 2E {X }2 + E {X }2
= E {X 2 } − E {X }2 .
Example Variance Derivation

P(X ) = 2X , 0 ≤ X ≤ 1

σ 2 {X } = E {(X − E {X })2 } = E {X 2 } − E {X }2
Z 1
2
= 2XX 2 dX − ( )2
0 3
4
2X 1 4
= | −
4 0 9
1 4 1
= − =
2 9 18
Variance Properties

σ 2 {aX } = a2 σ 2 {X }
σ 2 {aX + bY } = a2 σ 2 {X } + b 2 σ 2 {Y } ifX ⊥
⊥Y
σ 2 {a + cX } = c 2 σ 2 {X } ifa, c both constant

More generally
X XX
σ2{ ai Xi } = ai aj Cov(Xi , Xj )
i j
Covariance
I The covariance between two real-valued random variables X
and Y, with expected values E {X } = µ and E {Y } = ν is
defined as

Cov (X , Y ) = E {(X − µ)(Y − ν)}

I Which can be rewritten as

Cov (X , Y ) = E {XY − νX − µY + µν},


Cov (X , Y ) = E {XY } − νE {X } − µE {Y } + µν,
Cov (X , Y ) = E {XY } − µν.
Covariance of Independent Variables
If X and Y are independent, then their covariance is zero. This
follows because under independence

E {XY } = E {X }E {Y } = µν.

and then
Cov (XY ) = µν − µν = 0.
Least Squares Linear Regression
I Seek to minimize
n
X
Q= (Yi − (b0 + b1 Xi ))2
i=1

I By careful choice of b0 and b1 where b0 is a point estimator


for β0 and b1 is the same for β1

How?
Guess #1
Guess #2
Function maximization
I Important technique to remember!
I Take derivative
I Set result equal to zero and solve
I Test second derivative at that point
I Question: does this always give you the maximum?
I Going further: multiple variables, convex optimization
Function Maximization
Find
argmax −x 2 + ln(x)
x

ï5

ï10
ïx2 + ln(x)

ï15

ï20

ï25
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Least Squares Max(min)imization
I Function to minimize w.r.t. b0 and b1 , b0 and b1 are called
point estimators of β0 and β1 respectively
n
X
Q= (Yi − (b0 + b1 Xi ))2
i=1

I Minimize this by maximizing -Q


I Either way, find partials and set both equal to zero
dQ
= 0
db0
dQ
= 0
db1
Normal Equations
I The result of this maximization step are called the normal
equations.
X X
Yi = nb0 + b1 Xi
X X X
Xi Yi = b0 Xi + b1 Xi2

I This is a system of two equations and two unknowns. The


solution is given by. . .
Solution to Normal Equations
After a lot of algebra one arrives at
P
(Xi − X̄ )(Yi − Ȳ )
b1 =
(Xi − X̄ )2
P

b0 = Ȳ − b1 X̄
P
Xi
X̄ =
Pn
Yi
Ȳ =
n

You might also like