Lecture 2

Regression Introduction and Estimation Review
Dr. Frank Wood

Quick Example - Scatter Plot
Use linear regression/demo.m

Linear Regression
I Want to find parameters for a function of the form
Yi = β0 + β1 Xi + i
I Distribution of error random variable not specified

Quick Example - Scatter Plot
Formal Statement of Model
Yi = β0 + β1 Xi + i
I Yi value of the response variable in the i th trial

I β0 and β1 are parameters
I Xi is a known constant, the value of the predictor variable in
the i th trial
I i is a random error term with mean E {i } = 0 and finite
variance σ 2 {i } = σ 2
I i = 1, . . . , n
Properties
I The response Yi is the sum of two components
I Constant term β0 + β1 Xi
I Random term i
I The expected response is
E {Yi } = E {β0 + β1 Xi + i }
= β0 + β1 Xi + E {i }
= β0 + β1 Xi
Expectation Review
I Definition
Z
E {X } = E {X } = XP(X )dX , X ∈ R
I Linearity property
E {aX } = aE {X }
E {aX + bY } = aE {X } + bE {Y }
I Obvious from definition

Example Expectation Derivation
P(X ) = 2X , 0 ≤ X ≤ 1
Expectation
Z 1
E {X } = XP(X )dX
0
Z 1
= 2X 2 dX
0
2X 3 1
= |
3 0
2
=
3
Expectation of a Product of Random Variables
If X,Y are random variables with joint distribution P(X , Y ) then
the expectation of the product is given by
Z
E {XY } = XYP(X , Y )dXdY .
XY
Expectation of a product of random variables
What if X and Y are independent? If X and Y are independent
with density functions f and g respectively then
Z
E {XY } = XYf (X )g (Y )dXdY
XY
Z Z
= XYf (X )g (Y )dXdY
ZX Y Z
= Xf (X )[ Yg (Y )dY ]dX
ZX Y
= Xf (X )E {Y }dX
X
= E {X }E {Y }
Regression Function
I The response Yi comes from a probability distribution with
mean
E {Yi } = β0 + β1 Xi
I This means the regression function is
E {Y } = β0 + β1 X
Since the regression function relates the means of the

probability distributions of Y for a given X to the level of X
Error Terms
I The response Yi in the i th trial exceeds or falls short of the
value of the regression function by the error term amount i
I The error terms i are assumed to have constant variance σ 2
Response Variance
Responses Yi have the same constant variance
σ 2 {Yi } = σ 2 {β0 + β1 Xi + i }
= σ 2 {i }
= σ2
Variance (2nd central moment) Review
I Continuous distribution
Z
2 2
σ {X } = E {(X −E {X }) } = (X −E {X })2 P(X )dX , X ∈ R
I Discrete distribution
X
σ 2 {X } = E {(X − E {X })2 } = (Xi − E {X })2 P(Xi ), X ∈ Z
i
Alternative Form for Variance
σ 2 {X } = E {(X − E {X })2 }
= E {(X 2 − 2XE {X } + E {X }2 )}
= E {X 2 } − 2E {X }E {X } + E {X }2
= E {X 2 } − 2E {X }2 + E {X }2
= E {X 2 } − E {X }2 .
Example Variance Derivation
P(X ) = 2X , 0 ≤ X ≤ 1
σ 2 {X } = E {(X − E {X })2 } = E {X 2 } − E {X }2
Z 1
2
= 2XX 2 dX − ( )2
0 3
4
2X 1 4
= | −
4 0 9
1 4 1
= − =
2 9 18
Variance Properties
σ 2 {aX } = a2 σ 2 {X }
σ 2 {aX + bY } = a2 σ 2 {X } + b 2 σ 2 {Y } ifX ⊥
⊥Y
σ 2 {a + cX } = c 2 σ 2 {X } ifa, c both constant
More generally
X XX
σ2{ ai Xi } = ai aj Cov(Xi , Xj )
i j
Covariance
I The covariance between two real-valued random variables X
and Y, with expected values E {X } = µ and E {Y } = ν is
defined as
Cov (X , Y ) = E {(X − µ)(Y − ν)}
I Which can be rewritten as
Cov (X , Y ) = E {XY − νX − µY + µν},

Cov (X , Y ) = E {XY } − νE {X } − µE {Y } + µν,
Cov (X , Y ) = E {XY } − µν.
Covariance of Independent Variables
If X and Y are independent, then their covariance is zero. This
follows because under independence
E {XY } = E {X }E {Y } = µν.
and then
Cov (XY ) = µν − µν = 0.
Least Squares Linear Regression
I Seek to minimize
n
X
Q= (Yi − (b0 + b1 Xi ))2
i=1
I By careful choice of b0 and b1 where b0 is a point estimator

for β0 and b1 is the same for β1
How?
Guess #1
Guess #2
Function maximization
I Important technique to remember!
I Take derivative
I Set result equal to zero and solve
I Test second derivative at that point
I Question: does this always give you the maximum?
I Going further: multiple variables, convex optimization
Function Maximization
Find
argmax −x 2 + ln(x)
x
ï5
ï10
ïx2 + ln(x)
ï15
ï20
ï25
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Least Squares Max(min)imization
I Function to minimize w.r.t. b0 and b1 , b0 and b1 are called
point estimators of β0 and β1 respectively
n
X
Q= (Yi − (b0 + b1 Xi ))2
i=1
I Minimize this by maximizing -Q

I Either way, find partials and set both equal to zero
dQ
= 0
db0
dQ
= 0
db1
Normal Equations
I The result of this maximization step are called the normal
equations.
X X
Yi = nb0 + b1 Xi
X X X
Xi Yi = b0 Xi + b1 Xi2
I This is a system of two equations and two unknowns. The

solution is given by. . .
Solution to Normal Equations
After a lot of algebra one arrives at
P
(Xi − X̄ )(Yi − Ȳ )
b1 =
(Xi − X̄ )2
P
b0 = Ȳ − b1 X̄
P
Xi
X̄ =
Pn
Yi
Ȳ =
n

Lecture 2

Uploaded by

Copyright:

Available Formats

Lecture 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2

Uploaded by

Copyright:

Available Formats

Regression Introduction and Estimation Review

Dr. Frank Wood

Use linear regression/demo.m

I Distribution of error random variable not specified

I Yi value of the response variable in the i th trial

I Obvious from definition

I This means the regression function is

Since the regression function relates the means of the

Cov (X , Y ) = E {(X − µ)(Y − ν)}

I Which can be rewritten as

Cov (X , Y ) = E {XY − νX − µY + µν},

I By careful choice of b0 and b1 where b0 is a point estimator

I Minimize this by maximizing -Q

I This is a system of two equations and two unknowns. The

You might also like