Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
Listed below are the major topics covered in class that are likely to be in
Midterm Exam #2:
Mean (expectation), variance and standard deviation of a discrete random variable.
n
X n
X p
E[X] = xi P (X = xi ), Var[X] = (xi E[X])2 P (X = xi ), sd[X] = Var[X]
i=1 i=1
Pn
x)2
i=1 (xi
p
s2x = , sx = s2x
n1
Pn
(yi y)2 q
s2y = i=1 , sy = s2y
n1
1
Interpreting covariance, correlation and regression coefficients.
SST = SSR + SSE
Coefficient of determination:
SSR SSE
R2 = =1 2
= rxy
SST SST
Regression assumptions and statistical model.
Y = 0 + 1 X + , N (0, 2 )
yi = 0 + 1 xi + i , i N (0, 2 )
yi N (0 + 1 xi , 2 )
Assuming 0 , 1 and 2 are known, given xi , the 95% prediction interval of yi is
(0 + 1 xi ) 2.
b1 2sb1 , b0 2sb0 .
Hypothesis testing:
We test the null hypothesis H0 : 1 = 10 versus the alternative H1 : 1 6= 10 .
b1 10
The t-stat t = s b1 measures the number of standard errors the estimate
b1 is from the proposed value 10 .
The p-value provides a measure of how weird your estimate b1 is if the null
hypothesis is true.
We usually reject the null hypothesis if |t| > 2, p < 0.05, or 10 is not within
the 95% confidence interval (b1 2sb1 , b1 + 2sb1 ).
Forecasting:
Given Xf , the 95% plug-in prediction interval of Yf is (b0 + b1 Xf ) 2s.
A large predictive error variance (high uncertainty) comes from a large s, a
small n, a small sx and a large difference between Xf and X.
2
Multiple Linear Regression
Statistical model:
Y = 0 + 1 X1 + 2 X2 + + p Xp + , N (0, 2 )
Y | X1 . . . Xp N (0 + 1 X1 + + p Xp , 2 )
e = 0, Corr(Xj , e) = 0, Corr(Y , e) = 0
2
R2 = Corr(Y, Y ) = SSR SST = 1 SST
SSE
s2
s2bj =
variation in Xj not associated with other Xs
3
Dummy Variables and Interactions
Dummy variables
Gender: Male, Female; Education level: High-school, Bachelor, Master, Doc-
tor; Month: Jan, Feb, , Dec
A variable of n categories can be included into multiple linear regression
using C dummy variables, where 1 C n 1
Representing a variable of n categories with n dummy variables will lead to
the problem of perfect multicollinearity
Interpretation: the same slope but different intercepts
Interactions
Interpretation: different intercepts and slopes
Diagnostics and Transformations
Diagnostics
Model assumptions:
Statistical model:
Y = 0 + 1 X1 + 2 X2 + + p Xp + , N (0, 2 )
Y = 0 + 1 X + 2 X 2 + + m X m + , N (0, 2 )
Y = e0 X 1 e , N (0, 2 )
4
Interpretation: about 1 % change in Y per 1% change in X.
Example: price elasticity
95% plug-in prediction interval of log(Y )
(0 + 1 log(X)) 2s
Log transformation of Y
Statistical model:
log(Y ) = 0 + 1 X + , N (0, 2 )
Y = e0 e1 X e , N (0, 2 )
Interpretation: about (1001 )% change in Y per unit change in X (if 1 is
small).
Example: exponential growth
Time Series
Yt = 0 + 1 Yt1 + t , t N (0, 2 )
Yt = 0 + 1 Yt1 + 2 t + t , t N (0, 2 )
5
Modeling seasonality
Using no more than 11 dummy variables for 12 months; using no more than
3 dummy variables for 4 quarters
Seasonal model:
Yt = 0 + 1 Jan + + 11 N ov + t
Yt = 0 + 1 Jan + + 11 N ov + 12 Yt1 + 13 t + t