Methods To Address Multicollinearity: A Project Report On
Methods To Address Multicollinearity: A Project Report On
Methods To Address Multicollinearity: A Project Report On
INDIAN INSTITUTE OF
TECHNOLOGY KANPUR
Submitted by –
Abhishek Verma (211255)
Krishna Pratap Mall (211320)
Shiv Varun Maurya (211380)
Abhinav Shukla (211415)
MTH689A:
Linear and Non-Linear
Under Guidance of – Dr. Satya Prakash Singh
Models (Department of Mathematics and Statistics)
One of the main assumptions underlying multiple regression models is independence
of regressor variables.
When there are near-linear dependencies among the regressors, the problem of
multicollinearity is said to exist.
• There are four primary sources of multicollinearity :
3. Model specification
4. An over-defined model
INFLATION = j=1,2,…,p
REGRESSION
The bias of is
On Solving , we get
RIDGE TRACE :
• Ridge trace is the graphical display of ridge regression estimator versus .
• If multicollinearity is present and is severe, then the instability of regression coefficients is reflected in the ridge
trace.
• As increases, some of the ridge estimates vary dramatically, and they stabilize at some value of .
• The objective in ridge trace is to inspect the trace (curve) and find the reasonable small value of at which the
ridge regression estimators are stable.
• The ridge regression estimator with such a choice of will have smaller MSE than the variance of OLSE.
FITTING RIDGE REGRESSION MODEL
• We have used glmnet function in R to fit our Ridge model.
• We have found the optimum value of tuning parameter
(lambda) for the ridge regression model which is 0.7651
using K-fold cross validation.
• Train model :
R-square =0.84
RMSE =3.76
• Test model :
R-Square =0.85
RMSE =3.52
Principal component analysis, or PCA, is a
dimensionality reduction method that is often
PRINCIPAL used to reduce the dimensionality of large
data sets, by transforming a large set of
variables into a smaller one that still contains
COMPONENT most of the information in the large set.