Linear Regression

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Secondary Data Analysis

• Linear Regression
• Multiple Regression
• Logistic Regression
Multiple Linear Regression

• Multiple regression is an extension of simple linear regression. It is


used when we want to predict the value of a variable based on the
value of two or more other variables. The variable we want to predict
is called the dependent variable (or sometimes, the outcome, target
or criterion variable).
Assumptions of Linear Multiple Regression
• No outliers
• Dependent variable is normally distributed
• Linear relationship between dependent and independent variables
• No or little multicollinearity
• No auto-correlation
• Homoscedasticity
• In Linear regression the sample size rule of thumb is that the
regression analysis requires at least 20 cases per independent variable
in the analysis.
No Outliers

• Check Cook’s distance in Save option while conducting Linear


Regression.

• Cook’s distance should be less than 1.

• A new variable will be added in the data showing Cook’s distance for
each respondent.
Dependent variable is normally distributed
• Analyze > Descriptive Statistics >
Explore
• In plots
• Check histogram
• Check normality plots with tests
• Shapiro Wilk test should have p value
greater than 0.05
Linear relationship
• Visually inspect scatter plot
• Graphs > Chart Builder >
Scatter/Dot
• Select the dependent variable in
y axis
• Select independent variable
individually in x axis
Multicollinearity
• Multicollinearity refers to a situation in which more than two
explanatory variables in a multiple regression model are highly
linearly related.
• Check Collinearity diagnostics in Statistics option while
conducting Linear Regression.
• VIF value should be less than 10.
No autocorrelation
• Data that is correlated with itself, as opposed to being correlated with
some other data
• May happen in time series data.
• Stock prices, where the price is not independent from the previous
price.
• Check Durbin Watson in Statistics option while conducting Linear
Regression.
• Durbin Watson should be between 1.5 and 2.5.
Homoskedasticity

• Homoscedasticity describes a situation in which the error term


(that is, the “noise” or random disturbance in the relationship
between the independent variables and the dependent variable)
is the same across all values of the independent variables.

• ZRESID in Y axis

• ZPRED in X axis
Regression Interpretation
• In Model summary table, check the value of Adjusted R square
• It should be greater than 0.3
• In ANOVA table, check the value of Significance level
• It should be less than 0.05. It means that the model is significant.
• In coefficients table, first check the value of significance level
• If the value of significance of a variable is greater than 0.05, that variable is
insignificant. Remove the variable and run the regression again. Repeat till
there are no insignificant variables.
• Finally, note down the B (Beta) value of all the variables.
• Develop the regression equation.

You might also like