DS Unit-Iv
DS Unit-Iv
Graphing techniques like Q-Q plots determine whether the residuals are
normally distributed. The residuals should fall along a diagonal line in
the center of the graph. If the residuals are not normalized, you can test
the data for random outliers or values that are not typical. Removing the
outliers or performing nonlinear transformations can fix the issue.
Homoscedasticity
Homoscedasticity assumes that residuals have a constant variance or standard
deviation from the mean for every value of x. If not, the results of the analysis
might not be accurate. If this assumption is not met, you might have to change
the dependent variable. Because variance occurs naturally in large datasets, it
makes sense to change the scale of the dependent variable. For example,
instead of using the population size to predict the number of fire stations in a
city, might use population size to predict the number of fire stations per
person.
What are the types of linear regression?
• Logistic regression
Simple linear regression
1. Easy to implement: R provides built-in functions, such as lm(), to perform Simple Linear
Regression quickly and efficiently.
2. Easy to interpret: Simple Linear Regression models are easy to interpret, as they model a
linear relationship between two variables.
3. Useful for prediction: Simple Linear Regression can be used to make predictions about the
dependent variable based on the independent variable.
4. Provides a measure of goodness of fit: Simple Linear Regression provides a measure of how
well the model fits the data, such as the R-squared value.
Disadvantages of Simple Linear Regression in R:
In this type of logistic regression model, the dependent variable has three or more possible
outcomes; however, these values have no specified order. For example, movie studios
want to predict what genre of film a moviegoer is likely to see to market films more
effectively. A multinomial logistic regression model can help the studio to determine the
strength of influence a person's age, gender, and dating status may have on the type of film
that they prefer. The studio can then orient an advertising campaign of a specific movie
toward a group of people likely to go see it.
Ordinal logistic regression:
where:
The p-values in the output also give us an idea of how effective each predictor
variable is at predicting the probability of default: