Statistical Testing and Prediction Using Linear Regression: Abstract
Statistical Testing and Prediction Using Linear Regression: Abstract
Statistical Testing and Prediction Using Linear Regression: Abstract
Abstract: -
We'll are familiar with basics things of R, including variables, matrices, data
frames, and functions, and we'll be using the ggplot2 package, to make
visualizations of our data. Finally, some of the familiarity with mathematics,
coming under the concept of a test hypothesis, a confidencel interv and a p-
value, will be useful to come up with the test which we are going to
implement. Something we will not go in depth of using mathematical formulas
or justifications behind our project which is simple linear regression analysis.
Instead, we will be doing how to implemented the test in R. These are not rich
in the R offers., once you know how to implement the methods, it's easy to
explore others.
Introduction: -
Linear regression is very useful technique in data science. Many people are
familiarity with this type of models where graphs indicates straight lines are
overlaid on the graphs. It is used to predict or to evaluate whether there is a
linear relationship between values.
Literature Survey: -
1). Multiple responses for each level of the predictor
2) Regression diagnostics.
Leverage :-
Residuals:-
As the residuals are the differences between the observed and predicted
values along a vertical plane, they provide a measure of how much of an
outlier each point is in y-space (on y-axis). Outliers are identified by relatively
large residual values. Residuals can also standardized and studentized, the
latter of which can be compared across different modelsandfollowat
distribution enabling the probability of obtaining agivenresidual can be
determined. The patterns of residuals against predicted y values (residual plot)
are also useful diagnostic tools for investigating linearity and homogeneity of
variance assumptions.
Cook’s D:-
• splines -join together a series of polynomial fits that have been generated
after the entire data cloud is split up into a number of smaller windows, the
5
We can access one of these columns using the dollar sign: for example:
Input= mtcars$mpg
Output=
Input= mtcars$wt
Output=
9
Now we can see the straight line on our ggplot. The grey area indicates that it
is the uncertainty in the fit: it's a 95% confidence level of where the true trend
line could be. It's worth noting to teal that this is not a perfect linear model we
can see that values of both the variables have a tendency to be higher than we
would predict.
10
Conclusion:-
inferred that simple linear regression analysis means we can analyze a
response variable from an independent one, so whenever we need to know
from the beginning each time we add information.