What Is Linear Regression

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

What is Linear Regression?

Linear regression is a predictive statistical approach for


modelling relationship between a dependent variable with a
given set of independent variables.

It is a linear approach to modeling the relationship between a


dependent variable and one or more independent variables. When we
have only one independent variable it is as called simple linear
regression. For more than one independent variable, the process is
called multiple linear regression.

Therefore, minimizing the error between the model’s predictions and the
actual data means performing the following steps for each x value in your
data set:

 Use the linear regression equation, with values for A and B, to calculate predictions for
each value of x.
 Calculate the error for each value of x by subtracting the prediction for that x from the
actual, known data.
 Sum the error of all of the points to identify the total error from a linear regression
equation using values for A and B.
Keep in mind some errors will be positive while others will be negative.
Nevertheless, these errors will cancel each other out and bring the resulting error
closer to 0, despite errors in both readings.
Take for instance two points, one with an error of five and the other with an error
of -10. While we all know both points should be considered as causing 15 total
points of error, the method described above treats them as negative five points of
error. To overcome this problem, algorithms developing linear regression models
use the squared error instead of simply the error. In other words, the formula for
calculating error takes the form: Error = (Actual — Prediction)²
Linear Regression Model Representation
Linear Regression representation consists of a linear equation that
combines a specific set of input values (x), the solution to which is
the predicted output (y) for that set of input values (y).

For example, in a simple regression problem (a single x and a single


y), the form of the model would be:

y = B0 + B1*x, where

 B0 — represents the intercept

 B1 — represents the coefficient

 x — represents the independent variable

 y — represents the output or the dependent variable

B = correlation(x, y) * μ(y) / μ(x)


A = mean(y) — B * mean(x)
μ represents the standard deviations

Ordinary Least Squares


When we have more than one input we can use Ordinary Least
Squares to estimate the values of the coefficients.
The Ordinary Least Squares procedure seeks to minimize the sum
of the squared residuals. This means that given a regression line
through the data we calculate the distance from each data
point to the regression line, square it, and sum all of the
squared errors together. This is the quantity that ordinary
least squares seeks to minimize.

Some applications of Linear Regression


:
1. Studying engine performance from test data in
automobiles.
2. Least squares regression is used to model causal
relationships between parameters in biological systems.

3. OLS (ordinary least squares) regression is be used in


weather data analysis.

4. Linear regression is be used in market research studies


and customer survey results analysis.

5. Linear regression is used in observational astronomy. A


number of statistical tools and methods are used in
astronomical data analysis, and there are entire libraries
in languages like Python meant to do data analysis in
astrophysics.

What Are the Limits of Linear


Regression?
Just like all algorithms, there are limits to the performance of linear
regression.
As we’ve seen, the linear regression model is only capable of returning
straight lines. This makes it wholly unsuited to match data sets with any
sort of curve, such as exponential or logarithmic trends.
Linear regression only works when there’s a single dependent variable
and a single independent variable. If you want to include multiple of
either of those in your data set, you’ll need to use multiple regression. 
Finally, don’t use a linear regression model to predict values outside of
the range of your training data set. There’s no way to know that the
same trends hold outside of the training data set and you may need a
very different model to predict the behavior of the data set outside of
those ranges. Because of this uncertainty, extrapolation can lead to
inaccurate predictions–and then you'll never find those gnomes.
Example:
Number of man-hours and the corresponding productivity (in units) are furnished
below. Fit a simple linear regression equation ˆY = a + bx applying the method of
least squares.

Solution:

The simple linear regression equation to be fitted for the given data is

ŷ   = a  + bx

Here, the estimates of a and b can be calculated using their least squares estimates

From the given data, the following calculations are made with n=9
Substituting the column totals in the respective places in the of the
Type equation here .estimates aˆ and bˆ , their values can be calculated as
follows:

Thus, b = 1.5373.

 a =  y  − b  x

a = 121/9 – (1.5373× 62.1/9)

= 13.40 – 10.607

Hence, a =

Therefore, the required simple linear regression equation fitted to the given
data is

ŷ = 2.837 +1.537 x

It should be noted that the value of Y can be estimated using the above fitted
equation for the values of x in its range i.e., 3.6 to 10.7.

Goodness of our fit:


y=13.4
x y ‫ݕ‬െ‫ݕ‬
ത ሺ‫ݕ‬െ‫ݕ‬
തሻଶ ŷ y-ŷ ሺ‫ ݕ‬െþሻଶ
3.6 9.3 -4.144 17.176 8.370 0.930 0.865
4.8 10.2 -3.244 10.526 10.215 -0.015 0.000
7.2 11.5 -1.944 3.781 13.903 -2.403 5.776
6.9 12 -1.444 2.086 13.442 -1.442 2.080
10.7 18.6 5.156 26.580 19.283 -0.683 0.466
6.1 13.2 -0.244 0.060 12.213 0.987 0.975
7.9 10.8 -2.644 6.993 14.979 -4.179 17.467
9.5 22.7 9.256 85.665 17.439 5.262 27.683
5.4 12.7 -0.744 0.554 11.137 1.563 2.444
62.1 121 153.422 57.756

St = ∑ ( y i− y )2 = 153.42
i=1

Sr = ∑ ( y i− ŷ ) =57.756
2

i=1

S t−S r 153.422−57.75 6
R2= St
= 153.42 2
= 0.62354905

You might also like