Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Evaluation Metrics for

Regression
D r. JASMEET S INGH
ASSISTANT P ROFESSOR, C SED
T IET, PATIALA
Regression Evaluation Metrics
 The performance of the regression model is generally measured in terms of error in prediction i.e., the
difference between the actual values and the predicted values for all the instances in the test set.

 The various error metrics used in regression analysis are:

1. Mean Absolute Error

2. Mean Squared Error

3. Root Mean Squared Error

4. R2 Score (Coefficient of Determination)

5. Adjusted R2 Score
Mean Absolute Error
 The Mean Absolute Error(MAE) is the average of all absolute errors where
absolute error is the absolute value of the difference between the measured value
(predicted) and “true” value (actual).

1
𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 (𝑀𝐴𝐸) = |𝑦 − 𝑦 ∧ |
𝑛

 where yi is the actual value, ∧ is the predicted value of ith input of test set and n
are the total number of test samples.
Mean Squared Error
 The mean squared error (MSE) or mean squared deviation (MSD) of
an estimator (of a procedure for estimating an unobserved quantity) measures
the average of the squares of the errors—that is, the average squared difference
between the estimated values and the actual value

∧ 2

 where yi is the actual value, ∧ is the predicted value of ith input of test set and n
are the total number of test samples.
Root Mean Squared Error
 The root-mean-square deviation (RMSD) or root-mean-square
error (RMSE) is a frequently used measure of the differences between values
(sample or population values) predicted by a model.
 It is the square root of the mean squared error.

∧ 2

where yi is the actual value, ∧ is the predicted value of ith input of test set and n
are the total number of test samples.
R2 Score/ Coefficient of Determination
It measures the proportion of the variation independent variable explained by all
the independent variables in the model.
 It assumes that every independent variable in the model helps to explain
variation in the dependent variable.
 It is measured as the ratio of the explained variance of the model is to the total
variance of the data.

 where yi is the actual value, ∧ is the predicted value of ith input of test set, is
the mean of actual values of y and n are the total number of test samples.
R2 Score
Alternately, R2 Score is measured from the unexplained variance as follows:

 where SSE denote sum square error and SST denote sum square total.
 The value of R2 lies in between -1 and 1. R2 is negative only when the chosen
model does not follow the trend of the data, so fits worse than the regression line.
 Mathematically, it is possible when error sum-of-squares from the model is
larger than the total sum-of-squares from the horizontal line.
Significance of R2 Score
 R-squared is a statistical measure of how close the data are to the fitted regression line.

• 0% indicates that the model explains none of the variability of the response data around
its mean.

•100% indicates that the model explains all the variability of the response data around its
mean.

Higher the R-squared, the better the model fits your data.
Evaluation Metrics- Numerical Example
Consider that the number lectures per day (x)
affects the number of hours spent at university per S.No x y
day (y).
The equation of the regression line is 1 2 2

y =0.143+1.229x
Find 2 3 4
(i) MAE
3 4 6
(ii) MSE
(iii) RMSE 4 6 7
(iv) R2 Score
For the test set shown in Table
Evaluation Metrics- Numerical Example
∧ ∧ ∧
S.No x y y =0.143+1.229x Error=y-y Ab. Error=|y-y | Sq. Error y-mean(y) SST

1 2 2 2.601 -0.601 0.601 0.361201 -2.75 7.562


5
2 3 4 3.83 0.17 0.17 0.0289 -0.75 0.562
5
3 4 6 5.059 0.941 0.941 0.885481 1.25 1.562
5
4 6 7 7.517 -0.517 0.517 0.267289 2.25 5.062
5
Total 19 2.229 1.542871 0 14.75
Evaluation Metrics- Numerical Example
∧ .

∧ .

.
.
Adjusted R2 Score
 It measures the proportion of variation explained by only those independent variables
that really affect the dependent variable.

It penalizes you for adding independent variable that do not affect the dependent
variable.

 Every time you add a independent variable to a model, the R-squared increases, even if
the independent variable is insignificant. It never declines.

Adjusted R-squared increases only when independent variable is significant and affects
dependent variable.
Adjusted R2 Score
 Adjusted R2 is computed as follows:

 where R2 is the sample R-square; k is the number of predictors (independent variables)


and N is the total sample size.

 Adjusted R2 score must be used to compare different regression models with different
number of predictors and in case we want to decide the important predictors in our
training set.

You might also like