Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
Email: [email protected]
Model Validation Techniques
Purpose: To estimate performance of classifier on previously unseen
data (test set)
Holdout
Random subsampling
Cross validation (CV)
Leave-one-out CV (LOOCV)
An optimal balance of bias and variance would never overfit or underfit the model.
Overfitting and Underfitting (Cont.)
Underfitting
Underfitting refers to a model that can neither fit the training data
nor generalize to new data.
Underfitting happens when a model unable to capture the
underlying pattern of the data. These models usually have high bias
and low variance.
It happens when we have very less amount of data to build an
accurate model or when we try to build a linear model with a
nonlinear data.
Overfitting
Overfitting refers to a model that fits the training data too well but
generalizes poor to new data.
Overfitting happens when our model captures the noise along with
the underlying pattern in data.
These models have low bias and high variance.
How to detect overfitting and
underfitting?
Summary
A model with a high bias error underfits data and makes very
simplistic assumptions on it
A model with a high variance error overfits the data and learns too
much from it
A good model is where both Bias and Variance errors are balanced
HW: How to avoid underfitting?
Find it by yourself
HW: How to avoid overfitting?
By far the most common problem in applied machine learning is overfitting. The
commonly used methodologies are:
Cross-Validation
Train with more diverse data
Remove irrelevant input features
Early Stopping
Pruning
Regularization
Some Learning Materials
Bias and Variance in Machine Learning – A Fantastic Guide for
Beginners!
Gentle Introduction to the Bias-Variance Trade-Off in Machine
Learning
What Are Overfitting and Underfitting in Machine Learning?
Overfitting and Underfitting With Machine Learning Algorithms