Model Selection NEW

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

MODEL SELECTION

What is Model Selection?


“The process of selecting the machine learning model most appropriate for a given issue is
known as model selection.”

Model selection is a procedure that may be used to compare models of the same type that have been
set up with various model hyperparameters and models of other types

Model selection is a procedure used by statisticians to examine the relative merits of different
predictive methods and identify which one best fits the observed data. Model evaluation with the
data used for training is not accepted in data science because it easily generates overoptimistic and
overfitted models.

You may have to check things like

•Overfitting and underfitting


•Generalization error
•Validation for model selection
For certain algorithms, the best way to reveal the problem's structure to the learning algorithm is
through specific data preparation. The next logical step is to define model selection as the process
of choosing amongst model development workflows.

So, depending on your use case, you choose an ML model.

How to Choose the Best Model in Machine Learning

The choice of model is influenced by many variables, including dataset, task, model type, etc.

Generally, you need to consider two factors:

•Reason for choosing a model


•The model's performance
So let's explore the reason behind selecting a model. You can choose models based on their data and
task:

Type of data
•Images and videos
If your application mainly focuses on images and videos, for example, image recognition. The
Convolutional Neural Network model works better with images and videos when compared to other
models.
•Text data or speech data
Similarly, recurrent neural networks (RNN) are employed if your problem includes speech or text
data.

•Numerical data
You may use Support Vector Machine (SVM), logistic regression, and decision trees if your data is
numerical.

How to select a model based on the task?


•Classification Tasks - SVM, logistic regression, and decision trees.
•Regression tasks- Linear regression, Random Forest, Polynomial regression, etc.
•Clustering tasks- K means clustering, hierarchical clustering.
Therefore, depending on the type of data you have and the task you do, you may use a variety of
models.

Resampling methods
As the name implies, resampling methods are straightforward methods of rearranging data samples
to see how well the model performs on samples of data it hasn't been trained. Resampling, in other
words, enables us to determine the model's generalizability.

There are two main types of re-sampling techniques:


Cross-validation

It is a resampling procedure to evaluate models by splitting the data. Consider a situation where you
have two models and want to determine which one is the most appropriate for a certain issue. In this
case, we can use a cross-validation process.

So, let’s say you are working on an SVM model and have a dataset that iterates multiple times. We
will now divide the datasets into a few groups. One group out of the five will be used as test data.
Machine learning models are evaluated on test data after being trained on training data.

Let's say you calculated the accuracy of each iteration; the figure below illustrates the iteration and
accuracy of that iteration.

Now, let's calculate the mean accuracy of all the iterations, which comes to around 84.4%. You now
use the same procedure once again for the logistic regression model.

You can now compare the mean accuracy of the logistic regression model with the SVM. So,
according to accuracy, you might claim that a certain model is better for a given use case.

To implement cross-validation you can use

>>> from sklearn import datasets, linear_model


>>> from sklearn.model_selection import cross_val_score
>>> diabetes = datasets.load_diabetes()
>>> X = diabetes.data[:150]
>>> y = diabetes.target[:150]
>>> lasso = linear_model.Lasso()
>>> print(cross_val_score(lasso, X, y, cv=3))
[0.3315057 0.08022103 0.03531816]

Bootstrap

Another sampling technique is called Bootstrap, and it involves replacing the data with random
samples. It is used to sample a dataset using replacement to estimate statistics on a population.

•Used with smaller datasets


•The number of samples must be chosen.
•Size of all samples and test data should be the same.
•The sample with the most scores is therefore taken into account.
In simple terms, you start by:

•Randomly selecting an observation.


•You note that value.
•You put that value back.
Now, you repeat the steps N times, where N is the number of observations in the initial dataset. So
the final result is the one bootstrap sample with N observations.

Probabilistic measures

Information Criterion is a kind of probabilistic measure that can be used to evaluate the
effectiveness of statistical procedures. Its methods include a scoring system that selects the most
effective candidate models using a log-likelihood framework of Maximum Likelihood Estimation
(MLE).

Resampling only focuses on model performance, whereas probabilistic modeling concentrates


on both model performance and complexity.

•IC is a statistical metric that yields a score. The model with the lowest score is the most
effective.
•Performance is calculated using in-sample data; thereforea test set is unnecessary. Instead,
the score is calculated using all the training data.
•Less complexity entails a straightforward model with fewer parameters that is simple to
learn and maintain but unable to detect fluctuations that affect a model's performance.
There are three statistical methods for calculating the degree of complexity and how well a
particular model fits a dataset:
Akaike Information Criterion (AIC)

AIC is a single numerical score that may be used to distinguish across many models the one that is
most likely to be the best fit for a given dataset. AIC ratings are only helpful when compared to
other scores for the same dataset.

Lower AIC ratings are preferable.

AIC calculates the model's accuracy in fitting the training data set and includes a penalty term for
model accuracy.

K = the number of distinct variables or predictors.

L = the model's greatest likelihood

N is the number of data points in the practice set (especially helpful in the case of small datasets)

The drawback of AIC is that it struggles with generalizing models since it favors intricate models
that retain more training data. This implies that all tested models might still have a poor fit.

Minimum Description Length (MDL)

According to the MDL concept, the explanation that allows for the most data compression is the
best given a small collection of observed data. Simply put, it is a technique that forms the
cornerstone of statistical modeling, pattern recognition, and machine learning.

d = model D = the model's predictions

L(h) is the number of bits needed to express the model.

L(D | h) = amount of bits needed to describe the model's predictions

Bayesian Information Criterion (BIC)

BIC was derived using the Bayesian probability idea and is appropriate for models that use
maximum likelihood estimation during training.
BIC is more commonly employed in time series and linear regression models. However, it may be
applied broadly for any models based on maximum probability.

Structural Risk Minimization (SRM)

There are instances of overfitting when the model becomes biased toward the training data, which is
its primary source of learning.

A generalized model must frequently be chosen from a limited data set in machine learning, which
leads to the issue of overfitting when the model becomes too fitted to the specifics of the training set
and performs poorly on new data. By weighing the model's complexity against how well it fits the
training data, the SRM principle solves this issue.

Here,

J(f) is the complexity of the model


MODEL SELECTION

In machine learning, the process of selecting the top model or algorithm from a list of potential
models to address a certain issue is referred to as model selection. It entails assessing and
contrasting various models according to how well they function and choosing the one that reaches
the highest level of accuracy or prediction power.

Because different models have varied levels of complexity, underlying assumptions, and
capabilities, model selection is a crucial stage in the machine-learning pipeline. Finding a model
that fits the training set of data well and generalizes well to new data is the objective. While a model
that is too complex may overfit the data and be unable to generalize, a model that is too simple
could underfit the data and do poorly in terms of prediction.

•Problem formulation: Clearly express the issue at hand, including the kind of predictions or task
that you'd like the model to carry out (for example, classification, regression, or clustering).
•Candidate model selection: Pick a group of models that are appropriate for the issue at hand.
These models can include straightforward methods like decision trees or linear regression as well as
more sophisticated ones like deep neural networks, random forests, or support vector machines.
•Performance evaluation: Establish measures for measuring how well each model performs.
Common measurements include area under the receiver's operating characteristic curve (AUC-
ROC), recall, F1-score, mean squared error, and accuracy, precision, and recall. The type of
problem and the particular requirements will determine which metrics are used.
•Training and evaluation: Each candidate model should be trained using a subset of the available
data (the training set), and its performance should be assessed using a different subset (the
validation set or via cross-validation). The established evaluation measures are used to gauge the
model's effectiveness.
•Model comparison: Evaluate the performance of various models and determine which one
performs best on the validation set. Take into account elements like data handling capabilities,
interpretability, computational difficulty, and accuracy.
•Hyperparameter tuning: Before training, many models require that certain hyperparameters,
such as the learning rate, regularisation strength, or the number of layers that are hidden in a neural
network, be configured. Use methods like grid search, random search, and Bayesian optimization to
identify these hyperparameters' ideal values.
•Final model selection: After the models have been analyzed and fine-tuned, pick the model that
performs the best. Then, this model can be used to make predictions based on fresh, unforeseen
data.

Model Selection in machine learning:

Model selection in machine learning is the process of selecting the best algorithm and model
architecture for a specific job or dataset. It entails assessing and contrasting various models to
identify the one that best fits the data & produces the best results. Model complexity, data handling
capabilities, and generalizability to new examples are all taken into account while choosing a
model. Models are evaluated and contrasted using methods like cross-validation, and grid search, as
well as indicators like accuracy and mean squared error. Finding a model that balances complexity
and performance to produce reliable predictions and strong generalization abilities is the aim of
model selection.
There are numerous important considerations to bear in mind while selecting a model for machine
learning. These factors assist in ensuring that the chosen model is effective in solving the issue at its
core and has an opportunity for outstanding performance. Here are some crucial things to
remember:
•The complexity of the issue: Determine how complex the issue you're trying to resolve is. Simple
models might effectively solve some issues, but more complicated models can be necessary to fully
represent complex relationships in the data. Take into account the size of the dataset, the complexity
of the input features, and any potential for non-linear connections.
•Data Availability & Quality: Consider the accessibility and caliber of the data you already have.
Using complicated models with a lot of parameters on a limited dataset may result in overfitting.
Such situations may call for simpler models with fewer parameters. Take into account missing data,
outliers, and noise as well as how various models respond to these difficulties.
•Interpretability: Consider whether the model's interpretability is crucial in your particular setting.
Some models, like decision trees or linear regression, offer interpretability by giving precise
insights into the correlations between the input data and the desired outcome. Complex models,
such as neural networks, may perform better but offer less interpretability.
•Model Assumptions: Recognise the presumptions that various models make. For instance,
although decision trees assume piecewise constant relationships, linear regression assumes a linear
relationship between the input characteristics and the target variable. Make sure the model you
choose is consistent with the fundamental presumptions underpinning the data and the issue.
•Scalability and Efficiency: If you're working with massive datasets or real-time applications, take
the model's scalability and computing efficiency into consideration. Deep neural networks and
support vector machines are two examples of models that could need a lot of time and computing
power to train.
•Regularisation and Generalisation: Assess the model's capacity to apply to fresh, untested data.
By including penalty terms to the objective function of the model, regularisation approaches like L1
or L2 regularisation can help prevent overfitting. When the training data is sparse, regularised
models may perform better in terms of generalization.
•Domain Expertise: Consider your expertise and domain knowledge. On the basis of previous
knowledge of the data or particular features of the domain, consider if particular models are
appropriate for the task. Models that are more likely to capture important patterns can be found by
using domain expertise to direct the selection process.
•Resource Constraints: Take into account any resource limitations you may have, such as
constrained memory space, processing speed, or time. Make that the chosen model can be
successfully implemented using the resources at hand. Some models require significant resources
during training or inference.
•Ensemble Methods: Examine the potential advantages of ensemble methods, which integrate the
results of various models in order to perform more effectively. By utilizing the diversity of several
models' predictions, ensemble approaches, such as bagging, boosting, and stacking, frequently
outperform individual models.
•Evaluation and Experimentation: experimentation and assessment of several models should be
done thoroughly. Utilize the right evaluation criteria and statistical tests to compare their
performance. To evaluate the models' performance on unknown data and reduce the danger of
overfitting, use hold-out or cross-validation.
Model Selection Techniques

Model selection in machine learning can be done using a variety of methods and tactics. These
methods assist in comparing and assessing many models to determine which is best suited to solve a
certain issue. Here are some methods for selecting models that are frequently used:
•Train-Test Split: With this strategy, the available data is divided into two sets: a training set & a
separate test set. The models are evaluated using a predetermined evaluation metric on the test set
after being trained on the training set. This method offers a quick and easy way to evaluate a
model's performance using hypothetical data.
•Cross-Validation: A resampling approach called cross-validation divides the data into various
groups or folds. Several folds are used as the test set & the rest folds as the training set, and the
models undergo training and evaluation on each fold separately. Lowering the variance in the
evaluation makes it easier to generate an accurate assessment of the model's performance. Cross-
validation techniques that are frequently used include leave-one-out, stratified, and k-fold cross-
validation.
•Grid Search: Hyperparameter tuning is done using the grid search technique. In order to do this, a
grid containing hyperparameter values must be defined, and all potential hyperparameter
combinations must be thoroughly searched. For each combination, the models are trained, assessed,
and their performances are contrasted. Finding the ideal hyperparameter settings to optimize the
model's performance is made easier by grid search.
•Random Search: A set distribution for hyperparameter values is sampled at random as part of the
random search hyperparameter tuning technique. In contrast to grid search, which considers every
potential combination, random search only investigates a portion of the hyperparameter field. When
a thorough search is not possible due to the size of the search space, this strategy can be helpful.
•Bayesian optimization: A more sophisticated method of hyperparameter tweaking, Bayesian
optimization. It models the relationship between the performance of the model and the
hyperparameters using a probabilistic model. It intelligently chooses which set of hyperparameters
to investigate next by updating the probabilistic model and iteratively assessing the model's
performance. When the search space is big and expensive to examine, Bayesian optimization is
especially effective.
•Model averaging: This technique combines forecasts from various models to get a single
prediction. For regression issues, this can be accomplished by averaging the predictions, while for
classification problems, voting or weighted voting systems can be used. Model averaging can
increase overall prediction accuracy by lowering the bias and variation of individual models.
•Information Criteria: Information criteria offer a numerical assessment of the trade-off between
model complexity and goodness of fit. Examples include the Akaike Information Criterion (AIC)
and the Bayesian Information Criterion (BIC). These criteria discourage the use of too complicated
models and encourage the adoption of simpler models that adequately explain the data.
•Domain Expertise & Prior Knowledge: Prior understanding of the problem and the data, as well
as domain expertise, can have a significant impact on model choice. The models that are more
suitable given the specifics of the problem and the details of the data may be known by subject
matter experts.
•Model Performance Comparison: Using the right assessment measures, it is vital to evaluate the
performance of various models. Depending on the issue at hand, these measurements could include
F1-score, mean squared error, accuracy, precision, recall, or the area beneath the receiver's
operating characteristic curve (AUC-ROC). The best-performing model can be found by comparing
many models.

You might also like