Week 6 Machine Learning

Business Analytics
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Predictive Data Mining
Introduction
• An observation, or record, is the set of recorded values of variables
associated with a single entity.
• Supervised learning: Data mining methods for predicting an
outcome based on a set of input variables, or features.
• Supervised learning can be used for:
• Estimation of a continuous outcome.
• Classification of a categorical outcome.
Introduction
The data mining process comprises the following steps:
1. Data sampling. Extract a sample of data that is relevant to the business
problem under consideration.
2. Data preparation. Manipulate the data to put it in a form suitable for formal
modeling.
3. Data partitioning. Divide the sample data for the training, validation, and
testing of the data mining algorithm performance.
4. Model construction. Apply the appropriate data mining technique to the
training data set to accomplish the desired data mining task (classification or
estimation).
5. Model assessment. Evaluate models by comparing performance on the
training and validation data sets.
Data Sampling, Preparation, and
Partitioning
Data Sampling, Preparation, and Partitioning
• When dealing with large volumes of data, best practice is to extract

a representative sample for analysis.
• A sample is representative if the analyst can make the same
conclusions from it as from the entire population of data.
• Data mining algorithms typically are more effective given more data.
• When obtaining a representative sample, it is generally best to

include as many variables as possible in the sample.
• After exploring the data with descriptive statistics and visualization,
the analyst can eliminate variables that are not of interest.
• Data mining applications deal with an abundance of data that
simplifies the process of assessing the accuracy of data-based
estimates of variable effects.
• Overfitting occurs when the analyst builds a model that does a

great job of explaining the sample of data on which it is based, but
fails to accurately predict outside the sample data.
• We can use the abundance of data to guard against the potential for
overfitting by splitting the data set into different subsets for:
• The training (or construction) of candidate models.
• The validation and testing (or performance comparison/assessment)
of candidate models and future performance of a selected model.
k-Fold Cross-Validation
• k-Fold Cross-Validation: A robust procedure to train and validate
models in which observations to be used to train and validate the
model are repeatedly randomly divided into k subsets called folds.
In each iteration, one fold is designated as the validation set and the
remaining k-1 folds are designated as the training set. The results of
the iterations are then combined and evaluated.
Class Imbalanced Data
• There are two basic sampling approaches for modifying the class
distribution of the training set:
• Undersampling: Balances the number of Class 1 and Class 0
observations in a training set by removing majority class
observations from the training set.
• Oversampling: Balances the number of Class 1 and Class 0
observations in a training set by inserting copies of minority class
observations into the training set.
Performance Measures
Evaluating the Classification of Categorical Outcomes
Evaluating the Estimation of Continuous Outcomes
Evaluating the Classification of Categorical Outcomes:
• By counting the classification errors on a sufficiently large validation set
and/or test set that is representative of the population, we will generate
an accurate measure of the model’s classification performance.
• Classification confusion matrix: Displays a model’s correct and incorrect
classifications.
o True Positive (TP) Predicted Class
o False Positive (FP)
o False Negative (FN) Actual Class Positive Negative
o True Negative (TN) Positive TP FN
Negative FP TN
Evaluating the Classification of Categorical Outcomes (cont.):
• One minus the overall error rate is often referred to as the accuracy of the model.
• Overall error rate: Percentage of misclassified observations:

• While overall error rate conveys an aggregate measure of misclassification, it counts
as misclassifying an actual Class 0 observation as a Class 1 observation (a false
positive) the same as misclassifying an actual Class 1 observation as a Class 0
observation (a false negative).
• The ability to correctly predict Class 1 (positive) observations is
commonly expressed as sensitivity, or recall, and is calculated as:
• The ability to correctly predict Class 0 (negative) observations is

commonly expressed as specificity and is calculated as:
• Precision is a measure that corresponds to the proportion of
observations predicted to be Class 1 by a classifier that are actually in
Class 1:
Precision = TP / |TP + TN|
• The F1 Score combines precision and sensitivity into a single measure and
is defined as:
• The receiver operating characteristic (ROC) curve is an alternative
graphical approach for displaying the tradeoff between a classifier’s ability
to correctly identify Class 1 observations and its Class 0 error rate.
• In general, we can evaluate the quality of a classifier by computing the area
under the ROC curve, often referred to as the AUC.
• The greater the area under the ROC curve, i.e., the larger the AUC, the
better the classifier performs.
Figure: Receiver
Operating
Characteristic
(ROC) Curve

Week 6 Machine Learning

Uploaded by

Copyright:

Available Formats

Week 6 Machine Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 6 Machine Learning

Uploaded by

Copyright:

Available Formats

Business Analytics

• When dealing with large volumes of data, best practice is to extract

• When obtaining a representative sample, it is generally best to

• Overfitting occurs when the analyst builds a model that does a

• Overall error rate: Percentage of misclassified observations:

• The ability to correctly predict Class 0 (negative) observations is

You might also like