01 - Introduction To ML
01 - Introduction To ML
01 - Introduction To ML
Introduction to
Machine Learning (ML)
19/01/2022
Definitions and core concepts
# Table of contents
› Foreword
› Introduction
› ML core elements
› Artificial Neural Networks and Deep Learning
› Data Preparation & Feature Engineering
› Essential Statistics and Data Visualizations
L01 – Introduction to ML
› Dataset examples
M. Zappatore
3
# Foreword
› This course will address the usage of Python scripts for Machine Learning (ML).
› Before introducing core Python programming concepts as well as specific packages
dedicated to ML (and even before presenting the development environments needed to
implement, execute, and validate Python scripts) it is worth providing some core
concepts and definitions about ML.
› This lesson provides some introductory explanations on ML algorithms and processes.
› The addressed topics will be dealt with in other courses for Ph.D. students at your
L01 – Introduction to ML
University/Department with much more details. Moreover, many of you most probably
have a certain degree of expertise about ML already.
› However, if you are not going to attend any of the other courses dedicated to ML or if
you did not have any prior knowledge about ML, you can refer to this lesson as the
common knowledge ground about ML, to be referenced during the remaining part of
M. Zappatore
this course.
4
Introduction
First definitions
5
› Moreover, many other definitions and concepts are available, too: Data
Mining, Pattern Recognition, Statistical Learning, Computational Learning,
M. Zappatore
Computational Statistics, etc.
6
M. Zappatore
› This situation is commonly defined as Machine Learning (ML).
7
› Therefore, it can be said that ML is a computer algorithm that learns patterns and
M. Zappatore
models from data automatically (from large quantities of data) and then uses the
learnt model for predicting on new data.
8
M. Zappatore
Source: A. Farbin (2016)
9
# Why is ML useful?
› ML is useful as a system construction method, because:
– Some tasks can be properly defined only by example
– Certain features of the working environment are not known at the design time
– Sometimes it is more useful to extract the solution from data rather than trying
to write down explicitly all the computational steps required to reach the solution
– Applications that can adapt to a changing environment reduce the need for
constant redesign
L01 – Introduction to ML
M. Zappatore
1
# When is ML useful?
0
› …provided that:
– An adequate amount of data well representing the problem is available
– It is possible / allowed to have a certain degree of tolerance in the results
precision/accuracy
M. Zappatore
1
# Where is ML useful?
1
M. Zappatore
1
# AI vs ML vs DL
2
› Artificial Intelligence (AI): any technique able to mimic human behaviour (e.g.,
symbolic systems, knowledge bases, etc.)
› Machine Learning (ML): any AI technique able to learn from data (e.g., logistic
regression, decision trees, clustering, etc.)
› Representation Learning (RL): any ML technique able to learn representation of
data suitable to specific/generic tasks (e.g., shallow auto-encoders)
› Artificial Neural Network (ANN): any RL technique biologically inspired
L01 – Introduction to ML
› Deep Learning (DL): any multi-layered ANN (e.g., MLPs, DNNs, CNNs, RNNs,
GANs, VAEs, etc.)
M. Zappatore
1
# AI vs ML vs DL
3
L01 – Introduction to ML
M. Zappatore
Source: A. Farbin (2016)
L01 – Introduction to ML
#
of ML
Short history
(2020)
4
1
M. Zappatore
L01 – Introduction to ML
#
ML application potential
(2017)
5
1
M. Zappatore
1
# ML application potential
6
L01 – Introduction to ML
M. Zappatore
(Source: McKinsey & Company, 2019)
17
ML core elements
List of aspects that must be considered
when designing a ML application/solution
1
› Bayesian Models
– Learning algorithm (how knowledge is adapted to data)
› Backpropagation
› Expectation-maximization
M. Zappatore
› Performance measurement (how learning quality and performance are quantified)
1
› As it can be easily guessed even at this initial stage, no matter how sophisticated
your ML system is, but it can only be as good as the data it is fed with.
› Similarly, if you input poor data to your ML system (garbage-in) you can achieve
poor output only (garbage-out).
› Learning quality increases with dataset size and data quality.
› A suitable learning quality is achieved only if an adequate coverage of the process
that you are willing to model is achieved.
L01 – Introduction to ML
› However, many more challenges exist: noisy data, missing data, unbalanced data,
etc.
M. Zappatore
2
# Preprocessing
0
M. Zappatore
› The role of data preparation and data quality is pivotal in ML, as it will be
clarified in the following slides.
2
# Learning tasks
1
› Supervised Learning: predicting one or more dependent variables; based on labelled data;
like classification and regression. It begins with an established set of data and a certain
understanding of how data are classified. It is widely used in scenarios where are required
actions such as classification, approximation, control, modelling/identification, signal
processing, optimization, etc.
– Semi-Supervised Learning: not all the available data are labelled.
– Active Learning: the ML algorithm has to ask for (usually costly) labels with a limited
budget.
L01 – Introduction to ML
› Unsupervised Learning: looking for structure in data (without labels); like clustering or
pattern mining. It is best suited when the problem requires a massive amount of data that
are not known a priori (unlabelled). It is typically used for clustering, vector quantization,
feature extraction, signal coding, data analysis, etc.
M. Zappatore
2
# Learning tasks
2
› Adversarial Learning: the environment tries to deceive the learner; it can be both
supervised and unsupervised (e.g., spam filters, malware detectors).
– Generative-Adversarial Learning: the fooling environment is replaced by another ML
algorithm.
› Reinforcement Learning: only based on feedback to the algorithm's actions in a dynamic
environment. It is a behavioural learning model. The algorithm receives feedback from the
data analysis so that the user is guided to the best outcome. It differs from supervised
learning as the system is trained by trial and error instead of a sample dataset only.
L01 – Introduction to ML
› Deep Learning (DL): it exploits multi-layered ANNs so that the system can learn from
sample data in an iterative way. It is useful when it is required to learn patterns from
unstructured data
M. Zappatore
2
# Supervised Learning
3
Output is a
continuous
vector
Assign each
input to a
discrete class
A. Asperti, UniBO, 2019
L01 – Introduction to ML
› Needs supervised information associating the input xi to the desired target yi (which
M. Zappatore
can be an integer in {1, … , C} (classification) or real (regression)
› Training set is in the form of D = { (x1,y1), … , (xN,yN) }
2
# Unsupervised Learning
4
› Learns a natural / suitable grouping of the input data. It is used for clustering (see
picture below), finding a compressed representation of the available data,
estimating data density
L01 – Introduction to ML
M. Zappatore
2
# Reinforcement Learning
5
› Learns how to chose the best action based on rewards or penalties received from
the interacting environment. It is used for planning or behaviour learning.
M. Zappatore
2
M. Zappatore
E. Ricci, FBK
2
M. Zappatore
2
M. Zappatore
› Computational models will be discussed in the following slides,
2
M. Zappatore
3
M. Zappatore
3
M. Zappatore
ProMech (2018)
3
M. Zappatore
ProMech (2018)
3
M. Zappatore
Medium.com (2019)
3
# Training
4
› It is an iterative process.
› Determines new values for model parameters W’ based on training data Dtrain
› Evaluates the newly obtained model based on the loss L(Deval, W’),
where Deval is either the training set Dtrain or an external validation set Dvalid
› If L(Deval, W’) is sufficiently small, the training phase stops, otherwise it keeps
iterating the steps described above.
L01 – Introduction to ML
M. Zappatore
3
𝑁𝑁
M. Zappatore
3
› The core question for ML is, therefore, how well does an hypothesis perform.
› Measuring this aspect on training data only is not indicative
› Therefore, an external data subset not used for training is needed
› This subset, usually called validation subset, provides a reasonable estimation of
the ML system’s performances on new data. This operation is called validation or
test.
L01 – Introduction to ML
M. Zappatore
small datasets are available (e.g., bootstrapping, cross-validation, etc.)
3
# Note: ML vs Optimization
7
M. Zappatore
38
M. Zappatore
4
#
0
Artificial Neural Networks (ANNs) – A short history
L01 – Introduction to ML
M. Zappatore
VUNO, 2016
4
# Note: KB vs ML vs DL
1
› Deep Learning (DL): the same of traditional ML, without the domain expert
M. Zappatore
4
M. Zappatore
activated and emits a signal along its axion, which activates nearby neurons
4
# Artificial Neurons
3
L01 – Introduction to ML
M. Zappatore
4
#
4
The simplest ANN: the Rosemblatt’s Perceptron
L01 – Introduction to ML
M. Zappatore
E. Ricci, FBK
4
› The activation function must be selected depending on the specific task (e.g., ReLU and
Identity: regression; Sigmoid: multiple classification, etc.)
M. Zappatore
› Activation functions are homogeneous within the same layer (i.e., all the nodes within the
same layer are activated by the same function)
4
M. Zappatore
› Each neuron within a layer must be activated according to a (possibly non-
linear) activation function
4
M. Zappatore
4
› Each neuron has multiple inputs and produces a single output that is passed as input to the
neurons of the following layer
M. Zappatore
› If each neuron of the following layer is reached by that input, the ANN is said to be dense
› If more than one hidden layer is adopted, the ANN is deep, otherwise it is shallow
4
M. Zappatore
E. Ricci, FBK
5
M. Zappatore
E. Ricci, FBK
5
# Example #2 – CNN
1
L01 – Introduction to ML
M. Zappatore
A. Asperti, UniBO, 2019
5
M. Zappatore
A. Brunello, UniUD, 2015
5
#
3
ANNs – Typical parameters & Hyper-parameters
Parameters
› Node weights
› Number of inputs
› Number of outputs
Hyper-parameters
› Number of hidden layers
› Number of nodes per hidden layer
L01 – Introduction to ML
M. Zappatore
› Optimization method
› Number of epochs
54
DATA PREPARATION
L01 – Introduction to ML
M. Zappatore
Source: Developers.google.com (2020)
5
100%
~80%
L01 – Introduction to ML
M. Zappatore
Source: Developers.google.com (2020)
5
M. Zappatore
Source: Developers.google.com (2020)
5
› Reliability: the degree to which you can trust your data (the more reliable training data are,
the more reliable the trained ML model is). Reliability is affected by:
– label errors (i.e., wrongly labelled data points),
– noisy features (e.g., fluctuating measurements),
– unfiltered data
– omitted values (e.g., the data operator forgot to fill in a data field)
– duplicate values
L01 – Introduction to ML
M. Zappatore
5
M. Zappatore
Source: N. Donges (2018)
6
Nominal
Categorical
Data Types
Ordinal
Discrete
L01 – Introduction to ML
Continuous
Numerical
Interval
Ratio
M. Zappatore
Source: N. Donges (2018)
6
# Data Evaluation
1
M. Zappatore
6
› Normalization: numerical parameters are converted into the same scale to improve performance and
training stability of the model. Normalization is needed when you have:
– excessively different values within the same feature (this may cause problems to the gradient
update of the ML model)
– different ranges on different features (this may affect the model convergence)
Normalization
M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques 6
M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques: 6
# feature clipping
4
L01 – Introduction to ML
M. Zappatore
Data Transformation Techniques: 6
# log scaling
5
L01 – Introduction to ML
M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques: 6
# Z-score
6
L01 – Introduction to ML
M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques: 6
M. Zappatore
Data Transformation Techniques: 6
M. Zappatore
Data Transformation Techniques: 6
# Encoding (one-hot)
9
L01 – Introduction to ML
M. Zappatore
Source: Developers.google.com (2020)
70
# Suitable statistics
1
M. Zappatore
7
› Count
› Mean
› Standard deviation
› min
› Max
› 25% or bottom quartile
L01 – Introduction to ML
M. Zappatore
7
M. Zappatore
7
M. Zappatore
Source: Matplotlib documentation (2020)
7
M. Zappatore
7
M. Zappatore
Source: Seaborn documentation (2020)
7
› It is a matrix having all the parameters to be tested placed both on rows and
columns
› It is widely used to assess visually the existence of linear correlations amongst the
considered parameters
L01 – Introduction to ML
› No correlation conditions can be depicted without any color or with specific colors
M. Zappatore
7
M. Zappatore
Source: Data to Fish (2020)
79
Dataset Examples
Three widely-known, entry-level
training&validation datasets
8
M. Zappatore
L01 – Introduction to ML
#
IRIS Dataset
1
8
M. Zappatore
L01 – Introduction to ML
M. Zappatore
L01 – Introduction to ML
M. Zappatore
84
End of lesson.