01 - Introduction To ML

Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

L01

Introduction to
Machine Learning (ML)
19/01/2022
Definitions and core concepts

Eng. Marco Zappatore, Ph.D.


University of Salento
2

# Table of contents
› Foreword
› Introduction
› ML core elements
› Artificial Neural Networks and Deep Learning
› Data Preparation & Feature Engineering
› Essential Statistics and Data Visualizations
L01 – Introduction to ML

› Dataset examples

M. Zappatore
3

# Foreword
› This course will address the usage of Python scripts for Machine Learning (ML).
› Before introducing core Python programming concepts as well as specific packages
dedicated to ML (and even before presenting the development environments needed to
implement, execute, and validate Python scripts) it is worth providing some core
concepts and definitions about ML.
› This lesson provides some introductory explanations on ML algorithms and processes.
› The addressed topics will be dealt with in other courses for Ph.D. students at your
L01 – Introduction to ML

University/Department with much more details. Moreover, many of you most probably
have a certain degree of expertise about ML already.
› However, if you are not going to attend any of the other courses dedicated to ML or if
you did not have any prior knowledge about ML, you can refer to this lesson as the
common knowledge ground about ML, to be referenced during the remaining part of

M. Zappatore
this course.
4

Introduction
First definitions
5

# A lot of (apparent) confusion…


› There are several definitions and even more buzzwords
› For non-professionals, Artificial Intelligence (AI), Machine Learning (ML)
and Deep Learning (DL) are all the same
› However, they are not synonyms

› ML is a subfield of AI  there are several non-learning applications in AI


L01 – Introduction to ML

› DL is a subfield of ML  there are several non-deep application in ML

› Moreover, many other definitions and concepts are available, too: Data
Mining, Pattern Recognition, Statistical Learning, Computational Learning,

M. Zappatore
Computational Statistics, etc.
6

# First of all, what is learning?


› (for humans) It is a process by which we acquire new (or modify existing)
knowledge, skills, behaviors or preferences, thanks to specific underlying memory
mechanisms (e.g., habituation, associative learning, observational learning, etc.)
› According to Poggio & Shelton (AI Magazine, 1999), “The problem of learning is
arguably at the very core of the problem of intelligence, both biological and
artificial.”
› When machines are involved, the Artificial Intelligence (AI) is involved, which
L01 – Introduction to ML

refers to the creation of intelligent and/or adaptive systems.


› When intelligent systems are referred, if they are supplied with a learning
component, they are allowed to modify their decisional mechanisms in order to
improve their performances.

M. Zappatore
› This situation is commonly defined as Machine Learning (ML).
7

# What is Machine Learning (ML)?


› One of the most widely-accepted definitions was proposed by Tom Mitchell in 1997:
› Machine Learning is a computer program [that] is said to learn from
experience E with respect to some class of (learning) tasks T and
performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E
› The experience E usually comes in the form of data.
› The learning process usually happens in terms of rewards or penalties.
L01 – Introduction to ML

› The classes of (learning) tasks T are manifold: supervised/unsupervised learning, etc.


› There exists a relationships between data and performance, and not only between data and
learning task(s).

› Therefore, it can be said that ML is a computer algorithm that learns patterns and

M. Zappatore
models from data automatically (from large quantities of data) and then uses the
learnt model for predicting on new data.
8

# What is Machine Learning (ML)?


L01 – Introduction to ML

M. Zappatore
Source: A. Farbin (2016)
9

# Why is ML useful?
› ML is useful as a system construction method, because:
– Some tasks can be properly defined only by example
– Certain features of the working environment are not known at the design time
– Sometimes it is more useful to extract the solution from data rather than trying
to write down explicitly all the computational steps required to reach the solution
– Applications that can adapt to a changing environment reduce the need for
constant redesign
L01 – Introduction to ML

M. Zappatore
1

# When is ML useful?
0

› To address real-world problems that:


– Lack of consolidated theoretical/knowledge background
– Cannot be solved (efficiently or at all) by using available mathematical models
– Are affected by noisy data
– Are supplied with an excessive amount of data / information
L01 – Introduction to ML

› …provided that:
– An adequate amount of data well representing the problem is available
– It is possible / allowed to have a certain degree of tolerance in the results
precision/accuracy

M. Zappatore
1

# Where is ML useful?
1

› In situations that require:


– Predicting behaviours / events
– Estimating values / quantities
– Personalising services
– Recognising significant data from noisy/complex datasets
– Analysing large amounts of data / information
L01 – Introduction to ML

› More in general, ML is needed to tackle problems that are difficult to be managed


with traditional programming techniques (e.g., spam detection, behavior estimation,
facial recognition, real-time translation, etc.)

M. Zappatore
1

# AI vs ML vs DL
2

› Artificial Intelligence (AI): any technique able to mimic human behaviour (e.g.,
symbolic systems, knowledge bases, etc.)
› Machine Learning (ML): any AI technique able to learn from data (e.g., logistic
regression, decision trees, clustering, etc.)
› Representation Learning (RL): any ML technique able to learn representation of
data suitable to specific/generic tasks (e.g., shallow auto-encoders)
› Artificial Neural Network (ANN): any RL technique biologically inspired
L01 – Introduction to ML

› Deep Learning (DL): any multi-layered ANN (e.g., MLPs, DNNs, CNNs, RNNs,
GANs, VAEs, etc.)

M. Zappatore
1

# AI vs ML vs DL
3
L01 – Introduction to ML

M. Zappatore
Source: A. Farbin (2016)
L01 – Introduction to ML

#
of ML
Short history

(2020)
4
1

M. Zappatore
L01 – Introduction to ML

#
ML application potential

(2017)
5
1

M. Zappatore
1

# ML application potential
6
L01 – Introduction to ML

M. Zappatore
(Source: McKinsey & Company, 2019)
17

ML core elements
List of aspects that must be considered
when designing a ML application/solution
1

# Key elements for ML


8

› Data (observations or sample data)


› Learning Task (supervised/unsupervised/etc.)
› Learning “stuff”
– Computational model (how knowledge is represented)
› Decision Trees
› Artificial Neural Networks
L01 – Introduction to ML

› Bayesian Models
– Learning algorithm (how knowledge is adapted to data)
› Backpropagation
› Expectation-maximization

M. Zappatore
› Performance measurement (how learning quality and performance are quantified)
1

# Data and Data Quality


9

› As it can be easily guessed even at this initial stage, no matter how sophisticated
your ML system is, but it can only be as good as the data it is fed with.
› Similarly, if you input poor data to your ML system (garbage-in) you can achieve
poor output only (garbage-out).
› Learning quality increases with dataset size and data quality.
› A suitable learning quality is achieved only if an adequate coverage of the process
that you are willing to model is achieved.
L01 – Introduction to ML

› However, many more challenges exist: noisy data, missing data, unbalanced data,
etc.

M. Zappatore
2

# Preprocessing
0

› Preliminary activity of data preparation and filtering to ensure a pre-defined/minimal


degree of data quality
– Errors correction
– Missing data
– Noise reduction
› Finding data representation maximizing the performance of the learning model
– Scaling and normalization
L01 – Introduction to ML

– Feature selection and extraction


› ML models themselves can be used to preprocess the data

M. Zappatore
› The role of data preparation and data quality is pivotal in ML, as it will be
clarified in the following slides.
2

# Learning tasks
1

› Supervised Learning: predicting one or more dependent variables; based on labelled data;
like classification and regression. It begins with an established set of data and a certain
understanding of how data are classified. It is widely used in scenarios where are required
actions such as classification, approximation, control, modelling/identification, signal
processing, optimization, etc.
– Semi-Supervised Learning: not all the available data are labelled.
– Active Learning: the ML algorithm has to ask for (usually costly) labels with a limited
budget.
L01 – Introduction to ML

› Unsupervised Learning: looking for structure in data (without labels); like clustering or
pattern mining. It is best suited when the problem requires a massive amount of data that
are not known a priori (unlabelled). It is typically used for clustering, vector quantization,
feature extraction, signal coding, data analysis, etc.

M. Zappatore
2

# Learning tasks
2

› Adversarial Learning: the environment tries to deceive the learner; it can be both
supervised and unsupervised (e.g., spam filters, malware detectors).
– Generative-Adversarial Learning: the fooling environment is replaced by another ML
algorithm.
› Reinforcement Learning: only based on feedback to the algorithm's actions in a dynamic
environment. It is a behavioural learning model. The algorithm receives feedback from the
data analysis so that the user is guided to the best outcome. It differs from supervised
learning as the system is trained by trial and error instead of a sample dataset only.
L01 – Introduction to ML

› Deep Learning (DL): it exploits multi-layered ANNs so that the system can learn from
sample data in an iterative way. It is useful when it is required to learn patterns from
unstructured data

M. Zappatore
2

# Supervised Learning
3

› Learns a function h that maps inputs to desired outputs

Output is a
continuous
vector

Assign each
input to a
discrete class
A. Asperti, UniBO, 2019
L01 – Introduction to ML

y is discrete: y is (conceptually) continuous

› Needs supervised information associating the input xi to the desired target yi (which

M. Zappatore
can be an integer in {1, … , C} (classification) or real (regression)
› Training set is in the form of D = { (x1,y1), … , (xN,yN) }
2

# Unsupervised Learning
4

› Learns a natural / suitable grouping of the input data. It is used for clustering (see
picture below), finding a compressed representation of the available data,
estimating data density
L01 – Introduction to ML

› Only input pattern xi is provided (no desired output)


› Training set is in the form of D = {x1, … , xN}

M. Zappatore
2

# Reinforcement Learning
5

› Learns how to chose the best action based on rewards or penalties received from
the interacting environment. It is used for planning or behaviour learning.

› An input pattern xi is provided, which describes an observation coming from the


environment, along with a reward ri ∈ { –1, +1} returned as a response to the
predicted action yi
L01 – Introduction to ML

› Training set is in the form of D = { (x1,y1,r1), … , (xN,yN,rN) }

M. Zappatore
2

# Taxonomy of learning tasks


6
L01 – Introduction to ML

M. Zappatore
E. Ricci, FBK
2

# Taxonomy of learning tasks (enlarged)


7
L01 – Introduction to ML

M. Zappatore
2

# Computational Models (some examples)


8
L01 – Introduction to ML

A. Asperti, UniBO, 2019

M. Zappatore
› Computational models will be discussed in the following slides,
2

# Performance Measures for testing & validation


9

› Performance measurements strictly depend on the task (e.g., classification


performance metrics must differ from reinforcement learning performance metrics
at the two task types are different).
› Very often, performance measurements depend on the specific context, too (e.g.,
false positives assume different importance depending on the scenario: a false
positive in a repeatable test is less severe than a false positive in a one-shot test).
› Performance measurements may also depend on ethical, moral, and legal
L01 – Introduction to ML

considerations (e.g., a human behaviour classifier may achieve higher accuracy if


additional data such as race, religion, gender are considered, but this would render
the classifier ethically unacceptable).

M. Zappatore
3

# Sequence of key stages for ML


0

› Acquired knowledge is stored into model parameters W = {w1, … , wP}


› Two operational models are available:
– Learning phase (training and fitting)
› Building he model from known data (if available)
› Estimate model parameters from training dataset Dtrain
– Predictive phase (test or validation)
› Running the model with new, previously spared, samples Dtest
L01 – Introduction to ML

› Feed new data x ∈ Dtest as input to predict an output out(x)


› A Loss Function L(D,W) is used to estimate the quality of learned model
parameters W against dataset D,

M. Zappatore
3

# ML typical workflow (basic!)


1
L01 – Introduction to ML

M. Zappatore
ProMech (2018)
3

# ML typical workflow (basic, but applied)


2
L01 – Introduction to ML

M. Zappatore
ProMech (2018)
3

# ML typical workflow (slightly more detailed)


3
L01 – Introduction to ML

M. Zappatore
Medium.com (2019)
3

# Training
4

› It is an iterative process.
› Determines new values for model parameters W’ based on training data Dtrain
› Evaluates the newly obtained model based on the loss L(Deval, W’),
where Deval is either the training set Dtrain or an external validation set Dvalid
› If L(Deval, W’) is sufficiently small, the training phase stops, otherwise it keeps
iterating the steps described above.
L01 – Introduction to ML

M. Zappatore
3

# Examples of loss functions


5

› For classification tasks

𝑛𝑛𝑛𝑛. 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠


𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 =
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑛𝑛𝑛𝑛. 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

› For regression tasks


L01 – Introduction to ML

𝑁𝑁

𝑅𝑅𝑅𝑅𝑅𝑅 = �(𝑦𝑦𝑖𝑖 − 𝑜𝑜𝑜𝑜𝑜𝑜(𝑥𝑥𝑖𝑖 ))2


𝑖𝑖=1

M. Zappatore
3

# Testing & validation


6

› The core question for ML is, therefore, how well does an hypothesis perform.
› Measuring this aspect on training data only is not indicative
› Therefore, an external data subset not used for training is needed
› This subset, usually called validation subset, provides a reasonable estimation of
the ML system’s performances on new data. This operation is called validation or
test.
L01 – Introduction to ML

› This approach requires coping with two fundamental issues:


1. Model selection and model training stages must be separated from the model
testing stage (generalisation assessment)
2. More advanced statistical methods can be used to assess model performances if

M. Zappatore
small datasets are available (e.g., bootstrapping, cross-validation, etc.)
3

# Note: ML vs Optimization
7

› Even if ML problems have an optimization purpose, they differ from typical


optimization problems since the ML solution is not provided in an analytical form
(i.e., typically no closed form solutions are available)
› The optimization purpose is reached in ML applications via an iterative approach
› By iterating the algorithm applications, the result can be approximated
progressively
› Therefore, ML applications are forms of progressive learning applied to a
L01 – Introduction to ML

fitness function (i.e., loss function) depending on the results of past


observations

M. Zappatore
38

Artificial Neural Networks


and Deep Learning
An introduction
3

# Deep Learning (DL) – Definitions


9

› What is DL? DL refers to a specific branch of ML based on the usage of Artificial


Neural Networks (ANNs) made up of several layers of neuronal nodes that process
inputs in order to produce expected outputs.
› Why is DL so important? DL is more efficient than ML as neuronal layers
compute relevant features automatically, as human brain normally does.
› What is DL (more formally)? A family of parametric models which learn non-linear
hierarchical representations:
L01 – Introduction to ML

M. Zappatore
4

#
0
Artificial Neural Networks (ANNs) – A short history
L01 – Introduction to ML

M. Zappatore
VUNO, 2016
4

# Note: KB vs ML vs DL
1

› Knowledge-Based System (KBS): exploits logical rules defined according to


domain experts who explained how to solve a problem

› Traditional Machine Learning (ML): exploits machine-based progressive learning


algorithms applied to relevant features identified by a domain expert amongst all
the available data
L01 – Introduction to ML

› Deep Learning (DL): the same of traditional ML, without the domain expert

M. Zappatore
4

# Real Neurons (a reminder…)


2
L01 – Introduction to ML

› A brain neuron behaves like an organic switch:


– Synapses within dendrites send signals to it
– When the received signal exceeds a specific threshold, the neuron is

M. Zappatore
activated and emits a signal along its axion, which activates nearby neurons
4

# Artificial Neurons
3
L01 – Introduction to ML

› An artificial neuron is an abstraction of a real neuron


– It computes a weighted sum of its inputs and then result is passed through a
non-linear activation function

M. Zappatore
4

#
4
The simplest ANN: the Rosemblatt’s Perceptron
L01 – Introduction to ML

M. Zappatore
E. Ricci, FBK
4

# Non-Linear Activation Functions


5
L01 – Introduction to ML

› The activation function must be selected depending on the specific task (e.g., ReLU and
Identity: regression; Sigmoid: multiple classification, etc.)

M. Zappatore
› Activation functions are homogeneous within the same layer (i.e., all the nodes within the
same layer are activated by the same function)
4

# Layered architecture for ANNs


6
L01 – Introduction to ML

› A typical ANN is organized into multiple layers


› Each layer represents an abstraction hierarchy

M. Zappatore
› Each neuron within a layer must be activated according to a (possibly non-
linear) activation function
4

# Some reference layered architectures


7

› MULTILAYER PERCEPTRON (MLP): the simplest ones, made up of several


layers, ideal for regression and general-purpose classifications

› CONVOLUTIONAL NEURAL NETWORK (CNN): specifically designed for image


processing and classification
L01 – Introduction to ML

› RECURRENT NEURAL NETWORK (RNN): particularly suitable to sequential data


(e.g., text processing/translation, time-series analyses, etc.)

M. Zappatore
4

# Example #1 – Dense MLP


8
L01 – Introduction to ML

› Each neuron has multiple inputs and produces a single output that is passed as input to the
neurons of the following layer

M. Zappatore
› If each neuron of the following layer is reached by that input, the ANN is said to be dense
› If more than one hidden layer is adopted, the ANN is deep, otherwise it is shallow
4

# MLP architecture (details)


9
L01 – Introduction to ML

M. Zappatore
E. Ricci, FBK
5

# MLP training by backpropagation (pseudo-code)


0
L01 – Introduction to ML

M. Zappatore
E. Ricci, FBK
5

# Example #2 – CNN
1
L01 – Introduction to ML

M. Zappatore
A. Asperti, UniBO, 2019
5

# ANNs – Real-Life example


2
L01 – Introduction to ML

M. Zappatore
A. Brunello, UniUD, 2015
5

#
3
ANNs – Typical parameters & Hyper-parameters
Parameters
› Node weights
› Number of inputs
› Number of outputs

Hyper-parameters
› Number of hidden layers
› Number of nodes per hidden layer
L01 – Introduction to ML

› Inter-layer connection type


› Activation function type (per layer)
› Loss function (per layer)
› Learning rate
› Batch size

M. Zappatore
› Optimization method
› Number of epochs
54

Data Preparation &


Feature Engineering
Two core requirements
in any ML/DL project
5

# Data Preparation in ML projects


5

DATA PREPARATION
L01 – Introduction to ML

M. Zappatore
Source: Developers.google.com (2020)
5

# Data Preparation in ML projects


6

100%
~80%
L01 – Introduction to ML

M. Zappatore
Source: Developers.google.com (2020)
5

# Data Preparation in ML projects


7
L01 – Introduction to ML

M. Zappatore
Source: Developers.google.com (2020)
5

# Data Quality considerations


8

› Reliability: the degree to which you can trust your data (the more reliable training data are,
the more reliable the trained ML model is). Reliability is affected by:
– label errors (i.e., wrongly labelled data points),
– noisy features (e.g., fluctuating measurements),
– unfiltered data
– omitted values (e.g., the data operator forgot to fill in a data field)
– duplicate values
L01 – Introduction to ML

› Feature representation: it is the mapping of data to useful features. It may require:


– Data modelling
– Data normalization
– Outlier handling

M. Zappatore
5

# Data Types (from a statistics perspective)


9

Nominal discrete data, no quantitative value, no order


Categorical
Data Types

Ordinal discrete data, no quantitative value, ordered

discrete data, quantitative value,


Discrete can be counted, cannot be measured
L01 – Introduction to ML

continuous data, quantitative value,


Continuous cannot be counted, can be measured
Numerical
discrete data, quantiative value, ordered,
Interval no real base value, each unit has the same difference
discrete data, quantiative value, ordered,
Ratio real base value, each unit has the same difference

M. Zappatore
Source: N. Donges (2018)
6

# Data Types (from a statistics perspective)


0

Nominal
Categorical
Data Types

Ordinal

Discrete
L01 – Introduction to ML

Continuous
Numerical
Interval

Ratio

M. Zappatore
Source: N. Donges (2018)
6

# Data Evaluation
1

› A dataset can show different behaviors:


– Excessive availability of data
– Shortage of data
– Skewed proportions of data classes
– Heterogeneous data types (numerical parameters plus string parameters plus
Boolean parameters plus images plus… etc.)
L01 – Introduction to ML

› Therefore, data normalization is needed

M. Zappatore
6

# Data Transformation Techniques


2

Numerical parameters can be transformed in two ways:

› Normalization: numerical parameters are converted into the same scale to improve performance and
training stability of the model. Normalization is needed when you have:
– excessively different values within the same feature (this may cause problems to the gradient
update of the ML model)
– different ranges on different features (this may affect the model convergence)

› Bucketing: numerical (continuous) parameters are grouped into discrete bins/buckets

› Encoding: categorical (discrete) parameters are encoded as numerical (continuous) ones


L01 – Introduction to ML

Normalization

M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques 6

# for numerical parameters


3
L01 – Introduction to ML

M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques: 6

# feature clipping
4
L01 – Introduction to ML

Source: Developers.google.com (2020)

M. Zappatore
Data Transformation Techniques: 6

# log scaling
5
L01 – Introduction to ML

M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques: 6

# Z-score
6
L01 – Introduction to ML

M. Zappatore
Source: Developers.google.com (2020)
Data Transformation Techniques: 6

# Bucketing (fixed spacing)


7
L01 – Introduction to ML

Source: Developers.google.com (2020)

M. Zappatore
Data Transformation Techniques: 6

# Bucketing (quantile spacing)


8
L01 – Introduction to ML

Source: Developers.google.com (2020)

M. Zappatore
Data Transformation Techniques: 6

# Encoding (one-hot)
9
L01 – Introduction to ML

NOTE: from 1 feature with N


values we obtain N features

M. Zappatore
Source: Developers.google.com (2020)
70

Essential Statistics and


Data Visualizations
When preparing your ML/DL project,
statistical and visual data analysis is pivotal
7

# Suitable statistics
1

› Descriptive: identify patterns amongst data without allowing to make hypotheses


(e.g., mean, median, deviation, etc.)

› Inferential: allow making hypotheses on a sample of the entire population


L01 – Introduction to ML

M. Zappatore
7

# Typical descriptive statistics


2

› Count
› Mean
› Standard deviation
› min
› Max
› 25% or bottom quartile
L01 – Introduction to ML

› 50% or second quartile


› 75% or top quartile

M. Zappatore
7

# Essential chart types: Box-and-Whisker plot


3
L01 – Introduction to ML

M. Zappatore
7

# Box-and-Whisker plot [example]


4
L01 – Introduction to ML

M. Zappatore
Source: Matplotlib documentation (2020)
7

# Essential chart types: Scatter Plot


5
L01 – Introduction to ML

M. Zappatore
7

# Scatter plot [example]


6
L01 – Introduction to ML

M. Zappatore
Source: Seaborn documentation (2020)
7

# Essential chart types: Correlation Matrix


7

› It is a matrix having all the parameters to be tested placed both on rows and
columns

› It is widely used to assess visually the existence of linear correlations amongst the
considered parameters
L01 – Introduction to ML

› Each cell is colored depending on whether a positive or negative correlation exist

› No correlation conditions can be depicted without any color or with specific colors

M. Zappatore
7

# Correlation Matrix [example]


8
L01 – Introduction to ML

M. Zappatore
Source: Data to Fish (2020)
79

Dataset Examples
Three widely-known, entry-level
training&validation datasets
8

# MNIST – Handwritten digit database


0
L01 – Introduction to ML

M. Zappatore
L01 – Introduction to ML

#
IRIS Dataset
1
8

M. Zappatore
L01 – Introduction to ML

DATASET DATA MODEL


PIMA Indians Diabetes dataset
2
8

M. Zappatore
L01 – Introduction to ML

DATASET DATA MODEL


Boston house prices dataset
3
8

M. Zappatore
84

End of lesson.

You might also like