ELM Tutorial
ELM Tutorial
A Brief Tutorial
Outline
http://en.wikipedia.org/wiki/Perceptron
5
Perceptron and AI Winter
6
Three Waves of Machine Learning
Features: computers
1980s-2010: Research powerful enough,
driven powerful and smart
computing
sensors/devices
Features: computers everywhere, huge data
very powerful, many coming. Efficient
efficient algorithms algorithms under way
developed, no enough
data in many cases Situation; No matter
1950s-1980s: Warm up you admit or not, we
Situation: more driven have to rely on machine
by researchers instead of learning from now on
Features: computers not industries
powerful, no efficient
algorithms, no enough
data
Situation: Chinese
people already had good
dream since the inception
of computers and called
computers as “Electronic
Brains (电脑)” 7
Rethink Artificial Intelligence and
Machine Learning
Machine Learning
Artificial Intelligence
ELMs’ direct
Neural Networks reviving biological
Almost all Deep Learning evidence found
Rosenblatt’s (CNN, BP, etc) techniques in 2012
perceptron proposed proposed in 1980s
in 1950s AI Winter ELMs born in 2004
(1970s) Deep Learning reviving in 2004
SVM proposed
due to high performance of computing
in 1990s
8
Necessary Conditions of Machine
Learning Era
Rich
dynamic
Powerful data
computing
environment
Efficient
learning
algorithms
1 i L
Output of RBF hidden nodes:
1 i L L Hidden Nodes , ,
(ai , bi )
x x x
10
Feedforward Neural Networks
• Mathematical Model
– Approximation capability [Leshno 1993, Park and Sandberg 1991]: Any
continuous target function can be approximated by SLFNs
with adjustable hidden nodes. In other words, given any small
positive value , for SLFNs with enough number of hidden nodes
( ) we have .
– Classification capability [Huang, et al 2000]: As long as SLFNs can
approximate any continuous target function , such SLFNs can
differentiate any disjoint regions.
M. Leshno, et al., “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural
Networks, vol. 6, pp. 861-867, 1993.
J. Park and I. W. Sandberg, “Universal approximation using radial-basis-function networks,” Neural Computation, vol. 3, pp. 246-257,
1991.
G.-B. Huang, et al, “Classification ability of single hidden layer feedforward neural networks,” IEEE Trans. Neural Networks, vol. 11,
no. 3, pp. 799–801, May 2000.
11
Feedforward Neural Networks
• Learning Issue
– Conventional theories: only resolves the existence issue, however,
does not tackle learning issue at all.
– In real applications, target function is usually unknown. One
wishes that unknown could be approximated by SLFNs
appropriately.
12
Feedforward Neural Networks
• Learning Methods
– Many learning methods mainly based on gradient-descent / iterative
approaches have been developed over the past three decades.
• Back-Propagation (BP) [Rumelhart 1986] and its variants are most popular.
– Least-square (LS) solution for RBF network, with single impact factor for
all hidden nodes. [Broomhead and Lowe 1988]
– QuickNet (White, 1988) and Random vector functional network (RVFL) [Igelnik
and Pao 1995]
– Support vector machines and its variants. [Cortes and Vapnik 1995]
– Deep learning: dated back to 1960s and resurgence in mid of 2000s [wiki
2015]
13
Support Vector Machine – an
Alternative Solution of SLFN
SVM optimization formula
1
minimize:
2
b subject to: · 1 ,∀
0, ∀
x
The decision function of SVM and LS-SVM is:
14
Feedforward Neural Networks
15
Research in Neural Networks Stuck …?
Conventional Learning Methods Biological Learning
Very sensitive to network size Stable in a wide range (tens to thousands
of neurons in each module)
Difficult for parallel implementation Parallel implementation
Difficult for hardware implementation “Biological” implementation
Very sensitive to user specified parameters Free of user specified parameters
Different network types for different type of One module possibly for several
applications types of applications
Time consuming in each learning point Fast in micro learning point
16
Research in Neural Networks Stuck …?
• Reasons
– Based on the conventional existence theories:
• Since hidden nodes are important and critical, we need to find some
way to adjust hidden nodes.
• Learning focuses on hidden nodes.
• Learning is tremendously inefficient.
– Intensive research: many departments/groups in almost every
university/research institution have been spending huge
manpower on looking for so-called “appropriate” (actually still
very basic) learning methods in the past 30 years.
• Question
– Is free lunch really impossible?
– The answer is “seemingly far away, actually close at hand and
right under nose” “远在天边, 近在眼前” 17
Fundamental Problems to Be Resolved
by Extreme Learning Machines (ELM)
• Do we really need so many different types of learning
algorithms for so many different types of networks?
– different types of SLFNs
• sigmoid networks
• RBF networks
• polynomial networks
• complex (domain) networks
• Fourier series
• wavelet networks, etc
– multi-layers of architectures
• Do we really need to tune wide type of hidden neurons
including biological neurons (even whose modeling is
18
unknown) in learning?
Extreme Learning Machines (ELM)
, ,
1 i L Feature learning
Clustering
Problem based Regression
optimization constraints Classification
The hidden layer output function (hidden
1 i L layer mapping, ELM feature space):
(ai , bi ) , , ,⋯, , ,
1 d
The output functions of hidden nodes can be
xj but are not limited to:
– It not only proves the existence of the networks but also provides learning
solutions.
– All these hidden node parameters can be randomly generated without training data.
– That is, for any continuous target function and any randomly generated
sequence , , lim x lim ∑ , , 0 holds
→ →
with probability one if is chosen to minimize , ∀ . [Huang, et al 2006]
G.-B. Huang, et al., “Universal approximation using incremental constructive feedforward networks with random hidden nodes,”
IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006.
G.-B. Huang and L. Chen, “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, 2007.
O. Barak, et al, "The importance of mixed selectivity in complex cognitive tasks," Nature, vol.497, pp. 585-590, 2013
M. Rigotti, et al, "The sparseness of mixed selectivity neurons controls the generalization-discrimination trade-off," Journal of 20
Neuroscience, vol. 33, no. 9, pp. 3844-3856, 2013
Extreme Learning Machines (ELM)
• Essence of ELM
– Hidden layer need not be tuned.
• “randomness” is just one of ELM’s implementation, but not all
– It satisfies both ridge regress theory [Hoerl and Kennard 1970] and neural
network generalization theory [Bartlett 1998].
– It fills the gap and builds bridge among neural networks, SVM,
random projection, Fourier series, matrix theories, linear systems,
etc.
23
Basic ELM – a L2 Norm Solution
• Salient Features
– “Simple Math is Enough.” ELM is a simple tuning-free three-step
algorithm.
– The learning speed of ELM is extremely fast.
– Unlike conventional existence theories, the hidden node parameters
are not only independent of the training data but also of each other.
Although hidden nodes are important and critical, they need not
be tuned.
– Unlike conventional learning methods which MUST see the
training data before generating the hidden node parameters, ELM
could generate the hidden node parameters before seeing the training
data.
– Homogenous architectures for compression, feature learning,
clustering, regression and classification.
25
Extreme Learning Machines (ELM)
and
subject to: ,∀
26
Extreme Learning Machines (ELM)
and
,
– Kernel based (if is unknown): ⋮
,
where ,
x · ,
27
Image Super-Resolution by ELM
From top to down: super-resolution at 2x and 4x. State-of-the-art methods: iterative curve based
interpolation (ICBI), kernel regression based method (KR), compressive sensing based sparse
representation method (SR). [An and Bhanu 2012]
28
Automatic Object Recognition
Situation of the wind measuring towers in Spain and within the eight wind farms. Wind speed prediction in tower 6 of the
considered wind farm in Spain obtained by the ELM network (prediction using data from 7 towers). (a) Best prediction
obtained and (b) worst prediction obtained. [Saavedra-Moreno, et al, 2013]
30
Electricity Price Forecasting
Average results of market clearing prices (MCP) forecast by ELM in winter: Trading in the Australian
national electricity market (NEM) is based on a 30-min trading interval. Generators submit their offers every 5 min
each day. Dispatch price is determined every 5 min and 6 dispatch prices are averaged every half-hour to determine
the regional MCPs. In order to assist decision-making process for generators, there are totally 48 MCPs needed to
31
be predicted at the same time for the coming trading day. [Chen, et al, 2012]
Remote Control of a Robotic Hand
• An eight wrist motions offline
classification using linear
support vector machines with
little training time (under 10
minutes).
• This study shows human could
control the remote side robot
hand in real-time using his or
her sEMG signals with less than
50 seconds recorded training
data with ELM.[Lee, et al 2011]
32
Human Action Recognition
[Minhas, et al 2012]
33
3D Shape Segmentation and Labelling
[Xie, et al 2014] 34
Constraints of BP and SVM Theory
35
Constraints of BP and SVM Theory
36
Essential Considerations of ELM
High
Accuracy
Least User
Intervention
Real-Time
Learning
(in seconds,
milliseconds, even
microseconds)
37
ELM for Threshold Networks
39
ELM for Complex Networks
• Circular functions:
– tan , sin
• Hyperbolic functions:
– tanh , sinh
40
ELM for Complex Networks
41
ELM for Complex Networks
42
ELM for Complex Networks
Compared with ESN, ELM reduces the error rate by 1000 times or
above. 43
Why SVM / LS-SVM Are
Suboptimal
Optimization Constraints of ELM and
LS-SVM
subject to: ,∀
– The corresponding dual optimization problem:
Minimize: ∑ ∑ ∑
subject to: , , 0, ∀
45
Optimization Constraints of ELM and
LS-SVM
subject to: · 1 ,∀
In LS-SVM optimal are found
– The corresponding dual optimization problem: from one hyper plane
∑ 0
Minimize: ∑ ∑ ·
1
subject to:
, , · 1 0, ∀
0
46
Optimization Constraints of ELM and
SVM
subject to: 1 ,∀
0, ∀
– The corresponding dual optimization problem:
Minimize: ∑ ∑ · ∑
subject to: 0 ,∀
47
Optimization Constraints of ELM and
SVM
Minimize: ∑
i
C
subject to: · 1 ,∀
0, ∀ C 1
In SVM optimal are found from
one hyper plane ∑ 0
– The corresponding dual optimization problem:
Minimize: ∑ ∑ · ∑
subject to: 0 ,∀
∑ 0
48
Optimization Constraints of ELM and
SVM
N N
C C
i i
C C
C 1 C 1
ELM’s inequality constraint variant [Huang, et al 2010] SVM
ELM (based on inequality constraint conditions) and SVM have the same dual optimization
objective functions, but in ELM optimal are found from the entire cube 0, while in SVM
optimal are found from one hyperplane ∑ 0 within the cube 0, . SVM always
provides a suboptimal solution, so does LS-SVM.
49
SVM’s Suboptimal Solutions
• Reasons
– SVM’s historical role is irreplaceable! Without SVM and Vapnik,
computational intelligence may not be so successful and the history of
computational intelligence would be re-written! However ...
– SVM always searches for the optimal solution in the hyperplane
∑ 0 within the cube 0, of the SVM feature space.
– SVMs may apply similar application-oriented constraints to
irrelevant applications and search similar hyper planes in feature
space if their target labels are similar. Irrelevant applications may
become relevant in SVM solutions.
[Huang, et al 2010]
N
i
C
50
C 1
SVM’s Suboptimal Solutions
• Reasons
– SVM is too “generous” on the feature mappings and kernels,
almost condition free except for Mercer’s conditions.
1) As the feature mappings and kernels need not satisfy universal
approximation condition, must be present.
2) As exists, contradictions are caused.
3) LS-SVM inherits such “generosity” from the conventional SVM
51
SVM’s Suboptimal Solutions
· 0 =0
2 2
· 1 =+1
As SVM was originally proposed for classification, universal approximation capability was not
considered at the first place. Actually the feature mappings are unknown and may not
satisfy universal approximation condition, must be present to absorb the system error. ELM
was originally proposed for regression, the feature mappings are known and universal
approximation capability was considered at the first place. In ELM the system error tends to be
zero and should not be present.
52
SVM’s Suboptimal Solutions
• Maximum margin?
– Maximum margin is good to binary classification cases. However,
if only considering maximum margin, one may not be able to
imagine “maximum margin” in multi-class / regression
problems.
– To over-emphasize “maximum margin” makes the SVM research
deadlock in binary classification and difficult to find the direct
solution to multi-class applications
– “Maximum margin” is just a special case of ridge regression
theory, linear system stability, and neural network generalization
performance theory in binary applications.
• ELM integrates the ridge regression theory, linear system stability,
and neural network generalization performance theory for
regression and multiclass applications, “maximum margin” is just a
special case in ELM’s binary applications.
53
SVM’s Suboptimal Solutions
G.-B. Huang, et al., “Extreme learning machine for regression and multiclass classification”, IEEE Transactions on Systems, Man
and Cybernetics - Part B, vol. 42, no. 2, pp. 513-529, 2012.
G.-B. Huang, “An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels”, Cognitive
Computation, 2014.
54
ELM and SVM
(a) SVM
Binary Output
d Input Nodes
Unknown features in each layer, black box, lose layer wise information
ELM
d Input Nodes ELM Feature Space ELM Feature Space m Output Nodes
55
Layer wide features are learned, white box
Relationship and Difference Between
ELM and SVM/LS-SVM
56
ELM vs QuickNet / RVFL
oj
1 i L
Problem based
optimization constraints
d 1 dL 1 i L
(ai , bi )
Enhanced Patterns
(specific ELM feature mapping
such as sigmoid nodes and
1 d RBF nodes) 1 d
xj xj
QuickNet (1989, not patented) / RVFL (1994, patented) ELM (not patented)
, · , ,
Mainly on sigmoid and RBF nodes, not applicable to kernels learning Proved on general cases: any piecewise continuous nodes. ELM theories
extended to biological neurons whose mathematical formula is even unknown
Not feasible for multi-layer of RVFL, losing learning in auto-encoder and Efficient for multi-layer of ELM, auto-encoder, and feature learning, PCA and
feature learning. RVFL and PCA/Random project are different random projects are specific cases of ELM when linear neurons are used.
If ELM’s optimization is used in QuickNet (1988) / RVFL and Schimidt Regularization of output weights, ridge regression theories, neural networks
(1992), a suboptimal solution tends to be achieved. generalization performance theories (maximal margin in binary class cases),
SVM and LS-SVM provide suboptimal solutions.
Hidden layer output matrix: [HELM for Sig or RBF, X N x d] Hidden layer output matrix: HELM for almost any nonlinear piecewise neurons
Homogenous architectures for compression, feature learning, clustering, 57
regression and classification
Relationship and Difference Between
ELM and QuickNet/RVFL, Duin’s Work
G.-B. Huang, “What are Extreme Learning Machines? Filling the Gap between Frank Rosenblatt’s Dream and John von
Neumann’s Puzzle”, Cognitive Computation, vol. 7, pp. 263-278, 2015. 58
Part II
Hierarchical ELM
- Layer-wise learning
- but learning without iteratively tuning hidden neurons
- output weights analytically calculated by closed-forms solutions in many
applications
Multi-Layer ELM
d Input Nodes ELM Feature Space ELM Feature Space m Output Nodes
Different from Deep Learning, All the hidden neurons in ELM as
a whole are not required to be iteratively tuned 60
ELM as Auto-Encoder (ELM-AE)
61
ELM as Auto-Encoder (ELM-AE)
ELM-AE vs. singular value decomposition. (a) The output weights of ELM-AE and (b) rank 20
SVD basis shows the feature representation of each number (0–9) in the MNIST dataset.
62
ELM as Auto-Encoder (ELM-AE)
ELM-AE based multi-Layer ELM (ML-ELM): Different from Deep Learning, no iteration is
required in tuning the entire multi-layer feedforward networks
63
ELM vs Deep Learning
L. L. C. Kasun, et al, “Representational Learning with Extreme Learning Machine for Big Data,” IEEE Intelligent Systems, vol. 28,
no. 6, pp. 31-34, 2013.
J. Tang, et al, “Extreme Learning Machine for Multilayer Perceptron,” (in press) IEEE Transactions on Neural Networks and 64
Learning Systems, 2015.
Human Action Recognition
Methods ELM Tensor canonical Tangent bundles on
Based correlation special manifolds
Accuracies 99.4 85 93.4
[Deng, et al 2015]
65
Target Tracking
Feature Extraction
Frame (n) Sampling Online Sequential Updating
(Multilayer Encoding)
Updating OS-ELM
J. Xiong, et al, “Extreme Learning Machine for Multilayer Perceptron”, IEEE Transactions on Neural Networks and Learning
Systems, 2015. 66
Target Tracking
ELM
ELM
J. Xiong, et al, “Extreme Learning Machine for Multilayer Perceptron”, IEEE Transactions on Neural Networks and Learning
Systems, 2015. 67
Target Tracking
Comparison of tracking location error using H-ELM, CT, and SDA on different data sets. (a) David Indoor. (b) Trellis
68
Car Detection
Methods ELM Contour based SDA
Based learning
Accuracies 95.5 92.8 93.3
Time 46.78 s 3262.30 s
[Deng, et al 2014]
J. Xiong, et al, “Extreme Learning Machine for Multilayer Perceptron”, IEEE Transactions on Neural Networks and Learning
Systems, 2015. 69
ELM vs Deep Learning
Learning Methods Testing Accuracy Training Time
ELM-AE 86.45 602s
70
ELM Theory on Local Receptive Fields
and Super Nodes
Pooling
Size
Random Input i
Weights Vector
ak
3D DBNs 93.5%
DBMs 92.8%
NORB Dataset
SVMs 88.4% Training time in
NORB Data
DBN ELM
13 h 0.1h
G.-B. Huang, et al, “Local Receptive Fields Based Extreme Learning Machine,” IEEE Computational Intelligence Magazine,
vol. 10, no. 2, pp. 18-29, 2015. 74
ELM vs Deep Learning
Z. Bai, et al, “Generic Object Recognition with Local Receptive Fields Based Extreme Learning Machine,” 2015 INNS
Conference on Big Data, San Francisco, August 8-10, 2015. 75
ELM Slices
77
Traffic Sign Recognition (DNN + ELM)
78
ELM, SVM and Deep Learning
(a) SVM
Binary Output
d Input Nodes
Unknown features in each layer
d Input Nodes ELM Feature Space ELM Feature Space m Output Nodes
Different from Deep Learning, All the hidden neurons in ELM as
a whole are not required to be iteratively tuned
(b) ELM subnetwork
1 i L Feature learning
Clustering
Problem based Regression
optimization constraints Classification
1 d
80
xj
ELM and Deep Learning
Compression
Feature
Learning
Clustering
Regression
Classification
82
ELM Filling Gaps …
Baum
(1988)
Rosenblatt RVFL
Perceptron (1994)
(1958)
Schmidt, QuickNet
et al (1989)
(1992)
?
?
Biological
learning
?
Feature space methods
PCA SVM
(1995) PSVM
(1901) (2001)
Random
LS-SVM
Projection
(1999)
(1998)
83
ELM Filling Gaps …
Before ELM theory, for these methods:
1) Universal approximation capability was not
Rosenblatt proved for full random hidden nodes case
Baum QuickNet Schmidt, et al RVFL
Perceptron 2) Separation capability was not proved.
(1988) (1989) (1992) (1994)
(1958) 3) Optimization constraints were not used.
4) Dimensionality of hidden maps is usually
lower than number of training data
+: random
2) Remove bias in the output nodes, which are contradictory to biological systems
features
84
ELM Filling Gaps …
85
Towards Biological Learning, Cognition
and Reasoning?
Biological Learning ELMs
Stable in a wide range (tens to thousands of neurons Stable in a wide range (tens to thousands of
in each module) neurons in each module)
Parallel implementation Easy in parallel implementation
“Biological” implementation Much easier in hardware implementation
Free of user specified parameters Least human intervention
One module possibly for several types of applications One network type for different applications
Fast in micro learning point Fast in micro learning point
Nature in online sequential learning Easy in online sequential learning
Fast speed and high accuracy Fast speed and high accuracy
Brains are built before applications “Brains (devised by ELM)” can be generated
before applications are present
86
Biological Learning vs Computers
• 60 Years Later …
• Answered by ELM Learning Theory[Huang, et al 2006, 2007, 2008]
– “As long as the output functions of hidden neurons are nonlinear
piecewise continuous and even if their shapes and modeling are
unknown, (biological) neural networks with random hidden neurons
attain both universal approximation and classification capabilities,
and the changes in finite number of hidden neurons and their
related connections do not affect the overall performance of the
networks.” [Huang 2014] 87
Biological Learning vs Computers
Things
Intelligent
Things
(eg. Intelligent engine,
intelligent devices, intelligent
sensors, intelligent cameras,
etc)
ELMs
90
Society of Intelligent Things
91
Three Stages of Intelligent Things
Society of
Intelligent
Internet of Things
Intelligent • Internet
Things disappearing?
Internet of
Things • Intelligent • From living
things with thing
ELMs intelligence to
• Smart machine
materials, intelligence?
smart sensors
92
Human Intelligence vs Machine
Intelligence
Human
Intelligence
Machine
Intelligence
93
Outline
Incremental/Sequential ELM
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
Outline
Outline
1 ELM Theories
2 Incremental ELM
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
Outline
Outline
1 ELM Theories
2 Incremental ELM
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
Outline
Outline
1 ELM Theories
2 Incremental ELM
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
Outline
Outline
1 ELM Theories
2 Incremental ELM
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
M. Leshno, et al., “Multilayer feedforward networks with a nonpolynomial activation function can approximate any
J. Park and I. W. Sandberg, “Universal approximation using radial-basis-function networks,” Neural Computation, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
‚ ‚
‚
‚ XL ‚
‚
‚f (x) − β n n‚ <
g ‚ (2)
‚
‚ n=1 ‚
D E
en−1 ,gn
holds with probability one if βn = ,
kgn k2
Figure 2: Feedforward Network Architecture: any type of nonlinear gn = G(an , bn , x), i = 1, · · · , L.
piecewise continuous G(ai , bi , x).
tu-logo
G.-B. Huang, et al., “Universal approximation using incremental constructive feedforward networks with random
hidden nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. ur-logo
G.-B. Huang, et al., “Convex incremental learning machine,” Neurocomputing, vol. 70, pp. 3056-3062, 2007.
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
G.-B. Huang, et al., “Incremental extreme learning machine with fully complex hidden nodes,” Neurocomputing, vol. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
I-ELM
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output
function G(a, b, x), maximum node number Lmax and expected learning accuracy ,
1 Initialization: Let L = 0 and residual error E = t, where t = [t1 , · · · , tN ]T .
2 Learning step:
while L < Lmax and kEk >
- Increase by 1 the number of hidden nodes L: L = L + 1.
- Assign random hidden node parameter (aL , bL ) for new hidden node L. D E
E·HLT eL−1 ,gL
- Calculate the output weight βL for the new hidden node: βL = ≈
HL ·H T kgL k2
L
- Calculate the residual error after adding the new hidden node L: E = E − βL · HL
endwhile
where HL = [h(1), · · · , h(N)]T is the activation vector of the new node L for all the N tu-logo
training samples and E = [e(1), · · · , e(N)]T is the residual vector. E · HLT ≈ heL−1 , gL i
and HL · HLT ≈ kgL k2 .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
I-ELM
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output
function G(a, b, x), maximum node number Lmax and expected learning accuracy ,
1 Initialization: Let L = 0 and residual error E = t, where t = [t1 , · · · , tN ]T .
2 Learning step:
while L < Lmax and kEk >
- Increase by 1 the number of hidden nodes L: L = L + 1.
- Assign random hidden node parameter (aL , bL ) for new hidden node L. D E
E·HLT eL−1 ,gL
- Calculate the output weight βL for the new hidden node: βL = ≈
HL ·H T kgL k2
L
- Calculate the residual error after adding the new hidden node L: E = E − βL · HL
endwhile
where HL = [h(1), · · · , h(N)]T is the activation vector of the new node L for all the N tu-logo
training samples and E = [e(1), · · · , e(N)]T is the residual vector. E · HLT ≈ heL−1 , gL i
and HL · HLT ≈ kgL k2 .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
I-ELM
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output
function G(a, b, x), maximum node number Lmax and expected learning accuracy ,
1 Initialization: Let L = 0 and residual error E = t, where t = [t1 , · · · , tN ]T .
2 Learning step:
while L < Lmax and kEk >
- Increase by 1 the number of hidden nodes L: L = L + 1.
- Assign random hidden node parameter (aL , bL ) for new hidden node L. D E
E·HLT eL−1 ,gL
- Calculate the output weight βL for the new hidden node: βL = ≈
HL ·H T kgL k2
L
- Calculate the residual error after adding the new hidden node L: E = E − βL · HL
endwhile
where HL = [h(1), · · · , h(N)]T is the activation vector of the new node L for all the N tu-logo
training samples and E = [e(1), · · · , e(N)]T is the residual vector. E · HLT ≈ heL−1 , gL i
and HL · HLT ≈ kgL k2 .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
I-ELM
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output
function G(a, b, x), maximum node number Lmax and expected learning accuracy ,
1 Initialization: Let L = 0 and residual error E = t, where t = [t1 , · · · , tN ]T .
2 Learning step:
while L < Lmax and kEk >
- Increase by 1 the number of hidden nodes L: L = L + 1.
- Assign random hidden node parameter (aL , bL ) for new hidden node L. D E
E·HLT eL−1 ,gL
- Calculate the output weight βL for the new hidden node: βL = ≈
HL ·H T kgL k2
L
- Calculate the residual error after adding the new hidden node L: E = E − βL · HL
endwhile
where HL = [h(1), · · · , h(N)]T is the activation vector of the new node L for all the N tu-logo
training samples and E = [e(1), · · · , e(N)]T is the residual vector. E · HLT ≈ heL−1 , gL i
and HL · HLT ≈ kgL k2 .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
I-ELM
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output
function G(a, b, x), maximum node number Lmax and expected learning accuracy ,
1 Initialization: Let L = 0 and residual error E = t, where t = [t1 , · · · , tN ]T .
2 Learning step:
while L < Lmax and kEk >
- Increase by 1 the number of hidden nodes L: L = L + 1.
- Assign random hidden node parameter (aL , bL ) for new hidden node L. D E
E·HLT eL−1 ,gL
- Calculate the output weight βL for the new hidden node: βL = ≈
HL ·H T kgL k2
L
- Calculate the residual error after adding the new hidden node L: E = E − βL · HL
endwhile
where HL = [h(1), · · · , h(N)]T is the activation vector of the new node L for all the N tu-logo
training samples and E = [e(1), · · · , e(N)]T is the residual vector. E · HLT ≈ heL−1 , gL i
and HL · HLT ≈ kgL k2 .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
I-ELM
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output
function G(a, b, x), maximum node number Lmax and expected learning accuracy ,
1 Initialization: Let L = 0 and residual error E = t, where t = [t1 , · · · , tN ]T .
2 Learning step:
while L < Lmax and kEk >
- Increase by 1 the number of hidden nodes L: L = L + 1.
- Assign random hidden node parameter (aL , bL ) for new hidden node L. D E
E·HLT eL−1 ,gL
- Calculate the output weight βL for the new hidden node: βL = ≈
HL ·H T kgL k2
L
- Calculate the residual error after adding the new hidden node L: E = E − βL · HL
endwhile
where HL = [h(1), · · · , h(N)]T is the activation vector of the new node L for all the N tu-logo
training samples and E = [e(1), · · · , e(N)]T is the residual vector. E · HLT ≈ heL−1 , gL i
and HL · HLT ≈ kgL k2 .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
I-ELM
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output
function G(a, b, x), maximum node number Lmax and expected learning accuracy ,
1 Initialization: Let L = 0 and residual error E = t, where t = [t1 , · · · , tN ]T .
2 Learning step:
while L < Lmax and kEk >
- Increase by 1 the number of hidden nodes L: L = L + 1.
- Assign random hidden node parameter (aL , bL ) for new hidden node L. D E
E·HLT eL−1 ,gL
- Calculate the output weight βL for the new hidden node: βL = ≈
HL ·H T kgL k2
L
- Calculate the residual error after adding the new hidden node L: E = E − βL · HL
endwhile
where HL = [h(1), · · · , h(N)]T is the activation vector of the new node L for all the N tu-logo
training samples and E = [e(1), · · · , e(N)]T is the residual vector. E · HLT ≈ heL−1 , gL i
and HL · HLT ≈ kgL k2 .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
G.-B. Huang, et al., “Universal approximation using incremental constructive feedforward networks with random ur-logo
hidden nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006.
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Table 3: Training time (seconds) and network complexity comparison of different algorithms
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Table 5: Performance comparison (training time (seconds)) of I-ELM (with 500 random sigmoid hidden nodes),
stochastic gradient descent BP (SGBP), and SVR.
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Table 6: Performance comparison between the approximated threshold network (λ = 10) trained by stochastic
gradient descent BP (SGBP) and the true threshold networks trained by I-ELM with 500 threshold nodes:
g(x) = −1x<0 + 1x≥0 .
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
G.-B. Huang and L. Chen, “Enhanced random search based incremental extreme learning machine,”
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
+ Let i∗ = {i| min1≤i≤k kE(i) k}. Set E = E(i) , aL = a(i∗ ) , bL = b(i∗ ) , and βL = β(i∗ ) .
tu-logo
endwhile
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
EI-ELM Algorithm
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
Figure 7: Testing RMSE performance comparison between EI-ELM and I-ELM (with Sigmoid hidden nodes) for
Abalone case
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
Figure 8: Testing RMSE updating progress with new hidden nodes added and different number of selecting trials k in
Airplane case
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Natural Learning
1 The training observations are sequentially (one-by-one or
chunk-by-chunk with varying or fixed chunk length) presented to the
learning algorithm/system.
2 At any time, only the newly arrived single or chunk of observations
(instead of the entire past data) are seen and learned.
3 A single or a chunk of training observations is discarded as soon as the
learning procedure for that particular (single or chunk of) observation(s)
is completed.
4 The learning algorithm/system has no prior knowledge as to how many
training observations will be presented.
G.-B. Huang, et al., “A generalized growing and pruning RBF (GGAP-RBF) neural network for function tu-logo
approximation,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 57–67, 2005.
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ur-logo
Transactions on Neural Networks, vol. 17, no. 6, pp. 1411-1423, 2006.
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Natural Learning
1 The training observations are sequentially (one-by-one or
chunk-by-chunk with varying or fixed chunk length) presented to the
learning algorithm/system.
2 At any time, only the newly arrived single or chunk of observations
(instead of the entire past data) are seen and learned.
3 A single or a chunk of training observations is discarded as soon as the
learning procedure for that particular (single or chunk of) observation(s)
is completed.
4 The learning algorithm/system has no prior knowledge as to how many
training observations will be presented.
G.-B. Huang, et al., “A generalized growing and pruning RBF (GGAP-RBF) neural network for function tu-logo
approximation,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 57–67, 2005.
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ur-logo
Transactions on Neural Networks, vol. 17, no. 6, pp. 1411-1423, 2006.
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Natural Learning
1 The training observations are sequentially (one-by-one or
chunk-by-chunk with varying or fixed chunk length) presented to the
learning algorithm/system.
2 At any time, only the newly arrived single or chunk of observations
(instead of the entire past data) are seen and learned.
3 A single or a chunk of training observations is discarded as soon as the
learning procedure for that particular (single or chunk of) observation(s)
is completed.
4 The learning algorithm/system has no prior knowledge as to how many
training observations will be presented.
G.-B. Huang, et al., “A generalized growing and pruning RBF (GGAP-RBF) neural network for function tu-logo
approximation,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 57–67, 2005.
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ur-logo
Transactions on Neural Networks, vol. 17, no. 6, pp. 1411-1423, 2006.
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Natural Learning
1 The training observations are sequentially (one-by-one or
chunk-by-chunk with varying or fixed chunk length) presented to the
learning algorithm/system.
2 At any time, only the newly arrived single or chunk of observations
(instead of the entire past data) are seen and learned.
3 A single or a chunk of training observations is discarded as soon as the
learning procedure for that particular (single or chunk of) observation(s)
is completed.
4 The learning algorithm/system has no prior knowledge as to how many
training observations will be presented.
G.-B. Huang, et al., “A generalized growing and pruning RBF (GGAP-RBF) neural network for function tu-logo
approximation,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 57–67, 2005.
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ur-logo
Transactions on Neural Networks, vol. 17, no. 6, pp. 1411-1423, 2006.
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Natural Learning
1 The training observations are sequentially (one-by-one or
chunk-by-chunk with varying or fixed chunk length) presented to the
learning algorithm/system.
2 At any time, only the newly arrived single or chunk of observations
(instead of the entire past data) are seen and learned.
3 A single or a chunk of training observations is discarded as soon as the
learning procedure for that particular (single or chunk of) observation(s)
is completed.
4 The learning algorithm/system has no prior knowledge as to how many
training observations will be presented.
G.-B. Huang, et al., “A generalized growing and pruning RBF (GGAP-RBF) neural network for function tu-logo
approximation,” IEEE Transactions on Neural Networks, vol. 16, no. 1, pp. 57–67, 2005.
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ur-logo
Transactions on Neural Networks, vol. 17, no. 6, pp. 1411-1423, 2006.
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
RAN Based
1 RAN, MRAN, GAP-RBF, GGAP-RBF
2 At any time, only the newly arrived single observation is seen and learned
3 They do not handle chunks of training observations
4 Many control parameters need to be fixed by human. Very laborious! Very
tedious!
5 Training time is usually huge!!
6 Many control parameters need to be fixed by human
BP Based
1 Stochastic gradient BP (SGBP) tu-logo
2 It may handle chunks of training observations
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
N.-Y. Liang, et al., “A fast and accurate on-line sequential learning algorithm for feedforward networks”, IEEE
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
H0 T0
H1 β − T1
(4)
T
H0 T0
β (1)
= K−1
1 H1 T1
(5)
= K−1
1 (K1 β
(0)
− HT1 H1 β (0) + HT1 T1 )
= β (0) + K−1 T (0)
1 H1 (T1 − H1 β )
where β (1) is the output weight for all the data learned so far,
T
H0 H0 tu-logo
K1 = = K0 +HT1 H1 , K0 = HT0 H0 , β (0) = K−1 T
0 H0 T0
H1 H1
(6) ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Table 9: Comparison between OS-ELM and other sequential algorithms on regression applications. tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Table 10: Comparison between OS-ELM and other sequential algorithms on classification applications.
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Time-Series Problems
Table 11: Comparison between OS-ELM and other sequential algorithms on Mackey-Glass time series application.
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Table 12: Performance comparison of ELM and OS-ELM on regression applications. tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Table 13: Performance comparison of ELM and OS-ELM on classification applications. tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Time-Series Problems
Table 14: Performance comparison of ELM and OS-ELM on Mackey-Glass time series application.
tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
ur-logo
K. Choi, et al., “Incremental face recognition for large-scale social network services”, Pattern Recognition, vol. 45,
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Figure 10: Example frames from top row: Weizmann dataset, middle row: KTH dataset, and bottom row: UCF
sports dataset
tu-logo
R. Minhas, et al., “Incremental learning in human action recognition based on Snippets”, (in press) IEEE
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
R. Minhas, et al., “Incremental learning in human action recognition based on Snippets”, (in press) IEEE
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
tu-logo
ur-logo
Figure 11: Tracking results using action videos of run, kick, golf and dive (top to bottom) from UCF Sports dataset
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Weizmann dataset
Methods OS-ELM Based [32] [14] [36] [11]
Frames 1/1 3/3 6/6 10/10 1/12 1/9 1/1 7/7 10/10 8/8 20/20
Accuracy 65.2 95.0 99.63 99.9 55.0 93.8 93.5 96.6 99.6 97.05 98.68
KTH dataset
Methods OS-ELM Based [25] [33] [43] [14] [36] [12]
Frames 1/1 3/3 6/6 10/10 - - - - 1/1 7/7 20/20
Accuracy 74.4 88.5 92.5 94.4 91.3 90.3 83.9 91.7 88.0 90.9 90.84
Weizmann dataset
Methods OS-ELM Based [2] [32] [14] [36] [41] [30] [11]
Frames 1/1 3/3 6/6 10/10 - - - - - - -
Accuracy 100.0 100.0 100.0 100.0 100.0 72.8 98.8 100.0 97.8 99.44 100.0
KTH dataset
Methods OS-ELM Based [14] [36] [30] [21] [27] [9] [44]
Frames 1/1 3/3 6/6 10/10 - - - - - - -
Accuracy 92.8 93.5 95.7 96.1 91.7 92.7 94.83 95.77 97.0 96.7 95.7
tu-logo
Table 17: Classification comparison against different approaches at sequence-level.
ur-logo
R. Minhas, et al., “Incremental learning in human action recognition based on Snippets”, (in press) IEEE
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Open Problems
4 ELM always has faster learning speed than LS-SVM if the same kernel
is used?
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM
ELM Theory I-ELM EI-ELM OS-ELM
Open Problems
ELM Web Portal: www.extreme-learning-machines.org Part III of III: ELM Theories, Incremental/Sequential ELM