AI EEE Unit-I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

UNIT – I

Artificial Neural Networks: Introduction, Models of Neuron Network-Architectures –


Knowledge representation, Artificial Intelligence and Neural networks–Learning process-
Error correction learning, Hebbian learning–Competitive learning-Boltzman learning,
supervised learning-Unsupervised learning–Reinforcement learning-Learning tasks.
ANN Paradigms: Multi-layer perceptron using Back propagation Algorithm (BPA), Self –
Organizing Map (SOM), Radial Basis Function Network-Functional Link Network
(FLN),Hopfield Network

Artificial Neural Networks

Introduction

NEURAL NETWORK INTRODUCTION:


What is a neuron? A neuron is the basic processing unit in a neural network sitting on our
brain. It consists of
1. Nucleus-
2. Axon- Output node
3. Dendrites-Input node
4. Synaptic junction

The dynamics of this synaptic junction is complex. We can see the signal inputs from the
action of a neuron and through synaptic junction an output is actuated which is carried
over through dendrites to another neuron. Here, these are the neurotransmitters. We
learned from our experience that these synaptic junctions are either reinforced or in the
sense they behave in such a way that the output of synaptic junction may excite a neuron
or inhibit the neuron. This reinforcement of the synaptic weight is a concept that has been
taken to artificial neural model.
The objective is to create artificial machine and this artificial neural networks are motivated
by certain features that are observed in human brain, like as we said earlier, parallel
distributed information processing.

Artificial neural networks are among the most powerful learning models. They have the
versatility to approximate a wide range of complex functions representing multi-
dimensional input-output maps. Neural networks also have inherent adaptability, and can
perform robustly even in noisy environments.
An Artificial Neural Network (ANN) is an information processing paradigm that is inspired
by the way biological nervous systems, such as the brain, process information. The key
element of this paradigm is the novel structure of the information processing system. It is
composed of a large number of highly interconnected simple processing elements (neurons)
working in unison to solve specific problems. ANNs, like people, learn by example. An ANN
is configured for a specific application, such as pattern recognition or data classification,
through a learning process. Learning in biological systems involves adjustments to the
synaptic connections that exist between the neurons. This is true of ANNs as well. ANNs
can process information at a great speed owing to their highly massive parallelism.
A trained neural network can be thought of as an "expert" in the category of information it
has been given to analyses. This expert can then be used to provide projections given new
situations of interest and answer "what if" questions.

Advantages of ANN:
1. Adaptive learning: An ability to learn how to do tasks based on the data given for training
or initial experience.
2. Self-Organisation: An ANN can create its own organisation or representation of the
information it receives during learning time.
3. Real Time Operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this
capability.
4. Fault Tolerance via Redundant Information Coding: Partial destruction of a network
leads to the corresponding degradation of performance. However, some network capabilities
may be retained even with major network damage.
Artificial Neural Network (ANN) is an efficient computing system whose central theme is
borrowed from the analogy of biological neural networks. ANNs are also named as “artificial
neural systems,” or “parallel distributed processing systems,” or “connectionist systems.”
ANN acquires a large collection of units that are interconnected in some pattern to allow
communication between the units. These units, also referred to as nodes or neurons, are
simple processors which operate in parallel.
Every neuron is connected with other neuron through a connection link. Each connection
link is associated with a weight that has information about the input signal. This is the
most useful information for neurons to solve a particular problem because the weight
usually excites or inhibits the signal that is being communicated. Each neuron has an
internal state, which is called an activation signal. Output signals, which are produced
after combining the input signals and activation rule, may be sent to other units.
The history of ANN can be divided into the following three eras:
 ANN during 1940s to 1960s
 Some key developments of this era are as follows:
 1943: It has been assumed that the concept of neural network started with the work
of physiologist, Warren McCulloch, and mathematician, Walter Pitts, when in 1943
they modeled a simple neural network using electrical circuits in order to describe
how neurons in the brain might work.
 1949: Donald Hebb’s book, The Organization of Behavior, put forth the fact that
repeated activation of one neuron by another increases its strength each time they
are used.
 1956: An associative memory network was introduced by Taylor.
 1958: A learning method for McCulloch and Pitts neuron model named Perceptron
was invented by Rosenblatt.
 1960: Bernard Widrow and Marcian Hoff developed models called "ADALINE" and
“MADALINE.”
 ANN during 1960s to 1980s
 Some key developments of this era are as follows:
 1961: Rosenblatt made an unsuccessful attempt but proposed the
 “backpropagation” scheme for multilayer networks.
 1964: Taylor constructed a winner-take-all circuit with inhibitions among output
 units.
 1969: Multilayer perceptron (MLP) was invented by Minsky and Papert.
 1971: Kohonen developed Associative memories.
 1976: Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory.
 ANN from 1980s till Present
Some key developments of this era are as follows:
o 1982: The major development was Hopfield’s Energy approach.
o 1985: Boltzmann machine was developed by Ackley, Hinton, and Sejnowski.
o 1986: Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.
o 1988: Kosko developed Binary Associative Memory (BAM) and also gave the concept
of Fuzzy Logic in ANN.
Biological Neuron
o A nerve cell (neuron) is a special biological cell that processes information. According
to estimation, there are huge numbers of neurons, approximately 1011 with
numerous interconnections, approximately 1015.
The features of the biological neural network are attributed to its structure and function.
The fundamental unit of the network is called a neuron or a nerve cell. Figure 1 shows a
schematic of the structure of a neuron.

Figure 1: Schematic diagram of a typical neuron or nerve cell.


It like nerve fibres called dendrites is associated with the cell body. These dendrites receive
signals from other neurons. Extending from the cell body is a single long fibre called the
axon, which eventually branches into strands and sub strands connecting to many other
neurons at the synaptic junctions, or synapses. The receiving ends of these junctions on
other cells can be found both on the dendrites and on the cell bodies themselves. The axon
of a typical neuron leads to a few thousand synapses associated with other neurons.
The transmission of a signal from one cell to another at a synapse is a complex chemical
process in which specific transmitter substances are released from the sending side of the
junction. The effect is to raise or lower the electrical potential inside the body of the
receiving cell. If this potential reaches a threshold, an electrical activity in the form of short
pulses is generated. When this happens, the cell is said to have fired. These electrical
signals of fixed strength and duration are sent down the axon. Generally the electrical
activity is confined to the interior of a neuron, whereas the chemical mechanism operates
at the synapses.
The dendrites serve as receptors for signals from other neurons, whereas the purpose of an
axon is transmission of the generated neural activity to other nerve cells (inter-neuron) or
to muscle fibres (motor neuron). A third type of neuron, which receives information from
muscles or sensory organs, such as the eye or ear, is called a receptor neuron.
As shown in the above diagram, a typical neuron consists of the following four parts with
the help of which we can explain its working:
 Dendrites: They are tree-like branches, responsible for receiving the information
from other neurons it is connected to. In other sense, we can say that they are like
the ears of neuron.
 Soma: It is the cell body of the neuron and is responsible for processing of
information, they have received from dendrites.
 Axon: It is just like a cable through which neurons send the information.
 Synapses: It is the connection between the axon and other neuron dendrites
ANN versus BNN
Before taking a look at the differences between Artificial Neural Network (ANN) and
Biological Neural Network (BNN), let us take a look at the similarities based on the
terminology between these two.

The following table shows the comparison between ANN and BNN based on some criteria
mentioned.
Architecture:

Fig. Architecture of multilayer neural network


Artificial neural networks are represented by a set of nodes, often arranged in layers, and a
set of weighted directed links connecting them. The nodes are equivalent to neurons, while
the links denote synapses. The nodes are the information processing units and the links
acts as communicating media.
A neural network may have different layers of neurons like
1. input layer,
2. hidden layer,
3. output layer.
The input layer receives input data from the user and propagates a signal to the next layer
called the hidden layer. While doing so it multiplies the weight along with the input signal.
The hidden layer is a middle layer which lies between the input and the output layers. The
hidden layer with nonlinear activation function increases the ability of the neural network
to solve many problems than the case without the hidden layer. The output layer sends its
calculated output to the user from which decision can be made.
Neural nets can also be classified based on the above stated properties.
Models of Neuron
In this section we will consider three classical models for an artificial neuron or processing
unit.
McCulloch-Pitts Model
Perceptron
Adaline
Knowledge representation

Humans are best at understanding, reasoning, and interpreting knowledge. Human knows
things, which is knowledge and as per their knowledge they perform various actions in the
real world. But how machines do all these things comes under knowledge
representation and reasoning. Hence we can describe Knowledge representation as
following:

 Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence
which concerned with AI agents thinking and how thinking contributes to intelligent
behavior of agents.
 It is responsible for representing information about the real world so that a computer
can understand and can utilize this knowledge to solve the complex real world
problems such as diagnosis a medical condition or communicating with humans in
natural language.
 It is also a way which describes how we can represent knowledge in artificial
intelligence. Knowledge representation is not just storing data into some database,
but it also enables an intelligent machine to learn from that knowledge and
experiences so that it can behave intelligently like a human.

What to Represent:

Following are the kind of knowledge which needs to be represented in AI systems:

 Object: All the facts about objects in our world domain. E.g., Guitars contains
strings, trumpets are brass instruments.
 Events: Events are the actions which occur in our world.
 Performance: It describe behavior which involves knowledge about how to do things.
 Meta-knowledge: It is knowledge about what we know.
 Facts: Facts are the truths about the real world and what we represent.
 Knowledge-Base: The central component of the knowledge-based agents is the
knowledge base. It is represented as KB. The Knowledgebase is a group of the
Sentences (Here, sentences are used as a technical term and not identical with the
English language
Knowledge: Knowledge is awareness or familiarity gained by experiences of facts, data,
and situations. Following are the types of knowledge in artificial intelligence:

1. Declarative Knowledge:

 Declarative knowledge is to know about something.


 It includes concepts, facts, and objects.
 It is also called descriptive knowledge and expressed in declarativesentences.
 It is simpler than procedural language.

2. Procedural Knowledge

 It is also known as imperative knowledge.


 Procedural knowledge is a type of knowledge which is responsible for knowing how to
do something.
 It can be directly applied to any task.
 It includes rules, strategies, procedures, agendas, etc.
 Procedural knowledge depends on the task on which it can be applied.

3. Meta-knowledge:

 Knowledge about the other types of knowledge is called Meta-knowledge.


4. Heuristic knowledge:

 Heuristic knowledge is representing knowledge of some experts in a filed or subject.


 Heuristic knowledge is rules of thumb based on previous experiences, awareness of
approaches, and which are good to work but not guaranteed.

5. Structural knowledge:

 Structural knowledge is basic knowledge to problem-solving.


 It describes relationships between various concepts such as kind of, part of, and
grouping of something.
 It describes the relationship that exists between concepts or objects.

Approaches to knowledge representation:

There are mainly four approaches to knowledge representation, which are given below:

1. Simple relational knowledge:

 It is the simplest way of storing facts which uses the relational method, and each fact
about a set of the object is set out systematically in columns.
 This approach of knowledge representation is famous in database systems where the
relationship between different entities is represented.
 This approach has little opportunity for inference.

Example: The following is the simple relational knowledge representation.

Player Weight Age

Player1 65 23

Player2 58 18

Player3 75 24

2. Inheritable knowledge:

 In the inheritable knowledge approach, all data must be stored into a hierarchy of
classes.
 All classes should be arranged in a generalized form or a hierarchal manner.
 In this approach, we apply inheritance property.
 Elements inherit values from other members of a class.
 This approach contains inheritable knowledge which shows a relation between
instance and class, and it is called instance relation.
 Every individual frame can represent the collection of attributes and its value.
 In this approach, objects and values are represented in Boxed nodes.
 We use Arrows which point from objects to their values.

Example:

3. Inferential knowledge:

 Inferential knowledge approach represents knowledge in the form of formal logics.


 This approach can be used to derive more facts.
 It guaranteed correctness.
 Example: Let's suppose there are two statements:
1. Marcus is a man
2. All men are mortal Then it can represent as;
3.
man(Marcus)
∀x = man (x) ----------> mortal (x)s

4. Procedural knowledge:
 Procedural knowledge approach uses small programs and codes which describes how
to do specific things, and how to proceed.
 In this approach, one important rule is used which is If-Then rule.
 In this knowledge, we can use various coding languages such as LISP language and
Prolog language.
 We can easily represent heuristic or domain-specific knowledge using this approach.
 But it is not necessary that we can represent all cases in this approach.

Requirements for knowledge Representation system:

A good knowledge representation system must possess the following properties.


1. Representational Accuracy:
KR system should have the ability to represent all kind of required knowledge.
2. Inferential Adequacy:
KR system should have ability to manipulate the representational structures to
produce new knowledge corresponding to existing structure.
3. Inferential Efficiency:
The ability to direct the inferential knowledge mechanism into the most productive
directions by storing appropriate guides.
4. Acquisitional efficiency- The ability to acquire the new knowledge easily using
automatic methods.

Neural Rules Neural Network Learning Rules


Error correction learning-learning with a teacher
Error correction learning uses the error between the desired output and the actual output
for a given input pattern to adjust the weights. These are supervised learning laws, as they
depend on the availability of the desired output for a given input. Let (a, b) be a sample of
the input-output pair of vectors for which a network has to be designed by adjusting its
weights so as to obtain minimum error between the desired and actual outputs. Let E be
the error function and ε(E) be the expected value of the error function for all the training
data consisting of several input-output pairs. Since the joint probability density function of
the pairs of random input-output vectors is not known, it is not possible to obtain the
desired expectation ε[El. Stochastic approximation estimates the expectation using the
observed random input-output pairs of vectors (a, b). These estimates are used in a
discrete approximation algorithm like a stochastic gradient descent algorithm to adjust the
weights of the network. This type of adjustment may not always result in the optimum set
of weights, in the sense of minimizing ε[El. It may result in some local minima of the
expected error function.
Most error correction learning methods use the instantaneous error (b – b’) to adjust the
weights, where b' is the actual output vector of the network for the input vector a.
Rosenblatt's perceptron learning uses the instantaneous misclassification error to adjust
the weights. It is given by

where bi is the desired output from the ith output unit for an input pattern a = (al, a2, ...,
aM), a, is the jth component of the input Learning Methods pattern to the ith unit, and q is
a small positive learning constant.
The continuous perceptron learning uses a monotonically increasing nonlinear output
function fl.) for each unit. The weights are adjusted so as to minimize the squared error
between the desired and actual output at every instant. The corresponding learning
equation is given by The continuous perceptron learning uses a monotonically increasing
nonlinear output function fl.) for each unit. The weights are adjusted so as to minimize the
squared error between the desired and actual output at every instant. The corresponding

learning equation is given by where si = fi(xi) and .


Continuous perceptron learning is also called delta learning, and it can be generalized for
a network consisting of several layers of feed forward units. The resulting learning method
is called the generalized delta rule.
Widrow's least mean squared error (LMS) algorithm uses the instantaneous squared error
between the desired and the actual output of a unit, assuming a linear output function for
each unit, i.e.,

F(x) = x. The corresponding learning equation is given by


Note that in all of the above error correction learning methods, we have assumed that the
passive decay term to be zero. These methods require that the learning constant (η) is made
as small as possible, and that the training samples are applied several times to the network
until the weights lead to a minimum error.

Hebbian Learning Rule


This rule, one of the oldest and simplest, was introduced by Donald Hebb in his book The of
The Organization of Behavior in 1949. It is a kind of feed-forward, unsupervised learning.
Basic Concept Basic Concept − This rule is based on a proposal given by Hebb, who wrote

“firing“ When an axon of cell A is near enough to excite a cell B and repeatedly or
persistently takes part in firing it, some growth process or metabolic change takes place in
one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”
From the above postulate, we can conclude that the connections between two neurons
might be strengthened if the neurons fire at the same time and might weaken if they fire at
different times.
Mathematical Formulation the − According to Hebbian learning rule, following is the
formula to increase the weight of connection at every time step.

Competitive learning
It is concerned with unsupervised training in which the output nodes try to compete with
each other to network represent the input pattern. To understand this learning rule, we
must understand the competitive network which is given as follows −
Basic Network Concept of Competitive Network − This network is just like a single layer
feed forward network with shown with feedback connection between outputs. The
connections between outputs are inhibitory type, shown by by dotted lines, which means
the competitors never support themselves.

Basic Concept of Competitive Learning Rule: As said earlier, there will be a competition
among the output nodes. Hence, the main concept is that during training, the output unit
with the highest activation to a given input pattern will be declared the winner.
This rule is also called Winner-takes-all because only the winning neuron is updated and
the rest of the neurons are left unchanged.
Mathematical formulation: Following are the three important factors for mathematical
formulation of this learning rule:
Condition to be a winner: Suppose if a neuron 𝐲𝐤⁡wants to be the winner then there
would be the following condition

It means that if any neuron, say⁡𝒚𝒌, wants to win, then its induced local field (the output
of summation unit), say 𝒗𝒌, must be the largest among all the other neurons in the
network.
Condition of sum total of weight: Another constraint over the competitive learning rule
is, the sum total of weig
hts to a particular output neuron is going to be 1. For example, if we consider neuron k

then
Change of weight for winner: If a neuron does not respond to the input pattern, then no
learning takes place in that neuron. However, if a particular neuron wins, then the
corresponding weights are adjusted as follows:

This clearly shows that we are favoring the winning neuron by adjusting its weight and if
there is a neuron loss, then we need not bother to re-adjust its weight.
Boltzman learning
Boltzmann learning is statistical in nature, and is derived from the field of
thermodynamics. It is similar to error-correction learning and is used during
supervised training. ... Neural networks that use Boltzmann learning are
called Boltzmann machines.
The main purpose of Boltzmann Machine is to optimize the solution of a problem. It is the
work of Boltzmann Machine to optimize the weights and quantity related to that particular
problem.
Architecture
The following diagram shows the architecture of Boltzmann machine. It is clear from the
diagram, that it is a two-dimensional array of units. Here, weights on interconnections
between units are –p where p > 0. The weights of self-connections are given by b where b >
0.

Training Algorithm
As we know that Boltzmann machines have fixed weights, hence there will be no training
algorithm as we do not need to update the weights in the network. However, to test the
network we have to set the weights as well as to find the consensus function (CF).
Boltzmann machine has a set of units 𝐔𝐢⁡𝐚𝐧𝐝⁡𝐔𝐣 and has bi-directional connections on
them.
 We are considering the fixed weight say 𝐰𝐢𝐣.
 𝐰𝐢𝐣 ≠ 𝟎⁡ if 𝐔𝐢⁡𝐚𝐧𝐝⁡𝐔𝐣 are connected.
 There also exists a symmetry in weighted interconnection, i.e. 𝒘𝒊𝒋 = 𝒘𝒋𝒊.
 𝐰𝐢𝐢⁡also exists, i.e. there would be the self-connection between units.
 For any unit 𝐔𝐢, its state 𝐮𝐢 would be either 1 or 0.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher. This
learning process is dependent.
During which the training of ANN under supervised learning, the input vector is presented
to the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the
actual output and the desired output vector. On the basis of this error signal, the weights
are adjusted until the actual output is matched with the desired output.

Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a teacher.
This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar type
are combined to response form clusters. When a new input pattern is applied, then the
neural network gives an output response indicating the class to which the input pattern
belongs.
There is no feedback from the environment as to what should be the desired output and if
it is correct or incorrect. Hence, in this type of learning, the network itself must discover
the patterns and features from the input data, and the relation for the input data over the
output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the network
over some critic information. This learning process is similar to supervised learning;
however we might have very less information.
During the training of network under reinforcement learning, the network receives some
feedback from the environment. This makes it somewhat similar to supervised learning.
However, the feedback obtained here is evaluative not instructive, which means there is no
teacher as in supervised learning. After receiving the feedback, feedback, the network
performs adjustments of the weights to get better critic information in future.

Learning paradigms:
There are three major learning paradigms, each corresponding to a particular abstract
learning task. These are supervised learning, unsupervised learning and reinforcement
learning.
 Supervised learning the network is provided with a correct answer (output) for every
input pattern. Weights are determined to allow the network to produce answers as
close as possible to the known correct answers.
 Unsupervised learning does not require a correct answer associated with each input
pattern in the training data set. It explores the underlying structure in the data, or
correlations between patterns in the data, and organizes patterns into categories from
these correlations.
 Hybrid learning combines supervised and unsupervised learning. Part of the weights
is usually determined through supervised learning, while the others are obtained
through unsupervised learning. Reinforcement learning is a variant of supervised
learning in which the network is provided with only a critique on the correctness of
network outputs, not the correct answers themselves.
ANN Paradigms
Multilayer Neural Networks and Back propagation:
Self –Organizing Map (SOM)
Topographic Maps
Neurobiological studies indicate that different sensory inputs (motor, visual, auditory, etc.)
are mapped onto corresponding areas of the cerebral cortex in an orderly fashion.
This form of map, known as a topographic map, has two important properties:
1. At each stage of representation, or processing, each piece of incoming information is kept
in its proper context/neighbourhood.
2. Neurons dealing with closely related pieces of information are kept close together so that
they can interact via short synaptic connections.
Our interest is in building artificial topographic maps that learn through self-organization
in a neuro-biologically inspired manner.
We shall follow the principle of topographic map formation: “The spatial location of an
output neuron in a topographic map corresponds to a particular domain or feature drawn
from the input space”.
Setting up a Self Organizing Map
The principal goal of an SOM is to transform an incoming signal pattern of arbitrary
dimension into a one or two dimensional discrete map, and to perform this transformation
adaptively in a topologically ordered fashion.
We therefore set up our SOM by placing neurons at the nodes of a one or two dimensional
lattice. Higher dimensional maps are also possible, but not so common.
The neurons become selectively tuned to various input patterns (stimuli) or classes of
input patterns during the course of the competitive learning.
The locations of the neurons so tuned (i.e. the winning neurons) become ordered and a
meaningful coordinate system for the input features is created on the lattice. The SOM
thus forms the required topographic map of the input patterns.
We can view this as a non-linear generalization of principal component analysis (PCA).
Organization of the Mapping
We have points x in the input space mapping to points I(x) in the output space

Each point I in the output space will map to a corresponding point w(I) in the input space.
Components of Self Organization
The self-organization process involves four major components:
Initialization: All the connection weights are initialized with small random values.
Competition: For each input pattern, the neurons compute their respective values of a
discriminant function which provides the basis for competition. The particular neuron with
the smallest value of the discriminant function is declared the winner.
Cooperation: The winning neuron determines the spatial location of a topological
neighborhood of excited neurons, thereby providing the basis for cooperation among
neighboring neurons.
Adaptation: The excited neurons decrease their individual values of the discriminant
function in relation to the input pattern through suitable adjustment of the associated
connection weights, such that the response of the winning neuron to the subsequent
application of a similar input pattern is enhanced.
The Competitive Process
If the input space is D dimensional (i.e. there are D input units) we can write the input
patterns as x = {xi : i = 1, …, D} and the connection weights between the input units i and
the neurons j in the computation layer can be written wj = {wji : j = 1, …, N; i = 1, …, D}
where N is the total number of neurons.
We can then define our discriminant function to be the squared Euclidean distance
between the input vector x and the weight vector wj for each neuron j

In other words, the neuron whose weight vector comes closest to the input vector (i.e. is
most similar to it) is declared the winner.
In this way the continuous input space can be mapped to the discrete output space of

neurons by a simple process of competition between the neurons.

The Cooperative Process


In neurobiological studies we find that there is lateral interaction within a set of excited
neurons. When one neuron fires, its closest neighbors tend to get excited more than those
further away. There is a topological neighborhood that decays with distance.
We want to define a similar topological neighborhood for the neurons in our SOM. If Sij is
the lateral distance between neurons i and j on the grid of neurons, we take

as our topological neighborhood, where I(x) is the index of the winning neuron. This has
several important properties: it is maximal at the winning neuron, it is symmetrical about
that neuron, it decreases monotonically to zero as the distance goes to infinity, and it is
translation invariant (i.e. independent of the location of the winning neuron)
A special feature of the SOM is that the size s of the neighborhood needs to decrease with

time. Popular time dependence is an exponential decay:


The Adaptive Process
Clearly our SOM must involve some kind of adaptive, or learning, process by which the
outputs become self-organised and the feature map between inputs and outputs is
formed.
The point of the topographic neighborhood is that not only the winning neuron gets its
weights updated, but its neighbors will have their weights updated as well, although by not
as much as the winner itself. In practice, the appropriate weight update equation is
in which we have a time (epoch) t dependent

learning rate , and the updates are applied for all the training
patterns x over many epochs.
The effect of each learning weight update is to move the weight vectors wi of the winning
neuron and its neighbors towards the input vector x. Repeated presentations of the training
data thus leads to topological ordering.
Ordering and Convergence

Provided the parameters are selected properly, we can start from an


initial state of complete disorder, and the SOM algorithm will gradually lead to an organized
representation of activation patterns drawn from the input space. (However, it is possible to
end up in a metastable state in which the feature map has a topological defect.)
There are two identifiable phases of this adaptive process:
1. Ordering or self-organizing phase – during which the topological ordering of the weight
vectors takes place. Typically this will take as many as 1000 iterations of the SOM
algorithm, and careful consideration needs to be given to the choice of neighborhood and
learning rate parameters.
2. Convergence phase – during which the feature map is fine-tuned and comes to provide
an accurate statistical quantification of the input space. Typically the number of iterations
in this phase will be at least 500 times the number of neurons in the network, and again
the parameters must be chosen carefully.
Overview of the SOM Algorithm
We have a spatially continuous input space, in which our input vectors live. The aim is to
map from this to a low dimensional spatially discrete output space, the topology of which
is formed by arranging a set of neurons in a grid. Our SOM provides such a nonlinear
transformation called a feature map.
The stages of the SOM algorithm can be summarized as follows:
1. Initialization – Choose random values for the initial weight vectors wj.
2. Sampling – Draw a sample training input vector x from the input space.
3. Matching – Find the winning neuron I(x) with weight vector closest to input vector.

4. Updating – Apply the weight update equation


5. Continuation – keep returning to step 2 until the feature map stops changing.
Radial Basis Function Network
Introduction to Radial Basis Functions
The idea of Radial Basis Function (RBF) Networks derives from the theory of function
approximation. We have already seen how Multi-Layer Perceptron (MLP) networks with a
hidden layer of sigmoidal units can learn to approximate functions. RBF Networks take a
slightly different approach.
Their main features are:
1. They are two-layer feed-forward networks.
2. The hidden nodes implement a set of radial basis functions (e.g. Gaussian functions).
3. The output nodes implement linear summation functions as in an MLP.
4. The network training is divided into two stages: first the weights from the input to hidden
layer are determined, and then the weights from the hidden to output layer.
5. The training/learning is very fast.
6. The networks are very good at interpolation.
Commonly Used Radial Basis Functions
A range of theoretical and empirical studies have indicated that many properties of the
interpolating function are relatively insensitive to the precise form of the basic functions
φ(r). Some of the most commonly used basis functions are:
1. Gaussian Functions:

2. Multi-Quadric Functions:

3. Generalized Multi-Quadric Functions:

4. Inverse Multi-Quadric Functions:

5. Generalized Inverse Multi-Quadric Functions:

6. Thin Plate Spline Function:

7. Cubic Function:

8. Linear Function:
Problems with Exact Interpolation
We have seen how we can set up RBF networks that perform exact interpolation, but there
are two serious problems with these exact interpolation networks:
1. They perform poorly with noisy data
As we have already seen for Multi-Layer Perceptron (MLPs), we do not usually want the
network’s outputs to pass through all the data points when the data is noisy, because that
will be a highly oscillatory function that will not provide good generalization.
2. They are not computationally efficient
The network requires one hidden unit (i.e. one basis function) for each training data
pattern, and so for large data sets the network will become very costly to evaluate.
With MLPs we can improve generalization by using more training data – the opposite
happens in RBF networks, and they take longer to compute as well.
Improving RBF Networks
We can take the basic structure of the RBF networks that perform exact interpolation and
improve upon them in a number of ways:
1. The number M of basic functions (hidden units) need not equal the number N of training
data points. In general it is better to have M much less than N.
2. The centers of the basic functions do not need to be defined as the training data input
vectors. They can instead be determined by a training algorithm.
3. The basic functions need not all have the same width parameter s. These can also be
determined by a training algorithm.
4. We can introduce bias parameters into the linear sum of activations at the output layer.
These will compensate for the difference between the average value over the data set of the
basis function activations and the corresponding average value of the targets.

You might also like