Chapter 7 - Neural-Networks

Chapter -7
Neural Networks
Compiled By: Bal Krishna Nyaupane
[email protected]
Basic Components of Biological Neurons
 The brain is a collection of about 10
billion interconnected neurons.
 Each neuron is a cell that uses
biochemical reactions to receive,
process and transmit information.
 The majority of neurons encode
their activation or outputs as a series
of brief electrical pulses.
 A neuron's dendritic tree is
connected to a thousand
neighbouring neurons. When one of
those neurons fire, a positive or
negative charge is received by one
of the dendrites. The strengths of all
the received charges are added
together through the processes of
spatial and temporal summation.
2
Basic Components of Biological Neurons
 The neuron’s cell body (soma) processes the incoming activations and converts them into output
activations.
 The neuron’s nucleus contains the genetic material (DNA)
 Dendrites are fibers which emanate from the cell body and provide the receptive zone that receive
activation from other neurons.
 Axons are fibers acting as transmission lines that send action potentials to other neurons.
 Each terminal button is connected to other neurons across a small gap called a synapse.The synapses
allow signal transmission between the axons and the dendrites.
Biological NN Artificial NN
Soma Neuron
Dendrite Input
Axon Output
Synapse weight 3
Introduction to Neural Networks
 McCulloch & Pitts (1943) are generally recognised as the designers of the first neural network.
 The inventor of the first neurocomputer, Dr. Robert Hecht-Nielsen, defines a neural network as:
a computing system made up of a number of simple, highly interconnected processing elements,
which process information by their dynamic state response to external inputs.
 An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by
the biological nervous systems, such as the human brain’s information processing mechanism.
 An artificial network consists of a pool of simple processing units which communicate by
sending signals to each other over a large number of weighted connections.
 Artificial Neural Networks (ANNs) are networks of Artificial Neurons and hence constitute
crude approximations to parts of real brains.
 Computers point of view, an ANN is just a parallel computational system consisting of many
simple processing elements connected together in a specific way in order to perform a particular
task.
 An Artificial Neural Network is composed of a large number of highly interconnected
processing elements (neurons) working in unison to solve specific problems. NNs, like people,
learn by example.
4
Introduction to Neural Networks
 Neural networks are a powerful technique to solve many real world problems. They have the
ability to learn from experience in order to improve their performance and to adapt themselves
to changes in the environment. In addition to that they are able to deal with incomplete
information or noisy data and can be very effective especially in situations where it is not
possible to define the rules or steps that lead to the solution of a problem.
 Why are Artificial Neural Networks?
• They are extremely powerful computational devices.
• Massive parallelism makes them very efficient
• They can learn and generalize from training data, so there is no need for enormous feats of
programming
• They are particularly fault tolerant.
• They are very noise tolerant, so they can cope with situations where normal symbolic
systems would have difficulty
• In principle, they can do anything a symbolic/logic system can do, and more.
• They can perform tasks that a linear program cannot perform.
• Widely applied in data classification, clustering, pattern recognition.
5
Neural Network Applications
 Brain modelling
• Aid our understanding of how the brain works, how behavior emerges from the
interaction of networks of neurons, what needs to “get fixed” in brain damaged
patients.
 Real world applications
• Financial modelling – predicting the stock market
• Time series prediction – climate, weather
• Computer games – intelligent agents, chess, backgammon (A board game for two
players; pieces move according to throws of the dice)
• Robotics – autonomous adaptable robots
• Pattern recognition – speech recognition, seismic activity, sonar signals (acoustic
pulse in water and measures distances in terms of the time for the echo of the pulse to
return)
• Data analysis – data compression, data mining
• Bioinformatics – DNA sequencing, alignment
6
Learning Processes in Neural Networks
 Neural network has the ability to learn from its environment, and to improve its performance
through learning. The improvement in performance takes place over time in accordance with
some prescribed measure.
 A neural network learns about its environment through an iterative process of adjustments
applied to its synaptic weights and thresholds. The network becomes more knowledgeable
about its environment after each iteration of the learning process.
 There are three broad types of learning:
1. Supervised learning (i.e. learning with an external teacher)
2. Unsupervised learning (i.e. learning with no help)
3. Reinforcement learning (i.e. learning with limited feedback)
7
 Supervised learning
• Supervised learning is the machine learning task of inferring a function from training data and
the training data consist of a set of training examples i.e. a supervised learning algorithm
analyzes the training data and produces an inferred function, which can be used for mapping
new examples.
• In supervised the variables under investigation can be split into two groups: explanatory
variables and one (or more) dependent variables. The target of the analysis is to specify
a relationship between the explanatory variables and the dependent variable as it is done in
regression analysis.
• In supervised training, both the inputs and the outputs are provided. The network then processes
the inputs and compares its resulting outputs against the desired outputs.
• Errors are then propagated back through the system, causing the system to adjust the weights
which control the network. This process occurs over and over as the weights are continually
tweaked.
• The set of data which enables the training is called the training set. During the training of a
network the same set of data is processed many times as the connection weights are ever
refined.
8
• Used for: classification, regression
 Unsupervised learning
• Unsupervised learning is the task of finding hidden structure in unlabeled data. Since the
examples given to the learner are unlabeled, there is no error or reward signal to evaluate a
potential solution.
• In unsupervised learning situations all variables are treated in the same way, there is no
distinction between explanatory and dependent variables.
• In unsupervised training, the network is provided with inputs but not with desired outputs.
The system itself must then decide what features it will use to group the input data. This is
often referred to as self-organization or adaption.
• The most common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.
• Used for: clustering
9
 Reinforcement learning
• Reinforcement learning: in the case of the agent acts on its environment, it receives some
evaluation of its action (reinforcement), but is not told of which action is the correct one to
achieve its goal
• It allows machines and software agents to automatically determine the ideal behaviour
within a specific context, in order to maximize its performance. Simple reward feedback is
required for the agent to learn its behaviour; this is known as the reinforcement signal.
• In the reinforcement learning, the learner receives feedback about the appropriateness of its
response. For correct responses, reinforcement learning resembles supervised learning.
• However, the two forms of learning differ significantly for errors, situations in which the
learner's behavior is in some way inappropriate. In these situations, supervised learning lets
the learner know exactly what it should have done, whereas reinforcement learning only
says that the behavior was inappropriate and (usually) how inappropriate it was.
• Consider an animal that has to learn some aspects of how to walk. It tries out various
movements. Some work -- it moves forward -- and it is rewarded. Others fail -- it stumbles
or falls down -- and it is punished with pain.
10
McCulloch-Pitts (M-P) Neurons Equation
11
Artificial Neuron- Basic Elements
12
Artificial Neuron- Basic Elements
13
Activation function
14
Activation function
15
Example
16
Types of Layers in ANN
 The input layer
• Introduces input values into the network.
• No activation function or other processing.
 The hidden layer(s)
• Perform classification of features
• Two hidden layers are sufficient to solve any problem
• Features imply more layers may be better
 The output layer
• Functionally just like the hidden layers
• Outputs are passed on to the world outside the neural network. 17
Types/Architectures/Structures/Topologies of Neural Network
 Single layer feed forward Network
• The single layer feed forward Network consists of a single layer of weights, where the inputs are
directly connected to the outputs ,via a series of weights. The synaptic links carrying weights
connect every input to every output, but not other way. This way it is considered a network of feed-
forward type.
• The sum of the products of the weights and the inputs is calculated in each neuron node, and if the
value is above some threshold (typically 0) the neuron fires and takes the activated value (typically
1); otherwise it takes the deactivated value( typically -1).
• For example, a simple Perceptron.
18
 Multi-layer feed forward Network
• One input layer, one output layer, and one or more hidden layers of processing units. The hidden layers sit in
between the input and output layers, and are thus hidden from the outside world. The computational unit of
hidden layers are known as hidden neurons.
• The hidden layer does intermediate computation before directing the input to output layer. The input layer
neurons are linked to the hidden layer neurons.
• A multi-layer feedforward network with l input neurons, m1 neurons in first hidden layer, m2 neurons in second
hidden layer, and n output layer is written as (l - m1 - m2 – n)
• For example, a Multi-Layer Perceptron.
19
 Recurrent Network
• A recurrent network has at least one feedback loop.
• There could be neurons with self-feedback loop; that is the output of a neuron is feedback
into itself as input.
20
The Perceptron
 First studied in the late 1950s.
 Also known as Layered Feed-Forward Networks.
 The operation of Rosenblatt’s perceptron is based on the McCulloch and Pitts neuron model.
The model consists of a linear combiner followed by a hard limiter. The weighted sum of the
inputs is applied to the hard limiter, which produces an output equal to +1 if its input is positive
and -1 if it is negative.
 Single-layer two-input perceptron Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2

x2
Threshold 21
The perceptron learning rule
wi ( p  1)  wi ( p)  a . xi ( p) . e( p)
 where p = 1, 2, 3, . . .
a is the learning rate, a positive constant less than unity.
 The perceptron learning rule was first proposed by Rosenblatt in 1960. Using this rule we can
derive the perceptron training algorithm for classification tasks.
 Perceptron’s training algorithm

Step 1: Initialization
• Set initial weights w1, w2,…, wn and threshold  to random numbers in the range [-0.5, 0.5].
• If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is
negative, we need to decrease Y(p). 22
Perceptron’s training algorithm
Step 2: Activation
• Activate the perceptron by applying inputs x1(p), x2(p),…, xn(p) and desired output Yd (p).
• Calculate the actual output at iteration p = 1
 n 
Y ( p )  step   x i ( p ) w i ( p )   
 i  1 
• where n is the number of the perceptron inputs, and step is a step activation function.
• If at iteration p, the actual output is Y(p) and the desired output is Yd (p), then the error
is given by:
e( p)  Yd ( p) Y( p) where p = 1, 2, 3, . . .
23
Perceptron’s training algorithm
Step 3: Weight training

Update the weights of the perceptron
wi ( p  1)  wi ( p)  Dwi ( p)
where Dwi(p) is the weight correction at iteration p.

The weight correction is computed by the delta rule:
Dwi ( p)  a  xi ( .p)  e( p)
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the process until
convergence. 24
Example of perceptron learning: the logical operation AND
Inputs Desired Initial Actual Error Final
Epoch output weights output weights
x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3  0.1 0 0 0.3  0.1
0 1 0 0.3  0.1 0 0 0.3  0.1
1 0 0 0.3  0.1 1 1 0.2  0.1
1 1 1 0.2  0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0
0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 1 0.2 0.0
1 1 1 0.2 0.0 1 0 0.2 0.0
3 0 0 0 0.2 0.0 0 0 0.2 0.0
0 1 0 0.2 0.0 0 0 0.2 0.0
1 0 0 0.2 0.0 1 1 0.1 0.0
1 1 1 0.1 0.0 0 1 0.2 0.1
4 0 0 0 0.2 0.1 0 0 0.2 0.1
0 1 0 0.2 0.1 0 0 0.2 0.1
1 0 0 0.2 0.1 1 1 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold:  = 0.2; learning rate: = 0.1
25
Perceptron: Linear separability
 The single layer perceptron algorithm converges if examples are linearly separable.
 A single layer perceptron can only learn linearly separable concepts.
 A single layer perceptron can learn the operations AND, OR, and NOT , but not Exclusive-OR.
AND OR
26
Example: Linear separability
27
Multilayer Perceptron Neural Networks
 A multilayer perceptron is a feedforward neural network with one or more hidden layers.
 The network consists of an input layer of source neurons, at least one middle or hidden layer
of computational neurons, and an output layer of computational neurons.
 The input signals are propagated in a forward direction on a layer-by-layer basis.
 Figure: Multilayer perceptron with two hidden layers
Input Signal
Output Signa
s
ls
First Second
Input hidden hidden Output
layer layer layer layer 28
Backpropagation Algorithm
 In a back-propagation neural Input signals
network, the learning algorithm has 1
two phases. x1
1 y1
1
 First, a training input pattern is 2
x2
presented to the network input layer. 2
2 y2
The network propagates the input
pattern from layer to layer until the i wij j wjk
xi k yk
output pattern is generated by the
output layer. m
n
 If this pattern is different from the xn
l yl
desired output, an error is calculated Input Hidden Output
and then propagated backwards layer layer layer
through the network from the output Error signals
layer to the input layer. The weights
are modified as the error is
propagated. Figure: Three-layer back-propagation neural network
29
The Back-Propagation Algorithm
Step 1: Initialization
Set all the weights and threshold levels of the network to random
numbers uniformly distributed inside a small range:
 2.4 2.4 
  ,  
 Fi Fi 
where Fi is the total number of inputs of neuron i in the network. The

weight initialization is done on a neuron-by-neuron basis.
30
Step 2: Activation
Activate the back-propagation neural network by applying inputs x1(p), x2(p),…,
xn(p) and desired outputs yd,1(p), yd,2(p),…, yd,n(p).
(a) Calculate the actual outputs of the neurons in the hidden layer:
 n 
y j ( p )  sigmoid   xi ( p)  wij ( p )   j 
 i 1 
where n is the number of inputs of neuron j in the hidden layer, and sigmoid is
the sigmoid activation function.
(b) Calculate the actual outputs of the neurons in the output layer:
 m 
yk ( p)  sigmoid   x jk ( p )  w jk ( p )   k 
 j 1 
where m is the number of inputs of neuron k in the output layer. 31

Step 3: Weight training
Update the weights in the back-propagation network propagating backward the
errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the output layer:
k ( p)  yk ( p)  1  yk ( p)  ek ( p)
where ek ( p)  yd ,k ( p)  yk ( p)
Calculate the weight corrections:
Dw jk ( p)   y j ( p)  k ( p)
Update the weights at the output neurons:
w jk ( p  1)  w jk ( p)  Dw jk ( p)
32
(b) Calculate the error gradient for the neurons in the hidden layer:
l
j ( p)  y j ( p)  [1  y j ( p)]   k ( p) w jk ( p)
k 1
Calculate the weight corrections:
Dwij ( p)   xi ( p)  j ( p)
Update the weights at the hidden neurons:

wij ( p  1)  wij ( p)  Dwij ( p)
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the process until the
selected error criterion is satisfied.
33
Example: The Back-Propagation Algorithm
 As an example, we may consider the three-layer back-propagation network. Suppose that the
network is required to perform logical operation Exclusive-OR.
1
 The effect of the threshold
3
applied to a neuron in the w13 1
hidden or output layer is x1 1 3 w35 5
represented by its weight, , w23
connected to a fixed input equal 5 y5
to -1. w24
x2 2 4 w45
 The initial weights and w24
threshold levels are set Input 4 Output
randomly as follows: layer layer
1
w13 = 0.5, w14 = 0.9, w23 = 0.4,
Hiddenlayer
w24 = 1.0, w35 = 1.2, w45 = 1.1,
3 = 0.8, 4 = 0.1 and 5 = 0.3. Figure: Three-layer network for solving the
Exclusive-OR operation. 34
 We consider a training set where inputs x1 and x2 are equal to 1 and desired
output yd,5 is 0. The actual outputs of neurons 3 and 4 in the hidden layer
are calculated as (10.510.410.8)
y3  sigmoid ( x1w13  x2w23  3)  1/ 1 e  0.5250
y4  sigmoid ( x1w14  x2w24  4 )  1/ 1  e (10.911.010.1)  0.8808
 Now the actual output of neuron 5 in the output layer is determined as:
y5  sigmoid( y3w35  y4w45  5)  1/ 1 e(0.52501.20.88081.110.3)  0.5097
 Thus, the following error is obtained:

e  yd,5  y5  0  0.5097  0.5097
 The next step is weight training. To update the weights and threshold levels in
our network, we propagate the error, e, from the output layer backward to the
input layer. 35
 First, we calculate the error gradient for neuron 5 in the output layer:
5  y5 (1 y5) e  0.5097 (1 0.5097) ( 0.5097) 0.1274
 Then we determine the weight corrections assuming that the learning rate
parameter, a, is equal to 0.1:
Dw35   y3  5  0.1 0.5250 (0.1274)  0.0067
Dw45   y4  5  0.1 0.8808 (0.1274)  0.0112
D5   ( 1)  5  0.1 (1)  (0.1274)  0.0127
36
 calculate
 Next3we 5w
) error
y3(1  y3the 35  0.5250
gradients for neurons ( hidden
and 4 in the
 (1 30.5250) 0.1274)  ( 1
layer:
3 4y 3) y
3(1y4y(1 54 w w45  (10.8808
) 35 5 0.5250  (1 (0.8808)
 0.5250) 0.1274) ((1.20.127 4) 1.
)  0.0381
4  y4 (1 y4 )  5  w45  0.8808 (1  0.8808) (  0.1274) 1.1  0.0147
 We then determine the weight corrections:

Dw13   x1  3  0.1 1 0.0381 0.0038
Dw13   x1  3  0.1 1 0.0381 0.0038
D w   x  
Dw23   x2  3  0.11 0.0381 0.0038
23 2 3 0 . 1  1 0.0381  0 .0038
D D3 3  ( 1)( 31)0.13  1).1
( 0  0.(0381 .0038 0.0038
1) 0.00381
DwD14w14  x1   x 1
41 0.4 1 0(.1 1 () 0.0147
0.0147 0.0015 )  0.0015
DwD24 w24  x2  x42 0.411 (0.01.01471 () 0.0147
0.0015)  0.0015
DD4  ( 1)( 41) 0.1  ( 10).1((0.0147
1)  ()00
. .0015)  0.0015
0147
4 4
37
 At last, we update all weights and threshold:
w13  w13  D w13  0.5  0.0038  0.5038
w14  w14  Dw14  0.9  0.0015  0.8985
w23  w23  Dw23  0.4  0.0038  0.4038
w24  w24  D w24  1.0  0.0015  0.9985
w35  w35  D w35   1.2  0.0067   1.2067
w45  w45  D w45  1.1  0.0112  1.0888
 3   3  D  3  0.8  0.0038  0.7962
 4   4  D  4   0.1  0.0015   0.0985
 5   5  D  5  0.3  0.0127  0.3127

 The training process is repeated until the sum of squared errors is less than 0.001. 38
39
Hebbian learning
 In 1949, Donald Hebb proposed one of the key ideas in biological learning,
commonly known as Hebb’s Law. Hebb’s Law provides the basis for learning
without a teacher.
 Hebb’s Law states that “When an axon of cell A is near enough to excite a
cell B and repeatedly or persistently takes part in firing it, some growth
process or metabolic changes take place in one or both cells such that A’s
efficiency as one of the cells firing B is increased”
 Hebb’s Law can be represented in the form of two rules:
1. If two neurons on either side of a connection are activated synchronously,
then the weight of that connection is increased.
2. If two neurons on either side of a connection are activated asynchronously,
then the weight of that connection is decreased.
Hebbian learning algorithm
 Step 0: initialize all weights to 0

 Step 1: Given a training input “s”, with its target output “t”, set the
activations of the input units: xi = si
 Step 2: Set the activation of the output unit to the target value: y = t
 Step 3: Adjust the weights: wi(new) = wi(old) + xiy
 Step 4: Adjust the bias (just like the weights): b(new) = b(old) + y
42
43
44
Adaline Network
 ADALINE: Adaptive Linear Neuron or later Adaptive Linear Element
 It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford
University in 1960.
 It is based on the McCulloch–Pitts neuron. It consists of a weight, a bias and a summation
function.
 The difference between Adaline and the perceptron is that in the learning phase the weights are
adjusted according to the weighted sum of the inputs. In the standard perceptron, the net is
passed to the activation function and the function's output is used for adjusting the weights.
 Variation on the Perceptron Network
• inputs are +1 or -1
• outputs are +1 or -1
• uses a bias input
 It is trained using the Delta Rule which is also known as the least mean squares (LMS) or
Widrow-Hoff rule. The activation function, during training is the identity function. After
training the activation is a threshold function.
45
Algorithm of Adaline Network
 Step 0: initialize the weights to small random values and select a learning rate, a
 Step 1: for each input vector “s”, with target output “t”; set the inputs to s
 Step 2: compute the neuron inputs
Neuron input
y_in = b + S xiwi
 Step 3: use the delta rule to update the bias and weights
Delta rule
b(new) = b(old) + a(t - y_in)

wi(new) = wi(old) + a(t - y_in)xi
 Step 4: stop if the largest weight change across all the training samples is less
than a specified tolerance, otherwise cycle through the training set again
46
The Learning Rate, a
 The performance of an ADALINE neuron depends heavily on the choice of the
learning rate
• if it is too large the system will not converge
• if it is too small the convergence will take to long
 Typically, a is selected by trial and error
• typical range: 0.01 < a < 10.0
• often start at 0.1
• sometimes it is suggested that:
0.1 < n a < 1.0
where n is the number of inputs
 Example: Construct an AND function for a ADALINE neuron with a0.1.
Activation Function
Neuron input
y_in = b + S xiwi y= {-1 if y_in < 0
1 if y_in >= 0
47
48
49
Continue to cycle
through the four
training inputs until the
largest change in the
weights over a complete
cycle is less than some
small number (say 0.01)
In this case, the solution

becomes:
b = -0.5,
w1 = 0.5,
w2 = 0.5
50
Hopfield Neural Network
 A Hopfield network is a form of recurrent artificial neural network popularized by John
Hopfield in 1982, but described earlier by Little in 1974.
 A recurrent neural network has feedback loops from its outputs to its inputs. The presence of
such loops has a profound impact on the learning capability of the network.
x1 1 y1
x2 2 y2
xi i yi
xn n yn
Figure: Single-layer n-neuron Hopfield network

51
 The Hopfield network uses McCulloch and Pitts neurons with the sign activation function as
its computing element.
 The current state of the Hopfield network is determined by the current outputs of all neurons,
y1, y2, . . ., yn. Thus, for a single-layer n-neuron network, the state can be defined by the state
vector.
 y1 
 y  1, if X  0 M
 
2  sign  W   YmYm
T
M I
Y
  Y  1, if X  0
   Y, if X  0 m1

 yn 
 
sign activation function matrix form
state vector
 In the Hopfield network, synaptic weights between neurons are usually represented in matrix
form as shown in above.
 where M is the number of states to be memorized by the network, Ym is the n-dimensional
binary vector, I is n  n identity matrix, and superscript T denotes matrix transposition.
52
 The stable state-vertex is determined by the weight matrix W, the current input vector X, and
the threshold matrix . If the input vector is partially incorrect or incomplete, the initial state
will converge into the stable state-vertex after a few iterations.
 Suppose, for instance, that our network is required to memorize two opposite states, (1, 1, 1)
and (-1, -1, -1). Thus,
1 1
Y1  1 Y2  1 or Y1T  1 1 1 Y2T   1  1  1
 
1 1
where Y1 and Y2 are the three-dimensional vectors.
1 0 0
 The 3  3 identity matrix I is I  0 1 0
0 0 1
0 0 1
 Thus, we can now determine the weight matrix as follows:
1 1 1 0 0 0 2 2 2: is required to

W  1 1 1 1  1 1 1 1  2 0 1 0  2 0 2 memorize two
opposite states.
1 1 0 0 1 2 2 0
 Next, the network is tested by the sequence of input vectors,X
01 and2X2, which   to
2 1 are0equal 1 the
output (or target) vectors Y1 and Y2, respectively.Y  sign 2 0 2 1  0   1
1       
 First, we activate the Hopfield network by applying the inputvector  
X. Then,   1the
wecalculate

 2 2 0     
1 0
actual output vector Y, and finally, we compare the result with the initial input vector X.
0 2 2 1 0  1 0 2 2 1 0 1

                
Y1  sign 2 0 2 1  0   1 Y2  sign2 0 2 1  0  1
2 0 1 0  1 2 2 0 1 0 1
 2 
 The remaining six states are all unstable. However, stable states (also called fundamental
memories) are capable of attracting states that are close to them. The fundamental memory (1,
1, 1) attracts unstable states (1, 1, 1), (1, 1, 1) and (1, 1, 1). Each of these unstable states
represents a single error, compared to the fundamental memory (1, 1, 1). The fundamental
memory (1, 1, 1) attracts unstable states (1, 1, 1), (1, 1, 1) and (1, 1, 1).
 Thus, the Hopfield network can act as an error correction network.
y2
(1,1, 1) (1, 1, 1)
(1, 1, 1) (1, 1, 1)
y1
Figure: Possible states for the three-neuron 0
Hopfield network (1,1,1) (1,1,1)
(1,1, 1) (1,1, 1)
y3
The Kohonen network
 The Kohonen model provides a topological mapping. It places a fixed number of input
patterns from the input layer into a higher-dimensional output or Kohonen layer.
 Training in the Kohonen network begins with the winner’s neighbourhood of a fairly large size.
Then, as training proceeds, the neighbourhood size gradually decreases.
y1
Output Signals
Input Signals
x1
y2
x2
y3
Input Output
layer layer
Figure: Architecture of the Kohonen Network

The Kohonen network
 The lateral connections are used to create a competition between neurons. The neuron with
the largest activation level among all neurons in the output layer becomes the winner. This
neuron is the only neuron that produces an output signal. The activity of all other neurons
is suppressed in the competition.
 In the Kohonen network, a neuron learns by shifting its weights from inactive connections
to active ones. Only the winning neuron and its neighbourhood are allowed to learn. If a
neuron does not respond to a given input pattern, then learning cannot occur in that
particular neuron.
 The competitive learning rule defines the change Dwij applied to synaptic
weight wij as
a ( xi  wij ), if neuron j wins the competitio n
Dwij  
 0, if neuron j loses the competitio n
where xi is the input signal and a is the learning rate parameter.

The Kohonen network
 The overall effect of the competitive learning rule resides in moving the synaptic weight
vector Wj of the winning neuron j towards the input pattern X. The matching criterion is
equivalent to the minimum Euclidean distance between vectors.
 The Euclidean distance between a pair of n-by-1 vectors X and Wj is defined by
1/ 2
 n
2
d  X  W j   ( xi  wij ) 
 i 1 
where xi and wij are the ith elements of the vectors X and Wj, respectively.
 To identify the winning neuron, jX, that best matches the input vector X, we may apply the
following condition:
jX  min X  W j , j = 1, 2, . . ., m
j
where m is the number of neurons in the Kohonen layer.

The Kohonen network
 Suppose, for instance, that the 2-dimensional input vector X is presented to the three-neuron
Kohonen network, 0.52
X 
 0 . 12 
 The initial weight vectors, Wj, are given by
0.27 0.42 0.43
W1    W2    W3   
 0.81   0.70  0.21
 We find the winning (best-matching) neuron jX using the minimum-distance Euclidean
criterion:
d1  ( x1  w11) 2  ( x2  w21) 2  (0.52  0.27 ) 2  (0.12  0.81) 2  0.73
d 2  ( x1  w12 ) 2  ( x2  w22 ) 2  (0.52  0.42 ) 2  (0.12  0.70 ) 2  0.59
d 3  ( x1  w13 ) 2  ( x2  w23 ) 2  (0.52  0.43) 2  (0.12  0.21) 2  0.13

The Kohonen network
 Neuron 3 is the winner and its weight vector W3 is updated according to the competitive
learning rule.
Dw13  a ( x1  w13 )  0.1 (0.52  0.43)  0.01
Dw23  a ( x 2  w23 )  0.1 (0.12  0.21)   0.01
 The updated weight vector W3 at iteration (p + 1) is determined as:

0.43  0.01 0.44
W3 ( p  1)  W3 ( p )  DW3 ( p )       
 0 .21   0.01  0.20 
 The weight vector W3 of the wining neuron 3 becomes closer to the input vector X with
each iteration.

Chapter 7 - Neural-Networks

Uploaded by

Copyright:

Available Formats

Chapter 7 - Neural-Networks

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 7 - Neural-Networks

Uploaded by

Copyright:

Available Formats

Chapter -7

 Perceptron’s training algorithm

Step 3: Weight training

where Dwi(p) is the weight correction at iteration p.

where Fi is the total number of inputs of neuron i in the network. The

where m is the number of inputs of neuron k in the output layer. 31

Update the weights at the hidden neurons:

y4  sigmoid ( x1w14  x2w24  4 )  1/ 1  e (10.911.010.1)  0.8808

y5  sigmoid( y3w35  y4w45  5)  1/ 1 e(0.52501.20.88081.110.3)  0.5097

 Thus, the following error is obtained:

5  y5 (1 y5) e  0.5097 (1 0.5097) ( 0.5097) 0.1274

4  y4 (1 y4 )  5  w45  0.8808 (1  0.8808) (  0.1274) 1.1  0.0147

 We then determine the weight corrections:

 5   5  D  5  0.3  0.0127  0.3127

 Step 0: initialize all weights to 0

b(new) = b(old) + a(t - y_in)

In this case, the solution

Figure: Single-layer n-neuron Hopfield network

1 1 1 0 0 0 2 2 2: is required to

0 2 2 1 0  1 0 2 2 1 0 1

(1,1, 1) (1, 1, 1)

Figure: Possible states for the three-neuron 0

Hopfield network (1,1,1) (1,1,1)

Figure: Architecture of the Kohonen Network

where xi is the input signal and a is the learning rate parameter.

where m is the number of neurons in the Kohonen layer.

d 2  ( x1  w12 ) 2  ( x2  w22 ) 2  (0.52  0.42 ) 2  (0.12  0.70 ) 2  0.59

d 3  ( x1  w13 ) 2  ( x2  w23 ) 2  (0.52  0.43) 2  (0.12  0.21) 2  0.13

Dw23  a ( x 2  w23 )  0.1 (0.12  0.21)   0.01

 The updated weight vector W3 at iteration (p + 1) is determined as:

You might also like