Unit 9 - Neural Network
Unit 9 - Neural Network
Unit 9 - Neural Network
CONTENT
• Introduction
• Neural Network Representation
• Appropriate Problems for Neural Network Learning
• Perceptrons
• Multilayer Networks and BACKPROPAGATION Algorithms
• Remarks on the BACKPROPAGATION Algorithms
INTRODUCTION
Now, it is time to see how the human nervous system has been
mimicked in the computer world in the form of an artificial neural
network or simply a neural network.
• The study of artificial neural networks (ANNs) has been inspired by the
observation that biological learning systems are built of very complex webs of
interconnected Neurons
• Human information processing system consists of brain neuron: basic building
block cell that communicates information to and from various parts of body
• Simplest model of a neuron: considered as a threshold unit –a processing element
(PE)
• Collects inputs & produces output if the sum of the input exceeds an internal
threshold value
UNDERSTANDING THE BIOLOGICAL NEURON
Examples:
1. Speech phoneme recognition
2. Image classification
3. Financial perdition
Neuron
Neuron
Neuron
Neuron – Activation Function
Neuron – Activation Function
This means that f(x) is zero when x is less than zero and f(x) is equal to x when x is
above or equal to zero. Figure depicts the curve for a ReLU activation function.
Neuron – Activation Function
• The input to the neural network is a 30x32 grid of pixel intensities obtained from
a forward-pointed camera mounted on the vehicle.
• The McCulloch–Pitts neural model which was the earliest ANN model, has only
two types of inputs – excitatory and inhibitory.
• The inputs of the McCulloch–Pitts neuron could be either 0 or 1.
• It has a threshold function as activation function. So, the output signal y_out is 1
if the input y_sum is greater than or equal to a given threshold value, else 0.
McCulloch–Pitts model of neuron - EXAMPLE
• The perceptron, as depicted in Figure, receives a set of input x_1, x_2,…, x_n.
The linear combiner or the adder node computes the linear combination of the
inputs applied to the synapses with synaptic weights being w_1, w_2, …, w_n.
• Then, the hard limiter checks whether the resulting sum is positive or negative. If
the input of the hard limiter node is positive, the output is +1, and if the input is
negative, the output is −1.
• Mathematically, the hard limiter input is
• The objective of perceptron is to classify a set of inputs into two classes, c_1 and
c_2.
• This can be done using a very simple decision rule – assign the inputs x_1, x_2,
x_3 , …, x_n. to c_1 if the output of the perceptron, i.e. y_out , is +1 and c if y is
−1.
Rosenblatt’s perceptron
• So, for an n-dimensional signal space, i.e. a space for ‘n’ input signals x_1, x_2,
x_3, …, x_n, the simplest form of perceptron will have two decision regions,
resembling two classes, separated by a hyperplane defined by
• Therefore, for two input signals denoted by variables x1 and x2 , the decision
boundary is a straight line of the form
• So, for So, any point (x1 , x2 ) which lies above the decision boundary, as
depicted by Figure 10.9, will be assigned to class c1 and the points which lie
below the boundary are assigned to class c2 .
Example
• Let us examine if this perceptron is able to classify a set of points given below:
• As depicted in Figure 10.10, we can see that on the basis of activation function
output, only points p1 and p2 generate an output of 1. Hence, they are assigned to
class c1 as expected. On the other hand, p3 and p4 points having activation
function output as negative generate an output of 0. Hence, they are assigned to
class c2 , again as expected.
Multi-layer perceptron
• A basic perceptron works very successfully for data sets which possess linearly
separable patterns.
• A basic perceptron is not able to learn to compute even a simple 2-bit XOR. Why
is that so? Let us try to understand.
• The truth table highlighting output of a 2-bit XOR function.
Multi-layer perceptron
• The data is not linearly separable. Only a curved decision boundary can separate
the classes properly.
• To address this issue, the other option is to use two decision lines in place of one.
Figure 10.14 shows how a linear decision boundary with two decision lines can
clearly partition the data.
Multi-layer perceptron
One main part of the algorithm is adjusting the interconnection weights. This is
done using a technique termed as gradient descent.
In simple terms, the algorithm calculates the partial derivative of the activation
function by each interconnection weight to identify the ‘gradient’ or extent of
change of the weight required to minimize the cost function. Quite understandably,
therefore, the activation function needs to be differentiable.
Gradient Descent algorithm and its variants
• Overall the whole process of updating the parameters will look like the following
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM
for the k-th neuron in the hidden layer. If f is the activation function of the hidden
layer, then
for the k-th neuron in the output layer. Note that the input signals to X and Y are
assumed as 1. If f is the activation function of the hidden layer, then
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM
If tk is the target output of the k-th output neuron, then the cost function defined as
the squared error of the output layer is given by