Unit 9 - Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

BASICS OF NEURAL NETWORKS

CONTENT
• Introduction
• Neural Network Representation
• Appropriate Problems for Neural Network Learning
• Perceptrons
• Multilayer Networks and BACKPROPAGATION Algorithms
• Remarks on the BACKPROPAGATION Algorithms
INTRODUCTION

Now, it is time to see how the human nervous system has been
mimicked in the computer world in the form of an artificial neural
network or simply a neural network.

Artificial neural networks (ANNs) provide a general, practical method


for learning real-valued, discrete-valued, and vector-valued target
functions from examples.
Biological Motivation

• The study of artificial neural networks (ANNs) has been inspired by the
observation that biological learning systems are built of very complex webs of
interconnected Neurons
• Human information processing system consists of brain neuron: basic building
block cell that communicates information to and from various parts of body
• Simplest model of a neuron: considered as a threshold unit –a processing element
(PE)
• Collects inputs & produces output if the sum of the input exceeds an internal
threshold value
UNDERSTANDING THE BIOLOGICAL NEURON

Figure presents the structure of a neuron. It has three main


parts to carry out its primary functionality of receiving and
transmitting information:
1. Dendrites – to receive signals from neighbouring neurons.
2. Soma – main body of the neuron which accumulates the
signals coming from the different dendrites. It ‘fires’ when a
sufficient amount of signal is accumulated.
3. Axon – last part of the neuron which receives signal from
soma, once the neuron ‘fires’, and passes it on to the
neighbouring neurons through the axon terminals (to the
adjacent dendrite of the neighbouring neurons).

There is a very small gap between the axon terminal of one


neuron and the adjacent dendrite of the neighbouring neuron.
This small gap is known as synapse. The signals transmitted
through synapse may be excitatory or inhibitory.
Facts of Human Neurobiology

• Number of neurons ~ 1011


• Connection per neuron ~ 10 4 – 5
• Neuron switching time ~ 0.001 second or 10 -3
• Scene recognition time ~ 0.1 second
• 100 inference steps doesn’t seem like enough
• Highly parallel computation based on distributed representation
Properties of Neural Networks

• Many neuron-like threshold switching units


• Many weighted interconnections among units
• Highly parallel, distributed process
• Emphasis on tuning weights automatically
• Input is a high-dimensional discrete or real-valued (e.g, sensor input)
When to consider Neural Networks ?

• Input is a high-dimensional discrete or real-valued (e.g., sensor input)


• Output is discrete or real-valued
• Output is a vector of values
• Possibly noisy data
• Form of target function is unknown
• Human readability of result is unimportant

Examples:
1. Speech phoneme recognition
2. Image classification
3. Financial perdition
Neuron
Neuron
Neuron
Neuron – Activation Function
Neuron – Activation Function

ReLU (Rectified Linear Unit) function


ReLU is the most popularly used activation function in the areas of neural networks and deep
learning. It is of the form

This means that f(x) is zero when x is less than zero and f(x) is equal to x when x is
above or equal to zero. Figure depicts the curve for a ReLU activation function.
Neuron – Activation Function

Hyperbolic tangent function


Hyperbolic tangent function is another continuous activation function, which is bipolar in nature.
NEURAL N E T W O R K REPRESENTATIONS
• A prototypical example of ANN learning is provided by Pomerleau's (1993)
system ALVINN, which uses a learned ANN to steer an autonomous vehicle
driving at normal speeds on public highways.

• The input to the neural network is a 30x32 grid of pixel intensities obtained from
a forward-pointed camera mounted on the vehicle.

• The network output is the direction in which the vehicle is steered.


• Figure illustrates the neural network representation.
• The network is shown on the left side of the figure, with the input camera image
depicted below it.
• Each node (i.e., circle) in the network diagram corresponds to the output of a
single network unit, and the lines entering the node from below are its inputs.
• There are four units that receive inputs directly from all of the 30 x 32 pixels in
the image. These are called "hidden" units because their output is available only
within the network and is not available as part of the global network output. Each
of these four hidden units computes a single real-valued output based on a
weighted combination of its 960 inputs
• These hidden unit outputs are then used as inputs to a second layer of 30 "output"
units.
• Each output unit corresponds to a particular steering direction, and the output
values of these units determine which steering direction is recommended most
strongly.
• The diagrams on the right side of the figure depict the learned weight values
associated with one of the four hidden units in this ANN.
• The large matrix of black and white boxes on the lower right depicts the weights
from the 30 x 32 pixel inputs into the hidden unit. Here, a white box indicates a
positive weight, a black box a negative weight, and the size of the box indicates
the weight magnitude.
• The smaller rectangular diagram directly above the large matrix shows the
weights from this hidden unit to each of the 30 output units.
APPROPRIATE PROBLEMS FOR
NEURAL NETWORK LEARNING
ANN is appropriate for problems with the following characteristics :
• Instances are represented by many attribute-value pairs.
• The target function output may be discrete-valued, real-valued, or a vector of
several real- or discrete-valued attributes.
• The training examples may contain errors.
• Long training times are acceptable.
• Fast evaluation of the learned target function may be required
• The ability of humans to understand the learned target function is not important
Architectures of Artificial Neural Networks
An artificial neural network can be divided into three parts (layers), which are
known as:
• Input layer: This layer is responsible for receiving information (data), signals,
features, or measurements from the external environment. These inputs are usually
normalized within the limit values produced by activation functions
• Hidden, intermediate, or invisible layers: These layers are composed of neurons
which are responsible for extracting patterns associated with the process or system
being analysed. These layers perform most of the internal processing from a
network.
• Output layer : This layer is also composed of neurons, and thus is responsible for
producing and presenting the final network outputs, which result from the
processing performed by the neurons in the previous layers.
Architectures of Artificial Neural Networks
The main architectures of artificial neural networks, considering the neuron
disposition, how they are interconnected and how its layers are composed, can be
divided as follows:

1. Single-layer feedforward network


2. Multi-layer feedforward networks
3. Recurrent or Feedback networks
4. Mesh networks
Single-Layer Feedforward Architecture
• This artificial neural network has just one input layer and a single neural layer, which is also the
output layer.
• Figure illustrates a simple-layer feedforward network composed of n inputs and m outputs.
• The information always flows in a single direction (thus, unidirectional), which is from the input
layer to the output layer
Multi-Layer Feedforward Architecture
• This artificial neural feedforward networks with multiple layers are composed of one or more
hidden neural layers.
• Figure shows a feedforward network with multiple layers composed of one input layer with n
sample signals, two hidden neural layers consisting of n1 and n2 neurons respectively, and, finally,
one output neural layer composed of m neurons representing the respective output values of the
problem being analyzed .
Recurrent or Feedback Architecture
• In these networks, the outputs of the neurons are used as feedback inputs for other neurons.
• Figure illustrates an example of a Perceptron network with feedback, where one of its output
signals is fed back to the middle layer.
Mesh Architectures
• The main features of networks with mesh structures reside in considering the spatial arrangement
of neurons for pattern extraction purposes, that is, the spatial localization of the neurons is directly
related to the process of adjusting their synaptic weights and thresholds.
• Figure illustrates an example of the Kohonen network where its neurons are arranged within a two-
dimensional space
McCulloch–Pitts model of neuron

• The McCulloch–Pitts neural model which was the earliest ANN model, has only
two types of inputs – excitatory and inhibitory.
• The inputs of the McCulloch–Pitts neuron could be either 0 or 1.
• It has a threshold function as activation function. So, the output signal y_out is 1
if the input y_sum is greater than or equal to a given threshold value, else 0.
McCulloch–Pitts model of neuron - EXAMPLE

• John carries an umbrella if it is sunny or if it is raining. There are four given


situations. We need to decide when John will carry the umbrella. The situations
are as follows:
• Situation 1 – It is not raining nor is it sunny.
• Situation 2 – It is not raining, but it is sunny.
• Situation 3 – It is raining, and it is not sunny.
• Situation 4 – Wow, it is so strange! It is raining as well as it is sunny.
• We can consider the input signals as follows:
• X_1 → Is it raining?
• X_2 → Is it sunny?
• So, the value of both x1 and x2 can be either 0 or 1. We can use the value of both
weights x1 and x2 as 1 and a threshold value of the activation function as 1.
McCulloch–Pitts model of neuron - EXAMPLE

Formally, we can say,

From the truth table, we can conclude that in the


situations where the value of y is 1, John needs to carry an
umbrella. Hence, he will need to carry an umbrella in
situations 2, 3, and 4.
Rosenblatt’s perceptron

• The perceptron, as depicted in Figure, receives a set of input x_1, x_2,…, x_n.
The linear combiner or the adder node computes the linear combination of the
inputs applied to the synapses with synaptic weights being w_1, w_2, …, w_n.
• Then, the hard limiter checks whether the resulting sum is positive or negative. If
the input of the hard limiter node is positive, the output is +1, and if the input is
negative, the output is −1.
• Mathematically, the hard limiter input is

• The output is decided by the expression


Rosenblatt’s perceptron

• The objective of perceptron is to classify a set of inputs into two classes, c_1 and
c_2.
• This can be done using a very simple decision rule – assign the inputs x_1, x_2,
x_3 , …, x_n. to c_1 if the output of the perceptron, i.e. y_out , is +1 and c if y is
−1.
Rosenblatt’s perceptron

• So, for an n-dimensional signal space, i.e. a space for ‘n’ input signals x_1, x_2,
x_3, …, x_n, the simplest form of perceptron will have two decision regions,
resembling two classes, separated by a hyperplane defined by

• Therefore, for two input signals denoted by variables x1 and x2 , the decision
boundary is a straight line of the form

• So, for a perceptron having the values of synaptic weights w0 , w1 , and w2 as


−2, ½, and ¼, respectively, the linear decision boundary will be of the form
Rosenblatt’s perceptron

• So, for So, any point (x1 , x2 ) which lies above the decision boundary, as
depicted by Figure 10.9, will be assigned to class c1 and the points which lie
below the boundary are assigned to class c2 .
Example

• Let us examine if this perceptron is able to classify a set of points given below:

• As depicted in Figure 10.10, we can see that on the basis of activation function
output, only points p1 and p2 generate an output of 1. Hence, they are assigned to
class c1 as expected. On the other hand, p3 and p4 points having activation
function output as negative generate an output of 0. Hence, they are assigned to
class c2 , again as expected.
Multi-layer perceptron

• A basic perceptron works very successfully for data sets which possess linearly
separable patterns.
• A basic perceptron is not able to learn to compute even a simple 2-bit XOR. Why
is that so? Let us try to understand.
• The truth table highlighting output of a 2-bit XOR function.
Multi-layer perceptron

• The data is not linearly separable. Only a curved decision boundary can separate
the classes properly.
• To address this issue, the other option is to use two decision lines in place of one.
Figure 10.14 shows how a linear decision boundary with two decision lines can
clearly partition the data.
Multi-layer perceptron

• This is the philosophy used to design the multi-layer perceptron model.


• The major highlights of this model are as follows:
• The neural network contains one or more intermediate layers between the input and the output
nodes, which are hidden from both input and output nodes.
• Each neuron in the network includes a non-linear activation function that is differentiable.
• The neurons in each layer are connected with some or all the neurons in the previous layer.
ADALINE network model

• Adaptive Linear Neural Element (ADALINE) is an early single-layer ANN


developed by Professor Bernard Widrow of Stanford University.
• As depicted in Figure 10.16, it has only output neuron.
ADALINE network model

• The output value can be +1 or −1.


• A bias input x_0 (where x_0 = 1) having a weight w_0 is added. The activation
function is such that if the weighted sum is positive or 0, then the output is 1, else
it is −1.
• Formally, we can say,
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM

In 1986, an efficient method of training an ANN was discovered. In this method,


errors, i.e. difference in output values of the output layer and the expected values,
are propagated back from the output layer to the preceding layers. Hence, the
algorithm implementing this method is known as backpropagation, i.e. propagating
the errors backward to the preceding layers.

The backpropagation algorithm is applicable for multilayer feed forward networks.


It is a supervised learning algorithm which continues adjusting the weights of the
connected neurons with an objective to reduce the deviation of the output signal
from the target output.
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM

The iterations continue till a stopping criterion is reached. Figure depicts a


reasonably simplified version of the backpropagation algorithm.
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM

One main part of the algorithm is adjusting the interconnection weights. This is
done using a technique termed as gradient descent.

In simple terms, the algorithm calculates the partial derivative of the activation
function by each interconnection weight to identify the ‘gradient’ or extent of
change of the weight required to minimize the cost function. Quite understandably,
therefore, the activation function needs to be differentiable.
Gradient Descent algorithm and its variants

• Gradient Descent is an optimization algorithm used for minimizing the cost


function in various machine learning algorithms. It is basically used for updating
the parameters of the learning model.
• Batch Gradient Descent
• Stochastic Gradient Descent
• Mini Batch gradient descent
• Here we are using a linear regression model. In our baseline model, we are using
the values 0 for B(slope of the line) and b (intercept) is the mean of all
independent variables.
Calculate cost/error

• Once we have our prediction we can use it in error


calculation. Here, our error metric is the mean squared
error(MSE).
• Mean squared error is the average squared difference
between the estimated values and the actual value.

• Our calculated error value of the baseline model can lead


us anywhere on the Cost-B curve as shown in the
following images. Now our task is to update B in such a
way that it leads the error towards the bottom of the
curve
Update Parameters

• Now the question is how the parameters (B in this


case) will be updated. To update our parameters we
are going to use the partial derivatives. The partial
derivatives give the slope of the line also, it is the
change in the cost with respect to change in the B.
• Look at the image, in each case the partial derivative
will give the slope of the tangent. In the first case, the
slope will be negative whereas in the other case the
slope will be positive.
• Once we have the partial derivative

• ,we can update the values of B as shown below.


Update Parameters

• Overall the whole process of updating the parameters will look like the following
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM

The net signal input to the hidden layer neurons is given by

for the k-th neuron in the hidden layer. If f is the activation function of the hidden
layer, then

The net signal input to the output layer neurons is given by

for the k-th neuron in the output layer. Note that the input signals to X and Y are
assumed as 1. If f is the activation function of the hidden layer, then
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM

If tk is the target output of the k-th output neuron, then the cost function defined as
the squared error of the output layer is given by

You might also like