Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
Individual neuron has a very simple structure but an assembly of such elementary units
constitutes a tremendous processing power. A neuron consists of a cell body, soma, a number
of fibers called dendrites, and a single long fiber called the axon. While dendrites branch into
a network around the soma, the axon stretches out to the dendrites and somas of other neurons.
Human brain can be considered as a highly complex, nonlinear and parallel information-
processing system. Information is stored and processed in a neural network simultaneously
throughout the whole network, rather than at specific locations
Owing to the plasticity, connections between neurons leading to the ‘right answer’ are
strengthened while those leading to the ‘wrong answer’ weaken which leads neural networks
have the ability to learn through experience. Learning is a fundamental and essential
characteristic of biological neural networks.
An artificial neural network consists of a number of very simple and highly interconnected
processors, also called neurons, which are analogous to the biological neurons in the brain. The
neurons are connected by weighted links passing signals from one neuron to another. Each
neuron receives a number of input signals through its connections; however, it never produces
more than a single output signal. The output signal is transmitted through the neuron’s outgoing
connection (corresponding to the biological axon). The outgoing connection, in turn, splits into
a number of branches that transmit the same signal (the signal is not divided among these
branches in any way). The outgoing branches terminate at the incoming connections of other
neurons in the network. Figure 6.4 represents connections of a typical ANN, and Table 1 shows
the analogy between biological and artificial neural networks [10].
Table Error! No text of specified style in document.-1: Analogy between biological and
artificial neural network
Biological neural network Artificial neural network
Dendrite Input
Axon Output
Synapse Weight
Figure Error! No text of specified style in document..1: Architecture of a Typical ANN and
Human Neuron
ANN is layered structure i.e. from input to output layer there may be various hidden layers. If
the learning is in its simple form like linear then the layers will be few. But it increases with
non-linearity. The neurons connected to the external environment of input and output layers.
The weights are updated to bring the network behavior to desired output. [11] Individual neuron
is an elementary information-processing unit means for computing its activation level given
the inputs and numerical weights. Neuron as a computing element receives several signals from
its input links computes a new activation level and sends it as an output signal through the
output links [12].
Figure Error! No text of specified style in document..2: Weight in Neuron
The neuron computes the weighted sum of the input signals and compares the result with a
threshold value Ө. If the net input is less than the threshold, the neuron output is -1. But if the
net input is greater than or equal to the threshold, the neuron becomes activated and its output
attains a value 1 [13]
0 𝑖𝑓 ∑ 𝑤𝑖 𝑥𝑖 ≤ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝑖
𝑌= 2.1
1 𝑖𝑓 ∑ 𝑤𝑖 𝑥𝑖 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
{ 𝑖
Where xi is the value of input i, wi is the weight of input i, n is the number of neuron inputs,
and Y is the output of the neuron. This type of activation function is called a sign function. Y
is the output of the neuron
0 𝑖𝑓 𝑤𝑥 + 𝑏 ≤ 0
𝑌={ 2.2
1 𝑖𝑓 𝑤𝑥 + 𝑏 > 0
Where wx ≡ ∑i wi xi where w and x are vector who components are the weights and inputs
respectively,b ≡ −Ө(threshold)
Making small adjustments in the weights to reduce the difference between the actual and
desired outputs ANN converses. The initial weights are randomly assigned then adjusted to
obtain the output consistent with the training examples reducing the error i.e. using gradient
descent technique. This sort of technique which lets us find weights and biases so that the
output from the network approximates y(x) for all training inputs x is desirable. To quantify a
function named cost function is defined
1
𝐶(𝑤, 𝑏) ≡ ∑‖𝑦(𝑥) − 𝑎𝐿 (𝑥)‖2 2.3
2𝑛
𝑥
Here, w denotes the collection of all weights in the network, b all the biases, n is the total
number of training inputs, a is the vector of outputs from the network when x is input, and the
sum is over all training inputs, x. C the quadratic cost function also known as the mean squared
error. The cost C \(w,b) becomes small, i.e.(w,b)≈0 , precisely when y(x) is approximately
equal to the output, a for all training inputs, x. So our training algorithm has done a good job
if it can find weights and biases so that C (w,b) if C(w,b) is large - that would mean that y(x)
is not close to the output a for a large number of inputs. So the aim of our training algorithm
will be to minimize the cost C(w,b) as a function of the weights and biases. In other words, to
find a set of weights and biases which make the cost as small as a gradient descent is
implemented which solves the minimization problems. The idea is to use gradient descent to
find the weights wk and biases bl which minimize the cost in equation. Gradient descent is the
optimal strategy for searching for a minimum. Gradient descent update with the weights and
biases replacing components wk and bl, weight and biases are updated by corresponding
components ∂C/∂wk ∂C/∂wk and ∂C/∂bl i.e.
∂C
wk → wk′ = wk − η 2.4
∂wk
∂C
bl → b′l = bl − η 2.5
∂bl
1
𝜎(𝑧) ≡ 2.7
1 + 𝑒 −𝑧
With the inputs x1,x2 … and weights w1,w2 and bias equation can further simplified to learn
in backpropagation.
1
𝜎(𝑧) ≡ 2.8
1 + 𝑒𝑥𝑝(− ∑𝑖 𝑤𝑖 𝑥𝑖 − 𝑏)
The indices i, j and k here refer to neurons in the input, hidden and output layers, respectively.
Figure Error! No text of specified style in document..4: Back Propagation Neural Network
Input signals, x1 , x2 , …….. xn, are propagated through the network from left to right, and error
signals, C1,C2, ... Cn, from right to left. The symbol wij denotes the weight for the connection
between neuron i in the input layer and neuron j in the hidden layer, and the symbol w jk the
weight between neuron j in the hidden layer and neuron k in the output layer. If z ≡ w ⋅ x +
b is a large positive number. Then e−z ≈ 0 and so σ(z) ≈ 1σ. In other words, when z = w ⋅
x + b is large and positive, the output from the sigmoid neuron is approximately 1, If z = w ⋅
x + b is very negative. Thene−z →, and σ(z) ≈ 0. So when z = w ⋅ x + b is very negative, the
behavior of a sigmoid neuron also closely approximates a perceptron.
The activation function of the neuron in each layer is represented in terms of bias and weight
l
where blj foe for the bias of the jth neuron in the lth layer, wjk to denote the weight for the
connection from the kth neuron in the (l−1)th layer to the jth neuron in the lth layer and activation
alj of the jth neuron in the lthlayer is related to the activations in the (l−1)th layer .finally equation
can be written as below
𝑙
𝑎 = 𝜎 (∑ 𝑤𝑗𝑘 𝑎𝑘𝑙−1 + 𝑏𝑗𝑙 ) 2 .9
𝑘
The goal of backpropagation is to compute the partial derivatives ∂C/∂w and ∂C/∂b of the cost
function C with respect to any weight w or bias b in the network.
The term Activation function is biologically inspired, in which brain neurons get signals from
other neurons, and decided whether or not to fire by taking the cumulative input into
account.An activation function is used by a unit in a neural network to decide what the
activation value of the unit should be based on a set of input values. Neural networks have to
implement complex mapping functions hence they need activation functions that are non-linear
in order to bring in the much needed non-linearity property that enables them to approximate
any function. The activation value of many such units can then be used to make a decision
based on the classification or predict value of input [14].
A sigmoid function, for example, produces a curve with an elongated “S” shape. It takes a real-
valued input and range it between 0 and 1. It is a special case of the logistic function,
differentiable and has a positive derivative at each point.
1
𝑓(𝑧) = 2.10
1 + exp(−𝑧)
ez − e−z
f(z) = tanh(z) = 2.11
ez + e−z