AI EEE Unit-I
AI EEE Unit-I
AI EEE Unit-I
Introduction
The dynamics of this synaptic junction is complex. We can see the signal inputs from the
action of a neuron and through synaptic junction an output is actuated which is carried
over through dendrites to another neuron. Here, these are the neurotransmitters. We
learned from our experience that these synaptic junctions are either reinforced or in the
sense they behave in such a way that the output of synaptic junction may excite a neuron
or inhibit the neuron. This reinforcement of the synaptic weight is a concept that has been
taken to artificial neural model.
The objective is to create artificial machine and this artificial neural networks are motivated
by certain features that are observed in human brain, like as we said earlier, parallel
distributed information processing.
Artificial neural networks are among the most powerful learning models. They have the
versatility to approximate a wide range of complex functions representing multi-
dimensional input-output maps. Neural networks also have inherent adaptability, and can
perform robustly even in noisy environments.
An Artificial Neural Network (ANN) is an information processing paradigm that is inspired
by the way biological nervous systems, such as the brain, process information. The key
element of this paradigm is the novel structure of the information processing system. It is
composed of a large number of highly interconnected simple processing elements (neurons)
working in unison to solve specific problems. ANNs, like people, learn by example. An ANN
is configured for a specific application, such as pattern recognition or data classification,
through a learning process. Learning in biological systems involves adjustments to the
synaptic connections that exist between the neurons. This is true of ANNs as well. ANNs
can process information at a great speed owing to their highly massive parallelism.
A trained neural network can be thought of as an "expert" in the category of information it
has been given to analyses. This expert can then be used to provide projections given new
situations of interest and answer "what if" questions.
Advantages of ANN:
1. Adaptive learning: An ability to learn how to do tasks based on the data given for training
or initial experience.
2. Self-Organisation: An ANN can create its own organisation or representation of the
information it receives during learning time.
3. Real Time Operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this
capability.
4. Fault Tolerance via Redundant Information Coding: Partial destruction of a network
leads to the corresponding degradation of performance. However, some network capabilities
may be retained even with major network damage.
Artificial Neural Network (ANN) is an efficient computing system whose central theme is
borrowed from the analogy of biological neural networks. ANNs are also named as “artificial
neural systems,” or “parallel distributed processing systems,” or “connectionist systems.”
ANN acquires a large collection of units that are interconnected in some pattern to allow
communication between the units. These units, also referred to as nodes or neurons, are
simple processors which operate in parallel.
Every neuron is connected with other neuron through a connection link. Each connection
link is associated with a weight that has information about the input signal. This is the
most useful information for neurons to solve a particular problem because the weight
usually excites or inhibits the signal that is being communicated. Each neuron has an
internal state, which is called an activation signal. Output signals, which are produced
after combining the input signals and activation rule, may be sent to other units.
The history of ANN can be divided into the following three eras:
ANN during 1940s to 1960s
Some key developments of this era are as follows:
1943: It has been assumed that the concept of neural network started with the work
of physiologist, Warren McCulloch, and mathematician, Walter Pitts, when in 1943
they modeled a simple neural network using electrical circuits in order to describe
how neurons in the brain might work.
1949: Donald Hebb’s book, The Organization of Behavior, put forth the fact that
repeated activation of one neuron by another increases its strength each time they
are used.
1956: An associative memory network was introduced by Taylor.
1958: A learning method for McCulloch and Pitts neuron model named Perceptron
was invented by Rosenblatt.
1960: Bernard Widrow and Marcian Hoff developed models called "ADALINE" and
“MADALINE.”
ANN during 1960s to 1980s
Some key developments of this era are as follows:
1961: Rosenblatt made an unsuccessful attempt but proposed the
“backpropagation” scheme for multilayer networks.
1964: Taylor constructed a winner-take-all circuit with inhibitions among output
units.
1969: Multilayer perceptron (MLP) was invented by Minsky and Papert.
1971: Kohonen developed Associative memories.
1976: Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory.
ANN from 1980s till Present
Some key developments of this era are as follows:
o 1982: The major development was Hopfield’s Energy approach.
o 1985: Boltzmann machine was developed by Ackley, Hinton, and Sejnowski.
o 1986: Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.
o 1988: Kosko developed Binary Associative Memory (BAM) and also gave the concept
of Fuzzy Logic in ANN.
Biological Neuron
o A nerve cell (neuron) is a special biological cell that processes information. According
to estimation, there are huge numbers of neurons, approximately 1011 with
numerous interconnections, approximately 1015.
The features of the biological neural network are attributed to its structure and function.
The fundamental unit of the network is called a neuron or a nerve cell. Figure 1 shows a
schematic of the structure of a neuron.
The following table shows the comparison between ANN and BNN based on some criteria
mentioned.
Architecture:
Humans are best at understanding, reasoning, and interpreting knowledge. Human knows
things, which is knowledge and as per their knowledge they perform various actions in the
real world. But how machines do all these things comes under knowledge
representation and reasoning. Hence we can describe Knowledge representation as
following:
Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence
which concerned with AI agents thinking and how thinking contributes to intelligent
behavior of agents.
It is responsible for representing information about the real world so that a computer
can understand and can utilize this knowledge to solve the complex real world
problems such as diagnosis a medical condition or communicating with humans in
natural language.
It is also a way which describes how we can represent knowledge in artificial
intelligence. Knowledge representation is not just storing data into some database,
but it also enables an intelligent machine to learn from that knowledge and
experiences so that it can behave intelligently like a human.
What to Represent:
Object: All the facts about objects in our world domain. E.g., Guitars contains
strings, trumpets are brass instruments.
Events: Events are the actions which occur in our world.
Performance: It describe behavior which involves knowledge about how to do things.
Meta-knowledge: It is knowledge about what we know.
Facts: Facts are the truths about the real world and what we represent.
Knowledge-Base: The central component of the knowledge-based agents is the
knowledge base. It is represented as KB. The Knowledgebase is a group of the
Sentences (Here, sentences are used as a technical term and not identical with the
English language
Knowledge: Knowledge is awareness or familiarity gained by experiences of facts, data,
and situations. Following are the types of knowledge in artificial intelligence:
1. Declarative Knowledge:
2. Procedural Knowledge
3. Meta-knowledge:
5. Structural knowledge:
There are mainly four approaches to knowledge representation, which are given below:
It is the simplest way of storing facts which uses the relational method, and each fact
about a set of the object is set out systematically in columns.
This approach of knowledge representation is famous in database systems where the
relationship between different entities is represented.
This approach has little opportunity for inference.
Player1 65 23
Player2 58 18
Player3 75 24
2. Inheritable knowledge:
In the inheritable knowledge approach, all data must be stored into a hierarchy of
classes.
All classes should be arranged in a generalized form or a hierarchal manner.
In this approach, we apply inheritance property.
Elements inherit values from other members of a class.
This approach contains inheritable knowledge which shows a relation between
instance and class, and it is called instance relation.
Every individual frame can represent the collection of attributes and its value.
In this approach, objects and values are represented in Boxed nodes.
We use Arrows which point from objects to their values.
Example:
3. Inferential knowledge:
4. Procedural knowledge:
Procedural knowledge approach uses small programs and codes which describes how
to do specific things, and how to proceed.
In this approach, one important rule is used which is If-Then rule.
In this knowledge, we can use various coding languages such as LISP language and
Prolog language.
We can easily represent heuristic or domain-specific knowledge using this approach.
But it is not necessary that we can represent all cases in this approach.
where bi is the desired output from the ith output unit for an input pattern a = (al, a2, ...,
aM), a, is the jth component of the input Learning Methods pattern to the ith unit, and q is
a small positive learning constant.
The continuous perceptron learning uses a monotonically increasing nonlinear output
function fl.) for each unit. The weights are adjusted so as to minimize the squared error
between the desired and actual output at every instant. The corresponding learning
equation is given by The continuous perceptron learning uses a monotonically increasing
nonlinear output function fl.) for each unit. The weights are adjusted so as to minimize the
squared error between the desired and actual output at every instant. The corresponding
Competitive learning
It is concerned with unsupervised training in which the output nodes try to compete with
each other to network represent the input pattern. To understand this learning rule, we
must understand the competitive network which is given as follows −
Basic Network Concept of Competitive Network − This network is just like a single layer
feed forward network with shown with feedback connection between outputs. The
connections between outputs are inhibitory type, shown by by dotted lines, which means
the competitors never support themselves.
Basic Concept of Competitive Learning Rule: As said earlier, there will be a competition
among the output nodes. Hence, the main concept is that during training, the output unit
with the highest activation to a given input pattern will be declared the winner.
This rule is also called Winner-takes-all because only the winning neuron is updated and
the rest of the neurons are left unchanged.
Mathematical formulation: Following are the three important factors for mathematical
formulation of this learning rule:
Condition to be a winner: Suppose if a neuron 𝐲𝐤wants to be the winner then there
would be the following condition
It means that if any neuron, say𝒚𝒌, wants to win, then its induced local field (the output
of summation unit), say 𝒗𝒌, must be the largest among all the other neurons in the
network.
Condition of sum total of weight: Another constraint over the competitive learning rule
is, the sum total of weig
hts to a particular output neuron is going to be 1. For example, if we consider neuron k
then
Change of weight for winner: If a neuron does not respond to the input pattern, then no
learning takes place in that neuron. However, if a particular neuron wins, then the
corresponding weights are adjusted as follows:
This clearly shows that we are favoring the winning neuron by adjusting its weight and if
there is a neuron loss, then we need not bother to re-adjust its weight.
Boltzman learning
Boltzmann learning is statistical in nature, and is derived from the field of
thermodynamics. It is similar to error-correction learning and is used during
supervised training. ... Neural networks that use Boltzmann learning are
called Boltzmann machines.
The main purpose of Boltzmann Machine is to optimize the solution of a problem. It is the
work of Boltzmann Machine to optimize the weights and quantity related to that particular
problem.
Architecture
The following diagram shows the architecture of Boltzmann machine. It is clear from the
diagram, that it is a two-dimensional array of units. Here, weights on interconnections
between units are –p where p > 0. The weights of self-connections are given by b where b >
0.
Training Algorithm
As we know that Boltzmann machines have fixed weights, hence there will be no training
algorithm as we do not need to update the weights in the network. However, to test the
network we have to set the weights as well as to find the consensus function (CF).
Boltzmann machine has a set of units 𝐔𝐢𝐚𝐧𝐝𝐔𝐣 and has bi-directional connections on
them.
We are considering the fixed weight say 𝐰𝐢𝐣.
𝐰𝐢𝐣 ≠ 𝟎 if 𝐔𝐢𝐚𝐧𝐝𝐔𝐣 are connected.
There also exists a symmetry in weighted interconnection, i.e. 𝒘𝒊𝒋 = 𝒘𝒋𝒊.
𝐰𝐢𝐢also exists, i.e. there would be the self-connection between units.
For any unit 𝐔𝐢, its state 𝐮𝐢 would be either 1 or 0.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher. This
learning process is dependent.
During which the training of ANN under supervised learning, the input vector is presented
to the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the
actual output and the desired output vector. On the basis of this error signal, the weights
are adjusted until the actual output is matched with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a teacher.
This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar type
are combined to response form clusters. When a new input pattern is applied, then the
neural network gives an output response indicating the class to which the input pattern
belongs.
There is no feedback from the environment as to what should be the desired output and if
it is correct or incorrect. Hence, in this type of learning, the network itself must discover
the patterns and features from the input data, and the relation for the input data over the
output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the network
over some critic information. This learning process is similar to supervised learning;
however we might have very less information.
During the training of network under reinforcement learning, the network receives some
feedback from the environment. This makes it somewhat similar to supervised learning.
However, the feedback obtained here is evaluative not instructive, which means there is no
teacher as in supervised learning. After receiving the feedback, feedback, the network
performs adjustments of the weights to get better critic information in future.
Learning paradigms:
There are three major learning paradigms, each corresponding to a particular abstract
learning task. These are supervised learning, unsupervised learning and reinforcement
learning.
Supervised learning the network is provided with a correct answer (output) for every
input pattern. Weights are determined to allow the network to produce answers as
close as possible to the known correct answers.
Unsupervised learning does not require a correct answer associated with each input
pattern in the training data set. It explores the underlying structure in the data, or
correlations between patterns in the data, and organizes patterns into categories from
these correlations.
Hybrid learning combines supervised and unsupervised learning. Part of the weights
is usually determined through supervised learning, while the others are obtained
through unsupervised learning. Reinforcement learning is a variant of supervised
learning in which the network is provided with only a critique on the correctness of
network outputs, not the correct answers themselves.
ANN Paradigms
Multilayer Neural Networks and Back propagation:
Self –Organizing Map (SOM)
Topographic Maps
Neurobiological studies indicate that different sensory inputs (motor, visual, auditory, etc.)
are mapped onto corresponding areas of the cerebral cortex in an orderly fashion.
This form of map, known as a topographic map, has two important properties:
1. At each stage of representation, or processing, each piece of incoming information is kept
in its proper context/neighbourhood.
2. Neurons dealing with closely related pieces of information are kept close together so that
they can interact via short synaptic connections.
Our interest is in building artificial topographic maps that learn through self-organization
in a neuro-biologically inspired manner.
We shall follow the principle of topographic map formation: “The spatial location of an
output neuron in a topographic map corresponds to a particular domain or feature drawn
from the input space”.
Setting up a Self Organizing Map
The principal goal of an SOM is to transform an incoming signal pattern of arbitrary
dimension into a one or two dimensional discrete map, and to perform this transformation
adaptively in a topologically ordered fashion.
We therefore set up our SOM by placing neurons at the nodes of a one or two dimensional
lattice. Higher dimensional maps are also possible, but not so common.
The neurons become selectively tuned to various input patterns (stimuli) or classes of
input patterns during the course of the competitive learning.
The locations of the neurons so tuned (i.e. the winning neurons) become ordered and a
meaningful coordinate system for the input features is created on the lattice. The SOM
thus forms the required topographic map of the input patterns.
We can view this as a non-linear generalization of principal component analysis (PCA).
Organization of the Mapping
We have points x in the input space mapping to points I(x) in the output space
Each point I in the output space will map to a corresponding point w(I) in the input space.
Components of Self Organization
The self-organization process involves four major components:
Initialization: All the connection weights are initialized with small random values.
Competition: For each input pattern, the neurons compute their respective values of a
discriminant function which provides the basis for competition. The particular neuron with
the smallest value of the discriminant function is declared the winner.
Cooperation: The winning neuron determines the spatial location of a topological
neighborhood of excited neurons, thereby providing the basis for cooperation among
neighboring neurons.
Adaptation: The excited neurons decrease their individual values of the discriminant
function in relation to the input pattern through suitable adjustment of the associated
connection weights, such that the response of the winning neuron to the subsequent
application of a similar input pattern is enhanced.
The Competitive Process
If the input space is D dimensional (i.e. there are D input units) we can write the input
patterns as x = {xi : i = 1, …, D} and the connection weights between the input units i and
the neurons j in the computation layer can be written wj = {wji : j = 1, …, N; i = 1, …, D}
where N is the total number of neurons.
We can then define our discriminant function to be the squared Euclidean distance
between the input vector x and the weight vector wj for each neuron j
In other words, the neuron whose weight vector comes closest to the input vector (i.e. is
most similar to it) is declared the winner.
In this way the continuous input space can be mapped to the discrete output space of
as our topological neighborhood, where I(x) is the index of the winning neuron. This has
several important properties: it is maximal at the winning neuron, it is symmetrical about
that neuron, it decreases monotonically to zero as the distance goes to infinity, and it is
translation invariant (i.e. independent of the location of the winning neuron)
A special feature of the SOM is that the size s of the neighborhood needs to decrease with
learning rate , and the updates are applied for all the training
patterns x over many epochs.
The effect of each learning weight update is to move the weight vectors wi of the winning
neuron and its neighbors towards the input vector x. Repeated presentations of the training
data thus leads to topological ordering.
Ordering and Convergence
2. Multi-Quadric Functions:
7. Cubic Function:
8. Linear Function:
Problems with Exact Interpolation
We have seen how we can set up RBF networks that perform exact interpolation, but there
are two serious problems with these exact interpolation networks:
1. They perform poorly with noisy data
As we have already seen for Multi-Layer Perceptron (MLPs), we do not usually want the
network’s outputs to pass through all the data points when the data is noisy, because that
will be a highly oscillatory function that will not provide good generalization.
2. They are not computationally efficient
The network requires one hidden unit (i.e. one basis function) for each training data
pattern, and so for large data sets the network will become very costly to evaluate.
With MLPs we can improve generalization by using more training data – the opposite
happens in RBF networks, and they take longer to compute as well.
Improving RBF Networks
We can take the basic structure of the RBF networks that perform exact interpolation and
improve upon them in a number of ways:
1. The number M of basic functions (hidden units) need not equal the number N of training
data points. In general it is better to have M much less than N.
2. The centers of the basic functions do not need to be defined as the training data input
vectors. They can instead be determined by a training algorithm.
3. The basic functions need not all have the same width parameter s. These can also be
determined by a training algorithm.
4. We can introduce bias parameters into the linear sum of activations at the output layer.
These will compensate for the difference between the average value over the data set of the
basis function activations and the corresponding average value of the targets.