New Microsoft Office Word Document

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

In machine learning and cognitive science, an artificial neural network (ANN) is a network

inspired by biological neural networks (the central nervous systems of animals, in particular the
brain) which are used to estimate or approximate functions that can depend on a large number of
inputs that are generally unknown. Artificial neural networks are typically specified using three
things:[1]

Architecture specifies what variables are involved in the network and their topological
relationshipsfor example the variables involved in a neural network might be the
weights of the connections between the neurons, along with activities of the neurons

Activity Rule Most neural network models have short time-scale dynamics: local rules
define how the activities of the neurons change in response to each other. Typically the
activity rule depends on the weights (the parameters) in the network.

Learning Rule The learning rule specifies the way in which the neural network's weights
change with time. This learning is usually viewed as taking place on a longer time scale
than the time scale of the dynamics under the activity rule. Usually the learning rule will
depend on the activities of the neurons. It may also depend on the values of the target
values supplied by a teacher and on the current value of the weights.

For example, a neural network for handwriting recognition is defined by a set of input neurons
which may be activated by the pixels of an input image. After being weighted and transformed
by a function (determined by the network's designer), the activations of these neurons are then
passed on to other neurons. This process is repeated until finally, the output neuron that
determines which character was read is activated.
Like other machine learning methods systems that learn from data neural networks have been
used to solve a wide variety of tasks, like computer vision and speech recognition, that are hard
to solve using ordinary rule-based programming.

Contents

1 Background

2 History
o 2.1 Hebbian learning
o 2.2 Backpropagation and resurgence
o 2.3 Improvements since 2006

3 Models

o 3.1 Network function


o 3.2 Learning

3.2.1 Choosing a cost function

o 3.3 Learning paradigms

3.3.1 Supervised learning

3.3.2 Unsupervised learning

3.3.3 Reinforcement learning

o 3.4 Learning algorithms

4 Employing artificial neural networks

5 Applications
o 5.1 Real-life applications
o 5.2 Neural networks and neuroscience

5.2.1 Types of models

5.2.2 Memory networks

6 Neural network software

7 Types of artificial neural networks

8 Theoretical properties
o 8.1 Computational power
o 8.2 Capacity
o 8.3 Convergence
o 8.4 Generalization and statistics

9 Criticism

o 9.1 Training issues


o 9.2 Theoretical issues
o 9.3 Hardware issues
o 9.4 Practical counterexamples to criticisms
o 9.5 Hybrid approaches

10 Classes and types of ANNs

11 Gallery

12 See also

13 References

14 Bibliography

15 External links

Background
Examinations of humans' central nervous systems inspired the concept of artificial neural
networks. In an artificial neural network, simple artificial nodes, known as "neurons",
"neurodes", "processing elements" or "units", are connected together to form a network which
mimics a biological neural network.
There is no single formal definition of what an artificial neural network is. However, a class of
statistical models may commonly be called "neural" if it possesses the following characteristics:
1. contains sets of adaptive weights, i.e. numerical parameters that are tuned by a learning
algorithm, and
2. is capable of approximating non-linear functions of their inputs.
The adaptive weights can be thought of as connection strengths between neurons, which are
activated during training and prediction.
Artificial neural networks are similar to biological neural networks in the performing by its units
of functions collectively and in parallel, rather than by a clear delineation of subtasks to which
individual units are assigned. The term "neural network" usually refers to models employed in
statistics, cognitive psychology and artificial intelligence. Neural network models which

command the central nervous system and the rest of the brain are part of theoretical neuroscience
and computational neuroscience.[2]
In modern software implementations of artificial neural networks, the approach inspired by
biology has been largely abandoned for a more practical approach based on statistics and signal
processing. In some of these systems, neural networks or parts of neural networks (like artificial
neurons) form components in larger systems that combine both adaptive and non-adaptive
elements. While the more general approach of such systems is more suitable for real-world
problem solving, it has little to do with the traditional, artificial intelligence connectionist
models. What they do have in common, however, is the principle of non-linear, distributed,
parallel and local processing and adaptation. Historically, the use of neural network models
marked a directional shift in the late eighties from high-level (symbolic) artificial intelligence,
characterized by expert systems with knowledge embodied in if-then rules, to low-level (subsymbolic) machine learning, characterized by knowledge embodied in the parameters of a
dynamical system.

History
Warren McCulloch and Walter Pitts[3] (1943) created a computational model for neural networks
based on mathematics and algorithms called threshold logic. This model paved the way for
neural network research to split into two distinct approaches. One approach focused on
biological processes in the brain and the other focused on the application of neural networks to
artificial intelligence.

Hebbian learning
In the late 1940s psychologist Donald Hebb[4] created a hypothesis of learning based on the
mechanism of neural plasticity that is now known as Hebbian learning. Hebbian learning is
considered to be a 'typical' unsupervised learning rule and its later variants were early models for
long term potentiation. Researchers started applying these ideas to computational models in 1948
with Turing's B-type machines.
Farley and Wesley A. Clark[5] (1954) first used computational machines, then called
"calculators," to simulate a Hebbian network at MIT. Other neural network computational
machines were created by Rochester, Holland, Habit, and Duda[6] (1956).
Frank Rosenblatt[7] (1958) created the perceptron, an algorithm for pattern recognition based on a
two-layer computer learning network using simple addition and subtraction. With mathematical
notation, Rosenblatt also described circuitry not in the basic perceptron, such as the exclusive-or
circuit, a circuit which could not be processed by neural networks until after the backpropagation
algorithm was created by Paul Werbos[8] (1975).
Neural network research stagnated after the publication of machine learning research by Marvin
Minsky and Seymour Papert[9] (1969), who discovered two key issues with the computational
machines that processed neural networks. The first was that basic perceptrons were incapable of
processing the exclusive-or circuit. The second significant issue was that computers didn't have

enough processing power to effectively handle the long run time required by large neural
networks. Neural network research slowed until computers achieved greater processing power.

Backpropagation and resurgence


A key advance that came later was the backpropagation algorithm which effectively solved the
exclusive-or problem, and more generally the problem of quickly training multi-layer neural
networks (Werbos 1975).[8]
In the mid-1980s, parallel distributed processing became popular under the name connectionism.
The textbook by David E. Rumelhart and James McClelland[10] (1986) provided a full exposition
of the use of connectionism in computers to simulate neural processes.
Neural networks, as used in artificial intelligence, have traditionally been viewed as simplified
models of neural processing in the brain, even though the relation between this model and the
biological architecture of the brain is debated; it's not clear to what degree artificial neural
networks mirror brain function.[11]
Support vector machines and other, much simpler methods such as linear classifiers gradually
overtook neural networks in machine learning popularity. But the advent of deep learning in the
late 2000s sparked renewed interest in neural networks.

Improvements since 2006


Computational devices have been created in CMOS, for both biophysical simulation and
neuromorphic computing. More recent efforts show promise for creating nanodevices[12] for very
large scale principal components analyses and convolution. If successful, would create a new
class of neural computing[13] because it depends on learning rather than programming and
because it is fundamentally analog rather than digital even though the first instantiations may in
fact be with CMOS digital devices.
Between 2009 and 2012, the recurrent neural networks and deep feedforward neural networks
developed in the research group of Jrgen Schmidhuber at the Swiss AI Lab IDSIA have won
eight international competitions in pattern recognition and machine learning.[14][15] For example,
the bi-directional and multi-dimensional long short term memory (LSTM)[16][17][18][19] of Alex
Graves et al. won three competitions in connected handwriting recognition at the 2009
International Conference on Document Analysis and Recognition (ICDAR), without any prior
knowledge about the three different languages to be learned.
Fast GPU-based implementations of this approach by Dan Ciresan and colleagues at IDSIA have
won several pattern recognition contests, including the IJCNN 2011 Traffic Sign Recognition
Competition,[20][21] the ISBI 2012 Segmentation of Neuronal Structures in Electron Microscopy
Stacks challenge,[22] and others. Their neural networks also were the first artificial pattern
recognizers to achieve human-competitive or even superhuman performance[23] on important
benchmarks such as traffic sign recognition (IJCNN 2012), or the MNIST handwritten digits
problem of Yann LeCun at NYU.

Deep, highly nonlinear neural architectures similar to the 1980 neocognitron by Kunihiko
Fukushima[24] and the "standard architecture of vision",[25] inspired by the simple and complex
cells identified by David H. Hubel and Torsten Wiesel in the primary visual cortex, can also be
pre-trained by unsupervised methods[26][27] of Geoff Hinton's lab at University of Toronto.[28][29] A
team from this lab won a 2012 contest sponsored by Merck to design software to help find
molecules that might lead to new drugs.[30]

Models
Neural network models in artificial intelligence are usually referred to as artificial neural
networks (ANNs); these are essentially simple mathematical models defining a function or a
distribution over or both and , but sometimes models are also intimately associated with a
particular learning algorithm or learning rule. A common use of the phrase "ANN model" is
really the definition of a class of such functions (where members of the class are obtained by
varying parameters, connection weights, or specifics of the architecture such as the number of
neurons or their connectivity).

Network function
See also: Graphical models
The word network in the term 'artificial neural network' refers to the interconnections between t

You might also like