New Microsoft Office Word Document
New Microsoft Office Word Document
New Microsoft Office Word Document
inspired by biological neural networks (the central nervous systems of animals, in particular the
brain) which are used to estimate or approximate functions that can depend on a large number of
inputs that are generally unknown. Artificial neural networks are typically specified using three
things:[1]
Architecture specifies what variables are involved in the network and their topological
relationshipsfor example the variables involved in a neural network might be the
weights of the connections between the neurons, along with activities of the neurons
Activity Rule Most neural network models have short time-scale dynamics: local rules
define how the activities of the neurons change in response to each other. Typically the
activity rule depends on the weights (the parameters) in the network.
Learning Rule The learning rule specifies the way in which the neural network's weights
change with time. This learning is usually viewed as taking place on a longer time scale
than the time scale of the dynamics under the activity rule. Usually the learning rule will
depend on the activities of the neurons. It may also depend on the values of the target
values supplied by a teacher and on the current value of the weights.
For example, a neural network for handwriting recognition is defined by a set of input neurons
which may be activated by the pixels of an input image. After being weighted and transformed
by a function (determined by the network's designer), the activations of these neurons are then
passed on to other neurons. This process is repeated until finally, the output neuron that
determines which character was read is activated.
Like other machine learning methods systems that learn from data neural networks have been
used to solve a wide variety of tasks, like computer vision and speech recognition, that are hard
to solve using ordinary rule-based programming.
Contents
1 Background
2 History
o 2.1 Hebbian learning
o 2.2 Backpropagation and resurgence
o 2.3 Improvements since 2006
3 Models
5 Applications
o 5.1 Real-life applications
o 5.2 Neural networks and neuroscience
8 Theoretical properties
o 8.1 Computational power
o 8.2 Capacity
o 8.3 Convergence
o 8.4 Generalization and statistics
9 Criticism
11 Gallery
12 See also
13 References
14 Bibliography
15 External links
Background
Examinations of humans' central nervous systems inspired the concept of artificial neural
networks. In an artificial neural network, simple artificial nodes, known as "neurons",
"neurodes", "processing elements" or "units", are connected together to form a network which
mimics a biological neural network.
There is no single formal definition of what an artificial neural network is. However, a class of
statistical models may commonly be called "neural" if it possesses the following characteristics:
1. contains sets of adaptive weights, i.e. numerical parameters that are tuned by a learning
algorithm, and
2. is capable of approximating non-linear functions of their inputs.
The adaptive weights can be thought of as connection strengths between neurons, which are
activated during training and prediction.
Artificial neural networks are similar to biological neural networks in the performing by its units
of functions collectively and in parallel, rather than by a clear delineation of subtasks to which
individual units are assigned. The term "neural network" usually refers to models employed in
statistics, cognitive psychology and artificial intelligence. Neural network models which
command the central nervous system and the rest of the brain are part of theoretical neuroscience
and computational neuroscience.[2]
In modern software implementations of artificial neural networks, the approach inspired by
biology has been largely abandoned for a more practical approach based on statistics and signal
processing. In some of these systems, neural networks or parts of neural networks (like artificial
neurons) form components in larger systems that combine both adaptive and non-adaptive
elements. While the more general approach of such systems is more suitable for real-world
problem solving, it has little to do with the traditional, artificial intelligence connectionist
models. What they do have in common, however, is the principle of non-linear, distributed,
parallel and local processing and adaptation. Historically, the use of neural network models
marked a directional shift in the late eighties from high-level (symbolic) artificial intelligence,
characterized by expert systems with knowledge embodied in if-then rules, to low-level (subsymbolic) machine learning, characterized by knowledge embodied in the parameters of a
dynamical system.
History
Warren McCulloch and Walter Pitts[3] (1943) created a computational model for neural networks
based on mathematics and algorithms called threshold logic. This model paved the way for
neural network research to split into two distinct approaches. One approach focused on
biological processes in the brain and the other focused on the application of neural networks to
artificial intelligence.
Hebbian learning
In the late 1940s psychologist Donald Hebb[4] created a hypothesis of learning based on the
mechanism of neural plasticity that is now known as Hebbian learning. Hebbian learning is
considered to be a 'typical' unsupervised learning rule and its later variants were early models for
long term potentiation. Researchers started applying these ideas to computational models in 1948
with Turing's B-type machines.
Farley and Wesley A. Clark[5] (1954) first used computational machines, then called
"calculators," to simulate a Hebbian network at MIT. Other neural network computational
machines were created by Rochester, Holland, Habit, and Duda[6] (1956).
Frank Rosenblatt[7] (1958) created the perceptron, an algorithm for pattern recognition based on a
two-layer computer learning network using simple addition and subtraction. With mathematical
notation, Rosenblatt also described circuitry not in the basic perceptron, such as the exclusive-or
circuit, a circuit which could not be processed by neural networks until after the backpropagation
algorithm was created by Paul Werbos[8] (1975).
Neural network research stagnated after the publication of machine learning research by Marvin
Minsky and Seymour Papert[9] (1969), who discovered two key issues with the computational
machines that processed neural networks. The first was that basic perceptrons were incapable of
processing the exclusive-or circuit. The second significant issue was that computers didn't have
enough processing power to effectively handle the long run time required by large neural
networks. Neural network research slowed until computers achieved greater processing power.
Deep, highly nonlinear neural architectures similar to the 1980 neocognitron by Kunihiko
Fukushima[24] and the "standard architecture of vision",[25] inspired by the simple and complex
cells identified by David H. Hubel and Torsten Wiesel in the primary visual cortex, can also be
pre-trained by unsupervised methods[26][27] of Geoff Hinton's lab at University of Toronto.[28][29] A
team from this lab won a 2012 contest sponsored by Merck to design software to help find
molecules that might lead to new drugs.[30]
Models
Neural network models in artificial intelligence are usually referred to as artificial neural
networks (ANNs); these are essentially simple mathematical models defining a function or a
distribution over or both and , but sometimes models are also intimately associated with a
particular learning algorithm or learning rule. A common use of the phrase "ANN model" is
really the definition of a class of such functions (where members of the class are obtained by
varying parameters, connection weights, or specifics of the architecture such as the number of
neurons or their connectivity).
Network function
See also: Graphical models
The word network in the term 'artificial neural network' refers to the interconnections between t