1981 Albus - Brains, Behavior, and Robotics
1981 Albus - Brains, Behavior, and Robotics
1981 Albus - Brains, Behavior, and Robotics
Robotics
by
James S. Albus
The Libra n
University of Petrol*---
Daharan, Santil
Bibliography: p.
Includes index.
1. Artificial intelligence. I. Title.
Q335.A44 001.53'5 81-12310
ISBN 0-07-000975-9 AACR2
Cover Illustration
by Jonathan Graves.
Production Editing
by Peggy McCauley.
CREDITS
Chapter 2 2.1: from The Neuroanatomic Basis for Clinical Neurology, by Talmage L. Peele. ©
1961 by McGraw-Hill Book Co. 2.2: from Histologia Normal, by Ramon y Cajal. 2.3: from
Human Neuroanatomy, by Raymond C. Truex and Malcolm B. Carpenter. © 1969 by Williams
and Wilkins Co. 2.4: Peele. 2.5: Ramon y Cajal. 2.6: from Bailey’s Textbook of Histology, by
Copenhaven et al. © 1978 by Williams and Wilkins Co. 2.7: Truex and Carpenter. 2.8: Ramon
y Cajal. 2.9: from Physiology of Behavior, by Neil R. Carlson. © 1977 by Allyn and Bacon,
Inc. 2.10: from Organ Physiology, Structure, and Function of the Nervous System, by Arthur
C. Guyton. © 1976 by W. B. Saunders Co. 2.11: Truex and Carpenter. 2.12, 2.17, 2.19:
Guyton. 2.20: photo courtesy Drs. M. B. Bunge and R. P. Bunge, College of Physicians and
Surgeons, Columbia University.
Chapter 3 3.1: from D. Barker, Quarterly Journal of Microscope Science 89 (1948): 143-186.
3.2: Peele. 3.3-3.4: Guyton. 3.5: Carlson. 3.6: from Submicroscopic Structure of the Inner Ear,
edited by S. Iurato. © 1967 by Pergamon Press. 3.7: Guyton. 3.8: Guyton as modified from
Skogland, Acta Physiologica Scandinavica, Suppl. 124, 36:1 (1956). 3.9: Carlson. 3.10-3.11:
Guyton. 3.12: Carlson. 3.13-3.14: Guyton. 3.15: from The Vertebrate Visual System, by Stephen
Polyak. © 1957 by University of Chicago Press. 3.16: Carlson, redrawn from J. E. Dowling and
B. B. Boycott, Proceedings of the Royal Society (London) 166 (1966): 80-111. 3.17: Carlson.
3.18: Truex and Carpenter. 3.19-3.21: photos courtesy D. Hubei and T. Wiesel, Professors of
Neurobiology, Harvard Medical School. 3.23-3.24: from A Textbook of Physiological
Psychology, by S. P. Grossman. © 1967 by William C. Brown Co. 3.25-3.27: Carlson, 3.28:
from A. R. Tunturi, American Journal of Physiology 168 (1952): 712-727. 3.29: Peele.
3.30-3.22: Guyton.
Chapter 4 4.1: from Animals Without Backbones, by Ralph Buchsbaum. © 1948 by University
of Chicago Press. 4.3-4.5: Truex and Carpenter. 4.9: Guyton. 4.10-4.12: Truex and Carpenter.
4.13-4.14: Grossman. 4.16, 4.19: Peele. 4.22: Guyton from Warwick and Williams, Gray’s
Anatomy. © 1973 by Longman Group Ltd. 4.23: Guyton.
Chapter 7 7.8-7.9: from “Prospects for Industrial Vision,” by Tennenbaum, Barrow, and
Bolles. Computer Vision and Sensor-based Robots, edited by George G. Dodd and Lothar
Rossol. © 1979 by Plenum Publishing Co. 7.20: from “Attention in Unanesthetized Cats,” by
Raul Hernandez-Peon, Harald Scherrer, and Michel Jouvet. Science 123 (1956): 331-332.
Chapter 8 8.2-8.5: photos courtesy Musee d’Art d’Historie, Neuchatel, Switzerland. 8.6: photo
courtesy Billy Rose Theatre Collection, The New York Public Library at Lincoln Center; Astor,
Lenox, and Tilden Foundations. 8.7: BBC copyright photo. 8.8: photo courtesy General Elec¬
tric. 8.9-8.10: photos courtesy Auto-Place, Inc. 8.11: photo courtesy Unimation, Inc. 8.12:
photo courtesy PRAB Conveyors, Inc. 8.13: photo courtesy Unimation, Inc. 8.16-8.17: photos
courtesy Cincinnati Milicron. 8.18: photo courtesy International Harvester Science and
Technology Lab. 8.21-8.22: photos courtesy Unimation, Inc. 8.23: photo courtesy Cincinnati
Milicron. 8.24: photo courtesy Astek Engineerng, Inc. 8.25: Ben Rose photo. 8.27: photo
courtesy Stanford University AI Lab. 8.28: photo courtesy Cincinnati Milicron. 8.29-8.30:
photos courtesy SRI International. 8.34: photos courtesy Jet Propulsion Lab.
TABLE OF CONTENTS
pU
Chapter 10 Artificial Intelligence 281
Planning and Problem Solving
Production Systems
Language Understanding
Can Machines Understand?
References 341
Index 349
Brains, Behavior, and
Robotics
Mind and Matter 1
CHAPTER 1
w hat is mind? What is the relationship between mind and brain? What is
thought? What are the mechanisms that give rise to imagination? What is perception
and how is it related to the object perceived? What are emotions and why do we have
them? What is will and how do we choose what we intend to do? How do we convert
intention into action? How do we plan and how do we know what to expect from the
future?
These questions deal with the innermost secrets of the human brain and have
occupied philosophers for centuries. They address the relationship between the en¬
vironment and the imagination, between reality and belief. These are issues that lie
at the very heart of what we know and how we think.
Until recently such questions could only be addressed indirectly by subjective
introspection or by psychological experiments in which the majority of the critical
variables cannot be measured or controlled. Only in the past three decades, since the
invention of the electronic computer and the development of high-level program¬
ming, has it become possible to approach these issues directly by building
mathematical structures that exhibit some of the mind’s essential qualities: the abili¬
ty to recognize patterns and relationships, to store and use knowledge, to reason and
plan, to learn from experience, and to understand what is observed. The appearance
of these structures is a critical step in the study of mind, for it is difficult to under¬
stand phenomena without a mathematical model. Understanding implies the ability
to compare the model’s predictions with observed facts and then modify the model
until it becomes increasingly accurate in predicting behavior.
Many models of the mind have been constructed in the past. Throughout the
ages, philosophers from Aristotle and Descartes to Kant and Bertrand Russell have
attempted to formulate models to explain the ability of the mind to reason and
wonder, to know and understand. Psychologists from Freud and Pavlov to Skinner
and Piaget have attempted to construct theories that explain the phenomena of emo¬
tion, learning, perception, and behavior.
2
and behavioral situation/action rules are stored, and the motor system to be those
regions of the brain where the lower-level pattern recognizers and simple behavioral
skills are stored, then the mind and motor system are separate physical entities.
When these two are electrically or chemically in contact so that they communicate
information and command signals, they can act together in concert to perform
highly skilled and intellectually clever behavioral patterns. On other occasions, when
mind and motor system are functionally disconnected, they can pursue quite
separate activities. *
The famous split brain experiments of Roger W. Sperry show that when the
two sides of the brain are surgically disconnected, they function as two separate
brains. Just so, the higher levels of the brain, which produce the activity of mind,
can function independently from the lower levels of the motor system when the two
are logically disconnected. When nothing demanding is required of the motor
system (i.e., when the body is relaxed or engaged in some routine or overlearned
task), then the mind can disengage itself from the motor system. The motor system
can be given a command to perform a routine task and no further attention is re¬
quired of the higher levels for long periods of time. During this time the mind is not
occupied with supervising the motor system. It is free to “idle” or to otherwise oc¬
cupy itself with thoughts unrelated to current behavior or sensory experience.
Note the careful distinction between the activity of the mind and the physical
structure in which the mind resides. The mind is a process that takes place in the
physical structure of the upper levels of the hierarchical control system, the brain.
Unfortunately, this leads into a bit of semantic difficulty, for it is common usage to
refer to activities such as thinking, wondering, planning, hoping, and dreaming as
activities of the mind. Thus, we are confronted with the peculiar notion of an activi¬
ty of an activity. How can an activity (thinking) be an activity of another activity
(mind)? This technical difficulty might best be overcome by an analogy with the
computer science concept of a subroutine. A subroutine is a process that evokes
another process. It is a process within a process, or, alternatively, a process of a pro¬
cess. Thus, common programming practice provides a good example of an activity
of an activity. In short, thinking is a subprocess of mind, as is also imagining, believ¬
ing, perceiving, and understanding. *
The activity of the thinking mind might result in the selection of behavioral
goals for intentional action. Alternatively, it could confine itself to the hypothesis of
imaginary actions that it had no intention of carrying out. In either case, thinking is
the process that generates images and patterns; mind is the process in which the pro¬
cess of thinking takes place.
Thinking allows the mind to generate predictions of the sensory input that
would be expected when or if the actions intended or hypothesized by the mind were
executed. Predicted input may be generated from memories resulting from past in¬
stances of similar actions. Memories may also derive from stories heard or images
seen while a similar action was previously being hypothesized. The result is the same.
Both hypothetical as well as intended actions generate expectations of sensory ex-
4
perience that are made available to the sensory-processing system. \While an in¬
tended action is being executed, this internally generated expectation can be com¬
pared with the externally generated sensory experience. This makes it possible for
behavior to be guided by the difference between observation and expectation. On
the other hand, if a hypothesized action is merely imagined, the internally generated
expectation can still be analyzed and evaluated as if it had originated from the exter¬
nal world. This gives us the ability to plan, i.e., to think about and evaluate actions
before performing them.
If an evaluation modifies a hypothesized action, a new expectation will be
generated, leading to a new evaluation, and so on. The looping inherent in this pro¬
cess generates a series of hypotheses, expectations, and evaluations. The result is
what we call a thought or idea.
If this simple model of the mind is correct, it suggests that thinking first
developed as a means to facilitate behavior. After all, the brain is first and foremost
a control system, with a principal purpose of generating and controlling successful
goal-seeking behavior in searching for food, avoiding danger, competing for a mate,
and caring for offspring. All brains, even those of the tiniest insects, generate and
control behavior. Some brains produce only simple behavior, while others produce
very complex behavior. Only the most sophisticated and highly developed brains
show evidence of abstract thought. Cognitive thought separate from ongoing
behavior is a rare and recent phenomenon that occurs only in a tiny fraction of the
brains that have ever lived.
In its simplest form, thinking is the “deep structure” of behavior. But in
sophisticated brains, it can also generate expectations from hypothesized actions. In
its most highly developed form, thinking can be used to hypothesize an entire se¬
quence of actions and analyze the potential consequences of those actions before
they are actually performed. It can be used to rehearse future actions or to review
past actions. It is thus a mechanism for selecting and optimizing the most advan¬
tageous script for behavior in advance. Thinking allows us to analyze the past or to
plan for the future; it allows us to select goals and anticipate problems. Thinking
provides a means for choosing successful behavioral patterns that confer a clear ad¬
vantage in the competition for gene propagation.
Of course, once the mechanisms of thinking are developed for selecting and
controlling behavior, they can also be used for other purposes. In times of relaxa¬
tion, thinking can be used to hypothesize actions that yield pleasant memories of the
past or joyful anticipation of the future. We can hope and dream and fantasize.
Thinking allows us to wonder and calculate and contemplate our place in the
universe. It gives us the power to explore the logical consistency of the internal
models we use for understanding the environment of the natural elements, the living
creation, and the spiritual forces that lie beyond.
If thinking is primarily a high-level mechanism of behavior, then it would seem
that any serious attempt to model the cognitive powers of the mind should start with
the vastly simpler task of modeling the behavior-generating functions of the lower
Mind and Matter 5
regions of the brain. Once these are well understood, it could be possible to project
our understanding upwards and eventually understand the higher functions of the
mind.
For the most part, however, this has not been the approach taken in the study
of artificial intelligence. In 1950, Alan M. Turing wrote, “We may hope that
machines will eventually compete with men in all purely intellectual fields. But
which are the best ones to start with? . . . Many people think that a very abstract ac¬
tivity, like the playing of chess, would be best. It can also be maintained that it is
best to provide the machine with the best sense organs that money can buy, and then
teach it to understand. . . . This process could follow the normal teaching of a child.
Things would be pointed out and named, etc. Again I do not know what the right
answer is, but I think both approaches should be tried.”
Since that was written, both approaches have been tried. But by far the most ef¬
fort and commitment of intellectual and financial resources has gone into the first
method, the pursuit of abstract reasoning. The entire effort in the field of artificial
intelligence has been dedicated to an attempt to model the reasoning power of the
thinking mind. This is probably due to the historical fact that most of the brightest
pioneers in the field were trained in mathematics, a highly abstract and symbolically
orientated science. In a later chapter we will review some of the successes and
failures of the artificial intelligence approach to modeling the mind. •
In this book we will take Turing’s second approach. We will assume that the
precursor to intelligence is behavior control; that abstract thought arises out of the
sophisticated computing mechanisms designed to generate and control complex
behavior; and that first comes the manipulation of objects, then the manipulation of
tokens that represent the objects, and, finally, the manipulation of symbols that
represent the tokens.
This approach implies that the would-be mind modeler should first attempt to
understand and, if possible, reproduce the control functions and behavioral patterns
that exist in insects, birds, mammals, and primates. After these systems are suc¬
cessfully modeled, we might expect to understand some aspects of the mechanisms
that give rise to intelligence and abstract thought in the human brain.
Even so-called “simple” behavioral tasks are complex. A great deal of intellec¬
tual power goes into the most routine of our daily activities. It will be instructive to
examine briefly the level of intellectual activity required to perform a typical every¬
day task.
Consider the simple task of stopping at a shopping center on the way home
from work to buy a record. A detailed examination of your experiences in executing
this kind of task will not only illustrate the enormous complexity of the computation
6
and control problems involved, but also demonstrate the range of intellectual
capacities needed for such a task.
First, it should be pointed out that every task can be described by a hierarchy of
descriptive levels. In the example chosen here, the highest level description is simply
<PICK UP RECORD>. The modifier “on the way home from work” describes
the time slot in which the task <PICK UP RECORD > is performed.
A second hierarchical level of description is < GO TO SHOPPING
CENTER >, < PARK CAR >, <FIND RECORD SHOP>, < BUY RECORD >,
<FIND WAY BACK TO CAR> , < LEAVE SHOPPING CENTER > .
A description at a third hierarchical level would break down each of these ac¬
tivities into a sequence of simpler actions. For example, <FIND RECORD
SHOP > might decompose into < GET OUT OF CAR > , < LOCK CAR >,
<FIND ENTRANCE TO BUILDING >, < WALK DOWN CORRIDOR >,
< SEARCH FOR CORRIDOR CONTAINING RECORD SHOP>, <FIND EN¬
TRANCE TO RECORD SHOP>.
A description at a fourth hierarchical level would define each of these activities
as a sequence of still more detailed actions: <GET OUT OF CAR> might consist
of < REACH FOR DOOR HANDLE>, <PULL HANDLE >, <PUSH
DOOR >, < PUT LEFT FOOT OUT >, < TURN BODY LEFT > , < PUT RIGHT
FOOT OUT >, < STAND UP >, < STEP FORWARD > .
Each succeeding level would become more refined. A description at a fifth
hierarchical level would further break down these activities into a sequence of trajec¬
tories of limb movements. At a sixth level each trajectory would decompose into a
series of positions and velocities for each of the joints and a sequence of forces in
each muscle.
Descriptions at the higher levels are relatively independent of feedback from the
environment. <PICK UP RECORD> could apply to virtually any record shop.
However, the expansion of this task into the next lower level description depends on
the characteristics of the particular record shop in the particular shopping center. At
lower hierarchical levels, actions become more and more dependent on detailed con¬
ditions in the specific environment and less related to the global purpose under con¬
sideration. For example, the third-level task of < WALK DOWN CORRIDOR > re¬
quires visual information concerning the position of the walls, the position and tra¬
jectory of other persons walking in the same corridor, and the position of obstacles
such as benches, potted plants, etc. At the fourth level, the placing of feet and the
motions of the body depend on the position of floors, stairs, doors, and windows,
etc. Because of the increasing dependence at the lower levels on specific execution¬
time feedback from the environment, planning is mostly confined to the higher
levels.
It is important to understand the sophistication of the high-level perceptual and
intellectual capabilities involved in the execution of a simple task such as < PICK
UP RECORD > . This can be illustrated by a detailed first person account of the ac¬
tion sequence that actually took place one evening during a stop at the shopping
Mind and Matter 7
It
8
Upon entering the building, I walked down the corridor shown in figure 1.4.
Note that the visual cues in this scene are much simpler than those in the garage. The
lines formed by the intersection of the walls, floor, and ceiling apparently converge
on the end of the corridor. They are aligned with the flow of motion in the visual
field, which always radiates outward from the direction of motion. Any discrepancy
between the wall-ceiling lines and the visual flow lines can be interpreted by the vi¬
sion system as a velocity pointing error. This points out the importance of image
motion and visual flow in robot vision, to which we will return.
This particular shopping center consists of a circular building with corridors
radiating out like spokes on a wagon wheel. Once I reached the center shown in
figure 1.5, the problem then became: “Which way to turn?” Because I did not know
where the record shop was, I made an arbitrary choice to follow the flow of the lines
counterclockwise. I had been to the record shop once before but did not remember
where it was. The only clue that I could recall was that it had an old-fashioned
decor. The first corridor was definitely modern in style, so I tried the second, shown
in figure 1.6, which had an “old world” appearance. Of course, this required an
ability to distinguish old decor from modern.
A search of that corridor turned out to be fruitless. As I returned to the central
Mind and Matter 9
hub, I realized that a linear search of each spoke of the shopping center would re¬
quire an extensive amount of time, especially since there were three levels to each
corridor. At this point, I invoked the strategy “If lost, ask directions.” I thus began
a subtask, <FIND SOMEONE TO ASK>. The question then became “Who?” I
decided to ask one of the nearby shopkeepers and set off to find one who was not
busy with a customer. This required the ability to recognize a non-busy shopkeeper.
I finally found such a person in a tobacco shop. He said he wasn’t certain, but
he thought there was a bookstore that also sold records a few stores down the next
corridor. Following this suggestion, I finally came to the record store shown in
figure 1.7. This successfully accomplished the <FIND RECORD SHOP> task.
I then executed the <BUY RECORD > task and began the <FIND WAY
BACK TO CAR> task. I easily found my way back to the central hub, but now had
another right or left decision to make. If I went left and simply retraced my steps, I
could surely find my way back, but my previous search had been extensive, and it
probably would be shorter and take less effort to continue around to the right. Thus,
I continued counterclockwise, attempting to recognize the corridor where I had
entered. The first corridor I encountered had a cluttered appearance. I remembered
that my entrance corridor had been rather plain. So I rejected this corridor and
10
Figure 1.7: The record store. This was the goal of the
< FIND RECORD SHOP> task.
moved further to the right. The next corridor appeared plainer, and I decided that
this probably was the one. However, after walking halfway down it, I came to a
health food store that I did not recall seeing on my way in. But, I wasn’t certain—my
confidence that I was in the right corridor diminished. Next there was the meat
market shown in figure 1.8. Its striking appearance made me almost certain that if I
had seen it on the way in, I would have remembered it. Yet just ahead was a door
that had a familiar appearance. So, with great misgivings, I pressed onward.
Passing through the door, I encountered the parking garage; because it is a
relatively symmetrical structure, I reasoned that it must look the same at every door.
How could I be sure whether to go back and search some more—a long walk at
best—or go on and risk getting hopelessly lost? I then remembered that I had parked
my car at the edge of the roof of the parking garage, and that as I walked from my
car to the shopping center entrance, the shopping center building had been on my
right. Therefore, if I looked to my left from where I was now, I should see the corner
Mind and Matter 11
of the parking garage roof. I looked and saw, as shown in figure 1.9, that the roof
did not end. I was definitely not at the correct entrance.
Now certain of my error, I retraced my steps to the central hub and continued
my search counterclockwise. At the next corridor I saw a sign on the wall, shown in
figure 1.10, that said “Gartenhaus.” I remembered seeing this sign on my way in.
This landmark gave me great confidence. As I progressed down the Gartenhaus cor-
12
ridor, I noticed a window display that I also remembered. I was now certain that I
was on the right path. Going out the door, I looked up to the left and, sure enough,
there, as shown in figure 1.11, was the edge of the parking garage roof. I had found
my way back to the car.
This shopping center experience is interesting from a number of points of view.
First, it illustrates the complexity of the so-called simple tasks that we perform every
day. If we analyze routine daily activities—getting dressed in the morning, going to
school or work, preparing and eating meals, walking through woods or a crowd of
people—we will see that these simple activities are composed of intricate and com¬
plicated activities of manipulation and locomotion that require many subtle and
complex intellectual decisions.
Consider the complexity of the simple task that every child learns, the tying of
one's shoe, At a high level we can easily describe the procedure involved: < GRASP
THE TWO ENDS, ONE IN EACH HAND > , < FORM A CROSS >, < BRING
THE BOTTOM STRING OVER AND AROUND THE TOP >, etc., the type of in¬
structions that might be found in a Boy Scout knot-tying manual. However, try to
imagine the instructions required to implement each of these tasks at the lower
levels. Imagine the detailed instructions required to describe what to do for every er¬
ror condition: What if one string slips? What if one is too short to form a loop?
How much force should be felt when the task is proceeding correctly? How much is
felt, and in which direction, when a mistake has been made? How should a mistake
be corrected or would it be easier to start over?
We all perform such tasks every day, apparently without thinking. But the
amount of thought involved is considerable, and the amount of computation re¬
quired to process sensory input (especially visual images) so as to recognize relation¬
ships and patterns and to use that information in selecting and executing physical
movements is staggering.
Of course, there is nothing terribly mysterious about any of the isolated tasks
involved in everyday life. As in the shopping center problem, each task can be
broken down into a sequence of simpler subtasks. If there are a sufficient number of
levels in the computational hierarchy, each level merely needs to break each task into
a few subtasks. If each subtask decomposition is relatively short, it can be learned
and can be described by a reasonably compact set of behavioral rules. If the
breakdown is predictable, decision points at the various hierarchical levels can each
be described by a fairly small set of logical rules involving estimation of the costs
and benefits of various alternatives.
Much of the secret of complex behavior lies in structuring the computational
task into a hierarchy of computing modules such that each lower level describes the
task in only slightly more detail than the level above. This profound principle allows
a large number of relatively simple computing modules to be arranged in an in¬
tegrated system to produce behavior of arbitrary complexity. We will return to this
theme many times.
Note that it takes a human being many years to learn the motor skills and in-
Mind and Matter 13
tellectual powers needed to perform what adults consider to be routine tasks. A child
cannot learn to tie his or her shoes before using the fingers in complex manipulatory
tasks for three to five years. Most youths with less than six to eight years experience
in finding their way around their neighborhood would not be able to solve the shop¬
ping center problem. The apparent ease with which we solve everyday problems is
deceptive. Even the enormous computing power of the human brain does not deal
with such problems without difficulty.
Of course, once a set of effective procedures and algorithms is learned, the
problem appears simple: the many years of learning and practice are behind us. The
acquisition of skills and strategies is difficult and tedious, but once learned, their ex¬
ecution comes easily. In many ways, this is analogous to the apparent ease with
which performers in an ice skating show execute the most intricate maneuvers. Not
evident are the years of practice, the hundreds of failures and falls, the innumerable
hours of instruction, and the ruthless competition in which the less talented, less
skilled, and less determined performers were weeded out.
This suggests that the truly difficult part of complex behavior is in the learning,
or programming, by which the required skills and behavioral rules are acquired.
Once effective skills and strategies are mastered, they can be readily applied. It is the
discovery of the strategies and the learning of the skills that is difficult. Thus, a per¬
son or a robot who comes to some problem with a suitable repertoire of generalized
skills will appear very capable and intelligent. A person or robot without these
prelearned abilities will appear stupid and clumsy. »
The shopping center illustration also gives a number of clues as to how
memories are stored in the brain. For example, it suggests that memories are stored
in addresses defined by the state of both the body and the brain which existed at the
time that the experience originally occurred. The memory of the garage roof’s
geometrical position was recalled by mentally re-enacting the sequence of states in¬
volved in getting out of the car and walking to the shopping center entrance. The
mental image of walking with the shopping center building on my right recalled the
memory of the edge of the parking garage roof to my back. Similarly, the memory
of having seen the “Gartenhaus” sign was triggered by seeing it once again. This im¬
plies that the recall of sensory experience can be accomplished by creating the same
mental state that was present when the experience was stored. This is what we mean
when we say that memory is “context addressable.” It is why we are slow to
recognize people or objects when we see them out of context in unexpected places. In
later chapters we will explore possible mechanisms by which the brain performs this
type of memory storage and recall operation. We will also suggest a means by
which similar memory mechanisms might be constructed for a robot-control system.
Finally, the shopping center problem demonstrates the amount that can be
learned from careful observation and analysis of everyday activities. Nature has pro¬
vided us with innumerable examples of various degrees of intelligent behavior. Liv¬
ing creatures of every description abound and routinely demonstrate the most amaz¬
ing feats of manipulation, locomotion, and intellectual decision-making. Many
14
books are available that describe the behavior of various species in great detail.
These are valuable in that they relate behavioral patterns and cause-and-effect rela¬
tionships that are not easily observed. Nevertheless, there is no substitute for direct
first-hand observation.
For example, watching an ant climb a tree is enormously instructive in
understanding the complexity of legged locomotion. Observation of a beetle walking
reveals how such a creature can feel its way along in spite of very poor vision. Or
watch a bee move from flower to flower. Imagine the computing power of the visual
system that enables it to detect the motion of regions and edges in order to provide
the target tracking necessary to maneuver in a field of wild flowers. Watch a duck
land on a lake or fly in formation with companions. Consider the computational
problems of navigation and flight dynamics. Observe a squirrel jump from limb to
limb, a dog bury a bone, or a human play the violin or bake a cake. Consider the
problems of visual perception, motor coordination and dexterity, and intellectual
decision-making: these are the problems encountered when building a robot. How
do bees and ducks and squirrels and humans do what they do? What knowledge is
stored internally and how is that internal knowledge used to interpret the sensory in¬
put from the external environment?
Without the example of nature, we might easily conclude that intelligent
behavior is impossible. But obviously it is not. Even a mosquito can execute an in¬
tricate flight pattern, acquire and track a target, avoid getting swatted, execute a
precision landing, and perform a complex drilling operation on the skin of its vic¬
tim. How much computing power is there in the brain of a mosquito? There are but
a few thousand neurons in an insect’s brain. Surely this is within the capacity of
modern computers to duplicate or, at least, simulate.
To learn how creatures of nature do what we would like our robots to do, we’ll
first examine the computing structure of the brain. To show how the basic elements
of perception and behavior are organized in biological organisms, we’ll give a brief
survey of the structure and function of the sensory-motor system. Then we’ll
develop a theory of goal-directed behavior and propose a hierarchical structure of
computing modules that can produce the elements of such behavior in a robot.
Next, we will construct a neurological model that has the essential properties of
the proposed computational modules. We will show how this model can learn,
generalize, and recognize patterns. We will then try to show how such a computa¬
tional hierarchy might recall experiences, solve problems, plan tasks, select goals,
answer questions, structure knowledge of the world and events, understand music or
natural language, hope, dream, and contemplate the meaning of its own existence.
In the last five chapters, we will return to the subject of robotics and suggest
how the proposed hierarchical computing model can be used to build and program
robots with significant motor skills and intellectual capacities. Finally, we will offer
some speculations on the social and economic consequences of widespread use of
robots in the production of goods and services.
The Basic Elements of the Brain 15
CHAPTER 2
NEURONS
Neurons have four basic parts: a cell body, a set of dendrites, an axon, and a set
of terminal buttons. The cell body contains the nucleus and much of the machinery
that provides for the life processes of the neuron. Both the dendrites and cell body
receive informational input to the neuron. The axon is the output conductor that
transmits the neuron’s message to its destination. The terminal buttons are located
at the ends of the axon and release the transmitter chemicals that pass the neural
messages on to the next neuron.
Figure 2.1 shows a typical motor neuron. The cell body of the motor neuron is
usually located in the spinal cord, and the axon terminates on a set of muscle cells.
Epines.
Axone.
Long axons like those on motor neurons are encased in an insulating sheath of
myelin. Figure 2.2 shows a pyramidal neuron from the cerebral cortex. The axon
from this neuron sends out numerous collateral branches close to the cell body
before it enters the “white matter”—a layer of millions of myelin-encased axons all
bundled together like wires in a telephone cable. It is known as white matter because
in a brain preserved in formaldehyde the myelin coating on the axons in the bundle
appears whiter than the adjacent regions containing the cell bodies and dendrites.
These latter regions comprise the “grey matter.” The color of the white and grey
matter in photographs, however, depends on the type of stain which is used to
prepare the brain tissue. For example, the Weigert-Weil stain turns the white matter
black, as in figure 2.3.
Different types of stain can be used to bring out different aspects of a section of
neural tissues. Figure 2.3 shows three types of stain used on the same region of cor¬
tical tissue. Golgi stain brings out the entire shape of a few selected neurons in¬
cluding the dendrites and axons. Nissl stain brings out 100 percent of the neurons
18
I Tangential layer
II Dysfibrous layer
y^Interstriate layer
vInt. band of Baillarger
VI Infrastriate layer
but shows only the cell bodies. Weigert stain brings out the myelinated axon fibers.
Figure 2.4 shows the Nissl-stained cell bodies in seven different regions of the
cerebral cortex. Note the distributions of cell layers in the different regions. This
suggests that different types of computations are being performed in these various
regions. *
Figure 2.5 shows a variety of neurons in the cerebral cortex. Note that some of
the axons descend into the white matter, but that others rise and enter the fiber layer
near the surface of the cortex, and still others remain confined to a volume within
the grey matter near the neuron of origin. Figure 2.6 illustrates a number of dif¬
ferent types of neurons. Note that all except one have the basic form of dendritic in¬
puts, cell body, and axon output. The unipolar sensory neuron (A) shown in figure
2.6 is different. It has free endings that pick up sensory signals and transmit them to
the terminal arborizations in the spinal cord. The cell body of the unipolar neuron
hangs off to the side.
The Basic Elements of the Brain 19
Figure 2.4: Cell body organization in various cortical regions stained with Nissl stain. (A)
Precentral region, Brodmann area 4. (B) Postcentral region, Brodmann area 3. (C) Primary
visual cortex, area 17. (D) Superior temporal cortex, areas 41 and 42. (E) Associative visual
cortex, area 19. (F) Temporal cortex, area 21. (G) Inner temporal cortex, area 28. The
numbered Brodmann areas refer to the map of the cortex shown in figure 4.19.
20
The size and shape of each neuron depends on the particular computational job
it is asked to perform. Some neurons, such as the giant pyramidal cells, are large,
with cell bodies up to a tenth of a millimeter in diameter and axons that can reach
two feet or more in length. Other neurons are tiny, like the granule or stellate cells,
measuring only a few thousandths of a millimeter in diameter with axons that reach
less than a millimeter or two.
DENDRITES
The dendrites of a neuron resemble the branches or, in many cases, the roots of
a tree. Dendrites are covered with synapses much like the branches of a bush are
covered with buds in the spring. See figure 2.7. Synapses are the receptor sites that
receive the input signals from the axons of other neurons. Some neurons have den¬
dritic trees that branch extensively and receive inputs from thousands of other
neurons. The Purkinje cell, for example, has upwards of 200,000 synaptic inputs.
Other neurons have only a few dendrites and receive a small number of inputs. The
cerebellar granule cell typically receives from one to eight inputs.
enriched and purified fluid environment. They transport substances essential for
metabolism from capillaries to the neurons and transport waste products from
neurons to the capillaries, as illustrated in figure 2.9. This produces the so-called
“blood-brain barrier.” Glial cells regulate the chemical composition of the ex¬
tracellular fluid and even act as housekeepers by digesting and removing neurons
that die from injury, disease, or old age. In addition, they insulate neurons from
each other so that their electrical messages do not get scrambled.
AXONS
Axons are long tubes that carry the neurons’ electrical messages from the cell
body to the terminal buttons. The terminal regions of axons typically form branches
and attach themselves via buttons to synaptic sites on the dendrites and cell bodies of
the receiving neuron as illustrated in figure 2.10. In some cases the axon of the send¬
ing neuron will climb over the surface of the dendrites or cell body of the receiving
neuron like a vine, making repeated synaptic contacts. This is illustrated by the
climbing fibers in figure 2.11. In other cases, the axon of the sending neuron will
simply pass through the dendritic tree of a receiving neuron, making contact with
the few synaptic sites that happen to lie in its path, as illustrated by the parallel fibers
in figure 2.11.
Some neurons send axons to distant places in the brain or spinal cord. Motor
neurons leave the brain entirely to terminate on muscle cells and glands in the
24
Figure 2.11: A schematic diagram of the cerebellar cortex showing cell and fiber ar¬
rangements. Climbing fibers entwine about the branches of the dendritic trees of the Purkinje
cells, making repeated synaptic contact. This is quite different from the passage of the parallel
fibers through the dendritic trees of the Purkinje cells. Synaptic contacts here are made only at
those dendritic sites that happen to lie directly in the path of the passing fiber.
The Basic Elements of the Brain 25
Figure 2.12: Different views of the motor end-plate. (A) side view, (B) top view, (C) magnified
side view showing the axon terminal contact with the motor end-plate.
peripheral regions of the body. Figure 2.12 shows several views of the
neuromuscular junction, sometimes called the motor end-plate. Other neurons send
axons only to nearby neurons. Some axons travel long distances, branching only oc¬
casionally and ending with terminal buttons on target neurons in very specific
regions. Other axons branch extensively, ending in thousands, or even hundreds of
thousands, of synapses over a large and diffuse volume. %
SYNAPSES
The synapse is an electrical gate, or valve, whose resistance to the flow of cur¬
rent is controlled by the receipt of transmitter chemicals from the axon buttons of
other neurons. Three typical synapses are shown in figure 2.13. When an electrical
signal reaches the buttons at the terminal ends of the axon, tiny packets, or vesicles,
that contain a transmitter chemical are released. This transmitter diffuses across the
narrow gap between the button and the synaptic receptor site on a dendrite or cell
SPINE SYNAPSE
Figure 2.13: Different types of synapses on a typical neuron in the brain. Excitatory synapses
tend to have round vesicles and a continuous dense thickening of the postsynaptic membrane.
Inhibitory synapses tend to have flattened vesicles and a discontinuous postsynaptic mem¬
brane. [From “The Chemistry of the Brain/’ by L. L. Iversen. Copyright © 1979 by Scientific American, Inc. All
rights reserved.]
The Basic Elements of the Brain 27
body of the receiving neuron. The presence of the transmitter causes an electrical
current to flow in the synapse of the receiving neuron. This current may be either
positive or negative, depending on the type of transmitter chemical released.
As a general rule, a particular neuron releases only one type of transmitter
chemical. Thus, neurons can be classified as either excitatory (i.e., causing positive
current to flow in receiving neurons) or inhibitory (i.e., causing negative current to
flow). There is a synaptic receptor for every axon button. Thus, there are two types
of synaptic receptor sites: excitatory and inhibitory. A single receiving neuron may
have both excitatory and inhibitory inputs. Communication of information across
synapses is one-way, flowing from the terminal buttons of one neuron to the den¬
drites or cell body (and in some cases to the axon) of another neuron.
MEMBRANE POTENTIAL
Na+ FLOWS IN
AXON
INHIBITED STATE
Figure 2.15: Action of transmitter chemicals causes selective changes in the permeability of the
synaptic membrane. Excitatory transmitter increases permeability to sodium (Na+) ions. An
inflow of Na+ ions reduces the negative potential inside the neuron. Inhibitory transmitter in¬
creases permeability to potassium (K+) ions. An increased outflow of K+ ions increases the
negative potential inside the neuron.
The presence of transmitter chemical causes changes to occur in the size or the
configuration of tiny pores in the synaptic membrane. The excitatory transmitter
opens pores that allow positively charged sodium ions to flow back into the cell.
This current flow discharges the electrical voltage of the cell battery. Thus, the ex¬
citatory transmitter causes the receiver cell battery to depolarize or decrease its
negative voltage. This is illustrated in figure 2.15.
The presence of an inhibitory transmitter at an inhibitory synapse opens other
pores that are constructed to allow potassium ions to flow out of the neuron. This
current flow increases the charge of the cell battery. An inhibitory input, then,
causes the neuron to retain or even increase its normal negative battery voltage.
Using these excitatory and inhibitory transmitters, the neuron is able to receive
messages and compute functions. The neuron computes by summing the total of all
the positive and negative currents induced in the dendrites and cell body by the
transmitter chemicals. In most cases this computation is not just a simple arithmetic
sum, because the voltage in the cell body is influenced by the relative strength of the
The Basic Elements of the Brain 29
Figure 2.16: Stimulation of a neuron by synaptic inputs of different types located at many
points on dendrites produces a variety of electrical effects depending on the location and time
sequencing of the various inputs.
various synapses, the relative time of arrival of inputs from different sources, and
even the relative positions of the various synapses on the dendrites and cell body.
Thus, as shown in figure 2.16, the result of the neuronal computation may be a very
complex function of the totality of the inputs.
The electrical voltage in the cell body of a neuron at the point where the axon is
attached represents the result of its computation. It is its output. This voltage is a
piece of information, a scalar variable representing the state of the neuron. It is the
value of a parameter that may indicate some condition or event or the presence of
some pattern or relationship.
ACTION POTENTIAL
Once the neuron has computed its result, it must then communicate this infor¬
mation to its destination, which can be another neuron, a muscle, or a gland.
Transmission is not a simple problem, because the signal voltage is small (less than a
tenth of a volt), and the distance may be quite far. The axon that must carry this in¬
formation is very long compared to its diameter so that its electrical resistance is
high. This means that the signal voltage of the neuron will be dissipated in transmis¬
sion unless some means can be provided for boosting the signal strength along the
way. This is the purpose served by the action potential. The action potential allows
the signal voltage of the neurons to be transmitted over long distances by encoding it
as a string of pulses.
The walls of the axon tube have the ability to produce a dramatic and rapid
momentary reversal of the battery voltage in the axon. If the signal voltage of the
neuron cell body rises above about - 50 millivolts, the membrane walls of the axon
that connect to the cell body suddenly drop their resistance to the passage of sodium
ions. This allows sodium ions to rush into the axon causing its voltage to completely
reverse polarity and go positive to about 4- 50 millivolts. However, this is a momen¬
tary effect that disappears in about 0.5 millisecond to be replaced by an equally sud¬
den drop in the resistance to potassium. This causes potassium ions to rush out of
the axon, which in turn causes the axon to return to a voltage even more negative
than before the inrush of sodium. This entire sequence of events, illustrated in figure
2.17, happens in about one millisecond. The result is an electrical impulse that is
called an “action potential.77
The action potential propagates down the axon like a spark down a fuse, as il¬
lustrated in figure 2.18. As each section of the axon generates an action potential, it
depolarizes the section next to it so that it also generates an action potential. Thus,
no matter how long the axon may be or how many times it branches, an action
potential can be transmitted without loss in intensity.
In vertebrates, most axons that travel any significant distance are covered with
an insulating sheet of myelin. The role of the myelin is to increase the speed and
efficiency of the axon transmission. The myelin is interrupted every tenth of a
millimeter or so by a patch of bare axon. These patches, called the nodes of Ranvier,
are shown in figure 2.19. The current flow necessary for generating the action poten¬
tial in the axon wall cannot pass through the myelin insulator as it does while
propagating in an uninsulated axon. Instead it must flow around the myelin insula¬
tion, which causes the action potential to jump from one node to the next. This
greatly increases the speed of propagation from about 30 meters per second in an
unmyelinated axon to about 300 meters per second in a myelinated axon. It also
decreases the amount of energy expended in each action potential, because the ionic
currents of the action potential flow in only a tiny fraction of the axon surface.
The myelin is composed of the insulating membrane of a special type of cell
called the Schwann cell that wraps itself around and around the neuron axon, as
The Basic Elements of the Brain 31
I mm. length
Figure 2.19: An insulating sheath of myelin is formed by Schwann cells which wrap themselves
around the axon. Uninsulated patches between myelinated sections are called nodes of Ran-
vier.
The Basic Elements of the Brain 33
values are encoded as pulse frequency, or pulse spacing, or in some instances such as
in the localization of audio signals, as the phase, or relative time of arrival of action
potentials from two locations.
Signal encoding by action potentials unfortunately introduces quantization
noise into the information channel. This is because the action potential is a discrete
event, as is the pulse spacing between action potentials. The encoding of a con¬
tinuous voltage as a string of pulses is a form of quantization. The brain overcomes
this noise by redundancy; many neurons transmit the same message, each encoded
slightly differently. The average of a large number of neurons produces the accuracy
needed for precise control. This redundancy also provides improved reliability, im¬
portant in a structure in which approximately ten thousand neurons die every day of
disease, injury, or old age.
Neurons are the transistors, resistors, capacitors, and wires of the brain. Like
the individual circuit elements in a computer, neurons need to be interconnected in
specific ways to produce the computational power of the brain. In the next two
chapters we will briefly discuss the basic structure of the neuronal circuitry and the
networks of interconnecting pathways that collect sensory data, analyze informa¬
tion, and generate motor behavior.
Sensory Input 35
CHAPTER 3
Sensory Input
fluid in one or more of the semicircular canals is set in motion, causing the respective
cupulae to be deflected. The hair cells attached to each cupula are deflected, and
their neuronal firing rates are a measure of angular acceleration. Two other
chambers called the utricle and saccule contain a set of weights called otoconia that
is suspended on a gelatinous layer also containing hair cells. These are shown in
figure 3.6. If the position of the head changes with respect to gravity or if a linear ac¬
celeration occurs, the otoconia are displaced so that the hair cells buried in them can
report the direction of tilt or acceleration.
The vestibular system is the inertial guidance reference. Information from the
vestibular system is combined with force, velocity, tension, and position informa¬
tion from tendon, stretch, and joint receptors. Together, these form the vestibular
reflex that functions like an autopilot to maintain balance during a variety of bodily
motions. The neuronal computational mechanisms for the vestibular and stretch
reflexes are discussed in the next chapter. Circuit diagrams for the stretch reflex are
shown in figure 4.6; diagrams for the vestibular reflex appear in figure 4.13.
Figure 3.5; Two types of vestibular hair cells. Deflection of the hairs in one direction slows the
normal firing rate of action potentials in the vestibular afferent axons. Deflection of the hairs
in the opposite direction increases the firing rate.
Figure 3.6: Hair tufts from hair cells are also embedded in the gelatinous layer beneath the
calcium carbonate crystals making up the otoconia. Linear acceleration of the head causes a
shearing force that deflects the gelatinous layer, exciting the hair cells.
Sensory Input 39
TOUCH
There are at least seven different types of touch sensors. These are shown in
figure 3.7. First, there are free nerve endings that are found in the skin as well as in
many other places in the body. These can detect very slight pressure and provide an
extremely sensitive sense of touch. Second are Meissner’s corpuscles, which have
localized pressure-sensing capabilities. Abundant in the lips and fingertips, these
provide a high degree of spatial localization of the sense of touch and make it possi¬
ble to discern the shape of objects by touch. Third are the hair end-organs that
detect the mechanical deflection of the hairs to which they are attached. Pacinian
corpuscles, fourth, are particularly sensitive to vibration or rapid changes in
pressure such as might result from a blow. They are also found in the joints where
they are thought to signal the rate of motion of the joints. Fifth are the Ruffini end-
organs that signal the continuous deformation of the skin and deep tissues. These
are also found in joints where they signal the degree or position of joint rotation.
Figure 3.8 illustrates the type of information that is transmitted to the brain from
joint angle receptors. Additional types of touch receptors called Merkel’s discs and
Krause’s corpuscles are sixth and seventh, respectively. These are located in various
parts of the body and report different kinds of tactile information.
There are also specific sensory nerves for heat, cold, pain, and itch, all of the
free nerve ending type. The pain sensors are calibrated so that they begin to signal
pain precisely when tissue damage begins to occur.
In addition, there are visceral sensors that measure the condition of the internal
organs, the pressure in arteries, the degree of expansion of the lungs, the digestion of
food, the operation of the kidneys, the regulation of temperature, etc. An entire
subdivision of the nervous system, called the autonomic nervous system, is
dedicated to the control of the innumerable regulatory and control functions of the
body’s life support system. Figure 3.9 illustrates the basic structure of the autonomic
nervous system and the type of organs it controls. We will not deal further with this
portion of the nervous system as it has little direct connection to the part of the brain
that generates and controls intentional behavior.
40
Figure 3.8: Response of several different nerve fibers from sensory receptors reporting the
position of the knee joint in a cat.
Sensory Input 41
Parasympathetic
Figure 3.9: A schematic overview of the autonomic nervous system and the organs it ener-
42
For every sensory input there are specific nerves for each of the different sensa¬
tions. This feature is called the law of specific nerve energies or “place encoding.”
For example, the brain knows that touch has occurred at a particular point on the
body because a particular nerve fiber connected to a touch sensor at that point is car¬
rying a string of action potentials. The strength of the touch is encoded by the firing
rate. The location of the touch and the fact that a touch has occurred (as opposed to
heat or pain) is encoded in the particular touch fiber that is firing. There are also
several specific sets of fibers that report painful stimuli. The location of the pain is
communicated by the particular set of active pain fibers, and the intensity is com¬
municated by the number of pain fibers that are firing as well as by their rate of fir¬
ing. Similarly, there are nerve fibers that are sensitive to heat and cold. Their respec¬
tive rates are an indication of how hot or how cold it is.
As a general rule, the intensity of sensory input is encoded by the rate of firing
such that each increase of stimulus intensity by a certain percent tends to cause an in¬
crease in firing rate by a fixed amount. This leads to a logarithmic, or power law,
relationship between stimulus intensity and firing rate. This is called the Weber-
Fechner Law after the two gentlemen who first carefully measured the effect; see
figure 3.10. However, this power law is only a steady-state approximation, con¬
siderably modified by the temporal relationships between stimulus and firing rate.
The firing rate of a sensory neuron is typically quite high at the onset of the stimulus
and then decays rapidly to a much lower value as the stimulus remains constant. The
rate of decay varies from one type of sensory cell to another and is effected by an in¬
hibitory influence of the receptor circuits adjacent to the one being stimulated. As
shown in figure 3.11, the activity of a Pacinian corpuscle pressure sensor decays to
zero in about a tenth of a second; a hair-receptor touch sensor activation also decays
to zero in about a second. A muscle spindle or a Golgi tendon organ firing decays
more slowly, over five or ten seconds to about 50 percent of its initial value.
Pain-receptor input, however, does not decay. In fact, under some conditions, the
threshold for excitation of the pain receptors becomes progressively lower as the
painful stimulus continues.
Sensory Input 43
Figure 3.10: A plot of nerve firing rate versus stimulus strength demonstrating the Weber-
Fechner logarithmic or power law. Note that the logarithmic relationship does not hold at
either very weak or very strong stimulus strengths.
Figure 3.11: A plot of the firing rate of several types of sensory neurons versus time. Muscle
spindle and joint receptors are most sensitive at the onset of a stimulus, and the firing rate
decays with time. Hair receptors and Pacinian corpuscles sense only transitory stimuli.
44
VISION
One of the most important and widely studied of the senses is vision. As can be
seen from figure 3.12, the mechanical features of the eye are very like those of a
camera, with a lens that focuses the incoming light to form an image on a photosen¬
sitive surface, the retina. There is also an iris that can change the f-stop to regulate
the amount of light reaching the retina; adjustment of the iris increases the dynamic
range of illumination over which the eye can function by 30 times. However, the
largest portion of the eye’s dynamic range comes from photochemical changes in the
retina that adjust the sensitivity of the rods and cones. This latter process has a time
constant of several minutes, whereas the iris can adjust in a fraction of a second as
the gaze shifts from bright sunlight to shadow and back again. The muscles of the
iris are controlled by a reflex feedback system that is sensitive to the level of il¬
lumination on the retina. The neuronal pathway for this reflex is illustrated in figure
3.13.
The muscles that adjust the focus are controlled by a much more complex reflex
composed of several components. One component comes from the convergence of
the two eyes, i.e., the amount the two eyes are turned inward to point at a target.
The second comes from the fact that red and blue light focus at slightly different
distances through the same lens. The third is a slight oscillation in the focusing
muscles at a rate of one-half to two times per second. This causes the focus to
“hunt” around the point of maximum sharpness.
The fact that there are two eyes can be used to advantage in different ways
depending on the requirements of the particular species. For creatures of prey, like
rabbits and deer, the eyes are located on opposite sides of the head so as to cover the
entire hemisphere of the environment and warn of impending predators. For hunt¬
ing species, the eyes are located in the front of the head so that the visual fields
overlap, providing stereo depth vision. Among other things, this enables the brain to
continuously compute the distance to objects in the visual field. Stereo depth vision
measurements are highly accurate for distances within grasping or jumping range.
The resolution of the eyes is extremely high in the center of the field of view (the
fovea) and falls off rapidly in the peripheral regions. In humans the area of high
resolution is only one half of one degree in diameter, or roughly the size of the image
of the full moon. This is a very small area, but it is exactly in the center of the field
of view and is kept pointed at the center of attention at all times by an extraordinari¬
ly rapid and precise servo-like system. The photoreceptors in the fovea primarily
send signals to shape recognizing circuits in the higher level visual-processing centers
of the brain. The photoreceptors in the surrounding regions of the retina primarily
send signals to the areas that detect motion and position of objects and which direct
the eye muscles to point the fovea at points of interest. The pointing of the eyes is
controlled by three pairs of extraocular muscles: the lateral and medial recti, which
move the eye from side to side; the superior and inferior recti, which move it up
and down; and the superior and inferior oblique, which rotate the eyes, maintaining
Sensory Input 45
the visual fields upright and keeping the right and left fields in registration. The cir¬
cuitry for the control of eye-pointing movements is shown in figure 3.14a and b.
Posterior
chamber
TO
Conjunctiva
Edinger-Westphol
0 . . , / nucleus
Pretectal /
region / Ciliary ganglion
Visual
’’ossociofwn
oreos
In) - Primary
visual corfe*
-Superior rectus
Occipitotecral and
Inferior oblique Occipitocolhcular Iract
Figure 3.14a: The extraocular muscles that control the position of the eyes and the neuronal
nuclei that control them, b: The computing centers and neuronal pathways involved in the
control of eye position.
THE RETINA
The retina is composed of several layers as shown in figure 3.15. First is the pig¬
ment layer that lies on the inside surface of the eyeball farthest from the lens. Second
is the layer of rods and cones, or the photosensors, and third is the outer limiting
membrane. The outer nuclear layer, which contains the cell bodies of the rods and
cones, is fourth. Fifth is the outer plexiform layer, which is the location of the first
synapses in the visual pathway. Sixth is the inner nuclear layer, which contains the
cell bodies of the horizontal, bipolar, and amacrine cells. These provide the first
layer of processing of the visual image. The inner plexiform layer, which is the loca-
Sensory Input 47
HMllliil!^Tpi9men,ed
fLayer of rods and cones
J.j-Outer limiting membrane
^Horizontal cell
Inner nuclear layer
•fFiber of Muller
Amacrine cell
Inner plexiform layer
/^Ganglion cells
Ganglionic layer
-^Stratum opticum
'Inner limiting membrane
Figure 3.15: A diagram of the layered architecture of the retina. See text for a description of
the function of the different layers.
tion of the second level of synapses, is seventh. Eighth is the layer of ganglion cells
whose axons make up the optic nerve; these axons leave the eyeball and terminate in
the lateral geniculate body of the thalamus and in the superior colliculus. The ninth
layer is the optic nerve fibers on their way out of the eye. The inner limiting mem¬
brane, the tenth layer, separates the retina from the vitreous humor, which fills the
interior of the eyeball. A more detailed diagram of the retina illustrating the fine
structure of the various cells and synapses is shown in figure 3.16.
It is apparent from this anatomical structure that a great deal of processing of
the visual image must take place on the retina itself, a fact that has been confirmed
by neurophysiological experiments. The bipolar cells transmit excitatory informa¬
tion directly from the rods and cones to the ganglion cells immediately beneath
them. The horizontal cells transmit inhibitory information to ganglion cells from
rods and cones in the surrounding neighborhood. This produces the “center-on,
surround-off” response of many of the ganglion cells. A model of the circuit con¬
nections that produce this response is shown in figure 3.17. A structurally similar
but functionally inverse set of interconnections produces the “center-off, surround-
on” response. There are about the same number of ganglion cells with the “center-
off” response as with “center-on.”
The amacrine cells are also inhibitory cells, but their response is transitory in
contrast to the continuous response of the bipolar and horizontal cells. When the
photoreceptors are first stimulated, the amacrine response is intense, but this signal
dies away to almost nothing in a fraction of a second. This transient response pro¬
duces a sensitivity to changing light intensities, hence serving to detect motion in the
visual scene. Many of the ganglion cells that exhibit the on-center or off-center
response also have a transient response. Some produce a burst of action potentials
when the light stimulus is first turned on. Others produce a burst when the light is
turned off. This information is used at higher levels in the visual-processing system
to detect an object’s direction of motion.
The percentage of ganglion cells that are sensitive to motion is much higher in
the peripheral field of view than in the central. This is why moving objects in the
peripheral field can be readily detected, whereas stationary objects are not. The sur¬
vival advantage of this is obvious. Moving objects are much more likely to be of im¬
mediate importance than stationary ones: a moving object may be an approaching
enemy, or it may be a fleeing prey. When the eye is moving linearly through the en¬
vironment, the apparent motion of stationary objects is inversely proportional to
their distance. An object that appears to be moving rapidly represents a potential
collision. The ability of the vision system to detect moving objects over a wide field
of view directs the attention to important areas of the environment. This guidance
information sets the extraocular muscles to point the high-resolution part of the
visual field in that direction. Thus, the eyes tend to be kept pointing at the portion of
the visual field where the “action” is.
The recognition of motion is a relatively simple computation compared to the
recognition of shapes. Many lower forms can discriminate moving objects quite
well, but have virtually no capacity for recognizing stationary objects. For example,
motion discrimination is particularly well developed in the retina of the frog. In the
famous paper MWhat the Frog’s Eye Tells the Frog’s Brain,” Lettvin, Matturana,
McCulloch, and Pitts first discovered the extensive processing of motion signals
that takes place in the retina. The frog retina processes the visual signal to the extent
that there are neurons in the optic nerve of the frog that reveal the position and tra¬
jectory of flying bugs accurately enough so that the frog can snap them out of the air
with one flick of its tongue. Yet a frog will starve to death while looking at dead
bugs apparently because it cannot see them.
Higher forms, such as mammals, typically perform more sophisticated analyses
of shape and delay the detailed processing of motion data until higher levels in the
brain where other information can be integrated into the processing. However, even
in humans, the detection of motion begins in the retina itself, and this analysis re¬
mains an important component of the visual data-processing system.
50
Surround
Figure 3.17: A neural model that can account for the behavior of a ganglion cell with “center-
on, surround-off ” response.
COLOR VISION
There are three kinds of cones that have differential sensitivity to red, green,
and blue light. The rods are most sensitive to green. When all three types of cones
are equally stimulated, the perceived color is white. Different amounts of stimulus
of the various cones produce the perception of color. All the different colors,
shades, and hues that can be perceived arise from various combinations of stimuli of
the three types of cones.
Both rods and cones can change sensitivity to adapt to differing light levels. As
the light level decreases, the cones can increase sensitivity by about 60 times.
However, the rods can increase sensitivity by more than 25,000 times. Thus, in dim
light, vision is primarily generated by the rods that can’t discriminate color. In
brighter light, vision is mediated primarily by cones, and colors are perceived.
A single ganglion cell may be stimulated by a number of cones or only a very
few. When all three types of cones stimulate the same ganglion cell, the signal
transmitted is the same for any color. Such a ganglion cell is sensitive only to light
intensity, but not to color. Many other ganglion cells are excited by one color cone
and inhibited by another. Thus, color discrimination and analysis also begin in the
retina.
Sensory Input 51
The retinal ganglion cells send most of their axons to the lateral geniculate body
of the thalamus. From there the visual information is relayed to the visual cortex
through the pathways known as the optic radiations, shown in figure 3.18. In the
lateral geniculate, the optic fibers terminate in six layers as shown in figure 3.19.
Visual fields
Lower quadrant
\ ' \ /
\ / Nasal \ /
\ / retinae \ /
\i W\— U
Optic tract
Temporal retina
Oculomotor
Optic nerve
Optic chiasm
Superior colliculus
Optic radiations
Visual cortex
Left occipitaI cortex
( area 17)
Figure 3.18: A diagram of the visual pathways viewed from the underside of the brain. Note
that input from the left visual field falls on the right half of the retina in both eyes and projects
to the visual cortex on the right side of the brain. Input from the right visual field falls on the
left half of the retina in both eyes and projects to the visual cortex on the left side of the brain.
RIGHT EYE
Figure 3.19: A section through the lateral geniculate nucleus showing the layered structure.
Cells in layers I, 4. and 6 (numbered from the bottom to top) receive input from the eye on the
opposite side. Cells in layers 2, 2, and 5 receive input from the eye on the same side. The maps
are in register, so that the neurons along any radius (black line) receive signals from the same
part of the visual scene.
Layers 2, 3, and 5 (counting from the surface of the thalamus inward) receive inputs
from the outside half of the visual field of the eye on the same side of the head.
Layers 1, 4, and 6 receive signals from the inside of the visual field from the eye on
the opposite side of the head. Thus, the lateral geniculate body on the left side con¬
tains input from the right visual field of both eyes and the lateral geniculate on the
right contains input from the left visual field of both eyes. The registration of two
images from the two eyes in the lateral geniculate bodies establishes the basis for
stereo depth perception. Signals generated by black and white ganglion cells are
found mainly in layers 1 and 2, while signals carrying color information occur main¬
ly in layers 3 through 6. The receptive fields in the geniculate bodies have the same
on-center or off-center shapes similar to those found in the retina, although a much
higher percentage of geniculate cells respond to movement.
The lateral geniculate also receives a large number of fibers coming back from
the visual cortex and from the brain stem. This means that the cortex has the capaci¬
ty to modify the functions performed by the lateral geniculate so as to filter and
manipulate the incoming visual data. This looping structure and interaction between
incoming data and higher processing levels is characteristic of all the input pathways
in the brain.
Axons from cells in the lateral geniculate travel primarily to area 17 of the cor¬
tex, the primary visual cortex. Neurons that respond to lines at specific orientations
Sensory Input 53
are found here. Some neurons respond to dark lines on light background, others to
bright lines on dark background. Still other neurons respond to edges between dark
and light regions. Figure 3.20 shows the different kinds of responses for neurons in
the lateral geniculate and the cortex. These line and edge detectors were first ob¬
served by David Hubei and Torsten Wiesel. These neurons are termed simple if they
are sensitive to edges and lines in a particular orientation and position. Neurons that
have the same sensitivity to lines and edges at particular orientations but respond if
the stimulus is anywhere within a large area of the visual field are termed complex.
Still other neurons are sensitive to angles and corners; these are termed hyper¬
complex.
Figure 3.20: Common arrangements of lateral geniculate and cortical receptive fields. (A) On-
center geniculate receptive field. (B) Off-center receptive fields. (C-G) Various arrangements
of simple cortical receptive fields. "X” areas are excitatory "on” responses; "A” areas are
inhibitory off responses. Receptive fields shown here are all at the same angle, but each
type occurs in all orientations.
54
The visual cortex is markedly layered as can be seen in figure 3.21. Neurons
with the center-surround response tend to be located in layer IV. Simple line and
edge detectors lie just above them, and complex neurons are located in layers II, III,
V, and VI. The complex neurons can be further categorized; the ones found in each
layer are very different in a number of ways.
(a)
(b)
The neurons in the various layers also transmit their outputs to different
destinations. Layer VI projects back to the lateral geniculate; layer V projects to the
superior colliculus; layers II and III send their outputs to other parts of the cortex.
The visual cortex is also organized in columns that are arranged perpendicularly
to the cortical surface. All the neurons in a particular column are sensitive to lines
and edges at a particular orientation. If an electrode is driven through the cortex at
an angle so as to successively sample neurons from adjacent columns, the preferred
orientation gradually shifts as each new column is encountered. This is illustrated in
figure 3.22. Recent experiments, using radioactively labeled cell nutrients that can
reveal which neurons have recently been more active than their neighbors, have
shown that this pattern of shifting preferred orientation repeats every millimeter or
so. Every square millimeter of cortex corresponds to one resolution element of the
visual field. The high-resolution region of the fovea thus occupies a relatively large
percentage of the visual cortex.
Sensory data from the primary visual cortex go to a number of other regions
where they are processed further and integrated with information from other sen¬
sory pathways. We will deal with these higher level functions in a later section.
HEARING
Next to vision, hearing is the most extensively studied sense. The neural input to
the audio system begins in the cochlea. Sound enters the ear and the vibrations are
coupled from the ear drum into the fluid-filled cochlea by means of a mechanical
impedance transformer made up of three tiny bones called the hammer, anvil, and
stirrup. The cochlea is a snail-shaped compartment consisting of two- and three-
quarter turns of a gradually tapering cylinder as can be seen in figures 3.2 and 3.3.
This cylindrical tube is divided along its length into three sections. A cross-sectional
view is shown in figure 3.23. Pressure changes caused by sound vibrations enter the
oval window and cause the fluid in the scala vestibuli to move back and forth. This
motion is transmitted through the two membranes to the fluid in the scala tympanic
which causes the round window to move in and out. The result is that the basilar
membrane flexes up and down.
Secreting
epithelium/
’Area
/vascularis
r.f / Spiral
Spiral
Internal \ / / u_nf_a/ WVJT : ligament
ganglion
Capsule of
gang cell -
Figure 3.23: Cross-section of the cochlea. Pressure changes caused by sound vibrations are
transmitted through the oval window into the fluid in the scala vestibuli. This produces mo¬
tion of the fluid filling the cochlea and causes the basilar membrane to flex up and down.
Sensory Input 57
Tiny hairs are stretched between the tectorial membrane and the hair cells on
the basilar membrane, as shown in figure 3.24. These are deflected by mechanical
motion of the basilar membrane. The hair cells are sensitive to mechanical deflection
of the hairs and generate electrical signals as a result of the vibrations produced by
sound. The hair cells do not produce action potentials because they do not need to
transmit their information over any distance; rather, dendrites from the bipolar
neurons of the spiral ganglion make synaptic contact with the hair cells. These
bipolar neurons produce action potentials that encode the auditory information for
transmission to the cochlear nuclei.
The fluid in the center section, the ductus cochlearis, has an electrical potential
of +80 millivolts relative to the rest of the cochlea. Because the hair cells have a
- 70 millivolt potential, there exists a - 150 millivolt potential across the membrane
of the hair cells. The high electrical potential is thought to increase the sensitivity of
the cell to small movements of the hairs.
Figure 3.24: Enlarged cross-section of basilar membrane showing the relationship of the
cochlear hair cells to the tectorial membrane. As the basilar membrane flexes up and down in
response to the pressure waves of sound, the hairs connected to the tectorial membrane are
deflected back and forth. This produces an electrical signal in the hair cells and causes action
potentials to be produced on the neurons of the spiral ganglion.
58
There are about 20,000 outer and 3500 inner hair cells. Neurons in the spiral
ganglion synapse with these outer hair cells on a ten-receptor/one-neuron basis.
Though the inner hair cells are less numerous, neurons in the spiral ganglion synapse
with these inner hair cells on a one-receptor/one-neuron basis. Thus, the inner hair
cells are more heavily represented in the cochlear nerve.
Recordings made from the cochlear nerve indicate that different bipolar
neurons are sensitive to sounds with different pitch. Figure 3.25 shows the so-called
4‘tuning curves” for single auditory nerve fibers. The frequency discrimination
evidenced in these neurons is much better than would be predicted from the
mechanical tuning of the resonant cavity of the cochlea itself. Lateral inhibition of
the type which produces the center-surround effect in the visual system may account
for this phenomenon, a
It may also be that feedback from higher levels in the auditory system can
sharpen the frequency sensitivity through some form of a phase lock loop or
autocorrelation effect. The cochlear nerve also contains information sent from the
superior olivary nuclei to the point of origin of the auditory signal, shown in figure
3.26. These outward-conducting (efferent) axons synapse directly on the cell bodies
of the outer hair cells and on the dendrites of the bipolar neurons connected to the
inner hair cells. These efferent fibers convey inhibitory signals, but their exact func¬
tion is unknown. In any case, the cochlear ganglion neurons produce outputs similar
to the output of a comb filter. The firing rate on any particular neuron is analogous
to the value of a Fourier coefficient in a frequency spectrum.
Figure 3.25: Tuning curves of single auditory nerve fibers. Each individual nerve fiber is max¬
imally sensitive to a single frequency and moderately sensitive to a band of nearby frequencies.
Sensory Input 59
Figure 3.26: Pathways of the auditory system. Note that signals flow in both directions in the
auditory nerve. Sensory information originating in the cochlear hair cells flows to the cochlear
nuclei and superior olivary nuclei on the way to the inferior colluculi, the medial geniculate,
and the auditory cortex. However, signals originating in the superior olivary nucleus also
travel back to the cochlea where they modify the behavior of the hair cell receptors.
-U--^
v —
Fiber a
V-
_L K
v Fiberb v
i\ t\
* Fiberc V
_K J\ _
V , Fiber d V
l\ (\
v—
. . . V Fibere
Figure 3.27; Cochlear neurons tend to fire in synchrony with the mechanical displacement of
the hair cells. However, at frequencies above about 500 hertz, neuron firing rates cannot keep
up with every cycle of the mechanical motion. Thus, different neurons fire on different sub¬
multiples of the sound frequency. A combination of many fibers contains the frequency and
phase information of the original sound signal.
The neuronal pathways of the auditory information are shown in figure 3.26.
Nerve fibers from the spiral ganglion enter the cochlear nuclei located in the upper
part of the medulla. At this point, all the fibers synapse. From here the signals pass
mainly to the opposite side of the brain stem through the trapezoid body to the
superior olivary nucleus. Some fibers go to the superior olive on the same side. The
superior olive also sends some fibers back into the cochlea to synapse on the hair
cells and on the dendrites of the spiral ganglion cells near where they pick up their in¬
put from the hair cells. This architecture has many similarities to the design of a
phase lock loop or to an autocorrelation mechanism.
From the superior olivary nucleus, the auditory pathway passes upward to the
nucleus of the lateral lemniscus. Many fibers pass through this nucleus on their way
to the inferior colliculus where most terminate. From here neurons send their axons
to the medial geniculate body of the thalamus where all the fibers synapse. From
here the auditory tract spreads by way of the audio radiations to the auditory cortex
located mainly in the superior temporal lobe.
It should be noted that fibers from both ears are transmitted through auditory
pathways on both sides of the brain. Also many fibers enter the reticular system.
There are inputs to the cerebellum from several different levels of the auditory
system.
The auditory system is complex. The pathway from the cochlea to the cortex
Sensory Input 61
Suprasylvian
sulcus
Ectosylvian
gyrus
passes through at least four and as many as six neurons. Yet a high degree of tonal
organization is maintained. Figure 3.28 shows the representation of frequency and
intensity on the surface of the auditory cortex. This type of mapping exists at all
levels of the auditory pathway. However, the synchrony of action potentials with
sound waves is not preserved beyond the superior olivary nucleus except for sounds
below 200 hertz.
For primitive creatures, taste and smell are the most important senses. Certain¬
ly, from an information processing standpoint, these senses are the simplest.
Molecules entering the mouth or nose react directly with chemically sensitive
neuronal detectors to create the sensation. Specific chemical reactions with specific
incoming molecules create the basis for taste and smell discrimination. Very little ad¬
ditional processing is necessary to relate the neuronal signals to specific objects or
events in the external environment.
The tongue, palate, pharynx, and larynx contain approximately 10,000 taste
buds. Most of these are located such that they open into tiny trenches that trap
62
saliva. The taste receptors are connected with the free endings of unipolar sensory
neurons whose cell bodies are located in the ganglia of the seventh, ninth, and tenth
cranial nerves. These are, respectively, the facial, glossopharyngeal, and vagus
nerves. These pathways are shown in figure 3.29.
There are four types of taste buds that respond differently to the four types of
taste; sour, salty, bitter, and sweet. The response of these four types of cells to the
four types of taste are shown in figure 3.30, The various complex tastes that the
brain can recognize arise from various combinations of these four sensors. This is
analogous to the eye where all the colors result from various combinations of the
three-color receptors in the retina.
The nerve fibers from the taste receptors synapse in the nuclei of the tractus
solitarius. Cells from these nuclei synapse in the thalamus, and from the thalamus
taste fibers are transmitted to the opercular-insular area of the cerebral cortex. This
is next to the area that receives touch, pressure, and pain sensations from the
tongue.
The smell sensors reside within two patches of mucous membrane inside the
nasal cavity. Receptor cells have tiny cilia that protrude into the mucus. These recep¬
tors are neurons whose axons pass through tiny holes in the bone on the roof of the
nasal cavity into the olfactory bulb. The precise details of the chemistry and
neurophysiology by which odor-producing molecules interact with the cilia of the
olfactory receptors to produce action potentials is unknown.
The senses of smell and taste are often confused. Persons with bad colds often
claim they have lost their sense of taste. However, tests have shown that the loss is
that of smell rather than taste.
Output from groups of olfactory cells make synaptic contact with mitral cells in
synaptic clusters called glomeruli. The mitral cells make up the olfactory bulbs, and
their axons give rise to the olfactory tract as shown in figure 3.31. The olfactory
tract then enters the limbic lobe of the brain making synaptic contact in several lim¬
bic nuclei as shown in figure 3.32.
""-Olfactory hairs
.Olfactory cell
L—-Sustentacular cells
‘“'-Bowman's gland
MEDIAL
Hypothalamus OLFACTORY
‘"-Glomerulus
Brain stem x Habenular AREA
. ' nuclei
“""Mitral cell
^Olfactory
Olfactory /' /bulb
tract , ^Mitral
Olfactory bulb cell
Olfactory
'Olfactory tract membrane
It is significant that the olfactory tract enters the limbic system. In lower
animals the olfactory limbic cortex makes up the bulk of the cortical regions. The
limbic system is the seat of the emotions and functions as the evaluation system that
distinguishes good from bad. In lower forms the primary good-bad judgments con¬
cern whether to eat something or not, a decision based mostly on the sense of smell.
In higher forms the good-bad decision mechanisms are much more complex and
must deal with a much wider range of sensory inputs and behavioral choices. Hence,
the limbic system in higher forms is much larger and receives highly processed input
from all the senses.
CONCLUSIONS
This brief survey of the sensory input channels makes it clear that a great deal
of information processing goes on in the sensory pathways, starting in the sensory
receptor cells themselves. Each sensory pathway has a number of computational
modules dedicated to the processing of that particular input. Many, if not most, of
these modules receive input from the higher centers of the brain as well as from
the sensory input receptors. Outputs from the sensory-processing modules often
travel both to low-level, behavior-generating modules as well as to higher levels in
the sensory-processing system. Gradually, as the sensory data makes its way toward
the higher levels of the brain, it becomes integrated with data from other senses.
How does this complex interaction of sensory input and higher level signals generate
the phenomenon of perception? How is the perceived sensory information
translated into behavioral actions? Before addressing these fundamental questions
in the study of the brain, it will be useful to examine the neurological structures
where this translation is performed.
The Central Nervous System 65
CHAPTER 4
The human central nervous system, the most complex structure in nature, con-
sists of trillions of neurons connected together in such a way to produce the
phenomenon of conscious cognitive behavior and imagination. In the peripheral
regions of the sensory and motor systems, specific neurons and clusters of neurons
tend to have specific functions. However, at the higher levels in the neuronal hier¬
archy, the various modalities become intermixed, and it is not possible to relate a
specific neuron with a specific muscle or sensory receptor. Areas in the higher
regions of the brain tend to be related to specific behavioral actions, such as eating,
articulating thoughts, or thinking about spatial relationships.
This transformation from specific effectors and receptors to specific behavioral
actions takes place gradually, as the number of synapses from the periphery grows.
It is the natural result of a hierarchical goal-directed system. As will be seen later in
figure 9.1, the computational modules at each level in a hierarchy have more general
concerns than do the modules beneath them. At the very bottom, the computations
concern only a single muscle group. At slightly higher levels, computations concern
coordination of several muscle groups. At higher levels still, the motion of an entire
limb or concerted actions between limbs are computed. Finally, at the highest levels
of the brain, the entire body must be coordinated in a single activity directed toward
a future goal which may be expressed symbolically or even philosophically. Hierar¬
chical command and control structures which extend beyond the single individual
allow groups of individuals or entire societies to be coordinated in the pursuit of
family, tribal, or national goals.
We will begin this progression from the bottom up, starting with the motor
neurons of the spinal cord. As we go, we will note significant features of the struc¬
ture of the brain that are pertinent to the computing architecture we’ll later propose
for robot-control systems.
The most obvious hierarchical partitioning of the central nervous system
divides it into three levels. At the lowest level is the spinal cord; above that is the
Figure 4.1: The central nervous system is a hierar¬
chical structure with three main levels: the spinal
cord, the brain stem, and the forebrain.
brain stem, and finally at the top is the forebrain, shown in figure 4.1. Within each
of these three main subdivisions, there are many intermediate levels.
The spinal cord is much more than a bunch of axons carrying commands from
the brain to the muscles and sensory information from the sensor organs to the
brain. The cord contains a number of computing centers that coordinate extensor
and flexor muscles to facilitate standing, walking, running, and jumping. It
generates complex reflexes such as the one that causes a falling cat to land on its feet
and those that generate rhythmic stepping motions, reciprocal stepping of opposite
limbs, diagonal stepping of all four limbs, and even scratching movements. These
computing centers constitute the first and second levels in the sensory-motor hier¬
archy controlling the limbs, hands, feet, and digits.
The spinal cord also houses command centers for a number of autonomic
The Central Nervous System 67
Figure 4.2: Primitive nervous systems: (a) planaria (b) earthworm (c) bee (d) amphioxus.
68
This does not mean, however, that the motor patterns of the insects are simple.
The muscle coordination required to produce the flight patterns of a bee, dragon fly,
or mosquito, or the climbing skills of an ant or beetle are considerable. Even though
these types of movements are apparently computed in a one- or two-level computing
structure, the amount of multivariable computation that goes on in the insect brain
is astonishing. A close examination of the manipulatory dexterity in the leg of an ant
reveals a degree of control sophistication not found in any existing laboratory robot.
The spinal cord is thus a formidable computational machine and has been so from
the beginning. It is, in fact, the basic building block of the brain. It is the first and
second level input-output computational module.
If the spinal cord of a higher mammal is cut in two, the exposed face of the cut
has a cross-sectional appearance, as illustrated in figure 4.3. The dark regions in the
diagrams correspond to the grey matter at different levels in the spinal cord. Notice
the resemblance to a pair of horns. The upper part of these figures is toward the
back. The upper horns are thus the dorsal or posterior horns. The lower part is
toward the front. These are the ventral or anterior horns.
The motor neurons are located in the anterior horns as shown in figure 4.4 and
the axons of the motor neurons leave the cord in a series of little bundles along both
sides of the front of the cord. These bundles gather together to form the ventral
roots. A similar series of little bundles, consisting of axons from sensory neurons in
the periphery, enter the spinal cord from the rear. These are called the dorsal, or
posterior, roots and are shown in figure 4.5.
The sensory neurons are unipolar neurons. One unipolar neuron is shown in
figure 2.7a. The cell bodies of the unipolar sensory neurons reside in the bulges
called the dorsal root ganglia. The axons from the dorsal roots may synapse on sen¬
sory-processing neurons in the posterior horns, or they may travel into the anterior
horns to synapse on the motor neurons. The posterior horn neurons send axons up
the cord toward the brain.
The dorsal and ventral roots merge into single bundles called spinal nerves
before leaving the protection of the bony channel of the spinal vertabrae. The spinal
nerves often follow blood vessels and branch repeatedly, as the motor axons trace
out pathways to the muscles they control and the sensory axons fan out to the sen¬
sory endings by which they are stimulated.
Figure 4.4: Diagram of the position of motor neurons in the forward grey horn of a lower cer¬
vical segment of the spinal cord. On the left is shown the locations of motor neurons control¬
ling specific muscle groups. On the right are the axon pathways leaving the cord. Note that
some collaterals from outgoing axons return to synapse on intermediate cells.
70
Posterolateral
spinal vein
vasocorona
Arachnoid
Spinal arteries
Dura mater
Figure 4.5: The axons of motor neurons leave the front of the cord in a series of small nerve
bundles that gather together to form the anterior roots. Sensory neurons enter the cord from
the rear through the dorsal roots, which split into similar bundles. The anterior and dorsal
roots join to form the spinal nerves, which leave the protection of the spinal column and travel
to the muscles and sensory receptors in the periphery. All neural tissue in the central nervous
system is covered by two membranes: the arachnoid and the dura.
It is perhaps not surprising that the first computational level in the sensory-
motor system is the best understood. The servo level control system in the vertebrate
motor system is primarily composed of the stretch reflex. Figure 4.6 illustrates the
essential components of the stretch reflex. The muscle spindle attached to a muscle
bundle contains a sensory ending that sends out a stream of action potentials
whenever it is stretched. The rate of the pulses is proportional to the amount of
stretching. The axon carrying this signal from the muscle spindle enters the spinal
cord through the dorsal root. There it terminates with an excitatory synapse on the
The Central Nervous System 71
motor neuron which controls the muscle bundle to which the spindle is attached.
When the muscle is stretched, the spindle increases its firing rate. This excites the
motor neuron, which commands the muscle to contract, thereby counteracting the
stretch. The net result tends to keep the muscle at a constant length despite varia¬
tions in external load.
The advantages of this system are clear: the stretch reflex in the muscles of the
legs tends to hold the body upright. If the body starts to fall forward, the muscles in
the back of the legs are stretched; this activates the stretch reflex to contract these
same muscles and pull the body back erect.
The spindle also has a tiny auxiliary muscle of its own that can be used to con¬
tract and hence shorten the tissue by which it is attached to the much larger muscle
bundle. The relationship between the muscle spindle and the larger muscle bundle is
shown in figure 3.1. When the tiny attachment muscles contract, they shorten the
overall length of the spindle, and the spindle then fires at a shorter length. The
neurons that control these shortening muscles, called gamma neurons, essentially set
Figure 4.6: The essential components of the stretch reflex. A sensory neuron from a spindle
stretch receptor enters the cord through a dorsal root and makes excitatory synapses on an
alpha motor neuron. When the muscle to which the spindle is attached is stretched, the
resulting activity on the sensory neurons excites the motor neuron to counteract the stretch.
The gamma motor neuron can shorten the spindle organ to increase the sensitivity of the
stretch receptor.
72
the length of the spindle at which a certain firing rate will occur. If the gamma
neurons fire rapidly, the spindle will start firing when the muscle bundle to which it
is attached is stretched only a little. If the gamma neuron is firing slowly or not at
all, the spindle will not respond until the muscle bundle is stretched to a much longer
length. Figure 4.7 is a set of curves illustrating the input-output of the spindle at
various firing rates on the gamma neuron.
The output of the spindle sensor travels to the spinal cord where it enters
through the dorsal roots and terminates with excitatory synapses on the dendrites of
the alpha motor neurons as shown in figure 4.6. When the gamma neuron fires at a
rate gu it shortens the ends of the spindle to a length /(gi). If the muscle bundle at¬
tached to the spindle is stretched by more than an amount L(gi), the spindle will fire,
sending a signal to the motor neuron that controls the muscle bundle commanding it
to resist further stretching. Thus, the gamma neuron can set the point at which the
stretch reflex resists further movement. The effect is that the limb moves to a posi¬
tion set by the firing rate on the gamma motor neuron. The gamma neuron, muscle
spindle, and motor neuron thus comprise a position servo. A particular firing rate
on the gamma neuron tends to produce a particular length of a muscle and hence a
particular position in the joint angle.
Figure 4.8 shows a schematic diagram of the computing modules involved in the
gamma position servo. The position command enters the motor-output module and
generates a particular firing rate on a gamma neuron. The gamma neuron sends its
indication of what the spindle length should be to the sensory spindle where it
becomes an expected position. The spindle compares the actual position with the ex¬
pected and puts out an error signal, which comes back to the motor output module
SPINDLE FIRING
RATE
Figure 4.7: A plot of spindle firing rate vs. length of motor muscle for various levels of gamma
neuron activity g0, gu gi, and g3.
The Central Nervous System 73
DESIRED DESIRED
POSITION RATE OR FORCE
Figure 4.8: A schematic diagram of the relationship between the motor neurons, muscles,
stretch sensors, and input commands from higher motor centers.
RECIPROCAL INHIBITION
Excited Excited
There are also reflex pathways that travel up and down the cord on both sides.
These are the mechanisms that produce the righting reflexes and the diagonal step¬
ping of all four limbs.
The spinal cord possesses a large computing capacity just related to motor
reflexes. It also contains a number of sensory-processing modules, or nuclei: the
nucleus dorsalis, sometimes called Clark’s column, the intermediomedial nucleus
and several others. The neurons in these nuclei receive input from the sensory
neurons of the dorsal roots, as well as collaterals from the motor-command neurons
of the pyramidal and extrapyramidal tracts. They send their axons upward to the
cerebellum, the thalamus, and other spinal cord neurons. These nuclei are certainly
involved in some of the spinal reflexes as well as in other sensory-processing com¬
putations which are not as well understood as those of the spinal reflexes.
SPINAL TRACTS
In addition to the computing centers located in the grey matter of the horns of
the spinal cord, there are also a great number of axon pathways in the spinal cord
that carry motor command signals from the brain and sensory signals to the brain.
Figure 4.10 illustrates some of the sensory pathways to the cerebral cortex from the
touch and pressure sensors and from the joint receptors for position and movement.
Data carried by these pathways is processed through two sets of computing centers:
one set in the medulla consists of the nucleus gracilis and the nucleus cuneatus, and
the other in the thalamus is the ventral posterolateral nucleus. However, there is
another pathway for touch sensors that goes directly to the thalamus and from there
to the cortex. A third pathway for heat, cold, and pain receptors also travels directly
to the thalamus and on to the cortex. Finally, there is a pathway for data from mus¬
cle spindle and tendon organ information that travels to the cerebellum. Most of
these fiber bundles give off collaterals to various nuclei in the spinal cord and brain
stem as they pass by.
An equally diverse set of axon pathways travels downward. For example, figure
4.11 shows the pyramidal motor pathways that travel directly from the motor cortex
to the motor neurons in the ventral horns. These fibers, named pyramidal fibers,
pass through the triangular-shaped regions called the pyramids in the medulla.
There also exists a second set of motor-command fiber pathways from the vestibular
system to the motor neurons, and a third set from the red nucleus and the tectum to
the motor neurons.
Each of these pathways contains many hundreds of thousands of individual ax¬
ons, far more than necessary simply to move the limbs and report the results. The
purpose of such large numbers is twofold. Redundancy is the first: large numbers of
nerve fibers can be rendered inoperative through injury, yet the system can still func¬
tion, or at least can recover its function. The second is precision: an individual
neuron is a noisy and unreliable information channel; if the same information is en-
coded by many neurons, the statistical average of all the signals is much more precise
than any one signal by itself. A large number of motor neurons makes possible very
precise movements, and a large number of sensory neurons makes possible a very
fine degree of sensory discrimination.
Figure 4.10: Some of the upward traveling sensory pathways from touch and pressure sensors
and joint receptors to the cerebral cortex. Data carried by these pathways is processed through
the nuclei gracilis and cuneatus and then through ventral posterolateral nucleus of the
thalamus. There are three other sensory pathways: for touch sensors; for heat\ cold, and pain
sensors; and for stretch and tension sensors.
The Central Nervous System 77
Longitudinal fibers in
basilar portion of pons
Abducens nerve
Pyramidal decussation
To motor endings
in MM. of forearm
and hand
Figure 4.11: The principal downward-traveling motor pathways from the cerebral cortex to
the spinal motor neurons. These are the corticospinal pyramidal tracts. Other motor
pathways originate in the vestibular system and in the red nucleus.
78
At the top of the spinal cord just above where the cord enters the skull is the im¬
posing structure of the brain stem shown in figure 4.12. In lower forms the computa¬
tional centers of the brain stem are the highest levels that exist. But in humans, the
brain stem is subordinate to many layers of higher level computational centers. The
relationship of the brain stem to the rest of the brain in the human can be seen in
figure 4.1. The bulges of the cuneate and gracilis nuclei can be seen in figure 4.12 at
the top of the spinal cord where it joins the medulla oblongata.
Thalamus
Ant. tubercle of t
Fornix
Cerebral peduncle
Pulvinar
commi ssure
Medial geniculate
Hypophysis body
Middle
cerebellar
Facial i
peduncle
N. intermedius
Vestibulocochlear
Abducens n.
Glossopharyngeal n.
Inf. cerebellar
Pyramid peduncle
Vagus n
Inferior olive Medulla oblongata
Hypoglossal n.
Cuneate tubercle
Accessory n.
Gracilis tubercle
A number of nerve bundles enter and leave the brain stem. The accessory nerves
receive from and transmit to the neck and upper torso musculature. The hypoglossal
nerve carries taste information and controls the tongue muscles. The vagus and
glossopharyngeal nerves send and receive from the larynx, trachea, esophagus,
heart, and chest and abdominal viscera. The abducens, trochlear, and oculomotor
nerves control the muscles that position the eyes. The vestibulocochlear nerve con¬
veys information from the ears and vestibular sensors. The facial nerve plus the
three nerves of the trigeminal ganglion, the ophthalmic, maxillary, and mandibular
carry sensory information from and motor commands to the face and mouth and
control the tear and salivary glands. The largest nerve bundle projecting forward is
the optic nerve that carries visual information from the eye to the lateral geniculate
body of the thalamus. The anterior commissure is the smaller of two nerve bundles
that connect the two sides of the cortex. The larger is the corpus callosum, which
crosses just above the thalamus. Shown next to the anterior commissure is the
cerebral peduncle, the bundle of neurons from the motor cortex down through the
pyramids to the motor neurons in the spinal cord. At the back of the brain stem are
the cerebellar peduncles, nerve bundles carrying information into and out of the
cerebellum. Just above the cerebellar peduncles are the two bumps of the inferior
and superior colliculi.
THE MEDULLA
The lowest major segment of the brain stem is called the medulla oblongata.
Among the most important structures in the medulla are the vestibular nuclei.
Figure 4.13 illustrates the pathways of the vestibular system. The information from
the hair cells in the vestibular sensors is transmitted via the vestibulocochlear nerve
to the vestibular nuclei where the computations necessary to maintain equilibrium
are performed. Some fibers also go directly to the cerebellum. As can be seen in
figure 4.13, output from the vestibular nuclei goes to the motor computational
centers for the eyes, the neck muscles, and the body and limb muscles. Outputs from
the vestibular nuclei also go to the cerebellum, the computational center for rapid,
precise motor activity.
An example of one of the many functions of the vestibular system related to
balance and equilibrium is the automatic stabilization of the gaze of the eyes. As the
head turns, the vestibular reflex causes the eyes to turn the opposite direction by an
equal amount so that the gaze remains fixed. If the head continues to turn, the eyes
jump rapidly ahead and then rotate back at the rate of head turning, thus fixing the
gaze on another spot. This action is known as “nystagmus.”
The vestibular nuclei also receive input from the joint receptors in the neck.
This allows the vestibular nuclei to subtract a tilt of the head in the computation of
tilt of the body. The commands sent to the body muscles thus can maintain the
balance of the body in spite of motions of the head.
The Central Nervous System 81
The center core of the brain stem from the medulla up to the thalamus consists
of a diffuse mass of cells and fibers called the reticular formation, one of the most
primitive brain structures. In the course of phylogenetic development, distinct cell
masses and fiber bundles have arisen to overlay and surround the reticular forma¬
tion. To a large extent these have taken over the command and control duties of the
more primitive reticular formation. Nevertheless, many specific functions remain,
including some basic motor responses. Stimulation of the reticular formation may
elicit movements and even affect complete postural adjustments. More typically, the
reticular formation acts to facilitate or inhibit spinal motor mechanisms and thus
control the intensity rather than initiate or direct movements. The reticular forma¬
tion is also involved in the processing of sensory information from a variety of
sources. It controls habituation and arousal and has the ability to filter, modify, and
direct attention to sensory input. It has a profound influence on sleep and waking
and even performs some higher level integrative processes related to motivation,
emotion, and learning. Specific groups of cells in the reticular formation are known
to control such activities as gastrointestinal secretions, vasoconstrictor tone, and
respiration.
Surrounded on all sides by the fiber tracts and nuclei of the specific sensory
pathways as well as by the pyramidal and extrapyramidal motor systems, the
reticular formation receives input from the basal ganglia and from the cerebellum. It
also receives a profusion of collaterals from many ascending sensory pathways. In
addition, some axons from the spinal sensory neurons travel directly to the reticular
formation and synapse on cells scattered throughout its longitudinal extent. It is not
uncommon to find a single cell of the reticular formation that will fire or stop firing
in response to sensory input from a variety of modalities.
The pons is a large bulge on the front of the brain stem, containing a number of
computing centers that service the facial muscles, including the jaws and lips. It is
perhaps significant that the pons is well-developed in primitive creatures, whose
jaws and lips are principal manipulatory organs.
The cerebellum is attached to the brain stem just behind the pons by three large
fiber tracts called the cerebellar peduncles. The cerebellum is involved in the control
of rapid, precise muscular activities such as running, jumping, typing, or playing the
piano. It is not surprising that birds have a large cerebellum because of the require¬
ment for rapid, precise movements involved in flying.
The cerebellum receives motor-command inputs from the motor cortex by way
of the pons as well as feedback inputs from muscle spindles, Golgi tendon organs,
and tactile receptors in the skin and joints. These convey information as to the status
of muscle contraction, degree of tension in muscles, positions of the limbs, and
forces acting on the surface of the body.
The cerebellum transmits its outputs to the motor cortex through the thalamus,
as well as to the basal ganglia, the red nucleus, the reticular formation of the brain
stem, and the vestibular nuclei. A more complete description of the neural structure
and computational properties of the cerebellum is presented in Chapter 6.^
THE MIDBRAIN
Just above the pons and cerebellum is a region called the midbrain. The mid¬
brain is the smallest portion of the brain stem. Located on the back of the midbrain
are two pairs of small bumps called the inferior and superior colliculi. The inferior
colliculi are among the principal computing centers in the auditory pathway. The in¬
ferior colliculi receive input from the cochlear nuclei and transmit output to the
medial geniculate nuclei of the thalamus. Neurons in the inferior colliculi are ar¬
ranged in an orderly manner with respect to frequencies. It is thought that the in¬
ferior colliculi play a significant role in the localization of sound.
The superior colliculi are related to vision. In submammalian vertebrates the
primary visual processing center is the optic tectum, a precursor to the mammalian
superior colliculi. The superior colliculi have a complex, laminated structure resem¬
bling the cerebral cortex. Beginning with reptiles, the importance of the optic tectum
(superior colliculi) diminishes progressively as an increasing number of optic fibers
establish more extensive connections with the thalamus and cortex. In man the
superior colliculi have become greatly reduced in size, receiving only about 20 per¬
cent of the fibers from the optic nerve. They serve primarily as reflex centers con¬
cerned with eye movements. The superior colliculi receive input from the lateral
geniculate, the retina, and the visual cortex. Output goes to the oculomotor nucleus
and the other midbrain structures that control the muscles moving the eyes. The
superior colliculi are thought to compute the functions necessary to fixate the eyes
on an object of attention. The part of the visual scene of special interest is computed
in the visual areas of the occipital cortex, primarily in the visual association areas.
From there it passes to the visual pointing centers of the superior colliculi. Decisions
made in the superior colliculi then travel to the reticular areas around the
oculomotor nuclei and thence into the motor nuclei themselves.
Red Nucleus
Forward of the superior colliculi, in the center of the midbrain, is the red
nucleus. The red nucleus is an important part of the extrapyramidal motor system. It
receives input from the caudate nucleus and the putamen as well as from the
prestitial nucleus and the nucleus commissuralis. It also receives input from the
cerebellum. Output from the red nucleus goes to the spinal motor neurons and to the
reticular nuclei of the brain stem, which then project to the spinal motor neurons.
Next to the red nucleus is the substantia nigra, a large pigmented region that is
only partially understood. The substantia nigra is thought to have an important role
in the extrapyramidal motor system. It receives input from the basal ganglia and the
cerebral cortex. Output goes primarily to the thalamus.
The Central Nervous System 83
THE THALAMUS
The thalamus, the gateway to the cerebral cortex, is technically not part of the
brain stem but of the forebrain. It is the computational and switching center through
which all sensory information, with the exception of smell, is routed. From the
thalamus, sensory information is passed to the specific cortical regions dedicated to
the early and intermediate stages of sensory processing.
Every thalamic nucleus that sends fibers to a particular cortical region also
receives in turn some fibers from that same region. Thus, there is a great deal of
looping circuitry between the thalamus and cortical regions to which it sends input.
The consequence of this is that the cortex can send the thalamus instructions for
filtering the incoming data, separating the important from the inconsequential.
Signals, or portions of images that contain priority information, can be amplified,
and other regions can be filtered to suppress that which is not relevant to current
behavioral activity. In some cases, the cortex may even supply the thalamus with ex¬
pectations or predictions of what sensory data to expect, so that differences between
the expected and the observed can be passed to the motor-generating modules in the
forebrain.
The thalamus consists of a cluster of 29 separate nuclei. Figure 4.14 illustrates
the origin of input and destination of output of some of the major thalamic nuclei.
Among the most important thalamic nuclei is the lateral geniculate nucleus, the relay
station of the optic tract. The lateral geniculate, like all the other thalamic relay sta¬
tions, also receives inputs from the cortical area to which it projects, in this case, the
visual cortex. This allows visual perceptions made in the cortex to influence and
modify the incoming visual data.
A second important thalamic nucleus is the medial geniculate. This is the relay
station for audio data. The medial geniculate receives audio information from the
inferior colliculus as well as directly from the cochlear nucleus. It sends its output to
the auditory cortex located mainly in the upper part of the temporal lobe.
A third principal thalamic region is the ventral posterolateral nucleus. This is
the relay station for most of the sensory information from the spinal cord. Touch,
position, pain, heat, and cold are all relayed to the sensory cortex through the ven¬
tral posterolateral nucleus.
The ventral lateral nucleus is the relay station for information from the
cerebellum on its way to the premotor cortex. The anterior nucleus passes com¬
munications between the hypothalamus and the limbic cortex. The ventral anterior
nucleus transmits signals from the globus pallidus and the substantia nigra to
various regions in the frontal and temporal cortex.
In addition to these specific relay nuclei of the thalamus, there are a number of
so-called association nuclei that receive no direct fibers from the sensory systems,
but have abundant connections with other structures in the forebrain. These send
fibers to the association areas of the cerebral cortex in the frontal and parietal lobes
and, to a lesser extent, in the occipital and temporal lobes.
To and from To and from superior
precuneus
parietal lobule
Internal medullary
lamina (intralaminar To and from inferior parietal
nuclei) lobule and areas 18 and 19
To and from
prefrontal cortex
Mammillothalamic tract
Brachium of inferior
colliculus and lateral
lemniscus
To premotor cortex
(areas 6 and 8)
Optic tract
Brachium conjunctivum
Trigeminothalamic tracts
Diffuse cortical projection
Medial lemniscus,
spinothalamic tracts To sensory cortex-neck,
trunk, and extremities area
Figure 4.14: Major nuclei of the thalamus showing the origin and destination of fibers enter¬
ing and leaving. The thalamus is the final processing station for all sensory information (ex¬
cept smell) before it reaches the cerebral cortex.
The Central Nervous System 85
THE FOREBRAIN
The forebrain contains the highest levels of the computational hierarchy we call
the brain. The forebrain contains the thalamus, the basal ganglia, the cerebral cor¬
tex, and the limbic system, including the hypothalamus.
MOTOR CORTEX
Figure 4.15: A cutaway section of the brain of the macaque monkey showing the relative posi¬
tion of the thalamus to the three components of the basal ganglia, the putamen, the globus
pallidus, and the caudate nucelus. [From “Brain Mechanisms of Movement, ” by E. V. Evarts. Copyright
1979 by Scientific American, Inc. All rights reserved.]
86
The basal ganglia are the highest levels in the motor systems of the lower
vertebrates. For example, in birds, where the cerebral cortex is poorly developed, the
basal ganglia perform all the higher motor functions. Even in cats, and to a lesser ex¬
tent in dogs, removal of the cerebral cortex does not interfere with the ability to per¬
form many complex behavioral patterns such as walking, eating, fighting, showing
anger, and performing sexual activities. Only if large portions of the basal ganglia
are also destroyed is the animal reduced to the simple stereotyped movements
generated in the brain stem. Even in very young humans, destruction of the cortex
does not destroy the ability to walk crudely, to control equilibrium, or to perform
simple “unconscious” movements.
In humans, the basal ganglia, together with the premotor cortex, the thalamus,
the substantia nigra, and the red nucleus, make up the principal part of the extra-
pyramidal motor system. Figure 4.16 illustrates functional pathways of the extra-
pyramidal system.
The basal ganglia appear to be able to control the initiation and cessation of ac¬
tion primarily through the inhibition and modulation of muscle tone produced by
the lower motor centers of the brain stem. Destruction of the basal ganglia releases
these lower centers from inhibition and produces muscle rigidity.
The effects of lesions and electrical stimulation of the basal ganglia are com¬
plex. In some cases stimulation elicits complete motor sequences of coordinated
movement. In other cases, stimulation causes an animal to cease its ongoing
behavior and hold its position for many seconds while the stimulation continues.
Destruction of the globus pallidus results in a severe decrease of motor activity; the
subject remains passive and immobile. Lesions in the caudate nucleus can lead to
hyperactivity such as the “obstinate progression” phenomenon where an animal will
continue the effort to make walking movements even after encountering a wall, if
the floor is slippery enough to permit its feet to slide backward.
The Central Nervous System 87
i
|
At the very top of the brain, covering all the lower level structures, is the
cerebral cortex, or cerebrum. Cortex means “bark,” and the cerebral cortex covers
the cerebral hemispheres like the bark of a tree. In humans the cortex is convoluted,
or wrinkled like the surface of a prune, which greatly enlarges the surface area of the
cortex: a human brain contains approximately 20 square feet of cortical area. The
cerebrum is the most recent phylogenetic area in the brain. As can be seen in figure
4.17, it is largest in humans, smaller in chimpanzees and monkeys, smaller still and
less convoluted in cats and opossums, and decreasingly apparent in birds, reptiles,
amphibians, and fish.
88
Figure 4.17: Progressive increase in the size of the cerebrum in vertebrates, all drawn to the
same scale. In carnivores, and particularly in primates, the cerebrum increases dramatically in
size and complexity. [From “The Brain,” by D. H. Hubei. Copyright © 1979 by Scientific American, Inc. All
rights reserved.]
Figure 4.18: The five major regions of the cerebral cortex. At the left is the left side of the
brain seen from the left side. At the right is the right side seen from the left.
The Central Nervous System 89
20
Figure 4.19: A map of the cerebral cortex based on the differences in cell architecture com¬
piled by Brodmann in 1914. This map is in remarkably good agreement with functional
regions discovered in subsequent years.
In humans the cortex is divided into five major regions, or lobes: frontal,
parietal, temporal, occipital, and limbic as shown in figure 4.18. These divisions are
based partly on function and partly on anatomical features. A much more detailed
map of the cerebral cortex that delineates over fifty different regions is shown in
figure 4.19.
The cells in the motor cortex are arranged in columns normal to the surface of
the cortex. These columns are about 1 millimeter in diameter and have several thou¬
sand neurons in each column. The cells in each column are themselves arranged in
six distinct layers, distinguished by the origin of the axonal input to them. Each col¬
umn of cells seems to perform a specific motor function, such as stimulating a par¬
ticular muscle or several synergistic muscles. Specific cells within a column are
responsive to different types of input signals. Some cells respond to sensory signals
reporting joint movement, others to touch stimuli, and still others to signals from
the cerebellum, basal ganglia, and premotor cortex.
Direct electrical stimulation of a single output neuron in the motor cortex will
almost never excite a muscle contraction. At least several output neurons need to be
stimulated simultaneously. When barely threshold stimuli are used, only small
segments of the peripheral musculature contract at one time. In the ‘‘finger’’ and
‘‘thumb’’ regions, threshold stimuli can sometimes cause single muscles to contract.
The Central Nervous System 91
Conversely, in the “leg” region a threshold stimulus may cause some gross move¬
ment of the leg.
The area just forward of the primary motor cortex is called the premotor cortex
(area 6). Stimulation of this area will often elicit complex contractions of groups of
muscles. Vocalization or rhythmic movements, such as alternate thrusting of a leg
forward and backward, coordinated moving of the eyes, chewing, swallowing, or
contortion of parts of the body into different postural positions, may occasionally
occur.
The premotor area next to the mouth, lips, and tongue area on the left side of
the brain is known as Broca’s speech area. Damage to this region results in speech
defects related to the ordering of sequences of sounds and with the transition from
one sound to another. Damage does not prevent a person from vocalizing or re¬
sponding to questions with answers that are semantically meaningful. However, the
victim usually cannot encode thoughts into well-formed grammatical sentences.
Words may be uttered out of order, or the victim may be unable to move from one
word to another, resulting in the repetition of sounds. There is particular difficulty
with the inflection of verbs, with pronouns and connective words, and with complex
grammatical constructions.
Just above Broca’s area is the premotor area for the eyes. Damage here will
prevent a person from voluntarily moving the eyes to a new target once they fixate
on an object. The premotor cortex has direct input connections from the sensory
association areas of the parietal lobe and sends output to the primary motor cortex
and the basal ganglia.
The area forward of the premotor cortex is known to have functions related to
the brain’s ability to sequentially organize complex motor tasks and to deal with
complex spatial relationships. It is also deeply involved in the formulation of long-
range plans and conceptual abstractions. Damage to this area can cause disruption
in the ability to plan and execute extended sequences of action. Fragmented motor
sequences can appear but may be out of order, or they might show repetition or in¬
ability to proceed to the next part of a sequence of conceptual abstractions. This
part of the brain is the youngest phylogenetically. The high forehead of humans is
the result of the skull expanding to accommodate this latest addition to the brain. As
a general rule, damage to the more forward areas of the frontal cortex produces
more global defects in planning and symbolic reasoning, while damage to regions
nearer the primary motor cortex tends to interfere with more primitive elements of
motor behavior.
The premotor area just forward of Broca’s area appears to be involved with the
initiation of sentences rather than with the sequential organization of words.
Damage here may interfere with the patient’s ability to initiate speech.
recognize the meaning of written words. Damage in the outer layers of area 18
eliminates the ability to perceive visual spatial relationships. Damage to the inner
layers produces defects in object recognition. In area 19 there is no point-to-point
mapping of the retina. At this level in the visual-processing system, only the nature
of the stimulus is important, not the position. Damage in area 19 may impair the
ability to perceive the relationship between objects or to recognize more than one
object at a time.
The output of areas 18 and 19 projects to 20 and 21 of the temporal cortex.
CORTEX
Figure 4.21: A schematic diagram of the major computational centers and data-flow pathways
in the visual perceptual system. Two major pathways, which diverge at subcortical levels, are
apparent.
94
Figure 4.22: Anatomy of the limbic system. The regions in the shaded area make up the limbic
system.
POSTERIOR ANTERIOR
Posterior hypothalamus Paraventricular nucleus
(Increased blood pressure) (Oxytocin release)
(Pupillary dilation) ( Water conservation)
(Shi vering)
(Corticotropin)
Medial preoptic area
Dorsomedial nucleus (Bladder contraction)
(G.I. stimulation) /
( Decreased heart rate)
(Decreased blood pressure)
Perifornical nucleus
(Hunger) I
-Supraoptic nucleus
(Increased blood pressure) ( Water conservation)
(Rage)
Optic chiasm
Ventromedial nucleus
(Satiety) Infundibulum
Posterior preoptic and
anterior hypothalamic area
Mammillary body (Body temperature regulation)
(Feeding reflexes)
(Panting)
Lateral hypothalamic area (not shown) (Sweating)
(Thirst a hunger) (Thyrotropin inhibition)
Figure 4.23; The principal nuclei of the hypothalamus and the functions they affect.
These regions of the brain provide the value judgments as to whether the results
reported by the sensory-processing regions of the brain are good or bad. These are
the centers that tell us whether what we are doing (or are thinking of doing) is
rewarding or punishing. If the evaluation is positive, we will tend to continue our
ongoing action (or begin our contemplated action). If the evaluation is negative, we
will stop what we are doing or refrain from what we had planned to do.
Such evaluations are also useful in the control of memory storage. Some events
are very important to remember; others are not. The emotions tell us what is worth
remembering. Animal experiments have shown that sensory experiences causing
neither reward nor punishment will hardly be remembered at all, even if repeated a
number of times. However, a single event that arouses extreme pain, pleasure, or
fear will be remembered very clearly.
The part of the limbic system called the hippocampus is involved with memory
storage. Destruction of the hippocampus on both sides of the brain leads to loss of
the ability to remember anything new. Old memories stored before the hippocampal
damage are not affected, but there is a total loss of the ability to remember anything
afterwards, even events only a few minutes old. The hippocampus has numerous
connections with almost all parts of the limbic system. Stimulation of the hippocam¬
pus at times causes rage or other emotional reactions. At other times it causes a total
loss of attention. For example, stimulation of the hippocampus in people while they
are talking can result in their completely losing contact with the conversation.
The hippocampus is believed to make the emotional judgments as to what is
worth remembering. This allows the brain to be selective in what it stores, carefully
recording those experiences that are memorable and discarding those that are in¬
significant. Destruction of this selection center would therefore result in everything
being forgotten as if unimportant.
There are other areas of the limbic system, particularly in the hypothalamus,
that perform a somewhat different type of evaluation. For example, the ven¬
tromedial nucleus of the hypothalamus tells the brain when the body has had enough
to eat. When this center is stimulated, an animal eating food suddenly stops eating
and shows complete indifference to food. On the other hand if this area is destroyed,
the animal cannot be satiated and will eat voraciously, becoming quickly obese. The
lateral hypothalamus produces feelings of hunger and thirst.
Other areas in the hypothalamus produce visceral responses such as increased
blood pressure, pupillary dilation, shivering, changes in heart rate and body
temperature, panting, and sweating. Figure 4.23 illustrates the various bodily func¬
tions controlled or influenced by stimulation of various regions in the
hypothalamus.
This concludes our review of the structure and function of the brain. Perhaps
98
the most obvious feature of this amazing organ is that many different computations
are going on simultaneously in many different places: each sensory-motor system is
a separate computational structure; each neuron is a separate computing device;
each nucleus or patch of cortex is a computing module capable of calculating a
mathematical function on the multiple variables that are its inputs. The brain is not
a computer; it is a network of millions, even billions of computers, each operating
on its own set of inputs and transmitting its outputs to a specific set of other com¬
puters in the net. The computers of the brain are slow compared to modern digital
computers, but there are so many of them operating in parallel that the number of
computations per second far exceeds the capacity of the fastest electronic computer
ever built.
Attempting to model the entire brain in any single computer is hopeless. There
are limits to the speed of computation and the complexity of software that certainly
would doom any such effort. However, the emerging technology of large-scale in¬
tegrated circuitry that makes it possible to build entire computers, or even arrays of
computers, on a single chip of silicon raises the possibility of constructing networks
of hundreds, or even hundreds of thousands, of computers operating in parallel.
These would operate independent of each other except for the input and output in¬
formation shared by a number of computing centers. Such modularity would permit
the programs in the various computers to be debugged and optimized separately.
This would limit the complexity of the software in any single computer to a
manageable level. No computer would have to manage more than one task or sub¬
task at a time. Eventually, as more and more computers were added, the rate of
computation in such a structure might well approach that of the human brain.
A structure with the computational power of the brain will probably need to be
almost as complex as the brain. We often heard that we use only X percent of our
brains (where X may be any number from 10 to 80). This is an old wives’ tale, often
retold in the context of self-fulfillment lectures encouraging people to more fully
utilize their mental powers. It may have originated with frustrated parents or
schoolmasters in response to the perennial reluctance of children to do their studies.
Whatever its origin and regardless of its popular acceptance, this notion is clearly
absurd. The brain consumes approximately 20 percent of the heart’s blood flow on a
top-priority basis. The demands of survival in a hostile world would not permit in¬
dividuals of any species to prosper or survive very long while wasting so much of
their most precious resources. The brain is complex because the computational tasks
it performs are complex and multitudinous. If we would duplicate or model the per¬
formance of even a small brain, we must devise a complex structure.
The entire brain is made up of neurons. Each cluster of neurons computes a
relatively simple function on a well-defined set of variables. Thus, no part of the
model need be very complex, certainly no more complex than a single board
microcomputer, perhaps backed up by a disc or bubble memory. The problem is
how to partition the functions of the brain so that they can be modeled, and how to
interconnect a network of computers so that it corresponds to that partition.
The Central Nervous System 99
In order to even begin thinking about these questions,we must first devise some
concise and precise way of describing the behavior we want our model to produce
and the computations required to produce it. We need a mathematical notation that
can deal with many variables, indeed, many thousands of variables, and that can ex¬
plicitly represent the flow of time. We also need a graphical notation that can help us
visualize the stream of consciousness and the behavior that arises from its swirling
currents. The creation of such mathematical tools is a task we’ll examine in the next
chapter.
Hierarchical Goal-Directed Behavior 101
CHAPTER 5
VECTORS
One way to describe many variables and deal with many simultaneous
multivariant computations is to use vector notation. A vector is simply an ordered
set, or list of variables. The vector V in figure 5.1b has two components, vx along the
X axis and vy along the Y axis. The ordered set, or list of components, defines the
vector so that we can write V = (v*^).
The components of a vector can also be considered as the coordinates of a point
(v*,^) that correspond to the tip of the vector. The locus of all pairs of components
that can exist defines a vector space (for two dimensions the vector space is a sur¬
face). A vector can have more than two components. A vector with three com¬
ponents defines a volume (figure 5.1c), and a vector with four or more components
defines a hyperspace (figure 5.Id). A hyperspace is impossible to visualize, but is an
essential concept for our discussion.
A vector in a higher dimensional space can usually be visualized as a projection
onto a lower dimensional space. For example, typical mechanical drawings portray
front, side, and top views of a three-dimensional form projected onto a two-
dimensional sheet of paper. Each projection can either illustrate a cut through the
object at a particular plane along the projection axis, or a superposition of all the
102
salient features of the object collapsed into the plane of the illustration. In the col¬
lapsed version, the fact that two points or lines intersect in the projected image does
not necessarily mean that they coincide or intersect in the higher dimensional
space—they may simply lie behind each other along the projection axis. The projec¬
tion operator ignores variable differences that correspond to distance along the pro¬
jection axis.
It is not necessary to make the projection axis coincident with any of the coor¬
dinate axes. For example, in the oblique projection (perspective drawing) of figure
5.1c, the projection axis (the normal line to the paper through the origin of the coor¬
dinate system) is not aligned with any of the coordinate axes. The lines in the draw¬
ing represent the projections of lines in a three-dimensional space onto the two-
dimensional surface of the paper. In a similar way we can project higher dimen¬
sional vectors and hyperspaces of any dimension onto a two-dimensional drawing.
Figure 5.Id illustrates a four-dimensional vector projected onto a two-dimensional
drawing. f
(a)
A VECTOR WITH
V = vx ----*
Kr
4
(b)
V= Wx ,vy )
Figure 5.1: Defining space with vectors. A vector is an ordered list of variables which defines a
point in space; (a), (b), (c), and (d) depicit vectors representing 1, 2, 3, and 4 dimensions,
respectively. The number of dimensions in the space is equal to the number of variables in the
list. (The illustration in (d) is meant only to be symbolic of a four-dimensional vector, which
cannot be visualized in three dimensions.)
Hierarchical Goal-Directed Behavior 103
A vector can specify a state. This is the primary use we will make of vectors in
this discussion. A state is defined by an ordered set of variables. For example, the
state of the weather might be characterized by a state vector W = (wu w2, w3, w4)
where:
Wi = temperature
w2 = humidity
w3 = wind speed
w4 = rate of precipitation
The state vector W exists in a space that consists of all possible combinations of
values of variables in the ordered set (wl9 w2, vv3, w4). We can thus say that the vector
W defines a space Sw. Every point in the space corresponds to a particular, unique
weather condition. The entire space corresponds to all possible weather conditions.
The weather, like many things, is not constant; it varies with time. Each of the
state variables (temperature, humidity, wind speed, and rate of precipitation) is
time-dependent. Thus, as time passes, the point defined by W will move through the
four-dimensional space. Figure 5.2 illustrates the locus of the point traced out by W
as it moves to define a trajectory Tw.
RATE OF
PRECIPITATION
Figure 5.2: As time progresses, if one or more of the components of a vector W change, the
vector will move through space, tracing out a trajectory Tw
104
Figure 5.3: If the ordered list of variables which define a vector includes time, the space de¬
fined by the vector will have time as one of its axes. As time progresses, the vector will move
along the time axis. If none of the other variables is time dependent, the trajectory will be a
straight line parallel to the time axis, as in (a). If any of the other variables change with time,
the trajectory will be some curve with a component along the time axis as in (b).
Hierarchical Goal-Directed Behavior 105
If we project the state space of all the variables except time onto a two-
dimensional surface, we can represent the passage of time by the motion of this two-
dimensional plane along the time axis normal to it, as in figure 5.4. The state trajec¬
tory Ts is the locus of points traced out by the state vector as time passes.
A large variety of things can be represented as vectors. For example, we can
represent a picture as a vector. Any picture can be represented as a two-dimensional
array of points, each with a particular brightness and color hue. Thus each point can
be represented as three numbers corresponding to three primary color brightnesses:
the first for red, the second for blue, and the third for green. If all three brightnesses
are zero, the color is black. If all three are large and equal, the color is white. As
long as the number of points is large and the spacing is closer than the eye can readi¬
ly make out, the eye cannot distinguish such an array from a scene made up of the
real object. This is, of course, the principle by which pictures are printed in books
and transmitted over television. If the numbers corresponding to the color and
brightness of each resolution element are simply arranged in a list, then that list is a
vector. Any picture can be represented by a vector, and any series of pictures, like
those on a motion picture film, can be represented by a trajectory. The space is
defined by the set of all possible pictures capable of being printed or projected by a
two-dimensional array of brightness elements with a particular resolution.
/= 0 /=l /= 2 /= 3
Figure 5.4: If the vector space defined by all of the vector components except time is projected
upon a two-dimensional surface, then the passage of time can be represented as the movement
of the two-dimensional surface along the time axis normal to it.
106
^6
Figure 5.5: A vector can represent a symbol. Here two symbols from the ASCII character set,
an uppercase A and a lowercase a, are represented as vectors (or points) in an eight¬
dimensional space. The values of the eight bits in the ASCII code are plotted along the eight
axes (b8 is the even parity bit).
1. The difference between the neighborhood points and the exact symbol point
derives from noise on the channel transmitting variables denoting the vector
components. This is useful in signal detection theory, where the detection of
a vector within some neighborhood of a symbol vector corresponds to the
recognition of that symbol against a noisy background.
2. The difference from the exact symbol derives from distortions or variations
in the symbol itself. This makes the best sense if the components of the sym¬
bol’s vector are values of attributes or features of the symbol, rather than ar¬
bitrary digits as in the ASCII convention. In this case, a neighborhood of
points corresponds to a cluster of feature vectors from a symbol set which is
not identical, but very nearly so.
For example, a vector of features from the printed character e will be slightly dif¬
ferent for each instance of that symbol on a page due to variations in the paper on
which it is printed. However, if these e feature vectors fall in compact clusters far
5i
Figure 5.6: Each point in hyperspace, corresponding to a particular symbol such as a or e, has
some neighborhood ofpoints around it which are closer to it than to any other symbol. Varia¬
tions from the exact or ideal position of a symbol vector may derive from noise in a transmis¬
sion channel or from differences between the observed symbol and the ideal.
108
from the feature vectors of other symbols, the letter e will be easily recognized,
despite the fact that no two specimens are exactly alike.
This is a fundamental concept in pattern-recognition theory. Hyperspace is par¬
titioned into regions, and the existence of a feature vector in a particular region cor¬
responds to the recognition of a pattern or symbol. By definition, the best set of
features is the one that maximizes the separability of pattern vectors. In the design
of pattern recognizers it is important to select a set of features that is easily measured
and that produces widely separated and compact clusters in feature space.
A vector can also be used to describe the state of a set of neurons. At any in¬
stant, each neuron has some voltage value that produces some firing rate. If we
select a set of neurons and make a list of their firing rates, that list defines a vector.
Thus, we can describe the state of any set of neurons by a vector whose components
are the voltage values or firing rates of the individual neurons in the set.
This can be done for any set of neurons. If, for example, we pick the set of
neurons that make up the retina of the eye, then the pattern of neural activity
generated by a particular visual image can be described as a vector. Any particular
visual image will generate a particular pattern of neural activity on the retina and
hence a particular vector corresponding to the state of the retinal neurons. As time
progresses and the image changes, the vector traces out a trajectory. This trajectory
corresponds to a visual experience. Of course, the incoming image itself can be
described by a set of values, such as an array of brightness values, or even of color,
texture, or depth values.
Similarly, we can describe the state of any other set of neurons in the brain as a
vector and the sequence of states of these neurons over a period of time as a trajec¬
tory of that vector through the vector space of all possible states of those neurons.
We can, for example, describe the state of the neurons in the motor system by a vec¬
tor and a sequence of neuronal activity producing a behavioral action as a trajec¬
tory. We can describe the state of the neurons in the pain system as a vector and the
experience of pain from an injury as a trajectory. We can describe the state of the
neurons in the hypothalamus that produce the feeling of hunger as a vector, or the
neurons in the septum that produce the feeling of joy by a vector. Thus we can
define the mental feelings corresponding to pain and pleasure, the sensory ex¬
periences, and behavioral activities as trajectories of state vectors through
multidimensional space.
We can, in fact, define a vector containing the value of every neuron in the en¬
tire central nervous system. Such a vector then describes the state of the mind, and
the trajectory of that vector corresponds to the stream of consciousness.
geographical region is a function of the heat input, the prevailing wind conditions,
and other factors. The seasons are a function of the position and orientation of the
earth relative to the sun. Similarly, we can say that the level of our hunger is a func¬
tion of the signals on nerve fibers reporting on the state of the stomach, chemistry of
the blood, the time of day as indicated by internal biological rhythms, and so on.
In mathematics, a function defines, and is defined by, a relationship between
symbols. Sometimes the relationship can be set in one-to-one correspondence to
physical variables. A function often implies a directional relationship. For example,
in the physical world there is a one-way direction in the relationship between cause
and effect. In traditional terms a function can be expressed as an equation, such as
y = fix)
y = 2x*2 + 3x + 6
Figure 5.7: Functions can be expressed in a Figure 5.8: Functions can also be expressed as
number of different ways. Here the func¬ tables and circuits. Here the Boolean function
tional relationship between y and x is ex¬ z = x • y is expressed as a table, a circuit, and
pressed as an equation and a graph. an equation.
110
f:C-~E
which reads, “/is a relationship which maps the set of causes Cinto the set of effects
E.” It means that for any particular state in the set C, the relationship/will compute
a state in the set E. This is shown in figure 5.9.
We have already shown that states can be denoted by vectors and sets of states
by sets of points in vector hyperspaces. Thus, the notion of a function being a map¬
ping from one set of states to another naturally extends to a mapping of points in
one vector hyper space onto points in another.
Suppose, for example, we define an operator h as a function which maps the in¬
put S = (su s2, s3, . . . sN) onto the output scalar variable p. We can write this as
P = K S)
or
We can also draw the functional operator as a circuit element or “black box” as
in figure 5.10. (A black box is an engineering concept sometimes used to depict a
process with inputs and outputs. The viewer sees the effects on the output of changes
to the input, but the internal workings of the process remain hidden in a black box.)
If we assume that we have L such operators, hu h2, . . . hL, each operating on
the input vector S as in figure 5.11, we have a mapping
H: S - P or P = H(S)
where the operator H = (hu h2, . . . hL) maps every input vector S into an output
vector P. Now since S is a vector (or point) in input space, we can represent the func¬
tion H as a mapping from input space onto output space.
Hierarchical Goal-Directed Behavior 111
Figure 5.9: A function can also be expressed as a mapping from one set onto another. Here the
function f maps the set of causes C onto the set of effects E such that for every cause in C there
is an effect in E. In our discussion we will be concerned only with single-valued functions such
that there is only one effect for each cause. We will, however, allow more than one cause to
have the same effect (i.e., more than one point in C can map onto the same point in E).
r
s\
<L
/)
\ ^
•
•
•
L SN-
Figure 5.10: We will define the operator h as a function which maps the input vector S into the
output scalar variable p.
Figure 5.11: We will define the set of operators H = (h1} h2, . . hL) as a function which
maps the input vector S into the output vector P.
112
For the purposes of our discussion we require that both the input and output
space be bounded and that each S will map into one and only one P. Several dif¬
ferent S vectors may map into the same P vector, however. Of course, if any of the
variables in S are time-dependent, S will trace out a trajectory T.? through input
space. The operator H will map each point S on Ts into a point P on a trajectory TP
in output space, as shown in figure 5.12. «
A function can also describe the operation performed by a neuron on its inputs
or by a cluster of neurons on their inputs. If we define the vector S such that (su s2,
. . ., 5„) are the firing rates on the synaptic inputs of a particular neuron, then we can
say the output p = h(S) is the firing rate on the axon of that neuron. The function h
is defined by the strengths and types of the various synaptic connections and their
position on the dendrites and cell body of the neuron. If we define the vector S such
that (s„ s2, . . ., s„) are the firing rates on all the input fibers to a cluster of neurons,
then we can say the output P = H(S) is the firing rate of all the axons leaving that
cluster; that is, pt is the firing rate of the first neuron in the cluster, p2 is the firing
rate of the second neuron, and so on to pL, the firing rate of the last neuron in the
cluster.
Thus we can define a function H as the mathematical transformation per¬
formed by a cluster of neurons on a set of input fibers. As the input vector S traces
out a trajectory T? through input space, the function H performs a transformation
on each input S so that the output vector P traces out a trajectory TP through output
space. For example, if we label all the inputs to a motor control module in the spinal
cord so that they define a vector S, then the output firing rates to the muscles can be
labeled to define the vector P. The transformation performed by the spinal motor
control module on the input vector S is the function H, which computes the output
P = H(S).
Figure 5.12: The operator H maps every input vector S in input space into an output vector P
in output space. H thus maps the trajectory Ts into the trajectory TP.
Hierarchical Goal-Directed Behavior 113
This notation gives us a means of talking about the states, computations, and
sequences of states (or trajectories) which make up the activity of the brain. We can
define the instantaneous state of any neuron, or cluster of neurons, or even of the
entire brain as a vector. We can describe any experience, behavior pattern, or any
thought or idea as a trajectory. This is an extremely powerful concept, for it enables
us to visualize graphically the internal activity of the brain, as well as the external ac¬
tivity of behavior. It gives us a mathematically precise and concise notation for
describing the activity of individual neurons, clusters of neurons, and the entire
brain. Furthermore, it gives us a way to partition the activities of different areas of
the brain into different vector spaces and represent them simultaneously along the
time axis. Thus, we can decompose the enormous complexity of many simultaneous
computations into intellectually comprehensible modules and then systematically
recombine them into an integrated whole.
We are now ready to consider the structure of control systems for sensory-
interactive goal-directed behavior. The simplest form of goal-seeking device is the
servomechanism. The setpoint, or reference input to the servomechanism, is a sim¬
ple form of command. Feedback from a sensing device, which monitors the state of
the output or the results of action produced by the input, is compared with the com¬
mand. If there is any discrepancy between commanded action and the results, an er¬
ror signal is generated that acts on the output in the proper direction and by the pro¬
per amount to reduce the error. The system thus follows the setpoint, or, put
another way, it seeks the goal set by the input command.
Almost all servomechanism theory deals with a one-dimensional command, a
one-dimensional feedback, and a one-dimensional output. Our vector notation will
allow us to generalize from this one-dimensional case to the multidimensional case
with little difficulty.
Assume we have the multivariable servomechanism shown in figure 5.13. The
function //operates on the input variables in S and computes an output P = //(S).
Note that we have partitioned the input vector S into two vectors: C = (su s2 . . .,
s^ and F = (si+1, . . . such that S = C + F. If / = 1, N = 2, / = 1, and H
computes some function of the difference between C and F, we have a classical
servomechanism.
In our more general case, C may be any vector, and in some cases it may be a
symbolic command. The feedback vector may contain information of many dif¬
ferent types. It may simply report position or velocity of the controlled outputs, but
for a complicated system such as a robot manipulator or the limb of an animal, it
may also report the resistance to movement by the environment, the inertial con¬
figuration of the manipulator structure, and other parameters relevant to the prob¬
lem of making rapid and precise movements.
114
s <
Figure 5.13: A multivariable servomechanism. The reference or command input is the vector
C consisting of the variables Sj through Sj. The feedback is the vector F consisting of sensory
variables Si+1 through sN. The function H computes an output vector P consisting of Px
through PL that drives actuators, thus affecting the physical environment.
Figure 5.14: A stationary C vector establishes a setpoint, and as time progresses the feedback
vector varies from F1 to F2 to F3. The S vector thus traces out a trajectory Ts. The H operator
computes an output ¥ for each input S and so produces an output trajectory TP. The result is
that the input command C is decomposed into a sequence of output subcommands P1, P2, P3.
HIERARCHICAL CONTROL
Assume that the command vector C in figure 5.14 changes such that it steps
along the trajectory Tc as shown in figure 5.15. The result is that the sequence of in¬
put commands C1, C2, C3, followed by the sequence C4, C5, produces the sequence
of output vectors P1, P2, P3, P4, P5. In this case the subsequence P1, P2, P3 is called
by the commands C1'3 and driven by the feedback F1, F2, F3. The subsequence P4, P5
is called by C4 5 and driven by F4, F5, etc.
If we now represent time explicitly, the C, F, and P vectors and trajectories of
116
figure 5.15 appear as shown in figure 5.16. The fact that C remains constant while
the feedback changes from F1 to F2 to F3 means that the trajectory Tc is parallel to
the time axis over that interval. The jump from C1-3 to C4"5 causes an abrupt shift in
the Tc trajectory in the time interval between F3 and F4.
Note that each instant can be represented by a plane (or set of coplanar regions)
perpendicular to the time axis. Each plane contains a point from each trajectory and
represents a snapshot of all the vectors simultaneously at a specific instant in time.
^•s
5- o'*y
l-<s: t»(
g <"> •
* § ^
O ~ "
§■ i §
3r? Ci.
13_ -
cT4 v <?
o
Hierarchical Goal-Directed Behavior 119
level in the hierarchy. The lower level loops are simple and fast-acting. The higher
level loops are more sophisticated and slower.
At each level the feedback vector F drives the output vector P along its trajec¬
tory. Thus, at each level of the hierarchy, the time rate of change of the output vec¬
tor P, will be of the same order of magnitude as the feedback vector F, and con¬
siderably more rapid than the command vector C,. The result is that each stage of
the behavior-generating hierarchy effectively decomposes an input task represented
by a slowly changing C, into a string of subtasks represented by a more rapidly
changing P,.
At this point we should emphasize that the difference in time rate of change of
the vectors at various levels in the hierarchy does not imply that the H operators are
computing more slowly at the higher levels than at the lower. We will, in fact,
assume that every H operator transforms S into P with the same computational
delay At at all levels of the hierarchy. That is,
at every level. The slower time rate of change of P vectors at the higher levels stems
from the fact that the F vectors driving the higher levels convey information about
events that occur less frequently. In some cases certain components of higher level F
vectors may require the integration of information over long time intervals or the
recognition of symbolic messages with long word lengths.
When we represent time explicitly as in figure 5.17b, we can label the relatively
straight segments of the Tc trajectories as tasks and subtasks. Transitions between
the subtasks in a sequence correspond to abrupt changes in Tc.
If we do not represent time explicitly, the relatively constant C vectors corre¬
spond to nodes, as in figure 5.15. The resulting tree structure represents a classical
AND/OR decomposition of a task into sequences of subtasks, where the discrete Ct
vectors correspond to OR nodes and the rapidly changing sequences of P, vectors
become sets of AND nodes under those OR nodes.
GOAL-DIRECTED BEHAVIOR
The fifth [argument for the existence of God] begins from the guidedness of
things. For we observe that some things which lack knowledge, such as natural
bodies, work towards an end. This is apparent from the fact that they always or
most usually work in the same way and move towards what is best. From which it
is clear that they reach their end not by chance but by intention. For those things
which do not have knowledge do not tend to an end, except under the direction of
someone who knows and understands: the arrow, for example, is shot by the
archer. There is therefore an intelligent personal being by whom everything in
nature is ordered to this end, and this we call God.
Hierarchical Goal-Directed Behavior 121
In more recent times, Descartes fused mind, soul, and rationality into a single
entity, which like God, is of a substance that is nowhere and unextended. This kind
of thinking does indeed place the notion of intent and purpose outside the realm of
scientific investigation. Against such a historical backdrop, it is not surprising that
the behaviorist school fought to purge the concept of purpose and intent from the
science of behavioral psychology. The point was to divorce the study of behavior
from the supernatural and establish it as a natural science.
Yet such nonphysical things as purposes, intents, goals, plans, and values have
objective reality. They are a part of our everyday experience and the fact that the
Greeks or medieval philosophers attributed them to supernatural causes is no reason
for modern science to pretend that they do not exist. Ideas, goals, and dreams play a
great part in the generation of behavior, and any theory of behavior that does not
take these nonphysical realities into account is very limited in scope.
Every since the development of modern servo-control theory during World War
II, it has been clear that there is no need to appeal to the supernatural or to any life
principle in order to explain goal-directed behavior. Goal-seeking is the natural
behavior of any system that uses feedback information to steer behavior along a
course leading to a goal. The simplest case of this is the servomechanism. No one to¬
day would cite the operation of a thermostat, or even a guided missile, as proof for
the existence of God. We do not need to appeal to any life principle in order to ex¬
plain servo-control theory. Neither should we feel any need to invoke the super¬
natural to explain the behavior of hierarchical goal-seeking systems with several
levels in biological systems. To be sure, the addition of complex sensors,
sophisticated sensory processing, internal world models, and multiple levels of feed¬
back into the hierarchy produces an exponential increase in the complexity of possi¬
ble behavior patterns; but this is not magic. It is predictable and deterministic.
In the following pages we will try to show how the study of robotics can provide
an experimental tool for the study of hierarchical control systems and their relation¬
ship to goal-seeking and purposive behavior. With the advent of inexpensive and
powerful microprocessors, robot-control systems can be made sufficiently complex
so their behavior casts considerable light on many difficult issues of intention,
perception, and cognition. Yet the robot is a machine; the variables that control it
can be isolated and measured. Thus, robotics offers a way to make the study of pur¬
posive behavior into a hard science. Mathematically precise theories can be for¬
mulated and quantitatively tested with a rigor not achievable with biological sub¬
jects. By synthesizing purposive behavior, robotics may aid our understanding of it. #
Figure 5.18: Around each trajectory representing an ideal task performance there exists an
envelope of nearly ideal trajectories which correspond to successful, but not perfect, task per¬
formance. If the H functions are defined throughout these envelopes such that the system can
drive back toward the ideal whenever it deviates, then the trajectory will be stable and task
performance can be successful despite perturbations and unexpected events.
Hierarchical Goal-Directed Behavior 123
Figure 5.19: If the H functions at the lower levels are sufficiently well defined, small perturba¬
tions from the ideal performance can be corrected by low-level feedback without requiring any
change in the command from higher levels.
Figure 5.20: If the lower level H functions are not adequately defined, or if the perturbations
are too large for the lower level to cope, then feedback to the higher levels produces changes in
the task decomposition at a higher level. The result is an alternative strategy.
«, Note that it is not necessary for the feedback vector F to explicity represent an
error signal. It is merely necessary for it to convey information that indicates the
state of the current task performance. The H function at each-hierarchical level is
defined on the multidimensional space of all possible commands and all possible
task states. In order to generate successful behavior, the //function must compute
the correct control signals for every S vector near the ideal trajectory to steer the
behavior trajectory back toward the center of the success envelope.
The F vector can, of course, explicitly represent an error signal. In this case, the
H function must be defined over the space of all possible commands and all possible
error signals. In either case, the dimensionality of the S vector is the same. However,
in the second case the range of the variables in the F vector is reduced and centered
at zero. This is sometimes computationally convenient, but not logically significant.
Overlearned tasks correspond to those for which the H functions at the lower
levels are sufficiently well defined so as to maintain the terminal trajectory suc¬
cessfully without requiring intervention by the higher levels for strategy modifica¬
tion. Thus, a highly skilled and well-practiced performer, such as a water skier, can
execute extremely difficult maneuvers with apparent ease despite large perturba¬
tions, such as waves. His lower level H functions are well defined over large regions
of space corresponding to large perturbations in the environment. He is thus capable
of compensating for these perturbations quickly and precisely to maintain successful
performance without intervention by higher levels. Such a performance is
characterized by a minimum amount of physical and mental effort.
We say, “He skis effortlessly without even thinking.” What we mean is that his
lower level corrections are so quick and precise that his performance never deviates
significantly from the ideal. There is never any need for higher level loops to make
emergency changes in strategy. On the other hand, a novice skier (whose H func¬
tions are poorly defined, even near the ideal trajectory, and completely undefined
elsewhere) may have great difficulty maintaining a successful performance at all. He
is continually forced to bring higher levels into play to prevent failure, and even the
slightest perturbation from the ideal is likely to result in a watery catastrophe. He
works very hard and fails often, because his responses are late and often
misdirected. His performance is erratic and hardly ever near the ideal.
However, practice makes perfect, at least in creatures with the capacity to learn.
Each time a trajectory is traversed, if there is some way of knowing what mistakes
were made, corrections can be made to the H functions in those regions of input
spaces that are traversed. The degree and precision of these corrections and the
algorithm by which they are computed determine the rate of convergence (if any) of
the learning process to a stable and efficient success trajectory.
There are many interesting questions about learning, generalization, and the
mechanisms by which H functions are created and modified at the various hierar¬
chical levels in biological brains. However, we will defer these issues until later.
Hierarchical Goal-Directed Behavior 125
ALTERNATIVE TRAJECTORIES
Note that figure 5.17 illustrates only a single specific performance of a par¬
ticular task. None of the alternative trajectories that might have occurred under dif¬
ferent circumstances with a different set of F vectors is indicated. Alternatives that
might have occurred can be illustrated in a plane orthogonal to the time axis.
Figure 5.21 illustrates the set of alternative C vectors available at various levels
in the behavior-generating hierarchy of the male three-spined stickleback fish. This
figure represents a snapshot, or single cut through space orthogonal to the time axis.
C6, the highest level goal, is survival. The feedback F6 consists of variables in¬
dicating water temperature and depth, blood chemistry, and hormone levels
generated by length-of-day detectors. When the hormone levels indicate the proper
time of year and the blood chemistry does not call for feeding behavior, then
migratory behavior will be selected until warm, shallow water is detected. The F6
vector will then trigger the reproduction subgoal.
RESCUE
Figure 5.21: The command and control hierarchy proposed by Tinbergen to account for the
behavior of the male three-spined stickleback fish. The heavy line indicates the particular type
of behavior vector actually selected by the feedback shown at the various levels of the hierar¬
chy on the left. This figure represents a snapshot in time corresponding to one of the two-
dimensional surfaces shown in figure 5.16.
126
The sensory feedback that enters each level of the behavior-generating hierar¬
chy comes from a parallel sensory-processing hierarchy, as shown in figure 5.24.
Sensory data enters this hierarchy at the bottom and is filtered through a series of
sensory-processing and pattern-recognition modules arranged in a hierarchical
structure that runs parallel to the behavior-generating hierarchy. Each level of this
sensory-processing hierarchy processes the incoming sensory data stream, extracting
features, recognizing patterns, and applying various types of filters to the sensory
data. Information relevant to the control decisions being made at each level is ex¬
tracted and sent to the appropriate behavior-generating modules. The partially pro¬
cessed sensory data that remains is then passed to the next higher level for further
processing.
As was discussed in the section on vectors, any spatial pattern such as a picture,
a sound, or a symbol can be represented as a vector; any visual or auditory sequence
or any string of symbols can be represented by a trajectory.
The fundamental problem in pattern recognition is to name the patterns. All
patterns called by the same name are in the same class. When a pattern has been
given a name, we say it has been recognized. For example, when the image of a
familiar face falls on my retina and I say “That’s George,” I have recognized the
visual pattern by naming it.
Many patterns have the same name or fall in the same class, as shown in figure
5.25. For example, the same face casts many identifiable images. George’s profile
casts quite a different image on our retina than George’s face seen straight on. Two
Hierarchical Goal-Directed Behavior 129
identical positions of George’s face will cast different images if the lighting is dif¬
ferent, or if George is closer or further away. Yet these are all recognized as George.
Figure 5.25: The process ofpattern recognition is one of naming the patterns. Typically many
different patterns will be classified by the same name.
Hierarchical Goal-Directed Behavior 131
At this point we need to introduce some new symbols to clearly distinguish be¬
tween vectors in the sensory-processing hierarchy and those in the behavior¬
generating hierarchy. We will define the input vector to a pattern-recognizer module
as
D = E + R
where E = (du d2, . . dt) is a vector, or list, of data variables derived from sen¬
sory input from the external environment
and R = (di+u . . ., dN) is a vector of data variables derived from recalled ex¬
periences, or internal context.
The functional operator in the sensory-processing hierarchy will be denoted G and
the output Q such that
Q = G(D)
The D vector represents a sensory pattern plus context, such that each compo¬
nent di represents a feature of the pattern or the context. The existence of the D vec¬
tor within a particular region of space therefore corresponds to the occurrence of a
particular set of features or a particular pattern in a particular context. The recogni¬
tion problem then is to find a G function that computes an output vector:
In other words, G can recognize the existence of a particular pattern and context
(i.e., the existence of D in a particular region of input space) by outputting the name
Q. For example, in figure 5.26
etc.
qL
Figure 5.27: A time-varying D vector traces out a trajectory TD which represents a sensory ex¬
perience Te taking place in the context TR. A section of a TD trajectory which maps into a
small region of Q space corresponds to the recognition of an extended temporal pattern as a
single event.
such cases the ambiguity can often be resolved or the missing data filled in if the
context can be taken into account or if the classification decision can make use of
some additional knowledge or well-founded prediction regarding what patterns are
expected.
The addition of context or prediction variables R to the sensory input E such
that D = E + R increases the dimensionality of the pattern input space. The con¬
text variables thus can shift the total input (pattern) vector D to different parts of in¬
put space depending on the context. Thus, as shown in figure 5.28, the ambiguous
patterns Ei and E2, too similar to be reliably recognized as in separate classes, can be
easily distinguished when accompanied by context Rx and R2.
a)
b)
• CONTEXT
• R
PATTERN
E
Figure 5.28: In (a) the two pattern vectors Ex and E2 are too close together in pattern space to
be reliably recognized (i.e., named) as in different classes. In (b) the addition of context R, to
Ex and R2 to E2 makes the vectors Dx and D2far enough apart in pattern + context space to be
easily recognized as in separate classes.
Hierarchical Goal-Directed Behavior 135
In the brain many variables can serve as context variables. In fact, any fiber
carrying information about anything occurring simultaneously with the input pat¬
tern can be regarded as context. Thus context can be data from other sensory
modalities as well as information regarding what is happening in the behavior¬
generating hierarchy. In many cases, data from this latter source is particularly rele¬
vant to the pattern-recognition task, because the sensory input at any instant of time
depends heavily upon what action is currently being executed.
For example, information from the behavior-generating hierarchy provides
contextual information necessary for the visual-processing hierarchy to distinguish
between motion of the eyes and motion of the room about the eyes. In a classic ex¬
periment, von Holst and Mittelstaedt demonstrated that this kind of contextual data
pathway actually exists in insects. They observed that a fly placed in a chamber with
rotating walls will tend to turn in the direction of rotation so as to null the visual mo¬
tion. They then rotated the fly’s head 180° around its body axis (a procedure which
for some reason is not fatal to the fly) and observed that the fly now circled endlessly
because by attempting to null the visual motion it was now actually increasing it.
Later experiments with motion perception in humans showed that the perception of
a stationary environment despite motion of the retinal image caused by moving the
eyes is dependent on contextual information derived from the behavior-generating
hierarchy. The fact that the context is actually derived from the behavior-generating
hierarchy rather than from sensory feedback can be demonstrated by anesthetizing
the eye muscles and observing that the effect depends on the intent to move the eyes
and not on the physical act of movement. The perceptual correction occurs even
when the eye muscles are paralyzed so that no motion actually results from the con¬
scious intent to move.
Contextual information can also provide predictions of what sensory data can
be expected. This allows the sensory-processing modules to do predictive filtering, to
compare incoming data with predicted data, and to detect patterns obscured by
noise or data dropouts.
The mechanism by which such predictions or expectations can be generated is il¬
lustrated in figure 5.24. Here contextual input for the sensory-processing hierarchy
is shown as being processed through an M module before being presented to the sen¬
sory pattern-recognition G modules at each level. Input to the M modules derives
from the P vector of the corresponding behavior-generating hierarchy at the same
level as well as an X vector, which includes context derived from other areas of the
brain such as other sensory modalities or other behavior-generating hierarchies.
These M modules compute R = M(P + X). Their position in the links from the
behavior-generating to the sensory-processing hierarchies allows them to function as
a predictive memory. They are in a position to store and recall (or remember) sen¬
sory experiences (E vector trajectories) which occur simultaneously with P and X
vector trajectories in the behavior-generating hierarchy and other locations within
the brain. For example, data may be stored in each M, module by setting the desired
output R, equal to the sensory experience vector E,. At each instant of time t = k
sensory data represented by Ef will then be stored in an address selected by the
P- + X?vector. The result will be that the sensory experience represented by the sen¬
sory data trajectory TE. will be stored in association with the context trajectory
T/j.-ky..
If we assume, as shown in figure 5.24, that predictive recall modules exist at all
levels of the processing-generating hierarchy, then it is clear that the memory trace
itself is multileveled. In order to recall an experience precisely at all levels, it is
necessary to generate the same Pt + X, address at all levels as existed when the ex¬
perience was recorded.
If the Mi modules have the property of generalization, they will produce a recall
vector Ri which is similar to the stored experience as long as the context vector
P, + X{ is within the neighborhood of the context vector during storage. The more
the context vector during recall resembles the context vector during storage, the
more the recall vector Rt will resemble the experience vector E, which was stored.
We will examine this property of generalization in greater detail in the next chapter
and show how a neurological model of an M module can produce it.
Hierarchical Goal-Directed Behavior 137
We can say that the predictive memory modules Mt define the brain’s internal
model of the external world. They provide answers to the question, “What will hap¬
pen if I do such and such?” The answer is that whatever happened before when such
and such was done will probably happen again. What happened before is what is
stored in the Mt modules. In short, IF I do Y, THEN Z will happen; Z is whatever
was stored in predictive memory the last time (or some statistical average over the
last TV times) that I did Y, and Tis some action such as performing a task or pursu¬
ing a goal in a particular environment or situation. This is represented internally by
the P vectors at the various different levels of the behavior-generating hierarchy and
the X vectors describing the states of various other sensory-processing, behavior¬
generating hierarchies.
Any creature with such a memory structure can hypothesize an action and
receive a mental image of the results of that action before it is performed. Further¬
more, while the activity of behavior is proceeding, the M modules provide expecta¬
tions to be compared with the observed sensory data. This allows the sensory pro¬
cessing modules to detect deviations between the expected and the observed and to
modify behavior on the basis of the difference between the current experience and
previously stored memories.
The Mi modules (as well as the Ht or Gt modules) can be thought of as storing
knowledge in the form of IF/THEN rules. The Pf + Xt input is the IF premise, and
the recalled Rt vector is the THEN consequent. Much of the best and most exciting
work now going on in the field of artificial intelligence revolves around IF/THEN
production rules and how to represent knowledge in large computer programs based
on production rules. Practically any kind of knowledge, or set of beliefs, or rules of
behavior can be represented as a set of production rules. We will explore this topic in
greater depth in later chapters.
CONCLUSIONS
By defining a vector and trajectory notation for describing states and events, we
have completed the first major step in our development. We have suggested a hierar¬
chical computing structure that can execute goals, or intended tasks, in an unpre¬
dictable environment. We have applied our vector notation to this hierarchy to
mathematically and graphically describe the resulting goal-directed behavior. We
have shown how a sensory-processing hierarchy that runs parallel to the behavior¬
generating hierarchy can recognize patterns and detect errors, thus steering behavior
along trajectories that lead to success. Finally, we have shown how memory
modules, addressed by the state of the behavior hierarchy, can recall previous ex¬
periences and generate expectations and predictions of future results.
In the next chapter we will examine a neurological model that has the properties
138
of the H, G, and Mmodules we have discussed here. In later chapters we will suggest
how the brain might use such modules to create plans, solve problems, imagine the
future, understand knowledge, and produce language.
A Neurological Model 139
CHAPTER 6
A Neurological Model
some of which are confined to small, local clusters of neurons, and others which
may thread through several entirely different regions of the brain. As a result, no
one has yet been able to construct a clear picture of the overall information¬
processing architecture in the brain. At present, there is no widely accepted theory
that can bridge the gap between hard neurophysiological measurements and
psychological concepts, such as perception and cognition.
Nevertheless, much is known about the structure and function of at least some
parts of the brain, particularly in the periphery of the sensory and motor systems,
and a great deal can be inferred from this knowledge. In one area, the cerebellar cor¬
tex, the geometry is sufficiently regular to enable researchers to identify positively a
number of important neurophysiological relationships.
The cerebellum, which is attached to the midbrain and nestles up under the
visual cortex as shown in figure 6.1, is intimately involved with control of rapid,
precise, coordinated movements of limbs, hands, and eyes. Injury to the cerebellum
results in motor deficiencies such as overshoot in reaching for objects, lack of coor¬
dination, and the inability to execute delicate tasks or track precisely with the eyes.
During the 1960s, advances in the technology of single-cell recordings and elec¬
tron microscopy made possible an elegant series of experiments by Sir John Eccles
and a number of others that identified the functional interconnections between the
principal components in the cerebellar cortex. A brief outline of the structure and
function of the cerebellar cortex appears in figure 6.2.
MOTOR CORTEX
Figure 6.1: Side view of human brain showing the cerebellum attached to the brain stem and
partially hidden by the visual cortex.
A Neurological Model 141
• The principal input to the cerebellar cortex arrives via mossy fibers (so named
because they looked like moss to the early workers who first observed them through
a microscope). Mossy fibers carry information from a number of different sources
such as the vestibular system (balance), the reticular formation (alerting), the
cerebral cortex (sensory-motor activity), and sensor organs that measure such quan¬
tities as position of joints, tension in tendons, velocity of contraction of muscles,
and pressure on skin. Mossy fibers can be categorized into at least two classes based
on their point of origin: those carrying information that may include commands
from higher levels in the motor system, and those carrying feedback information
about the results of motor outputs. Once these two sets of fibers enter the
cerebellum, however, they intermingle and become virtually indistinguishable.
Figure 6.2: The principal cells and fiber systems of the cerebellar cortex. Command and feed¬
back information arrives via mossy fibers, each of which makes excitatory (+) contact with
several hundred granule cells. Golgi cells sample the response of the granule cells via the
parallel fibers and suppress by inhibitory (-) contacts all but the most highly excited granule
cells. Purkinje cells are the output of the cerebellar cortex. They sum the excitatory (+) effect
of parallel fibers through weighted connections. They also receive inhibitory (-) input from
parallel fibers via basket cell inverters. The strengths of these weights determine the transfer
function of the cerebellar cortex. Climbing fibers are believed to adjust the strength of these
weights so as to train the cerebellum.
142
The feedback mossy fibers tend to exhibit a systematic regularity in the map¬
ping from point of origin of their information to their termination in the cerebellum.
It is thus possible to sketch a map of the body on the surface of the cerebellum cor¬
responding to the origins of feedback mossy fiber information as shown in figure
6.3. This map is not sharply defined, however, and has considerable overlap be¬
tween regions, due in part to extensive intermingling and multiple overlapping of
terminations of the mossy fibers in the cerebellar granule cell layer. Each mossy
fiber branches many times and makes excitatory (+) contact with several hundred
granule cells spaced over a region several millimeters in diameter.
Granule cells, the most numerous cells in the brain, are estimated to number
more than 1010 in the human cerebellum alone. There are 100 to 1000 times as many
granule cells as mossy fibers. Each granule cell is contacted by one to twelve mossy
fibers and gives off a single output axon which rises toward the surface of the
cerebellum. When it nears the surface this axon splits into two parts which run about
1.5 millimeters in opposite directions along the folded ridges of the cerebellum,
making contact with a number of different kinds of cells in passage. These axons
from the granule cells run parallel to each other in a densely packed sheet—hence the
name “parallel fibers.”
Figure 6.3: A map of the surface of the cerebellar cortex showing the point of origin of mossy
fiber feedback and ultimate destination of Purkinje cell ouput.
A Neurological Model 143
One of the cell types contacted by parallel fibers is Golgi cells, named for their
discoverer. These cells have a widely spread dendritic tree and are excited by parallel
fibers over a region about 0.6 millimeters in diameter. Each Golgi cell puts out an
axon that branches extensively, making inhibitory (-) contact with up to 100,000
granule cells in its immediate vicinity, including many of the same granule cells that
excited it. The dendritic trees and axons of neighboring Golgi cells intermingle,
blanketing the entire granular layer with negative feedback. The general effect is
that of an automatic gain control on the level of activity in the parallel fiber sheet.
The Golgi cells are thought to operate such that only a small and controlled percent¬
age (perhaps one percent or less) of the granule cells are allowed above threshold at
any one time regardless of the level of activity of the mossy fiber input. Any par¬
ticular pattern of activity on the mossy fiber input will produce a few granule cells
that are maximally excited and many others that are less than maximally stimulated.
The Golgi cells suppress the outputs of all but the few maximally stimulated granule
cells. The result is that every input pattern, or vector, is transformed by the granule
layer into a small and relatively fixed percentage, or subset, of active parallel fibers.
These fibers not only contact Golgi cells, but also make excitatory contact with
Purkinje cells and basket and stellate cells (named for their shapes) through
weighted connections (synapses). Each Purkinje cell performs a summation over its
inputs and produces an output that is the output of the cerebellar cortex. The basket
and stellate cells are essentially inverters that provide the Purkinje with negative
weights that are summed along with the positive weights from parallel fibers.
A second set of fibers entering the cerebellar cortex are the climbing fibers, so
named because they climb the Purkinje cells like ivy on a tree. Typically, there is one
climbing fiber for each Purkinje cell, and these climbing fibers are believed to have
some role in adjusting the strength of the weighted synaptic connections with the
parallel fibers so as to alter the Purkinje output. Climbing fibers are thus
hypothesized to provide the information required for learning.
mmmv
wmm nnm
mmmm mmi /
mm umm
CUM8M8 FiiiA
mm
S - M - A - p
S-P
P = MS)
S P
P = H(S)
that is,
S = C + F
Some of the elements of the command vector C may define symbolic motor
commands such as < REACH >, < PULL BACK >, or < PUSH >. The remainder
of the elements in C define arguments, or modifiers, such as the velocity of motion
146
desired, the force required, or the position of the terminal point of a motion.
Elements of the feedback vector F may represent physical parameters such as the
position of a particular joint, the tension in a tendon, the velocity of contraction of a
muscle, or the pressure on a patch of skin. ^
The S — M Mapping
The vector components of S must be transmitted from their various points of
origin to their destination in the cerebellar granular layer. Distances may range from
a few inches to more than a foot. This presents a serious engineering problem
because, like all nerve axons, mossy fibers are noisy, unreliable, and imprecise infor¬
mation channels with limited dynamic range. Pulse frequency and pulse phase
modulation (which the brain uses for data transmission over long distances) are sub¬
ject to quantization noise and are bandwidth limited. Nerve axons typically cannot
transmit pulse rates above five hundred pulses per second. Nevertheless, high-
resolution high-bandwidth data is required for precise control of skilled actions.
The brain solves this problem by encoding each of the high-precision variables
to be transmitted so that it can be carried on a large number of low-precision chan¬
nels. Many mossy fibers are assigned to each input variable such that any one fiber
conveys only a small portion of the information content of a single variable.
The nature of this encoding is that any particular mossy fiber will be maximially
active over some limited range of the variable that it encodes and less than maximal¬
ly active over the rest of its variable’s range. For example, the mossy fiber labeled a
in figure 6.6 is maximally active whenever the elbow joint is between 90° and 120°
Figure 6.6: Typical responses of mossy fibers to the sensory variable they encode.
A Neurological Model 147
Figure 6.7: Three different mossy fibers encoding a single sensory variable (elbow position).
All three fibers maximally active simultaneously indicate that the elbow lies between 118° and
120°.
and is less than maximally active for all other elbow positions. The mossy fiber
labeled b in figure 6.6 is maximally active whenever the elbow angle is greater than
160°. If there are a large number of mossy fibers whose responses have a single max¬
imum but which are maximally active over different intervals, then it is possible to
tell the position of the elbow quite precisely by knowing which mossy fibers are max¬
imally active. For example, in figure 6.7, the fact that mossy fibers a, b, and c are
maximally active indicates that the elbow joint is between 118° and 120°.
CMAC models this encoding scheme in the following way. Define rru to be the
set of mossy fibers assigned to convey the value of the variable st. Define m* to be
the mossy fibers in m, which are maximally stimulated by a particular value of $f. If
for every value of s, over its range there exists a unique set mf of maximally active
mossy fibers, then there is a mapping 5, — mf such that knowing mf (i.e., which
fibers in are maximally active) tells us what is the value of s(. If such a mapping is
defined for every component s, in the vector S, then we have a mapping:
mf
m2*
S - M =
mN*
where Mis the set of all mossy fibers in all of the sets rrii where / = 1, . . ., N. In
148
other words, Mis the set of all mossy fibers which encode the variables in the vector
S. t
In CMAC each of the s, — m{* mappings may be defined by a set of K quantiz¬
ing functions 'Ci, C2, . . iCki each of which is offset by \/Kth of the quantizing
interval. An example of this is given in figure 6.8 where K = 4 and N = 2. is
represented along the horizontal axis and the range of st is covered by four quantiz¬
ing functions
1C1 = {A, B, C} A E]
XC2 = \Ff G, A J, K]
lC3 = [M, A A a
*)
XC4 = [5, T, Wy X]
Each quantizing function is offset from the previous one by one resolution element.
For every possible value of there exists a unique set consisting of the set of
values produced by the K quantizing functions. For example, in figure 6.8 the value
sl = 7 maps into the set mx* = [B, H, P, V\.
A similar mapping is also performed on s2 by the set of quantizing functions
2Ci = [a, b, cf d, e\
2C2 = fX g> h, j, k]
2C3 = {m, n, p, q, r}
2C4 = [s, t, Vj w, x]
For example, the value s2 = 10 maps into the set m2* = (c, j, q, v). If the in
figure 6.8 corresponds to the position of the elbow joint, the mossy fiber labeled B
will be maximally active whenever the elbow is between 4 and 7, and less than max¬
imally active whenever the elbow position is outside that region. Similarly, the
mossy fiber labeled H is maximally active when the elbow is between 5 and 8; the
fiber P is maximally active between 6 and 9, and Vbetween 7 and 10, etc. The com¬
bination of mossy fibers in the set = [By H, P, Vj thus indicates that the
variable s1 = 7. If 5*! changes one resolution element, from 7 to 8 for example, the
mossy fiber labeled B will drop out of the maximally active set mf to be replaced by
another labeled C.
This encoding scheme has a number of advantages. The most obvious is that a
single precise variable can be transmitted reliably over a multiplicity of imprecise in¬
formation channels. The resolution, or information content, of the transmitted
variable depends on the number of channels. The more mossy fibers dedicated to a
particular variable, the greater the precision with which it is represented.
A second equally important result is that small changes in the value of the input
variable s,• have no effect on most of the elements in This leads to a property
known as generalization, crucial for learning and recall in a world where no two
situations are ever exactly the same. In CMAC the extent of the neighborhood of
A Neurological Model 149
WII6HTS
Figure 6.8: A simple two-variable CMAC with four quantizing functions on each variable. A
detailed explanation is in the text.
generalization along each variable axis depends on the resolution of the CMAC
quantizing functions. In the brain this corresponds to the width of the maximally ac¬
tive region of the mossy fibers.
The M — A Mapping
Just as we can identify mossy fibers by the input variables they encode, so can
we identify granule cells by the mossy fibers that provide them input. Each granule
cell receives input from several different mossy fibers, and no two granule cells
receive input from the same combination of mossy fibers. This means that we can
compute a unique name, or address, for each granule cell by simply listing the mossy
fibers that contact it. For example, a granule cell contacted by two mossy fibers B
and c can be named, or addressed, Be.
In the CMAC example in figure 6.8, 25 granule cells are identified by their con¬
tacts with mossy fibers from the quantizing functions lCx and 2C1. Another 25
granule cells are identified by lC2 and 2C2, 25 by lC3 and 2C3, and 25 more by lC4 and
2C4. There are, of course, many other possible combinations of mossy fiber names
150
that might be used to identify a much larger number of granule cells. For this simple
example, however, we will limit our selection to the permutation of corresponding
quantizing functions along each of the coordinate axes. This provides a large and
representative sample that uniformly spans the input space. Furthermore, this par¬
ticular naming algorithm is simple to implement in either software or hardware.
We can define A to be the set of all granule cells identified by their mossy fiber
inputs. All of the granule cells in A are not active at the same time. As was previous¬
ly noted, most granule cells are inhibited from firing by Golgi-cell gain control feed¬
back. Only the small percentage of granule cells whose input mossy fibers are all
maximally active can rise above threshold. We will define the set of active granule
cells as A*.
s2
Figure 6.9: The weight Be will be selected as long as the CMAC input vector lies in the region
bounded by 4 < Sj < 7, 8 < s2 < IT
A Neurological Model 151
Since we already know which mossy fibers are maximally active (i.e., those
mossy fibers in the sets w,*), we can compute names of granule cells in A*. For ex¬
ample, in figures 6.8 and 6.10, if sx =7 ands2 = 10, then mx* = {B, H, P, V} and
m2* = [c, j, q, vj . The active granule cells in A* can now be computed directly as
A* = [Be, Hj, Pq, Kv). All other granule cell names in the larger set A involve at
least one mossy fiber which is not maximally active—i.e., not in nh* or m2*.
Note that, as illustrated in figure 6.9, the granule cell Be will be active as long as
the input vector remains in the region of input space 4 < st < 7 and 8 < s2 < 11.
Thus, the generalizing property introduced by the S — M mapping carries through
to the naming of active granule cells. A particular granule cell is active whenever the
input vector S lies within some extended region, or neighborhood, of input space.
52
Figure 6.10: The input vector (su s2) = (7, 10) selects weights Be, Hj, Pq, and Vv. These all
overlap only at the point (7, 10). If the input vector (su s2) moves to (8, 10) the weight Be will
drop out to be replaced by Cc.
152
Other granule cells are active over other neighborhoods. These neighborhoods
overlap, but each is offset from the others so that for any particular input S, the
neighborhoods in A* all overlap at only one point, namely the point defined by the
input vector. This is illustrated in figure 6.10. If the input vector moves one resolu¬
tion element in any direction, for example, from (7,10) to (8,10), one active granule
cell (Be) drops out of A* to be replaced by another (Cc).
The A — p Mapping
Granule cells give rise to parallel fibers which act through weighted connections
on the Purkinje output cell, varying its firing rate. To each cell in A, then, is
associated a weight which may be positive or negative in value. Only the cells in A*
have any effect on the Purkinje output cell. Thus, the Purkinje output sums only the
weights selected, or addressed, by A*. This sum is the CMAC output scalar variable
p. For example, in figure 6.8, S = (7,10) maps into A* = [Be, Hj, Pqt Vv] which
selects the weights
WBc = 1.0
WHj = 2.0
WPq = 1.0
WVv = 0.
p = 4.0
In figure 6.8 four weights are selected for every S vector in input space. Their
sum is the value of the output p. As the input vector moves from any point in input
space to an adjacent point, one weight drops out to be replaced by another. The dif¬
ference in value of the new weight minus the old is the difference in value of the out¬
put at the two adjacent points. Thus, the difference in adjacent weights is the partial
derivative, or partial difference, of the function at that point.
As the input vector S moves over the input space, a value p is output at each
point. We can therefore say that the CMAC computes the function
p = h( S)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 *1
Figure 6.11: The particular set of weights shown in figure 6.8 will compute the function shown
here.
In the cerebellum there are many Purkinje cells that receive input from essen¬
tially the same mossy fibers. Thus, there are many CMACs all computing on the
same input vector S. We can therefore say that a set of L CMACs computing on the
same input vector produces a vector mapping
P = H( S)
One of the most fascinating, intensively studied, and least understood features
of the brain is memory, and how data is stored in memory. In the cerebellum each
Purkinje cell has a unique fiber, a climbing fiber, which is believed to be related to
learning. Recently discovered fibers from an area called the locus coeruleous also
appear to be related to learning. In addition, a number of hormones have profound
effects on learning and retention of learned experiences.
154
Although the exact mechanism, or mechanisms, for memory storage are as yet
unknown, the cerebellar model upon which CMAC is based hypothesizes that climb¬
ing fibers carry error-correction information. The coincidence of climbing fiber er¬
ror signals with Purkinje cell output punishes synapses that participate in erroneous
firing of the Purkinje cell. The amount of error correction occurring at any one ex¬
perience may depend on factors such as the state of arousal or emotional importance
attached by the brain’s evaluation centers to the data being stored during the learn¬
ing process. Other plausible assumptions that input from the locus coeruleous might
mediate learning by reward/punishment signals, that some chemical or hormonal in¬
put might occur, or that all of the above are involved await the future attention of
serious investigation.
In the work to date, cerebellar learning has been modeled in CMAC by the
following procedure:
where \pt - p{\ is the absolute value of p, - p, and £, is an acceptable error, then do
nothing; the desired value is already stored.
However, if \pt - pt\ > £, then add A, to every weight which was summed to
produce p,
A
Pi ~ Pi
where Af = g~j^T[“
and g is a gain factor which controls the amount of error correction pro¬
duced by one learning experience.
A Neurological Model 155
p = sin x sin y
where x = 2xSi/360
and.y = 2xs,/360
can be stored in CMAC is shown in figure 6.12. In this example the input is defined
with unity resolution over the space 0 < s, < 360 and 0 < s2 < 180 and the
number of weights selected by each input is \A*\ = 32.
All the weights were initially equal to zero. The point S, = (90,90) was chosen
for the first data entry. The value of the desired function p = h(90,90) is 1. By for¬
mula (1), where g = 1, each of the weights selected by S = (90,90) is set to 1/32
causing the proper value to be stored at S = (90,90) as shown in figure 6.12a. After
two data storage operations, one at (90,90), the other at (270,90), the contents of the
CMAC memory are as shown in figure 6.12b. The results of 16 storage operations
along the s2 = 90 axis are shown in figure 6.12c. Figure 6.12d shows the contents of
the CMAC memory after 175 storage operations scattered over the entire input
space.
MEMORY SIZE REQUIREMENTS IN CMAC
Figure 6.13b: In CMAC each input selects a unique set of memory locations. The number of
unique sets which can be selected from M locations is much larger than M.
The fact that each possible CMAC input vector selects a unique set of memory
locations rather than a single location implies that any particular location may be
selected by more than one input vector. In fact, the S — A* mapping insures that
any two input vectors which are similar (i.e., close together in input space) will ac-
A Neurological Model 161
tivate many of the same granule cells and hence select many of the same weights.
This is what causes CMAC to generalize.
In figure 6.14a the input vector S2 selects three out of four of the same memory
locations as S,. Thus, the output h(S2) will be similar to h{Si), differing only by the
contents of the single location which is not in common. The S — A* mapping con¬
trols the amount of overlap between sets of selected memory locations such that as
the input space distance between two input vectors increases, the amount of overlap
decreases. Finally, at some distance the overlap becomes zero (except for random
hashing collision) as in figure 6.14b, and the sets of selected memory locations are
disjoint. At that point, input S2 can be said to be outside the neighborhood of
generalization of S,. The value of the output h(S2) is thus independent of h(Si).
The extent of the neighborhood of generalization depends upon both the
number of elements in the set \A*\ and the resolution of the s, — m* mappings. It is
possible in CMAC to make the neighborhood of generalization broad along some
Figure 6.14a: The CMAC memory generalizes. S2 selects three out offour of the same weights
as Si. Thus output h(SJ will be similar to hfSJ, differing only by the contents of the location
not in common.
CMAC TABLE LOOK-UP S, * S2
Figure 6.14b: When S2 is outside of the neighborhood of generalization of Su the overlap goes
to zero (except for random hashing collisions).
variable axes and limited along others by using different resolution quantizing func¬
tions for different input variables. This corresponds to the effect in the cerebellum
where some input variables are finely resolved by many mossy fibers and others
resolved more coarsely by fewer mossy fibers.
A good example of generalization can be seen in figure 6.12a. Following a
single data storage operation at Sx = (90,90) we find that an input vector
S2 = (91,90) will produce the output p = 31/32 even though nothing had ever been
explicitly stored at (91,90). This occurs because S2 selects 31 of the same weights as
Si. A third vector S3 = (92,90) or a fourth S4 = (90,92) will produce p = 30/32
because of sharing 30 weights with Si. Not until two input vectors are more than 32
resolution elements apart do they map into disjoint sets of weights.
As a result of generalization, CMAC memory addresses in the same
neighborhood are not independent. Data storage at any point alters the values stored
at neighboring points. Pulling one point to a particular value as in figure 6.12a pro¬
duces the effect of stretching a rubber sheet.
Generalization has the advantage that data storage, or training, is not required
at every point in the input space in order for an approximately correct response to be
A Neurological Model 163
obtained. This means that a good first approximation to the correct H function can
be stored for a sizable envelope around a Js trajectory by training at only a few
points along that trajectory. For example, figure 6.12c demonstrates that training at
only 16 points along the trajectory defined by s2 = 90 generalizes to approximately
the correct function for all 360 points along that trajectory plus a great many more
points in an envelope around that trajectory. Further training at 175 points scattered
over the entire space generalizes to approximately the correct response for all
360 x 180 (over 64,000) points in the input space as shown in figure 6.12d.
Generalization enables CMAC to predict on the basis of a few representative
learning experiences what the appropriate behavioral response should be for similar
situations. This is esential in order to cope with the complexities of real world en¬
vironments where identical Ts trajectories seldom, if ever, reoccur.
Figure 6.15: Information flow diagram for a robot arm controlled by seven CMACs.
Needless to say, predictions based on generalization are not always correct and
sometimes need to be refined by further learning. The ability of CMAC to
discriminate (i.e., to produce different outputs for different inputs, Si and S2,)
depends upon how many weights selected by Si are not also selected by S2, and how
different in value those weights are. If two inputs which are close together in input
space are desired to produce significantly different outputs, then repeated training
may be required to overcome the (in this case erroneous) tendency of CMAC to
generalize by building up large differences in the few weights not in common.
A Neurological Model 165
SECONDS
Figure 6.16: Two similar trajectories Tf>a and Tpb which have different starting points but the
same endpoint. Both trajectories define a version of an Elemental Movement <SLAP>
which was taught to the CMACs of figure 6.15.
In most behavioral control situations, sharp discontinuities requiring radically
different outputs for highly similar inputs do not occur. Indeed, most servo-control
functions have simple S-shaped characteristics along each variable axis. The com¬
plexity in control computation in multivariant servo systems typically derives from
cross-products which affect the slope of the function or produce skewness, and non-
symmetrical hills and valleys in various corners of the N-dimensional space. As can
be seen from figure 6.11, these are the type of functions CMAC can readily store
and hence compute. Nevertheless, even on smooth functions, generalization may
sometimes introduce errors by altering values stored at neighboring locations which
were already correct. This type of error corresponds to what psychologists call learn¬
ing interference, or retroactive inhibition.
TRAINING SESSIONS
Figure 6.17: CMAC learning and generalization performance on the <SLAP> motion .
Curve i is with no previous training. Curve ii is after 20 training sessions on the similar trajec¬
tory TV The improvement of ii over i is due to generalization.
A Neurological Model 167
For example, in the learning of the two similar trajectories in figure 6.16, train¬
ing on T>fe causes degradation or interference with what was previously learned on
Tpa. This can be seen in figure 6.18 where after 20 training sessions on Tpw, the
CMAC is trained 20 sessions on T^. Following this, the performance on is
degraded. However, the error rate on Tpa quickly improves over another 20 training
sessions. Following this, another 20 training sessions are conducted on Tpb. Again,
degradation in Tpa due to learning interference occurs, but not as severely as before.
Another set of 20 training sessions on Tpa followed by another 20 on Tpb shows that
the amount of learning interference is declining due to the buildup of values in the
few weights which are not common to both Tpa and Tpb. Thus, learning interference,
or retroactive inhibition, is overcome by iterative repetition of the learning process.
The type of learning thus described might be called learning by error correction.
In this case, the H function is learned by comparing its output with a desired result.
When the behavior produced is correct, no change is made. When the behavior pro¬
duced deviates from the ideal, changes are made in synaptic strengths to move the
output toward the ideal. Learning by error correction requires the existence of an
ideal vector P which traces out an ideal trajectory Tp.
168
What provides this ideal vector in a biological system? One possibility is the ex¬
istence of an external teacher. In many learning situations we have a teacher who,
like a golf instructor, tells us what we are doing wrong: hold your shoulder down,
keep your arm straight, lean over more, etc. In these cases the teacher provides the P
vectors and the Tp trajectories. The promise of reward or threat of punishment pro¬
vides the alerting motivation to set the learning gain coefficient to a high value.
A simple example of this type of learning is the programming of a first-
generation industrial robot. The robot is led through a series of points correspond¬
ing to a desired trajectory Tp, and a set of selected points is recorded. This training
procedure defines an H function on a two-dimension space where sx = the task
name and s2 = the program step counter. H is defined such that P - P at every
point in the table defining the space as shown in figure 6.19.
Even if we assume an external teacher, however, there is still a problem of how
the individual components of the P vectors could be generated in the brain. The
simplest case to understand is the storage of R vectors in the M modules. The M
modules record memories of events that occur in the sensory data stream. This is
H
TASK
PROGRAM
NAME TASK STEP
NAME
*1 S2
COUNT
P
C 1
C 2 9Z
c 3 r
c 4 r
c 5
c 6 'p6
• •
• •
• •
Figure 6.19: A simple controller for an industrial robot. Input Si defines the name of a task,
and input s2 defines the step in the task. The H function is defined by the table. For every set
of inputs (Si, s2) there is some point to which the robot should go. The entire set of points in
the order defined constitutes the robot's programmed trajectory. The robot is taught by
leading it through a desired trajectory defined by the set of desired points pj and recording
them.
A Neurological Model 169
represented by the E vectors and the TE trajectories. Thus, the E vector can provide
the R teacher for the learning that takes place in the M modules as shown in figure
6.20. Memories of past experiences E are recorded in the M modules as R vectors.
Errors in recalled R vectors are corrected by repeated learning experiences.
Learning in the M modules can be rapid and precise. Often a single learning ex¬
perience is sufficient to remember a lengthy and detailed record of a complex ex¬
perience. Repeated learning experiences refine what is stored in the manner sug¬
gested by figures 6.17 and 6.18. Memories of sensory experiences are addressed by
the state of the brain and particularly by the state of the behavior-generating hierar¬
chy that existed when the memory was stored. Learning interference is dependent
upon the degree of similarity between the states of the brain during two learning ex¬
periences.
For the H function in the behavior-generating hierarchy, on the^other hand,
there is no readily discernible internal mechanism that can provide the P vectors and
the Tp trajectories for error-correction learning. How is it, then, that the learning
takes place in the H modules so that input command can produce strings of outputs
strung together in such a way that skilled movements result?
One possible mechanism for training the //modules is instrumental condition¬
ing, the type of learning which is most closely associated with B.F. Skinner. In this
kind of learning there is no example: the teacher gives no information as to what
should be done, what the error is, or what should be done to correct it. The teacher
(or the contingencies of the environment) simply rewards whatever behavior is
desirable or successful when it occurs and punishes whatever behavior is undesirable
Figure 6.20: Learning in themodules is accomplished by using the observed experience vec¬
tor E as the desired output R of the M module. Thus, memories of experiences are stored on
addresses provided by the action decomposition vector P plus the whole brain context vector
X. Recall of the stored experience is most precise when the P + X vector is the same as when
the experience was stored.
170
This suggests that the learning of lower level skills in the H modules may occur
by making all the outputs fire when an unlearned input command first occurs, and
then finding out by trial and error which outputs improve performance by being
reduced in value. This also suggests that the M modules in the vision system may
have first stored an example of how an action of the arm or leg should look, and that
the sensory-processing system has available some mechanism for evaluating how
well the visual observation compares with the ideal. Thus, the M modules of one
modality, such as vision, may store ideal sensory trajectories and teach the H
modules of another modality, such as a limb, to produce actions that result in sen¬
sory observations that match the stored ideals.
This is a form of error-correction learning. However, it does not necessarily re¬
quire that the sensory system produce P vectors for training the //modules. In this
type of learning, we can assume the sensory modules can only detect whether the
results are good or bad and by how much. They don’t need to have a way of com¬
puting which neurons in the H modules are firing too fast and which are firing too
slow. Thus, the sensory system of vision can only provide instrumental conditoning
reinforcement signals to the //modules of the limb, and if these signals are primarily
negative reinforcers upon the detection of errors, the learning process will terminate
when the errors are reduced to zero. This prevents the saturation effects which are
otherwise associated with instrumental conditioning.
Even at the higher levels, however, instrumental conditioning would not work
without the assumption of a hierarchy. This is because the reinforcing signal usually
comes only at the end of a lengthy and complex sequence. For example, in figure
6.21, we see diagrammed the set of trajectories generated by the actions of a cat,
which upon seeing a wind-up toy mouse, pursues it, captures it, and only then learns
that it is not a real mouse. The disappointment is a negative reinforcement. If there
were no hierarchy, the negative reinforcement would be applied only to the set of
synapses involved in the capture phase of the activity. In this case, the cat would
never learn to ignore the fake mouse. This is because the set of synapses that are ac¬
tive during the capture phase are different from the set which are active in the goal-
selection process. Even if the negative reinforcement were applied to every action
over the entire <HUNT> task, then all the activities such as < TRACK > and
< PURSUE > which occurred during the < HUNT > activity would be inhibited in¬
discriminately.
However, if the activity is generated in a hierarchy, there can be different time
constants for reinforcement at the different levels. A long time constant at the
higher level will cause the discovery of the fake to affect the set of weights which are
active when the goal <HUNT> is selected. At this level the weights generating the
<HUNT> action are activated during the entire <HUNT> sequence. Therefore,
the corresponding synapses are well marked by actually being active when the rein¬
forcing signal is received. These same synapses will again be active the next time the
mouse runs. A short time constant at the lower level will leave the < TRACK > and
< PURSUE > subgoals unaffected. The learning of the proper behaviorial response
HUNT
FEEL DISAPPOINTMENT
TIME -►
Figure 6.21; A pair of trajectories generated in the behavior of a cat upon being fooled by a
wind-up toy mouse. The cat tracks, pursues, and captures it, only then to learn that it is not a
real mouse. This diagram illustrates how the cat can learn not to pursue a toy mouse even
though the reinforcer occurred during the capture phase, and not immediately following the
pursuit phase. Explanation is in text. Learning occurs at many different levels in the hierarchy.
At all levels, learning works by storing what just occurred in the recent past. Memory storage
is accomplished by altering synaptic sites that were active shortly before the storage signal, or
learning reinforcer, occurred. Different levels have different time constants. At higher levels,
the time rate of change of behavior and sensory vectors is slower. Thus, the learning reinforcer
at higher levels can operate over longer time intervals. Learning of high-level behavioral pat¬
terns can therefore be effective even when reinforcement occurs after a lengthy delay.
for the real and fake mouse is made at the higher level on the set of synapses that are
different in the real and fake situations. Those evoked only by the real mouse are in¬
creased by the reward of capturing the prey. Those evoked only by the fake mouse
are decreased by the disappointment of capturing a fake. Thus, the cat quickly
learns to select the <HUNT> goal only in the case of the real mouse.
This, of course, assumes that the cat can distinguish between the real and the
fake. In learning situations where there is no distinguishable difference between
stimuli leading to reward and punishment, the subject becomes uncertain and in¬
capable of decisive action.
A third kind of learning is stimulus-response, or classical conditioning. This
type of learning is most often associated with Pavlov. It consists of the procedure of
presenting a conditioned stimulus (CS) such as the ringing of a bell just before an
unconditioned stimulus (US) such as the taste of food. The unconditioned stimulus
A Neurological Model 173
interconnections that produce behavior are predetermined and fixed. For higher
forms, at least some of the predetermined synaptic interconnection patterns are
modifiable. Even in humans, there are basic prewired reflexes which form the basic
neuronal substrate in which learning takes place. The learning process doesn’t begin
with a blank slate, but with a set of genetically prewired computing modules that are
modifiable in some creatures.
In a hierarchy, learning must begin from the bottom up. In a newborn infant,
the neuronal computation centers in the motor hierarchy are defined only by a few
basic inborn reflexes, which themselves have developed in the context of the
environment of the womb. Somehow a change occurs in the H, M, and G modules
so that input commands can produce outputs strung together in such a way that
skilled movements result.
Studies in child development indicate that simple motion primitives are learned
first. Piaget has categorized the sequence of motor development in children into 11
stages over the first six years. Table I shows the sequential series of motor and
perceptual skills that are acquired by a child in sequence as learning and maturation
progress. At birth the child exhibits only reflexive grasping and arm-waving. Soon
it develops the ability to intentionally control these. At about four months the child
can track hand movements with the eyes and shortly thereafter can direct its hand to
a target selected by the eyes. Soon thereafter the child learns to manipulate objects
and distinguish between the changing shapes projected onto the retina by a rigid
object as it is rotated or moved and the changes wrought by the mechanical defor¬
mation of a plastic object. Primitive movements, which are simple adaptations of
prewired reflexes, are learned first. Then sequences of primitives are strung together
into elemental movements. Strings of elemental movements are put together into
simple skills and strings of simple skills become complex skills, etc. In the early
years, instant reinforcement is important because the time constants of the lower
levels of the behavioral hierarchy are short. As learning moves to the higher levels,
delayed reinforcement is adequate.
A similar progression in language skills can be observed. The newborn infant is
born with only the most basic verbal reflexes. At first speech primitives consisting of
coos, gurgles, cries, and various phonetic sounds are learned. These are followed by
strings of primitives formed into words and then strings of words combined into
phrases. At each level, the Mmodules store sounds from the environment as TR tra¬
jectories. Later the behavior-generating system learns to produce verbal outputs
which mimic or duplicate these stored trajectories.
This sequence of events has also been demonstrated in birds. Marler, Tamura
and Konishi have shown that a young white-crowned sparrow must hear the song of
an adult at a particular time during its maturation in order to learn to sing. The
young bird does not actually sing until a year after this learning experience. If the
bird is deafened after hearing the adult song (i.e., after storage of the song in M) but
before hearing itself sing (i.e., before transfer of the memory in Mto the behavior in
//), the result is the same as if the bird had been deafened at birth. However, once a
A Neurological Model 175
bird has had an opportunity to compare its own song with the remembered song, it
can continue to sing normally even if it is subsequently deafened.
Similarly in human children, the sound of adult speech is heard and
remembered and at some future date is reproduced. Children are known to practice
making sounds and to repeat phrases to themselves, testing the sounds and matching
them to remembered experiences. By this means the child learns pronunciation and
acquires not only words and idioms, but also dialects and accents.
This theory of learning assumes that there is some mechanism that has the
capacity for measuring the degree of correspondence between the currently observed
experience and the recalled memory. This is the function of the G modules, which
compare the sensory data stream with the recalled expectations from the Mmodules.
Since this function is a prerequisite to learning we can assume that the G functions
are primarily prewired. However, it seems probable that the G functions are also
subsequently modified by learning. This could occur by the mechanism of reinforc¬
ing the detection of certain features particularly advantageous for distinguishing
between situations leading to reward and those leading to punishment.
To a very large extent the process of recognition and perception consists of
creating the correct set of hypotheses in the behavior-generating hierarchy that
addresses the M modules in a way that generates a set of expectations to match the
current sensory data stream. The computation of match, or the filtering of data
through a mask provided by the M modules, is a more or less standard type of
computation that might well be genetically predetermined. Thus, learning can play
some role in the ability of the G functions to extract features but not in the correla¬
tion of the R vectors with the E vectors.
CMAC AS A COMPUTER
The ability of CMAC to store and recall (and hence compute) a general class of
multivariant mathematical functions of the form P = H(S) suggests that a relatively
small cluster of neurons might also be able to calculate the type of mathematical
functions required for multivariant servos, coordinate transformations, conditional
branches, task-decomposition operators, and IF/THEN production rules. These are
the types of functions discussed in Chapter 5 that were required for generating goal-
directed behavior (i.e., the purposive strings of behavioral patterns such as running,
jumping, flying, hunting, fleeing, fighting, and mating which are routinely ac¬
complished with apparent ease by the tiniest rodents, birds, and even insects).
In the case of multivariant servos the S vector corresponds to commands plus
feedback (i.e., S = C + F). For coordinate transformations the S vector contains
the arguments as well as the variables in the transformation matrix.
176
TABLE I
Eleven distinctly separate hierarchical stages of sensory-motor development in human children be¬
tween birth and about seven years of age (after Piaget and Inhelder).
p = sin Jt sin y if S3 = 0
p = 3x + 5y2 if S3 = 50
In the interval 0 < s3 < 50, the function would change smoothly from p = sin x
sin y to p = 3x + 5y2. Additional functions could be stored for other values of s3,
or other conditional variables s4, s5, . . . might be used for additional branching
capabilities.
If these conditional variables are part of a command vector, then each different
A Neurological Model 177
input command can select a different subgoal generator. If they are part of the feed¬
back, then different environmental conditions can trigger different behavioral pat¬
terns for accomplishing the subgoals.
Finite-State Automata
If some of the variables in the P output vector loop directly back to become part
of the S input vector (as frequently happens in the cerebellum as well as in other
parts of the brain), then CMAC becomes a type of finite-state automaton, string
generator, or task-decomposition operator. For example, the CMAC in figure 6.22a
behaves like the finite-state automaton in 6.22b. The loop-back inputs st and s2
define the state of the machine and s3 is the input. The H function defines the state
transition table. In general, it is possible to construct a CMAC equivalent of any
finite-state automaton. Of course, CMAC can accept inputs and produce outputs
which are non-binary. Furthermore, the outputs generalize. Thus, CMAC is a sort
of “fuzzy-state automaton.”
a) H
S P
*1 *2 53 P\ pz
0 0 0 1 0
1 0 0 0 0
0 ! 0 0 1
1 1 0
P\ 0 0 1 1 1
1 0 1 0 1
P*
0 1 1 0 0
1 1 1 1 0
b) 0
Figure 6.22: A CMAC with feedback directly from output to input behaves like a finite-state
automaton for binary inputs and outputs. It behaves like a 'fuzzy-state automaton” for non¬
binary S and P variables. (su s2) = the state; (pu p2) = the next-state.
A CMAC with direct feedback from output to input demonstrates how a neural
cluster can generate a string of outputs (subgoals) in response to a single input, or
unchanging string of inputs. Additional variables added to F from an external
source increase the dimensionality of the input space and can: thus alter the output
string (task decomposition) in response to environmental conditions.
The different possible feedback pathways to a CMAC control module cast light
on a long-standing controversy in neurophysiology: are behavioral patterns
generated by “stimulus-response chaining,, (i.e., a sequence of actions in which
feedback from sensory organs is required to step from one action to the next) or by
“central patterning’7 (i.e., a sequence which is generated by internal means alone)?
A CMAC hierarchy can include tight feedback loops from the output of one level
back to its own input to generate central patterns, and longer internal loops from
one level to another to cycle through a sequence of central patterns, as well as feed¬
back from the environment to select or modify central patterns or their sequence in
accordance with environmental conditions.
Computing Integrals
Direct feedback from output to input can also be used to compute the integral
of a function. If an input carrying feedback from the output is simply added to the
other inputs, the resultant function is the integral of the function computed on the
non-feedback inputs. This is illustrated in figure 6.23. If the function pk = h(sk~\
. . . , skN~l) is computed by the module in figure 6.23a without feedback, then the
function qk = h(sk~\ . . . , sir1) + qk~x computed by the module in figure 6.23b is
the integral of p. For example, if an h module computes the function p = then
an h' module computing q = sts2 4- q computes the integral
P* • A ...S*-1 )
<7* ■ h(s'K-1
r^^+ q K~ I
Figure 6.23: If the output of a CMAC is fed back to an input which simply adds it to the h
function otherwise computed, then the CMAC computes the integral of that h function.
IF/THEN Productions
The ability of CMAC to compute simple arithmetic functions also suggests how
CMAC might implement IF/THEN production rules. If the § vector (or the Ts
trajectory) corresponds to a set of conditions making up an IF premise, then the P
vector (or TP trajectory) output is the THEN consequent. We have already shown
how symbols can be represented as vectors and vice versa. Thus, the computation of
an IF/THEN rule reduces to an arithmetic computation of the form P = H(S).
The capability of CMAC to simulate a finite-state automaton, to execute the
equivalent of a conditional branch, and to compute a broad class of multivariant
arithmetic and integral functions implies that it is possible to construct a goal¬
seeking hierarchy of H, M, and G modules using nothing but CMACs. Conversely,
it is possible to construct a hierarchy of computing modules, perhaps implemented
on a network of microcomputers, which is the equivalent of a CMAC hierarchy.
This has profound implications regarding the type of computing architecture which
might be used to build a model of the brain for robot control. It suggests how we
might structure control systems so as to give robots the skills and intellectual
capabilities of biological organisms. We will return to these and other practical
issues for robot-control systems in Chapter 9.
Modeling the Higher Functions 181
CHAPTER 7
the CMAC H function can be chosen to mimic the neuronal characteristics of dif¬
ferent areas in the brain.
It’s an old idea that the central nervous system, which generates behavior in
biological organisms, is hierarchically structured. The idea dates back considerably
more than a century. The analogy is often made to a military command structure,
with many hundreds of operational units and thousands, even millions, of in¬
dividual soldiers coordinated in the execution of complex tasks or goals. In this
analogy each computing center in the behavior-generating hierarchy is like a military
command post, receiving commands from immediate superiors and issuing se¬
quences of subcommands which carry out those commands to subordinates.
Feedback is provided to each level by a sensory-processing hierarchy that
ascends parallel to the behavior-generating hierarchy, and that operates on a data
stream derived from sensory units that monitor the external environment as well as
from lower level command centers that report on the progress being made in carry¬
ing out their subcommands. Feedback is processed at many levels in this ascending
hierarchy by intelligence analysis centers that extract data relevant to the command
and control function being performed by the behavior-generating module at that
level.
Each of these intelligence analysis centers makes predictions based on the
results expected (i.e., casualties, rewards, sensory data patterns) because of actions
taken. The intelligence centers then interpret the sensory data they receive in the con¬
text of these predictions. For example, in the military intelligence analogy, a loss of
60 men in an operation where losses had been predicted at 600 implies an unex¬
pectedly easy success and perhaps indicates a weakness in the enemy position that
should be further exploited. In the brain the observation of 60 nerve impulses on an
axon where 600 had been anticipated may imply an unexpectedly weak branch in a
tree which, if used for support, could result in a fatal fall.
The response of each command post (or data analysis center) in the hierarchy to
its input depends on how it has been trained. Basic training teaches each soldier how
to do things the “army way” (i.e., what each command means and how it should be
carried out). Each operational unit in the military has a field manual that defines the
proper or ideal response of that unit to every foreseeable battlefield situation. Each
field manual is essentially a set of IF/THEN production rules or case statements,
corresponding to a set of CMAC functions, P = H($) or Q = G(D). At the lowest
level in the military analogy these rules define the proper procedures for maintaining
and operating weapons, as well as the proper behavioral patterns for surviving and
carrying out assignments under battlefield conditions. At higher levels they define
the proper tactics for executing various kinds of maneuvers. At the highest level,
they define the proper strategy for deployment of resources and achievement of ob¬
jectives.
In the case where each unit carries out its assignment “according to the book,”
the overall operation runs smoothly and the goal is achieved on schedule as ex¬
pected. To the extent that various units do not follow their ideal trajectories, either
Modeling the Higher Functions 183
There is, in fact, some evidence to suggest that the human brain is topologically
similar to three (or more) concentric paraboloid hierarchies as illustrated in figure
7.2. Paul MacLean and others have hypothesized a triune brain wherein the inner
LUNGS
LARYNX
Figure 7.2: The human brain is hypothesized to be a composite structure consisting of at least
three layers: (1) a reptilian brain which provides basic reflexes and instinctive responses; (2) an
old mammalian brain which is more sophisticated and capable of emotions and delayed
responses; and (3) a new mammalian brain which can imagine, plan, and manipulate abstract
symbols. The outer layers inhibit and modulate the more primitive tendencies of the inner
layers.
Modeling the Higher Functions 185
core is a primitive structure (i.e., the reptilian brain) which provides vital functions
such as breathing and basic reflexive or instinctive responses such as eating, fighting,
fleeing, and reproductive activities. Superimposed on this inner core is a second
layer (i.e., the old mammalian brain) that is capable of more sophisticated sensory
analysis and control. This second layer tends to inhibit the simple and direct
responses of the first so they can be applied more selectively and responses can be
delayed until opportune moments. This second brain thus provides the elements of
planning and problem-solving, prediction, expectation, emotional evaluation, and
delayed response to stimuli that have disappeared from direct observation. These are
the characteristics of behavior which make possible the complex hunting strategies
and social interactions of the mammals.
On top of this is yet a third layer (i.e., the new mammalian brain) which
possesses the capacity to manipulate the other two layers in extremely subtle ways:
to conceive elaborate plans, to imagine the unseen, to scheme and connive, to
generate and recognize signs and symbols, to speak and understand what is spoken.
The outer layers employ much more sophisticated sensory analysis and control
algorithms that detect greater subtleties and make more complex decisions than the
inner more primitive layers are capable of performing. Under normal conditions the
outer layers modify, modulate, and sometimes even reverse the sense of the more
primitive responses of the inner layers. However, during periods of stress, the highly
sophisticated outer layers may encounter computational overload and become con¬
fused or panicked. When this happens, the inner core hierarchy may be released
from inhibition and execute one of the primitive survival procedures stored in it
(i.e., fight, flee, or freeze). A similar takeover by the inner hierarchy can occur if the
more delicate circuitry of the outer is disrupted by physical injury or other trauma.
Thus the brain uses its redundancy to increase reliability in a hostile environment.
Of course, all three layers of the behavior-generating hierarchy come together
at the bottom level in the motor neurons—the final common pathway.
In the military hierarchy analogy, the motor neurons are the foot soldiers. They
actually drive the muscles and glands to produce action. Their output firing rate
defines the output trajectory of the behavior-generating hierarchy. A CMAC
representing a spinal motor neuron and its associated interneurons receives com¬
mands C from higher motor centers as well as feedback F from stretch receptors and
tendon organs via the dorsal roots. Additional components of the F vector come
from other motor neurons reporting ongoing activity in related muscles. Com¬
ponents of the command vector to this lowest level come from the vestibular system,
which provides inertial reference signals necessary for posture and balance. Other
components of C come from the reticular formation, the red nucleus, and in
186
primates, also directly from the motor cortex. A more detailed description of the ac¬
tions of this lowest level motor computational center is contained in the section on
the stretch reflex in Chapter 4. See particularly figures 4.6 through 4.9.
There is nothing analogous to climbing fibers for the motor neurons, but this is
not surprising since there is evidence that little or no learning takes place at this first
level in the behavior-generating hierarchy.
Much of the vestibular system input passes through, or is modulated by, the
cerebellum, which receives feedback from joint position sensors, tendon tension sen¬
sors, and skin touch sensors. Thus, parts of the cerebellum, together with the
primary motor cortex, the red nucleus, and the reticular formation represent a sec¬
ond level in the motor hierarchy.
The second level of the new mammalian motor hierarchy includes some neurons
in the cortex. The motor cortex contribution to the second level has been called the
transcortical servo-loop by Phillips. Evarts and Tanji have observed cells in the
motor cortex whose response P to a stretch stimulus F can be altered (indeed com¬
pletely reversed) by different command inputs C. As shown in figure 7.3, an ex¬
perimental animal was trained to pull a lever upon feeling a jerk if a red light pre¬
ceded the stimulus and push the lever if a green light preceded the stimulus. Both the
command C (low firing rate = red, high = green) and the altered response P (pull
ifC = low, push if C = high) are observed. There is a measurable time delay which
clearly separates the effect of feedback to the lowest level (10-20 milliseconds), feed¬
back to the second level (30-50 milliseconds), and changes in command inputs to the
second level (100-200 milliseconds).
Other experiments by Evarts and Thach have shown that neurons in the
cerebellum, thalamus, and motor cortex alter their firing rates at various intervals
prior to learned movements, and well in advance of any response feedback. This is
the propagation of goals and subgoals down the motor hierarchy as the various
levels receive commands and issue subcommands in preparation for the initiation of
a task.
Further evidence that hierarchical structures exist and function as AND/OR
task-decomposition operators in the generation and control of motor behavior can
be found in almost any neurophysiological textbook. For example, brain stem
transection experiments with animals and observations of injured humans where the
spinal cord is severed at different levels have demonstrated a consistent hierarchical
structuring of the sensory-motor system. If, as is shown in figure 7.4, the cord is
severed from the brain along the line A-A, most of the basic motor patterns such as
the flexor reflex and the reflexes that control the basic rhythm and patterns of
locomotion remain intact. However, coordinated activation of these patterns to
stand up and support the body against gravity requires that the regions below B-B be
intact.
The stringing together of different postures to permit walking and turning
movements requires the regions below C-C to be undamaged. In particular it is
known that the rotational movements of the head and eyes are generated in the in-
Modeling the Higher Functions 187
REFLEX VOLUNTARY
0 .5 1
TIME (SECONDS)
Figure 7.3: Firing rate on a motor-cortex neuron in an experiment designed to examine the
relation between voluntary and reflex responses. A red or green “get-set” light was turned on
from 1-5 s before a “go” signal, a mechanical displacement of the handle from its neutral
position. The change in firing rate in response to the get-set signals required about 200 ms to
appear. Within about 10 ms of the “go” signal, a reflex response to the mechanical displace¬
ment can be seen. After about 40 ms} the voluntary response of push (increase firing) or pull
(stop firing) can be seen to appear. The silence at the top far right and the activity at the bot¬
tom far right are due to the subject returning the handle to neutral after pushing (top) or pull¬
ing (bottom). [From “Brain Mechanisms of Movement,” by E. V. Evarts. Copyright © 1979 by Scientific American,
Inc. All rights reserved.]
terstitial nucleus; the raising and lowering of the head in the prestitial nucleus; and
the flexing movements of the head and body in the nucleus precommissuralis.
Stimulation of the subthalamic nuclei can cause rhythmic motions including walk¬
ing. A cat with its brain sectioned along C-C can walk almost normally. However, it
cannot vary its walking patterns to avoid obstacles.
188
Figure 7.4; The hierarchy of motor control that exists in the extrapyramidal motor system.
Basic reflexes remain even if the brain stem is cut at A-A. Coordination of these reflexes for
standing is possible if the cut is at B-B. The sequential coordination required for walking re¬
quires the area below C-C to be operable. Simple tasks can be executed if the region below
D-D is intact. Lengthy tasks and complex goals require the cerebral cortex.
Animals whose brains are cut along the line D-D can walk, avoid obstacles, eat,
fight, and carry on normal sexual activities. However, they lack purposiveness. They
cannot execute lengthy tasks or goals. Humans with brain disease in the basal
ganglia might perform an apparently normal pattern of movements for a few
seconds and then abruptly switch to a different pattern, and then another. One form
of this disease is called St. Vitus’ dance.
Higher levels of the behavior-generating hierarchy become increasingly difficult
to identify and localize, but there is much to indicate that many additional levels
exist in the cerebral cortex. For example, the motor cortex appears to be responsible
for initiating commands for complex tasks. The ability to organize lengthy se¬
quences of tasks, such as the ability to arrange words into a coherent thought or to
recall the memory of a lengthy past experience, seems to reside in the posterior tem¬
poral lobe. Interactions between emotions and intentional behavior appear to take
place in the mediobasal cortex, and long term plans and goals are believed to derive
from activity in the frontal cortex. Hierarchies of different systems (i.e., vision,
hearing, manipulation, and locomotion) merge together in the association areas.
Modeling the Higher Functions 189
Figure 7.5: A two-dimensional array of sensory-processing CMACs such as might exist in the
visual system. The observed sensory image Ei plus the prediction vector R1 enters and is
recognized by the operator Gt as a pattern. The vector Rt may select one of many filter func¬
tions or provide an expected image or map to be compared against the observed image.
190
light intensity (perhaps in a particular color band) together with predicted variables
R, which select a particular filter function. The output Qi = Gi(Dt) then might
define a pattern of edges or line segments. This output forms part of the input E2 to
the second level. Output from the second level Q2 = G2(D2), might define patterns
of connected regions or segments.
Recent work by David Marr at the Massachusetts Institute of Technology and
Jay Tennenbaum at SRI International suggests that the output vectors Q, at various
levels may define more than one type of feature. For example, a single level in the
visual-processing system might contain a depth image (derived from stereo disparity,
light gradients, local edge-interaction cues, etc,), a velocity image (derived from mo¬
tion detectors), and an outline-drawing image (derived from edge detectors, line,
and corner finders) in addition to brightness, color, and texture images of the visual
field as shown in figure 7.6. These and many other kinds of information appear to
Figure 7.6: A set of intrinsic images (b) through (e) derived from the single monochrome in¬
tensity image (a). The images are depicted as line drawings, but, in fact, contain values at
every point. The solid lines in the intrinsic images represent discontinuities in the scene
characteristic; the dashed lines represent discontinuities in its derivative. The distance image
(b) gives the range along the line of sight to each visible point in the scene. The reflection im¬
age (c) gives the ratio of reflected light to illumination at every point. The orientation image
(d) gives a vector representing the direction of the surface normal at every point. The illumina¬
tion image (e) gives the amount of light falling on the scene at every point.
Modeling the Higher Functions 191
Figure 7.7: The various intrinsic images can be developed at various hierarchical levels and
brought into registration for sophisticated parallel computations related to image understand¬
ing.
192
CROSS-COUPLING /
system, cochlear hair cells are excited by mechanical and electrical stimuli with fre¬
quencies ranging from about 20 Hz to 20,000 Hz. These sensory inputs thus have
periodicities from 0.00005 to 0.05 seconds.
The highest frequency a nerve axon can transmit is about 500 Hz, but the brain
handles higher frequencies in a manner somewhat reminiscent of the cerebellum’s
encoding of precise position. It encodes pieces of information about the phase of a
wavefront on a number of different fibers. (See figure 3.27.) This means that by
knowing which fibers are firing in which combinations at which instants, one can
compute not only what is the fundamental pitch of the temporal pattern but what
are all of its overtones. Thus, the CMAC G function at the lowest level (or really the
loop comprised of the lowest level G, H, and M modules) can compute the Fourier
transform, or the autocorrelation function, and presumably even the Bessel function
describing the modes of vibration of the cochlear membrane.
Assume, for example, that the G, H, and Mmodules in figure 7.8 constitute a
phase-lock loop such that the input PATTERN is a signal f(t) and the PREDIC¬
TION is another signal F(t - t). If the processing module G computes the integral
Q c
NAME HYPOTHESIS
EXP
PATTERN /*(/) CONTEXT CONTEXT
Figure 7.8: A phase-lock loop consisting of a G, H, and M module. If the H and M modules
produce a set of signals with nearly the same periodicity as the incoming signal E, the G func¬
tion can compute a phase error signal F which pulls the R prediction into lock with the E
observation. The G module can then also compute an autocorrelation function which gives a
perception of pitch.
of the product of the PATTERN • PREDICTION, then the output NAME is
jf(t) • f(t - r). When r corresponds to 1/4 of the period of the input/(0, the in¬
tegral of the output will produce a phase ERROR signal which, when applied to the
//module, can enable the PREDICTION signal f(t - r) to track and lock onto the
input PATTERN f(t). If the loop consists of a multiplicity of pathways with dif¬
ferent delays (r > 0), the output, when integrated, will produce an autocorrelation
function
Mr) = Hm i- ( - r) at
2 T J ~T
such that qx = ^(tj)
Ql =
Q -<
& = 4>M
Figure 7.9 suggests how a hierarchy of phase locked loops might interact to
recognize the variety of periodicities which provide the information content in
spoken language and music. The coefficients qt obtained from the lowest level loop
form the input (together with other variables) to the second level.
If we assume that the sensory input to the first level consists of a pattern rich in
information, such as music or speech, then as time progresses the trajectory of the
input vector to the second level will also contain many periodicities. The principal
difference from the standpoint of information theory is that the periodicity is now
on the order of 0.05 seconds to 0.5 seconds. The trajectory input to the second level
can, of course, be subjected to a quite similar mathematical analysis as were the tra¬
jectories of hair cell distortions and cochlear electrical stimulation which were input
to the first level.
The principal difference is that at the second level and higher, information can
be encoded for neural transmission by pulse frequency rather than pulse-phase
modulation. Also, some of the mechanisms by which time integrals are computed
Modeling the Higher Functions 195
Figure 7.9: A cross-coupled hierarchy in the hearing-speech system. The generating hierarchy
decomposes language goals into strings of verbal output. When speech is being generated', the
sensory-processing hierarchy provides feedback to control intensity and modulation. For
listening only, the generating hierarchy provides hypotheses and predictions for use in detect¬
ing, recognizing, following, and understanding the sensory input.
computes names of strings of words which it calls sentences (or phrases), strings of
tunes which it calls musical passages, etc. In music, the pattern in which the different
periodioities match up as multiples and submultiples (i.e., the beat, notes, various
voices, melodies, and chord sequences) comprise the inner structure, harmony, or
“meaning.” The ability of the sensory-processing generating hierarchy of the
listener to lock onto the periodicities and harmonies at many different levels (and
hence many different periodic intervals) is the ability to “appreciate” or “under¬
stand” the music.
Similarly, in speech the ability of the audio-processing hierarchy to lock on to
periodicities at each level and to detect or recognize and pass on to the next level the
information bearing modulations or deviations in those periodicities constitutes the
ability to ‘ ‘understand” what is spoken. If the audio system locks on only at the first
level, it detects phonetic sounds but not words. If it locks on the first two levels but
no higher, it detects words but not meaningful phrases. If, however, the audio
hierarchy locks on at the third, fourth, fifth, and higher levels, there is excited in the
mind of the listener many of the same trajectories and sequences of interrelated and
harmonious patterns (i.e., goals, hypotheses, sensory experiences) as exist in the
mind of the speaker.
This suggests that we can define understanding to be the lock-on phenomenon
that occurs at many levels in the processing-generating hierarchy of the one who
understands. The depth of understanding depends on how many levels lock onto the
sensory data stream, as well as on the degree of precision with which the various
hypotheses generated at the different levels can track and predict the incoming sen¬
sory data stream.
In general, it is easier to follow a trajectory than to reproduce it. When observ¬
ing a procedure, the generating hierarchy of the observer merely needs to produce
hypotheses that are in the right vicinity so that they can be synchronized with the
sensory input. Uncertainties at branch points in Tp do not matter greatly because er¬
rors are quickly corrected by comparing TR with T£.
On the other hand, reproducing a procedure requires that the H functions be
capable of generating TP trajectories that are precise over their entire length. They
must not wander outside of the success envelope or miss any critical branch points.
This is a much more exacting computational program. This suggests why a student
might be able to follow the reasoning of a professor’s lecture, but be unable to
reproduce it on an exam. It explains why deep understanding requires drill and prac¬
tice. Understanding is the product of an intimate interlocking interaction between
the sensory-processing and behavior-generating hierarchies at many different levels.
If we define understanding to be the generation of hypotheses that can recall R
vectors and T* trajectories that track and predict the incoming sensory data, then a
great number of phenomena related to perception become clear. For example, the
spontaneous reversal of ambiguous figures, such as the wire-frame Necker cube, the
staircase, or the face-goblet figure shown in figures 7.10 through 7.12, can be ex¬
plained by assuming that each of the two possible interpretations corresponds to a
Modeling the Higher Functions 197
hypothesis that produces R vector predictions matching the incoming sensory data.
The fact that either hypothesis produces a lock-on makes it possible for the percep¬
tion of the figure to flip from one interpretation to another. The fact that the two
hypotheses are quite different forces the perceptual understanding to make one
assumption or the other. The two hypotheses can’t exist simultaneously.
Figure 7.10: A wire-frame cube is a classic example of perspective reversal. When gazed at
steadilyy the corner A alternates from outside to inside. Either hypothesis is equally valid, so
whichever is chosen will be confirmed.
Figure 7.11: An ambiguous staircase. The staircase appears to either sit on the floor or hang
from the ceilingy depending on which internal mental hypothesis is chosen.
Figure 7.12; This figure may be either a goblet or two faces staring at each other. Again, what
is perceived depends upon what is hypothesized. Either of the two hypotheses will be con¬
firmed by the sensory input.
strange noise can elicit the expectation of a nonexistent burglar or ghost. On a dimly
lit street, a moving shadow can call forth any one of a number of imaginary
creatures that reside in our internal world model. In situations where sensory input is
clear and unambiguous, direct observations correct erroneous expectations
generated by the world model. This steers the hypotheses of the behavior-generating
hierarchy to a correct interpretation of the external world. But when the sensory in¬
put is fragmentary or ambiguous, the uncontradicted incorrect predictions of the
world model can produce many types of illusions.
m
* r' Tl V “7
V ~7 — —
4v
s Figure 7.13: Subjective contours
are generated in the imagination
^ V ^ L l of the viewer. The internal world
model postulates simple geomet¬
(a) (b) (c)
rical shapes superimposed on
each other to account for the
V - -
y
otherwise complicated figures
and unlikely coincidence of edge
alignments. The internal expec¬
•
In 4 \/ tations are inserted into the sen¬
sory data stream where they are
perceived as if they originated in
(d) (e) the sensory input.
The cross-coupled hierarchy of figure 7.9 also goes far toward explaining the
peculiar affinity of the ear for the rhythmic character of poetry and the numerical
relationships involved in musical harmony. Poetic verse has a rhythmic meter which
periodically terminates phrases with words that rhyme, i.e., that have the same end¬
ing sound. This corresponds to an interlocking harmonic periodicity at several
different levels of the auditory-speech hierarchy. Great poetry has deep and rich
harmonies that extend beyond the meter and word sounds to the higher levels where
meaning and emotional responses are produced in the mind of the listener.
In music the most basic feature is the fundamental rhythm. The melody consists
of a series of harmonically related notes played in phrases of a regularly recurring
number of beats. If there are words, they usually are written in verse that matches
the phrasing of the music. The number of interlocking rhythmic harmonies is a
measure of the richness of the music. Most simple ballads have only a beat, one
melody, and one verse. Polyphonic choral music might have many harmonically
related melodies and many verses, all of which interlock in regularly recurring pat¬
terns. Symphonic music has an extremely complex array of instruments all playing
different, but harmonically related, musical sequences.
The locked-loop concept may also explain the ability of the ear to ignore bursts
of noise and to “flywheel” through auditory dropouts with apparent ease. It is a
well-known phenomenon that sections of several tenths of a second can be cut out of
a tape recording of speech or music without the listener being able to notice. The
predictive capability of the cross-coupled hierarchy simply fills in the gaps, in many
cases without the higher levels even noticing that anything is missing.
Cross-coupled hierarchies, of course, exist in all the sensory-motor systems, not
just in the auditory-speech system. Thus, periodic patterns are intimately involved in
all types of behavior.
Nature is full of periodicities: the pitch of a musical note, the beat of the heart,
the rhythm of breathing. Many behavioral patterns such as walking, running, danc¬
ing, singing, speaking, and gesturing all have a distinctly rhythmic and sometimes
strictly periodic character. All of life’s activities are synchronized to the daily
rhythms of daylight and darkness, as well as to the longer term cycles set by the
phases of the moon and the seasons of the year. These are all regularly recurring pat¬
terns producing social as well as individual sensory-motor trajectories in the brain
synchronized to these rhythms.
As a result there are rhythmically recurring addresses input to the associative
memory modules in the internal world model. These produce rhythmically recurring
expectations to be compared with rhythmically recurring sensory experiences. Thus,
there exists a background of rhythmic patterns which permeate the entire
processing-generating hierarchy of the whole brain.
One of the most important properties of regularly recurring temporal relation¬
ships is that they are predictable. This permits efficient learning and optimization of
Modeling the Higher Functions 201
behavioral patterns that consistently produce pleasing results, we are able to cope
with our environment. It is thus not surprising that in a learning environment repeti¬
tion is rewarding in and of itself. For example, children are fascinated by repetition.
Ample evidence may be found in children’s songs and games, and in the
circumstances that accompany a child’s familiar request, “Do it again, do it again.”
Adolescents tend to listen to the same recorded song or attend the same movie over
and over until they have memorized every word and phrase. Even for adults in a
potentially hostile world there is great survival advantage in being able to predict the
results of future action. If the environment is periodic, this is much easier. Thus, any
environment that exhibits a periodic character tends to be rewarding. For example,
why are the rhythmic movements of dancing and marching to music so compelling?
Isn't it the correlations and harmonic relationships that arise between trajectories in
the behavior-generating and sensory-processing hierarchies? And why are daily
routines and habits so comfortable, and the disruptions of an accustomed schedule
so upsetting? Isn’t it the secure feeling that comes from predictability and the lock-
on that comes from a correspondence between the stored internal model and the
observed sensory data stream?
Of course, we can’t always accurately predict the future. Abrupt deviations
from the predictable produce the sensation of surprise, which arouses attention and
produces alerting signals. Events that are surprisingly pleasant are particularly
rewarding, because they first alert and then reward beyond expectation. Pleasant
surprises often make us laugh—-a behavioral response.
The essence of humor seems to be pleasant surprise. The humorist can tell in¬
teresting stories that end with an unexpectedly clever twist—a double meaning or
inverted logic. Risque humor relies on the attention-alerting effect of skirting the
limits of conventional morality.
On the other hand, unpleasant surprises evoke the emotional response of fear
or anger. In a hostile environment, novel or unexpected events are fraught with
danger. Deviations from the norm can result in disaster. The inability to predict, and
hence to be surprised, constitutes a serious disadvantage. Without predictability, not
only learning but survival itself is threatened. Continued or prolonged disruption of
regular patterns, either in the internal rhythms or in external stimuli, destroys
predictability, frustrates learning, and brings on punishing emotional stress that can
produce neuroses.
WRITING
Written language very likely had its origins in goal-seeking activities. For exam¬
ple, the earliest writing in China began around 2000 B.C. as ideograms or symbols
engraved on bones and shells for the purpose of asking questions of heaven. Each
stroke or series of strokes asks a certain question or seeks guidance for a particular
branch point in the behavioral trajectory of the life of the asker.
The earliest of all known writing is the Uruk tablets discovered in the Middle
East and dated about 3100 B.C. This writing appears to be almost exclusively a
mechanism for recording business transactions and land sales. These written sym¬
bols are now thought to be pictorial lists of tokens used for keeping track of
merchandise or livestock. The tokens themselves first appeared 5000 years earlier
during the beginning of the Neolithic period in Mesopotamia when human behavior
patterns related to hunting and gathering were being replaced by others related to
204
SPEECH
The origin of speech is less certain since it dates from a much earlier period. In
fact, if we include the sounds of whales, animals, birds, and even insects as a form
of speech, spoken language predates the origin of humanity itself. Surely any
behavioral pattern which communicates a threat, signals submission, or expresses
fear or acceptance is a form of language whether it be audible speech or sign
language, or whether it be expressed by a mouse or a human. By this definition,
some speech is very simple—a single facial expression, gesture, chirp, growl, or
squeak for each emotional state encoded or intent expressed. Throughout the animal
kingdom there exists a great variety of modes of expression and many different
levels of complexity. Sounds such as the growls, whines, barks, and howls of the
wolf express an extremely complex variety of social communications. One can easily
feel caught up in a primitive community sing-along when listening to a recording of
a wolf-pack chorus.
Modeling the Higher Functions 205
STORYTELLING
The telling of tales and stories is a primitive and fundamental aspect of human
language that has received relatively little attention from language researchers.
Most work on language has centered on the rules of syntax and grammar and the in¬
formational structure of semantics. This would seem to be a classic case of failing to
see the forest for the trees. Concentration on the mechanical details of vocabulary
and sentence structure has largely obscured the more important capacity of the
human mind to remember and relate tales of adventure and drama. In cultures
where written language is unknown, storytellers are able to relate epic tales many
hours long with the precision of a Broadway actor reciting a script. Persons familiar
with such stories can detect the omission of a single word or the substitution of a
single phrase.
In fact, the fundamental component of all literature is the story—the relating of
the behavioral actions and experiences of a cast of characters. The storyteller creates
in the mind of the listening audience a set of trajectories that approximate what
would be felt and experienced if the listener were actually acting out the behavioral
patterns of the characters in the story.
The close relationship of the story to behavior can best be understood by
considering the nature of the many trajectories that comprise the deep structure of
behavior. Consider once more the set of trajectories in figure 5.17. Each trajectory
consists of a sequence of state vectors which can be put in one-to-one cor¬
respondence to a vocabulary of words. Thus, each trajectory defines a string of
words that can be interpreted either as a program or as a story. If we choose to call it
a program, each trajectory consists of a string of commands, or program
statements, which are executed to generate behavior. However, we can just as easily
interpret the string of words as a story. Each trajectory tells its own story. The low-
level trajectories are very detailed stories, relating every movement of a particular
muscle, or every behavioral primitive of a single limb. The high-level trajectories tell
stories in a richer vocabulary, but with less detail. Trajectories in the behavior¬
generating hierarchy describe action. Trajectories in the sensory-processing hierar¬
chy describe experiences and feelings. Trajectories in the world-model hierarchy
Modeling the Higher Functions 207
describe hopes, expectations, and dreams. Thus, any behavioral sequence consists-of
hundreds of stories, all being related simultaneously with different levels of detail
and describing different aspects of the behavior.
In the normal course of events, these trajectories (or stories) of the deep struc¬
ture are played out in behavior, in experience, or in imagination. The translation of
these trajectories into words gives rise to language. The narrator chooses a single
string of words from the many available trajectories. Of course, the dramatic effect
of his tale can be enhanced by skipping from one trajectory to another and from one
level to another. He can drop down to a low-level trajectory to expand the details of
the exciting parts of a story and then jump up to a high-level trajectory to skip
quickly over boring or routine events. The storyteller can jump from a behavioral
trajectory to an experience trajectory to an emotional trajectory to a belief trajec¬
tory. He can even jump back and forth between trajectories in different characters
as he spins his tale.
The mind of the listener fills in the missing trajectories that the storyteller leaves
out just as the vision system fills in the subjective contours of the missing parts of
images such as are shown in figure 7.13. The storyteller’s string of words generates
hypotheses in the minds of the listening audience. These elicit memories and trigger
the imagination to produce a full range of sensory and emotional experiences. Thus,
the words of the storyteller pull the mind of the listener along the main experience
trajectory of the story, and the imagination of the listener fills in the background.
Among the most ancient forms of human speech that survive today are the
tribal dances of the few remaining stone-age peoples. In such rites, information of
vital subjects such as hunting (including the habits, ferocity and vulnerable areas of
the prey), stalking, and using weapons are conveyed by dance, symbolic gestures,
pantomime, songs, and shouts, as the hunters relate (indeed reenact) the exploits of
the hunt. The storytellers replay the behavioral trajectories of their own hunting ex¬
perience and attach verbal symbols and gestures to the portions which cannot be
literally acted out.
Indeed, a great deal of human language behavior must have developed as a
result of sitting around the fire relating tales of the day’s adventures and making
plans for tomorrow. Even in modern cultures, the majority of everyday speech con¬
sists of relating personal experiences. This is simply the straightforward encoding of
behavior trajectories, or the recalled sensory experiences addressed by those
behavioral trajectories, into a string of language tokens or symbols such as gestures,
vocal cord, tongue, and lip manipulations. Thus, in the final analysis, all language is
a form of goal-directed manipulation of tokens and symbols. The ultimate result is a
manipulation of the minds and hence the actions of other members of the society.
208
Language is a tool by which a speaker can arouse or implant in the listener a great
variety of behavioral goals, hypotheses, and belief structures. By the use of these
means, a speaker can command, instruct, threaten, entertain, or chastise other per¬
sons in his group to his own benefit and for his own ends.
The implication for research in language understanding is that there is much to
be learned from the relationship between language and other forms of behavior.
How, for example, can behavioral goals and trajectories be encoded into strings of
language symbols for making requests, issuing commands, and relating sensory ex¬
periences? How can patterns of trajectories be encoded and transmitted by one
processing-generating hierarchy so as to be received and reconstructed by another?
Language generation and recognition depend upon many of the same
mechanisms by which the rhythms, periodicities, and harmonic patterns of music,
song, and poetry are recognized, tracked and predicted at many different levels.
The relatively simple and well-structured domains of music and pentameter
may be particularly fertile, unexplored areas for research in language generation and
understanding. The rhythmic character of the time-dependent interactions between
stored models and sensory input should make the study of music recognition by
computer an interesting and rewarding research topic. Coupled with the study of
mechanisms for generating complex behavior in general, this provides a fresh new
approach to the study of language. /
MECHANISMS OF CHOICE
Emotions
Emotions play a crucial role in the selection of behavior. We tend to practice
what makes us feel comfortable and avoid what we dislike. Our behavior-generating
hierarchy normally seeks to prolong, intensify, or repeat those behaviors that give us
pleasure or make us feel happy or contented. We normally seek to terminate,
diminish, or avoid those behavior patterns that cause us pain or arouse fear or
disgust.
Modeling the Higher Functions 209
PROPAGATE
GENES
OBSERVATION ACTION
Figure 7.15: The highest levels in the processing-generating hierarchy are the value-judging
and goal-selecting mechanisms of the emotions and will. The emotions are the place where
events, objects, and relationships are judged as lovable, disgusting, happy, sad, joyful, fear¬
ful, and so on. The will is where the decisions are made that commit an organism to a unified
pattern of behavior directed toward a specific goal.
In the past 25 years research has shown that the emotions are generated in
localized areas, or computing centers, in the brain. For example, the posterior
hypothalamus produces fear, the amygdala generates anger and rage, the insula
computes feelings of contentment, and the septal regions produce joy and elation.
The perifornical nucleus of the hypothalamus produces punishing pain, the septum
pleasure, the anterior hypothalamus sexual arousal, and the pituitary computes the
body’s response to danger and stress. These emotional centers, along with many
others, make up a complex of about 53 regions linked together by 35 major nerve
bundles. This entire network is called the limbic system. Additional functions car¬
ried out in the limbic system are the regulation of hunger and thirst performed by the
medial and lateral hypothalamus, the control of body rhythms such as sleep-awake
cycles performed by the pineal gland, and the production of signals which con¬
solidate (i.e., make permanent) the storage of sensory experiences in memory
performed by the hippocampus. This last function allows the brain to be selective in
210
In simple creatures the emotional output vector can be restricted to a few com¬
ponents such as good-bad and pleasure-pain. In higher forms the emotional output
is a highly multidimensional vector with many faceted components such as love,
hate, jealousy, guilt, pride, and disgust. Part of this Q output may simply produce
feelings (i.e., joy, sadness, excitement, and fear.). However, most of the Q output
directly or indirectly provides F input to the highest level H function, the will.
OBSERVATION
Figure 7.16: A n external event (such as a person talking to a flower) may be recognized as de¬
viant or normal. If deviant, action may be selected appropriate to the emotional valuation of
fear, pity, or amusement. If the event is recognized as normal and evaluated as unnoteworthy,
no change in on-going activity is called for.
Will
For centuries philosophers and theologians have debated the nature of the will.
For the most part, this argument has centered on the question of whether humans
have free will (i.e., the freedom to choose goals) or whether all choice is merely a
reflexive or predestined response to the environment. Debates over free will revolve
around the question of responsibility and guilt. The theology of sin and the legal
questions of crime and punishment turn on the question of the individual’s respon¬
sibility for his or her own personal behavior. If the individual is free to choose right
from wrong, then when he chooses right he should be praised and when he chooses
wrong he should be punished. If, however, the choice of the individual is predes¬
tined by God, or by fate, or is largely determined by the contingencies (i.e., the
rewarding and punishing reinforcements of the environment), then the responsibility
for the behavior of the individual is at least shared by, if not totally thrust upon, the
external environment, be that society or the Divinity.
Most people would agree that the behavior of individuals is constrained by the
range of role models made available to them as a result of the prevailing social struc¬
ture and by accident of birth. To a large extent individual behavior is influenced by
the amount and quality of the training received and by the degree of health,
strength, intellect, and talent that a person is born with. Certainly, the really big
events in life—whether there is war or peace, whether society is civilized or bar¬
barous, whether one’s parents are prosperous or poverty-stricken—are matters not
much influenced by the will of the individual. These are the type of events that
people ascribe to the will of God or to the fates.
The question of free will of an individual is on another scale. Free will implies
an ability to choose from the variety of behavioral patterns available to the
individual in the immediate environment. Free will involves many implicit assump¬
tions about the rules of right and wrong, about motivation, about knowledge of
what is possible, and about what the consequences of various actions might be.
If we define the will to be the highest level in the behavior-generating hierarchy,
then the choices made are determined by the H function stored in this highest level
module. Some may interpret this to mean that the choices made by the will are not
free, because they are determined, even predestined, by the mathematical transfor¬
mations of the H function. But the H function of the will merely embodies the rules
of choice: //such and such is the state of the world, and if my emotions make me feel
so and so, then I will do thus and thus. Certainly, the fact that there exists a com¬
puting module wherein resides an algorithm or set of rules for making these types of
decisions does not negate the “freedom” of the decision. There are few restrictions
on the set of rules that may be embodied in the H module of the will. The will
receives input variables from literally hundreds of sources, including many from the
emotions, as well as from internally generated chemicals and hormones. Since both
emotions and hormone levels affect and are affected by what we call feelings and
moods, the decisions made by the highest level H function are profoundly influ¬
enced by these variables.
Modeling the Higher Functions 213
Furthermore, the will has a great deal of control over what inputs it will enter¬
tain. This is evident in the fact that we tend to see what we want to see and hear what
we want to hear. The emotions sit at the top of the sensory-processing hierarchy. In¬
puts to the emotions, and thus the emotional inputs to the will, are heavily influ¬
enced by the various processing and filtering functions that are selected by the
behavioral choices of the H function of the will itself. In short, we can suppress in¬
puts which are evaluated as immoral. Alternatively, we can execute behavioral ac¬
tions which avoid temptation or which remove its input from our sensory channels.
Finally, the H function of the will, as well as the M functions of the world
model, and the G functions of the entire sensory-processing hierarchy including the
emotions can be altered as the result of learning and/or teaching. Thus, even though
the decisions made by the H function of the will are theoretically deterministic
and the resultant behavior patterns therefore predestined, the range of inputs is so
large and the variability of the H function itself so wide that for all practical pur¬
poses the decisions made by the will are quite nondeterministic. They certainly seem
so to the individual. The influence of emotional states, moods, and feelings are pro¬
found, and the H function itself has the capacity to change with experience. The H
function is both culturally and individually determined. Thus, the model proposed
here provides all the variability needed to satisfy the most ardent advocate of the
doctrine of free will.
Yet there is a clear role played by the contingencies of the environment, by the
reinforcements of reward and punishment, by the family, clan, and community, and
by the national and religious heritage in the formation of the H function of the will;
and not only of the will, but of the emotions, and the H, M, and G functions of the
entire processing-generating hierarchy as well.
What the G and H functions of the emotions and will are and where they
originate is a matter of hot dispute. One recent theory proposed by sociobiology is
that they are genetically determined, derived from information stored in the DNA
molecule, as the result of millions of years of natural selection. This theory argues
that innate behavior-selecting mechanisms have evolved so as to maximize the Dar¬
winian fitness (the expected number of surviving offspring) of their possessors.
The incidence of behavior in many different species from insects to birds to
mammals corresponds closely to mathematical predictions derived from genetics
and game-theory analyses of strategies for maximizing the probability of gene prop¬
agation. Even cooperative or altruistic behavior such as that of the worker bee and
ritualized behavior in animal contests and courtship can in many cases be explained
by genetic arguments. However, the evidence for this theory is much stronger for
insects than for higher forms, and the opinion that human emotions are transmitted
genetically is not widely held.
214
If the top level of the behavior-generating hierarchy is the will, and the top level
of the sensory-processing hierarchy is the emotions, then the upper levels of the
world model are the philosophical beliefs that shape our thoughts and control our
behavior. A diagram of the types of beliefs contained at various levels of the world
model is shown in figure 7.17. The memories and predictions of the world model at
all levels are essentially beliefs, which are accumulated as a result of experience and
modified by new types of experience. At the higher levels, however, the flow of in¬
formation in the sensory-processing system is highly processed and abstract and may
Figure 7.17: The world model is the brain's mechanism for generating predictions and expec¬
tations for contemplated actions or recognized situations. At the lowest levels the world model
generates expectations for simple actions and physical events. At a higher level, an internal
model ofpeer group attitudes generates expectations for social behavior. At the highest levels,
an internal model consisting of philosophical and religious beliefs generates expectations for
consequences of moral and immoral behavior. Value judgments of what is good or bad are
made by the emotions.
come from a great variety of sensory sources. For example, higher level beliefs about
many things are acquired from the experience and beliefs of others through the
mechanism of language. Many beliefs are acquired from sayings, traditions, old
wives’ tales, legends, and myths. These are transmitted from parent to child and
from authority figures such as chiefs, elders, and priests to the common people. In
primitive tribal cultures, many of the beliefs concern gods, devils, ghosts, and spirits
and consist of elaborate tales about what these disembodied creatures will do or feel
in response to the behavioral choices of the individual or the society, f
The fact that such higher level beliefs cannot be verified by comparison with
direct sensory experience is often of little consequence. At these levels in the hierar¬
chy of the brain, all sensory data is highly abstract and subject to filtering by
expectations generated by the world model itself. Thus, the difference between
information derived from a physical experience and a verbal description of such an
experience is not large. Repeated listening to stories from authoritative sources such
as textbooks or the Holy Scriptures and acting out solemn rituals such as scholastic
examinations or religious ceremonies provide most of the experience needed to
verify the predictions of the world model beliefs and solidify the conviction of their
truth.
From a survival standpont it is quite immaterial whether the beliefs imbedded in
the world model are true or false. It really doesn’t matter much whether beliefs
about demons, fairies, witches, and leprechauns have any correspondence to reality.
All that is important is whether belief in such things gives the individual a basis for
confidently selecting behavior that leads to happy and successful results, and for
swiftly rejecting behavior that leads to punishment or disaster. For survival, it is on¬
ly important that the resulting behavioral choices be, on the whole, beneficial to the
individual and the society.
For learning and reinforcement, all that is necessary is that the predictions and
expectations generated by the world model be perceived to be useful in selecting and
guiding behavior that works to the advantage of the individual and society. If this is
so, the world model will be reinforced. If not, then the contents of the world model
will be modified or replaced.
It is important to realize that there is no way that the mind can ever really know
the external world. The interface between neuronal activity and the physical en¬
vironment is an impenetrable barrier. That boundary is like a mirror in which the
mind sees the world as the reflection of its own internal beliefs. We can, of course,
test our beliefs against observations. However, direct testing is possible only with
those expectations stored at the lowest levels in the hierarchy that are related to im¬
mediate interactions with the physical and social environment. We construct the
lower levels of our world model primarily through direct sensory experience. We test
the expectation generated by those lower levels against everyday experiences: every
time we throw a rock and observe its trajectory, we test our expectations concerning
the effects of gravity and inertia. We compare our expectations concerning the wind
and clouds and seasonal variations in the temperature and precipitation every time
Modeling the Higher Functions 217
we observe the weather. A great deal of human conversation and thought is, in fact,
dedicated to just such comparisons. We test our expectations concerning the
behavior of animals and other humans every time we observe their habits or interact
with them socially.
However, when we reach beyond everyday experience to philosophy and
abstract scientific principles, it is not so easy to compare observation against belief.
For example, how can we test our belief that the world is round? This is not directly
observable except from outer space. How do we know that matter is made up of
molecules and atoms and electrons and quarks? This is not observable by any direct
measurement. In fact, most people don’t have even the slightest notion of the
evidence that substantiates these theories. They simply believe them on the authority
of teachers and supposedly knowledgeable persons. Thus, modern science is itself a
belief structure propagated not very differently from the myths of ancient religions.
It is taught by a class of authoritative experts, who have much in common with
priests and theologians.
Of course, scientific beliefs are not generally accepted unless a sufficiently large
number of eminent scientists agree that the comparison between belief and observa¬
tion can, in fact, be made and that under repeated trials the model always predicts
exactly what is observed. This is the essence of the scientific method. It provides a
systematic procedure for discovering and refining world model beliefs that accurate¬
ly predict the results of behavioral experiments.
There will, however, always be some beliefs that can’t be tested either by direct
observation or by the scientific method. There will always be some questions that re¬
main cloaked with mystery. There is always some point at which it becomes unclear
whether what we are modeling is myth or reality. The belief model itself is always
imaginary. Whether it has a counterpart in the external environment can’t always be
known for sure. It is here that faith enters. In the words of the apostle Paul, faith is
"the conviction of things not seen”; it is the confidence we have that our model of
the external world is a reliable guide for behavior. As we have said earlier, this is
really all that is necessary.
Only two critical features are necessary for a set of IF/THEN rules embedded
in the belief structure to be successful:
behavior is preprogrammed. The extent to which the feedback pulls the TP. trajec¬
tories along predictable paths to the goal state is the extent to which behavior is
adaptive. For some goals, such as hunting for prey or searching for breeding ter¬
ritory, the selection of the goal merely triggers migratory searching behavior which
continues until feedback indicates that the goal is near at hand. For such goals,
behavior is indefinite and highly feedback dependent. For other goals, such as
building a nest, making a tool, courting a mate, or defending a territory, behavior is
more inner-directed, requiring only a few sensory cues for triggers.
In either case, while the brain is in the acting mode the sensory data flowing in
the sensory-processing hierarchy is highly dependent on (if not directly caused by)
the action itself. If the action is speech, the sensory-processing hierarchy is analyzing
what is spoken and provides feedback for control of loudness, pitch, and modula¬
tion. If the action is physical motion, data from vision, proprioception, and touch
sensors are all highly action dependent, and the sensory analysis is primarily directed
toward servo-control of the action itself.
In the action mode, the Mt associative memory modules provide context in the
form of predicted data to the sensory-processing modules in order to distinguish be¬
tween sensory data caused by motion of the sensors and that caused by motion of
the environment. What is predicted is whatever was stored on previous experiences
when the same action was generated under similar circumstances. This allows the
sensory-processing hierarchy to anticipate the sensory input and to detect more
sophisticated patterns in the sensory data than would otherwise be possible.
Attention
The directing or focusing of attention is essentially a purposive action whose
goal is to optimize the quality of the sensory data. The basic elements of attention
are orienting—positioning the body and sensory organs to facilitate the gathering of
data—and focusing—blocking out extraneous or peripheral information so that the
sensory-processing system can bring all of its capacities to bear on data that are rele¬
vant to the object of attention. The orienting element is simply a behavioral task or
goal to acquire and track a target. The focusing element is a filtering problem that
can be solved by a hypothesis or goal decomposition that evokes the appropriate
masks or filter functions from the R, modules so as to block out all but the relevant
sensory input data. Figure 7.18 illustrates the filtering aspects of attention.
Thus, attending is a combination of observing and acting. It is primarily a
sensory-analysis mode activity, strongly assisted from the task-execution mode.
Figure 7.18: Direct recording of click responses in the cochlear nucleus during three periods.
(Top and bottom) Cat is relaxed„ and the click responses are large. (Middle) The cat is visually
attentive to the mice in the jar, and the click responses are diminished in amplitude. This il¬
lustrates the filtering of sensory input controlled by activity in the behavior-generating hierar¬
chy.
Planning
Imagination gives us the ability to think about what we are going to do before
committing ourselves to action. We can try out, or hypothesize prospective
behavioral patterns, and predict the probable results. The emotions enable us to
evaluate these predicted results as good or bad, desirable or undesirable.
Imagination and emotional evaluators together give us the capability to conduct
a search over a space of potential goal decompositions and to find the best course of
action. This type of search is called planning.
When we plan, we hypothesize various alternative behavior trajectories and
attempt to select the one that takes us from our present state to the goal state by the
most desirable route. Imagined scenarios that produce positive emotional outputs
are flagged as candidate plans. Favorably evaluated scenarios or plans can be
repeatedly rehearsed, reevaluated, and refined prior to initiation of behavior-
producing action.
Imagined scenarios that produce negative evaluation outputs will be avoided if
possible. In some situations it might not be possible to find a path from our present
state to a goal state, or at least not one that produces a net positive evaluation.
Repeated unsuccessful attempts to find a satisfactory, nonpunishing plan, par¬
ticularly in situations recognized as critical to one’s well-being, correspond to worry.
One of the central issues in the study of planning is the search strategy, or
procedure, that dictates which of the many possible hypotheses should be evaluated
first. In most cases, the search space is much too large to permit an exhaustive search
of all possible plans, or even any substantial fraction of them. The set of rules for
deciding which hypotheses to evaluate, and in which order, are called heuristics.
Heuristics are usually derived in an ad hoc way from experience, accident,
analogy, or guesswork. Once discovered, they may be passed from one individual to
another and from one generation to another by teaching.
Historically, artificial intelligence researchers have been fascinated by the sub¬
ject of heuristics. At least a portion of this interest is a result of its recursive nature.
A heuristic is a procedure for finding a procedure. When this recursion is embedded
in a cross-coupled, processing-generating hierarchy with the rich complexity of the
human brain, it becomes clear why the thoughts and plans of humans are filled with
such exquisite subtleties and curious, sometimes insidious, reasoning. It also pro¬
vides some insight into the remarkable phenomenon of self-consciousness (i.e., a
computing structure with the capacity to observe, take note of, analyze, and, to
some extent, even understand itself).
Much of the artificial intelligence research in planning and problem-solving has
its origins and theoretical framework based on simple board games where there are a
finite (although sometimes very large) number of possible moves. The discrete
Modeling the Higher Functions 223
character of such games, together with the digital nature of computers, led naturally
to the analysis of discrete trees, graphs, and search strategies for such structures.
Planning in a natural environment is much more complex than searching
discrete trees and graphs. In the study of planning in the brain it is necessary to deal
with the continuous time-dependent nature of real world variables and situations.
States are not accurately represented as nodes in a graph or tree; they are more like
points in a tensor field. Transitions between states are not lines or edges, but
multidimensional trajectories, fuzzy and noisy at that. In a natural environment, the
space of possible behaviors is infinite. It is clearly impossible to exhaustively search
any significant portion of it. Furthermore, the real world is much too unpredictable
and hostile, and wrong guesses are far too dangerous to make exploration practical
outside of a few regions in which behavior patterns have had a historical record of
success. Thus behavior, and hence imagination and planning, is confined to a
relatively small range of possibilities, namely those behavioral and thought patterns
that have been discovered to be successful through historical accident or painful trial
and error. Both the potential behavioral patterns and the heuristics for selecting
them are passed from one generation to another by parents, educators, and civil and
religious customs.
Daydreaming or Fantasizing
The fact that the imagination can generate hypothetical scenarios with
pleasurable emotional evaluations makes it inevitable that such scenarios will
sometimes be rehearsed for their pleasure-producing effect alone. This is a pro¬
cedure that can only be described as daydreaming or fantasizing.
When we daydream we allow our hypothesis generators to drift wherever our
emotional evaluators pull them. Our imagination gravitates toward those trajec¬
tories that are emotionally most rewarding. Some of the most pleasurable scenarios
we can image are physically impossible, impractical, or socially taboo. Most of us
recognize these as fantasies and never attempt to carry them out. However, once a
person adopts the intent to carry out a fantasy, it ceases to be a dream and becomes
a plan.
Thus, planning and daydreaming are closely related activities, differing prin¬
cipally in that planning has a serious purpose and involves an intent to execute what
is finally selected as the most desirable of the alternative hypotheses.
This model suggests that dreaming while sleeping is similar in many respects to
daydreaming. The principal difference in night dreaming seems to be that the trajec¬
tories evoked are more spasmodic and random, and are not always under the control
of the emotions and will.
• CREATIVITY
tion that accompanies the moment of insight, i.e., the moment when a hypothesis is
selected that generates a prediction matching the observed facts. The recognition of
lock and the emotional evaluation that that particular lock-on is significant are the
“Aha!” This, of course, is as much a part of the act of genius as the selection of the
successful hypothesis.
There were millions of persons in James Watt’s day who had observed the
motion of a lid on a boiling kettle. There may even have been many who connected
the heat of the flame with the mechanical motion. However, it was only Watt who
recognized the implications of what he saw and who hypothesized the construction
of a machine to do industrial work. In fact, Watt’s contribution to the steam engine
was not in the design of the cylinder and pistons but in the concept of injecting a
spray of cold water into the steam-filled cylinder to create a vacuum on the return
stroke.
This was an act of creative genius. But what went on in the mind of the genius?
First was the recognition, then the hypothesis, and then the hard work of turning the
moment of insight into a piece of working machinery. There was no great leap
through uncharted regions of thought space. There was at most a tiny evolutionary
step, an almost accidental superposition of the recalled memory of a piston on the
observed image of a moving kettle lid. Watt could make this superposition because
he already had imbedded in the H, M, and G functions of his mind the learned skills
of making and working with pistons and cylinders.
The implication is that we need no elaborate mechanisms to account for
creativity beyond what we have already set forth for explaining goal-seeking
behavior. What is truly remarkable about the creative person is the ability to select
hypotheses, to recall realistic expectations, and to accurately process and evaluate
these expectations according to a set of logical rules. These are the mechanisms re¬
quired for sophisticated goal-directed behavior. Nothing more, other than habits of
careful investigation and observation and a fortunate choice of attentional goals, is
needed to account for even the greatest acts of creative genius.
Consider the fact that it took the human race many millenia to learn to start a
fire, to grow a crop, to build a wheel, to write a story, to ride a horse. Even the
Greeks did not know how to build an arch. Yet these are all simple procedures that
any child can understand and more or less master. Surely our ancestors as adults
were as intelligent and creative as today’s children. Why did they fail for hundreds
of years to discover these simple yet highly useful procedures?
Because they had no one to teach them. A modern child knows about wheels
because he is taught. He plays with toys that have wheels. He rides in vehicles with
wheels. If a modern child grew up in a culture where he never saw a wheel, he would
never think of one, nor would his children, or his grandchildren, any more than his
ancestors did for thousands of years before him.
This is not to say that there is not a creative aspect to genius, whether it be
artistic or scientific genius. We reserve our highest acclaim for the person who
discovers or devises something new. There are many artisans for every creative
genius. Even so, the creation of a new art form, or even a new scientific concept, is
seldom appreciated unless the details are worked out. The acclaim to genius is heavi¬
ly dependent on how skillfully the work is performed, how useful or entertaining it
is, and how well it harmonizes with the prevailing knowledge base and belief struc¬
ture of the peer group.
The reason we value creativity so highly is because it is so rare and so highly
advantageous. It is rare precisely because there are no mechanisms for creativity in
the brain. There are only mechanisms for sophisticated behavior. Creativity is so ad¬
vantageous because once a new and useful procedure like navigating a ship, making
steel, or flying an airplane is discovered, it can easily be taught to others. We all
possess the most remarkable mechanisms for learning and executing complex
behavior. Once a new invention is developed, it can be taught in schools and a whole
society can benefit from the results.
Thus, we learn to solve problems, to invent, and to be creative in much the same
way as we learn any other goal-directed behavior pattern such as hunting, dancing,
speaking, or behaving in a manner that is acceptable to and approved by our peers:
we learn from a teacher. The beauty and the sense of awe and wonder we experience
when confronted by work of creative genius derives much more from the skill and
precision with which it is executed than from the novelty of the creation.
If there are no specific mechanisms in the brain for creativity, then it would
seem foolish to attempt to design creative robots. This is certainly true at least until
our robots become as skilled and dexterous and adept at complex sensory-interactive
goal-directed behavior as the lower mammals.
If we design systems with sufficient skill in executing tasks and seeking goals,
and sufficient sophistication in sensory analysis and context sensitive recall, and if
we teach these systems procedures for selecting behavior patterns that are ap¬
propriate to the situation, then they will appear to be both intelligent and creative.
But there will never be any particular part of such a device to which one can point
and say “Here is the intelligence,” or “Here is the creativity.” Skills and knowledge
will be distributed as functional operators throughout the entire hierarchy. To the
degree that we are successful, intelligence and creativity will be evidenced in the pro¬
cedures that are generated by such systems.
Above all, we should not expect our robots to be more clever than ourselves, at
least not for many decades. In particular we should not expect our machines to pro¬
gram themselves or to discover for themselves how to do what we do not know how
to teach them. We teach our children for years. It will take at least as much effort to
teach our machines.
We must show our robots what each task is and how to do it. We must lead
them through in explicit detail and teach them the correct response for almost every
situation. This is how industrial robots are programmed today at the very lowest
levels, and this is, for the most part, how children are taught in school. It is the way
that most of us learned everything we know, and there is no reason to suspect that
robots will be programmed very differently. Surely it is as unreasonable to expect a
Modeling the Higher Functions 227
robot to program itself as it is to expect a child to educate himself. We should not ex¬
pect our robots to discover new solutions to unsolved problems or to do anything
that we, in all the thousands of generations we have been on this earth, have not
learned how to do ourselves.
This does not mean that once we have trained our robots to a certain level of
competence they can’t learn many things on their own. We can certainly write
programs to take the routine and the tedium out of teaching robots. Many different
laboratories are developing high-level robot programming languages. We already
know something about how to represent in computers knowledge about
mathematics, physics, chemistry, geology, and even medical diagnosis. We know
how to program complex control systems and to model complicated processes, and
we are rapidly learning how to do it better, more quickly, and more reliably. Soon,
perhaps, it will even be possible to translate knowledge from natural language into
robot language so that we will be able to teach our robots from textbooks or tape
recordings more quickly and easily than humans. We can even imagine robots learn¬
ing by browsing through libraries or reading scientific papers.
But it is a mistake to attempt to build creative robots. We are not even sure
what a creative human is, and we certainly have no idea what makes a person
creative, aside from contact with other creative humans—or time alone to think. Is it
both? Or neither?
We should first learn how to build skilled robots—skilled in manipulation, in
coping with an uncertain or even hostile environment, in hunting and escaping, in
making and using tools, in encoding behavior and knowledge into language, in
understanding music and speech, in imagining, and in planning. Once we have
accomplished these objectives, then perhaps we will understand how to convert such
skills into creativity. Or perhaps we will understand that robots with such skills
already possess the creativity and the wisdom that springs naturally from the
knowledge of the skills themselves.
Robots 229
CHAPTER 8
Robots
W e come finally to the subject of robotics. From this point on, we shall at¬
tempt to apply our knowledge of how the brain produces goal-directed behavior in
biological organisms to the problem of how computers can be made to produce
similar behavior in mechanical machines.
Man's fascination with machines that move under their own power and with
internal control is at least as old as recorded history. As early as 3000 B.C., the
Egyptians are said to have built water clocks and articulated figures, some of which
served as oracles. The ancient Greeks, Ethiopians, and Chinese constructed a great
variety of statues and figures that acted out sequences of motions powered by falling
water or steam. Hero of Alexandria amused Greek audiences around 100 B.C. with
plays in several acts performed entirely by puppets driven by weights hung on
twisted cords. Much later, a great number of timepieces were contrived that per¬
formed elaborate scenarios on striking the hour. Some of these clocks still exist
today. In Piazza San Marco in Venice there is a clock tower, built in 1496, with two
enormous bronze figures on top that strike a bell with hammers on the hour. See
figure 8.1. The clock itself not only tells the time but indicates the position of the sun
and moon in the zodiac. In the Frauenkirche in Nuremberg, there is a famous clock
built in 1509. On the hour, a whole troupe of figures appears in procession, ringing
bells, playing instruments, and summoning passersby to worship.
During the latter half of the 18th century, a number of Swiss craftsmen, most
notably Pierre and Henri-Louis Jaquet-Droz, constructed a number of lifelike
automata that could write, draw pictures, and play musical instruments. The Scribe,
shown in figure 8.2, was built in 1770. It is an elegantly dressed figure of a child that
writes with a quill pen that it dips in ink and moves over the paper with graceful
strokes. This amazing android is controlled by an elaborate set of precision cams
driven by a spring-powered clock escapement, shown in figure 8.3. A similar
automaton, known as the Draughtsman, was built three years later. It has a reper¬
toire of four drawings, one of which is shown in figure 8.4. The action patterns for
230
these drawings are stored on three interchangeable sets of twelve cams. During
pauses between drawings, while the cams are changing their positions, the puppet
blows the dust off his drawing paper using a bellows placed in his head for this pur¬
pose. The Musician, shown in figure 8.5, actually plays a miniature organ. The
fingers strike the keys in the proper sequence to produce the notes. The breast rises
and falls in simulated breathing, the body and head sway in rhythm with the music,
and the eyes glance about in a natural way. The Scribe, Draughtsman, and Musician
still exist in working condition in the Mus£e d’Art et d’Histoire in Neuchatel,
Switzerland, where they are operated occasionally. A similar picture-drawing au¬
tomaton, the Philadelphia Doll constructed by Henri Maillardet in 1811, can be seen
at the Franklin Institute in Philadelphia.
Figure 8.3: The mechanism that drives the Scribe. Cams that control the
movement of the hands in forming letters can be seen arranged in a
stack in the upper and middle parts of the mechanism. There are three
cams for each letter, one for each of the three degrees offreedom of the
puppet*s hand. Each turn of the stack forms a single letter. The disk at
the lower part of the mechanism selects the vertical position of the stack
and, hence, the letters to be formed. There are also sets of cams for
various other actions to be performed: begin a new line, dip the pen in
the ink, etc. The 40 positions on the lower disk are the program. The set¬
ting of the levers on this disk select the specific letters or actions to be
performed at each step in the program. The mechanism can be pro¬
grammed to execute any desired text of up to 40 letters and actions.
The fascination, awe, and sometimes fear that surround the subject of robotics
center on the notion of creating artificial life. The potentially threatening and un¬
controllable consequences of this possibility have been the theme of many books and
movies. One of the first and most influential works in this area was Mary Shelley’s
Frankenstein, published in England in 1817. Only a year before the book appeared,
Mary Shelley had visited Neuchatel where the Jaquet-Droz automata were then, as
now, on display. The theme of Frankenstein is the danger of artificial life run
amuck. Dr. Frankenstein, a well-intentioned scientist, creates an uncontrollable
monster with superhuman strength and a defective intellect.
The word “robot” was coined a century later, in 1921, by Czechoslovakian
playwright Karel Capek in his play R.U.R. (Rossum’s Universal Robots). Robot
derives from the Czech word for “worker.” Capek’s R.U.R. echoes the Franken¬
stein theme through a melodramatic plot involving a brilliant scientist named
Rossum who manufactures a line of robots designed to save mankind from work.
Rossum’s robot project is marvelously successful in the beginning, but the plot turns
sinister when the robots are used in war to kill humans. Eventually, after the robots
are given emotions and feelings by an irresponsible scientist in the Rossum
laboratory, disaster strikes. The mechanically perfect robots no longer tolerate being
treated as slaves by imperfect humans. A rebellion ensues and soon all human life is
exterminated. Figure 8.6 shows the last human survivor, a clerk in Rossum’s
laboratory, meeting his end. Variations on these motifs have dominated science fic¬
tion and movie literature on robots for decades.
Figure 8.6: A scene from the play R.U.R. where rebellious worker robots turn on their human
creators and kill them. At the end of the play all human life is exterminated.
Robots 233
Two notable exceptions to this trend are stories by Isaac Asimov and the recent
series of motion pictures spawned by Star Wars. In Star Wars, robots are depicted as
lovable friends and loyal companions to humans. In Asimov’s stories, robots are
constructed and programmed with the idealism of Asimov’s Three Laws of Robotics
which state:
1. A robot may not injure a human being or, through inaction, allow a human
to be harmed.
2. A robot must obey orders given by humans except when that conflicts with
the First Law.
3. A robot must protect its own existence unless that conflicts with the First or
Second Laws.
ROBOT REALITY
dexterity with the clumsy lumberings of the walking truck shown in figure 8.8 to
understand how far we are far from creating a robot with the physical skills of an
ant.
Some might even contend that it will never be possible to create a robot as
marvelous as an ant. In some ways this is undoubtedly true, at least not for a
hundred years, or perhaps a thousand. The truly interesting question is, “What is
possible in five years, in ten, in twenty, in fifty—the remaining years of our lives?”
Clearly it is possible to build vehicles that move about at many times the com¬
parative speed of insects, whether they walk, fly, or swim. We know how to build
and fuel efficient power plants and how to build transmission systems to transport
and modulate the power.
What we do not know is how to build the sensory-interactive control systems
Robots 235
necessary to direct that power to accomplish goals and execute skilled tasks in an
unstructured, uncooperative, and even hostile environment. Furthermore, we can¬
not yet build the mechanical structures inexpensively. The cheapest computer-
controlled industrial robot with five servoed degrees of freedom, one parallel jaw
gripper, and a crude sense of touch and vision costs about $60,000, and it cannot lift
one-tenth of its own weight.
Yet the problem is much more fundamental than merely the cost of mechanical
hardware. The software does not exist at any price that could control a six-legged
walking machine with two arms and a full set of force, touch, and vision senses in
the performance of tasks like building a brick fireplace, laying a hardwood floor, in¬
stalling a bathtub, or painting the front of a house.
Nevertheless, in spite of the present profound inadequacies in robot sensory-
motor skills, robot technology will very soon play a major economic, scientific, and
military role in human affairs. As crude as they are, industrial robots are already
beginning to make a significant contribution to several manufacturing processes
such as spray painting, unloading die-casting and injection-molding machines, tend¬
ing presses, spot welding automobile bodies, handling materials, and arc welding.
These are important and expensive operations in the manufacture of many valuable
articles such as automobiles, tractors, trucks, and earth-moving equipment. Much
of the computer and control technology that was and is being developed in artificial
intelligence and robot laboratories is crucial to the performance of modern missile
guidance systems, particularly in the smart bomb and cruise missile systems, and
soon will undoubtedly be incorporated into many other weapons and electronics
warfare systems as well.
Over the next two centuries, many, if not most, jobs in factories and offices will
be performed by a robot labor force. Robots will surely play a major role in
planetary exploration and in the exploitation of the two-thirds of the Earth's surface
covered by oceans. Robots will eventually appear in the household, although the
cost of general-purpose mechanical servants will probably limit their use for several
decades.
At present, robot technology has two major branches, one technological and
the other scientific. In the development of practical industrial robots the primary
criteria are reliability and cost-effectiveness. In the scientific study of robotics,
(often conducted under the heading of artificial intelligence) the emphasis is on ex¬
ploring fundamental questions of sensory perception, motor control, and intelligent
behavior.«
INDUSTRIAL ROBOTS
Most of the industrial robots used in factories throughout the world exhibit few
236
of the characteristics that the average person would associate with the term
“robot.” Many are “pick-and-place” machines that are capable of only the simplest
kinds of motion. These machines have little or no ability to sense conditions in their
environment. When they are switched on, they simply execute a preprogrammed se¬
quence of operations. The limits of motion of each joint of the machine are fixed by
mechanical “stops,” and each detail of movement must be guided by means of an
electric or pneumatic impulse originating at a plugboard control panel.
Figure 8.9 shows a popular variety of a pick-and-place robot. Programming of
this machine is accomplished by connecting pieces of plastic tubing to the ap¬
propriate nipples on the pneumatic control unit shown in figure 8.10. The bottom
part of the control unit is a sequencer which provides pressure and vacuum to a set
of nipples in a timed sequence. The upper part of the control unit contains a set of
nipples which activate each of the joint actuators. Connections made between the
various nipples determine which joints are actuated, and in which direction, at each
step. Whenever a new program is needed, the programming connections are
relocated and the mechanical stops repositioned to set up a new sequence of
movements.
Figure 8.9: Pick-and-place manipulators are the simplest of present-day industrial robots.
This table-mounted version, made by Auto-Place, Inc., of Troy, Michigan, is powered by air.
Six double-action air cylinders enable the robot to slide back and forth, lift, rotate, reach,
grasp, and turn objects over. The sequence of operations is programmed by means of the se¬
quencing module shown at the right of the table.
Robots 237
Figure 8.11: A Unimate 2000 robot picking up a metal plate to be loaded into a machine tool
shown in the background. This robot is controlled by a program stored in a digital electronic
memory. Each point in the robot’s program consists of six binary numbers specifying the loca¬
tion of the six degrees of freedom of the robot gripper.
Figure 8.12: The Versatran Model FA robot. The Versatran line, formerly manufactured by
AMF; is now produced by Prab Conveyors, Inc. Points in this robot's program can be
specified by potentiometers or can be stored in digital memory.
Robots 239
Figure 8.13: The hand controller used to program the Unimate robot. This device has rate-
control buttons for moving each joint in one direction or the other. The operator uses these
buttons to drive the robot to the desired position for each program point. He then pushes the
record button to store the six values which specify the positions of the six axes at that point.
When the program is played back, the control system simply commands each
joint to move to the positions recorded for each step. Once the robot goes into
operation on the production line it repeats the recorded program over and over
again, moving from one recorded point to the next according to a fixed timing cycle,
either on completion of the last step or in response to an interlock signal from exter¬
nal machinery. This type of robot is called a “point-to-point” robot because the
exact path of the robot is defined only at a few selected points. Most point-to-point
robots have programs of only a few hundred steps. Figure 8.14 shows a schematic
diagram of the record and play-back circuitry. /
An electronic memory enables a robot to store several programs and to select
one or another depending on different input commands or on feedback from exter¬
nal sensors. For example, robots that spot weld automobile bodies can be program¬
med to handle a variety of automobile models intermixed on an assembly line. A
coded signal indicating which body model is positioned in front of the robot is used
to select the appropriate entry point in the robot’s program memory.
If smooth, continuous motion is required, a magnetic tape capable of storing
many thousands of closely spaced steps can be used for a control system.
Continuous-path programs allow the robot to move smoothly through space along a
completely defined trajectory. This is most often used for paint spraying. Figure
8.15 shows a continuous-path, paint-spraying robot. Typically, the continuous-path
robot is led through its task by a human who performs the job once. This path is
recorded on tape and replayed each time the robot is called upon to perform the
same job.
Figure 8.14: Programming an industrial robot is usually done by using a hand control unit to
teach the robot by guiding it through a sequence of positions which are recorded in memory.
Once the teach operation is completed, the control system is switched to the playback mode.
The robot then repeats the recorded sequence of operations automatically.
Figure 8.15: An Italian-made paint-spraying robot. The wrist of this robot is controlled by
cables and can flex both side-to-side and up-and-down like an elephant's trunk. This enables
the spray nozzle on the tip to paint hard-to-reach spots.
Robots 241
Figure 8.16: A Cincinnati Milicron T3 robot with its computer control unit and display con¬
sole. This robot can be programmed to move its gripper in straight lines from point to point.
The computer calculates the velocities and accelerations needed for each joint to produce
coordinated motions.
242
Figure 8.17: Even computer-controlled robots are typically programmed by the teach method.
Specific points in space where parts are to be placed, welds are to be made, or other operations
are to be performed are specified by leading the robot to the points and pressing a “record”
button. A computer control system makes this programming task easier by allowing the pro¬
grammer to move the robot hand along axes defined in the coordinate systems of the work
space or the fingertips or tool point of the robot itself. This robot stores the program points in
x,y,z coordinates and gripper orientation.
Figure 8.18: Programming of a laboratory robot through a computer terminal. Data specify¬
ing points in space where operations are to be performed may be defined by Computer-Aided-
Design (CAD) data bases or from sensors such as video cameras and touch detectors. Pro¬
gramming by teaching may still be used for some elemental moves, but automatic optimiza¬
tion programs will refine these preliminary trajectories into graceful, efficient motions.
Figure 8.19; Two coordinate systems. One, the x,y,z system is defined with the table top as the
x-y plane, the origin at the center of the robot pedestal, and the x axis aligned parallel to one
edge of the table. A second coordinate system, the x/ y/ z' system, is defined with the y' axis
along the direction the finger tips are pointing, the x' /ms parallel to the line joining the two
fingertips, and the origin at the tip of the fingers (or at the tip ofa tool held in the fingers). A
computer can allow robot motions to be specified as vectors in either of these two coordinate
systems.
Figure 8.20: Coordinate systems may also be defined relative to a moving point on a conveyor
belt or relative to an object to be handled. Here the x, y,z system is defined with the x-y plane
in the plane of the belt, the x axis along the direction of the belt’s motion, and the origin fixed
at a specific point on the belt. The x,' y,' z' system is defined with the origin at one vertex of an
object that the robot will pick up. The x,' y/ and z' axes are defined along the edges emanating
from that vertex.
ROBOT SENSES
The great majority of industrial robots, even those with computers, have little
or no sensory capabilities. Feedback is limited to information about joint position,
combined with a few interlock and timing signals. Most robots can function only in
environments where the objects to be manipulated are precisely located in the proper
position for the robot to grasp.
For many industrial applications, this level of performance is quite adequate.
Until recently, the majority of robot applications consisted of taking parts out of
die-casting and injection-molding machines as shown in figure 8.21. In this task, the
parts produced are always in exactly the same position in the mold so that the robot
needs no sensory capability to find the part or compensate for misalignments.
Another principal application is the spot welding of automobile bodies, shown in
figure 8.22. Here, the car bodies are positioned and clamped so that each body is
always exactly in the same place as the one before. Thus, the robot needs no sensory
capability to find where to place the welds.
Robots 245
Figure 8.22: Unimate 2000 robots spot welding automobile bodies on a General Motors Vega
line. Spot welding is one of the largest applications of industrial robots today.
Only since the advent of robots with computers has it become possible for robot
spot welders to operate on moving auto bodies, shown in figure 8.23. In this applica¬
tion, an encoder is attached to the moving line to indicate to the robot how fast the
car body is moving. An optical sensor indicates when each car moves into the work
area so that the robot can begin its programmed routine. The robot’s computer then
transforms the coordinate system of the program to follow the conveyor line.
Figure 8.23: Cincinnati Milicron robots spot welding automobile bodies on a moving line at
the General Motors Assembly Division in Lakewood, Georgia. Four T3 robots place 200 spot
welds on each of 48 car bodies per hour as they move past on a conveyor. This robot is pro¬
grammed in a coordinate system fixed in the auto body, and the computer offsets that
program as the auto body moves past the robot. The ability to work on moving objects makes
it possible to use a much simpler and less expensive transfer mechanism for moving the car
bodies past the robots.
Over the last decade, robot spot welding has become a significant factor in
automotive production. Over 50 percent of all the automobiles built today are
welded by robots.
Of course, in the industrial world, the question of whether to use sensory
capabilities such as vision is purely one of economics. There are three possible ap¬
proaches:
Which is cheaper and easier? In most cases, the answer is either method number
one or two. Robot sensors, particularly vision sensors, are expensive and the pro¬
cessing of sensory data can be extremely complex. Typically, powerful computers
Robots 247
and sophisticated software are required, and often programs must be specially writ¬
ten for each specific application. Thus, although robot vision is an interesting and
challenging field with great promise for the future, it is still almostly entirely con¬
fined to the research laboratory. The exception to this is the industrial applications
where vision is used for inspection tasks, such as to verify whether bottles are filled,
labels are correctly placed, etc. Nevertheless, robot vision research is progressing
rapidly. Computing power is becoming less expensive, and robot control systems are
being designed to use visual information in directing the robot in its task «
Robot Assembly
Even the relatively difficult task of robot assembly can often be accomplished
with little or no sensing of events in the environment. For example, engineers at the
Kawasaki laboratories in Japan have shown that robots can put together complex
assemblies of motors and gearboxes with no more than high-precision position feed¬
back, cleverly designed grippers, and fixtures for holding parts that flex by a slight
amount when the parts are brought together. Other experiments have shown that a
small amount of vibration or jiggling, together with properly designed tapers and
bevels, can accommodate for slight misalignments and prevent jamming when two
pieces with close tolerances are assembled.
Figure 8.24: A commercial Remote Center Compliance (RCC) device. This device, when
mounted between the robot wrist and gripper, compensates for misalignments and thus can
minimize assembly forces and the possibility of jamming.
248
■
TOOL 16 jhroUGH
SCREWDRIVER
6 §§ 15 BOLTS
m CONTRACTING COLLET
I 14 REAR if ROTOR-NUT TIGHTENER
tool housing
5 AND [i LARGE THREE-FINGER TOOL
REAR
BEARING 6 BOLT DRIVER
JOOL ■ (TIGHTEN)
TOOL H 13 ROTOR
3 ■
Figure 8.26: An exploded view of the alternator being assembled in figure 8.25 shows the se¬
quence in which its parts are put together by the robot and identifies the tools that perform
each subtask. Time and motion studies indicate that alternators could be assembled in a fac¬
tory by a robot similar to this one in approximately one minute and five seconds. [From
”Computer-controlled Assembly, ” by J. L. Nevins and D. E. Whitney. Copyright © 1978 by Scientific American, Inc.
All rights reserved.]
250
Other research laboratories have also experimented with robot assembly. Ex¬
periments at Stanford University as well as in a number of other labs have used
vision to acquire parts that are not precisely positioned. In one of the first robot
assembly experiments ever performed, that of the water pump shown in figure
8.27a, the robot used a TV camera to locate the various parts. Figure 8.27b shows
the TV image seen by the robot eye. Similar assembly research has been performed in
Artificial Intelligence Laboratories at MIT, Edinburg University, S.R.I. Interna¬
tional, and in a number of industrial laboratories including IBM, Westinghouse,
and General Electric.
In recent months there have been a number of new robot companies formed.
For the most part these new ventures are targeted for assembly. Many large corpora¬
tions such as General Motors, Texas Instruments, International Harvester,
Volks wagon. Fiat, Renault, Hitachi, and General Numeric are entering the robot
research arena, hoping to use robots in their manufacturing operations.
It is important to note that despite extensive research, robot assembly has yet to
prove itself to be economically practical in more than a very few applications. There
are a number of reasons for this. First of all, robots are very expensive—almost as
expensive as special-purpose assembly machines. There are only a few jobs where
the number of items to be assembled is too few to justify a special-purpose, high¬
speed assembly machine, yet is numerous enough to justify the cost of the robot and
the tooling required for robot assembly.
A second reason is that robots are slow and clumsy compared either with a
special-purpose assembly machine or with a human worker. The human is incredibly
dexterous and adaptable to many different jobs. Human hands and fingers can
literally fly over the work, handle limp goods, and work in cramped quarters with
little difficulty.
Third, a human comes equipped with a vision system that far surpasses any
robot vision system likely to be marketed in this century. The human also has a fan¬
tastic sense of touch discrimination. He or she can take verbal instructions from
speakers of many dialects, can employ sophisticated reasoning powers, can notice
defects in parts or products, and can take corrective action that is far beyond the
capacities of the best robot.
Finally, the human works relatively cheaply, can be hired one week and fired
the next, can be easily moved from one job to another, and can be supervised by a
person with no special skills in computer science or electrical or mechanical
engineering.
Yet robots are very cost-effective in a number of applications. Besides the
injection-molding, die-casting, and spot-welding applications, robots have
demonstrated their abilities in forging, materials handling, machine tool loading, in¬
vestment casting, arc welding, and inspection. The cost of purchasing and maintain¬
ing a robot on the job is about $5.00 per hour. In many applications, this is half or
even a third of the cost of human labor.
Industrial robots usually work no faster than humans can in the same job. But
they do work more steadily, without coffee or lunch breaks, or trips to the rest
room. This can be very significant. For example, in arc welding, a very tedious, hot,
smoky job, the human must wear a face mask and heavy protective clothing. A
human welder has difficulty in keeping the welding torch applied to the work for
more than about 30 percent of the time. However, a robot arc welder, such as shown
in figure 8.28, can usually keep its torch on the work 90 percent of the time. Thus,
even though the robot can weld no faster than a human, it produces about three
times as much output per shift. If the robot works for one third the wages, the pro¬
ductivity gain for only one shift is about 900 percent. When the same robot can work
two, three, or even four (Saturdays and Sundays) shifts per week, the economic
payoff is that much greater.
Figure 8.29: Shakey, the S.R.I. robot designed as a research tool for studying artificial in¬
telligence issues in vision, navigation, planning, and problem-solving. Shakey was controlled
by a PDP-10 computer through a radio link. It carried a TV camera, an optical rangefinder,
and touch sensors so that it could know when it had bumped into something. Its vision system
could detect the dark baseboards around the walls and could discriminate the shapes of
various rectangular and triangular boxes in the room.
Robots 253
Sensory Interaction
Despite the enormous economic potential, present-day robots are simply
incapable of performing most industrial jobs. The principal impediment is that they
lack the sensory capabilities to enable them to compensate for unstructured condi¬
tions in the work environment. In most arc-welding applications, for example, it is
necessary for the welder to work in a number of different positions. To do this he
must be able to see where the weld is to be made and to adjust the welding process to
the slight differences in the conditions of the work. Similarly in assembly, machine
tool loading, and materials handling, the acquisition of parts often requires an abil¬
ity to measure where the parts are. The robot’s program must be adjusted to take
into account the difference between ideal and actual conditions.
As vision research progresses, as cheaper cameras and computers are designed
and manufactured (prices on both are falling dramatically), and as software is
developed, robots will eventually be able to see, and rather well, especially under
controlled conditions. Within the 1980s, industrial robots will be equipped with
many different kinds of sensors that will enable them to measure the state of the
environment. Control systems will be designed to act and react to the sensory infor¬
mation in a successful goal-seeking manner. Sensors will be designed to measure the
three-dimensional space around the robot and to recognize and measure the position
and orientation of objects and relationships between the objects in that space.
Sensory-processing systems will be able to analyze and interpret the raw sensory data
and compare it against an internal model of the external world to detect errors,
omissions, and unexpected events. Finally, there will be a control system that can
accept high-level commands and break down those commands into effective
behavior within the context of the conditions reported by the sensory-processing
system.
To be economically practical, the software for all these capabilities must fit in a
small minicomputer, or a network of microcomputers, and it must run fast enough
so that the robot can react to sensory data within a small fraction of a second. Final¬
ly, the entire robot system, including the sensors, computers, mechanical structure,
and power system, must cost less than what is normally paid to human labor for the
same work.
Given these very stringent requirements, it is almost surprising that robots are
practical at all. Yet, the progress in industrial robotics is rapid, and robots are being
successfully used in a growing number of applications. As the number and
sophistication of their sensory capabilities increases and the cost of computing
power continues to decline, the number of potential applications will grow. As that
happens, robots will become an increasingly important economic factor, and more
and more money will become available for research and development.
A good strategy for the would-be roboticist would be to start from the present
state-of-the-art as practiced in industrial robotics and gradually expand the sensory
and control capabilities until the more difficult tasks become tractable. In fact, this
strategy has been pursued successfully by a number of robotics research
laboratories. For example, the robotics group at S.R.I. International, directed for a
number of years by Charles Rosen and afterward by David Nitzan, began their
robotics work as an outgrowth of the Shakey project. Shakey, shown in figures 8.29
and 8.30, was an early robot, conceived as a demonstration project for the Ad¬
vanced Research Projects Agency (ARPA) artificial intelligence program. Shakey
could be given a task such as finding a box of a given size, shape, and color and
moving it to a designated position. Shakey was able to search for the box in various
rooms, cope with obstacles, and plan a suitable course of action. In some situations,
the performance required the implementation of a preliminary action before the
principal goal could be achieved. In one instance, illustrated in figure 8.30, Shakey
figured out that by moving a ramp a few feet it could climb up on a platform where
the box had been placed.
Figure 8.30: Shakey was able to find its way around a suite of several rooms connected by a
number of doorways. It could detect which pathways were clear and which were blocked and
plan an optimum pathway on command. It was even able to accept a task such as t(push the
box off the platform. ” This was accomplished by developing a strategy ofpushing the ramp to
the platform so that it could climb up and push the box off.
Robots 255
Figure 8.31: A side view of the vision system used on the National Bureau of Standards
research robot. A strobographic flash unit projects a plane of light into the region in front of
the robot fingertips. A camera mounted on the robot wrist measures the apparent position of
the light reflected from an object and computes the position and orientation of the reflecting
surface. If the camera sees a bright mark at angle au the reflecting object must be located at
distance di. If the bright mark is seen at angle a2, the reflecting object is at distance d2. The
known value of h makes the distance calculation a simple problem in trigonometry.
When the Shakey project ended, the S.R.I. robotics group turned their atten¬
tion to the problems of industrial automation. They established an industrial
affiliates program, whereby any company desiring to get expert advice and share in
state-of-the-art research in industrial robotics could become a partner in research.
The S.R.I. research program has produced a number of significant contribu¬
tions to the field. One of the most notable of these is a robot vision system which
includes a TV camera, a computer interface, a microcomputer, and a software
package with a number of binary image processing capabilities. The S.R.I. vision
system converts the TV image to a binary (black-white) image and computes a
number of features on the resulting image regions. For example, the system com¬
putes the area and perimeter of connected regions, their set inclusion relationships,
their degree of elongation along various axes, and their position and orientation in
the visual field. This vision system, used in a number of industrial applications, is
now marketed commercially. S.R.I. has also explored other industrial applications
in parts acquisition, welding, three-dimensional vision, and force and touch sensing.
Artificial intelligence techniques have also been applied to a number of in¬
dustrial applications at MIT. Professor Berthold Horn developed a method for
aligning integrated circuit chips under a microscope so that leads could be
automatically bonded and test probes automatically positioned. Machinery based on
these principles is widely used in the commercial manufacture of electronic com¬
ponents.
A research project at the University of Rhode Island under the direction of Pro¬
fessors John Birk and Robert Kelly has demonstrated the use of robot vision to ac¬
quire parts from a bin. In this case, a TV camera mounted on the robot looks into
the bin and finds a region of uniform brightness. This, it assumes, is a flat surface of
a part. Then it places a vacuum gripper in the center of this region, lifts the part out
of the bin, and holds it up in front of a second TV camera that analyzes the shape of
the now isolated part. This determines the orientation of the part relative to the
robot gripper, and the control system uses this information to place the part in a
desired position in a work fixture.
PIXEL COLUMN
1 32 64 96 128
Figure 8.32: A calibration chart for the vision system shown in Figure 8.31. The pixel row and
column of any illuminated point in the TV image can be immediately converted to x,y position
in a coordinate system defined in the robot fingertips. The x-axis passes through the two
fingertips and the y-axis points in the same direction as the fingers. The plane of the projected
light is coincident with the x-y plane so that the z coordinate of every illuminated point is zero.
Robots 257
Figure 8.33: The NBS experimental robot approaching a cylindrical object. The bright streak
across the cylinder and table top is produced by the plane of the light projector. The image of
this streak of light is processed by the vision system to measure the three-dimensional position
and orientation of the object and to assist in recognizing the type of object being viewed. This
projection technique makes it possible to directly measure the three-dimensional shape of the
object regardless of its reflectance properties or the nature of the background.
258
8.34c. 8.34g.
Robots 259
«.34d. 8.34h.
Figure 8.34: A series ofpictures made at the Jet Propulsion Laboratory by projecting a pair of
vertical planes of light onto objects on a table top. The pictures shown in (e), (f), (g), and (h)
are thresholded versions of (a), (b), (c), and (d) respectively. If the thresholded images are
scanned from left to right, the distances of the bright pixels from the edges of the frame are
directly proportional to the distances of the illuminated points from the camera. Thus, the
distance to, and height of, any object illuminated by either of the lines of light can be
calculated by simple trigonometry or from a look-up table. The planes of light can be scanned
back and forth to build up a depth map of the entire region in front of the camera.
CONCLUSIONS
Robotics has come a long way from the great clock tower in Piazza San Marco.
The technologies of electronics, computers, and servomechanisms have made enor¬
mous strides in recent years; the technologies of semiconductor sensors, memories,
and microcomputers are still in their ascendancy. Yet paradoxically, the actual state
of the art in robotics falls far short of the popular imagination. For years, science
fiction robots have been the heroes and villains of countless spellbinding tales. The
fictional robot is as familiar to most Americans as Superman or Wonder Woman.
Unfortunately, it is just as unreal. Industrial robots costing $50,000 apiece seem
pedestrian compared to the popular image of walking, talking, mechanical
humanoids with superhuman skills and strength. Today only the superhuman
strength is a reality.
But tomorrow will be different. We have only just learned to build good
numerically controlled machines. Only in the past ten years have computers become
inexpensive enough to dedicate a single computer to a robot. It will soon be practical
to use five or ten powerful computers for controlling a single robot. Microcomputer
technology is making this not only conceivable, but simple and inexpensive. Already
computers for robot-control systems cost only a fraction of what the mechanical
hardware costs.
The technology of robotics has come to the edge of a historical breakthrough.
Within this decade industrial robots will begin to play a major role in the process of
industrial manufacturing. By the turn of the century, robots will have fundamental¬
ly altered the entire industrial process. In the long term, this cannot but profoundly
affect the course of human civilization.
Hierarchical Robot-Control Systems 261
CHAPTER 9
^5,1
Figure 9.2: A state-space graph representing a simple robot program. Each state Q cor¬
responds to a single instruction or primitive action in the robot's program. represents the
logical conditions required for the state transition from C} to Cj. The state transition
corresponds to the program counter stepping from one instruction to the next.
264
^6,1
Figure 9.3: Conditional branching enables a robot to select one of several different state tra¬
jectories (or program pathways) depending on sensed conditions.
Figure 9.4: State trajectories corresponding to robot programs at two hierarchical levels.
For example, the command set shown in figure 9.5 comprising the programming
language VAL (supplied with the Unimate PUMA robot) is essentially a set of
elemental-move commands.
Of course, strings of macro statements at the elemental move level can
themselves be partitioned into consistently recurring groups to form second-level
macros. These correspond to VAL subroutines. A well-designed set of such
subroutines can become statements in a programming language at the simple task
level. Similarly, recurring groups of simple task commands can be defined as third
level macros, or complex task commands. In principle, this process can be repeated
any number of times to create a hierarchy wherein modules at each level contain a
library of programs, written in a programming language peculiar to that level. Each
statement at a particular level consists of an input command which, in the context of
the feedback at that level, generates a sequence of subcommands to the next lower
level. At the very top is a single command, or task name, which is decomposed
through a succession of hierarchical levels, until at the lowest level a string of action
primitives produces the forces and motions to accomplish the top-level goal.
The number of macros required at each level depends upon the breadth and
degree of generality of the robot skill mix. In a restricted domain such as manufac¬
turing, the number of skills is limited. Time and motion studies of human workers
and robots in manufacturing indicate that the number of different types of elemen¬
tal moves required to perform routine mechanical manufacturing tasks is not large.
There is a reasonably small set of different types of action at the elemental move
level, such as reach, grasp, lift, transport, position, insert, twist, push, pull, and
release. A list of modifier variables, derived in part from feedback, can specify
where to reach, when to grasp, how far to twist, and how hard to push.
266
Pxagxam instructions
AB ABOVE
BE BELOW
FL FLIP
LE LEFTY
NOF NOFLIP
RI RIGHTY
CO COARSE [ALWAYS]
FI FINE [ALWAYS]
INTOF INTOFF [ALWAYS]
INTON INTON [ALWAYS]
NON NONULL [ALWAYS]
NU NULL [ALWAYS]
SP SPEED <value> [ALWAYS]
Figure 9.5: The instruction set for the VAL robot programming language. This is the language
provided with the Unimation PUMA robot.
cess at any level at any time, altering it and so making the system responsive in real
time to events in the environment.
The sophisticated real-time use of sensory measurement information for coping
with uncertainty and recovering from errors requires that sensory data be able to
interact with the control system at many different levels with many different con¬
straints on speed and timing. For example, as shown in figure 9.6, joint position,
velocity, and sometimes force measurements are required at the lowest level in the
hierarchy for servo feedback. This data requires very little processing, but must be
supplied with time delays of only a few milliseconds. Visual depth, proximity data,
Figure 9.6: A cross-coupled sensory-control hierarchy. The links from left to right provide
feedback for controlling behavior. The links from right to left provide context and expecta¬
tions for processing sensory data. This figure illustrates the type of feedback information
provided to the task-decomposition hierarchy at each level. In this diagram the lowest level of
the hierarchy of figure 9.1 has been split into two levels: one for coordinate transformations
between work space (or end effector space) and joint space; the other for servo-computations
on the joint positions and velocities.
Hierarchical Robot-Control Systems 269
and information related to edges and surfaces are needed at the primitive action level
of the hierarchy to compute offsets for gripping points. This data requires a modest
amount of processing and must be supplied within a few tenths of a second.
Recognition of part position and orientation requires more processing and is needed
at the elemental move level where time constraints are on the order of seconds.
Recognition of parts and/or relationships between parts that may take several
seconds is required for conditional branching at the simple task level.
Attempting to deal with this full range of sensory feedback in all of its possible
combinations at a single level leads to extremely complex and inefficient programs.
An obvious strategy might be to use structured programming with layers of
subroutines. But using conventional macros and subroutines does not help in the
case of complex sensory interaction, because of the hopeless complication of multi¬
ple interrupt servicing at many levels of subroutine calls simultaneously. Interrupt-
driven sensory interaction has the additional disadvantage of making complex
programs very difficult to debug.
This suggests that conventional programming approaches to sensory-
interactive robot-control systems are fatally flawed. Only if the various modules in
the control hierarchy are treated like state machines does it become simple to write
and debug programs for sensory-interactive behavior. ,
SOFTWARE DESIGN
The design of a software system that can implement the multilevel branching of
the cross-coupled, processing-generating hierarchy is conceptually straightforward.
At each level of the generating hierarchy, the H function represents a table, or list,
of states and state mappings, one of which is selected by every possible combination
of input vectors C + F. At each time tick k, the input C + F constitutes an ad¬
dress, or pointer, to a node which contains either the output P itself or a procedure
for computing P. This is illustrated in figure 9.7.
Theoretically, all //, M, and G modules can be implemented by program
statements of the form IF < state, input > /THEN < output, next-state >. This is
equivalent to P = H(S). It is a canonical form for represented knowledge, rules of
behavior, and algorithms for perception.
One method of implementing the H, M, and G modules is by a state-table as
shown in figure 9.8. Here the simple task <FETCH(A)> is defined. The left-hand
272
C A—1
3
Fk~ 1 53_1
3
FEEDBACK
NEXT
STATE
P*
SENSORY 3
CONTEXT
Ct1
Fzl
S*2'1
FEEDBACK
NEXT
'
"2
„
3h2
STATE
P2
SENSORY
CONTEXT
C\-1
side of the table consists of a command vector C and a feedback vector F. The C vec¬
tor consists of a command FETCH and an argument X. The F vector consists of a
state defined by the previous output plus feedback consisting of processed sensory
data from the external environment as well as information from lower-level
modules. The right-hand side of the state-table defines (or points to procedures that
define) a P vector that consists of an output command to lower-level //modules, the
next state information to be used internally, and sensory context information to be
Hierarchical Robot-Control Systems 273
Ck-1 F* 1 Pk
Figure 9.8; A state-table representation of the task-decomposition function at the simple task
level. A state-table such as this is one method of implementing the function P = HfS>.
sent to M modules. The sensory context information addresses the M module that
retrieves an R vector to be sent to the G module. In this example, when the sensory
context output is g2, the sensory-processing module is instructed by the M module to
use the G function that computes the orientation of X. The M module may also be
instructed to generate an expected value for Orientation^. The feedback informa¬
tion then indicates whether Orientation(A")<0, or Orientation(A")>0. When the
sensory context output is gu the G function which computes the distance to X is
evoked.
Figure 9.9 illustrates an entire library of procedures in an H module and the
state-table for accessing them. At each clock tick k, the left-hand side of the state-
table is searched for an entry corresponding to the input C + F. If an entry is
found, the pointer is set to that location, and the first node in the right side of the
state-table is used as a pointer to a procedure that computes an output P = H(S). If
no entry can be found, the pointer is set to an error condition and a procedure is
evoked to output the appropriate failure activities. In most cases, a failure condition
will output a < STOP > command to the //module below and a failure flag to the H
module above.
Each entry in the state-table represents an IF/THEN rule, or production. With
this construction, it becomes possible to define behavior of arbitrary complexity. An
ideal task performance can be defined in terms of the list of events that take place
during the ideal performance. Deviations from the ideal can be incorporated by
simply adding the deviant conditions to the left-hand side of the state-table and the
appropriate action to be taken to the right-hand side. Any conditions not explicitly
covered by the table result in a failure routine being executed.
Figure 9.9: The entire library of procedures that comprise an H module can be represented as
an extended state-table. This extended table is simply the union of all the state-tables that
represent all of the tasks that can be decomposed by the H module.
Hierarchical Robot-Control Systems 275
fj. P BUS
Figure 9.10: A microcomputer network developed at the National Bureau of Standards for
implementing a hierarchical robot control system.
ALL MICRO¬
ALL MICRO¬ COMPUTERS
COMPUTERS WRITE INTO
READ FROM COMMON
COMMON MEMORY AND
MEMORY AND WAIT FOR
COMPUTE THE NEXT
AN OUTPUT START PULSE
t
START START
t t
START
I
START
PULSE PULSE PULSE PULSE
Figure 9.11: A timing diagram of the activity in the microcomputer network shown in figure
9.10.
Each logical module is thus a state-machine whose outputs depend only on its
present inputs and its present internal state. None of the logical modules admit any
interrupts. Each starts its read cycle on a clock signal, computes and writes its out¬
put, and waits for the next clock signal. Thus, each logical module is a state-machine
with the IF/THEN or P = H{S) properties of a CMAC function.
The common memory “mail drop” communication system has a number of
advantages and disadvantages. One disadvantage is that it takes two data transfers
to get information from one module to another. However, this is offset by the
simplicity of the communication protocol. No modules talk to each other so there is
no handshaking required. In each 28-millisecond time slice, all modules read from
common memory before any are allowed to write their outputs back in.
The use of common-memory data transfer means that the addition of each new
state variable requires only a definition of where the newcomer is to be located in
common memory. This information is needed only by the module that generates it
so that it knows where to write, and by the modules that read the information so that
they know where to look. None of the other modules need know, or care, when such
a change is implemented. Thus, new microcomputers can easily be added, logical
modules can be shifted from one microcomputer to another, new functions can be
added, and even new sensor systems can be introduced with little or no effect on the
rest of the system. As long as the bus has surplus capacity, the physical structure of
the system can be reconfigured with no changes required in the software resident in
the logical modules not directly involved in the change.
Furthermore, the common memory always contains a readily accessible map of
the current state of the system. This makes it easy for a system monitor to trace the
history of any or all of the state variables, to set break points, and to reason
backwards to the source of program errors or faulty logic.
The read-compute-write-wait cycle, with each module a state-machine, makes it
possible to stop the process at any point, to single step through a task, and to
observe the performance of the control system in detail. This is extremely important
for program development and verification in a sophisticated, real-time, sensory-
interactive system in which many processes are going on in parallel at many different
hierarchical levels.
storage until all the processes have completed their read cycle. Then each of the pro¬
cesses is allowed to write its output variables into common memory. The program
then cycles back to the beginning and restarts. Any process that cannot finish its
computation in a single cycle can hold temporary results until its next turn in the
next cycle. When it finally does complete its computation, it writes its output during
the allotted time slot and waits for the next read period. This is a programming
technique that is often used in process-control, systems-simulation, and multitask
modeling.
Thus, there are many ways to implement the hierarchical control structure
described above. It can be done with a large main-frame computer, a powerful
minicomputer, a network of microcomputers, or even by a network of CMACs. The
modular, state-machine approach separates the H, M, or G functions into simple,
understandable blocks of code which can be written, debugged, and optimized in¬
dependently. The modules have a simple canonical form that makes them
understandable and the code readable. It forces a partitioning of the problem into
manageable chunks, which can be independently analyzed, reduced to algorithms,
and then reassembled into a complex intelligent system. This provides a systematic
approach to the synthesis of intelligent behavior. It can start with the simple tasks
that are within the capacity of present-day robots, gradually adding new modules to
increase the computing power and sensory capability of the system. Each additional
sensory module increases the sophistication of the sensory interaction, and each new
control module adds a new motor skill. Each new memory or sensory-processing
module increases the perceptual capacities. Most important, each new level in the
hierarchy produces a quantum jump in the intelligence of the system. This modular
approach thus provides the beginnings of an evolutionary framework for upgrading
the simple behavioral patterns of the pick-and-place robot to eventually approach
the intellectually sophisticated and enormously complex behavioral patterns pro¬
duced by the human brain.
FUTURE DEVELOPMENTS
materials transport systems, inventory control, safety, and inspection systems into a
sensory-interactive, goal-seeking hierarchical computing structure for a totally
automatic factory.
Of course, it is at the higher levels that the implementation of the hierarchy in
small modular state-machine computational modules has not yet been proven feas¬
ible. This is still an open issue about which one can only speculate at this time.
Artificial Intelligence 281
CHAPTER
Artificial Intelligence
Planning and problem-solving are two of the central topics in artificial in¬
telligence. The principal issue in these areas is the search for, and optimization of, a
success trajectory through the space of all possible states of the world. Since this is
obviously an infinite space, the first step is to limit the problem domain to some
subset of all possible states of the world. One way to do this is to select a finite prob¬
lem. A game like tick-tack-toe has a relatively small number of possible states, and
the entire state space can be exhaustively searched rather easily. Figure 10.1 shows
part of the graph that completely describes the space of all possible states in the
game of tick-tack-toe. On the other hand, a game like chess has a space of all possi-
ble states that is so large that it could not be exhaustively searched by the fastest
computer in the world during the life span of the universe.
Problems in the real world often have an infinite space of possible states, yet
many of these are obviously solvable. Real-world problems are routinely solved by
persons of modest intellect and training. In fact, real problems of enormous com¬
plexity such as hunting for food, escaping from danger, winning a sexual partner, or
rearing offspring are regularly solved, not only by humans, but also by animals and
insects. Many everyday problems are apparently nowhere near as subtle or difficult
as board games like chess. For example, the simple everyday problem described in
Chapter 1 of buying a record at the shopping center clearly involves an infinite space
of possible states. Each step taken could be performed by a continuum of muscle
contraction forces. The problem could be solved in a wheelchair, on a bicycle, or
even by rolling one's body along the ground. An interesting feature of human
problem-solving is that we rarely think of all the possible alternatives—we just do
the simplest thing without much thought.
The traditional artificial intelligence approach to computerized problem¬
solving is to define a set of states (i.e., configurations of the world) and rules for
transforming one state into another. In a board game, there is usually an initial state
(or set of states) from which the game begins and a final state (or set of states) that
signals the end. The rules of the game are the rules for moving pieces that transform
the game from one state to the next. One can construct a graph, such as shown in
figure 10.1, that represents all possible trajectories through the state space of the
game. Any single playing of the game is represented by a single trajectory through
this graph.
Much AI (artificial intelligence) research has concentrated on the question of
finding a winning trajectory. This is usually accomplished by defining an evaluation
function, which computes some scalar value of goodness, or advantageousness, of
any state. From any current position, or state, all possible states that can be reached
by a given number of legal moves are evaluated for goodness, and the path leading
to the state with the best evaluation is chosen for the next move. The evaluation
function used in AI corresponds to the emotions in the biological brain. Both are
used by their respective systems to select optimum behavioral trajectories.
Research in AI has also focused on methods for finding effective evaluation
functions. Some efforts have attempted to devise evaluation functions that learn
from mistakes and thus improve their performance. The checker-playing program
of A. L. Samuel is an early and notable example of this strategy. Other research
areas have attempted to apply different evaluation functions under different board
conditions, such as opening, mid-game, and end-game phases. This is a simple form
of hierarchical decomposition.
Still other strategies have concentrated on the order of the search (i.e., which
states to evaluate and in which order). It is clear from figure 10.1 that a great major¬
ity of the possible moves are strategically poor and are apparently not even
considered by human tick-tack-toe players. How does one find the good moves and
Artificial Intelligence 283
avoid the bad? For example, should one evaluate all possible single moves from the
current position, or should one follow promising pathways through several moves
before evaluating the next possible move from the current position? There is a com¬
putational cost associated with each evaluation and a finite amount of computation
that can be performed between each move. Thus, the program that uses its computa¬
tional resources most efficiently is most likely to be successful.
XI 0 IX XI IX XI IX
0 o_0
XI lb XI IQ xl O'O
Figure 10.1: A graph that describes part of the space of all possible states of the game tick-
tack-toe. The dark trace indicates the sequence of moves actually made during one playing of
the game. Strategies for searching such graphs to find trajectories leading to winning states is a
classic topic in artifical intelligence research.
Perhaps the major difference between human and machine approaches to
problem-solving is that the brain performs many parallel operations while com¬
puters do not easily represent parallel processes in search strategies. This is par¬
ticularly true for computers programmed in LISP, the principal programming tool
of AI research. Thus, much effort has been expended on making early decisions
about which moves are unpromising so that the decision tree can be pruned.
Procedures for deciding which search strategies and which evaluation functions
to apply in which situations are called heuristics. Heuristics are essentially a set of
rules that reside one hierarchical level above the move selection and evaluation func¬
tions of the search procedure. A heuristic is a strategy for selecting rules, i.e., a
higher level rule for selecting lower level rules.
Attempts have been made to duplicate human problem-solving strategies.
Perhaps the best known example is the General Problem Solver (GPS) developed by
Allen Newell and Herbert A. Simon. Newell and Simon recorded the human thought
processes reported by students as they struggled with problems in abstract
mathematics. From these observations they developed for GPS a technique called
means-ends analysis. Simply stated, this consists of observing the goal state, com¬
paring it with the existing state, and then searching for a transformation rule that
can change the existing state into a form closer to the goal state.
This process implies an ability to measure the state-space distance between the
existing state and the goal state and to associate a transformation with each possible
difference condition. In complex problems, the number of possible difference condi¬
tions can be astronomically large.
Attempts to deal with this led eventually to the concept of problem reduction
using AND/OR graphs. The basic notion here is a hierarchical decomposition of a
single difficult (or high-level) problem into a sequence of simpler subproblems. If all
Figure 10.2: An AND/OR graph. Nodes B and C are OR nodes under A. Nodes D and E are
AND nodes under B. F and G are AND nodes under C. AND nodes are indicated by an arc
joining them.
Artificial Intelligence 285
of the subproblems need to be solved in order to solve the higher level problem, the
decomposition is represented by an AND node. If only one of several subproblems
needs to be solved in order to solve the higher level problem, the decomposition is
represented by an OR node. Figure 10.2 shows a simple AND/OR graph. Node A
represents a problem that can be solved by solving either subproblem B or C. Sub¬
problem B can be solved by solving both sub-subproblems D and E. The strategy is
to repeatedly decompose problems into subproblems until finally at the bottom of
the AND/OR graph there are a sequence of primitive subproblems for which there
exist transformation rules corresponding to one-step solutions.
AND/OR graphs were first used by James Slagle in a program called SAINT
that solved freshman calculus problems in symbolic integration. Soon afterward,
this technique was used by Rigney and Towne for analyzing the structure of serial
action schedules for industrial tasks. It has subsequently been used for many
different types of problem-solving, including task decomposition for robots. The
task-decomposition hierarchies discussed extensively in Chapters 5 through 7 are
essentially AND/OR graphs with time and feedback inputs explicitly represented.
The trajectories shown in figure 5.17 correspond to a string of AND nodes that
decompose the complex task < ASSEMBLE AB> . Figure 5.21 corresponds to a set
of alternative OR nodes that are selected under different external conditions.
PRODUCTION SYSTEMS
the molecule producing the spectrum has a particular structure. The premise can be
a set of symptoms and results of blood tests and the consequent a probability that a
certain infectious disease is present.
One of the first successful production-based systems is a program called DEN-
DRAL. The DENDRAL system works out the structure of molecules from chemical
formulas and mass spectrograms. First a set of production rules is applied to the
mass spectrogram to create lists of required and forbidden chemical substructures.
These rules are of the form:
IF there is
a high peak at mass/charge point 71, and
a high peak at mass/charge point 43, and
a high peak at mass/charge point 86, and
any peak at mass/charge point 58,
THEN
there must be an N-PROPYL-KETONE3 substructure.
DENDRAL contains about 10 such production rules for any given category of
chemical compounds. The application of these rules to the mass spectrogram
reduces the number of possible chemical structures from several hundreds or
thousands to several tens. Once this is done, a second set of production rules
operates on the remaining candidate compounds to predict what the mass spec¬
trogram would look like if that particular compound were the experimental one.
Then each of the predicted spectrograms is compared with the observed spec¬
trogram, and the best match is determined. There are about 100 production rules in
this second set.
Grammar G
Symbols: Rules:
S = Sentence S — NP VP
NP = Noun Phrase VP - V NP
VP = Verb Phrase VP - V NP PP
V = Verb PP — P NP
N = Noun P - to
PP = Prepositional Phrase V —> gave
P = Preposition NP - N
NP - the N
N — Sam
N — John
N — apple
Figure 10.3a: A grammar consisting of a set of symbols and a set of rules for transforming
symbols into other symbols.
Artificial Intelligence 287
apple to John
The DENDRAL program has been tested extensively and found to perform
about as well as an experienced chemist on the particular type of compounds
covered by the set of rules in the program. The success of the DENDRAL program
has led to further research and the development of several other programs with
names such as meta-DENDRAL, CONGEN, and SECS.
A similar program called MYCIN was developed to deal with the diagnosis of
bacterial infections. MYCIN currently contains about 500 production rules of the
following type:
IF
the infection type is primary-bacteremia, and
the suggested entry point is the gastrointestinal tract, and
the site of the culture is one of the sterile sites,
THEN
there is probability (p) that the organism is bacteroides.
Each rule returns a probability. MYCIN applies all 500 rules to the set of symp¬
toms and then combines the resulting probabilities in an AND/OR tree to select the
most likely type of infection.
One of the interesting features of this type of system is the canonical form of
the knowledge. Each rule is a modular chunk of knowledge that is applied whenever
the appropriate premise is present. It is the type of reasoning that humans are used
to, and the form of the knowledge makes it particularly easy to translate from
natural language into the productions rules and vice versa. The program includes a
language translator so that it can be used interactively by doctors who are not com¬
puter scientists. A typical MYCIN print-out is shown in figure 10.4.
**HOW DO YOU USE THE TYPE OF THE INFECTION?
Would you like a summary of all the parameters concluded by these rules?
**N0
Which of the rules do you wish to see?
**280
RULE 280
1) The infection which requires therapy is meningitis, and
2) The type of the infection is fungal, and
3) Organisms were not seen on the stain of the culture, and
4) The patient is not a compromised host, and
5) The patient has been to an area that is endemic for
coccidiomycoses, and
6) The race of the patient is one of: black asian Indian, and
7 ) The cryptococcal antigen in the csf was not positive
Author: YU
Justification: Dark-skinned races, especially Filipino, Asian, and Black
(in that order) have an increased susceptibility to coccidiomycoses
meningitis.
Literature: Stevens, D. A. et al. Miconazole in Coccidiomycosis. Am J Med,
60:191-202, Feb 1976.
Figure 10.4: A record of a dialogue between the MYCIN program and a user. The inputs
typed by the human are preceded by the prompt characters **.
By looking at a trace of which productions were used in which order, the pro¬
gram can answer why a certain question was asked or how a certain conclusion was
derived. For example, the following type of exchange is typical:
Doctor: How do you know that the culture was taken from a sterile source?
Mycin: RuleOOl and Rule022 gave suggestive evidence.
Doctor: How was RuleOOl triggered?
Mycin: It was known that the site is one for which sterility depends on method.
The method is one of the recognized ones for the site, and it is not known if care
was taken; therefore there is strongly suggestive evidence that the culture was
taken from a sterile source.
Artificial Intelligence 289
If the doctor disagrees with MYCIN’s conclusion, he can walk back from the
conclusion through the AND/OR tree examining each production rule invoked to
see where his reasoning differs from that of the program. This may lead to the
modification of some rule or to the addition of a new rule. This is a classic case of
learning from a teacher.
The performance of the MYCIN program has been tested against the perfor¬
mance of specialists with varying degrees of expertise. The results of two series of
tests on 80 patients in one case and 10 patients in another showed the MYCIN
program scoring higher than any of the human prescribers. The scoring was done by
a panel of eight prominent specialists of infectious diseases at institutions not con¬
nected with the MYCIN project.
MYCIN has not yet been used in clinical settings. To be practical for everyday
use it must be extended to cover all the major infections likely to be found in a
hospital. This implies that additional hierarchical levels of production rules are
needed to select the set of lower level rules that apply to the particular type of infec¬
tion encountered.
The ability of production rule-based systems to perform expertly in a wide
variety of fields, from organic chemistry to diagnostic medicine to geology, is well-
documented. Production rules are exactly the type of functional operators required
at the upper levels of a hierarchical processing-control system. At these levels of the
hierarchy where this type of behavioral decision needs to be made, the requirements
for speed are not demanding. Computation times of several seconds, or even
minutes, are acceptable. Even the most sophisticated production-based systems have
a reasonably small set of production rules (i.e., less than a thousand). Such a system
can easily return a decision, or recommend an action, in a few seconds even if im¬
plemented on a microcomputer.
Thus, the application of rule-based expert systems to the high-level, goal-
decomposition modules of a hierarchical control system for an intelligent robot or
an automatic factory is a likely prospect within the next decade. Before the turn of
the century this type of production-based system may be capable of performing all
of the day-to-day operating decisions required in most factories and offices.
LANGUAGE UNDERSTANDING
Figure 10.5a: A computer graphics display of the “blocks world” that is the subject of conver¬
sation between a “robot” and a human programmer using Terry Winog rad’s language¬
understanding program. Above is the configuration of the computer’s internal model of the
blocks world at the time when the <PICK UP A BIG RED BLOCK> command is received.
Figure 10.5b: The internal model after the computer; 1) finds it must move the green block
before it can reach the big red one, 2) finds a place to put the green block, and 3) moves the
green block to the empty space.
292
Figure 10.5c: The configuration of the blocks world model after the big red block has been
successfully picked up and the computer has answered “OK. ”
Figure 10.6 is an example of the dialogue. As is obvious from this example, the level
of understanding of the computer is considerable. The computer can carry on a pro¬
tracted conversation, recognizing and using colloquial sentence fragments
appropriately in place of complete sentences, correctly interpreting pronouns, and
automatically selecting the most meaningful interpretation of ambiguous words or
phrases. This is not a trivial level of understanding.
The key to the Winograd program is that the computer memory contains a
model of the world that is shared by both the computer and the human programmer.
The discourse is about objects, actions, and relationships that have some specific
representation in the computer’s internal model. Furthermore, the knowledge is
mostly represented in the form of procedures, rather than static data structures. For
example, the dictionary contains separate little programs for each word, the execu¬
tion of which check whether the words are being correctly used. The part of the
system that analyzes the syntactic form of sentences contains separate programs that
encode the rules of each grammatical structure.
Finally, the computer program has the ability to “imagine” what would happen
IF certain actions were carried out. For example, when it receives the command
< PICK UP THE RED BOX >, it hypothesizes the attempt only to find that it is
blocked by the fact of a green box sitting on top of the red one. It then backs up,
hypothesizes moving the green box first, finds out it must discover an empty space to
put it down, etc. Finally, once it finds a way to pick up the red box, it actually
Artificial Intelligence 293
performs the entire sequence of actions in the proper order to accomplish the com¬
manded goal. Thus the program can hypothesize an action and evaluate the result of
that action; hypothesize another action and evaluate that; and so on. It can then
select the hypothesis that results in the commanded goal being realized and imple¬
ment it in overt action. In the case where the results of its internal (mental)
experiments lead to ambiguous, unclear, or impossible situations, the program can
ask questions or request further instructions of the human operator. There have
been several programming languages written to facilitate the writing of the kind of
software that is used in the Winograd program.
I'M NOT SURE WHAT YOU MEAN BY "ON TOP OF" IN THE
PHRASE "ON TOP OF GREEN CUBES."
DO YOU MEAN:
1— DIRECTLY ON THE SURFACE
2— ANYWHERE ON TOP OF?
2
THREE OF THEM.
Had you touched any pyramid before you put the green
one on the little cube?
YES, THE GREEN ONE.
When did you pick it up?
WHILE I WAS STACKING UP THE RED CUBE, A LARGE RED
BLOCK AND A LARGE GREEN CUBE.
Why?
TO GET RID OF IT.
Why did you do that?
TO CLEAN OFF THE RED CUBE.
Why did you clear off that cube?
TO PUT IT ON A LARGE GREEN CUBE.
Why did you do that?
TO STACK UP THE RED CUBE, A LARGE RED BLOCK AND A
LARGE GREEN CUBE.
Why did you do that?
BECAUSE YOU ASKED ME TO.
Figure 10.6: An example of the dialogue between a human and the robot in Winograd’s pro¬
gram. Lines in capitals are generated by the computer.
Hello.
HOW DO YOU DO. PLEASE STATE YOUR PROBLEM.
I want a good example for my book.
YOUR BOOK.
Yes, my book about computers. Can you help me?
YOU'RE BEING SOMEWHAT SHORT WITH ME.
I didn't mean to be.
YOU SAY YOU DIDN'T MEAN TO BE.
That's right. What do you know about computers?
DO COMPUTERS WORRY YOU?
Not at all. I think they can be very useful.
CAN YOU ELABORATE ON THAT?
Computers play a vital role in the operation of our
society.
WHY DO YOU MENTION COMPUTERS?
Because that is what the book is all about.
IS THAT THE REAL REASON?
Of course. Sometimes you are difficult to talk to.
WHAT MAKES YOU THINK I AM DIFFICULT TO TALK TO?
You answer in such short, stylized phrases.
WE WERE DISCUSSING YOU - NOT ME.
I thought we were discussing computers.
WHAT DO YOU THINK MACHINES HAVE TO DO WITH YOUR
PROBLEM?
My only problem is to get you to carry on a
reasonable conversation.
WHY DO YOU SAY YOUR ONLY PROBLEM IS TO GET I TO CARRY
ON A RESONABLE CONVERSATION?
You don't use very good grammar.
OH, I DON'T USE VERY GOOD GRAMMAR.
That's right Don't you know the difference between
nominative and accusative?
YOU'RE NOT REALLY TALKING ABOUT ME - ARE YOU?
I really was, but I'm willing to change the subject.
DO YOU SAY YOU REALLY WAS FOR SOME SPECIAL REASON?
The reason is that I would like an interesting
conversation in the book.
IN WHAT WAY?
So that the readers will not get bored with it.
I AM NOT SURE I UNDERSTAND YOU FULLY.
You don't understand very much, do you!
WHAT ARE YOUR FEELINGS NOW?
I feel that I'm about ready to give up.
TELL ME MORE ABOUT SUCH FEELINGS.
The page is almost full now, so I can end this
interview .
SUPPOSE THE PAGE WERE NOT ALMOST FULL NOW?
Then we would have to continue this silliness a while
longer.
PLEASE GO ON.
That's all there is to say. Don't you ever get tired?
PERHAPS THAT'S ALL THERE REALLY IS TO SAY.
Yes. Goodbye.
YOU SEEM QUITE POSITIVE.
hierarchy. Thus, as the number of levels in the hierarchy grows linearly, the size of
the domain of discourse grows exponentially—which seems to suggest that robot
intelligence can indeed someday equal or even surpass human intelligence!
There is no reason to suppose that the present human brain represents any
theoretical maximum of intellect. Quite to the contrary, the enormous effort re¬
quired to educate the human mind in the techniques of logic and reason and to instill
the lessons of history suggests that other methods for computing and remembering
could be much superior. It might be possible, for example, to build a robot brain
that remembers all the information stored in all the libraries of the world. If it’s
possible to create robots that are as intelligent as humans, there is no reason to sup¬
pose that we couldn’t, with the simple addition of another hierarchical level, make
robots more intelligent than humans. Nevertheless, because of their very different
physical forms and prior learning experiences, robots and humans will forever
remain very different creatures.
The subject of comparative intelligence is full of logical pitfalls. To begin with,
intelligence is not a scalar value, as is suggested by IQ ratings. Intelligence is as
multi-valued as is behavioral skill. Who is more physically skilled, the baseball
player Reggie Jackson, the piano player Monty Alexander, or the gymnast Nadya
Comaneci? That these skills are so different makes the question meaningless. So
also, the ranking of intelligence on a linear scale is extremely misleading. The vast
differences in the organisms in which the intellect resides will prevent us from ever
being able to definitively answer the question of whether robots can be as intelligent
as humans.
Intelligence is only one aspect of the human mind. What about the emotions:
what of love, hate, hope, fear, awe, and wonder? What about religious experience?
Can a robot feel a sense of duty? Could it have a conscience? We don’t know. Even
in the human mind, such feelings and motives are far from understood. One can
only speculate whether similar phenomena could ever be duplicated in robots. In
order for any autonomous organism to survive and prosper in a hostile world, there
must be mechanisms for detecting and avoiding danger, seeking protection, and
carrying on in the midst of adversity. If future robots are ever designed to behave
successfully in an uncooperative, even hostile, environment, then something
analogous to human emotional evaluators will be needed to select and optimize
behavior. There must be a world model that enables it to predict the future, lay
plans, and take action in anticipation of difficult circumstances. The world must be
made predictable. In a world populated by intelligent rivals, one must be able to
construct internal models sufficiently complex to predict the behavior of those
rivals. In a world where survival depends on predicting the weather and coping with
fire, flood, earthquakes, storms, droughts and a natural order shrouded in mystery,
an internal world model might well require a sense of magic, a sense of religion to
give it predictability.
As long as robots are confined to the laboratory or factory where they do not
have to hunt for food, flee from predators, compete for sex, or survive in war, it
Artificial Intelligence 299
seems unlikely that they will develop the internal models required for survival under
such conditions. Laboratory or industrial robots will survive and multiply in
numbers by simply performing useful services or feats of intellect for their human
makers. Like laboratory rats, they will be most useful if they remain docile and
domesticated. Presumably they could be programmed to exhibit jealousy, envy,
greed, or any of the other emotional traits of humans. But whether there would be
any economic or entertainment benefit to us, and hence a survival benefit to the
robots, in such characteristics is questionable. There is no reason for such disruptive
characteristics to be implanted in robots designed for industrial or domestic applica¬
tions.
There is even a sense in which it can be argued that robots are an evolving life
form. Already robots are used to make robots. Within a few decades robots and
automatic factories will achieve a significant degree of self-reproduction. This will
create at least a precursor to a new life form, a life form based on silicon rather than
carbon, which will draw energy from the electrical power grid rather than from
photosynthesis or metabolism of carbohydrates. Robot evolution may be very rapid,
because robots have human designer-creators who are much more discriminating
than the mechanisms of natural selection.
Such is the stuff of lively conversation and philosophical disputation. All such
arguments are speculation and reasoning by analogy. The answer to the question of
whether machines ever will, or even can, possess a general level of intelligence com¬
parable to humans’ is unknown and may be unknowable.
Nevertheless, in the practical world, robots already have the capacity to per¬
form useful and valuable jobs, and those capabilities are being rapidly expanded.
Within limited domains, such as manufacturing, robots will soon be able to perform
the perceptual analysis and control functions necessary to operate machines and in¬
dustrial processes for long periods of time without human intervention. This might
not speak to the philosophical question of whether machines will ever rival human
intellect or duplicate human feelings, but it does indicate that totally automatic fac¬
tories are technically feasible. Then, the truly interesting questions become: can
robot factories be made economically practical, and, if so, what will be the effects
on the social order? How will the creation of an industrial economy based on robot
factories and offices impact the way we live and earn our income?
Future Applications 301
CHAPTER
Future Applications
ivy7
Figure 11.1: A hierarchical control structure being designed at the National Bureau of Stan¬
dards for controlling an automatic machine shop. On the right is a hierarchy of data bases
containing part designs and process plans generated interactively. On the left is another hierar¬
chy of data bases that contain a complete state description of all the machines and parts in
progress in the factory. In the center is a hierarchy of control modules that use the process
plans, part data, and factory status data to decompose high-level tasks of part manufacture in¬
to low-level actions by numerically controlled machine tools and robots. The feedback pro¬
cessors extract information from the lower levels relevant to the decisions being made at the
higher levels. The management-information system enables human managers to query the
status of the system and set priorities. Each of the computing modules acts as a state-machine
making possible the dynamic interaction of real-time feedback with the scheduling and
routing of work through the shop.
Future Applications 303
But there are numerous large-scale attempts being made right now to do just
that. Japan, East and West Germany, Norway, and Sweden have all made the
development of completely automatic robot factories a high priority item of na¬
tional policy. Firmly committed to the creation of automatic factories for over a
decade, the Japanese Ministry of Industry and Trade (MITI) has invested hundreds
of millions of dollars in research and development for the automatic factory. The
principal focus of this effort is a project called Methodology for Unmanned
Manufacturing (MUM).
The economic rationale for the automatic factory is that the productivity gains
to be achieved from the integration of many different types of automatic systems
multiplies the productivity gains from the individual systems by a factor of two to
four times. For example, once a factory is able to run overnight or over the weekend
without human labor, productivity immediately takes a quantum leap. There are 168
hours in a week. A factory that can operate continuously can be producing output
for four forty-hour shifts a week with eight hours left over for maintainance. This
must be compared with factories that employ human labor that usually operate only
one or two shifts a week. Most people do not like to work nights and weekends.
Thus, premium pay is required for the third and fourth shifts. Robots, however, do
not care whether it is day or night, weekday or weekend.
The first robot factories will probably be somewhat more expensive than con¬
ventional manned factories. Large initial investment is needed for novel and untried
technologies, and robot technology is in its infancy. Microcomputers are less than
ten years old. Robot vision, microcomputer networks, and hierarchical, sensory-
interactive goal-directed control theory are all just beginning to be investigated by a
few researchers in a few under-funded laboratories. Very few persons are skilled in
these matters, and no one has fully mastered them. There exists no significant body
of theory and engineering practice in robotics; much remains to be done.
Robotics, like space exploration, is a journey into the unknown. It will require
our best brains, a large investment in research, huge outlays of money for develop¬
ment, and, eventually, enormous capital resources for new plants and facilities.
These are the “front end” costs that will make the first robot factories very costly.
But the long-term economic benefits are clear. Knowledge, once acquired, is in¬
expensive to reproduce. Once we know how to build a robot factory, we can build a
hundred or a thousand robot factories at a fraction of the cost of the first. When we
have reduced robotics to an engineering practice, robot factories will not cost any
more than conventional factories. In fact, they may be less expensive: robots don’t
need air conditioning in the summer or heat in the winter; they don’t need cafeterias,
rest rooms, or parking lots; they don’t need costly equipment to protect them from
smoke, dust, noise, toxic fumes, or dangerous machinery.
Thus eventually, the robot factory may produce productivity improvements of
ten to a hundred times over a conventional factory. This means that products pro¬
duced in robot factories can be many times less expensive and profits higher.
Robot factories will eventually be able to manufacture, assemble, and test the
essential components for other robot factories. This will initiate a regenerative, or
reproductive, process similar to that which already exists in the computer industry.
The results might be similar to the 20 percent annual reduction in the cost per unit
performance that has been going on for at least 25 years in the computer industry.
See figure 11.2. A similar price/performance trend in industrial robots would mean
that the price of a sophisticated industrial robot might fall to several hundred 1980
dollars by the year 2000. Such a cost, when prorated for a 168-hour week, would
amount to an effective robot labor cost of only pennies per hour. Prices of products
produced in robot factories could eventually spiral downward by a factor of two
every three years.
Eventually the flow of material wealth from automated production lines will
give overwhelming economic power to the owners of the robot factories. Once
automatic factories become common, the economic advantage will be so large as to
be irresistible. No industry using conventional production techniques and human
labor will be able to survive in head-to-head competition with industries using the
automatic factory concept.
Figure 11.2: A graph of the cost of small computers over a period from 1963 to 1973. This
downward trend in costs has continued throughout the decade of the 1970s and shows every
indication that it will continue to do so for at least another ten years.
Future Applications 305
We have not yet entered the age of the robot factory. Even the Japanese, the
acknowledged leaders in the development of robot factories, have not yet achieved
their ultimate goal of the completely automatic factory. But they are getting close
enough to seriously influence the balance of world trade. Already the Japanese pro¬
duce twice as many cars per man-day of labor as American automobile manufac¬
turers. Each day, the Nissan Zama automobile plant assembles 1300 cars with only
67 human workers on the assembly line—the rest of the work is done by robots and
other types of automatic machinery. This type of technology is partially responsible
for the enormous influx of inexpensive, high quality steel automobile, motorcycles,
and other consumer products into the American and world markets. The Japanese
economic miracle is largely attributable to their enormous investment in
productivity-enhancing technologies in general, and to computer-aided manufactur¬
ing and robot technology in particular.
Some experts may disagree that the present cost of $30,000 to $100,000 for an
industrial robot will ever be reduced to a few hundred dollars. In fact, the recent an¬
nouncement of several European robots in the $150,000 to $200,000 price range
seems to run counter to this prediction. Nevertheless, there are a number of reasons
to believe that the long-term trend of robot costs will be downward. First, we are still
just on the threshold of the age of robot factories. Today only pieces and fragments
of the technology have entered the industrial arena. The full self-reproductive power
of the totally automated factory is still in the future. Today, only one or two robot
manufacturers use robots in the production of robots, and only for a few opera¬
tions. Surely, this will not continue. By the end of the 1980s it will be common prac¬
tice for robots to be used extensively in the manufacture of robots.
There are two types of costs for robots. One is the software costs: the program¬
ming language, the memory-management software, the operating system, and the
applications programs. The other is the hardware costs: the cost of mechanical
structures, of gears, bearings, motors, pistons, valves, encoders, sensors, power sup¬
plies, electronics, computers, and memory systems. Today it is possible to put
together an advanced laboratory robot with extensive sensory capabilities for about
$150,000 together with computer hardware to run it for another $150,000. However,
the cost to develop the software for such a system might easily run 5 to 50 times as
much. As the market grows, economies of scale will drastically reduce unit costs
both for software and hardware.
For example, once the robot market grows to a certain size it will become
economical to build robots from injection-molded plastic. A possible plastic design
is shown in figure 11.3. This particular design is powered by pressurized water.
Injection-molded plastic parts can be made cheaply and very precisely. Hydraulic
motors and pistons using water as the hydraulic fluid at less than 100 psi can easily
be constructed from plastic injection-molded parts and can use rubber gaskets and
O-rings for seals. Plastic motors, ranging in size and power from more than one
horsepower to less than l/100th horsepower, can be constructed for between one
and ten dollars. Plastic pistons can be made even more cheaply. For example, plastic
hypodermic syringes can be bought in large quantities for only pennies apiece.
Plastic tubing is only a few cents a foot, and 5/8th-inch garden hose costs con¬
siderably less than a dollar a foot.
Gear boxes made of plastic gears and nylon bearings can be made for a few
dollars with a wide range of speeds and power ratings. Valves can be made from rub¬
ber diaphragms and controlled by solenoid coils, a common practice in automatic
washing machines and dishwashers. Double valves for automatic washing machines
can be purchased in large quantities for a dollar or two apiece.
An array of N valves controlling a set of orifices of exponentially increasing
size, shown in figure 11.4, can control the rate of flow to a precision of one part in
2N. A set of four valves arranged in a bridge configuration as shown in figure 11.5
can control the direction of flow of water through a hydraulic motor or to a bidirec¬
tional piston. A set of four valves, arranged in the configuration of figure 11.6, can
drive a pair of push-push pistons. The combination of flow-rate control and direc¬
tion control can be used to operate hydraulic actuators in a robot.
Figure 11.3: A design for a robot made completely of plastic parts. Plastic pistons and motors
using a pressurized hydraulic fluid such as water provide power to the various joints.
Future Applications 307
16 8 4 2 1
Figure 11.4: A five-bit digital flow-rate hydraulic control valve. Orifices with areas of 1, 2, 4,
8, and 16 allow 32 different flow rates to be generated by opening or closing various combina¬
tions of valves. Such valves can be manufactured from plastic and rubber parts similar to
those presently used in automatic washing machines.
A C
Figure 11.5: A set of direction-control valves for a hydraulic motor. When valves A and D are
open, the motor is driven in one direction. When valves B and C are open, the motor is driven
in the other direction. If all the valves are closed, the motor is locked. If valves A and B are
open with C and D closed\ the motor is free to turn without drawing any power.
Figure 11.6: A set of direction-control valves for a pair of push-push pistons. The pistons can
be driven in either direction, locked, or left free to slide depending on which combinations of
valves are opened and closed.
3 PHOTOTRANSISTORS
3 LEDs
2 PIECES OF PLASTIC
Figure 11.7: A possible design for a plastic linear-motion detector using moire fringe patterns.
A grating pattern on the clear plastic slide creates moire fringes when pulled past a similar pat¬
tern embedded in the housing. Light emitting diodes (LEDs) on the back of the housing
transmit light through the gratings to be detected by the phototransistors shown on the front
of the housing. The two phototransistors at the bottom produce phase quadrature square
wave signals that are sent to an up-down counter such as shown in figure 11.8. The top photo¬
transistor produces a reset signal for initializing the up-down counter to a known value.
A chamber partially filled with air and containing a float can be used to
measure hydraulic pressure, and the use of orifices of different sizes can be used to
control the various relationships between pressure and flow rate, useful in servoing a
robot in position, velocity, and force.
Gratings printed on a pair of clear plastic plates, shown in figure 11.7, can be
used to generate moir6 fringe patterns that alternate from transparent to opaque
with each increment of motion equal to the line spacing of the gratings. An electrical
pulse for each increment of motion can then be obtained by putting a light-emitting
diode on one side and a light-sensitive detector on the other. By placing two emit¬
ter-detector pairs one-quarter wavelength of the moir£ fringe pattern apart, a pair
of phase-quadrature square wave signals can be produced to indicate the direction of
motion. The electrical pulses from these signals can be accumulated in registers as
shown in the circuit diagram of figure 11.8 and used to measure position and veloci¬
ty of motion.
Future Applications 309
Figure 11.8: An electronic circuit for driving an up-down counter from the phase quadrature
signal generated by the phototransistors of the motion detector in figure 11.7.
In industrial quantities (less than a million units per year) the following costs
could probably be achieved:
For those industrial tasks where the cost of robot labor falls below that of
human labor or where the capabilities of robots rise above those of human labor,
industrial productivity will leap forward as fast as the resources are committed to
investment in robot technology. This will undoubtedly happen first in the manufac¬
turing industries, resulting in profound effects on the economic strength of whatever
nations adopt this new technology.
Manufacturing is the foundation of industrialized civilization. Manufacturing
productivity growth is dominant among the factors producing growth in real income
and real economic prosperity. The lack of productivity growth towers over all other
factors that influence peacetime inflation (including government deficit spending).
Industrial productivity affects the cost of food through the cost of farm equipment
and supplies, the cost of construction through the cost of building materials and
construction equipment, and the cost of transportation through the cost of vehicles
and road-building machinery. It affects the cost of energy through the cost of dril¬
ling equipment and pipelines. It directly affects the cost of furniture, clothing,
appliances, automobiles, trucks, and trains, and indirectly affects the cost of
everything else, right down to church and school buildings, books, and the
pollution-control equipment required to keep the air and water clean. Industrial pro¬
ductivity growth is the principal source of prosperity and brings a rising standard of
living. High productivity is the reason that Americans have traditionally enjoyed a
high standard of living. The potential of robot technology to raise industrial produc¬
tivity by hundreds or even thousands of percent thus makes it a matter of the highest
importance to the future of this country, and indeed to the future of modern civiliza¬
tion itself.
A more exotic and versatile design would be to upgrade the stabilizing legs into
walking legs. Consider, for example, a construction robot designed as a hydraulical¬
ly powered six-legged walking machine using the power plant and carriage of a small
truck, like the one shown in figure 11.9. The legs would consist of three-degree-of-
freedom manipulators with a polar coordinate system. The first degree of freedom is
a rotary actuator about the vertical axis. The second and third degrees of freedom
are lifting and flexing motions controlled by hydraulic pistons.
Figure 11.10: Front view of a six-legged walking vehicle designed and built at Ohio State
University by Professor Robert B. McGhee.
Additional ranging sensors could keep the legs from bumping into anything or
anyone. Position inputs from each joint of the legs and sensors to measure the forces
exerted downward by each foot would also be necessary. Gaits for six-legged walk¬
ing devices have been worked out in detail and are well-known. A seventh
microcomputer would have coordination control over the six-leg computers and
would compute the roll, pitch, and yaw of the vehicle as well as the desired heading
and speed, so it could be steered like a car.
OPERATOR INPUT
Figure 11.11: Structure of interactive computer software system for control of the Ohio State
University hexapod of figure 11.10.
Future Applications 315
On the front of the vehicle would be a pair of slit projectors and a solid-state
TV camera configured as a three-dimensional vision system similar to the one il¬
lustrated in figure 8.34. The computations of a depth image for the region in front
of the vehicle would be handled by an eighth microcomputer controlling the vision
interface and processing the visual data. The planes of light would be mechanically
scanned back and forth across the territory in front of the vehicle in an interlaced
pattern so the entire region would be covered every one-half second. A ninth
microcomputer would compare the processed vision data with a stored topographic
map of the construction site so that the vehicle could find its way around.
On top of the vehicle could be a hydraulically powered cherry-picker boom,
with a three-axis device for orienting the shoulders of the two robot arms. The
shoulder structure would position two manipulator arms, a TV camera, a light pro¬
jector, and perhaps a tool manipulator (not shown in the figure).
The potential selling price, assuming present-day technology, might be the
following;
$200,000 TOTAL
This is too expensive for most applications today. However, technological ad¬
vances will eventually reduce the cost by three to ten times. As the cost of human
construction labor rises, the economic advantages of robot construction labor will
grow more and more significant.
A robot like the one shown in figure 11.10 could serve as an apprentice to a
master craftsman in constructing cinder block or brick walls, laying roofing
shingles, lifting, positioning, and fastening siding, insulation, window and door
frames, setting roof and floor joists, assembling forms for concrete, pouring and
dressing concrete, painting, and performing a large number of other specialized
tasks. A smaller version of this basic design would be suitable for inside work, for
setting studs, installing plumbing, laying tile, building brick or stone fireplaces, put¬
ting up wallboard, painting, sanding, polishing, etc. Eventually, such machines
would be able to work unsupervised for extended periods of time.
OUTSTANDING PROBLEMS
What are the technical problems in developing such devices? We know how to
build the mechanical apparatus, and there are a number of powered actuators on the
market. There are backhoes of many different designs that can execute very precise
motions of a powerful digging claw. A well-trained backhoe operator can dig a hole
of precise dimensions, can lift huge volumes of dirt or large boulders, or can pick up
a single pebble. There is a whole family of cranes that can lift heavy loads and
precisely position them. Why are these not made into robots, outfitted with a variety
of vacuum and fingered grippers, and provided with tools—automatic nailers, mor¬
tar dispensers, paint brushes and spray nozzles, trowels and spatulas, picks and
shovels, buckets and mops, grinders and sanders, scrapers and caulkers, drills, saws,
chisels, glue guns, and a hundred other varieties of instruments?
There are two reasons: first, at present there are no sensors or sensory-
processing devices that can make a robot see and feel well enough to know what it is
doing. Cranes and backhoes need the human operator to see where to put the shovel
or the load. Even a human has no way of sensing the forces developed at the point of
action except by such crude methods as seeing the deflection of the structures being
operated upon or hearing the power train being loaded down. Robot vision is still
crude, slow, unreliable, and expensive, and no software exists for rapidly and
reliably recognizing objects, relationships, and situations, or for measuring the posi¬
tion and orientation of objects and their relationship to one another and the
environment. Furthermore, no one knows how to write such software in an
economically practical manner.
Second, control systems or control system architectures that can enable a robot
to act and react using sensory information in a timely and intelligent way are still
nonexistent. Construction sites are notoriously dirty and cluttered places, and even
the simplest job (such as picking up a brick or board, deciding where to drive a nail,
or how to open a paint can) requires a great deal of knowledge and skill. In order for
a robot to perform such jobs efficiently and reliably, a great deal of additional
knowledge about writing software for accomplishing such tasks will be required.
Few if any robots, even under ideal conditions in the most advanced research
laboratories, can presently do these things. The software required to demonstrate
complex assembly tasks requires many person-hours of programming effort, often
by doctorate-level researchers with wide experience and formal training in computer
science. No one has yet perfected a person-machine interface by which an average
construction foreman, in a few words, could instruct a robot to perform a job like
<LAY A TILE FLOOR >, < BUILD A BRICK WALL>, < INSTALL A
SINK>, or < PAINT A CEILING >.
These are enormous yet solvable problems. Any complex task is nothing more
than a sequence of simpler subtasks. Each subtask can itself be decomposed into a
sequence of yet simpler sub-subtasks. At the very bottom, each actuator needs only
the information of how far to move in one of two directions, how fast to proceed,
and how much force to apply. For most tasks in factories or construction sites, there
is a well-known procedure to be followed. A robot would seldom, if ever, have to
figure out for itself how to perform a task. Many tasks can be described in terms
Future Applications 317
such as <PICK THAT UP AND HOLD IT HERE >, or < FETCH THOSE >,
< CARRY THESE AND PLACE THEM OVER THERE >, etc.
One problem is to be able to embed procedures to execute such commands in a
set of rules that reside at the various levels of a control hierarchy and to provide sen¬
sory information to the control modules that invoke the proper rules at the proper
times. Another problem is to devise a means by which the writing of software can be
similarly partitioned, so that the robot programmer can isolate and address a limited
problem with a limited set of parameters in a modular fashion. A third problem is to
make the various program modules fit together and function in a coordinated goal-
directed manner when they are assembled.
These are not small problems, but they can also be solved. The first step is to
analyze each task the robot is to perform and partition the task into a set of sub¬
tasks, and then sub-subtasks, in a hierarchical manner similar to what was described
in Chapter 9.
The second step is to configure a network of computers so that the separate
computing modules of the hierarchy can be implemented on separate computers in
the network. A separate computer (or separate time slot on a shared computer) for
each computational module in the processing-generating hierarchy means that the
different computing tasks never need to interrupt each other. This enormously
simplifies the problem of writing software because of the regular structure and
precise modularity it imposes on the process. It also simplifies the debugging and
verification of software that is critical to making robot programs interact intimately
with many external variables.
A third step is to fit the robot with visual systems using structured light in order
to recognize and measure the position and orientation in three-dimensional spaee of
the objects to be manipulated. The use of structured light simplifies the visuaL
processing algorithms sufficiently so unambiguous conclusions can be derived in a
small fraction of a second, This is essential if visual information is to be used for
controlling the actions of the robot in real time.
A fourth step is to equip the robot with force and touch sensors so that it can
feel the presence of objects and exert forces on them in a controlled and goal-
directed way.
The fifth and final step is to provide the robot with a world model that can
enable it to interpret both commands and sensory information in the context of a
known environment. One type of world model is a photographic image, or set of im¬
ages, of the work environment that can be used as a map or template to compare
against incoming sensory data. Consider, for example, if a painting robot were to be
provided with a photographic image of the object, such as the front of a house, to be
painted. This photograph could be used as a world model for interpreting com¬
mands such as < PAINT THE FRONT DOOR> or < PAINT THE UPSTAIRS
WINDOW FRAMES >. The photograph would show where the paint should be ap¬
plied. It would tell the vision system where to look for edges and corners and the
control system where to watch for obstructions such as posts and shrubs.
This assumes that the robot has the ability to scan and interpret the photograph
and compare the sensory information obtained from a vision or touch system with
the world model expectations derived from the photograph. The complexity of this
problem can be substantially reduced by using a map derived from the photograph,
instead of the photograph itself. In this case, the map could be produced from
photographs and graphic overlays developed off-line in a sophisticated computer-
vision-graphics laboratory or by manual input on an interactive graphics terminal.
The map would have numerical codes attached to each region identifying the
physical object represented by that region and its three-dimensional position, orien¬
tation, shape, and surface characteristics. The on-site robot would then merely need
the ability to scan the map, which could reside on a programmable read-only
memory (PROM) chip, a floppy disk, or a bubble memory, and compare the stored
data against incoming observations from the robot’s sensory system.
Programs written to implement robot tasks at various levels in the hierarchy
could then refer to various portions of the map image. For example, the areas to be
painted can be specified by number, and the color and dimensions of each area
defined by the map in the robot’s world model. The program then consists merely of
specifying the order in which each area is to be painted or sanded, where nails are to
be positioned, where tile is to be laid, where roofing materials should be positioned,
etc. Standard patterns can be specified for laying bricks or blocks of stone in con¬
structing walls, fireplaces, floors, and arches. Using such a map, or set of maps, and
programs referring to the maps, the robot can compare the existing state of the work
with the desired state of the completed project, selecting the appropriate operation
to be performed in the appropriate sequence.
Such functioning requires a significant amount of software to be embedded in
the computational modules of the control hierarchy. The software may be either
resident on read-only memory (ROM) in the on-site robot or in a mass storage device
that can be accessed by the robot. The mass storage device could be a large disk
system located in a van, or one that is accessible through phone lines to a remote
facility. The set of programs required to perform a large variety of tasks thus can be
loaded into the robot-control system whenever those specific tasks are called for by
the robot’s job supervisor. Given this type of a priori information, a construction
robot should be able to accept instructions from a foreman on a construction site in
a simple subset of English. Given recent progress in the state-of-the-art of automatic
voice recognition, such verbal commands may soon be possible.
It will probably be many decades before robots are able to do all the tasks
necessary to build a house. Nevertheless, enough of the value added in construction
may be amenable to robot labor by 1990 to significantly affect the cost of new hous¬
ing.
Once construction robots become dexterous, inexpensive, and simple to use in a
large variety of practical construction tasks, labor-intensive construction techniques
that have been abandoned because of prohibitive costs will once again be
economically practical.
Future Applications 319
HOUSEHOLD ROBOTS
The 1990s could be the decade during which the household robot becomes prac¬
tical. Once plastic robots become highly developed and inexpensive in industrial ap¬
plications, and once advanced software and sensory systems are developed for con¬
struction robots, the same technology can be applied to the problems of household
robots. The environment of the home is as variable and complex as that of the con¬
struction site. Each home has a different floor plan and arrangement of furniture. In
order for a household robot to negotiate through the average room, it must have an
internal map of the furniture placement and permissible pathways. In order for the
robot to set the table, it needs to have an internal map of the proper placement of
plates, silverware, cups, glasses, and napkins. In order for it to dust, it must have an
internal program that recognizes each piece of furniture and each fragile item and
how it should be handled. A household robot must know where the dishwasher is,
where the cabinets are located, and where each of the various types of plates and
utensils is to be stored. If it is to vacuum, it must have a map of what to vacuum as
well as a sensory system that can recognize the difference between patches of dirt
and patterns in the Oriental rugs. If it is to clean windows, it must know where they
are as well as how to reach them without bumping into the furniture or becoming en¬
tangled in the drapes. If it is to scrub the bathroom, it must have a model of the
shape of the fixtures as well as a procedures for wetting, rubbing, rinsing, and dry¬
ing them.
These are extremely complicated problems, yet they are similar to those that
must be solved by construction robots. Once the software techniques are developed
for construction applications, they should be adaptable to the household environ¬
ment. The mass market of household robots means that the software costs can be
amortized over a very large number of robots. Again, the driving cost factor will be
the mechanical structure.
In mass consumer quantities, the mechanical parts for a robot may cost one-
half, or even one-third of what industrial quantities cost. A household robot, such
as the one shown in figure 11.12, with two arms, wheels, and a shoulder-height-
adjustment capability might have twenty degrees of freedom. Not all would require
the full complement of rate control, pressure regulation, and position indicators.
The average cost per axis for a household robot might be between $20 and $30. This
translates into $400 to $600 for actuators and controls.
Add $1500 for a computer, $800 for sensor systems including vision, $600 for a
structure, and the result is a household robot costing around $3500. Such a product
could be marketed for $4000 to $6000 (1980 dollars)—about the price of an inexpen¬
sive automobile. If the capabilities are extensive, such as the ability to set and clear
the table, load and unload the dishwasher, prepare meals, sort clothes and do the
laundry, vacuum, dust, and wash windows, this price is not unreasonable and a
mass market would exist.
Hydraulic power could be provided simply and inexpensively by a single electric
motor driving a plastic hydraulic pump carried in the bottom of the robot. Such a
robot could obtain power as well as access to large external computers and data
bases by plugging itself into wall sockets. The robot would carry a small storage bat¬
tery to enable it to travel between one wall socket and another. Alternatively, it
might use two lengths of cord to enable it to maneuver from one electrical outlet to
another by plugging itself in at a new outlet before releasing and recoiling its cord
from the last outlet.
With some modifications, a household robot might be able to perform a variety
of jobs in the yard and garden. An internal map of the yard together with sensors to
detect landmarks like the corners of the house, walkways, trees, and shrubs could
enable the robot to navigate in the yard. A mowing attachment could cut and trim
the grass, and a map of the garden may even make it capable of weeding the flower
beds or garden. Such an outdoor device could be electrically powered through an ex¬
tension cord or mechanically powered by a small gasoline engine.
Besides the use of robots in manufacturing and construction and their potential
as household servants, there will be many other applications as well. Over the next
few decades, robot technology could greatly improve the safety and reduce the cost
of nuclear power. Professor Marvin Minsky of MIT has proposed that robots and
remote manipulators completely replace human workers inside nuclear power
plants. This could make it possible for nuclear fuel reprocessing plants and breeder
reactors to be permanently sealed. As a result, the threat of sabotage or the possibili¬
ty of theft of nuclear materials would be totally eliminated. In the event of nuclear
power plant accident or malfunction, robots could work in radioactive areas for
emergency repairs and clean-up operations.
Robots are also ideally suited for underwater applications. Underwater robots
using water hydraulics could operate at any depth. The computer could be housed in
a spherical pressure-resistant chamber together with a fuel cell that would generate
electricity to drive an electric motor connected to a hydraulic pump. Such a system
could easily provide one-horsepower working capacity in a very compact package.
Larger systems with ten to a hundred times as much power are well within the
capabilities of current technology.
Underwater robots could be weightless, and their buoyancy could be controlled
by expanding or contracting air-filled chambers. They could be maneuvered by pro¬
pellers or by jets of water. Walking underwater could easily be accomplished by a
two-legged (or two-armed) robot. Such robots could explore the depths for
minerals, operate underwater mining and drilling equipment, and perform under¬
water construction.
Underwater robots might eventually be capable of introducing completely new
approaches to solar energy. One possibility is to use underwater construction robots
to build and service huge turbines for capturing energy from deep ocean currents.
Although the energy density in currents such as the Gulf Stream is not high (the flow
is only about five knots), the total energy available is enormous. The Gulf Stream is
thousands of feet deep and many miles wide. Estimates are that underwater turbines
could produce electricity at commercially competitive prices, even using present
construction technology. Undersea construction and maintenance robots would
considerably reduce the cost of such structures.
COMPUTER-CONTROLLED SAILS
- FLOATING TUBES
Figure 11.13: A giant plastic floating lily pad for farming algae and converting it into alcohol.
Floating tubes support plastic-bottom ponds filled with fertilized water. Circulation from the
outer rim inward carries the growing algae toward the center where it is processed by bacteria
and distilled into fuel-grade alcohol. The center is covered by a clear plastic bubble that traps
and condenses the alcohol vapor produced by solar heating.
Future Applications 323
Figure 11.14: Cross-section of one of the floating tubes that support the plastic bottom of the
lily pad algae farm and circulate enriched water. Sails and keels are controlled by a network of
microcomputers to maintain the shape and structural integrity of the lily pad and permit it to
navigate on the open ocean. Mechanical power to adjust sails and keels is derived from wind
or wave energy.
The algae would grow and multiply in the outer portions of the lily pad that
would be fed by fertilized water recycled from the central processing region. The
algae would be concentrated through a series of one-way valves and sieves as the
algae flowed gradually toward the central processing region. At a point near the
center, algae-eating bacteria would be injected. The resulting product would be
alcohol. The central portion would be covered with a clear plastic bubble supported
by a pocket of air. Sunlight shining through the clear plastic would evaporate the
alcohol, which would then condense on the inside of the plastic and run into collect¬
ing troughs.
At present, only methyl alcohol could be economically produced by such a pro¬
cess. However, future research in genetic engineering should be able to produce a
strain of bacteria that can convert algae to ethyl alcohol, which has about twice the
energy density of methanol.
The alcohol fuel produced would be stored in an underwater bladder, collected
periodically by a robot tanker, transported safely, and burned without pollution in
home furnaces, automobiles, and electric utilities. The robot tankers could be giant
submarine blimps powered by alcohol fuel.
There are good reasons to believe that lily pads might be practical sources of
fuel in the near future. If we assume a 2 percent photosynthetic efficiency and a
25 percent chemical conversion efficiency, the output for a single lily pad ten
kilometers in diameter would be about 42,000 gallons of methanol an hour. Assum¬
ing eight hours a day, that leads to over 100 million gallons per year. A fleet of 2500
such lily pads would provide the equivalent of all the oil presently consumed in the
United States. This number of lily pads would fit conveniently in a square 500 miles
on a side. In the future genetic engineering on algae and on microorganisms that
feed on algae should be able to increase the efficiency of the sunlight-to-alcohol con¬
version process. This suggests that the alcohol fuel available from robot lily pads
might eventually be sufficient to provide the entire energy needs of the world for the
foreseeable future without any need for nuclear power in any form. The percentage
of the equatorial oceans that would need to be covered with lily pads in order to
satisfy the entire world’s energy supply for the next thousand years would be small.
An important feature of lily pad technology is that it would not divert any ex¬
isting food-producing land into energy production. In fact, sea-going lily pads
would themselves become complex ecosystems wherein many higher creatures might
live and reproduce. Plankton, shrimp, fish, and birds would thrive in the environ¬
ment created by the lily pads. The fish crop might eventually become a significant
source of food for the world’s growing population as the competition for arable
farmland grows and the natural fishing grounds become overharvested.
Finally, these havens of life and energy in the open ocean might one day become
desirable places for people to live and work and play. Such colonies are much more
likely to be economically and technologically practical than the space colonies pro¬
posed in the NASA-sponsored studies of Brian O’Leary, and the stepping stones to
reaching them are much closer at hand.
Future Applications 325
Structures similar to the ocean-going lily pads could also be constructed on land
in the desert. Dry-land lily pads wouJd require two sheets of plastic, one dark to
cover the bottom, the other clear to cover the top and prevent the water from
evaporating. Water is, by definition, scarce in the desert and evaporation expensive
to replace. The sea-going variety would not need to worry about evaporation.
It is important to add, however, that a major energy switch from oil to alcohol
produced by lily pad technology is probably not possible in this century. Even if
there were no remaining technical problems, which there are, and even if the
technology had already proved itself price-competitive, which it has not, the time
needed for conversion of a large fraction of the world’s energy industry to alcohol
produced with lily pad technology would probably be 20 to 30 years.
These are only a few of the potential applications of robotics. Robots will be
able to accomplish many other interesting and valuable tasks. For example, robots
will be able to explore the bottom of the oceans and eventually mine the deep ocean
trenches where untold mineral treasures lie hidden. They also will be capable of ex¬
ploring and even colonizing the surface of the Moon, Mars, and the satellites of
Jupiter.
Robot space exploration and robot construction of large space structures will be
much less expensive than similar work performed by human astronauts. Robot
space voyagers will not need the elaborate life-support and safety equipment re¬
quired by humans. Robots can fly aboard launch vehicles that aren’t certified safe
for humans and can embark on one-way missions with no provisions for returning
to Earth. Robot explorers, if equipped with the proper sensory systems, can transmit
visual and other sensory information back to Earth where the sensory experiences
can be reconstructed for human observers. By this means, robot vehicles on Mars or
robot gliders riding updrafts in the atmosphere of Jupiter could transport us all on
fantastic voyages of exploration to new worlds. By telepresence (i.e., the transmis¬
sion and reconstruction of sensory experiences over long distances) we might all
experience the sights, sounds, and feelings of unearthly regions throughout the solar
system.
Economic, Social, and Political Implications 327
CHAPTER
These are revolutionary times. The introduction of the computer into the
manufacturing process and the spectacular steady decline in the cost of computing
power are historical events that will someday rank with the invention of the steam
engine and the discovery of electricity. The human race is now poised on the brink of
a new industrial revolution that will at least equal, if not far exceed, the impact of
the first industrial revolution. Changes as profound as those resulting from the
development of agriculture and the domestication of wild animals are rushing us
toward a new world.
The first industrial revolution substituted mechanical energy for muscle power
in the manufacture of goods and the production of food. This brought about an
enormous increase in productivity, put an end to slavery, and freed a great mass of
human beings from a life of poverty, ignorance, and endless physical toil.
The next industrial revolution will substitute computer power for brain power
in the control of machines and industrial processes and will be based on robot labor,
Automatic factories, offices, and farms will be able to produce material
goods—automobiles, appliances, furniture, and food—in almost unlimited quan¬
tities at very low cost and without human intervention. Robot construction workers
will be able to build homes, roads, and commercial buildings. Robots will be able to
mine the sea beds and farm the ocean surface for fuel as well as food. Robots and
automatic factories have the potential to create material wealth in virtually
unlimited quantities and eventually to reproduce themselves in any numbers we
choose.
The next industrial revolution—the robot revolution—could free the human
race from the regimentation and mechanization imposed by the requirement for
manual labor and human decision-making in factories and offices. It has the capaci¬
ty to provide us all with material wealth, clean energy, and the personal freedom to
enjoy what could become a golden age of mankind.
BARRIERS TO A ROBOT LABOR FORCE
POTENTIAL SOLUTIONS
old-fashioned kind. How much better if farm worker families could Own and
operate a few robots rather than having to sell their own sweat to earn their daily
bread? This suggests that the farm unions might do their workers more good by sup¬
porting farm robot research and concentrating their organizational efforts on ob¬
taining the financial resources to make it possible for farm workers to become robot
owner-entrepreneurs.
Appealing as this possibility is, it suffers from the problem that it would be
practical in only a few instances. It might possibly work in farm applications and
perhaps to some extent in construction work. However, the structure of the working
relationships in factories and offices makes it unlikely that it would work there.
Such an arrangement would require an unprecedented degree of cooperation, vision,
and mutual good faith between unions and management, as well as between workers
and capital financing institutions.
Another possibility, more amenable to manufacturing industries, would be to
introduce a massive program of Employee Stock Ownership Plans (ESOP) such as
have been suggested by Lewis O. Kelso. This would allow most of the presently
employed industrial and business labor force to benefit from the robot revolution.
However, it would exclude most of the rest of the population and would tie each
worker’s fortune very tightly to the future profitability of his or her particular com¬
pany. Thus, some would fare very well while others might end up with nothing
because of a company bankruptcy.
Another possible solution would be to finance the development and construc¬
tion of robot factories out of public money (not tax money, but credit from the
Federal Reserve System) and pay dividends on the profits from those investments to
everyone on an equal per-capita basis. To be more specific, a semi-private invest¬
ment corporation, which we might call a National Mutual Fund, could be created
for the purpose of financing capital investment for increasing productivity in private
industry. This investment corporation would be authorized by the Congress each
year to draw up to a specified amount from the Federal Reserve which it would use
to purchase stock from private industry. This would provide equity financing for the
modernization of plants and machinery and the introduction of advanced computer-
based automation. Profits from these investments would then be paid in the form of
dividends by the National Mutual Fund to all adults on an equal per capita basis. By
this means each citizen would receive income from the industrial sector of the
economy independent of employment in factories and offices. Every adult citizen
would become a capitalist in the sense of deriving substantial income from invested
capital.
A new economic philosophy based on the concept of a National Mutual Fund is
outlined in some detail in my previous book: Peoples' Capitalism: The Economics
of The Robot Revolution. This book attempts to show how America could finance a
rebuilding of its industrial plant and a massive construction program for robot fac¬
tories. The suggestion is to begin the National Mutual Fund with the modest sum of
$10 million for the first year and increase this amount by a factor of three every year
332
for 25 years until the investment rate for the National Mutual Fund equals the total
private investment rate. Once this were achieved, annual public dividends for every
adult citizen would amount to about $8000, in 1980 dollars.
Figure 12.1 suggests that doubling the nation’s investment rate would lead to a
real annual growth rate of about ten percent. An enormous difference exists between
ten percent real growth and the present rate of GNP increase. Figure 12.2 shows the
implications over the next 25 years. At the current annual growth rate of 1.5 percent,
Figure 12.1: The relationship between capital investment and productivity growth in ten in¬
dustrialized countries. Countries with a high rate of investment have high productivity growth
and vice versa. This dependence of productivity growth on capital investment is not a tran¬
sitory phenomenon or one confined to a few countries. It is a fundamental relationship in¬
herent in all industrialized societies.
This implies that productivity growth can be controlled, that it is the direct result of
economic policies that promote investments in new technology and in more efficient plants
and equipment. The data shown here suggests that a given amount of investment will yield a
given amount of productivity growth. For example, it suggests that a doubling of America's
investment rate would produce an annual productivity growth rate of eight to ten percent.
This chart was compiled from data taken during the 1960s. However, the addition of data
from the 1970s makes little change in the graph.
Economic, Social, and Political Implications 333
EFFECTS ON REAL
GNP OF PRODUCTIVITY
GROWTH RATES
the GNP will barely rise from its present level of about $2 trillion to about $3
trillion. This hardly keeps up with the population growth. Even if the U.S. were to
achieve its historic growth rate of three percent per year* the GNP would rise to only
$4 trillion by 2005. However, a ten percent growth rate would result in a GNP of
over $20 trillion in 25 years, or more than six times the amount achievable at our
present rate!
Clearly, this is a matter of tremendous importance. A GNP surplus of $16
trillion over what otherwise would be considered normal would mean that even the
most exotic solutions to the problems of the environment would become
economically feasible. We could afford to collect solar energy or dig for geothermal
power anywhere on earth. We could afford to convert all industry, homes, and
transportation to alcohol or hydrogen fuel. We could process all sewage and farm
drainage to the purity of rainwater. At the same time, we could afford to rebuild our
cities, modernize our transportation systems, and provide the best in health care for
everyone. The military budget could support our defense needs on a smaller fraction
of the GNP, and we could embark on a much more exciting program of space ex¬
ploration.
There are, of course, many economists who would dispute the possibility of the
United States increasing its real growth rate to ten percent per year even if the invest¬
ment rate were doubled. Many would claim that Japan’s experience is unique and
that even Japan will slow down once she becomes the world leader. Perhaps so. But
the curve in figure 12.1 does not come from a single country or represent the ex¬
perience of only a few years. It reflects the combined experience of all the industrial
countries in the world over the past 20 years.
Productivity growth is positively correlated with the investment rate. The new
technology of robotics, computer-aided manufacturing, and particularly the pro¬
spect of self-reproducing factories certainly provide the technical basis for a ten per¬
cent annual productivity growth rate. What is needed is the capital investment to
completely rebuild the present industrial base using the latest computer and robot
technology. That involves a minimum of $6 trillion new investment, which, spread
over two decades, amounts to about $300 billion addition annual investment, or
about double the present rate.
Figure 12.3: A model of a national economy operating with the National Mutual Fund (NMF)
and a mandatory savings program (Industrial Development Bonds). NMF investments and
public dividends find their way into consumer pockets creating demand for new products pro¬
duced by modernized plants. The balance between demand and supply is reflected in prices
that regulate the amount of consumer purchasing power temporarily diverted into savings.
The savings rate is adjusted monthly to control inflation despite massive investments of newly
created money by the National Mutual Fund.
Economic, Social, and Political Implications 337
the government. The National Mutual Fund would be a semi-private investment cor¬
poration that would be disallowed from owing a controlling share of any company
and would never be allowed to invest more than private investors in the economy as
a whole. All National Mutual Fund investments would finance privately owned in¬
dustries operated for profit in a free market. Eventually public dividends would
make everyone financially independent. All, regardless of their beliefs or attitudes
toward industrialization or the work ethic, would have sufficient income to survive.
Everyone would be assured of enough cash money to purchase food, shelter,
clothing, and health care.
This would allow many new and previously untried lifestyles to develop and
survive. It would even allow the revival of many ancient lifestyles that have become
extinct because of competition with industrialized job economies. It would provide
freedom from want as a birthright.
The equal distribution of wealth earned by robot labor might even lead even¬
tually to the repeal of the graduated income tax. If robots were to provide for the
basic needs of everyone, there would be no further moral justification for taxing the
rich to subsidize the poor. There would be no need to penalize success and put great
riches beyond the reach of all but the very few. The payment of equal dividends to
all would make it possible for society, in good conscience, to reward excellence and
encourage ambition.
It is also important to distinguish public dividends from welfare. To begin with,
the dividends would not come out of taxes in any form. They would not be based on
need of recipients, but on profitability of the companies in which the investment
were made. This proposal is not just another “Robin Hood” scheme to take from
the rich and give to the poor. Dividends would be paid to rich and poor alike. The
healthy would benefit as well as the sick, the industrious as well as the lazy. This
means that there would be no need for government welfare inspectors to intrude on
privacy to determine need or for a large bureaucracy to certify eligibility or ad¬
minister payments.
Equal payments would, of course, mean more to the poor than to the rich. They
would provide an absolute income floor below which no one could sink, but there
would be no reduction of payments for persons who chose to work and no limit to
how much anyone could earn in additional income. Eventually, once annual per
capita dividend payments rose to $8000 or more (in 1980 dollars), there would be no
further need for welfare. Everyone would be financially independent regardless of
job employment or lack of it.
To be sure, some will object to this proposal on the grounds that it would
destroy the Puritan work ethic that made this country great, or that it would fly in
the face of the Biblical curse placed on Adam and Eve, “In the sweat of thy face
shall thou eat bread til thou return unto the ground.” The work ethic is a culturally
derived behavioral rule created by the requirements for economic success in a pre¬
robot economic system. It is not an indispensable component of human society, nor
is it essential to mental or physical health or moral stability. A large segment of
338
human society (namely the rich) have survived and prospered throughout history
with very little recourse to the work ethic as most of us know it.
Before the first industrial revolution, physical slavery formed the backbone of
every high civilization from the ancient Egyptians, Babylonians, Chinese, Greeks,
Syrians, Turks, and Romans, down to the antebellum southern American states.
The first industrial revolution substituted machine power for muscle power in the in¬
dustrial and agricultural processes. This made slavery uneconomical. In a very real
sense, the invention of the water wheel and the steam engine put slaves out of a job.
However, the machinery of the first industrial revolution still requires human labor
to control and service the machinery. Thus, present industrial technology still re¬
quires wage slavery in order for advanced civilization to flourish. If we wisely use
the available intellect, knowledge, and socio-political institutions, we can hope that
the second industrial revolution will free mankind from wage slavery and the
regimentation of the Puritan work ethic central to all economic philosophies born of
industrialization.
Feigenbaum, E.A., and Feldman, J. Com¬ Malone, R. The Robot Book. New York:
puters and Thought. New York: Harcourt Brace Jovanovich, 1978.
McGraw-Hill, 1963. “Manufacturing Technology — A Changing
Fikes, R.E., and Nillson, N. “STRIPS: A Challenge to Improve Productivity.”
New Approach to the Application of Report to the Congress by the Comp¬
Theorem Proving to Problem Solving.” troller General of the United States,
Artificial Intelligence 2 (1971): 189-208. June, 1976.
Flora, P.C., Thompson, A.M.; and Wilf, McCorduck, P. Machines Who Think. San
J.M., eds. Robotics Industry Directory. Francisco: W.H. Freeman, 1979.
La Canada, CA: Robotics Publishing Meltzer, B., and Michie, D. Machine In¬
Co., 1981. telligence. Halsted Press, 1973.
Fu, K.S. Syntatic Methods in Pattern Minsky, M., ed. Semantic Information
Recognition. New York: Academic Press, Processing. Cambridge, MA: M.I.T.
1974. Press, 1968.
Gonzales, R.C., and Wintz, P. Digital Im¬ Nevins, J.L., et al. “Exploring Research in
age Processing and Recognition. Reading, Industrial Modula Assembly.” Report
MA: Addison-Wesley, 1977. R-llll, C.S. Draper Labs, 1977.
Guzman, A. “Decomposition of a Visual _, and Whitney, D. “Computer-
Scene into Three-Dimensional Bodies,” Controlled Assembly.” Scientific
Proceedings Fall Joint Computer Con¬ American 238 (1979): 62.
ference 33. Washington: Thompson Book Newell, A., and Simon, H.A. “GPS, A
Co., 1968. Program That Simulates Human
Thought.” In Computers and Thought.
Edited by E.A. Feigenbaum and J.
Harrington, J. Computer Integrated Feldman. New York: McGraw-Hill, 1963.
Manufacturing. New York: Industrial Nilsson, N.J. “A Hierarchical Robot Plan¬
Press, 1973. ning and Execution System.” Artificial
Heginbotham, W.B., and Rooks, B.W., Intelligence Center Tech. Note, 6, Stan¬
eds. The Industrial Robot. Oxford: ford Research Institute, 1973.
Cotswold Press. _. Principles of Artificial Intelligence.
Hohn, R.E. “Computed Path Control for Palo Alto, CA: Tioga Publishing Co.,
an Industrial Robot.” Proc. 8th InCl 1980.
Symp. on Indust. Robots (Vol. 1), Int’l . Problem-Solving Methods in Ar¬
Fluidics Services, Ltd., Bedford, tificial Intelligence. New York: McGraw-
England, 1978. Hill, 1971.
_, (narrator). “Shakey: A First Ex¬
periment in Robot Planning and Learn¬
Issaman, P.B.S. “Tapping the Ocean’s Vast ing.” A film by Stanford Research In¬
Energy with Undersea Turbines.” stitute AI Center, 1972.
Popular Science 217 (1980): 72-158. _, and Rapheal, B. “Preliminary
Design of an Intelligent Robot.” Com¬
puter and Information Sciences. New
Kelso, L.O., and Hetter, P. Two-Factor York: Academic Press, 1967.
Theory: The Economics of Reality. New Nitzan, D. “Robotic Automation Program
York: Random House, 1967. at SRI.” Proc. MIDCON/79, Chicago,
Klix, F., ed. Human and Artificial In¬ 1979.
telligence. New York: North-Holland, _, and Rosen, C.A. “Programmable
1979. Industrial Automation.” NSF Grant GI-
38100X1, Tech. Note 133, SRI Interna¬
tional AI Center, Menlo Park, CA, 1979.
Lewis, R.A., and Johnston, A.R. “A Scan¬
ning Laser Rangefinder for a Robotic
Vehicle.” 5th Int’l Jt. Conf. on Artificial Park, W.T. “Robotics Research Trends.”
Intelligence, Cambridge, MA, 1977. Tech. Note 160, SRI International AI
Center, Menlo Park, CA, 1978.
346
Paul, R.L. “Modelling, Trajectory Schank, R.C., and Abelson, R.P. Scripts,
Calculation and Servoing of a Computer Plans, Goals, and Understanding: An In¬
Controlled Arm.” Memo AIM-177, quiry into Human Knowledge Structures.
report STAN-CS-72-311, Stanford AI Halsted Press, 1977.
Project, 1972. The Seeds of Artificial Intelligence. NIH
_“WAVE: A Model-Based Language Publications 80-2071, U.S. Dept, of
for Manipulator Control.” The Industrial HEW, 1980.
Robot A (1977): 10-17. Shimano, B. “User’s Guide to VAL.”
Pavlidis, T. Structural Pattern Recongition. Unimation, Inc., Danbury, CT 06810.
New York: Springer-Verlag, 1977. Shirai, Y., and Suwa, M. “Recognition of
Perkins, W.A. “A Model-Based Vision Polyhedra with a Range Finder,” Proc.
System for Industrial Parts.” IEEE Tran¬ of the IJCAI-71, British Computer Soci¬
sactions on Computers 27 (1978): ety, 1971.
126-143. Simon, H. and Newell, A. Human Problem
Pratt, W.K. Digital Image Processing. New Solving. New York: Prentice-Hall, 1972.
York: John Wiley & Sons, 1978. Simon, J.C., and Rosenfeld, A. Digital Im¬
age Processing and Analysis. Sijthoff &
Noordhoff, 1978.
Raphael, B. The Thinking Computer: Mind Slagle, James R. “A Heuristic Program
Inside Matter. San Francisco: W.H. That Solves Symbolic Integration Pro¬
Freeman, 1976. blems in Freshman Calculus.” In Com¬
Reichardt, J. Robots: Fact, Fiction, and puters and Thought. Edited by E.A.
Prediction. New York: Penguin Books, Feigenbaum and J. Feldman. New York:
1978. McGraw-Hill, 1963.
Reiger, C. “Artificial Intelligence Pro¬ Stauffer, R.N., ed. Robotics Today.
gramming Languages for Computer- Robotics International of Society of
Aided Manufacturing.” Report TR-595, Manufacturing Engineers.
Computer Science Department, University Stucki, Ed. Advances in Digital Image Pro¬
of Maryland, 1977. cessing: Theory, Application, Implemen¬
Rigney, J.W., and Towne, D.M. “Com¬ tation. New York: Plenum Press, 1979.
puter Techniques for Analyzing the Sussman, C. J. A Computer Model of Skill
Microstructure of Serial-Action Work in Acquisition. Elsvier, 1975.
Industry.” Human Factors 11 (1969):
113-122.
Rosen, C.A., and Nilsson, N. “An In¬ Tanner, W., ed. Industrial Robots: Vol. 1,
telligent Automaton.” IEEE International Fundamentals; Vol. 2, Applications.
Convention Record, 1967. Dearborn, MI: Society of Manufacturing
_; Nitzan, E.; et al. “Machine In¬ Engineers, 1978.
telligence Research Applied to Industrial
Automation.” Reports 1-8, SRI Interna¬ Watterman, D.A., and Hayes-Roth, F.
tional AI Center, Menlo Park, CA, Pattern-Directed Inference Systems. New
1973-1978. York: Academic Press, 1978.
Rosenfeld, A., and Kak, A.. Digital Picture Weekley, T.L. “The UAW Speaks Out on
Processing. New York: Academic Press, Industrial Robots.” Robotics Today
1976. (1979-80): 25-27.
Ruoff, C.F. “PACS—An Advanced Weizenbaum, J. Computer Power and
Multitasking Robot System.” The In¬ Human Reason: From Judgement To
dustrial Robot 1 (1980): 87-98. Calculation. San Francisco: W.H.
Freeman, 1976.
Whitney, D. “Resolved Motion Rate Con¬
Samuel, A.L. “Some Studies in Machine trol of Manipulators and Human Pros-
Learning Using the Game of Checkers.” theses.” IEEE Trans. Man-Machine
In Computers and Thought. Edited by Systems, VOL MMS-10, June, 1969:
E.A. Feigenbaum and J. Feldman. New
47-53.
York: McGraw-Hill, 1963. Winograd, T. Understanding Natural
Language. New York: Academic Press,
1972.
References 347
tactile receptors, 81
tactile sensory areas, 91-92
taste, 61-63
telepresence, 325
temporal patterns, 192
terminal buttons, 16
thalamus, 47, 83-84
touch sensors, 39
trajectories:
optimum behavioral, 282
state, 103-108
transcortical servo-loop, 186
transformational rules, 285
translation, automatic, 289-290
transmitter chemicals, 16
tropism, 127
tuning curves, 58
VAL, 265
vector:
binary, 106
notation, 101-102
state, 103-108
velocity image, 190
vestibular system, 35-36, 79-80
vision:
association areas, 82
color, 50
field, 92
flow, 8
stereo depth, 44
vocalization, 91