T The Origins of Psychological Testing

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 35

CH01.

QXD 6/12/2003 8:50 AM Page 1

C HAPTE R
1
The History of
Psychological Testing

TO PI C 1A The Origins of Psychological Testing


The Importance of Testing
Case Exhibit 1.1 The Consequences of Test Results
Rudimentary Forms of Testing in China in 2200 B.C.
Psychiatric Antecedents of Psychological Testing
The Brass Instruments Era of Testing
Changing Conceptions of Mental Retardation in the 1800s
Influence of Binet’s Early Research upon His Test
Binet and Testing for Higher Mental Processes
The Revised Scales and the Advent of IQ
Summary

T he history of psychological testing is a


fasci- nating story and has abundant relevance
to present-day practices. After all, contemporary
successors often exerted powerful effects on the
examinees who took them, so the first topic also
incorporates a brief digression documenting the
tests did not spring from a vacuum; they evolved pervasive importance of psychological test results.
slowly from a host of precursors introduced over Topic 1B, Early Testing in the United States, cata-
the last one hundred years. Accordingly, Chapter logues the profusion of tests developed by Ameri-
1 features a review of the historical roots of can psychologists in the first half of the twentieth
present-day psy- chological tests. In Topic 1A, century.
The Origins of Psy- chological Testing, we focus Psychological testing in its modern form
largely on the efforts of European psychologists origi- nated little more than one hundred years
to measure intelligence during the late nineteenth ago in lab- oratory studies of sensory
century and pre–World War I era. These early discrimination, motor skills, and reaction time.
intelligence tests and their The British genius Francis

1
2 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

Galton (1822–1911) invented the first battery of


tests, a peculiar assortment of sensory and motor THE IMPORTANCE OF TESTING
measures, which we review in the following. The
Tests are used in almost every nation on earth for
American psychologist James McKeen Cattell
counseling, selection, and placement. Testing oc-
(1860–1944) studied with Galton and then, in
curs in settings as diverse as schools, civil
1890, proclaimed the modern testing agenda in
service, industry, medical clinics, and counseling
his classic paper entitled “Mental Tests and
centers. Most persons have taken dozens of tests
Measurements.” He was tentative and modest
and thought nothing of it. Yet, by the time the
when describing the purposes and applications of
typical in- dividual reaches retirement age, it is
his instruments:
likely that psy- chological test results will help
Psychology cannot attain the certainty and exact- shape his or her destiny. The deflection of the life
ness of the physical sciences, unless it rests on a course by psy- chological test results might be
foundation of experiment and measurement. A step subtle, such as when a prospective mathematician
in this direction could be made by applying a series qualifies for an accelerated calculus course based
of mental tests and measurements to a large num-
ber of individuals. The results would be of consid- on tenth-grade achievement scores. More
erable scientific value in discovering the constancy commonly, psychologi- cal test results alter
of mental processes, their interdependence, and individual destiny in profound ways. Whether a
their variation under different circumstances. Indi- person is admitted to one college and not another,
viduals, besides, would find their tests interesting, offered one job but refused a sec- ond, diagnosed
and, perhaps, useful in regard to training, mode of as depressed or not—all such de- terminations
life or indication of disease. The scientific and
rest, at least in part, on the meaning of test results
practical value of such tests would be much in-
creased should a uniform system be adopted, so as interpreted by persons in authority. Put simply,
that determinations made at different times and psychological test results change lives. For this
places could be compared and combined. (Cattell, reason it is prudent—indeed, almost mandatory—
1890) that students of psychology learn about the
contemporary uses and occasional abuses of
Cattell’s conjecture that “perhaps” tests would
testing. In Case Exhibit 1.1, the life-altering af-
be useful in “training, mode of life or indication
termath of psychological testing is illustrated by
of disease” must certainly rank as one of the
means of several true case history examples.
prophetic understatements of all time. Anyone
The importance of testing is also evident from
reared in the Western world knows that
historical review. Students of psychology
psychological testing has emerged from its timid
generally regard historical issues as dull, dry, and
beginnings to become a big business and a
pedantic, and sometimes these prejudices are well
cultural institution that per- meates modern
deserved. After all, many textbooks fail to explain
society. To cite just one example, consider the
the rele- vance of historical matters and provide
number of standardized achievement and ability
only vague sketches of early developments in
tests administered in the school systems of the
mental testing. As a result, students of
United States. Although it is difficult to ob- tain
psychology often conclude incorrectly that
exact data on the extent of such testing, an es-
historical issues are boring and irrelevant.
timate of 200 million per year is probably not
In reality, the history of psychological testing
extreme (Medina & Neill, 1990). Of course, the
is a captivating story that has substantial
total number of tests administered yearly also in-
relevance to present-day practices. Historical
cludes millions of personality tests and untold
developments are pertinent to contemporary
numbers of the thousands of other kinds of tests
testing for the following reasons:
now in existence (Conoley & Kramer, 1989,
1992; Mitchell, 1985; Sweetland & Keyser,
1987). There is no doubt that testing is pervasive. 1. A review of the origins of psychological
But does it make a difference? testing helps explain current practices that
might other-
TOPIC 1A THE ORIGINS OF PSYCHOLOGICAL TESTING 3

THE CONSEQUENCES OF TEST RESULTS CASE EXHIBIT


1.1
The importance of psychological testing is best illustrated by example. Con-
sider these brief vignettes:
• A shy, withdrawn 7-year-old girl is administered an IQ test by a school
psy- chologist. Her score is phenomenally higher than the teacher
expected. The student is admitted to a gifted and talented program where
she blossoms into a self-confident and gregarious scholar.
• Three children in a family living near a lead smelter are exposed to the
toxic effects of lead dust and suffer neurological damage. Based in part on
psy- chological test results that demonstrate impaired intelligence and
shortened attention span in the children, the family receives an $8 million
settlement from the company that owns the smelter.
• A candidate for a position as police officer is administered a personality
in- ventory as part of the selection process. The test indicates that the
candidate tends to act before thinking and resists supervision from
authority figures. Even though he has excellent training and impresses the
interviewers, the candidate does not receive a job offer.
• A student, unsure of what career to pursue, takes a vocational interest in-
ventory. The test indicates that she would like the work of a pharmacist.
She signs up for a prepharmacy curriculum but finds the classes to be both
diffi- cult and boring. After three years, she abandons pharmacy for a
major in dance, frustrated that she still faces three more years of college
to earn a degree.
• An applicant to graduate school in clinical psychology takes the Minnesota
Multiphasic Personality Inventory (MMPI). His recommendations and
grade point average are superlative, yet he must clear the final hurdle
posed by the MMPI. His results are reasonably normal but slightly
defensive; by a narrow vote, the admissions committee extends him an
invitation. Ironically, this is the only graduate school to admit him—
nineteen others turn him down. He accepts the invitation and becomes
enchanted with the study of psychologi- cal assessment. Many years later,
he writes this book.

wise seem arbitrary or even peculiar. For exam- ple, modern intelligence tests are exceptionally
why do many current intelligence tests in- corporate a good at predicting school failure—precisely
seemingly nonintellective capacity, namely, short-term because this was the original and sole purpose
memory for digits? The an- swer is, in part, historical of the first such instrument developed in Paris,
inertia—intelligence tests have always included a France, at the turn of the twentieth century.
measure of digit span. 3. Finally, the history of psychological testing
2. The strengths and limitations of testing also stand con- tains some sad and regrettable episodes
out better when tests are viewed in historical con- that help remind us not to be overly zealous in
text. The reader will discover, for example, that our modern-day applications of testing. For
exam- ple, based on the misguided and
prejudicial
4 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

application of intelligence test results, several nation program, the similarities between their tra-
prominent psychologists helped ensure ditions and current testing practices are, in the
passage of the Immigration Restriction Act of main, superficial. Not only were their testing
1924. prac- tices unnecessarily grueling, the Chinese
In later chapters, we examine the principles of also failed to validate their selection procedures.
psychological testing, investigate applications in Nonetheless, it does appear that the examination
specific fields (e.g., personality, intelligence, program incor- porated relevant selection criteria.
neuropsychology), and reflect on the social and For example, in the written exams beauty of
legal consequences of testing. However, the penmanship was weighted very heavily. Given
reader will find these topics more comprehensible the highly stylistic features of Chinese written
when viewed in historical context. So, for now, forms, good penman- ship was no doubt essential
we begin at the beginning by reviewing for clear, exact com- munication. Thus,
rudimentary forms of testing that existed over penmanship was probably a relevant predictor of
four thousand years ago in imperial China. suitability for civil service em- ployment. In
response to widespread discontent, the
examination system was abolished by royal de-
RUDIMENTARY FORMS OF TESTING cree in 1906 (Franke, 1963).
IN CHINA IN 2200 B .C .
Although the widespread use of psychological PSYCHIATRIC ANTECEDENTS
test- ing is largely a phenomenon of the twentieth OF PSYCHOLOGICAL TESTING
cen- tury, historians note that rudimentary forms Most historians trace the beginnings of
of testing date back to at least 2200 B.C. when the psycholog- ical testing to the experimental
Chi- nese emperor had his officials examined investigation of in- dividual differences that
every third year to determine their fitness for office flourished in Germany and Great Britain in the
(Bowman, 1989; Chaffee, 1985; DuBois, 1970; late 1800s. There is no doubt that early
Franke, 1963; Lai, 1970; Teng, 1942–43). Such experimentalists such as Wilhelm Wundt, Francis
testing was modi- fied and refined over the Galton, and James McKeen Cattell laid the
centuries until written exams were introduced in foundations for modern-day testing, and we will
the Han dynasty (202 B.C.–A.D. 200). Five topics re- view their contributions in detail. But
were tested: civil law, military affairs, agriculture, psychologi- cal testing owes as much to early
revenue, and geography. The Chinese examination psychiatry as it does to the laboratories of
system took its final form about 1370 when experimental psychol- ogy. In fact, the
proficiency in the Confucian classics was examination of the mentally ill around the middle
emphasized. In the preliminary exam- ination, of the nineteenth century re- sulted in the
candidates were required to spend a day and a development of numerous early tests (Bondy,
night in a small isolated booth, composing es- 1974). These early tests featured the ab- sence of
says on assigned topics and writing a poem. The 1 standardization and were consequently relegated
to 7 percent who passed moved up to the district to oblivion. They were nonetheless influ- ential in
examinations, which required three separate ses- determining the course of psychological testing,
sions of three days and three nights. so it is important to mention a few typical
The district examinations were obviously developments from this era.
gruel- ing and rigorous, but this was not the final In 1885, the German physician Hubert von
level. The 1 to 10 percent who passed were Grashey developed the antecedent of the memory
allowed the privi- lege of going to Peking for the drum as a means of testing brain-injured patients.
final round of exam- inations. Perhaps 3 percent of His subjects were shown words, symbols, or pic-
this final group passed and became mandarins, tures through a slot in a sheet of paper that was
eligible for public office. moving slowly over the stimuli. Grashey found
Although the Chinese developed the external that many patients could recognize stimuli in their
trappings of a comprehensive civil service exami- to- tality but could not identify them when shown
TOPIC 1A THE ORIGINS OF PSYCHOLOGICAL TESTING 5
expose the mind to scientific
through the moving slot. Shortly thereafter, the
German psychiatrist Conrad Rieger developed an
excessively ambitious test battery for brain dam-
age. His battery took over 100 hours to administer
and soon fell out of favor.
In summary, early psychiatry contributed to
the mental test movement by showing that
standard- ized procedures could help reveal the
nature and extent of symptoms in the mentally ill
and brain- injured patients. Most of the early tests
developed by psychiatrists faded into oblivion,
but a few pro- cedures were standardized and
perpetuate them- selves in modern variations
(Bondy, 1974).

THE BRASS INSTRUMENTS


ERA OF TESTING
Experimental psychology flourished in the late
1800s in continental Europe and Great Britain.
For the first time in history, psychologists
departed from the wholly subjective and
introspective meth- ods that had been so
fruitlessly pursued in the pre- ceding centuries.
Human abilities were instead tested in
laboratories. Researchers used objective
procedures that were capable of replication. Gone
were the days when rival laboratories would have
raging arguments about “imageless thought,” one
group saying it existed, another group saying that
such a mental event was impossible.
Even though the new emphasis on objective
methods and measurable quantities was a vast im-
provement over the largely sterile mentalism that
preceded it, the new experimental psychology
was itself a dead end, at least as far as
psychological testing was concerned. The
problem was that the early experimental
psychologists mistook simple sensory processes
for intelligence. They used as- sorted brass
instruments to measure sensory thresh- olds and
reaction times, thinking that such abilities were at
the heart of intelligence. Hence, this period is
sometimes referred to as the Brass Instruments
era of psychological testing.
In spite of the false start made by early
experi- mentalists, at least they provided
psychology with an appropriate methodology.
Such pioneers as Wundt, Galton, Cattell, and
Clark Wissler showed that it was possible to
For each person there must be a certain speed of
thinking, which he can never exceed with his given
scrutiny and measurement. This was a mental constitution. But just as one steam engine
fateful change in the axiomatic can go faster than another, so this speed of thought
assumptions of psychol- ogy, a change will probably not be the same in all persons.
that has stayed with us to the current (Wundt, 1862, as translated in Rieber, 1980)
day.
Most sources credit Wilhelm This analysis of telescope reporting times
Wundt (1832– 1920) with founding seems simplistic by present-day standards and
the first psychological labora- tory in overlooks the possible contribution of such factors
1879 in Leipzig, Germany. It is less as attention,
well recognized that he was measuring
mental processes years before, at least
as early as 1862, when he ex-
perimented with his thought meter
(Diamond, 1980). This device was a
calibrated pendulum with needles
sticking off from each side. The
pendulum would swing back and
forth, striking bells with the needles.
The observer’s task was to take note of
the position of the pendulum when the
bells sounded. Of course, Wundt could
adjust the needles before- hand and
thereby know the precise position of
the pendulum when each bell was
struck. Wundt thought that the
difference between the observed
pendulum position and the actual
position would provide a means of
determining the swiftness of thought
of the observer.
Wundt’s analysis was relevant to a
longstanding problem in astronomy.
The problem was that two or more
astronomers simultaneously using the
same telescope (with multiple
eyepieces) would re- port different
crossing times as the stars moved
across a grid line on the telescope.
Even in Wundt’s time, it was a well-
known event in the history of science
that Kinnebrook, an assistant at the
Royal Observatory in England, had
been dismissed in 1796 because his
stellar crossing times were nearly a
full second too slow (Boring, 1950).
Wundt’s analysis offered another
explanation that did not as- sume
incompetence on the part of anyone.
Put sim- ply, Wundt believed that the
speed of thought might differ from
one person to the next:
6 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

motivation, and self-correcting feedback from thought for the average adult mind.
prior trials. On the positive side, this was at least
an em- pirical analysis that sought to explain
individual differences instead of trying to explain
them away. And that is the relevance to current
practices in psychological testing. However
crudely, Wundt measured mental processes and
begrudgingly ac- knowledged individual
differences.1

Galton and the First Battery


of Mental Tests
Sir Francis Galton (1822–1911) pioneered the
new experimental psychology in nineteenth-
century Great Britain. Galton was obsessed with
measure- ment, and his intellectual career seems
to have been dominated by a belief that virtually
anything was measurable. His attempts to
measure intellect by means of reaction time and
sensory discrimination tasks are well known. Yet,
to appreciate his wide- ranging interests, the
reader should be apprised that Galton also
devised techniques for measuring beauty,
personality, the boringness of lectures, and the
efficacy of prayer, to name but a few of the en-
deavors that his biographer has catalogued in
elab- orate detail (Pearson 1914, 1924, 1930ab).
Galton was a genius who was more interested
in the problems of human evolution than in psy-
chology per se (Boring, 1950). His two most
influ- ential works were Hereditary Genius
(1869), an empirical analysis purporting to prove
that genetic factors were overwhelmingly
important for the at- tainment of eminence, and
Inquiries into Human Faculty and Its
Development (1883), a disparate se- ries of essays
that emphasized individual differ- ences in mental
faculties.
Boring (1950) regards Inquiries as the begin-
ning of the mental test movement and the advent
of the scientific psychology of individual
differences. The book is a curious mixture of
empirical research and speculative essays on
topics as diverse as “just perceptible differences”
in lifted weight and di- minished fertility among
inbred animals. There is,

1. This emphasis upon individual differences was rare for


Wundt. He is more renowned for proposing common laws of
nonetheless, a common theme visual acuity, highest audible tone, speed of blow,
uniting these diverse essays; Galton and reaction time (RT) to both visual and auditory
demonstrates time and again that in- stimuli.
dividual differences not only exist Ultimately, Galton’s simplistic attempts to
but are objec- tively measurable. gauge intellect with measures of reaction time
Galton borrowed the time- and sensory discrimination proved fruitless.
consuming psy- chophysical Nonethe- less, he did provide a tremendous
procedures practiced by Wundt and impetus to the testing movement by
others on the European continent demonstrating that objective tests could be
and adapted them to a series of devised and that meaningful scores could be
simple and quick sensorimotor obtained through standardized procedures.
measures. Thus, he continued the
tradition of brass instruments mental
testing but with an impor- tant
difference: his procedures were
much more amenable to the timely
collection of data from hun- dreds if
not thousands of subjects. Because
of his efforts in devising practicable
measures of individ- ual differences,
historians of psychological testing
usually regard Galton as the father
of mental test- ing (Goodenough,
1949; Boring, 1950).
To further his study of
individual differences, Galton set up
a psychometric laboratory in
London at the International Health
Exhibition in 1884. It was later
transferred to the London Museum,
where it was maintained for six
years. Various anthropo- metric and
psychometric measures were
arranged on a long table at one side
of a narrow room. Sub- jects were
admitted at one end for threepence
and given successive tests as they
moved down the table. At least
17,000 individuals were tested
during the 1880s and 1890s. About
7,500 of the individual data records
have survived to the present day
(Johnson et al., 1985).
The tests and measures involved
both the phys- ical and behavioral
domains. Physical characteris- tics
assessed were height, weight, head
length, head breadth, arm span,
length of middle finger, and length
of lower arm, among others. The
behavioral tests included strength of
hand squeeze determined by
dynamometer, vital capacity of the
lungs mea- sured by spirometer,
TOPIC 1A THE ORIGINS OF PSYCHOLOGICAL TESTING 7

Cattell Imports Brass Instruments Reaction time for sound—using a device simi-
to the United States lar to Galton’s
Time for naming colors
James McKeen Cattell (1860–1944) studied the
Bisection of a 50-centimeter line
new experimental psychology with both Wundt
Judgment of 10 seconds of time
and Galton before settling at Columbia University
Number of letters repeated on one hearing
where, for twenty-six years, he was the
undisputed dean of American psychology. With
Strength of hand squeeze seems a curious
Wundt, he did a series of painstakingly elaborate
addi- tion to a battery of mental tests, a point that
RT studies (1880–1882), measuring with great
Cattell (1890) addressed directly in his paper. He
precision the fractions of a second presumably
was of the opinion that it was impossible to
required for dif- ferent mental reactions. He also
separate bodily en- ergy from mental energy.
noted, almost in passing, that he and another
Thus, in Cattell’s view, an ostensibly
colleague had small but consistent differences in
physiological measure such as dyna- mometer
RT. Cattell proposed to Wundt that such
pressure was an index of one’s mental power as
individual differences ought to be studied
well. Clearly, the physiological and sen- sory bias
systematically. Although Wundt acknowl- edged
of the entire test battery reflects its strongly
individual differences, he was philosophi- cally
Galtonian heritage (Fancher, 1985).
more inclined to study general features of the
In 1891, Cattell accepted a position at Colum-
mind, and he offered no support for Cattell’s pro-
bia University, at that time the largest university
posal (Fancher, 1985).
in the United States. His subsequent influence on
But Cattell received enthusiastic support for
American psychology was far in excess of his in-
his study of individual differences from Galton,
dividual scientific output and was expressed in
who had just opened his psychometric laboratory
large part through his numerous and influential
in Lon- don. After corresponding with Galton for
stu- dents (Boring, 1950). Among his many
a few years, Cattell arranged for a two-year
famous doctoral students and the years of their
fellowship at Cambridge so that he could continue
degrees were E. L. Thorndike (1898) who made
the study of in- dividual differences. Cattell opened
monu- mental contributions to learning theory
his own research laboratory and developed a series
and educa- tional psychology; R. S. Woodworth
of tests that were mainly extensions and additions
(1899) who was to author the very popular and
to Galton’s battery. Cattell (1890) invented the term
influential Ex- perimental Psychology (1938); and
mental test in his famous paper entitled “Mental
E. K. Strong (1911) whose Vocational Interest
Tests and Measure- ments.” This paper described
Blank—since re- vised—is still in wide use. But
his research program, detailing ten mental tests he
among Cattell’s stu- dents, it was probably Clark
proposed for use with the general public. These
Wissler (1901) who had the greatest influence on
tests were clearly a reworking
the early history of psychological testing.
and embellishment of the Galtonian tradition:
Wissler obtained both mental test scores and
Strength of hand squeeze as measured by academic grades from more than 300 students at
dynamometer Columbia University and Barnard College. His
Rate of hand movement through a distance of goal was to demonstrate that the test results could
50 centimeters pre- dict academic performance. With our early
Two-point threshold for touch—minimum dis- twenty- first-century perspective on research and
tance at which two points are still perceived testing, it seems amazing that the early
as separate experimentalists waited so long to do such basic
Degree of pressure needed to cause pain— validational re- search. Wissler’s (1901) results
rub- ber tip pressed against the forehead showed virtually no tendency for the mental test
Weight differentiation—discern the relative scores to correlate with academic achievement.
weights of identical-looking boxes varying For example, class standing correlated .16 with
by one gram from 100 to 110 grams memory for number lists, –.08
8 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

with dynamometer strength, .02 with color If Wissler’s (1901) negative findings had been
naming, and –.02 with reaction time. The highest more skeptically scrutinized, it might not have
correlation (.16) was statistically significant
been a full 70 years later until RT was
because of the large sample size. However, so
resurrected as a potentially useful intellectual
humble a correla- tion carries with it very little
measure. Correla- tions of –.40 between complex
predictive utility.2 forms of RT and in- telligence are not at all
Also damaging to the brass instruments
uncommon (Jensen, 1982).3 But that is getting
testing movement was the very modest
ahead of the story. The more common reaction
correlations be- tween the mental tests
among psychologists in the early 1900s was to
themselves. For example, color naming and hand
begrudgingly conclude that Galton had been
movement speed correlated only .19, while RT
wrong in attempting to infer complex abilities
and color naming correlated
from simple ones. Goodenough (1949) has
–.15. Several physical measures such as head size
likened Galton’s approach to “inferring the nature
(a holdover measure from the Galton era) were,
of genius from the nature of stupidity or the quali-
not surprisingly, also uncorrelated with the
ties of water from those of the hydrogen and
various sen- sory and RT measures.
oxygen of which it is composed.” The academic
With the publication of Wissler’s (1901)
psychologists apparently agreed with her, and
discouraging results, experimental psychologists
American attempts to develop intelligence tests
largely abandoned the use of RT and sensory dis-
vir- tually ceased at the turn of the twentieth
crimination as measures of intelligence. From one
century. For his own part, Wissler was
standpoint, this turning away from the brass
apparently so dis- couraged by his results that he
instru- ments approach was a desirable
immediately switched to anthropology, where he
development in the history of psychological
became a strong environmentalist in explaining
testing. The way was thereby paved for
differences
immediate acceptance of Alfred Binet’s more between ethnic groups.
sensible and useful measures of higher mental
The void created by the abandonment of the
processes.
Galtonian tradition did not last for long. In
But in other respects, the abandonment of RT Europe, Alfred Binet was on the verge of a major
and sensory measures was premature and unfortu- break- through in intelligence testing. Binet
nate. After all, by contemporary standards introduced his scale of intelligence in 1905, and
Wissler’s research methods revealed an shortly thereafter
extraordinary psy- chometric naivete. By using
H. H. Goddard imported it to the United States,
only bright college students as subjects, Wissler where it was applied in a manner that Gould
had inadvertently in- troduced an extreme
(1981) has described as “the dismantling of
restriction of range, which would invariably Binet’s inten- tions in America.” Whether early
reduce the size of his correlations. If a more
twentieth-century American psychologists
heterogeneous sample of subjects had been used, subverted Binet’s inten- tions is an important
the correlations would have been sub- stantially
question that we review in the next topic. First,
larger. In addition, certain measures such as RT we examine the social changes in nineteenth-
were inherently unreliable because of the small
century Europe that created the neces- sity for
number of trials per subject. Such unreliabil- ity practical intelligence tests.
in a measure also places a severe restriction on
the upper bounds of correlation coefficients.
CHANGING CONCEPTIONS OF
2. We discuss the correlation coefficient in more detail in MENTAL RETARDATION IN THE 1800S
Topic 3B, Concepts of Reliability. By way of quick preview,
cor- relations can range from –1.0 to +1.0. Values near zero
Many great inventions have been developed in re-
indicate a weak, negligible linear relationship between the two sponse to the practical needs created by changes in
variables.
For example, correlations between –.20 and +.20 are generally
of minimal value for purposes of individual prediction. Note also that negative correlations indicate an inverse relationship.
3. The correlations are negative because low
scores on RT are associated with high scores
on intelligence tests.
TOPIC 1A THE ORIGINS OF PSYCHOLOGICAL TESTING 9

societal values. Such is the case with intelligence tion, thereby helping to create the necessity for
tests. To be specific, the first such tests were devel- Binet’s tests.
oped by Binet in the early 1900s to help identify
children in the Paris school system who were un-
likely to profit from ordinary instruction. Prior to Esquirol and Diagnosis in Mental Retardation
this time, there was little interest in the educational Around the beginning of the nineteenth century,
needs of children with mental retardation. A new many physicians had begun to perceive the differ-
humanism toward those with mental retardation ence between mental retardation (then called id-
thus created the practical problem—identifying iocy) and mental illness (often referred to as
those with special needs—that Binet’s tests were dementia). J. E. D. Esquirol (1772–1840) was the
to solve. The Western world of the late 1800s was first to formalize the difference in writing. His
just emerging from centuries of indifference and diagnostic breakthrough was noting that mental
hos- tility toward the psychiatrically and mentally re- tardation was a lifelong developmental
im- paired. Medical practitioners were just phenome- non whereas mental illness usually had
beginning to acknowledge a distinction between a more abrupt onset in adulthood. He thought that
individuals with emotional disablities and mental mental retardation was incurable, whereas mental
retardation. For centuries, all such social outcasts illness might show improvement (Esquirol,
were given similar treatment. In the Middle Ages, 1845/1838).
they were occasionally “diagnosed” as witches Esquirol placed great emphasis upon language
and put to death by burning. Later on, they were skills in the diagnosis of mental retardation. This
alternately ignored, persecuted, or tortured. In his may offer a partial explanation as to why Binet’s
comprehen- sive history of psychotherapy and later tests and the modern-day descendents from
psychoanalysis, Bromberg (1959) has an them are so heavily loaded on linguistic abilities.
especially graphic chapter on the various forms of After all, the original use of the Binet scales was,
maltreatment toward those with mental and in the main, to identify children with mental
emotional disabilities, from which only one retardation who would not likely profit from
example will be provided here. In 1698, a ordinary schooling.
prominent physician wrote a gruesome book, Fla- Esquirol also proposed the first classification
gellum Salutis, in which beatings were advocated system in mental retardation and it should be no
as treatment “in melancholia; in frenzy; in paraly- surprise that language skills were the main diag-
sis; in epilepsy; in facial expression of feeble- nostic criteria. He recognized three levels of men-
minded” (Bromberg, 1959). tal retardation: (1) those using short phrases,
By the early 1800s, saner minds began to (2) those using only monosyllables, and (3) those
prevail. Medical practitioners realized that some of with cries only, no speech. Apparently, Esquirol
those with psychiatric impairment had reversible did not recognize what we would now call mild
illnesses that did not necessarily imply diminished mental retardation, instead providing criteria for
intellect, whereas other exceptional persons, those the equiv- alents of the modern-day
with men- tal retardation, showed a greater classifications of moder- ate, severe, and
developmental con- tinuity and invariably had profound mental retardation.
impaired intellect. In addition, a newfound
humanism began to influence social practices
toward individuals with psychologi- cal and mental Seguin and Education of
disabilities. With this humanism there arose a Individuals with Mental
greater interest in the diagnosis and remedia- tion Retardation
of mental retardation. At the forefront of these Perhaps more than any other pioneer in the field
developments were two French physicians, J. E. D. of mental retardation, O. Edouard Seguin (1812–
Esquirol and O. E. Seguin, each of whom revolu- 1880) helped establish a new humanism toward
tionized thinking about those with mental retarda- those with mental retardation in the late 1800s.
He had been a student of Esquirol and had also
studied with J. M. G. Itard (1774–1838), who is
well known
10 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

for his five-year attempt to train the Wild Boy of psychology of John Stuart Mill (1806–1873). Later, he selected
Aveyron, a feral child who had lived in the woods an apprenticeship
for his first 11 or 12 years (Itard, 1932/1801).
Seguin borrowed from techniques used by
Itard and devoted his life to developing
educational pro- grams for persons with mental
retardation. As early as 1838, he had established
an experimental class for such individuals. His
treatment efforts earned him international acclaim
and he eventually came to the United States to
continue his work. In 1866, he published Idiocy,
and Its Treatment by the Phys- iological Method,
the first major textbook on the treatment of
mental retardation. This book advo- cated a
surprisingly modern approach to education of
individuals with mental retardation and even
touched on what would now be called behavior
modification.
Such was the social and historical background
that allowed intelligence tests to flourish. We turn
now to the invention of the modern-day
intelligence test by Alfred Binet. We begin with a
discussion of the early influences that shaped his
famous test.

INFLUENCE OF BINET’S
EARLY RESEARCH UPON HIS
TEST
As most every student of psychology knows, Al-
fred Binet (1857–1911) invented the first modern
intelligence test in 1905. What is less well known,
but equally important for those who seek an
under- standing of his contributions to modern
psychol- ogy, is that Binet was a prolific
researcher and author long before he turned his
attentions to intel- ligence testing. The character
of his early research had a material bearing on the
subsequent form of his well-known intelligence
test. For those who seek a full understanding of
his pathbreaking in- fluence, brief mention of
Binet’s early career is mandatory. For more
details the reader can consult DuBois (1970),
Fancher (1985), Goodenough
(1949), Gould (1981), and Wolf (1973).
Binet began his career in medicine, but was
forced to drop out because of a complete
emotional breakdown. He switched to
psychology, where he studied the two-point
threshold and dabbled in the associationist
with the neurologist J. M. Charcot or exceeded adult levels. From these findings,
(1825–1893) at the famous Binet concluded that attention was a key
Salpetriere Hospital. Thus, for a component of intelligence, which was itself a
brief time Binet’s professional path very multifaceted entity. Fur- thermore, he
paralleled that of Sigmund Freud, became increasingly disenchanted with the brass
who also studied hysteria under instruments approach to measuring intelligence,
Charcot. At the Salpetriere Hospital, which probably explains his subse- quent use of
Binet co- authored (with C. Fere) measures of higher mental processes. In addition,
four studies supposedly Binet’s sensory-perceptual experi- ments with his
demonstrating that reversing the children greatly influenced his
polarity of a mag- net could induce views on proper testing procedures:
complete mood changes (e.g., from
happy to sad) or transfer of
hysterical paraly- sis (e.g., from left
to right side) in a single hypno-
tized subject. In response to public
criticism from other psychologists,
Binet later published a recan- tation
of his findings. This was a painful
episode for Binet, and it sent his
career into a temporary de- tour.
Nonetheless, he learned two things
through his embarrassment. First, he
never again used sloppy
experimental procedures that
allowed for unintentional suggestion
to influence his results. Second, he
became skeptical of the zeitgeist
(spirit of the times) in experimental
psychology. Both of these lessons
were applied when he later
developed his intelligence scales.
In 1891, Binet went to work at
the Sorbonne as an unpaid assistant
and began a series of studies and
publications that were to define his
new “in- dividual psychology” and
ultimately to culminate in his
intelligence tests. Binet was an
ardent exper- imentalist, often using
his two daughters to try out existing
and new tests of intelligence. Early
on, he flirted with a Cattellian
approach to intelligence testing,
using the standard measures of
reaction time and sensory acuity on
his two daughters. The results were
annoyingly inconsistent and
difficult to interpret. As might be
expected, he found that the reaction
times of his children were, on
average, much slower than for
adults. But on some trials his
daughters’ performance approached
TOPIC 1A THE ORIGINS OF PSYCHOLOGICAL TESTING 11

The experimenter is obliged, to a point, to adjust tery of assessments in 20 areas such as spoken
his method to the subject he is addressing. There lan- guage; knowledge of parts of the body;
are certain rules to follow when one experiments obedience to simple commands; naming common
on a child, just as there are certain rules for adults,
objects; and ability to read, write, and do simple
for hysterics, and for the insane. These rules are
not written down anywhere; each one learns them arithmetic. Binet criticized the scale for being too
for himself and is repaid in great measure. By subjective, for having items reflecting formal
making an error and later accounting for the cause, education, and for using a yes or no format on
one learns not to make the mistake a second time. many questions (DuBois, 1970). But he was
In re- gard to children, it is necessary to be much impressed with the idea of using a battery
suspicious of two principal causes of error: of tests, a feature which he adopted in his 1905
suggestion and fail- ure of attention. This is not the
scales.
time to speak on the first point. As for the second,
failure of attention, it is so important that it is In 1904, the Minister of Public Instruction in
always necessary to sus- pect it when one obtains a Paris appointed a commission to decide upon the
negative result. One must then suspend the educational measures that should be undertaken
experiments and take them up at a more favorable with those children who could not profit from
moment, restarting them 10 times, 20 times, with reg- ular instruction. The commission concluded
great patience. Children, in fact, are often little that medical and educational examinations should
disposed to pay attention to ex- periments which
are not entertaining, and it is use- less to hope that be used to identify those children who could not
one can make them more attentive by threatening learn by the ordinary methods. Furthermore, it
them with punishment. By particu- lar tricks, was de- termined that these children should be
however, one can sometimes give the ex- periment removed from their regular classes and given
a certain appeal. (Binet, 1895, quoted in Pollack, special in- struction suitable to their more limited
1971) intellectual prowess. This was the beginning of
It is interesting to contrast modern-day testing the special ed- ucation classroom.
practices—which go so far as to specify the exact It was evident that a means of selecting
wording the examiner should use—with Binet’s children for such special placement was needed,
ad- vice to exercise nearly endless patience and and Binet and his colleague Simon were called
use en- tertaining tricks when testing children. upon to de- velop a practical tool for just this
purpose. Thus arose the first formal scale for
assessing the intelli- gence of children.
Goodenough (1949) has outlined the four
BINET AND TESTING FOR HIGHER ways in which the 1905 scale differed from those
MENTAL PROCESSES which had been previously constructed.

In 1896, Binet and his Sorbonne assistant, Victor 1. It made no pretense of measuring precisely any
Henri, published a pivotal review of German and single faculty. Rather, it was aimed at
American work on individual differences. In this assessing the child’s general mental
historically important paper, they argued that development with a heterogeneous group of
intel- ligence could be better measured by means tasks. Thus, the aim was not measurement, but
of the higher psychological processes rather than classification.
the ele- mentary sensory processes such as 2. It was a brief and practical test. The test took
reaction time. After several false starts, Binet and less than an hour to administer and required
Simon eventu- ally settled on the straightforward little in the way of equipment.
format of their 1905 scales, discussed 3. It measured directly what Binet and Simon re-
subsequently. garded as the essential factor of intelligence—
The character of the 1905 scale owed much to practical judgment—rather than wasting time
a prior test developed by Dr. Blin (1902) and his with lower-level abilities involving sensory,
pupil, M. Damaye. They had attempted to motor, and perceptual elements. They took a
improve the diagnosis of mental retardation by pragmatic view of intelligence:
using a bat-
12 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

There is in intelligence, it seems to us, a


fundamental agency the lack or alteration of which
ization had been done with 50 normal children
has the great- est importance for practical life; that is ranging in age from three to 11 years and
judgement, otherwise known as good sense, practical several subnormal and retarded children as
sense, ini- tiative, or the faculty of adapting oneself. well.
To judge well, to understand well, to reason well—
these are the essential wellsprings of intelligence. The 30 tests on the 1905 scale ranged from ut-
(Binet and Simon, 1905; as translated in Fancher, terly simple sensory tests to quite complex verbal
1985) abstractions. Thus, the scale was appropriate for
as- sessing the entire gamut of intelligence—from
4. The items were arranged by approximate level
se- vere mental retardation to high levels of
of difficulty instead of content. A rough
giftedness. The entire scale is outlined in Table
standard-
1.1.

TABLE 1.1The 1905 Binet-Simon Scale

1. Follows a moving object with the eyes.


2. Grasps a small object which is touched.
3. Grasps a small object which is seen.
4. Recognizes the difference between a square of chocolate and a square of wood.
5. Finds and eats a square of chocolate wrapped in paper.
6. Executes simple commands and imitates simple gestures.
7. Points to familiar named objects, e.g., “Show me the cup.”
8. Points to objects represented in pictures, e.g., “Put your finger on the window.”
9. Names objects in pictures, e.g., “What is this?” [examiner points to a picture of a sign].
10. Compares two lines of markedly unequal length.
11. Repeats three spoken digits.
12. Compares two weights.
13. Shows susceptibility to suggestion.
14. Defines common words by function.
15. Repeats a sentence of 15 words.
16. Tells how two common objects are different, e.g., “paper and cardboard.”
17. Names from memory as many as possible of 13 objects displayed on a board for 30 seconds. [This test was
later dropped because it permitted too many possibilities for distraction.]
18. Reproduces from memory two designs shown for 10 seconds.
19. Repeats a longer series of digits than in item 11 to test immediate memory.
20. Tells how two common objects are alike, e.g., “butterfly and flea.”
21. Compares two lines of slightly unequal length.
22. Compares five blocks to put them in order of weight.
23. Indicates which of the previous five weights the examiner has removed.
24. Produces rhymes, e.g., “What rhymes with ‘school’?”
25. A word completion test based on those proposed by Ebbinghaus.
26. Puts three nouns, e.g., “Paris, river, fortune” (or three verbs) in a sentence.
27. Responds to 25 abstract (comprehension) questions, e.g., “When a person has offended you, and comes to
offer his apologies, what should you do?”
28. Reverses the hands of a clock.
29. After paper folding and cutting, draws the form of the resulting holes.
30. Defines abstract words by designating the difference between, e.g., “boredom and weariness.”

Source: Based on translations in Jenkins and Paterson (1961) and Jensen (1980).
TOPIC 1A THE ORIGINS OF PSYCHOLOGICAL TESTING 13

Except for the very simplest tests, which were The major innovation of the 1908 scale was
designed for the classification of very low-grade the introduction of the concept of mental level. The
id- iots (an unfortunate diagnostic term that has tests had been standardized on about 300 normal
since been dropped), the tests were heavily children between the ages of 3 and 13 years. This
weighted to- ward verbal skills, reflecting Binet’s allowed Binet and Simon to order the tests
departure from the Galtonian tradition. according to the age level at which they were
An interesting point that is often overlooked typically passed. Whichever items were passed by
by contemporary students of psychology is that 80 to 90 percent of the 3-year-olds were placed in
Binet and Simon did not offer a precise method the 3-year level, and similarly on up to age 13.
for arriv- ing at a total score on their 1905 scale. Binet and Simon also devised a rough scoring
It is well to remember that their purpose was system whereby a basal age was first determined
classification, not measurement, and that their from the age level at which not more than one test
motivation was en- tirely humanitarian, namely, was failed. For each five tests that were passed at
to identify those chil- dren who needed special levels above the basal, a full year of mental level
educational placement. By contemporary was granted. Insofar as partial years of mental
standards, it is difficult to accept the fuzziness level were not credited and the various age levels
inherent in such an approach, but that may reflect had anywhere from three to eight tests, the
a modern penchant for quantification more than a method left much to be desired.
weakness in the 1905 scale. In fact, their scale In 1911, a third revision of the Binet-Simon
was popular among educators in Paris. And, even scales appeared. Each age level now had exactly
with the absence of precise quantifica- tion, the five tests. The scale was also extended into the
approach was successful in selecting can- didates adult range. And with some reluctance, Binet in-
for special classes. troduced new scoring methods that allowed for
one-fifth of a year for each subtest passed beyond
the basal level. In his writings, Binet emphasized
THE REVISED SCALES
strongly that the child’s exact mental level should
AND THE ADVENT OF IQ
not be taken too seriously as an absolute measure
In 1908, Binet and Simon published a revision of of intelligence.
the 1905 scale. In the earlier scale, more than half Nonetheless, the idea of deriving a mental
the items had been designed for the very retarded, level was a monumental development that was to
yet the major diagnostic decisions involved older influ- ence the character of intelligence testing
children and those with borderline intellect. To throughout the twentieth century. Within months,
rem- edy this imbalance, most of the very simple what Binet called mental level was being
items were dropped and new items were added at translated as mental age. And testers everywhere,
the higher end of the scale. The 1908 scale had 58 including Binet him- self, were comparing a
prob- lems or tests, almost double the number child’s mental age with the child’s chronological
from 1905. Several new tests were added, many age. Thus, a 9-year-old who was functioning at
of which are still used today: reconstructing the mental level (or mental age) of a 6-year-old
scrambled sen- tences, copying a diamond, and was retarded by three years. Very shortly, Stern
executing a se- quence of three commands. Some (1912) pointed out that being retarded by three
of the items were absurdities that the children had years had different meanings at different ages. A
to detect and ex- plain. One such item was 5-year-old functioning at the 2-year-old level was
amusing to French chil- dren: “The body of an more impaired than a 13-year-old func- tioning at
unfortunate girl was found, cut into 18 pieces. It is the 10-year-old level. Stern suggested that an
thought that she killed her- self.” However, this intelligence quotient computed from the mental
item was very upsetting to some American age divided by the chronological age would give a
subjects, demonstrating the importance of cultural better measure of the relative functioning of a
factors in intelligence (Fancher, 1985). sub- ject compared to his or her same-aged peers.
14 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

In 1916, Terman and his associates at Binet died in 1911 before the IQ swept American
Stanford revised the Binet-Simon scales, testing, so we will never know what he would
producing the Stanford-Binet, a successful test have thought of this new development based on
that is discussed in a later chapter. Terman his scales. However, Simon, his collaborator, later
suggested multiplying the intelligence quotient by called the concept of IQ a “betrayal” of their
100 to remove fractions; he was also the first scale’s original objectives (Fancher, 1985, p.
person to use the abbreviation IQ. Thus was born 104), and we can assume from Binet’s humanistic
one of the most popular and con- troversial concern that he might have held a similar opinion.
concepts in the history of psychology.

SUMMARY
1. For better or for worse, psychological test 7. One of Cattell’s students, Clark Wissler,
results possess the power to alter lives. A review showed that reaction time and sensory discrimina-
of historical trends is crucial if we desire to tion measures did not correlate with college
compre- hend the contemporary influence of grades, thereby redirecting the mental-testing
psychological tests. movement away from brass instruments.
2. Rudimentary forms of testing date back to 8. In the late 1800s, a newfound humanism
2200 B.C. in China. The Chinese emperors used toward the mentally retarded, reflected in the
grueling written exams to select officials for civil diag- nostic and remedial work of French
service. physicians Es- quirol and Seguin, helped create
3. In the mid- to late 1800s, several physi- the necessity for early intelligence tests.
cians and psychiatrists developed standardized 9. Alfred Binet, who was to invent the first
pro- cedures to reveal the nature and extent of true intelligence test, began his career by studying
symptoms in the mentally ill and brain-injured. hysterical paralysis with the French neurologist
For example, in 1885, Hubert von Grashey Charcot. Binet’s claim that magnetism could cure
developed the pre- cursor to the memory drum to hysteria was, to his pained embarrassment, dis-
test the visual recog- nition skill of brain-injured proved. Shortly thereafter, he switched interests
patients. and conducted sensory-perceptual studies, using
4. Modern psychological testing owes its in- his children as subjects.
ception to the era of brass instruments psychology 10. In 1905, Binet and Simon developed the
that flourished in Europe during the late 1800s. first useful intelligence test in Paris, France. Their
By testing sensory thresholds and reaction times, simple 30-item measure of mainly higher mental
pio- neer test developers such as Sir Francis functions helped identify schoolchildren who
Galton demonstrated that it was possible to could not profit from regular instruction.
measure the mind in an objective and replicable Curiously, there was no method for scoring the
manner. test.
5. Wilhelm Wundt founded the first psycho- 11. In 1908, Binet and Simon published a re-
logical laboratory in 1879 in Leipzig, Germany. vised 58-item scale that incorporated the concept of
In- cluded among his earlier investigations was mental level. In 1911, a third revision of the Binet-
his 1862 attempt to measure the speed of thought Simon scales appeared. Each age level now had ex-
with the thought meter, a calibrated pendulum actly five tests; the scale extended into the adult
with nee- dles sticking off from each side. range.
6. The first reference to mental tests 12. In 1912, Stern proposed dividing the
occurred in 1890 in a classic paper by James mental age by the chronological age to obtain an
McKeen Cat- tell, an American psychologist who intelligence quotient. In 1916, Terman suggested
had studied with Galton. Cattell imported the multiplying the intelligence quotient by 100 to re-
brass instruments approach to the United States. move fractions. Thus was born the concept of IQ.
TO PI C 1B Early Testing in the United States
Early Uses and Abuses of Tests in the United States
The Invention of Nonverbal Tests in the Early
1900s The Stanford-Binet: The Early Mainstay of
IQ
Group Tests and the Classification of WWI Army Recruits
Early Educational Testing
The Development of Aptitude Tests
Personality and Vocational Testing After WWI
The Origins of Projective Testing
The Development of Interest Inventories
Summary of Major Landmarks in the History of Testing
Summary

T he Binet-Simon scales helped solve a practi-


cal social quandary, namely, how to identify
children who needed special schooling. With this
EARLY USES AND ABUSES
OF TESTS IN THE UNITED STATES
successful application of a mental test, psycholo- First Translation of the Binet-Simon Scale
gists realized that their inventions could have prag-
In 1906, Henry H. Goddard was hired by the
matic significance for many different segments of
Vineland Training School in New Jersey to do
society. Almost immediately, psychologists in the
research on the classification and education of
United States adopted a utilitarian focus. Intelli-
“feebleminded” children. He soon realized that a
gence testing was embraced by many as a reliable
diagnostic instrument would be required and was
and objective response to perceived social prob-
therefore pleased to read of the 1908 Binet-Simon
lems such as the identification of immigrants with
scale. He quickly set about translating the scale,
mental retardation and the quick, accurate classifi-
making minor changes so that it would be
cation of Army recruits (Boake, 2002). applica- ble to American children (Goddard,
Whether these early tests really solved social 1910a).
dilemmas—or merely exacerbated them—is a
Goddard (1910b) tested 378 residents of the
fiercely debated issue reviewed in the following
Vineland facility and categorized them by diagno-
sections. One thing is certain: The profusion of
sis and mental age. He classified 73 residents as
tests developed early in the twentieth century
id- iots because their mental age was 2 years or
helped shape the character of contemporary tests.
lower; 205 residents were termed imbeciles with
A review of these historical trends will aid in the
mental age of 3 to 7 years; and 100 residents were
comprehension of the nature of modern tests and
deemed feebleminded with mental age of 8 to 12
a better appreciation of the social issues raised by
years. It is instructive to note that originally
them.
neutral and descriptive terms for portraying levels
of mental retardation—idiot, imbecile, and
feebleminded—
15
16 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

have made their way into the everyday lexicon of mindedness were much higher than estimated by
pejorative labels. In fact, Goddard made his own the physicians who staffed the immigration
contribution by coining the diagnostic term service. Within a year, he reversed his opinions
moron (from the Greek moronia, meaning entirely and called for congressional funding so
“foolish”). that Ellis Island could be staffed with experts
Goddard (1911) also tested 1,547 normal chil- trained in the use of intelligence tests. In the
dren with his translation of the Binet-Simon following decade, Goddard became an apostle for
scales. He considered children whose mental age the use of intelligence tests to identify
was four or more years behind their chronological feebleminded immigrants. Although he wrote that
age to be feebleminded—these constituted 3 the rates of mentally deficient immi- grants were
percent of his sample. Considering that all of “alarming,” he did not join the popular call for
these children were found outside of institutions immigration restriction (Gelb, 1986).
for the retarded, 3 per- cent is rather an alarming The story of Goddard and his concern for the
rate of mental deficiency. Goddard (1911) was of “menace of feeblemindedness,” as Gould (1981)
the opinion that these chil- dren should be has satirically put it, is often ignored or
segregated so that they would be prevented from downplayed in books on psychological testing.
“contaminating society.” These early studies The majority of textbooks on testing do not
piqued Goddard’s curiosity about “feebleminded” mention or refer to God- dard at all. The few
citizenry and the societal burdens they imposed. books that do mention him usu- ally state that
He also gained a reputation as one of the leading Goddard “used the tests in institutions for the
experts on the use of intelligence tests to identify retarded,” which is surely an understatement. In
persons with impaired intellect. His tal- ents were his influential History of Psychological Testing,
soon in heavy demand. DuBois (1970) has a portrait of Goddard but
devotes less than one line of text to him.
The Binet-Simon and Immigration The fact is that Goddard was one of the most
in- fluential American psychologists of the early
In 1910, Goddard was invited to Ellis Island by 1900s. Any thoughtful person must therefore
the commissioner of immigration to help make won- der why so many contemporary authors
the ex- amination of immigrants more accurate. A have ig- nored or slighted the person who first
dark and foreboding folklore had grown up translated and applied Binet’s tests in the United
around mental deficiency and immigration in the States. We will attempt an answer here, based in
early 1900s: part on Goddard’s original writing, but also
It was believed that the feebleminded were degen- relying upon Gould’s (1981) critique of
erate beings responsible for many if not most social Goddard’s voluminous writings on mental
problems; that they reproduced at an alarming rate deficiency and intelligence testing. We refer to
and menaced the nation’s overall biological fitness; Gelb’s (1986) more sympathetic portrayal of
and that their numbers were being incremented by Goddard as well.
undesirable “new” immigrants from southern and
eastern European countries who had largely sup- Perhaps Goddard has been ignored in the text-
planted the “old” immigrants from northern and books because he was a strict hereditarian who
western Europe. (Gelb, 1986) con- ceived of intelligence in simple-minded
Mendelian terms. No doubt his call for
Initially, Goddard was unconcerned about the colonization of “mo- rons” so as to restrict their
supposed threat of feeblemindedness posed by the breeding has won him contemporary disfavor as
immigrants. He wrote that adequate statistics did well. And his insistence that much undesirable
not exist and that the prevalent opinions about behavior—crime, alco- holism, prostitution—was
undue percentages of mentally defective immi- due to inherited mental deficiency also does not
grants were “grossly overestimated” (Goddard, sit well with the modern environmentalist
1912). However, with repeated visits to Ellis position.
Island, Goddard became convinced that the rates However, the most likely reason that modern
of feeble- authors have ignored Goddard is that he exempli-
TOPIC 1B EARLY TESTING IN THE UNITED STATES 17

fied a large number of early, prominent psycholo- enced by the social ideologies of his time. Finally,
gists who engaged in the blatant misuse of intelli- Goddard was a complex scholar who refined and
gence testing. In his efforts to demonstrate that contradicted his professional opinions on numer-
high rates of immigrants with mental retardation ous occasions. One ironic example: After the
were entering the United States each day, dam- age was done and his writings had helped
Goddard sent his assistants to Ellis Island to restrict immigration, Goddard (1928) recanted,
administer his English translation of the Binet- concluding that feeblemindedness was not
Simon tests to newly arrived immigrants. The incurable, and that the feebleminded did not need
tests were adminis- tered through a translator, not to be segregated in institutions.
long after the immi- grants walked ashore. We The Goddard chapter in the history of testing
can guess that many of the immigrants were serves as a reminder that even well-meaning per-
frightened, confused, and dis- oriented. Thus, a sons operating within generally accepted social
test devised in French, then trans- lated to English norms can misuse psychological tests. We need
was, in turn, retranslated back to Yiddish, be ever mindful that disinterested “science” can
Hungarian, Italian, or Russian; adminis- tered to be harnessed to the goals of a pernicious social
bewildered farmers and laborers who had just ideology.
endured an Atlantic crossing; and interpreted
according to the original French norms.
THE INVENTION OF NONVERBAL
What did Goddard find and what did he make
TESTS IN THE EARLY 1900S
of his results? In small samples of immigrants (22
to 50), his assistants found 83 percent of the Jews, Because of the heavy emphasis of the Binet-
80 percent of the Hungarians, 79 percent of the Simon scales upon verbal skills, many
Italians, and 87 percent of the Russians to be psychologists real- ized that this new measuring
feebleminded, that is, below age 12 on the Binet- device was not entirely appropriate for non-
Simon scales (Goddard, 1917). His interpretation English-speaking subjects, illiterates, and those
of these findings is, by turns, skeptically cautious with speech and hearing impairments. A spate of
and then provoca- tively alarmist. In one place he performance scales there- fore arose in the decade
claims that his study “makes no determination of following Goddard’s 1908 translation of the
the actual percentage, even of these groups, who Binet-Simon. Only a brief chronology of
are feebleminded.” Yet, later in the report he states nonverbal tests will be supplied here. The
that his figures would only need to be revised by interested reader should consult DuBois (1970).
“a relatively small amount” in order to find the In this listing of early performance tests, the
actual percentages of feeblemindedness among reader will surely recognize many instruments
immigrant groups. Fur- ther, he concludes that the and subtests that are still used today.
intelligence of the aver- age immigrant is low, The earliest of the performance measures was
“perhaps of moron grade,” but then goes on to cite the Seguin form board, an upright stand with de-
environmental deprivation as the primary culprit. pressions into which ten blocks of varying shapes
Simultaneously, Goddard appears to favor could be fitted. This had been used by Seguin as a
deportation for low IQ immigrants but also training device for individuals with mental
provides the humanitarian perspective that we retarda- tion, but was subsequently developed as a
might be able to use “moron laborers” if only “we test by Goddard, and then standardized by R. H.
are wise enough to train them properly.” Sylvester (1913). This identical board is still
There is much, much more to the Goddard era used, with the subject blindfolded, in the
of early intelligence testing, and the interested Halstead-Reitan neu- ropsychological test battery
reader is urged to consult Gould (1981) and Gelb (Reitan & Wolfson, 1985).
(1986). The most important point that we wish to Knox (1914) devised several performance
stress here is that—like many other early psychol- tests for use with Ellis Island immigrants. His
ogists—Goddard’s scholarly views were influ- tests
18 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

required absolutely no verbal responses from sub- care had been taken in securing a representative
jects. The examiner demonstrated each task non- sample of subjects for use in the standardization
verbally to ensure that the subjects understood the of the test. As Goodenough (1949) notes: “The
instructions. Included in his tests were a simple publi- cation of the Stanford Revision marked the
wooden puzzle (which Knox referred to as the end of the initial period of experimentation and
“moron” test) and the same digit-symbol substitu- uncer- tainty. Once and for all, intelligence
tion test which is now found on most of the testing had been put on a firm basis.”
Wech- sler scales of intelligence. The Stanford-Binet was the standard of
Several other early performance tests are intelli- gence testing for decades. New tests were
worthy of brief mention because they have sur- always validated in terms of their correlations
vived to the present day in revised form. Pintner with this measure. It continued its preeminence
and Paterson (1917) invented a 15-part scale of through re- visions in 1937, and 1960, by which
performance tests that used several form boards, time the Wech- sler scales (Wechsler, 1949, 1955)
puzzles, and object assembly tests. The object as- had begun to compete with it. The latest revision
sembly test—reassembling cut-up cardboard ver- of the Stanford- Binet was completed in 2003.
sions of common objects such as a horse—is a This test and the Wechsler scales are discussed in
mainstay of several contemporary intelligence detail in a later chapter. It is worth mentioning
tests. The Kohs Block Design test (Kohs, 1920), here that the Wech- sler scales became a quite
which required the subject to assemble painted popular alternative to the Stanford-Binet mainly
blocks to resemble a pattern, is well known to any because they provided more than just an IQ score.
modern tester who uses the Wechsler scales. The In addition to Full Scale IQ, the Wechsler scales
Porteus Maze Test (Porteus, 1915) is a graded se- provided ten to twelve subtest scores, and a
ries of mazes for which the subject must avoid Verbal and Performance IQ. By con- trast, the
dead ends while tracing a path from beginning to earlier versions of the Stanford-Binet sup- plied
end. This is a fine instrument that is still available only a single overall summary score, the global
today, but underused. IQ.

THE STANFORD-BINET: GROUP TESTS AND THE


THE EARLY MAINSTAY OF IQ CLASSIFICATION OF WWI
While it was Goddard who first translated the ARMY RECRUITS
Binet scales in the United States, it was Stanford Given the American penchant for efficiency, it
pro- fessor Lewis M. Terman (1857–1956) who was only natural that researchers would seek
popu- larized IQ testing with his revision of the group men- tal tests to supplement the relatively
Binet scales in 1916. The new Stanford-Binet, as time-consum- ing individual intelligence tests
it was called, was a substantial revision, not just imported from France. Among the first to develop
an ex- tension, of the earlier Binet scales. Among group tests was Pyle (1913), who published
the many changes that led to the unquestioned schoolchildren norms for a battery consisting of
prestige of the Stanford-Binet was the use of the such well-worn measures as memory span, digit-
now fa- miliar IQ for expressing test results. The symbol substitution, and oral word association
number of items was increased to 90, and the new (quickly writing down words in re- sponse to a
scale was suitable for those with mental stimulus word). Pintner (1917) revised and
retardation, children, and both normal and expanded Pyle’s battery, adding to it a timed
“superior” adults. In addition, the Stanford-Binet cancellation test in which the child crossed out
had clear and well-organized in- structions for the letter a wherever it appeared in a body of text.
administration and scoring. Great But group tests were slow to catch on, partly
because the early versions still had to be scored
TOPIC 1B EARLY TESTING IN THE UNITED STATES 19
de- signed for use with illiterates and recruits whose
laboriously by hand. The idea of a completely ob-
jective test with a simple scoring key was
inconsis- tent with tests such as logical memory
for which the judgment of the examiner was
required in scor- ing. Most amazing of all—at
least to anyone who has spent any time as a
student in American schools—the multiple-choice
question was not yet in general use.
The slow pace of developments in group test-
ing picked up dramatically as the United States
en- tered World War I in 1917. It was then that
Robert
M. Yerkes, a well-known psychology professor at
Harvard, convinced the U.S. government and the
Army that all of its 1.75 million recruits should
be given intelligence tests for purposes of
classifica- tion and assignment (Yerkes, 1919).
Immediately upon being commissioned into the
Army as a colonel, Yerkes assembled a
Committee on the Ex- amination of Recruits,
which met at the Vineland school in New Jersey
to develop the new group tests for the assessment
of Army recruits. Yerkes chaired the committee;
other famous members included Goddard and
Terman.
Two group tests emerged from this collabora-
tion: the Army Alpha and the Army Beta. It
would be difficult to overestimate the influence of
the Alpha and Beta upon subsequent intelligence
tests. The format and content of these tests
inspired de- velopments in group and individual
testing for decades to come. We discuss these
tests in some de- tail so that the reader can
appreciate their influence on modern intelligence
tests.

The Army Alpha and Beta Examinations


The Alpha was based on the then unpublished
work of Otis (1918) and consisted of eight verbally
loaded tests for average and high-functioning
recruits. The eight tests were: (1) following
oral directions,
(2) arithmetical reasoning, (3) practical judgment,
(4) synonym–antonym pairs, (5) disarranged sen-
tences, (6) number series completion, (7)
analogies, and (8) information. Figure 1.1 lists
some typical items from the Army Alpha
examination.
The Army Beta was a nonverbal group test
locate individuals who are doing nothing, point to
their pages and say, “Fix it. Fix them,” trying to set
first language was not English. It everyone working. At the end of 3 minutes
consisted of var- ious visual- examiner says, “Stop! But don’t turn over the
perceptual and motor tests such as page.” (Yerkes, 1921)
tracing a path through mazes and
visualizing the correct number of The Army testing was intended to help segre-
blocks depicted in a three- gate and eliminate the mentally incompetent, to
dimensional drawing. Figure 1.2 classify men according to their mental ability, and
depicts the black- board to assist in the placement of competent men in
demonstrations for all eight parts of
the Beta examination.
In order to accommodate illiterate
subjects and recent immigrants who
did not comprehend Eng- lish, Yerkes
instructed the examiners to use largely
pictorial and gestural methods for
explaining the tests to the prospective
Army recruits. The exam- iner and an
assistant stood atop a platform at the
front of the class and engaged in
pantomime to ex- plain each of the
eight tests. We reproduce here the
exact instructions for one test so that
the reader can appraise the likely
effects of the testing procedures upon
Beta results. Keep in mind that many
recruits could not see or hear the
examiner well, and that some had
never taken a test before. Here is how
the examiners introduced test 6,
picture completion, to each new
roomful of potential recruits:

“This is test 6 here. Look. A lot of


pictures.” After everyone has found
the place, “Now watch.” Exam- iner
points to hand and says to
demonstrator, “Fix it.” Demonstrator
does nothing, but looks puzzled.
Examiner points to the picture of the
hand, and then to the place where the
finger is missing and says to
demonstrator, “Fix it; fix it.”
Demonstrator then draws in finger.
Examiner says “That’s right.”
Examiner then points to fish and place
for eye and says, “Fix it.” After
demonstrator has drawn miss- ing eye,
examiner points to each of the four re-
maining drawings and says, “Fix them
all.” Demonstrator works samples out
slowly and with apparent effort. When
the samples are finished ex- aminer
says, “All right. Go head. Hurry up!”
Dur- ing the course of this test the
orderlies walk around the room and
20 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

FOLLOWING ORAL DIRECTIONS


Mark a cross in the first and also the third circle:
O O O O O
ARITHMETICAL REASONING
Solve each problem:
How many men are 5 men and 10 men? Answer ( )
If 3 1/2 tons of coal cost $21, what will 5 1/2 tons cost? Answer ( )
PRACTICAL JUDGMENT
Why are high mountains covered with snow? Because
# they are near the clouds
# the sun shines seldom on them
# the air is cold there
SYNONYM–ANTONYM PAIRS
Are these words the same or opposite?
largess—donation same? or opposite?
accumulate—dissipate same? or opposite?
DISARRANGED SENTENCES
Can these words be rearranged to form a sentence?
envy bad malice traits are and true? or false?
NUMBER SERIES COMPLETION
Complete the series: 3 6 8 16 18 36 . . . ...
ANALOGIES
Which choice completes the analogy?
tears—sorrow :: laughter— joy smile girls grin
granary—wheat :: library— desk books paper librarian
INFORMATION
Choose the best alternative:
The pancreas is in the abdomen head shoulder neck
The Battle of Gettysburg was fought in 1863 1813 1778 1812
Note: Examinees received verbal instructions for each subtest.

FIGURE 1.1 Sample Items from the Army Alpha Examination


Source: Reprinted from Yerkes, R. M. (Ed.). (1921). Psychological examining in the United States Army.
Memoirs of the National Academy of Sciences, Volume 15. With permission from the National Academy
of Sciences, Washington, DC.

responsible positions (Yerkes, 1921). However, it ing officers. In the main, his memoirs say that the
is not really clear whether the Army made much Army could have saved millions of dollars and in-
use of the masses of data supplied by Yerkes and creased its efficiency, if the testing data had been
his eager assistants. A careful reading of his used.
memoirs reveals that Yerkes did little more than To some extent, the mountains of test data had
produce favorable testimonials from high-rank- little practical impact on the efficiency of the
Army
TOPIC 1B EARLY TESTING IN THE UNITED STATES 21

FIGURE 1.2
The Blackboard Demonstrations for All
Eight Parts of the Beta Examination
Source: Reprinted from Yerkes, R. M. (Ed.). (1921).
Psychological examining in the United States Army.
Memoirs of the National Academy of Sciences, Vol-
ume 15. With permission from the National Acad-
emy of Sciences, Washington, DC.

because of the resistance of the military mind to lidity of the test results. For example, an internal
scientific innovation. However, it is also true that memorandum described the use of pantomime in
the Army brass had good reason to doubt the va- the instructions to the nonverbal Beta
examination:
22 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

For the sake of making results from the various


camps comparable, the examiners were ordered to
racial intermixture would inevitably cause a
follow a certain detailed and specific series of bal- deteri- oration of American intelligence. For
let antics, which had not only the merit of being example, the caption to one graph reads, in part:
perfectly incomprehensible and unrelated to mental
The distributions of the intelligence scores of the
testing, but also lent a highly confusing and dis-
entire Nordic group, the combined Mediterranean
tracting mystical atmosphere to the whole perfor-
and Alpine groups, and the negro draft. The
mance, effectually preventing all approach to the
process of racial intermixture cannot result in any-
attitude in which a subject should be while having
thing but an average of these elements, with the re-
his soul tested. (cited in Samelson, 1977)
sulting deterioration of American intelligence.
(Brigham, 1923)
In addition, the testing conditions left much to be
desired, with wave upon wave of recruits ushered Seven years later, Brigham (1930) forthrightly
in one door, tested, and virtually shoved out the disavowed his earlier views. He cited cultural and
other side. Tens of thousands of recruits received language differences as the likely cause of ethnic
a literal zero for many subtests, not because they and racial disparities on the Army tests. He
were retarded but because they couldn’t fathom asserted that comparative studies of national and
the instructions to these enigmatic new racial groups could not be made with existing
instruments. Many recruits fell asleep while the tests and concluded that his earlier findings were
testers gave es- oteric and mysterious pantomime “without foundation” (Brigham, 1930).
instructions.
On the positive side, the Army testing
provided psychologists with a tremendous
amount of expe- rience in the psychometrics of EARLY EDUCATIONAL TESTING
test construction. Thousands of correlation For good or for ill, Yerkes’s grand scheme for
coefficients were com- puted, including the test- ing Army recruits helped to usher in the era
prominent use of multiple correlations in the of group tests. After WWI, inquiries rushed in
analysis of test data. Test con- struction graduated from industry, public schools, and colleges about
from an art to a science in a few short years. the po- tential applications of these
straightforward tests that almost anyone could
administer and score (Yerkes, 1921). The
The Army Tests and Ethnic Differences psychologists who had worked with Yerkes soon
Unfortunately, the Army test results were some- left the service and carried with them to industry
times used to substantiate prejudices about and education their newfound no- tion of paper-
various racial and ethnic groups rather than to and-pencil tests of intelligence.
dispassion- ately investigate the causes of group The Army Alpha and Beta were also released
differences. For example, in his influential book for general use. These tests quickly became the
A Study of American Intelligence, Brigham prototypes for a large family of group tests and
(1923) undertook a massive analysis of Alpha and influenced the character of intelligence tests, col-
Beta scores for Nordic, Mediterranean, and lege entrance examinations, scholastic
Alpine immigrants. The text is stuffed with achievement tests, and aptitude tests. To cite just
ostensibly objective tables and charts comparing one specific consequence of the Army testing, the
racial and ethnic groups. For example, one National Research Council, a government
curious figure in his book depicts the proportion organization of scientists, devised the National
of each immigration sample at or below the Intelligence Test, which was eventually given to 7
average of the African American draft. Brigham million children in the United States during the
concluded that African Americans, Mediterranean 1920s. Thus, such well-known tests as the
immigrants, and Alpine immigrants were intellec- Wechsler scales, the Scho- lastic Aptitude Tests,
tually inferior. He sounded a dire warning that and the Graduate Record Exam actually have
roots that reach back to Yerkes,
TOPIC 1B EARLY TESTING IN THE UNITED STATES 23

Otis, and the mass testing of Army recruits during trast, a single aptitude test will measure just one
WWI. ability domain, and a multiple aptitude test
The College Entrance Examination Board battery will provide scores in several distinctive
(CEEB) was established at the turn of the ability areas.
twentieth century to help avoid duplication in the The development of aptitude tests lagged be-
testing of applicants to U.S. colleges. The early hind that of intelligence tests for two reasons, one
exams had been of the short answer essay format, statistical, the other social. The statistical problem
but this was to change quickly when C. C. was that a new technique, factor analysis, was
Brigham, a disciple of Yerkes, became CEEB often needed to discern which aptitudes were pri-
secretary after WWI. In 1925, the College Board mary and therefore distinct from each other. Re-
decided to construct a scholastic aptitude test for search on this question had been started quite
use in college admis- sions (Goslin, 1963). The early by Spearman (1904) but was not refined
new tests reflected the now familiar objective until the 1930s (Spearman, 1927; Kelley, 1928;
format of unscrambling sen- tences, completing Thurstone, 1938). This new family of techniques,
analogies, and filling in the next number in a factor analysis, allowed Thurstone to conclude
sequence. Machine scoring was intro- duced in that there were specific factors of primary mental
the 1930s, making objective group tests even ability such as verbal comprehension, word
more efficient than before. These tests then fluency, num- ber facility, spatial ability,
evolved into the present College Board tests, in associative memory, perceptual speed, and
par- ticular, the Scholastic Aptitude Tests, now general reasoning (Thur- stone, 1938; Thurstone
known as the Scholastic Assessment Tests. & Thurstone, 1941). More will be said about this
The functions of the CEEB were later sub- in the later chapters on in- telligence and ability
sumed under the nonprofit Educational Testing testing. The important point here is that Thurstone
Ser- vice (ETS). The ETS directed the and his followers thought that global measures of
development, standardization, and validation of intelligence did not, so to speak, “cut nature at its
such well-known tests as the Graduate Record joints.” As a result, it was felt that such measures
Examination, the Law School Admissions Test, as the Stanford-Binet were not as useful as
and the Peace Corps Entrance Tests. multiple aptitude test batteries in determining a
Meanwhile, Terman and his associates at person’s intellectual strengths and weaknesses.
Stan- ford were busy developing standardized The second reason for the slow growth of
achieve- ment tests. The Stanford Achievement apti- tude batteries was the absence of a practical
Test (SAchT) was first published in 1923; a appli- cation for such refined instruments. It was
modern version of it is still in wide use today. not until WWII that a pressing need arose to
From the very beginning, the SAchT incorporated select candi- dates who were highly qualified for
such modern psychomet- ric principles as very difficult and specialized tasks. The job
norming the subtests so that within-subject requirements of pilots, flight engineers, and
variability could be assessed and se- lecting a navigators were very specific and demanding. A
very large and representative standardiza- tion general estimate of in- tellectual ability, such as
sample. provided by the group intelligence tests used in
WWI, was not sufficient to choose good
candidates for flight school. The armed forces
THE DEVELOPMENT
solved this problem by developing a specialized
OF APTITUDE TESTS
aptitude battery of 20 tests that was administered
Aptitude tests measure more specific and to men who passed preliminary screening tests.
delimited abilities than intelligence tests. These measures proved invaluable in selecting
Traditionally, intel- ligence tests assess a more pilots, navigators, and bombadiers, as reflected in
global construct such as general intelligence, the much lower washout rates of men
although there are exceptions to this trend that
will be discussed later. By con-
24 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

selected by test battery instead of the old methods detection; likewise, a normal subject with a fake
(Goslin, 1963). Such tests are still used widely in bad mentality might be categorized as unfit for
the armed services. ser- vice. Modern instruments such as the MMPI
have incorporated various validity scales for
PERSONALITY AND VOCATIONAL detecting such response tendencies. The Personal
TESTING AFTER WWI Data Sheet, by contrast, was predicated on the
assumption that subjects would be honest when
While such rudimentary assessment methods as responding to the questions.
the free association technique had been used be- The next major development was an inventory
fore the turn of the twentieth century by Galton, of neurosis, the Thurstone Personality Schedule
Kraepelin, and others, it was not until WWI that (Thurstone & Thurstone, 1930). After first culling
personality tests emerged in a form resembling hundreds of items answerable in the yes-no-?
their contemporary appearance. As has happened man- ner from Woodworth’s inventory and other
so often in the history of testing, it was once sources, Thurstone rationally keyed items in
again a practical need that served as the impetus terms of how the neurotic would typically answer
for this new development. Modern personality them. Reflect- ing Thurstone’s penchant for
testing began when Woodworth attempted to statistical finesse, this inventory was one of the
develop an instrument for detecting Army recruits first to use the method of internal consistency
who were susceptible to psychoneurosis. Virtually whereby each prospective item was correlated
all the modern personality inventories, schedules, with the total score on the tenta- tively identified
and questionnaires owe a debt to Woodworth’s scale to determine whether it be- longed on the
Per- sonal Data Sheet (1919). scale.
The Personal Data Sheet consisted of 116 From the Thurstone test sprang the Bernreuter
ques- tions that the subject was to answer by Personality Inventory (Bernreuter, 1931). It was a
underlining Yes or No. The questions were little more refined than its Thurstone predecessor,
exclusively of the “face obvious” variety and, for measuring four personality dimensions: neurotic
the most part, in- volved fairly serious tendency, self-sufficiency, introversion-extrover-
symptomatology. Representa- tive items included: sion, and dominance-submission. A major
• Do ideas run through your head so that you innova- tion in test construction was that a single
can- not sleep? test item could contribute to more than one scale.
• Were you considered a bad boy? The Allport-Vernon Study of Values was also
• Are you bothered by a feeling that things are published in 1931 (Allport & Vernon, 1931). This
not real? test was quite different from the others in that it
• Do you have a strong desire to commit suicide? measured values instead of psychopathology. Fur-
thermore, it adopted a new scoring method, the
Readers familiar with the Minnesota Multiphasic ip- sative approach, in which the respondent was
Personality Inventory (MMPI) must surely recog- compared only with himself or herself regarding
nize the debt that this more recent inventory has the balance of importance given to six basic
to Woodworth’s instrument. values: theoretical, economic, aesthetic, social,
From his account of how the Personal Data political, and religious. The test was devised in
Sheet was developed (Woodworth, 1951), it is such a man- ner that subjects were required to
clear that Woodworth took great care in the make choices be- tween the six values in specific
selection of items. In other respects, though, this situations. As a consequence, the average on the
instrument embodies a large dose of psychometric six scales was al- ways the same for each subject.
credulity. The most serious problem is simply that A weakness in one value was compensated for by
a disturbed subject motivated to look good could a strength in some other value. Thus, only the
do so without relative peaks and val- leys were of interest.
TOPIC 1B EARLY TESTING IN THE UNITED STATES 25

Any chronology of self-report inventories strongly influenced by Jungian and


must surely include the Minnesota Multiphasic psychoanalytic thinking, so it was natural that his
Personality Inventory, or MMPI (Hathaway & new approach fo- cused on the tendency of
McKinley, 1940). This test and its revision, the patients to reveal their in- nermost conflicts
MMPI-2, are discussed in detail later. It will suf- unconsciously when responding to ambiguous
fice for now to point out that the scales of the stimuli. The Rorschach and other projective tests
MMPI were constructed by the method that discussed subsequently were pred- icated upon
Wood- worth pioneered, contrasting the responses the projective hypothesis: When re- sponding to
of nor- mal and psychiatrically disturbed subjects. ambiguous or unstructured stimuli, we
In addition, the MMPI introduced the use of inadvertently disclose our innermost needs, fan-
validity scales to determine fake bad, fake good, tasies, and conflicts.
and ran- dom response patterns. Rorschach was convinced that people
revealed important personality dimensions in
their responses to inkblots. He spent years
THE ORIGINS OF
developing just the right set of ten inkblots and
PROJECTIVE TESTING
systematically analyzed the responses of personal
The projective approach originated with the word friends and different patient groups (Rorschach,
association method pioneered by Francis Galton 1921). Unfortunately, he died only a year after his
in the late 1800s. Galton gave himself four monograph was published, and it was up to others
seconds to come up with as many associations as to complete his work. Develop- ments in the
possible to a stimulus word, and then categorized Rorschach are reviewed later in the text.
his associa- tions as parrotlike, image-mediated, While Rorschach’s test was originally devel-
or histrionic representations. This latter category oped to reveal the innermost workings of the ab-
convinced him that mental operations “sunk normal subject, the TAT, or Thematic
wholly below the level of consciousness” were at Apperception Test (Morgan & Murray, 1935),
play. Some histori- ans have even speculated that was developed as an instrument to study normal
Freud’s application of free association as a personality. Of course, both have since been
therapeutic tool in psycho- analysis sprang from expanded for testing with the entire continuum of
Galton’s paper published in Brain in 1879 human behavior.
(Forrest, 1974). The TAT consists of a series of pictures that
Galton’s work was continued in Germany by largely depict one or more persons engaged in an
Wundt and Kraepelin, and finally brought to ambiguous interaction. The subject is shown one
fruition by Jung (1910). Jung’s test consisted of picture at a time and told to make up a story about
100 stimulus words. For each word, the subject it. He or she is instructed to be as dramatic as pos-
was to reply as quickly as possible with the first sible, to discuss thoughts and feelings, and to de-
word coming to mind. Kent and Rosanoff (1910) scribe the past, present, and future of what is
gave the association method a distinctively Amer- depicted in the picture.
ican flavor by tabulating the reactions of 1,000 Murray (1938) believed that underlying per-
normal subjects to a list of 100 stimulus words. sonality needs, such as the need for achievement,
These tables were designed to provide a basis for would be revealed by the contents of the stories.
comparing the reactions of normal and “insane” Although numerous scoring systems were devel-
subjects. oped, clinicians in the main have relied upon an
While the Americans were pursuing the impressionistic analysis to make sense of TAT
empir- ical approach to objective personality pro- tocols. Modern applications of the TAT are
testing, a young Swiss psychiatrist, Hermann dis- cussed in a later chapter.
Rorschach (1884–1922), was developing a The sentence completion technique was also
completely differ- ent vehicle for studying begun during this era with the work of Payne
personality. Rorschach was (1928). There have been numerous extensions and
26 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

variations on the technique, which consists of the photographed patients. This was a more palat-
giving subjects a stem such as “I am bored when able theoretical basis for the test than the dubious
———,” and asking them to complete the genetic theories of Szondi. Nonetheless, empirical
sentence. Some modern applications are research cast doubt on the validity of the Szondi
discussed later, but it can be mentioned now that Test, and it shortly faded into oblivion (Borstel-
the problem of scoring and interpretation, which mann & Klopfer, 1953).
vexed early sentence comple- tion test
developers, is still with us today.
An entirely new approach to projective testing THE DEVELOPMENT
was taken by Goodenough (1926), who tried to OF INTEREST INVENTORIES
determine not just intellectual level, but also the While the clinicians were developing measures
interests and personality traits of children by ana- for analyzing personality and unconscious
lyzing their drawings. Buck’s (1948) test, the conflicts, other psychologists were devising
House-Tree-Person, was a little more measures for guidance and counseling of the
standardized and structured and required the masses of more normal persons. Chief among
subject to draw a house, a tree, and a person. such measures was the interest inventory, which
Machover’s (1949) Per- sonality Projection in the has roots going back to Thorndike’s (1912) study
Drawing of the Human Figure was the logical of developmental trends in the interests of 100
extension of the earlier work. Figure drawing as a college students. In 1919–1920, Yoakum
projective approach to understanding personality developed a pool of 1,000 items relating to
is still used today, and a later chapter discusses interests from childhood through early maturity
modern developments in this practice. (DuBois, 1970). Many of these items were
Meanwhile, projective testing in Europe was incorporated in the Carnegie Interest Inven- tory.
dominated by the Szondi Test, a wacky Cowdery (1926–27) improved and refined
instrument based on wholly faulty premises. previous work on the Carnegie instrument by in-
Lipot Szondi was a Hungarian-born Swiss creasing the number of items, comparing
psychiatrist who believed that major psychiatric responses of three criterion groups (doctors,
disorders were caused by recessive genes. His test engineers, and lawyers) with control groups of
consisted of 48 photo- graphs of psychiatric nonprofessionals, and developing a weighting
patients divided into six sets of the following formula for items. He was also the first
eight types: homosexual, epilep- tic, sadistic, psychometrician to realize the im- portance of
hysteric, catatonic, paranoiac, manic, and cross validation. He tested his new scales on
depressive (Deri, 1949). From each set of eight additional groups of doctors, engineers, and
pictures, the subject was instructed to select the lawyers to ensure that the discriminations found
two pictures he or she liked best and the two in the original studies were reliable group
disliked most. A person who consistently differences rather than capitalizations on error
preferred one kind of picture in the six sets was variance.
presumed to have some recessive genes that made Edward K. Strong (1884–1963) revised
him or her have sympathy for the pictured person. Cowdery’s test and devoted 36 years to the devel-
Thus, projective preferences were presumed to opment of empirical keys for the modified instru-
reveal recessive genes predisposing the individual ment known as the Strong Vocational Interest
to specific psy- chiatric disturbances. Blank (SVIB). Persons taking the test could be
Deri (1949) imported the test to the United scored on separate keys for several dozen occupa-
States and changed the rationale. She did not tions, providing a series of scores of
argue for a recessive genetic explanation of immeasurable value in vocational guidance. The
picture choice but explained such preferences on SVIB became one of the most widely used tests
the basis of un- conscious identification with the of all time (Strong, 1927). Its modern version, the
characteristics of Strong In-
TOPIC 1B EARLY TESTING IN THE UNITED STATES 27

terest Inventory, is still widely used by guidance


counselors.
terest Survey (Kuder, 1966; Kuder & Diamond,
For decades the only serious competitor to the
1979; Zytowski, 1985).
SVIB was the Kuder Preference Record (Kuder,
1934). The Kuder differed from the Strong by
forc- ing choices within triads of items. The SUMMARY OF MAJOR LANDMARKS
Kuder was an ipsative test; that is, it compared IN THE HISTORY OF TESTING
the relative strength of interests within the
We conclude our historical survey of psychologi-
individual, rather than comparing his or her
cal testing with a brief tabular summary of land-
responses to various pro- fessional groups. More
mark events up to 1950 (Table 1.2). The
recent revisions of the Kuder Preference Record
interested reader can find a more detailed listing
include the Kuder Gen- eral Interest Survey and
—including a chronology of post-1950
the Kuder Occupational In-
developments—in Appendix A.

TABLE 1.2A Summary of Early Landmarks in the History of Testing

2200 B.C. Chinese begin civil service examinations.


A.D.1862 Wilhelm Wundt uses a calibrated pendulum to measure the “speed of
thought.”
1884 Francis Galton administers the first test battery to thousands of citizens at
the International Health Exhibit.
1890 James McKeen Cattell uses the term mental test in announcing the agenda
for his Galtonian test battery.
1901 Clark Wissler discovers that Cattellian “brass instruments” tests have no
correlation with college grades.
1905 Binet and Simon invent the first modern intelligence test.
1914 Stern introduces the IQ, or intelligence quotient: the mental age divided by
chronological age.
1916 Lewis Terman revises the Binet-Simon scales, publishes the Stanford-
Binet. Revisions appear in 1937, 1960, and 1986.
1917 Robert Yerkes spearheads the development of the Army Alpha and Beta
examinations used for testing WWI recruits.
1917 Robert Woodworth develops the Personal Data Sheet, the first personality
test.
1920 Rorschach Inkblot test published.
1921 Psychological Corporation—the first major test publisher—founded by
Cattell, Thorndike, and Woodworth.
1927 First edition of the Strong Vocational Interest Blank published.
1939 Wechsler-Bellevue Intelligence Scale published. Revisions published in
1955, 1981, and 1997.
1942 Minnesota Multiphasic Personality Inventory published.
1949 Wechsler Intelligence Scale for Children published. Revisions published
in 1974, 1991.
28 CHAPTER 1 THE HISTORY OF PSYCHOLOGICAL TESTING

SUMMARY
1. In 1910, Henry Goddard translated the cational Testing Service (ETS), which supervised
1908 Binet-Simon scale. In 1911, he tested more the release of such well-known tests as the
than a thousand schoolchildren with the test, rely- Scholas- tic Aptitude Tests and the Graduate
ing upon the original French norms. He was dis- Record Exam.
turbed to find that 3 percent of the sample was
7. The advent of multiple aptitude test
“feebleminded” and recommended segregation
batter- ies was made possible with the
from society for these children.
development of fac- tor analysis by L. L.
2. Nonverbal intelligence tests were Thurstone and others. Later, the improvement of
invented in the early 1900s to facilitate testing of these test batteries was spurred on by the practical
non- English-speaking immigrants. For example, need for selecting WWII recruits for highly
Knox published a wooden puzzle test in 1914 and specialized positions.
also used the now familiar digit-symbol
8. Personality testing began with Wood-
substitution test.
worth’s Personal Data Sheet, a simple yes-no
3. In 1916, Lewis Terman released the Stan- checklist of symptoms used to screen WWI
ford-Binet, a revision of the Binet scales. This recruits for psychoneurosis. Many later
well-designed and carefully normed test placed inventories, includ- ing the popular Minnesota
intelligence testing on a firm footing once and for Multiphasic Personality Inventory, borrowed
all. content from the Personal Data Sheet.
4. During WWI, Robert Yerkes headed a 9. Projective testing began with the word as-
team of psychologists who produced the Army sociation technique pioneered by Francis Galton
Alpha, a verbally loaded group test for average and brought to fruition by C. G. Jung in 1910.
and superior recruits, and the Army Beta, a Her- mann Rorschach published his famous
nonverbal group test for illiterates and non- inkblot test in 1921.
English-speaking recruits.
10. The Thematic Apperception Test (TAT), a
5. Early testing pioneers such as C. C. picture storytelling test introduced in 1935 by
Brigham used results of individual and group in- Mor- gan and Murray, was based upon the
telligence tests to substantiate ethnic differences projective hypothesis: When responding to
in intelligence and thereby justify immigration re- ambiguous or un- structured stimuli, examinees
strictions. Later, some of these testing pioneers inadvertently dis- close their innermost needs,
dis- avowed their prior views. fantasies, and conflicts.
6. Educational testing fell under the purview 11. The assessment of vocational interest
of the College Entrance Examination Board began with Yoakum’s Carnegie Interest Inventory
(CEEB), founded at the turn of the twentieth cen- developed in 1919–1920. After several revisions
tury. In 1947, the CEEB was replaced by the Edu- and extensions, this instrument emerged as E. K.
Strong’s Vocational Interest Blank.

You might also like