Failure To Disagree PDF

Conditions for Intuitive Expertise
A Failure to Disagree
Daniel Kahneman Princeton University

Gary Klein Applied Research Associates
This article reports on an effort to explore the differences still separated in many ways: by divergent attitudes, pref-
between two approaches to intuition and expertise that are erences about facts, and feelings about fighting words such
often viewed as conflicting: heuristics and biases (HB) and as “bias.” If we are to understand the differences between
naturalistic decision making (NDM). Starting from the our respective communities, such emotions must be taken
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
obvious fact that professional intuition is sometimes mar- into account.

velous and sometimes flawed, the authors attempt to map We begin with a brief review of the origins and
the boundary conditions that separate true intuitive skill precursors of the NDM and HB approaches, followed by a
from overconfident and biased impressions. They conclude discussion of the most prominent points of contrast be-
that evaluating the likely quality of an intuitive judgment tween them (NDM: Klein, Orasanu, Calderwood, & Zsam-
requires an assessment of the predictability of the environ- bok, 1993; HB: Gilovich, Griffin, & Kahneman, 2002;
ment in which the judgment is made and of the individual’s Tversky & Kahneman, 1974). Next we present some claims
opportunity to learn the regularities of that environment. about the conditions under which skilled intuitions de-
Subjective experience is not a reliable indicator of judg- velop, followed by several suggestions for ways to improve
ment accuracy. the quality of judgments and choices.
Keywords: intuition, expertise, overconfidence, heuristics, Two Perspectives
judgment
Origins of the Naturalistic Decision
I n this article we report on an effort to compare our Making Approach

views on the issues of intuition and expertise and to The NDM approach, which focuses on the successes
discuss the evidence for our respective positions. When of expert intuition, grew out of early research on master
we launched this project, we expected to disagree on many chess players conducted by deGroot (1946/1978) and later
issues, and with good reason: One of us (GK) has spent by Chase and Simon (1973). DeGroot showed that chess
much of his career thinking about ways to promote reliance grand masters were generally able to identify the most
on expert intuition in executive decision making and iden- promising moves rapidly, while mediocre chess players
tifies himself as a member of the intellectual community of often did not even consider the best moves. The chess
scholars and practitioners who study naturalistic decision grand masters mainly differed from weaker players in their
making (NDM). The other (DK) has spent much of his unusual ability to appreciate the dynamics of complex
career running experiments in which intuitive judgment positions and quickly judge a line of play as promising or
was commonly found to be flawed; he is identified with the fruitless. Chase and Simon (1973) described the perfor-
“heuristics and biases” (HB) approach to the field. mance of chess experts as a form of perceptual skill in
A surprise awaited us when we got together to con- which complex patterns are recognized. They estimated
sider our joint field of interest. We found ourselves agree- that chess masters acquire a repertoire of 50,000 to 100,000
ing most of the time. Where we initially disagreed, we were immediately recognizable patterns, and that this repertoire
usually able to converge upon a common position. Our enables them to identify a good move without having to
shared beliefs are much more specific than the common- calculate all possible contingencies. Strong players need a
place that expert intuition is sometimes remarkably accu- decade of serious play to assemble this large collection of
rate and sometimes off the mark. We accept the common- basic patterns, but of course they achieve impressive levels
place, of course, but we also have similar opinions about
more specific questions: What are the activities in which
Daniel Kahneman, Woodrow Wilson School of Public and International
skilled intuitive judgment develops with experience? What Affairs, Princeton University; Gary Klein, Applied Research Associates,
are the activities in which experience is more likely to Fairborn, Ohio.
produce overconfidence than genuine skill? Because we We thank Craig Fox, Robin Hogarth, and James Shanteau for their
largely agree about the answers to these questions we also helpful comments on earlier versions of this article.
Correspondence concerning this article should be addressed to Daniel
favor generally similar recommendations to organizations Kahneman, Woodrow Wilson School of Public and International Affairs,
seeking to improve the quality of judgments and decisions. Princeton University, Princeton, NJ 08544-0001. E-mail: kahneman@
In spite of all this agreement, however, we find that we are princeton.edu
September 2009 ● American Psychologist 515

© 2009 American Psychological Association 0003-066X/09/$12.00
Vol. 64, No. 6, 515–526 DOI: 10.1037/a0016755
support, and to make many other critical decisions. The
RPD model is consistent with the work of deGroot (1946/
1978) and Simon (1992) and has been replicated in multi-
ple domains, including system design, military command
and control, and management of offshore oil installations
(see Klein, 1998, for a review). In each of these domains,
the RPD model offers a generally encouraging picture of
expert performance. It would be a caricature of the NDM
approach, however, to describe it as being solely dedicated
to praising expertise. NDM researchers have also tried to
document and analyze failures in the performance of ex-
perts (Cannon-Bowers & Salas, 1998; Klein, 1998; Woods,
O’Brien, & Hanes, 1987). In fact, the NDM movement was
crystallized by an event that resulted from a catastrophic
failure in expert decision making.
In 1988, an international tragedy occurred after the

USS Vincennes accidentally shot down an Iranian Airbus

(Fogarty, 1988). The USS Vincennes was an Aegis cruiser,
one of the most technologically advanced systems in the
Navy inventory, but the technology was not sufficient to
Daniel stave off the disaster. The incident has been the subject of
Kahneman detailed investigation by NDM researchers (Collyer &
Malecki, 1998; Klein, 1998). As a result of the disastrous
error and subsequent political fallout, the U.S. Navy de-
cided to initiate a program of research on decision making,
of skill even earlier. On the basis of this work, Simon the Tactical Decision Making Under Stress (TADMUS)
defined intuition as the recognition of patterns stored in program (Cannon-Bowers & Salas, 1998).
memory. Thus it was that in 1989 a group of 30 researchers who
The early work that led to the approach that is now studied decision making in natural settings met for several
called NDM was an attempt to describe and analyze the days in an effort to find commonalities between the deci-
decision making of commanders of firefighting companies. sion-making processes of firefighters, nuclear power plant
Fireground commanders are required to make decisions controllers, Navy officers, Army officers, highway engi-
under conditions of uncertainty and time pressure that neers, and other populations. Several researchers from the
preclude any orderly effort to generate and evaluate sets of judgment and decision making tradition participated in this
options. Klein, Calderwood, and Clinton-Cirocco (1986) meeting and in the preparation of a book describing the
investigated how the commanders could make good deci- NDM perspective (Klein et al., 1993). Lipshitz (1993)
sions without comparing options. The initial hypothesis identified several decision-making models that were devel-
was that commanders would restrict their analysis to only oped to describe the strategies used in field settings, in-
a pair of options, but that hypothesis proved to be incorrect. cluding the recognition-primed decision model (Klein,
In fact, the commanders usually generated only a single 1993), the cognitive continuum model (Hammond, Hamm,
option, and that was all they needed. They could draw on Grassia, & Pearson, 1987), image theory (Beach, 1990), the
the repertoire of patterns that they had compiled during search for dominance structures (Montgomery, 1993), and
more than a decade of both real and virtual experience to the skills/rules/knowledge framework and decision ladder
identify a plausible option, which they considered first. (Rasmussen, 1986). The NDM movement that emerged
They evaluated this option by mentally simulating it to see from this meeting focuses on field studies of subject-matter
if it would work in the situation they were facing—a experts who make decisions under complex conditions.
process that deGroot (1946/1978) had described as progres- These experts are expected to successfully attain vaguely
sive deepening. If the course of action they were consid- defined goals in the face of uncertainty, time pressure, high
ering seemed appropriate, they would implement it. If it stakes, team and organizational constraints, shifting condi-
had shortcomings, they would modify it. If they could not tions, and action feedback loops that enable people to
easily modify it, they would turn to the next most plausible manage disturbances while trying to diagnose them
option and run through the same procedure until an accept- (Orasanu & Connolly, 1993).
able course of action was found. This recognition-primed A central goal of NDM is to demystify intuition by
decision (RPD) strategy was effective because it took ad- identifying the cues that experts use to make their judg-
vantage of the commanders’ tacit knowledge (Klein et al., ments, even if those cues involve tacit knowledge and are
1986). The fireground commanders were able to draw on difficult for the expert to articulate. In this way, NDM
their repertoires to anticipate how flames were likely to researchers try to learn from expert professionals. Many
spread through a building, to notice signs that a house was NDM researchers use cognitive task analysis (CTA) meth-
likely to collapse, to judge when to call for additional ods to investigate the cues and strategies that skilled deci-
516 September 2009 ● American Psychologist

discussed were diverse, with outcome measures ranging
from academic success to patient recidivism and propensity
for violence. Although the algorithms were based on a
subset of the information available to the clinicians, statis-
tical predictions were more accurate than human predic-
tions in almost every case. Meehl (1954) believed that the
inferiority of clinical judgment was due in part to system-
atic errors, such as the consistent neglect of the base rates
of outcomes in discussion of individual cases. In a well-
known article, he later explained his reluctance to attend
clinical conferences by citing his annoyance with the cli-
nicians’ uncritical reliance on their intuition and their fail-
ure to apply elementary statistical reasoning (Meehl, 1973).
Inconsistency is a major weakness of informal judg-
ment: When presented with the same case information on
separate occasions, human judges often reach different

conclusions. Goldberg (1970) reported a “bootstrapping

effect,” which provides the most dramatic illustration of the
effect of inconsistency on the validity of judgments. Gold-
berg required a group of 29 clinicians to make diagnostic
judgments (psychotic vs. neurotic) in a set of cases, based
Gary Klein on personality test profiles of 861 patients who had been
independently assigned to one of these categories. He con-
structed an individual model of the predictions of each
judge— using multiple regression to estimate the weights
sion makers apply (Crandall, Klein, & Hoffman, 2006; that the judge assigned to each of the 11 scales in the
Schraagen, Chipman, & Shalin, 2000). CTA methods are Minnesota Multiphasic Personality Inventory. Judges were
semi-structured interview techniques that elicit the cues then required to make predictions for a new set of cases;
and contextual considerations influencing judgments and Goldberg also used the individual statistical model of each
decisions. Researchers cannot expect decision makers to judge to generate a prediction for these new cases. The
accurately explain why they made decisions (Nisbett & bootstrap models were almost always more accurate than
Wilson, 1977); CTA methods provide a basis for making the judges they modeled. The only plausible explanation of
inferences about the judgment and decision process. For this remarkable result is that human judgments are noisy to
example, Crandall and Getchell-Reiter (1993) studied an extent that substantially impairs their validity. In an
nurses in a neonatal intensive care unit (NICU) who could extensive meta-analysis of judgment studies using the lens
detect infants developing life-threatening infections even model, Karelaia and Hogarth (2008) reported strong sup-
before blood tests came back positive. When asked, the port for the generality of the bootstrap effect and for the
nurses were at first unable to describe how they made their crucial importance of lack of consistency in explaining this
judgments. The researchers used CTA methods to probe effect.
specific incidents and identified a range of cues and pat- Kahneman read Meehl’s book in 1955 while serving
terns, some of which had not yet appeared in the nursing or in the Psychological Research Unit of the Israel Defense
medical literature. A few of these cues were opposite to the Forces, and the book helped him make sense of his own
indicators of infection in adults. Crandall and Gamblian encounters with the difficulties of clinical judgment. One of
(1991) extended the NICU work. They confirmed the find- Kahneman’s duties was to assess candidates for officer
ings with nurses from a different hospital and then created training, using field tests and other observations as well as
an instructional program to help new NICU nurses learn a personal interview. Kahneman (2003) described the pow-
how to identify the early signs of sepsis in neonates. That erful sense of getting to know each candidate and the
program has been widely disseminated throughout the accompanying conviction that he could foretell how well
nursing community. the candidate would do in further training and eventually in
combat. The subjective conviction of understanding each
Origins of the Heuristics and Biases Approach
case in isolation was not diminished by the statistical
In sharp contrast to NDM, the HB approach favors a feedback from officer training school, which indicated that
skeptical attitude toward expertise and expert judgment. the validity of the assessments was negligible. Kahneman
The origins of this attitude can be traced to a famous coined the term illusion of validity for the unjustified sense
monograph published by Paul Meehl in 1954. Meehl of confidence that often comes with clinical judgment. His
(1954) reviewed approximately 20 studies that compared early experience with the fallibility of intuitive impressions
the accuracy of forecasts made by human judges (mostly could hardly be more different from Klein’s formative
clinical psychologists) and those predicted by simple sta- encounter with the successful decision making of fire-
tistical models. The criteria in the studies that Meehl (1954) ground commanders.

The first study in the HB tradition was conducted in tradition are more likely to adopt an admiring stance to-
1969 (Tversky & Kahneman, 1971). It described perfor- ward experts. They are trained to explore the thinking of
mance in a task that researchers often perform without experts, hoping to identify critical features of the situation
recourse to computation: choosing the number of cases for that are obvious to experts but invisible to novices and
a psychological experiment. The participants in the study journeymen, and then to search for ways to pass on the
were sophisticated methodologists and statisticians, includ- experts’ secrets to others in the field. NDM researchers are
ing two authors of statistics textbooks. They answered disposed to have little faith in formal approaches because
realistic questions about the sample size they considered they are generally skeptical about attempts to impose uni-
appropriate in different situations. The conclusion of the versal structures and rules on judgments and choices that
study was that sophisticated scientists reached incorrect will be made in complex contexts.
conclusions and made inferior choices when they followed We found that the sharpest differences between the
their intuitions, failing to apply rules with which they were two of us were emotional rather than intellectual. Although
certainly familiar. The article offered a strongly worded DK is thrilled by the remarkable intuitive skills of experts
recommendation that researchers faced with the task of
that GK and others have described, he also takes consid-
choosing a sample size should forsake intuition in favor of
erable pleasure in demonstrations of human folly and in the
computation. This initial study of professionals reinforced

comeuppance of overconfident pseudo-experts. For his
Tversky and Kahneman (1971) in their belief (originally

based on introspection) that faulty statistical intuitions sur- part, GK recognizes that formal procedures and algorithms
vive both formal training and actual experience. Many sometimes outdo human judgment, but he enjoys hearing
studies in the intervening decades have confirmed the per- about cases in which the bureaucratization of decision
sistence of a diverse set of intuitive errors in the judgments making fails. Further, the nonoverlapping sets of col-
of some professionals. leagues with whom we interact generally share our atti-
tudes and reinforce our differences. Nevertheless, as this
Contrasts Between the Naturalistic article shows, we agree on most of the issues that matter.
Decision Making and Heuristics
and Biases Approaches Field Versus Laboratory
The intellectual traditions that we have traced to deGroot’s There is an obvious difference in the primary form of research
(1946/1978) studies of chess masters (NDM) and to Meehl’s conducted by the respective research communities. The mem-
(1954) research on clinicians (HB) are alive and well today. bers of the HB community are mostly based in academic
They are reflected in the approaches of our respective intel- departments, and they tend to favor well-controlled experi-
lectual communities. In this section we consider three impor- ments in the laboratory. The members of the NDM commu-
tant contrasts between the two approaches: the stance taken by nity are typically practitioners who operate in “real-world”
the NDM and HB researchers toward expert judgment, the use organizations. They have a natural sympathy for the ecolog-
of field versus laboratory settings for decision-making re- ical approach, first popularized in the late 1970s, which ques-
search, and the application of different standards of perfor- tions the relevance of laboratory experiments to real-world
mance, which leads to different conclusions about expertise. situations. NDM researchers use methods such as cognitive
task analysis and field observation to investigate judgments
Stance Regarding Expertise and
and decision making under complex conditions that would be
Decision Algorithms
difficult to recreate in the laboratory.
There is no logical inconsistency between the observations There is no logically necessary connection between these
that inspired the NDM and HB approaches to professional methodological choices and the nature of the hypotheses and
judgment: The intuitive judgments of some professionals models being tested. As the examples of the preceding section
are impressively skilled, while the judgments of other illustrate, the view that heuristics and biases are only studied
professionals are remarkably flawed. Although not contra- and found in the laboratory is a caricature.1 Similarly, the
dictory, these core observations suggest conflicting gener- RPD model could have emerged from the laboratory, and it
alizations about the utility of expert judgment. Members of has been tested there (Johnson & Raab, 2003; Klein, Wolf,
the HB community are of course aware of the existence of Militello, & Zsambok, 1995). In addition, a number of NDM
skill and expertise, but they tend to focus on flaws in researchers have reported studies of the performance of pro-
human cognitive performance. Members of the NDM com- ficient decision makers in realistically simulated environments
munity know that professionals often err, but they tend to (e.g., Smith, Giffin, Rockwell, & Thomas, 1986).
stress the marvels of successful expert performance.
The basic stance of HB researchers, as they consider
experts, is one of skepticism. They are trained to look for 1
Among many other examples, see Slovic (2000) for applications to
opportunities to compare expert performance with perfor- the study of responses to risk; Guthrie, Rachlinski, and Wistrich (2007)
mance by formal models or rules and to expect that experts and Sunstein (2000) for applications in the legal domain; Croskerry and
Norman (2008) for medical judgment; Bazerman (2005) for managerial
will do poorly in such comparisons. They are predisposed judgments and decision making; and Kahneman and Renshon (2007) for
to recommend the replacement of informal judgment by political decision making. The collection assembled by Gilovich, Griffin,
algorithms whenever possible. Researchers in the NDM and Kahneman (2002) includes other examples.

The Definition of Expertise sees a promising move. Intuitive skills are not restricted to
professionals: Anyone can recognize tension or fatigue in a
NDM researchers cannot use the same kinds of optimality
familiar voice on the phone. In the language of the two-
criteria as the HB community to define expertise. In rare
system (or dual process) models that have recently become
cases (e.g., the ratings of chess players based on their
popular (Evans & Frankish, 2009; see Evans, 2007, for a
record of wins and losses against other rated players) the
review of the origins of these ideas), intuitive judgments
performance level of experts is determined using standard-
ized measures. However, in most of the situations studied are produced by “System 1 operations,” which are auto-
by NDM researchers, the criteria for judging expertise are matic, involuntary, and almost effortless. In contrast, the
based on a history of successful outcomes rather than on deliberate activities of System 2 are controlled, voluntary,
quantitative performance measures. The most common and effortful—they impose demands on limited attentional
method for defining expertise in NDM research is to rely on resources. System 2 is involved, for example, when one
peer judgments. The conditions for defining expertise are performs a calculation (17 ⫻ 24 ⫽ ?), completes a tax
the existence of a consensus and evidence that the consen- form, reads a map, makes a left turn into heavy traffic, or
sus reflects aspects of successful performance that are parks in a narrow space. Self-monitoring is also a System
objective even if they are not quantified explicitly. If the 2 operation, which is impaired by concurrent effortful
performance of different professionals can be compared, tasks.

the best practitioners define the standard. As Shanteau The distinction between Systems 1 and 2 plays an
(1992) suggested, “Experts are operationally defined as important role in both the HB and NDM approaches. In the
those who have been recognized within their profession as RPD model, for example, the performance of experts in-
having the necessary skills and abilities to perform at the volves both an automatic process that brings promising
highest level” (p. 255). For example, captains of firefight- solutions to mind and a deliberate activity in which the
ing companies are evaluated not only by their ability to execution of the candidate solution is mentally simulated in
extinguish fires, but also by other criteria, such as the a process of progressive deepening. In the HB approach,
amount of damage created before the fire is controlled. System 2 is involved in the effortful performance of some
When colleagues say, “If Person X had been there instead reasoning and decision-making tasks as well as in the
of Person Y, the fire would not have spread as far,” then continuous monitoring of the quality of reasoning. When
Person X counts as an expert within that organization. The there are cues that an intuitive judgment could be wrong,
use of peer judgments can distinguish highly competent System 2 can impose a different strategy, replacing intu-
decision makers from mediocre ones who may have the ition by careful reasoning.2
same amount of experience and from novices who have The NDM and HB approaches share the assumption
little experience. This level of differentiation is sufficient that intuitive judgments and preferences have the charac-
for most NDM studies. teristics of System 1 activity: They are automatic, arise
In several of the studies that Meehl (1954) reviewed, effortlessly, and often come to mind without immediate
the quality of expert performance was evaluated by com- justification. However, the two approaches focus on differ-
paring the accuracy of decisions made by experts with the ent classes of intuition. Intuitive judgments that arise from
accuracy of optimal linear combinations. If the predictions experience and manifest skill are the province of NDM,
generated by a linear combination of a few variables are which explores the cues that guided such judgments and the
more accurate (in a new sample) than those of a profes- conditions for the acquisition of skill. In contrast, HB
sional who has access to the same information, the perfor- researchers have been mainly concerned with intuitive
mance of the professional is certainly suboptimal. Note that judgments that arise from simplifying heuristics, not from
the optimality criterion is significantly more demanding specific experience. These intuitive judgments are less
than the criteria by which expertise is evaluated in NDM likely to be accurate and are prone to systematic biases.
research. NDM researchers compare the performance of We discuss the two classes of judgment in sequence.
professionals with that of the most successful experts in First, we describe the process of skill acquisition that
their field, whereas HB researchers prefer to compare the supports the intuitive judgments and preferences of genuine
judgments of professionals with the outcome of a model experts. In particular, we explore two necessary conditions
that makes the best possible use of available information. It for the development of skill: high-validity environments
is entirely possible for the predictions of experienced cli- and an adequate opportunity to learn them. Next, we dis-
nicians to be superior to those of novices but inferior to a cuss heuristic-based intuitions and some of the biases to
linear model or an intelligent system. which they are prone. Finally, we address the question of
Sources of Intuition the critique of intuition: How can skilled intuitions be
distinguished from heuristic-based intuitions?
The judgments and decisions that we are most likely to call
intuitive come to mind on their own, without explicit 2
awareness of the evoking cues and of course without an The contrast between System 1 and System 2 has given rise to its
own literature. For example, J. St. B. T. Evans (2007) has asserted that
explicit evaluation of the validity of these cues. The fire- System 1 is affected by the tendency to contextualize problems in the light
fighter feels that the house is very dangerous, the nurse of prior knowledge and belief and that System 2 is affected by the
feels that an infant is ill, and the chess master immediately tendency to satisfice without considering alternatives.

Skilled Intuition as Recognition not obvious. Fifteen years later it was quite clear that the
highly educated and experienced experts that he studied
Simon (1992) offered a concise definition of skilled intu- were not superior to untrained readers of newspapers in
ition that we both endorse: “The situation has provided a their ability to make accurate long-term forecasts of polit-
cue: This cue has given the expert access to information ical events. The depressing consistency of the experts’
stored in memory, and the information provides the answer. failure to outdo the novices in this task suggests that the
Intuition is nothing more and nothing less than recognition” problem is in the environment: Long-term forecasting must
(p. 155). The model of intuition as recognition is helpful in fail because large-scale historical developments are too
several ways. First, it demystifies intuition. Many experts complex to be forecast. The task is simply impossible. A
who have intuitions (and some authors who study them) thought experiment can help. Consider what the history of
endow intuition with an almost magic aura— knowledge the 20th century might have been if the three fertilized eggs
that is not acquired by a rational process. In Simon’s that became Hitler, Stalin, and Mao had been female. The
definition, the process by which the pediatric nurse recog- century would surely have been very different, but can one
nizes that an infant may be gravely ill is not different in know how?
principle from the process by which she would notice that In other environments, the regularities that can be
a friend looks tired or angry or from the way in which a
observed are misleading. Hogarth (2001) introduced the

small child recognizes that an animal is a dog, not a cat. It useful notion of wicked environments, in which wrong
may be worth noting that this description of pattern recog- intuitions are likely to develop. His most compelling ex-
nition and the skilled pattern recognition described in the ample (borrowed from Lewis Thomas) is the early 20th-
RPD model are different from the recognition heuristic century physician who frequently had intuitions about pa-
discussed by Goldstein and Gigerenzer (1999), which is a tients in the ward who were about to develop typhoid. He
special-purpose rule of thumb. confirmed his intuitions by palpating these patients’
The recognition model implies two conditions that tongues, but because he did not wash his hands the intui-
must be satisfied for an intuitive judgment (recognition) to tions were disastrously self-fulfilling.
be genuinely skilled: First, the environment must provide High validity does not imply the absence of uncer-
adequately valid cues to the nature of the situation. Second, tainty, and the regularities that are to be discovered are
people must have an opportunity to learn the relevant cues. sometimes statistical. Games such as bridge or poker count
For the first condition, valid cues must be specifiable, at as high-validity situations. The mark of these situations is
least in principle— even if the individual does not know that skill, the ability to identify favorable bets, improves
what they are. The child relies on valid cues to identify a without guaranteeing that every attempt will succeed. The
dog, without any ability to state what the cues are. Simi- challenge of learning bridge and poker is not essentially
larly, the nurse and the firefighter are also guided by valid different from the challenge of learning chess, where the
cues they find in the environment. No magic is involved. A uncertainty arises from the enormous number of possible
crucial conclusion emerges: Skilled intuitions will only developments.
develop in an environment of sufficient regularity, which As the examples of competitive games illustrate, the
provides valid cues to the situation. The ways in which second necessary condition for the development of recog-
skilled judgments take advantage of environmental regu- nition (and of skilled intuition) is an adequate opportunity
larities have been discussed by, among others, Brunswik to learn the relevant cues. It has been estimated that chess
(1957) and Hertwig, Hoffrage, and Martingnon (1999). masters must invest 10,000 hours to acquire their skills
Validity, as we use the term, describes the causal and (Chase & Simon, 1973). Fortunately, most of the skills can
statistical structure of the relevant environment. For exam- be acquired with less practice. A child does not need
ple, it is very likely that there are early indications that a thousands of examples to learn to discriminate dogs from
building is about to collapse in a fire or that an infant will cats. The skilled pediatric nurse has seen a sufficient num-
soon show obvious symptoms of infection. On the other ber of sick infants to recognize subtle signs of disease, and
hand, it is unlikely that there is publicly available informa- the experienced fireground commander has experienced
tion that could be used to predict how well a particular numerous fires and probably imagined many more, during
stock will do—if such valid information existed, the price years of thinking and conversing about firefighting. With-
of the stock would already reflect it. Thus, we have more out these opportunities to learn, a valid intuition can only
reason to trust the intuition of an experienced fireground be due to a lucky accident or to magic—and we do not
commander about the stability of a building, or the intui- believe in magic.
tions of a nurse about an infant, than to trust the intuitions Two conditions must be satisfied for skilled intuition
of a trader about a stock. We can confidently expect that a to develop: an environment of sufficiently high validity and
detailed study of how professionals think is more likely to adequate opportunity to practice the skill. Ericsson, Char-
reveal useful predictive cues in the former cases than in the ness, Hoffman, and Feltovich (2006) have described a
latter. range of factors that influence the rate of skill development.
Determining the validity of an environment is not These include the type of practice people employ, their
always easy. When Tetlock (2005) embarked on his ambi- level of engagement and motivation, and the self-regula-
tious study of long-term forecasts of strategic and eco- tory processes they use. Even when the circumstances are
nomic events by experts, the outcome of his research was favorable, however, some people will develop skilled in-

tuitions more quickly than others. Talent surely matters. discussed in the HB literature illustrate the sources of
Every normal child can recognize a cat or a dog, but not all flawed intuitive judgments.
dedicated chess players become grand masters. Extraordi- Frederick (2005) has studied problems such as the
nary players such as Fischer and Kasparov were able to following: “A ball and a bat together cost $1.10. The bat
recognize patterns that other grand masters could not see on costs a dollar more than the ball. How much does the ball
their own—although the weaker players could recognize cost?” The question invariably evokes an immediate tenta-
the validity of the star’s intuition when led through it. tive solution: 10 cents. But the intuitive response is wrong
Intuitions that are available only to a few exceptional in this problem: The correct response is 5 cents. Further-
individuals are often called creative. Like other intuitions, more, an easy check will quickly show that the answer is
however, creative intuitions are based on finding valid wrong: If the ball is worth 10 cents, then the bat is worth
patterns in memory, a task that some people perform much $1.10 and the total is $1.20, which is not correct. The
better than others. There are large individual differences in surprising finding of Frederick’s research is that many
performance on the Remote Associations Test (RAT), intelligent people adopt the intuitively compelling response
which has a long history as a test of creativity. Participants without checking it. The incidence of intuitive errors in this
in that test are instructed to search for a common associate question ranges from approximately 50% in top undergrad-
of three words. The task has a wide range of difficulty: The uate schools (MIT, Princeton, Harvard) to 90% in some-
item cottage/swiss/cake is easy, but few people can quickly what less selective schools. It can be argued that the setting
find the answer to the item dive/light/rocket—although of this problem is not typical of the challenges that people
everyone recognizes the answer as valid (it is above us and face in the real world, but the phenomenon that Frederick
is blue in good weather; Mednick, 1962). The RAT brings studied is hardly restricted to puzzles. A common genre of
us back to Simon’s observation that the regularities on business literature celebrates successful leaders who made
which intuitions depend are represented in memory. The strategic decisions on the basis of gut feelings and intui-
situation of the RAT has high validity: Widely shared tions that they did not adequately check, but many of these
patterns of associations exist, which everyone can recog- successes owe more to luck than to genius (Rosenzweig,
nize although few can find them without prompting. 2007).
The anchoring phenomenon is another case in which a
Imperfect Intuition bias in the operations of memory causes intuitions to go
astray. Suppose some participants in an experiment are first
We have seen that reliably skilled intuitions are likely to asked “Is the average price of German cars more or less
develop when the individual operates in a high-validity than $100,000?” before they are required to provide a
environment and has an opportunity to learn the rules of numerical estimate of the average cost of German cars.
that environment. These conditions often remain unmet in Other respondents encounter a different anchoring ques-
professional contexts, either because the environment is tion: They are first asked whether the average cost of
insufficiently predictable (as in the long-term forecasting of German cars is more or less than $30,000, and then they are
political events) or because of the absence of opportunities to give an estimate of the average. We can expect the
to learn its rules (as in the case of firefighters exposed to a estimates of the two groups to differ by as much as half the
fire in a skyscraper with unexpected damage to the heat difference between the anchors—in this case the expected
shielding of its structural support). We both agree that most anchoring effect would be $35,000 (Jacowitz & Kahneman,
of the intuitive judgments and decisions that System 1 1995). The mechanism of anchoring is well understood
produces are skilled, appropriate, and eventually success- (Mussweiler & Strack, 2000). The original question with
ful. But we also agree that not all intuitive judgments are the high anchor brings expensive cars to the respondents’
skilled, although our hunches about the frequency of ex- mind: Mercedes, BMWs, Audis. The lower anchor is more
ceptions differ. People, including experienced profession- likely to evoke the image of a beetle and the name Volks-
als, sometimes have subjectively compelling intuitions wagen. The initial question therefore biases the sample of
even when they lack true skill, either because the environ- cars that come to mind when people next attempt to esti-
ment is insufficiently regular or because they have not mate the average price of German cars. The process of
mastered it. Lewis (2003) described the weaknesses in the estimating the average is a deliberate, System 2 operation,
ability of baseball scouts and managers to judge the capa- but the bias occurs in the automatic phase in which in-
bilities, contributions, and potential of players. Despite stances are retrieved from memory. The resulting anchor-
ample opportunities to acquire judgment skill, scouts and ing effect is large and robust. The answers that come to
managers were often insensitive to important variables and mind are typically held with substantial confidence, and the
overly influenced by such factors as the player’s appear- victims of anchoring manipulations confidently deny any
ance—a clear case of prediction by representativeness. effect of the anchor. The common criticism of laboratory
When intuitive judgments do not come from skill, experiments hardly applies here, because large anchoring
where do they come from? This is the question that stu- effects have been demonstrated in the courtroom, in real
dents of heuristics and biases have explored, mostly in estate transactions, and in other real-world contexts.
laboratory experiments. The answer, of course, is that For a final example, consider this question: “Julie is a
incorrect intuitions, like valid ones, also arise from the graduating senior. She read fluently at age 4. What is your
operations of memory. Three phenomena that have been best guess of her GPA [grade point average]?” Most people

who think about this question report having an immediate judgment was made as well as the judge’s history of
intuitive impression of the best-fitting GPA. The value that learning the rules of that environment.
comes to their mind is a GPA that is as impressive as
Professional Intuitions
Julie’s precocity in reading—roughly a match of percentile
scores. This intuitive prediction is clearly wrong because it We are of course not the first to have identified a regular
is not regressive. The correlation between early reading and environment and an adequate opportunity to learn it as
graduating GPA is not high and certainly does not justify preconditions for the development of skills, including in-
nonregressive matching. The process that generates this tuitive skills (see, e.g., Hogarth, 2001). Other investigators
intuitive answer has been called attribute substitution. The have focused on attitude, motivation, talent, and deliberate
attribute that is to be assessed is GPA, but the answer is practice as crucial to skill development (Ericsson, 2006;
simply a projection onto the GPA scale of an evaluation of Ericsson et al., 2006).
reading precocity. Attribute substitution has been described The importance of predictable environments and op-
as an automatic process. It produces intuitive judgments in portunities to learn them was apparent in an early review of
which a difficult question is answered by substituting an professions in which expertise develops. Shanteau (1992)
easier one—the essence of heuristic thinking (Kahneman & reviewed evidence showing that expertise was found in
Frederick, 2002). livestock judges, astronomers, test pilots, soil judges, chess
Of course, the mechanisms that produce incorrect masters, physicists, mathematicians, accountants, grain in-
intuitions will only operate in the absence of skill. If people spectors, photo interpreters, and insurance analysts. In con-
have a skilled response to the task with which they are trast, Shanteau noted poor performance by experienced
charged, they will apply their skill. But even in the absence professionals in another large set of occupations: stockbro-
of skill an intuitive response may come to their minds. The kers, clinical psychologists, psychiatrists, college admis-
difficulty is that people have no way to know where their sions officers, court judges, personnel selectors, and intel-
intuitions came from. There is no subjective marker that ligence analysts. Shanteau searched for task characteristics
distinguishes correct intuitions from intuitions that are pro- that distinguished the domains in which experts did well
duced by highly imperfect heuristics. An important char- from those in which experts did poorly. The factors that we
acteristic of intuitive judgments, which they share with identified—the predictability of outcomes, the amount of
experience, and the availability of good feedback—were
perceptual impressions, is that a single response initially
included in his list. In addition, Shanteau pointed to static
comes to mind. Most of the time we have to trust this first
(as opposed to dynamic) stimuli as favorable to good
impulse, and most of the time we are right or are able to
performance.
make the necessary corrections if we turn out to be wrong,
Three professions—nurses, physicians, and audi-
but high subjective confidence is not a good indication of
tors—appeared on both of Shanteau’s (1992) lists. These
validity (Einhorn & Hogarth, 1978). Checking one’s intu-
professionals exhibited genuine expertise in some of their
ition is an effortful operation of System 2, which people do activities but not in others. We refer to such mixed grades
not always perform—sometimes because it is difficult to do for professionals as “fractionated expertise,” and we be-
so and sometimes because they do not bother. lieve that the fractionation of expertise is the rule, not an
Intuitions that originate in heuristics are not necessar- exception. For example, auditors who have expertise in
ily wrong. Indeed, the original statement of the HB ap- “hard” data such as accounts receivable may do much less
proach asserted, “In general these heuristics are quite use- well with “soft” data such as indications of fraud (J. Shan-
ful, but sometimes they lead to severe and systematic teau, personal communication, February 12, 2009).
errors” (Tversky & Kahneman, 1974, p. 1124). The HB There are a few activities, such as chess, in which a
claim is not that intuitions that arise in heuristics are always master will not encounter challenges that are genuinely
incorrect, only that they are less trustworthy than intuitions new. In most domains, however, professionals will occa-
that are rooted in specific experiences. Unfortunately, peo- sionally have to deal with situations and tasks that they
ple are not normally aware of the origins of the thoughts have not had an opportunity to master. Physicians, as is
that come to their minds, and the correlation between the well known, encounter from time to time diagnostic prob-
accuracy of their judgments and the confidence they expe- lems that are entirely new to them—they have expertise in
rience is not consistently high (Arkes, 2001; Griffin & some diagnoses but not in others. Similarly, weather fore-
Tversky, 1992). Subjective confidence is often determined casters are more successful in the routine prediction of
by the internal consistency of the information on which a temperature and precipitation than in forecasting hail
judgment is based, rather than by the quality of that infor- (Stewart, Roebber, & Bosart, 1997).
mation (Einhorn & Hogarth, 1978; Kahneman & Tversky, Characteristically, we came to the topic of fraction-
1973). As a result, evidence that is both redundant and ated expertise with different examples in mind. GK focuses
flimsy tends to produce judgments that are held with too on the experts who perform a constant task (e.g., putting
much confidence. These judgments will be presented too out fires; establishing a diagnosis) but encounter unfamiliar
assertively to others and are likely to be believed more than situations. The ability to recognize that a situation is anom-
they deserve to be. The safe way to evaluate the probable alous and poses a novel challenge is one of the manifesta-
accuracy of a judgment (our own or someone else’s) is by tions of authentic expertise. Descriptions of diagnostic
considering the validity of the environment in which the thinking in medicine emphasize the intuitive ability of

some physicians to realize that the characteristics of a case There is compelling evidence that under certain con-
do not fit into any familiar category and call for a deliberate ditions mechanical and analytical judgments outperform
search for the true diagnosis (Gawande, 2002; Groopman, human judgment. Grove, Zald, Lebow, Snitz, and Nelson
2007). (2000) reported a meta-analysis of 136 studies that com-
DK is particularly interested in cases in which profes- pared the accuracy of clinical and mechanical judgments,
sionals who know how to use their knowledge for some most within the domains of clinical psychology and med-
purposes attempt to use the same knowledge for other icine. Their review excluded studies involving nonhuman
purposes. He views the fractionation of expertise as one outcomes such as horse races and weather. The preponder-
element in the explanation of the illusion of validity: the ance of data favored the algorithms (i.e., the “mechanical”
overconfidence that professionals sometimes experience in judgments), which were superior in about half the studies
dealing with problems in which they have little or no skill. (n ⫽ 63). The other half of the studies showed no differ-
Finance professionals, psychotherapists, and intelligence ence (n ⫽ 65), and only a few studies showed better
analysts may know a great deal about a particular company, performance by the clinical judgments (n ⫽ 8). For exam-
patient, or international conflict, and they may have re- ple, the tasks for which there was at least a 17-point
ceived ample feedback supporting their confidence in the difference in effect size favoring mechanical over clinical
performance of some tasks—typically those that deal with judgments included the following: college academic per-
the short term— but the feedback they receive from their formance, presence of throat infections, diagnosis of gas-
failures in long-term judgments is delayed, sparse, and trointestinal disorders, length of psychiatric hospitalization,
ambiguous. The experience of the professionals that DK job turnover, suicide attempts, juvenile delinquency, ma-
has thought about is therefore conducive to overconfidence. lingering, and occupational choice.
These professionals may have strong subjective con- Findings in which the performance of human judges is
fidence in their judgments, but we do not believe that inferior to that of simple algorithms are often cited as
subjective confidence reliably indicates whether intuitive evidence of cognitive ineptitude, but this conclusion is
judgments or decisions are valid. When experts recognize unwarranted. The correct conclusion is that people perform
anomalies, using judgments of typicality and familiarity, significantly more poorly than algorithms in low-validity
they are detecting violations of patterns in the external environments. The tasks reviewed by Grove et al. (2000)
situation. In contrast, people do not have a strong ability to
generally involved noisy and/or highly complex situations.
distinguish correct intuitions from faulty ones. People, even
The forecasts made by the algorithms were often wrong,
experts, do not appear to be skilled in detecting patterns in
albeit less often than the clinical predictions. The studies in
the internal situation in order to identify the basis for their
the Meehl paradigm have not produced “smoking gun”
judgments. Therefore, reliance on subjective confidence
demonstrations in which clinicians miss highly valid cues
may contribute to overconfidence.
that the algorithm detects and uses. Indeed, such an out-
The experts that GK has studied seem less susceptible
to overconfidence, perhaps in part because of the direct come would be unlikely, because human learning is nor-
personal risks it poses. Weather forecasters, engineers, and mally quite efficient. Where simple and valid cues exist,
logistics specialists typically resist requests to make judg- humans will find them if they are given sufficient experi-
ments about matters that fall outside their area of compe- ence and enough rapid feedback to do so— except in the
tence. People in professions marked by standard methods, environments that Hogarth (2001) labeled “wicked,” in
clear feedback, and direct consequences for error appear to which the feedback is misleading. A statistical approach
appreciate the boundaries of their expertise. These experts has two crucial advantages over human judgment when
know more knowledgeable experts exist. Weather forecast- available cues are weak and uncertain: Statistical analysis
ers know there are people in another location who better is more likely to identify weakly valid cues, and a predic-
understand the local dynamics. Structural engineers know tion algorithm will maintain above-chance accuracy by
that chemical engineers, or even structural engineers work- using such cues consistently. The meta-analysis performed
ing with different types of models or materials, are the true by Karelaia and Hogarth (2008) showed that consistency
experts who should be consulted. accounted for much of the advantage of algorithms over
As in the other topics that we have considered, we find humans.
no reason to disagree about either fractionation of expertise The evaluation and approval of personal loans by loan
or overconfidence. As usual, different rules apply to dif- officers is an example of a situation in which algorithms
ferent tasks. should be used to replace human judgment. Identifying the
relatively small number of defaulting loans is a low-valid-
Augmenting Professional Judgment: ity task because of the low base rate of the critical outcome.
The Use of Algorithms Algorithms have largely replaced human judges in this
The attitude toward the Meehl paradigm, in which intui- task, using as inputs objective demographic and personal
tions and professional judgments are set in competition, is data rather than subjective impression of reliability. The
a sore point in conversations between adherents of NDM result is an unequivocal improvement: We have fairer loan
and HB. The idea of algorithms that outdo human judges is judgments (i.e., judgments that are not improperly influ-
a source of pride and joy for members of the HB tribe, but enced by gender or race), faster decisions, and reduced
algorithms are usually distrusted by the NDM community. expenses.

Our analysis suggests that algorithms significantly and respecting the expertise of decision makers, a hallmark
outperform humans under two quite different conditions: of the NDM approach. We expect that there are additional
(a) when validity is so low that human difficulties in methods that can synthesize the strengths of the two traditions.
detecting weak regularities and in maintaining consistency
of judgment are critical and (b) when validity is very high, Conclusions
in highly predictable environments, where ceiling effects In an effort that spanned several years, we attempted to
are encountered and occasional lapses of attention can answer one basic question: Under what conditions are the
cause humans to fail. Automatic transportation systems in intuitions of professionals worthy of trust? We do not claim
airports are an example in that class. that the conclusions we reached are surprising (many were
NDM proponents correctly emphasize that the condi- anticipated by Shanteau, 1992, Hogarth, 2001, and Myers,
tions necessary for the construction and use of an algorithm 2002, among others), but we believe that they add up to a
are stringent. These conditions include (a) confidence in the coherent view of expert intuition, which is more than we
adequacy of the list of variables that will be used, (b) a expected to achieve when we began.
reliable and measurable criterion, (c) a body of similar
cases, (d) a cost/benefit ratio that warrants the investment ● Our starting point is that intuitive judgments can
in the algorithmic approach, and (e) a low likelihood that arise from genuine skill—the focus of the NDM
changing conditions will render the algorithm obsolete. We approach— but that they can also arise from inap-
also agree that algorithms that substitute for human judg- propriate application of the heuristic processes on
ment must remain under human supervision, to provide which students of the HB tradition have focused.
continuous monitoring of their performance and of relevant ● Skilled judges are often unaware of the cues that
changes in the environment. Maintaining adequate super- guide them, and individuals whose intuitions are not
vision of algorithms can be difficult, because there is evi- skilled are even less likely to know where their
dence that human operators become more passive and less judgments come from.
vigilant when algorithms are in charge—a phenomenon ● True experts, it is said, know when they don’t know.
that has been labeled “automation bias” (Skitka, Mosier & However, nonexperts (whether or not they think
Burdick, 1999, 2000). they are) certainly do not know when they don’t
We agree that the introduction of algorithms and other know. Subjective confidence is therefore an unreli-
formal decision aids in organizations will often encounter able indication of the validity of intuitive judgments
opposition and unexpected problems of implementation. and decisions.
Few people enjoy being replaced by mechanical devices or ● The determination of whether intuitive judgments
by mathematical algorithms, and many devices and algo- can be trusted requires an examination of the envi-
rithms function less well in the real world than on the ronment in which the judgment is made and of the
planning board (Yates, Veinott, & Patalano, 2003). Even opportunity that the judge has had to learn the
decision aids and procedures that leave the authority of the regularities of that environment.
decision maker intact— decision analysis is a salient exam- ● We describe task environments as “high-validity” if
ple—are often resisted, for both good and bad reasons. there are stable relationships between objectively iden-
Naturally, we have somewhat different attitudes toward tifiable cues and subsequent events or between cues
these problems of implementation, with DK usually view- and the outcomes of possible actions. Medicine and
ing them as obstacles to be overcome and GK seeing them firefighting are practiced in environments of fairly
as reasons to be skeptical about the value of formal methods. high validity. In contrast, outcomes are effectively
Despite our different attitudes toward formal methods, unpredictable in zero-validity environments. To a
we agree on the potential of semi-formal strategies. An good approximation, predictions of the future value of
example is the premortem method (Klein, 2007) for reduc- individual stocks and long-term forecasts of political
ing overconfidence and improving decisions. Project teams events are made in a zero-validity environment.
using this method start by describing their plan. Next they ● Validity and uncertainty are not incompatible. Some
imagine that their plan has failed and the project has been environments are both highly valid and substan-
a disaster. Their task is to write down, in two minutes, all tially uncertain. Poker and warfare are examples.
the reasons why the project failed. The facilitator goes The best moves in such situations reliably increase
around the table, getting reasons from each of the team the potential for success.
members, starting with the leader. The rationale for the ● An environment of high validity is a necessary
method is the concept of prospective hindsight (Mitchell, condition for the development of skilled intuitions.
Russo, & Pennington, 1989)—that people can generate Other necessary conditions include adequate oppor-
more criticisms when they are told that an outcome is tunities for learning the environment (prolonged
certain. It also offers a solution to one of the major prob- practice and feedback that is both rapid and un-
lems of decision making within organizations: the gradual equivocal). If an environment provides valid cues
suppression of dissenting opinions, doubts, and objections, and good feedback, skill and expert intuition will
which is typically observed as an organization commits eventually develop in individuals of sufficient talent.
itself to a major plan. The premortem method is consistent ● Although true skill cannot develop in irregular or
with the HB concern for overconfidence while drawing on unpredictable environments, individuals will some-

times make judgments and decisions that are suc- Chase (Ed.), Visual information processing (pp. 215–281). New York:
cessful by chance. These “lucky” individuals will be Academic Press.
Collyer, S. C., & Malecki, G. S. (1998). Tactical decision making under
susceptible to an illusion of skill and to overconfi- stress: History and overview. In J. A. Cannon-Bowers & E. Salas
dence (Arkes, 2001). The financial industry is a rich (Eds.), Making decisions under stress: Implications for individual and
source of examples. team training (pp. 3–15). Washington, DC: American Psychological
● The situation that we have labeled fractionation of Association.
Crandall, B., & Gamblian, V. (1991). Guide to early sepsis assessment in
skill is another source of overconfidence. Profes- the NICU [Instruction manual prepared for the Ohio Department of
sionals who have expertise in some tasks are some- Development under the Ohio Small Business Innovation Research
times called upon to make judgments in areas in Bridge Grant program]. Fairborn, OH: Klein Associates.
which they have no real skill. (For example, finan- Crandall, B., & Getchell-Reiter, K. (1993). Critical decision method: A
cial analysts may be skilled at evaluating the likely technique for eliciting concrete assessment indicators from the “intu-
ition” of NICU nurses. Advances in Nursing Sciences, 16(1), 42–51.
commercial success of a firm, but this skill does not Crandall, B., Klein, G., & Hoffman, R. R. (2006). Working minds: A
extend to the judgment of whether the stock of that practitioner’s guide to cognitive task analysis. Cambridge, MA: MIT
firm is underpriced.) It is difficult both for the Press.
professionals and for those who observe them to Croskerry, P., & Norman, G. (2008). Overconfidence in clinical decision
making. The American Journal of Medicine, 121, S24 –S29.
determine the boundaries of their true expertise. deGroot, A. D. (1978). Thought and choice in chess. The Hague: Mouton.
● We agree that the weak regularities available in (Original work published 1946)
low-validity situations can sometimes support the Einhorn, H. J., & Hogarth, R. M. (1978). Confidence in judgment:
development of algorithms that do better than Persistence of the illusion of validity. Psychological Review, 85, 395–
chance. These algorithms only achieve limited ac- 416.
Ericsson, K. A. (2006). The influence of experience and deliberate prac-
curacy, but they outperform humans because of tice on the development of superior expert performance. In K. A.
their advantage of consistency. However, the intro- Ericsson, N. Charness, R. R. Hoffman, & P. J. Feltovich (Eds.), The
duction of algorithms to replace human judgment is Cambridge handbook of expertise and expert performance (pp. 39 – 68).
likely to evoke substantial resistance and sometimes New York: Cambridge University Press.
Ericsson, K. A., Charness, N., Hoffman, R. R., & Feltovich, P. J. (Eds.).
has undesirable side effects. (2006). The Cambridge handbook of expertise and expert performance.
New York: Cambridge University Press.
Another conclusion that we both accept is that the Evans, J. St. B. T. (2007). Hypothetical thinking: Dual processes in
approaches of our respective communities have built-in reasoning and judgment. Hove, East Sussex, England: Psychology
limitations. For historical and methodological reasons, HB Press.
researchers generally find errors more interesting and in- Evans, J. St. B. T., & Frankish, K. (Eds.). (2009). In two minds: Dual
structive than correct performance; but a psychology of processes and beyond. New York: Oxford University Press.
Fogarty, W. M. (1988). Formal investigation into the circumstances
judgment and decision making that ignores intuitive skill is surrounding the downing of a commercial airliner by the U. S. S.
seriously blinkered. Because their intellectual attitudes de- Vincennes (CG 49) on 3 July 1988 [Unclassified Letter Ser. 1320 of 28
veloped in reaction to the HB tradition, members of the July 1988, to Commander in Chief, U.S. Central Command]. Washing-
NDM community have an aversion to the word bias and to ton, DC: U.S. Department of the Navy.
Frederick, S. (2005). Cognitive reflection and decision making. Journal of
the corresponding concept; but a psychology of profes- Economic Perspectives, 19(4), 25– 42.
sional judgment that neglects predictable errors cannot be Gawande, A. (2002). Complications: A surgeon’s notes on an imperfect
adequate. Although we agree with both of these conclu- science. London: Profile Books.
sions, we have yet to move much beyond recognition of the Gilovich, T., Griffin, D., & Kahneman, D. (Eds.). (2002). Heuristics and
problem. DK is still fascinated by persistent errors, and GK biases: The psychology of intuitive judgment. New York: Cambridge
University Press.
still recoils when biases are mentioned. We hope, however, Goldberg, L. R. (1970). Man versus model of man: A rationale, plus some
that our effort may help others do more than we have been evidence, for a method of improving on clinical inferences. Psycholog-
able to do in bringing the insights of both communities to ical Bulletin, 73, 422– 432.
bear on their common subject. Goldstein, D. G., & Gigerenzer, G. (1999). The recognition heuristic:
How ignorance makes us smart. In G. Gigerenzer, P. M. Todd, &
A. B. C. Research Group (Eds.), Simple heuristics that make us smart
REFERENCES (pp. 37–58). New York: Oxford University Press.
Griffin, D. W., & Tversky, A. (1992). The weighing of evidence and the
Arkes, H. R. (2001). Overconfidence in judgmental forecasting. In J. S. determinants of confidence. Cognitive Psychology, 24, 411– 435.
Armstrong (Ed.), Principles of forecasting: A handbook for researchers Groopman, J. (2007). How doctors think. New York: Houghton Mifflin.
and practitioners (pp. 495–516). Boston: Kluwer Academic. Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C.
Bazerman, M. H. (2005). Judgment in managerial decision making (6th (2000). Clinical versus mechanical prediction: A meta-analysis. Psy-
ed.). Hoboken, NJ: Wiley. chological Assessment, 12, 19 –30.
Beach, L. R. (1990). Image theory: Decision making in personal and Guthrie, C., Rachlinski, J. J., & Wistrich, A. J. (2007). Blinking on the
organizational contexts. Chichester, England: Wiley. bench: How judges decide cases. Cornell Law Review, 93, 1– 43.
Brunswik, E. (1957). Scope and aspects of the cognitive problem. In H. Hammond, K. R., Hamm, R. M., Grassia, J., & Pearson, T. (1987). Direct
Gruber, K. R. Hammond, & R. Jessor (Eds.), Contemporary ap- comparison of the efficacy of intuitive and analytical cognition in
proaches to cognition (pp. 5–31). Cambridge, MA: Harvard University expert judgment. IEEE Transactions on Systems, Man, and Cybernet-
Press. ics, SMC-17(5), 753–770.
Cannon-Bowers, J. A., & Salas, E. (Eds.). (1998). Making decisions under Hertwig, R., Hoffrage, U., & Martingnon, L. (1999). Quick estimation:
stress: Implications for individual and team training. Washington, DC: Letting the environment do the work. In G. Gigerenzer, P. M. Todd, &
American Psychological Association. A. B. C. Research Group (Eds.), Simple heuristics that make us smart
Chase, W. G., & Simon, H. A. (1973). The mind’s eye in chess. In W. G. (pp. 37–58). New York: Oxford University Press.

Hogarth, R. M. (2001). Educating intuition. Chicago: University of Chi- Montgomery, H. (1993). The search for a dominance structure in decision
cago Press. making: Examining the evidence. In G. A. Klein, J. Orasanu, R.
Jacowitz, K. E., & Kahneman, D. (1995). Measures of anchoring in Calderwood, & C. E. Zsambok (Eds.), Decision making in action:
estimation tasks. Personality and Social Psychology Bulletin, 21, 1161– Models and methods (pp. 182–187). Norwood, NJ: Ablex.
1166. Mussweiler, T., & Strack, F. (2000). The use of category and exemplar
Johnson, J. G., & Raab, M. (2003). Take the first: Option generation and knowledge in the solution of anchoring tasks. Journal of Personality
resulting choices. Organizational Behavior and Human Decision Pro- and Social Psychology, 78, 1038 –1052.
cesses, 91(2), 215–229. Myers, D. G. (2002). Intuition: Its powers and perils. New Haven, CT:
Kahneman, D. (2003). Autobiography. In T. Frangsmyr (Ed.), Les Prix Yale University Press.
Nobel 2002 [Nobel Prizes 2002]. Stockholm, Sweden: Almqvist & Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know:
Wiksell International. Verbal reports on mental processes. Psychological Review, 84, 231–
Kahneman, D., & Frederick, S. (2002). Representativeness revisited: 259.
Attribute substitution in intuitive judgment. In T. Gilovich, D. Griffin, Orasanu, J., & Connolly, T. (1993). The reinvention of decision making.
& D. Kahneman (Eds.), Heuristics and biases: The psychology of In G. A. Klein, J. Orasanu, R. Calderwood, & C. E. Zsambok (Eds.),
intuitive judgment (pp. 49 – 81). New York: Cambridge University Decision making in action: Models and methods (pp. 3–20). Norwood,
Press. NJ: Ablex.
Kahneman, D., & Renshon, J. (2007). Why hawks win. Foreign Policy, Rasmussen, J. (1986). Information processing and human–machine inter-
158, 34 –38. action: An approach to cognitive engineering. Amsterdam: North-
Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Holland.
Psychological Review, 80, 237–251. Rosenzweig, P. (2007). The halo effect . . . and the eight other business
Karelaia, N., & Hogarth, R. M. (2008). Determinants of linear judgment: delusions that deceive managers. New York: Free Press.
A meta-analysis of lens model studies. Psychological Bulletin, 134, Schraagen, J. M. C., Chipman, S. F., & Shalin, V. J. (Eds.). (2000).
404 – 426. Cognitive task analysis. Mahwah, NJ: Erlbaum.
Klein, G. A. (1993). A recognition-primed decision (RPD) model of rapid Shanteau, J. (1992). Competence in experts: The role of task character-
decision making. In G. A. Klein, J. Orasanu, R. Calderwood, & C. E. istics. Organizational Behavior and Human Decision Processes, 53,
Zsambok (Eds.), Decision making in action: Models and methods (pp. 252–262.
138 –147). Norwood, NJ: Ablex. Simon, H. A. (1992). What is an explanation of behavior? Psychological
Klein, G. (1998). Sources of power: How people make decisions. Cam- Science, 3, 150 –161.
bridge, MA: MIT Press. Skitka, L. J., Mosier, K., & Burdick, M. D. (1999). Does automation bias
Klein, G. (2007, September). Performing a project premortem. Harvard decision-making? International Journal of Human–Computer Studies,
Business Review, pp. 18 –19. 51, 991–1006.
Klein, G. A., Calderwood, R., & Clinton-Cirocco, A. (1986). Rapid Skitka, L. J., Mosier, K., & Burdick, M. D. (2000). Accountability and
decision making on the fireground. In Proceedings of the Human automation bias. International Journal of Human–Computer Studies,
Factors and Ergonomics Society 30th Annual Meeting (Vol. 1, pp. 52, 701–717.
576 –580). Norwood, NJ: Ablex. Slovic, P. (Ed.). (2000). The perception of risk. London: Earthscan.
Klein, G. A., Orasanu, J., Calderwood, R., & Zsambok, C. E. (1993). Smith, P. J., Giffin, W. C., Rockwell, T. H., & Thomas, M. (1986).
Decision making in action: Models and methods. Norwood, NJ: Ablex. Modeling fault diagnosis as the activation and use of a frame system.
Klein, G., Wolf, S., Militello, L., & Zsambok, C. (1995). Characteristics Human Factors, 28, 703–716.
of skilled option generation in chess. Organizational Behavior and Stewart, T. R., Roebber, P. J., & Bosart, L. F. (1997). The importance of
Human Decision Processes, 62, 63– 69. the task in analyzing expert judgment. Organizational Behavior and
Lewis, M. (2003). Moneyball: The art of winning an unfair game. New Human Decision Processes, 69, 205–219.
York: Norton. Sunstein, C. R. (Ed.). (2000). Behavioral law and economics. New York:
Lipshitz, R. (1993). Converging themes in the study of decision making in Cambridge University Press.
realistic settings. In G. A. Klein, J. Orasanu, R. Calderwood, & C. E. Tetlock, P. E. (2005). Expert political judgment: How good is it? How can
Zsambok (Eds.), Decision making in action: Models and methods (pp. we know? Princeton, NJ: Princeton University Press.
103–137). Norwood, NJ: Ablex. Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers.
Mednick, S. A. (1962). The associative basis of the creative process. Psychological Bulletin, 76, 105–110.
Psychological Review, 69, 220 –232. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heu-
Meehl, P. E. (1954). Clinical vs. statistical prediction: A theoretical ristics and biases. Science, 185, 1124 –1131.
analysis and a review of the evidence. Minneapolis: University of Woods, D. D., O’Brien, J., & Hanes, L. F. (1987). Human factors
Minnesota Press. challenges in process control: The case of nuclear power plants. In G.
Meehl, P. E. (1973). Why I do not attend case conferences. In P. E. Meehl Salvendy (Ed.), Handbook of human factors/ergonomics (pp. 1724 –
(Ed.), Psychodiagnosis: Selected papers (pp. 225–302). Minneapolis: 1770). New York: Wiley.
University of Minnesota Press. Yates, J. F., Veinott, E. S., & Patalano, A. L. (2003). Hard decisions, bad
Mitchell, D., Russo, J., & Pennington, N. (1989). Back to the future: decisions: On decision quality and decision aiding. In S. L. Schneider
Temporal perspective in the explanation of events. Journal of Behav- & J. C. Shanteau (Eds.), Emerging perspectives on judgment and
ioral Decision Making, 2, 25–38. decision research (pp. 13– 63). New York: Cambridge University Press.

Failure To Disagree PDF

Uploaded by

Copyright:

Available Formats

Failure To Disagree PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Failure To Disagree PDF

Uploaded by

Copyright:

Available Formats

Conditions for Intuitive Expertise

Daniel Kahneman Princeton University

obvious fact that professional intuition is sometimes mar- into account.

I n this article we report on an effort to compare our Making Approach

September 2009 ● American Psychologist 515

In 1988, an international tragedy occurred after the

USS Vincennes accidentally shot down an Iranian Airbus

516 September 2009 ● American Psychologist

separate occasions, human judges often reach different

conclusions. Goldberg (1970) reported a “bootstrapping

September 2009 ● American Psychologist 517

computation. This initial study of professionals reinforced

Tversky and Kahneman (1971) in their belief (originally

518 September 2009 ● American Psychologist

performance of different professionals can be compared, tasks.

September 2009 ● American Psychologist 519

observed are misleading. Hogarth (2001) introduced the

520 September 2009 ● American Psychologist

September 2009 ● American Psychologist 521

522 September 2009 ● American Psychologist

September 2009 ● American Psychologist 523

524 September 2009 ● American Psychologist

September 2009 ● American Psychologist 525

526 September 2009 ● American Psychologist

You might also like