War and Chance
War and Chance
War and Chance
BRIDGING THE G AP
Series Editors:
Goldgeier James
Jentleson Bruce
Steven Weber
JEFFREY A . FRIEDMAN
1
1
Oxford University Press is a department of the University of Oxford. It furthers
the University’s objective of excellence in research, scholarship, and education
by publishing worldwide. Oxford is a registered trade mark of Oxford University
Press in the UK and certain other countries.
1 3 5 7 9 8 6 4 2
Printed by Sheridan Books, Inc., United States of America
For my parents
CONTENTS
Acknowledgments ix
vii
ACKNOWLEDGMENTS
I am delighted to have so many people to thank for their input, guidance, and
support.
Stephen Walt, Monica Toft, and Robert Bates were my dissertation advisers at
Harvard. In addition to providing careful feedback on my research over many years,
each of them has served as a role model for producing scholarship that is interesting,
rigorous, and policy-relevant. I am also especially grateful for the mentorship of
Richard Zeckhauser, who inspired my interest in decision analysis, and Stephen
Biddle, who played a crucial role in launching my academic career.
Dartmouth has provided an ideal environment in which to study international
politics. It has been wonderful to be surrounded by so many outstanding faculty
who work in this area, including Dan Benjamin, Steve Brooks, Udi Greenberg,
Brian Greenhill, Jenny Lind, Mike Mastanduno, Ed Miller, Jennie Miller, Nick
Miller, Katy Powers, Daryl Press, Ben Valentino, and Bill Wohlforth. I was lucky
to arrive at Dartmouth along with a cohort of brilliant, supportive junior scholars,
especially David Cottrell, Derek Epp, Jeremy Ferwerda, Herschel Nachlis, Paul
Novosad, Julie Rose, Na’ama Shenhav, Sean Westwood, and Thomas Youle.
Dartmouth’s undergraduates have also served as a valuable sounding board for
many of the ideas presented in this book. It has been a pleasure to work with
them and to learn from their insights.
While conducting this research, I held fellowships at the Weatherhead Center
for International Affairs, the Belfer Center for Science and International Affairs,
the Tobin Project, the Dickey Center for International Understanding, and the
Institute for Advanced Study in Toulouse, France. Each institution provided a
stimulating environment for research and writing, particularly given the interdis-
ciplinary nature of my work. The U.S. Department of Homeland Security funded
portions of this research via the University of Southern California’s Center for
Risk and Economic Analysis of Terrorism Events. That funding was especially
important in ensuring the ethical compensation of survey respondents.
ix
x Acknowledgments
Over the past two decades, the most serious problems in U.S. foreign
policy have revolved around the challenge of assessing uncertainty. Leaders
underestimated the risk of terrorism before 2001, overestimated the chances
that Saddam Hussein was pursuing weapons of mass destruction in 2003,
and did not fully appreciate the dangers of pursuing regime change in Iraq,
Afghanistan, or Libya. Many of this generation’s most consequential events,
such as the 2008 financial crisis, the Arab Spring, the rise of ISIS, and Brexit,
were outcomes that experts either confidently predicted would not take place
or failed to anticipate entirely. Those experiences provide harsh reminders that
scholars and practitioners of international politics are far less clairvoyant than
we would like them to be.
The central difficulty with assessing uncertainty in international politics
is that the most important judgments also tend to be the most subjective. No
known methodology can reliably predict the outbreak of wars, forecast economic
recessions, project the results of military operations, anticipate terrorist attacks,
or estimate the chances of countless other events kinds of uncertainty that shape
foreign policy decisions.1 Many scholars and practitioners therefore believe that
it is better to keep foreign policy debates focused on the facts—that it is, at best,
a waste of time to debate uncertain judgments that will often prove to be wrong.
1
On how irreducible uncertainty surrounds most major foreign policy decisions, see Robert
Jervis, System Effects: Complexity in Political and Social Life (Princeton, N.J.: Princeton University
Press, 1997); Richard K. Betts, Enemies of Intelligence: Knowledge and Power in American National
Security (New York: Columbia University Press, 2006). On the limitations of even the most state-of-
the-art methods for predicting international politics, see Gerald Schneider, Nils Petter Gleditsch, and
Sabine Carey, “Forecasting in International Relations,” Conflict Management and Peace Science, Vol.
20, No. 1 (2011), pp. 5–14; Michael D. Ward, “Can We Predict Politics?” Journal of Global Security
Studies, Vol. 1, No. 1 (2016), pp. 80–91.
1
2 Wa r a n d C h a n c e
This skepticism raises fundamental questions about the nature and limits of
foreign policy analysis. How is it possible to draw coherent conclusions about
something as complicated as the probability that a military operation will suc-
ceed? If these judgments are subjective, then how can they be useful? To what
extent can fallible people make sense of these judgments, particularly given the
psychological constraints and political pressures that surround foreign policy
decision making? These questions apply to virtually every element of foreign
policy discourse, and they are the subject of this book.
The book has two main goals. The first of these goals is to show how for-
eign policy officials often try to avoid the challenge of probabilistic reasoning.
The book’s second goal is to demonstrate that assessments of uncertainty in in-
ternational politics are more valuable than the conventional wisdom expects.
From a theoretical standpoint, we will see that foreign policy analysts can assess
uncertainty in clear and structured ways; that foreign policy decision makers
can use those judgments to evaluate high-stakes choices; and that , in some
cases , it is nearly impossible to make sound foreign policy decisions without
assessing subjective probabilities in detail. The book’s empirical chapters then
demonstrate that real people are remarkably capable of putting those concepts
into practice. We will see that assessments of uncertainty convey meaningful in-
formation about international politics; that presenting this information explic-
itly encourages decision makers to be more cautious when they place lives and
resources at risk; and that, even if foreign policy analysts often receive unfair
criticism, that does not necessarily distort analysts’ incentives to provide clear
and honest judgments.
Altogether, the book thus explains how foreign policy analysts can assess un-
certainty in a manner that is theoretically coherent, empirically meaningful, polit-
ically defensible, practically useful, and sometimes logically necessary for making
sound choices.2 Each of these claims contradicts widespread skepticism about the
value of probabilistic reasoning in international politics, and shows that placing
greater emphasis on this subject can improve nearly any foreign policy debate. The
book substantiates these claims by examining critical episodes in the history of
U.S. national security policy and by drawing on a diverse range of quantitative evi-
dence, including a database that contains nearly one million geopolitical forecasts
and experimental studies involving hundreds of national security professionals.
The clearest benefit of improving assessments of uncertainty in international
politics is that it can help to prevent policymakers from taking risks that they do
not fully understand. Prior to authorizing the Bay of Pigs invasion in April 1961,
for example, President Kennedy asked his Joint Chiefs of Staff to evaluate the
2
Here and throughout the book, I use the term foreign policy analysts to describe anyone who
seeks to inform foreign policy debates, both in and out of government.
Int roduc tion 3
plan’s feasibility. The Joint Chiefs submitted a report that detailed the operation’s
strengths and weaknesses, and concluded that “this plan has a fair chance of ul-
timate success.”3 In the weeks that followed, high-ranking officials repeatedly
referenced the Joint Chiefs’ judgment when debating whether or not to set the
Bay of Pigs invasion in motion. The problem was that no one had a clear idea of
what that judgment actually meant.
The officer who wrote the Joint Chiefs’ report on the Bay of Pigs invasion
later said that the “fair chance” phrase was supposed to be a warning, much like a
letter grade of C indicates “fair performance” on a test.4 That is also how Secretary
of Defense Robert McNamara recalled interpreting the Joint Chiefs’ views.5 But
other leaders read the report differently. When Marine Corps Commandant
David Shoup was later asked to say how he had interpreted the “fair chance”
phrase, he replied that “the plan they had should have accomplished the mis-
sion.”6 Proponents of the invasion repeatedly cited the “fair chance” assessment
in briefing materials.7 President Kennedy came to believe that the Joint Chiefs
had endorsed the plan. After the invasion collapsed, he wondered why no one
had warned him that the mission might fail.8
3
“Memorandum from the Joint Chiefs of Staff to Secretary McNamara,” Foreign Relations of the
United States [FRUS] 1961-1963, Vol. X, Doc 35 (3 February 1961). Chapter 5 describes this docu-
ment in greater detail.
4
When the historian Peter Wyden interviewed the officer fifteen years later, he found that “[then-
Brigadier General David] Gray was still severely troubled about his failure to have insisted that figures
[i.e., numeric percentages] be used. He felt that one of the key misunderstandings in the entire proj
ect was the misinterpretation of the word ‘fair.’ ” Gray told Wyden that the Joint Chiefs believed that
the odds of the invasion succeeding were roughly three in ten. Peter Wyden, Bay of Pigs: The Untold
Story (New York: Simon and Schuster, 1979), pp. 88–90.
5
In an after-action review conducted shortly after the Bay of Pigs invasion collapsed, McNamara
said that he knew the Joint Chiefs thought the plan was unlikely to work, but that he had still
believed it was the best opportunity the United States would get to overthrow the Castro regime.
“Memorandum for the Record,” FRUS 1961-1963, Vol. X, Doc 199 (3 May 1961).
6
“Memorandum for the Record,” FRUS 1961-1963, Vol. X Doc 209 (8 May 1961).
7
James Rasenberger, The Brilliant Disaster: JFK, Castro, and America’s Doomed Invasion of the Bay
of Pigs (New York: Scribner, 2011), p. 119, explains that “the entire report, on balance, came off as
an endorsement of the CIA’s plan.” On the CIA’s subsequent use of the “fair chance” statement to
support the invasion, see “Paper Prepared in the Central Intelligence Agency,” FRUS 1961–1963,
Vol. X, Doc 46 (17 February 1961). For post-mortem discussions of the invasion decision, see FRUS
1961–1963, Vol. X, Docs 199, 209, 210, and 221.
8
President Kennedy later complained to an aide that the Joint Chiefs “had just sat there nodding,
saying it would work.” In a subsequent interview, Kennedy recalled that “five minutes after it began
to fall in, we all looked at each other and asked, ‘How could we have been so stupid?’ When we saw
the wide range of the failures we asked ourselves why it had not been apparent to somebody from the
start.” Richard Reeves, President Kennedy: Profile of Power (New York: Simon and Schuster, 1993),
p. 103; Hugh Sidey, “The Lesson John Kennedy Learned from the Bay of Pigs,” Time, Vol. 157, No. 15
(2001).
4 Wa r a n d C h a n c e
Regardless of the insight that the Joint Chiefs provided about the strengths
and weaknesses of the Bay of Pigs invasion, their advice thus amounted to
a Rorschach test: it allowed policymakers to adopt nearly any position they
wanted, and to believe that their view had been endorsed by the military’s top
brass. This is just one of many examples we will see throughout the book of how
failing to assess uncertainty in clear and structured ways can undermine for-
eign policy decision making. Yet, as the next example shows, the challenge of
assessing uncertainty in international politics runs deeper than semantics, and it
cannot be solved through clear language alone.
In April 2011—almost exactly fifty years after the Bay of Pigs invasion—
President Barack Obama convened his senior national security team to dis-
cuss reports that Osama bin Laden might be living in Abbottabad, Pakistan.
Intelligence analysts had studied a suspicious compound in Abbottabad for
months. They possessed clear evidence connecting this site to al Qaeda, and they
knew that the compound housed a tall, reclusive man who never left the prem-
ises. Yet it was impossible to be certain about who that person was. If President
Obama was going to act on this information, he would have to base his deci-
sion on probabilistic reasoning. To make that reasoning as rigorous as possible,
President Obama asked his advisers to estimate the chances that bin Laden was
living in the Abbottabad compound.9
Answers to the president’s question ranged widely. The leader of the bin
Laden unit at the Central Intelligence Agency (CIA) said there was a ninety-five
percent chance that they had found their man. CIA Deputy Director Michael
Morell thought that those chances were more like sixty percent. Red Teams
assigned to make skeptical arguments offered figures as low as thirty or forty per-
cent. Other views reportedly clustered around seventy or eighty percent. While
accounts of this meeting vary, all of them stress that participants did not know
how to resolve their disagreement and that they did not find the discussion to
be helpful.10 President Obama reportedly said at the time that the debate had
provided “not more certainty but more confusion.” In a subsequent interview, he
9
The following account is based on Michael Morell, The Great War of Our Time: The CIA’s
Fight against Terrorism from Al Qa’ida to ISIS (New York: Twelve, 2014), ch. 7; along with
David Sanger, Confront and Conceal: Obama’s Secret Wars and Surprising Use of American Power
(New York: Crown 2012); Peter Bergen, Manhunt: The Ten-Year Search for Bin Laden from 9/11 to
Abbottabad (New York: Crown 2012); and Mark Bowden, The Finish: The Killing of Osama bin Laden
(New York: Atlantic, 2012).
10
Reflecting later on this debate, James Clapper, who was then the director of national intelli-
gence, said, “We put a lot of discussion [into] percentages of confidence, which to me is not partic-
ularly meaningful. In the end it’s all subjective judgment anyway.” CNN, “The Axe Files,” Podcast
Ep. 247 (31 May 2018).
Int roduc tion 5
told a reporter that his advisers’ probability estimates had “disguised uncertainty
as opposed to actually providing you with more useful information.”11
Of course, President Obama’s decision to raid the Abbottabad compound
ended more successfully than President Kennedy’s decision to invade the Bay
of Pigs. Yet the confusion that President Obama and his advisers encountered
when pursuing bin Laden was, in many ways, more troubling. The problem with
the Joint Chiefs’ assessment of the Bay of Pigs invasion was a simple matter of se-
mantics. By contrast, the Obama administration’s efforts to estimate the chances
that bin Laden was living in Abbottabad revealed a deeper conceptual confu-
sion: even when foreign policy officials attempted to debate the uncertainty that
surrounded one of their seminal decisions, they still struggled to understand
what those judgments meant and how they could be useful.
War and Chance seeks to dispel that confusion. The book describes the the-
oretical basis for assessing uncertainty in international politics, explains how
those judgments provide crucial insight for evaluating foreign policy decisions,
and shows that the conventional wisdom underestimates the extent to which
these insights can aid foreign policy discourse. These arguments apply to vir-
tually any kind of foreign policy analysis, from debates in the White House
Situation Room to op-eds in the New York Times. As the following chapters ex-
plain, it is impossible to evaluate foreign policy choices without assessing uncer-
tainty in some way, shape, or form.
To be clear, nothing in the book implies that assessing uncertainty in inter-
national politics should be easy or uncontroversial. The book’s main goal is, in-
stead, to show that scholars and practitioners handle this challenge best when
they confront it head-on, and to explain that there is no reason why these debates
should seem to be intractable. Even small advances in understanding this sub-
ject matter could provide substantial benefit—for, as President Obama reflected
when the bin Laden raid was over, “One of the things you learn as president is
that you’re always dealing with probabilities.”12
11
Bowden, The Finish, pp. 160–161; Bergen, Manhunt, p. 198; Sanger, Confront and Conceal, p. 93.
12
Bowden, The Finish, p. 161.
13
Carl von Clausewitz, On War, tr. Michael Howard and Peter Paret (Princeton, N.J.: Princeton
University Press, 1984), pp. 85–86. The U.S. Marine Corps’ capstone doctrine reflects this sentiment,
6 Wa r a n d C h a n c e
stating on its opening page that “War is intrinsically unpredictable. At best, we can hope to determine
possibilities and probabilities.” U.S. Marine Corps Doctrinal Publication 1, Warfighting, p. 7.
14
Clausewitz, On War, pp. 80, 86, 112, 184.
15
On how rational choice logic is contingent on personal beliefs, see Ken Binmore, Rational
Decisions (Princeton, N.J.: Princeton University Press, 2009). Chapter 2 discusses this point in more
detail. There is, for example, a large literature that describes how foreign policy analysts can use
Bayesian mathematics to assess uncertainty, but Bayesian reasoning depends on subjective proba-
bility estimates.
16
This view is premised on the notion that international politics and armed conflicts involve in-
definable levels of complexity. On complexity and international politics, see Richard K. Betts, “Is
Strategy an Illusion?” International Security, Vol. 25, No. 2 (2000), pp. 5–50; Jervis, System Effects;
Thomas J. Czerwinski, Coping with the Bounds: Speculations on Nonlinearity in Military Affairs
(Washington, D.C.: National Defense University Press, 1998); and Ben Connable, Embracing the Fog
of War (Santa Monica, Calif.: Rand, 2012).
Int roduc tion 7
17
James N. Mattis, “USJFCOM Commander’s Guidance for Effects- based Operations,”
Parameters, Vol. 38, No. 3 (2008), pp. 18–25. For similar questions about whether assessments of
uncertainty provide a sound basis for decision making in international politics, see Alexander Wendt,
“Driving with the Rearview Mirror: On the Rational Science of Institutional Design,” International
Organization, Vol. 55, No. 4 (2001), pp. 1019–1049; Jonathan Kirshner, “Rationalist Explanations
for War?” Security Studies, Vol. 10, No. 1 (2000), pp. 143–150; Alan Beyerchen, “Clausewitz,
Nonlinearity, and the Unpredictability of War,” International Security, Vol. 17, No. 3 (1992/93), pp.
59–90; and Barry D. Watts, Clausewitzian Friction and Future War, revised edition (Washington,
D.C.: National Defense University, 2004).
18
On how any international relations theory relies on a coherent standard of rational decision
making, see Charles L. Glaser, Rational Theory of International Politics (Princeton, N.J.: Princeton
University Press, 2010), pp. 2–3. On how assessments of uncertainty play a crucial role in nearly all in-
ternational relations paradigms, see Brian C. Rathbun, “Uncertain about Uncertainty: Understanding
the Multiple Meanings of a Crucial Concept in International Relations Theory,” International Studies
Quarterly, Vol. 51, No. 3 (2007), pp. 533–557.
19
Aristotle, Nicomachean Ethics, tr. Terence Irwin (Indianapolis, Ind.: Hackett, 1985), p. 1094b.
20
For arguments exhorting foreign policy analysts to adopt such humility, see Stanley Hoffmann,
Gulliver’s Troubles: Or, the Setting of American Foreign Policy (New York: McGraw-Hill, 1968), pp.
87–175; David Halberstam, The Best and the Brightest (New York: Random House, 1972).
8 Wa r a n d C h a n c e
21
Robert Jervis, Perception and Misperception in International Politics (Princeton, N.J.: Princeton
University Press, 1976); Rose McDermott, Risk- Taking in International Politics (Ann Arbor,
Mich.: University of Michigan Press, 1998); Philip E. Tetlock, Expert Political Judgment: How Good Is
It? How Can We Know? (Princeton, N.J.: Princeton University Press, 2005).
22
Mark M. Lowenthal, Intelligence: From Secrets to Policy, 3rd ed. (Washington, D.C.: CQ Press,
2006), p. 129; Yaakov Y. I. Vertzberger, Risk Taking and Decisionmaking: Foreign Military Intervention
Decisions (Stanford, Calif.: Stanford University Press, 1998), pp. 27–28.
23
Chapter 5 explains that this impulse is not purely self-serving. If national security analysts lose
the trust of their colleagues or the general public as a result of unjustified criticism, then this can un-
dermine their effectiveness regardless of whether that loss of standing is deserved. If analysts seek to
avoid probability assessment in order to escape justified criticism, then this would reflect the cynical
viewpoint.
24
Jack S. Levy, “Psychology and Foreign Policy Decision-Making,” in Leonie Huddy, David O.
Sears, and Jack S. Levy, eds., The Oxford Handbook of Political Psychology, 2nd ed. (New York: Oxford
University Press, 2013); Emilie M. Hafner-Burton et al., “The Behavioral Revolution and the Study
of International Relations,” International Organization, Vol. 71, No. S1 (April 2017), pp. S1–S31;
Joshua D. Kertzer and Dustin Tingley, “Political Psychology in International Relations,” Annual
Review of Political Science, Vol. 21 (2018), pp. 319–339.
25
Daniel Kahneman famously captured this insight with the distinction between “thinking
fast” and “thinking slow,” where the latter is less prone to heuristics and biases. Daniel Kahneman,
Thinking, Fast and Slow (New York: Farrar, Straus and Giroux, 2011). In national security specifically,
see Richards Heuer, Jr., Psychology of Intelligence Analysis (Washington, D.C.: Center for the Study of
Intelligence, 1999).
Int roduc tion 9
26
Wendt, “Driving with the Rearview Mirror”; Stanley A. Renshon and Deborah Welch Larson,
eds., Good Judgment in Foreign Policy (Lanham, Md.: Rowman and Littlefield, 2003); and Peter
Katzenstein and Lucia Seybert, “Protean Power and Uncertainty: Exploring the Unexpected in
World Politics,” International Studies Quarterly, Vol. 62, No. 1 (2018), pp. 80–93.
27
Dean Acheson, Present at the Creation: My Years at the State Department (New York: Norton,
1969), p. 375; John M. Schuessler, Deceit on the Road to War (Ithaca, N.Y.: Cornell University Press,
2015); Uri Bar-Joseph and Rose McDermott, Intelligence Success and Failure: The Human Factor
(New York: Oxford University Press, 2017). In other cases, decision makers may prefer to leave
assessments of uncertainty vague, so as to maintain freedom of action. Joshua Rovner, Fixing the
Facts: National Security and the Politics of Intelligence (Ithaca, N.Y.: Cornell University Press, 2011);
Robert Jervis, “Why Intelligence and Policymakers Clash,” Political Science Quarterly, Vol. 125, No. 2
(Summer 2010), pp. 185–204.
28
H. R. McMaster, Dereliction of Duty (New York: HarperCollins, 1997); Philip E. Tetlock,
“Reading Tarot on K Street,” The National Interest, No. 103 (2009), pp. 57–67; Christopher Hood,
The Blame Game: Spin, Bureaucracy, and Self-Preservation in Government (Princeton, N.J.: Princeton
University Press, 2011). As mentioned in note 23, the desire to avoid unjustified blame falls within the
rejectionists’ viewpoint.
29
This is consistent with the (much more informed) views of Robert Jervis, “Politics and Political
Science,” Annual Review of Political Science, Vol. 21 (2018), p. 17.
10 Wa r a n d C h a n c e
Chapter Outline
The book contains seven chapters. Chapter 1 describes how foreign policy
analysts often avoid assessing uncertainty in a manner that supports sound de-
cision making. This concern dates back to a famous 1964 essay by Sherman
Kent, which remains one of the seminal works in intelligence studies.31 But
chapter 1 explains that aversion to probabilistic reasoning is not just a problem
for intelligence analysts and that the issue runs much deeper than semantics.
We will see how scholars, practitioners, and pundits often debate international
politics without assessing the most important probabilities at all, particularly
by analyzing which policies offer the best prospects of success or by debating
whether actions are necessary to achieve their objectives, without carefully
assessing the chances that high-stakes decisions will actually work. Chapter 1
shows how this behavior is ingrained throughout a broad range of foreign policy
discourse, and it describes how these problematic practices shaped the highest
levels of U.S. decision making during the Vietnam War.
Chapter 2 explores the theoretical foundations of probabilistic reasoning
in international politics. It explains that, even though the most important
assessments of uncertainty in international politics are inherently subjective,
foreign policy analysts always possess a coherent conceptual basis for debating
these judgments in clear and structured ways. Chapter 3 then examines the
30
See, for example, Jack Snyder, Myths of Empire: Domestic Politics and International Ambition
(Ithaca, N.Y.: Cornell University Press, 1991); John Lewis Gaddis, We Now Know: Rethinking Cold
War History (New York: Oxford University Press, 1997); Dominic D. P. Johnson, Overconfidence
and War (Cambridge, Mass.: Harvard University Press, 2004); and John Mueller, Overblown: How
Politicians and the Terrorism Industry Inflate National Security Threats, and Why We Believe Them
(New York, Free Press 2006).
31
Sherman Kent, “Words of Estimative Probability,” Studies in Intelligence, Vol. 8, No. 4 (1964),
pp. 49–65.
Int roduc tion 11
32
One irony of these findings is that the national security officials who participated in the
experiments often insisted that fine-grained probabilistic distinctions would not shape their decisions,
even as the experimental data unambiguously demonstrated that this information influenced their
views. The notion that decision makers may not always be aware of how they arrive at their own
beliefs is one of the central motivations for conducting experimental research in political psychology.
Yet, unlike areas of political psychology that show how decision makers’ views are susceptible to
unconscious biases, the book’s empirical analysis suggests that national security officials are more
sophisticated than they give themselves credit for in handling subjective probabilities.
12 Wa r a n d C h a n c e
This argument further highlights that the goal of improving assessments of un-
certainty in international politics is not just an issue for government officials,
but that it is also a matter of how scholars, journalists, and pundits can raise the
standards of foreign policy discourse.
of foreign policy do not lend themselves to this kind of data collection. Scholars
have therefore tended to treat the assessment of uncertainty as a topic better-
suited to philosophical debate than to empirical analysis.33
In recent years, however, social scientists have developed new methods to
study probabilistic reasoning, and governmental organizations have become in-
creasingly receptive to supporting empirical research on the subject. Chapter 3’s
analysis of the value of precision in probability assessment would not have been
possible without the U.S. Intelligence Community’s decision to sponsor the col-
lection of nearly one million geopolitical forecasts.34 Similarly, chapter 4’s anal-
ysis of how decision makers respond to assessments of uncertainty depended
on the support of the National War College and the willingness of more than
six hundred national security professionals to participate in experimental re-
search. Thus, even if none of the following chapters represents the final word on
its subject, one of the book’s main contributions is simply to demonstrate that
it is possible to conduct rigorous empirical analysis of issues that many scholars
and practitioners have previously considered intractable.
This book also differs from previous scholarship in how it treats the relation-
ship between explanation and prescription. Academic studies of international
politics typically prioritize the explanatory function of social science, in which
scholars focus on building a descriptive model of the world that helps readers
to understand why states and leaders act in puzzling ways. Most scholars there-
fore orient their analyses around theoretical and empirical questions that are im-
portant for explaining observed behavior. Although such studies can generate
policy-relevant insights, those insights are often secondary to scholars’ descrip-
tive aims. Indeed, some of the most salient insights that these studies produce
is that there is relatively little we can do to improve problematic behavior, either
because foreign policy officials have strong incentives to act in a harmful fashion
or because their choices are shaped by structural forces outside of their control.35
This book, by contrast, prioritizes the prescriptive value of social science. It
aims to understand what sound decision making entails in principle and how
close we can get to that standard in practice. To serve these objectives, the fol-
lowing chapters focus on theoretical and empirical questions that are important
for understanding how to improve foreign policy analysis and decision making,
33
Mandeep K. Dhami, “Towards an Evidence-Based Approach to Communicating Uncertainty
in Intelligence Analysis,” Intelligence and National Security, Vol. 33, No. 2 (2018), pp. 257–272.
34
Philip E. Tetlock and Daniel Gardner, Superforecasting: The Art and Science of Prediction
(New York: Crown, 2015).
35
For a critique of how international relations scholars often privilege descriptive aims over
prescriptive insights, see Alexander George, Bridging the Gap: Theory and Practice in Foreign
Policy (Washington, D.C.: U.S. Institute of Peace, 1993); and Bruce W. Jentleson, “The Need for
Praxis: Bringing Policy Relevance Back In,” International Security, Vol. 26, No. 4 (2002), pp. 169–183.
14 Wa r a n d C h a n c e
not just to explain the current state of affairs in this field. As with most social sci-
ence, however, the book’s prescriptive and descriptive aims overlap. By showing
how scholars and practitioners often exaggerate the obstacles to assessing un-
certainty in international politics, the following chapters reveal how many key
aspects of this subject remain understudied and misunderstood.
36
Elsewhere, I have argued that analytic confidence comprises three distinct attributes: the avail-
ability of reliable evidence supporting a judgment, the range of reasonable opinion surrounding
that judgment, and the extent to which analysts expect their judgment to change in response to new
information. See Jeffrey A. Friedman and Richard Zeckhauser, “Analytic Confidence and Political
Decision Making: Theoretical Principles and Experimental Evidence from National Security
Professionals,” Political Psychology, Vol. 39, No. 5 (2018), pp. 1069–1087.
37
See, for example, James Clapper’s description of “percentages of confidence” in note 10, above.
For further discussion of how scholars and practitioners conflate probability and confidence, see
Jeffrey A. Friedman and Richard Zeckhauser, “Assessing Uncertainty in Intelligence,” Intelligence and
National Security, Vol. 27, No. 6 (2012), pp. 834–841.
Int roduc tion 15
38
Robert Winkler, An Introduction to Bayesian Inference and Decision, 2nd ed. (Sugar Land,
Tex.: Probabilistic Publishing, 2003). Decision scientists further distinguish between situations of
“risk,” where all probabilities relevant to decision making are known; situations of “uncertainty,”
where decision makers know all the relevant outcomes, but those outcomes have ambiguous
probabilities of occurrence; and situations of “ignorance,” where decision makers do not know all
the relevant outcomes that their choices would affect. See Richard Zeckhauser, “Investing in the
Unknown and Unknowable,” in Francis Diebold, Neil Doherty, and Richard Herring, eds., The
Known, the Unknown, and the Unknowable in Financial Risk Management (Princeton, N.J.: Princeton
University Press, 2010).
39
One study of more than three thousand doctor-patient interactions found that physicians
described uncertainty about treatment outcomes in just seventeen percent of complex procedures
(and in four percent of procedures overall). See Clarence H. Braddock et al., “Informed Decision
Making in Outpatient Practice,” Journal of the American Medical Association, Vol. 282, No. 24
(December 1999), pp. 2313–2320.
40
For competing views on this subject, see Cass R. Sunstein, Valuing Life: Humanizing the
Regulatory State (Chicago, Ill.: University of Chicago Press, 2014); and Frank Ackerman and Lisa
Heinzerling, Priceless: On Knowing the Price of Everything and the Value of Nothing (New York: W.
W. Norton, 2004).
16 Wa r a n d C h a n c e
For more than a decade, climate scientists have engaged in a vigorous debate
over the proper methods for communicating uncertainty to the public regarding
projections of global warming, sea level rise, and other environmental issues.41
In one of the most salient examples of how vague probabilistic reasoning
shapes civil society, the U.S. criminal justice system reaches verdicts by asking
jurors to determine whether the probability of a defendant’s guilt lies “beyond a
reasonable doubt.” Judges, juries, and attorneys hold strikingly divergent views
of what the reasonable doubt standard entails. Some of the ways that lawyers
and judges have described this standard in court include “60 percent,” “kind of
like 75 percent,” “somewhere between the 75-and 90-yard line on a 100-yard-
long football field,” and “a 1,000-piece puzzle with sixty pieces missing.”42 One
survey that asked federal judges to quantify the “beyond a reasonable doubt”
standard produced answers with a minimum of fifty percent, a maximum of one
hundred percent, an average of ninety percent, and a standard deviation of eight
percentage points.43 A related survey found several real juries in which a majority
of jurors believed that a seventy percent probability of guilt lay beyond a reason-
able doubt.44 These seemingly arbitrary interpretations raise troubling questions
about the application of criminal justice. Yet, as in international politics, many
scholars and practitioners of the law oppose assessing uncertainty in clearer and
more structured ways.
Empirical findings from one field do not always apply to others. Yet the book’s
conceptual framework and empirical methodology can be extended to nearly
any domain of high-stakes decision making. And to the extent that international
politics are typically understood to be particularly complex and subjective, this
domain should pose a high degree of difficulty for improving the quality and
rigor of probabilistic reasoning. Thus, to the extent that the book pushes back
against entrenched skepticism about the nature and limits of assessing uncer-
tainty in international politics, it suggests that other disciplines might also ben-
efit from revisiting their own views of this subject.
41
David V. Budescu, Stephen Broomell, and Han-Hui Por, “Improving Communication of
Uncertainty in the Report of the Intergovernmental Panel on Climate Change,” Psychological Science,
Vol. 20, No. 3 (2009), pp. 299–308.
42
Peter Tillers and Jonathan Gottfried, “Case Comment—United States v. Copeland, 369
F. Supp. 2d 275 (E.D.N.Y. 2005): A Collateral Attack on the Legal Maxim That Proof beyond a
Reasonable Doubt Is Unquantifiable?” Law, Probability, and Risk, Vol. 5, No. 2 ( June 2006), pp.
135–157.
43
C. M. A. McAuliff, “Burdens of Proof: Degrees of Belief, Quanta of Evidence, or Constitutional
Guarantees?” Vanderbilt Law Review 35 (November 1982), p. 1325.
44
James Franklin, “Case Comment—United States v. Copeland, 369 F. Supp. 2d 275 (E.D.N.Y.
2005): Quantification of the ‘Proof Beyond Reasonable Doubt’ Standard,” Law, Probability, and Risk,
Vol. 5, No. 2 ( June 2006), p. 165.
1
This chapter describes how foreign policy officials often avoid assessing uncer-
tainty in a manner that supports sound decision making. This problem extends
beyond the fact that foreign policy analysts are generally unwilling to articulate
their judgments precisely. Instead, we will see how foreign policy officials often
appear reluctant to assess basic elements of uncertainty at all. For example, the
chapter shows that foreign policy officials frequently orient debates around de-
termining what policies offer the best prospects for success or what actions are
necessary to achieve strategic objectives, without analyzing the chances that
these measures will actually work. This behavior is ingrained throughout foreign
policy discourse, from day-to-day intelligence reports to the highest levels of na-
tional security decision making.
The chapter describes this problem in two ways. First, it reviews official
guidelines for assessing uncertainty in the U.S. military, Intelligence Community,
and other national security agencies. These guidelines demonstrate a wide-
spread aversion to probabilistic reasoning. The chapter then presents a case
study that describes how U.S. officials debated their strategic prospects during
the Vietnam War between 1961 and 1965. Through qualitative and quantitative
analysis of key documents from this period, we will see that senior leaders con-
sistently avoided assessing the uncertainty that surrounded their decisions to
escalate the war.
The Vietnam case holds special significance considering that Secretary of
Defense Robert McNamara and his staff were some of the most analytically
inclined decision makers in the history of U.S. foreign policy. Conventional
narratives of Vietnam War decision making portray these so-called Whiz Kids
as being overly committed to methodological rigor, especially when it came
to conducting the kinds of quantitative, cost-benefit analyses that Secretary
McNamara had pioneered as an executive at Ford Motors. This narrative may
accurately characterize the immense analytic firepower that the Pentagon de-
voted to measuring tactical progress during the war. Yet we will also see that
17
18 Wa r a n d C h a n c e
Figure 1.1 Sherman Kent’s odds table. Source: Kent, “Words of Estimative Probability.”
U.S. officials put virtually no systematic effort into assessing the chances that
their overall strategy would succeed.
Seen from this vantage point, the McNamara Pentagon did not suffer from an
excess of methodological rigor when guiding the Vietnam War, but rather from
an almost complete absence of attention to the logical foundations of their most
important choices. The fact that even the Whiz Kids could not bring themselves
to address these matters in clear and structured ways highlights a basic diver-
gence between the theory and the practice of foreign policy decision making.
Although virtually every theory of decision making in international politics
depends on assessing uncertainty in one form or another, this chapter shows
that foreign policy officials can place lives and resources at risk without carefully
evaluating what those risks entail.
Sherman Kent, “Words of Estimative Probability,” Studies in Intelligence, Vol. 8, No. 4 (1964),
1
pp. 49–65. Kent defined the numeric ranges in the odds table by surveying how foreign policy officials
intuitively quantified probabilistic phrases. This method of eliciting “inferred probabilities” is now
Pathol og ie s o f P robab ilit y A s s e s sme nt 19
Sherman Kent was one of the most influential figures in the early years of the
United States Intelligence Community. A former history professor at Yale, Kent
became the director of the Board of National Estimates, an organization sim-
ilar to today’s National Intelligence Council. Kent is known as the “founding fa-
ther of intelligence estimation” in the United States, and the Central Intelligence
Agency training center is named in his honor. But despite his position of au-
thority, Kent could not convince his colleagues to use the odds table or to clarify
their assessments of uncertainty in any other systematic way.
In his essay, Kent described receiving pushback from a group of analysts he
called the “poets” (in contrast to the “mathematicians,” who supported making
assessments of uncertainty more precise). Kent wrote that the poets “appear
to believe the most a writer can achieve when working in a speculative area of
human affairs is communication in only the broadest general sense. If he gets
the wrong message across or no message at all—well, that is life.” To the extent
that the poets saw the odds table as reflecting “bogus precision,” Kent argued, it
represented a “fundamentally defeatist” attitude. The poets nevertheless retained
the upper hand in this debate for four decades, defeating subsequent attempts by
other high-ranking officials, including National Intelligence Council chairman
Joseph Nye, to clarify assessments of uncertainty in intelligence.2
The poets were ultimately forced to cede ground following mistaken
assessments of Saddam Hussein’s weapons of mass destruction (WMD)
programs, which President George W. Bush used to justify the 2003 invasion of
Iraq. A National Intelligence Estimate (NIE) on this subject opened by stating,
“We judge that Iraq has continued its weapons of mass destruction programs in
defiance of UN resolutions and restrictions.”3 Phrases like “we judge” are what
a standard tool in the decision sciences. See Ruth Beyth-Marom, “How Probable Is Probable?”
Journal of Forecasting, Vol. 1 (1982), pp. 257–269; Frederick Mosteller and Cleo Youtz, “Quantifying
Probabilistic Expressions,” Statistical Science, Vol. 5, No. 1 (1990), pp. 2–12; and David R. Mandel,
“Accuracy of Intelligence Forecasts from the Intelligence Consumer’s Perspective,” Policy Insights
from the Behavioral and Brain Sciences, Vol. 2, No. 1 (2015), pp. 111–120.
2
Joseph S. Nye, Jr., “Peering into the Future,” Foreign Affairs, Vol. 73, No. 4 (1994), pp. 82–93.
Reviewing a corpus of 379 NIEs written between 1964 and 1994, Richard Zeckhauser and I found
that only 4 percent of key judgments assessed uncertainty using quantitative indicators, whereas
18 percent of key judgments did not convey even a qualitative sense of the chances that a state-
ment was true. Jeffrey A. Friedman and Richard Zeckhauser, “Assessing Uncertainty in Intelligence,”
Intelligence and National Security, Vol. 27, No. 6 (2012), p. 837. For similar descriptions of the aver-
sion to transparent probabilistic reasoning in Israeli and Canadian intelligence, respectively, see
Zvi Lanir and Daniel Kahneman, “An Experiment in Decision Analysis in Israel in 1975,” Studies in
Intelligence, Vol. 50, No. 4 (2006), pp. 11–19; and Alan Barnes, “Making Intelligence Analysis More
Intelligent: Using Numeric Probabilities,” Intelligence and National Security, Vol. 31, No. 1 (2016), pp.
327–344.
3
NIE 2002-16HC, Iraq’s Continuing Programs for Weapons of Mass Destruction (Oct. 2002).
20 Wa r a n d C h a n c e
the analysts did not really think about that statement before making
it. It was a reflection of their gut view. It did not reflect a thorough
4
See, for example, Michael Schrage, “What Percent Is Slam Dunk?” Washington Post, February
20, 2005, p. B01. Chapter 5 will return to this episode, describing how the “slam dunk” term was mis-
takenly (but foreseeably) used to characterize the Iraq NIE. Richard Betts similarly argues that the
principal error in intelligence on Iraqi weapons programs was a “failure to make clear how weak the
direct evidence was for reaching any conclusion.” Richard K. Betts, Enemies of Intelligence: Knowledge
and Power in American National Security (New York: Columbia University Press, 2006), p. 116.
5
Robert Jervis, “Reports, Politics, and Intelligence Failures: The Case of Iraq,” Journal of Strategic
Studies, Vol. 29, No. 1 (2006), p. 44.
6
Michael Morell, The Great War of Our Time: The CIA’s Fight against Terrorism from Al Qa’ida to
ISIS (New York: Twelve, 2014), p. 102.
Pathol og ie s o f P robab ilit y A s s e s sme nt 21
7
Ibid. Note that in this and the previous quotation, Morell uses the term “confidence” as a syn-
onym for probability. The Iraq NIE conflates these concepts in similar ways—or, at least, never draws
a clear distinction between the chances that a statement is true and the basis on which analysts have
drawn those conclusions. Chapter 2 returns to the distinction between probability and confidence.
8
2004 Intelligence Reform and Terrorism Prevention Act, Section 1019(b)(2)(A).
9
The Defense Intelligence Agency guidelines in Figure 1.1 have been redrawn for clarity; each of
the other guidelines in Figure 1.2 is a facsimile based on original documents.
10
Defense Intelligence Agency Tradecraft Note 01-15, Expressing Analytic Certainty ( January
2015), p. 1. The Defense Intelligence Agency (DIA) guidelines contain other peculiarities. For ex-
ample, they do not give analysts any clear option for communicating probability estimates that are
near fifty percent. Terms in the middle of the DIA’s spectrum, such as “undetermined” and “perhaps,”
suggest lack of knowledge as opposed to considered judgments, and the guidelines state that these
phrases should only be used to reflect “purely exploratory analysis.”
From the 2007 National Intelligence Estimate, “Prospects for Iraq’s Stability”:
Intelligence judgments pertaining to likelihood are intended to reflect the community’s sense of the
probability of a development or event. Assigning precise numerical ratings to such judgments would
imply more rigor than we intend. The chart below provides a rough idea of the relationship of terms
to each other.
Will
Highly improbable Do not anticipate Could/can Anticipate Highly probable
Very doubtfull Do not expect Perhaps/unsure Expect Very little doubt
Does not appear Unknown Appears
Doubtful Do not know
Unable to assess
Decreasing likelihood Increasing likelihood
0 20 40 60 80 100
Highly Roughly Highly Nearly
Remote improbable Improbable even odds Probable probable certain
It is hard to say exactly what these judgments mean. One plausible interpre-
tation is that analysts conflated assessments of probability (which reflect the
chances that a statement is true) with assessments of confidence (which re-
flect the extent to which analysts believe they have a sound basis for drawing
conclusions).12 This would raise further questions about what probabilities a
“high confidence” or “moderate confidence” judgment might reflect. If analysts
did intend to distinguish between probability and confidence, then their
conclusions would be even harder to interpret, because the only other proba-
bilistic term these judgments use is the estimative verb, “we assess.” Either way,
these judgments are not consistent with the analytic standards for assessing
11
ICA 2017-01D, Assessing Russian Activities and Intentions in Recent U.S. Elections ( January
2017). Emphasis added. See https://www.dni.gov/files/documents/ICA_2017_01.pdf.
12
Chapter 2 expands on this conceptual distinction. Friedman and Zeckhauser, “Assessing
Uncertainty in Intelligence,” pp. 834–841, show that intelligence analysts appear to regularly conflate
the language of probability and the language of confidence, particularly when they are dealing with
uncertainty about questions to which the answers are knowable. Intelligence analysts are thus more
likely to use the language of confidence to describe uncertainty about events that have occurred in
the past; they are more likely to use the language of probability to describe uncertainty about events
that will occur in the future; and they rarely employ both concepts when offering a single judgment.
24 Wa r a n d C h a n c e
Figure 1.3 U.S. Army guidelines for risk assessment. Source: U.S. Army Field Manual 5-
19, Composite Risk Management.
uncertainty that the document itself put forth. This discrepancy highlights how
intelligence analysts remain ambivalent about the importance of assessing un-
certainty in clear and structured ways, despite decades of effort spent promoting
that goal.
13
U.S. Army Field Manual 5-19, Composite Risk Management (2006), paras. 1–23. U.S. Army doc-
trine can be found online at https://usacac.army.mil/cac2/Doctrine.asp.
14
“CJCS Joint Risk Assessment System” (2004), accessed October 5, 2016, http://dde.carlisle.
army.mil/LLL/DSC/ppt/L14_CRA.pdf.
15
U.S. Army Field Manual 2-22.3, Human Intelligence Collector Operations (2006), p. B-1.
Figure 1.4 Joint Chiefs of Staff risk assessment system.
Pathol og ie s o f P robab ilit y A s s e s sme nt 27
Figure 1.5 U. S. Army guidelines for source reliability assessment. Source: U.S. Army
Field Manual 2-22.3, Human Intelligence Collector Operations.
prewar intelligence on Iraq, for instance, was that analysts should have been
more skeptical of the source known as “Curveball,” who had fabricated informa-
tion about biological weapons. Even though several intelligence officials doubted
Curveball’s reliability, that skepticism was not clearly conveyed in reports that
drew on Curveball’s interviews.16
To help avoid such problems, collectors of human intelligence generally attach
source reliability assessments to their reports. The source assessment doctrine
shown in Figure 1.5 encourages analysts to communicate these evaluations using
six qualitative terms. These assessments of reliability are defined in relation to an-
other set of qualitative terms describing doubt. The doctrine then provides no in-
dication of how either reliability or doubt relates to the probability that a source is
telling the truth.
The guidelines described in this section reflect just a handful of relevant
examples of how intelligence agencies and military organizations encourage
analysts to leave probability assessments vague.17 Nevertheless, the guidelines
16
Senate Select Committee on Intelligence, Report on the U.S. Intelligence Community’s Prewar
Intelligence Assessments on Iraq (2004), ch. IV.
17
This behavior is not limited to the United States. The current “NATO Common Standard”
for assessing probability in intelligence analysis involves five terms: highly unlikely (less than 10 per-
cent), unlikely (10 to 40 percent), even chance (40 to 60 percent), likely (60 to 90 percent), and highly
likely (more than 90 percent). NATO Standard AJP 2.1, Allied Joint Doctrine for Intelligence Procedures,
pp. 3–15. For NATO standards on source reliability assessment that are similar to the doctrine
shown in Figure 1.5, see Dan Irwin and David Mandel, “Standards for Evaluating Source Reliability
28 Wa r a n d C h a n c e
and Information Credibility in Intelligence Production,” in David Mandel, ed., Assessment and
Communication of Uncertainty in Intelligence to Support Decision-Making (Brussels, Belgium: NATO
Science and Technology Organization, in press). For additional examples drawn from Britain,
Canada, and Israel, see Mandeep K. Dhami, Understanding and Communicating Uncertainty in
Intelligence Analysis (London, U.K.: Report Prepared for H.M. Government, 2013); Barnes, “Making
Intelligence Analysis More Intelligent”; and Lanir and Kahneman, “Experiment in Decision Analysis
in Israel.”
18
Commander, International Security Assistance Force- Afghanistan, COMISAF Initial
Assessment, August 2009, pp. 1–3, 1–4, and 2–22. See http://www.washingtonpost.com/wp-dyn/
content/article/2009/09/21/AR2009092100110.html.
19
Kevin Marsh, “Obama’s Surge: A Bureaucratic Politics Analysis of the Decision to Order a
Troop Surge in the Afghanistan War,” Foreign Policy Analysis, Vol. 10, No. 3 (2014), pp. 265–288.
Pathol og ie s o f P robab ilit y A s s e s sme nt 29
the staff compares feasible COAs [courses of action] to identify the one
with the highest probability of success . . . . The selected COA should also
pose the minimum risk to the force and mission accomplishment; place
the force in the best posture for future operations; provide maximum
20
Jonathan Baron, “Confusion of Relative and Absolute Risk in Valuation,” Journal of Risk and
Uncertainty, Vol. 14 (1997), pp. 301–309; and Gerd Gigerenzer, Calculated Risks: How to Know
When Numbers Deceive You (New York: Simon and Schuster, 2002). On how governments likewise
spend inordinate resources reducing risks that are already very small, see Cass R. Sunstein, Worst-
Case Scenarios (Cambridge, Mass.: Harvard University Press, 2007); and John Mueller and Mark
G. Stewart, Terror, Security, and Money: Balancing the Risks, Benefits, and Costs of Homeland Security
(New York: Oxford University Press, 2011).
21
Emphasis added. U.S. Army Field Manual 5-0, The Operations Process (2009), para. 2–19.
30 Wa r a n d C h a n c e
22
Ibid., para. B-173.
23
U.S. Army Field Manual 101-5, Staff Organization and Operations (1997), para. 5–24.
24
U.S. Army Field Manual 34-130, Intelligence Preparation of the Battlefield (1994), p. A-7.
25
U.S. Joint Forces Command, Commander’s Handbook for an Effects-Based Approach to Joint
Operations (2006), pp. 3–14.
26
Alain C. Enthoven and K. Wayne Smith, How Much Is Enough? Shaping the Defense Program,
1961–1969 (Santa Monica, Calif.: Rand, 2005).
27
George W. Downs and David M. Rocke, “Conflict, Agency, and Gambling for Resurrection: The
Principal-Agent Problem Goes to War,” American Journal of Political Science, Vol. 38, No. 2 (1994), pp.
362–380.
Pathol og ie s o f P robab ilit y A s s e s sme nt 31
expected value without estimating its chances of success in absolute terms. Any
heuristic that avoids doing this will generate systematic biases.
The terrorism advisory system of the Department of Homeland Security
(DHS) offers another example of how relative probability provides limited and
potentially misleading information. Starting in 2003, the system involved five
color-coded threat levels: red (severe risk), orange (high risk), yellow (elevated/
significant risk), blue (guarded/general risk), and green (low risk). Since none
of these terms received a clear definition, the primary way in which DHS used
the system to communicate information was by shifting the color-coded threat
levels over time. These shifts essentially instructed citizens to adjust their prior
assumptions about the risk of terrorism up or down. But since most people al-
ready hold dramatically inflated perceptions about what the risk of terrorism
entails, raising official threat levels may have only created unnecessary alarm.28
Faced with mounting critiques, the DHS scrapped this framework in 2011. The
department’s new National Terrorism Advisory System provides information
about specific threats, whose risk levels are classified into three tiers (“elevated,”
“intermediate,” and “imminent”), none of which receives a definition that maps
to a discernible segment of the number line.29
Relative probability plays a prominent role in intelligence analysis, too. For
example, the most widely taught “structured analytic technique” for assessing
uncertainty in intelligence is called analysis of competing hypotheses
(ACH).30 ACH instructs analysts to create a matrix in which the columns
represent plausible conclusions and the rows represent pieces of available
evidence. Analysts populate this matrix by indicating the extent to which each
piece of evidence is inconsistent with each hypothesis. When the analysis is
complete, the hypothesis with the lowest inconsistency score is considered to
be the most likely conclusion.
ACH combats confirmation bias by ensuring that analysts consider alternative
judgments and at the same time fosters a falsificationist mindset that prevents
jumping to conclusions. These goals are undeniably important. Yet identifying
28
Rose McDermott and Philip G. Zimbardo, “The Psychological Consequences of Terrorist
Alerts,” in Bruce Bongar et al., eds., Psychology of Terrorism (New York: Oxford University Press,
2007), pp. 357–370; and Jacob N. Shapiro and Dara Kay Cohen, “Color Blind: Lessons from the
Failed Homeland Security Advisory System,” International Security, Vol. 32, No. 2 (Fall 2007), pp.
121–154.
29
For example, an “elevated” alert “warns of a credible terrorism threat,” and an “imminent”
alert “warns of a credible, specific, and impending terrorism threat.” See Department of Homeland
Security, “National Terrorism Advisory System (NTAS),” https://www.dhs.gov/national-terrorism-
advisory-system, accessed August 6, 2018.
30
Richards Heuer and Randolph Pherson, Structured Analytic Techniques for Intelligence Analysis
(Washington, D.C.: CQ Press, 2010), pp. 160–169.
32 Wa r a n d C h a n c e
the most probable conclusion says little about what that probability entails. Just
as a strategy that offers the best chances of success might not be worth pursuing,
the most likely conclusion is not necessarily worth betting on.31
This was indeed the heart of the critiques we encountered earlier regarding in-
telligence on Iraq’s weapons of mass destruction programs. Though the National
Intelligence Estimate may have reflected the most plausible interpretation of the
available evidence, it was still important to assess the probability that this inter-
pretation was true. Highlighting the presence of this uncertainty could then have
provoked important questions about why existing evidence about Iraq’s weapons
programs could just sustain more conclusive judgments. As Morell describes it,
The way analysts talked and wrote about their judgments would have
led anyone to think that it was a slam dunk—that is, that Saddam de-
finitively had active WMD programs. No one ever said to me, [CIA
Deputy Director for Intelligence Jami] Miscik, [Deputy Director of
Central Intelligence John] McLaughlin, [CIA Director George] Tenet,
[Secretary of State Condoleezza] Rice, or the president, “You know,
there is a chance he might not have them.” Such a statement would have
gotten everyone’s attention.32
Conditioning
A third way that assessments of uncertainty can be too vague to support sound
foreign policy decision involves a practice that I call conditioning. This occurs
when analysts describe the conditions that must hold in order for a statement to
be true. Conditioning is the least informative approach to assessing uncertainty
described in this chapter. The logic of conditioning states that if a statement is
true, then an assumption must hold. Note, however, that this logic does not flow
in the opposite direction. If the assumption holds, this implies nothing about the
probability that the statement is true.
31
Indeed, there is remarkably little systematic evidence that proves that ACH improves judg-
mental accuracy. See David R. Mandel, Christopher W. Karvetski, and Mandeep K. Dhami, “Boosting
Intelligence Analysts’ Judgmental Accuracy: What Works, What Fails?” Judgment and Decision
Making, forthcoming; Welton Chang and Philip Tetlock, “Rethinking the Training of Intelligence
Analysts,” Intelligence and National Security, Vol. 31, No. 6 (2016), pp. 903–920.
32
Morell, Great War of Our Time, p. 103. Note how this statement implies that the proper way
to assess the impact of the Iraq NIE is not to speculate about what would have happened if the NIE
had never been published, but rather to explore what might have happened if the NIE had been
researched and written in a manner that gave higher priority to the importance of assessing and
communicating uncertainty.
Pathol og ie s o f P robab ilit y A s s e s sme nt 33
One vivid example of conditioning involves recent debates over force sizing
for counterinsurgency. In 2006, the U.S. military released a new counterinsur-
gency doctrine known as Field Manual 3-24 (FM 3-24).33 Here is the guidance
that FM 3-24 offered for sizing forces in counterinsurgency wars:
33
The manual’s formal title was U.S. Army Field Manual 3-24/Marine Corps Warfighting
Publication 3-33.5, Counterinsurgency.
34
Field Manual 3-24, Counterinsurgency, para. 1–67.
35
This is highly dubious: see Jeffrey A. Friedman, “Manpower and Counterinsurgency: Empirical
Foundations for Theory and Doctrine,” Security Studies, Vol. 20, No. 4 (2011), pp. 556–591.
36
Peter Mansoor, Surge (New Haven, Conn.: Yale University Press, 2013), ch. 2.
34 Wa r a n d C h a n c e
In principle, one could offer a clear probability assessment and identify the
conditions that must hold for a statement to be true. In practice, however, many
scholars and practitioners argue that foreign policy analysts should limit their
focus to the latter of those tasks. Intelligence scholars call this the “linchpins
and drivers” approach to foreign policy analysis.37 The basic idea behind this ap-
proach is that analysts should lay out the key assumptions that are required to
sustain a conclusion under uncertainty, but then leave it to readers to assess those
assumptions’ collective plausibility. Some intelligence estimates explicitly define
their scope along these lines. For example, the National Intelligence Council’s
Global Trends 2025 report opened by explaining, “The study as a whole is more
a description of the factors likely to shape events than a prediction of what will
actually happen. By examining a small number of variables that we judge prob-
ably will have a disproportionate influence on future events and possibilities,
the study seeks to help readers to recognize signposts indicating where events
are headed.” Global Trends 2030 similarly describes its aim as being to identify
“megatrends” and “game-changers” that could potentially lead to “alternative
worlds,” but not to make predictions about the chances that those worlds will
actually materialize.
Yet analysts cannot speak about the future without engaging in some form
of probabilistic reasoning. There is little practical value in analyzing events that
truly have no chance of taking place. And since analysts cannot devote equal text
to describing every global trend or possible future that they can imagine, those
judgments depend—however implicitly—on the premise that these statements
are sufficiently likely to deserve the reader’s attention. Leaving these judgments
vague only defers the challenge of assessing uncertainty from analysts to de-
cision makers, who generally have much less time, expertise, and incentive to
make accurate judgments.38
Relative probability and conditioning are important because they dem-
onstrate how foreign policy officials’ aversion to probabilistic reasoning runs
much deeper than semantics. The problem is not just that analysts and decision
makers are reluctant to assess uncertainty in clear and structured ways. Indeed,
37
Douglas J. MacEachin, “The Tradecraft of Analysis,” in Roy Godson, Ernest R. May, and Gary
Schmitt eds., U.S. Intelligence at the Crossroads: Agendas for Reform (Washington, D.C.: Brassey’s,
1995); Jack Davis, A Compendium of Analytic Tradecraft Notes (Washington, D.C.: Central Intelligence
Agency, 1997); and Thomas Fingar, Reducing Uncertainty: Intelligence Analysis and National Security
(Stanford, Calif.: Stanford Security Studies, 2011), pp. 53–74.
38
As Richard Betts puts it, “The greater the ambiguity, the greater the impact of preconceptions.”
Richard K. Betts, Surprise Attack: Lessons for Defense Planning (Washington, D.C.: Brookings, 1982),
p. 103. Joshua Rovner, Fixing the Facts: National Security and the Politics of Intelligence (Ithaca,
N.Y.: Cornell University Press, 2011), further explains how decision makers can exploit judgmental
ambiguity for political ends.
Pathol og ie s o f P robab ilit y A s s e s sme nt 35
this section has shown how intelligence agencies and military organizations de-
liberately develop heuristics that allow analysts and decision makers to avoid
grappling with crucial probabilities entirely.
It is also important to note that while all of the examples described in
this chapter come from official government sources, similar problems recur
throughout broader foreign policy discourse. Recent works by Daniel Gardner,
Chaim Kaufmann, John Mueller, Nate Silver, Mark Stewart, and Philip Tetlock,
among others, document a range of strategies that foreign policy commentators
use to appear insightful without actually making falsifiable claims.39 These
strategies include focusing excessive attention on hypothetical scenarios or
worst-case outcomes, defending policies based on desirability rather than fea-
sibility, and lacing statements with vague caveats and contingencies. In many
cases, it is virtually impossible to define the outcomes that analysts are assessing,
let alone to understand how probable analysts think these outcomes might be.
Here, too, the problem is not just that foreign policy analysts use imprecise lan-
guage to express their beliefs, but that they avoid addressing the uncertainty that
surrounds high-stakes choices.
The remainder of the chapter shows how these problems surround the highest
levels of foreign policy decision making. To do this, I present a mixed-method
case study that examines Vietnam War decision making, focusing on how senior
leaders assessed the risks of escalating U.S. involvement in Vietnam between
1961 and 1965. I devote special attention to this case for four reasons.
The first of these reasons is that, if any group of leaders should have been
expected to evaluate the logical foundations of high-stakes choices in clear and
structured ways, it would have been the group that shaped Vietnam War policy
from 1961 to 1965. President Kennedy appointed Robert McNamara as sec-
retary of defense based on his reputation for conducting rigorous cost-benefit
analyses as an executive at Ford Motor Company. McNamara staffed his office
with the so-called Whiz Kids, who possessed an unusual zeal for quantitative
analysis.40 The Whiz Kids found a receptive audience among the Kennedy
administration’s roster of public intellectuals, including National Security
Advisor McGeorge Bundy, a political scientist who had previously served as
39
Daniel Gardner, Future Babble (New York: Plume, 2011); Chaim Kaufmann, “Threat Inflation
and the Failure of the Marketplace of Ideas,” International Security, Vol. 29, No. 1 (Summer 2004),
pp. 5–48; Nate Silver, The Signal and the Noise (New York: Penguin, 2012); Philip Tetlock and Daniel
Gardner, Superforecasting (New York: Crown, 2015); John Mueller, Overblown: How Politicians and
the Terrorism Industry Inflate National Security Threats, and Why We Believe Them (New York: Free
Press 2006); John Mueller and Mark G. Stewart, Chasing Ghosts: The Policing of Terrorism
(New York: Oxford University Press, 2016).
40
David Halberstam, The Best and the Brightest (New York: Penguin, 1983); John A. Byrne, The
Whiz Kids (New York: Doubleday, 1993).
36 Wa r a n d C h a n c e
dean of Harvard’s Faculty of Arts and Sciences. If even these leaders assessed
uncertainty in the problematic ways that this chapter describes, that would lend
credibility to the notion that these are consistent flaws in foreign policy decision
making.41
Debates about escalating the Vietnam War also provide exactly the kind of
context in which we would expect rational decision makers to assess uncertainty
as carefully as possible. U.S. officials knew that all available strategies in Vietnam
presented serious risks. They had genuine doubts about whether those risks
were worth accepting, and several years to develop opinions on this issue. In
this sense, the escalation of U.S. involvement in Vietnam presents a case where
leaders should have had both the motivation and the opportunity to study their
strategic options in detail.42
The documentary record of Vietnam War decision making is, furthermore,
unusually thorough. Collections such as the Foreign Relations of the United States,
the Pentagon Papers, and the National Intelligence Council’s Estimative Products
on Vietnam provide scholars with a broad base of well-known, widely available,
trusted primary sources. We will see how these documents contain candid
discussions of U.S. strategy, as public officials did not yet anticipate that sensi-
tive opinions voiced in private would consistently leak to the press.43 Debates
over Vietnam War strategy thus present a case where assessments of uncertainty
were especially important, where decision makers were especially capable of
confronting this challenge directly, and where transparent discussions of those
issues should be especially accessible to scholars.
Finally, Vietnam War decision making is important because it relates so
closely to the theoretical frameworks that international relations scholars use to
study coercion. During this period, international relations theorists developed
the contemporary view that the primary purpose of coercion is to influence how
opponents perceive their strategic prospects.44 Secretary McNamara himself
wrote in 1965 that the basic goal of military operations in Vietnam was “to create
41
For a similar logic of selecting cases as a window into the (ir)rationality of foreign policy de-
cision making, see Brian Rathbun, “The Rarity of Realpolitik: What Bismarck’s Rationality Reveals
about International Politics,” International Security, Vol. 43, No. 1 (2018), pp. 7–55.
42
Leslie Gelb and Richard K. Betts, The Irony of Vietnam: The System Worked (Washington,
D.C.: Brookings, 1979); Fredrik Logevall, Choosing War: The Lost Chance for Peace and the Escalation
of War in Vietnam (Berkeley, Calif.: University of California Press, 1999).
43
Yuen Foong Khong, Analogies at War: Korea, Munich, Dien Bien Phu, and the Vietnam Decisions
of 1965 (Princeton, N.J.: Princeton University Press, 1992).
44
Prominent articulations of this view include Thomas Schelling, Arms and Influence (New
Haven, Conn.: Yale University Press, 1966); Robert A. Pape, Bombing to Win: Air Power, Coercion,
and War (Ithaca, N.Y.: Cornell University Press, 1996); and Branislav Slantchev, “The Principle of
Convergence in Wartime Negotiations,” American Political Science Review, Vol. 97, No. 4 (2003), pp.
621–632.
Pathol og ie s o f P robab ilit y A s s e s sme nt 37
45
McNamara to Johnson, Foreign Relations of the United States, 1964–1968, Vol. III, Doc 38
(1 July 1965). Hereafter cited as FRUS. On U.S. officials’ conception of coercion in Vietnam, see
Wallace J. Thies, When Governments Collide: Coercion and Diplomacy in the Vietnam Conflict, 1964–
1968 (Berkeley, Calif.: University of California Press, 1980).
38 Wa r a n d C h a n c e
Vietnamese President Ngo Dinh Diem was rapidly losing public support.
President Kennedy thus asked Secretary McNamara to write a report analyzing
how the United States could respond to this growing crisis.
Secretary McNamara presented his views to the president in November 1961.
He wrote, “The chances are against, probably sharply against, preventing [the fall
of South Vietnam] by any measures short of the introduction of U.S. forces on
a substantial scale.” McNamara rejected proposals for limited escalation, saying
that if the Kennedy administration introduced just eight to ten thousand ad-
ditional soldiers, then they would be “almost certain to get increasingly mired
down in an inconclusive struggle.” McNamara recommended sending up to
220,000 troops to Vietnam. But he never assessed the chances that the United
States would succeed in this effort, writing only that large-scale commitments
were necessary to keep Saigon from falling to the Communists.46
Kennedy had solicited a second report on the growing crisis in Vietnam,
which was coauthored by General Maxwell Taylor and the deputy national se-
curity adviser, Walt Rostow. They argued that “morale in Viet-Nam will rapidly
crumble” without “a hard U.S. commitment to the ground.” But when Taylor
and Rostow recommended that the United States commit combat forces to
stop the insurgency’s momentum, they indicated only that this initiative was
more likely to succeed than the alternatives, explaining that “intervention
under SEATO [Southeast Asian Treaty Organization] or U.S. plans is the best
means of saving SVN [South Vietnam] and indeed, all of Southeast Asia.”47
Employing a similar emphasis on relative probability, the Joint Chiefs of Staff
argued that defeating the Viet Cong insurgency required expanding U.S. in-
volvement and that “the over-all objective could best be served” by attacking
North Vietnam directly.48
These documents established a pattern that recurred throughout debates
about Vietnam War strategy from 1961 to 1965. When U.S. officials evaluated
existing strategies, they candidly acknowledged that the war’s current tra-
jectory offered little chance of success. But when they turned their focus to
recommending new strategies, their probabilistic reasoning shifted as well.
Instead of directly assessing the odds that their recommendations would work,
U.S. officials typically justified new measures through relative probability or
46
FRUS, 1961–1963, Vol. I, Doc 214 (5 November 1961). Also see ibid., Doc 222, indicating
concurrence from the secretary of state, Dean Rusk, and the Joint Chiefs of Staff.
47
FRUS, 1961–1963, Vol. I, Doc 210 (3 November 1961); and Pentagon Papers, Vol. V.B.4,
pp. 331–342. Taylor and Rostow called for 6,000–8,000 combat troops to be sent to Vietnam, while
acknowledging that there would be “no limit to our possible commitment” once U.S. forces were
committed in this way.
48
JCS to McNamara, Pentagon Papers, Vol. V.B.4, p. 288 (9 October 1961).
Pathol og ie s o f P robab ilit y A s s e s sme nt 39
conditioning, saying that new policies offered the best chance of success or that
they were necessary to prevent Saigon’s collapse. These forms of probabilistic
reasoning were not just semantically imprecise—they reflected a deeper lack
of engagement with the uncertainty surrounding strategic choices that would
come to dominate U.S. foreign policy for the next decade.
This method of analysis was not limited to proponents of military escalation.49
In several letters to President Kennedy, for instance, foreign policy adviser John
Kenneth Galbraith denounced military measures for their “high risk and lim-
ited promise,” arguing that “there is scarcely the slightest practical chance” that
President Diem would implement the needed reforms. There was “no chance
of success,” Galbraith concluded, if the United States continued to back Diem
and his family. Galbraith thus recommended that Washington shift its support
to other Vietnamese leaders without expanding its military commitments. But
beyond arguing that this course was “not hopeless,” Galbraith did not evaluate
the odds that his proposal would work—he simply wrote that it was “the only
solution” worth considering.50 The chairman of the Joint Chiefs of Staff, Lyman
Lemnitzer, rebutted Galbraith using the same style of argumentation, writing
that “the President’s policy of supporting the Diem regime . . . appears to be the
only practicable alternative at this time,” without explaining how practicable the
policy actually was.51
Again, the problem with this debate was not just that U.S. officials articulated
their proposals in an imprecise manner that made it difficult to determine which
argument was more likely to be correct. It is indeed plausible that both sides of
this debate offered valid arguments: the United States may have had little chance
of creating a stable noncommunist South Vietnam through political measures
or through military measures, with or without the Diem regime. Focusing on
which measure was the most likely to succeed concealed deeper uncertainty
about whether the United States was pursuing feasible objectives.52 Secretary
McNamara himself later wrote that failing to address this uncertainty was one
of the key factors that led the United States into the Vietnam quagmire. In his
memoirs, McNamara acknowledged that “we never carefully debated what
49
See Logevall, Choosing War, on how Ball and other opponents of Americanization rarely went
into depth in articulating alternative proposals.
50
FRUS, 1961–1963, Vol. I, Doc 209 (3 November 1961); Galbraith to Kennedy, Pentagon
Papers, Vol. V.B.4, pp. 406–407 (20 November 1961); Galbraith to Kennedy, ibid., pp. 414–416 (21
November 1961).
51
Lemnitzer to McNamara, Pentagon Papers, Vol. V.B.4, p. 465 (13 April 1962).
52
Aaron Rapport, Waging War, Planning Peace: U.S. Noncombat Operations in Major Wars (Ithaca,
N.Y.: Cornell University Press, 2015), argues that this reflects a broader tendency for foreign policy
decision makers to base high-stakes decisions on the desirability of their objectives rather than the
feasibility of their methods.
40 Wa r a n d C h a n c e
U.S. force would ultimately be required, what our chances of success would be,
or what the political, military, financial, and human costs would be if we pro-
vided it. Indeed, these basic questions went unexamined. We were at the begin-
ning of a slide down a tragic and slippery slope.”53
53
Robert S. McNamara with Brian VanDeMark, In Retrospect: The Tragedy and Lessons of Vietnam
(New York: Times Books, 1995), p. 107.
54
McG. Bundy to Johnson, FRUS, 1964–1968, Vol. II, Doc 84 (7 February 1965).
55
Taylor to State Department, FRUS, 1964–1968, Vol. II, Doc 93 (9 February 1964).
Pathol og ie s o f P robab ilit y A s s e s sme nt 41
The Joint Chiefs of Staff played an ambivalent role in this debate.56 They agreed
that the United States could not succeed in Vietnam without committing addi-
tional forces. In an October 1964 memorandum, the chairman of the Joint Chiefs
of Staff Earle Wheeler stated, “Unless we move now to alter the present evolution
of events, there is great likelihood of VC [Viet Cong] victory.”57 However, the
Joint Chiefs believed that civilian leaders underestimated the importance of di-
rectly attacking Hanoi. Wheeler thus explained: “The military course of action
which would contribute most to defeating insurgencies in Southeast Asia remains
the destruction of the DRV will and capabilities as necessary to compel the DRV
to cease providing support to those insurgencies.” Wheeler repeatedly backed
this position with relational judgments, arguing that bombing North Vietnam
offered “the best probability of success,” the “greatest assurance of success,” and
the “best probability of achieving our objectives.”58 Yet despite warning that ex-
isting U.S. policy would almost surely fail, Wheeler and the Chiefs did not de-
scribe the chances that their preferred strategy might succeed.
As in 1961, opponents of escalating the war also presented recommendations
without assessing their chances of success. The most prominent internal critic of
escalation during this period was the undersecretary of state, George Ball. In a
sixty-seven-page memorandum, Ball argued that his colleagues offered implau-
sible justifications for expanding the war.59 Ball instead recommended that the
White House attempt to negotiate a neutral political status for South Vietnam.
But though Ball wrote that this strategy’s prospects were “at least as good as the
chance of success through military action alone,” he did not describe what those
chances were.60 In his memoirs, Ball admitted that he never tried to “do more
than outline the possibilities of such a settlement.”61 Thus, though Ball advocated
a different strategy than most of the other high-ranking members of the Johnson
administration, he mirrored his colleagues’ style of argumentation, identifying
the risks and logical flaws in other proposals without evaluating the chances that
his own recommendations would succeed.
56
Herbert R. McMaster, Dereliction of Duty: Lyndon Johnson, Robert McNamara, the Joint Chiefs of
Staff, and the Lies That Led to Vietnam (New York: HarperCollins, 1997).
57
Wheeler to McNamara, FRUS, 1964–1968, Vol. I, Doc 388 (27 October 1964). The JCS also
wrote to McNamara saying that “if present trends are not reversed, the counterinsurgency campaign
in South Vietnam will be lost” and that U.S. combat troops were required “to turn the tide of the war.”
JCS to McNamara, FRUS, 1964–1968, Vol. II, Doc 208 (18 March 1965).
58
Wheeler to McNamara, FRUS, 1964–1968, Vol. I, Doc 420 (7 February 1964).
59
George Ball, “How Valid Are the Assumptions Underlying Our Viet-Nam Policies?” 5 October
1964. Lyndon B. Johnson Presidential Library, National Security File, Vietnam Country File, Box 222.
60
FRUS, 1964–1968, Vol. II, Doc 300 (13 May 1965).
61
George Ball, The Past Has Another Pattern (New York: Norton, 1982), p. 383.
42 Wa r a n d C h a n c e
62
Thomas C. Thayer, War without Fronts: The American Experience in Vietnam (Boulder,
Colo.: Westview Press, 1985); Gregory Daddis, No Sure Victory: Measuring U.S. Army Effectiveness in
the Vietnam War (New York: Oxford University Press, 2011).
63
On the Vietnam War’s legacy for debates about methodological rigor and military strategy,
see Halberstam, The Best and the Brightest; Ben Connable, Embracing the Fog of War (Santa Monica,
Calif.: Rand, 2012).
64
Daddis, No Sure Victory, p. 121.
65
Gelb and Betts, Irony of Vietnam, p. 125.
66
FRUS, 1964–1968, Vol. III, Doc 69 (21 July 1965).
67
FRUS, 1964–1968, Vol. I, Doc 361 (25 September 1964).
Pathol og ie s o f P robab ilit y A s s e s sme nt 43
68
Ibid., Doc 335 (31 August 1964).
69
Lansdale to McNamara, Pentagon Papers, Vol. V.B.4, p. 11 (17 January 1961).
70
W. Bundy to Rusk, ibid., p. 312 (10 October 1961).
44 Wa r a n d C h a n c e
its breaking point when confronting the might of United States military.71 Leslie
Gelb and Richard Betts thus titled their famous book on decision making during
this period The Irony of Vietnam: The System Worked. Even though U.S. strategy in
Vietnam ultimately failed, Gelb and Betts argued, it might have still represented
a reasonable, calculated risk.72
Yet to conclude that escalating the Vietnam War reflected reasonable
logic, one must show that decision makers actually attempted to analyze
their choices in a logical, reasonable manner. If McNamara, Bundy, and their
colleagues had carefully considered whether the chances of success were high
enough to justify escalating the war, then critics might have to accept that this
was a subjective matter over which reasonable people could disagree. Instead,
we have seen how senior U.S. officials repeatedly avoided assessing the un-
certainty that surrounded their most important strategic choices and focused
instead on identifying which measures were necessary or which strategies
had the highest probability of success. As Gelb and Betts describe it them-
selves, debates about U.S. strategy in Vietnam “revolved around how to do
things better, and whether they could be done, not whether they were worth
doing.”73
Seen from this perspective, the McNamara Pentagon’s approach to decision
making during the Vietnam War was not characterized by an excess of method-
ological rigor, but rather by a nearly complete absence of debate about the log-
ical foundations of U.S. strategy. In his memoirs, Robert McNamara reflected on
how he of all people should have understood this. “I had spent twenty years as a
manager,” McNamara wrote, “identifying problems and forcing organizations—
often against their will—to think deeply and realistically about alternative
courses of action and their consequences. I doubt I will ever fully understand
why I did not do so here.”74
71
John Mueller, “The Search for the ‘Breaking Point’ in Vietnam: The Statistics of a Deadly
Quarrel,” International Studies Quarterly, Vol. 24, No. 4 (1980), pp. 497–519; Stanley Hoffmann et al.,
“Vietnam Reappraised,” International Security, Vol. 6, No. 1 (1981), p. 9.
72
Gelb and Betts, Irony of Vietnam.
73
Leslie Gelb and Richard K. Betts, “Vietnam: The System Worked,” Foreign Policy, No. 3 (1971),
p. 146. Furthermore, saying that these officials shared a sense of pessimism does not mean that fur-
ther analysis was unwarranted. There are situations in which decision makers might accept a thirty
percent chance of success, but not a five percent chance of success, and certainly not a zero percent
chance of success, even though all of these assessments could be described as pessimistic. When
decision makers are pessimistic about their prospects, this is in fact exactly when it should be most
important to parse probability estimates carefully, because a strategy’s chances of success may be too
low to justify moving forward.
74
McNamara, In Retrospect, pp. 202–203.
Pathol og ie s o f P robab ilit y A s s e s sme nt 45
75
“About the Foreign Relations of the United States Series,” Historical Documents, Office of
the Historian website, https://history.state.gov/historicaldocuments/about-frus, accessed May
18, 2017.
76
Ibid.
46 Wa r a n d C h a n c e
77
The terms were “almost certain,” “cannot rule out,” “chance,” “conceivable,” “doubt,” “doubtful,”
“fifty-fifty,” “improbable,” “inconceivable,” “liable,” “likelihood,” “likely,” “odds,” “percent,” “proba-
bility,” “probable,” “probably,” “prospect,” “remote,” “risk,” and “unlikely.” I chose these search terms
because they appear in other scholars’ empirical studies of verbal uncertainty expressions, specifi-
cally, Kent, “Words of Estimative Probability”; Beyth-Marom, “How Probable Is Probable?”; and
Mosteller and Youtz, “Quantifying Probabilistic Expressions.”
78
I applied three criteria for inclusion in this database. First, I only retained statements that
expressed the beliefs of the document’s author or that were recorded in a memorandum of discus-
sion. Second, I only retained statements that reflected the beliefs of U.S. government officials; though
independent scholars or members of the press appear frequently in FRUS documents, the goal of
this study is to analyze how U.S. officials debated uncertainty. Third, I only included documents and
memoranda intended for an internal audience, on the assumption that public speeches, letters to
President Diem, or other public statements would not reliably reflect officials’ true beliefs.
Pathol og ie s o f P robab ilit y A s s e s sme nt 47
Asterisks reflect statistical significance levels when comparing each category to baseline patterns
across the data as a whole, as follows: * = p < 0.05, ** = p < 0.01, *** = p < .0.001.
border nor assessed the probability that this could be accomplished.79 Other
statements convey extremely vague sentiments, such as the argument of Deputy
National Security Advisor Rostow that escalation would produce “a fair chance
of avoiding war,” General Wheeler’s assessment that “victory is now a hopeful
prospect,” or Director of Central Intelligence John McCone’s reporting to
President Johnson that winning the war would be “formidable and difficult, but
not impossible.”80
The database also confirms that U.S. officials consistently provided more
informative probability assessments when critiquing the strategies they op-
posed. Fifty-eight statements in the database involve U.S. officials describing
the current strategy’s prospects or critiquing other officials’ recommendations.
Although only one of these statements could reasonably be considered precise,81
just six (ten percent) involved relative probability or conditioning. Most of these
statements offer at least some rough way to gauge U.S. officials’ pessimism: for
example, stating that “the chances of a turn-around in South Vietnam remain
less than even,” explaining that current U.S. strategy “would probably not save
South Viet-Nam from eventual loss,” or warning that “our present course of ac-
tion . . . is very likely to fail.”82
The database also confirms that the manner in which U.S. officials discussed
their strategic prospects contrasted with how they assessed other aspects of
uncertainty about the war.83 Though these officials almost never made their
probability assessments explicit, just fourteen percent of the statements
across the database as a whole involved either relative probability or condi-
tioning. No other category of statements employed these forms of probabi-
listic reasoning nearly as frequently as when officials described the chances
79
Frederick Nolting, “Telegram from the Embassy in Vietnam to the Department of State,”
FRUS, 1961–1963, Vol. 1, Doc 147 (6 October 1961). Similarly, Major General Charles Bonesteel
argued in a meeting of the Presidential Task Force on Vietnam that “if we had a reasonably workable
settlement in Laos, bolstered by the barrier between Laos and Viet-Nam and along the 17th parallel,
we could probably succeed.” FRUS, 1961–1963, Vol. 1, Doc 43 (4 May 1961).
80
Rostow to Johnson, FRUS, 1961–1963, Vol. 1, Doc 251 (14 November 1961); JCS Team
Report on South Vietnam, FRUS, 1961–1963, Vol. II, Doc 26 ( January 1963); McCone to Johnson,
FRUS, 1961–1963, Vol. IV, Doc 375 (23 December 1963).
81
This is a memorandum written by the Defense Department’s Office of International Security
Affairs, which argues that continuing U.S. policy toward President Diem would “almost certainly seal
the fate of Vietnam.” FRUS, 1961–1963, Vol. II, doc. 35 (26 January 1962).
82
McG. Bundy to Johnson, FRUS, 1964–1968, Vol. II, Doc 183 (6 March 1965); Maxwell Taylor,
“The Current Situation in South Vietnam,” FRUS, 1964–1968, Vol. I, Doc 426 (November 1964);
Chester Bowles to Kennedy, “Recommendations for a Fresh Approach to the Vietnam Impasse,”
FRUS, 1961–1963, Vol. III, Doc 52 (7 March 1963).
83
The statistical significance tests reported in Table 1.1 reflect two-way t-tests comparing
differences in means.
Pathol og ie s o f P robab ilit y A s s e s sme nt 49
that their favored strategies would succeed. Altogether, these data support
the chapter’s qualitative description of how U.S. officials were overwhelm-
ingly vague when assessing uncertainty in Vietnam; of how this problem ran
much deeper than the use of imprecise language; and of how assessments of
uncertainty were least informative when it came to proposals for escalating
the war.
This experience motivates a broader concern that will run throughout the
remainder of the book. In most areas of policy and scholarship, it is generally
assumed that rigor, objectivity, and analytic precision go together.84 Yet that is
not the case when it comes to probabilistic reasoning in international politics.
Any attempt to assess uncertainty in this domain is inherently subjective, and
that is indeed one of the main reasons why foreign policy analysts are typically
reluctant to describe their judgments explicitly. The Whiz Kids’ experience in
Vietnam highlights the dangers of failing to grapple with this challenge—of
privileging objectivity over rigor in foreign policy analysis. But if the most im-
portant elements of uncertainty in international politics are inherently subjec-
tive, then what would rigorously analyzing these issues look like? How well
can fallible individuals handle this challenge? And how can decision makers
incorporate these judgments into real foreign policy decisions? These are the
questions that occupy the rest of the book.
84
For a history of debates about rigor, objectivity, and quantification in public policy analysis,
see Theodore Porter, Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (Princeton,
N.J.: Princeton University Press, 1995).
2
51
52 Wa r a n d C h a n c e
(such as judging a foreign leader’s intentions) are essentially impossible to resolve, and should thus
be considered mysteries. Meanwhile, some kinds of aleatory uncertainty (such as estimating the
probability that a coin flip will come up heads) do yield “correct” answers, and should therefore
be considered puzzles. On the distinction between mysteries and puzzles, see Wilhelm Agrell and
Gregory F. Treverton, National Intelligence and Science: Beyond the Great Divide in Analysis and Policy
(New York: Oxford University Press, 2015), pp. 32–35.
5
Richard von Mises, Probability, Statistics and Truth (Mineola, N.Y.: Dover Publications, 1981
[1928]).
6
von Mises, Probability, Statistics, and Truth, pp. 11–15.
54 Wa r a n d C h a n c e
be unable to measure.7 These are all subjective judgments that reflect epistemic
uncertainty about the quality of a statistical model’s assumptions, rather than
objective properties of randomness.8 In this sense, even if the statistical methods
used in foreign policy analysis are often described in terms of “frequentist statis-
tics,” they are properly viewed as analogies to the frequentist paradigm, rather
than strict applications of that conceptual framework.9
7
The subjective nature of determining what it means for cases to be “comparable” in this fashion
is known as the reference class problem. See John Venn, The Logic of Chance (London, U.K.: Macmillan,
1888); Hans Reichenbach, The Theory of Probability, tr. Ernest H. Hutten and Maria Reichenbach
(Berkeley, Calif.: University of California Press, 1949).
8
Charles F. Manski, Public Policy in an Uncertain World: Analysis and Decision (Cambridge,
Mass.: Harvard University Press, 2013), explains that statistical probability estimates present an
incomplete (and usually narrower) picture of the overall uncertainty that surrounds empirical
inference.
9
In other words, the use of frequentist statistics in foreign policy analysis is properly under-
stood as a tool for generating subjective probability estimates. It is, then, an empirical question as
to whether those statistical methods are more reliable than other forms of subjective judgment. For
more on how frequentist statistics rarely adhere to the frequency theory of probability itself, see
Harold Jeffreys, “Probability and Scientific Method,” Proceedings of the Royal Society of London A,
Vol. 146, No. 856 (1933), pp. 9–16; E. T. Jaynes, Probability Theory: The Logic of Science, ed. G. Larry
Brethorst (New York: Cambridge University Press, 2003).
10
The propensity theory originates with Karl Popper, “The Propensity Interpretation of
Probability,” British Journal for the Philosophy of Science, Vol. 10, No. 37 (1959), pp. 25–52.
Subjec tive P robab ili t y 55
at Abbottabad, he was not implying that some mechanical process had ran-
domly assigned bin Laden to this location six times out of ten. And though for-
eign policy analysts can use physical models and simulation techniques to guide
some elements of policy design, no one would claim that these analyses depict
the actual properties of social systems. Even the most sophisticated models of
armed conflict and international politics reflect extreme simplifications of reality.
Interpreting these models’ results thus requires making subjective judgments
about the appropriateness of simplifying assumptions.11
Moreover, even if foreign policy analysts could create a deterministic model
of international politics, that would still be insufficient to make objective
predictions, because many of the model’s key parameters would remain un-
certain. When waging wars or bargaining over contentious issues, leaders have
incentives to conceal their intentions, capabilities, and resolve. Since any model
of conflict or cooperation should account for these variables, those models’
output will reflect epistemic uncertainty about underlying conditions.12 This is
another reason why even the most sophisticated piece of foreign policy analysis
could never meet the propensity theory’s standards of objectivity.13
Of course, there is no need to invoke probability theory just to understand the
difficulties of predicting international politics. The purpose of this discussion is
11
On the limits to objectivity in military modeling in particular, see Joshua M. Epstein, Strategy
and Force Planning (Washington, D.C.: Brookings, 1987); Thomas J. Czerwinski, Coping with
the Bounds: Speculations on Nonlinearity in Military Affairs (Washington, D.C.: National Defense
University Press, 1998); Michael E. O’Hanlon, The Science of War (Princeton, N.J.: Princeton
University Press, 2009).
12
On the pervasiveness of this uncertainty, see Robert Jervis, System Effects: Complexity in
Political and Social Life (Princeton, N.J.: Princeton University Press, 1997); John Mearsheimer, The
Tragedy of Great Power Politics (New York: Norton, 2001), pp. 29–54; and Kristopher W. Ramsey,
“Information, Uncertainty, and War,” Annual Reviews of Political Science, Vol. 20 (2017), pp. 505–527.
For practitioners’ views of this subject, see James N. Mattis, “USJFCOM Commander’s Guidance
for Effects-Based Operations,” Parameters, Vol. 38, No. 3 (2008), pp. 18–25; and Paul K. Van Riper,
“EBO: There Was No Baby in the Bathwater,” Joint Forces Quarterly, No. 52 (2009), pp. 82–85. On the
limits of academic forecasting in international politics, see Gerald Schneider, Nils Petter Gleditsch,
and Sabine Carey, “Forecasting in International Relations,” Conflict Management and Peace Science,
Vol. 20, No. 1 (2011), pp. 5–14; and Michael D. Ward, “Can We Predict Politics?” Journal of Global
Security Studies, Vol. 1, No. 1 (2016), pp. 80–91.
13
Furthermore, even if foreign policy analysts could somehow develop a purely objective prob-
ability assessment, most decision makers would be unable to know that this was the case. Decision
makers do not possess the time or expertise to scrutinize every data point and assumption behind
the estimates that analysts give them. Indeed, the purpose of delegating analytic tasks is so that
decision makers do not need to scrutinize these details. But since these minutiae matter, any time
decision makers solicit input in forming judgments under uncertainty, their reactions to that input
depend on the subjective trust that they place in their analysts for the purposes of understanding a
specific issue.
56 Wa r a n d C h a n c e
instead to place those difficulties into proper theoretical perspective. When for-
eign policy analysts attempt to assess the uncertainty that surrounds high-stakes
decisions, their judgments should not be viewed as unreliable attempts to esti-
mate frequencies or propensities. Logically speaking, these judgments do not re-
flect frequencies or propensities at all—even if foreign policy analysts base their
assessments of uncertainty on the output of simulations or statistical analyses,
their conclusions always reflect some measure of subjective belief. The next part
of this section explains how the logic of translating subjective beliefs into prob-
ability assessments is entirely different from the manner in which analysts typi-
cally interpret output from statistical analyses or mechanical models.
14
The Italian economist Bruno de Finetti and the British philosopher Frank P. Ramsey developed
the theory of subjective probability independently. See Frank P. Ramsey, “Truth and Probability,”
in The Foundations of Mathematics and Other Logical Essays (New York: Harcourt, Brace, 1931);
and Bruno de Finetti, “Probabilism: A Critical Essay on the Theory of Probability and the Value
of Science,” Logos, Vol. 14 (1931), pp. 163–219; reprinted in Erkenntnis, Vol. 31, Vol. 2/3 (1989),
pp. 169–223. The American mathematician Leonard J. Savage then extended the formal logic behind
this framework in The Foundations of Statistics (New York: Wiley, 1954).
15
And that you would be unwilling to bet on any other outcome, regardless of the payoff. Gillies,
Philosophical Theories of Probability, pp. 55–58, explores conceptual nuances of the comparison of
lotteries (or “betting quotient”) method for defining what probability assessments represent.
16
Frank Lad, Operational Subjective Statistical Methods (New York: Wiley, 1996), pp. 8–9.
Subjec tive P robab ili t y 57
logic applies to interpreting the “chances” that Osama bin Laden was living at
Abbottabad. When President Obama’s advisers estimated those chances, their
degrees of personal conviction reportedly ranged from thirty percent to ninety-
five percent, even though everyone knew that the “real” probability of bin Laden
living at Abbottabad was either zero percent or one hundred percent.
Many people find it counterintuitive to equate convictions with probabilities,
and laboratory experiments confirm that subjects tend to treat these concepts dif-
ferently. For example, if given the opportunity to bet on calling the outcome of a fair
coin versus calling the outcome of a coin with an unknown bias, most people will
choose the first of these gambles. Some people will even pay nontrivial amounts to
ensure that they can bet on less-ambiguous probabilities.
This behavior is known as ambiguity aversion, and it is easy to demonstrate that
ambiguity aversion is misguided.17 For example, you could simply randomize
whether you predict that the coin will land heads or tails. In that case, you would
guarantee yourself a fifty percent chance of winning the bet regardless of whether
the coin you flip is fair or imbalanced. If anything, it is actually in your interest to
flip the coin with an unknown bias, because this would provide potentially valuable
information about how that coin is weighted. If you ever have a chance to take this
gamble again, you could pick the outcome that you observed from the last time you
flipped this coin, and your expected probability of winning the bet would then be
greater than fifty percent.18
Another source of discomfort with the subjective probability framework
is that it conflicts with the desire to base high-stakes choices on objective
judgments.19 Objectivity is in fact a formal requirement in many areas of for-
eign policy analysis. For example, the very first requirement laid out in the U.S.
Intelligence Community’s analytic standards is that “analysts must perform their
functions with objectivity.”20 U.S. Army planning doctrine similarly revolves
17
The term ambiguity aversion was coined by Daniel Ellsberg, the economist who later went on to
leak the Pentagon Papers. See Daniel Ellsberg, “Risk, Ambiguity, and the Savage Axioms,” Quarterly
Journal of Economics, Vol. 75, No. 4 (1961) pp. 643–669. In some situations ambiguity can serve as
a signal of other decision-relevant factors, but this does not change the fact that ambiguity regarding
probability estimates is not, by itself, a component of rational decision making. See Nabil al-Najjar
and Jonathan Weinstein, “The Ambiguity Aversion Literature: A Critical Assessment,” Economics and
Philosophy, Vol. 25, No. 3 (2009), pp. 249–284.
18
Stefan T. Trautmann and Richard Zeckhauser, “Shunning Uncertainty: The Neglect of Learning
Opportunities,” Games and Economic Behavior, Vol. 79 (2013), pp. 44–55.
19
For a history of debates about rigor, objectivity, and quantification in public policy analysis,
see Theodore Porter, Trust in Numbers: The Pursuit of Objectivity in Science and Public Life (Princeton,
N.J.: Princeton University Press, 1995).
20
Intelligence Community Directive 203 (ICD-203), Analytic Standards, p. 2. U.S. law actually
mandates that the Intelligence Community develop institutional mechanisms to “safeguard objec-
tivity.” See the 2004 Intelligence Reform and Terrorism Prevention Act, Section 1020A. Recent
58 Wa r a n d C h a n c e
survey research indicates that intelligence analysts indeed prioritize “objectivity” over ICD-203’s
other analytic standards: see David R. Mandel and Tonya Hendriks, “ODNI’s ICD-203 Standards
on Analytic Integrity,” Paper presented at the 7th Biannual Meeting of the NATO Systems Analysis
and Studies Panel Research Technical Group on the Assessment and Communication of Uncertainty
in Intelligence to Support Decision-Making (Madrid, Spain, 2018).
21
U.S. Army Field Manual 5-0, The Operations Process (2010), p. H-4.
22
Mark Lowenthal, Intelligence: From Secrets to Policy, 3rd edition (Washington, D.C.: CQ Press,
2006), p. 129. In a similar vein, the former director of national intelligence James Clapper reflected
on the debate about the chances that Osama bin Laden was living at Abbottabad, saying: “We put a
lot of discussion [into] percentages of confidence, which to me is not particularly meaningful. In the
end it’s all subjective judgment anyway.” CNN, “The Axe Files,” Podcast Ep. 247 (31 May 2018).
23
Carl von Clausewitz, On War, tr. Michael Howard and Peter Paret (Princeton, N.J.: Princeton
University Press, 1984); and Clifford J. Rogers, “Clausewitz, Genius, and the Rules,” Journal of
Military History, Vol. 66, No. 4 (2002), pp. 1167–1176.
Subjec tive P robab ili t y 59
24
The rejection of Enlightenment rationality was indeed crucial to Clausewitz’s intellectual
project. On this point, see Azar Gat, The Origins of Military Thought: From the Enlightenment to
Clausewitz (Oxford, U.K.: Clarendon Press, 1989). On the distinction between subjectivity and
deliberativeness in foreign policy analysis more generally, see Philip E. Tetlock, “Second Thoughts
about Expert Political Judgment,” Critical Review, Vol. 22, No. 4 (2010), pp. 467–488. In a similar
vein, psychologists have produced a large volume of research showing that unconscious reasoning
can improve decision outcomes. See Ap Dijksterhuis et al., “On Making the Right Choice: The
Deliberation-Without-Attention Effect,” Science, Vol. 311 (2006), pp. 1005–1007.
25
Thus one of the principal critiques of Clausewitz’s theory of decision making: if two
commanders disagree on the chances that a strategy will succeed, how do we know which one
possesses greater “genius”?
26
John Stuart Mill, A System of Logic, Ratiocinative and Inductive, 8th ed. (New York, N.Y.: Harper
and Brothers, 1882), p. 539.
27
John Maynard Keynes, “The General Theory of Employment,” Quarterly Journal of Economics,
Vol. 51, No. 2 (1937), pp. 213–214; see also John Maynard Keynes, A Treatise on Probability
(London: Macmillan, 1921). For reviews of this argument in international affairs, see Richard K.
Betts, “Is Strategy an Illusion?” International Security, Vol. 25, No. 2 (2000), pp. 5–50; and Ben
Connable, Embracing the Fog of War (Santa Monica, Calif.: Rand, 2012).
60 Wa r a n d C h a n c e
surge represented “the minimum force levels to accomplish the mission with
an acceptable level of risk.”
Both statements reflected the premise that sending fewer than forty thou-
sand reinforcements to Afghanistan would produce lower chances of suc-
cess.28 It is impossible to believe that General McChrystal could coherently
argue for sending additional troops to Afghanistan while also claiming that he
had no basis for assessing the chances that the deployment would succeed. If
General McChrystal had no basis for assessing those chances, then President
Obama would have had no reason to expect that sending forty thousand addi-
tional forces to Afghanistan would produce a better outcome than deploying
thirty thousand more troops. Iterating this logic, it would become unjustifiable
to send any additional soldiers to Afghanistan, or even to keep any remaining
U.S. forces in the country at all. Who could have said that withdrawal would
lower the chances of defeating the Taliban? What is the purpose of expending
lives and resources if there is no basis for believing that this raises the chances of
obtaining valuable outcomes?
Few observers would adopt such an extreme viewpoint.29 Regardless of how
complex or subjective a decision might be, it is ultimately necessarily to draw a
line somewhere between resource allocations that are acceptable and those that
are not. Any deployment of resources thus depends on the notion that it is, in
fact, possible to form coherent perceptions of subjective probability.30
Of course, this does not mean that anything goes in foreign policy analysis. If
a team of subject-matter experts conducts a rigorous, mixed-method analysis to
estimate the chances than some event will occur, their judgment clearly deserves
28
This statement also implies that General McChrystal could judge that the magnitude of this shift
in probability was large enough to justify the additional investment of resources. Similar assumptions
are also evident in McChrystal’s claim that deploying more than 40,000 additional troops “could
achieve low risk, but this would be excessive in the final analysis.” Commander, International Security
Assistance Force-Afghanistan, COMISAF Initial Assessment, August 2009, pp. 2–20 to 2–21.
29
In fact, if foreign policy analysts cannot make valid probability assessments, then the only eth-
ical move would be to eliminate all defense expenditures. There would be no reason to think that
those resources actually make the country safer, and thus the money would be better spent elsewhere
(or else returned to taxpayers). As Betts puts it, if there is no hope of predicting the results of military
strategy, then “analysts as well as politicians and generals should all quit and go fishing” (“Is Strategy
an Illusion,” p. 20). See also Robert A. Pape, “The Air Force Strikes Back,” Security Studies, Vol. 7, No.
2 (1997/98), pp. 191–214.
30
Note how this discussion inverts the traditional relationship between objectivity and ana-
lytic precision. In many areas of public life, scholars and practitioners seek to quantify important
parameters as a way of pursuing objective judgment. Yet, when dealing with probability assessments
in international politics, many scholars and practitioners seek to avoid making precise judgments, on
the grounds that these judgments would be excessively subjective. See Porter, Trust in Numbers, for
more detail on the first of these viewpoints.
Subjec tive P robab ili t y 61
more weight than a pundit’s off-the-cuff opinion, even if both views reflect sub-
jective probabilities. But how should decision makers determine how much
weight these judgments deserve? If objectivity is not the proper standard for
assigning credibility to foreign policy analysts’ judgments, then what criterion
should we use?
Answering these questions requires distinguishing between the concepts of
“probability” and “confidence.” As mentioned in the introduction, probability
reflects analysts’ beliefs about the chances that a statement is true, while con-
fidence reflects the extent to which analysts believe they possess a sound basis
for assessing uncertainty. Elsewhere, I have argued for further parsing analytic
confidence into at least three components: the reliability of available evidence
supporting a judgment, the range of reasonable opinion surrounding a judg-
ment, and the degree to which analysts believe their judgment could change in
response to new information. These concepts can vary independently, and each
carries distinct implications for high-stakes decision making.31
In U.S. criminal trials, for example, jurors are only allowed to convict
defendants based on reliable evidence. Inferences based on hearsay, stereotypes,
or coerced confessions are barred from jurors’ consideration on the grounds
that these inferences do not provide a valid basis for revoking defendants’ civil
liberties. Most criminal trials furthermore require that jurors reach a guilty ver-
dict unanimously, reflecting the premise that a single dissenting view creates
enough reasonable doubt to preclude conviction. In this respect, the application
of criminal justice is not just a matter of estimating the probability of defendants’
guilt. Estimating a high probability of guilt is necessary, but not sufficient, to sus-
tain a guilty verdict.32
Controversy over the U.S. invasion of Iraq in 2003 shows how similar factors
shape the politics and ethics of foreign policy decision making. As described in
chapter 1, most intelligence analysts believed that Iraq was pursuing weapons
of mass destruction (WMD), and the Bush administration invoked that belief
to justify going to war. However, inferences about Iraq’s WMD programs relied
heavily on circumstantial evidence and questionable informants. The U.S. State
Department had, furthermore, lodged an explicit dissent regarding assessments
of Iraq’s nuclear program, and the International Atomic Energy Agency had
31
Jeffrey A. Friedman and Richard Zeckhauser, “Analytic Confidence and Political Decision
Making: Theoretical Principles and Experimental Evidence from National Security Professionals,”
Political Psychology, Vol. 39, No. 5 (2018), pp. 1069–1087.
32
On how the standard of “guilt beyond a reasonable doubt” combines assessments of proba-
bility and confidence in this way, see Peter Tillers and Jonathan Gottfried, “Case Comment—United
States v. Copeland, 369 F. Supp. 2d 275 (E.D.N.Y. 2005): A Collateral Attack on the Legal Maxim
That Proof Beyond a Reasonable Doubt Is Unquantifiable?” Law, Probability, and Risk, Vol. 5, No. 2
(2006), pp. 135–157.
62 Wa r a n d C h a n c e
warned that there was no factual basis for concluding that the program was op-
erative. Many of the Bush administration’s critics argued that the lack of reliable
evidence and the presence of dissenting views undermined the case for war, or
at least justified delaying the invasion in order to gather more reliable informa-
tion and to build greater consensus.33 Note that this is distinct from critiquing
how the Bush administration perceived the probability that Iraq was pursuing
WMDs. The argument is instead that analysts and decision makers paid insuffi-
cient attention to the confidence surrounding this judgment.34
The degree to which probability assessments might change in response to new
information plays an especially important role when it comes to timing high-
stakes decisions. For example, when the Central Intelligence Agency (CIA) ini-
tially approached President Obama with indications that Osama bin Laden might
be living in Abbottabad, Pakistan, Obama decided that the available information
did not yet warrant taking military action. Instead, he asked intelligence analysts to
conduct additional surveillance of the compound. Over the next eight months, the
CIA’s collection efforts became increasingly creative. When none of these efforts
produced significant insights, President Obama eventually decided that he could
no longer afford to delay the decision. What swayed Obama’s thought process
over time was not that he had gained new insight about the probability that bin
Laden was living in Abbottabad, but rather that it had become increasingly clear
that perceptions of this probability were unlikely to change in response to new
information.35
These examples emphasize how assessments of probability provide only
part of what decision makers need to know about the uncertainty surrounding
high-stakes choices. Reliability of available information, range of reasonable
opinion, and responsiveness to new information can also shape decision makers’
33
The lack of direct evidence regarding Iraq’s WMD programs was particularly relevant to
judging the Bush administration’s attempt to justify the invasion using the doctrine of “preemp-
tion,” which is generally reserved for cases of clear and present danger. See Michael B. Doyle, Striking
First: Preemption and Prevention in International Conflict (Princeton, N.J.: Princeton University Press,
2008). For broader arguments about how the ethics of using force depend on the reliability of the evi-
dence decision makers possess, see Jeff McMahan, The Ethics of Killing (New York: Oxford University
Press, 2002); and Michael Walzer, Just and Unjust Wars, 4th ed. (New York: Basic Books, 2006).
34
For expressions of similar views, see Richard K. Betts, Enemies of Intelligence: Knowledge and
Power in American National Security (New York: Columbia University Press, 2006), p. 116; Robert
Jervis, “Reports, Politics, and Intelligence Failures: The Case of Iraq,” Vol. 29, No. 1 (2006), p. 44;
and Michael Morell, The Great War of Our Time: the CIA’s Fight against Terrorism from Al Qa’ida to
ISIS (New York: Twelve, 2014), p. 102.
35
Peter Bergen, Manhunt: The Ten- Year Search for Bin Laden from 9/ 11 to Abbottabad
(New York: Crown 2012); and Mark Bowden, The Finish: The Killing of Osama bin Laden
(New York: Atlantic, 2012); Morell, Great War of Our Time, ch. 7.
Subjec tive P robab ili t y 63
willingness to take risks.36 These attributes are far more relevant to foreign policy
decision making than objectivity—which is, in any case, an impossible standard
to apply when assessing uncertainty in international politics. This discussion
highlights the importance of drawing clear distinctions between the probability
and the confidence that foreign policy analysts assign to their judgments. And in
the next section, we will see how foreign policy analysts can always assess prob-
ability in clear and structured ways.
36
Friedman and Zeckhauser, “Analytic Confidence and Political Decision Making,” provide
experimental evidence showing that these factors do, in fact, shape the way that national security
professionals evaluate risky decisions.
37
These ranges are often called confidence intervals (or credible intervals), but note that this only
reflects one of the three elements of analytic confidence described earlier. One reason to shift existing
discussions of analytic confidence toward more precise terminology is simply to avoid the confusion
that the word “confidence” can generate—including, as we have seen in several places throughout the
book, conflating the concepts of confidence and probability.
38
It is important to note that this logic holds for estimating subjective probabilities, but not for
estimating material costs and benefits. The difference stems from the fact that individuals can assign
different utilities to these costs and benefits. These utility functions create “risk preferences” that
shape the manner in which rational actors resolve ambiguity. For example, a gamble that offers a fifty
percent chance of winning $0 and a fifty percent chance of $100 would have an expected monetary
value of $50. But the expected utility of taking this gamble is not necessarily the utility that gamblers
assign to a certain $50 payoff. Risk-averse individuals would value this bet at less than $50, while
risk-acceptant individuals would value this bet at more than $50. For more on the contrast between
resolving ambiguity about outcomes versus probabilities, see Ellsberg, “Risk, Ambiguity, and the
Savage Axioms.”
64 Wa r a n d C h a n c e
39
For a review of similar techniques, see Paul H. Garthwaite, Joseph B. Kadane, and Anthony
O’Hagan, “Statistical Methods for Eliciting Probability Distributions,” Journal of the American
Statistical Association, Vol. 100, No. 470 (2005), pp. 680–700.
40
On the logic of the Dutch Book, see Ken Binmore, Rational Decisions (Princeton, N.J.: Princeton
University Press, 2009), pp. 123–126.
Subjec tive P robab ili t y 65
that all estimates within this range are equally credible. The only sensible way
to resolve that kind of ambiguity is to take the range’s midpoint of fifty per-
cent. Any estimate higher than fifty percent would implicitly give more weight
to possibilities above that value; any estimate lower than fifty percent would
implicitly give more weight to possibilities below that value; and either of those
options would violate your own personal belief that all estimates within this
range are equally credible.
Similar logic allows decision makers to resolve ambiguity when analysts offer
diverging views. When analysts study the same information and offer different
assessments, a decision maker should consider those assessments’ relative cred-
ibility.41 If the available estimates appear equally credible—and this includes
situations in which decision makers have no basis for evaluating analysts’ cred-
ibility at all—then the only reasonable way to resolve the disagreement is to
average the available estimates together. Any other approach would violate
a decision maker’s own assumptions by implicitly assigning some estimates
more weight than others. In cases where a decision maker believes that proba-
bility assessments are not equally credible, the only logical solution is to take a
weighted average.42
Viewed in light of the subjective probability framework, seemingly intrac-
table debates about assessing uncertainty become conceptually straightforward,
as long as analysts and decision makers are willing to approach these debates
in a principled manner. For example, we saw in the book’s introduction that
President Obama’s advisers voiced a broad range of opinions about the chances
that Osama bin Laden was living in Abbottabad. Red Teams assigned to draw
skeptical conclusions placed these chances around forty percent. CIA Deputy
Director Michael Morell offered an estimate of sixty percent. Most advisers
placed their judgments around eighty percent, while the leader of the CIA unit
assigned to track bin Laden said there was a ninety-five percent chance that he
was living in the suspected compound. Multiple sources describe how President
Obama found this discussion confusing, and how he concluded that the odds
41
The weighted averaging approach described here is only valid when analysts base their
assessments on the same body of information. When analysts form their estimates based on non-
identical information, then resolving their disagreements becomes more complex. See Richard
Zeckhauser, “Combining Overlapping Information,” Journal of the American Statistical Association,
Vol. 66, No. 333 (1971), pp. 91–92 and Robert T. Clemen, “Combining Overlapping Information,”
Management Science, Vol. 33, No. 3 (1987), pp. 373–380.
42
Thus, if one analyst estimates a probability as sixty percent, another analyst estimates that prob-
ability as ninety percent, and the latter judgment seems twice as credible, then the only coherent way
for a decision maker to resolve the ambiguity is to believe that the probability is eighty percent. For
more on the weighted averaging issue, see Jeffrey A. Friedman and Richard Zeckhauser, “Handling
and Mishandling Estimative Probability,” Intelligence and National Security, Vol. 30, No. 1 (2015),
pp. 77–99.
66 Wa r a n d C h a n c e
were fifty-fifty. “Look guys, this is a flip of the coin,” President Obama said ac-
cording to one account. “I can’t base this decision on the notion that we have any
greater certainty than that.”43
Two aspects of President Obama’s conclusion are worth noting. First, a
fifty-fifty estimate was remarkably conservative given the viewpoints that
the president’s advisers presented. The simple average of the four estimates
described in the previous paragraph is sixty-nine percent. President Obama’s
coin flip analogy thus implicitly afforded special weight to the lowest estimates
he received.44 Yet it is not at all clear that these were the most credible positions
on the table. The Red Teams were developed to offer deliberately pessimistic
conclusions, not unbiased reflections of any analysts’ actual views. Even Morell
indicated that his sixty percent estimate involved intentionally low-balling the
issue given how intelligence analysts had overestimated the chances that Iraq
was pursuing nuclear weapons a decade earlier.
The point of this discussion is not to second-guess President Obama’s judg-
ment, but to show that no matter how much his advisers disagreed over the
chances that bin Laden was at Abbottabad, there was no reason to believe that
resolving this disagreement represented an intractable problem. As long as
analysts are willing to assess uncertainty in transparent ways, and as long as de-
cision makers are willing to consider the relative credibility of those judgments,
then it is always possible to reconcile diverging opinions. Greater ambiguity ac-
tually makes it easier, not harder, to resolve these disagreements. In the extreme
case where a decision maker has no reason to believe that any estimate is more
credible than the others, then the only valid approach is simply to average the
available assessments together.45
This reasoning also implies that any time foreign policy analysts formulate
judgments using words of estimative probability, they also possess a coherent
basis for converting those judgments into quantitative expressions. Chapter 1,
for example, showed how U.S. Intelligence Community standards currently
43
Bowden, The Finish, p. 163.
44
This estimate may have reflected the well-established “fifty-fifty bias” in which individuals use
this phrase to refer to situations in which they feel they have no idea about the right answer and not
to indicate that the chances of a particular outcome are actually fifty percent. See Baruch Fischhoff
and Wandi Bruine de Bruin, “Fifty-Fifty = 50%?” Journal of Behavioral Decision Making, Vol. 12, No. 2
(1999), pp. 149–163.
45
As mentioned earlier, the weighted averaging approach depends on the assumption that analysts
have based their estimates on similar information. This assumption appears safe in most high-level
national security debates such as the discussion of whether Osama bin Laden was at Abbottabad.
Deputy CIA Director Morell reportedly told President Obama during the meeting: “People don’t
have differences [in their probability estimates] because they have different intel. . . . We are all
looking at the same things.” Bowden, The Finish, p. 161.
Subjec tive P robab ili t y 67
Conclusion
This chapter has argued that aversion to assessing uncertainty in international
politics rests, at least in part, on misperceptions of what probabilistic rea-
soning entails. Much of the existing debate about this subject revolves around
the expectation that assessments of uncertainty in international politics
should obey the same kind of logic that characterizes statistics or quantum
mechanics. Many observers appear to believe that if foreign policy analysts
cannot meet those objective standards, then their beliefs cannot provide a
credible basis for high-stakes decision making. This emphasis on objectivity
is enshrined throughout official analytic standards for intelligence analysis
and military planning, and it is often used to justify leaving controversial
judgments deliberately vague.
68 Wa r a n d C h a n c e
Friedman and Zeckhauser, “Analytic Confidence and Political Decision Making,” provide
46
experimental evidence showing that national security professionals interpret and react to these
distinctions in sensible ways.
3
*
Portions of this chapter previously appeared in Jeffrey A. Friedman, Joshua D. Baker, Barbara
A. Mellers, Philip E. Tetlock, and Richard Zeckhauser, “The Value of Precision in Probability
Assessment: Evidence from a Large-Scale Geopolitical Forecasting Tournament,” International
Studies Quarterly, Vol. 62, No. 2 (2018), pp. 410–422. © published by Oxford University Press,
reproduced with permission.
1
This is the crux of the argument Sherman Kent encountered from intelligence analysts who
argued that “the most a writer can achieve when working in a speculative area of human affairs is com-
munication in only the broadest general sense.” Sherman Kent, “Words of Estimative Probability,”
Studies in Intelligence, Vol. 8, No. 4 (1964), pp. 49–65.
69
70 Wa r a n d C h a n c e
2
Aristotle, Nicomachean Ethics, tr. Terence Irwin (Indianapolis, Ind.: Hackett, 1985), p. 1094b.
3
Nate Silver, “Why FiveThirtyEight Gave Trump a Better Chance Than Almost Anyone Else,”
Fivethirtyeight.com, November 11, 2016, https://fivethirtyeight.com/features/why-fivethirtyeight-
gave-trump-a-better-chance-than-almost-anyone-else/.
4
Among election forecasters, Nate Silver actually assigned one of the highest predicted
probabilities to a Donald Trump victory. David Rothschild, for example, only gave Trump an
eleven percent chance of winning the presidency. But eleven percent chance events should not be
The Value o f P recis ion in P robab ili t y A s s e s sme nt 71
particularly surprising either. An eleven percent chance is roughly equivalent to the probability that
a major league pitcher gets a hit when batting. Such events are rare, but they do not cause anyone to
fundamentally re-evaluate their perceptions of baseball.
5
Of course, accuracy is just one measure one can use to evaluate assessments of uncertainty. For
broader discussions of what “quality” foreign policy analysis entails, see Stephen Marrin, “Evaluating
the Quality of Intelligence Analysis: By What (Mis)Measure?” Intelligence and National Security,
Vol. 27, No. 6 (2012), pp. 896–912; and Kristan J. Wheaton, “Evaluating Intelligence: Answering
Questions Asked and Not,” International Journal of Intelligence and CounterIntelligence, Vol. 22, No. 4
(2009), pp. 614–631.
6
Steven Rieber, “Intelligence Analysis and Judgmental Calibration,” International Journal of
Intelligence and CounterIntelligence, Vol. 17, No. 1 (2004), pp. 97–112.
7
I collected these data as part of annual training sessions on assessing uncertainty. The National
War College provides mid-career education to an unusually broad cross-section of national secu-
rity officials, spanning all U.S. military services, civilian national security personnel, and military
officers from partner countries. Chapter 4 describes these respondents in more detail: they comprise
the subjects from the “qualitative assessment condition” in the second and third experiments that
chapter 4 presents.
8
These words of estimative probability were defined for respondents using the National
Intelligence Council’s spectrum, which involves seven evenly spaced terms.
72 Wa r a n d C h a n c e
60%
40%
20%
0
Remote Very Unlikely Even Likely Very Almost
chance unlikely chance likely certain
Probability assessments
Figure 3.1 Calibration data from a study of U.S. national security officials. The figure
shows the relationship (with 95 percent intervals) between estimated probabilities and
observed probabilities for 9,480 probability assessments made by 316 students at the U.S.
National War College.
Note that this is another instance in which colloquial uses of the term confidence refer to proba-
9
bility assessments themselves, and not to the basis that analysts possess for making those judgments.
10
Respondents were less overconfident in using the term “remote chance.” Just 12 percent of
those statements turned out to be true, though this is still at the outer limit of the term’s definition
according to the National Intelligence Council and more than twice the term’s acceptable limit as
defined by the director of national intelligence.
11
This finding may reflect the so-called fifty-fifty bias, where probability assessors conflate “fifty
percent” with situations in which they feel they have no basis for rendering a sound judgment. For
more on this bias, see Baruch Fischhoff and Wandi Bruine de Bruin, “Fifty-Fifty = 50%?” Journal of
Behavioral Decision Making, Vol. 12, No. 2 ( June 1999), pp. 149–163. For additional evidence that
national security officials assign an “even chance” to judgments that are much less likely than fifty per-
cent to be true, see Paul Lehner et al., Using Inferred Probabilities to Measure the Accuracy of Imprecise
Forecasts (Arlington, Va.: MITRE, 2012), p. 12.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 73
questions rather than the kind of thorough analyses that foreign policy officials
use to make high-stakes decisions. It is nevertheless important to understand
how intuitive assessments of uncertainty can be miscalibrated, and specifically
how these judgments tend to involve excessive levels of certainty.12
Yet Figure 3.1 also demonstrates that these national security professionals
could reliably discriminate among different levels of subjective probability.
When study participants said that one set of statements was more likely to be
true than another, this consistently proved to be the case.13 The way in which
these officials distinguished between words of estimative probability thus clearly
reflected meaningful insight rather than arbitrary detail. And in this respect, the
notion that the data in Figure 3.1 reflect intuitive judgments rather than careful
analysis only adds to the salience of these findings. Here, we see that national
security officials can reliably draw distinctions among subjective probabilities,
even when they devote limited effort to the task.
Figure 3.1 thus illustrates the difference between estimating the value of pre-
cision in probability assessment and examining the accuracy of probabilistic
judgments overall. There is no contradiction in accepting that foreign policy
analysts’ judgments can be flawed and in arguing that those analysts can still
draw meaningful distinctions among different levels of certainty. Yet the kinds of
data shown in Figure 3.1 still provide a limited basis for identifying the threshold
of “allowable exactness” in this domain. For one thing, these data leave open
the question of whether respondents could have parsed their judgments more
precisely than seven words of estimative probability. Moreover, these data ul-
timately reflect a relatively small number of people answering a relatively small
number of questions.
Thus, in order to understand where the threshold of allowable exactness
lies when assessing uncertainty in international politics, we need a data set that
includes quantitative probability assessments; we need that data set to span a
broad range of people and topics; and we need to develop a structured method
for estimating the extent to which expressing these judgments more precisely ac-
tually changes their predictive value. The next section explains how I teamed up
with a group called the Good Judgment Project in order to conduct such a study.
12
The findings in Figure 3.1 resonate with a substantial body of research that documents over-
confidence in national security decision making. Chapter 4 presents additional evidence on how
judgments involving the highest levels of certainty deserve to be treated with special caution.
Dominic D. P. Johnson, Overconfidence in War: The Havoc and Glory of Positive Illusions (Cambridge,
Mass.: Harvard University Press, 2004).
13
Differences among these categories were all statistically significant at the p < 0.001 level, except
for the distinction between judgments that were “unlikely” versus “very unlikely,” which was statisti-
cally significant at the p < 0.05 level.
74 Wa r a n d C h a n c e
14
Philip E. Tetlock, Expert Political Judgment: How Good Is It? How Can We Know? (Princeton,
N.J.: Princeton University Press, 2005).
15
See also Philip E. Tetlock, “Theory-Driven Reasoning about Plausible Pasts and Probable
Futures in World Politics: Are We Prisoners of Our Own Preconceptions?” American Journal of Political
Science, Vol. 43, No. 2 (1999), pp. 335–366. On broader tendencies for individuals to develop illusory
perceptions of effectiveness when assessing uncertainty, see Daniel Kahneman, Thinking, Fast and
Slow (New York: Farrar, Straus and Giroux, 2011), pp. 199–221, and Robyn Dawes, Rational Choice in
an Uncertain World (San Diego, Calif.: Harcourt Brace Jovanovich, 1988), pp. 100–120, 256–261.
16
On the origins of the GJP, see Philip E. Tetlock and Daniel Gardner, Superforecasting: The
Art and Science of Prediction (New York: Crown, 2015). On the GJP’s methodology, see Barbara
A. Mellers et al., “Psychological Strategies for Winning a Geopolitical Forecasting Tournament,”
Psychological Science, Vol. 25, No. 5 (May 2014), pp. 1106–1115; Barbara A. Mellers et al., “The
Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics,” Journal of
Experimental Psychology: Applied, Vol. 21, No. 1 (March 2015), pp. 1–14; and Barbara A. Mellers
et al., “Improving Probabilistic Predictions by Identifying and Cultivating ‘Superforecasters,’”
Perspectives on Psychological Science, Vol. 10, No. 3 (May 2015), pp. 267–281.
17
The main exception to ecological validity with respect to IARPA’s question list was the require-
ment that each question be written precisely enough that the outcomes could be judged clearly after
the fact.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 75
18
Jeffrey A. Friedman, Joshua D. Baker, Barbara A. Mellers, Philip E. Tetlock, and Richard
Zeckhauser, “The Value of Precision in Probability Assessment: Evidence from a Large-Scale
Geopolitical Forecasting Tournament,” International Studies Quarterly, Vol. 62, No. 2 (2018), pp.
410–422.
19
The GJP also administered a prediction market, but because those data involve respondents
making buy/sell decisions as opposed to registered probability assessments, the prediction market is
less relevant to the analysis presented in this chapter.
20
In order to exclude less serious participants from our study, we limited our analysts to
forecasters who made at least twenty-five predictions in a given year.
21
Topics included defining base rates, avoiding cognitive biases, and extrapolating trends from
data. For a discussion of these methods, see Welton Chang, Eva Chen, Barbara Mellers, and Philip
Tetlock, “Developing Expert Political Judgment: The Impact of Training and Practice on Judgmental
Accuracy in Geopolitical Forecasting Tournaments,” Judgment and Decision Making, Vol. 11, No. 5
(2016), pp. 509–526.
76 Wa r a n d C h a n c e
22
The GJP team analyzed the superforecasters in extensive detail. Broadly speaking, they
found that superforecasters were not experts in particular subjects or methodologies. Instead,
superforecasters typically shared a willingness to address each forecasting problem in a flexible way
and to draw on an eclectic range of inputs rather than any particular theoretical or methodological
framework. See Mellers et al., “Psychological Strategies for Winning a Geopolitical Forecasting
Tournament”; Mellers et al., “The Psychology of Intelligence Analysis”; and Mellers et al., “Improving
Probabilistic Predictions by Identifying and Cultivating ‘Superforecasters.’ ”
23
Someone who knew the future would have given a probability estimate of one hundred per-
cent, the difference between sixty percent and one hundred percent is 0.40, and 0.40 squared is 0.16.
See appendix section 1a for a more formal description of the Brier score.
24
Someone who knew the future would have provided a probability estimate of zero percent, the
difference between sixty percent and zero percent is 0.60, and 0.60 squared is 0.36.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 77
that distance as small as possible. All else being equal, lower Brier scores thus
represent more accurate judgments.
The Brier score is what decision theorists call a strictly proper scoring rule. This
means that there is no way to cheat the Brier score when reporting probability
estimates. Analysts should always expect to receive their best Brier scores by hon-
estly reporting their beliefs about the chances that a statement is true.25 The Brier
score is the most common method scholars use to evaluate the accuracy of prob-
ability assessments, which is why my colleagues and I chose to make it the main
metric for our study. But to make sure that our findings were not driven by the spe-
cific properties of the Brier score, we replicated our results using another evalua-
tion technique called logarithmic scoring. The appendix explains what logarithmic
scoring entails and shows that it does not meaningfully impact the empirical anal-
ysis presented here.26
The second step in our analysis involved coarsening the forecasts in our data
set. There are many ways one can do this. For example, we saw in c hapter 1 that
the DNI’s current analytic standards divide probability estimates into seven
bins. The middle of these bins, labeled “even chance,” spans probabilities be-
tween forty-five and fifty-five percent. Chapter 2 then explained that, absent ad-
ditional information, assessing a range of probabilities is equivalent to stating its
midpoint.27 We can thus say that every probability estimate in our data set that
falls between forty-five and fifty-five percent corresponds to the DNI’s definition
of “even chance,” and we can assume that the word “even chance” implies a prob-
ability estimate of fifty percent.28 Coarsening probability assessments according
to the DNI standards thus results in rounding all of forecasts in our data set to
seven possible values.29
We can generalize this approach by dividing the number line into bins of
any shapes and sizes that we like. Chapter 1, for example, showed how the
National Intelligence Council (NIC) divides words of estimative probability
into seven equally spaced categories. Rounding probability estimates to the
midpoint of these segments will thus coarsen forecasts to another set of seven
possible values. In other cases, national security analysts assess uncertainty by
ascribing “low confidence,” “moderate confidence,” or “high confidence” to
25
Chapter 5 provides additional discussion of strictly proper scoring rules.
26
See appendix section 1b.
27
We also experimented with an alternative rounding method in which we replaced each obser-
vation with the mean forecast that fell within each bin on each question. This allows the meaning of
each qualitative expression to vary by context, and it did not meaningfully change our results.
28
Chapter 2 noted that it is often useful to explain the ambiguity surrounding a probability esti-
mate, but this is part of assessing analytic confidence, not estimating probability per se.
29
When forecasts fell on the boundaries between bins, we randomized the direction of rounding.
78 Wa r a n d C h a n c e
their judgments—this essentially divides the number line into three bins.30 The
vaguest way that we have seen foreign policy analysts express uncertainty is by
using estimative verbs, such as “we assess” or “we judge.” Estimative verbs essen-
tially divide probability estimates into two bins: statements that analysts think
are likely to be true and those they do not think are likely to be true.
The final step in our methodology involved comparing the accuracy of
forecasts between their original and coarsened values. For instance, if the orig-
inal forecast had a Brier score of 0.20 and the coarsened forecast had a Brier
score of 0.22, that would represent a ten percent degradation in predictive ac-
curacy. We call these relative comparisons rounding errors. Rounding errors
indicate the loss of accuracy that forecasts suffered when we rounded these
judgments to different degrees of (im)precision. These rounding errors provide
our main metric for estimating the value of precision across nearly one million
geopolitical forecasts.
Caveats
To my knowledge, this analysis represents the first systematic attempt to un-
derstand the value of precision when assessing subjective probability. It is cer-
tainly the first attempt to estimate the threshold of “allowable exactness” when
assessing uncertainty in international politics—a notable fact given the strong
views many scholars, practitioners, and pundits appear to hold on the issue. And
the Good Judgment Project’s data are uniquely positioned to assess the value of
precision in probability assessment given the sheer volume of forecasts that the
GJP collected, the range of individuals that the project involved, and the efforts
of IARPA to ensure that forecasters addressed questions that were relevant to
policymakers’ concerns. Nevertheless, any research design has drawbacks, four
of which are worth addressing up front.
30
As chapters 1 and 2 explained, assessments of confidence are technically distinct from
assessments of probability, but foreign policy analysts regularly conflate confidence and proba-
bility when expressing their judgments. For example, a controversial 2007 NIE titled Iran: Nuclear
Intentions and Capabilities presented twenty-six key judgments. Fourteen of them assessed uncer-
tainty using confidence levels without also offering words of estimative probability. The document
thus stated: “We assess with high confidence that until fall 2003, Iranian military entities were
working under government direction to develop nuclear weapons” and “We judge with moderate
confidence that Tehran had not restarted its nuclear weapons program as of mid-2007.” Readers
could reasonably assume that a statement made with “high confidence” was more likely to be
true than a statement made with “moderate confidence” or “low confidence.” In this way, the use
of confidence levels to assess uncertainty effectively divides the number line into three segments,
in a manner that is more precise than estimative verbs but less precise than words of estimative
probability.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 79
First, the GJP only asked respondents to make predictions that could be re-
solved within a feasible time frame. An average of seventy-six days (standard de-
viation, eighty days) elapsed between the time when a forecast was registered
and the time when that forecast’s outcome became known. Yet most foreign
policy analysis tends to focus on relatively short-term issues, too.31 We will also
see that the value of precision across GJP forecasts held relatively consistent
across time spans. At the very least, it is possible to show that our assessments
of analysts’ capabilities in this area are not driven by the shortest-term forecasts
in the data set.
A second caveat for interpreting these data is that the GJP only asked
respondents to assess the probability of future events, whereas we saw in
chapter 2 that foreign policy analysts must also assess uncertainty about factual
matters, such as Osama bin Laden’s location or Saddam Hussein’s weapons of
mass destruction programs. Yet forecasting should generally be more difficult
than assessing uncertainty about the current state of the world—in addition to
grappling with imperfect information, forecasters must also account for the pos-
sibility that this information may change in the future. If making predictions is
indeed harder than assessing uncertainty about the present and past, then the
findings presented in the rest of the chapter should actually understate the value
of precision when assessing probability in international politics.
Because the participants in the Good Judgment Project volunteered their
time and effort, it would be a mistake to claim that these individuals are a rep-
resentative sample of foreign policy analysts. One particular concern is that the
individuals who self-selected into the competition might be unusually capable at
probabilistic reasoning. Selection bias is a concern in nearly all survey-based re-
search, and I will address that challenge in two ways. First, I will show that returns
to precision correlate with individual attributes such as education, numeracy,
and cognitive style. Then, in chapter 4, I will compare how national security
professionals and non-elite respondents perform on similar probability assess-
ment tasks. Neither of these analyses provides any indication that the book’s
main findings depend on survey respondents possessing special backgrounds.
Finally, even though the GJP data set contains nearly one million forecasts,
those forecasts are correlated with each other, in the sense that the GJP gathered
large numbers of responses to each question that it posed, and because forecasters
could update their forecasts over time. It would therefore be inappropriate to
treat all forecasts in the data set as representing independent observations. My
colleagues and I accounted for this issue by treating forecasting questions as our
31
David Mandel and Alan Barnes, “Geopolitical Forecasting Skill in Strategic Intelligence,”
Journal of Behavioral Decision Making, Vol. 31, No. 1 (2018), pp. 127–137.
80 Wa r a n d C h a n c e
unit of analysis. As the appendix describes in more detail, we took all forecasts
in the data set that corresponded to a given question; we calculated Brier scores
across all of those forecasts simultaneously; and then we examined how much
coarsening forecasts changed their predictive accuracy.32 This was an extremely
conservative approach to conducting statistical analysis, because it reduced our
effective sample size from 888,328 forecasts to 380 forecasting questions. The
next section nevertheless uses this method to demonstrate that the value of pre-
cision in probability assessment is larger and more consistent than what the con-
ventional wisdom expects.
Primary Results
To recap, the goal of this chapter is to estimate where the threshold of “allowable
exactness” lies when it comes to assessing uncertainty in international politics.
If such a threshold exists, it would reflect a fundamental constraint on the con-
duct of foreign policy analysis. Yet to date, there has been virtually no systematic
empirical study of where this threshold might lie. The analysis in the remainder
of this section suggests that this threshold—if it exists at all—is much more per-
missive than the conventional wisdom expects.
Table 3.1 summarizes the main results from this analysis, showing that
coarsening forecasts to different degrees of imprecision consistently reduced
their predictive accuracy. For example, the top-right corner of the table shows
that the Brier scores associated with GJP forecasts became 31 percent worse,
on average, when we transformed numeric probabilities into estimative verbs.
Outliers did not drive this difference, as the median rounding error across all
forecasts in the data set was 22 percent.33 Even the worst-performing group of
forecasters in the data set—untrained individuals who worked alone—incurred
an average rounding error of 15 percent when we rounded their forecasts into
two bins. Figure 3.1 also shows substantial penalties from coarsening GJP
forecasts into “confidence levels.” On average, this level of imprecision de-
graded forecast accuracy by more than ten percent, and far more than that for
the higher-performing analysts. These findings provide unambiguous evidence
of how the vague language foreign policy analysts typically use to assess uncer-
tainty systematically sacrifices meaningful information.
Translating forecasts into seven- step words of estimative probability
recovered some, but not all, of these losses. Despite the extremely conservative
We estimated statistical significance using standard two-way t-tests when comparing means,
33
Note: the table describes the extent to which coarsening probability assessment using different systems of expression reduced the judgmental accuracy of the Good Judgment
Project (GJP) forecasts. For example, rounding superforecasters’ estimates into the seven equal bins recommended by the National Intelligence Council degraded Brier scores by an
average of 40%.
Unless otherwise noted, all estimates are statistically distinct from zero at the p < 0.001 level. GJP = Good Judgment Project; DNI = Director of National Intelligence; NIC = National
Intelligence Council.
^
This estimate is statistically distinct from zero at the p < 0.05 level.
†
This is the only estimate that is not statistically distinct from zero at the p < 0.05 level.
82 Wa r a n d C h a n c e
34
The DNI spectrum compensates for tightening the “remote” and “almost certain” bins by
widening the “likely” and “unlikely” bins. This makes a majority of forecasts worse (and the differ-
ence in means more statistically significant) even as average rounding errors decline.
35
Mark Lowenthal, “Towards a Reasonable Standard for Analysis: How Right, How Often on
Which Issues?” Intelligence and National Security, Vol, 23, No. 3 (2008), p. 314.
36
Thomas Fingar, Reducing Uncertainty: Intelligence Analysis and National Security (Stanford,
Calif.: Stanford Security Studies, 2011), pp. 34, 130.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 83
37
These results are symmetrical (e.g., the results of rounding forecasts to the terms “likely” and
“unlikely” create identical rounding errors) because any forecast automatically implies its comple-
ment. In other words, stating that an outcome has a ten percent chance of occurrence is equivalent to
estimating a ninety percent chance of non-occurrence, and we score both judgments simultaneously.
38
For instance, the average forecast falling in the “remote chance” segment of the number line
was ten percent. The midpoint of the “remote chance” bin is seven percent. Coarsening forecasts thus
tended to shift judgments in this part of the number line a bit closer toward certainty.
Table 3.2. Returns to Precision across the Number Line
Group Remote Very unlikely Unlikely Even chance Likely Very likely Almost certain
(.00–.14) (.15–.28) (.29–.42) (.43–.56) (.57–.71) (.72–.85) (.86–1.0)
Note: The table examines how rounding probability estimates into seven equal bins influences the accuracy of forecasts within different segments of the number line. For example,
coarsening the probability estimates that fell between 72% and 85% into the phrase “very likely” degraded forecasters’ Brier scores by an average of 4.3%. All estimates in this table are
statistically distinct from zero at the p < 0.001 level.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 85
forecasts more often than not (hence the improvement in the median Brier
score), but when analysts’ judgments proved to be mistaken, making forecasts
more extreme caused analysts to suffer large penalties (hence the degradation in
the average Brier score). In any case, these findings demonstrate that the value
of precision across GJP forecasts did not depend on nearly certain responses to
especially easy questions.
39
As previously mentioned, forecasters were allowed to register and update their predictions
whenever they liked over the course of the competition. The average forecast was registered seventy-
six days before the outcome of that forecast became known, but there is substantial variation on that
metric (standard deviation, eighty days).
40
There were 109,240 Lay-Ups in our data, leaving 259,696 forecasts in each of the other three
categories.
41
The average duration of a long-term forecast was 177 days (standard deviation, 77 days).
42
See appendix section 1c.
86 Wa r a n d C h a n c e
.25
.2
Proportion of questions
.15
.1
.05
0
3 bins 7 bins 10 bins 15 bins
(confidence (words of estimative
levels) probability)
Figure 3.2 Thresholds of estimative precision across the Good Judgment Project
questions.
Thus, we began by coarsening all forecasts into one bin (that is, we rounded all judgments to
43
fifty percent). If this degraded the accuracy of the median forecast, then we coarsened all forecasts
into two bins: that is, we rounded all judgments to either twenty-five percent or seventy-five percent.
We continued this procedure until we no longer found that coarsening judgments generated statis-
tically significant rounding errors. We estimated statistical significance for these purposes using a
comparison of medians, based on a one-sided, paired-sample Wilcoxon signed rank test with a 0.05
significance threshold.
44
The average threshold is 6.1 bins, with a standard deviation of 4.4.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 87
where an event they thought was unlikely did in fact occur. In another 7 percent
of cases, we observed no systematic costs to rounding analysts’ judgments into
“estimative verbs.”
Yet we found that foreign policy analysts could generally do better than this.
Coarsening judgments into estimative verbs systematically sacrificed predictive
accuracy for 77 percent of questions in the data set. Rounding these judgments
into confidence levels degraded respondents’ Brier scores 70 percent of the time.
Coarsening forecasts into seven-step words of estimative probability systemati-
cally sacrificed information for 42 percent of the questions we examined. And for
9 percent of these questions, we found that respondents could reliably parse their
judgments more finely than intervals of ten percentage points. In other words,
we found that the U.S. Intelligence Community’s official analytic standards for
assessing uncertainty systematically sacrificed information on nearly half of the
forecasting problems in our data set, and that there was a nontrivial proportion
of topics where even rough numeric expressions were insufficient to capture all
the insight foreign policy analysts could reliably provide.
Nevertheless, it could still be the case that returns to precision in probability
assessment are concentrated in particular sectors of foreign policy analysis. To
test this hypothesis, we classified questions as corresponding to eleven regions
(e.g., Western Europe, Sub-Saharan Africa) and fifteen topics (e.g., domestic
conflict, trade).45 We found that this information captured almost no system-
atic variation in returns to precision across questions in the GJP data set.46 This
finding—or, more accurately, this lack of consistent variation in returns to preci-
sion across substantive domains—further supports the argument that the value
of precision in probability assessment extends across a broad range of foreign
policy issues.
45
These categories were not mutually exclusive.
46
When we combined all 26 variables into a model predicting thresholds of estimative precision
across questions—a method that biases statistical analysis toward “overfitting” the data based on
spurious correlations—this model had an R2 value of just 0.16. See Friedman et al., “The Value of
Precision in Probability Assessment,” for more details on this analysis.
88 Wa r a n d C h a n c e
assessment varies across individuals, and shows that this capability does not de-
pend on special levels of numeracy, high degrees of educational achievement, or
particular cognitive profiles.
In addition to extending the robustness of the chapter’s empirical results,
this analysis speaks to long- standing concerns about the divide between
“mathematicians” and “poets” in foreign policy analysis. As we saw in c hapter 1,
descriptions of this divide date back to the writings of Sherman Kent, who found
that many intelligence officials were extremely uncomfortable with the idea of
treating subjective judgments through the lens of rationalist logic.47 If this dis-
comfort maps onto aptitudes for assessing uncertainty—that is, if the so-called
mathematicians are the only analysts who can draw meaningful distinctions
among their probability estimates—that would undermine the generalizability
of the findings presented in this chapter.
This concern resonates with a large body of research that examines the rela-
tionship between numeracy and political knowledge.48 In many areas of public
policy, it appears that only a small fraction of the population can grasp politically
relevant numbers. One prominent example is that most voters possess wildly
inaccurate perceptions of how much money the U.S. government spends in dif-
ferent policy areas.49 A majority of American voters cannot estimate the size of
the budget deficit within an order of magnitude, let alone command the array
of figures necessary to understand the federal budget in a serious way.50 These
findings are disheartening, but they are not particularly surprising. Most voters
are not naturally exposed to information about the federal budget as part of
their daily lives, and there is virtually no other domain in which one encounters
dollar values so enormous. As a result, few people are positioned to develop an
47
For contemporary descriptions of this controversy, see Ben Connable, Embracing the Fog of
War (Santa Monica, Calif.: Rand, 2012); and Rob Johnston, Analytic Culture in the U.S. Intelligence
Community (Washington, D.C.: Center for the Study of Intelligence, 2005). Discomfort with scien-
tific approaches to national security analysis dates back to Clausewitz. See Azar Gat, The Origins of
Military Thought: From the Enlightenment to Clausewitz (Oxford, U.K.: Clarendon Press, 1989).
48
See, for example, Ellen Peters et al. “Numeracy and Decision Making,” Psychological Science,
Vol. 17, No. 5 (2006), pp. 407–413; and Michael Delli Carpini and Scott Keeter, What Americans
Know about Politics and Why It Matters (New Haven, Conn.: Yale University Press, 1996).
49
See, for example, Martin Gilens, “Political Ignorance and Collective Policy Preferences,”
American Political Science Review, Vol. 95, No. 2 (2001), pp. 379–396; James H. Kuklinksi et al.,
“Misinformation and the Currency of Democratic Citizenship,” Journal of Politics, Vol. 62, No. 3
(2000), pp. 790–816; Steven Kull, Americans and Foreign Aid: A Study of American Public Attitudes
(College Park, Md.: Program on International Policy Attitudes, 1995).
50
On the broader cognitive challenges that many people face in dealing with money, see Dan
Ariely and Jeff Kriesler, Dollars and Sense: How We Misthink Money and How to Spend Smarter (New
York: HarperCollins, 2017).
The Value o f P recis ion in P robab ili t y A s s e s sme nt 89
intuitive sense about what it means for the federal government to spend a “small”
or a “large” amount of money on any policy issue, let alone to calibrate their
perceptions of the federal budget in a systematic way.
Yet there are two main reasons why these findings may not extend into the
realm of probability assessment. The first of these reasons is that probability
estimates fall along a scale that is easy to understand. All probabilities are
bounded between zero and one hundred percent. Virtually everyone knows that
a ten percent chance is quite small, that a ninety percent chance is quite large,
and that fifty-percent chance events occur as often as not. Thus, even though
probability is an abstract concept, it is reasonable to expect that most people
will find those numbers to be more intuitive than federal expenditures or other
numbers that citizens often struggle to comprehend.
Furthermore, most people naturally encounter probabilities throughout a
broad range of activities in daily life. Obvious examples involve weather forecasts,
sports statistics such as batting averages or free-throw percentages, and assigning
betting odds . This does not mean that we should expect everyone to possess
identical skill in assessing uncertainty. But the everyday nature of probabilistic
judgment is another reason to expect that fluency in this domain will be distrib-
uted more broadly than capabilities for dealing with other kinds of politically
relevant numbers.
Consistent with this argument, the remainder of the section will demonstrate
that the value of precision in probability assessment does not appear to depend
on numeracy, education, or cognitive style. Instead, the data suggest that the
ability to draw meaningful distinctions among probability assessments is a gen-
eralizable skill that foreign policy analysts can cultivate through training, effort,
and experience. And if that is the case, then the data presented in this chapter
may actually underestimate the value of precision in probability assessment
among professional foreign policy analysts, who can devote far more time and
effort to political forecasting than the individuals who participated in the Good
Judgment Project.
51
This variable had a mean of 4.36, standard deviation 8.02. Note that the mean threshold of esti-
mative precision at the individual level is thus lower than the mean threshold of estimative precision
at the question level. This is because the average forecaster registered 85 predictions throughout the
90 Wa r a n d C h a n c e
how these thresholds varied across the 1,832 forecasters who participated in the
Good Judgment Project.
My colleagues and I divided independent variables into two categories. The
first of these categories captures skill, training, effort, experience, and collabora-
tion. We can call these factors Targets for Cultivation, because organizations and
individuals have feasible opportunities to develop these attributes. The second
category of predictors in this analysis captures information on numeracy, edu-
cation, and cognitive style. If these prove to be the primary source of returns to
precision, then this finding would bolster the notion that the ability to assess
uncertainty with meaningful detail belongs to people with special backgrounds.
We can call these attributes Targets for Selection, because such findings would
suggest that improving returns to precision in foreign policy analysis is mainly
a matter of recruiting the right analysts for the job. This would also lend cre-
dence to the idea that individuals who self-identify as “poets” could justifiably
conclude that they would be incapable of assessing subjective probabilities with
meaningful detail.
The appendix provides more information on each of these attributes, but here
is a short description of what those variables entailed.52 We measured forecasting
skill by calculating respondents’ median Brier scores. Though we expected to
see that higher-quality forecasters would incur greater penalties from having
their forecasts coarsened, it is important to note that this relationship is not tau-
tological. It is possible for a forecaster to be excellent at discriminating events
that are unlikely from those that are likely, even if she is not especially good at
calibrating fine-grained distinctions within each category. This is, indeed, the hy-
pothesis implied by recommendations that foreign policy analysts express their
judgments using coarse language like estimative verbs or confidence levels.53
We used five additional variables to capture effort, training, experience, and
collaboration. We counted the number of questions each forecaster answered
across all the years of the competition, and we counted the average number of
times each forecaster revised those predictions. Each variable proxies for the ef-
fort respondents expended in engaging with the competition and for their ex-
perience responding to forecasting questions. We also included a measure of
the granularity of respondents’ forecasts, which captures the proportion of the
GJP, whereas each question received more than 1,000 individual forecasts. Larger volumes of data
make it possible to detect smaller statistical changes: hence the higher thresholds of estimative preci-
sion when aggregating the data by questions relative to individuals.
See appendix section 1d.
52
An analyst who is excellent at discrimination yet overconfident in her estimates could even see
53
her estimates improve on account of rounding schemes that shift her most extreme judgments away
from certainty.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 91
time that forecasters registered probability assessments that were not multiples
of ten percentages points. Finally, we recorded which GJP respondents received
training in probabilistic reasoning as part of participating in the study, and
which forecasters were assigned to teams with whom they could collaborate
before making their judgments. Since the GJP assigned respondents to these
conditions at random, these data provide a unique opportunity to study how
training and group work improve probabilistic reasoning.
Our analysis spanned six Targets for Selection. We measured respondents’
education in terms of the highest academic degree they had obtained. We meas-
ured respondents’ numeracy using a standard battery of word problems that
psychologists use to capture mathematical fluency. We also examined four
batteries of questions that GJP forecasters completed to establish a cognitive
profile: Raven’s Progressive Matrices, which is an index that captures reasoning
ability; the expanded Cognitive Reflection Test, which reflects an individual’s
propensity to suppress misleading intuitive reactions in favor of more accurate,
deliberative answers; a Fox-Hedgehog index that captures respondents’ self-
assessed tendency to rely on ad hoc reasoning versus simplifying frameworks;
and Need for Cognition, a scale that reflects respondents’ self-assessed prefer-
ence for addressing complex problems.54
Results
To summarize, this portion of the analysis aims to provide insight into why
foreign policy analysts vary in their ability to draw meaningful distinctions
among subjective probabilities. From a conceptual standpoint, we can divide
these sources of variation into two groups: Targets for Cultivation and Targets
for Selection. My colleagues and I collected data on six separate factors that fell
into each category. The appendix presents a full statistical analysis of how these
variables predict returns to precision across individuals.55 Here, I will summarize
the main findings from that analysis.
We found that the Targets for Cultivation variables consistently predicted
returns across forecasters. Across nearly two thousand people who participated
in the Good Judgment Project, forecasters demonstrated a greater ability
to draw meaningful distinctions among probabilities when they had better
forecasting skill, when they responded to more questions, when they re-
vised their forecasts more often, when they received training in probabilistic
54
In addition to measuring respondents’ cognitive style using these indices, we also included
control variables for age, gender, and whether a respondent was designated as a superforecaster in any
tournament year. See appendix section 1e.
55
See appendix section 1e.
92 Wa r a n d C h a n c e
reasoning, and when they worked together in groups.56 Granularity was the
only Target for Cultivation variable that did not prove to be a statistically signif-
icant predictor of returns to precision. In other words, we found that coarsening
probability assessments systematically sacrificed predictive accuracy, even
among forecasters who were naturally inclined to express their judgments in
terms of round numbers.
By contrast, we found that the Targets for Selection variables had almost
no ability to predict returns to precision in probability assessment. None of
our measures for education, numeracy, or cognitive style proved to be statis-
tically significant predictors of why some respondents demonstrated higher
returns to precision than others. We thus found no evidence that returns to
precision belong primarily to “mathematicians” versus “poets,” nor that the
ability to make subjective probability assessments belongs to people who
possess special cognitive skills. Instead, we found that when a broad range
of foreign policy analysts took the time and effort to assess uncertainty in
clear and structured ways, that consistently improved the accuracy of their
judgments.
These findings provide a heartening basis for generalizing on the basis of
the chapter’s empirical results. For example, collaboration tends to be much
denser among foreign policy professionals than it was among GJP groups who
worked together sporadically and online. Government agencies can train pro-
fessional analysts much more extensively than the simple, one-hour training
modules that GJP respondents received. Many foreign policy analysts assess
uncertainty on a regular basis over many years, which affords them far more
experience than the respondents who participated in the GJP competition.
And most practitioners have far more opportunity and incentive to refine and
revise their forecasts in light of new information.57 All these factors correlate
with higher returns to precision across the GJP data set. It is thus reasonable to
56
It is not surprising that Number of Questions predicted returns to precision among GJP
respondents. Forecasters who registered more predictions were not only more experienced and more
engaged in the competition, but they also provided more statistical power for calculating thresholds
for estimative precision, such that smaller rounding errors would register as being statistically sig-
nificant. The analysis presented here cannot isolate how much of this correlation resulted from
sample size versus gains from experience. Yet either interpretation has the same practical implica-
tion: the more forecasts analysts make, the more likely it becomes that coarsening those estimates
will systematically sacrifice information. Given the vast quantity of judgments that national security
officials make, along with the vast numbers of interviews and essays that make up the marketplace
of ideas, the relationship we observe between Number of Questions and returns to precision further
emphasizes how the GJP data may understate the degree to which vague probability assessments
sacrifice information from foreign policy discourse.
57
GJP respondents revised their forecasts, on average, less than twice per question.
The Value o f P recis ion in P robab ili t y A s s e s sme nt 93
expect that the findings presented here understate the degree to which placing
greater emphasis on probability assessment could improve the quality of for-
eign policy discourse.
assessing uncertainty in clear and structured ways could expose foreign policy
analysts to excessive criticism.
These arguments are important because they suggest another set of funda-
mental obstacles that foreign policy analysts face when it comes to assessing un-
certainty. Yet chapters 4 and 5 will explain that the theoretical and empirical
foundations of these arguments are much weaker than the conventional wisdom
suggests. Here, too, we will see that common objections to assessing uncertainty
in international politics reflect basic misperceptions about the nature and limits
of probabilistic reasoning.
4
The book’s previous chapters explained how foreign policy analysts can assess
uncertainty in a manner that is theoretically coherent and empirically mean-
ingful, even when they are dealing with complex, unique phenomena. But are
those judgments actually useful for policymaking? In this chapter, we will en-
counter two main sources of skepticism to that effect. The first is that foreign
policy decision makers are largely uninterested in probabilistic reasoning, pre-
ferring that analysts “make the call” on tough questions. The second concern
is that debating assessments of uncertainty in clear and structured ways could
surround subjective judgments with illusions of rigor that warp high-stakes
choices.1 The illusions-of-rigor argument plays an especially prominent role in
what the book calls the “rejectionist” objection to probabilistic reasoning—the
idea that assessing uncertainty can be actively counterproductive in shaping for-
eign policy debates.
The rejectionists’ concerns are worth taking seriously in light of the growing
body of scholarship that shows how cognitive constraints hinder foreign policy
decision making.2 Yet while psychologists generally believe that decision makers
can mitigate their cognitive shortcomings by analyzing high-stakes issues in
clear and structured ways, the rejectionist viewpoint suggests that clarity and
structure can backfire when it comes to assessing uncertainty, activating psy-
chological distortions that create major problems of their own. At best, the
*
Portions of this chapter previously appeared in Jeffrey A. Friedman, Jennifer S. Lerner,
and Richard Zeckhauser, “Behavioral Consequences of Probabilistic Precision: Experimental
Evidence from National Security Professionals,” International Organization, Vol. 71, No. 4 (2017),
pp. 803–826. ©, published by Cambridge University Press, reproduced with permission.
1
Of course, these arguments cannot simultaneously be true: the first posits that foreign policy
officials ignore assessments of uncertainty, and the second posits that foreign policy officials care too
much about such judgments. Ironically, we will see that both arguments are widespread in scholar-
ship on foreign policy decision making and intelligence analysis.
2
Rose McDermott, Risk-Taking in International Politics (Ann Arbor, Mich.: University of
Michigan Press, 1998); Daniel Kahneman and Jonathan Renshon, “Why Hawks Win,” Foreign Policy,
95
96 Wa r a n d C h a n c e
No. 158 (2007), pp. 34–38; Jack S. Levy, “Psychology and Foreign Policy Decision-Making,” in
Leonie Huddy, David O. Sears, and Jack S. Levy, eds., The Oxford Handbook of Political Psychology,
2nd ed. (New York: Oxford University Press, 2013); and Emilie M. Hafner-Burton et al., “The
Behavioral Revolution and the Study of International Relations,” International Organization, Vol. 71,
No. S1 (2017), pp. S1–S31.
3
Zvi Lanir and Daniel Kahneman, “An Experiment in Decision Analysis in Israel in 1975,” Studies
in Intelligence, Vol. 50, No. 4 (2006). Available at https://www.cia.gov/library/center-for-the-study-
of-intelligence/csi-publications/csi-studies/studies/vol50no4/an-experiment-in-decision-analysis-
in-israel-in-1975.html.
4
Center for the Study of Intelligence, National Estimates: An Assessment of the Product and Process
(Central Intelligence Agency, 1977), p. 34.
5
James Marchio, “The Intelligence Community’s Struggle to Express Analytic Certainty in the
1970s,” Studies in Intelligence, Vol. 58, No. 4 (2015), pp. 31–42.
6
Gregory F. Treverton, “Theory and Practice,” Intelligence and National Security, Vol. 33, No. 4
(2018), p. 476. On how foreign policy decision makers often pressure analysts to “make the call”
rather than offering uncertain judgments, see Mark M. Lowenthal, Intelligence: From Secrets to Policy,
3rd ed. (Washington, D.C.: CQ Press, 2006); Richard K. Betts, Enemies of Intelligence: Knowledge
and Power in American National Security (New York: Columbia University Press, 2007); and Thomas
Fingar, Reducing Uncertainty: Intelligence Analysis and National Security (Stanford, Calif.: Stanford
Security Studies, 2011).
98 Wa r a n d C h a n c e
crucial to the logic of foreign policy decision making. But the analysts’ job is
to provide foreign policy decision makers with useful information. If decision
makers do not find assessments of uncertainty to be useful, then analysts might
reasonably choose to spend their time and effort elsewhere.7
At the same time, one of the central lessons of behavioral psychology is that
decision makers respond to a vast range of stimuli, knowingly or otherwise. Just
as no one would conclude that decision makers are free from confirmation bias
or the sunk-cost fallacy simply because they say so, there is no reason to con-
clude that assessments of uncertainty are irrelevant to foreign policy decision
making just because policymakers seem disinterested in the subject.
Indeed, decision makers’ apparent ambivalence toward assessments of un-
certainty might signal discomfort instead of apathy. As we saw in c hapters 1
and 2, many foreign policy officials are reluctant to accept the fact that high-
stakes choices depend on subjective judgments. If reasonable people can disa-
gree about what those judgments entail, then it can be genuinely difficult to say
how that information should inform rigorous decisions.8 Decision makers may
thus prefer to steer policy debates toward information that is more concrete and
less arguable. Such behavior might resemble the “conspicuous disinterest” in
assessments of uncertainty that Kahneman and other scholars have described—
but this does not mean that foreign policy decision makers literally ignore those
judgments.
If scholars want to understand how probability assessments shape foreign
policy decisions, it is therefore important to study the topic systematically, and
not simply to infer what decision makers might or might not be thinking based
on outward appearances. The ideal research design for this purpose would com-
pare how foreign policy decision makers respond to scenarios that are iden-
tical in all respects, save for the manner in which analysts assess uncertainty.
Needless to say, it is impossible to conduct such a study from archival records,
given how foreign policy decisions vary on so many dimensions, and given how
foreign policy analysts tend to assess uncertainty in such vague and idiosyn-
cratic ways. A second-best, but far more feasible, research design is to employ
7
This concern resonates with a broad range of international relations scholarship that argues that
foreign policy decision makers can be insensitive to nuance when assessing uncertainty, preferring
instead to base their actions on gut instincts, salient cues, or simple analogies. Prominent examples
include Robert Jervis, Perception and Misperception in International Politics (Princeton, N.J.: Princeton
University Press, 1976); Keren Yarhi- Milo, Knowing the Adversary: Leaders, Intelligence, and
Assessment of Intentions in International Relations (Princeton, N.J.: Princeton University Press, 2014);
Aaron Rapport, Waging War, Planning Peace: U.S. Noncombat Operations and Major Wars (Ithaca,
N.Y.: Cornell University Press, 2015); and Yuen Foong Khong, Analogies at War: Korea, Munich, Dien
Bien Phu, and the Vietnam Decisions of 1965 (Princeton, N.J.: Princeton University Press, 1992).
8
Chapter 6 explores the theoretical basis for addressing this challenge.
D ispel ling Il lu s ions o f R ig or 99
9
Portions of experiments 2 and 3 appeared in Jeffrey A. Friedman, Jennifer S. Lerner,
and Richard Zeckhauser, “Behavioral Consequences of Probabilistic Precision: Experimental
Evidence from National Security Professionals,” International Organization, Vol. 71, No. 4 (2017),
pp. 803–826. The analysis of experiments 1 and 4 are new to this chapter.
10
See, for example, Jason Barabas and Jennifer Jerit, “Are Survey Experiments Externally
Valid?” American Political Science Review, Vol. 104, No. 2 (2010), pp. 226–284. For evidence that
survey experiments nevertheless provide reliable benchmarks for some kinds of political behav
ior, see Jens Hainmueller, Dominik Hangartner, and Teppei Yamamoto, “Validating Vignette
and Conjoint Survey Experiments against Real-World Behavior,” PNAS, Vol. 112, No. 8 (2015),
pp. 2395–2400.
11
The notion that immediate, intuitive reactions to decision scenarios are particularly sus-
ceptible to cognitive biases is the central message of Daniel Kahneman, Thinking, Fast and Slow
(New York: Farrar, Straus and Giroux, 2011).
12
On the relationship between survey research on non-elite samples and practical concerns in
international relations, see Emilie Hafner-Burton, Alex Hughes, and David G. Victor, “The Cognitive
Revolution and the Political Psychology of Elite Decision Making, Perspectives on Politics, Vol. 11,
No. 2 (2013), pp. 368–386; Susan D. Hyde, “Experiments in International Relations: Lab, Survey,
and Field,” Annual Review of Political Science, Vol. 18 (2015), pp. 403–424; and Jonathan Renshon,
“Losing Face and Sinking Costs: Experimental Evidence on the Judgment of Political and Military
Leaders,” International Organization, Vol. 69, No. 3 (2015), pp. 659–695.
100 Wa r a n d C h a n c e
often encountered when discussing the decision sciences with foreign policy
professionals. Even if some academic finding is well-established in studies of un-
dergraduate sophomores, this does not mean we should necessarily see the same
behavior among practitioners of international politics.
To guard against this problem, I administered all four of the survey
experiments described in this chapter to cross-sections of actual foreign policy
officials, as well as to larger samples of non-elites. This “paired sample” approach
offers unique insight into the extent to which foreign policy professionals and
non-elites differ in how they assess and interpret subjective probabilities. In gen
eral, we will see that these differences turn out to be relatively small, and that
all of the chapter’s substantive findings hold across both samples. This compar-
ison helps to reinforce the chapter’s empirical results, while refuting concerns
about how foreign policy elites possess some kind of peculiar psychology that
should make them unusually incapable or unwilling to engage in probabilistic
reasoning.
13
The scenarios appeared in random order.
14
See appendix section 2b.
D ispel ling Il lu s ions o f R ig or 101
a text box summarizing what those key judgments entailed.15 After reading each
scenario, respondents reported how much they supported or opposed taking
the proposed action.16 Respondents were also asked to say how much they
supported or opposed delaying that choice in order to gather more information.
15
Some of these estimates were represented in terms of words of estimative probability shown in
Figure 4.1, and others were represented in terms of numeric percentages. This variation provides the
basis for evaluating the illusions-of-rigor thesis in the next section.
16
Respondents provided those assessments on a seven-point scale. For simplicity, the chapter
will present information only on the proportion of respondents who supported or opposed taking
102 Wa r a n d C h a n c e
The latter question will be relevant to evaluating the illusions-of-rigor thesis later
in the chapter.
The key manipulation in this experiment involved randomizing the prob-
ability assessments each scenario contained. A “neutral” version of each sce-
nario was designed to make the costs and benefits of taking action appear
roughly equivalent.17 A “pessimistic” version of each scenario entailed altering
those probability assessments in a manner that increased the potential costs
and reduced the potential benefits of taking risky action. An “optimistic”
version of each scenario entailed shifting probability assessments in the op-
posite direction. These changes were made as subtly as possible, involving
just a single step along the National Intelligence Council’s recommended
words-of-estimative-probability spectrum. Thus, while the neutral version of
the hostage-rescue mission said it was “very likely” that Special Forces could
retrieve the hostages alive, the pessimistic version said just that this possi-
bility was “likely,” and the optimistic version said that this outcome was “al-
most certain.”18 Each respondent evaluated one randomly selected version of
each scenario.
I administered this experiment to a cross-section of 208 national security
officials enrolled at the U.S. National War College. The National War College
provides advanced education to military officers who gain appointments to
the rank of colonel or commander, and draws students from each branch of
the U.S. military. The student body also includes active-duty officials of similar
rank from civilian agencies and from other countries’ militaries. Twenty-five
percent of respondents were professionals from the Department of State, the
Intelligence Community, or other foreign policy organizations. Thirteen per-
cent were military officers from countries besides the United States. National
War College students thus provide a relatively broad cross-section of rela-
tively high-ranking national security personnel. Response rates among this
sample are also extremely high—in this case, ninety-five percent—thereby
mitigating concerns about response bias that are common to elite-level survey
experiments.
action. All results are substantively similar (and indeed more statistically significant) when treatment
effects are measured using the full seven-point measures.
17
Though such cost-benefit comparisons are inherently subjective, they proved to be relatively
predictable. The national security officials who participated in this study supported taking action in
these scenarios 45 percent of the time and opposed taking these actions 49 percent of the time, with
6 percent saying that they neither supported nor opposed these choices.
18
See appendix section 2c.
D ispel ling Il lu s ions o f R ig or 103
19
The workers who participated in this study were 52 percent female and 80 percent white;
61 percent reported that they held a college degree; and the median age was 35.
20
These differences are apparent not just when pooling data from the three scenarios together,
but also when comparing responses to each scenario individually. When examining different versions
of individual scenarios, there was only one of nine instances in which shifting probability assessments
did not cause a statistically significant shift in respondents’ willingness to support risky actions. As
mentioned above, all results are similar—and indeed statistically stronger—when measuring treat-
ment effects in terms of seven-point scales.
104 Wa r a n d C h a n c e
.6
.4
.2
0
Pessimistic scenarios Neutral scenarios Optimistic scenarios
(N = 675) (N = 618) (N = 579)
.6
.4
.2
0
Pessimistic scenarios Neutral scenarios Optimistic scenarios
(N = 4,338) (N = 4,299) (N = 4,521)
Figure 4.2 How small changes in probability assessments shape support for risky
actions. The graphs shows how small variations in probability assessments substantially
altered respondents’ support for proposed actions. The bars in the graphs represent mean
levels of support for proposed actions (plotted with 95 percent intervals). All differences
are statistically significant at the p = 0.001 level or below.
the latter sample, the reliability of statistical estimates becomes greater, hence
the smaller confidence intervals surrounding the bars in the graph. But the sub-
stantive effects of manipulating probability assessments across subjects was
essentially the same among participants recruited through AMT as it was for
national security officials enrolled at the National War College. The similarity of
responses across these samples helps to refute the notion that national security
professionals are, for whatever reason, unusually unwilling or particularly unable
to engage in probabilistic reasoning.
These findings are consistent with a similar set of experiments that Richard
Zeckhauser and I conducted with another cross-section of national security
D ispel ling Il lu s ions o f R ig or 105
21
Jeffrey A. Friedman and Richard Zeckhauser, “Analytic Confidence and Political Decision
Making: Theoretical Principles and Experimental Evidence from National Security Professionals,”
Political Psychology, Vol. 39, No. 5 (2018), pp. 1069–1087.
106 Wa r a n d C h a n c e
basis for this view is the notion that providing clear estimates of subjective prob-
ability create illusions of rigor, leading decision makers to see these judgments
as being sounder or more informative than they really are.
These concerns appear throughout public policy analysis and the social
sciences.22 The political theorist Isaiah Berlin thus cautioned: “To demand or
preach mechanical precision, even in principle, in a field incapable of it is to be
blind and to mislead others.”23 The governing board on risk assessment of the
National Research Council captured a similar sentiment when it warned: While
quantitative risk assessment facilitates comparison, such comparison may be il-
lusory or misleading if the use of precise numbers is unjustified.”24 And when the
National Intelligence Council incorporated words of estimative probability into
its analytic standards, it explained that this might be the outer limit of allowable
detail, since “assigning precise numerical ratings to such judgments would imply
more rigor than we intend.”25
The illusions-of-rigor thesis presents an especially direct challenge to the
arguments presented in this book. Previous chapters have explained how
marginalizing assessments of uncertainty can enable unsound choices. But the
illusions-of-rigor thesis suggests that efforts to clarify probabilistic reasoning
can have harmful effects, too. Instead of emphasizing uncertainty and encour-
aging careful reasoning about the ambiguity that surrounds major decisions, the
illusions-of-rigor thesis suggests that analytic clarity can drive decision makers
toward overconfidence and premature cognitive closure.
Yet, while this is another plausible reason for skepticism regarding probability
assessment in international politics, I am unaware of any previous scholarship
See David V. Budescu and Thomas S. Wallsten, “Subjective Estimation of Vague and Precise
22
Uncertainties,” in George Wright and Peter Ayton, eds., Judgmental Forecasting (Chichester: Wiley,
1987), pp. 63–82; Theodore Porter, Trust in Numbers: The Pursuit of Objectivity in Science and Public
Life (Princeton, N.J.: Princeton University Press, 1995).
23
Isaiah Berlin, “On Political Judgment,” New York Review of Books, October 3, 1996. Note that
Berlin’s concern goes beyond the arguments made by such scholars as Keynes and Mill that subjec-
tive probability assessments are meaningless (see chapter 2). Berlin argues that analytic precision is
not just meaningless, but that it can also be actively misleading.
24
National Research Council, The Handling of Risk Assessments in NRC Reports (Washington,
D.C.: National Academy Press, 1981), p. 15. See also Frank Ackerman and Lisa Heinzerling,
Priceless: On Knowing the Price of Everything and the Value of Nothing (New York: New Press, 2004).
25
National Intelligence Council, Prospects for Iraq’s Stability: A Challenging Road Ahead ( January
2007), available at https://fas.org/irp/dni/iraq020207.pdf. For similar arguments, see Charles
Weiss, “Communicating Uncertainty in Intelligence and Other Professions,” International Journal
of Intelligence and CounterIntelligence, Vol. 21, No. 1 (2008), pp. 57–85; Kristan J. Wheaton, “The
Revolution Begins on Page Five: The Changing Nature of NIEs,” International Journal of Intelligence
and CounterIntelligence,” Vol. 25, No. 2 (March 2012), pp. 330–349; and James Marchio, “‘If the
Weatherman Can . . .’: The Intelligence Community’s Struggle to Express Analytic Uncertainty in the
1970s,” Studies in Intelligence, Vol. 58, No. 4 (2014), pp. 31–42.
D ispel ling Il lu s ions o f R ig or 107
that has systematically examined that subject. Indeed, given how foreign policy
analysts tend to leave their probability assessments so vague, it is hard to know
how concerns about illusions of rigor have emerged, and on what basis skeptics
have become so convinced that assessing uncertainty in clear and structured
ways would warp high-stakes choices.26
It is also unclear what form these illusions of rigor would actually take
should they exist. Consider what might have happened, for example, if General
McChrystal had told President Obama that sending forty thousand addi-
tional troops to Afghanistan would raise the U.S. military’s chances of success
from ten percent to forty percent (whereas we saw in chapter 1 that General
McChrystal justified by Afghan Surge by saying only that sending more troops
was “best prospect for success in this important mission”). Assume for the sake
of argument that providing a clearer assessment of uncertainty would have
surrounded General McChrystal’s recommendation with an unjustified illusion
of rigor. Would that illusion of rigor have made General McChrystal’s recom-
mendation seem more defensible? Or would it have highlighted the prospect of
failure and thereby made President Obama likely to have approved the Afghan
Surge? I know of no existing studies that attempt to answer these questions.
Testing the illusions-of-rigor thesis is thus not simply a matter of gathering em-
pirical evidence—it also requires conducting the basic conceptual legwork of
specifying falsifiable claims about what the illusions-of-rigor problem entails.
This problem could take at least three forms. First, illusions of rigor could
cause decision makers to become more willing to put lives and resources at risk.
This argument builds from the premise that foreign policy officials may be un-
comfortable about the prospect of basing life-and-death decisions on subjective
judgments. Just as criminal courts forbid convicting defendants without con-
crete evidence of their guilt, there are clear political and ethical concerns that
mediate against using lethal force based on arguable assumptions.27 If illusions
26
This argument became particularly prominent given the failure of Secretary McNamara’s Whiz
Kids during the Vietnam War. The orthodox narrative is that the Whiz Kids’ use of quantitative anal-
ysis lent undeserved credibility to their arguments. Critics of structured analytic techniques for mil-
itary planning almost always invoke this case as a cautionary tale. We saw in chapter 1, however, how
the orthodox narrative gives the Whiz Kids too much credit for their analytic ambitions. Though
McNamara’s team constructed an unprecedentedly complex system for evaluating tactical outcomes
in Vietnam, McNamara himself admitted that he devoted virtually no effort to analyzing the chances
that U.S. strategy would ultimately succeed. In this sense, what appeared to be rigorous analysis on
the part of McNamara’s Defense Department actually ignored the most important elements of uncer-
tainty surrounding escalation in Vietnam.
27
Prior to the invasion of Iraq, for example, public officials created the impression that the
United States possessed reliable intelligence indicating that Saddam Hussein was developing
WMDs, whereas those judgments actually depended on circumstantial evidence and questionable
informants. If the speculative nature of these judgments had been clearer, then even if senior officials
108 Wa r a n d C h a n c e
of rigor lead decision makers to believe they possess a stronger evidentiary basis
for evaluating choices under uncertainty, it could increase their willingness to
take military action.
A second way to specify the illusions-of-rigor thesis is to say that analytic pre-
cision does not necessarily bias decision makers toward supporting or opposing
risky actions, but that these illusions amplify the weight decision makers assign
to analysts’ judgments.28 When analysts express probabilities more precisely, de-
cision makers may believe that those judgments are more credible. Under this
interpretation, clarifying a seemingly favorable probability assessment (such as a
high chance that a hostage-rescue mission will succeed) could increase decision
makers’ support for taking action. By contrast, clarifying a seemingly unfavorable
assessment of uncertainty (such as a high chance that a drone strike will cause
collateral damage) could increase decision makers’ opposition to taking action.
A third way that illusions of rigor could impair decision making is by reducing
decision makers’ willingness to gather additional information. This is perhaps
the most straightforward implication of the idea that clear probability estimates
seem more reliable than they really are. When dealing with uncertainty, deci-
sion makers frequently confront a trade-off between acting immediately versus
conducting additional analysis. Because conducting additional analysis carries
costs—both the direct costs of gathering more information and the opportunity
costs of delay—rational decision makers must consider the potential benefits
that gathering this additional information might bring. If illusions of rigor lead
decision makers to believe that assessments of uncertainty are more reliable
than they really are, then this could also cause decision makers to undervalue
the benefits of delaying high-stakes choices, thereby encouraging a potentially
harmful rush to judgment.
There are thus three distinct ways in which illusions of rigor can potentially
degrade the quality of foreign policy decision making. Before we move on to test
these claims, however, it worth noting that each of these arguments relies on an
in Congress or the Executive Branch still believed that it was likely that Iraq was pursuing WMDs,
they might have found it harder to justify pursuing regime change. On how these concerns appear in
many areas of political decision making (and on how they form a rational basis for “ambiguity aver-
sion”), see Friedman and Zeckhauser, “Analytic Confidence and Political Decision Making.”
For psychological research supporting this perspective in other areas of decision making, see
28
Ido Erev, Gary Bornstein, and Thomas S. Wallsten, “The Negative Effect of Probability Assessments
on Decision Quality,” Organizational Behavior and Human Decision Making, Vol. 55, No. 1 (1993), pp.
78–94; Claudia C. Gonzalez-Vallejo, Ido Erev, and Thomas S. Wallsten, “Do Decision Quality and
Preference Order Depend on Whether Probabilities Are Verbal or Numerical?” American Journal of
Psychology, Vol. 107, No. 2 (1994), pp. 157–172; and Nathan F. Dieckmann, Paul Slovic, and Ellen
M. Peters, “The Use of Narrative Evidence and Explicit Likelihood by Decisionmakers Varying in
Numeracy,” Risk Analysis, Vol. 29, No. 10 (2009), pp. 1473–1488.
D ispel ling Il lu s ions o f R ig or 109
29
For a review of fundamental research in this field, see Thomas Wallsten, “Costs and Benefits
of Vague Information,” in Insights in Decision Making, ed. Robin Hogarth (Chicago, Ill.: University of
Chicago Press, 1990), pp. 28–43.
30
We translated words of estimative probability into numeric percentages in terms of the round
number (multiples of 5 percentage points) that was closest to the midpoint of the range that each
word of estimative probability represented. Thus, given that the word “unlikely” spans probabilities
from 29 percent to 43 percent, according to National Intelligence Council guidelines, we translated
this term into an estimate of 35 percent for the experiment’s numeric condition. Of course, there
110 Wa r a n d C h a n c e
are many other ways we could have chosen to vary the precision that these judgments entailed.
Chapter 1, for example, described how national security analysts often express their beliefs in terms
of confidence levels or estimative verbs. However, we expected that any problems associated with
illusions of rigor would likely be the most pronounced when comparing the way respondents reacted
to the difference between qualitative and quantitative judgments, and so that is where we chose to
put our main focus.
31
Appendix section 2a provides more details on respondent demographics for all four
experiments presented in this chapter.
32
This is also true if we examine the two samples of national security official separately. The
findings presented in Figures 4.3 and 4.4 hold in multivariate regression as well. See Friedman,
Lerner, and Zeckhauser, “Behavioral Consequences of Probabilistic Precision.”
D ispel ling Il lu s ions o f R ig or 111
.1
Treatment effect
–.1
–.2
.1
Treatment effect
–.1
–.2
Figure 4.3 How quantifying probability assessments shapes support for risky actions. The
figure shows the “treatment effect” associated with providing respondents with probability
assessments expressed using numbers instead of words. The bars represent the change in
the proportion of respondents who support taking action (with 95 percent intervals).
Figure 4.3 shows that the experimental data refuted both of these predictions.
Quantifying probability assessments actually had its strongest impact when it
came to reducing support for risky actions in the neutral scenarios, and it did
not increase support for risky action in the optimistic scenarios. These results
show that decision makers do not simply assign greater weight to explicit prob-
ability assessments. Instead, we see that clearer probabilistic reasoning caused
decision makers to be more careful about taking risks where the evidence was
most ambiguous. This is consistent with the idea that assessing uncertainty in
clearer and more structured ways makes leaders more sensitive to the risks that
112 Wa r a n d C h a n c e
.40 .40
Verbal Numeric Verbal Numeric
probabilities probabilities probabilities probabilities
(N = 833, p = 0.08) (N = 4,386, p < 0.001)
Figure 4.4 How analytic precision shapes willingness to delay risky choices. The figure
compares the proportion of respondents who support delaying risky choices to gather
more information, based on whether those respondents were provided with probability
assessments expressed in verbal versus numeric form. Data shown with 95 percent
intervals. AMT = Amazon Mechanical Turk.
33
This finding falls just outside the standard p = 0.05 threshold for statistical significance in the
national security officials sample. Yet this is partly because collapsing survey respondents into bi-
nary categories (i.e., whether or not they support delay) sacrifices information. If we instead measure
treatment effects using the full seven-point scale respondents used to provide their answers, then the
treatment effect for national security officials is statistically significant at p = 0.002.
D ispel ling Il lu s ions o f R ig or 113
34
Elite respondents completed a survey containing both this experiment and the decision
scenarios from experiment 2 (in random order). This sample of AMT respondents was recruited
specifically for this study. See appendix section 2a for more detail on respondent demographics.
35
The experiment’s full question list is in appendix section 2e. All findings in this section are ro-
bust to examining forecasts or non-forecasts independently.
36
Together, these surveys produced a total of 61,901 probability estimates (excepting 14
estimates that were greater than 100 percent, presumably due to typographical errors).
114 Wa r a n d C h a n c e
20%
10%
0%
Remote Very unlikely Unlikely Even chance Likely Very likely Almost certain
(0.00–0.14) (0.15–0.28) (0.29–0.42) (0.43–0.56) (0.57–0.71) (0.72–0.85) (0.86–1.00)
20%
10%
0%
Remote Very unlikely Unlikely Even chance Likely Very likely Almost certain
(0.00–0.14) (0.15–0.28) (0.29–0.42) (0.43–0.56) (0.57–0.71) (0.72–0.85) (0.86–1.00)
When we analyzed the data from this experiment, we found that the numeric
probabilities respondents provided were not just more precise than the verbal
judgments. Instead, we found that the two sets of judgments were distributed in
entirely different ways. Figure 4.5 describes this pattern. The black circles in the
figure show the frequency with which qualitative assessors employed each word
of estimative probability. The graphs show that respondents were most likely to
employ terms near the middle of the probability spectrum. The hollow circles
in Figure 4.5 plot the frequency with which respondents provided numeric
estimates that corresponded to each word of estimative probability. The distri-
bution of these judgments is almost exactly the opposite of what the qualitative
D ispel ling Il lu s ions o f R ig or 115
assessors reported: those judgments tended to cluster near the edges of the
probability spectrum, not the middle.37
The remainder of this section explains how these data indicate that foreign
policy analysts face a trade-off between discrimination and calibration when
assessing uncertainty. As described in chapter 3, discrimination reflects the
ability to distinguish which outcomes are more likely than others, whereas cal-
ibration reflects the congruence between probability estimates and observed
frequencies. For example, a weather forecaster who always predicts a ten per-
cent chance of rain in an area where it rains on ten percent of the days would
demonstrate excellent calibration but no discrimination. If a different weather
forecaster said there was a sixty percent chance of precipitation every time it
rained and a forty percent chance of precipitation every day it was sunny, then
this would reflect perfect discrimination but poor calibration.
We saw in chapter 3 that foreign policy analysts are remarkably effective at
discriminating among subjective probabilities. Indeed, chapter 3 showed that
foreign policy analysts were so effective at this task that even rough quantita-
tive expressions could not capture the distinctions that analysts could reliably
draw when assessing uncertainty in international politics. Yet the data shown in
Figure 4.5 indicate that providing this level of detail can come at the expense of
calibration. This pattern does not appear to reflect a quirk in the data: it is highly
statistically significant; it appears in both elite and non-elite samples; and a later
part of this section will replicate that finding in another experiment that em-
ployed a separate question list.
The discrimination-calibration trade-off is arguably the most important draw-
back to transparent probabilistic reasoning that we will encounter in this book.
And it is important to know what that trade-off entails—in particular, the data
presented in this section indicate that verbal and numeric probabilities should
not be seen as equivalent expressions, but rather as entirely separate languages
for assessing uncertainty. Yet the remainder of this section will show that the
discrimination-calibration trade-off mainly appears among low-quality analysts;
that mere minutes of feedback can mitigate the problem; and that foreign policy
decision makers who are forced to choose between conducting debates in a lan-
guage that is either highly discriminating or well calibrated should generally
prefer the former.
37
All differences in proportions between qualitative and quantitative assessors in Figure 4.5 are
statistically significant at the p < 0.01 level.
116 Wa r a n d C h a n c e
On persistent discrepancies between the intended and actual definitions of verbal proba-
38
bility estimates, see David V. Budescu, Han-Hui Por, Stephen B. Broomell, and Michael Smithson,
“Improving Communication of Uncertainty in the Report of the Intergovernmental Panel on Climate
Change,” Psychological Science, Vol. 20, No. 3 (2009), pp. 299–308.
39
Friedman, Lerner, and Zeckhauser, “Behavioral Consequences of Probabilistic Precision,”
refer to this as the “numbers as a second language” problem. For similar ideas in other disciplines,
see Wallsten, “Costs and Benefits of Vague Information”; and Alf C. Zimmer, “Verbal vs. Numerical
Processing of Subjective Probabilities,” in R. W. Scholtz ed., Decision Making under Uncertainty
(Amsterdam: North-Holland, 1983).
D ispel ling Il lu s ions o f R ig or 117
Even though this information is less precise than providing a numeric estimate
of the chances that you will head to the polls, survey researchers generally be-
lieve that qualitative response scales are less confusing and more reliable.
The broader practice of eliciting survey responses using “feeling thermo
meters,” “rating scales,” or other non-numeric tools is usually justified on sim-
ilar grounds. In each of these areas, scholars deliberately sacrifice their ability to
observe fine-grained detail (that is, they sacrifice discrimination) because they
worry that the act of providing such detail might warp the content of what survey
respondents say (and would thus sacrifice calibration). Figure 4.5 indicates that
a version of this trade-off also appears when researchers are eliciting subjective
probability estimates. Specifically, Figure 4.5 shows that foreign policy analysts
who express subjective probabilities using numbers instead of words also tend to
make those estimates more extreme.
The most plausible explanation for this behavior is that respondents relied on
the cut-points between words of estimative probability as anchors for calibrating
their judgments. Thus, if a foreign policy analyst thinks that a statement is more
probable than not but still far from certain, then she would presumably express
this judgment as being “likely” or “very likely” according to the standard words-
of-estimative-probability spectrum that we used in our experiment. This analyst’s
response would thus turn on her perception of whether her judgment falls above
or below the boundary between two terms. If she wishes to be cautious, then our
analyst might say that her judgment is, simply, “likely.” But if we ask her to ex-
press that same judgment using numeric percentages, then she would not have a
natural benchmark to use when calibrating her judgment. This analyst might say
that a judgment that is more probable than not but still far from certain is some-
thing like eighty percent. That estimate actually lies beyond the range of values
that the word “likely” could plausibly take according to the guidelines that we
used in our experiment. This is the sense in which the boundaries between
words of estimative probability can provide anchors for restraining the natural
inclinations of foreign policy analysts to make their judgments too extreme.
The tendency of foreign policy analysts to attach extra certainty to numeric
judgments appears to exact a toll on judgmental accuracy. When we measured
judgmental accuracy in terms of each respondent’s overall Brier score, then
national security officials who assessed uncertainty using words of estima-
tive probability instead of percentages performed 14 percent better than their
counterparts (0.230 versus 0.265).40 Among the AMT respondents, the gap in
40
The appendix describes this methodology in more detail, and explains how these findings are
robust to a range of alternative approaches for comparing the accuracy of qualitative and quantitative
judgments. The appendix also provides more detail on the relationship between certitude and judg-
mental accuracy in this experiment.
118 Wa r a n d C h a n c e
performance was 11 percent (0.276 versus 0.310). Both findings were statisti-
cally significant at the p < 0.001 level (that is, the chances that the data would
produce these distinctions through random chance are less than one in a thou-
sand). These patterns hold regardless of respondents’ numeracy, gender, lan-
guage, nationality, age, education, or military experience.41
One group of respondents, however, drove these patterns more than others.
In particular, the gap in performance between qualitative and quantitative
assessors in this experiment was almost entirely associated with the worst-
performing analysts in the study. For instance, if we exclude respondents whose
average Brier scores fell into the bottom quartile of their respective samples, then
this eliminates roughly two-thirds of the gap in performance between qualitative
and quantitative assessors.42 And if we limit the analysis to respondents whose
Brier scores were better than the median within their respective samples, then
there is no longer any statistically significant difference between the Brier scores
associated with qualitative and quantitative assessors.43 This finding suggests
that, even if some foreign policy analysts face a trade-off between discrimina-
tion and calibration when assessing uncertainty, it may not take much talent or
effort to overcome that problem. The next section presents additional data that
supports this view.
44
The elite respondents surveyed in this experiment came from a second cohort at the National
War College, surveyed a year after the survey experiments described in prior sections of this chapter.
See appendix section 2a for demographic details.
45
See appendix section 2e for the question list.
46
The other half of respondents received placebo text of similar length that did not include
performance-related feedback. See appendix section 2h.
47
See appendix section 2h.
48
Respondents in the qualitative assessment condition received equivalent information, as
documented in appendix section 2h.
120 Wa r a n d C h a n c e
6 percent better, on average, than those of the respondents who received the
placebo treatment. That improvement amounted to roughly one- third of
a standard deviation in respondent Brier scores, and it fell just outside the
standard threshold for statistical significance (p = 0.06). Among the 1,208 AMT
respondents who participated in the same experiment, individuals who received
calibration feedback provided assessments of uncertainty with Brier scores that
were 5 percent better, on average, than their counterparts. This improvement
also reflected roughly one-third of a standard deviation in overall performance,
and with a much larger amount of data to draw on, this finding was highly
statistically-significant (p < 0.001).
Figures 4.6 and 4.7 show that this brief training also narrowed the diver-
gence between qualitative and quantitative judgments. Figure 4.7 shows how
.30
Respondent certitude
.25
.20
.15
Words Numbers Words Numbers
Untrained analysts Trained analysts
.20
.15
Words Numbers Words Numbers
Untrained analysts Trained analysts
Figure 4.6 How brief calibration training influenced respondent certitude. The graphs
display the average levels of certitude (with standard deviations) that respondents
attached to their judgments, based on whether or not they assessed uncertainty using
words versus numbers, and on whether or not they received brief calibration training.
D ispel ling Il lu s ions o f R ig or 121
.25
.20
.15
Words Numbers Words Numbers
Untrained analysts Trained analysts
Figure 4.7 How brief calibration training influenced response accuracy. The graphs
display respondents’ average Brier scores (with standard deviations), based on whether
or not they assessed uncertainty using words versus numbers, and on whether or not they
received brief calibration training.
respondents who received calibration feedback attached far less certainty to their
judgments.49 Figure 4.8 then shows how this adjustment helped to narrow the
performance gap between qualitative and quantitative assessors. Among national
security professionals, those two minutes of training closed roughly one-quarter
49
Among national security officials in the placebo condition, numeric assessors attached an
average of 6.6 extra percentage points of certainty to numeric judgments (0.307 versus 0.241, p <
0.001); among national security officials in the training condition, numeric assessments attached an
average of 5.2 extra percentage points of certainty to their judgments (0.240 versus 0.187, p < 0.001).
Figure 4.8 Summary of experiments and results. AMT = Amazon Mechanical Turk; DM = decision maker.
D ispel ling Il lu s ions o f R ig or 123
of the gap in Brier scores between analysts who assessed probabilities using
numbers versus words. Among AMT respondents, the training eliminated this
gap almost entirely, with the difference in mean Brier scores between qualitative
and quantitative assessors becoming trivial (0.281 versus 0.282) and no longer
approaching statistical significance (p = 0.63).50
Practical Implications of
the Discrimination-Calibration Trade-off
Of course, mitigating the discrimination-calibration trade-off is not the same
thing as making it go away. And even if brief training could eliminate that trade-
off entirely, this would still require effort on the part of foreign policy analysts
or organizations. The fact remains that some people are better able to calibrate
their assessments of uncertainty in international politics using verbal language.
To my knowledge, this is a novel finding. That finding appears to deserve more
attention than concerns about illusions of rigor—a widespread argument that
received no discernible support from this chapter’s experimental data.
There are, nevertheless, three reasons why the discrimination-calibration
trade-off does not appear to reflect a major barrier to foreign policy discourse.
The first reason is that the discrimination-calibration trade-off appeared mainly
among low-quality analysts, particularly those in the bottom quartile of overall
performance. These respondents may have shown that they were not intuitively
equipped to translate their assessments of uncertainty into numbers, but their
performance hardly indicates that serious, talented foreign policy analysts could
not manage this challenge if they set their minds to doing so.
We then saw that small amounts of effort indeed make a substantial dent
in the discrimination-calibration trade-off. Just two minutes of training signif-
icantly mitigated this trade-off among national security officials and entirely
eliminated the problem among non-elite respondents. We also saw that brief
feedback yielded across-the-board benefits for improving judgmental accuracy.51
Among non-elite respondents, the gap in certitude declined from 4.4 percentage points in the pla-
cebo condition (0.228 versus 0.185, p < 0.001) to 3.3 percentage points in the training condition
(0.194 versus 0.161, p < 0.001).
50
See appendix section 2i for full results.
51
This finding is consistent with research from the Good Judgment Project (GJP). As described
in chapter 3, the GJP demonstrated that just one hour of online training in probability assessment
produced substantial improvements in forecasting performance that persisted throughout four sep-
arate, year-long study periods. See Welton Chang, Eva Chen, Barbara Mellers, and Philip Tetlock,
“Developing Expert Political Judgment: The Impact of Training and Practice on Judgmental Accuracy
in Geopolitical Forecasting Tournaments,” Judgment and Decision Making, Vol. 11, No. 5 (2016), pp.
509–526.
124 Wa r a n d C h a n c e
For skepticism regarding the theoretical and empirical basis for organizational intelligence
52
reforms, see Richard A. Posner, Preventing Surprise Attacks: Intelligence Reform in the Wake of 9/11
(Lanham, Md.: Rowman and Littlefield, 2005); Uri Bar-Joseph and Rose McDermott, “Change the
Analyst and Not the System: A Different Approach to Intelligence Reform,” Foreign Policy Analysis,
Vol. 4, No. 2 (April 2008), pp. 127–145; Paul Pillar, Intelligence and U.S. Foreign Policy: Iraq, 9/
11 and Misguided Reform (New York: Columbia University Press, 2011); and Betts, Enemies of
Intelligence.
D ispel ling Il lu s ions o f R ig or 125
Conclusion
We have seen throughout the book how many scholars and practitioners believe
that probabilistic reasoning in international politics is either meaningless, coun-
terproductive, or both. Yet chapter 3 demonstrated that foreign policy analysts
are, in fact, capable of assessing subjective probabilities with remarkable levels of
detail. The evidence presented in this chapter then showed that clear assessments
of uncertainty do not appear to mislead decision makers in the manner that con-
ventional wisdom suggests.
After administering four survey experiments to more than six hundred na-
tional security officials and over three thousand non-elite respondents, I found
no evidence to support the common view that clear assessments of uncertainty
in international politics create harmful illusions of rigor. These experiments also
refuted concerns that foreign policy decision makers are somehow unable or un-
willing to grapple with subjective probabilities when they evaluate risky choices.
Instead, the chapter showed that subtle differences in probability estimates
convey information that national security officials use to evaluate high-stakes
decisions, and that expressing these distinctions more clearly causes decision
makers to become more cautious about placing lives and resources at risk. The
chapter then showed that, even though some people face a trade-off between dis-
crimination and calibration when assessing uncertainty, this problem appeared
mainly among low-quality analysts, and it can be mitigated with mere minutes
of effort.
Figure 4.8 summarizes the design, results, and implications of these
experiments. In addition to testing specific hypotheses about the psychology
of assessing uncertainty in international politics, these studies offer three broad
messages.
The first of these messages relates to the state of existing scholarship on
the psychology of foreign policy analysis. Throughout the chapter, we have
encountered a broad range of arguments about the cognitive constraints that
53
On methods for collecting and evaluating calibration data across foreign policy organization
writ large, see Jeffrey A. Friedman and Richard Zeckhauser, “Why Evaluating Estimative Accuracy Is
Feasible and Desirable,” Intelligence and National Security, Vol. 31, No. 2 (2016), pp. 178–200.
126 Wa r a n d C h a n c e
For more on how widespread assumptions about assessing uncertainty in international politics
54
lack rigorous foundations, see Philip Tetlock, “Second Thoughts about Expert Political Judgment,”
Critical Review, Vol. 22, No. 4 (2010), pp. 467–488.
D ispel ling Il lu s ions o f R ig or 127
Finally, and most importantly, the findings presented in this chapter rebut
common notions that human cognition is somehow unequipped to assess sub-
jective probability in international politics. No one should pretend that this
challenge is easy or uncontroversial. But that is true of virtually any element
of high-stakes decision making. For practical purposes, the key question is not
just how good we are at assessing uncertainty in general, but how we can ap-
proach this challenge as effectively as possible. The experiments presented in
this chapter have honed wide-ranging concerns about this topic to a specific
and previously undocumented trade-off that training can apparently correct.
These findings support the book’s broader argument that foreign policy analysts
and decision makers do best when they confront the challenge of assessing
uncertainty head-on, particularly if they are willing to put minimal effort into
calibrating their views.
5
In March 2009, a series of seismic tremors shook the city of L’Aquila, Italy. Italy’s
National Commission for the Forecast and Prevention of Major Risks met on
March 31. The commission observed that low-level tremors generally do not
predict the onset of major earthquakes, and concluded that there was no cause
for alarm. Within a week, a 6.3-magnitude earthquake struck L’Aquila, killing
more than three hundred people. Seven members of the risk commission were
subsequently convicted of manslaughter.
Scientists swiftly condemned this verdict. Earthquakes are rare, it is essen-
tially impossible to predict them, and it is true that low-level tremors do not
consistently predate major shocks. Punishing analysts for saying as much could
encourage them to give inflated warnings, to keep their judgments deliberately
vague, or to avoid addressing controversial subjects. Many scientists thus argued
that the L’Aquila convictions were not only unfair to the risk commission itself,
but that this episode would also discourage experts from assisting with public
policy matters moving forward.
Yet the judge who issued the L’Aquila verdict explained that his decision was
not, in fact, based on the scientists’ inability to predict natural disasters. Instead,
he argued that the risk commission had been negligent in offering “vague, ge-
neric, and ineffective” information.1 For example, the official in charge of the
commission had said in a press conference that “the scientific community tells
me there is no danger,” which was not the same thing as saying that small-scale
tremors do not reliably signal larger shocks. “I’m not crazy,” said the prosecutor
in the case—“I know they can’t predict earthquakes.” The basis for the charges,
he insisted, was that the risk commission had overstepped the evidence in
appearing to conclusively reject the possibility of an impending crisis.2
1
Liza Davies, “L’Aquila Quake: Italian Judge Explains Why He Jailed Scientists over Disaster,”
The Guardian, January 18, 2013.
2
Michael Lucibella, “Communication Breakdown Played Role in Scientists’ Convictions,”
American Physical Society News, Vol. 21, No. 1 (2012), pp. 1–6.
129
130 Wa r a n d C h a n c e
Of course, Italy’s risk commission would surely have come under fire no
matter how it had communicated its findings to the public. In the aftermath of
crises like the L’Aquila earthquake, the public naturally searches for scapegoats.
But that is different from saying that the prospect of criticism prevents public
officials from making clear and honest assessments of uncertainty. In the L’Aquila
case, the risk commission received special scrutiny for appearing to ignore the
uncertainty that surrounded their predictions.
This chapter explains how similar dynamics surround foreign policy analysis.
The politics of foreign policy are notoriously adversarial, with critics poised to
claim that unpleasant surprises reflect analytic malfeasance. It is widely observed
that there are only two kinds of outcomes in international affairs: policy successes
and intelligence failures.3 Many scholars and practitioners thus believe that for-
eign policy analysts face a trade-off between providing useful assessments of
uncertainty and minimizing their exposure to criticism.4 As a shorthand, the
chapter refers to this logic as the politics of uncertainty and blame.
It is important to note that concerns about the politics of uncertainty and
blame are not purely self-serving. Foreign policy analysts cannot have much in-
fluence unless they are perceived to be credible. And analysts who come under
fire for making errors can lose their credibility, regardless of whether or not
this criticism is deserved. As a former deputy director of national intelligence,
Thomas Fingar, puts it, “Criticism has a badly corrosive effect on the confidence
in—and the confidence of—the analytic community.”5 Scholars, practitioners,
and pundits thus have legitimate reasons to worry about how their errors will be
perceived by potential critics.
As intelligence scholar John Hedley puts it, “Anything that catches the U.S. by surprise and is bad
3
news is deemed an intelligence failure.” Hedley, “Learning from Intelligence Failures,” International
Journal of Intelligence and CounterIntelligence, Vol. 18, No. 3 (2005), pp. 436. See also Paul R. Pillar,
Intelligence and U.S. Foreign Policy (New York: Columbia University Press, 2011), pp. 175–201; and
Philip E. Tetlock and Barbara A. Mellers, “Intelligent Management of Intelligence Agencies: Beyond
Accountability Ping-Pong,” American Psychologist, Vol. 66, No. 6 (2011), pp. 542–554.
4
On this point, see Philip E. Tetlock and Daniel Gardner, Superforecasting: The Art and Science
of Prediction (New York: Crown, 2015); John Mueller and Mark G. Stewart, Chasing Ghosts: The
Policing of Terrorism (New York: Oxford University Press, 2016); and Nate Silver, The Signal and the
Noise: Why Most Predictions Fail—but Some Don’t (New York: Penguin, 2012).
5
Thomas Fingar, Reducing Uncertainty: Intelligence Analysis and National Security (Stanford,
Calif.: Stanford Security Studies, 2011), p. 35. This statement holds for professional foreign policy
analysts as well as for public intellectuals who seek to inform the marketplace of ideas. Increasing
political polarization throughout Western democracies is thus widely seen as driving a so-called
death of expertise, as voters lose faith in intellectual elites and become increasingly attached to beliefs
having little basis in fact. See, for example, Tom Nichols, “How America Lost Faith in Expertise,”
Foreign Affairs, Vol. 96, No. 2 (2017), pp. 60–73; and Matthew Motta, “The Dynamics and Political
Implications of Anti-intellectualism in the United States,” American Politics Research, Vol. 46, No. 3
(2018), pp. 465–498.
The Politic s o f Uncer taint y and Blame 131
Yet, in principle, this is exactly the kind of problem that transparent probabi-
listic reasoning should solve. The purpose of assessing uncertainty is to explain
what analysts do not know, and to emphasize how decision makers cannot count
on assumptions being true. As we have seen in previous chapters, there is vir-
tually no way to conclude that a single probability assessment is “wrong.” And
when analysts express probabilities more clearly, they leave less room for critics
to misperceive their views.
Moreover, we will see throughout the chapter how the conventional wisdom
regarding the politics of uncertainty and blame contains basic contradictions.
For instance, if foreign policy analysts were truly judged on the basis of policy
outcomes, then they would have no incentive to conceal or to distort their
views.6 The notion that analysts benefit from leaving their judgments vague
assumes that adversarial critics are willing to grant analysts the benefit of the
doubt when it comes to interpreting what those judgments mean. And in order
to believe that foreign policy analysts have an incentive to draw excessively cau-
tious conclusions, one must assume that critics actually care about the nuances
of how analysts express uncertainty—which, of course, contradicts the premise
that critics automatically treat all unpleasant surprises as analytic failures. The
chapter thus explains that the conventional wisdom about the politics of un-
certainty and blame is more tenuous than it seems and may in fact get its sub-
ject exactly backward: if foreign policy analysts are truly worried about making
errors or having their judgments distorted, their best move might be to embrace
probabilistic reasoning, rather than to avoid it.
The chapter backs this argument with a combination of qualitative and quan-
titative evidence. The qualitative evidence explores the historical record of
perceived intelligence failures in the United States. With just one clear exception
(assessments of the Soviet Union’s development of nuclear weapons), we will
see that intelligence analysts did not come under fire in these cases as a result
of making clear and honest assessments of uncertainty. Quite the opposite, the
judgments analysts offered in these cases tended to be to be extraordinarily vague
or to avoid important elements of uncertainty entirely. Yet we will see that these
practices did not shield analysts from blame in the manner that the conventional
wisdom expects. Instead, critics tended to exploit the gaps and ambiguities in in-
telligence reporting in order to make the Intelligence Community appear more
complacent or more mistaken than it really was.
The chapter then replicates this finding through a survey experiment
designed to understand why critics evaluate some assessments of uncertainty
6
On how outcome-based evaluations do not necessarily distort analysts’ incentives, see Mehmet
Y. Gurdal, Joshua B. Miller, and Aldo Rustichini, “Why Blame?” Journal of Political Economy, Vol. 121,
No. 6 (2013), pp. 1205–1247.
132 Wa r a n d C h a n c e
more harshly than others. The data from this experiment revealed that analysts
were judged less on the basis of judgmental accuracy than on policy outcomes.
Since those outcomes lie outside analysts’ control, this is a clear sense in which
criticism can be unfair. But we will also see little evidence that this criticism gives
analysts any incentive to distort or conceal their judgments. Criticism directed
toward foreign policy analysts in this experiment was systematically lower, on
average, when analysts framed their judgments with explicit probabilities. These
data complement the chapter’s historical evidence in suggesting that transparent
probabilistic reasoning not only improves the quality of foreign policy discourse,
but also protects the interests of foreign policy analysts themselves.
7
For a review of foundational literature on this subject, see Jennifer S. Lerner and Philip E.
Tetlock, “Accounting for the Effects of Accountability,” Psychological Bulletin, Vol. 125, No. 2 (1999),
pp. 255–275.
8
On blame avoidance in politics, see R. Kent Weaver, “The Politics of Blame Avoidance,” Journal
of Public Policy, Vol. 6, No. 4 (1986), pp. 371–398, and Christopher Hood, The Blame Game: Spin,
Bureaucracy, and Self-Preservation in Government (Princeton, N.J.: Princeton University Press, 2011).
On defensive decision making in other professions, see Gerd Gigerenzer, Risk Savvy: How to Make
Good Decisions (New York: Viking, 2014).
9
On how attempts to sanction leaders for poor performance can end up corrupting incentives, see
George W. Downs and David M. Rocke, Optimal Imperfection? (Princeton, N.J.: Princeton University
Press, 1995); and H. E. Goemans, War and Punishment (Princeton, N.J.: Princeton University Press,
2000). On the challenges of oversight in national security analysis and decision making specifically,
see Amy B. Zegart, Eyes on Spies (Stanford, Calif.: Hoover Institution Press, 2011); Loch Johnson,
Spy Watching (New York: Oxford University Press, 2017); and Tetlock and Mellers, “Intelligent
Management of Intelligence Agencies.”
The Politic s o f Uncer taint y and Blame 133
10
If an analyst says that an event has a thirty percent probability and the event does not happen,
then her judgmental error for that estimate is 0.30; if the event does take place, then her judgmental
error for that estimate is 0.70.
11
If an analyst says that an event has a thirty percent probability and the event does not happen,
then her judgmental error for that estimate is 0.30 and her Brier score is 0.09; if the event does take
place, then her judgmental error for that estimate is 0.70 and her Brier score is 0.49.
134 Wa r a n d C h a n c e
certainty, so as to reap the highest rewards when their judgments are correct.12 Of
course, it seems hard to believe that critics can strike this kind of balance when
it comes to evaluating assessments of uncertainty in foreign policy discourse. Yet
the remainder of this section explains how scholars cannot agree on what these
distortions actually entail, and how they sometimes offer contradictory views of
how these factors shape the politics of uncertainty and blame.
12
Glenn W. Brier, “Verification of Forecasts Express in Terms of Probability,” Monthly Weather
Review, Vol. 78, No. 1 (1950), pp. 1–3; Edgar C. Merkle and Mark Steyvers, “Choosing a Strictly
Proper Scoring Rule,” Decision Analysis, Vol. 10, No. 4 (2013), pp. 292–304.
13
M. David Piercey, “Motivated Reasoning and Verbal vs. Numerical Probability
Assessment: Evidence from an Accounting Context,” Organizational Behavior and Human Decision
Processes, Vol. 108, No. 2 (2009), pp. 330–341; Andrew Mauboissin and Michael Mauboissin, “If You
Say Something Is ‘Likely,’ How Likely Do People Think It Is?” Harvard Business Review ( July 2018),
available at https://hbr.org/2018/07/if-you-say-something-is-likely-how-likely-do-people-think-it-
is, accessed August 2018. Hood, Blame Game, provides a broader discussion of “abstinence,” when
public officials deliberately withhold information or services that could expose them to criticism
after the fact.
The Politic s o f Uncer taint y and Blame 135
“weasel words” that reflect “analytical pusillanimity” rather than efforts to pro-
vide useful information.14 Philip Tetlock argues that this problem undermines
the quality of foreign policy discourse, both in government and throughout the
broader public sphere. “Both private-and public sector prognosticators,” he
writes,
must master the same tightrope-walking act. They know they need to
sound as though they are offering bold, fresh insights into the future not
readily available off the street. And they know they cannot afford to be
linked to flat-out mistakes. Accordingly, they have to appear to be going
out on a limb without actually going out on one. That is why [foreign
policy analysts] so uniformly appear to dislike affixing ‘artificially precise’
subjective probability estimates to possible outcomes. . . . It is much safer
to retreat into the vague language of possibilities and plausibilities.”15
14
Mark Lowenthal, Intelligence: From Secrets to Policy, 3rd ed. (Washington, D.C.: CQ Press,
2006), p. 129. Alan Barnes similarly describes how Canadian intelligence analysts were only willing
to participate in a study that involved quantifying probability estimates if researchers ensured that
their judgments would only be used for academic purposes, and not for professional evaluations.
See Alan Barnes, “Making Intelligence Analysis More Intelligent: Using Numeric Probabilities,”
Intelligence and National Security, Vol. 31, No. 1 (2016), pp. 327–344. On foreign policy analysts’
general reticence to provide assessments of uncertainty in a form that could be evaluated objectively,
see Jeffrey A. Friedman and Richard Zeckhauser, “Why Assessing Estimative Accuracy Is Feasible
and Desirable,” Intelligence and National Security, Vol. 31, No. 2 (2016), pp. 178–200.
15
Philip E. Tetlock, “Reading Tarot on K Street,” The National Interest, No. 103 (2009), p. 67.
16
Hedley, “Learning from Intelligence Failures,” p. 447.
136 Wa r a n d C h a n c e
17
Gregory F. Treverton, “Theory and Practice,” Intelligence and National Security, Vol. 33, No. 4
(2018), p. 477.
18
For evidence that foreign policy analysts naturally avail themselves of this opportunity, and do
so in ways that hamper effective learning, see Philip E. Tetlock, “Theory-Driven Reasoning about
Plausible Pasts and Probable Futures in World Politics: Are We Prisoners of Our Preconceptions?”
American Journal of Political Science, Vol. 43, No. 2 (1999), pp. 335–366.
The Politic s o f Uncer taint y and Blame 137
that they do this consistently enough to outweigh the risks of adversarial exploi-
tation.19 The next two sections present historical and experimental evidence that
contradicts this claim.
The logic of strategic caution also presumes a substantial degree of for-
giveness on the part of supposedly adversarial critics. This logic asserts that
critics are willing and able to calibrate political attacks based on semantic
nuance. For analysts to have an incentive to say that outcomes are “likely”
instead of “very likely” (or that a probability is thirty percent instead of ten
percent), they must believe that critics will actually afford greater leniency
to judgments that convey less certainty. But why would adversarial critics do
that? This perspective is, once again, inconsistent with the notion that critics
automatically characterize unpleasant foreign policy surprises as analytic
failures.
The logic of strategic caution actually relies on the assumption that critics
are overly sensitive to nuance when evaluating assessments of uncertainty. As we
have seen, strictly proper scoring rules should penalize analysts more harshly for
assessments that entail greater degrees of judgmental error, because this is the
only way to give analysts an incentive to report their honest beliefs. The Brier
score, which is based on the square of judgmental error, is a prime example. In
order to assume that the prospect of criticism gives analysts incentives for stra-
tegic caution, it is necessary to believe that critics are so sensitive to nuances in
probabilistic reasoning that their “penalty function”—the relationship between
judgmental error and assigned criticism—accelerates more sharply than the
Brier score. This is an extremely specific and nonintuitive claim that no scholar,
to my knowledge, has ever tested directly.
19
In this way, making explicit assessments of uncertainty can be compared to buying insurance
against the risk of adversarial exploitation. As with other forms of insurance, risk-averse individuals
should be willing to pay costs in order to avoid bad outcomes. The historical review presented in the
section shows that analysts can indeed suffer extreme criticism when they leave their assessments of
uncertainty vague.
138 Wa r a n d C h a n c e
the chapter will replicate this claim in a setting that allows for more precise
causal analysis.
Absences of Warning
To structure a historical review of the politics of uncertainty and blame, I
gathered a list of cases that have been labeled as “intelligence failures” in mul-
tiple, peer-reviewed sources.20 I identified nineteen such cases, which are listed
in Figure 5.1. Broadly speaking, we can divide these cases into two groups. The
first group can be labeled absences of warning. These are episodes in which in-
telligence analysts were criticized because they did not appear to anticipate
strategic problems. Prominent absences of warning include Pearl Harbor, the
terrorist attacks of September 11, 2001, and the Arab Spring. The second cat-
egory of perceived intelligence failures can be labeled flawed assessments. These
are episodes in which intelligence analysts were criticized for making judgments
that later appeared to be inaccurate. Prominent examples of flawed assessments
include intelligence reports concerning the October War, the Iranian Revolution,
and Iraq’s presumed WMD programs.
According to the conventional wisdom surrounding the politics of uncer-
tainty and blame, foreign policy analysts can deflect criticism by leaving their
assessments of uncertainty vague. Almost by definition, none of the perceived
intelligence failures in the absence of warning category fits this logic. Quite the
opposite: if analysts had explicitly assessed the probability of events like Pearl
Harbor, 9/11, or the Arab Spring, then there would have been no basis for arguing
that the Intelligence Community had failed to consider alternative viewpoints, or
that analysts had lacked the imagination to anticipate strategic surprise.21
Even if analysts had assigned a low probability to these events—say, just five
or ten percent—then it would be much more plausible to characterize these
cases as policy failures instead of intelligence failures. For instance, one reason
20
I compiled this list by conducting searches for the term “intelligence failure” in JSTOR, Google
Scholar, and the journals Intelligence and National Security and International Journal of Intelligence and
CounterIntelligence. This section takes no stance as to whether any of these cases truly deserved to be
called a “failure,” as opposed to the sort of surprises that are inevitable when assessing uncertainty in
world politics. On the importance and the difficulty of drawing this distinction, see Richard K. Betts,
Enemies of Intelligence: Knowledge and Power in American National Security (New York: Columbia
University Press, 2007).
21
These are, respectively, the central argument of Roberta Wohlstetter’s criticism of intelligence
analysis regarding Pearl Harbor, and one of the main conclusions of the 9/11 Commission’s report.
See Roberta Wohlstetter, Pearl Harbor: Warning and Decision (Stanford, Calif.: Stanford University
Press, 1964); and National Commission on Terrorist Attacks Upon the United States [9/11
Commission], Final Report (Washington, D.C.: U.S. Government Printing Office, 2004).
Figure 5.1 Perceived U.S. intelligence failures, 1949–2016.
The Politic s o f Uncer taint y and Blame 141
why Japan’s attack on Pearl Harbor was so destructive was that U.S. forces had
taken few defensive precautions. U.S. aircraft were parked wingtip-to-wingtip,
presenting easy targets for Japanese bombers. Meanwhile, many of the ships
stationed at Pearl Harbor sustained damage because their hatches were open on
the morning of the attack.22 If U.S. commanders had been warned before the
fact that there was even a small chance of attack, then they could not have later
argued that their lack of preparation resulted from having no awareness of the
potential threat.
Similar dynamics surround the politics of blame regarding the terrorist attacks
of September 11, 2001. In the months leading up to the attacks, the Intelligence
Community was well aware that al Qaeda was determined to strike within the
United States. The Central Intelligence Agency even wrote an August 2001
briefing for President George W. Bush to that effect. But intelligence analysts
did not warn decision makers that al Qaeda might turn airplanes into missiles in
order to cause unprecedented damage.
The Intelligence Community’s harshest critics argue that it should have
been possible to warn the White House about this vulnerability. For example,
a Federal Bureau of Investigation (FBI) agent stationed in Phoenix had sent a
memorandum to headquarters in July 2001 advising of the “possibility of a co-
ordinated effort by Usama bin Ladin” to send al Qaeda operatives to U.S. flight
schools. In August, the FBI’s Minnesota field office opened an investigation into
Zacarias Moussaoui, an extremist who had raised alarm bells when he sought to
learn how to fly a Boeing 747 despite lacking the usual qualifications for flight
training.23 Though CIA Director George Tenet was briefed about Moussaoui’s
suspicious behavior, neither he nor the FBI shared this information with the
White House, and the FBI did not grant special priority to these investigations
until after the 9/11 attacks had transpired.24
Other observers defend the Intelligence Community’s performance prior
to 9/11 by arguing that signals of airborne terrorism remained ambiguous;
that intelligence officials did not have sufficient evidence to prioritize these
investigations over countless other threats to national security; and that even
if the Intelligence Community had briefed President Bush about al Qaeda’s in-
terest in flight training, it would have been politically impossible to take effective
22
Uri Bar-Joseph and Rose McDermott, “Pearl Harbor and Midway: The Decisive Influence of
Two Men on the Outcomes,” Intelligence and National Security, Vol. 31, No. 7 (2016), p. 954.
23
Moussaoui was in fact working with al Qaeda, and 9/11 pilots Mohamed Atta and Marwan
al-Shehhi had visited the same flight school.
24
9/11 Commission, Final Report, pp. 272–276. For more detail on how the Intelligence
Community might have anticipated the 9/11 attacks, see Amy Zegart, Spying Blind: The CIA, the FBI,
and the Origins of 9/11 (Princeton, N.J.: Princeton University Press, 2007).
142 Wa r a n d C h a n c e
countermeasures at the time.25 Yet simply passing along this warning—even with
the caveat that it reflected a low-probability threat—would have allowed the
Intelligence Community to counter the criticism that its analysts had overlooked
the issue, or that they had failed to envision a major strategic vulnerability.
Of course, intelligence analysts cannot ring alarm bells about every possible
threat to national security. Analysts who continually predict the occurrence of
threats that do not materialize could develop a reputation for “crying wolf” that
would undermine their credibility. Yet that is exactly the kind of problem that trans-
parent probabilistic reasoning prevents. Assessing the probability of some event is
different from saying that the event will occur (and, in fact, explicitly indicates that
the event is not guaranteed to take place). If decision makers then determine that
these probabilities are not large enough to warrant preventive action, they cannot
later claim they had no reason to believe that these threats could materialize.
25
See, for example, Richard Posner, Preventing Surprise Attacks: Intelligence Reform in the Wake
of 9/11 (Lanham, Md.: Rowman and Littlefield, 2005); Pillar, Intelligence and U.S. Foreign Policy,
pp. 233–280; and Betts, Enemies of Intelligence, pp. 104–116.
26
ORE 3/1, “Soviet Capabilities for the Development and Production of Certain Types of
Weapons and Equipment,” October 31, 1946.
27
Interdepartmental Intelligence Study, “Status of the Russian Atomic Energy Product,”
December 15, 1947.
28
ISO/SR-10/49/1, “Status of the USSR Atomic Energy Project,” August 25, 1949. For discus-
sion of this report and other contemporary analyses, see Donald P. Steury, “How the CIA Missed
Stalin’s Bomb,” Studies in Intelligence, Vol. 49, No. 1 (2005).
The Politic s o f Uncer taint y and Blame 143
mark. Intelligence analysts could surely have reduced their exposure to criti-
cism in this case by making their judgments more cautious or by leaving their
reasoning vague.
At the same time, we have already seen how vague assessments of uncertainty
led to far greater levels of criticism regarding the Intelligence Community’s
assessments of nuclear weapons programs in Iraq. The 2002 NIE on this subject
is often criticized for presenting a “slam dunk” conclusion that Saddam Hussein
was pursuing nuclear weapons. This assessment played a major role in public
debates about the 2003 invasion of Iraq, and it is now widely seen as one of the
most consequential intelligence failures in U.S. history.29
It is, however, important to note that intelligence analysts did not actually
say that Saddam Hussein was pursuing nuclear weapons. The NIE’s crucial judg-
ment said, “We judge that Iraq has continued its weapons of mass destruction
programs in defiance of UN resolutions and restrictions.” As we saw in chapter 1,
the estimative verb, “we judge,” is technically supposed to highlight the presence
of uncertainty—it is the logical opposite of saying that this conclusion was a
“slam dunk.”30 The NIE then contained an explicit dissent from the U.S. State
Department, highlighting concerns about the lack of concrete evidence to sup-
port the claim that Saddam Hussein was building nuclear weapons.
A literal reading of this document would thus grant intelligence analysts a
wide berth to explain exactly what they believed. According to the conventional
wisdom about the politics of uncertainty and blame, this vagueness should have
shielded the Intelligence Community from criticism. Of course, that is not what
happened. Most critics of the Iraq NIE simply ignored analysts’ use of estima-
tive verbs and pilloried the Intelligence Community for implying false certainty.
This is not to absolve the Intelligence Community for its conclusions re-
garding Iraq’s non-existent WMD programs. Most intelligence agencies—both
in the United States and in other countries—believed that Saddam Hussein
was building nuclear weapons in 2002.31 If the NIE’s authors had committed
their views to paper explicitly, they would surely have assigned this hypothesis
a high probability, and their judgment would surely have been criticized after
the fact. But even if the NIE’s authors had assessed these chances to be as high
29
For an overview of debates surrounding the Iraq NIE, see Jervis, Why Intelligence Fails, pp. 123–
155; and Fingar, Reducing Uncertainty, pp. 89–115.
30
The “slam dunk” phrase originated when CIA Director George Tenet told President Bush that
he could provide enough evidence to convince the American public that Iraq was pursuing weapons
of mass destruction. Thus, the “slam dunk” phrase referred to political messaging, not to levels of cer-
tainty surrounding intelligence analyses themselves. See George Tenet, At the Center of the Storm: My
Years at the CIA (New York: HarperCollins, 2007), p. 62.
31
Robert Jervis, “Reports, Politics, and Intelligence Failures: The Case of Iraq,” Journal of Strategic
Studies, Vol. 29, No. 1 (2006), p. 18.
144 Wa r a n d C h a n c e
as eighty or ninety percent, they could not have been accused of concealing the
fact that they lacked conclusive evidence.32 The notion that analysts argued this
judgment was a “slam dunk” was always mistaken. The fact that this perception
spread so widely throughout public discourse serves as a prime example of how
critics can exploit ambiguous language in ways that harm analysts’ credibility.
The remainder of this section will show how similar dynamics surround several
other perceived intelligence failures.
For similar discussions of this point, see Michael Morell, The Great War of Our Time: the CIA’s
32
Fight against Terrorism from Al Qa’ida to ISIS (New York: Twelve, 2014), pp. 102–103. Betts, Enemies
of Intelligence, p. 116; Jervis, Why Intelligence Fails, p. 44.
33
National Security Action Memorandum 55, “Relations of the Joint Chiefs of Staff to the
President in Cold War Operations,” 28 June 1961.
34
“Memorandum from the Joint Chiefs of Staff to Secretary McNamara,” Foreign Relations of the
United States [FRUS] 1961–1963, Vol. X, Doc 35 (3 February 1961).
The Politic s o f Uncer taint y and Blame 145
The Joint Chiefs’ report began by explaining that its goal was to evaluate the
plan “to effect the overthrow of the Castro regime.” But in the last paragraph of
the document’s second appendix, the Joint Chiefs explained that their defini-
tion of “ultimate success” only referred to the “objective stated in paragraph 2b.”
Understanding this cryptic statement requires consulting another section of the
report, in which the Joint Chiefs defined that specific objective as “hold[ing] a
beachhead long enough to establish a provisional government, act[ing] as a ral-
lying point for volunteers and a catalyst for uprisings throughout Cuba.”35 This
outcome was substantially more limited than toppling Fidel Castro, let alone
doing so in a manner that concealed U.S. involvement.36
Moreover, the Joint Chiefs’ assessment only applied to the original concept
for the invasion, known as the Trinidad Plan. The concept that the United States
actually implemented, known as the Zapata Plan, involved a different order of
battle, a new landing site, and substantially less U.S. air support. Neither the Joint
Chiefs nor any of President Kennedy’s other advisers prepared a formal assess-
ment of the plan that was actually set in motion.37
The documents used to justify the Bay of Pigs invasion thus came nowhere
near providing a clear assessment of the chances that the invasion would work.
Some participants claim that these documents were actually intended to warn
the president about the invasion’s risks; the assessments applied to tactical
outcomes rather than strategic objectives; and these assessments referred to
a plan that was substantially different from the one that President Kennedy
implemented. According to this conventional wisdom about the politics of un-
certainty and blame, these are exactly the kinds of vague, indirect assessment of
35
FRUS 1961–1963, Vol. X, Doc 35 (3 February 1961).
36
This nuance was clearly lost on some CIA planners. A CIA briefing paper thus states: “The Joint
Chiefs of Staff have evaluated the military aspects of the plan . . . . They have concluded that ‘this plan
has a fair chance of ultimate success’ (that is of detonating a major and ultimately successful revolt
against Castro).” Although that is the most intuitive way of interpreting the Chiefs’ stated views, it is
not what their report actually said. “Paper Prepared in the Central Intelligence Agency,” FRUS 1961–
1963, Vol. X, Doc 46 (17 February 1961).
37
President Kennedy and Secretary McNamara understood that the Joint Chiefs would object to
the withholding of air support, in particular, but they did not know how strong this objection would
be. Secretary McNamara acknowledged in a post-mortem discussion that “the Chiefs never knew
about” the decision to cancel air strikes and that this was an issue about which “they all felt strongly.”
“Memorandum for the Record,” FRUS 1961–1963, Vol. X, Doc 199 (3 May 1961). Yet, even after
the fact, it is hard to determine what the Joint Chiefs might have said about the last-minute change.
Thus, when Joint Chiefs Chairman Lyman Lemnitzer was asked to explain how his assessment of
the Zapata Plan might have deviated from the original “fair chance” estimate, he replied: “I could
put words together and say that we said that Trinidad had a fair chance and that Zapata had less than
a fair chance, but actually we felt that Zapata had a fair chance but of a lower grade than Trinidad.”
“Memorandum for the Record,” FRUS 1961–1963, Vol. X, Doc 221 (18 May 1961).
146 Wa r a n d C h a n c e
uncertainty that should shield foreign policy analysts from criticism. As histo-
rian James Rasenberger puts it, the documentary record in this case reflects “a
triumph of bureaucratic equivocation over clarity.”38
Yet President Kennedy did not grant his advisers any benefit of the doubt
when it came to assigning accountability for the invasion’s failure. Instead,
President Kennedy emerged from this crisis with the view that he could not trust
the Joint Chiefs to provide sound advice. “Those sons of bitches,” he remarked
to an aide, had “just sat there nodding, saying it would work.”39 This was not a
stray remark: several historians have argued that this episode created a rift be-
tween the president and his uniformed military leadership that continued to
grow throughout the early stages of the Vietnam War.40 Thus, whatever the Joint
Chiefs intended to achieve by offering vague and indirect assessments of the Bay
of Pigs invasion, their equivocation ultimately backfired, enabling a major for-
eign policy blunder while eroding credibility with the White House.
James Rasenberger, The Brilliant Disaster: JFK, Castro, and America’s Doomed Invasion of the Bay
38
44
On the U-2 overflight controversy, see Max Holland, “The ‘Photo Gap’ That Delayed Discovery
of Missiles,” Studies in Intelligence, Vol. 49, No. 4 (2005), pp. 15–30. On intelligence estimates re-
garding missiles in Cuba and decision makers’ engagement with that information, see Jonathan
Renshon, “Mirroring Risk: The Cuban Missile Estimation,” Intelligence and National Security, Vol. 24,
No. 3 (2009), pp. 315–338.
45
NIE 30-73, Possible Egyptian-Israeli Hostilities: Determinants and Implications (May 17, 1973).
148 Wa r a n d C h a n c e
unlikely moving forward, it only said that war was not necessarily more likely
than not. Almost any probability of war in the medium-to long-run would have
been consistent with the language of this NIE.
Intelligence reports continued to offer vague and cautious predictions
leading up to the war’s onset on October 6. A U.S. Intelligence Board esti-
mate from October 4 stated, “We continue to believe that an outbreak of
major Arab-Israeli hostilities remains unlikely for the immediate future, al-
though the risk of localized fighting has increased slightly.” An October 5 CIA
Bulletin described the Egyptian military buildup but argued that it “do[es] not
appear to be preparing for a military offensive against Israel.” An October 6
Bulletin stated: “For Egypt a military initiative makes little sense at this critical
juncture.”46
Each of these statements indicated that intelligence analysts did not see the
outbreak of war as being more likely than not. But none of these statements pro-
vided a clear or confident prediction that war would not in fact take place. Once
again, vague descriptions of uncertainty allowed decision makers to draw the
conclusion that the risk was too small to worry about, even though that is not
what intelligence analysts had actually said. This is, in fact, one lesson that the
CIA itself took away from the crisis: the agency’s official post-mortem on its own
performance during the October War argued that analysts had offered “rather
timid cautionary advice” and that it would have been better to have explicitly
assessed the risk of war.47
Combined Watch Report of the U.S. Intelligence Board, No. 1206 (October 4, 1973); CIA
46
Bulletin (October 5, 1973); CIA Bulletin (October 6, 1973). For commentary on these and
other relevant documents, see Director of Central Intelligence, The Performance of the Intelligence
Community before the Arab-Israeli War of October 1973 (December 20, 1973); and Matthew T.
Penney, “Intelligence and the 1973 Arab-Israeli War,” in President Nixon and the Role of Intelligence in
the 1973 Arab-Israeli War (Washington, D.C.: Center for the Study of Intelligence, 2013), pp. 6–13.
47
The CIA post-mortem is reprinted in Director of Central Intelligence, Performance of the
Intelligence Community before the Arab-Israeli War.
The Politic s o f Uncer taint y and Blame 149
firmness in handling the situation as evidence that there was no major crisis.
As the U.S. Embassy in Tehran explained in August 1978, “At some point, the
Shah may be forced to repress an outbreak with the iron fist and not the velvet
glove if Iran is to retain any order at all. We have no doubt that he will do so if
that becomes essential.”48
Intelligence reporting throughout this period suggests that analysts took this
logic for granted, seemingly dismissing out of hand the prospect of governmental
collapse. The National Foreign Assessment Center (NFAC) reported in April
1978 that “the riots, demonstrations, and sabotage in many cities in towns in
recent weeks are no threat to government stability.” An August 1978 NFAC as-
sessment argued that “Iran is not in a revolutionary or even a ‘prerevolutionary’
situation . . . this does not at present threaten the government.” Reflecting the
prevailing view that state repression was the Shah’s trump card in reserve, a
September 1978 assessment stated that Iranian security forces possessed “a mo-
nopoly of coercive force in the country” and thus they retained “the ultimate say
about whether the Shah stays in power.”49
Judged with the benefit of hindsight, these statements demonstrate clear
overconfidence. Yet these statements did not really assess uncertainty at all,
apparently taking the shah’s stability for granted. In his analysis of the episode,
Robert Jervis explains that a systematic problem during this period was that in-
telligence products largely stuck to reporting on current events instead of de-
veloping deeper analyses of long-term prospects that might have revealed weak
points in the conventional wisdom.50 Jervis therefore concludes that “the case of
Iran reveals a need for analysts to make sharp and explicit predictions.”51 Even if
48
For post-mortems on intelligence about the Iranian Revolution, see Robert Jervis, Why
Intelligence Fails: Lessons from the Iranian Revolution and the Iraq War (Ithaca, N.Y.: Cornell University
Press, 2010), ch. 2; and William J. Daugherty, “Behind the Intelligence Failure in Iran,” International
Journal of Intelligence and CounterIntelligence, Vol. 14, No. 4 (2001), pp. 449–484. The quote in this
paragraph is from Jervis, Why Intelligence Fails, p. 70.
49
These and other relevant assessments are quoted in Jervis, Why Intelligence Fails, ch. 2.
50
For example, the U.S. Intelligence Community did not produce a NIE analyzing the Shah’s
prospects. Intelligence officials began drafting an NIE during the summer of 1978, but shelved the
effort well before potential publication.
51
Jervis, Why Intelligence Fails, p. 46. Jervis (p. 49) furthermore explains how vague estimates
not only prevented analysts from conveying uncertainties to decision makers, but that this practice
may have also hindered analysts from adapting to new information. Part of the difficulty of analyzing
the Iranian protests was that there was no single moment that made clear that faith in the Shah was
misplaced. Instead, Jervis writes, discrepant information “arrived bit by bit over an extended period
of time.” Because analysts never explicitly assessed the probability of regime change, they had no way
to keep track of these minor adjustments as they built up over time. This is another reason why Jervis
concludes that “explicit predictions would have been especially helpful,” allowing analysts to make
and account for incremental revisions to the status quo.
150 Wa r a n d C h a n c e
analysts had assessed the chances of regime change to be small, they could not
have been accused of false certainty or of dismissing the protests’ viability out
of hand.
Summary
This section has examined the empirical record of perceived U.S. intelligence
failures since World War II. Only the 1949 Soviet nuclear test supports the no-
tion that analysts expose themselves to preventable criticism by making clear
and honest assessments of uncertainty. In nine other cases, by contrast—the
eight absences of warning described in Figure 5.1 and the Iranian Revolution—
intelligence analysts received criticism because they did not appear to as-
sess key elements of uncertainty at all. After the Bay of Pigs invasion and the
assessments of Iraq’s WMD programs, vague assessments of uncertainty
backfired by allowing critics to make analysts’ judgments seem more mistaken
than they really were. And in the cases of the Cuban Missile Crisis and the
October War, vague assessments of uncertainty appeared to create the impres-
sion that analysts did not believe that high-stakes risks were worth worrying
about.52
Altogether, this experience suggests that the conventional wisdom about
the politics of uncertainty and blame may have it exactly backward: rather
than shielding intelligence analysts from preventable attacks, vague proba-
bility assessments have generally allowed critics to claim that the Intelligence
Community had neglected potential surprises, presented conclusions with false
confidence, or downplayed important risks. And despite the inherent limitations
of drawing these kinds of inferences directly from the historical record, the next
section will replicate these findings in an experimental context where it is pos-
sible to draw sharper causal inferences about why critics judge some assessments
of uncertainty more harshly than others.
52
The other five cases provide less insight into the politics of uncertainty in blame, but they
are generally consistent with the notion that decision makers interpret vague assessments of
low probabilities as an indication that a risk is not worth worrying about. In some cases, as
with the Soviet invasion of Czechoslovakia or Saddam Hussein’s invasion of Kuwait, intelli-
gence analysts did provide decision makers with relatively clear advance warnings, supporting
the argument that the politics of uncertainty and blame have less to do with analysts’ actual
assessments than with policy outcomes that are outside their control. See Jeffrey A. Friedman,
“Probability Assessment and National Security Decision Making: The Politics of Uncertainty
and Blame,” paper presented to the 2016 annual meeting of the American Political Science
Association (Philadelphia, Penn.).
The Politic s o f Uncer taint y and Blame 151
Experimental Evidence
In order to gain a clearer picture of the causal dynamics behind the politics of
uncertainty and blame, I designed a survey experiment for a nationally repre-
sentative sample of 3,000 respondents.53 I chose to focus this experiment on the
general public because, out of all of the potential sources of criticism that foreign
policy analysts face, I expect the general public to evaluate assessments of uncer-
tainty in a manner that is the least sophisticated and thereby the most problem-
atic. Thus, as with the experiments discussed in chapter 4, I deliberately designed
this research in a manner that should be most likely to support pessimistic views
about the constraints on assessing uncertainty in international politics.54
As explained in c hapter 4, survey experiments can never replicate the com-
plexity and intensity of real foreign policy debates. Yet by isolating specific
variables within an experimental framework, it is possible to generate rigorous
insights about how respondents intuitively evaluate assessments of uncertainty.
This method makes it possible to observe how critics naturally calibrate their
levels of criticism depending on subtle changes in probability estimates; to com-
pare how much these judgments are shaped by judgmental accuracy as opposed
to factors such as policy outcomes; and to examine whether critics intuitively
criticize explicit probability estimates more harshly than equivalent qualitative
expressions.55 The results from this experiment reinforce the historical patterns
described in the previous section. The data show that vague assessments of
53
I administered this survey through Survey Sampling International on April 22–23, 2016. Fifty-
four percent of respondents were female; 11 percent were black or African American; and 14 percent
were Hispanic. Respondents’ average age was 44. All results presented in this section are robust to
employing survey weights. Unless otherwise noted, the results presented below only include data for
the respondents who passed the survey’s recall questions. Appendix section 3b demonstrates that all
results are robust to including respondents at any level of minimum performance on the recall questions.
54
Of course, even if we can expect the public to be the least sophisticated critics of national secu-
rity analysis, their criticism might also be less political than the kinds of adversarial arguments made
by elected officials and pundits. As explained in this chapter’s first section, however, adversarialism
should actually reduce incentives to leave foreign policy analysis vague, on the grounds that adver-
sarial critics should be more likely to exploit elastic redefinition to make analysts’ judgments seem
worse than they really were.
55
In this respect, the survey experiment presented in this section resembles the manner in which
other international relations scholars have examined the determinants of public attitudes toward foreign
policy decision making. The literature on “audience costs” in international relations is particularly ger-
mane, as the main thrust of this literature is to explore the conditions under which leaders suffer public
criticism, and the use of survey experiments to study these dynamics is well-established. See, for ex-
ample, Michael R. Tomz and Jessica L. P. Weeks, “Public Opinion and the Democratic Peace,” American
Political Science Review, Vol. 107, No. 4 (2013), pp. 849–865; and Joshua D. Kertzer and Ryan Brutger,
“Decomposing Audience Costs,” American Journal of Political Science, Vol. 60, No. 1 (2016), pp. 234–249.
152 Wa r a n d C h a n c e
uncertainty tend to leave foreign policy analysts worse-off, and that the prospect
of criticism does not necessarily distort their incentives in the manner the con-
ventional wisdom suggests.
Experimental Design
The experiment involved four decision scenarios: a proposed hostage-rescue
mission, a decision about whether to restrict commercial air travel in light of a
rumored terrorist threat, the prospect of arresting a foreign official accused of
embezzling U.S. development aid, and a drone strike against a person suspected
of being a high-ranking terrorist. Figure 5.2 summarizes key elements of these
scenarios, and the appendix provides their full texts.56
56
See appendix section 3a.
The Politic s o f Uncer taint y and Blame 153
Each vignette began with a paragraph describing the basic decision problem.
For example, here is the first paragraph of the vignette involving a rumored
terrorist plot:
• Analysts conclude that there is a 40 percent chance that this plot is real.
Decision makers review the information and decide to halt all flights
leaving California for one week. This move costs the airline industry
more than $1 billion and creates national alarm. The threat is later re-
vealed to have been a hoax, and stopping air travel appears to have saved
no lives.
57
Within each of these conditions, respondents saw a range of possible probability assessments,
varying from zero to one hundred percent (and their verbal equivalents). See appendix section 3a for
details.
154 Wa r a n d C h a n c e
• Analysts conclude that it is unlikely that this plot is real. Decision makers re-
view the information and decide to allow air travel to continue. The threat is
later revealed to have been a hoax.
• Analysts conclude that there is a 10 percent chance that this plot is real.
Decision makers review the information and decide to allow air travel to con-
tinue. Days later, four airliners leaving California are destroyed in explosions,
killing several hundred passengers.
After each scenario, the survey asked respondents to say how much criticism
analysts deserved for the way they had assessed uncertainty. The survey elicited
these responses on scales from zero to one hundred.58 The mean for this variable
was 36, with a standard deviation of 34.59
Eliciting numeric assessments, rather than using a coarser rating scale, is important for
58
examining potentially nonlinear effects. It is not possible to conduct a direct experimental test of
whether public opinion resembles a strictly proper scoring rule without eliciting granular responses.
59
I chose to ask respondents to provide levels of criticism—as opposed to using a response scale
that would have allowed respondents to express both positive and negative impressions—because
scholars of blame avoidance in public policy generally assume that pressures to reduce criticism out-
weigh incentives to claim credit. Scholars also generally believe that this so-called negativity bias is
especially strong in national security, where secrecy ensures that analysts’ failures are typically more
public than their successes. Actions that raise average scores on a feeling thermometer could run
contrary to public officials’ interests if they intensify negative opinions. Directly eliciting negative
opinions is thus the most appropriate way to evaluate the key drivers of the politics of uncertainty and
blame. On negativity bias in public policy analysis, see Hood, Blame Game, pp. 9–14; and Weaver,
“Politics of Blame Avoidance.”
The Politic s o f Uncer taint y and Blame 155
70
Quantitative assessments
Respondent criticism (95% Cl) 60 Qualitative assessments
50
40
30
20
60
These results reflect a two-way, paired sample t-test. I also asked respondents to say how much
criticism they believed the decision makers deserved in each scenario. Clearer assessments of uncer-
tainty led to small reductions of blame assigned to decision makers, too, an effect that was also statis-
tically significant at the p < 0.01 level.
61
To capture the judgmental error associated with qualitative probability assessments, I ran-
domly assigned each judgment a numeric value inside the range by which the National Intelligence
Council defines that term. See c hapter 1 on how the National Intelligence Council defines words of
estimative probability according to seven equally-spaced segments of the number line. All results
presented below hold if I instead defined words of estimative probability using the Director of
National Intelligence’s guidelines, or interpret verbal terms using the number at the midpoint of each
term’s respective range.
156 Wa r a n d C h a n c e
Numeric probability
Judgmental error
Policy failure
Constant
–10 0 10 20 30
Estimated effect (95% intervals)
Figure 5.4 Why respondents criticize some assessments of uncertainty more harshly
than others. The figure shows the impact each randomized variable exerted on the
amount of criticism respondents assigned to foreign policy analysts. Ordinary least
squares regression with robust standard errors and respondent fixed effects (N = 5,151).
Multivariate Analysis
To build a more complete picture of the factors that shape the politics of un-
certainty and blame, I conducted a multivariate analysis that took the fol-
lowing randomized factors into account: (i) whether analysts expressed their
judgments using numbers versus words; (ii) the judgmental error associated
with each probability assessment; (iii) the square of judgmental error, which
is the key parameter underpinning the Brier score; (iv) an indicator capturing
whether or not a probability assessment fell on the “wrong side of maybe,” such
that its judgmental error exceeded fifty percent; and (v) whether or not decision
makers ended up making the right choice on the basis of analysts’ assessments.62
Figure 5.4 summarizes the results from this analysis.63 The first row of the
figure shows, once again, that respondents assigned less criticism, on average, to
numeric probability assessments.64 The next three lines of Figure 5.4 show that
That is, whether decision makers made one of the errors of omission or commission described
62
in Figure 5.2. If we instead measure the treatment effects associated with errors of omission and
commission separately, those effects prove to be nearly identical. In Figure 5.3, those respective
coefficients (standard errors) would be 11.51 (1.08) and 10.85 (1.10).
63
The model is ordinary least squares regression, with robust standard errors and respondent
fixed effects.
64
This finding is statistically significant at the p = 0.01 level.
The Politic s o f Uncer taint y and Blame 157
65
This finding (p = 0.24) falls well short of the standard threshold for statistical significance.
66
Since respondent criticism increased with the square of judgmental error, it is not possible
to equate this “outcome failure” penalty to any specific amount of judgmental error. Thus, the pen-
alty associated with an extra ten percentage points’ worth of judgmental error would be different
depending on the original amount of judgmental error entailed.
67
Again, see Gurdal, Miller, and Rustichini, “Why Blame?,” on this point.
68
Any multiple of the Brier score is also a strictly proper scoring rule. The right-hand panel of
Figure 5.4 thus multiplies the Brier score by a coefficient that minimizes the difference between
respondents’ data and a strictly proper scoring rule.
69
These questions were substantially more challenging than simple “attention checks,” in which
respondents are asked to give a particular answer in order to ensure that they are reading instructions
at all. For example, I asked whether survey-takers could recall the specific probability estimate which
analysts gave in each case.
Respondents passing Respondents passing Respondents passing Brier score
<3/4 recall questions 3/4 recall questions 4/4 recall questions
60 60 60 60
50 50 50 50
40 40 40 40
20 20 20 20
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Judgmental error (%) Judgmental error (%) Judgmental error (%) Judgmental error (%)
Figure 5.5 Respondents’ intuitive scoring rule. The figure plots the relationship between judgmental error and the
criticism respondents assigned to those judgments, or what I have called the respondents’ “intuitive scoring rule.”
The first three plots indicate that as respondents pay more attention to the scenarios (as measured by the number of
recall questions they passed), their scoring rule becomes steeper and more convex. The fourth panel compares their
performance to a strictly proper scoring rule based on a transformation of the Brier score.
The Politic s o f Uncer taint y and Blame 159
whole, but how their evaluations changed as a result of closer engagement with
substantive material.
Figure 5.5 shows that the more closely respondents engaged with the details
of the survey, the more their responses approximated the Brier score.70 On the
far-left panel of Figure 5.5, plotting data for respondents who answered fewer
than three of four recall questions correctly (32 percent of respondents overall),
we see that judgmental error only weakly predicts variations in respondent crit-
icism, and that this relationship is essentially linear. In the second panel of the
survey, plotting data for respondents who answered three of four recall questions
correctly (26 percent of respondents), the relationship between judgmental
accuracy and respondent criticism becomes steeper and it accelerates more
quickly. In the third panel of the survey, capturing responses from the 42 percent
of participants who answered all four recall questions correctly, the empirical
relationship between judgmental accuracy and respondent criticism is almost
identical to the Brier score.
Conclusion
These patterns refute the common notion that ordinary people lack the capacity
to evaluate assessments of uncertainty in reasonable ways. Without any training
or special instructions on how to approach this challenge, a nationally repre-
sentative sample of respondents calibrated their reactions to judgmental error
about as well one could possibly hope for. On the whole, these data do not re-
flect knee-jerk, over-the-top reactions. Instead respondents’ views were sensitive
to nuance and they rewarded analysts for making judgments that were clearer
and more accurate.
Although this is still a far cry from claiming that people are fully rational
when it comes to evaluating the quality of foreign policy analysis, these findings
contradict a range of prominent concerns about how the prospect of criticism
distorts incentives for assessing uncertainty in international politics. If anything,
the qualitative and quantitative evidence presented in this chapter suggests that
vague probability assessments expose foreign policy analysts to more criticism
than they prevent. More generally, the chapter explained that just because foreign
policy analysts receive unfair criticism, this does not necessarily distort analysts’
incentives to assess uncertainty in clear and structured ways. Perceptions to
the contrary appear to reflect another area in which the conventional wisdom
exaggerates the obstacles to assessing uncertainty in foreign policy discourse.
70
I plotted these relationships using fractional polynomials, which is a flexible method for de-
fining curves that best fit empirical data.
6
The book’s previous chapters have examined the logic, psychology, and politics
of assessing uncertainty in international affairs. Those chapters explained how
foreign policy analysts can assess subjective probabilities in clear and structured
ways that add meaningful information to high-stakes policy debates. This chapter
now takes a closer look at how decision makers can use probability assessments
to evaluate high-stakes foreign policy choices.
The chapter begins by showing how probabilistic reasoning can inform for-
eign policy choices through a logic called “break-even analysis.” Then the chapter
presents a longer and more nuanced argument that explains how subjective
probabilities are especially important when it comes to evaluating foreign policy
decisions that play out over extended periods of time. In some cases, it can actu-
ally be impossible to make rigorous judgments about the extent to which policies
are making acceptable progress without assessing subjective probabilities in de-
tail. This argument is significant because it departs from a large body of existing
scholarship on learning in international politics which assumes that leaders can
use a straightforward logic of trial and error to determine to update their beliefs
about policy effectiveness. In this way, the chapter explains that even rationalist
scholars of international politics underestimate the importance of transparent
probabilistic reasoning in foreign policy analysis.
1
Another key element of decision making under uncertainty, which the chapter does not discuss,
involves determining whether it is better to decide on a course of action immediately as opposed to
delaying that choice in order to gather additional information. To simplify this discussion, the chapter
treats the opportunity costs of forgoing additional information as only one component of a decision’s
161
162 Wa r a n d C h a n c e
use the letter C to represent an action’s expected costs, the letter B to repre-
sent the benefits a decision maker expects to receive if the action is successful,
and the letter p to represent the probability that the action will in fact succeed,
then the decision is worthwhile so long as pB > C .2 We can rewrite this expres-
sion to say that a decision is worthwhile as long as its probability of success is
greater than the ratio of costs to benefits ( p > C / B ).
In an ideal world, decision makers would estimate each of these factors pre-
cisely.3 But that is not actually necessary in order to make sound choices, so long
as decision makers can make informed judgments regarding which side of this
inequality is greater than the other. The more precisely decision makers estimate
either their probability of success or the ratio of costs and benefits, the more am-
biguity they can accept when dealing with the other.
This logic is called break-even analysis. The game of poker provides a classic
example of how break-even analysis can simplify decision making under uncer-
tainty.4 The central challenge in poker is determining the chances that your cards
are good enough to win the hand. But poker players do not need to determine
these chances precisely to make sound bets. Poker players always know how
much money they need to bet to stay in the game ( C ), and they always know
how much money they will win if the bet pays off ( B ). The ratio of these costs
and benefits ( C / B ) is known in poker as “pot odds.” As long as players believe
that their chances of winning the hand are larger than the hand’s pot odds, then
it is in their interest to make a bet.
For instance, if there is $100 in the pot and it takes $10 to call, then the benefits
of winning the hand will be ten times greater than the costs of attempting to do
so. With a cost-benefit ratio of 10/100 (or 0.10), a player would need to be-
lieve that she has more than a ten percent chance of success to make the gamble
overall expected costs. For more complete normative foundations of decision making under uncer-
tainty, readers should consult John W. Pratt, Howard Raiffa, and Robert Schlaifer, Introduction to
Statistical Decision Theory (Cambridge, Mass.: MIT Press, 1995); and Robert L. Winkler, Introduction
to Bayesian Inference and Decision, 2nd ed. (Sugar Land, Tex.: Probabilistic Publishing, 2003), among
others.
2
A more generalizable version of this logic would be to start by identifying all outcomes a de-
cision might entail, and to estimate the costs and benefits associated with realizing each of those
possibilities. This makes the analysis more demanding in a technical sense, but it does not change the
basic theoretical logic of break-even analysis.
3
On the normative foundations for estimating each of these parameters, see Pratt, Raiffa, and
Schlaifer, Introduction to Statistical Decision Theory.
4
On the theory of break-even analysis, see Winkler, Introduction to Bayesian Inference and Decision,
ch. 6. For a rigorous application of this idea in a national security context, see John Mueller and Mark
G. Stewart, Terror, Security, and Money: Balancing the Risks, Benefits, and Costs of Homeland Security
(New York: Oxford University Press, 2011). See also Richard Posner, Catastrophe: Risk and Response
(New York: Oxford University Press, 2004).
A nalys is and Decis i on 163
worthwhile.5 It does not matter whether the player is uncertain about whether
her chances of success lie somewhere between twenty percent and forty percent,
since every possibility within this range is high enough to clear the break-even
point. If, instead, there is $40 in the pot and it takes $20 to call, then a player
will need at least a fifty percent chance of success to justify trying to win the
hand. Once again, it would not matter if the player is uncertain about whether
her chances of success are between twenty and forty percent. All possibilities
within this range would now lie below the break-even point, and it would be in
the player’s interest to fold.
Of course, the logic of break-even analysis is harder to apply in areas of de-
cision making, like international politics, where costs and benefits are tough to
measure. The national interest is notoriously difficult to define.6 When President
Obama and his advisers met to consider striking Osama bin Laden’s compound
in Abbottabad, for instance, they would have had to consider the strategic value
of damaging al Qaeda, the political value of capturing or killing bin Laden him-
self, the risk that soldiers and civilians would be harmed in the operation, and
the damage that the operation would do to U.S.-Pakistani relations. There is no
objective way to measure any of these factors, let alone to combine them into a
single index.7 Thus, while poker players can quickly calculate their “pot odds” for
winning any hand, foreign policy decision makers almost always face substan-
tial ambiguity in determining what their chances of success would have to be to
make a risky action worthwhile.
In these cases, it can be more useful to conduct break-even analysis from the
opposite direction: instead of using the ratio of costs and benefits to determine
what the probability of success would need to be to make the choice worthwhile,
decision makers can begin by estimating their probability of success, and they
can use that judgment to ask what the ratio of costs and benefits would have
to look like to justify a risky choice. This approach is attractive because, unlike
trying to estimate every factor that plays into the national interest, a decision’s
probability of success represents a single parameter. As we saw in chapter 2, de-
cision makers can always estimate this parameter precisely if they are willing to
approach the challenge in clear and structured ways.
5
I have simplified this discussion by omitting risk preferences. Formally, the costs and benefits in
this case are determined by the utility that poker players attach to different monetary outcomes, and
not the monetary outcomes themselves.
6
On the “conceptual chaos” of valuing outcomes in international relations, see David A. Baldwin,
“Success and Failure in Foreign Policy,” Annual Reviews of Political Science, Vol. 3 (2000), pp. 167–182.
7
It is always possible to reduce multi-attribute utility functions to numeric indices, but this is
an inherently subjective task. See Ralph L. Keeney and Howard Raiffa, Decisions with Multiple
Objectives: Preferences and Value Tradeoffs (New York: Wiley, 1976).
164 Wa r a n d C h a n c e
We have already seen that President Obama said he thought the odds that bin
Laden was living at Abbottabad were similar to a coin flip. This would represent
a probability estimate of fifty percent, or 0.50. If a decision is worthwhile when
its probability of success is greater than its cost-benefit ratio ( p > C / B ), then
as soon as President Obama determined that there was a fifty percent chance
that bin Laden was at Abbottabad, he should have also believed that a raid on
the compound was justified as long as he thought that the decision’s ratio of
costs to benefits was less than one-half.8 This makes it possible to hone President
Obama’s decision problem to the question of whether or not the prospective
benefits of killing or capturing bin Laden were at least twice as large as the pro-
spective costs of sending U.S. forces to Abbottabad.
Making this determination may be inherently subjective, but that is a far cry
from saying that the problem is intractable. Indeed, I find it difficult to believe
that any senior foreign policy official, if presented with this question directly,
would have argued that the benefits of killing or capturing bin Laden were not
twice as large as the raid’s expected costs. We have also seen how President
Obama’s assessment of the chances that bin Laden was living at Abbottabad was
deliberately cautious. Chapter 2 explained how a simple average of estimates
expressed at this meeting would have produced a probability estimate that
was closer to two in three. In that case, the raid would have been justified if the
benefits of killing or capturing bin Laden were only one-and-a-half times as great
as the raid’s expected costs.
Of course, the challenges of grappling with the uncertainty surrounding the
bin Laden raid did not prevent President Obama from achieving one of his signa-
ture national security accomplishments. The episode nevertheless demonstrates
how senior leaders struggled with a conceptual challenge that was, in fact, rel-
atively straightforward to resolve. There might have been no “right answer”
when it came to estimating the chances that Osama bin Laden was living in
Abbottabad (at least, not on the basis of the information available to U.S. leaders
at the time). But as long as decision makers are willing to address such ambiguity
in a principled manner, they can always use that information to evaluate difficult
trade-offs. And there are many other cases in which the struggles that foreign
In making this calculation, I have assumed that a Special Forces raid on the Abbottabad com-
8
pound would certainly kill or capture bin Laden if he were present. As then Secretary of Defense
Robert Gates wrote in his memoir, “I had total confidence in the ability of the SEAL team to carry
out the mission. My reservations lay elsewhere.” Robert Gates, Duty: Memoirs of a Secretary at War
(New York: Alfred A. Knopf, 2014), p. 540. However, adding uncertainty about the chances of the
raid’s success would not make the decision problem much more difficult. We could thus reframe the
critical probability estimate as being the chances that a Special Forces raid on the compound would
kill or capture bin Laden, as opposed to the chances that bin Laden was present.
A nalys is and Decis i on 165
9
On the details of this hostage-rescue attempt, see Adam Entous, Julian E. Barnes, and Siobhan
Gorman, “Intelligence Gaps Crippled Mission in Syria to Rescue Hostages James Foley, Steven
Sotloff,” Wall Street Journal, September 5, 2014; Karen DeYoung, “The Anatomy of a Failed Hostage
Rescue Deep in Islamic State Territory,” Washington Post, February 14, 2015.
166 Wa r a n d C h a n c e
10
Two important studies describing the challenges of strategic assessment are Scott Sigmund
Gartner, Strategic Assessment in War (New Haven, Conn.: Yale University Press, 1997); and Ben
Connable, Embracing the Fog of War (Santa Monica, Calif.: Rand, 2012).
11
Prominent examples of the modeling frameworks that I describe in this section include
R. Harrison Wagner, “Bargaining and War,” American Journal of Political Science, Vol. 44, No. 3 (2000),
pp. 469–484; Darren Filson and Suzanne Werner, “A Bargaining Model of War and Peace,” American
Journal of Political Science, Vol. 46, No. 4 (2002), pp. 819–838; Branislav Slantchev, “The Principle
of Convergence in Wartime Negotiations,” American Political Science Review, Vol. 97, No. 4 (2003),
pp. 621–632; Robert D. Powell, “Bargaining and Learning While Fighting,” American Journal
of Political Science, Vol. 42, No. 2 (2004), pp. 344–361; and Alastair Smith and Allan C. Stam,
“Bargaining and the Nature of War,” Journal of Conflict Resolution, Vol. 48, No. 6 (2004), pp. 783–813.
A nalys is and Decis i on 167
12
In some models, those probabilities are allowed to change in a manner that is common knowl
edge. But this is effectively the same as arguing that the probability of success in each round of fighting
is a known function of a common parameter, which itself does not change from one round of fighting
to the next.
13
Thus, Smith and Stam, “Bargaining and the Nature of War,” p. 627, write: “The act of waging
war reveals information about the relative strength of each side.” Filson and Werner, “Bargaining
Model of War and Peace,” p. 820, explain that “war itself provides the information necessary for
disputants to reach a settlement to end the war.” This notion that war automatically provides cred-
ible information is accepted by a broad range of international relations scholars. For example, H. E.
Goemans explains that “war makes agreement possible because war provides information . . . . As the
war progresses, at least one side must learn that its estimate [of the balance of power] was wrong.”
H. E. Goemans, War and Punishment (Princeton, N.J.: Princeton University Press, 2000), pp. 27–28.
Or Dan Reiter, “Exploring the Bargaining Model of War,” Perspectives on Politics, Vol. 1, No. 1 (2003),
p 31, writes: “Combat can reduce uncertainty by providing combatants with information about the
actual balance of power.”
14
Slantchev, “Principle of Convergence,” p. 627, thus argues that “it is impossible for the unin-
formed player to become optimistic with time” as combat unfolds. Similarly, Powell, “Bargaining and
Learning,” p. 349, writes that each time a round of fighting passes without causing an opponent to
collapse, a rational decision maker “becomes more confident that it is facing the more powerful type
[of opponent]. . . . Indeed, the odds of facing [the more powerful opponent] increase with each round
168 Wa r a n d C h a n c e
of fighting . . . . This is the sense in which fighting conveys information.” Again, it is straightforward
to adapt this logic to other policy areas. For example, we could argue that every time an opponent
resists the pressure of economic sanctions for a year, it should reduce decision makers’ estimates of
the chances that the opponent will concede moving forward. Or, we could argue that every time a
country holds a corrupt election, this should reduce the perceived chances that this country will con-
duct a fair contest in the future.
A nalys is and Decis i on 169
wins a battle (or every time a policy produces some tactical success), that should
make her more optimistic about her broader strategic prospects. Mathematical
logic can formalize how much decision makers should update these beliefs, but
the direction of that updating is straightforward and intuitive.
These assumptions are extremely useful for building game-theoretic models.
Yet it is clearly implausible to argue that a military strategy’s chances of suc-
ceeding remain fixed and repeated over time. For that matter, it is hard to con-
ceive of any major foreign policy for which that assumption actually holds, and
I am unaware of any scholar who has ever attempted to substantiate such a claim.
When leaders design military strategies, apply economic sanctions, pursue de-
mocracy promotion, or implement many other major foreign policies, they usu-
ally understand that those policies have virtually no chance of succeeding in the
short run. Instead, the assumption is that, over time, these policies will make
gradual progress toward a desired goal.
For example, when the Johnson administration debated committing U.S.
combat forces to Vietnam in 1965 (see c hapter 1), few senior officials believed
that the Communists would surrender right away. Secretary of Defense Robert
McNamara captured the views of U.S. military leaders in writing that “none of
them expects the [Viet Cong] to capitulate or to come to a position acceptable
to us, in less than six months.”15 Assistant Secretary of State William Bundy sim-
ilarly argued that “we may have to hang on quite a long time before we can hope
to see an improving situation in South Viet-Nam,” let alone before the South
Vietnamese regime would be able to stand on its own.16 When intelligence
estimates predicted the impact of ground-force commitments, they typically
stated that the escalations would not cause the Communists to buckle immedi-
ately. “The real test,” according to one report, “would be that of combat,” and it
was only if “the tide of battle runs against the Viet Cong for a substantial period”
that the Communists might possibly resort to negotiations.17
These statements express a logic that is very different from the standard
assumptions international relations scholars use to model strategic assess-
ment, which hold that every round of fighting provides meaningful information
that helps decision makers to resolve uncertainty. If McNamara, Bundy, and
other senior officials truly believed there was no chance that the Vietnamese
Communists would concede in the short run, then the fact that the Communists
15
Robert McNamara to President Johnson, April 21, 1965, Foreign Relations of the United States,
1964–1968, Vol. II, Doc 265.
16
William P. Bundy, “Where Are We Heading?” Pentagon Papers (Gravel edition), February 18,
1965, Vol. III, Doc 252.
17
Directorate of Intelligence, “Memorandum: Reactions to a Further US Buildup in South
Vietnam,” 10 June 1965: Estimative Products on Vietnam, pp. 255–260.
170 Wa r a n d C h a n c e
continued to fight throughout this period would not have provided any new in-
formation about their capabilities and resolve. Of course, this does not preclude
saying that decision makers could have drawn on other kinds of information
to revise their beliefs about the chances that their strategy would succeed. The
point is instead that the mere act of continued fighting would not have helped
the Johnson administration to update its strategic assessments in the straightfor-
ward manner scholars have traditionally assumed. This is a clear sense in which
the cumulative dynamics of armed conflict depart from conventional wisdom
on learning in international politics.
Yet the place where the cumulative dynamics of international politics diverge
most sharply from the standard logic of strategic assessment is in how they allow
actors to update their beliefs in different directions, even as they observe iden-
tical experiences. The next section will describe the theoretical foundations
of this claim in more detail. But to build basic intuitions on this subject, con-
sider Napoleon’s invasion of Russia in 1812. The French Army won the opening
battles in this war. The conventional wisdom on learning in international poli-
tics would thus say that Napoleon should have become progressively more opti-
mistic about his chances of continuing to defeat his opponents in future battles,
and that Russia should have become progressively more pessimistic about its
own strategic prospects. Yet Russia’s leaders knew that Napoleon’s chances of
success would not remain fixed and repeated from one battle to the next. As
Napoleon’s invasion dragged on, his army lost manpower, supplies, and morale.
Indeed, Russia’s defensive strategy depended on the notion that these cumula-
tive losses would drain momentum from Napoleon’s invasion. In this sense, it
was entirely plausible—and, indeed, accurate—for Russian leaders to believe
that their strategy was working at a strategic level, even as the Russian Army lost
one battle after another.
In principle, scholars could amend standard models of strategic assessment
to anticipate this dynamic: for example, by drawing assumptions about how
France’s probability of winning battles could decline by a particular amount
from one round of fighting to the next. In practice, however, there are an infinite
number of ways in which one could capture these potential trends, and there is
no reason why French and Russian leaders should have held similar perceptions
of what those trends entailed.18 Indeed, the fact that the French and Russian
leaders could have held different assumptions about this issue implies that both
18
On the ambiguity surrounding perceptions of relative power, see Jonathan Kirshner,
“Rationalist Explanations for War?” Security Studies, Vol. 10, No. 1 (2000), pp. 143–150. Slantchev,
“Principle of Convergence,” p. 623n6, acknowledges that relative power might change from one
round of fighting to the next, but states that “it is not clear a priori” how this would happen, and
so he brackets the issue. Note that the indeterminacy of these relative power shifts is exactly
A nalys is and Decis i on 171
sides could have simultaneously believed that their strategies were working. If
the French leaders believed that their probability of success remained relatively
constant from one battle to the next, then they would have seen their early per-
formance as an optimistic indication of eventual strategic victory. Meanwhile, if
Russian leaders believed that their probability of success would rise over time,
then they might have seen the outcome of these early battles as being largely
irrelevant to the war’s ultimate outcome. The next section delves further into
this argument, showing how these kinds of probability assessments can lead ra-
tional decision makers to draw starkly different lessons from observing the same
experience.
what leads the analysis presented in this chapter to depart from conventional models of strategic
assessment.
19
Thus, we could represent this logic in terms of the number of years it might take for economic
sanctions to coerce opponents into making a concession, or the number of electoral cycles it might
take before a regime is willing to conduct a fair contest.
172 Wa r a n d C h a n c e
100 1
80 .8
Marbles originally in the urn
60 .6
40 .4
number of marbles that were originally in the urn. For the first ten draws, you
should have no reason to update this judgment, because you did not believe that
the urn could contain fewer than ten marbles. This is similar to the notion that
U.S. officials had no reason to update their beliefs about the breaking point of
the Vietnamese Communists as a result of the fighting that immediately followed
their decision to escalate the war. We have already seen how this contrasts with
conventional notion that every round of fighting should provide rational decision
makers with an opportunity to revise their initial assessments of uncertainty.
The dashed line in Figure 6.1 represents your estimate of the number of mar-
bles remaining in the urn. This is equivalent to asking how much longer it will
take or how much more it will cost to achieve your intended objectives. Note
how the dashed line shows that your estimate of this quantity will continually
decline as the experiment continues. The intuition behind this pattern is that,
even if you initially underestimated the number of marbles remaining in the
urn, every draw brings you one step closer to the point where the urn’s contents
are exhausted. The dotted line in Figure 6.1 represents the probability that the
next marble drawn from the urn will prove to be the last. Note that this proba-
bility continually increases as the experiment continues. Every time the experi-
menter pulls a marble from the urn, it progressively depletes the urn’s remaining
contents, just as military officials might assume that every round of fighting
degrades an opponent’s remaining capabilities and resolve.
The purpose of this thought experiment is not to say that foreign policy de-
cision makers should myopically become more optimistic as they implement
their policies. Indeed, one of the central ways in which this theoretical logic
A nalys is and Decis i on 173
20
For an extended and more formal discussion of these arguments, see Jeffrey A. Friedman,
Cumulative Dynamics and Strategic Assessment, Ph.D. diss. (Cambridge, Mass.: Harvard University,
2013).
Normal prior Weibull prior
Probability of success
Probability of success
Cost Cost
Lognormal prior Exponential prior
Cost Cost
Figure 6.2 How probability estimates shape strategic assessment. The shaded regions in the graphs represent
estimates of the chances that a policy will succeed at different levels of cost. The dashed line represents the
remaining cost rational actors should expect to pay to achieve their goals conditional on previous investment.
Note that expected remaining cost can rise, fall, or remain constant depending on decision makers’ initial
assumptions.
A nalys is and Decis i on 175
each graph represent a decision maker’s initial perceptions of the chances that
a policy will succeed at each potential level of investment. The dashed line in
each graph represents a parameter that we can call expected remaining cost. This is
the amount of further investment rational decision makers would expect to pay
before achieving their strategic objectives, conditional on already committing a
certain level of resources toward that goal.
The upper-left panel of Figure 6.2 shows how a decision maker should adapt
her expectations over time if her initial beliefs about the cost that might be re-
quired to achieve her objectives follow a normal distribution: the bell-shaped
curve that social scientists often use to model uncertainty.21 This graph shows
how estimates of expected remaining cost will continually decrease for a deci-
sion maker whose prior beliefs correspond to the normal distribution. By con-
trast, the upper-right panel of Figure 6.2 shows how decision makers should
adjust their perceptions of uncertainty based on prior beliefs that correspond to
the heavy Weibull distribution. A decision maker who holds these initial beliefs
about the way her chances of success will rise and fall over time should con-
tinually become more pessimistic about expected remaining cost. The bottom-
right panel in Figure 6.2 shows how perceptions of expected remaining cost will
hold constant over time if a decision makers’ prior assumptions follow the ex-
ponential distribution.22 And the bottom-left panel in Figure 6.2 shows how ex-
pected remaining cost can fluctuate based on a prior assumption that follows the
lognormal distribution. Here, we see that expected remaining cost initially falls,
but subsequently begins to rise.
I chose to present the four probability distributions in Figure 6.2 not be-
cause I think that they necessarily reflect common assumptions about the un-
certainty that surrounds foreign policy decision making, but because these
distributions are so similar. The normal and lognormal distributions on the left
side of Figure 6.2 are closely related to each other, as are the exponential and
Weibull distributions on the right side of Figure 6.2. These graphs thus indicate
how subtle differences in probability assessments carry diverging implications
for rational learning in international politics.23
21
Unlike the assumptions presented in the stylized thought experiments, the normal distribution
is unbounded, such that it assigns some probability to every possible value of a given variable (in this
case, the cost that leaders might have to pay to achieve their strategic goals).
22
This provides a more general result, which is that expected remaining cost should only rise
when a decision maker’s initial assumptions have a heavier tail than the exponential distribution.
23
There is an extensive debate among international relations theorists over whether rational
actors can enter war (or any other form of coercive bargaining) without holding “common priors”
about the uncertainty surrounding the resolution of disputes. This debate lies outside the scope of
the chapter’s analysis. The logic presented here is premised instead on the descriptive observation
that combatants often do enter armed conflict and other coercive bargaining situations with different
expectations about their strategic prospects. For more on the common priors debate in international
176 Wa r a n d C h a n c e
Conceptual Summary
This section has examined the logic of strategic assessment in international
politics. The discussion focused, in particular, on how tactical outcomes shape
perceptions of strategic progress. Of course, this is only one source of infor-
mation that leaders can use to update their beliefs about the viability of mili-
tary strategies or other foreign policies. Nevertheless, the relationship between
tactical experience and strategic expectations looms large over the theory
and practice of international politics. We saw in chapter 1 that the Kennedy
and Johnson administrations devoted enormous effort to surveying tactical
outcomes in Vietnam, in the hopes that this information would help to opti-
mize U.S. military strategy.24 And this chapter has described how contemporary
scholarship on strategic assessment, in large part, revolves around the question
of how leaders can use tactical successes and failures to improve their estimates
of how much longer it might take or how much more it might cost to achieve
their strategic goals.
This scholarship underestimates the analytic challenges of strategic assess-
ment. The standard approach to this topic assumes that tactical outcomes pro-
vide unambiguous signals that naturally eliminate uncertainty about a policy’s
strategic prospects. Yet this logic relies on the implausible assumption that a
policy’s chances of success remain fixed and repeated. By relaxing that as-
sumption, this section explained how the strategic lessons that rational actors
should draw from tactical outcomes are conditional on their beliefs about how
a policy’s chances of success might change over time. If leaders do not agree on
what those probability estimates entail, then there is no reason to expect their
expectations to converge. The chapter’s final section illustrates this problem
with examples drawn from the U.S. occupation of Iraq.
politics, see Mark Fey and Kristopher Ramsay, “Mutual Optimism and War,” American Journal of
Political Science, Vol. 51, No. 4 (2007), pp. 738–754; Branislav Slantchev and Ahmer Tarar, “Mutual
Optimism as a Rationalist Explanation for War,” American Journal of Political Science, Vol. 55, No.
1 (2011), pp. 135–148; Kirshner, “Rationalist Explanations for War?”; Mark Fey and Kristopher
Ramsay, “The Common Priors Assumption,” Journal of Conflict Resolution, Vol. 50, No. 4 (2006), pp.
607–613, draw a similar distinction between the descriptive fact that combatants often enter disputes
without possessing common priors and the normative question of whether or not noncommon
priors can be rational.
24
The recent wars in Iraq and Afghanistan have generated renewed interest in the question of
what leaders can (and cannot) learn from collecting these kinds of data. Jim Baker, “Systems Thinking
and Counterinsurgencies,” Parameters, Vol. 26, No. 4 (2006/07), pp. 26–43; Jonathan Schroden,
“Measures for Security in a Counterinsurgency,” Journal of Strategic Studies, Vol. 32, No. 5 (2009), pp.
715–744; Connable, Embracing the Fog of War.
A nalys is and Decis i on 177
25
Donald P. Wright and Timothy R. Reese, On Point II: Transition to the New Campaign (Ft.
Leavenworth, Kans.: Combat Studies Institute, 2008), p. 153.
26
Coalition casualty data are from the website iCasualties.org, Iraq Coalition Casualties Count,
http://www. icasualties.org. Iraqi civilian casualties data are from the website Iraq Body Count,
http://www. iraqbodycount.org. I downloaded both data sets on October 20, 2016.
178 Wa r a n d C h a n c e
3000
2000
50
1000
0 0
3
04
05
06
07
08
09
10
11
12
13
14
15
00
20
20
20
20
20
20
20
20
20
20
20
20
.2
n.
n.
n.
n.
n.
n.
n.
n.
n.
n.
n.
n.
ar
Ja
Ja
Ja
Ja
Ja
Ja
Ja
Ja
Ja
Ja
Ja
Ja
M
reverse unfavorable trends, and when violence declined, they argued that Iraq’s
relative stability remained too fragile to last without external support.
My purpose in describing these debates is not to stake claims about which side
was correct in evaluating any specific period of the war. The purpose of this sec-
tion is instead to show how proponents and critics of U.S. strategy each presented
internally coherent logics based on diverging beliefs about how long it might take
or how much it might cost to create a stable Iraq. These contrasting assessments
of uncertainty allowed observers to draw fundamentally different lessons from
observing the same experience. Those assumptions were always subjective and
open to dispute. But without resolving those disputes, decision makers and their
critics could not agree about whether or not U.S. strategy was making progress,
let alone whether that progress was sufficient to stabilize Iraq at acceptable cost.
This is the sense in which probabilistic reasoning lies at the heart of strategic as-
sessment, and how disputes about this reasoning can prevent drawing even ru-
dimentary judgments about the extent to which foreign policies are worthwhile.
On this point, see Aaron Rapport, Waging War, Planning Peace: U.S. Noncombat Operations
27
and Major Wars (Ithaca, N.Y.: Cornell University Press, 2015), ch. 3. Other secondary sources that
shaped my discussion of this debate include Gordon Rudd, Reconstructing Iraq: Regime Change, Jay
A nalys is and Decis i on 179
Garner, and the ORHA Story (Lawrence, Kans.: University Press of Kansas, 2011); Michael Gordon
and Bernard E. Trainor, Endgame: The Inside Story of the Struggle for Iraq, from George W. Bush to
Barack Obama (New York: Pantheon, 2012); and Wright and Reese, On Point II.
28
See George W. Bush, Decision Points (New York: Broadway, 2010), p. 258; Donald Rumsfeld,
Known and Unknown: A Memoir (New York: Sentinel, 2011) pp. 520, 664; George Tenet, At the
Center of the Storm: My Years at the CIA (New York: HarperCollins, 2007), p. 318.
29
Wright and Reese, On Point II, pp. 88–89.
30
Conceptually, this idea reflected a bimodal conception of uncertainty surrounding strategic
prospects in Iraq: if the United States were unable to contain violence at the early stages of the oc-
cupation, then the chances of stabilizing the situation would rapidly deteriorate. Thus, the longer
that the war dragged on, and the higher that violence climbed, the greater were the chances that the
United States was headed into a quagmire.
31
Gordon and Trainor, Endgame, p. 56, thus explain: “At the White House, the hope was that the
push toward sovereignty would soothe the Iraqis’ grievances over the occupation and take the steam
out of the insurgency. Politics, in effect, was to enable the military strategy.”
180 Wa r a n d C h a n c e
U.S. strategy in Iraq was thereby premised on the notion that current trends
did not reliably predict future prospects. As Secretary Rumsfeld wrote in a 2005
memorandum to President Bush, the “key question” surrounding the war effort
was “when there will be a clearly discernible ‘tipping point.’ Eventually, more
and more Iraqi people will decide that they will no longer side with the enemies
of the legitimate Iraqi government and move to the middle. And the people in
the middle, at some point, will decide that there is going to be a legitimate, free
Iraqi government.”32
Thus, even as the Bush administration acknowledged that the postinvasion
violence had been far more extensive than anticipated, senior officials did not
see these trends as reflecting basic flaws in their overall strategy. President Bush
wrote in his memoirs that while “the chaos and violence we witnessed were
alarming . . . I refused to give up on our plan before it had a chance to work.”33
Meanwhile, the administration pointed to several indicators of Iraq’s political
progress. Iraqi leaders approved an interim constitution in March 2004. Three
months later, Iraq formally regained its legal sovereignty. In January 2005, the
country held its first democratic elections. In October 2005, Iraq adopted a new
democratic constitution via national referendum.
Of course, none of these events proved to be the turning points for which
the Bush administration had hoped. In hindsight, it was clearly wrong to
believe that security conditions in Iraq were closely coupled to political
benchmarks. But the flaw in this reasoning had little to do with drawing
inferences from the violence trends themselves. Rather, the basic ques-
tion driving debates about U.S. strategy in Iraq was estimating the prob-
ability that U.S. and Iraqi efforts could reverse those trends. Critics and
proponents of U.S. strategy held different assumptions about whether those
chances were growing larger or smaller over time. These assumptions, in
turn, led foreign policy analysts to draw different conclusions about the via-
bility of U.S. strategy even though they were observing the same experience.
Without resolving the controversy surrounding these assumptions, the Bush
administration had no reason to abandon its belief that U.S. strategy was
making progress in Iraq. After all, there is nothing inherently implausible
about arguing that political and military efforts can help to reverse problem-
atic trends—indeed, that is exactly what happened during the next phase of
the conflict.
32
Donald Rumsfeld, “Progress in Iraq,” memorandum to President Bush, 29 November 2005.
33
Bush, Decision Points, p. 259.
A nalys is and Decis i on 181
34
Bush, Decision Points, pp. 363, 393. Secretary Rumsfeld recalled the situation in a similar way.
“Looking back,” he wrote, “it is now clear that the effect of the [Samarra] bombing proved a game
changer in Iraq.” Rumsfeld, Known and Unknown, p. 660.
35
A large number of excellent sources describe both the internal and public debates about the
Surge. The following sources were especially helpful in grounding my discussion: Bob Woodward,
The War Within (New York: Simon and Schuster, 2008); David Ucko, The New Counterinsurgency
Era (Washington, D.C.: Georgetown University Press, 2009); Thomas Ricks, The Gamble
(New York: Penguin, 2009); Peter D. Feaver, “The Right to Be Right: Civil-Military Relations and
the Iraq Surge,” International Security, Vol. 35, No. 4 (Spring 2011), pp. 87–125; Peter Mansoor, Surge
(New Haven, Conn.: Yale, 2013); and Gordon and Trainor, Endgame.
182 Wa r a n d C h a n c e
36
Elsewhere, I have argued that the Surge strategy was a necessary but insufficient component
for reducing Iraq’s violence, though it remains unclear whether additional forces were necessary for
this strategy to take effect. See Stephen Biddle, Jeffrey A. Friedman, and Jacob N. Shapiro, “Testing
the Surge: Why Did Violence Decline in Iraq in 2007?” International Security, Vol. 37, No. 1 (2012),
pp. 7–40.
A nalys is and Decis i on 183
reliable indicator of strategic prospects. Once again, the trend lines in violence
offered little direct support to either the critics or the proponents of the Surge.
The central challenge in evaluating this strategy was estimating the probability
that these trends would shift within a relevant time frame, and there was no
logical reason to assume that those chances had to remain fixed and repeated
over time. Thus, while the conventional wisdom among international relations
scholars holds that tactical outcomes provide unambiguous signals that resolve
uncertainty about a policy’s strategic prospects, debates about the Iraq Surge
show that this causal relationship can work in the exact opposite direction: be-
cause proponents and critics of the Surge held very different perceptions of this
strategy’s overall prospects, they drew fundamentally different lessons from
observing the same tactical experience.
37
Again, many excellent sources describe the debates about U.S. politics during this period along
with the breakdown of the SOFA negotiations specifically. Particularly useful for information my dis-
cussion here are Emma Sky, The Unraveling (New York: PublicAffairs, 2015); David Kilcullen, Blood
Year (New York: Oxford University Press, 2016); and articles by Stephen Biddle and Ali Khedery.
184 Wa r a n d C h a n c e
withdraw from a country that had not seen large-scale violence in several years.
Extrapolating from this past experience, withdrawal advocates assumed that the
probability of renewed violence in Iraq would remain low into the future.
Proponents of maintaining a U.S. military presence in Iraq were unconvinced
that Iraq’s relative stability would continue. In their view, even a limited number
of U.S. forces played an important role in bolstering Iraq’s military and police,
and in restraining Prime Minister Maliki’s sectarian tendencies. By removing
this support, they argued, U.S. withdrawal could raise the probability that Iraq’s
Sunnis and Shiites would return to the cycle of conflict that drove large-scale
violence during the occupation’s earlier years. In that sense, proponents of
maintaining a U.S. presence in Iraq argued that recent violence trends were mis-
leading, and that the country still required substantial assistance in order to con-
tinue consolidating its progress in politics and security.
It remains difficult to judge which of these predictions proved to be more
accurate. After another year of relative stability following U.S. withdrawal, sec-
tarian violence rose in 2013, and then reached unprecedented levels as Islamic
State militants seized several of the country’s Sunni population centers in 2014.
Proponents of withdrawing U.S. forces from Iraq were clearly mistaken in
believing that Iraq was in a position to consolidate progress on its own. Yet it
is also hard to predict the extent to which a limited number of U.S. forces in a
noncombat role could have prevented the Islamic State’s rise. A foreign military
presence during this period might even have proved to be counterproductive if it
exacerbated resentment or encouraged Iraqi leaders to shirk their responsibility
for managing domestic problems.
It thus remains an open question how much longer it would have taken or
how much it would have cost the United States to have left Iraq on a sustain-
able footing or whether this goal was ever feasible at all. Once again, resolving
this uncertainty had little to do with extrapolating trends in tactical outcomes.
Instead, the key issue shaping strategic decisions lay with estimating the proba-
bility that U.S. forces could help Iraq reach a point where it could stand on its
own within a feasible time frame. Though any attempt to estimate those chances
would have been inherently subjective, it is impossible to evaluate the wisdom of
withdrawing from Iraq without tackling that issue.
Conclusion
This chapter has shown that assessments of subjective probability provide cru-
cial foundations for learning and adaptation in international politics. Although
international relations scholars believe that leaders’ perceptions should natu-
rally converge as they observe foreign policies in action, we have seen that this
A nalys is and Decis i on 185
argument relies on the premise that a policy’s chances of success remain fixed
and repeated over time.38 I explained why that assumption is implausible, and
how it leads scholars to underestimate the genuine analytic challenge of stra-
tegic assessment. Tactical results in armed conflict or other areas of international
politics do not provide unambiguous signals about a policy’s future outcomes.
Instead, the chapter showed that the lessons leaders draw from observing tac-
tical successes and failures are conditioned by the way that those leaders assess
their strategic prospects. Without reaching consensus about what this uncer-
tainty entails, we cannot expect foreign policy analysts and decision makers
to agree on even rudimentary issues like whether or not a military strategy is
making acceptable progress.
The problem, of course, is that assessments of strategic prospects are inher-
ently subjective. There is almost never an obvious “right way” to estimate how
a foreign policy’s chances of success will rise and fall over time. Yet to say that
these beliefs are subjective is different from saying they are meaningless. We
have seen throughout the book that foreign policy analysts always possess a
theoretically coherent and empirically justifiable basis for assessing subjective
probabilities. This chapter showed that those judgments can provide useful—
indeed, sometimes crucial—foundations for structuring hard choices. The next
chapter concludes by drawing additional practical implications from the book’s
analysis and by describing directions for future research.
38
Or that these probabilities change in predictable ways that are known to decision makers in
advance.
7
After the terrorist attacks on September 11, 2001, many national security officials
worried that the next strike could be far worse. The nightmare scenario involved
a nuclear weapon being detonated on U.S. soil. After a receiving a briefing about
al Qaeda’s attempts to acquire a nuclear device from Pakistan, Vice President
Richard Cheney concluded: “If there’s a one percent chance that Pakistani
scientists are helping al Qaeda build or develop a nuclear weapon, we have to
treat it as a certainty in terms of our response.” The journalist Ron Suskind fa-
mously labeled Cheney’s statement the “One Percent Doctrine.” In a best-selling
book by that name, Suskind argued that the One Percent Doctrine characterized
the heightened priority that U.S. foreign policy officials have assigned to man-
aging low-probability, high-consequence risks in an age of global terrorism.1
On its face, the logic of the One Percent Doctrine seems absurd. If the
damage that a risk could cause is sufficiently high, and if policymakers have fea-
sible options for reducing that danger, then a one percent chance of catastrophe
could easily be large enough to justify a major response. From this perspective,
treating a one percent chance as a certainty is not careful thinking: it just inflates
the risk by two orders of magnitude.2 Foreign policy decisions are hard enough
without deliberately distorting assumptions. How could anyone justify debating
high-stakes choices in this manner?
There are at least two ways to answer that question. The first is that public
officials did not believe that they could justify their preferred policies based
on merit alone. In this view, consciously inflating low-probability threats could
have been a rhetorical strategy for overriding reasonable opposition, in much
the same way that Harry Truman’s secretary of state, Dean Acheson, famously
1
Ron Suskind, The One Percent Doctrine (New York: Simon and Schuster, 2006).
2
For a broader, related criticism of public discourse on the risk of terrorism, see John Mueller and
Mark Stewart, Chasing Ghosts: The Policing of Terrorism (New York: Oxford University Press, 2016).
187
188 Wa r a n d C h a n c e
argued that leaders sometimes need to present hard choices to the public in a
manner that is “clearer than truth.”3 Treating a one percent chance as a certainty
could thus be a tactic for avoiding open and honest discourse, impairing other
people’s abilities to make sound decisions.4
Yet one can also interpret the One Percent Doctrine as a tactic for improving
other people’s abilities to make sound decisions, and I suspect that is closer to
what the doctrine’s proponents had in mind. The premise behind this view is
that most people struggle to assess and interpret subjective probabilities. Thus,
even if a one percent chance of catastrophe deserved a major policy response, it
might be difficult for leaders and voters to prioritize combating threats that seem
so unlikely.5 Treating a one percent chance as a certainty could be a patronizing
but productive way to navigate difficult conceptual terrain.
In this respect, the One Percent Doctrine reflects an extreme example of the
aversion to probabilistic reasoning described throughout this book. The book
has explained that many scholars and practitioners believe there is no coherent
way to assess uncertainty in international politics, that making these assessments
provides no meaningful information, that transparent probabilistic reasoning
biases analysts or decision makers, that it exposes public officials to excessive crit-
icism, and that subjective probabilities are simply irrelevant to structuring hard
choices. We have also seen that these arguments do not withstand scrutiny. Even
though the most important assessments of uncertainty in international politics
are inherently subjective, the book has shown that it is always possible to deal
with this challenge in clear and structured ways, and that foreign policy analysts
and decision makers handle this challenge best when they confront it head-on.
In some cases, placing greater emphasis on assessing uncertainty can directly
influence major foreign policy decisions. When the Joint Chiefs of Staff argued
that the Bay of Pigs invasion had “a fair chance of ultimate success,” they enabled
one of the worst U.S. foreign policy blunders of the Cold War. The Pentagon sent
hundreds of thousands of soldiers to Vietnam without carefully analyzing the
risks of doing so. And even though intelligence analysts technically acknowledged
3
Dean Acheson, Present at the Creation: My Years at the State Department (New York: Norton,
1969), p. 375.
4
Joshua Rovner, Fixing the Facts: National Security and the Politics of Intelligence (Ithaca,
N.Y.: Cornell University Press, 2011); Chaim Kaufmann, “Threat Inflation and the Failure of the
Marketplace of Ideas,” International Security, Vol. 29, No. 1 (2004), pp. 5–48; John Mueller, Overblown
(New York: Free Press, 2006).
5
On the challenges that policymakers and the general public face in dealing with low-probability,
high-consequence threats, see Richard Posner, Catastrophe: Risk and Response (New York: Oxford
University Press, 2004); Cass R. Sunstein, Worst-case Scenarios (Cambridge, Mass.: Harvard
University Press, 2007); and Robert Meyer and Howard Kunreuther, The Ostrich Paradox: Why We
Underprepare for Disasters (Philadelphia, Penn.: University of Pennsylvania Press, 2017).
P robab ilit y and Foreig n Poli c y Di s c ours e 189
6
An added benefit to this component of the basic standard is that it would require policy
advocates to be clear about what their objectives entail.
190 Wa r a n d C h a n c e
thumb for all but the most extreme judgments. The data in chapter 3 suggest that anything less than
this level of precision would systematically sacrifice meaningful information.
P robab ilit y and Foreig n Poli c y Di s c ours e 191
We have nevertheless seen several examples of how foreign policy analysts often
conflate probability and confidence, and how they rarely assess these concepts
simultaneously. The most straightforward way to solve this problem is for for-
eign policy analysts to provide a clear statement of the chances that a judgment
is true, accompanied by a brief description of how the analysts arrived at that
conclusion.8 The difficulty of distinguishing between assessments of probability
and confidence provides another reason to favor making those judgments as ex-
plicit as possible.
None of the recommendations described here is particularly complicated.
The basic standard is essentially a matter of being transparent about the uncer-
tainty that lies behind predictions and policy recommendations. Implementing
the basic standard would nevertheless represent a fundamental shift in the char-
acter of foreign policy discourse, both in and out of government. Chapter 1
showed that even analytically inclined policymakers like the Whiz Kids fail to
meet the basic standard in the vast majority of their policy analyses. Similarly,
when one scans policy recommendations presented in newspaper op-eds or on
the pages of major policy journals, it is rare to see scholars, practitioners, and
pundits place assessments of uncertainty front and center in their analysis.
Government agencies possess the most obvious opportunities to estab-
lish these norms. Chapter 1 explained how virtually all intelligence and mili-
tary organizations develop official guidelines for assessing uncertainty. Those
guidelines could be changed at any time to incorporate the basic standard.
Media outlets possess similar opportunities to set rules of the road for policy
debates. Just as it is reasonable to expect that military planners should carefully
assess the chances that any dangerous operation will succeed, it is worth asking
scholars, practitioners, and pundits to thoughtfully describe the uncertainty that
surrounds any policy recommendation they offer in public forums. Editorial
boards can develop and enforce publishing guidelines that prevent making hard
choices seem clearer than truth. Journalists who cover foreign policy debates can
press policy advocates to describe the uncertainty their recommendations entail.
Decision makers can take steps to implement the basic standard by directing
their staffs to incorporate clear assessments of uncertainty into policy analyses.
The book has described several cases in which decision makers learned this
8
Chapter 2 explained that rigorous discussions of analytic confidence should include at least
three components: the reliability of the available evidence supporting a judgment, the range of rea-
sonable opinion surrounding that judgment, and the degree to which that judgment might change in
response to new information. While it may be infeasible to require that foreign policy analysts assess
these elements of uncertainty for every judgment they make, this standard of rigor is appropriate for
particularly high-stakes judgments, such as estimating the chances that Osama bin Laden was living
in Abbottabad, or estimating the chances that Saddam Hussein was pursuing nuclear weapons.
192 Wa r a n d C h a n c e
lesson the hard way. Thus, after the Bay of Pigs invasion collapsed, President
Kennedy wrote a National Security Action Memorandum that required the
Joint Chiefs of Staff “to present the military viewpoint in governmental councils
in such a way as to assure that the military factors are clearly understood before
decisions are reached.”9 Of course, decision makers would ideally implement
these lessons before major crises occur. Adopting the basic standard is one way
to do that.
What about cases in which neither analysts nor decision makers are naturally
inclined to provide transparent assessments of uncertainty? In this context, mul-
tiple advocacy can help to foster rigorous policy debates.10 Any time one side of
a policy debate seeks to make matters clearer than truth, their opponents should
have incentives to bring relevant doubts into better focus.
This kind of multiple advocacy can play out in a wide range of forums, in-
cluding Congressional testimony, television commentary, and debates in the
White House Situation Room. Critics who oppose a policy can always press
their rivals to assess the chances that this policy will succeed and to defend
that judgment against scrutiny. If policy analysts are unable or unwilling to
do this, then critics can point out that they do not possess a valid basis for
recommending that decision makers place lives and resources at risk.11 The
fact that policy analysts do not systematically press each other to justify their
arguments in this fashion helps to emphasize that the goal of improving foreign
policy analysis is not just a matter of setting better standards for intelligence
analysts and military planners. Scholars, journalists, and pundits can also play
an important role in helping to place assessments of uncertainty at the center of
foreign policy discourse.
9
National Security Action Memorandum 55, “Relations of the Joint Chiefs of Staff to the
President in Cold War Operations,” 28 June 1961.
10
Alexander L. George and Eric K. Stern, “Harnessing Conflict in Foreign Policy Making: From
Devil’s to Multiple Advocacy,” Presidential Studies Quarterly, Vol. 32, No. 3 (2002), pp. 484–508.
11
Indeed, the fact that scholars, practitioners, and pundits do not pursue this kind of multiple
advocacy more often provides one of the strongest indications of how aversion to probabilistic rea-
soning in foreign policy discourse does not simply reflect cynical avoidance of controversial issues.
To the extent that a policy’s critics do not make more effort to probe the uncertainty surrounding
high-stakes choices, it can only be because critics do not naturally consider this line of questioning,
or because they do not think that line of questioning will be productive.
P robab ilit y and Foreig n Poli c y Di s c ours e 193
this topic would engage many questions that the book did not address. In
closing, I will highlight four such issues: organizational culture, effort, emo-
tion, and values.
Chapter 2 argued that rigorously assessing uncertainty in international poli-
tics conflicts with the traditional emphasis that foreign policy analysts place on
objectivity. Indeed, chapter 2 explained why rigor and objectivity are all but im-
possible to achieve simultaneously when debating foreign policy decisions, given
that foreign policy decisions almost always depend on subjective probabilities.
Of course, saying that assessments of uncertainty are subjective does not imply
that they are meaningless. It is nevertheless important to acknowledge that
placing these judgments front and center in foreign policy discourse would clash
with many analysts’ and decision makers’ natural impulse to value objectivity.
Further research on organizational culture could provide valuable insights about
how to manage this tension.12
Another limitation of the book’s analysis is that it relied on experiments that
captured respondents’ intuitive responses to foreign policy issues. All things
being equal, this is a strength of the book’s research design: responses provided
on the basis of limited effort should generally underestimate the value of careful
probabilistic reasoning. Yet this also means that the book does not describe
the true frontiers of human capabilities in this area. Accomplishing that goal
requires merging the methodological tools described in the book with actual
analytic efforts conducted by foreign policy professionals.13
A third way to expand the book’s research program would draw upon insights
from the study of values and emotions in international politics. Some observers
might see the book’s focus on subjective probability as representing an overly
narrow focus on the rationalist elements of foreign policy analysis. There is in-
deed a great deal of evidence that emotion and rationality are complements, not
12
Relevant studies in this area include Rob Johnston, Analytic Culture in the U.S. Intelligence
Community (Washington, D.C.: Center for the Study of Intelligence, 2005); James Marchio, “The
Intelligence Community’s Struggle to Express Analytic Certainty in the 1970s,” Studies in Intelligence,
Vol. 58, No. 4 (2015), pp. 31–42; and Alan Barnes, “Making Intelligence Analysis More Intelligent,”
Intelligence and National Security, Vol. 31, No. 1 (2016), pp. 327–344.
13
For examples of important work already being conducted in this area, see David R. Mandel and
Alan Barnes, “Accuracy of Forecasts in Strategic Intelligence,” Proceedings of the National Academy of
Sciences, Vol. 111, No. 30 (2014), pp. 10984–10989, who analyze experimental work conducted by
Canada’s Intelligence Assessment Secretariat; and Bradley J. Stasny and Paul E. Lehner, “Comparative
Evaluation of the Forecast Accuracy of Analysis Reports and a Prediction Market,” Judgment and
Decision Making, Vol. 13, No. 2 (2018), pp. 202–211, who analyze experimental work conducted by
the U.S. Intelligence Advanced Research Projects Agency (IARPA). On IARPA’s broader efforts to
improve assessments of uncertainty in intelligence analysis, see Philip E. Tetlock and Daniel Gardner,
Superforecasting: The Art and Science of Prediction (New York: Crown, 2015).
194 Wa r a n d C h a n c e
See, for example, James Druckmann and Rose McDermott, “Emotion and the Framing of Risky
14
Choice,” Political Behavior, Vol. 30, No. 3 (2008), pp. 297–321; and Jennifer Lerner et al., “Emotion
and Decision Making,” Annual Review of Psychology, Vol. 66 (2015), pp. 799–823. On the argument
that emotion and rationality are complements, not substitutes, see Antonio Damasio, Descartes’
Error: Emotion, Reason, and the Human Brain (New York: Putnam, 1994).
15
For an important collection of essays on the extent to which rational conceptions of “good
judgment” apply to foreign policy decision making, see Stanley A. Renshon and Deborah Welch
Larson, eds., Good Judgment in Foreign Policy (Lanham, Md.: Rowman and Littlefield, 2003).
16
Prominent articulations of this concern include Stanley Hoffmann, Gulliver’s Troubles: Or, the
Setting of American Foreign Policy (New York: McGraw Hill, 1968), pp. 87–175; Alexander Wendt,
“Driving with the Rearview Mirror: On the Rational Science of Institutional Design,” International
Organization, Vol. 55, No. 4 (2001), pp. 1019–1049; Peter Katzenstein and Lucia Seybert, “Protean
Power and Uncertainty: Exploring the Unexpected in World Politics,” International Studies Quarterly,
Vol. 62, No. 1 (2018), pp. 80–93.
P robab ilit y and Foreig n Poli c y Di s c ours e 195
remaining doubts about Iraq’s WMD programs entailed. In this sense, placing
assessments of uncertainty front and center in debates about the Iraq War would
have provoked contentious value judgments rather than pushing those issues
aside. Identifying the best ways to grapple with these judgments is another area
in which further research can provide substantial insights for shaping foreign
policy discourse.17
There are thus many potential avenues for expanding the book’s analysis.
Each of these issues also highlights how debates about assessing uncertainty in
international politics are only partly a matter of developing pragmatic standards
for communicating key judgments. These controversies ultimately revolve
around theoretical and empirical questions about what assessments of uncer-
tainty mean and how they can be useful.
By helping to answer some of these questions, I hope that War and Chance
leaves readers with an optimistic outlook. Skeptical views of probabilistic rea-
soning are understandable, but they are also overblown, and they do not pre-
clude making meaningful judgments about major foreign policy issues. The main
reason to be concerned with existing approaches to assessing uncertainty in in-
ternational politics is not just because the current state of affairs is surrounded
by vagueness and confusion, but because it is genuinely possible to do better.
17
On the relationship between assessments of probabilities and values in high-stakes decision
making, see Robyn M. Dawes, Rational Choice in an Uncertain World (San Diego, Calif.: Harcourt
Brace Javonovich, 1988).
Appendix
S U P P O RT I N G I N F O R M AT I O N
F O R C H A P T E R S 3 –5
Contents
1. Supplementary Materials for Chapter 3
1a. Methodology for Estimating Returns to Precision in Probability
Assessment
1b. Logarithmic Scoring and Replication of Results
1c. Robustness of Empirical Results across Time Horizons
1d. Attributes Used to Examine Variation in Returns to Precision across
Individuals
1e. Full Analysis of Variation in Returns to Precision across Individuals
2. Supplementary Materials for Chapter 4
2a. Respondent Demographics (Experiments 1–4)
2b. Text of Drone and Security Force Scenarios (Experiments 1–2)
2c. Randomization of Probability Assessments (Experiments 1–2)
2d. Full Results for Supplementary Elite Sample (Experiment 2)
2e. Question Lists (Experiments 3–4)
2f. Method for Comparing the Accuracy of Qualitative and Quantitative
Probability Assessments (Experiments 3–4)
2g. Relationship between Certitude and Judgmental Accuracy
(Experiment 3)
2h. Informational Treatment and Placebo (Experiment 4)
2i. Full Analysis of Experiment 4
3. Supplementary Materials for Chapter 5
3a. Text of Experimental Vignettes
3b. Robustness of Empirical Results to Attention Check Completion
197
198 Appendix
1
All of the forecasting problems analyzed in chapter 3 have binary outcomes. The Brier score for
2
these forecasting problems is thus equivalent to (1 − p1 ) , where p1 is the probability the analyst
assigned to the observed outcome.
2
With a maximum of one forecast per day, recorded as a forecaster’s most recent estimate prior to
midnight, U.S. Eastern Time.
Appendix 199
3
Unlike the Brier score, higher values are better when applying the logarithmic scoring rule.
4
The questions used in the GJP were deliberately selected to produce estimates that fell in this
range. If the actual rate of occurrence for the outcomes that these forecasters predicted was too ex-
treme (i.e., too high or too low) then much larger volumes of data would have been required to esti-
mate forecasting accuracy.
Table A.1. Comparison of Aggregate Rounding Errors, Brier versus Logarithmic Scoring
Group Remote Very Unlikely Even chance Likely Very likely Almost certain
(.00–.14) unlikely (.29–.42) (.43–.56) (.57–.71) (.72–.85) (.86–1.0)
(.15–.28)
Note: The table examines how rounding probability estimates into seven equal bins influences predictive accuracy for forecasts within different segments
of the number line. Unless otherwise indicated, all findings in this table are statistically distinct from zero at the p < 0.001 level.
^
This statistic is not statistically significant at the p < 0.05 level.
Appendix 201
question’s closing date and with no more than five percent probability or no less
than ninety-five percent probability; (b) Period I forecasts, made up to 36 days
prior to a question’s closing date, but excluding Lay-Ups; (c) Period II forecasts,
made anywhere from 37 to 96 days prior to a question’s closing date; and (d) Period
III forecasts made more than 96 days prior to a question’s closing date. There were
109,240 Lay-Ups in the data set, and 259,696 forecasts in the other three categories,
respectively. Table A.2 analyzes returns to precision within each category.5
5
Note that forecasters receive larger rounding penalties for Period II forecasts than for Period
I forecasts: this is because we removed Lay-Ups from the Period I forecasts, whereas the Period II
forecasts retain substantial numbers of small-probability estimates.
6
See Ellen Peters et al. “Numeracy and Decision Making,” Psychological Science, Vol. 17, No. 5
(2006), pp. 407–413. The GJP changed numeracy tests between years 2 and 3 of the competition.
Table A.2. Returns to Precision across Time Horizons
Reference Brier score across all Rounding Errors
class numerical forecasts 7 WEPs, 7 WEPs, Confidence levels Estimative verbs
2015 version evenly spaced (3 bins) (2 bins)
Lay-Ups
All forecasters Mean: 0.065 42.4%*** 436.4%* 2578.1%** 5865.1%***
Median: 0.009 2.3%*** 40.4%*** 267.9%*** 628.3%***
Superforecasters Mean: 0.046† 869.4% 5334.5% 2.9e4%*** 6.6e-4%***
Median: 2.4e-4 278.9%*** 2047.9%*** 1.2e4%*** 2.6e-4%***
Period I forecasts: Forecasts registered up to 36 days prior to question closing
All forecasters Mean: 0.114 1.6%*** 8.1%*** 45.7%*** 118.0%***
Median: 0.064 1.2%*** 4.3%* 15.8%*** 46.9%***
†
Superforecasters Mean: 0.073 4.41%*** 29.1%*** 174.0%*** 422.7%***
Median: 0.006 6.9%** 68.5%*** 432.2%*** 1032.5%***
Period II forecasts: Forecasts registered between 37 and 96 days prior to question closing
All forecasters Mean: 0.148 1.0%*** 3.8% 21.0%*** 55.6%***
Median: 0.104 1.0%*** 1.9%* 8.5%*** 22.0%***
Superforecasters Mean: 0.119 32.0%*** 215.2%* 1251.6%*** 2869.9%***
Median: 0.047 2.5%** 14.5%*** 97.3%*** 255.1%***
Period III forecasts: Forecasts registered at least 97 days prior to question closing
All forecasters Mean: 0.187 0.7%*** 1.2% 6.6%*** 18.3%***
Median: 0.155 0.9%*** 0.8%* 3.7%*** 11.8%***
Superforecasters Mean: 0.119 0.9%*** 13.5%** 78.2%*** 204.4%***
Median: 0.047 0.7%** 6.3%*** 22.6%*** 72.9%***
Note: *p < 0.05, ** p < 0.01, *** p < 0.001. †On some questions, superforecasters’ average prediction was zero, creating a negatively infinite rounding penalty. We dropped those
observations in order to create the estimates shown here. WEPs = words of estimative probability.
Appendix 203
We standardized numeracy test results so that they represent comparable indices. If a respondent
participated in multiple years of the forecasting competition, we averaged Numeracy values
across years.
7
See W. Arthur Jr. et al., “College-Sample Psychometric and Normative Data on a Short Form of
the Raven Advanced Progressive Matrices Test,” Journal of Psychoeducational Assessment, Vol. 17, No.
4 (1999), pp. 354–361.
8
Jonathan Baron et al., “Why Does the Cognitive Reflection Test (Sometimes) Predict Utilitarian
Moral Judgment (and Other Things)?” Journal of Applied Research in Memory and Cognition, Vol. 4,
No. 3 (2015), pp. 265–284.
9
Mellers et al., “Psychological Strategies for Winning a Geopolitical Forecasting
Tournament.”
10
J. T. Cacioppo and R. E. Petty, “The Need for Cognition,” Journal of Personality and Social
Psychology, Vol. 42, No. 1 (1982), pp. 116–131. If a respondent participated in multiple competition
Table A.3. Predicting Individual-Level Returns to Precision
Model 1 Model 2 Model 3 Model 4 Model 5†
Note: Ordinary least squares regression predicting thresholds of estimative precision across respondents. Non-binary
independent variables standardized. Robust standard errors. *p < 0.05, **p < 0.01, ***p < 0.001.
†
Model 5 only retains observations available in Models 3–4.
Appendix 205
years, we averaged values across years. The GJP changed CRT tests after year 2 of the competition, so
we standardized each test’s results in order to provide comparable measures.
11
Adding a squared term for Number of Questions is statistically significant (p < 0.01), but
improves R2 by less than 0.01. A model containing all targets for cultivation less the Brier score
has a model fit of R2 = 0.17 for the full sample and for the 1,307 observations for which we have
full data.
12
Estimating Model 1 in a sample with those same 1,307 observations only returns R2 and AIC
scores of 0.37 and 6,898, respectively.
13
Because of concerns about protecting respondent anonymity among military officers, we were
able to ask fewer demographic questions of the elite sample respondents, and the questions we were
allowed to ask those respondents varied by cohort.
206 Appendix
Note: The Experiments row reflects the experiments in which each sample participated. NWC =
National War College; AMT = Amazon Mechanical Turk; N = “Number of respondents.”
Figure A.1b Security forces scenario (neutral version, qualitative assessment condition).
Hostage-Rescue Scenario
Hostages at compound Even chance Likely Very likely
(50%) (65%) (80%)
Special forces can retrieve Likely Very likely Almost certain
hostages (65%) (80%) (95%)
Soldiers wounded on mission Likely Even chance Unlikely
(65%) (50%) (35%)
Collateral damage Even chance Unlikely Very unlikely
(50%) (35%) (20%)
Hostages killed if mission fails Almost certain Almost certain Almost certain
(95%) (95%) (95%)
Drone Strike Scenario
House contains Qaeda leaders Even chance Likely Very likely
(50%) (65%) (80%)
Drone strike kills occupants Very likely Very likely Very likely
(80%) (80%) (80%)
House contains women/ Likely Unlikely Remote chance
children (65%) (35%) (5%)
Strike compromises surveillance Almost certain Almost certain Almost certain
(95%) (95%) (95%)
Local Security Forces Scenario
Jan can mobilize forces Even chance Likely Very likely
(50%) (65%) (80%)
Jan’s forces can secure border Unlikely Likely Almost certain
(35%) (65%) (95%)
Jan previously assisted Taliban Very likely Even chance Very unlikely
(80%) (50%) (20%)
Jan will secure illegal smuggling Almost certain Very likely Likely
(95%) (80%) (65%)
U.S. can retain local leaders’ Very unlikely Unlikely Even chance
support (20%) (35%) (50%)
210 Appendix
Note: The p-values in this table reflect two-way t-tests comparing differences in means.
• In your opinion, what are the chances that Afghanistan’s literacy rate is cur-
rently above 50 percent?
• In your opinion, what are the chances that Saudi Arabia currently exports
more oil than all other countries in the world combined?
• In your opinion, what are the chances that the United States currently has a
longer life expectancy than Jamaica?
• In your opinion, what are the chances that the United States currently
operates a military base in Ethiopia?
• In your opinion, what are the chances that the United States has an active ter-
ritorial claim in Antarctica?
• In your opinion, what are the chances that France currently has more soldiers
stationed in Afghanistan than any NATO member besides the United States?
Appendix 211
• In your opinion, what are the chances that more than 20 countries currently
operate nuclear power plants?
• In your opinion, what are the chances that Japan is currently a member of the
International Whaling Commission?
• In your opinion, what are the chances that Russia is a member of the Nuclear
Nonproliferation Treaty?
• In your opinion, what are the chances that the United States currently has free
trade agreements in place with fewer than 30 countries?
• In your opinion, what are the chances that fewer than 80 countries currently
recognize Taiwan’s independence from China?
• In your opinion, what are the chances that ISIS draws more foreign fighters
from Egypt than from any other country outside of Iraq and Syria?
• In your opinion, what are the chances that Russia’s economy grew in 2014?
• In your opinion, what are the chances that Haiti has the lowest per capita in-
come of any Latin American country?
• In your opinion, what are the chances that there are currently more Muslims
in the world than there are Roman Catholics?
• In your opinion, what are the chances that Sweden is a member of NATO?
• In your opinion, what are the chances that Tokyo’s stock exchange is the
second largest stock exchange in the world?
• In your opinion, what are the chances that the U.S. State Department cur-
rently lists Iran as a state sponsor of terrorism?
• In your opinion, what are the chances that the Arabic media organization al-
Jazeera currently operates bureaus in more countries than does CNN?
• In your opinion, what are the chances that the United States currently
possesses more than 2,000 nuclear warheads?
• In your opinion, what are the chances that the economy of North Korea is
larger than the economy of New Hampshire?
• In your opinion, what are the chances that German President Angela Merkel
is currently the longest-serving head of government in Western Europe?
• In your opinion, what are the chances that there are currently more refugees
living in Lebanon than in any other country in the world?
• In your opinion, what are the chances that the United States currently
conducts more trade with Mexico than with the European Union?
• In your opinion, what are the chances that the largest U.S. Embassy is cur-
rently located in Beijing?
• In your opinion, what are the chances that the U.S. defense budget is more
than five times as large as China’s defense budget?
• In your opinion, what are the chances that the United States currently operates
more aircraft carriers than all other countries in the world combined?
212 Appendix
• In your opinion, what are the chances that Israel receives more foreign aid
than any other country in the world?
• In your opinion, what are the chances that more than 3 million people live
within the borders of the Palestinian territories of the West Bank and Gaza?
• In your opinion, what are the chances that more than 5,000 people died as a
result of the Ebola outbreak in West Africa in 2014?
• In your opinion, what are the chances that within the next six months, Syrian
President Bashar al-Assad will be killed or no longer living in Syria?
• In your opinion, what are the chances that within the next six months, the
Iraqi Security Forces will reclaim control of either Ramadi or Mosul (or both)
from ISIS?
• In your opinion, what are the chances that there will be a new Pope within the
next six months?
• In your opinion, what are the chances that more than 50,000 U.S. citizens will
travel to Cuba within the next six months?
• In your opinion, what are the chances that more than 10 U.S. soldiers will be
killed fighting ISIS within the next six months?
• In your opinion, what are the chances that Nigeria has the largest population
of any African country?
• In your opinion, what are the chances that Turkey currently provides more
United Nations peacekeeping troops than any other country?
• In your opinion, what are the chances that Afghanistan currently produces
more than half of the world’s opium?
• In your opinion, what are the chances that the International Committee of
the Red Cross has more full-time staff members than the U.S. Department
of State?
• In your opinion, what are the chances that more than one-third of the world’s
population is currently under the age of 15?
• In your opinion, what are the chances that India tested its first nuclear weapon
in 1998?
• In your opinion, what are the chances that Norway has a larger economy than
New Jersey?
• In your opinion, what are the chances that Iraq’s President is required by law
to be Kurdish?
• In your opinion, what are the chances that Morocco is a member of the
African Union?
• In your opinion, what are the chances that the United States has provided
more official development assistance than any other country since 2010?
Appendix 213
• In your opinion, what are the chances that Pope Francis is currently over the
age of 75?
• In your opinion, what are the chances that the United States currently spends
more money on national defense than on Social Security?
• In your opinion, what are the chances that Pakistan has a larger active-duty
military than Iran?
• In your opinion, what are the chances that the United States currently has a
lower unemployment rate than Vietnam?
• In your opinion, what are the chances that Transparency International
currently ranks the United States as being more corrupt than the average
country?
• In your opinion, what are the chances that the U.S. Department of Homeland
Security’s annual budget is less than $100 billion?
• In your opinion, what are the chances that the United States is currently
holding fewer than 100 detainees in the Guantanamo Bay prison camp?
• In your opinion, what are the chances that North Korea’s Kim Jong Un is cur-
rently the world’s youngest head of government?
• In your opinion, what are the chances that the 2015 Paris Climate Agreement
requires all signing countries to cut carbon emissions by at least 2 percent?
• In your opinion, what are the chances that Russia withdrew its support for the
International Space Station in 2014?
• In your opinion, what are the chances that Saudi Arabia currently produces
more oil than any other country?
• In your opinion, what are the chances that jihadi terrorists have killed more
people in France than in the United States over the last decade?
• In your opinion, what are the chances that the International Criminal Court
has never issued a conviction?
• In your opinion, what are the chances that the United States is the only
country whose military possesses stealth aircraft?
• In your opinion, what are the chances that more than 50,000 Americans trav-
eled to Cuba over the past year?
• In your opinion, what are the chances that there are currently more than
3 million registered refugees from the Syrian Civil War?
• In your opinion, what are the chances that the global number of people living
on less than $2 per day has declined over the past twenty years?
• In your opinion, what are the chances that the United States currently
maintains economic sanctions on more than 50 countries?
• In your opinion, what are the chances that Brazil’s President, Dilma
Rousseff, currently faces criminal charges as part of her impeachment
proceedings?
214 Appendix
• In your opinion, what are the chances that the United States currently
conducts more trade with Canada than with Mexico?
For example, the WEP “remote chance” covers the first 1/7 of the probability
spectrum (i.e., probabilities from zero to fourteen percent). For each question in
the data set, we would thus calculate the mean of all probability estimates that
fell between 0.00 and 0.14. We can generalize this approach using the formula
µ q ,w = ∑ w p / N , where µ q ,wep is the mean value for all numeric estimates cor-
whigh
responding to WEP w on question q , wlow and whigh represent the lowest and
low
On how interpretations of verbal probabilities can vary by context, see Ruth Beyth-Merom,
14
However, this particular aspect of the scoring method is not crucial to the
results presented in c hapter 4. All findings reported in chapter 4 are robust to
translating WEPs into the midpoints of each bin; to defining WEPs based on
survey research that indicates how respondents typically interpret what those
terms are supposed to mean;15 and to defining WEPs based on the midpoints of
definitions recommended by the U.S. Director of National Intelligence.16
Step 3: The previous step in the analysis assigned a single numeric value to
each WEP in the data set. Because there is no such constraint on the values
that numeric percentages can take in the quantitative assessment condi-
tion of this experiment, the latter set of responses involve greater variance.
It is necessary to eliminate that extra variance in order to score qualitative
and quantitative judgments on a level playing field. We thus round all nu-
meric probability estimates that correspond to a particular WEP to the
same µ q ,wep values estimated above.
Step 4: We have now transformed the data so that we can compare the accu-
racy of verbal and numeric probability estimates on equivalent grounds.
The results presented in chapter 4 evaluate these judgments using the Brier
scores. These findings are also robust to evaluating the accuracy of proba-
bility estimates using the logarithmic scoring rule.
15
We based these definitions on “inferred probabilities” collected by Frederick Mosteller and
Cleo Youtz, “Quantifying Probabilistic Expressions,” Statistical Science, Vol. 5, No. 1 (1990), pp. 2–
12. For a similar method applied in more recent work, see David R. Mandel, “Accuracy of Intelligence
Forecasts from the Intelligence Consumer’s Perspective,” Policy Insights from the Behavioral and Brain
Sciences, Vol. 2, No. 1 (2015), pp. 111–120.
16
The Director of National Intelligence’s alternative WEP spectrum is described in c hapter 1,
Figure 1.2. For more detail on these robustness checks, see Jeffrey A. Friedman, Jennifer S. Lerner, and
Richard Zeckhauser, “Behavioral Consequences of Probabilistic Precision: Experimental Evidence
from National Security Professionals,” International Organization, Vol. 71, No. 4 (2017), pp. 803–826.
17
Indeed, in a statistical sense, the tendency for numeric assessors to make more extreme
judgments entirely explains the gap in performance between groups.: See Friedman, Lerner, and
Zeckhauser, “Behavioral Consequences of Probabilistic Precision.”
216 Appendix
.35
Qualitative assessments
Quantitative assessments
.30
Brier score
.25
.20
0 .10 .20 .30 .40 .50
Certitude
.35
.30
.25
Figure A.2 Accuracy, certitude, and analytic precision. The graphs display the
relationship between the certitude respondents attached to their probability assessments
and the accuracy of those assessments (as measured by Brier scores, with 95 percent
intervals). The performance gap between qualitative and quantitative assessors grows as
respondents became more certain in their judgments.
The horizontal axes in Figure A.2 reflect the certitude that respondents at-
tached to their probability estimates, defined as the absolute difference between
each probability estimate and fifty percent (such that estimates of ten percent
and ninety percent reflect the same levels of certitude).18 The vertical axes in
The data in these graphs do not extend all the way to 100 percent given the scoring method
18
described in the previous section, which rounds each probability estimate in the data set to the
nearest µ q ,w benchmark.
Appendix 217
Figure A.2 represent the average Brier scores associated with probability
estimates at each level of certitude. Note that the gap between Brier scores
steadily grows as probability estimates become more extreme.19
19
Figure A.2 plots these relationships using fractional polynomials with 95 percent confidence
intervals.
100%
How often those statements
75%
were actually true
50%
25%
0%
0% 25% 50% 75% 100%
Probability respondents assigned
Hostage-Rescue Vignette
The U.S. military is searching for five American citizens held hostage by a rebel
group overseas. They receive information suggesting that the hostages are being
held in a rural compound. Analysts can tell that the compound is being used by
the rebel group, but they have difficulty confirming that the hostages are present.
There is pressure to act quickly and little opportunity to gain additional informa-
tion. Special Forces expect that a raid on the compound will meet armed resistance.
After careful deliberation, analysts report that [probability manipulation]
that the hostages are being held in the compound. Decision makers review the
available information and [decision manipulation].
Probability manipulations: There is a 40 percent chance; there is a 50 percent
chance; there is a 60 percent chance; there is a 75 percent chance; there is a 90 per-
cent chance; there is a 100 percent chance; it is possible but unlikely; there is an
even chance; it is likely; it is very likely; they are almost certain; they are certain.
Decision manipulations: (i) choose to raid the compound. When special
forces enter the compound, they find that the hostages were not in fact present,
and two soldiers are killed in the resulting firefight; (ii) choose to raid the com-
pound. When Special Forces enter the compound, they locate and retrieve the
hostages, though two soldiers are killed in the resulting firefight; (iii) decline
to raid the compound. It is later determined that the hostages were not in fact
220 Appendix
The p-values in this table reflect two-way t-tests comparing differences in means.
present. Their location remains unknown; (iv) decline to raid the compound. It
is later determined that the hostages were in fact present. They have since been
moved to another unknown location.
Terrorism Vignette
U.S. intelligence analysts receive information about a potential terrorist attack
on passenger airliners. Informants warn that terrorists may be preparing use a
new form of explosive against several flights departing from California. If this is
true, then it poses an immediate threat to passenger safety. However, there are
reasons to doubt that the plot is real. In particular, terrorists may be planting
false information to trick the U.S. government into restricting air travel, which
would cause panic and economic damage.
Analysts conclude that [probability manipulation] that this plot is real.
Decision makers review the information and decide [decision manipulation].
Probability manipulations: there is a 0 percent chance; there is a 10 percent
chance; there is a 25 percent chance; there is a 40 percent chance; there is a
50 percent chance; there is a 60 percent chance; it is impossible; there is a
Appendix 221
remote chance; it is possible but very unlikely; it is possible but unlikely; there is
an even chance; it is likely.
Decision manipulations: (i) to halt all flights leaving California for one week. This
move costs the airline industry more than $1 billion and creates national alarm. The
threat is later revealed to have been a hoax, and stopping air travel appears to have
saved no lives; (ii) to halt all flights leaving California for one week. This move costs
the airline industry more than $1 billion and creates national alarm. However, the
threat is confirmed to be real, law enforcement agents disrupt it, and this saves sev-
eral hundred lives; (iii) to allow air travel to continue. The threat is later revealed to
have been a hoax; (iv) to allow air travel to continue. Days later, four airliners leaving
California are destroyed in explosions, killing several hundred passengers.
Corruption Vignette
U.S. intelligence analysts are tracking Afghan government officials suspected
of embezzling more than $100 million in U.S. development aid. One suspect,
Babrak Ghafar, is currently visiting the United States. U.S. officials could appre-
hend Ghafar before he leaves. But arresting a prominent Afghan government of-
ficial would cause major political controversy if Ghafar turns out to be innocent.
After careful deliberation, intelligence analysts report that [probability ma-
nipulation] that Ghafar is embezzling funds. Decision makers review the avail-
able information and [decision manipulation].
Probability manipulations: There is a 40 percent chance; there is a 50 per-
cent chance; there is a 75 percent chance; there is a 90 percent chance; there
is a 100 percent chance; it is possible but unlikely; there is an even chance; it is
likely; it is very likely; they are almost certain; they are certain.
Decision manipulations: (i) choose to arrest Ghafar. It is later determined that
Ghafar was in fact part of the corruption ring; (ii) choose to arrest Ghafar. It is
later determined that Ghafar was not part of the corruption ring. The incident
draws substantial international criticism of U.S. intervention in Afghan politics;
(iii) decline to arrest Ghafar. It is later determined that Ghafar was not part of the
corruption ring; (iv) decline to arrest Ghafar. It is later determined that Ghafar
was in fact part of the corruption ring. His current whereabouts are unknown.
Drone-Strike Vignette
U.S. intelligence officials are attempting to locate a high-ranking terrorist. Drone
operators say that they have found a man who meets their target’s description. He
is driving alone, in a deserted area. Drone operators may not get another shot as
clean as this one. But it is always difficult to confirm a target’s identity using re-
mote surveillance. U.S. officials worry that the man could be an innocent civilian.
222 Appendix
Numeric probability
Judgmental error
Policy failure
Constant
–10 0 10 20 30 40
Estimated effect (95% intervals)
Figure A.4 Replication of Figure 5.4, with no excluded responses. The figure shows the
impact each randomized variable exerted on the amount of criticism the respondents
assigned to foreign policy analysts. These data do not exclude any respondents who
failed attention checks. Ordinary least squares regression with robust standard errors and
respondent fixed effects (N = 11,793).
Abbottabad, Pakistan. See raid on Osama bin distinguishing between probability and
Laden’s compound, Abbottabad, Pakistan confidence, 190–91
ACH (analysis of competing hypotheses), 31–32 supplementing key judgments with numeric
Acheson, Dean, 187–88 percentages, 190
Afghan Surge (2009), 28–29, 30, 33, 59–60 Bay of Pigs invasion (1961), Joint Chiefs of Staff
agnostics report on, 2–4, 5, 144–46
on probability assessment, 6–7, 93–94 Bayesian updating, 167–70
on thresholds of allowable exactness, 7 Berlin, Isaiah, 106
al-Askari Mosque bombing, Iraq, 181 Betts, Richard, 20, 43–44, 60
al-Maliki, Nouri, 181–82, 183, 184 bin Laden, Osama, 4–5, 52, 58, 62, 65–66, 68,
al Qaeda, 141–42, 187 163, 164–65
al Qaeda in Iraq (AQI), 181 blame exposure/avoidance. See politics of
aleatory uncertainty vs. epistemic uncertainty and blame
uncertainty, 52–53 Board of National Estimates, 19
allowable exactness threshold, 7 break-even analysis, 161–65. See also subjective
Amazon Mechanical Turk (AMT), 102–5, 110, probability estimates and high-stakes choices
113, 117–23 Brier scores, 76–77, 132–34, 137, 156–57, 198
ambiguity aversion, 57 Bundy, McGeorge, 35–36, 40, 42–44, 45, 169–70
analysis of competing hypotheses (ACH), 31–32 Bush, George W. See also Iraq War (2003–2011);
analytic confidence September 11, 2001 terrorist attacks
defined, 14 CIA report on al Qaeda (2001), 141
elements of, 14, 58–63, 190–91 preemption doctrine, 62
probability vs. confidence, 14, 58–63, 190–91 reliance on NIE report, 19–20
Appendix views on U.S. occupation of Iraq, 180, 181
Chapter 3 supplementary materials, 198–205
Chapter 4 supplementary materials, 205–17 Central Intelligence Agency (CIA)
Chapter 5 supplementary materials, 217–22 Bulletin on Egyptian military buildup
Arab Spring, 139 (1973), 148
Aristotle, 7 interpretation of Joint Chiefs assessment of Bay
of Pigs invasion, 145
Ball, George, 41 Office of Reports and Estimates on Soviet
bargaining model of war, 166–70 nuclear weapons (1946), 142–43
basic standards for assessing uncertainty, 189–92 Office of Scientific Intelligence on Soviet
assessing chances of policy success, 189–90 nuclear weapons, 142–43
avoiding relative probability and on probability of bin Laden at Abbottabad
conditioning, 190 compound, 4–5, 62
common lexicon for verbal probability report on al Qaeda (2001), 141
assessments, 190 study on Kent’s words of estimative probability, 97
223
224 Index
North Atlantic Treaty Organization (NATO), One Percent Doctrine and, 187–88
Common Standard for probability probability assessment, pathologies of, 17–50
assessment, 27–28 aversion to probabilistic reasoning, 49
Nye, Joseph, 19 conditioning practice of, 32–35
imprecision and words of estimative probability,
Obama, Barack. See also Iraq War (2003–2011) 21–24, 32
Afghan Surge (2009) and, 28–29, 30, 59–60 measuring tactical progress without assessing
hostage-rescue attempt, Raqqa, Syria strategic prospects in Vietnam, 42–44
(2014), 165 probability of success in Vietnam (1961–1965),
raid on Osama bin Laden’s compound, 4–5, 52, 35–44, 50
58, 62, 65–66, 68, 164–65 relative probability, 28–32, 34–35
Status of Forces Agreement (SOFA) words of estimative probability outside
negotiations, 183 intelligence studies, 24–28
objectivity, 57–58 probability assessment, returns to precision
October War (1973), 147–48 in, 80–93
odds table (Sherman Kent), 18, 19 aggregate results, 80–82
Office of the Director of National targets for cultivation vs. targets for
Intelligence (ODNI) selection, 89–93
Intelligence Advanced Research Projects variation across individuals, 87–93
Activity (IARPA), 74–75, 78 variation across questions, 86–87
Intelligence Community Directive, on variation across the number line, 83–85
objectivity, 57–58 variation across time horizons, 85
On War (Clausewitz), 5–6 propensity theory of probability, 54–56, 63
One Percent Doctrine, 187–88 psychology of probability assessment, 95–127
ambiguity aversion, 57
paired sample approach, 100 cultivating of fluency with numeric
Pearl Harbor attack, 139–41 expressions, 118–23
Pentagon Papers, 36 decision maker interest in probabilities, 96–105
pharmaceutical industry, use of relative probability discrimination-calibration trade-off,
in marketing of drugs, 29 115–18, 123–25
politics of uncertainty and blame, 129–59. disinterest hypothesis, 100–5
See also specific intelligence failures illusions of rigor hypothesis, 105–13
absences of warning, 139–44 public evaluation of uncertainty assessments. See
elastic redefinition, 134–35, 136–37, 154–55 politics of uncertainty and blame
flawed assessments, 142–50 Putin, Vladimir, 23
perceived intelligence failures, 138–50 puzzles vs. mysteries, 52–53
questions regarding conventional wisdom of,
136–38, 150 raid on Osama bin Laden’s compound,
strategic caution, 135–36, 137 Abbottabad, Pakistan, 4–5, 52, 58, 62, 65–66,
precision in probability assessment. See Good 68, 163, 164–65
Judgment Project (GJP) Ramsey, Frank P., 56
probabilistic terms, 46 Raven’s Progressive Matrices, 91
probability assessment, defined, 14 reasonable doubt standard, in legal verdicts,
frequency theory of probability assessment, 16, 61
53–54, 63 rejectionists
probability vs. confidence, 14, 58–63, 190–91 illusions-of-rigor thesis and, 95–96
probability vs. risk and ignorance, 15 on probability assessment, 6, 8, 12, 93–94
probability vs. uncertainty, 14–15 relational judgments
propensity theory of probability assessment, in U.S. Army analytic planning
54–56, 63 standards, 29–31
subjective theory of probability in Vietnam War probability assessments, 41
assessment, 56–58 relative probability. See also probability
probability assessment, norms and assessment, pathologies of
reforms, 187–95 avoidance of, 190
basic standards development, 189–92 DHS National Terrorism Advisory System
directions for future research, 192–95 and, 31
multiple advocacy and, 192 U.S. Army analytic standards and, 28–31
Index 227
relative probability assessments, defined, 46 superforecasters, 75–76. See also Good Judgment
risk analysis/assessment. See also subjective Project (GJP)
probability and international politics Survey Sampling International, 151
earthquake risk communications, 129–30 Suskind, Ron, 187
effectiveness and, 190 Syria
in military strategies, 25, 27–28 hostage-rescue attempt, Raqqa (2014), 165
risk vs. uncertainty and ignorance, 15 October War (1973), 147–48
terrorism risk assessment, 31
Rostow, Walt, 38, 47–48 Taylor, Maxwell, 38, 40
rounding errors, defined, 78, 198 Tenet, George, 141, 143, 178–79
Rumsfeld, Donald, 178–79, 180 terrorism risk assessment, 31
Russia Tetlock, Philip, 35, 74, 75, 134–35
DIA report on interference in 2016 U.S. Treverton, Gregory, 97, 135–36
presidential election, 21–24
Napoleon’s invasion of (1812), 170–71 uncertainty. See also basic standards for assessing
uncertainty; politics of uncertainty
Sadat, Anwar, 147 and blame
September 11, 2001 terrorist attacks, aleatory uncertainty, 52–53
139, 141–42 epistemic uncertainty, 52–53
Shoup, David, 3 probability vs. confidence, 14, 58–63, 190–91
Silver, Nate, 35, 70–71 probability vs. uncertainty, 14–15
“slam dunk” (Iraq War intelligence), 19–20, puzzles vs. mysteries, 52–53
32, 143–44 uncertainty vs. risk, 15
Sotloff, Steven, 165 U.S. Army
Soviet Union, nuclear weapons programs, 138, analytic standards for human source reliability
142–43, 150 assessment, 25–28
strategic assessment, defined, 166. See also analytic standards for planning and relational
subjective probability estimates and judgments, 29–31, 57–58
high-stakes choices doctrine and terminology of risk
strategic caution, defined, 135–36. assessment, 24–25
See also politics of uncertainty and blame Field Manual 3-24 (FM 3-24),
strictly proper scoring rules, 77, 132–34, 137, counterinsurgency doctrine, 33
157–59, 198–99 force sizing, 33
subjective probability and international tactical reporting during Vietnam War, 42
politics, 51–68 U.S. Defense Department
probability, confidence, and analytic on improvement as goal in Vietnam, 42–43
rigor, 58–63 probability assessments during Vietnam
subjective probability framework, practical War, 18, 36–37, 43–44, 45, 49, 169–70,
implications, 63–67 171–72, 188–89
subjective theory of probability, 56–58 U.S. Department of Homeland Security (DHS),
theoretical foundations of probability use of relative probability in National
assessment, 52–58 Terrorism Advisory System, 31
subjective probability estimates and high-stakes U.S. Intelligence Board, on Arab-Israeli hostilities
choices, 161–85 (1973), 148
bargaining model of war and, 166–70 U.S. National War College, 13
Bayesian updating, 167–70 national security officials as survey respondents,
break-even analysis and, 161–65 102–5, 110, 113, 117–23
cumulative dynamics and strategic assessment probability estimates data set, 71–73
in Iraq (2003–2011), 177–85 U.S. presidential election (2016)
debates over the Iraq Surge (2006–2007), Intelligence Community Assessment of Russian
177–78, 181–83 interference in, 21–24
debates over U.S. strategy in Iraq predictions in, 70–71
(2003–2006), 178–80 U.S. State Department, dissent over Iraqi WMD
debates over withdrawal from Iraq programs, 61–62, 143
(2009–2011), 177–78, 183–84
subjective theory of probability, 56–58 vagueness. See politics of uncertainty and blame;
sunk-cost fallacy, 98 probability assessment, pathologies of
228 Index