Axelrod 1980

Effective Choice in

the Prisoner's Dilemma

lnsrirute of Public Policy Siudies
University of hf iehigon

This is a "primer" on how to play the iterated Prisoner's Dilemma game effectively.
Existing research approaches offer the participant limited help in understanding how t o
cope effectively with such interactipns. To gain a deeper understanding of how t o be
effective in such a partially competitive and partially cooperative environment, a computer
tournament was conducted for the iterated Prisoner's Dilemma. Decision rules were
submitted by entrants who were recruited primarily from experts in game theory from
a variety ofdisciplines: psychology, political science,economics, sociology,and mathema-
tics. The results of the tournament demonstrate that there are subtle reasons for a n in-
dividualistic pragmatist t o cooperate a s long as the other side does, to be somewhat for-
giving, and t o be optimistic about the other side's responsiveness.



This article is a "primer" on how to play the Prisoner's Dilemma

game effectively.
International politics is just one of the arenas which offer numerous
occasions of the Prisoner's Dilemma iterated for many moves. For
example, between the United States and the Soviet Union there are
aspects of the Prisoner's Dilemma in such sequences of events as arma-

ments budgets, trade concessions, deployment of troops, and escalation

in limited or proxy wars (see, for example, Snyder, 1971).
The distinguishing feature of the Prisoner’s Dilemma is that in the
short run, neither side can benefit itself with a selfish choice enough
to make Up for the harm done to it from a selfish choice by the other.
Thus, if both cooperate, both d o fairly well. But if one defects while
the other cooperates, the defecting side gets its highest payoff, and the
cooperating side is the sucker and gets its lowest payoff. This gives
both sides an incentive to defect. The catch is that if both do defect, both
do poorly. Therefore the Prisoner’s Dilemma embodies the tension
between individual rationality (reflected in the incentive of both sides
to be selfish) and group rationality (reflected in the higher payoff to both
sides for mutual cooperation over mutual defection). The payoff struc-
ture of each move in a typical Prisoner’s Dilemma is given in Table 1.
The fact that interactions are repeated is very important to the
dynamics of the situation. If the two sides knew that there would be
only a single choice, there would be every incentive to defect, since no
matter what the other player chooses, defection yields a higher payoff
than cooperation. But actors often have ongoing relationships with
both an informative history and an important future. Making effective
choices in such an ongoing relationship requires insight into the struc-
tural implications of strategic interaction.
Suppose, for example, that in an interaction between the United
States and the Soviet Union, both sides are following a strategy of TIT
FOR T A T cooperate initially, and thereafter cooperate if the other
side cooperated last time and defect if the other side defected last time.
This pair of strategies would lead to an unending series of mutual
cooperation. But now suppose that one side or the other decided to
make a minor change in its strategy to seek a slightly greater measure
of success. One such change would be to cooperate 90% rather than
100% of the time after the other side has just cooperated. This change
in strategy may seem promising, because it occasionally yields the
highest possible payoff, while still punishing the other side for any de-
fections it might undertake. In fact, as we shall see, this slight and
seemingly advantageous variation on TIT FOR TAT actually performs
much worse than theTIT FOR TATstrategyinavariety ofsettings, and
it frequently leads to an uninterrupted series of unrewarding mutual
Of course, the elegant formulation of an interaction as a Prisoner’s
Dilemma puts aside many vital features which make any actual event

A.rclrot1 1 711E PRISOKER’S DILEhlhlA 5

Payoff Matrix for Each Move of the Prisoner’s Dilemma
~ ~~

Cooperate Defect

Cooperate 3,3 0,5

Row Player
Defect 5 .O 1.I
(The payoff to the row player is given first in each pair of numbers.)

unique. Examples of what is left out by this formal abstraction include

the possibility of messages to go along with the choices, the presence
of third parties, the problems of implementinga choice, and uncertainty
about the actual prior choice of the other side. The list could be extended
indefinitely. Of course, no intelligent actor should make a choice with-
out taking such complicating factors into account. The value of an
analysis without them is that it can help the decision maker see some
subtle features of the interaction, features which might otherwise be
lost in the maze of complexities of the highly particular circumstances
in which choice must actually be made. It is the very complexity of
reality which makes the analysis of an abstract interaction so helpful
as a n aid to perception.

The Study of Effective Choice

Psychologists using experimental subjects have found that in the

iterated Prisoner’s Dilemma, the amount of cooperation attained and
the specific pattern for attaining it depend on a wide variety of factors
relating to the context of the game, the attributes of the individual
players, and the relationship between the players. Since behavior in
the game reflects so many important factorsabout people, it has become
a standard way to explore questions in social psychology from the
effects of Westernization in Central Africa (Bethlehem, 1975) to the
existence (or nonexistence) of aggression in career-oriented women
(Baefsky and Berger, 1974) and the differential consequences of abstract
vs. concrete thinking styles (Nydegger, 1974). Just in the last 10 years,
there have been over 350 articles on the Prisoner’s Dilemma cited in

ps).cl~ologicalAbsrracrs. The iterated Prisoner’s Dilemma has become

the E. coli. of social psychology.’
Equally important as its use as an experimental test bed is its use
as the conceptual foundation for models of important social processes.
Richardson’s model of the arms race is based on an interaction which is
essentially a Prisoner’s Dilemma played once a year with the budgets
of the competing nations (Richardson, 1960; Zinnes. 1976: 33-440).
Oligopolistic competition can also be modeled as Prisoner’s Dilemma
(Samuelson, 1973: 503-505). The ubiquitous problems of collective
action to produce a collective good are analyzable as an n-person
Prisoner’s Dilemma (Hardin, 1971). Even vote trading has been mo-
deled as a Prisoner’s Dilemma (Riker and Brams, 1973). In fact, many
of the best developed models of important political, social, and eco-
nomic processes have the Prisoner’s Dilemma as their foundation.
There is yet a third literature about the Prisoner’s Dilemma. This is
the literature which goes beyond the empirical questions of the labora-
tory or the real world, and instead uses the abstract game to analyze
the features of some fundamental strategic issues, such as the meaning
of rationality (Luce and Raiffa, 1957), choices which have externalities
(Schelling, 1973), and cooperation with enforcement (Taylor, 1976).
Unfortunately, none of these three literatures on the Prisoner’s
Dilemma tells us much about how to play the game well. The experi-
- - all of it is based
mental literature is not much help, because virtually
on analyzing the choices made by inexperienced players who are seeing
the game for the first time. Their appreciation of the strategic subtleties
is bound to be restricted. The choices of experienced economic and
political elites are studied in some of the applied literature of Prisoner’s
Dilemma, but the evidence is of limited help, because of the relatively
slow pace of most high level interactions and the difficulty of control-
ling for changing circumstances. All together, no more thana fewdozen
choices have been identified and analyzed this way. Finally, the abstract
literature of strategic interaction usually studies variants of the iterated
Prisoner’s Dilemma designed to eliminate the dilemma itself by intro-
ducing changes in the game such as allowing interdependent choices
(Howard, 1966; Rapoport, 1967) or puttinga tax on defection (Tideman
and Tullock, 1976; Clarke, 1978).
To learn more about how to choose effectively in a n iterated Prison-
er’s Dilemma, a new approach is needed. Such an approach would have
to draw on people who had a rich understanding of the strategic possi-
I. A helpful article reviewing the effects of programmed strategies is Oskamp (1971).

A.wlrvtl 1 TIlE PRISONER’S D I L E M M A 7

bilities. It would also have to take into account two important facts
about strategic interaction in a non-zero sum setting. First, what is
effective is likely t o depend not only upon the characteristics of a par-
ticular strategy, but also upon the nature of the other strategies with
which it must interact. The second point follows directly from the
first. An effective strategy must be able at any point t o take into account
the history of the interaction as it has developed so far.
A computer tournament for the study of effective choice in the
iterated Prisoner’s Dilemma meets these needs. In a computer tourna-
ment, each entrant writes a program which embodies a decision rule
t o select the cooperative o r noncooperative choice on each move. The
program has available t o it the history of the game so far, and may use
this history in making a choice. Because the participants are recruited
primarily from those who have written on game theory a n d especially
the Prisoner’s Dilemma, the entrants are assured that their decision
rule will be facing rules of other experts. Such recruitment also guaran-
tees that the state of the art is represented in the tournament.
Just such a tournament was conducted. T h e results contain some real
surprises. These surprises in turn offer new insights into the questions
of how to understand and how t o cope with a n environment which
contains aspects of the Prisoner’s Dilemma.


The Winner

TIT FOR TAT, submitted by Professor Anatol Rapoport of the

University of Toronto, won the Computer Tournament for the Iterated
Prisoner’s Dilemma. This simplest of all submitted programs turned out
t o be the best of the fourteen entries from economists, mathematicians,
political scientists, psychologists, and a sociologist.
TIT FOR T A T starts with a cooperative choice, and thereafter does
what the other player did on the previous move. This decision rule is
probably the most widely known and most discussed rule for playing
the Prisoner’s Dilemma. It is easily understood and easily programmed.
It is known t o elicit a good degree of cooperation when played with
humans (Oskamp, 1971). As a n entry t o a computer tournament, it has
the desirable properties that it is not very exploitable and that it does
well with its own twin. It has the disadvantage that it is too generous

with the RANDOM rule, which was known by the participants to be

entered in the tournament.
In addition, TIT FOR TAT was known to be a powerful competitor.
In a preliminary tournament, TIT FOR TAT scored second place, and
in a variant of that preliminary tournament, TIT FOR TAT won
first place. All of these properties were known to most of the people
designing programs for the Computer Prisoner’s Dilemma Tourna-
ment, because they were sent copies of the description of a preliminary
tournament. And not surprisingly, many of them used the TIT FOR
TAT principle and tried to improve upon it.
The striking fact is that none of the more complex programs sub-
mitted were able to perform as well as the original, simple TIT FOR
This result contrasts with computer chess tournaments where com-
plexity is obviously needed. For example, in the Second World Com-
puter Chess Championships, the least complex program came in last
(Jennings, 1978). It was submitted by Johann Joss of the Eidgenossishe
Technische Hochschule of Zurich, Switzerland, who also submitted an
entry to the Computer Prisoner’s Dilemma Tournament. His entry to
the Prisoner’s Dilemma Tournament was the small modification of
TIT FOR TAT which has already been mentioned. But his modifica-
tion, like the others, just lowered the performance of the decision rule.

The Form of the Tournament

The tournament was run as a round robin so that each entry was
paired with each other entry. As announced in the rules of the tourna-
ment, each entry was also paired with its own twinand with RANDOM,
a program that randomly cooperates and defects with equal probability.
Each game consisted of exactly 200 moves. The payoff matrix for
each move awards both players 3 points for mutual cooperation, and
1 point for mutual defection. If one player defects while the other player
cooperates, the defecting player receives 5 points and the cooperating
player receives 0 points (see Table 1).
No entry was disqualified for exceeding the allotted time. In fact,
the entire round robin tournament was run five times to get a more
stable estimate of the scores for each pair of players. In all, there were
120,000 moves, making for 240,000 separate choices.

A.relrotl TliE PRISONERS DILEhlAlA 9

The Contestants

The fourteen submitted entries came from three countries and five
disciplines. The Appendix lists the names a n d affiliations of the sixteen
people who submitted these entries, and it gives the rank and score
of their entries.
One remarkable aspect of the tournament is that it allowed people
from different disciplines t o interact with each other in a common
format and language. Most of the entrants were recruited from those
who had published articles o n game theory in general o r the Prisoner’s
Dilemma in particular.
Analysis of the results shows that neither the discipline of the author,
the brevity of the program, nor its length accounts for a rule’s relative
success. What does?
Before answering this question, a remark on the interpretation of
numerical scores is in order. A useful benchmark for very good perfor-
mance is 600 points, which is equivalent t o the score attained by a
player when both sides always cooperate with each other. A useful
benchmark for very poor performance is 200 points, which is equivalent
t o the score attained by a player when both sides never cooperate with
each other. As we shall see, most scores range between 200 and 600
points, although scores from 0 to 1000 points are possible. The winner,
T I T FOR TAT, averaged 504 points per game.



Surprisingly, there is a single property which distinguishes the

relatively high scoring entries from the relatively low scoring entries.
I will call this the property of being nice. A decision rule is’nice if it will
not be the first t o defect, o r if at least it will not be the first to defect
before the last few moves (say before move 199).
Each of the top eight ranking entries are nice. None of the other
entries are nice. There is even a substantial gap in the score between
the nice entries and the others. The nice entries received tournament
averages between 472 and 504, while the best of the entries that were not
nice received only 401 points. Thus, not being the first t o defect, at least
until virtually the end of the game, was a property which all by itself

separated the more SUCCeSSfUl rules from the less successful rules in this
Computer Prisoner’s Dilemma Tournament.
Each of the nice rules got about 600 points with each of the 7 other
nice rules and with its own twin. This is because when 2 nice rules play,
they are sure t o cooperate with each other until virtually the end of
the game. Actually the minor variations in end-game tactics did not
account for much variation in the scores.
The scores of each player with each of the others are displayed in
Table 2. Notice that the numbers in the upper’left are all near 600, due
t o the fact that the top ranking entries are all nice and therefore cooper-
ate with each other.
The bottom left section of Table 2 shows that the rules which are not
nice rarely got more than 400 points with any of the nice rules. There-
fore, the nice rules had a tremendous advantage in that they cooperated
with each other, but were not very cooperative with the rules which
were the first t o defect.
So far we have seen that niceness accounts for which rules are in the
top eight ranks and which rules are below. But what accounts for the
relative ranking among the top eight?

Effectiveness with the Kingmaker

Since the nice rules all got within a few points of 600 with each other,
ihe thing which distinguishes the relative rankings among the nice
rules is their scores with the rules which are not nice. This much is
What is not obvious is that the relative ranking of the eight top rules
was largely determined by just two of the other seven rules. These two
rules are kirtgntakers because they d o not d o SO well for themselves,
but they largely determine the rankings among the top contenders.
The two kingmakers are G R A A S K A M P and DOWNING. (Entries
are indicated by capital letters. All entries except the familiar TIT F O R
T A T are named after their authors.)
DOWNlNG is the most important kingmaker because the range of
scores achieved with it by the nice rules was the largest of any rule.
The scores of the nice rules playing DOWNING ranged from a high
of 601 by TlDEMAN A N D CHIERUZZI to a low of 158 by NYDEG-
DOWNING is a particularly interesting rule in its own right, because,
unlike most of the others, its logic is not based on a variant of TIT F O R

~ W ~ P P W P
o w a w w m NYDEGCER
r l + m r ~ r n r n ~
- - . l w w m m . l GROFMAN
W W w - I m ~ m
W - m W m P SHUBIK
w o m m 4 - 0 1
- - N N N N W
B W W W 1 I I N N
a - a w d - ~
d w \ o m m a w c n JOSS
-P-.l-.l-lte-I TULLWK
r D v I W W N W P
(Name Withheld)
TAT. DOWNING assumes that the other side bases its choice on
DOWNING'S own previous choice. DOWNING estimates the pro-
bability that the other player cooperates after it (DOWNING) cooper-
ates, and also the probability that the other player cooperates after
DOWNING defects. Each move, it updates its estimate of these two
conditional probabilities and then selects the choice which will maxi-
mize its own long-term payoff under the assumption that it hascorrectly
modeled the other player. The decision ru1e.k based on "an outcome
maximization" principle originally developed as a possible interpreta-
tion of what human subjects d o in Prisoner's Dilemma laboratory
experiments (Downing, 1975). If the two conditional probabilities
have similar values, DOWNING determines that it pays to defect, since
the other player seems to be doing the same thing whether DOWNING
cooperates or not. Conversely, if the other player tends to cooperate
after a cooperation but not after a defection by DOWNING, then the
other player seems responsive, and DOWNING will calculate that the
best thing to d o with a responsive player is to cooperate. Under certain
circumstances, DOWNING will even determine that the best strategy
is to alternate cooperation and defection.
At the start of a game, DOWNING does not know the values of these
conditional probabilities for the other players. It assumes that they are
both .5, but gives no weight to this estimate when information actually
does come in during the play of the game.
This is a fairly sophisticated decision rule, but its implementation
does have one flaw. By initially assuming that the other player is un-
responsive, DOWNING is doomed to defect on the first two moves.
These first two defections led many other rules to punish DOWNING,
so things usually got off to a bad start. But this is precisely why DOW-
NING served so well as a kingmaker. First ranking TIT FOR T A T
and second ranking TIDEMAN AND CHIERUZZI both reacted in
such a way that DOWNING learned to expect that defection does
not pay but that cooperation does. All of the other nice rules went
downhill with DOWNING.
The other kingmaker is GRAASKAMP. This is a very clever pro-
gram. It plays tit for tat for 50 moves, defects once, plays tit for tat for
another 5 moves, and then examines the history of the game so far.
Its defection on move 51 dlows it to recognize its own twin and be
cooperative with it. Similarly, it can check to see if the other player

Asihtl 1 TtlE PRISONER'S 1)ILEhlMA I3

Seems t o be TIT F O R T A T or another program it recognizes. If so, it

plays the rest of the game in a n appropriate way. If its score is not very
good, it suspects (perhaps incorrectly) that it is playing RANDOM
and defects for the rest of the game. Otherwise it continues t o play
tit for tat, but throws in a defection every live t o fifteen moves. Not
being nice, GRAASKAMP itself did not d o very well in the tournament,
but it was one of the two kingmakers. T I T FOR T A T did well with
GRAASKAMP, getting 597 points. Third ranking NYDEGGER also
got a boost by doing next best among the nice programs when playing
The importance of the kingmakers is that they largely determined the
rank order of the top programs. TIT F O R T A T did very well with
both, thereby winning the tournament. TIDEMAN A N D CHIERUZZI
did very well with one of them and came in second. In fact, the rank
order of the nice programs in the tournament exactly matched their
rank order of scores with just the two kingmakers, with only a single
exception. The exception was NYDEGGER, which moved up t o third
in the tournament ranking because it not only did moderately well with
the two kingmakers, but also did about a hundred points better than
the other nice programs with both F E L D and JOSS.
It is noteworthy that R A N D O M was not a kingmaker since all
entrants knew that R A N D O M would be in the tournament. The range
of scores achieved by the nice programs with RANDOM was only 157,
compared t o a range of443 achieved with DOWNINGand 290 achieved
with GRAASKAMP. In fact, among the nice programs, there is actu-
ally a negative relationship between how well they did with RANDOM
and how well they did in the full tournament. The reason is not hard
t o find and it carries a n important lesson. T o d o well with RANDOM
requires giving up on it early. But being too ready to give u p on a seem-
ingly random player will mean often mistakenly giving up on a poten-
tially responsive player.
FRIEDMAN provides a n extreme example of this. FRIEDMAN is
never the first t o defect, but once the other player defects even once,
FRIEDMAN defects from then on. FRIEDMAN did very well with
RANDOM, and since it is nice, it did well with all the other nice rules.
But it did not d o well with the kingmakers, because it did not let them
recover from the consequences of their first defection. This brings us
t o the important property of forgiveness of a decision rule.

We have seen that the nice rules did well in the tournament largely
because they did SO well with each other, and because there were enough
of them to substantially raise each other’s average score. As long as the
other player did not defect, each of the nice rules was certain to continue
cooperating until virtually the end of the game.
But what happened if there was a defection? As we shall see, different
rules responded quite differently, and their response was important
in determining their overall success. A key concept in this regard is
the forgiveness of a decision rule. Forgiveness of a rule is its propensity
to cooperate in the moves after the other player has defected.
The least forgiving rule is also a nice one. This is FRIEDMAN, which
as we have just seen is cooperative until the other player defects; then
it never cooperates again. T I T FOR TAT is unforgiving for one move,
but then is totally forgiving of an isolated defection. SHUBIK, on the
other hand, gets progressively less forgiving: It punishes the first defec-
tion with one defection, then punishes the next with two defections,
and so on.
One of the main reasons why the rules which are not nice did not d o
well in the tournament is that most of the rules in the tournament were
not very forgiving. A concrete illustration will help. Consider the case
of JOSS. This decision rule is a variation of TIT FOR TAT. Like
T I T FOR TAT, it always defects immediately after the other player
defects. But instead of always cooperating after the other player cooper-
ates, 10% of the time it defects after the other player cooperates. Thus
it tries to get away with an occasional exploitation of the other player.
This decision rule seems like a fairly small variation of TIT FOR
TAT, but in fact its overall performance was much worse, and it is
interesting to see exactly why. Table 3 shows the move-by-move history
of a game between JOSS and TIT FOR TAT. At first both players
cooperated, but on the sixth move, JOSS selected one of its probabilis-
tic defections. O n the next move JOSS cooperated again, but TIT FOR
TAT defected in response to JOSS’s previous move. Then JOSS de-
fected in response to TIT FOR TAT’S defection. In effect, the single
defection of JOSS on the sixth move created an echo back and forth
between JOSS and TIT FOR TAT. This echo resulted in JOSS de-
fecting on all the subsequent even numbered moves and TIT F O R T A T
defecting on all the subsequent odd numbered moves.
During this alternation of cooperation and defection by the two
players, each was getting an average of 2.5 points per turn ( 5 when the

Aselrorl 1 T f l E PRISOSER'S D I L E , ~ l , I l A I5

Illustrative Game Between TIT FOR TAT and JOSS
moves 1- 20 11111 23232 32323 23232
moves 21- 4 0 . 32324 44444 44444 44444
moves 41- 60 44444 44444 44444 44444
moves 61- 80 44444 44444 44444 44444
moves 81-100 44444 44444 44444 44444
moves 101-120 44444 44444 44444 44444
moves 121-140 44444 44444 44444 44444
moves 141-160 44444 44444 44444 44444
moves 161-180 44444 44444 44444 44444
moves 181-200 44444 44444 44444 44444
Score in this game: TIT FOR TAT 236, JOSS 241
Legend: 1 b o t h cooperated
2 TIT FOR TAT only coopcrated
3 JOSS only cooperated
4 neither cooperated

other cooperated and 0 when the other defected). This is not too much
below the average of 3 points per turn when both were cooperating.
However, worse was to come. JOSS was still cooperating on moves
7, 9, 11, 13, and so on. But on the twenty-fifth move, JOSS selected
another of its probabilistic defections. Of course TIT FOR TAT de-
fected on move 26 and another reverberating echo began. This echo
had JOSS defecting on the odd numbered moves and TIT FOR TAT
defecting on the even numbered moves. Together these two echoes
resulted in both players defecting on every move after move 25. This
meant that for the next 175 moves they both got only one point per
turn. The final score of this game was 236 for TIT FOR TAT and 241
for JOSS. Notice that while JOSS did a little better than TIT FOR TAT,
both did poorly.*

The problem was a combination of an occasional defection after

the other's cooperation by JOSS, combined with a short-term lack of
forgiveness by both sides. The moral is that if both sidesare unforgiving
in the way that JOSS and TIT FOR TAT were, it does not pay to be as
greedy as JOSS was.

2. In the five games between them, the average scores aere 225 for TIT FOR TAT
and 230 for JOSS.

In the tournament as a whole, T I T FOR TAT won with an average

of 504 points, but JOSS came in twelfth with a n average of only 304
points. They did about the same with the rules which were not nice.
The difference was that JOSS did much worse than TIT FOR T A T
with the rules which were nice. The reason is simply that when playing
with a nice rule JOSS would sooner o r later throw in a defection, and
then the other player would punish it, setting off mutual recriminations.
Without sufficient forgiveness by either side, these recriminations
echoed throughout the game, often being magnified by the specific
pattern of retaliation of the other player.
A similar story could be told for FELD. FELD is a rule which also
defects after the other player defects, but does not always cooperate
after the other player cooperates. In F E L D s case, the probability of
cooperation after the other player cooperates is not a constant 90%
as it was for JOSS. Instead it starts a t 100% and declines steadily so
that it is 50% by the end of the game at move 200. FELD ran into the
same problem of having its defections echoed back t o it, and overall
FELD ranked eleventh with an average score of 328.
FELD does, however, reveal how maximization rules, like DOW-
NING, are subject t o exploitation. In their 5 games together, FELD
averaged 704 points (including the highest single score in the entire
tournament), while DOWNING attained a middling score of 436.
Recall that DOWNING is the rule which estimates what the other player
does after its own cooperation, and what the other player does after its
own defection, and then selects the strategy whichgives it the maximum
long-term gains if these estimates are accurate. What happened in these
games was that DOWNING defected on the first move, and FELD
defected on the second move in response. Then DOWNING drew the
conclusion that if it ever defected, the other player was sure t o defect.
So DOWNING cooperated on move 3. DOWNING’S cooperations
were usually met with cooperations from FELD, so DOWNING deter-
mined that the best thing would be to cooperate. Then when FELD
threw in an occasional defection, DOWNING was totally forgiving
because DOWNING (correctly) estimated that the other player would
be sure to defect if DOWNlNG defected but would be likely to cooper-
ate if DOWNING cooperated. F E L D s tactic of increasing its probabil-
ity of defection following the other side’s cooperation was sufficiently
gradual so that from DOWNING’S point of view the best thing to d o
was always to cooperate. Thus FELD got away with about 25% defec-
tions, while DOWNING sat still for it and never defected after the
second move. This gave FELD an exceptionally high score when paired

A.~eIrii(l,/TllE PRISONER'S llILE~!irliA 17

Actually, no other rule was verysuccessful a t exploiting DOWNING'S

reactive maximization approach. DOWNING'S main problem in the
tournament was simply that its initial beliefs led it to defect o n the
very first move. As we have seen, its initial expectations were that the
other side would defect with equal probability whether DOWNING
defected o r not. This proved t o be a n inaccurate estimate. In actual
play, a cooperation by one player o n the first move of the game was
followed by a cooperation by the other player on the second move 83%
of the time. Conversely, a defection by one player on the first move
was followed by a cooperation by the other player o n the second move
only 29% of the time. So DOWNING'S initial belief in the unrespon-
siveness of the other players was not well-founded and led t o its own
The very best rules were not only nice, but they were relatively for-
giving. TIT FOR TAT, for example, has a very short memory. It pun-
ishes each defection once, but then it is willing to let bygones be bygones.
This is why it did so well with the two kingmakers.
Second place TIDEMAN A N D CHIERUZZI was also forgiving,
but in its own special way. Like SHUBIK, it punishes the other player's
first defection with one defection, punishes the second defection with
two defections, and so on. But it does not give up entirely on a n un-
cooperative player. Under certain conditions, it gives the other player
a fresh start. When a fresh start is decided upon, TIDEMAN A N D
CHIERUZZI cooperates twice and then plays as if the game had just
started. TIDEMAN A N D CHIERUZZI is n o pushover: It only forgives
the other player if five separate conditions are simultaneously met.
These conditions even include a statistical test t o check the hypothesis
that the other player is not RANDOM. Another test is that TIDEMAN
A N D CHIERUZZI is more than ten points ahead. But even this con-
trolled degree of forgiveness was enough to get the program out ofsome
ruts of mutual defection and thereby achieve second place in the tour-

Much of the reason why NYDEGGER achieved third place in the

tournament was the fact that after three mutualdefections, it was willing

3. As might be expected, the results could depend on the payoff matrix. Forexample,
if P is changed from I to 2, then TIDEMAN AND CHIERUZZI would have come in
first, followed by STEIN AND RAPOPORT, FRIEDhlAN. and TIT FOR TAT. Of
course, the entrants knew the actual payoff matrix. Had the announced payoff matrix
been different. some of them might have submitted different programs, and this would
have affected everyone's score.

t o try a cooperation. This direct kind of forgiveness was especially

helpful in getting out of mutual recriminations when playing the varia-
tions of the tit for tat principle used by FELD and JOSS. And among
the nice rules, the least forgiving (FRIEDMAN and DAVIS) did the
least well.
In contrast t o the successful forgiveness of the t o p entries, TUL-
LOCK demonstrates the costs of being not nice and deliberately mean.
TULLOCK cooperates for the first eleven moves, but from then on,
it has a probability of cooperation which is 10% less than the level of
cooperation shown by the other side in the preceding ten moves. Other
players tended to echo TULLOCK’s defections, and then TULLOCK
would amplify the echo. Often games with TULLOCK would end with
a long string of mutual defection. This lack of forgiveness along with
the willingness t o be the first t o defect was largely t o blame for TUL-
LOCK’S relatively low score of 301 points and thirteenth rank.4


Many surprises were produced by the Computer Tournament for

the Iterated Prisoner’s Dilemma. Here is a review of these surprises,
with some remarks on their implications.
( I ) T I T F O R T A T won the tournament. This was the simplest rule
submitted: Cooperate on the first move and then d o whatever the other
player did on the previous move. One implication is that reciprocity
is not only a social norm, but can also be a n extremely successful oper-
ating rule for an individualistic pragmatist.
(2) Despite the fact that most entrants tried t o improve on T I T F O R
TAT, none of the submitted attempts actually performed as well. As
the British would say, “they were too clever by half.” It seems that the
main thing overlooked by many of these entries was the significance
of echo effects. While many of the rules would work well with a player
who tolerated their occasional exploitative moves, they did less well
in a n environment in which their defections were usually echoed back
a t them. In many games, even two unprovoked defections were enough
t o set up echoes leading t o a pattern of unending mutual punishment.

4. TULLOCK was also involved in a cycle of scores. TULLOCK did about fifty
points better than SHUBIK in their games together. And SHUBIK did about ten points
better than DOWNlNG in h e i r games together. But DOWNING did fifty points better

perhaps the primary lesson of the tournament is the importance of

minimizing echo effects in an environment of mutual power. A sophis-
ticated analysis must go at least three levels deep. First is the direct
effect of a choice. This is easy, since a defection always earns more than
a cooperation. Second are the indirect effects, taking into account that
the other side may or may not punish a defection. This much was cer-
tainly appreciated by many of the entrants. But third is the fact that
in responding to the defections of the other side, one may be repeating
or even amplifying one’s own previous exploitative choice. Thus a
single occasional defection may be successful when analyzed for its
direct effects, and perhaps even when its secondary effects are taken
into account. But the real costs may be in the tertiary effects when one’s
own isolated defections turn into unending mutual recriminations.
Without realizing it, many of these rules actually wound u p punishing
themselves. With the other player serving as a mechanism to delay
the self-punishment by a few moves, this aspect of self-punishment was
not perceived by the decision rules.
(3) The clearest pattern in the tournament data was certainly a
surprise: A single attribute, “niceness,” accounted completely for which
rules were in the top eight ranks and which were in the bottom seven
ranks. Niceness is the property of never being the first to defect, a t least
until the last few moves. Every rule which had it did much better than
any rule which did not. Each of the nice rules did well with each of the
other nice rules, but in addition few of the rules which were not nice
did very well with any of the nice rules. It is remarkable that a single
property would so clearly separate the successful rules from the unsuc-
cessful rules. At least in this first tournament, it paid to be nice.
(4) It also paid to be forgiving. T I T F O R T A T did so well because
it punished each defection by the other side once, but only once. There-
fore its forgiveness prevented it from amplifying the echoes of the other
player’s defections. Just as the presence or absence of niceness ac-
counted for whether a program was in the top or bottom half of the
tournament, the degree of forgiveness largely accounted for how a nice
program ranked within the top half. The more forgiving, the better the
~ ( 5 ) The very existence of two kingmakers is surprising. The relative
ranking among the top rules was almost totally accounted for by their
performance with just two of the fifteen rules. Given that there were
kingmakers a t all, the fact that RANDOM was not one of them is also
surprising since there was a lot to be gained from discriminatingpromptly

against RANDOM. In retrospect, we can see that giving up quickly

on RANDOM was only done at the cost of being too uncooperative
with some other rules which may appear to be either RANDOM or
untrainable. Thus another lesson of the tournament is that it pays to
be wary of regarding the other side's seemingly inexplicable behavior
as hopelessly unresponsive.
(6) Despite the fact that none of the attempts at more or less sophis-
ticated decision rules were improvements on TIT FOR TAT, it was
easy to find several rules which performid substantially better than
TIT FOR TAT in the environment of the tournament. This should
serve as a warning against the facile belief than a n eye for an eye is
necessarily the best strategy. Here are three rules that would have won
the tournament if submitted.

(a) The sample program sent to prospective contestants to show them how t o make
a submission would in fact have won the tournament if anyone had simply clipped
it and mailed it in! But no one did. This rule is to defect only iftheother playerde-
fected o n the previous t w o moves. I t is a more forgiving version of TIT FOR T A T
in that it does not punish isolated defections. Theexcellent performanceof theTIT
FOR T W O T A T S rule highlights the fact that a common error of the contestants
was t o expect that gains could be made from being relatively less forgiving than
TIT FOR TAT. whereas in fact there were big gains to be made from being even
more forgiving. The implication of this finding is striking, since it suggests that
expert strategists d o not give sufficient weight to the importance of forgiveness.
(b) Another rule which would have won the tournament was also available t o most
of the contestants. This was the rule which won the preliminary tournament, a
report of which was used in recruiting the contestants (Axelrod, 1978). Called
LOOK AHEAD, it was inspired by the tree searching techniques used in the
artificial intelligence programs t o play chess. It is interesting that artificial in-
telligencetechniques could have inspired a rule which was in fact better than any
of the rules designed by game theorists specifically for the Prisoner's Dilemma.
(c) A third rule which would have won the tournament was a slight modification of
DOWNING. If DOWNING had started with initial beliefs that the other players
would be responsive rather than unresponsive, it too would have won and won
by a large margin.' A kingmaker could have been king. DOWNING'S initial
beliefs about the other players were pessimistic. It turned out that optimism
about their responsiveness would not only have been moreaccurate but would also
have led to more successful performance. It would have resulted in first place
rather than tenth place.

5 . In the environment of the 15 rules of the tournament, REVISED DOWNING

averages 542 points. This compares t o T I T FOR TAT which won with 504 points. TIT
FOR TWO TATS averages 532 in the same environment. and LOOK AHEAD averages
520 points.

A.rdrotl 1 TIlE PRISONER‘S 1 ~ I L E h f . I l A 21

These results from supplementary rules reinforce a theme from the

analysis of the tournament entries themselves: The entries were t o o
competitive f o r their own good. I n the first place, many of them defected
early in the game without provocation, a characteristic which was very
costly in the long run. In t h e second place, the optimal amount of for-
giveness was considerably greater than displayed by a n y of the entries
(except possibly DOWNING). And in the third place, the entry which
was most different from the others, DOWNING, floundered o n its own
misplaced pessimism regarding the initial responsiveness of the others.
(7) Finally, the most impressively, the sheer number of surprises
was itself a surprise. What this indicates is that there is a lot to be learned
about coping with a n environment of mutual power. Even expert
strategists from political science, sociology, economics, psychology,
a n d mathematics made the systematic errors of being too competitive
for their own good, not forgiving enough, and too pessimistic a b o u t
the responsiveness of the other side.
The effectiveness of a particular strategy depends n o t only o n i t s own
characteristics, but also o n the nature of the other strategies with which
it must interact. F o r this reason, the results of a single tournament a r e
not definitive. What this tournament does provide a r e concepts a n d
examples which allow a deeper appreciation of some of the subtle con-
sequences of choice in a n environment which contains aspects of the
iterated Prisoner’s Dilemma.
The application of these concepts a n d examples can be widespread
since the Prisoner’s Dilemma is so common. T h e discovery of subtle
reasons for t h e individualistic pragmatist to be nice, forgiving, a n d
optimistic is a n unexpected bonus.


First Place with 504.5 points is TIT FOR TAT submitted by Anatol Rapo-
port of the Department of Psychology, University of Toronto. This rule is
only four lines long in FORTRAN. I t cooperates on the first move, and then
does whatever the other player did on the previous move. It has a long history,
since it can be identified with the ancient lex talionis or an eye for an eye. A
recent mathematical treatment of its strategic properties is given in Taylor
(1976). Discussions of its successful performance with human subjects include
Oskamp (1971) and Wilson (1971).

Second Place with 500.4 points is a program of 41 lines by T. NicolasTIDE-

MAN and Paula CHIERUZZI of the Department of Economics, Virginia
Polytechnic Institute and State University. This rule begins with cooperation
and tit for tat. However, when the other player finishes his second run of defec-
tions, an extra punishment is instituted, and the number of punishing defections
is increased by one with each run of the other’s defections. The other player
is given a fresh start if he is 10 or more points behind, if he has not just started a
run of defections. if it has been at least 20 moves since a fresh start, if there are
a t least 10 moves remaining, and if the number af defections differs from a
50-50 random generator by a t least 3.0 standard deviations. A fresh start in-
volves two cooperations and then play as if the game had just started. The
program defects automatically on the last two moves.
Third Place with 485.5 points is a 23-line program by Rudy NYDEGGER
of the Department of Psychology, Union College, Schenectady, New York..
The program begins with tit for tat for the first three moves, except that if it
was the only one to cooperate on the first move and the only one to defect on
the second move, it defects on the third move. After the third move, its choice
is determined from the 3 preceding outcomes in the following manner. Let A
be the sum formed by counting the other’s defection as 2 points and one’s own
as 1 point, and giving weights of 16, 4. and 1 to the preceding three moves in
chronological order. The choice can be described as defecting only when A
equals I, 6,7. 17,22,23,26,29,30,31,33,38,39,45,49,54,55.58, or 61. Thus
if all three preceding moves are mutual defection, A = 63 and the rule cooperates.
This rule was designed for use in laboratory experiments as a stooge which had
a memory and appeared to be trustworthy, potentially cooperative, but not
gullible (Nydegger. 1978).
Fourth Place with 481.9 points is a n eight-line program by Bernard GROF-
MAN of the Public Policy Research Organization, University of California,
Irvine. If the players did different things on the previous move, this rule cooper-
ates with probability 2/7. Otherwise this rule always cooperates.
Fifth Place with 480.7 points is a sixteen-line program by Martin SHUBIK
of the Department of Economics, Yale University. This rule cooperates until
the other defects, and then defects once. If the other defects again after the
rule’s cooperation is resumed, the rule defects twice. In general, the length of
retaliation is increased by one for each departure from mutual cooperation.
This rule is described with its strategic implications in Shubik (1970). Further
treatment of its is given in Taylor (1976).
Sixth Place with 477.8 is a fifty-line program by William STEIN of the
Mathematics Department, Texas Christian University and Amnon RAPO-
PORT of the Department of Psychology. University of North Carolina. This
rule plays tit for tat except that it cooperates on the first four moves, it defects
on the last two moves, and every fifteen moves it checks to see if the opponent
seems to be playing randomly. This check uses a chi-squared test of the other’s
transition probabilities and also checks for alternating moves of C D and DC.

Seventh Place with 473.4 points is a thirteen-line program by James W.

FRIEDMAN of the Department of Economics, University of Rochester. This
rule cooperates until the other player defects, and then defects until the end of
the game. This strategy was described in the context of the Prisoner’s Dilemma
by Harris (1969). Its properties in a broader class of games have beendeveloped
by Friedman (1971).
Eighth Place with 47 1.8 points is a six-line program by Morton DAVIS of the
Department of Mathematics, City College, CUNY. This rule cooperates on the
first ten moves, and then if there is a defection it defects until the end of the game.
Ninth Place with 400.7 points isa63-line program by’James GRAASKAMP,
an undergraduate at Beloit College. This rule plays tit for tat for 50 moves,
defects on move 51, and then plays 5 more moves of tit for tat. A check is then
made to see if the player seems t o be RANDOM, in which case it defects from
then on. A check is also made to see if the other is TIT FOR TAT, ANALOGY
(a program from the preliminary tournament), and its own twin, in which case
it plays tit for tat. Otherwise it randomly defects every 5 to 15 moves, hoping
that enough trust has been built up so that the other player will not notice these
Tenth Place with 390.6 points is a 33-line program based o n an idea sub-
mitted by Leslie DOWNING of the Department of Psychology, Union College,
Schenectady, New York. This rule selects its choice to maximize its own long-
term expected payoff on the assumption that the other rule cooperates with a
fixed probability which depends only on whether the other player cooperated
or defected on the previous move. These two probabilities estimates are con-
tinuously updated as the game progresses. Initially, they are both assumed to
be .5, which amounts to the pessimistic assumption that the other player is not
responsive. This rule is based o n an outcome maximization interpretation of
human performances proposed by Downing (1975).
Eleventh Place with 327.6 points is a six-line program by Scott FELD of
the Department of Sociology, University of California, Riverside. This rule
starts with tit for tat and gradually lowers its probability of cooperation fol-
lowing the other’s cooperation to .5 by the two hundredth move. It always
defects after a defection by the other.
Twelfth Place with 304.4 points is a five-line program by Johann JOSS of
the Eidgenossishe Technische Hochschule, Zurich. This rule cooperates 90%
of the time after a cooperation by the other. It always defects after a defection
by the other.
Thirteenth Place with 300.5 points is an eighteen-line program entered by
Gordon TULLOCK of the Center for Study of Public Choice, Virginia Poly-
technic Institute and State University. This rule cooperates on the first eleven
moves. I t then cooperates 10% less than the other player has cooperated on the
preceding ten moves. This rule is based on an idea developed in Overcast and
Tullock (1971). Professor Tullock was invited t o specify how the idea could be

implemented; a n d h e did S O o u t of scientific interest rather than an expectation

that it would b e a likely winner.
Fourteenth Place with 282.2 points is a 77-line program by a graduate
student of political science whose dissertation is in g a m e theory. This rule has
a probability of cooperating, P, which is initially 30% a n d is updated every 10
moves. P is adjusted if t h e other player seems random, very cooperative, or
very uncooperative. P is also adjusted after move 130 if t h e rule has a lower
score than the other player. Unfortunately, the complex process of adjustment
frequently left the probability of cooperation in t h e 30% to 70% range, a n d
therefore the rule appeared r a n d o m t o m a n y other players.
Fifteenth Place with 276.3 points was t h e five-line RANDOM program
which cooperates a n d defects with equal probability.

Postscript: A second round of the Computer Tournament for the

Iterated Prisoner's Dilemma has now been conducted. The results will
be reported in this journal later this year.


