Axelrod 1980
Axelrod 1980
Axelrod 1980
ROBERT AXELROD
lnsrirute of Public Policy Siudies
University of hf iehigon
This is a "primer" on how to play the iterated Prisoner's Dilemma game effectively.
Existing research approaches offer the participant limited help in understanding how t o
cope effectively with such interactipns. To gain a deeper understanding of how t o be
effective in such a partially competitive and partially cooperative environment, a computer
tournament was conducted for the iterated Prisoner's Dilemma. Decision rules were
submitted by entrants who were recruited primarily from experts in game theory from
a variety ofdisciplines: psychology, political science,economics, sociology,and mathema-
tics. The results of the tournament demonstrate that there are subtle reasons for a n in-
dividualistic pragmatist t o cooperate a s long as the other side does, to be somewhat for-
giving, and t o be optimistic about the other side's responsiveness.
INTRODUCTION
AUTHOR'S NOTE: I would like t o thank those who made this project possible, the
participants whose names and affiliations are given in the Appendix. I would also like t o
thank Jeff Pynnonen. my researchassistant, for helping t o prepare theexecutive program
for the tournament. And for their helpful suggestions concerning the design and analysis
of the tournament, 1 would like to thank John Chamberlin, hlichael Cohen, and Serge
Taylor. For its support of this research, I owe thanks t o the Institute of Public Policy
Studies of the University of Michigan.
TABLE 1
Payoff Matrix for Each Move of the Prisoner’s Dilemma
CO~UVIII
Player
~ ~~
Cooperate Defect
bilities. It would also have to take into account two important facts
about strategic interaction in a non-zero sum setting. First, what is
effective is likely t o depend not only upon the characteristics of a par-
ticular strategy, but also upon the nature of the other strategies with
which it must interact. The second point follows directly from the
first. An effective strategy must be able at any point t o take into account
the history of the interaction as it has developed so far.
A computer tournament for the study of effective choice in the
iterated Prisoner’s Dilemma meets these needs. In a computer tourna-
ment, each entrant writes a program which embodies a decision rule
t o select the cooperative o r noncooperative choice on each move. The
program has available t o it the history of the game so far, and may use
this history in making a choice. Because the participants are recruited
primarily from those who have written on game theory a n d especially
the Prisoner’s Dilemma, the entrants are assured that their decision
rule will be facing rules of other experts. Such recruitment also guaran-
tees that the state of the art is represented in the tournament.
Just such a tournament was conducted. T h e results contain some real
surprises. These surprises in turn offer new insights into the questions
of how to understand and how t o cope with a n environment which
contains aspects of the Prisoner’s Dilemma.
THE TOURNAMENT
The Winner
The tournament was run as a round robin so that each entry was
paired with each other entry. As announced in the rules of the tourna-
ment, each entry was also paired with its own twinand with RANDOM,
a program that randomly cooperates and defects with equal probability.
Each game consisted of exactly 200 moves. The payoff matrix for
each move awards both players 3 points for mutual cooperation, and
1 point for mutual defection. If one player defects while the other player
cooperates, the defecting player receives 5 points and the cooperating
player receives 0 points (see Table 1).
No entry was disqualified for exceeding the allotted time. In fact,
the entire round robin tournament was run five times to get a more
stable estimate of the scores for each pair of players. In all, there were
120,000 moves, making for 240,000 separate choices.
The Contestants
The fourteen submitted entries came from three countries and five
disciplines. The Appendix lists the names a n d affiliations of the sixteen
people who submitted these entries, and it gives the rank and score
of their entries.
One remarkable aspect of the tournament is that it allowed people
from different disciplines t o interact with each other in a common
format and language. Most of the entrants were recruited from those
who had published articles o n game theory in general o r the Prisoner’s
Dilemma in particular.
Analysis of the results shows that neither the discipline of the author,
the brevity of the program, nor its length accounts for a rule’s relative
success. What does?
Before answering this question, a remark on the interpretation of
numerical scores is in order. A useful benchmark for very good perfor-
mance is 600 points, which is equivalent t o the score attained by a
player when both sides always cooperate with each other. A useful
benchmark for very poor performance is 200 points, which is equivalent
t o the score attained by a player when both sides never cooperate with
each other. As we shall see, most scores range between 200 and 600
points, although scores from 0 to 1000 points are possible. The winner,
T I T FOR TAT, averaged 504 points per game.
Niceness
separated the more SUCCeSSfUl rules from the less successful rules in this
Computer Prisoner’s Dilemma Tournament.
Each of the nice rules got about 600 points with each of the 7 other
nice rules and with its own twin. This is because when 2 nice rules play,
they are sure t o cooperate with each other until virtually the end of
the game. Actually the minor variations in end-game tactics did not
account for much variation in the scores.
The scores of each player with each of the others are displayed in
Table 2. Notice that the numbers in the upper’left are all near 600, due
t o the fact that the top ranking entries are all nice and therefore cooper-
ate with each other.
The bottom left section of Table 2 shows that the rules which are not
nice rarely got more than 400 points with any of the nice rules. There-
fore, the nice rules had a tremendous advantage in that they cooperated
with each other, but were not very cooperative with the rules which
were the first t o defect.
So far we have seen that niceness accounts for which rules are in the
top eight ranks and which rules are below. But what accounts for the
relative ranking among the top eight?
Since the nice rules all got within a few points of 600 with each other,
ihe thing which distinguishes the relative rankings among the nice
rules is their scores with the rules which are not nice. This much is
obvious.
What is not obvious is that the relative ranking of the eight top rules
was largely determined by just two of the other seven rules. These two
rules are kirtgntakers because they d o not d o SO well for themselves,
but they largely determine the rankings among the top contenders.
The two kingmakers are G R A A S K A M P and DOWNING. (Entries
are indicated by capital letters. All entries except the familiar TIT F O R
T A T are named after their authors.)
DOWNlNG is the most important kingmaker because the range of
scores achieved with it by the nice rules was the largest of any rule.
The scores of the nice rules playing DOWNING ranged from a high
of 601 by TlDEMAN A N D CHIERUZZI to a low of 158 by NYDEG-
GER.
DOWNING is a particularly interesting rule in its own right, because,
unlike most of the others, its logic is not based on a variant of TIT F O R
TAT. DOWNING assumes that the other side bases its choice on
DOWNING'S own previous choice. DOWNING estimates the pro-
bability that the other player cooperates after it (DOWNING) cooper-
ates, and also the probability that the other player cooperates after
DOWNING defects. Each move, it updates its estimate of these two
conditional probabilities and then selects the choice which will maxi-
mize its own long-term payoff under the assumption that it hascorrectly
modeled the other player. The decision ru1e.k based on "an outcome
maximization" principle originally developed as a possible interpreta-
tion of what human subjects d o in Prisoner's Dilemma laboratory
experiments (Downing, 1975). If the two conditional probabilities
have similar values, DOWNING determines that it pays to defect, since
the other player seems to be doing the same thing whether DOWNING
cooperates or not. Conversely, if the other player tends to cooperate
after a cooperation but not after a defection by DOWNING, then the
other player seems responsive, and DOWNING will calculate that the
best thing to d o with a responsive player is to cooperate. Under certain
circumstances, DOWNING will even determine that the best strategy
is to alternate cooperation and defection.
At the start of a game, DOWNING does not know the values of these
conditional probabilities for the other players. It assumes that they are
both .5, but gives no weight to this estimate when information actually
does come in during the play of the game.
This is a fairly sophisticated decision rule, but its implementation
does have one flaw. By initially assuming that the other player is un-
responsive, DOWNING is doomed to defect on the first two moves.
These first two defections led many other rules to punish DOWNING,
so things usually got off to a bad start. But this is precisely why DOW-
NING served so well as a kingmaker. First ranking TIT FOR T A T
and second ranking TIDEMAN AND CHIERUZZI both reacted in
such a way that DOWNING learned to expect that defection does
not pay but that cooperation does. All of the other nice rules went
downhill with DOWNING.
The other kingmaker is GRAASKAMP. This is a very clever pro-
gram. It plays tit for tat for 50 moves, defects once, plays tit for tat for
another 5 moves, and then examines the history of the game so far.
Its defection on move 51 dlows it to recognize its own twin and be
cooperative with it. Similarly, it can check to see if the other player
TABLE 3
Illustrative Game Between TIT FOR TAT and JOSS
moves 1- 20 11111 23232 32323 23232
moves 21- 4 0 . 32324 44444 44444 44444
moves 41- 60 44444 44444 44444 44444
moves 61- 80 44444 44444 44444 44444
moves 81-100 44444 44444 44444 44444
moves 101-120 44444 44444 44444 44444
moves 121-140 44444 44444 44444 44444
moves 141-160 44444 44444 44444 44444
moves 161-180 44444 44444 44444 44444
moves 181-200 44444 44444 44444 44444
Score in this game: TIT FOR TAT 236, JOSS 241
Legend: 1 b o t h cooperated
2 TIT FOR TAT only coopcrated
3 JOSS only cooperated
4 neither cooperated
other cooperated and 0 when the other defected). This is not too much
below the average of 3 points per turn when both were cooperating.
However, worse was to come. JOSS was still cooperating on moves
7, 9, 11, 13, and so on. But on the twenty-fifth move, JOSS selected
another of its probabilistic defections. Of course TIT FOR TAT de-
fected on move 26 and another reverberating echo began. This echo
had JOSS defecting on the odd numbered moves and TIT FOR TAT
defecting on the even numbered moves. Together these two echoes
resulted in both players defecting on every move after move 25. This
meant that for the next 175 moves they both got only one point per
turn. The final score of this game was 236 for TIT FOR TAT and 241
for JOSS. Notice that while JOSS did a little better than TIT FOR TAT,
both did poorly.*
2. In the five games between them, the average scores aere 225 for TIT FOR TAT
and 230 for JOSS.
3. As might be expected, the results could depend on the payoff matrix. Forexample,
if P is changed from I to 2, then TIDEMAN AND CHIERUZZI would have come in
first, followed by STEIN AND RAPOPORT, FRIEDhlAN. and TIT FOR TAT. Of
course, the entrants knew the actual payoff matrix. Had the announced payoff matrix
been different. some of them might have submitted different programs, and this would
have affected everyone's score.
4. TULLOCK was also involved in a cycle of scores. TULLOCK did about fifty
points better than SHUBIK in their games together. And SHUBIK did about ten points
better than DOWNlNG in h e i r games together. But DOWNING did fifty points better
(a) The sample program sent to prospective contestants to show them how t o make
a submission would in fact have won the tournament if anyone had simply clipped
it and mailed it in! But no one did. This rule is to defect only iftheother playerde-
fected o n the previous t w o moves. I t is a more forgiving version of TIT FOR T A T
in that it does not punish isolated defections. Theexcellent performanceof theTIT
FOR T W O T A T S rule highlights the fact that a common error of the contestants
was t o expect that gains could be made from being relatively less forgiving than
TIT FOR TAT. whereas in fact there were big gains to be made from being even
more forgiving. The implication of this finding is striking, since it suggests that
expert strategists d o not give sufficient weight to the importance of forgiveness.
(b) Another rule which would have won the tournament was also available t o most
of the contestants. This was the rule which won the preliminary tournament, a
report of which was used in recruiting the contestants (Axelrod, 1978). Called
LOOK AHEAD, it was inspired by the tree searching techniques used in the
artificial intelligence programs t o play chess. It is interesting that artificial in-
telligencetechniques could have inspired a rule which was in fact better than any
of the rules designed by game theorists specifically for the Prisoner's Dilemma.
(c) A third rule which would have won the tournament was a slight modification of
DOWNING. If DOWNING had started with initial beliefs that the other players
would be responsive rather than unresponsive, it too would have won and won
by a large margin.' A kingmaker could have been king. DOWNING'S initial
beliefs about the other players were pessimistic. It turned out that optimism
about their responsiveness would not only have been moreaccurate but would also
have led to more successful performance. It would have resulted in first place
rather than tenth place.
Appendix
THE DECISION RULES
First Place with 504.5 points is TIT FOR TAT submitted by Anatol Rapo-
port of the Department of Psychology, University of Toronto. This rule is
only four lines long in FORTRAN. I t cooperates on the first move, and then
does whatever the other player did on the previous move. It has a long history,
since it can be identified with the ancient lex talionis or an eye for an eye. A
recent mathematical treatment of its strategic properties is given in Taylor
(1976). Discussions of its successful performance with human subjects include
Oskamp (1971) and Wilson (1971).
REFERENCES