Final 81
Final 81
Final 81
Abstract—We modeled the card game Coup as a Markov the game is to deceive the other players by lying and claiming
Decision Process and attempted to solve it using various methods you have whatever card suits you best. Because lying can give
learned in CS238. Due to our large state space, we focused on a significant advantage to the player, the other players try to
online methods. Since Coup is a multi-agent game we generated
optimal policies against players with specific strategies. We first determine when a player is lying and call bluff. If they call
modeled the game as an MDP where we knew everything about bluff and you were not lying, the player who called bluff must
the game state and developed policies against a player doing flip over one of their cards (and cannot use it anymore). If the
random actions. We used forward search, sparse sampling, and other players catch you lying, you must flip over one of your
monte carlo tree search. We then modeled the game as a POMDP cards.
with state uncertainty where we did not know our opponents
cards. We implemented Monte Carlo Tree Search, sparse sam- B. Sources of Uncertainty
pling and forward search with both incomplete and complete
information. Finally, to try and beat our Monte Carlo Tree There are several sources of uncertainty in the game:
Search player, we implemented Forward Search with Discrete • Players are uncertain what roles (cards) other players have
State Filtering for updating our belief.
until they are eliminated
Index Terms—MDP, POMDP, Coup, multi-agent
• Players are uncertain what actions/claims their opponent
will make
I. I NTRODUCTION
C. Related Work
Fig. 1. Coup Rules To the best of our knowledge, there isnt any previous work
that has tried to compute the optimal policy or computed
online planning strategies to play the board game Coup. We
review work done on related games here. There’s another
similar game called One Night Werewolf where the objective
of the game is to try and discern which players are lying. It was
a topic at the Computer Games Workshop at IJCAI, and they
discussed Monte Carlo Tree Search (MCTS), reinforcement
learning, alpha-beta, and nested rollout policy adaptation [1].
Yet, they note that the most popular methods was MCTS and
Deep Learning. Similarly, in our project we try out using
MCTS to decide the best action from a given game state.
II. M ODELING C OUP
We can represent Coup as an MDP with the following
states, actions, transitions, and rewards.
TABLE IX
Table 5, we have a MCTS agent with access to the partially MCTSI NCOMPLETE VS 2 MCTSC OMPLETE P LAYERS
observable state competing against two random agents. The
same reasoning for variation in speed and win percentage hold Depth # sims Win %
1 4 0
in Table 4 and Table 5 as in Table 2 and Table 3. MCTS agents 1 10 0
take less time than Sparse Sampling to choose an action but 4 1 0
have a reduced win percentage. 4 4 0
10 10 10
10 20 0
TABLE VI 20 10 40
S PARSE S AMPLING C OMPLETE VS . 2 S PARSE S AMPLING I NCOMPLETE
P LAYERS (10 GAMES )
In Table 8, we have an MCTS agent with complete state
Depth # samples Win % information competing against two MCTS agents with incom-
1 1 0 plete state. In Table 9, we have the opposite: an MCTS agent
1 5 0
2 1 20 with incomplete state competing against two MCTS agents
2 2 40 with complete state. The same logic can be held for the pair
4 1 60 of figures as in Table 6 and Table 7.
4 5 70
For Forward Search with Discrete State Filtering, we were
able to implement an initial version but it ran too slow for us to
gather meaningful results. However, we do believe that if we
TABLE VII refactored our code to avoid redundant computation, Forward
S PARSE S AMPLING I N C OMPLETE VS . 2 S PARSE S AMPLING C OMPLETE Search with Discrete State Filtering would be a feasible
P LAYERS (10 GAMES )
method that would work in practice. We could also explore
Depth # samples Win % different approximation methods such as particle filtering and
1 1 0 others.
1 5 0
2 1 20 IX. D ISCUSSION
2 2 30
4 1 10 Because this a friendly board game where there isn’t any
4 5 10 monetary incentive to win, it is better to sacrifice a bit on
the chance of winning it for a quicker game. Forward Search
In Table 6, we have a Sparse Sampling agent with complete with a partially observable state is more accurate than any of
state competing against two Sparse Sampling agents with the methods that use a generative function but costs a lot more
incomplete state. In Table 7, we have the opposite: a Sparse time. Therefore, it might be more valuable to use either Sparse
Sampling agent with incomplete state competing against two Sampling or MCTS to compute the best action to take.
Sparse Sampling agents with complete state. Notice that if the We believed, initially, that we could use offline methods
depth is less than four, then our agent never wins. This is to generate an optimal policy to play the game. We realized
that the state space is too large to feasibly iterate through and
update all state action combinations. We then decided to try
and reduce our state space by removing an entire role, creating
qualitative classes to represent the quantitative number of coins
and removing the opponents cards from the state. It turns out,
that still the order of the magnitude of the state space is still
huge and unfeasible to run value iteration or Q-learning.
To extend this project, it would be interesting to consider
how we can create players to challenge our player that uses
forward search with particle filtering. Inspired by the level-K
model, if we know that all our opponents are implementing
forward search, can we leverage that knowledge?
X. W ORK B REAKDOWN
Adrien is taking this class for 3 units and tackled represent-
ing Coup as an MDP. He also implemented Forward Search,
contributed to debugging Sparse Sampling and MCTS and
helped analyze the results. David is also taking this class for 3
units and was integral in the development of optimizations for
Forward Search with discrete state filtering and transforming
our results into a presentable manner. Semir is taking the
class for 4 units and combed through the literature. He also
contributed to the initial design of Coup as an MDP and
creating the simulator. He implemented Sparse Sampling,
MCTS, and contributed to Forward Search with discrete state
filtering. He helped outline the report and ran the simulations
to generate the results and interpreted the findings.
R EFERENCES
[1] Srivastava, Biplav, and Gita Sukthankar. ”Reports on the 2016 IJCAI
workshop series.” AI Magazine 37.4 (2016): 94.
[2] Mykel J. Kochenderfer; Christopher Amato; Girish Chowdhary;
Jonathan P. How; Hayley J. Davison Reynolds; Jason R. Thornton; Pedro
A. Torres-Carrasquillo; N. Kemal Ure; John Vian, ”State Uncertainty,” in
Decision Making Under Uncertainty: Theory and Application , , MITP,
2015, pp.