Games and Search: Lecture 3 - 58066 7 Artificial Intelligence (4ov / 8op)
Games and Search: Lecture 3 - 58066 7 Artificial Intelligence (4ov / 8op)
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.1 Search problems
●
State space search — find a (shortest) path
from the initial state to the goal state.
●
Constraint satisfaction — find a value
assignment to a set of variables so that given
constraints are met.
●
Combinatorial optimization — find a value
assignment to a set of variables so that an
objective function is minimized (or maximized).
●
Games — find an optimal strategy to beat the
opponent. Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.2 Games of interest
●
Two-person zero-sum games; if one player wins, the
other must lose.
●
Players are adversarial — both not only try to win but
cause the opponent to lose.
●
Players are called:
– MAX maximize one's own payoff
– MIN minimize other's (MAX's) payoff
●
No chance involved
●
Complete information
– Available actions
– Payoffs Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.3 Game tree
●
Root of the tree is the starting position, with the indicator
who moves first.
●
Nodes represent possible states of the game.
●
Operators determine the legal moves.
●
Terminal test tells when the game is over.
●
Utility function gives a numeric value to the outcome of the
game.
●
Evaluation function enables the player to estimate if a
given state is good or bad.
●
Moves by two players are represented as alternate levels
in the tree. Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.4 Example tree
MAX
MIN
MAX
MIN
FINAL
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.5 Expected utility
●
Game theoretic concept (Bernoulli, 1738; von Neumann &
Morgenstern, 1944)
●
Subjective value (utility) of an uncertain outcome is weighed by
its probability.
●
Decision makers try to maximize their expected utility.
●
Lottery example:
(a) Win $1000 with probability 1.0
(b) Win
✔
$101,000 with probability .001, or
✔
$900.9 with probability .999.
●
Which one would you choose?
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.6 Minimax strategy
●
Proceeds depth-first
●
Determines the optimal strategy for MAX:
– Generate the whole game tree
– Propagate utility values from leaves toward the root.
3 A MAX
2 B1 3 B2 MIN
2 12 8 3 4 6
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.7 Searching game trees
●
One cannot use exhaustive search
– The tree is potentially huge.
– The opponent complicates the search.
●
One can use depth-first or breadth-first search
to generate the tree, but other methods need to
be used to choose good moves:
– Bounded lookahead
– Alpha-beta pruning
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.8 Bounded look-ahead
●
At each move the search tree is examined to
particular depth.
●
Difficulty of choosing a fixed cut-point:
– Non-quiescent positions in near future cut search
only at points that are safe.
– Horizon problem; consequences of a bad move is
postponed beyond the search depth no general
solution exists.
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.9 Alpha-beta pruning
●
Remove sections of the game tree that are not worth examining.
●
In other words, if better outcome is already guaranteed after
examining one move or its parents, the others need not be examined.
●
Does not change the outcome of the game, if both players play
optimally.
●
Effectiveness depends on the order the nodes are evaluated.
●
For MAX node
– = maximum value found in its descendants
– = minimum beta value found in its MIN ancestors
●
For MIN node
– = minimum value found in its descendants
– = maximum alpha value found in its MAX ancestors
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.10 Alpha-beta algorithm
●
Function alpha_beta(current_node, alpha, beta)
If ROOT(current_node)
alpha = -inf
beta = inf
If LEAF(current_node)
return payoff
If MAX_node
alpha = max(alpha, alpha_beta(children, alpha, beta))
If alpha beta
cut_off(current_node)
If MIN_node
beta = min(beta, alpha_beta(children, alpha, beta))
If beta alpha
cut_off(current_node)
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.11 Alpha-beta example
= - = 3
a = MAX
=3 b c =3
MIN
1 2 3 4 5 7 1 0 2 6 1 5
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.12 Other pruning methods
●
Null-move pruning
– Speeds up alpha-beta
– If the position after a skipped moved is strong enough to produce a cut-
off, likely the current position is strong enough even if the player actually
moved.
●
Forward pruning
– Node is discarded without searching beyond it, if it unlikely leads to
better moves.
– Non-zero probability of errors
– Lim & Lee (2006)
●
Errors more severe if opponent's moves pruned than own prune in MAX
nodes especially if winning
●
The probability the error propagates to the root decreases as the depth of
error location increases.
●
However, the number of leaves increases at each depth faster than errors
can be avoided by minimax prune less in deeper levels.
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.13 Solving games
●
Finding the game-theoretic value of the game (van
den Herik et al., 2002): value indicates if the first
mover wins, loses or the game ends in draw.
●
Ultra-weakly solved, weakly solved and strongly
solved
●
Why? To explore if the knowledge from solving games
can be translated to rules and strategies that
– can be applied by humans.
– are general and not ad hoc.
– are transferable between games.
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
State-space 4.14 Game space
complexity
Category 3 Category 4
if solvable, then by not solvable by
knowledge-based methods any method
(e.g., Go end games) (e.g., full Chess)
Category 1 Category 2
Solvable by any method if solvable, then by brute-first
(e.g., Go, Othello,
endgame Chess and Checkers)
Game-tree complexity
-State space complexity = number of legal positions reachable from
initial position
- Game-tree complexity = number of leaf nodes in the solution
search tree of the initial position.
- Convergent (Chess, Checkers) vs. divergent games (Go, Othello)
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.15 Other kind of games
●
Two-person perfect-information games
(e.g., Chess, Othello)
●
Multiple-player, stochastic, incomplete or
imperfect information games (e.g., Poker,
Backgammon)
●
Interactive games, such as action games, role-
playing games, adventure games, and sports
games are a topic of whole another course!
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.16 Availability of information
●
Complete information: every player knows the payoffs
and the strategies available to other players (type of
players, structure of the game)
●
Perfect information: player knows the actions of the
other players (what happens within the game)
●
Certain information: players know which game they
are playing, i.e., what the payoff from a certain
strategy will be given the strategies played by others.
●
Games of incomplete and imperfect information pose
problems to search based methods.
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.17 How to study these games?
●
Simplified versions
– Subset of the game
– Address each sub-problem separately
– Abstractions; collect similar sub-problems into same class
– Two-players only
●
Tackle whole problem at once (Billings et al., 2002).
●
For instance, for poker this includes
– Betting strategy
– Opponent modeling
– Learning
– Performance evaluation
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.18 Poki: architecture
●
Plays Texas Hold'em (Billings et al., 1999, 2002)
●
Architecture (also consists of a dealer)
Triple
p(fold), p(call), p(raise)
Betting rulebase
Simulator
Action selector
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)
4.19 Poki: Learning and decision
making
●
Basic betting strategy:
1. Compute effective hand strength=current hand strength+potential
to improve
2. Calculate probabilities of actions: fold, call, and raise
3. Choose action stochastically
●
Simulation based betting strategy; play out many likely
scenarios to get the expected value of each betting action.
●
Opponent modeling
– Deduce the strength of the hand from actions
– Predict future actions
●
General Opponent Model (GOM): Fixed strategy based on rational
choice
●
Specific Opponent Model (SOM); Personal history
Lecture 3 – 580667 Artificial Intelligence (4ov / 8op)