Aipython PDF
Aipython PDF
http://aipython.org http://artint.info
©David L Poole and Alan K Mackworth 2017-2023.
All code is licensed under a Creative Commons Attribution-NonCommercial-
ShareAlike 4.0 International License. See: http://creativecommons.org/licenses/
by-nc-sa/4.0/deed.en US
This document and all the code can be downloaded from
http://artint.info/AIPython/ or from http://aipython.org
The authors and publisher of this book have used their best efforts in prepar-
ing this book. These efforts include the development, research and testing of
the theories and programs to determine their effectiveness. The authors and
publisher make no warranty of any kind, expressed or implied, with regard to
these programs or the documentation contained in this book. The author and
publisher shall not be liable in any event for incidental or consequential dam-
ages in connection with, or arising out of, the furnishing, performance, or use
of these programs.
Contents 3
3
4 Contents
2.2.3 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Hierarchical Controller . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Middle Layer . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.4 Top Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.5 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.5 Assumables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11 Causality 239
11.1 Do Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
11.2 Counterfactual Example . . . . . . . . . . . . . . . . . . . . . . 241
Bibliography 313
Index 315
in a terminal shell (not in Python). That should “just work”. If not, try using
pip instead of pip3.
The command python or python3 should then start the interactive python
shell. You can quit Python with a control-D or with quit().
9
10 1. Python for Artificial Intelligence
In [6]:
You can then interact at the last prompt.
There are many textbooks for Python. The best source of information about
python is https://www.python.org/. We will be using Python 3; please down-
load the latest release. The documentation is at https://docs.python.org/3/.
The rest of this chapter is about what is special about the code for AI tools.
We will only use the Standard Python Library and matplotlib. All of the exer-
cises can be done (and should be done) without using other libraries; the aim
is for you to spend your time thinking about how to solve the problem rather
than searching for pre-existing solutions.
1.4 Pitfalls
It is important to know when side effects occur. Often AI programs consider
what would happen or what may have happened. In many such cases, we
don’t want side effects. When an agent acts in the world, side effects are ap-
propriate.
In Python, you need to be careful to understand side effects. For example,
the inexpensive function to add an element to a list, namely append, changes the
list. In a functional language like Haskell or Lisp, adding a new element to a
list, without changing the original list, is a cheap operation. For example if x is
a list containing n elements, adding an extra element to the list in Python (using
append) is fast, but it has the side effect of changing the list x. To construct a new
list that contains the elements of x plus a new element, without changing the
value of x, entails copying the list, or using a different representation for lists.
In the searching code, we will use a different representation for lists for this
reason.
enumerates the values fe for each e in iter for which cond is true. The “if cond”
part is optional, but the “for” and “in” are not optional. Here e has to be a
variable, iter is an iterator, which can generate a stream of data, such as a list,
a set, a range object (to enumerate integers between ranges) or a file. cond
called, not the value of the variable when the function was defined (this is called
“late binding”). This means if you want to use the value a variable has when
the function is created, you need to save the current value of that variable.
Whereas Python uses “late binding” by default, the alternative that newcomers
often expect is “early binding”, where a function uses the value a variable had
when the function was defined, can be easily implemented.
Consider the following programs designed to create a list of 5 functions,
where the ith function in the list is meant to add i to its argument:1
Try to predict, and then test to see the output, of the output of the following
calls, remembering that the function uses the latest value of any variable that
is not bound in the function call:
pythonDemo.py — (continued)
29 # in Shell do
30 ## ipython -i pythonDemo.py
31 # Try these (copy text after the comment symbol and paste in the Python
prompt):
32 # print([f(10) for f in fun_list1])
33 # print([f(10) for f in fun_list2])
34 # print([f(10) for f in fun_list3])
35 # print([f(10) for f in fun_list4])
In the first for-loop, the function fun uses i, whose value is the last value it was
assigned. In the second loop, the function fun2 uses iv. There is a separate iv
variable for each function, and its value is the value of i when the function was
defined. Thus fun1 uses late binding, and fun2 uses early binding. fun list3
1 Numbered lines are Python code available in the code-directory, aipython. The name of
the file is given in the gray text above the listing. The numbers correspond to the line numbers
in that file.
and fun list4 are equivalent to the first two (except fun list4 uses a different i
variable).
One of the advantages of using the embedded definitions (as in fun1 and
fun2 above) over the lambda is that is it possible to add a __doc__ string, which
is the standard for documenting functions in Python, to the embedded defini-
tions.
49 def ga(n):
50 """generates square of even nonnegative integers less than n"""
51 for e in range(n):
52 if e%2==0:
53 yield e*e
54 a = ga(20)
The sequence of next(a), and list(a) gives exactly the same results as the com-
prehension in Section 1.5.1.
It is straightforward to write a version of the built-in enumerate. Let’s call it
myenumerate:
pythonDemo.py — (continued)
56 def myenumerate(enum):
57 for i in range(len(enum)):
58 yield i,enum[i]
Exercise 1.2 Write a version of enumerate where the only iteration is “for val in
enum”. Hint: keep track of the index.
pythonDemo.py — (continued)
At the end of the code are some commented-out commands you should try in
interactive mode. Cut from the file and paste into Python (and remember to
remove the comments symbol and leading space).
1.7 Utilities
1.7.1 Display
In this distribution, to keep things simple and to only use standard Python, we
use a text-oriented tracing of the code. A graphical depiction of the code could
where the level is less than or equal to the value for max display level will be
printed. The to print . . . can be anything that is accepted by the built-in print
(including any keyword arguments).
The definition of display is:
Note that args gets a tuple of the positional arguments, and nargs gets a dictio-
nary of the keyword arguments). This will not work in Python 2, and will give
an error.
Any class that wants to use display can be made a subclass of Displayable.
To change the maximum display level to say 3, for a class do:
which will make calls to display in that class print when the value of level is less
than-or-equal to 3. The default display level is 1. It can also be changed for
individual objects (the object value overrides the class value).
The value of max display level by convention is:
0 display nothing
2 also display the values as they change (little detail through a loop)
26 def visualize(func):
27 """A decorator for algorithms that do interactive visualization.
28 Ignored here.
29 """
30 return func
1.7.2 Argmax
Python has a built-in max function that takes a generator (or a list or set) and re-
turns the maximum value. The argmax method returns the index of an element
that has the maximum value. If there are multiple elements with the maxi-
mum value, one if the indexes to that value is returned at random. argmaxe
assumes an enumeration; a generator of (element, value) pairs, as for example
is generated by the built-in enumerate(list) for lists or dict.items() for dicts.
utilities.py — AIPython useful utilities
11 import random
12 import math
13
14 def argmaxall(gen):
15 """gen is a generator of (element,value) pairs, where value is a real.
16 argmaxall returns a list of all of the elements with maximal value.
17 """
18 maxv = -math.inf # negative infinity
19 maxvals = [] # list of maximal elements
20 for (e,v) in gen:
21 if v>maxv:
22 maxvals,maxv = [e], v
23 elif v==maxv:
24 maxvals.append(e)
25 return maxvals
26
27 def argmaxe(gen):
28 """gen is a generator of (element,value) pairs, where value is a real.
29 argmaxe returns an element with maximal value.
30 If there are multiple elements with the max value, one is returned at
random.
31 """
32 return random.choice(argmaxall(gen))
33
34 def argmax(lst):
35 """returns maximum index in a list"""
36 return argmaxe(enumerate(lst))
37 # Try:
38 # argmax([1,6,3,77,3,55,23])
39
40 def argmaxd(dct):
41 """returns the arx max of a dictionary dct"""
42 return argmaxe(dct.items())
43 # Try:
44 # arxmaxd({2:5,5:9,7:7})
Exercise 1.3 Change argmax to have an optional argument that specifies whether
you want the “first”, “last” or a “random” index of the maximum value returned.
If you want the first or the last, you don’t need to keep a list of the maximum
elements.
1.7.3 Probability
For many of the simulations, we want to make a variable True with some prob-
ability. flip(p) returns True with probability p, and otherwise returns False.
utilities.py — (continued)
45 def flip(prob):
46 """return true with probability prob"""
47 return random.random() < prob
49 def dict_union(d1,d2):
50 """returns a dictionary that contains the keys of d1 and d2.
51 The value for each key that is in d2 is the value from d2,
52 otherwise it is the value from d1.
53 This does not have side effects.
54 """
55 d = dict(d1) # copy d1
56 d.update(d2)
57 return d
The following code tests argmax and dict_union, but only when if utilities
is loaded in the top-level. If it is loaded in a module the test code is not run.
In your code you should do more substantial testing than we do here, in
particular testing the boundary cases.
utilities.py — (continued)
59 def test():
60 """Test part of utilities"""
61 assert argmax(enumerate([1,6,55,3,55,23])) in [2,4]
62 assert dict_union({1:4, 2:5, 3:4},{5:7, 2:9}) == {1:4, 2:9, 3:4, 5:7}
63 print("Passed unit test in utilities")
64
65 if __name__ == "__main__":
66 test()
21
22 2. Agent Architectures and Hierarchical Control
13 class Agent(object):
14 def __init__(self,env):
15 """set up the agent"""
16 self.env=env
17
18 def go(self,n):
19 """acts for n time steps"""
20 raise NotImplementedError("go") # abstract method
agents.py — (continued)
agents.py — (continued)
33 class TP_env(Environment):
34 prices = [234, 234, 234, 234, 255, 255, 275, 275, 211, 211, 211,
35 234, 234, 234, 234, 199, 199, 275, 275, 234, 234, 234, 234, 255,
36 255, 260, 260, 265, 265, 265, 265, 270, 270, 255, 255, 260, 260,
37 265, 265, 150, 150, 265, 265, 270, 270, 255, 255, 260, 260, 265,
38 265, 265, 265, 270, 270, 211, 211, 255, 255, 260, 260, 265, 265,
39 260, 265, 270, 270, 205, 255, 255, 260, 260, 265, 265, 265, 265,
40 270, 270]
41 max_price_addon = 20 # maximum of random value added to get price
42
43 def __init__(self):
44 """paper buying agent"""
45 self.time=0
46 self.stock=20
47 self.stock_history = [] # memory of the stock history
48 self.price_history = [] # memory of the price history
49
50 def initial_percepts(self):
51 """return initial percepts"""
52 self.stock_history.append(self.stock)
53 price = self.prices[0]+random.randrange(self.max_price_addon)
54 self.price_history.append(price)
55 return {'price': price,
56 'instock': self.stock}
57
58 def do(self, action):
59 """does action (buy) and returns percepts (price and instock)"""
60 used = pick_from_dist({6:0.1, 5:0.1, 4:0.2, 3:0.3, 2:0.2, 1:0.1})
61 bought = action['buy']
62 self.stock = self.stock+bought-used
63 self.stock_history.append(self.stock)
64 self.time += 1
65 price = (self.prices[self.time%len(self.prices)] # repeating pattern
66 +random.randrange(self.max_price_addon) # plus randomness
67 +self.time//2) # plus inflation
68 self.price_history.append(price)
69 return {'price': price,
70 'instock': self.stock}
The pick from dist method takes in a item : probability dictionary, and returns
one of the items in proportion to its probability.
agents.py — (continued)
72 def pick_from_dist(item_prob_dist):
73 """ returns a value from a distribution.
74 item_prob_dist is an item:probability dictionary, where the
75 probabilities sum to 1.
76 returns an item chosen in proportion to its probability
77 """
78 ranreal = random.random()
agents.py — (continued)
86 class TP_agent(Agent):
87 def __init__(self, env):
88 self.env = env
89 self.spent = 0
90 percepts = env.initial_percepts()
91 self.ave = self.last_price = percepts['price']
92 self.instock = percepts['instock']
93
94 def go(self, n):
95 """go for n time steps
96 """
97 for i in range(n):
98 if self.last_price < 0.9*self.ave and self.instock < 60:
99 tobuy = 48
100 elif self.instock < 12:
101 tobuy = 12
102 else:
103 tobuy = 0
104 self.spent += tobuy*self.last_price
105 percepts = env.do({'buy': tobuy})
106 self.last_price = percepts['price']
107 self.ave = self.ave+(self.last_price-self.ave)*0.05
108 self.instock = percepts['instock']
Set up an environment and an agent. Uncomment the last lines to run the agent
for 90 steps, and determine the average amount spent.
agents.py — (continued)
2.2.3 Plotting
The following plots the price and number in stock history:
agents.py — (continued)
In this implementation, each layer, including the top layer, implements the en-
vironment class, because each layer is seen as an environment from the layer
above.
We arbitrarily divide the environment and the body, so that the environ-
ment just defines the walls, and the body includes everything to do with the
agent. Note that the named locations are part of the (top-level of the) agent,
not part of the environment, although they could have been.
2.3.1 Environment
The environment defines the walls.
agentEnv.py — Agent environment
11 import math
12 from agents import Environment
13
14 class Rob_env(Environment):
15 def __init__(self,walls = {}):
16 """walls is a set of line segments
17 where each line segment is of the form ((x0,y0),(x1,y1))
18 """
19 self.walls = walls
2.3.2 Body
The body defines everything about the agent body.
agentEnv.py — (continued)
21 import math
22 from agents import Environment
23 import matplotlib.pyplot as plt
24 import time
25
26 class Rob_body(Environment):
27 def __init__(self, env, init_pos=(0,0,90)):
28 """ env is the current environment
29 init_pos is a triple of (x-position, y-position, direction)
30 direction is in degrees; 0 is to right, 90 is straight-up, etc
31 """
32 self.env = env
33 self.rob_x, self.rob_y, self.rob_dir = init_pos
34 self.turning_angle = 18 # degrees that a left makes
35 self.whisker_length = 6 # length of the whisker
36 self.whisker_angle = 30 # angle of whisker relative to robot
37 self.crashed = False
38 # The following control how it is plotted
39 self.plotting = True # whether the trace is being plotted
40 self.sleep_time = 0.05 # time between actions (for real-time
plotting)
41 # The following are data structures maintained:
42 self.history = [(self.rob_x, self.rob_y)] # history of (x,y)
positions
43 self.wall_history = [] # history of hitting the wall
44
45 def percepts(self):
46 return {'rob_x_pos':self.rob_x, 'rob_y_pos':self.rob_y,
47 'rob_dir':self.rob_dir, 'whisker':self.whisker() ,
'crashed':self.crashed}
48 initial_percepts = percepts # use percept function for initial percepts
too
49
50 def do(self,action):
51 """ action is {'steer':direction}
This detects if the whisker and the wall intersect. It’s value is returned as a
percept.
agentEnv.py — (continued)
75 def whisker(self):
76 """returns true whenever the whisker sensor intersects with a wall
77 """
78 whisk_ang_world = (self.rob_dir-self.whisker_angle)*math.pi/180
79 # angle in radians in world coordinates
80 wx = self.rob_x + self.whisker_length * math.cos(whisk_ang_world)
81 wy = self.rob_y + self.whisker_length * math.sin(whisk_ang_world)
82 whisker_line = ((self.rob_x,self.rob_y),(wx,wy))
83 hit = any(line_segments_intersect(whisker_line,wall)
84 for wall in self.env.walls)
85 if hit:
86 self.wall_history.append((self.rob_x, self.rob_y))
87 if self.plotting:
88 plt.plot([self.rob_x],[self.rob_y],"ro")
89 plt.draw()
90 return hit
91
92 def line_segments_intersect(linea,lineb):
93 """returns true if the line segments, linea and lineb intersect.
94 A line segment is represented as a pair of points.
95 A point is represented as a (x,y) pair.
96 """
97 ((x0a,y0a),(x1a,y1a)) = linea
98 ((x0b,y0b),(x1b,y1b)) = lineb
99 da, db = x1a-x0a, x1b-x0b
100 ea, eb = y1a-y0a, y1b-y0b
101 denom = db*ea-eb*da
102 if denom==0: # line segments are parallel
103 return False
104 cb = (da*(y0b-y0a)-ea*(x0b-x0a))/denom # position along line b
105 if cb<0 or cb>1:
106 return False
107 ca = (db*(y0b-y0a)-eb*(x0b-x0a))/denom # position along line a
108 return 0<=ca<=1
109
110 # Test cases:
111 # assert line_segments_intersect(((0,0),(1,1)),((1,0),(0,1)))
112 # assert not line_segments_intersect(((0,0),(1,1)),((1,0),(0.6,0.4)))
113 # assert line_segments_intersect(((0,0),(1,1)),((1,0),(0.4,0.6)))
31 """
32 if 'timeout' in action:
33 remaining = action['timeout']
34 else:
35 remaining = -1 # will never reach 0
36 target_pos = action['go_to']
37 arrived = self.close_enough(target_pos)
38 while not arrived and remaining != 0:
39 self.percepts = self.env.do({"steer":self.steer(target_pos)})
40 remaining -= 1
41 arrived = self.close_enough(target_pos)
42 return {'arrived':arrived}
This determines how to steer depending on whether the goal is to the right or
the left of where the robot is facing.
agentMiddle.py — (continued)
44 def steer(self,target_pos):
45 if self.percepts['whisker']:
46 self.display(3,'whisker on', self.percepts)
47 return "left"
48 else:
49 gx,gy = target_pos
50 rx,ry = self.percepts['rob_x_pos'],self.percepts['rob_y_pos']
51 goal_dir = math.acos((gx-rx)/math.sqrt((gx-rx)*(gx-rx)
52 +(gy-ry)*(gy-ry)))*180/math.pi
53 if ry>gy:
54 goal_dir = -goal_dir
55 goal_from_rob = (goal_dir -
self.percepts['rob_dir']+540)%360-180
56 assert -180 < goal_from_rob <= 180
57 if goal_from_rob > self.straight_angle:
58 return "left"
59 elif goal_from_rob < -self.straight_angle:
60 return "right"
61 else:
62 return "straight"
63
64 def close_enough(self,target_pos):
65 gx,gy = target_pos
66 rx,ry = self.percepts['rob_x_pos'],self.percepts['rob_y_pos']
67 return (gx-rx)**2 + (gy-ry)**2 <= self.close_threshold_squared
2.3.5 Plotting
The following is used to plot the locations, the walls and (eventually) the move-
ment of the robot. It can either plot the movement if the robot as it is go-
ing (with the default env.plotting = True), or not plot it as it is going (setting
env.plotting = False; in this case the trace can be plotted using pl.plot run()).
agentTop.py — (continued)
49 plt.plot([x0,x1],[y0,y1],"-k",linewidth=3)
50 for loc in top.locations:
51 (x,y) = top.locations[loc]
52 plt.plot([x],[y],"k<")
53 plt.text(x+1.0,y+0.5,loc) # print the label above and to the
right
54 plt.plot([body.rob_x],[body.rob_y],"go")
55 plt.draw()
56
57 def plot_run(self):
58 """plots the history after the agent has finished.
59 This is typically only used if body.plotting==False
60 """
61 xs,ys = zip(*self.body.history)
62 plt.plot(xs,ys,"go")
63 wxs,wys = zip(*self.body.wall_history)
64 plt.plot(wxs,wys,"ro")
65 #plt.draw()
Exercise 2.1 The following code implements a robot trap. Write a controller that
can escape the “trap” and get to the goal. See textbook for hints.
agentTop.py — (continued)
90 # pl=Plot_env(trap_body,trap_top)
91 # trap_top.do({'visit':['goal']})
• a start node
33
34 3. Searching for Solutions
The neighbors is a list of arcs. A (directed) arc consists of a from node node
and a to node node. The arc is the pair ⟨from node, to node⟩, but can also contain
a non-negative cost (which defaults to 1) and can be labeled with an action.
searchProblem.py — (continued)
36 class Arc(object):
37 """An arc has a from_node and a to_node node and a (non-negative)
cost"""
38 def __init__(self, from_node, to_node, cost=1, action=None):
39 assert cost >= 0, ("Cost cannot be negative for"+
40 str(from_node)+"->"+str(to_node)+", cost:
"+str(cost))
41 self.from_node = from_node
42 self.to_node = to_node
43 self.action = action
44 self.cost=cost
45
46 def __repr__(self):
47 """string representation of an arc"""
48 if self.action:
49 return str(self.from_node)+" --"+str(self.action)+"-->
"+str(self.to_node)
50 else:
51 return str(self.from_node)+" --> "+str(self.to_node)
• a start node
To define a search problem, we need to define the start node, the goal predicate,
the neighbors function and the heuristic function.
searchProblem.py — (continued)
53 class Search_problem_from_explicit_graph(Search_problem):
54 """A search problem consists of:
55 * a list or set of nodes
56 * a list or set of arcs
57 * a start node
58 * a list or set of goal nodes
59 * a dictionary that maps each node into its heuristic value.
60 * a dictionary that maps each node into its (x,y) position
61 """
62
63 def __init__(self, nodes, arcs, start=None, goals=set(), hmap={},
positions={}):
64 self.neighs = {}
65 self.nodes = nodes
66 for node in nodes:
67 self.neighs[node]=[]
68 self.arcs = arcs
69 for arc in arcs:
70 self.neighs[arc.from_node].append(arc)
71 self.start = start
72 self.goals = goals
73 self.hmap = hmap
74 self.positions = positions
75
76 def start_node(self):
77 """returns start node"""
78 return self.start
79
80 def is_goal(self,node):
81 """is True if node is a goal"""
82 return node in self.goals
83
84 def neighbors(self,node):
85 """returns the neighbors of node"""
86 return self.neighs[node]
87
88 def heuristic(self,node):
89 """Gives the heuristic value of node n.
90 Returns 0 if not overridden in the hmap."""
91 if node in self.hmap:
92 return self.hmap[node]
93 else:
94 return 0
95
96 def __repr__(self):
97 """returns a string representation of the search problem"""
98 res=""
99 for arc in self.arcs:
100 res += str(arc)+". "
101 return res
3.1.2 Paths
A searcher will return a path from the start node to a goal node. A Python list
is not a suitable representation for a path, as many search algorithms consider
multiple paths at once, and these paths should share initial parts of the path.
If we wanted to do this with Python lists, we would need to keep copying the
list, which can be expensive if the list is long. An alternative representation is
used here in terms of a recursive data structure that can share subparts.
A path is either:
• a path, initial and an arc, where the from node of the arc is the node at the
end of initial.
These cases are distinguished in the following code by having arc = None if the
path has length 0, in which case initial is the node of the path. Python yield is
used for enumerations only
searchProblem.py — (continued)
114 self.arc=arc
115 if arc is None:
116 self.cost=0
117 else:
118 self.cost = initial.cost+arc.cost
119
120 def end(self):
121 """returns the node at the end of the path"""
122 if self.arc is None:
123 return self.initial
124 else:
125 return self.arc.to_node
126
127 def nodes(self):
128 """enumerates the nodes for the path.
129 This starts at the end and enumerates nodes in the path
backwards."""
130 current = self
131 while current.arc is not None:
132 yield current.arc.to_node
133 current = current.initial
134 yield current.initial
135
136 def initial_nodes(self):
137 """enumerates the nodes for the path before the end node.
138 This starts at the end and enumerates nodes in the path
backwards."""
139 if self.arc is not None:
140 yield from self.initial.nodes()
141
142 def __repr__(self):
143 """returns a string representation of a path"""
144 if self.arc is None:
145 return str(self.initial)
146 elif self.arc.action:
147 return (str(self.initial)+"\n --"+str(self.arc.action)
148 +"--> "+str(self.arc.to_node))
149 else:
150 return str(self.initial)+" --> "+str(self.arc.to_node)
A
1 3
1
C B
1 3
3
D G
1
A H
3
1 1
B 1 D G J
1
3 3
C E
The second search problem is one with 8 nodes where many paths do not lead
to the goal. See Figure 3.2.
searchProblem.py — (continued)
4 J 4 G 0
E 3 3
2 5
7 5 B H 3
F
3
2 2 4
C A D
3 4
9 7 6
Figure 3.3: simp delivery graph with arc costs and h values of nodes
The third search problem is a disconnected graph (contains no arcs), where the
start node is a goal node. This is a boundary case to make sure that weird cases
work.
searchProblem.py — (continued)
4 G
J
E 3
2
6 H
B F
3
2 2 4
C A D
3 4
194 'F': 5,
195 'G': 0,
196 'H': 3,
197 'J': 4,
198 })
cyclic simp delivery graph is the graph shown Figure 3.4. This is the
graph of Figure 3.10 in the third edition of the textbook. The heuristic values
are the same as in simp delivery graph.
searchProblem.py — (continued)
223 'B': 5,
224 'C': 9,
225 'D': 6,
226 'E': 3,
227 'F': 5,
228 'G': 0,
229 'H': 3,
230 'J': 4,
231 })
searchProblem.py — (continued)
316 'c1' : 6,
317 'c2' : 10,
318 'c3' : 12,
319 'storage' : 12
320 }
321 )
3.2.1 Searcher
A Searcher for a problem can be asked repeatedly for the next path. To solve a
problem, we can construct a Searcher object for the problem and then repeatedly
ask for the next path using search. If there are no more paths, None is returned.
37 def search(self):
38 """returns (next) path from the problem's start node
39 to a goal node.
40 Returns None if no path exists.
41 """
42 while not self.empty_frontier():
43 path = self.frontier.pop()
44 self.display(2, "Expanding:",path,"(cost:",path.cost,")")
45 self.num_expanded += 1
46 if self.problem.is_goal(path.end()): # solution found
47 self.display(1, self.num_expanded, "paths have been expanded
and",
48 len(self.frontier), "paths remain in the
frontier")
49 self.solution = path # store the solution found
50 return path
51 else:
52 neighs = self.problem.neighbors(path.end())
53 self.display(3,"Neighbors are", neighs)
54 for arc in reversed(list(neighs)):
55 self.add_to_frontier(Path(path,arc))
56 self.display(3,"Frontier:",self.frontier)
57 self.display(1,"No (more) solutions. Total of",
58 self.num_expanded,"paths expanded.")
Note that this reverses the neighbors so that it implements depth-first search in
an intuitive manner (expanding the first neighbor first). The call to list is for the
case when the neighbors are generated (and not already in a list). Reversing the
neighbors might not be required for other methods. The calls to reversed and
list can be removed, and the algorithm still implements depth-fist search.
To use depth-first search to find multiple paths for problem1 and simp delivery graph,
copy and paste the following into Python’s read-evaluate-print loop; keep find-
ing next solutions until there are no more:
searchGeneric.py — (continued)
Exercise 3.1 Implement breadth-first search. Only add to frontier and/or pop need
to be modified to implement a first-in first-out queue.
searchGeneric.py — (continued)
The following methods are used for finding and printing information about
the frontier.
searchGeneric.py — (continued)
3.2.3 A∗ Search
For an A∗ Search the frontier is implemented using the FrontierPQ class.
searchGeneric.py — (continued)
138
139 def test(SearchClass, problem=searchProblem.problem1,
solutions=[['G','D','B','C','A']] ):
140 """Unit test for aipython searching algorithms.
141 SearchClass is a class that takes a problem and implements search()
142 problem is a search problem
143 solutions is a list of optimal solutions
144 """
145 print("Testing problem 1:")
146 schr1 = SearchClass(problem)
147 path1 = schr1.search()
148 print("Path found:",path1)
149 assert path1 is not None, "No path is found in problem1"
150 assert list(path1.nodes()) in solutions, "Shortest path not found in
problem1"
151 print("Passed unit test")
152
153 if __name__ == "__main__":
154 #test(Searcher)
155 test(AStarSearcher)
156
157 # example queries:
158 # searcher1 = Searcher(searchProblem.acyclic_delivery_problem) # DFS
159 # searcher1.search() # find first path
160 # searcher1.search() # find next path
161 # searcher2 = AStarSearcher(searchProblem.acyclic_delivery_problem) # A*
162 # searcher2.search() # find first path
163 # searcher2.search() # find next path
164 # searcher3 = Searcher(searchProblem.cyclic_delivery_problem) # DFS
165 # searcher3.search() # find first path with DFS. What do you expect to
happen?
166 # searcher4 = AStarSearcher(searchProblem.cyclic_delivery_problem) # A*
167 # searcher4.search() # find first path
Exercise 3.2 Change the code so that it implements (i) best-first search and (ii)
lowest-cost-first search. For each of these methods compare it to A∗ in terms of the
number of paths expanded, and the path found.
Exercise 3.3 In the add method in FrontierPQ what does the ”-” in front of frontier index
do? When there are multiple paths with the same f -value, which search method
does this act like? What happens if the ”-” is removed? When there are multiple
paths with the same value, which search method does this act like? Does it work
better with or without the ”-”? What evidence did you base your conclusion on?
Exercise 3.4 The searcher acts like a Python iterator, in that it returns one value
(here a path) and then returns other values (paths) on demand, but does not imple-
ment the iterator interface. Change the code so it implements the iterator interface.
What does this enable us to do?
49 if __name__ == "__main__":
50 test(SearcherMPP)
51
52 import searchProblem
53 # searcherMPPcdp = SearcherMPP(searchProblem.cyclic_delivery_problem)
54 # print(searcherMPPcdp.search()) # find first path
Depth-first search methods do not need an a priority queue, but can use
a list as a stack. In this implementation of branch-and-bound search, we call
search to find an optimal solution with cost less than bound. This uses depth-
first search to find a path to a goal that extends path with cost less than the
bound. Once a path to a goal has been found, that path is remembered as the
best path, the bound is reduced, and the search continues.
searchBranchAndBound.py — Branch and Bound Search
11 from searchProblem import Path
12 from searchGeneric import Searcher
13 from display import Displayable, visualize
14
15 class DF_branch_and_bound(Searcher):
16 """returns a branch and bound searcher for a problem.
17 An optimal path with cost less than bound can be found by calling
search()
18 """
19 def __init__(self, problem, bound=float("inf")):
20 """creates a searcher than can be used with search() to find an
optimal path.
21 bound gives the initial bound. By default this is infinite -
meaning there
22 is no initial pruning due to depth bound
23 """
24 super().__init__(problem)
25 self.best_path = None
26 self.bound = bound
27
28 @visualize
29 def search(self):
30 """returns an optimal solution to a problem with cost less than
bound.
31 returns None if there is no solution with cost less than bound."""
32 self.frontier = [Path(self.problem.start_node())]
33 self.num_expanded = 0
34 while self.frontier:
35 path = self.frontier.pop()
36 if path.cost+self.problem.heuristic(path.end()) < self.bound:
37 # if path.end() not in path.initial_nodes(): # for cycle
pruning
38 self.display(3,"Expanding:",path,"cost:",path.cost)
39 self.num_expanded += 1
40 if self.problem.is_goal(path.end()):
41 self.best_path = path
42 self.bound = path.cost
43 self.display(2,"New best path:",path," cost:",path.cost)
44 else:
45 neighs = self.problem.neighbors(path.end())
46 self.display(3,"Neighbors are", neighs)
47 for arc in reversed(list(neighs)):
48 self.add_to_frontier(Path(path, arc))
49 self.display(1,"Number of paths expanded:",self.num_expanded,
50 "(optimal" if self.best_path else "(no", "solution
found)")
51 self.solution = self.best_path
52 return self.best_path
Note that this code used reversed in order to expand the neighbors of a node
in the left-to-right order one might expect. It does this because pop() removes
the rightmost element of the list. The call to list is there because reversed only
works on lists and tuples, but the neighbors can be generated.
Here is a unit test and some queries:
searchBranchAndBound.py — (continued)
Exercise 3.7 After the branch-and-bound search found a solution, Sam ran search
again, and noticed a different count. Sam hypothesized that this count was related
to the number of nodes that an A∗ search would use (either expand or be added to
the frontier). Or maybe, Sam thought, the count for a number of nodes when the
bound is slightly above the optimal path case is related to how A∗ would work.
Is there relationship between these counts? Are there different things that it could
count so they are related? Try to find the most specific statement that is true, and
explain why it is true.
To test the hypothesis, Sam wrote the following code, but isn’t sure it is helpful:
46
47 print("\nDepth-first search: (Use ˆC if it goes on forever)")
48 tsearcher = Searcher(problem)
49 print("Path found:",tsearcher.search()," cost=",tsearcher.solution.cost)
50
51
52 import searchProblem
53 from searchTest import run
54 if __name__ == "__main__":
55 run(searchProblem.problem1,"Problem 1")
56 # run(searchProblem.acyclic_delivery_problem,"Acyclic Delivery")
57 # run(searchProblem.cyclic_delivery_problem,"Cyclic Delivery")
58 # also test some graphs with cycles, and some with multiple least-cost
paths
53
54 4. Reasoning with Constraints
32 def __str__(self):
33 return self.name
34
35 def __repr__(self):
36 return self.name # f"Variable({self.name})"
4.1.2 Constraints
A constraint consists of:
• An optional name
cspProblem.py — (continued)
38 class Constraint(object):
39 """A Constraint consists of
40 * scope: a tuple of variables
41 * condition: a Boolean function that can applied to a tuple of values
for variables in scope
42 * string: a string for printing the constraints. All of the strings
must be unique.
43 for the variables
44 """
45 def __init__(self, scope, condition, string=None, position=None):
46 self.scope = scope
47 self.condition = condition
48 if string is None:
49 self.string = self.condition.__name__ + str(self.scope)
50 else:
51 self.string = string
52 self.position = position
53
54 def __repr__(self):
55 return self.string
An assignment is a variable:value dictionary.
If con is a constraint, con.holds(assignment) returns True or False depending
on whether the condition is true or false for that assignment. The assignment
assignment must assigns a value to every variable in the scope of the constraint
con (and could also assign values other variables); con.holds gives an error if
not all variables in the scope of con are assigned in the assignment. It ignores
variables in assignment that are not in the scope of the constraint.
In Python, the ∗ notation is used for unpacking a tuple. For example,
F(∗(1, 2, 3)) is the same as F(1, 2, 3). So if t has value (1, 2, 3), then F(∗t) is
the same as F(1, 2, 3).
cspProblem.py — (continued)
4.1.3 CSPs
A constraint satisfaction problem (CSP) requires:
cspProblem.py — (continued)
71 class CSP(object):
72 """A CSP consists of
73 * a title (a string)
74 * variables, a set of variables
75 * constraints, a list of constraints
76 * var_to_const, a variable to set of constraints dictionary
77 """
78 def __init__(self, title, variables, constraints):
79 """title is a string
80 variables is set of variables
81 constraints is a list of constraints
82 """
83 self.title = title
84 self.variables = variables
85 self.constraints = constraints
86 self.var_to_const = {var:set() for var in self.variables}
87 for con in constraints:
88 for var in con.scope:
89 self.var_to_const[var].add(con)
90
91 def __str__(self):
92 """string representation of CSP"""
93 return str(self.title)
94
95 def __repr__(self):
96 """more detailed string representation of CSP"""
97 return f"CSP({self.title}, {self.variables}, {([str(c) for c in
self.constraints])})"
99 def consistent(self,assignment):
100 """assignment is a variable:value dictionary
101 returns True if all of the constraints that can be evaluated
102 evaluate to True given assignment.
103 """
104 return all(con.holds(assignment)
105 for con in self.constraints
106 if con.can_evaluate(assignment))
The show method uses matplotlib to show the graphical structure of a con-
straint network.
cspProblem.py — (continued)
4.1.4 Examples
In the following code ne , when given a number, returns a function that is true
when its argument is not that number. For example, if f = ne (3), then f (2)
is True and f (3) is False. That is, ne (x)(y) is true when x ̸= y. Allowing
a function of multiple arguments to use its arguments one at a time is called
currying, after the logician Haskell Curry. Functions used as conditions in
constraints require names (so they can be printed).
cspExamples.py — Example CSPs
11 from cspProblem import Variable, CSP, Constraint
12 from operator import lt,ne,eq,gt
13
14 def ne_(val):
15 """not equal value"""
16 # nev = lambda x: x != val # alternative definition
17 # nev = partial(neq,val) # another alternative definition
18 def nev(x):
19 return val != x
20 nev.__name__ = str(val)+"!=" # name of the function
21 return nev
Similarly is (x)(y) is true when x = y.
cspExamples.py — (continued)
23 def is_(val):
24 """is a value"""
25 # isv = lambda x: x == val # alternative definition
26 # isv = partial(eq,val) # another alternative definition
27 def isv(x):
28 return val == x
29 isv.__name__ = str(val)+"=="
30 return isv
The CSP, csp0 has variables X, Y and Z, each with domain {1, 2, 3}. The con-
straints are X < Y and Y < Z.
cspExamples.py — (continued)
32 X = Variable('X', {1,2,3})
33 Y = Variable('Y', {1,2,3})
csp1
A B B != 2
C
A<B
B<C
34 Z = Variable('Z', {1,2,3})
35 csp0 = CSP("csp0", {X,Y,Z},
36 [ Constraint([X,Y],lt),
37 Constraint([Y,Z],lt)])
The CSP, csp1 has variables A, B and C, each with domain {1, 2, 3, 4}. The con-
straints are A < B, B ̸= 2 and B < C. This is slightly more interesting than csp0
as it has more solutions. This example is used in the unit tests, and so if it is
changed, the unit tests need to be changed.
cspExamples.py — (continued)
csp2
A A != B B B != 3
A=D B != D A != C
A>E B>E
D C<D C
D>E C>E C != 2
The following example is another scheduling problem (but with multiple an-
swers). This is the same a scheduling 2 in the original AIspace.org consistency
app.
cspExamples.py — (continued)
csp3
A A != B B
A<D
D != E C != E
cspExamples.py — (continued)
72 def adjacent(x,y):
73 """True when x and y are adjacent numbers"""
74 return abs(x-y) == 1
75
76 csp4 = CSP("csp4", {A,B,C,D,E},
77 [Constraint([A,B], adjacent, "adjacent(A,B)"),
78 Constraint([B,C], adjacent, "adjacent(B,C)"),
79 Constraint([C,D], adjacent, "adjacent(C,D)"),
80 Constraint([D,E], adjacent, "adjacent(D,E)"),
81 Constraint([A,C], ne, "A != C"),
82 Constraint([B,D], ne, "B != D"),
83 Constraint([C,E], ne, "C != E")])
csp4
A adjacent(A,B) B
B != D A != C adjacent(B,C)
D adjacent(C,D) C
adjacent(D,E) C != E
1 2
Words:
3
ant, big, bus, car, has,
book, buys, hold, lane,
year, ginger, search,
symbol, syntax.
4
crossword1
one_across
meet_at(2,0)[one_across, two_down]
meet_at(0,0)[one_across, one_down] two_down
one_down
meet_at(2,2)[three_across, two_down]
meet_at(0,2)[three_across, one_down]
meet_at(0,4)[four_across, two_down]
three_across
four_across
means that the third letter (at position 2) of the first argument is the same as
the first letter of the second argument. This is shown in Figure 4.6.
cspExamples.py — (continued)
85 def meet_at(p1,p2):
86 """returns a function of two words that is true
87 when the words intersect at postions p1, p2.
88 The positions are relative to the words; starting at position 0.
89 meet_at(p1,p2)(w1,w2) is true if the same letter is at position p1 of
word w1
90 and at position p2 of word w2.
91 """
92 def meets(w1,w2):
93 return w1[p1] == w2[p2]
94 meets.__name__ = "meet_at("+str(p1)+','+str(p2)+')'
95 return meets
96
97 one_across = Variable('one_across', {'ant', 'big', 'bus', 'car', 'has'},
position=(0.3,0.9))
98 one_down = Variable('one_down', {'book', 'buys', 'hold', 'lane', 'year'},
position=(0.1,0.7))
99 two_down = Variable('two_down', {'ginger', 'search', 'symbol', 'syntax'},
position=(0.9,0.8))
100 three_across = Variable('three_across', {'book', 'buys', 'hold', 'land',
'year'}, position=(0.1,0.3))
101 four_across = Variable('four_across',{'ant', 'big', 'bus', 'car', 'has'},
position=(0.7,0.0))
crossword1d
is_word[p00, p10, p20]
p01 p21
p03 p23
p25
cspExamples.py — (continued)
119 "z"}
120
121 # pij is the variable representing the letter i from the left and j down
(starting from 0)
122 p00 = Variable('p00', letters, position=(0.1,0.85))
123 p10 = Variable('p10', letters, position=(0.3,0.85))
124 p20 = Variable('p20', letters, position=(0.5,0.85))
125 p01 = Variable('p01', letters, position=(0.1,0.7))
126 p21 = Variable('p21', letters, position=(0.5,0.7))
127 p02 = Variable('p02', letters, position=(0.1,0.55))
128 p12 = Variable('p12', letters, position=(0.3,0.55))
129 p22 = Variable('p22', letters, position=(0.5,0.55))
130 p32 = Variable('p32', letters, position=(0.7,0.55))
131 p03 = Variable('p03', letters, position=(0.1,0.4))
132 p23 = Variable('p23', letters, position=(0.5,0.4))
133 p24 = Variable('p24', letters, position=(0.5,0.25))
134 p34 = Variable('p34', letters, position=(0.7,0.25))
135 p44 = Variable('p44', letters, position=(0.9,0.25))
136 p25 = Variable('p25', letters, position=(0.5,0.1))
137
138 crossword1d = CSP("crossword1d",
139 {p00, p10, p20, # first row
140 p01, p21, # second row
141 p02, p12, p22, p32, # third row
142 p03, p23, #fourth row
143 p24, p34, p44, # fifth row
144 p25 # sixth row
145 },
146 [Constraint([p00, p10, p20], is_word,
position=(0.3,0.95)), #1-across
147 Constraint([p00, p01, p02, p03], is_word,
position=(0,0.625)), # 1-down
148 Constraint([p02, p12, p22, p32], is_word,
position=(0.3,0.625)), # 3-across
149 Constraint([p20, p21, p22, p23, p24, p25], is_word,
position=(0.45,0.475)), # 2-down
150 Constraint([p24, p34, p44], is_word,
position=(0.7,0.325)) # 4-across
151 ])
Exercise 4.1 How many assignments of a value to each variable are there for
each of the representations of the above crossword? Do you think an exhaustive
enumeration will work for either one?
The queens problem is a puzzle on a chess board, where the idea is to place
a queen on each column so the queens cannot take each other: there are no
two queens on the same row, column or diagonal. The n-queens problem is a
generalization where the size of the board is an n × n, and n queens have to be
placed.
cspExamples.py — (continued)
Exercise 4.2 How many constraints does this representation of the n-queens
problem produce? Can it be done with fewer constraints? Either explain why it
can’t be done with fewer constraints, or give a solution using fewer constraints.
Unit tests
The following defines a unit test for csp solvers, by default using example csp1.
cspExamples.py — (continued)
Exercise 4.3 Modify test so that instead of taking in a list of solutions, it checks
whether the returned solution actually is a solution.
Exercise 4.4 Propose a test that is appropriate for CSPs with no solutions. As-
sume that the test designer knows there are no solutions. Consider what a CSP
solver should return if there are no solutions to the CSP.
Exercise 4.5 Write a unit test that checks whether all solutions (e.g., for the search
algorithms that can return multiple solutions) are correct, and whether all solu-
tions can be found.
42 return next(gen)
43 except StopIteration:
44 return None
45
46 if __name__ == "__main__":
47 test_csp(dfs_solve1)
48
49 #Try:
50 # dfs_solve_all(csp1)
51 # dfs_solve_all(csp2)
52 # dfs_solve_all(crossword1)
53 # dfs_solve_all(crossword1d) # warning: may take a *very* long time!
Exercise 4.6 Instead of testing all constraints at every node, change it so each
constraint is only tested when all of it variables are assigned. Given an elimina-
tion ordering, it is possible to determine when each constraint needs to be tested.
Implement this. Hint: create a parallel list of sets of constraints, where at each po-
sition i in the list, the constraints at position i can be evaluated when the variable
at position i has been assigned.
Exercise 4.7 Estimate how long dfs_solve_all(crossword1d) will take on your
computer. To do this, reduce the number of variables that need to be assigned,
so that the simplifies problem can be solved in a reasonable time (between 0.1
second and 10 seconds). This can be done by reducing the number of variables in
var_order, as the program only splits on these. How much more time will it take
if the number of variables is increased by 1? (Try it!) Then extrapolate to all of the
variables. See Section 1.6.1 for how to time your code. Would making the code 100
times faster or using a computer 100 times faster help?
The next solver constructs a search space that can be solved using the search
methods of the previous chapter. This takes in a CSP problem and an optional
variable ordering, which is a list of the variables in the CSP. In this search space:
• A node is a variable : value dictionary which does not violate any con-
straints (so that dictionaries that violate any conmtratints are not added).
Exercise 4.8 What would happen if we constructed the new assignment by as-
signing node[var] = val (with side effects) instead of using dictionary union? Give
an example of where this could give a wrong answer. How could the algorithm be
changed to work with side effects? (Hint: think about what information needs to
be in a node).
Exercise 4.9 Change neighbors so that it returns an iterator of values rather than
a list. (Hint: use yield.)
The following selects an arc. Any element of to do can be selected. The selected
element needs to be removed from to do. The default implementation just se-
lects which ever element pop method for sets returns. A user interface could
allow the user to select an arc. Alternatively a more sophisticated selection
could be employed (or just a stack or a queue).
cspConsistency.py — (continued)
The value of new_domain is the subset of the domain of var that is consistent
with the assignment to the other variables. It might be easier to understand the
following code, which treats unary (with no other variables in the constraint)
and binary (with one other variables in the constraint) constraints as special
cases (this can replace the assignment to new_domain in the above code):
if len(other_vars)==0: # unary constraint
new_domain = {val for val in domains[var]
if const.holds({var:val})}
elif len(other_vars)==1: # binary constraint
other = other_vars[0]
new_domain = {val for val in domains[var]
if any(const.holds({var: val,other:other_val})
for other_val in domains[other])}
else: # general case
new_domain = {val for val in domains[var]
if self.any_holds(domains, const, {var: val}, other_vars)}
any holds is a recursive function that tries to finds an assignment of values to the
other variables (other vars) that satisfies constraint const given the assignment
in env. The integer variable ind specifies which index to other vars needs to be
checked next. As soon as one assignment returns True, the algorithm returns
True. Note that it has side effects with respect to env; it changes the values of
the variables in other vars. It should only be called when the side effects have
no ill effects.
cspConsistency.py — (continued)
111
112 def select_var(self, iter_vars):
113 """return the next variable to split"""
114 return select(iter_vars)
115
116 def partition_domain(dom):
117 """partitions domain dom into two.
118 """
119 split = len(dom) // 2
120 dom1 = set(list(dom)[:split])
121 dom2 = dom - dom1
122 return dom1, dom2
cspConsistency.py — (continued)
cspConsistency.py — (continued)
Exercise 4.10 Implement of solve all that is like solve one but returns the set of all
solutions.
Exercise 4.11 Implement solve enum that enumerates the solutions. It should use
Python’s yield (and perhaps yield from).
Unit test:
cspConsistency.py — (continued)
cspConsistency.py — (continued)
Exercise 4.12 When splitting a domain, this code splits the domain into half,
approximately in half (without any effort to make a sensible choice). Does it work
better to split one element from a domain?
Unit test:
cspConsistency.py — (continued)
Testing:
cspConsistency.py — (continued)
199 from cspExamples import csp1, csp2, csp3, csp4, crossword1, crossword1d
200
201 ## Test Solving CSPs with Arc consistency and domain splitting:
202 #Con_solver.max_display_level = 4 # display details of AC (0 turns off)
203 #Con_solver(csp1).solve_one()
204 #searcher1d = Searcher(Search_with_AC_from_CSP(csp1))
205 #print(searcher1d.search())
206 #Searcher.max_display_level = 2 # display search trace (0 turns off)
207 #searcher2c = Searcher(Search_with_AC_from_CSP(csp2))
208 #print(searcher2c.search())
209 #searcher3c = Searcher(Search_with_AC_from_CSP(crossword1))
210 #print(searcher3c.search())
211 #searcher4c = Searcher(Search_with_AC_from_CSP(crossword1d))
212 #print(searcher4c.search())
This implements both the two-stage choice, the any-conflict algorithm and
a random choice of variable (and a probabilistic mix of the three).
Given a CSP, the stochastic local searcher (SLSearcher) creates the data struc-
tures:
• variables to select is the set of all of the variables with domain-size greater
than one. For a variable not in this set, we cannot pick another value from
that variable.
• var to constraints maps from a variable into the set of constraints it is in-
volved in. Note that the inverse mapping from constraints into variables
is part of the definition of a constraint.
restart creates a new total assignment, and constructs the set of conflicts (the
constraints that are false in this assignment).
cspSLS.py — (continued)
29 def restart(self):
30 """creates a new total assignment and the conflict set
31 """
32 self.current_assignment = {var:random_choice(var.domain) for
33 var in self.csp.variables}
34 self.display(2,"Initial assignment",self.current_assignment)
35 self.conflicts = set()
36 for con in self.csp.constraints:
37 if not con.holds(self.current_assignment):
38 self.conflicts.add(con)
39 self.display(2,"Number of conflicts",len(self.conflicts))
40 self.variable_pq = None
The search method is the top-level searching algorithm. It can either be used
to start the search or to continue searching. If there is no current assignment,
it must create one. Note that, when counting steps, a restart is counted as one
step.
This method selects one of two implementations. The argument pob best
is the probability of selecting a best variable (one involving the most conflicts).
When the value of prob best is positive, the algorithm needs to maintain a prior-
ity queue of variables and the number of conflicts (using search with var pq). If
the probability of selecting a best variable is zero, it does not need to maintain
this priority queue (as implemented in search with any conflict).
The argument prob anycon is the probability that the any-conflict strategy is
used (which selects a variable at random that is in a conflict), assuming that
it is not picking a best variable. Note that for the probability parameters, any
value less that zero acts like probability zero and any value greater than 1 acts
like probability 1. This means that when prob anycon = 1.0, a best variable is
chosen with probability prob best, otherwise a variable in any conflict is chosen.
A variable is chosen at random with probability 1 − prob anycon − prob best as
long as that is positive.
This returns the number of steps needed to find a solution, or None if no
solution is found. If there is a solution, it is in self .current assignment.
cspSLS.py — (continued)
Exercise 4.13 This does an initial random assignment but does not do any ran-
dom restarts. Implement a searcher that takes in the maximum number of walk
steps (corresponding to existing max steps) and the maximum number of restarts,
and returns the total number of steps for the first solution found. (As in search, the
solution found can be extracted from the variable self .current assignment).
4.5.1 Any-conflict
If the probability of picking a best variable is zero, the implementation need to
keeps track of which variables are in conflicts.
cspSLS.py — (continued)
Exercise 4.14 This makes no attempt to find the best alternative value for a vari-
able. Modify the code so that after selecting a variable it selects a value the reduces
the number of conflicts by the most. Have a parameter that specifies the probabil-
ity that the best value is chosen.
cspSLS.py — (continued)
132 var_differential[v] =
var_differential.get(v,0)-1
133 else:
134 if varcon not in self.conflicts: # was consis, not now
135 self.display(3,"Became inconsistent",varcon)
136 self.conflicts.add(varcon)
137 for v in varcon.scope: # v is in one more
conflicts
138 var_differential[v] =
var_differential.get(v,0)+1
139 self.variable_pq.update_each_priority(var_differential)
140 self.display(2,"Number of conflicts",len(self.conflicts))
141 if not self.conflicts: # no conflicts, so solution found
142 self.display(1,"Solution found:",
self.current_assignment,"in",
143 self.number_of_steps,"steps")
144 return self.number_of_steps
145 self.display(1,"No solution in",self.number_of_steps,"steps",
146 len(self.conflicts),"conflicts remain")
147 return None
cspSLS.py — (continued)
cspSLS.py — (continued)
Exercise 4.15 This makes no attempt to find the best alternative value for a vari-
able. Modify the code so that after selecting a variable it selects a value the reduces
the number of conflicts by the most. Have a parameter that specifies the probabil-
ity that the best value is chosen.
Exercise 4.16 These implementations always select a value for the variable se-
lected that is different from its current value (if that is possible). Change the code
so that it does not have this restriction (so it can leave the value the same). Would
you expect this code to be faster? Does it work worse (or better)?
193
194 def remove(self,elt):
195 """remove the element from the priority queue"""
196 if elt in self.elt_map:
197 self.elt_map[elt][2] = self.REMOVED
198 del self.elt_map[elt]
199
200 def update_each_priority(self,update_dict):
201 """update values in the priority queue by subtracting the values in
202 update_dict from the priority of those elements in priority queue.
203 """
204 for elt,incr in update_dict.items():
205 if incr != 0:
206 newval = self.elt_map.get(elt,[0])[0] - incr
207 assert newval <= 0,
str(elt)+":"+str(newval+incr)+"-"+str(incr)
208 self.remove(elt)
209 if newval != 0:
210 self.add(elt,newval)
211
212 def pop(self):
213 """Removes and returns the (elt,value) pair with minimal value.
214 If the priority queue is empty, IndexError is raised.
215 """
216 self.max_size = max(self.max_size, len(self.pq)) # keep statistics
217 triple = heapq.heappop(self.pq)
218 while triple[2] == self.REMOVED:
219 triple = heapq.heappop(self.pq)
220 del self.elt_map[triple[2]]
221 return triple[2], triple[0] # elt, value
222
223 def top(self):
224 """Returns the (elt,value) pair with minimal value, without
removing it.
225 If the priority queue is empty, IndexError is raised.
226 """
227 self.max_size = max(self.max_size, len(self.pq)) # keep statistics
228 triple = self.pq[0]
229 while triple[2] == self.REMOVED:
230 heapq.heappop(self.pq)
231 triple = self.pq[0]
232 return triple[2], triple[0] # elt, value
233
234 def empty(self):
235 """returns True iff the priority queue is empty"""
236 return all(triple[2] == self.REMOVED for triple in self.pq)
cspSLS.py — (continued)
4.5.5 Testing
cspSLS.py — (continued)
cspSoft.py — (continued)
98 else:
99 var = next(var for var in self.csp.variables if var not in
asst)
100 for val in var.domain:
101 self.cbsearch({var:val}|asst, newcost, rem_cons)
102
103 # bnb = DF_branch_and_bound_opt(scsp1)
104 # bnb.max_display_level=3 # show more detail
105 # bnb.optimize()
An askable atom can be asked of the user. The user can respond in English or
French or just with a “y”.
logicProblem.py — (continued)
27 class Askable(object):
28 """An askable atom"""
29
30 def __init__(self,atom):
31 """clause with atom head and lost of atoms body"""
89
90 5. Propositions and Inference
32 self.atom=atom
33
34 def __str__(self):
35 """returns the string representation of a clause."""
36 return "askable " + self.atom + "."
37
38 def yes(ans):
39 """returns true if the answer is yes in some form"""
40 return ans.lower() in ['yes', 'yes.', 'oui', 'oui.', 'y', 'y.'] #
bilingual
A knowledge base is a list of clauses and askables. In order to make top-down
inference faster, this creates a dictionary that maps each atoms into the set of
clauses with that atom in the head.
logicProblem.py — (continued)
71 triv_KB = KB([
72 Clause('i_am', ['i_think']),
73 Clause('i_think'),
74 Clause('i_smell', ['i_exist'])
75 ])
Here is a representation of the electrical domain of the textbook:
logicProblem.py — (continued)
77 elect = KB([
78 Clause('light_l1'),
79 Clause('light_l2'),
80 Clause('ok_l1'),
81 Clause('ok_l2'),
82 Clause('ok_cb1'),
83 Clause('ok_cb2'),
84 Clause('live_outside'),
85 Clause('live_l1', ['live_w0']),
86 Clause('live_w0', ['up_s2','live_w1']),
87 Clause('live_w0', ['down_s2','live_w2']),
88 Clause('live_w1', ['up_s1', 'live_w3']),
89 Clause('live_w2', ['down_s1','live_w3' ]),
90 Clause('live_l2', ['live_w4']),
91 Clause('live_w4', ['up_s3','live_w3' ]),
92 Clause('live_p_1', ['live_w3']),
93 Clause('live_w3', ['live_w5', 'ok_cb1']),
94 Clause('live_p_2', ['live_w6']),
95 Clause('live_w6', ['live_w5', 'ok_cb2']),
96 Clause('live_w5', ['live_outside']),
97 Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
98 Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
99 Askable('up_s1'),
100 Askable('down_s1'),
101 Askable('up_s2'),
102 Askable('down_s2'),
103 Askable('up_s3'),
104 Askable('down_s2')
105 ])
106
107 # print(kb)
The following knowledge base is false of the intended interpretation. One of
the clauses is wrong; can you see which one? We will show how to debug it.
logicProblem.py — (continued)
15 """
16 fp = ask_askables(kb)
17 added = True
18 while added:
19 added = False # added is true when an atom was added to fp this
iteration
20 for c in kb.clauses:
21 if c.head not in fp and all(b in fp for b in c.body):
22 fp.add(c.head)
23 added = True
24 kb.display(2,c.head,"added to fp due to clause",c)
25 return fp
26
27 def ask_askables(kb):
28 return {at for at in kb.askables if yes(input("Is "+at+" true? "))}
The following provides a trivial unit test, by default using the knowledge base
triv_KB:
logicBottomUp.py — (continued)
Exercise 5.1 It is not very user-friendly to ask all of the askables up-front. Imple-
ment ask-the-user so that questions are only asked if useful, and are not re-asked.
For example, if there is a clause h ← a ∧ b ∧ c ∧ d ∧ e, where c and e are askable,
c and e only need to be asked if a, b, d are all in fp and they have not been asked
before. Askable e only needs to be asked if the user says “yes” to c. Askable c
doesn’t need to be asked if the user previously replied “no” to e.
This form of ask-the-user can ask a different set of questions than the top-
down interpreter that asks questions when encountered. Give an example where
they ask different questions (neither set of questions asked is a subset of the other).
Exercise 5.2 This algorithm runs in time O(n2 ), where n is the number of clauses,
for a bounded number of elements in the body; each iteration goes through each
of the clauses, and in the worst case, it will do an iteration for each clause. It is
possible to implement this in time O(n) time by creating an index that maps an
atom to the set of clauses with that atom in the body. Implement this. What is its
complexity as a function of n and b, the maximum number of atoms in the body of
a clause?
Exercise 5.4 This code can re-ask a question multiple times. Implement this code
so that it only asks a question once and remembers the answer. Also implement a
function to forget the answers.
Exercise 5.5 What search method is this using? Implement the search interface
so that it can use A∗ or other searching methods. Define an admissible heuristic
that is not always 0.
41 proofs.append(proof_at)
42 return proofs
The following provides a simple unit test that is hard wired for triv_KB:
logicExplain.py — (continued)
logicExplain.py — (continued)
76 try:
77 command = inps[0]
78 if command == "quit":
79 going = False
80 elif command == "ask":
81 proof = prove_atom(kb, inps[1])
82 if proof == "fail":
83 print("fail")
84 else:
85 print("yes")
86 elif command == "how":
87 if proof=="fail":
88 print("there is no proof")
89 elif len(inps)==1:
90 print_rule(proof)
91 else:
92 try:
93 ups.append(proof)
94 proof = proof[1][int(inps[1])] #nth argument of rule
95 print_rule(proof)
96 except:
97 print('In "how n", n must be a number between 0
and',len(proof[1])-1,"inclusive.")
98 elif command == "up":
99 if ups:
100 proof = ups.pop()
101 else:
102 print("No rule to go up to.")
103 print_rule(proof)
104 elif command == "kb":
105 print(kb)
106 elif command == "help":
107 print(helptext)
108 else:
109 print("unknown command:", inp)
110 print("use help for help")
111 except:
112 print("unknown command:", inp)
113 print("use help for help")
114
115 def print_rule(proof):
116 (head,body) = proof
117 if body == "answered":
118 print(head,"was answered yes")
119 elif body == []:
120 print(head,"is a fact")
121 else:
122 print(head,"<-")
123 for i,a in enumerate(body):
124 print(i,":",a[0])
125
126 # try
127 # interact(elect)
128 # Which clause is wrong in elect_bug? Try:
129 # interact(elect_bug)
130 # logicExplain: ask lit_l1
>>> interact(elect)
logicExplain: ask lit_l1
Is up_s2 true? no
Is down_s2 true? yes
Is down_s1 true? yes
yes
logicExplain: how
lit_l1 <-
0 : light_l1
1 : live_l1
2 : ok_l1
logicExplain: how 1
live_l1 <-
0 : live_w0
logicExplain: how 0
live_w0 <-
0 : down_s2
1 : live_w2
logicExplain: how 0
down_s2 was answered yes
logicExplain: up
live_w0 <-
0 : down_s2
1 : live_w2
logicExplain: how 1
live_w2 <-
0 : down_s1
1 : live_w3
logicExplain: quit
>>>
Exercise 5.6 The above code only ever explores one proof – the first proof found.
Change the code to enumerate the proof trees (by returning a list all proof trees,
or preferably using yield). Add the command ”retry” to the user interface to try
another proof.
5.5 Assumables
Atom a can be made assumable by including Assumable(a) in the knowledge
base. A knowledge base that can include assumables is declared with KBA.
logicAssumables.py — Definite clauses with assumables
11 from logicProblem import Clause, Askable, KB, yes
12
13 class Assumable(object):
14 """An askable atom"""
15
16 def __init__(self,atom):
17 """clause with atom head and lost of atoms body"""
18 self.atom = atom
19
20 def __str__(self):
21 """returns the string representation of a clause.
22 """
23 return "assumable " + self.atom + "."
24
25 class KBA(KB):
26 """A knowledge base that can include assumables"""
27 def __init__(self,statements):
28 self.assumables = [c.atom for c in statements if isinstance(c,
Assumable)]
29 KB.__init__(self,statements)
The top-down Horn clause interpreter, prove all ass returns a list of the sets of
assumables that imply ans body. This list will contain all of the minimal sets
of assumables, but can also find non-minimal sets, and repeated sets, if they
can be generated with separate proofs. The set assumed is the set of assumables
already assumed.
logicAssumables.py — (continued)
48 for cl in self.clauses_for_atom(selected)
49 for ass in
self.prove_all_ass(cl.body+ans_body[1:],assumed)
50 ] # union of answers for each clause with
head=selected
51 else: # empty body
52 return [assumed] # one answer
53
54 def conflicts(self):
55 """returns a list of minimal conflicts"""
56 return minsets(self.prove_all_ass(['false']))
Given a list of sets, minsets returns a list of the minimal sets in the list. For
example, minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}]) returns [{2, 3}, {2, 4, 5}].
logicAssumables.py — (continued)
58 def minsets(ls):
59 """ls is a list of sets
60 returns a list of minimal sets in ls
61 """
62 ans = [] # elements known to be minimal
63 for c in ls:
64 if not any(c1<c for c1 in ls) and not any(c1 <= c for c1 in ans):
65 ans.append(c)
66 return ans
67
68 # minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
Warning: minsets works for a list of sets or for a set of (frozen) sets, but it does
not work for a generator of sets. For example, try to predict and then test:
minsets(e for e in [{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
The diagnoses can be constructed from the (minimal) conflicts as follows.
This also works if there are non-minimal conflicts, but is not as efficient.
logicAssumables.py — (continued)
69 def diagnoses(cons):
70 """cons is a list of (minimal) conflicts.
71 returns a list of diagnoses."""
72 if cons == []:
73 return [set()]
74 else:
75 return minsets([({e}|d) # | is set union
76 for e in cons[0]
77 for d in diagnoses(cons[1:])])
Test cases:
logicAssumables.py — (continued)
80 electa = KBA([
81 Clause('light_l1'),
82 Clause('light_l2'),
83 Assumable('ok_l1'),
84 Assumable('ok_l2'),
85 Assumable('ok_s1'),
86 Assumable('ok_s2'),
87 Assumable('ok_s3'),
88 Assumable('ok_cb1'),
89 Assumable('ok_cb2'),
90 Assumable('live_outside'),
91 Clause('live_l1', ['live_w0']),
92 Clause('live_w0', ['up_s2','ok_s2','live_w1']),
93 Clause('live_w0', ['down_s2','ok_s2','live_w2']),
94 Clause('live_w1', ['up_s1', 'ok_s1', 'live_w3']),
95 Clause('live_w2', ['down_s1', 'ok_s1','live_w3' ]),
96 Clause('live_l2', ['live_w4']),
97 Clause('live_w4', ['up_s3','ok_s3','live_w3' ]),
98 Clause('live_p_1', ['live_w3']),
99 Clause('live_w3', ['live_w5', 'ok_cb1']),
100 Clause('live_p_2', ['live_w6']),
101 Clause('live_w6', ['live_w5', 'ok_cb2']),
102 Clause('live_w5', ['live_outside']),
103 Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
104 Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
105 Askable('up_s1'),
106 Askable('down_s1'),
107 Askable('up_s2'),
108 Askable('down_s2'),
109 Askable('up_s3'),
110 Askable('down_s2'),
111 Askable('dark_l1'),
112 Askable('dark_l2'),
113 Clause('false', ['dark_l1', 'lit_l1']),
114 Clause('false', ['dark_l2', 'lit_l2'])
115 ])
116 # electa.prove_all_ass(['false'])
117 # cs=electa.conflicts()
118 # print(cs)
119 # diagnoses(cs) # diagnoses from conflicts
Exercise 5.8 Implement explanations(self , body), where body is a list of atoms, that
returns the a list of the minimal explanations of the body. This does not require
modification of prove all ass.
Deterministic Planning
• effects: a dictionary of feature:value pairs that are made true by this action.
In particular, a feature in the dictionary has the corresponding value (and
not its previous value) after the action, and a feature not in the dictionary
keeps its old value.
103
104 6. Deterministic Planning
23 self.name = name
24 self.preconds = preconds
25 self.effects = effects
26 self.cost = cost
27
28 def __repr__(self):
29 return self.name
• A set of actions.
• A dictionary that maps each feature into a set of possible values for the
feature.
stripsProblem.py — (continued)
31 class STRIPS_domain(object):
32 def __init__(self, feature_domain_dict, actions):
33 """Problem domain
34 feature_domain_dict is a feature:domain dictionary,
35 mapping each feature to its domain
36 actions
37 """
38 self.feature_domain_dict = feature_domain_dict
39 self.actions = actions
stripsProblem.py — (continued)
41 class Planning_problem(object):
42 def __init__(self, prob_domain, initial_state, goal):
43 """
44 a planning problem consists of
45 * a planning domain
46 * the initial state
47 * a goal
48 """
49 self.prob_domain = prob_domain
50 self.initial_state = initial_state
51 self.goal = goal
Coffee
Shop
(cs) Sam's
Office
(off )
Mail Lab
Room (lab)
(mr )
stripsProblem.py — (continued)
stripsProblem.py — (continued)
b move(b,c,a) b
a c a c
move(b,c,table)
a c b
71 problem0 = Planning_problem(delivery_domain,
72 {'RLoc':'lab', 'MW':True, 'SWC':True, 'RHC':False,
73 'RHM':False},
74 {'RLoc':'off'})
75 problem1 = Planning_problem(delivery_domain,
76 {'RLoc':'lab', 'MW':True, 'SWC':True, 'RHC':False,
77 'RHM':False},
78 {'SWC':False})
79 problem2 = Planning_problem(delivery_domain,
80 {'RLoc':'lab', 'MW':True, 'SWC':True, 'RHC':False,
81 'RHM':False},
82 {'SWC':False, 'MW':False, 'RHM':False})
c
b
a
a c
b
27 def zero(*args,**nargs):
28 """always returns 0"""
29 return 0
30
31 class Forward_STRIPS(Search_problem):
32 """A search problem from a planning problem where:
33 * a node is a state object.
34 * the dynamics are specified by the STRIPS representation of actions
35 """
36 def __init__(self, planning_problem, heur=zero):
37 """creates a forward search space from a planning problem.
38 heur(state,goal) is a heuristic function,
39 an underestimate of the cost from state to goal, where
40 both state and goals are feature:value dictionaries.
41 """
42 self.prob_domain = planning_problem.prob_domain
43 self.initial_state = State(planning_problem.initial_state)
44 self.goal = planning_problem.goal
45 self.heur = heur
46
47 def is_goal(self, state):
48 """is True if node is a goal.
49
50 Every goal feature has the same value in the state and the goal."""
51 return all(state.assignment[prop]==self.goal[prop]
stripsForwardPlanner.py — (continued)
21 def h1(state,goal):
22 """ the distance to the goal location, if there is one"""
23 if 'RLoc' in goal:
24 return dist(state['RLoc'], goal['RLoc'])
25 else:
26 return 0
27
28 def h2(state,goal):
29 """ the distance to the coffee shop plus getting coffee and delivering
it
30 if the robot needs to get coffee
31 """
32 if ('SWC' in goal and goal['SWC']==False
33 and state['SWC']==True
34 and state['RHC']==False):
35 return dist(state['RLoc'],'cs')+3
36 else:
37 return 0
The maximum of the values of a set of admissible heuristics is also an admis-
sible heuristic. The function maxh takes a number of heuristic functions as ar-
guments, and returns a new heuristic function that takes the maximum of the
values of the heuristics. For example, h1 and h2 are heuristic functions and so
maxh(h1,h2) is also. maxh can take an arbitrary number of arguments.
stripsHeuristic.py — (continued)
39 def maxh(*heuristics):
40 """Returns a new heuristic function that is the maximum of the
functions in heuristics.
41 heuristics is the list of arguments which must be heuristic functions.
42 """
43 # return lambda state,goal: max(h(state,goal) for h in heuristics)
44 def newh(state,goal):
45 return max(h(state,goal) for h in heuristics)
46 return newh
The following runs the example with and without the heuristic.
stripsHeuristic.py — (continued)
Exercise 6.4 Try the forward planner with a heuristic function of just h1, with
just h2 and with both. Explain how each one prunes or doesn’t prune the search
space.
Exercise 6.5 Create a better heuristic than maxh(h1, h2). Try it for a number of
different problems. In particular, try and include the following costs:
i) h3 is like h2 but also takes into account the case when Rloc is in goal.
ii) h4 uses the distance to the mail room plus getting mail and delivering it if
the robot needs to get need to deliver mail.
iii) h5 is for getting mail when goal is for the robot to have mail, and then getting
to the goal destination (if there is one).
44
45 def is_goal(self, subgoal):
46 """if subgoal is true in the initial state, a path has been found"""
47 goal_asst = subgoal.assignment
48 return all(self.initial_state[g]==goal_asst[g]
49 for g in goal_asst)
50
51 def start_node(self):
52 """the start node is the top-level goal"""
53 return self.top_goal
54
55 def neighbors(self,subgoal):
56 """returns a list of the arcs for the neighbors of subgoal in this
problem"""
57 goal_asst = subgoal.assignment
58 return [ Arc(subgoal, self.weakest_precond(act,goal_asst),
act.cost, act)
59 for act in self.prob_domain.actions
60 if self.possible(act,goal_asst)]
61
62 def possible(self,act,goal_asst):
63 """True if act is possible to achieve goal_asst.
64
65 the action achieves an element of the effects and
66 the action doesn't delete something that needs to be achieved and
67 the preconditions are consistent with other subgoals that need to
be achieved
68 """
69 return ( any(goal_asst[prop] == act.effects[prop]
70 for prop in act.effects if prop in goal_asst)
71 and all(goal_asst[prop] == act.effects[prop]
72 for prop in act.effects if prop in goal_asst)
73 and all(goal_asst[prop]== act.preconds[prop]
74 for prop in act.preconds if prop not in act.effects
and prop in goal_asst)
75 )
76
77 def weakest_precond(self,act,goal_asst):
78 """returns the subgoal that must be true so goal_asst holds after
act
79 should be: act.preconds | (goal_asst - act.effects)
80 """
81 new_asst = act.preconds.copy()
82 for g in goal_asst:
83 if g not in act.effects:
84 new_asst[g] = goal_asst[g]
85 return Subgoal(new_asst)
86
87 def heuristic(self,subgoal):
88 """in the regression planner a node is a subgoal.
stripsRegressionPlanner.py — (continued)
Exercise 6.7 Multiple path pruning could be used to prune more than the current
code. In particular, if the current node contains more conditions than a previously
visited node, it can be pruned. For example, if {a : True, b : False} has been visited,
then any node that is a superset, e.g., {a : True, b : False, d : True}, need not be
expanded. If the simpler subgoal does not lead to a solution, the more complicated
one wont either. Implement this more severe pruning. (Hint: This may require
modifications to the searcher.)
Exercise 6.8 It is possible that, as knowledge of the domain, that some as-
signment of values to variables can never be achieved. For example, the robot
cannot be holding mail when there is mail waiting (assuming it isn’t holding
mail initially). An assignment of values to (some of the) variables is incompat-
ible if no possible (reachable) state can include that assignment. For example,
{′ MW ′ : True,′ RHM′ : True} is an incompatible assignment. This information may
be useful information for a planner; there is no point in trying to achieve these
together. Define a subclass of STRIPS domain that can accept a list of incompatible
assignments. Modify the regression planner code to use such a list of incompatible
assignments. Give an example where the search space is smaller.
Exercise 6.9 After completing the previous exercise, design incompatible assign-
ments for the blocks world. (This should result in dramatic search improvements.)
71
72 def test_regression_heuristic(thisproblem=problem1):
73 print("\n***** REGRESSION NO HEURISTIC")
74 print(SearcherMPP(Regression_STRIPS(thisproblem)).search())
75
76 print("\n***** REGRESSION WITH HEURISTICs h1 and h2")
77 print(SearcherMPP(Regression_STRIPS(thisproblem,maxh(h1,h2))).search())
78
79 if __name__ == "__main__":
80 test_regression_heuristic()
Exercise 6.10 Try the regression planner with a heuristic function of just h1 and
with just h2 (defined in Section 6.2.1). Explain how each one prunes or doesn’t
prune the search space.
Exercise 6.11 Create a better heuristic than heuristic fun defined in Section 6.2.1.
29 for (feat,dom) in
prob_domain.feature_domain_dict.items()}
30
31 # initial state constraints:
32 constraints = [Constraint((feat_time_var[feat][0],), is_(val))
33 for (feat,val) in initial_state.items()]
34
35 # goal constraints on the final state:
36 constraints += [Constraint((feat_time_var[feat][number_stages],),
37 is_(val))
38 for (feat,val) in goal.items()]
39
40 # precondition constraints:
41 constraints += [Constraint((feat_time_var[feat][t],
self.action_vars[t]),
42 if_(val,act)) # feat@t==val if action@t==act
43 for act in prob_domain.actions
44 for (feat,val) in act.preconds.items()
45 for t in range(number_stages)]
46
47 # effect constraints:
48 constraints += [Constraint((feat_time_var[feat][t+1],
self.action_vars[t]),
49 if_(val,act)) # feat@t+1==val if
action@t==act
50 for act in prob_domain.actions
51 for feat,val in act.effects.items()
52 for t in range(number_stages)]
53 # frame constraints:
54
55 constraints += [Constraint((feat_time_var[feat][t],
self.action_vars[t], feat_time_var[feat][t+1]),
56 eq_if_not_in_({act for act in
prob_domain.actions
57 if feat in act.effects}))
58 for feat in prob_domain.feature_domain_dict
59 for t in range(number_stages) ]
60 variables = set(self.action_vars) | {feat_time_var[feat][t]
61 for feat in
prob_domain.feature_domain_dict
62 for t in range(number_stages+1)}
63 CSP.__init__(self, variables, constraints)
64
65 def extract_plan(self,soln):
66 return [soln[a] for a in self.action_vars]
The following methods return methods which can be applied to the particular
environment.
For example, is (3) returns a function that when applied to 3, returns True
and when applied to any other value returns False. So is (3)(3) returns True
68 def is_(val):
69 """returns a function that is true when it is it applied to val.
70 """
71 #return lambda x: x == val
72 def is_fun(x):
73 return x == val
74 is_fun.__name__ = "value_is_"+str(val)
75 return is_fun
76
77 def if_(v1,v2):
78 """if the second argument is v2, the first argument must be v1"""
79 #return lambda x1,x2: x1==v1 if x2==v2 else True
80 def if_fun(x1,x2):
81 return x1==v1 if x2==v2 else True
82 if_fun.__name__ = "if x2 is "+str(v2)+" then x1 is "+str(v1)
83 return if_fun
84
85 def eq_if_not_in_(actset):
86 """first and third arguments are equal if action is not in actset"""
87 # return lambda x1, a, x2: x1==x2 if a not in actset else True
88 def eq_if_not_fun(x1, a, x2):
89 return x1==x2 if a not in actset else True
90 eq_if_not_fun.__name__ = "first and third arguments are equal if action
is not in "+str(actset)
91 return eq_if_not_fun
Putting it together, this returns a list of actions that solves the problem prob
for a given horizon. If you want to do more than just return the list of actions,
you might want to get it to return the solution. Or even enumerate the solutions
(by using Search with AC from CSP).
stripsCSPPlanner.py — (continued)
93 def con_plan(prob,horizon):
94 """finds a plan for problem prob given horizon.
95 """
96 csp = CSP_from_STRIPS(prob, horizon)
97 sol = Con_solver(csp).solve_one()
98 return csp.extract_plan(sol) if sol else sol
The following are some example queries.
stripsCSPPlanner.py — (continued)
• agenda: a list of (s, a) pairs, where s is a (var, val) pair and a is an action
instance. This means that variable var must have value val before a can
occur.
• causal links: a set of (a0, g, a1) triples, where a1 and a2 are action instances
and g is a (var, val) pair. This holds when action a0 makes g true for action
a1 .
stripsPOP.py — (continued)
28 class POP_node(object):
29 """a (partial) partial-order plan. This is a node in the search
space."""
30 def __init__(self, actions, constraints, agenda, causal_links):
31 """
extract plan constructs a total order of action instances that is consistent with
the partial order.
stripsPOP.py — (continued)
54 def extract_plan(self):
55 """returns a total ordering of the action instances consistent
56 with the constraints.
57 raises IndexError if there is no choice.
58 """
59 sorted_acts = []
60 other_acts = set(self.actions)
61 while other_acts:
62 a = random.choice([a for a in other_acts if
63 all(((a1,a) not in self.constraints) for a1 in
other_acts)])
64 sorted_acts.append(a)
65 other_acts.remove(a)
66 return sorted_acts
stripsPOP.py — (continued)
72 Search_problem.__init__(self)
73 self.planning_problem = planning_problem
74 self.start = Action_instance("start")
75 self.finish = Action_instance("finish")
76
77 def is_goal(self, node):
78 return node.agenda == []
79
80 def start_node(self):
81 constraints = {(self.start, self.finish)}
82 agenda = [(g, self.finish) for g in
self.planning_problem.goal.items()]
83 return POP_node([self.start,self.finish], constraints, agenda, [] )
stripsPOP.py — (continued)
The following methods check whether an action (or action instance) achieves
or deletes some subgoal.
stripsPOP.py — (continued)
This chapter is the first on machine learning. It covers the following topics:
• Features: many of the features come directly from the data. Sometimes
it is useful to construct features, e.g. height > 1.9m might be a Boolean
feature constructed from the real-values feature height. The next chapter
is about neural networdks and how to learn features; in this chapter we
construct explicitly in what is often known a feature engineering.
• Learning with no input features: this is the base case of many methods.
What should we predict if we have no input features? This provides the
base cases for many algorithms (e.g., decision tree algorithm) and base-
lines that more sophisticated algorithms need to beat. It also provides
ways to test various predictors.
• Decision tree learning: one of the classic and simplest learning algo-
rithms, which is the basis of many other algorithms.
127
128 7. Supervised Machine Learning
Figure 7.1: Some of the datasets used here. MLR is UCI Machine Learning
Repository.
• A feature is a function from examples into the range of the feature. Each
feature f also has the following attributes:
Thus for example, a Boolean feature is a function from the examples into
{False, True}. So, if f is a Boolean feature, f .frange == [False, True], and if
e is an example, f (e) is either True or False.
learnProblem.py — (continued)
18 class Data_set(Displayable):
19 """ A data set consists of a list of training data and a list of test
data.
20 """
21
22 def __init__(self, train, test=None, prob_test=0.20, target_index=0,
23 header=None, target_type= None, seed=None): #12345):
24 """A dataset for learning.
25 train is a list of tuples representing the training examples
26 test is the list of tuples representing the test examples
27 if test is None, a test set is created by selecting each
28 example with probability prob_test
29 target_index is the index of the target.
30 If negative, it counts from right.
31 If target_index is larger than the number of properties,
32 there is no target (for unsupervised learning)
33 header is a list of names for the features
34 target_type is either None for automatic detection of target type
35 or one of "numerical", "boolean", "cartegorical"
36 seed is for random number; None gives a different test set each time
37 """
38 if seed: # given seed makes partition consistent from run-to-run
39 random.seed(seed)
40 if test is None:
41 train,test = partition_data(train, prob_test)
42 self.train = train
43 self.test = test
44
45 self.display(1,"Training set has",len(train),"examples. Number of
columns: ",{len(e) for e in train})
46 self.display(1,"Test set has",len(test),"examples. Number of
columns: ",{len(e) for e in test})
47 self.prob_test = prob_test
48 self.num_properties = len(self.train[0])
49 if target_index < 0: #allows for -1, -2, etc.
50 self.target_index = self.num_properties + target_index
51 else:
52 self.target_index = target_index
53 self.header = header
54 self.domains = [set() for i in range(self.num_properties)]
55 for example in self.train:
56 for ind,val in enumerate(example):
57 self.domains[ind].add(val)
58 self.conditions_cache = {} # cache for computed conditions
59 self.create_features()
60 if target_type:
61 self.target.ftype = target_type
62 self.display(1,"There are",len(self.input_features),"input
features")
63
64 def __str__(self):
65 if self.train and len(self.train)>0:
66 return ("Data: "+str(len(self.train))+" training examples, "
67 +str(len(self.test))+" test examples, "
68 +str(len(self.train[0]))+" features.")
69 else:
70 return ("Data: "+str(len(self.train))+" training examples, "
71 +str(len(self.test))+" test examples.")
73 def create_features(self):
74 """create the set of features
75 """
76 self.target = None
77 self.input_features = []
78 for i in range(self.num_properties):
79 def feat(e,index=i):
80 return e[index]
81 if self.header:
82 feat.__doc__ = self.header[i]
83 else:
84 feat.__doc__ = "e["+str(i)+"]"
85 feat.frange = list(self.domains[i])
86 feat.ftype = self.infer_type(feat.frange)
87 if i == self.target_index:
88 self.target = feat
89 else:
90 self.input_features.append(feat)
We try to infer the type of each feature. Sometimes this can be wrong, (e.g.,
when the numbers are really categorical) and so needs to be set explicitly.
learnProblem.py — (continued)
92 def infer_type(self,domain):
93 """Infers the type of a feature with domain
94 """
95 if all(v in {True,False} for v in domain):
96 return "boolean"
97 if all(isinstance(v,(float,int)) for v in domain):
98 return "numeric"
99 else:
100 return "categorical"
• When the range only has two values, we designate one to be the “true”
value.
• When the values are all numeric, we assume they are ordered (as opposed
to just being some classes that happen to be labelled with numbers) and
construct Boolean features for splits of the data. That is, the feature is
e[ind] < cut for some value cut. We choose a number of cut values, up to
a maximum number of cuts, given by max num cuts.
• When the values are not all numeric, we create an indicator function for
each value. An indicator function for a value returns true when that value
is given and false otherwise. Note that we can’t create an indicator func-
tion for values that appear in the test set but not in the training set be-
cause we haven’t seen the test set. For the examples in the test set with a
value that doesn’t appear in the training set for that feature, the indicator
functions all return false.
There is also an option to only create Boolean features from categorical input
features.
learnProblem.py — (continued)
117 if self.header:
118 feat.__doc__ = f"{self.header[ind]}=={true_val}"
119 else:
120 feat.__doc__ = f"e[{ind}]=={true_val}"
121 feat.frange = boolean
122 feat.ftype = "boolean"
123 conds.append(feat)
124 elif all(isinstance(val,(int,float)) for val in frange):
125 if categorical_only: # numerical, don't make cuts
126 def feat(e, i=ind):
127 return e[i]
128 feat.__doc__ = f"e[{ind}]"
129 conds.append(feat)
130 else:
131 # all numeric, create cuts of the data
132 sorted_frange = sorted(frange)
133 num_cuts = min(max_num_cuts,len(frange))
134 cut_positions = [len(frange)*i//num_cuts for i in
range(1,num_cuts)]
135 for cut in cut_positions:
136 cutat = sorted_frange[cut]
137 def feat(e, ind_=ind, cutat=cutat):
138 return e[ind_] < cutat
139
140 if self.header:
141 feat.__doc__ = self.header[ind]+"<"+str(cutat)
142 else:
143 feat.__doc__ = "e["+str(ind)+"]<"+str(cutat)
144 feat.frange = boolean
145 feat.ftype = "boolean"
146 conds.append(feat)
147 else:
148 # create an indicator function for every value
149 for val in frange:
150 def feat(e, ind_=ind, val_=val):
151 return e[ind_] == val_
152 if self.header:
153 feat.__doc__ = self.header[ind]+"=="+str(val)
154 else:
155 feat.__doc__= "e["+str(ind)+"]=="+str(val)
156 feat.frange = boolean
157 feat.ftype = "boolean"
158 conds.append(feat)
159 self.conditions_cache[(max_num_cuts, categorical_only)] = conds
160 return conds
Exercise 7.1 Change the code so that it splits using e[ind] ≤ cut instead of e[ind] <
cut. Check boundary cases, such as 3 elements with 2 cuts. As a test case, make
sure that when the range is the 30 integers from 100 to 129, and you want 2 cuts,
the resulting Boolean features should be e[ind] ≤ 109 and e[ind] ≤ 119 to make
Why might Sam have suggested this? Does this work better? (Try it on a few data
sets).
The following class is used for datasets where the training and test are in dif-
ferent files
learnProblem.py — (continued)
When reading from a file all of the values are strings. This next method
tries to convert each values into a number (an int or a float) or Boolean, if it is
possible.
learnProblem.py — (continued)
learnProblem.py — (continued)
337 """
338 self.orig_dataset = dataset
339 self.unary_functions = unary_functions
340 self.binary_functions = binary_functions
341 self.include_orig = include_orig
342 self.target = dataset.target
343 Data_set.__init__(self,dataset.train, test=dataset.test,
344 target_index = dataset.target_index)
345
346 def create_features(self):
347 if self.include_orig:
348 self.input_features = self.orig_dataset.input_features.copy()
349 else:
350 self.input_features = []
351 for u in self.unary_functions:
352 for f in self.orig_dataset.input_features:
353 self.input_features.append(u(f))
354 for b in self.binary_functions:
355 for f1 in self.orig_dataset.input_features:
356 for f2 in self.orig_dataset.input_features:
357 if f1 != f2:
358 self.input_features.append(b(f1,f2))
The following are useful unary feature constructors and binary feature com-
biner.
learnProblem.py — (continued)
Example:
learnProblem.py — (continued)
Exercise 7.3 For symmetric properties, such as product, we don’t need both
f 1 ∗ f 2 as well as f 2 ∗ f 1 as extra properties. Allow the user to be able to declare
feature constructors as symmetric (by associating a Boolean feature with them).
Change construct features so that it does not create both versions for symmetric
combiners.
learnProblem.py — (continued)
414
415 def learn(self):
416 """returns a predictor, a function from a tuple to a value for the
target feature
417 """
418 raise NotImplementedError("learn") # abstract method
• a point prediction, where we are only allowed to predict one of the values
of the feature. For example, if the values of the feature are {0, 1} we are
only allowed to predict 0 or 1 or of the values are ratings in {1, 2, 3, 4, 5},
we can only predict one of these integers.
• a point prediction, where we are allowed to predict any value. For exam-
ple, if the values of the feature are {0, 1} we may be allowed to predict 0.3,
1, or even 1.7. For all of the criteria we can imagine, there is no point in
predicting a value greater than 1 or less that zero (but that doesn’t mean
we can’t), but it is often useful to predict a value between 0 and 1. If the
values are ratings in {1, 2, 3, 4, 5}, we may want to predict 3.4.
• a probability distribution over the values of the feature. For each value v,
we predict a non-negative number pv , such that the sum over all predic-
tions is 1.
7.3.1 Evaluation
To evaluate a point prediction, we first generate some data from a simple (Bernoulli)
distribution, where there are two possible values, 0 and 1 for the target feature.
Given prob, a number in the range [0, 1], this generate some training and test
data where prob is the probability of each example being 1. To generate a 1 with
probability prob, we generate a random number in range [0,1] and return 1 if
that number is less than prob. A prediction is computed by applying the pre-
dictor to the training data, which is evaluated on the test set. This is repeated
num_samples times.
Let’s evaluate the predictions of the possible selections according to the
different evaluation criteria, for various training sizes.
learnNoInputs.py — (continued)
Exercise 7.4 Which predictor works best for low counts when the error is
You may need to try this a few times to make sure your answer is supported by
the evidence. Does the difference from the other methods get more or less as the
number of examples grow?
Exercise 7.5 Suggest some other predictions that only take the training data.
Does your method do better than the given methods? A simple way to get other
predictors is to vary the threshold of bounded average, or to change the pseodo-
counts of the Laplace method (use other numbers instead of 1 and 2).
The decision tree algorithm does binary splits, and assumes that all input
features are binary functions of the examples. It stops splitting if there are
no input features, the number of examples is less than a specified number of
examples or all of the examples agree on the target feature.
learnDT.py — Learning a binary decision tree
11 from learnProblem import Learner, Evaluate
12 from learnNoInputs import Predict
13 import math
14
15 class DT_learner(Learner):
16 def __init__(self,
17 dataset,
18 split_to_optimize=Evaluate.log_loss, # to minimize for at
each split
19 leaf_prediction=Predict.empirical, # what to use for value
at leaves
20 train=None, # used for cross validation
21 max_num_cuts=8, # maximum number of conditions to split a
numerical feature into
22 gamma=1e-7 , # minimum improvement needed to expand a node
23 min_child_weight=10):
24 self.dataset = dataset
25 self.target = dataset.target
26 self.split_to_optimize = split_to_optimize
27 self.leaf_prediction = leaf_prediction
28 self.max_num_cuts = max_num_cuts
29 self.gamma = gamma
30 self.min_child_weight = min_child_weight
31 if train is None:
32 self.train = self.dataset.train
33 else:
34 self.train = train
35
36 def learn(self, max_num_cuts=8):
37 """learn a decision tree"""
38 return self.learn_tree(self.dataset.conditions(self.max_num_cuts),
self.train)
The main recursive algorithm, takes in a set of input features and a set of
training data. It first decides whether to split. If it doesn’t split, it makes a point
prediction, ignoring the input features.
It only splits if the best split increases the error by at least gamma. This im-
plies it does not split when:
If it splits, it selects the best split according to the evaluation criterion (as-
suming that is the only split it gets to do), and returns the condition to split on
(in the variable split) and the corresponding partition of the examples.
learnDT.py — (continued)
learnDT.py — (continued)
Test cases:
learnDT.py — (continued)
Note that different runs may provide different values as they split the train-
ing and test sets differently. So if you have a hypothesis about what works
better, make sure it is true for different runs.
Exercise 7.6 The current algorithm does not have a very sophisticated stopping
criterion. What is the current stopping criterion? (Hint: you need to look at both
learn tree and select split.)
Exercise 7.7 Extend the current algorithm to include in the stopping criterion
(a) A minimum child size; don’t use a split if one of the children has fewer
elements that this.
(b) A depth-bound on the depth of the tree.
(c) An improvement bound such that a split is only carried out if error with the
split is better than the error without the split by at least the improvement
bound.
Which values for these parameters make the prediction errors on the test set the
smallest? Try it on more than one dataset.
Exercise 7.8 Without any input features, it is often better to include a pseudo-
count that is added to the counts from the training data. Modify the code so that
it includes a pseudo-count for the predictions. When evaluating a split, including
pseudo counts can make the split worse than no split. Does pruning with an im-
provement bound and pseudo-counts make the algorithm work better than with
an improvement bound by itself?
Exercise 7.9 Some people have suggested using information gain (which is equiv-
alent to greedy optimization of log loss) as the measure of improvement when
building the tree, even in they want to have non-probabilistic predictions in the
final tree. Does this work better than myopically choosing the split that is best for
the evaluation criteria we will use to judge the final prediction?
The above decision tree overfits the data. One way to determine whether
the prediction is overfitting is by cross validation. The code below implements
k-fold cross validation, which can be used to choose the value of parameters
to best fit the training data. If we want to use parameter tuning to improve
predictions on a particular data set, we can only use the training data (and not
the test data) to tune the parameter.
In k-fold cross validation, we partition the training set into k approximately
equal-sized folds (each fold is an enumeration of examples). For each fold, we
train on the other examples, and determine the error of the prediction on that
fold. For example, if there are 10 folds, we train on 90% of the data, and then
test on remaining 10% of the data. We do this 10 times, so that each example
gets used as a test set once, and in the training set 9 times.
The code below creates one copy of the data, and multiple views of the data.
For each fold, fold enumerates the examples in the fold, and fold complement
enumerates the examples not in the fold.
learnCrossValidation.py — Cross Validation for Parameter Tuning
11 from learnProblem import Data_set, Data_from_file, Evaluate
12 from learnNoInputs import Predict
13 from learnDT import DT_learner
14 import matplotlib.pyplot as plt
15 import random
16
17 class K_fold_dataset(object):
18 def __init__(self, training_set, num_folds):
19 self.data = training_set.train.copy()
20 self.target = training_set.target
21 self.input_features = training_set.input_features
22 self.num_folds = num_folds
23 self.conditions = training_set.conditions
24
25 random.shuffle(self.data)
26 self.fold_boundaries = [(len(self.data)*i)//num_folds
27 for i in range(0,num_folds+1)]
28
29 def fold(self, fold_num):
30 for i in range(self.fold_boundaries[fold_num],
31 self.fold_boundaries[fold_num+1]):
32 yield self.data[i]
33
34 def fold_complement(self, fold_num):
35 for i in range(0,self.fold_boundaries[fold_num]):
36 yield self.data[i]
37 for i in range(self.fold_boundaries[fold_num+1],len(self.data)):
38 yield self.data[i]
The validation error is the average error for each example, where we test on
each fold, and learn on the other folds.
learnCrossValidation.py — (continued)
Note that different runs for the same data will have the same test error, but
different validation error. If you rerun the Data_from_file, you will get the
new test and training sets, and so the graph will change.
Exercise 7.10 Change the error plot so that it can evaluate the stopping criteria
of the exercise of Section 7.6. Which criteria makes the most difference?
predictor predicts the value of an example from the current parameter settings.
predictor string gives a string representation of the predictor.
learnLinear.py — (continued)
40
41 def predictor(self,e):
42 """returns the prediction of the learner on example e"""
43 linpred = sum(w*f(e) for f,w in self.weights.items())
44 if self.squashed:
45 return sigmoid(linpred)
46 else:
47 return linpred
48
49 def predictor_string(self, sig_dig=3):
50 """returns the doc string for the current prediction function
51 sig_dig is the number of significant digits in the numbers"""
52 doc = "+".join(str(round(val,sig_dig))+"*"+feat.__doc__
53 for feat,val in self.weights.items())
54 if self.squashed:
55 return "sigmoid("+ doc+")"
56 else:
57 return doc
learn is the main algorithm of the learner. It does num iter steps of stochastic
gradient descent with batch size = 1. The other parameters it gets from the
class.
learnLinear.py — (continued)
59 def learn(self,num_iter=100):
60 for it in range(num_iter):
61 self.display(2,"prediction=",self.predictor_string())
62 for e in self.train:
63 predicted = self.predictor(e)
64 error = predicted - self.target(e)
65 update = self.learning_rate*error
66 for feat in self.weights:
67 self.weights[feat] -= update*feat(e)
68 return self.predictor
one is a function that always returns 1. This is used for one of the input prop-
erties.
learnLinear.py — (continued)
70 def one(e):
71 "1"
72 return 1
74 def sigmoid(x):
75 return 1/(1+math.exp(-x))
76
77 def logit(x):
78 return -math.log(1/x-1)
sigmoid([x0 , v2 , . . . ]) returns [v0 , v2 , . . . ] where
exp(xi )
vi =
∑j exp(xj )
80 def softmax(xs,domain=None):
81 """xs is a list of values, and
82 domain is the domain (a list) or None if the list should be returned
83 returns a distribution over the domain (a dict)
84 """
85 m = max(xs) # use of m prevents overflow (and all values underflowing)
86 exps = [math.exp(x-m) for x in xs]
87 s = sum(exps)
88 if domain:
89 return {d:v/s for (d,v) in zip(domain,exps)}
90 else:
91 return [v/s for v in exps]
92
93 def indicator(v, domain):
94 return [1 if v==dv else 0 for dv in domain]
The following tests the learner on a data sets. Uncomment the other data
sets for different examples.
learnLinear.py — (continued)
The following plots the errors on the training and test sets as a function of
the number of steps of gradient descent.
learnLinear.py — (continued)
Exercise 7.11 The squashed learner only makes predictions in the range (0, 1).
If the output values are {1, 2, 3, 4} there is no use prediction less than 1 or greater
than 4. Change the squashed learner so that it can learn values in the range (1, 4).
Test it on the file 'data/car.csv'.
The following plots the prediction as a function of the function of the num-
ber of steps of gradient descent. We first define a version of range that allows
for real numbers (integers and floats).
learnLinear.py — (continued)
learnLinear.py — (continued)
35
36 # from learnLinear import plot_steps
37 # from learnProblem import Data_from_file
38 # data = Data_from_file('data/holiday.csv', target_index=-1)
39 # learner = Linear_learner_bsgd(data)
40 # plot_steps(learner = learner, data=data)
41
42 # to plot polynomials with batching (compare to SGD)
43 # from learnLinear import plot_polynomials
44 # plot_polynomials(data, learner_class = Linear_learner_bsgd)
7.7 Boosting
The following code implements functional gradient boosting for regression.
A Boosted dataset is created from a base dataset by subtracting the pre-
diction of the offset function from each example. This does not save the new
dataset, but generates it as needed. The amount of space used is constant, in-
dependent on the size of the data set.
learnBoosting.py — Functional Gradient Boosting
11 from learnProblem import Data_set, Learner, Evaluate
12 from learnNoInputs import Predict
13 from learnLinear import sigmoid
14 import statistics
15 import random
16
17 class Boosted_dataset(Data_set):
18 def __init__(self, base_dataset, offset_fun, subsample=1.0):
19 """new dataset which is like base_dataset,
20 but offset_fun(e) is subtracted from the target of each example e
21 """
22 self.base_dataset = base_dataset
23 self.offset_fun = offset_fun
24 self.train =
random.sample(base_dataset.train,int(subsample*len(base_dataset.train)))
25 self.test = base_dataset.test
26 #Data_set.__init__(self, base_dataset.train, base_dataset.test,
27 # base_dataset.prob_test, base_dataset.target_index)
28
29 #def create_features(self):
30 """creates new features - called at end of Data_set.init()
31 defines a new target
32 """
33 self.input_features = self.base_dataset.input_features
34 def newout(e):
35 return self.base_dataset.target(e) - self.offset_fun(e)
36 newout.frange = self.base_dataset.target.frange
37 newout.ftype = self.infer_type(newout.frange)
38 self.target = newout
39
40 def conditions(self, *args, colsample_bytree=0.5, **nargs):
41 conds = self.base_dataset.conditions(*args, **nargs)
42 return random.sample(conds, int(colsample_bytree*len(conds)))
A boosting learner takes in a dataset and a base learner, and returns a new
predictor. The base learner, takes a dataset, and returns a Learner object.
learnBoosting.py — (continued)
44 class Boosting_learner(Learner):
45 def __init__(self, dataset, base_learner_class, subsample=0.8):
46 self.dataset = dataset
47 self.base_learner_class = base_learner_class
48 self.subsample = subsample
49 mean = sum(self.dataset.target(e)
50 for e in self.dataset.train)/len(self.dataset.train)
51 self.predictor = lambda e:mean # function that returns mean for
each example
52 self.predictor.__doc__ = "lambda e:"+str(mean)
53 self.offsets = [self.predictor] # list of base learners
54 self.predictors = [self.predictor] # list of predictors
55 self.errors = [data.evaluate_dataset(data.test, self.predictor,
Evaluate.squared_loss)]
56 self.display(1,"Predict mean test set mean squared loss=",
self.errors[0] )
57
58
59 def learn(self, num_ensembles=10):
60 """adds num_ensemble learners to the ensemble.
61 returns a new predictor.
62 """
63 for i in range(num_ensembles):
64 train_subset = Boosted_dataset(self.dataset, self.predictor,
subsample=self.subsample)
65 learner = self.base_learner_class(train_subset)
66 new_offset = learner.learn()
67 self.offsets.append(new_offset)
68 def new_pred(e, old_pred=self.predictor, off=new_offset):
69 return old_pred(e)+off(e)
70 self.predictor = new_pred
71 self.predictors.append(new_pred)
72 self.errors.append(data.evaluate_dataset(data.test,
self.predictor, Evaluate.squared_loss))
73 self.display(1,f"Iteration {len(self.offsets)-1},treesize =
{new_offset.num_leaves}. mean squared
loss={self.errors[-1]}")
74 return self.predictor
For testing, sp DT learner returns a learner that predicts the mean at the leaves
and is evaluated using squared loss. It can also take arguments to change the
default arguments for the trees.
learnBoosting.py — (continued)
76 # Testing
77
78 from learnDT import DT_learner
79 from learnProblem import Data_set, Data_from_file
80
81 def sp_DT_learner(split_to_optimize=Evaluate.squared_loss,
82 leaf_prediction=Predict.mean,**nargs):
83 """Creates a learner with different default arguments replaced by
**nargs
84 """
85 def new_learner(dataset):
86 return DT_learner(dataset,split_to_optimize=split_to_optimize,
87 leaf_prediction=leaf_prediction, **nargs)
88 return new_learner
89
90 #data = Data_from_file('data/car.csv', target_index=-1) regression
91 data = Data_from_file('data/student/student-mat-nq.csv',
separator=';',has_header=True,target_index=-1,seed=13,include_only=list(range(30))+[32])
#2.0537973790924946
92 #data = Data_from_file('data/SPECT.csv', target_index=0, seed=62) #123)
93 #data = Data_from_file('data/mail_reading.csv', target_index=-1)
94 #data = Data_from_file('data/holiday.csv', num_train=19, target_index=-1)
95 #learner10 = Boosting_learner(data,
sp_DT_learner(split_to_optimize=Evaluate.squared_loss,
leaf_prediction=Predict.mean, min_child_weight=10))
96 #learner7 = Boosting_learner(data, sp_DT_learner(0.7))
97 #learner5 = Boosting_learner(data, sp_DT_learner(0.5))
98 #predictor9 =learner9.learn(10)
99 #for i in learner9.offsets: print(i.__doc__)
100 import matplotlib.pyplot as plt
101
102 def plot_boosting_trees(data, steps=10, mcws=[30,20,20,10], gammas=
[100,200,300,500]):
103 # to reduce clutter uncomment one of following two lines
104 #mcws=[10]
105 #gammas=[200]
106 learners = [(mcw, gamma, Boosting_learner(data,
sp_DT_learner(min_child_weight=mcw, gamma=gamma)))
107 for gamma in gammas for mcw in mcws
108 ]
109 plt.ion()
110 plt.xscale('linear') # change between log and linear scale
111 plt.xlabel("number of trees")
112 plt.ylabel("mean squared loss")
113 markers = (m+c for c in ['k','g','r','b','m','c','y'] for m in
['-','--','-.',':'])
114 for (mcw,gamma,learner) in learners:
115 data.display(1,f"min_child_weight={mcw}, gamma={gamma}")
116 learner.learn(steps)
8.1 Layers
A neural network is built from layers.
This provides a modular implementation of layers. Layers can easily be
stacked in many configurations. A layer needs to implement a function to com-
pute the output values from the inputs, a way to back-propagate the error, and
perhaps update its parameters.
165
166 8. Neural Networks and Deep Learning
52 class Linear_complete_layer(Layer):
learnNN.py — (continued)
One of the old standards for the activation function for hidden layers is the
sigmoid. It is included here to experiment with.
learnNN.py — (continued)
learnNN.py — (continued)
8.3.2 RMS-Prop
learnNN.py — (continued)
8.4 Dropout
Dropout is implemented as a layer.
learnNN.py — (continued)
8.4.1 Examples
The following constructs a neural network with one hidden layer. The hidden
layer has width 2 with a ReLU activation function. The output layer used a
sigmoid
learnNN.py — (continued)
340 nn1do.add_layer(Linear_complete_layer(nn1do,3))
341 #nn1.add_layer(Sigmoid_layer(nn1)) # comment this or the next
342 nn1do.add_layer(ReLU_layer(nn1do))
343 nn1do.add_layer(Dropout_layer(nn1do, rate=0.5))
344 #nn1.add_layer(Linear_complete_layer(nn1do,1)) # when using
output_type="boolean"
345 nn1do.add_layer(Linear_complete_layer(nn1do,1)) # when using
output_type="categorical"
346 #nn1do.learn(epochs = 100)
347
348
349 nn_r1 = NN(data)
350 nn_r1.add_layer(Linear_complete_layer_RMS_Prop(nn_r1,3))
351 #nn_r1.add_layer(Sigmoid_layer(nn_r1)) # comment this or the next
352 nn_r1.add_layer(ReLU_layer(nn_r1))
353 #nn_r1.add_layer(Linear_complete_layer(nn_r1,1)) # when using
output_type="boolean"
354 nn_r1.add_layer(Linear_complete_layer_RMS_Prop(nn_r1,1)) # when using
output_type="categorical"
355 #nn_r1.learn(epochs = 100)
356
357
358 nnm1 = NN(data)
359 nnm1.add_layer(Linear_complete_layer_momentum(nnm1,3))
360 #nnm1.add_layer(Sigmoid_layer(nnm1)) # comment this or the next
361 nnm1.add_layer(ReLU_layer(nnm1))
362 #nnm1.add_layer(Linear_complete_layer(nnm1,1)) # when using
output_type="boolean"
363 nnm1.add_layer(Linear_complete_layer_momentum(nnm1,1)) # when using
output_type="categorical"
364 #nnm1.learn(epochs = 100)
365
366
367 nn2 = NN(data) #"boolean") #
368 nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,2))
369 nn2.add_layer(ReLU_layer(nn2))
370 nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,1)) # when using
output_type="categorical"
371
372 nn3 = NN(data) #"boolean") #
373 nn3.add_layer(Linear_complete_layer_RMS_Prop(nn3,5))
374 nn3.add_layer(ReLU_layer(nn3))
375 nn3.add_layer(Linear_complete_layer_RMS_Prop(nn3,1)) # when using
output_type="categorical"
376
377 nn0 = NN(data,learning_rate=0.05)
378 nn0.add_layer(Linear_complete_layer(nn0,1)) # categorical linear regression
379 #nn0.add_layer(Linear_complete_layer_RMS_Prop(nn0,1)) # categorical linear
regression
Plotting.
learnNN.py — (continued)
420 else:
421 error(f"Not implemented: {data.output_type}")
422 nn.learn(epochs)
The following tests on MNIST. The original files are from http://yann.lecun.
com/exdb/mnist/. This code assumes you use the csv files from https://pjreddie.
com/projects/mnist-in-csv/, and put them in the directory ../MNIST/. Note
that this is very inefficient; you would be better to use Keras or Pytorch. There
are 28 ∗ 28 = 784 input units and 512 hidden units, which makes 401,408 pa-
rameters for the lowest linear layer. So don’t be surprised when it takes many
hours in AIPython (even if it only takes a few seconds in Keras).
learnNN.py — (continued)
Exercise 8.1 In the definition of nn1 above, for each of the following, first hy-
pothesize what will happen, then test your hypothesis, then explain whether you
testing confirms your hypothesis or not. Test it for more than one data set, and use
more than one run for each data set.
(a) Which fits the data better, having a sigmoid layer or a ReLU layer after the
first linear layer?
(b) Which is faster, having a sigmoid layer or a ReLU layer after the first linear
layer?
(c) What happens if you have both the sigmoid layer and then a ReLU layer
after the first linear layer and before the second linear layer?
(d) What happens if you have a ReLU layer then a sigmoid layer after the first
linear layer and before the second linear layer?
(e) What happens if you have neither the sigmoid layer nor a ReLU layer after
the first linear layer?
Exercise 8.2 Do some
179
180 9. Reasoning with Uncertainty
34 def __repr__(self):
35 return self.name # f"Variable({self.name})"
36 def __str__(self):
63 class CPD(Factor):
64 def __init__(self, child, parents):
65 """represents P(variable | parents)
66 """
67 self.parents = parents
68 self.child = child
69 Factor.__init__(self, parents+[child])
70
71 def __str__(self):
72 """A brief description of a factor using in tracing"""
73 if self.parents:
74 return f"P({self.child}|{','.join(str(p) for p in
self.parents)})"
75 else:
76 return f"P({self.child})"
77
78 __repr__ = __str__
The simplest CPD is the constant that has probability 1 when the child has the
value specified.
probFactors.py — (continued)
80 class ConstantCPD(CPD):
81 def __init__(self, variable, value):
82 CPD.__init__(self, variable, [])
83 self.value = value
84 def get_value(self, assignment):
85 return 1 if self.value==assignment[self.child] else 0
9.3.2 Noisy-or
A noisy-or, for Boolean variable X with Boolean parents Y1 . . . Yk is parametrized
by k + 1 parameters p0 , p1 , . . . , pk , where each 0 ≤ pi ≤ 1. The sematics is de-
fined as though there are k + 1 hidden variables Z0 , Z1 . . . Zk , where P(Z0 ) = p0
and P(Zi | Yi ) = pi for i ≥ 1, and where X is true if and only if Z0 ∨ Z1 ∨ · · · ∨ Zk
(where ∨ is “or”). Thus X is false if all of the Zi are false. Intuitively, Z0 is the
probability of X when all Yi are false and each Zi is a noisy (probabilistic) mea-
sure that Yi makes X true, and X only needs one to make it true.
probFactors.py — (continued)
18
19 vars is a set of variables
20 factors is a set of factors
21 """
22 def __init__(self, title, variables=None, factors=None):
23 self.title = title
24 self.variables = variables
25 self.factors = factors
A belief network (also known as a Bayesian network) is a graphical model
where all of the factors are conditional probabilities, and every variable has a
conditional probability of it given its parents. This only checks the first condi-
tion, and builds some useful data structures.
probGraphicalModels.py — (continued)
27 class BeliefNetwork(GraphicalModel):
28 """The class of belief networks."""
29
30 def __init__(self, title, variables, factors):
31 """vars is a set of variables
32 factors is a set of factors. All of the factors are instances of
CPD (e.g., Prob).
33 """
34 GraphicalModel.__init__(self, title, variables, factors)
35 assert all(isinstance(f,CPD) for f in factors)
36 self.var2cpt = {f.child:f for f in factors}
37 self.var2parents = {f.child:f.parents for f in factors}
38 self.children = {n:[] for n in self.variables}
39 for v in self.var2parents:
40 for par in self.var2parents[v]:
41 self.children[par].append(v)
42 self.topological_sort_saved = None
The following creates a topological sort of the nodes, where the parents of
a node come before the node in the resulting order. This is based on Kahn’s
algorithm from 1962.
probGraphicalModels.py — (continued)
44 def topological_sort(self):
45 """creates a topological ordering of variables such that the
parents of
46 a node are before the node.
47 """
48 if self.topological_sort_saved:
49 return self.topological_sort_saved
50 next_vars = {n for n in self.var2parents if not self.var2parents[n]
}
51 self.display(3,'topological_sort: next_vars',next_vars)
52 top_order=[]
53 while next_vars:
54 var = next_vars.pop()
55 self.display(3,'select variable',var)
56 top_order.append(var)
57 next_vars |= {ch for ch in self.children[var]
58 if all(p in top_order for p in
self.var2parents[ch])}
59 self.display(3,'var_with_no_parents_left',next_vars)
60 self.display(3,"top_order",top_order)
61 assert
set(top_order)==set(self.var2parents),(top_order,self.var2parents)
62 self.topologicalsort_saved=top_order
63 return top_order
The show method uses matplotlib to show the graphical structure of a belief
network.
probGraphicalModels.py — (continued)
65 def show(self):
66 plt.ion() # interactive
67 ax = plt.figure().gca()
68 ax.set_axis_off()
69 plt.title(self.title)
70 bbox = dict(boxstyle="round4,pad=1.0,rounding_size=0.5")
71 for var in reversed(self.topological_sort()):
72 if self.var2parents[var]:
73 for par in self.var2parents[var]:
74 ax.annotate(var.name, par.position, xytext=var.position,
75 arrowprops={'arrowstyle':'<-'},bbox=bbox,
76 ha='center')
77 else:
78 x,y = var.position
79 plt.text(x,y,var.name,bbox=bbox,ha='center')
Report-of-leaving
Tamper Fire
Alarm Smoke
Leaving
Report
91 f_b = Prob(B,[A],[[0.9,0.1],[0.2,0.8]])
92 f_c = Prob(C,[B],[[0.6,0.4],[0.3,0.7]])
93 f_d = Prob(D,[C],[[0.1,0.9],[0.75,0.25]])
94
95 bn_4ch = BeliefNetwork("4-chain", {A,B,C,D}, {f_a,f_b,f_c,f_d})
Report-of-Leaving Example
The second belief network, bn_report, is Example 8.15 of Poole and Mack-
worth [2017] (http://artint.info). The output of bn_report.show() is shown
in Figure 9.1 of this document.
probGraphicalModels.py — (continued)
Rained Sprinkler
Grass wet
Sprinkler Example
The third belief network is the sprinkler example from Pearl. The output of
bn_sprinkler.show() is shown in Figure 9.2 of this document.
probGraphicalModels.py — (continued)
123
124 f_season = Prob(Season,[],{'summer':0.5, 'winter':0.5})
125 f_sprinkler = Prob(Sprinkler,[Season],{'summer':{'on':0.9,'off':0.1},
126 'winter':{'on':0.01,'off':0.99}})
127 f_rained = Prob(Rained,[Season],{'summer':[0.9,0.1], 'winter': [0.2,0.8]})
128 f_wet = Prob(Grass_wet,[Sprinkler,Rained], {'on': [[0.1,0.9],[0.01,0.99]],
129 'off':[[0.99,0.01],[0.3,0.7]]})
130 f_shiny = Prob(Grass_shiny, [Grass_wet], [[0.95,0.05], [0.3,0.7]])
131 f_shoes = Prob(Shoes_wet, [Grass_wet], [[0.98,0.02], [0.35,0.65]])
132
133 bn_sprinkler = BeliefNetwork("Pearl's Sprinkler Example",
134 {Season, Sprinkler, Rained, Grass_wet, Grass_shiny,
Shoes_wet},
135 {f_season, f_sprinkler, f_rained, f_wet, f_shiny,
f_shoes})
136
137 bn_sprinkler_soff = BeliefNetwork("Pearl's Sprinkler Example
(do(Sprinkler=off))",
138 {Season, Sprinkler, Rained, Grass_wet, Grass_shiny,
Shoes_wet},
139 {f_season, f_rained, f_wet, f_shiny, f_shoes,
140 Prob(Sprinkler,[],{'on':0,'off':1})})
probGraphicalModels.py — (continued)
probGraphicalModels.py — (continued)
169
170 p_cold_lr = Prob(Cold,[],[0.9,0.1])
171 p_flu_lr = Prob(Flu,[],[0.95,0.05])
172 p_covid_lr = Prob(Covid,[],[0.99,0.01])
173
174 p_cough_lr = LogisticRegression(Cough, [Cold,Flu,Covid], [-2.2, 1.67,
1.26, 3.19])
175 p_fever_lr = LogisticRegression(Fever, [ Flu,Covid], [-4.6, 5.02,
5.46])
176 p_sneeze_lr = LogisticRegression(Sneeze, [Cold,Flu ], [-2.94, 3.04, 1.79
])
177
178 bn_lr1 = BeliefNetwork("Bipartite Diagnostic Network - logistic
regression",
179 {Cough, Fever, Sneeze, Cold, Flu, Covid},
180 {p_cold_lr, p_flu_lr, p_covid_lr, p_cough_lr,
p_fever_lr, p_sneeze_lr})
181
182 # to see the conditional probability of Noisy-or do:
183 #print(p_cough_lr.to_table())
184
185 # example from box "Noisy-or compared to logistic regression"
186 # from learnLinear import sigmoid, logit
187 # w0=logit(0.01)
188 # X = Variable("X",boolean)
189 # print(LogisticRegression(X,[A,B,C,D],[w0, logit(0.05)-w0, logit(0.1)-w0,
logit(0.2)-w0, logit(0.2)-w0]).to_table(given={X:True}))
190 # try to predict what would happen (and then test) if we had
191 # w0=logit(0.01)
Tamper Fire
False: 0.601 False: 0.769
True: 0.399 True: 0.231
Alarm Smoke
False: 0.372 False: 0.785
True: 0.628 True: 0.215
Leaving
False: 0.347
True: 0.653
Report=True
We use bn_4ch as the test case, in particular P(B | D = true). This needs an
error threshold, particularly for the approximate methods, where the default
threshold is much too accurate.
probGraphicalModels.py — (continued)
The following draws the posterior distribution of all variables. Figure 9.4
shows the result of bn_reportRC.show_post({Report:True}) when run after
loading probRC.py (see below).
probGraphicalModels.py — (continued)
217 """
218 plt.ion() # interactive
219 ax = plt.figure().gca()
220 ax.set_axis_off()
221 plt.title(self.gm.title+" observed: "+str(obs))
222 bbox = dict(boxstyle="round4,pad=1.0,rounding_size=0.5")
223 for var in reversed(self.gm.topological_sort()):
224 distn = self.query(var, obs=obs)
225 if var in obs:
226 text = var.name + "=" + str(obs[var])
227 else:
228 text = var.name + "\n" + "\n".join(str(d)+":
"+format_string.format(v) for (d,v) in distn.items())
229 if self.gm.var2parents[var]:
230 for par in self.gm.var2parents[var]:
231 ax.annotate(text, par.position, xytext=var.position,
232 arrowprops={'arrowstyle':'<-'},bbox=bbox,
233 ha='center')
234 else:
235 x,y = var.position
236 plt.text(x,y,text,bbox=bbox,ha='center')
probRC.py — (continued)
The recursive conditioning algorithm adds forgetting and caching and recog-
nizing disconnected components. We do this by adding a cache and redefining
the recursive search algorithm. In inherits the query method.
probRC.py — (continued)
66 class ProbRC(ProbSearch):
67 def __init__(self,gm=None):
68 self.cache = {(frozenset(), frozenset()):1}
69 ProbSearch.__init__(self,gm)
70
71 def prob_search(self, context, factors, split_order):
72 """ returns the number \sum_{split_order} \prod_{factors} given
assignments in context
73 context is a variable:value dictionary
74 factors is a set of factors
75 split_order is a list of variables in factors that are not assigned
in context
76 returns sum over variable assignments to variables in split_order
77 of the product of factors
78 """
79 self.display(3,"calling rc,",(context,factors))
80 ce = (frozenset(context.items()), frozenset(factors)) # key for the
cache entry
81 if ce in self.cache:
82 self.display(3,"rc cache lookup",(context,factors))
83 return self.cache[ce]
84 # if not factors: # no factors; needed if you don't have forgetting
and caching
85 # return 1
86 elif vars_not_in_factors := {var for var in context
87 if not any(var in fac.variables for
fac in factors)}:
88 # forget variables not in any factor
89 self.display(3,"rc forgetting variables", vars_not_in_factors)
90 return self.prob_search({key:val for (key,val) in
context.items()
91 if key not in vars_not_in_factors},
92 factors, split_order)
93 elif to_eval := {fac for fac in factors if
fac.can_evaluate(context)}:
94 # evaluate factors when all variables are assigned
95 self.display(3,"rc evaluating factors",to_eval)
96 val = math.prod(fac.get_value(context) for fac in to_eval)
97 if val == 0:
98 return 0
99 else:
100 return val * self.prob_search(context, {fac for fac in factors
101 if fac not in to_eval},
split_order)
102 elif len(comp := connected_components(context, factors,
split_order)) > 1:
103 # there are disconnected components
104 self.display(3,"splitting into connected components",comp,"in
context",context)
105 return(math.prod(self.prob_search(context,f,eo) for (f,eo) in
comp))
106 else:
107 assert split_order, "split_order should not be empty to get
here"
108 total = 0
109 var = split_order[0]
110 self.display(3, "rc branching on", var)
111 for val in var.domain:
112 total += self.prob_search(dict_union({var:val},context),
factors, split_order[1:])
113 self.cache[ce] = total
114 self.display(2, "rc branching on", var,"returning", total)
115 return total
connected_components returns a list of connected components, where a con-
nected component is a set of factors and a set of variables, where the graph that
connects variables and factors that involve them is connected. The connected
components are built one at a time; with a current connected component. At
all times factors is partitioned into 3 disjoint sets:
• other_factors the other factors that are not (yet) in the connected com-
ponent
probRC.py — (continued)
127 component_factors.add(next_fac)
128 new_vars = set(next_fac.variables) - component_variables -
context.keys()
129 component_variables |= new_vars
130 for var in new_vars:
131 factors_to_check |= {f for f in other_factors if var in
f.variables}
132 other_factors -= factors_to_check # set difference
133 if other_factors:
134 return ( [(component_factors,[e for e in split_order if e in
component_variables])]
135 + connected_components(context, other_factors, [e for e in
split_order
136 if e not in
component_variables])
)
137 else:
138 return [(component_factors, split_order)]
Testing:
probRC.py — (continued)
163 ## bn_sprinklerv.query(Shoes_wet,{})
164 ## bn_sprinklerv.query(Shoes_wet,{Rained:True})
165 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:True})
166 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:False,Rained:True})
167
168 from probGraphicalModels import bn_no1, bn_lr1, Cough, Fever, Sneeze,
Cold, Flu, Covid
169 bn_no1v = ProbRC(bn_no1)
170 bn_lr1v = ProbRC(bn_lr1)
171 ## bn_no1v.query(Flu, {Fever:1, Sneeze:1})
172 ## bn_lr1v.query(Flu, {Fever:1, Sneeze:1})
173 ## bn_lr1v.query(Cough,{})
174 ## bn_lr1v.query(Cold,{Cough:1,Sneeze:0,Fever:1})
175 ## bn_lr1v.query(Flu,{Cough:0,Sneeze:1,Fever:1})
176 ## bn_lr1v.query(Covid,{Cough:1,Sneeze:0,Fever:1})
177 ## bn_lr1v.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:0})
178 ## bn_lr1v.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:1})
179
180 if __name__ == "__main__":
181 InferenceMethod.testIM(ProbRC)
32 elim_order = self.gm.variables
33 projFactors = [self.project_observations(fact,obs)
34 for fact in self.gm.factors]
35 for v in elim_order:
36 if v != var and v not in obs:
37 projFactors = self.eliminate_var(projFactors,v)
38 unnorm = factor_times(var,projFactors)
39 p_obs=sum(unnorm)
40 self.display(1,"Unnormalized probs:",unnorm,"Prob obs:",p_obs)
41 return {val:pr/p_obs for val,pr in zip(var.domain, unnorm)}
A FactorObserved is a factor that is the result of some observations on an-
other factor. We don’t store the values in a list; we just look them up as needed.
The observations can include variables that are not in the list, but should have
some intersection with the variables in the factor.
probFactors.py — (continued)
∑ ∏ f.
var f ∈factors
We store the values in a list in a lazy manner; if they are already computed, we
used the stored values. If they are not already computed we can compute and
store them.
probFactors.py — (continued)
182
183 def get_value(self,assignment):
184 """lazy implementation: if not saved, compute it. Return saved
value"""
185 asst = frozenset(assignment.items())
186 if asst in self.values:
187 return self.values[asst]
188 else:
189 total = 0
190 new_asst = assignment.copy()
191 for val in self.var_summed_out.domain:
192 new_asst[self.var_summed_out] = val
193 total += math.prod(fac.get_value(new_asst) for fac in
self.factors)
194 self.values[asst] = total
195 return total
The method factor times multiples a set of factors that are all factors on the same
variable (or on no variables). This is the last step in variable elimination before
normalizing. It returns an array giving the product for each value of variable.
probFactors.py — (continued)
43 def project_observations(self,factor,obs):
44 """Returns the resulting factor after observing obs
45
46 obs is a dictionary of {variable:value} pairs.
47 """
48 if any((var in obs) for var in factor.variables):
49 # a variable in factor is observed
50 return FactorObserved(factor,obs)
51 else:
52 return factor
53
54 def eliminate_var(self,factors,var):
55 """Eliminate a variable var from a list of factors.
56 Returns a new set of factors that has var summed out.
57 """
58 self.display(2,"eliminating ",str(var))
59 contains_var = []
60 not_contains_var = []
61 for fac in factors:
62 if var in fac.variables:
63 contains_var.append(fac)
64 else:
65 not_contains_var.append(fac)
66 if contains_var == []:
67 return factors
68 else:
69 newFactor = FactorSum(var,contains_var)
70 self.display(2,"Multiplying:",[str(f) for f in contains_var])
71 self.display(2,"Creating factor:", newFactor)
72 self.display(3, newFactor.to_table()) # factor in detail
73 not_contains_var.append(newFactor)
74 return not_contains_var
75
76 from probGraphicalModels import bn_4ch, A,B,C,D
77 bn_4chv = VE(bn_4ch)
78 ## bn_4chv.query(A,{})
79 ## bn_4chv.query(D,{})
80 ## InferenceMethod.max_display_level = 3 # show more detail in displaying
81 ## InferenceMethod.max_display_level = 1 # show less detail in displaying
82 ## bn_4chv.query(A,{D:True})
83 ## bn_4chv.query(B,{A:True,D:False})
84
85 from probGraphicalModels import
bn_report,Alarm,Fire,Leaving,Report,Smoke,Tamper
86 bn_reportv = VE(bn_report) # answers queries using variable elimination
87 ## bn_reportv.query(Tamper,{})
88 ## InferenceMethod.max_display_level = 0 # show no detail in displaying
89 ## bn_reportv.query(Leaving,{})
90 ## bn_reportv.query(Tamper,{},elim_order=[Smoke,Report,Leaving,Alarm,Fire])
91 ## bn_reportv.query(Tamper,{Report:True})
92 ## bn_reportv.query(Tamper,{Report:True,Smoke:False})
93
94 from probGraphicalModels import bn_sprinkler, Season, Sprinkler, Rained,
Grass_wet, Grass_shiny, Shoes_wet
95 bn_sprinklerv = VE(bn_sprinkler)
96 ## bn_sprinklerv.query(Shoes_wet,{})
97 ## bn_sprinklerv.query(Shoes_wet,{Rained:True})
98 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:True})
99 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:False,Rained:True})
100
101 from probGraphicalModels import bn_lr1, Cough, Fever, Sneeze, Cold, Flu,
Covid
102 vediag = VE(bn_lr1)
103 ## vediag.query(Cough,{})
104 ## vediag.query(Cold,{Cough:1,Sneeze:0,Fever:1})
105 ## vediag.query(Flu,{Cough:0,Sneeze:1,Fever:1})
106 ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1})
107 ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:0})
108 ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:1})
109
110 if __name__ == "__main__":
111 InferenceMethod.testIM(VE)
26 """
27 total = sum(dist.values())
28 rands = sorted(random.random()*total for i in range(num_samples))
29 result = []
30 dist_items = list(dist.items())
31 cum = dist_items[0][1] # cumulative sum
32 index = 0
33 for r in rands:
34 while r>cum:
35 index += 1
36 cum += dist_items[index][1]
37 result.append(dist_items[index][0])
38 return result
Exercise 9.1
What is the time and space complexity the following 4 methods to generate n
samples, where m is the length of dist:
The test sampling method can be used to generate the statistics from a num-
ber of samples. It is useful to see the variability as a function of the number of
samples. Try it for few samples and also for many samples.
probStochSim.py — (continued)
probStochSim.py — (continued)
53 class SamplingInferenceMethod(InferenceMethod):
54 """The abstract class of sampling-based belief network inference
methods"""
55
56 def __init__(self,gm=None):
57 InferenceMethod.__init__(self, gm)
58
59 def query(self,qvar,obs={},number_samples=1000,sample_order=None):
60 raise NotImplementedError("SamplingInferenceMethod query") #
abstract
probStochSim.py — (continued)
62 class RejectionSampling(SamplingInferenceMethod):
63 """The class that queries Graphical Models using Rejection Sampling.
64
65 gm is a belief network to query
66 """
67 method_name = "rejection sampling"
68
69 def __init__(self, gm=None):
70 SamplingInferenceMethod.__init__(self, gm)
71
72 def query(self, qvar, obs={}, number_samples=1000, sample_order=None):
73 """computes P(qvar | obs) where
74 qvar is a variable.
75 obs is a {variable:value} dictionary.
76 sample_order is a list of variables where the parents
77 come before the variable.
78 """
79 if sample_order is None:
80 sample_order = self.gm.topological_sort()
81 self.display(2,*sample_order,sep="\t")
82 counts = {val:0 for val in qvar.domain}
83 for i in range(number_samples):
84 rejected = False
85 sample = {}
86 for nvar in sample_order:
87 fac = self.gm.var2cpt[nvar] #factor with nvar as child
Exercise 9.2 Change this algorithm so that it does importance sampling using a
proposal distribution. It needs sample one using a different distribution and then
update the weight of the current sample. For testing, use a proposal distribution
that only specifies probabilities for some of the variables (and the algorithm uses
the probabilities for the network in other cases).
Resampling
Resample is based on sample multiple but works with an array of particles.
(Aside: Python doesn’t let us use sample multiple directly as it uses a dictionary,
and particles, represented as dictionaries can’t be the key of dictionaries).
probStochSim.py — (continued)
9.8.6 Examples
probStochSim.py — (continued)
Exercise 9.3 This code keeps regenerating the distribution of a variable given
its parents. Implement one or both of the following, and compare them to the
original. Make cond dist return a slice that corresponds to the distribution, and
then use the slice instead of the dictionary (a list slice does not generate new data
structures). Make cond dist remember values it has already computed, and only
return these.
Exercise 9.4 Change the code so that it can have multiple query variables. Make
the list of query variable be an input to the algorithm, so that the default value is
the list of all non-observed variables.
Exercise 9.5 In this algorithm, explain where it computes the probability of a
variable given its Markov blanket. Instead of returning the average of the samples
for the query variable, it is possible to return the average estimate of the probabil-
ity of the query variable given its Markov blanket. Does this converge to the same
answer as the given code? Does it converge faster, slower, or the same?
samples approaches infinity (as do all of these algorithms), the algorithms can
be compared by comparing the accuracy for multiple runs. Summary statistics
like the variance may provide some information, but the assumptions behind
the variance being appropriate (namely that the distribution is approximately
Gaussian) may not hold for cases where the predictions are bounded and often
skewed.
It is more appropriate to plot the distribution of predictions over multiple
runs. The plot stats method plots the prediction of a particular variable (or for
the partition function) for a number of runs of the same algorithm. On the x-
axis, is the prediction of the algorithm. On the y-axis is the number of runs
with prediction less than or equal to the x value. Thus this is like a cumulative
distribution over the predictions, but with counts on the y-axis.
Note that for runs where there are no samples that are consistent with the
observations (as can happen with rejection sampling), the prediction of proba-
bility is 1.0 (as a convention for 0/0).
That variable what contains the query variable, or what is “prob ev”, the
probability of evidence.
probStochSim.py — (continued)
337 #
plot_stats(bn_reportr,Tamper,True,{Report:True,Smoke:True},number_samples=1000,
number_runs=1000)
338 #
plot_stats(bn_reportL,Tamper,True,{Report:True,Smoke:True},number_samples=1000,
number_runs=1000)
339 #
plot_stats(bn_reportp,Tamper,True,{Report:True,Smoke:True},number_samples=1000,
number_runs=1000)
340 #
plot_stats(bn_reportr,Tamper,True,{Report:True,Smoke:True},number_samples=100,
number_runs=1000)
341 #
plot_stats(bn_reportL,Tamper,True,{Report:True,Smoke:True},number_samples=100,
number_runs=1000)
342 #
plot_stats(bn_reportg,Tamper,True,{Report:True,Smoke:True},number_samples=1000,
number_runs=1000)
343
344 def plot_mult(methods, example, qvar, qval, obs, number_samples=1000,
number_runs=1000):
345 for method in methods:
346 solver = method(example)
347 if isinstance(method,SamplingInferenceMethod):
348 plot_stats(solver, qvar, qval, obs, number_samples, number_runs)
349 else:
350 plot_stats(solver, qvar, qval, obs, number_runs)
351
352 from probRC import ProbRC
353 # Try following (but it takes a while..)
354 methods =
[ProbRC,RejectionSampling,LikelihoodWeighting,ParticleFiltering,GibbsSampling]
355 #plot_mult(methods,bn_report,Tamper,True,{Report:True,Smoke:False},number_samples=100,
number_runs=1000)
356 #
plot_mult(methods,bn_report,Tamper,True,{Report:False,Smoke:True},number_samples=100,
number_runs=1000)
357
358 # Sprinkler Example:
359 #
plot_stats(bn_sprinklerr,Shoes_wet,True,{Grass_shiny:True,Rained:True},number_samples=1000)
360 #
plot_stats(bn_sprinklerL,Shoes_wet,True,{Grass_shiny:True,Rained:True},number_samples=1000)
models, and more generally, dynamic belief networks, using the graphical mod-
els code.
This HMM code assumes there are multiple Boolean observation variables
that depend on the current state and are independent of each other given the
state.
probHMM.py — Hidden Markov Model
11 import random
12 from probStochSim import sample_one, sample_multiple
13
14 class HMM(object):
15 def __init__(self, states, obsvars, pobs, trans, indist):
16 """A hidden Markov model.
17 states - set of states
18 obsvars - set of observation variables
19 pobs - probability of observations, pobs[i][s] is P(Obs_i=True |
State=s)
20 trans - transition probability - trans[i][j] gives P(State=j |
State=i)
21 indist - initial distribution - indist[s] is P(State_0 = s)
22 """
23 self.states = states
24 self.obsvars = obsvars
25 self.pobs = pobs
26 self.trans = trans
27 self.indist = indist
Consider the following example. Suppose you want to unobtrusively keep
track of an animal in a triangular enclosure using sound. Suppose you have
3 microphones that provide unreliable (noisy) binary information at each time
step. The animal is either close to one of the 3 points of the triangle or in the
middle of the triangle.
probHMM.py — (continued)
29 # state
30 # 0=middle, 1,2,3 are corners
31 states1 = {'middle', 'c1', 'c2', 'c3'} # states
32 obs1 = {'m1','m2','m3'} # microphones
The observation model is as follows. If the animal is in a corner, it will
be detected by the microphone at that corner with probability 0.6, and will be
independently detected by each of the other microphones with a probability of
0.1. If the animal is in the middle, it will be detected by each microphone with
a probability of 0.4.
probHMM.py — (continued)
probHMM.py — (continued)
96 hmm1f1 = HMMVEfilter(hmm1)
97 # hmm1f1.filter([{'m1':0, 'm2':1, 'm3':1}, {'m1':1, 'm2':0, 'm3':1}])
98 ## HMMVEfilter.max_display_level = 2 # show more detail in displaying
99 # hmm1f2 = HMMVEfilter(hmm1)
100 # hmm1f2.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':1, 'm3':0},
{'m1':1, 'm2':0, 'm3':0},
101 # {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
{'m1':0, 'm2':0, 'm3':0},
102 # {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':1},
{'m1':0, 'm2':0, 'm3':1},
103 # {'m1':0, 'm2':0, 'm3':1}])
104 # hmm1f3 = HMMVEfilter(hmm1)
Exercise 9.6 The representation assumes that there are a list of Boolean obser-
vations. Extend the representation so that the each observation variable can have
multiple discrete values. You need to choose a representation for the model, and
change the algorithm.
9.9.2 Localization
The localization example in the book is a controlled HMM, where there is a
given action at each time and the transition depends on the action. In this
class, the transition is set to None initially, and needs to be provided with an
action to determine the transition probability.
40 class HMM_Local(HMMVEfilter):
41 """VE filter for controlled HMMs
42 """
43 def __init__(self, hmm):
44 HMMVEfilter.__init__(self, hmm)
45
46 def go(self, action):
47 self.hmm.trans = self.hmm.act2trans[action]
48 self.advance()
49
50 loc_filt = HMM_Local(hmm_16pos)
51 # loc_filt.observe({'door':True}); loc_filt.go("right");
loc_filt.observe({'door':False}); loc_filt.go("right");
loc_filt.observe({'door':True})
52 # loc_filt.state_dist
The following lets us interactively move the agent and provide observa-
tions. It shows the distribution over locations.
probLocalization.py — (continued)
54 class Show_Localization(Displayable):
55 def __init__(self,hmm):
56 self.hmm = hmm
57 self.loc_filt = HMM_Local(hmm)
58 fig,(self.ax) = plt.subplots()
59 plt.subplots_adjust(bottom=0.2)
60 left_butt = Button(plt.axes([0.05,0.02,0.1,0.05]), "left")
61 left_butt.on_clicked(self.left)
62 right_butt = Button(plt.axes([0.25,0.02,0.1,0.05]), "right")
63 right_butt.on_clicked(self.right)
64 door_butt = Button(plt.axes([0.45,0.02,0.1,0.05]), "door")
65 door_butt.on_clicked(self.door)
66 nodoor_butt = Button(plt.axes([0.65,0.02,0.1,0.05]), "no door")
67 nodoor_butt.on_clicked(self.nodoor)
68 reset_butt = Button(plt.axes([0.85,0.02,0.1,0.05]), "reset")
69 reset_butt.on_clicked(self.reset)
70 #this makes sure y-axis goes to 1, graph overwritten in
draw_dist
71 self.draw_dist()
72 plt.show()
73
74 def draw_dist(self):
75 self.ax.clear()
76 plt.ylim(0,1)
77 self.ax.set_ylabel("Probability")
78 self.ax.set_xlabel("Location")
79 self.ax.set_title("Location Probability Distribution")
80 self.ax.set_xticks(self.hmm.states)
81 vals = [self.loc_filt.state_dist[i] for i in self.hmm.states]
82 self.bars = self.ax.bar(self.hmm.states, vals, color='black')
83 self.ax.bar_label(self.bars,["{v:.2f}".format(v=v) for v in vals],
padding = 1)
84 plt.draw()
85
86 def left(self,event):
87 self.loc_filt.go("left")
88 self.draw_dist()
89 def right(self,event):
90 self.loc_filt.go("right")
91 self.draw_dist()
92 def door(self,event):
93 self.loc_filt.observe({'door':True})
94 self.draw_dist()
95 def nodoor(self,event):
96 self.loc_filt.observe({'door':False})
97 self.draw_dist()
98 def reset(self,event):
99 self.loc_filt.state_dist = {i:1/16 for i in range(16)}
100 self.draw_dist()
101
102 # sl = Show_Localization(hmm_16pos)
probHMM.py — (continued)
• Rolling out the DBN for some time period, and using standard belief net-
work inference. The latest time that needs to be in the rolled out network
is the time of the latest observation or the time of a query (whichever is
later). This allows us to observe any variables at any time and query any
variables at any time. This is covered in Section 9.10.2.
• An unrolled belief network may be very large, and we might only be in-
terested in asking about “now”. In this case we can just representing the
variables “now”. In this approach we can observe and query the current
variables. We can them move to the next time. This does not allow for
arbitrary historical queries (about the past or the future), but can be much
simpler. This is covered in Section 9.10.3.
be created together.
A dynamic belief network consists of:
• An initial distribution over the features ”now” (time 1). This is a belief
network with all variables being time 1 variables.
48 class FactorRename(Factor):
49 def __init__(self,fac,renaming):
50 """A renamed factor.
51 fac is a factor
52 renaming is a dictionary of the form {new:old} where old and new
var variables,
53 where the variables in fac appear exactly once in the renaming
54 """
55 Factor.__init__(self,[n for (n,o) in renaming.items() if o in
fac.variables])
56 self.orig_fac = fac
57 self.renaming = renaming
58
59 def get_value(self,assignment):
60 return self.orig_fac.get_value({self.renaming[var]:val
61 for (var,val) in assignment.items()
62 if var in self.variables})
probDBN.py — (continued)
71 class DBN(Displayable):
72 """The class of stationary Dynamic Belief networks.
73 * name is the DBN name
74 * vars_now is a list of current variables (each must have
75 previous variable).
76 * transition_factors is a list of factors for P(X|parents) where X
77 is a current variable and parents is a list of current or previous
variables.
78 * init_factors is a list of factors for P(X|parents) where X is a
79 current variable and parents can only include current variables
80 The graph of transition factors + init factors must be acyclic.
81
82 """
83 def __init__(self, title, vars_now, transition_factors=None,
init_factors=None):
84 self.title = title
85 self.vars_now = vars_now
86 self.vars_prev = [v.previous for v in vars_now]
87 self.transition_factors = transition_factors
88 self.init_factors = init_factors
89 self.var_index = {} # var_index[v] is the index of variable v
90 for i,v in enumerate(vars_now):
91 self.var_index[v]=i
Here is a 3 variable DBN:
probDBN.py — (continued)
109 from probHMM import closeMic, farMic, midMic, sm, mmc, sc, mcm, mcc
110
111 Pos_0,Pos_1 = variable_pair("Position",domain=[0,1,2,3])
112 Mic1_0,Mic1_1 = variable_pair("Mic1")
113 Mic2_0,Mic2_1 = variable_pair("Mic2")
114 Mic3_0,Mic3_1 = variable_pair("Mic3")
115
116 # conditional probabilities - see hmm for the values of sm,mmc, etc
117 ppos = Prob(Pos_1, [Pos_0],
118 [[sm, mmc, mmc, mmc], #was in middle
119 [mcm, sc, mcc, mcc], #was in corner 1
120 [mcm, mcc, sc, mcc], #was in corner 2
121 [mcm, mcc, mcc, sc]]) #was in corner 3
122 pm1 = Prob(Mic1_1, [Pos_1], [[1-midMic, midMic], [1-closeMic, closeMic],
123 [1-farMic, farMic], [1-farMic, farMic]])
124 pm2 = Prob(Mic2_1, [Pos_1], [[1-midMic, midMic], [1-farMic, farMic],
125 [1-closeMic, closeMic], [1-farMic, farMic]])
126 pm3 = Prob(Mic3_1, [Pos_1], [[1-midMic, midMic], [1-farMic, farMic],
probDBN.py — (continued)
158 # Try
159 #from probRC import ProbRC
160 #bn = BNfromDBN(dbn1,2) # construct belief network
161 #drc = ProbRC(bn) # initialize recursive conditioning
162 #B2 = bn.name2var['B'][2]
10.1 K-means
The k-means learner maintains two lists that suffice as sufficient statistics to
classify examples, and to learn the classification:
• class counts is a list such that class counts[c] is the number of examples in
the training set with class = c.
• feature sum is a list such that feature sum[i][c] is sum of the values for the
i’th feature i for members of class c. The average value of the ith feature
in class i is
feature sum[i][c]
class counts[c]
229
230 10. Learning with Uncertainty
35 def distance(self,cl,eg):
36 """distance of the eg from the mean of the class"""
37 return sum( (self.class_prediction(ind,cl)-feat(eg))**2
38 for (ind,feat) in
enumerate(self.dataset.input_features))
39
40 def class_prediction(self,feat_ind,cl):
41 """prediction of the class cl on the feature with index feat_ind"""
42 if self.class_counts[cl] == 0:
43 return 0 # there are no examples so we can choose any value
44 else:
45 return self.feature_sum[feat_ind][cl]/self.class_counts[cl]
46
47 def class_of_eg(self,eg):
48 """class to which eg is assigned"""
49 return (min((self.distance(cl,eg),cl)
50 for cl in range(self.num_classes)))[1]
51 # second element of tuple, which is a class with minimum
distance
One step of k-means updates the class counts and feature sum. It uses the old
values to determine the classes, and so the new values for class counts and
feature sum. At the end it determines whether the values of these have changes,
and then replaces the old ones with the new ones. It returns an indicator of
whether the values are stable (have not changed).
learnKMeans.py — (continued)
53 def k_means_step(self):
54 """Updates the model with one step of k-means.
55 Returns whether the assignment is stable.
56 """
57 new_class_counts = [0]*self.num_classes
58 # feature_sum[i][c] is the sum of the values of feature i for class
c
59 new_feature_sum = [[0]*self.num_classes
60 for feat in self.dataset.input_features]
61 for eg in self.dataset.train:
62 cl = self.class_of_eg(eg)
63 new_class_counts[cl] += 1
64 for (ind,feat) in enumerate(self.dataset.input_features):
65 new_feature_sum[ind][cl] += feat(eg)
66 stable = (new_class_counts == self.class_counts) and
(self.feature_sum == new_feature_sum)
67 self.class_counts = new_class_counts
68 self.feature_sum = new_feature_sum
69 self.num_iterations += 1
70 return stable
71
72
73 def learn(self,n=100):
74 """do n steps of k-means, or until convergence"""
75 i=0
76 stable = False
77 while i<n and not stable:
78 stable = self.k_means_step()
79 i += 1
80 self.display(1,"Iteration",self.num_iterations,
81 "class counts: ",self.class_counts,"
Stable=",stable)
82 return stable
83
84 def show_classes(self):
85 """sorts the data by the class and prints in order.
86 For visualizing small data sets
87 """
88 class_examples = [[] for i in range(self.num_classes)]
89 for eg in self.dataset.train:
90 class_examples[self.class_of_eg(eg)].append(eg)
91 print("Class","Example",sep='\t')
92 for cl in range(self.num_classes):
93 for eg in class_examples[cl]:
94 print(cl,*eg,sep='\t')
95
96 def plot_error(self, maxstep=20):
97 """Plots the sum-of-suares error as a function of the number of
steps"""
98 plt.ion()
99 plt.xlabel("step")
100 plt.ylabel("Ave sum-of-squares error")
101 train_errors = []
102 if self.dataset.test:
103 test_errors = []
104 for i in range(maxstep):
105 self.learn(1)
106 train_errors.append( sum(self.distance(self.class_of_eg(eg),eg)
107 for eg in self.dataset.train)
108 /len(self.dataset.train))
109 if self.dataset.test:
110 test_errors.append(
sum(self.distance(self.class_of_eg(eg),eg)
111 for eg in self.dataset.test)
112 /len(self.dataset.test))
113 plt.plot(range(1,maxstep+1),train_errors,
114 label=str(self.num_classes)+" classes. Training set")
115 if self.dataset.test:
116 plt.plot(range(1,maxstep+1),test_errors,
117 label=str(self.num_classes)+" classes. Test set")
118 plt.legend()
119 plt.draw()
120
121 %data = Data_from_file('data/emdata1.csv', num_train=10,
target_index=2000) % trivial example
122 data = Data_from_file('data/emdata2.csv', num_train=10, target_index=2000)
123 %data = Data_from_file('data/emdata0.csv', num_train=14,
target_index=2000) % example from textbook
124 kml = K_means_learner(data,2)
125 num_iter=4
126 print("Class assignment after",num_iter,"iterations:")
127 kml.learn(num_iter); kml.show_classes()
128
129 # Plot the error
130 # km2=K_means_learner(data,2); km2.plot_error(20) # 2 classes
131 # km3=K_means_learner(data,3); km3.plot_error(20) # 3 classes
132 # km13=K_means_learner(data,13); km13.plot_error(20) # 13 classes
133
134 # data = Data_from_file('data/carbool.csv',
target_index=2000,boolean_features=True)
135 # kml = K_means_learner(data,3)
136 # kml.learn(20); kml.show_classes()
137 # km3=K_means_learner(data,3); km3.plot_error(20) # 3 classes
138 # km3=K_means_learner(data,30); km3.plot_error(20) # 30 classes
Exercise 10.1 Change boolean features = True flag to allow for numerical features.
K-means assumes the features are numerical, so we want to make non-numerical
features into numerical features (using characteristic functions) but we probably
don’t want to change numerical features into Boolean.
Exercise 10.2 If there are many classes, some of the classes can become empty
(e.g., try 100 classes with carbool.csv). Implement a way to put some examples
into a class, if possible. Two ideas are:
(a) Initialize the classes with actual examples, so that the classes will not start
empty. (Do the classes become empty?)
(b) In class prediction, we test whether the code is empty, and make a prediction
of 0 for an empty class. It is possible to make a different prediction to “steal”
an example (but you should make sure that a class has a consistent value for
each feature in a loop).
Make your own suggestions, and compare it with the original, and whichever of
these you think may work better.
10.2 EM
In the following definition, a class, c, is a integer in range [0, num classes). i is
an index of a feature, so feat[i] is the ith feature, and a feature is a function from
tuples to values. val is a value of a feature.
A model consists of 2 lists, which form the sufficient statistics:
• class counts is a list such that class counts[c] is the number of tuples with
class = c, where each tuple is weighted by its probability, i.e.,
• feature counts is a list such that feature counts[i][val][c] is the weighted count
of the number of tuples t with feat[i](t) = val and class(t) = c, each tuple
is weighted by its probability, i.e.,
learnEM.py — EM Learning
11 from learnProblem import Data_set, Learner, Data_from_file
12 import random
13 import math
14 import matplotlib.pyplot as plt
15
16 class EM_learner(Learner):
17 def __init__(self,dataset, num_classes):
18 self.dataset = dataset
19 self.num_classes = num_classes
20 self.class_counts = None
21 self.feature_counts = None
The function em step goes though the training examples, and updates these
counts. The first time it is run, when there is no model, it uses random distri-
butions.
learnEM.py — (continued)
The last step is because len(self .dataset) is a constant (independent of c). class counts[c]
can be taken out of the product, but needs to be raised to the power of the num-
ber of features, and one of them cancels.
learnEM.py — (continued)
51 def learn(self,n):
52 """do n steps of em"""
53 for i in range(n):
54 self.class_counts,self.feature_counts =
self.em_step(self.class_counts,
55 self.feature_counts)
The following is for visualizing the classes. It prints the dataset ordered by the
probability of class c.
learnEM.py — (continued)
57 def show_class(self,c):
58 """sorts the data by the class and prints in order.
59 For visualizing small data sets
60 """
61 sorted_data =
sorted((self.prob(tpl,self.class_counts,self.feature_counts)[c],
62 ind, # preserve ordering for equal
probabilities
63 tpl)
64 for (ind,tpl) in enumerate(self.dataset.train))
65 for cc,r,tpl in sorted_data:
66 print(cc,*tpl,sep='\t')
The following are for evaluating the classes.
The probability of a tuple can be evaluated by marginalizing over the classes:
where cc is the class count and fc is feature count. len(self .dataset) can be dis-
tributed out of the sum, and cc[c] can be taken out of the product:
1 1
=
len(self .dataset) ∑ cc[c]#feats−1 ∗ ∏ fc[i][feati (tple)][c]
c i
Given the probability of each tuple, we can evaluate the logloss, as the negative
of the log probability:
learnEM.py — (continued)
68 def logloss(self,tple):
69 """returns the logloss of the prediction on tple, which is
-log(P(tple))
70 based on the current class counts and feature counts
71 """
72 feats = self.dataset.input_features
73 res = 0
74 cc = self.class_counts
75 fc = self.feature_counts
76 for c in range(self.num_classes):
77 res += prod(fc[i][feat(tple)][c]
78 for (i,feat) in
enumerate(feats))/(cc[c]**(len(feats)-1))
79 if res>0:
80 return -math.log2(res/len(self.dataset.train))
81 else:
82 return float("inf") #infinity
83
84 def plot_error(self, maxstep=20):
85 """Plots the logloss error as a function of the number of steps"""
86 plt.ion()
87 plt.xlabel("step")
88 plt.ylabel("Ave Logloss (bits)")
89 train_errors = []
90 if self.dataset.test:
91 test_errors = []
92 for i in range(maxstep):
93 self.learn(1)
94 train_errors.append( sum(self.logloss(tple) for tple in
self.dataset.train)
95 /len(self.dataset.train))
96 if self.dataset.test:
97 test_errors.append( sum(self.logloss(tple) for tple in
self.dataset.test)
98 /len(self.dataset.test))
99 plt.plot(range(1,maxstep+1),train_errors,
100 label=str(self.num_classes)+" classes. Training set")
101 if self.dataset.test:
102 plt.plot(range(1,maxstep+1),test_errors,
103 label=str(self.num_classes)+" classes. Test set")
104 plt.legend()
105 plt.draw()
106
107 def prod(L):
108 """returns the product of the elements of L"""
109 res = 1
110 for e in L:
111 res *= e
112 return res
113
114 def random_dist(k):
115 """generate k random numbers that sum to 1"""
116 res = [random.random() for i in range(k)]
117 s = sum(res)
118 return [v/s for v in res]
119
120 data = Data_from_file('data/emdata2.csv', num_train=10, target_index=2000)
121 eml = EM_learner(data,2)
122 num_iter=2
Exercise 10.3 For the EM data, where there are naturally 2 classes, 3 classes does
better on the training set after a while than 2 classes, but worse on the test set.
Explain why. Hint: look what the 3 classes are. Use ”em3.show class(i)” for each
of the classes i ∈ [0, 3).
Exercise 10.4 Write code to plot the logloss as a function of the number of classes
(from 1 to say 15) for a fixed number of iterations. (From the experience with the
existing code, think about how many iterations is appropriate.)
Causality
11.1 Do Questions
A causal model can answer “do” questions.
The following adds the queryDo method to the InferenceMethod class, so it
can be used with any inference method.
probDo.py — (continued)
239
240 11. Causality
30
31 from probGraphicalModels import bn_sprinkler, Season, Sprinkler, Rained,
Grass_wet, Grass_shiny, Shoes_wet, bn_sprinkler_soff
32 bn_sprinklerv = ProbRC(bn_sprinkler)
33 ## bn_sprinklerv.queryDo(Shoes_wet)
34 ## bn_sprinklerv.queryDo(Shoes_wet,obs={Sprinkler:"off"})
35 ## bn_sprinklerv.queryDo(Shoes_wet,do={Sprinkler:"off"})
36 ## ProbRC(bn_sprinkler_soff).query(Shoes_wet) # should be same as previous
case
37 ## bn_sprinklerv.queryDo(Season, obs={Sprinkler:"off"})
38 ## bn_sprinklerv.queryDo(Season, do={Sprinkler:"off"})
probDo.py — (continued)
B C if b C if not b B'
C C'
29
30 # as a deterministic system with independent noise
31 A = Variable("A", boolean, position=(0.2,0.8))
32 B = Variable("B", boolean, position=(0.2,0.4))
33 C = Variable("C", boolean, position=(0.2,0.0))
34 Aprime = Variable("A'", boolean, position=(0.8,0.8))
35 Bprime = Variable("B'", boolean, position=(0.8,0.4))
36 Cprime = Variable("C'", boolean, position=(0.8,0.0))
37 BifA = Variable("B if a", boolean, position=(0.4,0.8))
38 BifnA = Variable("B if not a", boolean, position=(0.6,0.8))
39 CifB = Variable("C if b", boolean, position=(0.4,0.4))
40 CifnB = Variable("C if not b", boolean, position=(0.6,0.4))
41
42 p_A = Prob(A, [], [0.5,0.5])
43 p_B = Prob(B, [A, BifA, BifnA], [[[[1,0],[0,1]],[[1,0],[0,1]]], # A=0
44 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # A=1
45 p_C = Prob(C, [B, CifB, CifnB], [[[[1,0],[0,1]],[[1,0],[0,1]]], # B=0
46 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # B=1
47 p_Aprime = Prob(Aprime,[], [0.6,0.4])
48 p_Bprime = Prob(Bprime, [Aprime, BifA, BifnA],
[[[[1,0],[0,1]],[[1,0],[0,1]]], # A=0
49 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # A=1
50 p_Cprime = Prob(Cprime, [Bprime, CifB, CifnB],
[[[[1,0],[0,1]],[[1,0],[0,1]]], # B=0
51 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # B=1
52 p_bifa = Prob(BifA, [], [0.6,0.4]) # Does not actually depend on A!!!
53 p_bifna = Prob(BifnA, [], [0.6,0.4])
54 p_cifb = Prob(CifB, [], [0.9,0.1])
55 p_cifnb = Prob(CifnB, [], [0.2,0.8])
56
57 abcCounter = BeliefNetwork("ABC Counterfactual Example",
58 [A,B,C,Aprime,Bprime,Cprime,BifA, BifnA, CifB,
CifnB],
59 [p_A,p_B,p_C,p_Aprime,p_Bprime, p_Cprime, p_bifa,
p_bifna, p_cifb, p_cifnb])
60
61 abcq = ProbRC(abcCounter)
62 # abcq.queryDo(Cprime, obs = {Aprime:False, A:True})
63 # abcq.queryDo(Cprime, obs = {C:True, Aprime:False})
64 # abcq.queryDo(Cprime, obs = {A:True, C:True, Aprime:False})
65 # abcq.queryDo(Cprime, obs = {A:True, C:True, Aprime:False})
66 # abcq.queryDo(Cprime, obs = {A:False, C:True, Aprime:False})
67 # abcq.queryDo(CifB, obs = {C:True,Aprime:False})
68 # abcq.queryDo(CifnB, obs = {C:True,Aprime:False})
69
70 # abcq.show_post(obs = {})
71 # abcq.show_post(obs = {Aprime:False, A:True})
72 # abcq.show_post(obs = {A:True, C:True, Aprime:False})
73 # abcq.show_post(obs = {A:True, C:True, Aprime:True})
The following is the firing squad example of Pearl. See Figure 11.2.
Dead
False: 0.882
True: 0.118
probCounterfactual.py — (continued)
245
246 12. Planning with Uncertainty
decnNetworks.py — (continued)
29 class DecisionVariable(Variable):
30 def __init__(self, name, domain, parents, position=None):
31 Variable.__init__(self, name, domain, position)
32 self.parents = parents
33 self.all_vars = set(parents) | {self}
A decision network is a graphical model where the variables can be random
variables or decision variables. Among the factors we assume there is one util-
ity factor.
decnNetworks.py — (continued)
35 class DecisionNetwork(BeliefNetwork):
36 def __init__(self, title, vars, factors):
37 """vars is a list of variables
38 factors is a list of factors (instances of CPD and Utility)
39 """
40 GraphicalModel.__init__(self, title, vars, factors) # don't call
init for BeliefNetwork
41 self.var2parents = ({v : v.parents for v in vars if
isinstance(v,DecisionVariable)}
42 | {f.child:f.parents for f in factors if
isinstance(f,CPD)})
43 self.children = {n:[] for n in self.variables}
44 for v in self.var2parents:
45 for par in self.var2parents[v]:
46 self.children[par].append(v)
47 self.utility_factor = [f for f in factors if
isinstance(f,Utility)][0]
48 self.topological_sort_saved = None
The split order ensures that the parents of a decision node are split before
the decision node, and no other variables (if that is possible).
decnNetworks.py — (continued)
50 def split_order(self):
51 so = []
52 tops = self.topological_sort()
53 for v in tops:
54 if isinstance(v,DecisionVariable):
55 so += [p for p in v.parents if p not in so]
56 so.append(v)
57 so += [v for v in tops if v not in so]
58 return so
decnNetworks.py — (continued)
60 def show(self):
61 plt.ion() # interactive
62 ax = plt.figure().gca()
63 ax.set_axis_off()
64 plt.title(self.title)
Weather
Forecast Utility
Umbrella
Report Call
decnNetworks.py — (continued)
Cheat Decision
Watched Punish
Caught1 Caught2
Grade_1 Grade_2
Fin_Grd
Chain of 3 decisions
The following example is a finite-stage fully-observable Markov decision pro-
cess with a single reward (utility) at the end. It is interesting because the par-
ents do not include all predecessors. The methods we use will work without
change on this, even though the agent does not condition on all of its previous
observations and actions. The output of ch3.show() is shown in Figure 12.4.
decnNetworks.py — (continued)
3-chain
Utility
S0 S1 S2 S3
D0 D1 D2
184
185 ch3U = UtilityTable([S3],[0,1], position=(7/7,0.9))
186
187 ch3 = DecisionNetwork("3-chain",
{S0,D0,S1,D1,S2,D2,S3},{p_s0,p_s1,p_s2,p_s3,ch3U})
188 #rc3 = RC_DN(ch3)
189 #rc3.optimize()
190 #rc3.opt_policy
200
201 gm is graphical model to query
202 """
203
204 def __init__(self,gm=None):
205 self.gm = gm
206 self.cache = {(frozenset(), frozenset()):1}
207 ## self.max_display_level = 3
208
209 def optimize(self, split_order=None):
210 """computes expected utility, and creates optimal decision
functions, where
211 elim_order is a list of the non-observed non-query variables in gm
212 """
213 if split_order == None:
214 split_order = self.gm.split_order()
215 self.opt_policy = {}
216 return self.rc({}, self.gm.factors, split_order)
The following us the simplest search-based algorithm. It is exponential in
the number of variables, so is not very useful. However, it is simple, and useful
to understand before looking at the more complicated algorithm. Note that the
above code does not call rc0; you will need to change the self.rc to self.rc0
in above code to use it.
decnNetworks.py — (continued)
We can combine the optimization for decision networks above, with the
improvements of recursive conditioning used for graphical models (Section
9.6, page 193).
decnNetworks.py — (continued)
decnNetworks.py — (continued)
decnNetworks.py — (continued)
356
357 def __init__(self, dvar, factor):
358 """dvar is a decision variable.
359 factor is a factor that contains dvar and only parents of dvar
360 """
361 self.dvar = dvar
362 self.factor = factor
363 vars = [v for v in factor.variables if v is not dvar]
364 Factor.__init__(self,vars)
365 self.values = [None]*self.size
366 self.decision_fun = FactorDF(dvar,vars,[None]*self.size)
367
368 def get_value(self,assignment):
369 """lazy implementation: if saved, return saved value, else compute
it"""
370 index = self.assignment_to_index(assignment)
371 if self.values[index]:
372 return self.values[index]
373 else:
374 max_val = float("-inf") # -infinity
375 new_asst = assignment.copy()
376 for elt in self.dvar.domain:
377 new_asst[self.dvar] = elt
378 fac_val = self.factor.get_value(new_asst)
379 if fac_val>max_val:
380 max_val = fac_val
381 best_elt = elt
382 self.values[index] = max_val
383 self.decision_fun.values[index] = best_elt
384 return max_val
A decision function is a stored factor.
decnNetworks.py — (continued)
34 class MDPtiny(GridMDP):
35 def __init__(self, discount=0.9):
36 actions = ['right', 'upC', 'left', 'upR']
37 self.x_dim = 2 # x-dimension
38 self.y_dim = 3
39 states = [(x,y) for x in range(self.x_dim) for y in
range(self.y_dim)]
40 # for GridMDP
41 self.xoff = {'right':0.25, 'upC':0, 'left':-0.25, 'upR':0}
42 self.yoff = {'right':0, 'upC':-0.25, 'left':0, 'upR':0.25}
43 GridMDP.__init__(self, states, actions, discount)
44
45 def P(self,s,a):
46 """return a dictionary of {s1:p1} if P(s1 | s,a)=p1. Other
probabilities are zero.
47 """
48 (x,y) = s
49 if a == 'right':
50 return {(1,y):1}
51 elif a == 'upC':
52 return {(x,min(y+1,2)):1}
53 elif a == 'left':
54 if (x,y) == (0,2): return {(0,0):1}
55 else: return {(0,y): 1}
56 elif a == 'upR':
57 if x==0:
58 if y<2: return {(x,y):0.1, (x+1,y):0.1, (x,y+1):0.8}
59 else: # at (0,2)
60 return {(0,0):0.1, (1,2): 0.1, (0,2): 0.8}
61 elif y < 2: # x==1
62 return {(0,y):0.1, (1,y):0.1, (1,y+1):0.8}
63 else: # at (1,2)
64 return {(0,2):0.1, (1,2): 0.9}
65
66 def R(self,s,a):
67 (x,y) = s
68 if a == 'right':
69 return [0,-1][x]
70 elif a == 'upC':
71 return [-1,-1,-2][y]
72 elif a == 'left':
73 if x==0:
74 return [-1, -100, 10][y]
75 else: return 0
76 elif a == 'upR':
77 return [[-0.1, -10, 0.2],[-0.1, -0.1, -0.9]][x][y]
78 # at (0,2) reward is 0.1*10+0.8*-1=0.2
Here is the domain of Example 9.28 of Poole and Mackworth [2017]. Here
the state is represented as (x, y) where x counts from zero from the left, and y
counts from zero upwards, so the state (0, 0) is on the bottom-left state.
mdpExamples.py — (continued)
80 class grid(GridMDP):
81 """ x_dim * y_dim grid with rewarding states"""
82 def __init__(self, discount=0.9, x_dim=10, y_dim=10):
83 self.x_dim = x_dim # size in x-direction
84 self.y_dim = y_dim # size in y-direction
85 actions = ['up', 'down', 'right', 'left']
86 states = [(x,y) for x in range(y_dim) for y in range(y_dim)]
87 self.rewarding_states = {(3,2):-10, (3,5):-5, (8,2):10, (7,7):3}
88 self.fling_states = {(8,2), (7,7)}
89 self.xoff = {'right':0.25, 'up':0, 'left':-0.25, 'down':0}
90 self.yoff = {'right':0, 'up':0.25, 'left':0, 'down':-0.25}
91 GridMDP.__init__(self, states, actions, discount)
92
93 def intended_next(self,s,a):
94 """returns the next state in the direction a.
95 This is where the agent will end up if to goes in its
intended_direction
96 (which it does with probability 0.7).
97 """
98 (x,y) = s
99 if a=='up':
100 return (x, y+1 if y+1 < self.y_dim else y)
101 if a=='down':
102 return (x, y-1 if y > 0 else y)
103 if a=='right':
mdpProblem.py — (continued)
61 class GridMDP(MDP):
62 def __init__(self, states, actions, discount):
63 MDP.__init__(self, states, actions, discount)
64
65 def show(self):
66 #plt.ion() # interactive
67 fig,(self.ax) = plt.subplots()
68 plt.subplots_adjust(bottom=0.2)
69 stepB = Button(plt.axes([0.8,0.05,0.1,0.075]), "step")
70 stepB.on_clicked(self.on_step)
71 resetB = Button(plt.axes([0.65,0.05,0.1,0.075]), "reset")
72 resetB.on_clicked(self.on_reset)
73 self.qcheck = CheckButtons(plt.axes([0.2,0.05,0.35,0.075]),
74 ["show q-values","show policy"])
75 self.qcheck.on_clicked(self.show_vals)
76 self.font_box = TextBox(plt.axes([0.1,0.05,0.05,0.075]),"Font:",
textalignment="center")
77 self.font_box.on_submit(self.set_font_size)
78 self.font_box.set_val(str(plt.rcParams['font.size']))
79 self.show_vals(None)
80 plt.show()
81
82 def set_font_size(self, s):
83 plt.rcParams.update({'font.size': eval(s)})
84 plt.draw()
85
86 def show_vals(self,event):
87 self.ax.cla()
88 array = [[self.v[(x,y)] for x in range(self.x_dim)]
89 for y in range(self.y_dim)]
90 self.ax.pcolormesh([x-0.5 for x in range(self.x_dim+1)],
91 [x-0.5 for x in range(self.y_dim+1)],
92 array, edgecolors='black',cmap='summer')
93 # for cmap see
https://matplotlib.org/stable/tutorials/colors/colormaps.html
94 if self.qcheck.get_status()[1]: # "show policy"
95 for (x,y) in self.q:
96 maxv = max(self.q[(x,y)][a] for a in self.actions)
97 for a in self.actions:
98 if self.q[(x,y)][a] == maxv:
99 # draw arrow in appropriate direction
100 self.ax.arrow(x,y,self.xoff[a]*2,self.yoff[a]*2,
101 color='red',width=0.05, head_width=0.2,
length_includes_head=True)
102 if self.qcheck.get_status()[0]: # "show q-values"
103 self.show_q(event)
104 else:
105 self.show_v(event)
106 self.ax.set_xticks(range(self.x_dim))
107 self.ax.set_xticklabels(range(self.x_dim))
108 self.ax.set_yticks(range(self.y_dim))
109 self.ax.set_yticklabels(range(self.y_dim))
110 plt.draw()
111
112 def on_step(self,event):
113 self.vi(1)
114 self.show_vals(event)
115
116 def show_v(self,event):
117 """show values"""
118 for (x,y) in self.v:
119 self.ax.text(x,y,"{val:.2f}".format(val=self.v[(x,y)]),ha='center')
120
121 def show_q(self,event):
122 """show q-values"""
123 for (x,y) in self.q:
124 for a in self.actions:
125 self.ax.text(x+self.xoff[a],y+self.yoff[a],
126 "{val:.2f}".format(val=self.q[(x,y)][a]),ha='center')
127
128 def on_reset(self,event):
129 self.v = self.initv
130 self.q = self.initq
131 self.show_vals(event)
Figure 12.5 shows the user interface for the tiny domain, which can be ob-
tained using
MDPtiny(discount=0.9).show()
resizing it, checking “show q-values” and “show policy”, and clicking “step” a
few times.
Figure 12.6 shows the user interface for the grid domain, which can be ob-
tained using
grid(discount=0.9).show()
resizing it, checking “show q-values” and “show policy”, and clicking “step” a
few times.
Exercise 12.1 Computing q before v may seem like a waste of space because we
don’t need to store q in order to compute value function or the policy. Change the
algorithm so that it loops through the states and actions once per iteration, and
only stores the value function and the policy. Note that to get the same results as
before, you would need to make sure that you use the previous value of v in the
computation not the current value of v. Does using the current value of v hurt the
algorithm or make it better (in approaching the actual value function)?
24.27 21.71
2 28.09 22.34 25.03 21.34
23.03 20.34
14.10 21.84
1 -78.56 19.25 21.44 18.25
24.03 21.34
20.53 18.78
0 17.09 16.67 18.09 15.67
20.44 18.25
0 1
Figure 12.5: Interface for tiny example, after a number of steps. Each rectangle
represents a state. In each rectangle are the 4 Q-values for the state. The leftmost
number is the for the left action; the rightmost number is for the right action; the
upper most is for the upR (up-risky) action and the lowest number is for the
upC action. The arrow points to the action(s) with the maximum Q-value. Use
MDPtiny().show() after loading mdpExamples.py
0.12 0.54 0.85 1.18 1.57 2.01 2.50 2.89 2.57 2.03
9 0.12 0.94 0.92 1.32 1.27 1.65 1.59 2.01 1.94 2.43 2.35 2.90 2.80 3.37 3.22 3.27 3.39 2.87 2.93 2.03
0.93 1.35 1.68 2.04 2.46 2.94 3.49 3.99 3.58 3.02
0.90 1.33 1.65 2.00 2.40 2.87 3.41 3.82 3.49 2.93
8 0.51 1.33 1.32 1.74 1.68 2.10 2.03 2.51 2.43 3.00 2.90 3.56 3.44 4.17 3.94 4.00 4.21 3.58 3.64 2.72
1.19 1.63 1.99 2.42 2.93 3.52 4.21 4.91 4.32 3.73
1.17 1.59 1.93 2.32 2.82 3.37 4.00 6.01 4.21 3.60
7 0.65 1.48 1.45 1.91 1.83 2.32 2.21 2.82 2.73 3.44 3.31 4.13 3.96 4.97 6.01 6.01 5.12 4.30 4.35 3.42
1.20 1.60 1.90 2.27 3.07 3.69 4.33 6.01 5.10 4.50
1.24 1.67 2.00 2.07 3.07 3.77 4.50 5.34 4.86 4.34
6 0.59 1.39 1.39 1.75 1.69 2.05 1.66 2.41 2.51 3.45 3.40 4.14 4.05 4.83 4.70 5.32 5.10 5.01 5.14 4.23
1.21 1.60 1.70 -0.62 3.07 4.05 4.79 5.57 5.97 5.40
1.21 1.58 1.49 -2.72 2.80 3.91 4.62 5.34 5.71 5.22
5 0.63 1.43 1.41 1.59 1.35 -0.79 -3.07 -2.16 -0.23 3.45 3.54 4.65 4.53 5.50 5.31 6.21 5.96 5.97 6.19 5.20
1.37 1.78 1.77 -2.32 3.38 4.63 5.51 6.45 7.19 6.46
1.29 1.70 1.83 -0.44 3.42 4.49 5.34 6.24 6.86 6.27
4 0.82 1.67 1.64 2.13 2.02 2.58 2.12 3.17 3.26 4.51 4.42 5.48 5.32 6.48 6.25 7.46 7.10 7.13 7.48 6.33
1.43 1.88 2.26 2.46 4.33 5.43 6.47 7.62 8.71 7.69
1.43 1.89 2.24 2.13 4.14 5.24 6.25 7.40 8.29 7.48
3 0.83 1.68 1.65 2.13 2.00 2.57 1.81 3.20 3.43 5.15 5.06 6.39 6.20 7.61 7.39 9.01 8.45 8.50 9.06 7.65
1.34 1.73 1.65 -2.96 4.30 6.08 7.44 9.00 10.61 9.10
1.41 1.81 1.46 -7.13 3.78 5.81 7.07 8.44 13.01 8.59
2 0.72 1.50 1.47 1.47 1.06 -3.31 -8.04 -6.26 -2.38 4.81 4.96 7.05 6.77 8.68 8.26 10.60 13.01 13.01 10.70 8.85
1.44 1.84 1.50 -7.10 3.78 5.81 7.07 8.44 13.01 8.59
1.35 1.76 1.69 -2.91 4.30 6.08 7.44 9.00 10.61 9.11
1 0.87 1.72 1.69 2.19 2.07 2.64 1.89 3.25 3.46 5.16 5.06 6.39 6.20 7.62 7.39 9.01 8.46 8.51 9.06 7.65
1.45 1.99 2.45 2.43 4.15 5.22 6.24 7.40 8.32 7.50
1.39 1.90 2.35 2.94 4.37 5.40 6.46 7.63 8.76 7.71
0 0.78 1.69 1.63 2.28 2.16 2.89 2.75 3.63 3.55 4.53 4.40 5.45 5.29 6.47 6.26 7.50 7.15 7.19 7.52 6.36
0.78 1.34 1.89 2.55 3.44 4.30 5.24 6.29 7.15 6.36
0 1 2 3 4 5 6 7 8 9
show q-values
reset step
show policy
Figure 12.6: Interface for grid example, after a number of steps. Each rectan-
gle represents a state. In each rectangle are the 4 Q-values for the state. The
leftmost number is the for the left action; the rightmost number is for the right
action; the upper most is for the up action and the lowest number is for the down
action. The arrow points to the action(s) with the maximum Q-value. From
grid(discount=0.9).show()
Exercise 12.2 Implement value iteration that stores the V-values rather than the
Q-values. Does it work better than storing Q? (What might better mean?)
Exercise 12.3 In asynchronous value iteration, try a number of different ways
to choose the states and actions to update (e.g., sweeping through the state-action
pairs, choosing them at random). Note that the best way may be to determine
which states have had their Q-values change the most, and then update the pre-
vious ones, but that is not so straightforward to implement, because you need to
find those previous states.
Reinforcement Learning
26 class Healthy_env(RL_env):
27 def __init__(self):
28 RL_env.__init__(self,["party","relax"], "healthy")
29
269
270 13. Reinforcement Learning
50 class Env_from_MDP(RL_env):
51 def __init__(self, mdp):
52 initial_state = mdp.states[0]
53 RL_env.__init__(self,mdp.actions, initial_state)
54 self.mdp = mdp
55 self.action_index = {action:index for (index,action) in
enumerate(mdp.actions)}
56 self.state_index = {state:index for (index,state) in
enumerate(mdp.states)}
57
58 def do(self, action):
59 """updates the state based on the agent doing action.
60 returns state,reward
61 """
62 action_ind = self.action_index[action]
63 state_ind = self.state_index[self.state]
64 self.state = pick_from_dist(self.mdp.trans[state_ind][action_ind],
self.mdp.states)
65 reward = self.mdp.reward[state_ind][action_ind]
4 P1 R P2
3 M
2 M
1 M M M
0 P3 P4
0 1 2 3 4
22
23 prize_locs = [(0,0), (0,4), (4,0), (4,4)]
24 prize_apears_prob = 0.3
25 prize_reward = 10
26
27 monster_locs = [(0,1), (1,1), (2,3), (3,1), (4,2)]
28 monster_appears_prob = 0.4
29 monster_reward_when_damaged = -10
30 repair_stations = [(1,4)]
31
32 actions = ["up","down","left","right"]
33
34 def __init__(self):
35 # State:
36 self.x = 2
37 self.y = 2
38 self.damaged = False
39 self.prize = None
40 # Statistics
41 self.number_steps = 0
42 self.total_reward = 0
43 self.min_reward = 0
44 self.min_step = 0
45 self.zero_crossing = 0
46 RL_env.__init__(self, Monster_game_env.actions,
47 (self.x, self.y, self.damaged, self.prize))
48 self.display(2,"","Step","Tot Rew","Ave Rew",sep="\t")
49
50 def do(self,action):
51 """updates the state based on the agent doing action.
52 returns state,reward
53 """
54 reward = 0.0
55 # A prize can appear:
56 if self.prize is None and flip(self.prize_apears_prob):
57 self.prize = random.choice(self.prize_locs)
58 # Actions can be noisy
59 if flip(0.4):
60 actual_direction = random.choice(self.actions)
61 else:
62 actual_direction = action
63 # Modeling the actions given the actual direction
64 if actual_direction == "right":
65 if self.x==self.xdim-1 or (self.x,self.y) in self.vwalls:
66 reward += self.crashed_reward
67 else:
68 self.x += 1
69 elif actual_direction == "left":
70 if self.x==0 or (self.x-1,self.y) in self.vwalls:
71 reward += self.crashed_reward
72 else:
73 self.x += -1
74 elif actual_direction == "up":
75 if self.y==self.ydim-1:
76 reward += self.crashed_reward
77 else:
78 self.y += 1
79 elif actual_direction == "down":
80 if self.y==0:
81 reward += self.crashed_reward
82 else:
83 self.y += -1
84 else:
85 raise RuntimeError("unknown_direction "+str(direction))
86
87 # Monsters
88 if (self.x,self.y) in self.monster_locs and
flip(self.monster_appears_prob):
89 if self.damaged:
90 reward += self.monster_reward_when_damaged
91 else:
92 self.damaged = True
93 if (self.x,self.y) in self.repair_stations:
94 self.damaged = False
95
96 # Prizes
97 if (self.x,self.y) == self.prize:
98 reward += self.prize_reward
99 self.prize = None
100
101 # Statistics
102 self.number_steps += 1
103 self.total_reward += reward
104 if self.total_reward < self.min_reward:
105 self.min_reward = self.total_reward
106 self.min_step = self.number_steps
107 if self.total_reward>0 and reward>self.total_reward:
108 self.zero_crossing = self.number_steps
109 self.display(2,"",self.number_steps,self.total_reward,
110 self.total_reward/self.number_steps,sep="\t")
111
112 return (self.x, self.y, self.damaged, self.prize), reward
13.2 Q Learning
To run the Q-learning demo, in folder “aipython”, load “rlQTest.py”,
and copy and paste the example queries at the bottom of that file. This
assumes Python 3.
rlQLearner.py — Q Learning
11 import random
12 from display import Displayable
13 from utilities import argmaxe, flip
14
15 class RL_agent(Displayable):
16 """An RL_Agent
17 has percepts (s, r) for some state s and real reward r
18 """
rlQLearner.py — (continued)
20 class Q_learner(RL_agent):
21 """A Q-learning agent has
22 belief-state consisting of
23 state is the previous state
24 q is a {(state,action):value} dict
25 visits is a {(state,action):n} dict. n is how many times action was
done in state
26 acc_rewards is the accumulated reward
27
28 it observes (s, r) for some world-state s and real reward r
29 """
rlQLearner.py — (continued)
46 self.discount = discount
47 self.explore = explore
48 self.fixed_alpha = fixed_alpha
49 self.alpha = alpha
50 self.alpha_fun = alpha_fun
51 self.qinit = qinit
52 self.label = label
53 self.restart()
restart is used to make the learner relearn everything. This is used by the plot-
ter to create new plots.
rlQLearner.py — (continued)
55 def restart(self):
56 """make the agent relearn, and reset the accumulated rewards
57 """
58 self.acc_rewards = 0
59 self.state = self.env.state
60 self.q = {}
61 self.visits = {}
do takes in the number of steps.
rlQLearner.py — (continued)
63 def do(self,num_steps=100):
64 """do num_steps of interaction with the environment"""
65 self.display(2,"s\ta\tr\ts'\tQ")
66 alpha = self.alpha
67 for i in range(num_steps):
68 action = self.select_action(self.state)
69 next_state,reward = self.env.do(action)
70 if not self.fixed_alpha:
71 k = self.visits[(self.state, action)] =
self.visits.get((self.state, action),0)+1
72 alpha = self.alpha_fun(k)
73 self.q[(self.state, action)] = (
74 (1-alpha) * self.q.get((self.state, action),self.qinit)
75 + alpha * (reward + self.discount
76 * max(self.q.get((next_state,
next_act),self.qinit)
77 for next_act in self.actions)))
78 self.display(2,self.state, action, reward, next_state,
79 self.q[(self.state, action)], sep='\t')
80 self.state = next_state
81 self.acc_rewards += reward
select action us used to select the next action to perform. This can be reimple-
mented to give a different exploration strategy.
rlQLearner.py — (continued)
rlQExperienceReplay.py — (continued)
83 # plot_rl(sag1ar,steps_explore=100000,steps_exploit=100000,label="AR
alpha="+str(sag1ar.alpha))
84 sag2ar = Q_AR_learner(senv,0.9,explore=0.2,fixed_alpha=False)
85 # plot_rl(sag2ar,steps_explore=100000,steps_exploit=100000,label="AR
alpha=1/k")
86 sag3ar =
Q_AR_learner(senv,0.9,explore=0.2,fixed_alpha=False,alpha_fun=lambda
k:10/(9+k))
87 # plot_rl(sag3ar,steps_explore=100000,steps_exploit=100000,label="AR
alpha=10/(9+k)")
• q[s, a] is dictionary that, given a (s, a) pair returns the Q-value, the esti-
mate of the future (discounted) value of being in state s and doing action
a.
• r[s, a] is dictionary that, given a (s, a) pair returns the average reward
from doing a in state s.
• visits[s, a] is dictionary that, given a (s, a) pair returns the number of times
action a was carried out in state s.
• res states[s, a] is dictionary that, given a (s, a) pair returns the list of re-
sulting states that have occurred when action a was carried out in state s.
This is used in the asynchronous value iteration to determine the s′ states
to sum over.
• visits list is a list of (s, a) pair that have been carried out. This is used
to ensure there is no divide-by zero in the asynchronous value iteration.
Note that this could be constructed from r, visits or res states by enumer-
ating the keys, but needs to be a list for random.choice, and we don’t want
to keep recreating it.
rlModelLearner.py — (continued)
39 def restart(self):
40 """make the agent relearn, and reset the accumulated rewards
41 """
42 self.acc_rewards = 0
43 self.state = self.env.state
44 self.q = {} # {(st,action):q_value} map
45 self.r = {} # {(st,action):reward} map
46 self.t = {} # {(st,action,st_next):count} map
47 self.visits = {} # {(st,action):count} map
48 self.res_states = {} # {(st,action):set_of_states} map
49 self.visits_list = [] # list of (st,action)
50 self.previous_action = None
rlModelLearner.py — (continued)
52 def do(self,num_steps=100):
53 """do num_steps of interaction with the environment
54 for each action, do updates_per_step iterations of asynchronous
value iteration
55 """
56 for step in range(num_steps):
rlModelLearner.py — (continued)
rlModelLearner.py — (continued)
Exercise 13.3 If there was only one update per step, the algorithm can be made
simpler and use less space. Explain how. Does it make it more efficient? Is it
worthwhile having more than one update per step for the games implemented
here?
32 f7 = 1-f6
33 # f8: damaged and prize ahead
34 f8 = 1 if d and f3 else 0
35 # f9: not damaged and prize ahead
36 f9 = 1 if not d and f3 else 0
37 features = [1,f1,f2,f3,f4,f5,f6,f7,f8,f9]
38 # the next 20 features are for 5 prize locations
39 # and 4 distances from outside in all directions
40 for pr in Monster_game_env.prize_locs+[None]:
41 if p==pr:
42 features += [x, 4-x, y, 4-y]
43 else:
44 features += [0, 0, 0, 0]
45 # fp04 feature for y when prize is at 0,4
46 # this knows about the wall to the right of the prize
47 if p==(0,4):
48 if x==0:
49 fp04 = y
50 elif y<3:
51 fp04 = y
52 else:
53 fp04 = 4-y
54 else:
55 fp04 = 0
56 features.append(fp04)
57 return features
58
59 def monster_ahead(x,y,action):
60 """returns 1 if the location expected to get to by doing
61 action from (x,y) can contain a monster.
62 """
63 if action == "right" and (x+1,y) in Monster_game_env.monster_locs:
64 return 1
65 elif action == "left" and (x-1,y) in Monster_game_env.monster_locs:
66 return 1
67 elif action == "up" and (x,y+1) in Monster_game_env.monster_locs:
68 return 1
69 elif action == "down" and (x,y-1) in Monster_game_env.monster_locs:
70 return 1
71 else:
72 return 0
73
74 def wall_ahead(x,y,action):
75 """returns 1 if there is a wall in the direction of action from (x,y).
76 This is complicated by the internal walls.
77 """
78 if action == "right" and (x==Monster_game_env.xdim-1 or (x,y) in
Monster_game_env.vwalls):
79 return 1
80 elif action == "left" and (x==0 or (x-1,y) in Monster_game_env.vwalls):
81 return 1
82 elif action == "up" and y==Monster_game_env.ydim-1:
83 return 1
84 elif action == "down" and y==0:
85 return 1
86 else:
87 return 0
88
89 def towards_prize(x,y,action,p):
90 """action goes in the direction of the prize from (x,y)"""
91 if p is None:
92 return 0
93 elif p==(0,4): # take into account the wall near the top-left prize
94 if action == "left" and (x>1 or x==1 and y<3):
95 return 1
96 elif action == "down" and (x>0 and y>2):
97 return 1
98 elif action == "up" and (x==0 or y<2):
99 return 1
100 else:
101 return 0
102 else:
103 px,py = p
104 if p==(4,4) and x==0:
105 if (action=="right" and y<3) or (action=="down" and y>2) or
(action=="up" and y<2):
106 return 1
107 else:
108 return 0
109 if (action == "up" and y<py) or (action == "down" and py<y):
110 return 1
111 elif (action == "left" and px<x) or (action == "right" and x<px):
112 return 1
113 else:
114 return 0
115
116 def towards_repair(x,y,action):
117 """returns 1 if action is towards the repair station.
118 """
119 if action == "up" and (x>0 and y<4 or x==0 and y<2):
120 return 1
121 elif action == "left" and x>1:
122 return 1
123 elif action == "right" and x==0 and y<3:
124 return 1
125 elif action == "down" and x==0 and y>2:
126 return 1
127 else:
128 return 0
129
38 self.get_features = get_features
39 self.actions = env.actions
40 self.discount = discount
41 self.explore = explore
42 self.step_size = step_size
43 self.winit = winit
44 self.label = label
45 self.restart()
restart() is used to make the learner relearn everything. This is used by the
plotter to create new plots.
rlFeatures.py — (continued)
47 def restart(self):
48 """make the agent relearn, and reset the accumulated rewards
49 """
50 self.acc_rewards = 0
51 self.state = self.env.state
52 self.features = self.get_features(self.state,
list(self.env.actions)[0])
53 self.weights = [self.winit for f in self.features]
54 self.action = self.select_action(self.state)
do takes in the number of steps.
rlFeatures.py — (continued)
56 def do(self,num_steps=100):
57 """do num_steps of interaction with the environment"""
58 self.display(2,"s\ta\tr\ts'\tQ\tdelta")
59 for i in range(num_steps):
60 next_state,reward = self.env.do(self.action)
61 self.acc_rewards += reward
62 next_action = self.select_action(next_state)
63 feature_values = self.get_features(self.state,self.action)
64 oldQ = dot_product(self.weights, feature_values)
65 nextQ = dot_product(self.weights,
self.get_features(next_state,next_action))
66 delta = reward + self.discount * nextQ - oldQ
67 for i in range(len(self.weights)):
68 self.weights[i] += self.step_size * delta * feature_values[i]
69 self.display(2,self.state, self.action, reward, next_state,
70 dot_product(self.weights, feature_values), delta,
sep='\t')
71 self.state = next_state
72 self.action = next_action
73
74 def select_action(self, state):
75 """returns an action to carry out for the current agent
76 given the state, and the q-function.
77 This implements an epsilon-greedy approach
78 where self.explore is the probability of exploring.
79 """
80 if flip(self.explore):
81 return random.choice(self.actions)
82 else:
83 return argmaxe((next_act, dot_product(self.weights,
84 self.get_features(state,next_act)))
85 for next_act in self.actions)
86
87 def show_actions(self,state=None):
88 """prints the value for each action in a state.
89 This may be useful for debugging.
90 """
91 if state is None:
92 state = self.state
93 for next_act in self.actions:
94 print(next_act,dot_product(self.weights,
self.get_features(state,next_act)))
95
96 def dot_product(l1,l2):
97 return sum(e1*e2 for (e1,e2) in zip(l1,l2))
Test code:
rlFeatures.py — (continued)
Exercise 13.6 How does the step-size affect performance? Try different step sizes
(e.g., 0.1, 0.001, other sizes in between). Explain the behaviour you observe. Which
step size works best for this example. Explain what evidence you are basing your
prediction on.
Exercise 13.7 Does having extra features always help? Does it sometime help?
Does whether it helps depend on the step size? Give evidence for your claims.
Exercise 13.8 For each of the following first predict, then plot, then explain the
behavour you observed:
(a) SARSA LFA, Model-based learning (with 1 update per step) and Q-learning
for 10,000 steps 20% exploring followed by 10,000 steps 100% exploiting
(b) SARSA LFA, model-based learning and Q-learning for
i) 100,000 steps 20% exploring followed by 100,000 steps 100% exploit
ii) 10,000 steps 20% exploring followed by 190,000 steps 100% exploit
(c) Suppose your goal was to have the best accumulated reward after 200,000
steps. You are allowed to change the exploration rate at a fixed number of
steps. For each of the methods, which is the best position to start exploiting
more? Which method is better? What if you wanted to have the best reward
after 10,000 or 1,000 steps?
Based on this evidence, explain when it is preferable to use SARSA LFA, Model-
based learner, or Q-learning.
Important: you need to run each algorithm more than once. Your explanation
should include the variability as well as the typical behavior.
41 next_state,reward = self.env.do(self.action)
42 self.add_to_buffer((self.state,self.action,reward,next_state))
#remember experience
43 self.acc_rewards += reward
44 next_action = self.select_action(next_state)
45 feature_values = self.get_features(self.state,self.action)
46 oldQ = dot_product(self.weights, feature_values)
47 nextQ = dot_product(self.weights,
self.get_features(next_state,next_action))
48 delta = reward + self.discount * nextQ - oldQ
49 for i in range(len(self.weights)):
50 self.weights[i] += self.step_size * delta * feature_values[i]
51 self.display(2,self.state, self.action, reward, next_state,
52 dot_product(self.weights, feature_values), delta,
sep='\t')
53 self.state = next_state
54 self.action = next_action
55 if self.number_added > self.burn_in:
56 for i in range(self.num_updates_per_action):
57 (s,a,r,ns) =
self.action_buffer[random.randrange(min(self.number_added,
58 self.max_buffer_size))]
59 na = self.select_action(ns)
60 feature_values = self.get_features(s,a)
61 oldQ = dot_product(self.weights, feature_values)
62 nextQ = dot_product(self.weights, self.get_features(ns,na))
63 delta = reward + self.discount * nextQ - oldQ
64 for i in range(len(self.weights)):
65 self.weights[i] += self.step_size * delta *
feature_values[i]
Test code:
rlLinExperienceReplay.py — (continued)
Multiagent Systems
14.1 Minimax
Here we consider two-player zero-sum games. Here a player only wins when
another player loses. This can be modeled as where there is a single utility
which one agent (the maximizing agent) is trying minimize and the other agent
(the minimizing agent) is trying to minimize.
291
292 14. Multiagent Systems
29
30 def children(self):
31 """returns the list of all children."""
32 return self.allchildren
33
34 def evaluate(self):
35 """returns the evaluation for this node if it is a leaf"""
36 return self.value
The following gives the tree from Figure 11.5 of the book. Note how 888 is used
as a value here, but never appears in the trace.
masProblem.py — (continued)
38 fig10_5 = Node("a",True,None, [
39 Node("b",False,None, [
40 Node("d",True,None, [
41 Node("h",False,None, [
42 Node("h1",True,7,None),
43 Node("h2",True,9,None)]),
44 Node("i",False,None, [
45 Node("i1",True,6,None),
46 Node("i2",True,888,None)])]),
47 Node("e",True,None, [
48 Node("j",False,None, [
49 Node("j1",True,11,None),
50 Node("j2",True,12,None)]),
51 Node("k",False,None, [
52 Node("k1",True,888,None),
53 Node("k2",True,888,None)])])]),
54 Node("c",False,None, [
55 Node("f",True,None, [
56 Node("l",False,None, [
57 Node("l1",True,5,None),
58 Node("l2",True,888,None)]),
59 Node("m",False,None, [
60 Node("m1",True,4,None),
61 Node("m2",True,888,None)])]),
62 Node("g",True,None, [
63 Node("n",False,None, [
64 Node("n1",True,888,None),
65 Node("n2",True,888,None)]),
66 Node("o",False,None, [
67 Node("o1",True,888,None),
68 Node("o2",True,888,None)])])])])
6 1 8
7 5 3
2 9 4
70
71 class Magic_sum(Node):
72 def __init__(self, xmove=True, last_move=None,
73 available=[1,2,3,4,5,6,7,8,9], x=[], o=[]):
74 """This is a node in the search for the magic-sum game.
75 xmove is True if the next move belongs to X.
76 last_move is the number selected in the last move
77 available is the list of numbers that are available to be chosen
78 x is the list of numbers already chosen by x
79 o is the list of numbers already chosen by o
80 """
81 self.isMax = self.xmove = xmove
82 self.last_move = last_move
83 self.available = available
84 self.x = x
85 self.o = o
86 self.allchildren = None #computed on demand
87 lm = str(last_move)
88 self.name = "start" if not last_move else "o="+lm if xmove else
"x="+lm
89
90 def children(self):
91 if self.allchildren is None:
92 if self.xmove:
93 self.allchildren = [
94 Magic_sum(xmove = not self.xmove,
95 last_move = sel,
96 available = [e for e in self.available if e is
not sel],
97 x = self.x+[sel],
98 o = self.o)
99 for sel in self.available]
100 else:
101 self.allchildren = [
102 Magic_sum(xmove = not self.xmove,
103 last_move = sel,
104 available = [e for e in self.available if e is
not sel],
105 x = self.x,
106 o = self.o+[sel])
107 for sel in self.available]
108 return self.allchildren
109
110 def isLeaf(self):
111 """A leaf has no numbers available or is a win for one of the
players.
112 We only need to check for a win for o if it is currently x's turn,
113 and only check for a win for x if it is o's turn (otherwise it would
114 have been a win earlier).
115 """
116 return (self.available == [] or
117 (sum_to_15(self.last_move,self.o)
118 if self.xmove
119 else sum_to_15(self.last_move,self.x)))
120
121 def evaluate(self):
122 if self.xmove and sum_to_15(self.last_move,self.o):
123 return -1
124 elif not self.xmove and sum_to_15(self.last_move,self.x):
125 return 1
126 else:
127 return 0
128
129 def sum_to_15(last,selected):
130 """is true if last, toegether with two other elements of selected sum
to 15.
131 """
132 return any(last+a+b == 15
133 for a in selected if a != last
134 for b in selected if b != last and b != a)
24 return max_score,max_path
25 else:
26 min_score = float("inf")
27 min_path = None
28 for C in node.children():
29 score,path = minimax(C,depth+1)
30 if score < min_score:
31 min_score = score
32 min_path = C.name,path
33 return min_score,min_path
The following is a depth-first minimax with α-β pruning. It returns the
value for a node as well as a best path for the agents.
masMiniMax.py — (continued)
35 def minimax_alpha_beta(node,alpha,beta,depth=0):
36 """node is a Node, alpha and beta are cutoffs, depth is the depth
37 returns value, path
38 where path is a sequence of nodes that results in the value
39 """
40 node.display(2," "*depth,"minimax_alpha_beta(",node.name,", ",alpha, ",
", beta,")")
41 best=None # only used if it will be pruned
42 if node.isLeaf():
43 node.display(2," "*depth,"returning leaf value",node.evaluate())
44 return node.evaluate(),None
45 elif node.isMax:
46 for C in node.children():
47 score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
48 if score >= beta: # beta pruning
49 node.display(2," "*depth,"pruned due to
beta=",beta,"C=",C.name)
50 return score, None
51 if score > alpha:
52 alpha = score
53 best = C.name, path
54 node.display(2," "*depth,"returning max alpha",alpha,"best",best)
55 return alpha,best
56 else:
57 for C in node.children():
58 score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
59 if score <= alpha: # alpha pruning
60 node.display(2," "*depth,"pruned due to
alpha=",alpha,"C=",C.name)
61 return score, None
62 if score < beta:
63 beta=score
64 best = C.name,path
65 node.display(2," "*depth,"returning min beta",beta,"best=",best)
66 return beta,best
Testing:
masMiniMax.py — (continued)
28
29 def init_action(self):
30 """ The initial action.
31 Act randomly initially
32 Could be overridden (but I'm not sure why you would).
33 """
34 self.act = random.choice(self.actions)
35 self.dist[self.act] += 1
36 return self.act
37
38 def select_action(self, reward):
39 """
40 Select the action given the reward.
41 This implements "Act randomly" and should be overridden!
42 """
43 self.total_score += reward
44 self.act = random.choice(self.actions)
45 self.dist[self.act] += 1
46 return self.act
masLearn.py — (continued)
48 class SimpleQAgent(GameAgent):
49 """This agent just counts the number of times (it thinks) it has won
and does the
50 actions it thinks is most likely to win.
51 """
52 def __init__(self, actions, alpha=0.1, q_init=1, explore=0.01):
53 """
54 Actions is the set of actions the agent can do.
55 alpha is the Q step size
56 q_init is the initial q-values
57 explore is the probability of an exporatory (random) action
58 """
59 GameAgent.__init__(self, actions)
60 self.Q = {a:q_init for a in self.actions}
61 self.dist = {act:1 for act in actions} # unnormalized distibution
62 self.num_steps = 0
63 self.alpha = alpha
64 self.explore = explore
65
66 def select_action(self, reward):
67 self.total_score += reward
68 self.num_steps += 1
69 self.display(2,f"The reward for agent {self.id} was {reward}")
70 self.Q[self.act] += self.alpha*(reward-self.Q[self.act])
71 if random.random() < self.explore:
72 self.act = random.choice(self.actions) # act randomly
73 else:
74 self.act = utilities.argmaxd(self.Q)
75 self.dist[self.act] += 1
masLearn.py — (continued)
79 class StochasticQAgent(GameAgent):
80 """This agent maintains the Q-function for each state.
81 (Or just the average reward as the future state is all the same).
82 Chooses the best action using
83 """
84 def __init__(self, actions, alpha=0.1, q_init=10, p_init=5):
85 """
86 Actions is the set of actions the agent can do.
87 alpha is the Q step size
88 q_init is the initial q-values
89 p_init is the initial counts for q
90 """
91 GameAgent.__init__(self, actions)
92 self.Q = {a:q_init for a in self.actions}
93 self.dist = {a:p_init for a in self.actions} # start with random
dist
94 self.alpha = alpha
95 self.num_steps = 0
96
97 def select_action(self, reward):
98 self.total_score += reward
99 self.display(2,f"The reward for agent {self.id} was {reward}")
100 self.Q[self.act] += self.alpha*(reward-self.Q[self.act])
101 a_best = utilities.argmaxall(self.Q.items())
102 for a in a_best:
103 self.dist[a] += 1
104 self.display(2,f"Distribution for agent {self.id} is {self.dist}")
105 self.act = select_from_dist(self.dist)
106 self.display(2,f"Agent {self.id} did {self.act}")
107 return self.act
108
109 def normalize(dist):
110 """unnorm dict is a {value:number} dictionary, where the numbers are
all non-negative
111 returns dict where the numbers sum to one
112 """
113 tot = sum(dist.values())
114 return {var:val/tot for (var,val) in dist.items()}
115
116 def select_from_dist(dist):
117 rand = random.random()
118 for (act,prob) in normalize(dist).items():
119 rand -= prob
120 if rand < 0:
121 return act
The simulator takes a game and simulates the game:
masLearn.py — (continued)
masLearn.py — (continued)
154
155 def plot_dynamics(self, x_action=0, y_action=0):
156 #plt.ion() # make it interactive
157 agents = self.agents
158 x_act = self.game.actions[0][x_action]
159 y_act = self.game.actions[1][y_action]
160 plt.xlabel(f"Probability {self.game.players[0]}
{self.agents[0].actions[x_action]}")
161 plt.ylabel(f"Probability {self.game.players[1]}
{self.agents[1].actions[y_action]}")
162 plt.plot([self.dist_history[t][0][x_act] for t in
range(len(self.dist_history))],
163 [self.dist_history[t][1][y_act] for t in
range(len(self.dist_history))],
164 color='k')
165 #plt.legend()
166 plt.savefig('soccerplot.pdf')
167 plt.show()
The following are some games from Poole and Mackworth [2017].
masLearn.py — (continued)
masLearn.py — (continued)
252 """
253 def __init__(self, actions, alpha=0.1, q_init=10, p_init=50,
beta=0.001):
254 """
255 Actions is the set of actions the agent can do.
256 alpha is the Q step size
257 q_init is the initial q-values
258 p_init is the initial counts for q
259 beta is the discount for older probabilities
260 """
261 GameAgent.__init__(self, actions)
262 self.Q = {a:q_init for a in self.actions}
263 self.dist = {a:p_init for a in self.actions} # start with random
dist
264 self.alpha = alpha
265 self.beta = beta
266 self.num_steps = 0
267
268 def select_action(self, reward):
269 self.total_score += reward
270 self.display(2,f"The reward for agent {self.id} was {reward}")
271 self.Q[self.act] += self.alpha*(reward-self.Q[self.act])
272 a_best = utilities.argmaxall(self.Q.items())
273 for a in self.Q.keys():
274 self.dist[a] *= (1-self.beta)
275 for a in a_best:
276 self.dist[a] += 1
277 self.display(2,f"Distribution for agent {self.id} is {self.dist}")
278 self.act = select_from_dist(self.dist)
279 self.display(2,f"Agent {self.id} did {self.act}")
280 return self.act
Relational Learning
303
304 15. Relational Learning
27 ):
28 self.rating_set = rating_set
29 self.ratings = rating_subset or rating_set.training_ratings #
whichever is not empty
30 if test_subset is None:
31 self.test_ratings = self.rating_set.test_ratings
32 else:
33 self.test_ratings = test_subset
34 self.step_size = step_size
35 self.reglz = reglz
36 self.num_properties = num_properties
37 self.num_ratings = len(self.ratings)
38 self.ave_rating = (sum(r for (u,i,r,t) in self.ratings)
39 /self.num_ratings)
40 self.users = {u for (u,i,r,t) in self.ratings}
41 self.items = {i for (u,i,r,t) in self.ratings}
42 self.user_bias = {u:0 for u in self.users}
43 self.item_bias = {i:0 for i in self.items}
44 self.user_prop = {u:[random.uniform(-property_range,property_range)
45 for p in range(num_properties)]
46 for u in self.users}
47 self.item_prop = {i:[random.uniform(-property_range,property_range)
48 for p in range(num_properties)]
49 for i in self.items}
50 self.zeros = [0 for p in range(num_properties)]
51 self.iter=0
52
53 def stats(self):
54 self.display(1,"ave sumsq error of mean for training=",
55 sum((self.ave_rating-rating)**2 for
(user,item,rating,timestamp)
56 in self.ratings)/len(self.ratings))
57 self.display(1,"ave sumsq error of mean for test=",
58 sum((self.ave_rating-rating)**2 for
(user,item,rating,timestamp)
59 in self.test_ratings)/len(self.test_ratings))
60 self.display(1,"error on training set",
61 self.evaluate(self.ratings))
62 self.display(1,"error on test set",
63 self.evaluate(self.test_ratings))
relnCollFilt.py — (continued)
65 def prediction(self,user,item):
66 """Returns prediction for this user on this item.
67 The use of .get() is to handle users or items not in the training
set.
68 """
69 return (self.ave_rating
70 + self.user_bias.get(user,0) #self.user_bias[user]
71 + self.item_bias.get(item,0) #self.item_bias[item]
72 +
sum([self.user_prop.get(user,self.zeros)[p]*self.item_prop.get(item,self.zeros)[
73 for p in range(self.num_properties)]))
74
75 def learn(self, num_iter = 50):
76 """ do num_iter iterations of gradient descent."""
77 for i in range(num_iter):
78 self.iter += 1
79 abs_error=0
80 sumsq_error=0
81 for (user,item,rating,timestamp) in
random.sample(self.ratings,len(self.ratings)):
82 error = self.prediction(user,item) - rating
83 abs_error += abs(error)
84 sumsq_error += error * error
85 self.user_bias[user] -= self.step_size*error
86 self.item_bias[item] -= self.step_size*error
87 for p in range(self.num_properties):
88 self.user_prop[user][p] -=
self.step_size*error*self.item_prop[item][p]
89 self.item_prop[item][p] -=
self.step_size*error*self.user_prop[user][p]
90 for user in self.users:
91 self.user_bias[user] -= self.step_size*self.reglz*
self.user_bias[user]
92 for p in range(self.num_properties):
93 self.user_prop[user][p] -=
self.step_size*self.reglz*self.user_prop[user][p]
94 for item in self.items:
95 self.item_bias[item] -=
self.step_size*self.reglz*self.item_bias[item]
96 for p in range(self.num_properties):
97 self.item_prop[item][p] -=
self.step_size*self.reglz*self.item_prop[item][p]
98 self.display(1,"Iteration",self.iter,
99 "(Ave Abs,AveSumSq) training
=",self.evaluate(self.ratings),
100 "test =",self.evaluate(self.test_ratings))
relnCollFilt.py — (continued)
15.1.2 Plotting
relnCollFilt.py — (continued)
This plots a single property. Each (user, item, rating) is plotted where the
x-value is the value of the property for the user, the y-value is the value of the
property for the item, and the rating is plotted at this (x, y) position. That is,
rating is plotted at the (x, y) position (p(user), p(item)).
relnCollFilt.py — (continued)
relnCollFilt.py — (continued)
Version History
• 2022-08-13 Version 0.9.5 major revisions including extra code for causality
and deep learning
• 2021-07-08 Version 0.9.1 updated the CSP code to have the same repre-
sentation of variables as used by the probability code
• 2020-10-20 Version 0.8.4 planning simplified, and gives error if goal not
part of state (by design). Fixed arc costs.
311
Bibliography
Dua, D. and Graff, C. (2017), UCI machine learning repository. URL http://
archive.ics.uci.edu/ml. 127
313
Index
315
316 Index
uncertainty, 179
unit test, 19, 46, 65, 93, 94, 96
unrolling
DBN, 225
updatable priority queue, 81
utility, 245
utility table, 245
XGBoost, 162
yield, 14