CP164 ExamNotes
CP164 ExamNotes
CP164 ExamNotes
Our stacks, queues, and priority queues have been implemented with the Python list that allows us to access individual elements with an index value. However,
these data structures can also be built upon linked data elements rather than indexed data elements. We can add and delete these linked data elements as required
and so replicate the methods of the array-based data structures. We will also discover that implementing non-linear data structures, such as trees, is actually easier
to do with a linked structure than with an array-based structure.
Linked data structures are based upon nodes. A node typically consists of two parts:
1. a data part, which may contain any kind of value
2. at least one pointer, or link, part, which links to another node
A data structure is built by linking together a series of these nodes. How they are linked together determines the type of data
structure that is built. We will look at a linked Stack data structure in order to understand such a structure is built.
A node contains a single value and a link to the next node.
The _Stack_Node Class
A stack node is very simple: it needs only a _value attribute and a _next attribute. A node is created only when there is a value to add to the stack. There is no use
for an empty node. The _Stack_Node constructor takes care of that. Other than the constructor, the _Stack_Node class needs no methods as the Stack class takes
care of manipulating the nodes. Thus the _Stack_Node class code is very simple:
Note that there are no special cases to consider when pushing a new
value onto the stack. The new node always becomes the top node
whether the stack is empty or not.
Pop
Popping an item from a linked stack means extracting the value from the top
node then removing that node. The stack's _top pointer must now point to the
next node in the stack. This is easy: get the current top node's link to its next
node by referring to its _next attribute, then making that node the new top.
This is the reverse of push.
Is there a special case for when the last node is removed? No.
The _top attribute is set to the value of the _next attribute of the last node in
the stack, which is None, and the stack becomes empty again.
MOVING NODES
Combine
You were asked to write two different ways to combine two stacks into one. One version required a function that used the ADT methods to combine the stacks.
Because the ADT methods look identical from the outside for both the array-based and linked implementations of the stack, that version could use the linked
stack representation without change - you would simply import the linked implementation of the stack rather than the array-based implementation.
The second approach involved extending the ADT by adding an method the worked at the class level to combine two stacks into one. Although the method
signatures (the def lines of the methods) are identical, the implementation details of the two stacks are radically different. However, the general algorithm is not.
It requires looping through the two input stacks moving data to the target stack until one or the other of the input stacks is exhausted. Then the algorithm loops
through one of the remaining input stacks (if any) moving data to the target stack until the remaining input stack is exhausted:
The key difference between the two approaches is that in the array-based approach the actual node data is moved around. In the linked version it
is the the node connections that are changed and the data is left where it is.
temp = source._top
source._top = source._top._next temp._next = self._top
self._top = temp
2
1
7
Once the top is moved, we no longer care what happens to temp. In the combine algorithm, it is used repeatedly to move nodes, and when the method ends, it disappears,
leaving the combined stack behind.
Once the proper location has been found, you can then create the new node (you can't
create it before you know what its _next should be), and adjust the pointer on the
previous node to point to it, and the new node is now inserted into the proper place in
the sorted list:
The key to understanding this process is to understand that without keeping track of
the node before the location of the new node (i.e. previous), you will lose the
connections between the nodes.
These side-by-side code examples compare array-based code (on the left/top) against
linked code (on the right/bottom) to demonstrate that despite the differences in
details, the algorithms are often very similar. The details are, of course, absolutely
key, so you have to clearly understand those differences, but the algorithms are often
- not always, but often - the same.
LINKED RECURSION
Printing Contents of a Linked Structure
Printing the contents of a linked List from front to rear is very simple. Loop
through the nodes, following the links to the next node until None is reached:
def print_i(self): --- Framed pic before
The algorithm is simple, and one we shall see again and again when processing linked structures: assign a current variable to the top of the stack and move
through the stack node by node, updating current to the next node until it reaches the end of the links, i.e. when _next is None.
What if we want to print the contents of the List from rear to front? (Call the method print_r.) Then we have a problem. The links go only one way:
Looping through the linked List from rear to front cannot be done simply. Because the nodes used are singly-linked, each node can point to only one other node.
For inserting and removing purposes, that node has to be the next node in the data structure. The node has no way to get back to its parent node. How, then, can
we print the contents of this List from rear to front?
In this example the function string_reverse works directly with a simple data structure, in this case a string. The function calls itself directly and passes the same
data type to its subcall. Problems arise, however, with more complex data structures such as Lists. The actual data is stored within a more complex structure.
Unfortunately, this outer 'layer' of the data structure is not amenable to being used recursively because, unlike the strings and lists of our previous examples, it is
made up of layers of elements. Our linked version of a List has this structure:
We wish to work recursively with the _front portion of the List definition - the part where the nodes begin - not the entire List.
Our recursive solution would look something like this:
This function works by recursively moving through the List node by node until it reaches the last
node. Only then, after the last recursive call has finished, does the function execute
its print statement as the recursive calls finish.
The problem is that the call to some_func requires a node as a parameter. However, the call
to print_r has no parameters other than self. Adding a node parameter to print_r at the top level violates the ADT and means that the function so written would
not work with a different implementation of the ADT. The array-based version of print_r doesn't require extra parameters.
The solution is to use an auxiliary function that works with the node portion of the List only. Although we can claim that print_r is recursive, the actual recursion
is done in its auxiliary function. This is print_r itself:
Note that print_r has no extra nodes and does not recursively call itself. Instead, it calls a private
auxiliary function. This function is passed the front of the node list rather than the entire List.
This auxiliary function works directly with the node data:
This function makes a recursive call to itself, and with each call passes a link to the next node in
the list of nodes. (Note that printing from front to rear could also easily be done recursively by
moving the print line up so that the printing is done before the recursive call.)
Note that auxiliary recursive functions generally have more parameters and/or different parameters
than the base function that calls them. If you write an auxiliary function that uses the same number
and type of parameters as the base function that calls it, then you probably don't need the auxiliary
function (or you wrote it incorrectly).
The _BST_Node Class
Each BST node is only slightly more complex than a singly linked node. Instead of a next node pointer, each
node contains a copy of the value it stores, pointers to its left and right children, if any, and the height of its
subtree. (The height of any subtree is defined as the maximum height of its children, plus one.)
The BST Class
The BST class constructor is very simple, containing only a root node that is initialized to None and a node count.
Searching a BST
Searching a BST can be done either recursively or iteratively.
To understand how to search a BST we have to understand how nodes are order in the tree. As noted above:
~ for each node N, the keys in the left subtree of N are less than the key of N, and the keys in the right subtree of N are
greater than the key of N
This means that each time we look at a node, if the value in the node does not match the key value we must look to
either the left or right children of the node. The direction chosen depends on the comparison between the current node
value and the key. The algorithm is:
- Get the root node.
- If there is no node, the search fails.
- Search the tree: compare the node value to the key:
If the key and node value match, the search ends successfully.
If the key < the node value, search the left subtree of the node.
If the key > the node value, search the right subtree of the node.
Remember that we can treat each node as the root of its own tree. The following diagram shows the route through
the tree taken to search for the value 8:
The iterative algorithm for the search works nicely because we can treat the path through the BST as a singly-linked list. We never have to search both the left
and right subtrees - the choice of the next subtree to search is determined strictly by the comparison of a node value to the search key.
What is the time complexity of this algorithm? The hint is to note that after each value-key comparison, potentially half of the remaining tree can be ignored. The
key word here, however, is potentially. In a well-balanced BST where each level is filled with nodes before the next level has nodes attached, the search method
looks like a binary search, and requires O(log n) time on average.
To insert into a BST we perform some of the same steps that we do when searching for a
key in the tree. However, nodes are always inserted as leaves, i.e. at the bottom of the tree
below an existing node, never in the middle of the tree. The important thing to understand,
then, is how to find the correct position of the new value within the existing tree: following
the search rules will always get us to the proper location within the tree. Once we arrive
there we create a new node that becomes the root of a new subtree within the BST.
In the following example we wish to insert the value 13 into the BST. The value 13 ends up in
a node that is the right child of the node containing the value 12:
It should be clear that the shape of a BST is
determined by the insertion order of the values in the
tree. A nicely balanced
tree requires that some thought be put into the order in which items are inserted into the tree. We will examine this in lab.
The last consideration to take into account is what to do with duplicate values? At first we will work with trees that do not
allow duplicates. Duplicate values are simply ignored. We still have to search the tree to determine if there is a duplicate,
but if we find one in the tree we can stop searching and exit the insertion method. There are a number of approaches to
working with trees that allow duplicates, and we will examine some of them later.
Why Recursion?
At first glance, it seems that an iterative algorithm would perform a node insertion quite nicely. It would look a great deal like the retrieve algorithm, except that
it would fail if a duplicate value were found, and add a node to the bottom of the tree if no duplicate existed. However, note that the heights of the nodes
containing 12 and 15 are incremented by one to take into account the fact that their maximum subtree heights have increased by one with the addition of the new
node. These heights cannot be updated from the top down, since the lower nodes must have their heights updated before their parent nodes. The iterative insertion
algorithm, which works from the top down, does not lend itself to an easy solution for this.
- A recursive solution allows us to update a tree both from the top down and the bottom up. We can find the location of the new node on the way down the tree,
and update the node heights on the way back up the tree. The insertion code involves the following three methods:
_update_height is a private _BST_Node method that simply takes the maximum height of its two
child nodes and adds one to it to get its new height. It does not return a value as it is updating its
own private _height attribute.
insert is a public BST method that takes a value as a parameter and attempts to insert that value
into its proper place in
the BST. If it succeeds, it
returns True,
and False otherwise. inse
rt does not do much
work as its only job is to
call a private auxiliary
method that does the
actual work. The
auxiliary method is required because it requires two parameters, a node and a value,
rather than the single parameter required by insert. The auxiliary method also returns a
node as well as the insertion success flag. This node updates the root node of each
subtree.
Because insert_aux is recursive it allows work to be performed on the way down the tree
as well as back up the tree. On the way down the tree the method searches for the proper
place to insert the new node. This search has two base cases: the first when it
reaches None, and is therefore at the bottom of the tree and ready to insert a new node;
and the second when it finds a node that already contains the value to insert, and therefore
the insertion fails. The two general cases simply move further down the tree, either to the
left or right depending on the value being inserted.
On the way back up the tree insert_aux updates the height of each node that it has already
traversed all the way back to the root node. Because it is performing these height updates
from the bottom up, the updated heights will correctly reflect the addition of the new
node at the bottom of the tree. Note that only the nodes traversed on the way down the
tree can possible be affected by these new heights - the nodes on the bypassed halves of
the tree and subtrees remain unchanged.
Binary Search Tree Deletion
Inserting nodes into a BST is fairly straightforward. Deleting a node from a BST is not.
Deleting a node may cause major shifts in the positions of the rest of the elements in the
tree. We will look at deleting nodes from various positions in the tree and how that affects
the structure of the tree. Starting with the BST:
If we wish to delete the node containing the value 18, it is fairly straightforward.
First, we have to find the node to delete, keeping track of its parent in much the same way that we kept track of the previous node when removing
a value from a singly-linked list. Note, however, we have to keep track of whether we are removing the parent node's left or right child.
Finding the maximum value in the left subtree is simple. We move to the left child of the node to be deleted, then keep moving to the right of that child until
there are no nodes left, i.e. we reach a node that has no right child. This is the node that we will use to replace the deleted node.
In our sample tree, if we remove the root node with the value 11, it will be replaced by the node containing the value 9. We would find this node by moving to
the 7 node (the left child of 11), then moving to the right until we reach None. The 9 node is to the right of the 7 node, and the 9 node has no right child, so it
must be the node with the maximum value of the left subtree of the 11 node. All of the nodes above 9 may have to have their heights adjusted appropriately.
When the 9 node takes the place of the 11 node as the new root it is clear that the BST retains its key property: 9 is larger than
any value in its left subtree and smaller than any value in its right subtree.
The only difficulty is that the 9 node has a child: the 8 node. However, removing the 9 node is no different than removing a
node with a single child - the child node moves up to take the place of its parent. In this case the right pointer of the 7 node
must now point to the 8 node, while the 9 node must change its left and right pointers to point to the 7 and 12 nodes
respectively. (If the 9 node had no children, it would be even easier to move it.)
As in our other examples, the parent of the removed node must have its left or right pointer (as appropriate) updated to
point to the new node. In this particular case the node being removed is the root node, so there is no parent to update.
The method is very simple and follows the algorithm noted above:
check to see if the node exists, and if it does count that node (1 +), and
then call the recursive method on the left (self._count_aux(node._left)) and right (self._count_aux(node._right))
children.
Inorder, Preorder, Postorder, Levelorder Traversals
In the example above of the count method, the order in which the tree nodes are traversed makes no difference.
We could walk through the tree by recursing through the right children rather than the left children first, and the
resulting count would be the same. However, there are some traversal orders that give us information about the
data stored in the tree, or tell us something about the order in which data was stored in the tree.
Inorder Traversal
Inorder traversal allows us to walk through the tree and access node data in order - by that we mean we can either print or extract (i.e. copy the data to an array)
the data in value order. For our sample tree, and inorder traversal would give us the data back in the following order:
6, 7, 8, 9, 11, 12, 15, 18
In short, we are retrieving the tree data in order, thus the name of the traversal. This makes BSTs extremely powerful in terms of doing things like sorting data.
Attempt to visit a node. If the node is not None:
- Visit the node's left child
- Print or extract the data.
- Visit the node's right child
Preorder Traversal
Attempt to visit a node. If the node is not None:
- Print or extract the data.
- Visit the node's left child
- Visit the node's right child
Thus the data in our sample tree would be printed or extracted in preorder as:
11, 7, 6, 9, 8, 15, 12, 18
Preorder is useful in that if we insert data into a BST in preorder, we will produce a tree with a structure identical to that of the tree that we extracted the data
from. Try it with the preorder data above: you should end up with a tree that looks the same as the tree at the top of these notes.
Postorder Traversal
Postorder moves the data processing to the end:
Attempt to visit a node. If the node is not None:
- Visit the node's left child
- Visit the node's right child
- Print or extract the data.
Thus the data in our sample tree would be printed or extracted in postorder as:
6, 8, 9, 7, 12, 18, 15, 11
Levelorder Traversal
A levelorder traversal returns data according to the level that the data occupies in the tree. Like preorder, inserting data into the tree in levelorder should
reproduce the tree. It also helps you visualize the tree. The algorithm is:
If the root node is not None:
Create a queue for nodes. Add the root node to this queue.
As long as the queue is not empty:
- Remove the front node from the queue.
- Extract / print the data in the extracted node
- If not None, add node's left child to the rear of the queue
- If not None, add node's right child to the rear of the queue
Thus the data in our sample tree would be printed or extracted in levelorder as:
11, 7, 15, 6, 9, 12, 18, 8
SORTING
Analysis of Insertion Sort
Worst Case: In the worst case every element in the sorted part of the array must be moved to the right. If A[i] were the current key to be inserted, then
values A[0], A[1], ... A[i-1] must be moved and A[i] compared to all of them.
Since i comparisons are needed to insert the key value in A[i] into its proper
place, the total number of comparisons is 1+2+...+n-1 = n(n-1)/2 <= n2: thus
Insertion Sort is O(n2) in the worst case.
Average Case: is also O(n2).
Tree Sort
The Main Idea: let a BST do the work. Simply insert the keys into a BST then
do an inorder traversal of the tree to get the sorted elements back.
Analysis of Tree Sort
Worst Case: O(n2). (Under what circumstances?)
Average Case: O(n log n).
Selection Sort
The Main Idea: treat the unsorted array as it if were a priority queue. If we always move the item of highest priority remaining amongst the unsorted elements to
the front by swapping it with the element that is originally there, then the items end up sorted. Example:
Analysis of Selection Sort
The analysis is similar to that of the Insertion Sort, including the calculation of the number of comparisons necessary to complete
the sort.
So far we have been looking at the number of comparisons as a good indicator of an algorithm's complexity. However, there are
other tasks that require CPU time. For example, the Selection Sort requires O(n2) comparisons and O(n)swaps. (It is still
an O(n2) algorithm because the n2 term dominates.) How many swaps does Bubble Sort require in the worst case?
Quick Sort
The Main Idea: divide and conquer, but do so by picking a pivot value, where one subarray consists of values smaller than the pivot, and the other subarray
consists of values larger than the pivot.
Quick Sort is the "fastest" known sorting method based upon key comparisons. In actual trials it beats the other methods discussed.
Implementation of Quick Sort
Divide
- Partition the array A[m..n] into two (possibly empty) subarrays A[m...q-1] and A[q+1...n}. Each value in the left array must be smaller than the value in A[q],
and each value in the right array must be larger than the value in A[q].
- Sort these left and right subarrays by recursive calls to Quick Sort.
Partition
- Choose one element in the array as a pivot. This may be the first, last, middle, or some other arbitrary element. (We use the middle index in our sample code.)
- Define two indexes i and j that start at either end of the array (excluding the pivot location).
- Move i through the array to the right and j through the array to the left, swapping A[i] and A[j] if they are out of position with respect to the pivot value.
- At the end, swap A[i+1] with the pivot. Once positioned, pivot elements never move.
Quick Sort Analysis
Quick sort is, on average, an O(n log n) algorithm. In its worst case it is O(n2), though this is rare.
Notes - Types of Recursion
When considering how to write a recursive algorithm, it is useful to know some basic different approaches to recursion.
Fruitful Recursion
Fruitful recursion simply uses a function that returns one or more values. This is nothing new to us - we have written many functions that return something. In the
context of recursion, however, this means that the recursion relies on the fact that each call returns a value that will be used in the recursion.
We will demonstrate this approach by reversing the contents of a list. This algorithm solves the problem by starting at the ends of the list, swapping the values at
the ends, then working inwards with the next pair of values until the middle of the list is reached and there are no values left to swap. As part of these examples
we will use a swap function to 'swap' two elements of a list:
def swap(data, i, j): temp = data[i] data[i] = data[j] data[j] = temp return
In this example, we reverse the contents of a Python list by changing slices of the list:
def rev_list_f(data): n = len(data) if n > 1: swap(data, 0, n-1) data[1:n - 1] = rev_list_f(data[1:n - 1]) return data
Sample call:
>>> x = [1, 2, 3, 4, 5] >>> x = rev_list_f(x) >>> print(x) [5, 4, 3, 2, 1]
rev_list_f first determines whether or not there are enough elements to swap - a list must have at least two elements in it to
perform a swap. It then updates the interior of the list (i.e. everything except the end values) with the result of the recursive call.
An ever smaller list is passed to the recursive call, guaranteeing that eventually the base case will be reached. Because the
function returns a value it is a fruitful function, and because the function is fruitful the result of the function must be used or the
changes to the list are lost.
The following diagram shows how the fruitful recursion is handled:>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
children. Note that Tree Recursion may either be fruitful or in-place.