Data Structures & Algorithm in Java - Robert Lafore - PPT
Data Structures & Algorithm in Java - Robert Lafore - PPT
1
Objectives
Provide background in data structures (arrangements of data
in memory) and algorithms (methods for manipulating this
data)
Why is this important? The answer is there is far more to
programming than finding a solution that will work:
Execution time
Memory consumption
So by the end of the course, you should be more equipped to
not just develop an accurate solution to a problem, but the
best solution possible.
2
Definition of a Data Structure
A data structure is an arrangement of data in a computer’s
memory (or disk).
Questions to ponder:
What are some examples of data structures you already know
about from your Java course?
3
Definition of an Algorithm
An algorithm provides a set of instructions for manipulating
data in structures.
Questions to ponder:
What’s an example of an algorithm?
4
Data Structure or Algorithm?
Linked List
Sort
Search
Stack
Vector
5
Real World Data Storage
Real-world data: data that describes physical entities external
to the computer. Can we think of some examples?
6
Real World Data Storage
Say we wanted to convert one of these systems to a computer
program, what must we consider?
Memory consumption:
Scalability:
7
Important: Data Structures can be
HUGE!!!
What are some example scenarios where we need large data
structures?
8
Programmer Tools
Do not store real-world data, but perform some function
internal to the machine.
Example - A ‘stack’ of data, where I can only insert or
remove elements from the top:
9
Real World Modeling
Effectively, ‘simulate’ a real-world situation.
For example, what could the following represent:
10
Real World Modeling
How about airline routes?
This type of structure is called an ‘undirected graph’
Example applications:
Grocery store lines
Traffic (Queues are actually used when determining timing of
traffic lights!! How? Let’s think about it)
12
Data Structure Trade-offs
A structure we have dealt with before: arrays
Requirement that is enforced:
Arrays store data sequentially in memory.
Let’s name the advantages (i.e., when is an array efficient?)
13
Overall Costs for Structures We’ll Study
Structure Access Search Insert Delete Impl. Memory
Array Low High Med High Low Low
Ord. Array Low Med High High Med Low
Linked List High High Low Low Med Med
Stack Med High Med Med Med Med
Queue Med High Med Med Med Med
Bin.Tree Med Low Low Low High High
R-BTree Med Low Low Low Very High High
234 Tree Med Low Low Low Very High High
Hash Table Med Med Low High Low High
Heap Med Med Low Low High High
Graph High High Med Med Med Med
15
Databases
A database refers to all data that will be dealt with in a
particular situation. We can think of a database as a table of
rows and columns:
16
Database Records
A record is the unit into which a database is divided. How is a
record represented in a table:
A clothing catalogue?
18
Database Keys
Given a database (a collection of records), a common
operation is obviously searching. In particular we often want
to find a single particular record. But what exactly does this
mean? Each record contains multiple fields, i.e.:
20
Java.util Package
Includes Vector, Stack, Dictionary, and Hashtable. We won’t
cover these particular implementations but know they are
there and accessible through:
import java.util.*;
You may not use these on homeworks unless I explicitly say you
can.
Several other third-party libraries available
A central purpose of Java
21
Review of Object-Oriented
Programming
Procedural Programming Languages
Examples: C, Pascal, early BASIC
What is the main unit of abstraction?
Object-Oriented Languages:
Examples: C++, Ada, Java
What is the main unit of abstraction?
Obviously procedural languages weren’t good enough in all
cases. Let’s rediscover why.
22
Main Limitations of Procedural
Programming
1. Poor Real World Modeling. Let’s discuss why.
23
Idea of Objects
A programming unit which has associated:
Variables (data), and
Methods (functions to manipulate this data).
How does this address the two problems on the previous
slide?
Real World Modeling
Organization
24
Idea of Classes (Java, C++)
Objects by themselves can create lots of redundancy. Why?
class thermostat {
private float currentTemp;
private float desiredTemp;
thermostat therm1;
therm1 = new thermostat();
thermostat therm2 = new thermostat();
26
Invoking Methods of an Object
Parts of the program external to the class can access its
methods (unless they are not declared public):
Dot operator:
therm2.furnace_on();
Can I access data members similarly?
therm2.currentTemp = 77;
What would I need to change to do so?
Is this change good programming practice?
How, ideally, should data members be accessed?
27
Another Example
If you have your book, look at the BankAccount class on page
18. If you don’t have it, don’t worry I’ll write it on the
board.
Look at the output. Let’s go over why this is generated.
28
Inheritance
Creation of one class, called the base class
Creation of another class, called the derived class
Has all features of the base, plus some additional features.
Example:
A base class Animal may have associated methods eat() and
run() and variable name, and a derived class Dog can inherit
from Animal, gaining these methods plus a new method
bark().
If name is private, does Dog have the attribute?
How do we enforce Dog to also have attribute name?
29
Polymorphism
Idea: Treat objects of different classes in the same way.
What’s the requirement?
30
Software Engineering
Bigger picture. How do the topics of this course fit relate to
software engineering?
The life cycle of a software project consists of the following:
Specification – purpose of the software, requirements , etc.
Design – components and interaction of the software
Verification – individual components and global functionality
Coding – actual writing of the software
Testing – validating proper functionality
Production – distribution to the community
Maintenance – updating the software (FYI in a poor design,
these costs can be high!!)
31
Stage 1: Specification
Here we answer the questions of what purpose the software
serves, and requirements. Very high level at this stage.
How do data structures fit in?
The specification _______________ the data structures.
Why?
32
Stage 2: Design
At this stage, we break software into components and
describe their interaction.
The data structures help _______________ the design.
Why?
33
Stage 3: Verification
Verification involves a review of all components, studying
their individual inputs and outputs and ensuring functionality
of the design.
In order to do this, we have to know the expected ________
and __________ of the data structures. Why?
34
Stage 4: Coding
Involves the actual writing of the software.
How can data structures save time at this stage? Think in
terms of reusable components.
35
Stage 5: Testing
Involves running the software package through a set of
benchmarks, which make up a testsuite, and verify proper
functionality. You can also test individual components.
How does understanding a data structure help in terms of
testing?
36
Stage 6: Production
Distributing the software to the community.
If we use reusable components from another source, what
must we do in this case? Think about how Java packages
work.
37
Stage 7: Maintenance
Updating the software while keeping the internal design
intact.
How do generic components come in handy here? How
might individual components (and thus their associated data
structures) have to scale with issues such as data size? How
about data types?
In a simple situation, think about maintenance on a software
package where every data structure was hardcoded to operate
on only integers. What happens when we extend to floats?
See why these costs can be high? How can software updates
propagate between components?
Dependencies can grow exponentially!
38
Final Review of some Java Concepts
Difference between a value and a reference:
int intVar;
BankAccount bc1;
39
Java Assignments
What must be noted about the following code snippet:
40
Java Garbage Collection
When is the memory allocated to an object reclaimed in Java?
Code like this would leak memory in C++, but does not in Java
because of the garbage collector:
while (true) {
Integer tmp = new Integer();
…
}
41
Passing by Value vs. Reference
Same idea:
void method1() {
BankAccount ba1 = new BankAccount(350.00);
float num = 4;
}
void method2(BankAccount acct) { … }
void method3(float f) { … }
42
== vs. equals()
carPart cp1 = new carPart(“fender”);
carPart cp2 = cp1;
// What’s the difference between this:
if (cp1 == cp2)
System.out.println(“Same”);
// And this:
if (cp1. equals(cp2)
System.out.println(“Same”);
Does “Same” print twice, once, or not at all?
43
Primitive Sizes and Value Ranges
44 Source: roseindia.net
Screen Output
System.out is an output stream which corresponds to
standard output, which is the screen:
45
Keyboard Input
Package: java.io
Read a string: InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(isr);
String s = br.readLine();
47
The Array
Most commonly used data structure
Common operations
Insertion
Searching
Deletion
How do these differ for an ‘ordered array’?
How do these differ for an array which does not allow
duplicates?
48
Array Storage
An array is a collection of data of the same type
Stored linearly in memory:
49
Remember, value vs. reference…
In Java:
Data of a primitive type is a ____________.
All objects are ________________.
50
Defining a Java Array
Say, of 100 integers:
int[] intArray;
intArray = new int[100];
51
We said an array was a reference…
That means if we do this:
int[] intArray;
intArray = new int[100];
52
The Size
Size of an array cannot change once it’s been declared:
intArray = new int[100];
But, one nice thing is that arrays are objects. So you can
access its size easily:
int arrayLength = intArray.length;
53
Access
Done by using an index number in square brackets:
int temp = intArray[3]; // Gets 4th element
intArray[7] = 66; // Sets 8th element
54
Initialization
What do the elements of this array contain:
int[] intArray = new int[100];
How about this one:
BankAccount[] myAccounts = new BankAccount[100];
What happens if we attempt to access one of these values?
55
Look at a book example…
See the example on p. 41-42, where we do the following:
Insert 10 elements into an array of integers
Display them
Find item with key 66
Delete item with key 55
Display them
Ask ourselves:
How could we make the initialization shorter?
How could we save declaring nElems?
56
This did not use OOP
So our next task will be to divide it up (p. 45)
What will we want for the array class? Let’s think about the
purpose of classes. They have data, and functions to manipulate
that data.
57
The LowArray interface
Here’s what it looked like:
59
Abstraction
This illustrates the concept of abstraction
The way in which an operation is performed inside a class is
invisible
Client of HighArray performs more complex operations
through simple method invocations
Never directly accesses the private data in the array
60
The Ordered Array
An array in which the data items are arranged in ascending
order
Smallest value is at index:
Largest value is at index:
61
That’s right, searching!
We can still do a linear search, which is what we’ve seen.
Step through the elements
62
Binary Search: Idea
Ever see the Price is Right?
Guess the price on an item
If guess is too low, Bob Barker says “higher”
If guess is too high, Bob Barker says “lower”
63
Note what this can save!
Let’s take a simple case, where we search for an item in a
100-element array:
int[] arr = {1,2,3,4,5,6,…..,100}
64
Binary Search
Array has values 1-100
First search: Check element 50
50 > 33, so repeat on first half (1-49)
Second search: Check element 25
25 < 33, so repeat on second half (26-49)
Third search: Check element 37
37 > 33, so repeat on first half (26-36)
Fourth search: Check element 31
31 < 33, so repeat on second half (32-36)
Fifth search: Check element 34
34 > 33, so repeat on first half (32-33)
Sixth search: Check element 32
32 < 33, so repeat on second half (33)
Seventh search: Check element 33! Found.
65 So 7 comparisons. With linear search, it would’ve been 33.
Affect on Operations
We saw how binary search sped up the searching operation
Can it also speed up deletion?
66
Implementation
Let’s go through the Java implementation, on pages 56-57.
At any given time:
lowerBound holds the lower index of the range we are searching
upperBound holds the upper index of the range we are
searching
curIn holds the current index we are looking at
What if the element is not in the array? What happens?
67
Now, let’s implement the OrdArray
Data
The array itself
The number of occupied slots
Methods
Constructor
Size
Find (with binary search)
Insert (with binary search)
Delete (with binary search)
Display
68
Analysis
What have we gained by using ordered arrays?
Is searching faster or slower?
Is insertion faster or slower?
Is deletion faster or slower?
69
Ordered Array: Operation Counts
Maximum number of comparisons for an ordered array of n elements,
running binary search:
n Comparisons
10 4
100 7
1000 10
10000 14
100000 17
1000000 20
How does this compare with linear search, particularly for large arrays?
Whew.
70
A Deeper Analysis
How many comparisons would be required for an array of
256 elements? (2^8)
What about 512 (2^9)?
What do you think 1024 would be (2^10)?
See the pattern?
71
Computing log2n
On a calculator, if you use the “log” button, usually the base is
10. If you want to convert:
Multiply by 3.322
72
Storing Objects
We’ve seen an example where we used arrays to store
primitive data. Now let’s look at an example which stores
objects. What’s our situation now with values and
references?
Implications?
73
Person Class
Let’s go through the Person class on page 65.
Data:
First name and last name (String objects), age (integer value)
Functions
Constructor which takes two strings and an integer
Function to display information
Function to return the last name (we’ll eventually use this for
searching)
74
Adapting our HighArray class
Rewrite the implementation on page 49
Change to operate on Persons instead of integers
Watch out for the ==!
In main() construct Person objects
75
Big-Oh Notation
Provides a metric for evaluating the efficiency of an
algorithm
Analogy: Automobiles
Subcompacts
Compacts
Midsize
etc.
76
How it’s done
It’s difficult to simply say: A is twice as fast as B
We saw with linear search vs. binary search, the comparison
can be different when you change the input size. For
example, for an array of size n:
n=16, linear search comparisons = 10, binary search
comparisons = 5
Binary search is 2x as fast
n=32, linear search comparisons = 32, binary search
comparisons = 6
Binary search is 5.3x as fast
77
Example: Insertion into Unordered
Array
Suppose we just insert at the next available position:
Position is a[nElems]
Increment nElems
78
Example: Linear search
You’ll require a loop which runs in the worst case n times
Each time, you have to:
Increment a loop counter
Compare the loop counter to n
Compare the current element to the key
Each of these operations take time independent of n, so let’s say
they consume a total time of K.
79
Example: Binary Search
We’ve already said that for an array of n elements, we need
log(n)+1 comparisons.
Each comparison takes time independent of n, call it K
80
Why this is useful
Useful to evaluate how well an algorithm scales with input
size n. For example:
O(1) scales better than…
O(log n), which scales better than…
O(n), which scales better than…
O(n log n), which scales better than…
O(n^2), etc.
81
Generally speaking…
For an input of size n and a function T(n), to compute the
Big-Oh value, you take the leading term and drop the
coefficient.
Examples – compute Big Oh values of the following
runtimes:
T(n) = 100*n^2 + n + 70000
T(n) = (n*log n) / n
T(n) = n^3 + 754,000*n^2 + 1
T(n) = (n + 2) * (log n)
82
But, these large constants must mean
something…
T(n) = n^3 + 754,000*n^2 + 1
This huge constant on the n^2 term, has to have some effect,
right?
The answer is yes and no.
83
Algorithms we’ve discussed…
Linear search: O(n)
Binary search: O(log n)
Insertion, unordered array: O(1)
Insertion, ordered array: O(n)
Deletion, unordered array: O(n)
Deletion, ordered array: O(n)
84
Graph of Big O times.
See page 72.
85
Unordered/Ordered Array Tradeoffs
Unordered
Insertion is fast – O(1)
Searching is slow – O(n)
Ordered
Searching is fast – O(log n)
Insertion is slow – O(n)
Deletion is even – O(n)
Memory can be wasted, or even misused. Let’s discuss.
86
What we will see…
There are structures (trees) which can insert, delete and
search in O(log n) time
Of course as you’d expect, they’re more complex
We will also learn about structures with flexible sizes
java.util has class Vector – what you should know:
Array of flexible size
Some efficiency is lost (why do you think?)
What happens when we try to go beyond the current size?
Why is this penalty very large at the beginning of array population?
87
Data Structure #2: Arrays
88
The Array
Most commonly used data structure
Common operations
Insertion
Searching
Deletion
How do these differ for an ‘ordered array’?
How do these differ for an array which does not allow
duplicates?
89
Array Storage
An array is a collection of data of the same type
Stored linearly in memory:
90
Remember, value vs. reference…
In Java:
Data of a primitive type is a ____________.
All objects are ________________.
91
Defining a Java Array
Say, of 100 integers:
int[] intArray;
intArray = new int[100];
92
We said an array was a reference…
That means if we do this:
int[] intArray;
intArray = new int[100];
93
The Size
Size of an array cannot change once it’s been declared:
intArray = new int[100];
But, one nice thing is that arrays are objects. So you can
access its size easily:
int arrayLength = intArray.length;
94
Access
Done by using an index number in square brackets:
int temp = intArray[3]; // Gets 4th element
intArray[7] = 66; // Sets 8th element
95
Initialization
What do the elements of this array contain:
int[] intArray = new int[100];
How about this one:
BankAccount[] myAccounts = new BankAccount[100];
What happens if we attempt to access one of these values?
96
Look at a book example…
See the example on p. 41-42, where we do the following:
Insert 10 elements into an array of integers
Display them
Find item with key 66
Delete item with key 55
Display them
Ask ourselves:
How could we make the initialization shorter?
How could we save declaring nElems?
97
This did not use OOP
So our next task will be to divide it up (p. 45)
What will we want for the array class? Let’s think about the
purpose of classes. They have data, and functions to manipulate
that data.
98
The LowArray interface
Here’s what it looked like:
100
Abstraction
This illustrates the concept of abstraction
The way in which an operation is performed inside a class is
invisible
Client of HighArray performs more complex operations
through simple method invocations
Never directly accesses the private data in the array
101
The Ordered Array
An array in which the data items are arranged in ascending
order
Smallest value is at index:
Largest value is at index:
102
That’s right, searching!
We can still do a linear search, which is what we’ve seen.
Step through the elements
103
Binary Search: Idea
Ever see the Price is Right?
Guess the price on an item
If guess is too low, Bob Barker says “higher”
If guess is too high, Bob Barker says “lower”
104
Note what this can save!
Let’s take a simple case, where we search for an item in a
100-element array:
int[] arr = {1,2,3,4,5,6,…..,100}
105
Binary Search
Array has values 1-100
First search: Check element 50
50 > 33, so repeat on first half (1-49)
Second search: Check element 25
25 < 33, so repeat on second half (26-49)
Third search: Check element 37
37 > 33, so repeat on first half (26-36)
Fourth search: Check element 31
31 < 33, so repeat on second half (32-36)
Fifth search: Check element 34
34 > 33, so repeat on first half (32-33)
Sixth search: Check element 32
32 < 33, so repeat on second half (33)
Seventh search: Check element 33! Found.
106 So 7 comparisons. With linear search, it would’ve been 33.
Affect on Operations
We saw how binary search sped up the searching operation
Can it also speed up deletion?
107
Implementation
Let’s go through the Java implementation, on pages 56-57.
At any given time:
lowerBound holds the lower index of the range we are searching
upperBound holds the upper index of the range we are
searching
curIn holds the current index we are looking at
What if the element is not in the array? What happens?
108
Now, let’s implement the OrdArray
Data
The array itself
The number of occupied slots
Methods
Constructor
Size
Find (with binary search)
Insert (with binary search)
Delete (with binary search)
Display
109
Analysis
What have we gained by using ordered arrays?
Is searching faster or slower?
Is insertion faster or slower?
Is deletion faster or slower?
110
Ordered Array: Operation Counts
Maximum number of comparisons for an ordered array of n elements,
running binary search:
n Comparisons
10 4
100 7
1000 10
10000 14
100000 17
1000000 20
How does this compare with linear search, particularly for large arrays?
Whew.
111
A Deeper Analysis
How many comparisons would be required for an array of
256 elements? (2^8)
What about 512 (2^9)?
What do you think 1024 would be (2^10)?
See the pattern?
112
Computing log2n
On a calculator, if you use the “log” button, usually the base is
10. If you want to convert:
Multiply by 3.322
113
Storing Objects
We’ve seen an example where we used arrays to store
primitive data. Now let’s look at an example which stores
objects. What’s our situation now with values and
references?
Implications?
114
Person Class
Let’s go through the Person class on page 65.
Data:
First name and last name (String objects), age (integer value)
Functions
Constructor which takes two strings and an integer
Function to display information
Function to return the last name (we’ll eventually use this for
searching)
115
Adapting our HighArray class
Rewrite the implementation on page 49
Change to operate on Persons instead of integers
Watch out for the ==!
In main() construct Person objects
116
Big-Oh Notation
Provides a metric for evaluating the efficiency of an
algorithm
Analogy: Automobiles
Subcompacts
Compacts
Midsize
etc.
117
How it’s done
It’s difficult to simply say: A is twice as fast as B
We saw with linear search vs. binary search, the comparison
can be different when you change the input size. For
example, for an array of size n:
n=16, linear search comparisons = 10, binary search
comparisons = 5
Binary search is 2x as fast
n=32, linear search comparisons = 32, binary search
comparisons = 6
Binary search is 5.3x as fast
118
Example: Insertion into Unordered
Array
Suppose we just insert at the next available position:
Position is a[nElems]
Increment nElems
119
Example: Linear search
You’ll require a loop which runs in the worst case n times
Each time, you have to:
Increment a loop counter
Compare the loop counter to n
Compare the current element to the key
Each of these operations take time independent of n, so let’s say
they consume a total time of K.
120
Example: Binary Search
We’ve already said that for an array of n elements, we need
log(n)+1 comparisons.
Each comparison takes time independent of n, call it K
121
Why this is useful
Useful to evaluate how well an algorithm scales with input
size n. For example:
O(1) scales better than…
O(log n), which scales better than…
O(n), which scales better than…
O(n log n), which scales better than…
O(n^2), etc.
122
Generally speaking…
For an input of size n and a function T(n), to compute the
Big-Oh value, you take the leading term and drop the
coefficient.
Examples – compute Big Oh values of the following
runtimes:
T(n) = 100*n^2 + n + 70000
T(n) = (n*log n) / n
T(n) = n^3 + 754,000*n^2 + 1
T(n) = (n + 2) * (log n)
123
But, these large constants must mean
something…
T(n) = n^3 + 754,000*n^2 + 1
This huge constant on the n^2 term, has to have some effect,
right?
The answer is yes and no.
124
Algorithms we’ve discussed…
Linear search: O(n)
Binary search: O(log n)
Insertion, unordered array: O(1)
Insertion, ordered array: O(n)
Deletion, unordered array: O(n)
Deletion, ordered array: O(n)
125
Graph of Big O times.
See page 72.
126
Unordered/Ordered Array Tradeoffs
Unordered
Insertion is fast – O(1)
Searching is slow – O(n)
Ordered
Searching is fast – O(log n)
Insertion is slow – O(n)
Deletion is even – O(n)
Memory can be wasted, or even misused. Let’s discuss.
127
What we will see…
There are structures (trees) which can insert, delete and
search in O(log n) time
Of course as you’d expect, they’re more complex
We will also learn about structures with flexible sizes
java.util has class Vector – what you should know:
Array of flexible size
Some efficiency is lost (why do you think?)
What happens when we try to go beyond the current size?
Why is this penalty very large at the beginning of array population?
128
Sorting Algorithms
129
Sorting in Databases
Many possibilities
Names in alphabetical order
Students by grade
Customers by zip code
Home sales by price
Cities by population
Countries by GNP
Stars by magnitude
130
Sorting and Searching
We saw with arrays they could work in tandem to improve
speed
What was the search method that required sorting an array?
131
Basic Sorting Algorithms
Bubble
Selection
Insertion
132
Example
Unordered:
Ordered:
134
Sort #1: The Bubble Sort
Way to envision:
Suppose you’re ‘near sighted’
You can only see two adjacent players at the same time
How would you sort them?
135
Bubble Sort:
First Pass
136
Bubble Sort: End of First Pass
139
How many operations total then?
First pass: n-1 comparisons, n-1 swaps
Second pass: n-2 comparisons, n-2 swaps
Third pass: n-3 comparisons, n-3 swaps
(n-1)th pass: 1 comparison, 1 swap
Then it’s sorted
141
Invariants
Algorithms tend to have invariants, i.e. facts which are true
all the time throughout its execution
In the case of the bubble sort, what is always true is….
142
Sort #2: Selection Sort
Purpose:
Improve the speed of the bubble sort
Number of comparisons: O(n2)
Number of swaps: O(n)
143
What’s Involved
Make a pass through all the players
Find the shortest one
Swap that one with the player at the left of the line
At position 0
Now the leftmost is sorted
Find the shortest of the remaining (n-1) players
Swap that one with the player at position 1
And so on and so forth…
144
Selection Sort in
Action
145
Count Operations
First Pass, for an array of size n:
How many comparisons were made?
How many (worst case) swaps were made?
146
Let’s do an example ourselves…
Use a selection sort to sort an array of ten integers:
147
How many operations total then?
First pass: n-1 comparisons, 1 swap
Second pass: n-2 comparisons, 1 swap
Third pass: n-3 comparisons, 1 swap
(n-1)th pass: 1 comparison, 1 swap
Then it’s sorted
149
A method of an array class…
See page 93, we’ll go through it.
150
Sort #3: Insertion Sort
In most cases, the best one…
2x as fast as bubble sort
Somewhat faster than selection in MOST cases
151
Proceed..
A subarray to the left is
‘partially sorted’
Start with the first element
The player immediately to
the right is ‘marked’.
The ‘marked’ player is
inserted into the correct
place in the partially sorted
array
Remove first
Marked player ‘walks’ to
the left
Shift appropriate elements
until we hit a smaller one
152
Let’s do an example ourselves…
Use a insertion sort to sort an array of ten integers:
153
Count Operations
First Pass, for an array of size n:
How many comparisons were made?
How many swaps were made?
Were there any? What were there?
154
How many operations total then?
First pass: 1 comparison, 1 copy
Second pass: 2 comparisons, 2 copies
Third pass: 3 comparisons, 3 copies
(n-1)th pass: n-1 comparison, n-1 copies
Then it’s sorted
155
Why are we claiming it’s better than
selection sort?
A swap is an expensive operation. A copy is not.
To see this, how many copies are required per swap?
Selection/bubble used swaps, insertion used copies.
156
Flow
157
Implementation
We’ll write the function (pages 99-100) together
Use the control flow on the previous slide
158
And finally…
Encapsulate the functionality within an array class (p. 101-
102)
159
Invariant of Insertion Sort
At the end of each pass, the data items with indices smaller
than __________ are partially sorted.
160
Sorting Objects
Let’s modify our insertion sort to work on a Person class,
which has three private data members:
lastName (String)
firstName (String)
age (int)
161
Lexicographic Comparison
For Java Strings, you can lexicographically compare them
through method compareTo():
s1.compareTo(s2);
Returns an integer
If s1 comes before s2 lexicographically, returns a value < 0
If s1 is the same as s2, return 0
If s1 comes after s2 lexicographically, returns a value > 0
162
Stable Object Sorts
Suppose you can have multiple persons with the same last
name.
Now given this ordering, sort by something else (first name)
A stable sort retains the first ordering when the second sort
executes.
163
Sort Comparison: Summary
Bubble Sort – hardly ever used
Too slow, unless data is very small
Selection Sort – slightly better
Useful if: data is quite small and swapping is time-consuming
compared to comparisons
Insertion Sort – most versatile
Best in most situations
Still for large amounts of highly unsorted data, there are better
ways – we’ll look at them
Memory requirements are not high for any of these
164
Stacks and Queues
165
New Structures
Stack
Queue
Priority Queue
What’s “new”?
Contrast with arrays
Usage
Access
Abstraction
166
Usage
Arrays are conducive for databases
Data which will be accessed and modified
Easy operations for insertion, deletion and searching
Although some of these are time consuming
167
Access
Arrays allow immediate access to any element
Takes constant time
Very fast
168
Abstraction
A bit higher than arrays. Why?
When a user indexes an array, they specify a memory address
Indirectly, because they say:
Array name -> address of the first element
Index -> offset from that address * size of array elements
With stacks and queues, everything is done through methods
User has no idea what goes on behind the scenes
Also no initial size needed
BIGGEST THING
Stacks, queues and priority queues can use arrays as their underlying
structure
Or linked lists…
From the user’s perspective, they are one and the same
169
Stack
A stack only allows access to the last item inserted
To get the second-to-last, remove the last
Analogy: US Postal Service
170
Performance Implication
Note what we can already infer about stack performance!
It is critical that we are able to process mail efficiently
Otherwise what happens to the letters on the bottom?
171
Applications
Compilers
Balancing parentheses, braces, brackets
Symbol tables
Parsing arithmetic expressions
Traversing nodes of trees and graphs
Invoking methods
Pocket calculators
172
The ‘push’ operation
Pushing involves placing an element on the top of the stack
Analogy: Workday
You’re given a long-term project A (push)
A coworker interrupted for temporary help with project B
(push)
Someone in accounting stops by for a meeting on project C
(push)
Emergency call for help on project D (push)
Analogy: Workday
Finish the emergency call with project D (pop)
Finish the meeting on project C (pop)
Finish the help on project B (pop)
Complete the long-term project A (pop)
174
The ‘peek’ operation
Peek allows you to view the element on top of the stack
without removing it.
175
Stack Class
Java implementation, page 120
Let’s go through it
Note, we have to pick our internal data structure
For now we’ll stick with what we know: The Array
And analyze the main()
176
Stack class methods
Constructor:
Accepts a size, creates a new stack
Internally allocates an array of that many slots
push()
Increments top and stores a data item there
pop()
Returns the value at the top and decrements top
Note the value stays in the array! It’s just inaccessible (why?)
peek()
Return the value on top without changing the stack
isFull(), isEmpty()
Return true or false
177
Pictorally, let’s view the execution of
main()
StackX theStack =
new StackX(10);
178
Push
theStack.push(20);
179
theStack.push(40);
180
theStack.push(60);
181
theStack.push(80);
182
Pop
while (!theStack.isEmpty())
{
long value = theStack.pop()
…
183
Print
while (!theStack.isEmpty())
{
…
System.out.print(value)
System.out.print(“”)
184
Pop
while (!theStack.isEmpty())
{
long value = theStack.pop()
…
185
Print
while (!theStack.isEmpty())
{
…
System.out.print(value)
System.out.print(“”)
186
Pop
while (!theStack.isEmpty())
{
long value = theStack.pop()
…
187
Print
while (!theStack.isEmpty())
{
…
System.out.print(value)
System.out.print(“”)
188
Pop
while (!theStack.isEmpty())
{
long value = theStack.pop()
…
189
Print
while (!theStack.isEmpty())
{
…
System.out.print(value)
System.out.print(“”)
190
Error Handling
When would it be responsible to perform error handling in
the case of the stack?
What function would we add it?
And how would we do it?
191
Example Application: Word Reversal
Let’s use a stack to take a string and reverse its characters
How could this work? Let’s look.
Reminder of the available operations with Strings:
If I have a string s
s.charAt(j) <- Return character with index j
s + “…” <- Append a string (or character to s)
What would we need to change about our existing stack class?
Reverser, page 125
192
Example Application: Delimiter Matching
This is done in compilers!
Parse text strings in a computer language
Sample delimiters in Java:
{, }
[, ]
(, )
193
Example Strings
c[d]
a{b[c]d}e
a{b(c]d}e
a[b{c}d]e}
a{b(c)
194
Algorithm
Read each character one at a time
If an opening delimiter, place on the stack
If a closing delimiter, pop the stack
If the stack is empty, error
Otherwise if the opening delimiter matches, continue
Otherwise, error
If the stack is not empty at the end, error
195
Example
Let’s look at a stack for a{b(c[d]e)f}
196
Example
Let’s do one that errors: a[b{c}d]e}
Together on the board
197
Java Implementation
Let’s implement the checker together
Page 129
We’ll write a function which accepts a string input
And returns true or false depending on if the string has all
delimiters matching
We can use the Stack class where the internal array held
characters
198
Stacks: Evaluation
For the tools we saw: reversing words and matching
delimiters, what about stacks made things easier?
i.e. What would have been difficult with arrays?
Why does using a stack make your program easier to
understand?
Efficiency
Push -> O(1) (Insertion is fast, but only at the top)
Pop -> O(1) (Deletion is fast, but only at the top)
Peek -> O(1) (Access is fast, but only at the top)
199
Queues
British for “line”
Somewhat like a stack
Except, first-in-first-out
Thus this is a FIFO structure.
200
Analogy:
Line at the movie theatre
Last person to line up is the last person to buy
201
Applications
Graph searching
Simulating real-world situations
People waiting in bank lines
Airplanes waiting to take off
Packets waiting to be transmitted over the internet
Hardware
Printer queue
Keyboard strokes
Guarantees the correct processing order
202
Queue Operations
insert()
Also referred to as put(), add(), or enque()
Inserts an element at the back of the queue
remove()
Also referred to as get(), delete(), or deque()
Removes an element from the front of the queue
peekRear()
Element at the back of the queue
peekFront()
Element at the front of the queue
203
Question
In terms of memory now, what about the queue do we need
to worry about?
That we did not have to worry about with the stack
Hint: Think in terms of the low-level representation
204
Insert and remove occur at opposite
ends!!!
Whereas with a stack,
they occurred at the
same end
That means that if we
remove an element we
can reuse its slot
With a queue, you
cannot do that
Unless….
205
Circular Queue
Indices ‘wraparound’
206
Java Implementation
Page 137-138 in textbook, which again uses an internal array
representation
We’ll construct that class
Then analyze the main function pictorally
207
Queue theQueue = new Queue(5);
208
theQueue.insert(10);
209
theQueue.insert(20);
210
theQueue.insert(30);
211
theQueue.insert(30);
212
theQueue.insert(40);
213
theQueue.remove();
214
theQueue.remove();
215
theQueue.remove();
216
theQueue.insert(50);
217
theQueue.insert(60);
218
theQueue.insert(70);
219
theQueue.insert(80);
220
Remove and print…
while (!theQueue.isEmpty())
long n = theQueue.remove();
System.out.print(n);
221
Remove and print…
while (!theQueue.isEmpty())
long n = theQueue.remove();
System.out.print(n);
222
Remove and print…
while (!theQueue.isEmpty())
long n = theQueue.remove();
System.out.print(n);
223
Remove and print…
while (!theQueue.isEmpty())
long n = theQueue.remove();
System.out.print(n);
224
Remove and print…
while (!theQueue.isEmpty())
long n = theQueue.remove();
System.out.print(n);
225
Queues: Evaluation
Some implementations remove nItems
Allow front and rear indices to determine if queue is full or
empty, or size
Queue can appear to be full and empty (why?)
Additional overhead when determining size (why?)
Can remedy these by making array one size larger than the max number
of items
Efficiency
Same as stack:
Push: O(1) only at the back
Pop: O(1) only at the front
Access: O(1) only at the front
226
Priority Queues
Like a Queue
Has a front and a rear
Items are removed from the front
Difference
No longer FIFO
Items are ordered
We have seen ordered arrays. A priority queue is essentially
an ‘ordered queue’
227
Priority Queue Implementation
Almost NEVER use arrays. Why?
Application in Computing
Programs with higher priority, execute first
Print jobs can be ordered by priority
Nice feature: The min (or max) item can be found in O(1)
time
228
(Time Pending) Java Implementation
Page 147
Biggest difference will be the insert() function
Analysis
delete() - O(1)
insert() - O(n) (again, since arrays are used)
findMin() - O(1) if arranged in ascending order
findMax() – O(1) if arranged in descending order
229
Parsing Arithmetic Expressions
A task that must be performed by devices such as computers
and calculators
Parsing is another word for analyzing, that is, piece by piece
230
How it’s done…
1. Transform the arithmetic expression into postfix notation
Operators follow their two operands, i.e.
3+4 = 34+ (in postfix)
2*(3+4) = 234+* (in postfix)
May seem silly, but it makes the expression easier to evaluate
with a stack
231
Some practice
Convert the following to postfix:
3*5
3+8*4 (remember the rules of precedence!)
(3+4)*(4+6)
232
Translating infix to postfix
Think conceptually first. How do we evaluate something
like: 2*(3+4) to get 14?
Read left to right
When we’ve read far enough to evaluate two operands and an
operator - in the above case, 3+4
Evaluate them: 3+4=7
Substitute the result: 2*7 = 14
Repeat as necessary
233
Parsing in our Heads
2*(3+4)
We have to evaluate anything in parentheses before using it
Read Parsed
2 2
2* 2*
2*( 2*(
2*(3 2*(3
2*(3+ 2*(3+
2*(3+4) 2*(3+4)
2*7
14
234
Precedence
3+4*5
Note here we don’t evaluate the ‘+’ until we know what follows
the 4 (a ‘*’)
So the ‘parsing’ proceeds like this:
Read Parsed
3 3
+ 3+
4 3+4
* 3+4*
5 3+4*5
3+20
23
235
Summary
We go forward reading operands and operators
When we have enough information to apply an operator, go
backward and recall the operands, then evaluate
Sometimes we have to defer evaluation based on precedence
236
Infix to Postfix: Algorithm
Start with your infix expression, and an empty postfix string
Infix: 2*(3+4) Postfix:
Go through the infix expression character-by-character
For each operand:
Copy it to the postfix string
For each operator:
Copy it at the ‘right time’
When is this? We’ll see
237
Example: 2*(3+4)
Read Postfix Comment
2 2 Operand
* 2 Operator
( 2 Operator
3 23 Operand
+ Operator
4 234 Operand
) 234+ Saw ), copy +
234+* Copy remaining ops
238
Example: 3+4*5
Read Postfix Comment
3 3 Operand
+ 3 Operator
4 34 Operand
* 34 Operator
5 345 Operand
345* Saw 5, copy *
345*+ Copy remaining ops
239
Rules on copying operators
You cannot copy an operator to the postfix string if:
It is followed by a left parenthesis ‘(‘
It is followed by an operator with higher precedence (i.e., a ‘+’
followed by a ‘*’)
If neither of these are true, you can copy an operator once
you have copied both its operands
240
How can we use a stack?
Suppose we have our infix expression, empty postfix string
and empty stack S. We can have the following rules:
If we get an operand, copy it to the postfix string
If we get a ‘(‘, push it onto S
If we get a ‘)’:
Keep popping S and copying operators to the postfix string until either S
is empty or the item popped is a ‘(‘
Any other operator:
If S is empty, push it onto S
Otherwise, while S is not empty and the top of S is not a ‘(‘ or an
operator of lower precedence, pop S and copy to the postfix string
Push operator onto S
242
Evaluating postfix expressions
If we go through the trouble of converting to postfix, there’s
got to be a reason, right?
Well, there is! The resulting expression is much easier to
evaluate, once again using a stack
244
Why easier?
It is clear what operators go with which operands
Order of operations is enforced – removed from our concern
No parentheses to worry about
245
Java Implementations
Infix->Postfix, 161-165
Postfix Evaluator, 169-172
Time pending, let’s check them out
Otherwise, please read through them
246
Linked Lists
247
Recall Arrays
Advantages
Access is fast – O(1)
Insertion is fast in an unordered array O(1)
Searching is fast in an ordered array – O(log n)
Because we can apply the binary search
Disadvantages
Deletion is slow – O(n)
Searching is slow in an unordered array – O(n)
Insertion is slow in an ordered array – O(n)
248
Recall stacks and queues
Not generally used for real-world data storage
Why?
249
A versatile data structure
The linked list
Second most commonly used behind arrays
We also saw last week that for stacks and queues, you could use
arrays as an underlying data structure
You also can use linked lists!
250
Several Types
Simple
Double-ended
Sorted
Doubly-linked
Lists with iterators
251
A Link
Data in linked lists are embedded in links
Each link consists of:
The data itself
A reference to the next link in the list, which is null for the last
item
252
The Link class
It makes sense to make Link its own class, since a list can
then just be a collection of Link objects:
This is sometimes called a self-referential class. Any theories
why?
class Link {
public int iData;
public Link next; // Does this cause trouble? Why not?
}
253
References
Remember in Java, all objects are references
That means that the variable ‘next’, for each link just contains an
integer for a memory address
A ‘magic number’ which tells us where the object is
They are always the same size (so no problem)
254
Memory
How would this look in memory then? Let’s draw it on the
board.
255
Recall the implication!
Access for linked lists is slow compared to arrays
Arrays are like rows of houses
They are arranged sequentially
So it’s easy to just find, for example, the third house
With linked lists, you have to follow links in the chain
The next references
How do we get the third element here:
256
Links of Records
We can have a link of personnel records:
class Link{
public String name;
public String address;
public int ssn;
public Link next;
}
257
Operations
Insertion
At the beginning (fast)
In the middle (slower, although still better than arrays)
Deletion
At the beginning (fast)
In the middle (slower, although still better than arrays)
Search
Similar to arrays, worst case we have to check all elements
258
LinkedList class
Start with:
A private Link to the first element
A constructor which sets this reference to null
A method isEmpty() which returns true if the list is empty
259
insertFirst(): O(1)
Accept a new integer (p. 188)
Create a new link
Change the new link’s next reference to the current first
Change first to reference the new link
We could not execute these last two in reverse. Why?
260
deleteFirst(): O(1)
Remove the first integer in the list (p. 188)
Just reset the first reference to first.next
261
displayList() – p. 189 O(n)
Use a reference current to iterate through the elements
Print the value
Set current to current.next
Stop when current becomes null
Before setting current to current.next:
262
main() function
LinkedList theList = new LinkedList();
theList.insertFirst(22);
theList.insertFirst(44);
theList.insertFirst(66);
theList.insertFirst(88);
theList.displayList();
while (!theList.isEmpty())
theList.deleteFirst();
theList.displayList();
263
find() – p. 194 O(n)
Essentially the same idea as displayList()
Linearly iterate through the elements with a reference current
Repeatedly set current to current.next
Except this time, stop when you find the item!
Before setting current to current.next:
264
delete() – p. 194 O(n)
Pass a value stored in the list, and remove it
First we have to find it, at that point it will be in current
Set the previous element’s next reference to current.next
When we find the value:
265
main() function - #2
LinkedList theList = new LinkedList();
theList.insertFirst(22);
theList.insertFirst(44);
theList.insertFirst(66);
theList.insertFirst(88);
theList.displayList();
Link f = theList.find(44);
theList.delete(66);
theList.displayList();
266
Double-Ended Lists
Just like a regular linked list, except there are now two
references kept
One to the beginning (first)
And one to the end (last)
Enables easy insertion at both ends
You still cannot delete the last element any easier. Why?
You cannot change find() to start from the end. Why?
267
insertLast() – p. 199 O(1)
What does this look like now? Let’s see:
Create the new link with the new value (next=null)
Set last.next to reference the new link
Set last to reference the new link
Might we also have to set first? When?
268
main() function - #3
LinkedList theList = new LinkedList();
theList.insertFirst(22);
theList.insertFirst(44);
theList.insertFirst(66);
theList.insertLast(11);
theList.insertLast(33);
theList.insertLast(55);
theList.displayList();
269
Double-Ended Lists
Would we also have to modify delete()?
When? Let’s do it.
270
Efficiency: Summary
Fast insertion/deletion at ends: O(1)
Searching: O(n)
Deleting a specific item: O(n)
BUT, faster than arrays
You have equal O(n) for the search
But then an array requires an O(n) shift, where a list requires
reference copies – O(1)
Insertion at a specific point can be done, with similar results
271
Memory: Summary
A linked list uses (more or less) memory than an array?
Why?
272
Abstract Data Types (ADT)
A central feature of the Java programming language
Let’s review
What is a datatype?
What do we mean by abstraction?
What is an interface?
273
Data Types
Examples of data types: int, float double
These are called primitive data types
When we refer to a datatype:
Characteristics of the data
Operations which you can perform on that data
Object-oriented programming defines classes
Which are also datatypes. They fit the description above.
274
Abstraction
Abstract: Considered apart from detailed specifications or
implementation.
Let’s ponder the following questions:
What is an analogy of abstraction in the English language?
How does abstraction equate to datatypes and operations?
How can we describe abstraction in the context of object-
oriented programming?
What was an example of abstraction that we saw in stacks and
queues?
275
Abstract Data Types (ADTs)
Idea: Represent a Data Structure by focusing on what it does
and ignoring how it does it.
We’ve seen this already with stacks and queues
Internally, they stored data as an array
But the user didn’t know this! All they saw:
push(), pop(), peek() in the case of the stack
insert(), remove(), front(), rear() in the case of the queue
The set of functions that a client of the class can use, is called
the interface.
We can represent stacks and queues using linked lists instead
of arrays. Let’s look at how to do it.
276
Revisiting the stack….
LIFO structure
Items are inserted, removed and accessed from the top
278
When to use which?
List is clearly the better choice when you (know or do not
know?) the number of elements that the stack or queue will
hold
Analyze: what are the tradeoffs
In the case of the queue, a linked list saves us the concern of
wraparound
Keeping track of two references, front and rear
Watching if they move too far in one direction
279
Summary: ADTs
In Software Engineering
It’s always important to consider the operations you want before you
determine details, like:
Memory
Implementation of functions
For example, the operations you desire will strongly determine the
data structure you use
First item? Last item? Item in a certain position?
280
Sorted Lists
Linked list where the data is maintained in sorted order
Useful in some applications
Same applications where you’d use a sorted array
But, insertion will be faster!
And, memory will be used more efficiently
But, a tad more difficult to implement
Let’s check them out…
281
insert() p. 214 O(n)
We haven’t looked at inserting in the middle. Let’s see how
it will be done:
theList.displayList();
theList.insert(10);
theList.insert(30);
theList.insert(50);
theList.displayList();
theList.remove();
theList.displayList();
283
Sorted Linked List: Efficiency
Insertion and deletion are O(n) for the search worst case
Cannot do a binary search on a sorted linked list, like we could
with arrays! Why not?
Minimum value can be found in O(1) time
If list is double-ended, the maximum can as well (why?)
Thus, good if an application frequently accesses minimum (or
maximum) item
For which type of queue would this help us?
Also good for a sorting algorithm!
n insertions, each require O(n) comparisons so still O(n2)
However, O(n) copies as opposed to O(n2) with insertion sort
284 But twice the memory (why?)
Limitation: Previous element
Numerous times, we found the inability to access the
previous element inconvenient
Double-ended list and deleting the last element
Could not search from both ends
285
Our new Link class
class Link
{
public int iData;
public Link previous;
public Link next;
}
286
Pictorally…
Single-ended (‘L’ references the List)
287
Reverse Traversal O(n)
Forward traversal is the same as before
Use current to reference a Link, and repeatedly set it to
current.next
Backward traversal is new
It can only be done conveniently if the list is double-ended
Now we repeatedly set current to current.previous
See page 222
288
Java Implementation, p. 226
Methods
isEmpty(), check if empty
insertFirst(), insert at beginning
insertLast(), insert at end
insertAfter(), insert in the middle
deleteFirst(), delete at beginning
deleteLast(), delete at end
deleteKey(), delete in the middle
displayForward(), forward traversal
displayBackward(), backward traversal
289
isEmpty() O(1)
Simple function
Returns true if first is null.
290
insertFirst() O(1)
Steps
Create a new link
Set its next reference to first
Set first’s previous reference to the new link
Set first (and last if empty) to reference the new link
Before
After
291
insertLast() O(1)
Steps
Create a new link
Set it previous reference to last
Set last’s next reference to the new link
Set last (and first if empty) to reference the new link
Before
After
292
insertAfter() O(n)
Steps
Find the element in the list to insert after (current)
Set current.next’s previous reference to the new link
Set link’s next reference to current.next
Set current.next to the new link
Set the link’s previous reference to current
Before
After
293
deleteFirst() O(1)
Steps
Set first.next’s previous reference to null
Remember first.next could be null!!
Set first to first.next
Before
After
294
deleteLast() O(1)
Steps
Set last.previous’ next reference to null
Remember last.previous could be null!!
Set last to last.previous
Before
After
295
deleteKey() O(n)
Steps
Find the key, call it current
Set current.previous’ next reference to current.next
Set current.next’s previous reference to current.previous
Be sure to handle the case when either is null!! This would be equivalent to
deleteFirst() or deleteLast()
Before
After
296
displayForward() O(n)
Use a reference current to iterate through the elements
Initially equal to first, print the value
Set current to current.next
Stop when current becomes null
Before setting current to current.next:
297
displayBackward() O(n)
Use a reference current to iterate through the elements
Initially equal to last, print the value
Set current to current.previous
Stop when current becomes null
Before setting current to current.previous:
298
Iterators
What have we seen?
Ability to linearly traverse a list and find an item
What have we been missing?
Control over the items we traverse
class ListIterator() {
private Link current;
public ListIterator(Link l) {current = l;}
public Link getCurrent() {return current;}
public void nextLink() {current = current.next;}
}
300
main() function - #4
LinkedList theList = new LinkedList();
ListIterator iter = new ListIterator(theList.getFirst());
301
Pictorally…
I can create multiple instances of ListIterators, and have their
member current reference various points in the list
302
Bidirectional Iterators
If we have a doubly-linked list, it’s easy.
Let’s add two methods to our previous iterator class:
One to access the previous Link prevLink()
303
Encapsulation
Let’s connect the components for LinkedList and ListIterator
I want to be able to construct a LinkedList, and return a
ListIterator referencing itself, through a function
getIterator(), i.e.:
305
ListIterator: p. 237
Let’s now change the ListIterator
Make it bidirectional
Contain a reference to the list, as opposed to a single link
Methods
ListIterator(): Pass a LinkedList reference, and set
reset(): Reset iterator to the beginning of the list
atEnd(): Check if we are at the end
nextLink(): Move to the next link
getCurrent(): Return the current link
insertAfter(): Insert a link after the link referenced
insertBefore(): Insert a link before the element referenced
deleteCurrent(): Delete the currently referenced link
306
deleteCurrent(): Notes
If we delete an item, where should the iterator now point?
We’ve deleted the item it was pointed to!
We don’t want to move it to the beginning
Concept of ‘locality’
Can’t move it to the previous item
No way to reset ListIterator.previous
Could if we had a doubly linked list
Must move it to the next item
307
atEnd(): Notes
Our implementation checks if the iterator points to the last
element. Tradeoffs:
Looping becomes awkward
Iterator is always pointing at a valid link, which is good for
reliability
Must be careful with iteration
Let’s say we used a loop to display data
First reset, then display, THEN loop:
Go to the next, and display again
Need one extra, because if we simply looped till iter.atEnd()
was true, we’d hit a null reference
308
Another example
Deleting all links that contain values with multiples of three.
How can we use the iterator to do this?
309
Recursion
310
Final Exam
In case anyone’s making travel plans now…
Wednesday, December 10
12 noon – 3 PM
Location: TBA
311
Recursion
Definition
A programming technique where a function calls itself
Very effective technique in programming
In Java:
316
Scope
In Java, each function call
creates a new ‘scope’
Each of which declares a
new version of n, which is
visible
Suppose we call triangle(5)
317
Recursive Methods
Characteristics:
They call themselves
When they call themselves, they do so to solve a smaller
problem
A base case must exist
What happens if it doesn’t?
318
Iteration
Anything recursive can be made iterative, using a while or
for loop:
int triangle(int n) {
int total = 0;
while (n > 0) {
total += n;
n--;
}
}
319
Efficiency of Recursion
Recursion is very often simpler and easier to read
But, often is slower. Why?
Overhead of function calls
Sometimes, you can make redundant recursive calls
We’ll see an example of this with Fibonacci
320
Mathematical Induction
This can be a convenient way to represent a recursive
problem:
tri(n) = { 1 n=1
{ n*tri(n-1) n>1
321
Example: Factorials
Let’s start by representing fact(n) by mathematical induction:
322
Factorial Scope
In Java, each function call
creates a new ‘scope’
Each of which declares a
new version of n, which is
visible
Suppose we call factorial(4)
323
Fibonacci Numbers
Mathematical Induction:
fib(n) = {1 n=0
{1 n=1
{fib(n-1) + fib(n-2) n>1
325
Binary Search
Array has values 1-100
First search: Check element 50
50 > 33, so repeat on first half (1-49)
Second search: Check element 25
25 < 33, so repeat on second half (26-49)
Third search: Check element 37
37 > 33, so repeat on first half (26-36)
Fourth search: Check element 31
31 < 33, so repeat on second half (32-36)
Fifth search: Check element 34
34 > 33, so repeat on first half (32-33)
Sixth search: Check element 32
32 < 33, so repeat on second half (33)
Seventh search: Check element 33! Found.
326 So 7 comparisons. With linear search, it would’ve been 33.
Our implementation before was
iterative…
public int find(long key) {
int lower = 0;
int upper = nElems-1;
int curIn;
while (true) {
curIn = (lower + upper) / 2;
if (a[curIn] == key) return curIn;
else if (lower > upper) return -1;
else {
if (a[curIn] < key) lower = curIn + 1;
else upper = curIn – 1;
}
}
327 }
But we can also do it recursively!
If we think of binary search in these terms:
Start lower at 0, and upper at n-1
Let mid = arr[lower+upper]/2
If arr[mid] = key, return mid # we’re done
else if lower > upper, return -1 # not found
else if arr[mid] > key:
perform binarysearch on arr[lower…mid-1]
else if arr[mid] < key:
perform binarysearch on arr[mid+1…upper]
331
Anagrams
Involves producing all possible combinations of the letters of
a word.
Example: Anagram the word cat:
cat
cta
atc
act
tca
tac
Six possible combinations
332
Anagrams: General
In general, for a word of n letters, there will be n!
combinations assuming all letters are distinct
We saw for cat (3 letters), there were 6 possible
If some letter(s) repeat themselves, this will reduce the
number of combinations. Example, tat only has 3:
tat
att
tta
333
Anagram Algorithm
Anagram a word with n letters:
Anagram the rightmost n-1 letters
If n=2, display the word
Rotate all n letters
Repeat these steps n times
335
rotate() and doAnagram() function
Java implementation, page 266
We will write:
A rotate() function which moves each character one slot to the
left, and the first character in the last position
A recursive anagram() function which invokes rotate().
Base case: n=1, just return
Recursive step (do n times):
anagram(n-1)
display if n=2
rotate(n)
336
Output Produced
337
Towers of Hanoi
An ancient puzzle consisting of disks on pegs A, B and C
Start all disks on peg A
341
Base Case?
For TOH(n,A,B,C):
Well if there’s just one
disk (n=1), move from A
to C!
Java implementation,
page 278
Note: It’s just a few lines!
342
Complexity
For n disks:
(n disks) - 1st call, 2 recursive calls (n disks)
(n-1 disks) Two 2nd calls, 2 recursive calls
(n-2 disks) Four 3rd calls, 2 recursive calls
…
(1 disk) Many nth calls, base case
Let’s draw the tree
See why, this is too expensive for large numbers of disks?
Old legend: In remote India temple, monks continuously work
at solving this problem with 64 disks and 3 diamond towers
The world ends when they are finished
No worries, it will take forever anyway…. J
343
Number of Operations
Each recursive call generates two recursive calls, and a
constant number of operations (call it c)
First call: c
Two second calls, times c: 2*c
Four third calls, times c: 4*c
…
2n nth calls, times c: 2n*c
347
Merge…
Whichever one we chose, move one spot to the right in that
subarray and repeat
348
Keep going…
349
And going…
350
And going…
351
And going…
352
Get the idea?
353
Few more…
354
Put in the rest…
When we get to the end of one subarray, just insert the rest
of the other.
355
Finally…
We’re done when the temporary array is full
356
So now, we know…
If we have two sorted subarrays, we can merge them to sort
the entire array. And we can do it in O(n) time.
Just one comparison for each of the n elements
357
Pictorally…
358
So conceptually, what must we do?
mergesort(A, n): # Sort an array of size n
mergesort(first half of A, n/2)
mergesort(second half of A, n/2)
merge(first half of A, second half of A)
359
Let’s add a merge() procedure to
this class. (p. 289)
class DArray {
private long[] theArray;
private int nElems;
public Darray(int max) {
theArray = new long[max];
nElems++;
}
public void insert(long value) {
theArray[nElems] = value;
nElems++;
360 }
What merge() accepts
A workspace array of size n
The lower, middle and upper indices of theArray to merge
First half is index lower to index (middle-1)
Second half is index middle to index upper
So n=upper-lower+1
361
Now write mergesort()
Our recursive mergesort will accept:
Workspace of size n, and lower/upper indices of theArray to
sort
Initial call will pass an empty workspace, 0, and nElems-1.
362
Complexity
Every call makes two recursive calls, each with n/2 copies
First call: n copies, and generates:
Two recursive calls at (n/2) copies each, which generate:
Four recursive calls at (n/4) copies each
…
n recursive calls at (n/n) copies each
363
Total number of operations
n + 2(n/2) + 4(n/4) +…. + n(n/n)
= n + n + …. + n
= (log n + 1) * n
= n log n + n
O(n log n)
Best so far!!
364
Advanced Sorting
365
Radar
Midterm
Two weeks from this Friday on 3/27
In class
Closed book
366
Improved Sorting Techniques
What we have seen
Bubble, selection and insertion
Easy to implement
Slower: O(n2)
Mergesort
Faster: O(n log n)
More memory (temporary array)
368
Recall Insertion Sort....
A subarray to the left is
‘partially sorted’
Start with the first element
The player immediately to
the right is ‘marked’.
The ‘marked’ player is
inserted into the correct
place in the partially sorted
array
Remove first
Marked player ‘walks’ to
the left
Shift appropriate elements
until we hit a smaller one
369
The problem
If a small item is very far
to the right
Like in this case ->
You must shift many
intervening large items
one space to the right
Almost N copies
Average case N/2
N items, N2/2 copies
Better if:
Move a small item many
370
spaces, without shifting
Remember
What made insertion
sort the best of the basic
sorts?
If the array is almost
sorted, O(n)
371
The “Almost Sort” step
Say we have a 10-element array:
60 30 80 90 0 20 70 10 40 50
Sort indices 0, 4, and 8:
0 30 80 90 40 20 70 10 60 50
Sort indices 1, 5, and 9:
0 20 80 90 40 30 70 10 60 50
Sort indices 2, 6:
0 20 70 90 40 30 80 10 60 50
Sort indices 3, 7
0 20 70 10 40 30 80 90 60 50
372
The “Almost Sort” step
This is called a “4-sort”:
60 30 80 90 0 20 70 10 40 50
0 30 80 90 40 20 70 10 60 50
0 20 80 90 40 30 70 10 60 50
0 20 70 90 40 30 80 10 60 50
0 20 70 10 40 30 80 90 60 50
Once we’ve done this, the array is almost sorted, and we can
run insertion sort on the whole thing
Should be about O(n) time
373
Interval Sequencing
4-sort was sufficient for a 10-element array
For larger arrays, you’ll want to do many of these to achieve
an array that is really almost sorted. For example for 1000
items:
364-sort
121-sort
40-sort
13-sort
4-sort
insertion sort
374
Knuth’s Interval Sequence
How to determine?
Knuth’s algorithm:
Start at h=1
Repeatedly apply the function h=3*h+1, until you pass the
number of items in the array:
h = 1, 4, 13, 40, 121, 364, 1093….
Thus the previous sequence for a 1000-element array
375
Why is it better?
When h is very large, you are sorting small numbers of
elements and moving them across large distances
Efficient
When h is very small, you are sorting large numbers of
elements and moving them across small distances
Becomes more like traditional insertion sort
But each successive sort, the overall array is more sorted
So we should be nearing O(n)
376
Let’s do our own example…
Sort these fifteen elements:
8 10 1 15 7 4 12 13 2 6 11 14 3 9 5
377
Java Implementation, page 322
What are function needs to do:
Initialize h properly
Have an outer loop start at outer=h and count up
Have an inner loop which sorts outer, outer-h, outer-2h, etc.
For example, if h is 4:
We must sort (8, 4, 0), and (9, 5, 1), etc.
378
Other Interval Sequences
Original Shellsort:
h=h/2
Inefficient, leads to O(n2)
Variation:
h = h / 2.2
Need extra effort to make sure we eventually hit h=1
Flamig’s Approach (yields similar to Knuth)
if (h < 5) h = 1; else h = (5*h-1) / 11;
381
Partitioning
Idea: Divide data into two groups, such that:
All items with a key value higher than a specified amount (the
pivot) are in one group
All items with a lower key value are in another
Applications:
Divide employees who live within 15 miles of the office with
those who live farther away
Divide households by income for taxation purposes
Divide computers by processor speed
384
Efficiency: Partitioning
O(n) time
left starts at 0 and moves one-by-one to the right
right starts at n-1 and moves one-by-one to the left
When left and right cross, we stop.
So we’ll hit each element just once
Example:
Unpartitioned: 42 89 63 12 94 27 78 10 50 36
Partitioned around Pivot: 3 27 12 36 63 94 89 78 42 50
What does this imply about the pivot element after the
partition?
386
Placing the Pivot
Goal: Pivot must be in the leftmost position in the right
subarray
3 27 12 36 63 94 89 78 42 50
Our algorithm does not do this currently.
It currently will not touch the pivot
left increments till it finds an element < pivot
right decrements till it finds an element > pivot
So the pivot itself won’t be touched, and will stay on the right:
3 27 12 63 94 89 78 42 50 36
387
Options
We have this:
3 27 12 63 94 89 78 42 50 36
Our goal is the position of 36:
3 27 12 36 63 94 89 78 42 50
We could either:
Shift every element in the right subarray up (inefficient)
Just swap the leftmost with the pivot! Better J
We can do this because the right subarray is not in any
particular order
3 27 12 36 94 89 78 42 50 63
388
Swapping the Pivot
Just takes one more line to our Java method
Basically, a single call to swap()
Swaps A[end-1] (the pivot) with A[left] (the partition index)
389
Quicksort
The most popular sorting algorithm
For most situations, it runs in O(n log n)
Remember partitioning. It’s the key step. And it’s O(n).
391
Shall we try it on an array?
10 70 50 30 60 90 0 40 80 20
Let’s go step-by-step on the board
392
Best case…
We partition the array each time into two equal subarrays
Say we start with array of size n = 2i
We recurse until the base case, 1 element
393
Total: (i+1)*n = (log n + 1)*n -> O(n log n)
The VERY bad case….
If the array is inversely sorted.
Let’s see the problem:
90 80 70 60 50 40 30 20 10 0
What happens after the partition? This:
0 20 30 40 50 60 70 80 90 10
This is almost sorted, but the algorithm doesn’t know it.
It will then call itself on an array of zero size (the left
subarray) and an array of n-1 size (the right subarray).
Producing:
0 10 30 40 50 60 70 80 90 20
394
The VERY bad case…
In the worst case, we partition every time into an array of 0
elements and an array of n-1 elements
This yields O(n2) time:
First call: Partition n elements, n operations
Second calls: Partition 0 and n-1 elements, n-1 operations
Third calls: Partition 0 and n-2 elements, n-2 operations
Draw the tree
Yielding:
Operations = n + n-1 + n-2 + … + 1 = n(n+1)/2 -> O(n2)
395
Summary
What caused the problem was “blindly” choosing the pivot
from the right end.
In the case of a reverse sorted array, this is not a good choice
at all
396
Median-Of-Three Partitioning
Everytime you partition, choose the median value of the left,
center and right element as the pivot
Example:
44 11 55 33 77 22 00 99 101 66 88
Once you get to a very small subarray, you can just sort with
insertion sort
You can experiment a bit with ‘cutoff’ values
Knuth: n=9
399
(Time Pending) Java
Implementation
QuickSort with maximum optimization
Median-Of-Three Partitioning
Insertion Sort on arrays of size less than 9
400
Operation Count Estimates
For QuickSort
n=8: 30 comparisons, 12 swaps
n=12: 50 comparisons, 21 swaps
n=16: 72 comparisons, 32 swaps
n=64: 396 comparisons, 192 swaps
n=100: 678 comparisons, 332 swaps
n=128: 910 comparisons, 448 swaps
Assumptions
402
Assumption
The assumption is: base 10, positive integers!
403
Example: On the Board
421 240 35 532 305 430 124
Remember: 35 has a 100s digit of zero!
404
Operations
n Elements
Copy each element once to a group, and once back again:
2n copies -> O(n)
Then you have to copy k times, where k is the maximum
number of digits in any value
So, 2*k*n copies -> O(kn)
Zero comparisons
407
Binary Trees
408
Binary Trees
A fundamental data structure
Combines advantages of arrays and linked lists
Fast search time
Fast insertion
Fast deletion
Moderately fast access time
409
Recall Ordered Arrays…
Their search time is faster, because there is some ‘ordering’
to the elements.
We can do binary search, O(log n)
Instead of linear search, O(n)
412
Traversing a Tree
Start at the root and traverse downward along its edges
Typically, edges represent some kind of relationship
We represent these by references
Just as in linked lists:
class Link {
int data;
Link next;
}
In a tree:
class Node {
int data;
Node child1;
Node child2;
413 …
}
Size of a Tree
Increases as you go down
Opposite of nature. J
414
Binary Trees
A special type of tree
With this tree, nodes had varying numbers of children:
416
A Binary Tree
Each node thus has a left and right child
What would the Java class look like?
417
Binary Trees: Terms
Path: Sequence of nodes connected by edges
Green line is a path from A to J
418
Binary Trees: Terms
Root: The node at the top of the tree
Can be only one (in this case, A)
419
Binary Trees: Terms
Parent: The node above. (B is the parent of D, A is the
parent of B, A is the grandparent of D)
420
Binary Trees: Terms
Child: A node below. (B is a child of A, C is a child of A, D
is a child of B and a grandchild of A)
421
Binary Trees: Terms
Leaf: A node with no children
In this graph: H, E, I, J, and G
422
Binary Trees: Terms
Subtree: A node’s children, it’s children’s children, etc.
The hilited example is just one, there are many in this tree
423
Binary Trees: Terms
Visit: Access a node, and do something with its data
For example we can visit node B and check its value
424
Binary Trees: Terms
Traverse: Visit all the nodes in some specified order.
One example: A, B, D, H, E, C, F, I, J, G
425
Binary Trees: Terms
Levels: Number of generations a node is from the root
A is level 0, B and C are at level 1, D, E, F, G are level 2, etc.
426
Binary Trees: Terms
Key: The contents of a node
427
A Binary Search Tree
A binary tree, with the following characteristics:
The left child is always smaller than its parent
The right child is always larger than its parent
All nodes to the right are bigger than all nodes to the left
428
Integer Tree
Will use this class for individual nodes:
class Node {
public int data;
public Node left;
public Node right;
}
Let’s sketch the Java template for a binary search tree (page
375)
429
Example main() function
Page 275, with a slight tweak
Insert three elements: 50, 25, 75
Search for node 25
If it was found, print that we found it
If it was not found, print that we did not find it
430
Finding a node
What do we know?
For all nodes:
All elements in the left subtree are smaller
All elements in the right subtree are larger
431
Searching for a KEY
We’ll start at the root, and check its value
If the value = key, we’re done.
If the value is greater than the key, look at its left child
If the value is less than the key, look at its right child
Repeat.
432
Example
Searching for
element 57
433
Java Implementation – find()
Pages 377-378
434
Number of operations: Find
Typically about O(log n). Why?
436
Example
Inserting
element
45
437
Java Implementation – insert()
Page 380
438
Traversing a Tree
Three Ways:
Inorder (most common)
Preorder
Postorder
439
Inorder Traversal
Visits each node of the tree in ascending order:
443
Preorder Traversal
Prints all parents before children
Prints all left children before right children. So with this tree:
446
Postorder Traversal
Prints all children before parents
Prints all left children before right children. So with this tree:
449
Finding the Minimum
In a binary search tree, this is always the leftmost child of the
tree! Easy. Java?
Start at the root, and traverse until you have no more left children
450
Finding the Maximum
In a binary search tree, this is also easy – it’s the rightmost
child in the tree
Start at the root, traverse until there are no more right children
Java?
451
Deletion
This is the challenging one
First, find the element you want to delete
Once you’ve found it, one of three cases:
1. The node has no children (easy)
2. The node has one child (decently easy)
3. The node has two children (difficult)
452
Case 1: No Children
To delete a node with no children:
Find the node
Set the appropriate child field in its parent to null
Example: Removing 7 from the tree below
453
Java Implementation
Start from page 390-391
Find the node first
As we go through, keep track of:
The parent
Whether the node is a left or right child of its parent
454
Case 2: One Child
Assign the deleted node’s
child as the child of its
parent
Essentially, ‘snip out’ the
deleted node from the
sequence
Example, deleting 71
from this tree:
455
Java Implementation
Pages 392-393
Two cases to handle. Either:
The right child is null
If the node is a left child, set its parent’s left child to the node’s left child
If the node is a right child, set its parent’s right child to the node’s left
child
The left child is null
If the node is a left child, set its parent’s left child to the node’s right child
If the node is a right child, set its parent’s right child to the node’s right
child
456
Case 3: Two Children
Here’s the tough case.
Let’s see an example of why it’s complicated…
457
Case 3: Two Children
What we need is the next highest node to replace 25.
For example, if we replaced 25 by 30, we’re set.
458
Case 3: Two Children
We call this the inorder successor of the deleted node
i.e., 30 is the inorder successor of 25. This replaces 25.
459
Inorder successor
The inorder successor is
always going to be the
smallest element in the
right subtree
In other words, the
smallest element that is
larger than the deleted
node.
460
Finding the inorder successor
Algorithm to find the inorder
successor of some node X:
First go to the right child of X
Then keep moving to left
children
Until there are no more
Then we are at the inorder
successor
461
Removing the successor
We must remove the
successor from its current
spot, and place it in the spot
of the deleted node
462
successor
If the successor is not the deleted node’s right child, tougher
We must add two steps:
1. Set the successor’s parent’s left to the successor’s right
2. Set the successor’s right to the deleted node’s right
3. Set the successor’s left to the deleted node’s left (as before)
4. Replace the deleted node by the successor (as before)
463
Java Implementation (Time
Pending)
getSuccessor() function, page 396
Accepts a node
First goes to its right child
Then goes to the left child
Does this until no more left children
464
Efficiency: Binary Search Trees
Note that:
Insertion, deletion, searching all involved visiting nodes of the
tree until we found either:
The position for insertion
The value for deletion
The value we were searching for
For any of these, we would not visit more than the number of
levels in the tree
Because for every node we visit, we check its value, and if we’re
not done, we go to one of its children
465
Efficiency: Binary Search Trees
So for a tree of n nodes, how many levels are there:
Nodes Levels
1 1
3 2
7 3
15 4
31 5
….
1,073,741,824 30
466 It’s actually log(n) + 1!
So…
All three of our algorithms: insertion, deletion, and
searching take O(log n) time
We go through log n + 1 levels, each time with one
comparison.
At the point of insertion or deletion, we just manipulate a
constant number of references (say, c)
That’s independent of n
467
Compare to Arrays
Take 1 million elements and delete an element in the middle
Arrays -> Average case, 500 thousand shifts
Binary Search Trees -> 20 or fewer comparisons
Similar case when comparing with insertion into an ordered
array
468
Huffman Codes
An algorithm to ‘compress’ data
Purpose:
Apply a compression algorithm to take a large file and store it as
a smaller set of data
Apply a decompression algorithm to take the smaller
compressed data, and get the original back
469
Quick Lesson In Binary
Generally for an n-digit number in binary:
bn-1 … b2b1b0 = bn-12n-1 + … + b222 + b121 + b020
Internal Storage
01001001 01001100 01001111 01010110 01000101
01010100 01010010 01000101 01000101 01010011
472
Underlying Motivation
Why use the same number of bits to store all characters?
For example, E is used much more often than Z
So what if we only used two bits to store E
And still used the eight to store Z
We should save space.
474
Most Used Characters
The most used characters will vary by file
Computing Huffman Codes first requires computing the
frequency of each character, for example for “SUSIE SAYS IT
IS EASY”:
CHAR COUNT
A 2
E 2
I 3
S 6
T 1
U 1
Y 2
Space 4
475 Linefeed 1
Computing Huffman Codes
Huffman Codes are varying bit lengths depending on
frequency (remember S had the highest freq at 6):
CHAR CODE
A 010
E 1111
I 110
S 10
T 0110
U 01111
Y 1110
Space 00
476 Linefeed 01110
Coding “SUSIE SAYS IT IS EASY”
CHAR CODE
A 010
E 1111
I 110
S 10
T 0110
U 01111
Y 1110
Space 00
Linefeed 01110
10 01111 10 110 1111 00 10 010 1110 10 00 110 0110 00
110 10 00 1111 010 10 1110 01110 (65 bits)
477 Before, it would’ve been (21*8=168 bits!)
A Huffman Tree
Idea:
Each character appears as
a leaf in the tree
The higher the frequency
of a character, the higher
up in the tree it is
Number outside a leaf is
its frequency
Number outside a non-
leaf is the sum of all child
frequencies
478
A Huffman Tree
Decoding a message:
For each bit, go right (1)
or left (0)
Once you hit a character,
print it, go back to the
root and repeat
Example: 0100110
Start at root:
Go L(0), R(1), L(0), get A
Go back to root
Go L(0), R(1), R(1), L(0),
get T
479
Encoding
Decoding is thus easy
when you have this tree
However, we must
produce the tree
480
First step
Start from the leaves, which contain single characters and
their associated frequencies
Store these nodes in a priority queue, ordered by frequency
481
Next
Take the left two elements, and form a subtree
The two leaves are the two characters
The parent is empty, with a frequency as the sum of its two
children
Put this back in the priority queue, in the right spot
482
Continue this process…
Again, adjoin the leftmost two elements (now we actually
adjoin a leaf and a subtree):
483
Keep going…
Adjoin leaves Y (2) and E (2), this forms a subtree with root
frequency of 4
484
Continue until we have one tree…
485
Continue until we have one tree…
486
Continue until we have one tree…
487
Continue until we have one tree…
488
Our final tree
Note we were able to construct this from the frequency table
489
Obtaining the Huffman Code from
the tree
Once we construct the tree,
we still need the Huffman
Code to encode the file
No way around this: we
have to start from the root
and traverse all possible
paths to leaf nodes
As we go along, keep track
of if we go left (0) or right
(1)
So A went left (0), then
right (1), then left (0)
490
Code Table
When we get the Huffman
Code for each character,
we insert them into a Code
Table, as to the right
Decoding a File
Read the compressed file bit-by-bit
Use the Huffman Tree to get each character
492
Red-Black Trees
493
Recall Binary Trees
What were the advantages?
494
Unbalanced binary trees
Let’s form two binary search trees
One inserting this sequence:
10 20 30 40 50 60 70 80 90 100
Another inserting this sequence:
100 90 80 70 60 50 40 30 20 10
495
Red-Black Trees
Binary search trees, with some added features
These ‘added features’ make sure that the tree is balanced
Which we’re never guaranteed with binary search trees!
Thus keeping:
Insertion
Deletion
Searching
Rotations, in general:
Raise some nodes and lower others to help balance the tree
Ensure that we do not violate any characteristics of a binary
search tree
Thus all nodes to the left must still have values smaller
All nodes to the right must still have values larger
498
Rotations Involving Many Nodes
A three node rotation was easy.
Let’s look at a more complicated one.
499
Literally, this is what we must do…
For a right rotation, the
top node must have a left
child
505
Color? Really?
How would we include ‘color’ as a characteristic of a Node?
507
Red-Black Rules: #2
The root of the tree MUST be black.
508
Red-Black Rules: #3
If a node is red, its children MUST be black
The converse is NOT true; black nodes can have black or red
children
509
Red-Black Rules: #4
Every path from the root to a leaf or null child must have the
same number of black nodes.
510
Summary
These are the four rules:
All nodes are either red or black.
The root must be black.
A red node can only have black children.
All paths to leaf or null have the same number of black children
513
Color Flip
Now suppose we insert 15
(initially red). What rule is
broken?
516
Putting in (16)…
We initially have this
situation ->
If we make (16)
black, what is
violated?
517
Putting in (16)…
We initially have this
situation ->
If we make (15)
black, what is
violated?
518
Putting in (16)…
We initially have this
situation ->
519
Need rotations
To fix this situation:
We have to color flip, to get rid of the rule (3) violation
But we also have to rotate to fix other problems
So, we need both color flips and rotations.
520
How do we do it?
Let’s reduce it to a general case
We initially insert (16) as red
General:
Let X be a node that causes a rule violation
Let P be the parent of X
Let G be the grandparent of X (the parent of P)
521
This Example
X is (16)
P is (15) - the parent of X (16)
G is (17) – the grandparent of X (16), parent of P (15)
522
Insertion: Color Flips
To find the point of insertion, you have to start at the root
and go down the tree
If you encounter a black node with two red children:
Flip both children’s color to black
Flip the black node to red, unless it’s the root (then keep it
black)
523
Color Flip: Revisit
Again, let’s look at
inserting (15), red
We start at (13) which is
black, and see it has two
red children (8) and (17)
Flip (8) and (17) to black
Normally we’d flip (13) to
red, but it’s the root
Now go right to (17)
Go left, that’s for (15)
(17) is black, so we can just
524 pop it in
Color Flip: Revisit
Again, let’s look at
inserting (16), red
Start at (13), go right
At (17), go left
At (15), go right – that’s
for (16)
525
When must we rotate…
So we’ve gone down the tree, flipped colors as necessary, and
gotten to the point of insertion for our new node, X
X is red
Call its parent P
526
Inside vs. Outside Grandchildren
X is an outside grandchild if:
X is a left child of P, and P is a left child of G, or…
X is a right child of P, and P is a right child of G
X is an inside grandchild if:
X is a right child of P, and P is a left child of G, or…
X is a left child of P, and P is a right child of G
527
Rotations Required
If X is an inside grandchild:
Flip the color of G
Flip the color of X
Rotate with P at the top, in the direction that raises X
Rotate with G at the top, in the direction that raises X
This is a perfect example, (16) is an inside grandchild of (17)
X is (16)
P is (15)
G is (17)
528
Step 1: Color Flips
Flip the color of X (16)
Flip the color of G (17)
529
Step 2: Rotate with P as the top
Rotate with P (15) as the top, in the direction that raises X
(16). In this case, it’s to the left
530
Step 3: Rotate with G as the top
Rotate with G (17) as the top, in the direction that raises X
(16). In this case, it’s to the right
532
Summary: Insertion
Start at the root, and find the point of insertion
Go right or left, just like an ordinary binary search tree
But as you descend, if you find a black node with two red children,
flip color
* IF YOU THEN HAVE A RED PARENT WITH RED CHILD, ROTATE
USING RULES BELOW *
At the point of insertion,
You’ll insert some node X as the child of P and grandchild of G
If P is black, done
If P is red, then:
If X is an outside grandchild, flip the colors of G and P and rotate with G as the
top in the direction that raises X
If X is an inside grandchild, flip the colors of G and X, and:
Rotate with P as the top in the direction that raises X
533
Rotate with G as the top in the direction that raises X
Example
Construct the red-black tree that results from inserting the
following elements:
10 20 30 40 50 60 70 80 90 100
Remember with binary search tree, this results in maximum
non-balance!
534
Another Example
Draw the red-black tree that results from inserting the
following elements:
1 6 8 11 13 15 17 22 25 27
535
Deletion: Red-Black Trees
This is very difficult to do
Remember: Deletion from a plain binary search tree is hard!
In Red-Black Trees:
You must delete just as you would from a binary search tree
PLUS, uphold the properties of Red-Black Trees
537
Efficiency: Searching
Because of the extra effort that we take with insertion and
deletion, a red black tree will always be balanced
For n nodes, no more than log(n)+1 levels
538
Efficiency: Insertion
Note: Insertion into a red-black tree will be (faster or
slower?) than a regular binary search tree
540
Final Comparison
Compare to binary search trees:
If your data is fairly random, a binary search tree will likely be
better
Of course, you’re playing the odds
But there is a penalty for inserting and deleting into a red-black tree, once
the point of insertion is found
Thus if the data would be fairly balanced in a binary search tree, it’s better
If your data is fairly sorted, a red-black tree will likely be better
May ask: why would our data ever be sorted?
What would be a structure for which a red-black tree would be good?
541
Other Balanced Trees
Just to be aware
AVL Trees
Instead of a color, each node stores the difference between the heights of
its left and right subtrees
This difference cannot be greater than 1
Similar penalties, advantages vs. binary search trees
A bit slower than red-black trees actually, so rarely used
Multiway or 2-3-4 Tree
Each node has left children and right children, with the same properties
as a binary search tree
Easier to keep balanced, but requires linear search through left children
when for example we ‘branch left’
However, if the number of left (or right) children is restricted to a
small number, not too bad
542
Hash Tables
543
Hash Tables: Overview
Provide very fast insertion and searching
Both are O(1)
Is this too good to be true?
Disadvantages
Based on arrays, so the size must be known in advance
Performance degrades when the table becomes full
No convenient way to sort data
Summary
Best structure if you have no need to visit items in order and
544 you can predict the size of your database in advance.
Motivation
Let’s suppose we want to insert a key into a data structure,
where the key can fall into a range from 0 to m, where m is
very large
And we want to be able to find the key quickly
546
Hash Table
Idea
Provide easy searching and insertion by mapping keys to
positions in an array
This mapping is provided by a hash function
Takes the key as input
Produces an index as output
547
Hash Function: Example
The easiest hash function Index Value
is the following: 0
H(key) = key % tablesize 1 2001
H(key) now contains a 2
value between 0 and 3 13
tablesize-1 4
So if we inserted the 5
following keys into a table 6 11456
of size 10: 13,11456, 7 157
2001, 157 8
You probably already see 9
potential for collisions
548
Patience, we’ll come to it!
What have we accomplished?
We have stored keys of an Index Value
unpredictable large range 0
into a smaller data 1 2001
structure 2
And searching and 3 13
inserting becomes easy! 4
5
H(207) = 7 8
We have a collision at 9
550 position 7
What have we learned?
If we use hash tables, we need the following:
Some way of handling collisions. We’ll study a couple ways:
Open addressing
Which has 3 kinds: linear probing, quadratic probing, and double
hashing
Separate chaining
551
Linear Probing
Presumably, you will have define your hash table size to be
‘safe’
As in, larger than the maximum amount of items you expect to
store
As a result, there should be some available cells
552
Linear Probing: Example
Again, say we insert Index Value
element 207 0
H(207) = 207 % 10 = 7 1 2001
2
3 13
This results in a collision
with element 157 4
5
So we search linearly for
6 11456
the next available cell,
7 157
which is at position 8
8 207
And put 207 there
9
553
Linear Probing
Note: This complicates Index Value
insertion and searching a 0
bit! 1 2001
For example, if we then 2
inserted element 426, we 3 13
would have to check three 4
cells before finding a vacant 5
one at position 9 6 11456
7 157
554
You apply H(k), and probe!
Linear Probing: Clusters
As the table to the right Index Value
illustrates, linear probing 0
also tends to result in the 1 2001
formation of clusters. 2
Where large amounts of 3 13
cells in a row are populated 4
And large amounts of cells 5
are sparse
6 11456
7 157
This becomes worse as the 8 207
table fills up 9 426
Degrades performance
555
Linear Probing: Clusters
LaFore: A cluster is like a Index Value
‘faint scene’ at a mall 0
Initially, the first arrivals 1 2001
come 2
Later arrivals come because 3 13
they wonder why everyone 4
was in one place 5
As the crowd gets bigger, 6 11456
more are attracted 7 157
8 207
Same thing with clusters! 9 426
Items that hash to a value in
556
the cluster will add to its size
Index Value
0
Linear Probing 1
2
2001
3
One option: If the table 4
17 157
H(k) = k % 20
18
But, less clustering 19
557 20
Linear Probing
Linear probing is the simplest way to handle collisions, and is
thus worthy of explanation
Let’s look at the Java implementation on page 533
This assumes a class with member variables:
hashArray (the hash table)
arraySize (the size of the hash table)
Assume an empty slot contains -1
We’ll construct:
hashFunc()
find()
insert()
558
delete()
Quadratic Probing
The main problem with linear probing was its potential for
clustering
Quadratic probing attempts to address this
Instead of linearly searching for these next available cell
i.e. for hash x, search cell x+1, x+2, x+3, x+4….
Search quadratically
i.e. for hash x, search cell x+1, x+4, x+9, x+16, x+25…
Idea
On a collision, initially assume a small cluster and go to x+1
If that’s occupied, assume a larger cluster and go to x+4
If that’s occupied assume an even larger cluster, and go to x+9
559
Quadratic Probing: Example
Returning to our old Index Value
example with inserting 207 0
H(207) = 207 % 10 = 7 1 2001
2
3 13
This results in a collision
with element 157 4
5
In this case, slot 7 is
6 11456
occupied but slot 7+1=8 is
7 157
open, so we put it there
8
9
560
Quadratic Probing
Now, if we insert 426 Index Value
H(426) = 426 % 10 = 6 0
Which is occupied 1 2001
561
Quadratic Probing
We have achieved a decrease Index Value
in the cluster count 0 426
Clusters will tend to be 1 2001
smaller and more sparse 2
Instead of having large 3 13
clusters 4
And largely sparse areas 5
6 11456
Thus quadratic probing got 7 157
rid of what we call primary 8 207
clustering. 9
562
Quadratic Probing
Quadratic probing does, Index Value
however, suffer from 0 426
secondary clustering 1 2001
2
Where, if you have several 3 13
keys hashing to the same 4
value 5
The first collision requires 6 11456
one probe 7 157
The second requires four 8 207
The third requires nine 9
The fourth requires sixteen
563
Quadratic Probing
Secondary clustering would Index Value
happen if we inserted for 0 426
example: 1 2001
827, 10857, 707 1117 2
Because they all hash to 7 3 13
4
Not as serious a problem as 5
primary clustering 6 11456
7 157
8 207
But there is a better solution
that avoids both. 9
564
Double Hashing
The problem thus far is that the probe sequences are always
the same
For example: linear probing always generates x+1, x+2, x+3...
Quadratic probing always generates x+1, x+4, x+9…
566
Double Hashing: Example
Returning to our old Index Value
example with inserting 207 0
H(207) = 207 % 10 = 7 1 2001
2
3 13
This results in a collision
with element 157 4
5
So we hash again, to get the
6 11456
probe
7 157
Suppose we choose c=5
8
Then:
9
P(207) = 5 – (207 % 5)
567 P(207) = 5 – 2 = 3
Double Hashing: Example
So we insert 207 at Index Value
position: 0 207
H(207) + P(207) = 1 2001
7+3 = 2
10 3 13
4
Wrapping around, this will 5
put 207 at position 0 6 11456
7 157
8
9
568
Double Hashing: Example
Now, let’s again insert Index Value
value 426 0 207
We run the initial hash: 1 2001
H(426) = 426 % 10 = 6 2
We get a collision, so we 3 13
probe: 4
P(426) = 5 – (426 % 5) 5
=5–1=4 6 11456
7 157
And insert at location:
8
H(426) + P(426) = 10
9
Wrapping around, we get
569 0. Another collision!
Double Hashing: Example
So, we probe again Index Value
P(426) = 4 0 207
So we insert at location 1 2001
0+4 = 4, and this time 2
there is no collision 3 13
4 426
clusters 7 157
570
Java Implementation, page 547
Let’s try this again:
Again, we have our hash table stored in hashArray
And arraySize as the size of the hash table
Again, assume positive integers and all entries are initially -1
Let’s construct
hashFunc()
hashFunc2()
find()
insert()
delete()
571
Note…
What is a potential problem with choosing a hash table of size
10 and a c of 5 for the probe, as we just did?
572
Probe Sequence
The probe sequence may never find an open cell!
Because H(0) = 0, we’ll start at hash location 0
If we have a collision, P(0) = 5 so we’ll next check 0+5=5
If we have a collision there, we’ll next check 5+5=10, with
wraparound we get 0
We’ll infinitely check 0 and 5, and never find an open cell!
573
Double Hashing Requirement
The root of the problem is that the table size is not prime!
For example if the size were 11:
0, 5, 10, 4, 9, 3, 8, 2, 7, 1, 6
If there is even one open cell, the probing is guaranteed to find
it
574
Generally, for open addressing, double hashing is best
Separate Chaining
The alternative to open
addressing
Does not involve probing
to different locations in the
hash table
Rather, every location in
the hash table contains a
linked list of keys
575
Separate Chaining
Simple case, 7 element
hash table
H(k) = k % 7
So:
21, 77 each hash to
location 0
72 hashes to location 2
75, 5, 19 hash to location 5
577
Java Implementation
Let’s look at pages 555-557
Note: We will need a linked list and the hash table!
Will take a little time
578
A Good Hash Function
Has two properties:
Is computable quickly; so as not to degrade performance of
insertion and searching
Can take a range of key values and transform them into indices
such that the key values are distributed randomly across the
hash table
579
For example…
Data can be highly non-random
For example, a car-part ID:
033-400-03-94-05-0-535
580
digits in the code
Rule #1: Don’t Use Non-Data
Compress the key fields down enough until every bit counts
For example:
The category (bits 3-5, with restricted values 100, 150, 200, …
, 850) counting by 50s needs to be compressed down to run
from 0 to 15
The checksum is not necessary, and should be removed. It is a
function of the rest of the code and thus redundant with respect
to the hash table
581
Rule #2: Use All of the Data
Every part of the key should contribute to the hash function
More data portions that contribute to the key, more likely it
will be that the keys hash evenly
Saving collisions, which cause trouble no matter what the
algorithm you use
582
Rule #3: Use a Prime Number for
Modulo Base
This is a requirement for double hashing
Important for quadratic probing
Especially important if the keys may not be randomly
distributed
The more keys that share a divisor with the array size, the more
collisions
Example, non-random data which are multiples of 50
If the table size is 50, they all hash to the same spot
If the table size is 10, they all hash to the same pot
If the table size is 53, no keys divide evenly into the table size. Better!
583
Hashing Efficiency
Insertion and Searching are O(1) in the best case
This implies no collisions
If you minimize collisions, you can approach this runtime
If collisions occur:
Access times depend on resulting probe lengths
Every probe equals one more access
So every worst case insertion or search time is proportional to:
The number of required probes if you use open addressing
The number of links in the longest list it you use separate chaining
584
Efficiency: Linear Probing
Let’s assume a load factor L,
where L is the percentage of
hash table slots which are
occupied.
At L = 2/3:
Successful: 2.0
Unsuccessful: 5.0
Successful (average):
1 + (L/2)
Unsuccessful:
1+L
588
Summary: When to use What
If the number of items that will be inserted is uncertain, use
separate chaining
Must create a LinkedList class
But performance degrades only linearly
With open addressing, major penalties
590
Heaps
591
Heaps: Motivation
Recall priority queues. What were they?
An ordered queue
Offered us O(1) removal, searching of:
The largest element if ordered from highest to lowest
The smallest element if ordered from lowest to highest
Insertion still takes O(n) time
593
Complete
A heap is a complete binary tree
In that, each row is completely filled in reading from left to right
The last row need not be
594
Array Implementation
Heaps are usually implemented with arrays
It will become clear why
595
Traversal
Note: An inorder traversal of the heap is very difficult!
Because the elements are weakly ordered
The heap condition is not as strong as the organizing principle
in the binary search tree
Thus this operation is not supported by heaps
596
Arbitrary search and deletion
Searching and deleting any element other than the maximum
is also not supported
For the same reasons, they are difficult and expensive
There are actually only two operations that a heap
supports….
597
Supported Operations
A heap only supports two operations:
Deletion of the maximum element
Insertion of a new element
These are actually the two required operations for a priority
queue!
598
Operation 1: Removing the max
We already know that the maximum element is:
At the root of the heap
At position 0 of the heap array
Generally, follow these steps:
Remove the root
Move the last node into the root
Trickle the last node down until it’s below a larger node and
above a smaller one
599
Example
Delete node 95 (the max) from the following heap:
600
Step 1
Remove the root and replace it by the last node
601
Step 2
Trickle the node down the tree, swap until it lies between
larger and smaller nodes
602
Step 2
Trickle the node down the tree, swap until it lies between
larger and smaller nodes
603
Step 2
Trickle the node down the tree, swap until it lies between
larger and smaller nodes
604
Step 2
Trickle the node down the tree, swap until it lies between
larger and smaller nodes
605
Implementation
Assuming we know the current size of the array (call it n),
removing the root and replacing it with the last node is easy
Just set heapArray[0] equal to heapArray[n-1]
And decrement n
606
Trickling Down
Once we’ve moved the last element to the root, if either or
both children are larger:
Find the bigger child and swap
607
Trickling Down
Note, given a node at index x:
Its parent is at (x-1)/2
Its children are at 2x+1 and 2x+2
608
Trickling Down
So generally, for trickling down:
Start x at 0
while (heapArray[x] < heapArray[2x+1] or
heapArray[x] < heapArray[2x+2])
largerChild = max(heapArray[2x+1], heapArray[2x+2])
swap(heapArray[x], largerChild)
if (largerChild was left child)
x = 2x+1
else
x = 2x+2
609
Of course, we need checks if we are at the bottom…
Java Implementation, page 592
We’ll avoid the extra Node class and just use integers
Let’s construct
A constructor, which takes a maximum heap size
A function to check if the heap is empty
A function to accept an index and perform a trickle down
A function to perform the deletion of the maximum element
610
Operation 2: Insertion
Generally, follow the following steps:
Insert the new node at the next available spot in the bottom row
If it violates the heap condition (translation: it’s bigger than its
parent)
Trickle the node upwards, until it’s smaller than the parent
611
Example
Add node 95 to the following tree:
612
Step 1
Put the new node in the next empty spot
613
Step 2
If the node is larger than the parent, swap it
614
Step 2
If the node is larger than the parent, swap it
615
Step 2
If the node is larger than the parent, swap it
616
Step 2
We’re done! We actually added a new maximum.
617
Implementation
Once again, step 1 is easy
If the current size of the heap is n
Set heapArray[n] to the new key
Increment n
618
Trickling Up
Again, given a node at index x:
Its parent is at (x-1)/2
Its children are at 2x+1 and 2x+2
619
Trickling Up
General approach will be:
while (heapArray[(x-1)/2] < heapArray[x] and we’re not at the
root)
swap(heapArray[x], heapArray[(x-1)/2]
x = (x-1)/2
620
Java Implementation, page 592
Again, we’re avoiding the Node class and just using integers
Let’s implement
Constructor
Function to check if the heap is empty
A function which takes an index and trickles that node up
A function which performs the insertion
621
Let’s do our own example…
Begin with an initially empty heap, with a maximum size of
10. Perform the following operations, showing both the
array contents and the corresponding heap:
Insert 64
Insert 91
Insert 80
Insert 21
Insert 45
Remove the max
Insert 110
Insert 35
Remove the max
Remove the max
622 Insert 204
Efficiency
Swapping just takes O(1) time
Trickle up and trickle down each take O(log n) time
They each iteratively examine parent nodes
A heap is necessarily balanced because of its completeness
623
Why use arrays?
Actually you can use trees if you want to
This is called a tree heap
Let’s construct the Node class… will look similar to the binary
search tree (but remember the properties are different!)
Yes!
625
Heapsort Efficiency
Let’s look at each operation:
Insert all n into the heap
Remove the maximum element n times
Really? That easy?
627
New Insertion Process
The one we learned:
For each new node we insert:
Place it in the last available position O(1)
Trickle up O(log n)
Overall, O(n log n)
629
What we can note…
Trickling down
requires correct
subheaps
630
What we can note…
However, the leaf
nodes in the bottom
row, already must be
correct heaps
So we don’t have to
apply trickle down to
them
These comprise
roughly half the nodes
631 in the tree
So, summary with insertion
So with insertion, we actually can save operations
Instead of n operations of trickle up, we have n/2 operations of
trickle down
Overall
Randomly insert n elements -> n*O(1) = O(n)
Trickle down n/2 elements -> (n/2)*O(log n) = O(n log n)
So it’s still O(n log n), but you’re doing half as many trickles
Not a huge savings, but well worth it!
632
One way: Iteratively
Note that, the bottom row of nodes begins at index n/2
So, apply trickle down to nodes (n/2)-1 to 0 (the root)
Note we have to go in reverse, because trickle down works
properly only if subtrees are correct heaps
633
Second Way: Recursion
Called ‘heapify’, and pass a
node index
Envision as:
If the index is larger than
(n/2)-1, do nothing
Otherwise:
Heapify the left subtree
Heapify the right subtree
Trickle down this node
635
Sharing Space
Note: We don’t necessarily
need two arrays for
heapsort!
n/2 trickle down
operations can be easily
done on the same array, it’s
just swapping contents of
cells
636
Sharing Space
When we remove the
maximum element, one
slot becomes open at the
end of the subarray we are
sorting
We can just insert the
maximum element there!
The result will be a sorted
array
637
(Time Pending)
Java Implementation, page 605
Summary:
Get the array size from the user -> 1
Fill with random data -> n
Turn array into a heap with n/2 applications of trickle down ->
(n/2)*(log n)
Remove items from the heap -> n log n
Write back to the end of the array -> n
639
Graphs
Graphs are a data structure which represent relationships
between entities
Vertices represent entities
Edges represent some kind of relationship
640
Example
The graph on the previous page could be used to model San
Jose freeway connections:
641
Adjacency
Two vertices are adjacent to one another if they are
connected by a single edge
For example:
I and G are adjacent
A and C are adjacent
I and F are not adjacent
642
Path
A path is a sequence of edges
644
Unconnected Graph
An unconnected graph consists of several connected
components:
646
Weighted Graphs
A graph where edges have weights, which quantifies the
relationship
For example, you may assign path distances between cities
Or airline costs
These graphs can be directed or undirected
647
Vertices: Java Implementation
We can represent a vertex as a Java class with:
Character data
A boolean data member to check if it has been visited
648
Adjacency Matrix
An adjacency matrix for a graph with n nodes, is size n x n
Position (i, j) contains a 1 if there is an edge connecting node i
with node j
Zero otherwise
For example, here is a graph and its adjacency matrix:
649
Redundant?
This may seem a bit redundant:
651
Application: Searches
A fundamental operation for a graph is:
Starting from a particular vertex
Find all other vertices which can be reached by following paths
Example application
How many towns in the US can be reached by train from
Tampa?
Two approaches
Depth first search (DFS)
Breadth first search (BFS)
652
Depth First Search (DFS)
Idea
Pick a starting point
Follow a path to unvisited
vertices, as long as you can
until you hit a dead end
When you hit a dead end,
go back to a previous spot
and hit unvisited vertices
Stop when every path is a
dead end
653
Depth First Search (DFS)
Algorithm
Pick a vertex (call it A) as your starting point
Visit this vertex, and:
Push it onto a stack of visited vertices
Mark it as visited (so we don’t visit it again)
Visit any neighbor of A that hasn’t yet been visited
Repeat the process
When there are no more unvisited neighbors
Pop the vertex off the stack
Finished when the stack is empty
655
Depth First Search: Complexity
Let |V| be the number of vertices in a graph
And let |E| be the number of edges
658
Example
Start from A, and execute breadth first search on this graph,
showing the contents of the queue at each step
Every step, we’ll either have a visit or a removal
659
Breadth First Search: Complexity
Let |V| be the number of vertices in a graph
And let |E| be the number of edges
660
Minimum Spanning Trees (MSTs)
On that note of large numbers of edges slowing down our
precious search algorithms:
Let’s look at MSTs, which can help ameliorate this problem
It would be nice to take a graph and reduce the number of
edges to the minimum number required to span all vertices:
What’s the
number of
edges
now?
661
We’ve done it already…
Actually, if you execute DFS you’ve already computed the
MST!
Think about it: you follow a path for as long as you can, then
backtrack (visit every vertex at most once)
You just have to save edges as you go
662
Directed Graphs
A directed graph is a graph where the edges have direction,
signified by arrows:
663
Adjacency Matrix
The adjacency matrix for this graph does not contain
redundant entries
Because now each edge has a source and a sink
So entry (i, j) is only set to 1 if there is an edge going from i to j
0 otherwise
664
Topological Sort
Only works with DAGs (Directed Acyclic Graphs)
That is if the graph has a cycle, this will not work
667
Weighted Graph: Adjacency List
The adjacency list for a weighted graph contains edge weights
Instead of 0 and 1
A B C D E F
A INF INF INF INF 0.1 0.9
B 0.3 INF 0.3 0.4 INF INF
C INF INF INF 0.6 0.4 INF
D INF INF INF INF 1 INF
E 0.55 INF INF INF INF 0.45
F INF INF INF 1 INF INF
669
Dijkstra’s Algorithm
Given a weighted graph, find the shortest path (in terms of
edge weights) between two vertices in the graph
Numerous applications
Cheapest airline fare between departure and arrival cities
Shortest driving distance in terms of mileage
670
Dijkstra’s Algorithm
Suppose in the graph below, we wanted the shortest path
from B to F
A C D E F
INF INF INF INF INF
672
Step 1
Take all edges leaving B,
and put their weights in
the table
Along with the source
vertex
A C D E F
0.3 (B) 0.3 (B) 0.4 (B) INF INF
673
Step 2
Pick the edge with
the smallest weight
and mark it as the shortest
path from B
(How do we know that?)
A C D E F
0.3* (B) 0.3 (B) 0.4 (B) INF INF
674
Step 3
Now choose one of the
edges with minimal weight
and repeat the process
(explore adj. vertices and
mark their total weight)
A C D E F
0.3* (B) 0.3 (B) 0.4 (B) INF INF
675
Step 4
In this case, we’ll look at A
Explore adjacent vertices
Enter the total weight
from B to those vertices
IF that weight is smaller than
the current entry in the table
Ignore the ones marked (*)
A C D E F
0.3* (B) 0.3 (B) 0.4 (B) 0.4 (A) 1.2 (A)
676
Step 5
Now, A is marked and
we’ve visited its neighbors
So pick the lowest entry
in the table (in this case C)
and repeat the process
677
Step 6
Visit C’s neighbors that
are unmarked
Insert their total weight
into the table, IF it’s
smaller than the current
entry
678
Step 7
Now we visit D
Which only contains
one edge to E
A C D E F
0.3* (B) 0.3* (B) 0.4* (B) 0.4 (A) 1.2 (A)
679
Step 8
Now we visit E
Has two outgoing edges
One to A (marked, ignore)
One to F, which changes
the table to
0.4 + 0.45 = 0.85
Which is smaller than the
current entry, 1.2
680
Step 9
Only one vertex left, so
we’re actually finished
Shortest path can be
obtained by starting from
the destination entry and
working backwards
F <- E <- A <- B
682