Lectures Handout
Lectures Handout
Contents
1 Outline 1
2 Background 1
2.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 Introduction 2
3.1 First example - quicksort . . . . . . . . . . . . . . . . . . . . . . 3
4 Generating Functions 5
4.1 GF definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Extra details on power series . . . . . . . . . . . . . . . . . . . . 6
4.3 Other types of generating functions . . . . . . . . . . . . . . . . . 7
7 Randomized algorithms 20
7.1 Monte Carlo Algorithms . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Las Vegas Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Background
2.1 Probability
Probability basics I
Probability basics II
P
• P
The mean of a discrete random variable T is µ := E[T ] := x∈X T (x) Pr({x}) =
y y Pr(T (x) = y).
Asymptotic notation
• Let f, g : N → N. We say that f ∈ O(g) if there is C > 0 such that
f (n) ≤ Cg(n) for all sufficiently large n.
• f ∈ Ω(g) ⇔ g ∈ O(f ).
• f ∈ Θ(g) ⇔ f ∈ O(g) and f ∈ Ω(g).
• In particular if limn→∞ f (n)/g(n) exists and equals L, then
f ∈ Θ(g)
if 0 < L < ∞;
f ∈ Ω(f ) if L = ∞;
f ∈ O(g) if L = 0.
2
Our basic framework
• Let A be an algorithm for a given problem, and let I be the set of legal
inputs. The size of input ι ∈ I is denoted |ι|; we let In be the set of inputs
of size n. The running time of A on input ι is the number of elementary
operations T (ι).
• Worst-case analysis studies W (n) := maxι∈In T (ι). Example: for sorting
integers, In is the set of all sequences of integers of length n. For quicksort,
W (n) ∈ Θ(n2 ) but for mergesort, W (n) ∈ Θ(n log n).
• Here we prefer to study the distribution of T (ι) over In , since this is often
more relevant in practice. This approach requires some probability model
on the inputs, and then T is a random variable, whose mean, variance,
etc, can be studied.
3 Introduction
Organizational matters
3
• If n = 0 or n = 1, do nothing. Otherwise, choose a pivot element x and
partition the array so that if y < x then y is to the left of x and if z > x
then z is to the right of x. Then call the algorithm recursively on the left
and right parts.
• Note that we have not specified what happens if x = y, but we must do
so in any implementation.
• The partitioning step requires at least n−1 comparisons, plus some swaps.
Quicksort recurrence
4
Quicksort recurrence solution details
= 2(n − 1) + 2an−1
4 Generating Functions
4.1 GF definitions
Ordinary generating functions — OGFs
5
• Basic operations on sequences (sum, convolution, . . . ) correspond to those
on OGFs (sum, product, . . . ). See handout for more operations.
• The equality n z n = 1/(1 − z) is purely formal at this stage but also
P
makes sense for |z| < 1. So far OGFs are just a convenient short way to
describe sequences, but soon they will be a powerful machine.
• Given an OGF, how to extract its sequence if not available from above list?
Taylor series definition always works, but is usually not computationally
useful. We mainly use table lookup here.
6
Operations in the power series algebra
All operations take place in B; that is, computing h(n) always involves only
finite algebra.
7
• Basic operations on sequences (sum, convolution, . . . ) correspond to those
on EGFs (sum, product, . . . ). See handout.
• Sometimes OGFs are better for computation, sometimes EGFs.
• The
P PGF associated to a random variable X taking values in N is G(z) =
n
n Pr(X = n)z .
• The mean of X is just G0 (1) and the variance is G00 (1) + G0 (1) − G0 (1)2 .
• The PGF of the sum of independent RVs X and Y is the product of the
individual PGFs.
• The PGF of the sequence Pr(X > n) of tail probabilities is (1 − G(z))/(1 −
z).
• Main idea: recurrence gives an equation involving the GF. Try to solve
this, then extract the coefficients.
• Example (Fibonacci): a0 = 0, a1 = 1; an = an−1 +Pan−2 for n ≥ 2.
Multiply each side by z n , then sum on n. Let F (z) = n an z n . Then we
get F (z) − z = zF (z) + z 2 F (z). This gives F (z) = z/(1 − z − z 2 ).
• To convert back to a sequence, we can use partial fractions. Get F (z) =
−1
√ − φz) + B/(1
A/(1 √ + φ z) where φ = 1.618 · · · , the golden ratio, A =
1/ 5, B = −1/ 5.
√
• Thus an = (φn + (−1)n φ−n )/ 5 ∈ Θ(φn ).
• Note: this can be done automatically by a computer algebra system!
8
Quicksort recurrence with GFs
• From nan = n(n − 1) + 2 j<n aj , a0 = 0, we obtain zF 0 (z) = 2z 2 /(1 −
P
9
Comtet’s algorithm outline
• Suppose P (z, F (z)) = 0 where P (z, y) ∈ C[z, y] is irreducible. Differenti-
ate and solve for F 0 to obtain
A(z, y)
F 0 (z) = for relatively prime polynomials A, B ∈ C[z, y].
B(z, y)
Note that B and P are relatively prime.
• The extended Euclidean algorithm yields polynomials u(z, y), v(z, y), g(z)
such that uB+vP = g. Define C = Au mod P to get F 0 (z) = C(y, z)/g(z).
Repeat as required.
• In above binary tree example, P = zy 2 − y + 1, B = 1 − 2zy, A = y 2 , u =
(2zy − 1)/(4z − 1), C = (2zy − y − z)/(z − 4z 2 ), g = 1/(4z − 1). Algorithm
terminates in one step.
√
• Hence OGF satisfies T (z) = zT (z)2 + 1. Thus T (z) = (1 − 1 − 4z)/2z
(how to choose square root?).
• Can extract coefficients using binomial theorem: get
1 2n
Tn = (Catalan number).
n+1 n
√
• Asymptotics can be derived (later) and we will get Tn ∼ 4n / πn3 .
10
• Study this example carefully - it leads to the symbolic method for enumer-
ation.
• Also the counting OGF for A ∪ B is A(z) + B(z) if the classes are disjoint.
Thus the OGF for the set of sequences of elements of A is 1 + A(z) +
A(z)2 + · · · = (1 − A(z))−1 .
• If a combinatorial class is constructed from atoms using only disjoint
union, product and sequence constructions, then its counting OGF is ra-
tional.
11
• When forming the sum or product of labelled classes, we need to relabel
so that a proper labelling is obtained and the structure of the components
is preserved. If a has size k and b size n −k, then the number of ways to
properly label the ordered pair (a, b) is nk .
• Thus it makes sense to consider EGFs since
! ! !
X X X X n
n n
an z /n! bn z /n! = ak bn−k z n /n!.
n n n
k
k
• Similarly the EGF for sets (sequences where order does not matter) is
1 + A(z)/1! + A(z)2 /2! + · · · = exp(A(z)).
T = {ext} ∪ {int} × T × T .
< tree >=< ext > + < int > × < tree > × < tree > .
• Give < ext > weight a and < int > weight b to obtain T (z) = z a +z b T (z)2 .
Special cases: a = 0, b = 1 counts trees by internal nodes; a = 1, b = 0 by
external nodes; a = b = 1 by total nodes.
12
The symbolic method - labelled tree example
an ≈ Cρ−n np−1
All are best proved using the Cauchy integral formula (complex analysis).
13
• Let TΩ (z) be the enumerating GF of this class. The symbolic method
immediately gives the equation
xω .
P
where φ(x) = ω∈Ω
• Let Ω = {0, 3}. Then the counting (by external nodes) OGF T (z) satisfies
T (z) = z(1 + T (z)3 ).
• By lookup we obtain
(
1 n
n k if n = 3k + 1
an =
0 otherwise.
14
Some trees in analysis of algorithms
• Binary search tree: a binary tree with each internal node having a key,
such that the key of each node n is ≤ all keys in Rn and ≥ all keys in Ln .
Applications: database for comparable data, model for quicksort.
• Heap-ordered tree: a binary tree such that the key of each node is ≥ the
key of anything in its subtree. Applications: priority queue.
• Trie: an m-ary tree where each external node may contain data; children
of leaves must be nonempty. Applications: database for string data, model
for radix exchange sort, leader election in distributed computing.
Tree attributes
• The mean and variance are given by a standard computation. Note that
and so Fu (z, 1) = 2zF (z, 1)[Fu (z, 1) + zFz (z, 1)]. Thus
zFz (z,1)
[z n ] 1−2zF (z,1)
µn :=
[z n ]F (z, 1)
√ 3/2
• The mean
√ µn is asymptotic to πn , so3/2the mean level of a node is of
order n. The variance is also of order n .
• Suppose we insert n distinct keys into an initially empty BST. The uniform
distribution on permutations of size n induces the quicksort distribution
on BSTs of size n.
• The internal path length equals the construction cost of a binary search
tree of size n; dividing by n gives the expected cost of a successful search.
15
P z|π| `(π)
• Let F (z, u) = |π|! u be the BGF of BSTs by size and internal path
n
length. Note that [z ]F (z, u) is the PGF of internal path length on BSTs
with n nodes.
Then
Fz (z, u) = F (zu, u)2 F (0, u) = 1.
• Moments of the distribution are easily obtained as for the uniform model.
The mean is ∼ 2n log n and variance is in Θ(n2 ). Quite a different shape.
6.2 Strings
String basics
• Let A be a finite set called the alphabet. A string over A is a finite sequence
of elements of A. The set of all strings over A is written A∗ .
• A subset of A∗ is a language.
• If A = {0, 1} the strings are called bitstrings.
• Basic algorithmic questions: string matching (find a pattern in a given
string); search for a word in a dictionary; compress a string. Many appli-
cations in computational biology, computer security, etc.
Pattern avoidance
16
• A nice trick: let T be the position of the end of the first occurrence of the
pattern, Xn the event that the firstP n bits of a random bitstring do not
contain the pattern. Then S(z) = n≥0 an z n implies that
X X X
S(1/2) = an /2n = Pr(Xn ) = Pr(T > n) = E[T ].
n≥0 n≥0 n≥0
17
Regular languages
• Consider language (over alphabet {a, b}) defined by (bb | a(bb)∗ aa | a(bb)∗ (ab |
ba)(bb)∗ (ab | ba))∗ (number of b’s is even, number of a’s divisible by 3).
• The symbolic method gives
(1 − z 2 )2
S(z) = .
1 − 3z 2 − z 3 + 3z 4 − 3z 5 + z 6
Hence an ≈ CAn , A ∼
= 1.7998.
• Need to check that the expression is unambiguous.
6.3 Tries
Tries
18
• More generally, we may stop branching as soon as the strings are all distin-
guished. This gives a trie, a binary tree such that all children of leaves are
nonempty. Each string is stored in an external node but not all external
nodes have strings. Can be described by symbolic method.
• A Patricia trie saves space, by collapsing one-way branches to a single
node.
• Relevant parameters: number of internal nodes In ; external path length
Ln ; length of rightmost branch Rn .
Trie recurrences
We assume that a trie is built from n infinite random bitstrings. Each bit
of each string is independently either 0 or 1. We have
•
1 X n
Ln = n + n (Lk + Ln−k ) ,
2 k
k
•
1 X n
In = 1 + n (Ik + In−k ) ,
2 k
k
•
1 X n
Rn = 1 + Rk ,
2n k
k
• Thus we obtain
X n−1
Ln = n 1 − 1 − 2−j
j≥0
X h n n n−1 i
In = 2j 1 − 1 − 2−j − j 1 − 2−j .
2
j≥0
19
• How to derive an asymptotic approximation? See Flaj-Sedg p211, p402 for
elementary arguments. Answers: Ln ≈ n lg n, In ≈ n/ lg 2. More precise
answers are obtained by complex methods (Mellin transform).
• Iteration yields X
φ̂(z) = 2j â(2−j z).
j≥0
Summary: tries
• A useful data structure for dictionary and pattern matching. Also a math-
ematical model for many algorithms.
• Asymptotically optimal (lg n) expected search cost.
• Space wastage: about 44% extra nodes (1/ lg 2 − 1).
• Recurrences under the infinite random bitstring model yield GF equations
that are tricky. Solution involves infinite sums of functions.
• Explicit formulae for solutions are infinite sums. Mellin transforms or
Rice’s integrals give precise asymptotics; elementary methods can also be
used.
7 Randomized algorithms
Randomized Algorithms
• Incorporating random choices into an algorithm can improve its expected
performance. The key idea is that no adversary can force us into a bad
case, because our choices are unknown.
• There are philosophical problems concerning random number generation
on a deterministic computer, or even what “random” means. Pseudoran-
dom generators are used in practice; their output passes most statistical
tests for “randomness”.
20
• We can also think of a randomized algorithm as an element randomly
chosen from a set of algorithms.
• There are two main types: Monte Carlo (answer may be wrong, running
time is deterministic) and Las Vegas (answer is correct, running time is
random).
Fingerprinting
• This means P (Y |N ) ≤ 1 − p, P (N |Y ) ≤ 1 − p.
21
• If, say, NO is always right then we can improve our confidence in the
answer by repeating the algorithm n times on the same instance. This is
amplification of the stochastic advantage. If we ever get NO, report NO.
Else report YES. Probability of error is at most (1 − p)n . To reduce this
to ε requires number of trials proportional to lg(1/ε) and to − lg(1 − p).
• We need p ≥ 1/2. Repeat n (odd) times and return the more frequent
answer. Analysis is more complicated.
• Let Xi = 1 if ith run gives correct P answer, 0 otherwise. Then Xi is a
n
Bernoulli random variable and X = i=1 Xi is binomial with parameter
p. Probability of error of repeated algorithm is P (X < n/2). This is just
n j
P n−j
j<n/2 j p (1 − p) .
• Simplifying this could be done, but it is easier to use the normal approxi-
mation: X is approximately normal with mean np and variance np(1 − p)
for n large enough. Use table of normal distribution to work out size of n
for given ε. Answer is proportional to lg(1/ε) and (p − 1/2)−2 .
Primality testing
22
7.2 Las Vegas Algorithms
Las Vegas algorithms
• Nice properties: can find more than one solution even on same input;
breaks the link between input and worse-case runtime.
• Every Las Vegas algorithm can be converted to a Monte Carlo algorithm:
just report a random answer if the running time gets too long.
• Examples:
– randomized quicksort and quickselect;
– randomized greedy algorithm for n queens problem;
– integer factorization;
– universal hashing;
– linear time MST.
• If a particular run takes too long, we can just stop and run again on the
same input. If the expected runtime is fast, it is very unlikely that many
iterations will fail to run fast.
• To quantify this: let p be probability of success, s the expected time to find
a solution, and f the expected time when failure occurs. Then expected
time t until we find a solution is given by t = ps + (1 − p)(f + t), so
t = s + (1 − p)f /p.
• We can use this to optimize the repeated algorithm.
Randomized quickselect
23