0% found this document useful (0 votes)
4 views

Lectures Handout

The document discusses generating functions and their use in analyzing algorithms. It covers background topics like probability and algorithm analysis. Generating functions are defined and various types are discussed. Methods are presented for using generating functions to solve recurrences and extract coefficient information. Applications to problems involving trees, strings, and tries are provided. The document also discusses randomized algorithms and Monte Carlo and Las Vegas algorithms.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lectures Handout

The document discusses generating functions and their use in analyzing algorithms. It covers background topics like probability and algorithm analysis. Generating functions are defined and various types are discussed. Methods are presented for using generating functions to solve recurrences and extract coefficient information. Applications to problems involving trees, strings, and tries are provided. The document also discusses randomized algorithms and Monte Carlo and Las Vegas algorithms.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

1

Contents
1 Outline 1

2 Background 1
2.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 1

3 Introduction 2
3.1 First example - quicksort . . . . . . . . . . . . . . . . . . . . . . 3

4 Generating Functions 5
4.1 GF definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Extra details on power series . . . . . . . . . . . . . . . . . . . . 6
4.3 Other types of generating functions . . . . . . . . . . . . . . . . . 7

5 Generating functions and recurrences 8


5.1 Extra details on recurrences . . . . . . . . . . . . . . . . . . . . . 9
5.2 Extras on the symbolic method . . . . . . . . . . . . . . . . . . . 11
5.3 Coefficient extraction from GFs . . . . . . . . . . . . . . . . . . . 12

6 Combinatorial and Algorithmic Applications 14


6.1 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3 Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7 Randomized algorithms 20
7.1 Monte Carlo Algorithms . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Las Vegas Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 22

2 Background
2.1 Probability
Probability basics I

• A probability space is a set X with a probability measure Pr defined on a


σ-algebra S.
• A σ-algebra is a collection of subsets of X that contains ∅ and is closed
under complement, unions.
• A probability measure
P is a function Pr : S → [0, 1] such that Pr(∅) =
0; Pr(∪i Ai ) = i Pr Ai if the Ai are pairwise disjoint.
• For us, usually X is finite of size n and S contains all the 2n subsets of
X. The space is discrete.
• A random variable is a (measurable, real-valued) function on X. For us,
the “measurable” can safely be ignored.

Probability basics II
P
• P
The mean of a discrete random variable T is µ := E[T ] := x∈X T (x) Pr({x}) =
y y Pr(T (x) = y).

• The variance of T is σ 2 := E[T 2 ] − E[T ]2 .


• Chebyshev’s inequality says that Pr(|T − µ| > cσ) < c−2 for each c > 0.
In practice this probability is often exponentially small in c.
• The measure is uniform if the probability of a subset depends only on its
size. The probability of a k-element subset of an n-element space is then
k/n. In this case all the above quantities can be computed via counting.

2.2 Algorithm Analysis


Recalling ideas from previous courses
• We aim to compare resource use of algorithms for a given computational
problem. A mathematical model is needed.
• We measure asymptotic running time for large inputs. Small inputs are
not a challenge. We don’t care about constant factor speedups due to
faster machine, better programming, etc.
• We identify elementary operations and measure running time in terms of
these (e.g. comparisons, swaps for sorting).
• Every comparison-based sorting method must use Ω(n log n) comparisons
in the worst case for a file of size n.

Asymptotic notation
• Let f, g : N → N. We say that f ∈ O(g) if there is C > 0 such that
f (n) ≤ Cg(n) for all sufficiently large n.
• f ∈ Ω(g) ⇔ g ∈ O(f ).
• f ∈ Θ(g) ⇔ f ∈ O(g) and f ∈ Ω(g).
• In particular if limn→∞ f (n)/g(n) exists and equals L, then

f ∈ Θ(g)
 if 0 < L < ∞;
f ∈ Ω(f ) if L = ∞;

f ∈ O(g) if L = 0.

2
Our basic framework

• Let A be an algorithm for a given problem, and let I be the set of legal
inputs. The size of input ι ∈ I is denoted |ι|; we let In be the set of inputs
of size n. The running time of A on input ι is the number of elementary
operations T (ι).
• Worst-case analysis studies W (n) := maxι∈In T (ι). Example: for sorting
integers, In is the set of all sequences of integers of length n. For quicksort,
W (n) ∈ Θ(n2 ) but for mergesort, W (n) ∈ Θ(n log n).
• Here we prefer to study the distribution of T (ι) over In , since this is often
more relevant in practice. This approach requires some probability model
on the inputs, and then T is a random variable, whose mean, variance,
etc, can be studied.

• We concentrate on systematic and powerful mathematical tools, avoiding


special tricks.

3 Introduction
Organizational matters

• Lecturer: Dr Mark Wilson; office 303.588; www.cs.auckland.ac.nz/˜mcw.


Office hours by appointment and whenever my door is open - email is
preferred.
• Recommended reading:
– Flajolet and Sedgewick, An Introduction to the Analysis of Algo-
rithms (on reserve in library);
– Wilf, generatingfunctionology (in library, also freely available online
from my webpage);
– Graham, Knuth, Patashnik, Concrete Mathematics (on reserve).
– Lecture slides available online, but are continually being updated and
corrected, so don’t rely on them until the end of the course.
• One assignment worth 20% of course marks will be given.

3.1 First example - quicksort


Quicksort algorithm

• A recursive algorithm for sorting an array of n elements from a totally


ordered set.

3
• If n = 0 or n = 1, do nothing. Otherwise, choose a pivot element x and
partition the array so that if y < x then y is to the left of x and if z > x
then z is to the right of x. Then call the algorithm recursively on the left
and right parts.
• Note that we have not specified what happens if x = y, but we must do
so in any implementation.
• The partitioning step requires at least n−1 comparisons, plus some swaps.

Quicksort recurrence

• Let C(F ) be the number of comparisons done by quicksort on an n-element


file F of distinct elements. Let F1 , F2 be the subfiles consisting of elements
less than, greater than the pivot. Then
(
0 if n = 0;
C(F ) =
n − 1 + C(F1 ) + C(F2 ) if n > 0.

• If the pivot is always the smallest element then F1 is empty, so iterating


this recursion gives W (n) ∈ Θ(n2 ). Quicksort has a bad worst case.
• Choose an input file (permutation of {1, . . . , n}) uniformly at random.
After pivoting, the pivot is equally likely to be in any of the n positions,
and each subfile is uniformly distributed, so
n
1X
E[Cn ] = n − 1 + (E[Cp−1 ] + E[Cn−p ]).
n p=1

Quicksort recurrence solution


1
P
• Let an = E[Cn ] so that an = n − 1 + n 1≤p≤n (ap−1 + an−p ). Collect
common terms in the sums to obtain
n−1
2X
an = n − 1 + aj .
n j=0

• This full history recurrence can be solved by eliminating the history, (a


general form of telescoping).
• The solution is an = 2(n + 1)Hn − 4n ≈ 2n log n, where
X
Hn := 1/j, the nth harmonic number.
1≤j≤n

• Quicksort is of optimal order on average.


• What about the variance?

4
Quicksort recurrence solution details

• We have for n > 1


X
nan − (n − 1)an−1 = n(n − 1) + 2 aj
0≤j<n
X
− (n − 1)(n − 2) − 2 aj
0≤j<n−1

= 2(n − 1) + 2an−1

• Thus nan = 2(n − 1) + (n + 1)an−1 , so that


an 2(n − 1) an−1 an−1
= + = + 2(n − 1).
n+1 n(n + 1) n n

• We can iterate to obtain


n
an a1 X 2 1 1
= + + − .
n+1 2 j=2
j + 1 j j + 1

Issues arising from our analysis

• How do we analyse suggested improvements, such as using a cutoff for


small files, or median-of-3 pivoting, or different partitioning method?
• The method we used for solving the recurrences was somewhat specialized.
How to do it more generally?
• How do we get more information about the full distribution of the number
of comparisons?
• How to derive asymptotics for the sums occurring?
• The above analysis relies on the fact that the subfiles are themselves uni-
formly distributed. What to do if this is not the case?
• How to analyse input where keys can be equal?

We answer points 1–4 in this course.

4 Generating Functions
4.1 GF definitions
Ordinary generating functions — OGFs

• The OGFPassociated to a sequence a0 , a1 , . . . , is the formal power series


n 2
F (z) = n an z . Examples: 1, 1, 1, . . . , has OGF 1 + z + z + · · · =
1/(1 − z). See handout for more examples.

5
• Basic operations on sequences (sum, convolution, . . . ) correspond to those
on OGFs (sum, product, . . . ). See handout for more operations.
• The equality n z n = 1/(1 − z) is purely formal at this stage but also
P
makes sense for |z| < 1. So far OGFs are just a convenient short way to
describe sequences, but soon they will be a powerful machine.
• Given an OGF, how to extract its sequence if not available from above list?
Taylor series definition always works, but is usually not computationally
useful. We mainly use table lookup here.

Summary - power series

• Basic principle: if an identity between power series is true at the level


of analytic functions, then it is true for formal series, provided all opera-
tions concerned are formally valid (computation of each coefficient can be
carried out in B).
• Thus we may freely use all our algebra and calculus knowledge, with ap-
propriate caution about composition.
• An example of what is not allowed formally: n (z +1)n /n! = e n z n /n!.
P P
This is true at the function level for every z ∈ C and says exp(z + 1) =
exp(1) exp(z), but the constant term of the left side can’t be computed in
B.

4.2 Extra details on power series


Formal power series

• A sequence is a function f : N → C. We sometimes write (f (0), f (1), . . . , )


instead of just f . The special sequences 1 := (1, 0, 0, . . . , ) and z :=
(0, 1, 0, 0 . . . ) will be useful.

• The set of all sequences we call A. We define operations + and · on A as


follows.
– The sum f + g of sequences is the sequence h such that h(n) =
f (n) + g(n) for each n ∈ N.

Pn product f · g is the sequence h such that for each n, h(n) =


– The
k=0 f (k)g(n − k).

With these operations and the obvious multiplication by complex num-


bers, A becomes a commutative associative algebra. It contains a sub-
algebra B (consisting of all sequences with finitely many nonzero terms)
isomorphic to the algebra of polynomials in one indeterminate.

6
Operations in the power series algebra

• We can define the derivative of f as the sequence h such that h(n) =


(n + 1)f (n + 1).
• The antiderivative of f is the sequence h such that h(n) = f (n − 1)/n for
n > 0, and h(0) = 0.
Pn
• The composition of f and g is the sequence h given by h(n) = k=0 f (k)g k .
This is only valid when g(0) = 0.
• The inverse of f is the series h such that f · h = 1; this exists if and only
if f (0) 6= 0.

All operations take place in B; that is, computing h(n) always involves only
finite algebra.

Analytic functions and power series


Some facts from a course in complex analysis (not proved here):
• For each
P f ∈ A there is a largest R ≤ ∞, such that for |z| < R, the infinite
series n f (n)z n converges absolutely and uniformly to a limit F (z).
• If R > 0, then F is analytic (has derivatives of all orders) for |z| < R, and
can be integrated and differentiated term-by-term.
• Conversely, if R > 0 and F is an analytic
P function for |z| < R, then there
is a unique sequence f so that F (z) = n f (n)z n .
• Here f is essentially the sequence of Taylor coefficients of F at 0,
 n
1 d
f (n) = F (z)|z=0
n! dz
and can also be computed by Cauchy’s integral formula
Z
1 F (z)
f (n) = dz.
2πi C z n+1

4.3 Other types of generating functions


Exponential generating functions — EGFs

• The EGF P associated to a sequence a0 , a1 , . . . , is the formal power series


F (z) = n an z n /n!. Examples: 1, 1, 1, . . . , has EGF 1 + z + z 2 /2! + · · · =
exp(z). See handout.
• The EGF of sequence {an } is the OGF of {an /n!}.
• EGFs are often used for labelled constructions and for permutations (per-
haps more later).

7
• Basic operations on sequences (sum, convolution, . . . ) correspond to those
on EGFs (sum, product, . . . ). See handout.
• Sometimes OGFs are better for computation, sometimes EGFs.

Probability generating functions — PGFs

• The
P PGF associated to a random variable X taking values in N is G(z) =
n
n Pr(X = n)z .
• The mean of X is just G0 (1) and the variance is G00 (1) + G0 (1) − G0 (1)2 .
• The PGF of the sum of independent RVs X and Y is the product of the
individual PGFs.
• The PGF of the sequence Pr(X > n) of tail probabilities is (1 − G(z))/(1 −
z).

Multivariate generating functions — MGFs


Occur naturally in many contexts. An important one for probability is as
follows. Let ank be the number of objects P of some type with size n and another
parameter χ equal to k. Let F (z, u) = ank z n uk , the bivariate GF. Then
• cn := [z n ]F (z, 1) = k ank is the number of objects of size n;
P

• µn := (1/cn )[z n ]Fu (z, 1) = (1/cn ) k kank is the mean of χ on a uniformly


P
chosen object of size n;
• σn2 := (1/cn )[z n ]Fuu (z, 1) + µn − µ2n is the variance of χ on a uniformly
chosen object of size n.

5 Generating functions and recurrences


GFs solve recurrences

• Main idea: recurrence gives an equation involving the GF. Try to solve
this, then extract the coefficients.
• Example (Fibonacci): a0 = 0, a1 = 1; an = an−1 +Pan−2 for n ≥ 2.
Multiply each side by z n , then sum on n. Let F (z) = n an z n . Then we
get F (z) − z = zF (z) + z 2 F (z). This gives F (z) = z/(1 − z − z 2 ).
• To convert back to a sequence, we can use partial fractions. Get F (z) =
−1
√ − φz) + B/(1
A/(1 √ + φ z) where φ = 1.618 · · · , the golden ratio, A =
1/ 5, B = −1/ 5.

• Thus an = (φn + (−1)n φ−n )/ 5 ∈ Θ(φn ).
• Note: this can be done automatically by a computer algebra system!

8
Quicksort recurrence with GFs
• From nan = n(n − 1) + 2 j<n aj , a0 = 0, we obtain zF 0 (z) = 2z 2 /(1 −
P

z)3 + 2zF (z)/(1 − z) via the dictionary.


• This is a standard first order linear inhomogeneous differential equation
that can be solved (how?) to obtain
 
2 1
F (z) = log − z .
(1 − z)2 1−z
Thus by lookup we have an = 2(n + 1)Hn − 4n.

GFs also yield recurrences


• Example: for fixed r ≥ 1, consider
X
F (z) = (1 − z)/(1 − 2z + z r+1 ) = an z n .
n

Thus ( n an z n )(1 − 2z + z r+1 ) = 1 − z. Compare coefficients to see that


P
a0 = 1, a1 − 2a0 = −1, and
an − 2an−1 + an−r−1 = 0 for n ≥ 2.
This allows linear time computation of an .
• Every rational OGF gives a linear constant coefficient recurrence in this
way, and vice versa.
• In the same way, linear recurrences with polynomial coefficients corre-
spond to linear differential equations for the OGF.

5.1 Extra details on recurrences


GFs sometimes yield better recurrences
• The counting GF of binary trees by internal nodes satisfies T (z) = 1 +
2
zT (z)P (later). This is equivalent to the quadratic recurrence a0 = 1,
an = k<n ak an−1−k for the number an of binary trees with n nodes.
• There is an algorithm (Comtet 1964) which finds a linear differential equa-
tion with polynomial coefficients for each algebraic GF.
• In this case the answer turns out to be
(4z 2 − z)T 0 (z) + (2z − 1)T (z) + 1 = 0
which is equivalent to the recurrence
(n + 1)an = (4n − 2)an−1 a0 = 1.
This allows for much faster computation and makes it  plain that an in-
1 2n
volves a quotient of factorials ( in fact an = n+1 n , the nth Catalan
number).

9
Comtet’s algorithm outline
• Suppose P (z, F (z)) = 0 where P (z, y) ∈ C[z, y] is irreducible. Differenti-
ate and solve for F 0 to obtain
A(z, y)
F 0 (z) = for relatively prime polynomials A, B ∈ C[z, y].
B(z, y)
Note that B and P are relatively prime.
• The extended Euclidean algorithm yields polynomials u(z, y), v(z, y), g(z)
such that uB+vP = g. Define C = Au mod P to get F 0 (z) = C(y, z)/g(z).
Repeat as required.
• In above binary tree example, P = zy 2 − y + 1, B = 1 − 2zy, A = y 2 , u =
(2zy − 1)/(4z − 1), C = (2zy − y − z)/(z − 4z 2 ), g = 1/(4z − 1). Algorithm
terminates in one step.

OGFs and counting I


• Can often set up a recursion and solve as above. Example: how many
binary trees Tn with n (internal) nodes? Binary tree is a recursive object:
either a single external node, or an internal node connected to an ordered
pair of binary trees. Thus T0 = 1 and for n > 0,
X X
Tn = Tk−1 Tn−k = Tk Tn−1−k .
1≤k≤n 0≤k≤n−1


• Hence OGF satisfies T (z) = zT (z)2 + 1. Thus T (z) = (1 − 1 − 4z)/2z
(how to choose square root?).
• Can extract coefficients using binomial theorem: get
 
1 2n
Tn = (Catalan number).
n+1 n

• Asymptotics can be derived (later) and we will get Tn ∼ 4n / πn3 .

OGFs and counting II


• A nicer way to get the OGF is as follows. Let T be the set of all binary
trees, |t| the size of tree t, Tn = {t ∈ T : |t| = n}. Then
XX X X
T (z) = zn = z |t| = 1 + z |tl |+|tr |+1
n t∈Tn t∈T t∈T \T0
X X X X
|tl |+|tr |+1
=1+ z =1+ z |tl | z |tr |
tl ∈T tr ∈T tl ∈T tr ∈T
2
= 1 + zT (z)
where tr , tl are the right and left subtrees of t.

10
• Study this example carefully - it leads to the symbolic method for enumer-
ation.

The symbolic method I

• Avoids explicit recurrences, goes straight from recursive description of


structure to the counting GF. A great time-saver, and also more amenable
to computer algebra implementation.
• A combinatorial class is a set X which is the disjoint union of finite sets
Xn . The size |x| of an element x is the value of n for which x ∈ Xn .
• Let A be aPcombinatorial
P class, with |An | = an . The counting OGF for A
is A(z) = n an z n = a∈A z |a| .
• Set operations such as disjoint union, cartesian product, sequence corre-
spond to GF operations sum, product, quasi-inverse.
• Main examples: plane trees, compositions, regular languages.

The symbolic method II

• Let A, B be combinatorial classes with counting OGFs A(z), B(z). The


size of an ordered pair of objects (α, β) is defined to be |α| + |β|.

• The counting OGF of A × B is then


X XX
z |γ| = z |α|+|β| = A(z)B(z).
γ∈A×B α∈A β∈B

• Also the counting OGF for A ∪ B is A(z) + B(z) if the classes are disjoint.
Thus the OGF for the set of sequences of elements of A is 1 + A(z) +
A(z)2 + · · · = (1 − A(z))−1 .
• If a combinatorial class is constructed from atoms using only disjoint
union, product and sequence constructions, then its counting OGF is ra-
tional.

5.2 Extras on the symbolic method


The symbolic method for EGFs

• EGFS are often used for labelled constructions. A labelled combinatorial


object is one where each atom carries a positive integer label and all labels
are distinct. The labelling is proper if the label set is {1, . . . , n}.
• Example: a permutation is a properly labelled sequence of atoms.

11
• When forming the sum or product of labelled classes, we need to relabel
so that a proper labelling is obtained and the structure of the components
is preserved. If a has size k and b size n −k, then the number of ways to
properly label the ordered pair (a, b) is nk .
• Thus it makes sense to consider EGFs since
! ! !
X X X X n 
n n
an z /n! bn z /n! = ak bn−k z n /n!.
n n n
k
k

• Similarly the EGF for sets (sequences where order does not matter) is
1 + A(z)/1! + A(z)2 /2! + · · · = exp(A(z)).

The symbolic method - binary tree example

• A binary tree is either a single external node or an internal node connected


to a pair of binary trees. Let T be the class of binary trees:

T = {ext} ∪ {int} × T × T .

In terms of a formal grammar

< tree >=< ext > + < int > × < tree > × < tree > .

• Give < ext > weight a and < int > weight b to obtain T (z) = z a +z b T (z)2 .
Special cases: a = 0, b = 1 counts trees by internal nodes; a = 1, b = 0 by
external nodes; a = b = 1 by total nodes.

The symbolic method - composition examples

• An integer composition is a sequence of positive integers. The size is


the sum of the sequence. The counting OGF for the positive integers
is I(z) = z/(1 − z), so the counting OGF for compositions by size is
(1 − z/(1 − z))−1 = (1 − z)/(1 − 2z) and there are 2n−1 compositions of n.
• Let S be any subset of positive integers and let I(z) = n∈S z n be its
P
counting OGF. Then (1 − I(z))−1 enumerates compositions with parts
restricted to S. Example: S = {1, 2, . . . , r} gives (1 − z)/(1 − 2z + z r+1 ).
• Similarly we can count the number of terms in the sequence with another
variable. The bivariate GF for compositions is
1 1−z 1−z
= = .
1 − uz/(1 − z) 1 − z − uz 1 − (1 + u)z

12
The symbolic method - labelled tree example

• A properly labelled (unordered) tree is a connected acyclic graph with n


vertices, each with one of the numbers 1, . . . , n.
• By symbolic method, the set T of rooted labelled unordered trees satisfies
T = {•} × set(T ) and so T (z) = z exp(T (z)).
• Lagrange inversion gives
n! n−1 n! X
an = [y ] exp(ny) = [y n−1 ] (ny)k /k! = nn−1 .
n n
k

• Hence the number of labelled trees is nn−2 .

5.3 Coefficient extraction from GFs


Coefficient extraction formulae (no proof )
Some basic results:
• √
n! ≈ 2πn(n/e)n (Stirling).
Hence
4n
 
1 2n
≈√ .
n+1 n πn

• Suppose F (z) = n an z n = G(z)/H(z) is rational and H has a unique


P
root ρ of smallest modulus. Then

an ≈ Cρ−n np−1

where p is the order of ρ and C is easily computable.


• If F (z) = zφ(F (z)) with φ(0) 6= 0, then
1 n−1
[z n ]F (z) = [u ]φ(u)n (Lagrange inversion formula).
n

All are best proved using the Cauchy integral formula (complex analysis).

Lagrange inversion example: degree-restricted trees

• Let 0 ∈ Ω ⊆ N. We consider the class TΩ of all ordered plane trees such


that the outdegree of each node is restricted to belong to Ω.
• Examples: Ω = {0, 1} gives paths; Ω = {0, 2} gives binary trees; Ω = {0, t}
gives t-ary trees; Ω = N gives general ordered trees.

13
• Let TΩ (z) be the enumerating GF of this class. The symbolic method
immediately gives the equation

TΩ (z) = zφ(TΩ (z))

xω .
P
where φ(x) = ω∈Ω

• Lagrange inversion is tailor-made for this situation. We have


1 n−1
[z n ]TΩ (z) = [u ]φ(u)n .
n

Lagrange inversion example II - ternary trees

• Let Ω = {0, 3}. Then the counting (by external nodes) OGF T (z) satisfies
T (z) = z(1 + T (z)3 ).

• By Lagrange inversion we get


1 n−1 n
an = [z n ]T (z) = [u ] 1 + u3 .
n

• By lookup we obtain
(
1 n

n k if n = 3k + 1
an =
0 otherwise.

• Asymptotics are easily derived using Stirling’s approximation.

6 Combinatorial and Algorithmic Applications


6.1 Trees
Types of trees I

• A free tree is a connected graph with no cycles.


• A rooted tree is a free tree with a distinguished node called the root.
• An ordered tree is a rooted tree where the order of subtrees is important;
recursively, a root connected to a sequence of ordered trees.
• A m-ary tree is an ordered tree where every node has 0 or m children.
• A labelled tree is a tree with n nodes such that each node is labelled by an
element of [n] and all labels are distinct.

Synonyms in literature: plane = ordered; oriented = rooted.

14
Some trees in analysis of algorithms

• Binary search tree: a binary tree with each internal node having a key,
such that the key of each node n is ≤ all keys in Rn and ≥ all keys in Ln .
Applications: database for comparable data, model for quicksort.
• Heap-ordered tree: a binary tree such that the key of each node is ≥ the
key of anything in its subtree. Applications: priority queue.
• Trie: an m-ary tree where each external node may contain data; children
of leaves must be nonempty. Applications: database for string data, model
for radix exchange sort, leader election in distributed computing.

Tree attributes

• The size is the number of (external, internal, or just plain) nodes.


• The depth of a node in a rooted tree is the distance to the root.
• The maximum depth is the height. The sum of all depths of internal
(external) nodes is the internal (external) path length.

Path length in binary trees (uniform model)

• The bivariate generating function F (z, u) enumerating binary trees by


number of nodes and internal path length satisfies the equation

F (z, u) = 1 + zF (zu, u)2 .

• The mean and variance are given by a standard computation. Note that

Fu (z, u) = 2zF (zu, u)[Fu (zu, u) + zFz (zu, u)]

and so Fu (z, 1) = 2zF (z, 1)[Fu (z, 1) + zFz (z, 1)]. Thus
zFz (z,1)
[z n ] 1−2zF (z,1)
µn :=
[z n ]F (z, 1)
√ 3/2
• The mean
√ µn is asymptotic to πn , so3/2the mean level of a node is of
order n. The variance is also of order n .

Path length in binary search trees

• Suppose we insert n distinct keys into an initially empty BST. The uniform
distribution on permutations of size n induces the quicksort distribution
on BSTs of size n.
• The internal path length equals the construction cost of a binary search
tree of size n; dividing by n gives the expected cost of a successful search.

15
P z|π| `(π)
• Let F (z, u) = |π|! u be the BGF of BSTs by size and internal path
n
length. Note that [z ]F (z, u) is the PGF of internal path length on BSTs
with n nodes.
Then
Fz (z, u) = F (zu, u)2 F (0, u) = 1.

• Moments of the distribution are easily obtained as for the uniform model.
The mean is ∼ 2n log n and variance is in Θ(n2 ). Quite a different shape.

6.2 Strings
String basics

• Let A be a finite set called the alphabet. A string over A is a finite sequence
of elements of A. The set of all strings over A is written A∗ .
• A subset of A∗ is a language.
• If A = {0, 1} the strings are called bitstrings.
• Basic algorithmic questions: string matching (find a pattern in a given
string); search for a word in a dictionary; compress a string. Many appli-
cations in computational biology, computer security, etc.

Hidden pattern occurrences

• The set of occurrences of the subsequence (hidden pattern) −a1 −a2 −· · ·−


ak − in a string of length n corresponds to A∗ a1 A∗ a2 A∗ a3 . . . A∗ ak A∗ .
• The counting OGF of A∗ is 1/(1 − mz) so the OGF for all pattern occur-
rences is P (z) = z k /(1 − mz)k+1 where m = |A|.
• The expected number of occurrences in a random “word” of length n is
[z n ]P (z)/(mn ) = m−k nk .
• The OGF for total occurrences of the substring a1 a2 . . . ak is z k /(1 − mz)2
and a similar analysis applies.
• Note relevance to various conspiracy theories. We should expect a suffi-
ciently long random text to contain any given hidden message.

Pattern avoidance

• We have already counted total occurrences of a given substring or pattern.


Now we want to count number of words an not containing a given pattern
(a harder problem).

16
• A nice trick: let T be the position of the end of the first occurrence of the
pattern, Xn the event that the firstP n bits of a random bitstring do not
contain the pattern. Then S(z) = n≥0 an z n implies that
X X X
S(1/2) = an /2n = Pr(Xn ) = Pr(T > n) = E[T ].
n≥0 n≥0 n≥0

Pattern avoidance - simple example


• Given substring σ = 00 · · · 0 of length k, let S(z) be the counting OGF for
bitstrings without σ as substring.
• Recursion/symbolic method gives
!
X
i
S(z) = z (1 + zS(z))
i<k
so
1 + z + z 2 + · · · + z k−1 1 − zk
S(z) = = .
1 − z − z2 − · · · − zk 1 − 2z + z k+1
• Asymptotics: an ≈ Cρ−n where ρ is smallest modulus root of denomina-
tor.
• Note that ρ = 1/2 + ρk+1 /2 and 0 < ρ < 1. Thus 1/2 < ρ < 1/2 + 1/2k ,
etc, and we can compute ρ quickly by iteration.
• Note S(1/2) = 2k+1 − 2.

Substring patterns - autocorrelation polynomial


• Consider an arbitrary binary string σ = σ0 σ1 · · · σk−1 of length k.
• For 0 ≤ j ≤ 1, shift σ right j places. Define cj = 1 if the overlap matches
the tailPσ (j) of σ, cj = 0 otherwise. The autocorrelation polynomial is
c(z) = j cj z j .
• Let S, (resp. T ) be the set of bitstrings not containing p (resp. containing
it once at the end). Then
S ∪T ∼ = {} ∪ S × {0, 1}
S × {σ} ∼
= T × ∪{j:cj 6=0} σ (j)
and the symbolic method gives S(z) + T (z) = 1 + 2zS(z) and S(z)z k =
T (z)c(z).
• Thus
c(z)
S(z) = .
zk + (1 − 2z)c(z)
• Note [z n ]S(z/2) = Pr(Xn ). Looking at the poles of S(z/2) we see that
Pr(Xn ) → 1 exponentially fast as n → ∞: every fixed substring is con-
tained with high probability in a sufficiently long string.

17
Regular languages

• Rational GFs always arise from the transfer matrix method.


• Special case: the counting GF of an unambiguous regular language is
rational (Chomsky-Schützenberger, 1963).
• Recall that every regular language can be defined by an unambiguous
regular expression.
• Thus if we construct a combinatorial class iteratively using only disjoint
union, cartesian product, and sequence, the counting GF is rational.

Regular expression example

• Consider language (over alphabet {a, b}) defined by (bb | a(bb)∗ aa | a(bb)∗ (ab |
ba)(bb)∗ (ab | ba))∗ (number of b’s is even, number of a’s divisible by 3).
• The symbolic method gives

(1 − z 2 )2
S(z) = .
1 − 3z 2 − z 3 + 3z 4 − 3z 5 + z 6
Hence an ≈ CAn , A ∼
= 1.7998.
• Need to check that the expression is unambiguous.

Patterns in strings: summary

• The generating function for any regular expression is a rational function


and can be computed algorithmically.
• In certain special cases more efficient methods (such as autocorrelation
polynomial) exist.
• We have really only considered the case where all letters appear indepen-
dently and with equal probability. In real applications, letter probabilities
vary and letters are not independent. The methods above can be extended
to cope with this.

6.3 Tries
Tries

• Each binary tree corresponds to a set of binary strings (0 encodes left


branch, 1 encodes right branch, string is given by labels on path to external
node). This set of strings is prefix-free.
• Conversely a finite prefix-free set of strings corresponds to a unique binary
tree, a full trie.

18
• More generally, we may stop branching as soon as the strings are all distin-
guished. This gives a trie, a binary tree such that all children of leaves are
nonempty. Each string is stored in an external node but not all external
nodes have strings. Can be described by symbolic method.
• A Patricia trie saves space, by collapsing one-way branches to a single
node.
• Relevant parameters: number of internal nodes In ; external path length
Ln ; length of rightmost branch Rn .

Trie recurrences
We assume that a trie is built from n infinite random bitstrings. Each bit
of each string is independently either 0 or 1. We have
•  
1 X n
Ln = n + n (Lk + Ln−k ) ,
2 k
k

•  
1 X n
In = 1 + n (Ik + In−k ) ,
2 k
k

•  
1 X n
Rn = 1 + Rk ,
2n k
k

Ln z n /n!, etc. Then


P
• Let L(z) = n

L(z) = 2L(z/2)ez/2 + zez − z;


I(z) = 2I(z/2)ez/2 + ez − z − 1;
R(z) = R(z/2)ez/2 + ez − z − 1;
P (z) = P (z/2)ez/2 .

Solving the trie recurrences, I

• If φ(z) = 2ez/2 φ(z/2) + a(z), then by iteration we obtain


X −j
φ(z) = 2j ez(1−2 ) a(2−j z).
j≥0

• Thus we obtain
X n−1 
Ln = n 1 − 1 − 2−j
j≥0
X h n n n−1 i
In = 2j 1 − 1 − 2−j − j 1 − 2−j .
2
j≥0

19
• How to derive an asymptotic approximation? See Flaj-Sedg p211, p402 for
elementary arguments. Answers: Ln ≈ n lg n, In ≈ n/ lg 2. More precise
answers are obtained by complex methods (Mellin transform).

Solving the trie recurrences, II


• Define φ̂(z) = e−z φ(z), etc (this is the Poisson transform). Then we have

φ̂(z) = 2φ̂(z/2) + â(z).

• Iteration yields X
φ̂(z) = 2j â(2−j z).
j≥0

• This gives, on inverting the transform,


k2k−1
 
k n
X
Ln = (−1) .
k 2k−1 − 1
k≥2

• Asymptotics for such alternating sums can be obtained by Rice’s method.

Summary: tries
• A useful data structure for dictionary and pattern matching. Also a math-
ematical model for many algorithms.
• Asymptotically optimal (lg n) expected search cost.
• Space wastage: about 44% extra nodes (1/ lg 2 − 1).
• Recurrences under the infinite random bitstring model yield GF equations
that are tricky. Solution involves infinite sums of functions.
• Explicit formulae for solutions are infinite sums. Mellin transforms or
Rice’s integrals give precise asymptotics; elementary methods can also be
used.

7 Randomized algorithms
Randomized Algorithms
• Incorporating random choices into an algorithm can improve its expected
performance. The key idea is that no adversary can force us into a bad
case, because our choices are unknown.
• There are philosophical problems concerning random number generation
on a deterministic computer, or even what “random” means. Pseudoran-
dom generators are used in practice; their output passes most statistical
tests for “randomness”.

20
• We can also think of a randomized algorithm as an element randomly
chosen from a set of algorithms.
• There are two main types: Monte Carlo (answer may be wrong, running
time is deterministic) and Las Vegas (answer is correct, running time is
random).

7.1 Monte Carlo Algorithms


Monte Carlo algorithms

• If the runtime on a given instance is deterministic and the answer is correct


with bounded probability, we have a Monte Carlo algorithm.
• A MC algorithm for a decision problem is biased if one of the answers
(yes/no) is always correct when given. In other words, for example,
P (Y |N ) ≡ P(says Y given answer is N) = 0, P (N |N ) = 1, P (Y |Y ) =
p, P (N |Y ) = 1 − p.
• Examples: fingerprinting (verification of identities); primality testing.

Fingerprinting

• Suppose we have a set U and want to determine whether elements u, v are


equal. It is often easier to choose a fingerprinting function f and compute
whether f (u) = f (v), in which case we return yes. Such methods are
always biased: P (N |Y ) = 0.

• Example: X, Y, Z are n × n matrices, and we want to know whether


XY = Z. Choose a random n bit vector r and compute XY r and Zr.
Can show P (Y |N ) ≤ 1/2.
• Example: M is a symbolic matrix in variables x1 , . . . xn ; we want to know
whether det(M ) = 0. Choose a finite subset S of C and choose r1 , . . . rn
independently and uniformly from S. Substitute xi = ri for all i and
compute the determinant. Can show P (Y |N ) ≤ m/|S| where m is the
total degree of the polynomial det(M ).
• There are many specific applications of the above examples.

Improving biased Monte Carlo algorithms

• Let 0 < p < 1. Say a MC algorithm is p-correct if its error probability


is at most 1 − p on every instance (sometimes p depends on the size, but
never on the instance itself).

• This means P (Y |N ) ≤ 1 − p, P (N |Y ) ≤ 1 − p.

21
• If, say, NO is always right then we can improve our confidence in the
answer by repeating the algorithm n times on the same instance. This is
amplification of the stochastic advantage. If we ever get NO, report NO.
Else report YES. Probability of error is at most (1 − p)n . To reduce this
to ε requires number of trials proportional to lg(1/ε) and to − lg(1 − p).

Improving unbiased Monte Carlo algorithms

• We need p ≥ 1/2. Repeat n (odd) times and return the more frequent
answer. Analysis is more complicated.
• Let Xi = 1 if ith run gives correct P answer, 0 otherwise. Then Xi is a
n
Bernoulli random variable and X = i=1 Xi is binomial with parameter
p. Probability of error of repeated algorithm is P (X < n/2). This is just
n j
P  n−j
j<n/2 j p (1 − p) .
• Simplifying this could be done, but it is easier to use the normal approxi-
mation: X is approximately normal with mean np and variance np(1 − p)
for n large enough. Use table of normal distribution to work out size of n
for given ε. Answer is proportional to lg(1/ε) and (p − 1/2)−2 .

Primality testing

• We wish to know whether a given positive integer n is prime (has no


proper factor other than 1).
• Fermat (1640) proved that if n is prime, and 1 ≤ a ≤ n − 1, then an−1 ≡ 1
mod n. However if n is composite (not prime) then this equality may or
may not hold.
• An obvious idea is: given n, choose a randomly and compute an−1 mod n
(recall that we can do this quickly, using a divide-and conquer approach).
If the answer is not 1, we report NO (such an a is called a witness).
• This gives a biased Monte Carlo algorithm with P (N |N ) = 1, P (Y |N ) =
0. What about P (N |Y ), P (Y |Y )?
• Unfortunately P (N |Y ) can be made arbitrarily close to 1 (some n have a
lot of false witnesses). So this algorithm is not p-correct for any p > 0.

Improving the primality testing algorithm

• There is a more complicated test (Miller-Rabin, 1976) based on Fermat’s


result that gives P (N |Y ) ≤ 1/4, and hence a 3/4-correct algorithm.
• There is an even more complicated test (Agrawal-Kayal-Saxena 2002) that
is in fact always correct, and this gives the first worst-case polynomial-time
algorithm for primality. But it is not as fast as the randomized algorithm
above.

22
7.2 Las Vegas Algorithms
Las Vegas algorithms

• Nice properties: can find more than one solution even on same input;
breaks the link between input and worse-case runtime.
• Every Las Vegas algorithm can be converted to a Monte Carlo algorithm:
just report a random answer if the running time gets too long.

• Examples:
– randomized quicksort and quickselect;
– randomized greedy algorithm for n queens problem;
– integer factorization;
– universal hashing;
– linear time MST.

Improving Las Vegas algorithms

• If a particular run takes too long, we can just stop and run again on the
same input. If the expected runtime is fast, it is very unlikely that many
iterations will fail to run fast.
• To quantify this: let p be probability of success, s the expected time to find
a solution, and f the expected time when failure occurs. Then expected
time t until we find a solution is given by t = ps + (1 − p)(f + t), so
t = s + (1 − p)f /p.
• We can use this to optimize the repeated algorithm.

Randomized quickselect

23

You might also like