Lecture 15

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

ORIE 6334 Bridging Continuous and Discrete Optimization Nov 4, 2019

Lecture 15
Lecturer: David P. Williamson Scribe: Abhishek Shetty

1 Iterative Methods
For a graph G and a supply vector b, we would like to solve the linear system LG p = b
for the potential p. We would also like to construct algorithms that  take advantage
of the sparsity of G. Even writing down L†G explicitly takes O n2 time and space.
This motivates us towards exploring iterative methods for solving linear systems of
equations. To solve a linear system Ax = b, iterative algorithms only involve mul-
tiplication of a matrix A with vectors, and for a matrix A whose sparsity is m, this
can be done in time O(m). One disadvantage of such methods is that unlike other
methods like Gaussian elimination, this only returns an approximate solution, the
gap becoming smaller the longer the algorithm runs. However, they are quite fast
and require a low amount of space.
The basic idea behind iterative methods is that to solve a system of linear equa-
tions Ax = b, where A is symmetric and positive definite, we start with a vector x0 ,
perform the linear operation A on it (along with some vector additions) to get x1 ,
and iteratively keep performing these operations to get the sequence x0 , x1 , ..., xt and
we stop when xt is sufficiently close to the vector x∗ which satisfies Ax∗ = b. The
exact details will be outlined below, but we can see that the only expensive operation
is multiplying by the matrix A, which is fast if A is sparse.
Before we dive into the algorithm, we note that if Ax∗ = b, then for any scalar α,
we have αAx∗ = αb, rearranging which gives us

x∗ = (I − αA)x∗ + αb.

This tells us that x∗ is the fixed point of the affine transformation indicated by
the equation, and naturally leads to an iterative algorithm, called the Richardson
Iteration. Formally, consider the following algorithm.
Algorithm 1: Richardson Iteration
x0 ← 0
for t ← 1 to k do
xt ← (I − αA)xt−1 + αb

0
This lecture is based on scribe notes by Shijin Rajakrishnan for a previous version of this course
and Lecture 12 of Daniel Spielman’s course on Spectral Graph Theory.

15-1
Remark 1 Note that if we rewrite our linear system as A1/2 x = A−1/2 b (assuming
A is invertible),
then the Richardson
2 iteration is equivalent to gradient descent for the
square loss 12 A1/2 x − A−1/2 b with step size α.

2

In order to analyze the algorithm, consider the following definition.

Definition 1 (Spectral Norm) Given a matrix M , define its spectral norm as

kM xk2
kM k = sup .
x6=0 kxk2

When A is symmetric, this can also be equivalently defined as

kM k = max|µi |
i

where µi are the eigenvalues of M .

Suppose that λ1 ≤ λ2 ≤ · · · ≤ λn are the eigenvalues of the matrix I − αA. Then


the eigenvalues of I − αA are 1 − αλ1 ≥ 1 − αλ2 ≥ · · · ≥ 1 − αλn , and thus

kI − αAk = max |1 − αλi | = max |1 − αλ1 |, |1 − αλn |
i

This is minimized when we take α = λ1 +λ 2


n
, yielding kI − αAk = 1 − λ12λ 1
+λn
.
Now we turn to the analysis of the convergence of the Richardson Iteration.
   
x∗ − xt = (I − αA)x∗ + αb − (I − αA)xt−1 + αb
= (I − αA)(x∗ − xt−1 )
= (I − αA)2 (x∗ − xt−2 )
= (I − αA)t (x∗ − x0 )
= (I − αA)t x∗ .

We define xt to be close to the x∗ when the norm of their difference is a small


fraction of the norm of x∗ . Then

kx∗ − xt k = k(I − αA)t x∗ k


≤ k(I − αA)t kkx∗ k
 t
2λ1
= 1− kx∗ k
λ1 + λn
 
−2λ1 t
≤ exp kx∗ k,
λ1 + λn

15-2
where the final step used the fact that 1 − x ≤ e−x . We set
     
λ1 + λn 1 λn 1 1
t= ln = + ln ,
2λ1  2λ1 2 
so that we obtain
kx∗ − xt k ≤ kx∗ k.
We can see that the speed of convergence, i.e the number of iterations required to
get close to the solution to Ax = b, depends on the ratio of the largest and smallest
eigenvalues, λλn1 , and that the larger it is, the longer it takes for the algorithm to
converge to the approximate solution.
Definition 2 For a symmetric, positive definite matrix A with eigenvalues λ1 ≤ λ2 ≤
... ≤ λn , its condition number is defined as
λn
κ(A) =
λ1
This algorithm was just one of the examples of an iterative methods to find an
approximate solution to a linear system. There are other, faster methods (such as
the Chebyshev
p method and Conjugate Gradient) that find an -approximate solution
1

in O κ(A) ln  iterations.

2 Preconditioning
We can see that if we modify the initial problem so that the condition number de-
creases, then the algorithms will run faster. Of course, since we change the problem,
we need to worry about the implications on how far the new solution is from the old,
and how the algorithm changes by changing the initial matrix. First, let us setup
some notation.

Definition 3 Given a matrix M , we say that M  0 iff M is positive semidefinite.


Given a pair of matrices A and B, we say B  A iff B − A  0

One such idea is precondition the matrix. For a matrix B  0 (or B  0) that
is symmetric and has the same nullspace as A, instead of solving Ax = b, solve
B † Ax = B † b. Now we apply the iterative methods to the matrix B † A. This provides
an improvement because we will prove that for a careful choice of B, we can reduce
the condition number of the new matrix, and thus approximate the solution faster.
For solving LG p = b, we precondition by L†H , where H is a subgraph of G. In
particular, we precondition by L†T where T is a spanning tree of G. This idea is at-
tributed to Vaidya ([1]). Now the relevant condition number is λn (L†T LG )/λ2 (L†T LG ):
We know that the smallest eigenvalue is zero, and thus look at the smallest positive
eigenvalue for the condition number, which assuming the graph is connected is λ2 .

15-3
Claim 1 For any subgraph H of G, LH  LG .

Proof: For all x,


X
x T LH x = (x(i) − x(j))2
(i,j)∈H
X
≤ (x(i) − x(j))2
(i,j)∈E

= xT LG x.

Thus, xT (LG − LH )x ≥ 0. From this, we infer that LH  LG . 2


†/2 †/2 †/2
Claim 2 L†T LG has the same spectrum as LT LG LT , where LT = √1 xi xi T
P
i:λi 6=0 λi
and λi , xi are corresponding eigenvalues and eigenvectors of LT .

Proof: Consider an eigenvector x of L†T LG of eigenvalue λ such that hx, ei =


†/2 †/2 †/2
0. Then since L†T LG x = λx, on setting x = LT y, we get L†T LG LT y = λLT y.
1/2 √ †/2 †/2
Premultiplying both sides by LG = i:λi 6=0 λi xi xi T , we obtain LT LG LT y = λy,
P
†/2 †/2
implying that λ is an eigenvalue of LT LG LT as well. 2
Using these results, we can prove a bound on the smallest positive eigenvalue of
L†T LG .

Lemma 3 x
λ2 (L†T LG ) ≥ 1.

Proof:
†/2 †/2
λ2 (L†T LG ) = λ2 (LT LG LT )
†/2 †/2
xT LT LG LT
= min
x:hx,ei=0 xT x
y T LG y
= min T
y=LT x y LT y
†/2

hx,ei=0

≥ 1,

where the final step used the fact that LT  LG . 2


So we have bounded the denominator of the condition number of L†T LG , and we
now turn to upper-bounding the numerator.

15-4
3 Connection to Low Stretch Spanning Trees
1
Suppose that G is a weighted graph, with weights w(i,j)
≥ 0, for each edge (i, j) ∈ E.
The above proof for bounding λ2 (L†T LG )
can be used to prove the same even for the
weighted case. From the last lecture, recall that for a spanning tree T of G, and an
edge e = (k, l) ∈ E, the stretch of e is defined as
P
w(i, j)
(i,j) on k-l path in T
stT (e) =
w(k, l)

and that the total stretch of the graph is


X
stT (G) = stT (e).
e∈E

Lemma 4 ([2]) tr(L†T LG ) = stT (G).

Proof:
 
X 1
tr(L†T LG ) = tr L†T (ek − el )(ek − el )T 
w(k, l)
(k,l)∈E
X 1 
† T

= tr LT (ek − el )(ek − el )
w(k, l)
(k,l)∈E
(a) X 1  
= tr (ek − el )T L†T (ek − el )
w(k, l)
(k,l)∈E
X 1
= reff (k, l)
w(k, l)
(k,l)∈E
X 1 X
= w(i, j)
w(k, l)
(k,l)∈E (i,j) in
k-l path in T

= stT (G),

where (a) used the cyclic property of the trace; that is, the trace is invariant under
cyclic permutations, and thus tr(ABC) = tr(BCA) = tr(CAB) (This is equivalent
to the fact that tr (AB) = tr (BA)), and reff (k, l) is the effective resistance in the tree
1
T for sending one unit of current from k to l, with conductances w(i,j) . 2
From this lemma, we can arrive at the required bound on the largest eigenvalue,
λn (L†T LG ) ≤ stT (G). Thus, from the previous two lemmas, we can see that the
condition number of L†T LG is at most stT (G), and †
p thus the1 linear system LT LG p =

LT b can be -approximately solved for p in O( stT (G) ln  ) iterations.

15-5
But now each iteration consists of multiplying by the matrix L†T LG , and initially,
we need to compute L†T b as well. Thus we can see that we need to be able to compute
the product of a vector with L†T in an efficient way. Suppose that we have to compute
z = L†T y, equivalently, solve LT z = y, then it turns out that since T is not just any
subgraph but rather a spanning tree, this computation can be done in time O(n).
To see this, we write down the equations in the system LT z = y:
X
dT (i)z(i) − z(j) = y(i) ∀i ∈ V.
j:{i,j}∈T

Suppose that i is a leaf in T , with an incident edge (i, j). Then the relevant equation
for this node is z(i) − z(j) = y(i), i.e, z(i) = z(j) + y(i). Note that since i is a leaf,
the only equation in which the variable z(i) appears is this one and the equation for
z(j). Thus we can substitute for z(i) with z(j) + y(i) and recurse on the smaller tree
excluding the vertex i. This recursion will continue until we end up with a single edge
(k, l). In this case, we set z(k) = 0, and back substitute to find the values of z for all
the other vertices. It can be seen that this process takes O(n) time, as in each step
of the recursion, we do constant work and there are n − 1 recursive steps.
Thus we can compute the matrix product with L†T LG in time O(m), and recalling
that for a graph G, we can find a low stretch spanning tree of stretch stT (G) =
O(m log n log log n) in time O(m log p n log log n), we
 cansee3 that given the system
LG p = b, in O m log n log log n + m stT (G) ln  = Õ m 2 ln 1 time, we can find
1

an -approximate solution.
Remember that in finding an upper bound for the largest eigenvector of L†T LG ,
we bounded it by its trace. [2] improved upon this running time bound by using the
following result.
Theorem 5 ([3]; as stated in [2]) For matrices A, B  0 with the same nullspace,
let all but q eigenvalues of B † A lie in the interval [l, u], with the remaining eigenvalues
larger than u. Then for a vector b in the rangespace of A, using the preconditioned
conjugate gradient algorithm, an -approximate solution such that√ kx − A† bkA ≤
† 1 2
pu
kA bkA can be found in q + d 2 ln  l e iterations, where kxkA = xT Ax.
We can use this theorem and since we have a bound on the trace, we can bound
2
the number of large eigenvalues: Set l = 1, u = (stT (G)) 3 , then we can have at
1
most q = u = (stT (G)) 3 eigenvalues of value more than u. Now ul = q, and thus
p
weget that the number
 of 4iteration
 required to solve the system approximately is
1
O (stT (G)) 3 ln 1 = Õ m 3 ln 1 .

References
[1] Pravin M. Vaidya. Solving linear equations with symmetric diagonally dominant
matrices by constructing good preconditioners. Unpublished manuscript, UIUC

15-6
1990. A talk based on the manuscript was presented at the IMA Workshop on
Graph Theory and Sparse Matrix Computation, October 1991, Minneapolis, MN.

[2] Daniel A. Spielman and Jaeoh Woo. A Note on Preconditioning by Low-Stretch


Spanning Trees. ArXiv 2009 http://arxiv.org/abs/0903.2816

[3] Owe Axelsson and Gunhild Lindskog. On the rate of convergence of the precondi-
tioned conjugate gradient method. Numerische Mathematik, 48(5):499–523, 1986.

15-7

You might also like