G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
com
Matrix
Algorithms
www.pdfgrip.com
Matrix
Algorithms
Volume I:Basic Decompositions
G. W.
Stewart
University of Maryland
College Park, Maryland
siam
Society for Industrial and Applied Mathematics
Philadelphia
www.pdfgrip.com
1098765432 1
All rights reserved. Printed in the United States of America. No part of this book may be reproduced,
stored, or transmitted in any manner without the written permission of the publisher. For information, write
to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia,
PA 19104-2688.
0-89871-414-1 (Volume I)
0-89871-418-4 (set)
CONTENTS
Algorithms xiii
Notation xv
Preface xvii
v
www.pdfgrip.com
vi CONTENTS
CONTENTS vii
viii CONTENTS
CONTENTS ix
x CONTENTS
CONTENTS xi
xii CONTENTS
References 417
Index 441
www.pdfgrip.com
ALGORITHM S
xiii
www.pdfgrip.com
xiv ALGORITHMS
NOTATION
xv
www.pdfgrip.com
xvi NOTATION
PREFACE
This book, Basic Decompositions, is the first volume in a projected five-volume series
entitled Matrix Algorithms. The other four volumes will treat eigensystems, iterative
methods for linear systems, sparse direct methods, and special topics, including fast
algorithms for structured matrices.
My intended audience is the nonspecialist whose needs cannot be satisfied by black
boxes. It seems to me that these people will be chiefly interested in the methods them-
selves—how they are derived and how they can be adapted to particular problems.
Consequently, the focus of the series is on algorithms, with such topics as rounding-
error analysis and perturbation theory introduced impromptu as needed. My aim is to
bring the reader to the point where he or she can go to the research literature to augment
what is in the series.
The series is self-contained. The reader is assumed to have a knowledge of ele-
mentary analysis and linear algebra and a reasonable amount of programming expe-
rience— about what you would expect from a beginning graduate engineer or an un-
dergraduate in an honors program. Although strictly speaking the individual volumes
are not textbooks, they are intended to teach, and my guiding principle has been that
if something is worth explaining it is worth explaining fully. This has necessarily re-
stricted the scope of the series, but I hope the selection of topics will give the reader a
sound basis for further study.
The focus of this and part of the next volume will be the computation of matrix
decompositions—that is, the factorization of matrices into products of simpler ones.
This decompositional approach to matrix computations is relatively new: it achieved
its definitive form in the early 1960s, thanks to the pioneering work of Alston House-
holder and James Wilkinson. Before then, matrix algorithms were addressed to spe-
cific problems—the solution of linear systems, for example — and were presented at
the scalar level in computational tableaus. The decompositional approach has two ad-
vantages. First, by working at the matrix level it facilitates the derivation and analysis
of matrix algorithms. Second, by deemphasizing specific problems, the approach turns
the decomposition into a computational platform from which a variety of problems can
be solved. Thus the initial cost of computing a decomposition can pay for itself many
times over.
In this volume we will be chiefly concerned with the LU and the QR decomposi-
tions along with certain two-sided generalizations. The singular value decomposition
xvn
www.pdfgrip.com
xviii PREFACE
also plays a large role, although its actual computation will be treated in the second
volume of this series. The first two chapters set the stage not only for the present vol-
ume but for the whole series. The first is devoted to the mathematical background—
matrices, vectors, and linear algebra and analysis. The second chapter discusses the
realities of matrix computations on computers.
The third chapter is devoted to the LU decomposition—the result of Gaussian
elimination. This extraordinarily flexible algorithm can be implemented in many dif-
ferent ways, and the resulting decomposition has innumerable applications. Unfortu-
nately, this flexibility has a price: Gaussian elimination often quivers on the edge of
instability. The perturbation theory and rounding-error analysis required to understand
why the algorithm works so well (and our understanding is still imperfect) is presented
in the last two sections of the chapter.
The fourth chapter treats the QR decomposition—the factorization of a matrix
into the product of an orthogonal matrix and an upper triangular matrix. Unlike the
LU decomposition, the QR decomposition can be computed two ways: by the Gram-
Schmidt algorithm, which is old, and by the method of orthogonal triangularization,
which is new. The principal application of the decomposition is the solution of least
squares problems, which is treated in the second section of the chapter. The last section
treats the updating problem—the problem of recomputing a decomposition when the
original matrix has been altered. The focus here is on the QR decomposition, although
other updating algorithms are briefly considered.
The last chapter is devoted to decompositions that can reveal the rank of a matrix
and produce approximations of lower rank. The issues stand out most clearly when the
decomposition in question is the singular value decomposition, which is treated in the
first section. The second treats the pivoted QR decomposition and a new extension,
the QLP decomposition. The third section treats the problem of estimating the norms
of matrices and their inverses—the so-called problem of condition estimation. The
estimators are used in the last section, which treats rank revealing URV and ULV de-
compositions. These decompositions in some sense lie between the pivoted QR de-
composition and the singular value decomposition and, unlike either, can be updated.
Many methods treated in this volume are summarized by displays of pseudocode
(see the list of algorithms following the table of contents). These summaries are for
purposes of illustration and should not be regarded as finished implementations. In
the first place, they often leave out error checks that would clutter the presentation.
Moreover, it is difficult to verify the correctness of algorithms written in pseudocode.
In most cases, I have checked the algorithms against MATLAB implementations. Un-
fortunately, that procedure is not proof against transcription errors.
A word on organization. The book is divided into numbered chapters, sections,
and subsections, followed by unnumbered subsubsections. Numbering is by section,
so that (3.5) refers to the fifth equations in section three of the current chapter. Ref-
erences to items outside the current chapter are made explicitly—e.g., Theorem 2.7,
Chapter 1.
www.pdfgrip.com
PREFACE xix
Initial versions of the volume were circulated on the Internet, and I received useful
comments from a number of people: Lawrence Austin, Alekxandar S. Bozin, Andrew
H. Chan, Alan Edelman, Lou Ehrlich, Lars Elden, Wayne Enright, Warren Ferguson,
Daniel Giesy, Z. Han, David Heiser, Dirk Laurie, Earlin Lutz, Andrzej Mackiewicz,
Andy Mai, Bart Truyen, Andy Wolf, and Gehard Zielke. I am particularly indebted
to Nick Higham for a valuable review of the manuscript and to Cleve Moler for some
incisive (what else) comments that caused me to rewrite parts of Chapter 3.
The staff at SIAM has done their usual fine job of production. I am grateful to
Vickie Kearn, who has seen this project through from the beginning, to Mary Rose
Muccie for cleaning up the index, and especially to Jean Keller-Anderson whose care-
ful copy editing has saved you, the reader, from a host of misprints. (The ones remain-
ing are my fault.)
Two chapters in this volume are devoted to least squares and orthogonal decom-
positions. It is not a subject dominated by any one person, but as I prepared these
chapters I came to realize the pervasive influence of Ake Bjorck. His steady stream
of important contributions, his quiet encouragment of others, and his definitive sum-
mary, Numerical Methods for Least Squares Problems, have helped bring the field to
a maturity it might not otherwise have found. I am pleased to dedicate this volume to
him.
G. W. Stewart
College Park, MD
www.pdfgrip.com
1
MATRICES, ALGEBRA, AND ANALYSIS
There are two approaches to linear algebra, each having its virtues. The first is abstract.
A vector space is defined axiomatically as a collection of objects, called vectors, with
a sum and a scalar-vector product. As the theory develops, matrices emerge, almost
incidentally, as scalar representations of linear transformations. The advantage of this
approach is generality. The disadvantage is that the hero of our story, the matrix, has
to wait in the wings.
The second approach is concrete. Vectors and matrices are defined as arrays of
scalars—here arrays of real or complex numbers. Operations between vectors and
matrices are defined in terms of the scalars that compose them. The advantage of this
approach for a treatise on matrix computations is obvious: it puts the objects we are
going to manipulate to the fore. Moreover, it is truer to the history of the subject. Most
decompositions we use today to solve matrix problems originated as simplifications of
quadratic and bilinear forms that were defined by arrays of numbers.
Although we are going to take the concrete approach, the concepts of abstract lin-
ear algebra will not go away. It is impossible to derive and analyze matrix algorithms
without a knowledge of such things as subspaces, bases, dimension, and linear trans-
formations. Consequently, after introducing vectors and matrices and describing how
they combine, we will turn to the concepts of linear algebra. This inversion of the tra-
ditional order of presentation allows us to use the power of matrix methods to establish
the basic results of linear algebra.
The results of linear algebra apply to vector spaces over an arbitrary field. How-
ever, we will be concerned entirely with vectors and matrices composed of real and
complex numbers. What distinguishes real and complex numbers from an arbitrary
field of scalars is that they posses a notion of limit. This notion of limit extends in a
straightforward way to finite-dimensional vector spaces over the real or complex num-
bers, which inherit this topology by way of a generalization of the absolute value called
the norm. Moreover, these spaces have a Euclidean geometry—e.g., we can speak of
the angle between two vectors. The last section of this chapter is devoted to exploring
these analytic topics.
1
www.pdfgrip.com
1. VECTORS
Since we are going to define matrices as two-dimensional arrays of numbers, called
scalars, we could regard a vector as a degenerate matrix with a single column, and a
scalar as a matrix with one element. In fact, we will make such identifications later.
However, the words "scalar" and "vector" carry their own bundles of associations, and
it is therefore desirable to introduce and discuss them independently.
1.1. SCALARS
Although vectors and matrices are represented on a computer by floating-point num-
bers — and we must ultimately account for the inaccuracies this introduces—it is con-
venient to regard matrices as consisting of real or complex numbers. We call these
numbers scalars.
where x and y are real and i is the principal square root of -1. The number x is the real
part of z and is written Re z. The number y is the imaginary part of z and is written
Im z. The absolute value, or modulus, of z is \z\ = ^/x1 -f y2- The conjugate x — iy
of z will be written z. The following relations are useful:
SEC. 1. VECTORS 3
is the unit circle in the complex plane. We will use the standard notation X U y, X n X
and -X \ y for the union, intersection, and difference of sets.
If a set of objects has operations these operations can be extended to subsets of
objects in the following manner. Let o denote a binary operation between objects, and
let X and y be subsets. Then X o y is defined by
The extended operation is called the Minkowski operation. The idea of a Minkowski
operation generalizes naturally to operations with multiple operands lying in different
sets.
For example, if C is the unit circle defined above, and B = {—1,1}, then the
Minkowski sum B + C consists of two circles of radius one, one centered at —1 and
the other centered at 1.
1.2. VECTORS
In three dimensions a directed line segment can be specified by three numbers x, y,
and z as shown in Figure 1.2. The following definition is a natural generalization of
this observation.
We also write
The scalars Xi are called the COMPONENTS ofx. The set ofn-vectors with real compo-
nents will be written Rn. The set ofn-vectors with real or complex components will
be written C n . These sets are called REAL and COMPLEX W-SPACE.
In addition to allowing vectors with more than three components, we have allowed
the components to be complex. Naturally, a real vector of dimension greater than three
cannot be represented graphically in the manner of Figure 1.2, and a nontrivial com-
plex vector has no such representation. Nonetheless, most facts about vectors can be
illustrated by drawings in real 2-space or 3-space.
Vectors will be denoted by lower-case Latin letters. In representing the compo-
nents of a vector, we will generally use an associated lower-case Latin or Greek letter.
Thus the components of the vector 6 will be 6; or possibly /%. Since the Latin and
Greek alphabets are not in one-one correspondence, some of the associations are arti-
ficial. Figure 1.3 lists the ones we will use here. In particular, note the association of
£ with x and 77 with y.
The zero vector is the vector whose components are all zero. It is written 0, what-
ever its dimension. The vector whose components are all one is written e. The vector
www.pdfgrip.com
SEC. 1. VECTORS 5
whose z'th component is one and whose other components are zero is written et and is
called the ith unit vector.
In summary,
Definition 1.2. Let x and y be n-vectors and a be a scalar. The SUM of x and y is the
vector
The following properties are easily established from the definitions of the vector
sum and scalar-vector product.
Theorem 1.3. Let x, y, and z be n-vectors and a and (3 be scalars. Then
www.pdfgrip.com
The properties listed above insure that a sum of products of the form
Example 1.4. The following are vector spaces under the Datura! operations of sum-
mation and multiplication by a scalar.
1. The set Pn of polynomials of degree not greater than n
2. The set P^ of polynomials of any degree
3. The set C[0,1] of all real functions continuous on [0,1]
• The first example is really our friend Cn+1 in disguise, since the polynomial ctQZ°+
ct\zl H \-anzn can be identified with the (n + l)-vector(ao, «i? • • • >«n) T in such
a way that sums and scalar-vector products in the two spaces correspond.
Any member of Pn can be written as a linear combination of the monomials z°,
zl,... , zn, and no fewer will do the job. We will call such a set of vectors a basis for
the space in question (see §3.1).
• The second example cannot be identified with Cn for any n. It is an example of an
infinite-dimensional vector space. However, any element of "Poo can be written as the
finite sum of monomials.
• The third example, beloved of approximation theorists, is also an infinite-dimen-
sional space. But there is no countably infinite set of elements such that any member
C[0,1] can be written as a finite linear combination of elements of the set. The study
of such spaces belongs to the realm of functional analysis.
Given rich spaces like C[0,1], little spaces like R n may seem insignificant. How-
ever, many numerical algorithms for continuous problems begin by reducing the prob-
lem to a corresponding finite-dimensional problem. For example, approximating a
member of C[0,1] by polynomials of bounded degree immediately places us in a finite-
dimensional setting. For this reason vectors and matrices are important in almost every
branch of numerical analysis.
www.pdfgrip.com
SEC. 2. MATRICES 7
Function spaces
The space C[0,1] is a distinguished member of a class of infinite-dimensional spaces
called function spaces. The study of these spaces is called functional analysis. The
lack of a basis in the usual sense is resolved by introducing a norm in which the space
is closed. For example, the usual norm for C[0,1] is defined by
2. MATRICES
When asked whether a programming language supports matrices, many people will
think of two-dimensional arrays and respond, "Yes." Yet matrices are more than two-
dimensional arrays — they are arrays with operations. It is the operations that cause
matrices to feature so prominently in science and engineering.
www.pdfgrip.com
2.1. MATRICES
Matrices and the matrix-vector product arise naturally in the study of systems of equa-
tions. An raxn system of linear equations
and define the product Ax by the left-hand side of (2.2), then (2.1) is equivalent to
The scalars a^ are caJJed the ELEMENTS of A. The set ofmxn matrices with real
elements is written R mxn. The set ofm x n matrices with real or complex components
is written Cmxn.
The indices i and j of the elements a z j of a matrix are called respectively the row
index and the column index. Typically row and column indices start at one and work
their way up by increments of one. In some applications, however, matrices begin with
zero or even negative indices.
www.pdfgrip.com
SEC. 2. MATRICES 9
Matrices will be denoted by upper-case Latin and Greek letters. We will observe
the usual correspondences between the letter denoting a matrix and the letter denoting
its elements (see Figure 1.3).
We will make no distinction between a Ix 1 matrix a 1-vector and a scalar and
likewise for nx 1 matrices and ra-vectors. A Ixn matrix will be called an n-dimen-
sional row vector.
Familiar characters
• Void matrices. A void matrix is a matrix with no rows or no columns (or both).
Void matrices are convenient place holders in degenerate matrix partitions (see §2.4).
• Square matrices. An nxn matrix A is called a square matrix. We also say that A
is of order n.
• The zero matrix. A matrix whose elements are zero is called a zero matrix, written
0.
• Identity matrices. The matrix In of order n defined by
is called the identity matrix. The ith column of the identity matrix is the z'th unit vector
e;: symbolically,
When context makes the order clear, we will drop the subscript and simply write / for
the identity matrix.
• Permutation matrices. Let J = {z'i, i^... , in} be a permutation of the integers
1,2,... , n. The matrix
is called a permutation matrix. Thus a permutation matrix is just an identity with its
columns permuted. Permutation matrices can be used to reposition rows and columns
of matrices (see §2.5).
The permutation obtained by exchanging columns i and j of the identity matrix
is called the (i, j)-exchange matrix. Exchange matrices are used to interchange rows
and columns of other matrices.
www.pdfgrip.com
Patterned matrices
An important theme in matrix computations is the reduction of matrices to ones with
special properties, properties that make the problem at hand easy to solve. Often the
property in question concerns the distribution of zero and nonzero elements in the ma-
trix. Although there are many possible distributions, a few are ubiquitous, and we list
them here.
• Diagonal matrices. A square matrix D is diagonal if
In other words, a matrix is diagonal if its off-diagonal elements are zero. To specify a
diagonal matrix with diagonal elements 61, 62,..., 8n, we write
SEC. 2. MATRICES 11
These cross matrices are obtained from their more placid relatives by reversing the
orders of their rows and columns. We will call any matrix form obtained in this way
a cross form.
• Hessenberg matrices. A matrix A is upper Hessenberg if
www.pdfgrip.com
It acquires its name from the fact that it consists of three diagonals: a superdiagonal,
a main diagonal, and a subdiagonal.
A matrix is lower bidiagonal if it is lower triangular and tridiagonal; that is, if it
has the form
SEC. 2. MATRICES 13
Definition 2.2. Let A be a scalar and A and B bemxn matrices. The SCALAR-MATRIX
PRODUCT of A and A is the matrix
The matrix sum is defined only for matrices having the same dimensions. Such
matrices are said to be conformable with respect to summation, or when the context is
clear simply conformable. Obviously the matrix sum is associative [i.e., (A + B] +
C - A + (B + C)] and commutative [i.e., A + B = B + A]. The identity for
summation is the conforming zero matrix.
These definitions make E m x n a real mn-dimensional vector space. Likewise the
space C m x n is a complex mn-dimensional vector space. Thus any general results
about real and complex vector spaces hold for E m x n and C m x n .
Then y and 6 are related by a linear system Cy = 6, where the coefficients matrix C
can be obtained by substituting the scalar formulas for the components of x = By into
the scalar form of the equation Ax = b. It turns out that
www.pdfgrip.com
On the other hand, if we symbolically substitute B y for x in the first equation we get
the equation
Thus, the matrix product should satisfy AB = C, where the elements of C are given
by (2.7). These considerations lead to the following definition.
Definition 2.3. Let Abeantxm matrix and B be a m x n matrix. The product of A
and B is the ixn matrix C whose elements are
For the product AB to be defined the number of columns of A must be equal to the
number of rows of B. In this case we say that A and B are conformable with respect to
multiplication. The product has the same number of rows as A and the same number
of columns as B.
It is easily verified that if AeC m X n then
The failure to respect the noncommutativity of matrix products accounts for the bulk
of mistakes made by people encountering matrices for the first time.
Since we have agreed to make no distinction between vectors and matrices with a
single column, the above definition also defines the matrix-vector product Ax, which
of course reduces to (2.1).
SEC. 2. MATRICES 15
By our conventions, vectors inherit the above definition of transpose and conjugate
transpose. The transpose ZT of an n-vector # is an n-dimensional row vector.
The transpose and the conjugate transpose of a real matrix are the same. For a
complex matrix they are different, and the difference is significant. For example, the
number
is a nonnegative number that is the natural generalization of the square of the Euclidean
length of a 3-vector. The number XTX has no such interpretation for complex vectors,
since it can be negative, complex, or even zero for nonzero x. For this reason, the sim-
ple transpose is used with complex vectors and matrices only in special applications.
The transpose and conjugate transpose interact nicely with matrix addition and
multiplication. The proof of the following theorem is left as an exercise.
Theorem 2.5. Let A and B be matrices. IfA + B is defined, then
If AB is defined, then
Symmetric matrices are so called because they are symmetric about their diago-
nals:
from which it immediately follows that the diagonal elements of a Hermitian matrix
are real. The diagonals of a real skew symmetric matrix are zero, and the diagonals of
a skew Hermitian matrix are pure imaginary. Any real symmetric matrix is Hermitian,
but a complex symmetric matrix is not.
www.pdfgrip.com
The second function requires a little preparation. Let Z = (t'i, 1*2,..., in) be a
permutation of the integers {1,2,... , n}. The function
where (H, ^2, • • • , in) ranges over all permutations of the integers 1, 2 , . . . , n.
The determinant has had a long and honorable history in the theory of matrices.
It also appears as a volume element in multidimensional integrals. However, it is not
much used in the derivation or analysis of matrix algorithms. For that reason, we will
not develop its theory here. Instead we will list some of the properties that will be used
later.
Theorem 2.9. The determinant has the following properties (here we introduce ter-
minology that will be defined later).
SEC. 2. MATRICES 17
Submatrices
A submatrix of a matrix A is a matrix formed from the intersection of sets of rows and
columns of A. For example, if A is a 4x4 matrix, the matrices
consisting of the elements in the intersection of rows I < ii < i% < • • • < ip < m
and columns 1 < j\ < J2 < • • • < jq < n is a SUBMATRIX A. The COMPLE-
MENTARY SUBMATRIX is the submatrix corresponding to the complements of the sets
{z'i,i 2 ,... ,z p } and{ji,J2,... ,jj. If we have ik+i = 4+1 (k = 1,... ,p-l)and
jk+l = jk+1 (k = 1,... , q-1), then B is a CONTIGUOUS SUBMATRIX. If p = q and
ik = jk (k = 1,... , p), then B is a PRINCIPAL SUBMATRIX. If ip = p andjq = q, then
B is a LEADING SUBMATRIX. If, on the other hand, ii — m—p+1 and ji — n—q+1,
then B is a TRAILING SUBMATRIX.
Thus a principal submatrix is one formed from the same rows and columns. A
leading submatrix is a submatrix in the northwest corner of A. A trailing submatrix
lies in the southeast corner. For example, in the following Wilkinson diagram
www.pdfgrip.com
the 3x3 matrix whose elements are t is a leading principal submatrix and the 2x3
submatrix whose element are t is a trailing submatrix.
Partitions
We begin with a definition.
By this definition The blocks in any one column must all have the same number
of columns. Similarly, the blocks in any one row must have the same number of rows.
A matrix can be partitioned in many ways. We will write
where o,j is the jth column of A. In this case A is said to be partitioned by columns.
[We slipped in a partition by columns in (2.3).] A matrix can also be partitioned by
rows:
where aj is the z'th row of A. Again and again we will encounter the 2x2 partition
SEC. 2. MATRICES 19
Northwest indexing
The indexing conventions we have used here are natural enough when the concern is
with the partition itself. However, it can lead to conflicts of notation when it comes to
describing matrix algorithms. For example, if A is of order n and in the partition
the submatrix AH is of order n-1, then the element we have designated by 0:22 is
actually the (n, n)-element of A and must be written as such in any algorithm. An
alternate convention that avoids this problem is to index the blocks of a partition by
the position of the element in the northwest corner of the blocks. With this convention
the above matrix becomes
We will call this convention northwest indexing and say that the partition has been
indexed to the northwest.
and
then
provided that the dimensions of the partitions allow the indicated products and sums.
In other words, the partitioned product is formed by treating the submatrices as scalars
and performing an ordinary multiplication of 2x2 matrices. This idea generalizes.
The proof of the following theorem is left as an exercise.
and
and the same eauation holds with the transpose replaced bv the conjugate transpose.
The restrictions on the dimensions of the matrices in the above theorem insure con-
formity. The general principal is to treat the submatrices as scalars and perform the
operations. However, in transposition the individual submatrices must also be trans-
posed. And in multiplying partitioned matrices, keep in mind that the matrix product
is not commutative.
Block forms
The various forms of matrices—diagonal matrices, triangular matrices, etc. —have
block analogues. For example, a matrix A is block upper triangular if it can be parti-
tioned in the form
SEC. 2. MATRICES 21
Inner products
Given two n-vectors x and y, the inner product of x and y is the scalar
When x and y are real 3-vectors of length one, the inner product is the cosine of the
angle between x and y. This observation provides one way of extending the definition
of angle to more general settings [see (4.18) and Definition 4.35].
The inner product is also known as the scalar product or the dot product.
Outer products
Given an n-vector x and an m-vector y, the outer product of x and y is the mxn
matrix
The outer product is a special case of a full-rank factorization to be treated later (see
Theorem 3.13).
Linear combinations
The linear combination
has a useful matrix representation. Let X = (xi x 2 ••• x^) and form a vector a =
T
(ai a.2 ''' ak] from the coefficients of the linear transformation. Then it is easily
verified that
In other words:
Thus:
The columns of X A are linear combinations of the columns o f X . The coeffi-
cients of the linear combination for thejth column Xa are the elements of the
jth column of A.
If A is partitioned by columns,
In other words:
The columns of the product AD are the original columns scaled by the corre-
sponding diagonal elements ofD.
Likewise:
The rows of DA are the original rows scaled by the diagonal elements ofD.
In other words:
Postmultiplying a matrix by a permutation matrix permutes the columns of the
matrix into the order of the permutation.
Likewise:
Premultiplying a matrix by the transpose of a permutation matrix permutes the
rows of the matrix into the order of the permutation.
www.pdfgrip.com
SEC. 2. MATRICES 23
Undoing a permutation
It is easy to see that:
If P is a permutation matrix, then PTP = PPT = /.
Consequently, having interchanged columns by computing B — AP, we can undo the
interchanges by computing A — BPT.
Crossing a matrix
Let feRnXn be defined by
Then it is easily verified that if T is a triangular matrix then fT and Tf are cross
triangular. More generally, f x is the vector obtained by reversing the order of the
components of x. We will call f the cross operator.
Then E^-AF is the submatrix in the intersection of rows {z'i,... , ip} and col-
umns {ji,... ,jq] of A. Moreover, ifBeCpxq, then forming A + EBFT re-
places the submatrix ETAF with ETAF + B.
2.6. LU DECOMPOSITIONS
A matrix decomposition is a factorization of a matrix into a product of simpler ma-
trices. Decompositions are useful in matrix computations because they can simplify
the solution of a problem. For example, if a matrix can be factored into the product
of lower and upper triangular matrices, the solution of a linear system involving that
matrix reduces to the solution of two triangular systems. The existence of such an
LU decomposition is the point of the following theorem.
where
www.pdfgrip.com
Now let m > 1. Let P and Q be permutations such that the (1,1)-element of
PTAQ is nonzero. Partition
and let
If we set
and
SEC. 2. MATRICES 25
• The proof of Theorem 2.13 is constructive in that it presents an algorithm for com-
puting LU decompositions. Specifically, interchange rows and columns of A so that its
(1, l)-elementis nonzero. Then with A partitioned as in (2.13), form B - a~lcdP and
apply the procedure just sketched recursively. This process is called Gaussian elimi-
nation.
• The integer k is unique, but the proof does not establish this fact. For a proof see
Theorem 3.13.
or, when A has more rows than columns, at least compute an x such that Ax is a good
approximation to 6. In either case the solution will be unique if and only if the homo-
geneous equation or system
has only the solution v = 0. For if Av — 0 with v ^ 0, then A(x + v) = Ax, and
x + v solves the problem whenever x does. Conversely if Ax — Ay for x ^ y, the
vector v = x - y is a nontrivial solution of (2.14).
If A G C m x n and ra > n, the homogeneous equation (2.14) has a nontrivial so-
lution only in special circumstances [see (3.14.1)]. If ra < n, the system (2.14) is said
to be underdetermined. Because its right-hand side is zero it always has a nontrivial
solution, as the proof of following theorem shows.
Theorem 2.14. An underdetermined homogeneous system has a nontrivial solution.
Proof. If A = 0, any nonzero vector v is a solution of (2.14). Otherwise let PT AQ =
LU bean LU decomposition of A. Suppose that Uw = 0, where w ^ 0. Then with
v = Qw, we have
Thus the problem becomes one of finding a nontrivial solution of the system Uw = 0.
Because U is upper trapezoidal, we can write it in the form
Moreover, v ^ 0. By an obvious induction step (we leave the base case as an exercise),
the system U*w* = 0 has a nontrivial solution. Let WT = (-v^u^w* wj). Then
www.pdfgrip.com
Indexing conventions
The reason matrix indices begin at one in this work and in most books and articles
on matrix computations is that they all treat matrix algorithms independently of their
applications. Scientists and engineers, on the other hand, have no difficulty coming
up with unusual indexing schemes to match their applications. For example, queue-
ing theorists, whose queues can be empty, generally start their matrices with a (0,0)-
element.
Determinants
Most linear algebra texts treat determinants in varying degrees. For a historical survey,
Muir's Theory of Determinants in the Historical Order of Development [238] is unsur-
passed. His shorter Treatise on the Theory of Determinants [237] contains everything
you wanted to know about determinants—and then some.
www.pdfgrip.com
SEC. 2. MATRICES 27
Partitioned matrices
Partitioning is a powerful tool for proving theorems and deriving algorithms. A typical
example is our derivation of the LU decomposition. An early example is Schur's proof
that any matrix is unitarily similar to a triangular matrix [274, 1909]. However, the
technique came to be widely used only in the last half of this century. It is instructive
to compare the treatment of matrix algorithms in Dwyer's Linear Computations [112,
1951], which looks backward to the days of hand computation, with the treatment in
Householder's Principles of Numerical Analysis [187,1953], which looks forward to
digital computers.
The northwest indexing convention is, I think, new. It has the additional advantage
that if the dimensions of a partitioned matrix are known, the dimensions of its blocks
can be determined by inspection.
The LU decomposition
The LU decomposition was originally derived as a decomposition of quadratic and
bilinear forms. Lagrange, in the very first paper in his collected works [205, 1759],
derives the algorithm we call Gaussian elimination, using it to find out if a quadratic
form is positive definite. His purpose was to determine whether a stationary point of a
function was actually a minimum. Lagrange's work does not seem to have influenced
his successors.
The definitive treatment of decomposition is due to Gauss, who introduced it in
his treatment of the motion of heavenly bodies [130,1809] as a device for determin-
ing the precision of least squares estimates and a year after [131,1810] as a numerical
technique for solving the normal equations. He later [134,1823] described the algo-
rithm as follows. Here Q is a residual sum of squares which depends on the unknown
parameters x, y, z, etc.
Specifically, the function £1 can be reduced to the form
in which the divisors A°, B', C", V", etc. are constants and u°, u', u", u"', etc.
are linear functions o f x , y, z, etc. However, the second function u' is indepen-
dent ofx; the third u" is independent ofx and y; the fourth u'" is independent
ofx, y, and z, and so on. The last function w^"1) depends only on the last of
the unknowns x, y, z, etc. Moreover, the coefficients A°, B', C", etc. multiply
x,y, z, etc. in u°,uf, u", etc. respectively. Given this reduction, we may easily
find x,y, z, etc. in reverse order after setting u° — 0, u' = 0, u" = 0, u'" = 0,
etc.
The relation to the LU decomposition is that the coefficients of Gauss's x, y, z, etc. in
the functions u°, u', u", etc. are proportional to the rows of U. For more details see
[306].
www.pdfgrip.com
Both Lagrange and Gauss worked with symmetric matrices. The extension to gen-
eral matrices is due to Jacobi [191,1857, posthumous], who reduced a bilinear form
in the spirit of Lagrange and Gauss.
3. LINEAR ALGEBRA
The vector spaces En and C n have an algebraic structure and an analytic structure.
The latter is inherited from the analytic properties of real and complex numbers and
will be treated in §4, where norms and limits are introduced. The algebraic structure is
common to all finite-dimensional vector spaces, and its study is called linear algebra.
The purpose of this section is to develop the fundamentals of linear algebra. For defi-
niteness, we will confine ourselves to En, but, with the exception of (3.11), the results
hold for any finite-dimensional vector space.
Subspaces
Any linear combination of vectors in a vector space remains in that vector space; i.e.,
vector spaces are closed under linear combinations. Subsets of a vector space may or
may not have this property. For example, the usual (x, ?/)-plane in E3, defined by
is not closed under linear combinations, since the difference of two vectors with non-
negative components may have negative components. More subtly, En regarded as
a subset of Cn is not closed under linear combinations, since the product of a real
nonzero vector and a complex scalar has complex components.
Subsets closed under linear combinations have a name.
Definition 3.1. A nonempty subset X C En is a SUBSPACE if
Subspaces have an algebra of their own. The proof of the following theorem is
left as an exercise.
Theorem 3.2. Let X and y be subspaces of R n . Then the following are subspaces.
Since for any subspace X we have {0} + X — X, the subspace consisting of only
the zero vector acts as an additive identity. If we regard the operation of intersection
as a sort of multiplication, then {0} is an annihilator under multiplication, as it should
be.
If X and y are subspaces of E n and X r\y = {0}, we say that the subspaces are
disjoint. Note that disjoint subspaces are not disjoint sets, since they both contain the
zero vector. The sum of disjoint subspaces X and y is written X © y and is called the
direct sum of X and y.
The set of all linear combinations of a set of vectors X is easily seen to be a sub-
space. Hence the following definition.
Definition 3.3. Let X C En. The set of all linear combinations of members of X is
a subspace y called the SPAN of X. We write
The space spanned by the vectors x\, x^,... , Xk is also written span(zi, #2, • • • » X k ) .
In particular,
Linear independence
We have just observed that the unit vectors span R n . Moreover, no proper subset of
the unit vectors spans Rn. For if one of the unit vectors is missing from (3.2), the
corresponding component of x is zero. A minimal spanning set such as the unit vectors
is called a basis. Before we begin our treatment of bases, we introduce a far reaching
definition.
This matrix formulation of linear independence will be widely used in what follows.
• Any set containing the zero vector is linearly dependent. In particular, a set con-
sisting of a single vector x is independent if and only if x ^ 0.
• If #1, #2, ... , £& are linearly dependent, then one of them can be expressed as a
linear combination of the others. For there are constants a t , not all zero, such that
If, say, otj ^ 0, then we may solve (3.3) for Xj in the form Xj = a-l Y^i^j <*ixi- In
particular, if x\, X2, - •., £fc-i are independent, then j can be taken equal to k. For if
at = 0, in (3.3), then xi, # 2 , . . . , Xk-i are linearly dependent.
• If a vector can be expressed as a linear combination of a set of linearly independent
vectors x i , X 2 , - . . , X k t then that expression is unique. For if
then 2f-(ai ~ Pi)xi = 0.and by the linear independence of the ar,- we have at - fa = 0
(» = !,...,*).
• A particularly important example of a set of linearly independent vectors is the col-
umns of a lower trapezoidal matrix L whose diagonal elements are nonzero. For sup-
pose La = 0, with a / 0. Let a,- be the first nonzero component of a. Then writing
out the «th equation from the relation La = 0, we get
the last equality following from the fact that ai, 02,..., a»-i are all zero. Since in ^
0, we must have a,- = 0, a contradiction. This result is also true of upper and lower
triangular matrices.
Bases
A basis for a subspace is a set of linearly independent vectors that span the subspace.
Definition 3.5. Let X be a subspace of Rn. A set of vectors {bi, 62,... , 6jt} is a BA-
SIS forX if
1. &i, 6 2 , . . . , bk are linearly independent,
www.pdfgrip.com
Example 3.6. The space R m X n regarded as a vector space has the basis
If B — {&i, 62, • • • » b k } is a basis for X and B = (&i 62 • • • &fc), then any mem-
ber x£_X can be written uniquely in the form x = Ba. This characterization is so
useful that it will pay to abuse nomenclature and call the matrix B along with the set
B a basis for X.
We want to show that every subspace has a basis. An obvious way is to start pick-
ing vectors from the subspace, throwing away the dependent ones and keeping the ones
that are independent. The problem is to assure that this process will terminate. To do
so we have to proceed indirectly by first proving a theorem about bases before proving
that bases exist.
Proof. Let B = (b\ 62 • • • &fc) and C = (GI c? • • - Q), where £ > k. Then each
column of C is a linear combination of the columns of B, say ct = Bvi, where V{£Rk.
If we set V = (vi v% • • • v^}, then C = BV. But V has more columns than rows.
Hence by Theorem 2.14 there is a nonzero vector w such that Vw — 0. It follows that
Corollary 3.8. If B and B' are bases for the subspace X, then they have the same num-
ber of elements.
For if they did not, the larger set would be linearly dependent. In particular, since the
n unit vectors form a basis for En, any basis of R n has n elements.
We are now in a position to show that every subspace has a basis. In particular, we
can choose a basis from any spanning subset—and even specify some of the vectors.
www.pdfgrip.com
Theorem 3.9. Let X be a nontrivial subspace of R n that is spanned by the set B. Sup-
pose 61,62,... , bt€B are linearly independent. Then there is a subset
Proof. Let Bt = {&i,62,..., 6^}. Note that t may be zero, in which case BO is the
empty set.
Suppose now that for i > t we have constructed a set #, C B of t linearly inde-
pendent vectors. If there is some vector 6;+i # that is not a linear combination of the
members of Bi, adjoin it to Bi to get a new set of linearly independent vectors Bt+i.
Since R n cannot contain more than n linearly independent vectors, this process of ad-
joining vectors must stop with some Bk, where k < n.
We must now show that any vector x^X can be expressed as a linear combination
of the members of B^. Since B spans X, the vector x may be expressed as a linear
combination of the members of B: say
Two spaces satisfying (3.4) are said to be complementary. We thus have the following
result.
Dimension
Since any nontrivial subspace of R n has a basis and all bases for it have the same num-
ber of elements, we may make the following definition.
Thus the dimension of Rn is n — a fact which seems obvious but, as we have seen,
takes some proving.
The dimension satisfies certain relations
Theorem 3.12. For any subspaces X and y ofRn,
and
Proof. We will prove the second equality, leaving the first inequality as an exercise.
Let dim(;r n y) = j, dim(A') = jfc, and dim(^) = I. Let AeR n X J be aba-
sis for X fl y and extend it to a basis (A B) for X. Note that £eR nx(A: ~ j) . Simi-
larly let CeR n X ^~ j ) be such that (A C) is a basis for y. Then clearly the columns
of (A B C) span X + X But the columns of (A B C) are linearly independent. To
see this note that if
columns.
In what follows we will most frequently use (3.5) in the two weaker forms
and
A full-rank factorization
If a matrix has linearly dependent columns, some of them are redundant, and it is nat-
ural to seek a more economical representation. For example, the mxn matrix
whose columns are proportional to one another, can be written in the form
where 6T = (fl\ /32 - • • /?n). The representation encodes the matrix economically
using m+n scalars instead of the ran scalars required by the more conventional rep-
resentation.
The following theorem shows that any matrix has an analogous representation.
The ROW SPACE of A is the space 7£(AT). The RANK of A is the integer
Proof. Since U(A + B) is contained in U(A) + U(B\ it follows from (3.7) that the
rank satisfies (3.8). Since the row space of AB is contained in the row space of A, we
have rank( AB) < rank( A). Likewise, since the column space of AB is contained in
the column space of A, we have rank(A-B) < rank(jB). Together these inequalities
imply (3.9).
We have observed that a solution of the linear system Ax = b is unique if and
only if the homogeneous equation Ax = 0 has no nontrivial solutions [see (2.14)].
It is easy to see that the set of all solutions of Ax — 0 forms a subspace. Hence the
following definition.
Definition 3.16. LetA£Rm*n. The set
is called the NULL SPACE of A. The dimension ofM(A) is called the NULLITY of A
and is written
null(A) = dim[N(A)].
A nonzero vector in the null space of A — that is, a nonzero vector x satisfying Ax =
0 —is called a NULL VECTOR of A.
The null space determines how the solutions of linear systems can vary. Specifi-
cally:
If the system Ax = b has a solution, say XQ, then any solution lies in
the set
Thus the nullity of A in some sense measures the amount of nonuniqueness in the so-
lutions of linear systems involving A.
The basic facts about the null space are summarized in the following theorem.
Theorem 3.17. LetA£RmXn. Then
and
www.pdfgrip.com
is singular. But a nonzero perturbation, however small, in any one element will make
it nonsingular.
If we are willing to accept that full-rank matrices are more likely to occur than
degenerate ones, we can make some statements—case by case.
1. If m > n, the matrix (A b) will generally have full column rank,
and hence 6 will not lie in 7l(A). Thus overdetermined systems
usually have no solutions. On the other hand, null( A) will gen-
erally be zero; and in this case when a solution exists it is unique.
2. If m < n, the matrix A will generally be of full row rank, and
hence of rank m. In this case H(A) = R m , and a solution exists.
However, null(.A) > 0, so no solution is unique.
3. If m = n, the matrix A will generally be nonsingular. In this
case a solution exists and is unique.
A warning. The above statements, correct though they are, should not lull one
into thinking errors in a matrix can make the difficulties associated with degeneracies
go away. On the contrary, the numerical and scientific problems associated with near
degeneracies are subtle and not easy to deal with. These problems are treated more
fully in §1, Chapter 5.
each have unique solutions a;,-. If we set X = (xi x% • • • xn), then AX = I. Simi-
larly, by considering the systems
Unfortunately there are no simple general conditions for the existence of the in-
verse of the sum A+B of two square matrices. However, in special cases we can assert
the existence of such an inverse and even provide a formula; see (3.4), Chapter 4.
The inverse is in many respects the driving force behind matrix algebra. For ex-
ample, it allows one to express the solution of a linear system Ax — b succinctly as
x = A~lb. For this reason, disciplines that make heavy use of matrices load their
books and papers with formulas containing inverses. Although these formulas are full
of meaning to the specialist, they seldom represent the best way to compute. For exam-
ple, to write x — A~l b is to suggest that one compute x by inverting A and multiplying
the inverse by 6 — which is why specialists in matrix computations get frequent re-
quests for programs to calculate inverses. But there are faster, more stable algorithms
for solving linear systems than this invert-and-multiply algorithm. (For more on this
point see §1.5, Chapter 3, and Example 4.11, Chapter 3.)
represents a vector zR n as a sum of the unit vectors. This unit basis occupies a dis-
tinguished position because the coefficients of the representation are the components
of the vector. In some instances, however, we may need to work with another basis.
In this subsection, we shall show how to switch back and forth between bases.
Change of basis
First a definition.
Definition 3.22. Let X be a basis for a subspace X in En, and let x = Xu. Then the
components ofu are the COMPONENTS OF x WITH RESPECT TO THE BASIS X.
By the definition of basis every x G X can be represented in the form X u and hence
has components with respect to X. But what precisely are these components? The
following theorem supplies the wherewithal to answer this question.
Theorem 3.23. Let XeRnxk have full column rank. Then there is a matrix X1 such
that
Proof. The proof mimics the proof of Theorem 3.20. Since rank(JC) = k, thecolumns
of X^ span R*. Hence the equations
www.pdfgrip.com
have solutions. If we set XL = (yi y2 • - • yk) , then equation (3.16) implies that
T T
X1X = I.
The matrix X1 is called a left inverse of X. It is not unique unless X is square. For
otherwise the systems (3.16) are underdetermined and do not have unique solutions.
The solution of the problem of computing the components of x with respect to X
is now simple. If x = Xu and X1 is a left inverse of X, then u = Xlx contains the
components of x with respect to X. It is worth noting that although X1 is not unique,
the vector u = Xlx is unique for any x^.H(X).
Now suppose we change to another basis X. Then any vector x£X can be ex-
pressed in the form Xu. The relation of u to u is contained in the following theorem,
in which we repeat some earlier results.
Theorem 3.24. Let XeRn be a subspace, and let X and X be bases for X. Let X1
be a left inverse of X. Then
is nonsingular and
1
are the components ofx with respect to X. Moreover, P X 1 is a left inverse ofX.
Hence ifx = Xu.
Proof. We have already established (3.18). Since X is a basis and the columns of X
lie in X, we must have X = XP for some P. On premultiplying this relation by X1,
we get (3.17). The matrix P is nonsingular, since otherwise there would be a nonzero
vector v such that Pv = 0. Then Xv = XPv = 0, contradicting the fact that X is a
basis. The rest of the proof is a matter of direct verification.
Thus the matrix-vector product Ax reproduces the action of the linear transformation
/, and it is natural to call A the matrix representation off.
Now suppose we change bases to Y in R m and X in Rn. Let x = Xu and f ( x ) —
Yv. What is the relation between u and v?
Let X = (xi x2 • • • xn) be partitioned by columns, and define
Theorem 3.26. Let X and y be subspaces and let f: X —* y be linear. Let X and Y
be bases f or X and y, and let X1 and Y1 be left in verses for X and Y. For any x^X
letu = Xlx andv = Ylf(x) be the components of x andf(x) with respect to X and
Y. Then
Full-rank factorizations
Although the principal application of full-rank factorizations in this section is to char-
acterize the rank of a matrix, they are ubiquitous in matrix computations. One of the
reasons is that if the rank of a matrix is low a full-rank factorization provides an eco-
nomical representation. We derived a full-rank factorization from the pivoted LU de-
composition, but in fact they can be calculated from many of the decompositions to
be treated later—e.g., the pivoted QR decomposition or the singular value decompo-
sition. The tricky point is to decide what the rank is in the presence of error. See §1,
Chapter 5, for more.
4. ANALYSIS
We have already pointed out that vectors and matrices regarded simply as arrays are
not very interesting. The addition of algebraic operations gives them life and utility.
But abstract linear algebra does not take into account the fact that our matrices are de-
fined over real and complex numbers, numbers equipped with analytic notions such
as absolute value and limit. The purpose of this section is to transfer these notions to
vectors and matrices. We will consider three topics—norms, orthogonality and pro-
jections, and the singular value decomposition.
4.1. NORMS
Vector and matrix norms are natural generalizations of the absolute value of a num-
ber—they measure the magnitude of the objects they are applied to. As such they
www.pdfgrip.com
SEC. 4. ANALYSIS 43
can be used to define limits of vectors and matrices, and this notion of limit, it turns
out, is independent of the particular norm used to define it. In this subsection we will
introduce matrix and vector norms and describe their properties. The subsection con-
cludes with an application to the perturbation of matrix inverses.
There are two ways of generalizing this notion to vectors and matrices. The first is
to define functions on, say, Rn that satisfy the three above conditions (with £ and 77
regarded as vectors and a remaining a scalar). Such functions are called norms, and
they will be the chief concern of this subsection. However, we will first introduce a
useful componentwise definition of absolute value.
The basic ideas are collected in the following definition.
Definition 4.1. Let A, J?GR mxn . Then A > B if a t j > fa and A > B ifa{j > fa.
Similar definitions hold for the relations "< " and "< ". If A > 0, then A is POSITIVE.
If A > 0, then A is NONNEGATIVE. The ABSOLUTE VALUE of A is the matrix \A\ whose
elements are |at-j|.
There are several comments to be made about this definition.
• Be warned that the notation A < B is sometimes used to mean that B - A is positive
definite (see §2.1, Chapter 3, for more on positive definite matrices).
• Although the above definitions have been cast in terms of matrices, they also apply
to vectors.
• The relation A > B means that every element of A is greater than the corresponding
element of B. To say that A > B with strict inequality in at least one element one has
to say something like A > B and A ^ B.
• If A ^ 0, the most we can say about | A \ is that | A \ > 0. Thus the absolute value of a
matrix is not, strictly speaking, a generalization of the absolute value of a scalar, since
it is not definite. However, it is homogeneous and satisfies the triangle inequality.
• The absolute value of a matrix interacts nicely with the various matrix operations.
For example,
Vector norms
As we mentioned in the introduction to this subsection, norms are generalizations of
the absolute value function.
Thus a vector norm is a definite, homogeneous function on C n that satisfies the trian-
gle inequality \\x + y\\ < \\x\\ + \\y\\. Vector norms on Rn are defined analogously.
The triangle inequality for vector norms has a useful variant
The process of dividing a nonzero vector by its norm to turn it into a vector of norm
one is called normalization.
There are infinitely many distinct vector norms. For matrix computations, three
are especially important.
Theorem 4.3. The following three function on Cn are norms:
and for any x there is a y for which equality is attained—and vice versa. Moreover,
SEC. 4. ANALYSIS 45
The norms defined in Theorem 4.3 are called the 1-, 2-, and oo-norms. They are
special cases of the Holder norms defined for 0 < p < oo by
(the case p = oo is treated as a limit). The 1-norm is sometimes called the Manhattan
norm because it is the distance you would have to traverse on a rectangular grid to get
from one point to another. The 2-norm is also called the Euclidean norm because in
real 2- or 3-space it is the Euclidean length of the vector x. All three norms are easy
to compute.
Pairs of norms satisfying (4.2) with equality attainable are called dual norms. The
inequality (4.3), which is called the Cauchy inequality, says that the 2-norm is self-
dual. This fact is fundamental to the Euclidean geometry of C n , as we will see later.
Given a norm, it is easy to generate more. The proof of the following theorem is
left as an exercise.
Theorem 4.4. Let || • || be a norm onCn, and let A£CnXn be nonsingular. Then the
function HA(%) defined by
www.pdfgrip.com
is a norm. If A is positive definite (Definition 2.1, Chapter 3), then function VA(X)
defined by
is a norm.
be a sequence of vectors in Cn and let #(EC n . Then we may write x^ —> x provided
Clearly this sequence converges to the zero vector componentwise, since each com-
ponent converges to zero. But \\xk — 0||oo = 1 for each k. Hence the sequence does
not converge to zero in the co-norm.
Not only can pointwise and normwise convergence disagree, but different norms
can generate different notions of convergence. Fortunately, we will only be dealing
with finite-dimensional spaces, in which all notions of convergence coincide. The
www.pdfgrip.com
SEC. 4. ANALYSIS 47
problem with establishing this fact is not one of specific norms. It is easy to show,
for example, that the 1-, 2-, and oo-norms all define the same notion of convergence
and that it is the same as componentwise convergence. The problem is that we have
infinitely many norms, and one of them might be a rogue. To eliminate this possibility
we are going to prove that all norms are equivalent in the sense that they can be used
to bound each other. For example, it is easy to see that
Proof.
It is sufficient to prove the theorem for the case where v is the 2-norm. (Why?)
We will begin by establishing the upper bound on \i.
Let K = max; //(e;). Since x — ^- &et-,
the last inequality following from (4.5). Hence, wither = I/(K^/H), wehave<j//(z) <
\\X\\2-
An immediate consequence of the bound is that jj,(x) as a function of x is contin-
uous in the 2-norm. Specifically, from (4.1)
Hence lim-ii^lla^o v(x) - v(y)\ < o 1 lim\\y_xi\2_+0 \\x - y\\2 = 0, which is the
definition of continuity.
Now let «S = {x: ||&||2 = 1}. Since S is closed and bounded and//is continuous,
// assumes a minimum at some point x^n on S. Let r = l/^x^n). Then
All the properties of vector norms are equally true of matrix norms. In particular,
all matrix norms are equivalent and define the same notion of limit, which is also the
same as elementwise convergence.
A difficulty with this approach to matrix norms is that it does not specify how ma-
trix norms interact with matrix multiplication. To compute upper bounds, we would
like a multiplicative analogue of the triangle inequality:
However, the conditions (4.7) do not imply (4.8). For example, if we attempt to gen-
eralize the infinity norm in a natural way by setting
then
but
SEC. 4. ANALYSIS 49
Since we have agreed to identify C n x l with C n , the above definition also serves to
define consistency between matrix and vector norms.
An example of a consistent matrix norm is the widely used Frobenius norm.
Definition 4.10. The FROBENIUS NORM is the function || • ||p denned for any matrix
by
The Frobenius norm is defined in analogy with the vector 2-norm and reduces to
it when the matrix in question has only one column. Just as ||ar||2 can be written in the
form XHX, so can the square of the Frobenius norm be written as
Since the diagonal elements of A1 A are the squares of the 2-norms of the columns of
A, we have
where «j is the jth column of A. There is a similar expression in terms of the rows of
A.
The Cauchy inequality can be written as a consistency relation in the Frobenius
norm:
The proof of the following theorem begins with this special case and elevates it to gen-
eral consistency of the Frobenius norm.
Theorem 4.11. Whenever the product AB is defined,
Proof. We will first establish the result for the matrix-vector product y = Ax. Let
AH = (ai 02 • • • a m ) be a partitioning of A by rows, so that yi — oPx. Then
the inequality following from the Cauchy inequality. Now let B = (&i • • • bn) be
partitioned by columns. Then
It sometimes happens that we have a consistent matrix norm, say defined on Cnxn,
and require a consistent vector norm. The following theorem provides one.
www.pdfgrip.com
Operator norms
The obvious generalizations of the usual vector norms to matrices are not guaranteed
to yield consistent matrix norms, as the example of the oo-norm shows [see (4.9)].
However, there is another way to turn vector norms into matrix norms, one that always
results in consistent norms. The idea is to regard the matrix in question as an operator
on vectors and ask how much it changes the size of a vector.
For definiteness, let v be a norm on C n , and let A be of order n. For any vector x
with v(x) = 1 let px — v( Ax), so that px measures how much A expands or contracts
x in the norm v. Although px varies with x, it has a well-defined maximum. This
maximum defines a norm, called the operator norm subordinate to the vector norm
n-H.
Before we make a formal definition, an observation is in order. Most of the norms
we work with are generic — that is, they are defined generally for spaces of all di-
mensions. Although norms on different spaces are different mathematical objects, it is
convenient to refer to them by a common notation, as we have with the 1-, 2-, and oo-
norms. We shall call such a collection a family of norms. In defining operator norms,
it is natural to work with families, since the result is a new family of matrix norms de-
fined for matrices of all dimensions. This is the procedure we adopt in the following
definition.
Definition 4.13. Let v be a family of vector norms. Then the OPERATOR NORM SUB-
ORDINATE TO v or GENERATED BY v is defined by
Although we have defined operator norms for a family of vector norms there is
nothing to prevent us from restricting the definition to one or two spaces—e.g., to
C n.
The properties of operator norms are summarized in the following theorem.
Theorem 4.14. Let || • \\v be an operator norm subordinate to a family of vector norms
v. Then \\ -^ is a consistent family of matrix norms satisfying
www.pdfgrip.com
SEC. 4. ANALYSIS 51
The operator norm is consistent with the generating vector norm. Moreover, ifv(£) =
|£|, then \\a\\v = v(a).
Proof. We must first verify that || • || is a norm. Definiteness and homogeneity are
easily verified. For the triangle inequality we have
For consistency, first note that by the definition of an operator norm we have a
fortiori v(Ax) < \\A\\vi/(x). Hence
52 CHAPTER1.MATRICES,ALGEBRAANDANALYSIS
of the rows of A. Similarly, the matrix co-norm is also called the column sum norm.
These norms are easy to compute, which accounts for their widespread use.
Although the Frobenius norm is consistent with the vector 2-norm, it is not the
same as the operator 2-norm—as can be seen from the fact that for n > 1 we have
\\In\\F = V™ T^ 1- The matrix 2-norm, which is also called the spectral norm, is
not easy to compute; however, it has many nice properties that make it valuable in
analytical investigations. Here is a list of some of the properties. The proofs are left
as exercises. (See §4.3 and §4.4 for singular values and eigenvalues.)
Theorem 4.16. The 2-norm has the following properties.
the largest singular value of A.
the largest eigenvalue of A A.
withequalityifandonlyifrank(A) = 1.
Absolute norms
It stands to reason that if the elements of a vector x are less in absolute value than the
elements of a vector y then we should have \\x\\ < \\y\\. Unfortunately, there are easy
counterexamples to this appealing conjecture. For example, the function
is a norm, but
Since norms that are monotonic in the elements are useful in componentwise error
analysis (see §3, Chapter 3), we make the following definition.
Definition 4.17. A norm \\ • \\ is ABSOLUTE if
or equivalently if
SEC. 4. ANALYSIS 53
• We may also speak of absolute matrix norms. The matrix 1-, oo-, and Frobenius
norms are absolute. Unfortunately, the matrix 2-norm is not absolute. However, it
does satisfy the relation
Theorem 4.18. Let \\-\\bea consistent matrix norm on C n x n . For any matrix P of
order n, if
and
Proof. By Theorem 4.12 there is a vector norm, which we will also write || • ||, that is
consistent with the matrix norm || • ||. Now let x ^ 0. Then
in proportion as P is small. The inequality (4.15) says that (/ - P)~l is itself near
7-1 = /.
• The result can be extended to a perturbation A - E of a nonsingular matrix A.
Write A-E = A(I- A^E), so that (A - E)~l = (I- A~lE)A~l. Thus we have
the following corollary.
Corollary 4.19. If
then
Moreover, so that
1
Multiplying this identity by (/ - P) and subtracting (/ - P)~l from both sides, we
get
Thus if / - P is nonsingular and the powers Pk converge to zero, the Neumann series
SEC. 4. ANALYSIS 55
A sufficient condition for Pk -» 0 is that \\P\\ < I in some consistent norm, in which
case
Proof. Suppose that Pk —> 0, but / - P is singular. Then there is a nonzero x such
that (/ - P)x = 0 or Px = z. Hence Pkx = x, and I - Pk is singular for all k. But
since Pk -» 0, for some k we must have ||P||oo < 1, and by Theorem 4.18 / - Pk is
nonsingular— a contradiction.
Since / — P is nonsingular, the convergence of the Neumann series follows on
taking limits in (4.16).
If ||P|| < 1, where || • || is a consistent norm, then ||Pfc|| < ||P||fc -> 0, and
the Neumann series converges. The error bound (4.17) follows on taking norms and
applying (4.14). •
The following corollary will be useful in deriving componentwise bounds for lin-
ear systems.
Corollary 4.21. tf\P\k -> 0, then (I - \P\)~l is nonnegativeand
Proof. Since \Pk\ < \P\k, Pk approaches zero along with \P\k. The nonnegativity
of (/ - I PI)"1 and the inequality now follow on taking limits in the inequality
Orthogonality
In classical vector analysis, it is customary to write the Cauchy inequality in the form
\x^y\ = cos0||o;||2||3/||2. In real 2- or 3-space it is easy to see that 0 is the angle be-
tween x and y. This suggests that we use the Cauchy inequality to extend the notion
of an angle between two vectors to C n .
www.pdfgrip.com
Definition 4.22. Let x, y£Cn be nonzero. Then the ANGLE 0(x, y) BETWEEN x AND y
is defined by
Thus for nonzero vectors orthogonality generalizes the usual notion of perpendic-
ularity. By our convention any vector is orthogonal to the zero vector.
The Pythagorean equality, mentioned above, generalizes directly to orthogonal
vectors. Specifically,
In fact,
Definition 4.23. Let u\. v,2,... , Uk€.C n. Then the vectors U{ are ORTHONORMAL if
SEC. 4. ANALYSIS 57
• From (4.19) it follows that the columns of an orthonormal matrix are linearly in-
dependent. In particular, an orthonormal matrix must have at least as many rows as
columns.
• If U is unitary (or orthogonal), then its inverse is its conjugate transpose:
• For any n x p orthonormal matrix U we have f7T U = Ip. Hence from Theorem 4.16
and (4.10) we have
Because of these relations, the 2-norm and the Frobenius norm are said to be unitarily
invariant.
Theorem 4.24 (QR factorization). Let XeCnXp have rank p. Then X can be writ-
ten uniquely in the form
where Q isannxp orthonormal matrix andR is upper triangular matrix with positive
diagonal elements.
The first equation simply asserts the existence of the factorization for X\, which exists
and is unique by the induction hypothesis.
Let
Since Q^Ql = 7,
To show that qp is nonzero, note that RU has positive diagonal elements and hence
is nonsingular (Theorem 2.1, Chapter 2). Thus from (4.22.1), Q\ — X\R^. Hence
The right-hand side of this relation is a nontrivial linear combination of the columns
of X, which cannot be zero because X has full column rank.
The uniqueness of the factorization follows from the uniqueness of the factoriza-
tion X\ = QiRu, and the fact that formulas (4.23) and (4.24) uniquely determine
rip, rpp, and qp
• The factorization whose existence is established by the theorem is called the QR fac-
torization of X. This factorization is one of the most important tools in matrix com-
putations.
www.pdfgrip.com
SEC. 4. ANALYSIS 59
• Let Xk = (xi X2 • • • Xk) and Qk = (<?i <?2 • • • <?fc)» and let Rk be the leading
principal submatrix of R of order k. Then from the triangularity of R it follows that
Xk = QkRk- In other words, the first k columns of Q form an orthonormal basis for
the space spanned by the first k columns of X.
• If X is a subspace and if X is a basis for X, then we can extend that basis to a basis
(X Y) for C n . The QR factorization of (X Y) gives an orthonormal basis Q for C n
whose first k columns are an orthonormal basis for X. The last n—k columns of Q are
a basis for a complementary subspace whose vectors are orthogonal to X. This space
is called the orthogonal complement of X, Thus we have shown that:
Every subspace has an orthogonal complement,
We will write Aj_ for the orthogonal complement of a subspace X. It is worth noting
that the existence of orthogonal complements is also implied by (3.11).
• Looking at the above construction in a different way, suppose that X is an orthonor-
mal basis for X. Then the first k columns of Q are the columns of X. Consequently:
If X is orthonormal, then there is an orthonormal matrix Y such that (X Y) is
unitary.
In particular, if X = x is a vector, it follows that:
If x is nonzero, then there is a unitary matrix whose first column is
This result is useful both in theory and practice. In fact, in § 1, Chapter 4, we will show
how to use Householder transformations to efficiently construct the required matrix.
• The proof of the existence of the QR factorization is constructive. The resulting
algorithm is called the classical Gram-Schmidt algorithm. Be warned that the al-
gorithm can be quite unstable; however, it has the theoretical advantage that it can be
used in arbitrary inner-product spaces. We will return to the Gram-Schmidt algorithm
in §1.4, Chapter 4.
Although there are infinitely many orthonormal bases for a nontrivial subspace of
C n , they are all related by unitary transformations, as the following theorem shows.
www.pdfgrip.com
Proof. Because X and X span the same space, there is a unique matrix U such that
X = XU. Now
Orthogonal projections
Imagine an eagle flying over a desert, so high that it is a mere point to the eye. The
point on the desert that is nearest the eagle is the point immediately under the eagle.
Replace the eagle by a vector and the desert by a subspace, and the corresponding near-
est point is called the projection of the vector onto the subspace.
To see how projections are computed, let X C C n be a subspace and let zGC n .
Let Q be an orthonormal basis for X and define
But (zx — zx}£X while (z±_ - z±)£Xi_. Consequently, they are both zero.
The vector zx = PXZ is called the orthogonal projection ofz onto X. The vector
z± = (I — PX}Z is called the orthogonal projection ofz onto the orthogonal comple-
ment ofX. We write P% for the projection matrix onto the orthogonal complement of
X. When X is clear from context, we write simply Pj_.
The operation of projecting a vector is clearly linear. It therefore has a unique ma-
trix representation, which in fact is PX- We call PX the orthogonal projection matrix
www.pdfgrip.com
SEC. 4. ANALYSIS 61
onto X — or when it is clear that an operator and not a vector is meant, simply the
orthogonal projection onto X. The projection matrix, being unique, does not depend
on the choice of an orthogonal basis Q.
The projection matrix PX satisfies
i.e., it is idempotent and Hermitian. It is an interesting exercise to verify that all Her-
mitian, idempotent matrices are orthogonal projections.
We can obtain another very useful expression for PX . Let X be a basis for X, and
let X = QR be its QR factorization [see (4.21)]. Then
It follows that
Theorem 4.26. Let X C Cn be a subspace and let z£Cn. Then the unique solution
of the problem
Proof. Let xGX. Since PX(z — x) _L P±.(z — x), we have by the Pythagorean equality
The second term in the right-hand side of (4.28) is independent o f x . The first term is
minimized precisely when x = PXZ.
In other words there is a unitary matrix Q that reduces X to upper triangular form.
(This reduction is called the QR decomposition; see §1.1, Chapter 4.)
The singular value decomposition is a unitary reduction to diagonal form. The
degrees of freedom needed for the additional simplification come from operating on
the matrix on both the left and the right, i.e., transforming X to U^XV, where U and
V are unitary. This subsection is concerned with the singular value decomposition—
its existence and properties.
Existence
The singular value decomposition can be established by a recursive argument, similar
in spirit to the proof of Theorem 2.13, which established the existence of the LU de-
composition.
Theorem 4.27. LetX£Cnxp, where n > p. Then there are unitary matrices U and
V such that
where
with
Proof. The proof is by recursive reduction of X to diagonal form. The base case is
when X is a vector x. If x = 0 take U = I and V = 1. Otherwise take U to be any
unitary matrix whose first column is a?/||a?||2 [see (4.26)] and let V = (1). Then
SEC. 4. ANALYSIS 63
If follows that if e is sufficiently small, then \\X\\2 > &, which contradicts the defini-
tion of a.
Now by the induction hypothesis there are unitary matrices I] and V such that
and
Moreover,
• It often happens that n > p, in which case maintaining the nxn matrix U can be
burdensome. As an alternative we can set
www.pdfgrip.com
in which case
This form of the decomposition is sometimes called the singular value factorization.
• The singular value decomposition provides an elegant full-rank factorization of X.
Suppose &k > 0 = <?k+i = • • • = ffp. Set
is a full-rank factorization of X. Since the rank of the factorization is fc, we have the
following important relation.
The rank of a matrix is the number of its nonzero singular values.
Later it will be convenient to have a notation for the smallest singular value of X.
We will denote it by
In other words:
The square of the Frobenius norm of a matrix is the sum of squares of its singular
values.
www.pdfgrip.com
SEC. 4. ANALYSIS 65
The characterizations of the spectral and the Frobenius norms in terms of singular
values imply that
They also explain why, in practice, the Frobenius norm and the spectral norm often
tend to be of a size. The reason is that in the sum of squares <rj + 0-3 -I h <?p if <?i is
just a little bit less than a\, the squaring makes the influence of (T2 and the subsequent
singular values negligible. For example, suppose p = 101, <j\ = 1, and the remaining
singular values are 0.1. Then
Thus
Uniqueness
The singular value decomposition is one of the many matrix decompositions that are
"essentially unique." Specifically, any unitary reduction to diagonal form must exhibit
the same singular values on the diagonal. Moreover, the singular vectors correspond-
ing to single distinct singular values are unique up to a factor of modulus one.
Repeated singular values are a source of nonuniqueness, as the following theorem
shows.
Let
A similar statement is true ofU provided we regard the singular vectors up+i,... , un
as corresponding to zero singular values.
Since Q is nonsingular, det(Q) ^ 0. Hence the left- and right-hand sides of (4.36) are
polynomials which are proportional to each other and therefore have the same zeros
counting multiplicities. But the zeros of these polynomials are respectively the num-
bers of and of. Since these numbers are arranged in descending order, we must have
£ = £.
To establish (4.34), write (4.35) in the form
Then< Hence
In consequence of (4.34), Q is block diagonal, each block corresponding to a re-
peated singular value. Write Q = diag(Qi, Qi,... , Qk), and partition
conformally. Then
Thus it is the subspace spanned by the right singular vectors that is unique. The singu-
lar vectors may be taken as any orthonormal basis—V{, V{, what have you—for that
subspace. However, once the right singular vectors are chosen, the left singular vec-
tors corresponding to nonzero singular values are uniquely determined by the relation
XV = UE. Analogous statements can be made about U'.
The nonuniqueness in the singular value decomposition is therefore quite limited,
and in most applications it makes no difference. Hence we usually ignore it and speak
of the singular value decomposition of a matrix.
Unitary equivalence
Two matrices X and Y are said to be unitarily equivalent if there are unitary matrices
P and Q such that Y = PHXQ. If X has the singular value decomposition (4.32),
then
SEC. 4. ANALYSIS 67
Unitarily equivalent matrices have the same singular values. Their singular vec-
tors are related by the unitary transformations connecting the two matrices.
The proof of the following result, which is an immediate consequence of the proof
of Theorem 4.28, is left as an exercise.
The singular values ofXHX X are the squares of the singular values ofX. The
nonzero singular values ofXXH h are the squares of the nonzero singular values
ofX.
although this time without any assumption that 0k+i = 0. Note that
Note for future reference that the only candidates for nonzero singular values of X —
Xk are cr^+i,... , crp.
and
Hence
www.pdfgrip.com
and
Moreover,
Proof. We will establish only the inequalities involving maxima, the others being es-
tablished similarly.
Let RS1* be a full-rank factorization of Y. Since the matrix 5HV)t+i has more
columns than rows, it has a nontrivial null space. Let a be a vector of 2-norm one
such that SHVk+ia = 0. Then w = Vk+ia£Af(Y). Moreover,
Now let Bi-i and Cj-\ be formed in analogy with (4.37). Then a\(B — J?;_i) =
(Ti(B)and(Ti(C-Cj-i) = 0j(C). Moreover,rank(5,-_i +Cy_i) < i+j-2. Hence
from (4.38)
and
www.pdfgrip.com
SEC. 4. ANALYSIS 69
Low-rank approximations
We have already observed that if <7fc > cr^+i = 0, then X has rank k. Since in practical
applications matrices are generally contaminated by errors, we will seldom encounter
a matrix that is exactly defective in rank. Instead we will find that some of the singular
values of the matrix in question are small.
One consequence of small singular values is that the matrix must be near one that
is defective in rank. To quantify this statement, suppose that the small singular values
are &k+i, tffc+2,... , crp. If we define Xk by (4.37), then the nonzero singular values
of Xk - X are ffk+i, &k+2-> • • • > vp. Hence we have
The following theorem shows that these low-rank approximations are optimal in the
2- and Frobenius norms.
www.pdfgrip.com
nxp
Theorem 4.32 (Schmidt-Mirsky). For any matrix XzCnXp, if Y e is of rank
k, then
and
the last equality following from the fact that rank(F) = k. Hence
so that the "left singular vector" (-1) and the "right singular vector" (1) have opposite
signs. However, if we relax the requirement that the singular values be positive, we
can obtain a symmetric decomposition.
Specifically, let v be a vector of 2-norm one such that
vHAv is maximized,
SEC. 4. ANALYSIS 71
We can continue the reduction to diagonal form with A, as we did with the singular
value decomposition. The result is the following theorem.
Theorem 4.33 (The spectral decomposition). If A£CHXn is Hermitian, there is a
unitary matrix U such that
where
Many properties of the singular value decomposition hold for the spectral decom-
position. The eigenvalues of a Hermitian matrix and their multiplicities are unique.
The eigenvectors corresponding to a multiple eigenvalue span a unique subspace, and
the eigenvectors can be chosen as any orthonormal basis for that subspace.
It is easily verified that the singular values of a Hermitian matrix are the absolute
values of its eigenvalues. Of particular importance is the following result, whose proof
is left as an exercise.
The eigenvalues of X^X are the squares of the singular values of
X. Their eigenvectors are the corresponding right singular vectors
ofX.
The in variance of singular values under unitary equivalences has a counterpart for
eigenvalues of a Hermitian matrix. Specifically,
IfUHAUU = A is the spectral decomposition of A and B = VHAV, where V
is unitary, then (V H f/) H 5(F H J l 7) = A is the spectral decomposition ofB.
The transformation A —>• V^ AV is called a unitary similarity transformation. The
above result shows that unitary similarities preserve eigenvalues and transform the
eigenvectors by the transpose of the unitary matrix of the similarity transformation.
Theorem 4.29 and its consequences are also true of the eigenvalues of Hermitian
matrices. We collect these results in the following theorem. Here the ith eigenvalue
in descending order of a Hermitian matrix is written A;( A).
www.pdfgrip.com
and
4. If E is Hermitian, then
Proof. The proofs of the first four items are mutatis mutandis the same as the corre-
sponding results for the singular values. The last is established as follows. For any
k < n-1 let W C C71""1 be afc-dimensionalsubspace for which
The fact that \k(VEAV) > \k+i(A) follows similarly from (4.46).
www.pdfgrip.com
SEC. 4. ANALYSIS 73
We write
A pair of orthonormal bases X^ and Y\>\ for X and y are said to be biorthogonal if
X^YM is diagonal. From (4.48) it follows that the matrices Xbi = XU and ybi = YV
are orthonormal bases for X and y satisfying X^Y^i = cos 0 ( X , y) and hence are
biorthogonal. From the uniqueness properties of the singular value decomposition it
follows that any such basis must be essentially unique and the diagonal elements of
X^j Ybi must be the cosines of the canonical angles between X and y. We summarize
these results in the following theorem.
Theorem 4.36. Let X, y C Cn be subspaces of dimension p. Then there are (essen-
tially unique) CANONICAL ORTHONORMAL BASES X and Y for X and y such that
www.pdfgrip.com
Two subspaces are identical if and only if their canonical angles are zero. Thus a
principal application of canonical angles is to determine when subspaces are near one
another. Fortunately, we do not have to compute the canonical angles themselves to
test the nearness of subspaces. The following theorem shows how to compute matrices
whose singular values are the sines of the canonical angles. As the subspaces approach
one another, these matrices approach zero, and conversely.
Proof. Without loss of generality we may assume that X and Y are canonical bases
for X and y. Then the matrix
It follows that 55H = / - F2 is diagonal. Since the diagonal elements of F are the
canonical cosines, the diagonal elements of SS^ are the squares of the canonical sines.
Thus the nonzero singular values of S are the sines of the nonzero canonical angles.
The result for X^Y is established similarly.
To establish the result for PX(I - Py) note that
The nonzero singular values of this matrix are the nonzero singular values of -X"HYj_,
which establishes the result. The result for (/ - Px}Py is established similarl
Thus although we cannot compute, say, 110(^,^)11? directly, we can compute
|| sin0(A',^)||Fby computing, say, ||XHYi||F. The latter is just as useful as the for-
mer for assessing the nearness of subspaces.
The CS decomposition
Suppose that we have a partitioned unitary matrix
www.pdfgrip.com
SEC. 4. ANALYSIS 75
where Q\\ is of order m. If we identify this matrix with the matrix (Y Yj_) in (4.49)
and set XH = (Im 0), then S = Qii- It follows that if we regard the singular values
of Qn as cosines then the singular values of $12 are sines. A similar argument shows
that the singular values of Q<i\ are the same sines. Moreover, passing to $22 brings us
back to the cosines.
These relations are a consequence of a beautiful decomposition of a unitary ma-
trix called the CS decomposition. The proof of the following theorem is tedious but
straightforward, and we omit it.
Theorem 4.38 (The CS decomposition). Let the be an unitary matrix Q of order n
be partitioned in the form
where Q\\ is of order m < n/1. Then there are unitary matrices U\, Vi£Cm and
t/o.Foed 71 -™)*^-™) such that
In effect the theorem states that the blocks in a partitioned unitary matrix share
singular vectors. An important application of the decomposition is to simplify proofs
of geometric theorems. It is an instructive exercise to derive the results on canonical
angles and bases using the CS decomposition.
The QR factorization
In some sense the QR factorization originated with Gram [158,1883], who orthogo-
nalized sequences of functions, giving determinantal expressions for the resulting or-
thogonal sequences. Later, Schmidt [272,1907] gave the algorithm implicitly used in
the proof of Theorem 4.24, still in terms of sequences of functions. The name of the de-
composition is due to Francis [123], who used it in his celebrated QR algorithm for the
nonsymmetric eigenvalue problem. Rumor has it that the "Q"in QR was originally an
"O" standing for orthogonal. It is a curiosity that the formulas for the Gram-Schmidt
algorithm can be found in a supplement to Laplace's Theorie Analytique des Proba-
bilites [211,1820]. However, there is no notion of orthogonalization associated with
the formulas.
Projections
Not all projections have to be orthogonal. Any idempotent matrix P is a projection
onto n(P) along ft(PH).
SEC. 5. ADDENDA 77
5. ADDENDA
5.1. HISTORICAL
On the word matrix
According to current thinking [221], about six thousand years ago the region between
the Dnieper and Ural rivers was occupied by people speaking a language known as
proto-Indo-European. Fifteen hundred years later, the language had fragmented, and
its speakers began to spread out across Europe and Asia in one of the most extensive
linguistic invasions ever recorded. From Alaska to India, from Patagonia to Siberia,
half the world's population now speak Indo-European languages.
One piece of evidence for the common origin of the Indo-European languages is
the similarity of their everyday words. For example, the word for two is dva in San-
skrit, duo in Greek, duva in Old Church Slavonic, and dau in Old Irish. More to our
purpose, mother is mother in Sanskrit, mater in Greek, mati in Old Church Slavonic,
mathir in Old Irish — and mater in Latin.
Matrix is a derivative of the Latin mater. It originally meant a pregnant animal and
later the womb. By extension it came to mean something that surrounds, supports, or
sustains—for example, the material in which a fossil is embedded. In 1850 Sylvester
used it to refer to a rectangular array of numbers. It acquired its present mathemat-
ical meaning in 1855 when Cayley endowed Sylvester's array with the usual matrix
operations.
History
A definitive history of vectors, matrices, and linear algebra has yet to be written. Two
broad traditions can be discerned. The first begins with quaternions and passes through
www.pdfgrip.com
vector analysis to tensor analysis and differential geometry. This essentially analytic
theory, whose early history has been surveyed in [80], touches only lightly on the sub-
ject of this work.
The second tradition concerns the theory of determinants and canonical forms.
Muir [238] gives an exhaustive history of the former in four volumes. Kline, who
surveys the latter in his Mathematical Thought from Ancient to Modern Times [199],
points out that most of the fundamental results on matrices—their canonical forms
and decompositions—had been obtained before matrices themselves came into wide-
spread use. Mathematicians had been working with linear systems and quadratic and
bilinear forms before Cay ley introduced matrices and matrix algebra in the 1850s [59,
60], and they continued to do so.
The relation of matrices and bilinear forms is close. With every matrix A one can
associate a function
called a bilinearform, that is linear in each of its variables. Conversely, each bilinear
form 531- aijyiXj corresponds to the matrix of its coefficients az-j. Under a change of
variables, say x = Px and y = Qy, the matrix of the form changes to Q^AP. Thus
the reduction of a matrix by transformations is equivalent to simplifying a quadratic
form by a change of variables.
The first simplification of this kind is due to Lagrange [205,1759], who showed
how to reduce a quadratic form to a sum of squares, in which the fcth term contains only
the last n—k+1 variables. His purpose was to determine if the form was positive def-
inite. Gauss [130,131,1809,1810] introduced essentially the same reduction—now
called Gaussian elimination—to solve systems and compute variances arising from
least squares problems. Throughout the rest of the century, various reductions and
canonical forms appeared in the literature: e.g., the LU decomposition by Jacobi [191,
1857], Jordan's canonical form [192, 1870], reductions of matrix pencils by Weier-
strass [337, 1868] and Kronecker [204, 1890], and the singular value decomposition
discovered independently by Beltrami [25, 1873] and Jordan [193, 1874]. For more
on these decompositions see the notes and references to the appropriate sections and
chapters.
The notion of an abstract vector space seems to be more a creature of functional
analysis than of matrix theory (for a history of the former see [95]). Definitions of
normed linear spaces—usually called Banach spaces — were proposed independently
by Banach [14,1922] and Wiener [340, 1922]. Less the norm, these spaces became
our abstract vector spaces.
SEC. 5. ADDENDA 79
ever, it is useful to list some of the more important books on the subject.
Textbooks
The first textbook devoted exclusively to modern numerical linear algebra was Fox's
Introduction to Numerical Linear Algebra [122]. My own text, Introduction to Ma-
trix Computations [288], published in 1973, is showing its age. Golub and Van Loan's
Matrix Computations [153] is compendious and up to date—the standard reference.
Watkins' Fundamentals of Matrix Computations [333], Datta's Numerical Linear Al-
gebra and Applications [86], and Trefethen and Bau's Numerical Linear Algebra [319]
are clearly written, well thought out introductions to the field. Coleman and Van Loan's
Handbook for Matrix Computations [71] provides a useful introduction to the practi-
www.pdfgrip.com
Special topics
There are a number of books on special topics in matrix computations: eigenvalue
problems [82, 207, 253], generalized inverses [26, 240, 267], iterative methods [13,
17, 165, 166, 332, 353], least squares [41, 213], rounding-error analysis [177], and
sparse matrices [108,143, 247].
Software
The progenitor of matrix software collections was the series of Handbook articles that
appeared in Numerische Mathematik and were later collected in a single volume by
Wilkinson and Reinsch [349]. The lineal descendants of this effort are EISPACK [284],
LINPACK [99], and LAPACK [9]. It is not an exaggeration to say that applied linear
algebra and matrix computations have been transformed by the availability of Cleve
Moler's MATLAB system [232] and its clones. See [255] for a useful handbook.
EISPACK, LINPACK, and LAPACK are available over the web from the NETLIB
repository at
http://www.netlib.org/index.html
This repository contains many other useful, high-quality numerical routines. For a
general index of numerical routines, consult the Guide to Available Mathematical Soft-
ware (GAMS) at
http://math.nist.gov/gams/
Historical sources
Kline's Mathematical Thought from Ancient to Modern Times [199] contains many
references to original articles. Older texts on matrix theory are often good sources
of references to original papers. Particular mention should be made of the books by
Mac Duffee [220], Turnbull and Aitken [322], and Wedderburn, [334]. For a view of
precomputer numerical linear algebra see Dwyer's Linear Computations [112].
www.pdfgrip.com
2
MATRICES AND MACHINES
Matrix algorithms—at least the ones in this series — are not museum pieces to be
viewed and admired for their beauty. They are meant to be programmed and run on to-
day's computers. However, the road from a mathematical description of an algorithm
to a working implementation is often long. In this chapter we will traverse the road in
stages.
The first step is to decide on the vehicle that will carry us — the language we will
use to describe our algorithms. In this work we will use pseudocode, which is treated
in the first section of this chapter.
The second stage is the passage from a mathematical description to pseudocode.
It often happens that an algorithm can be derived and written in different ways. In the
second section of this chapter, we will use the problem of solving a triangular system
to illustrate the ins and outs of getting from mathematics to code. We will also show
how to estimate the number of arithmetic operations an algorithm performs. Although
such operation counts have limitations, they are often the best way of comparing the
efficiency of algorithms—short of measuring actual performance.
The third stage is to move from code to the computer. For matrix algorithms two
aspects of the computer are paramount: memory and arithmetic. In the third section,
we will show how hierarchical memories affect the performance of matrix algorithms
and conversely how matrix algorithms may be coded to interact well with the memory
system of a computer. In the fourth section, we introduce floating-point arithmetic
and rounding-error analysis — in particular, backward rounding-error analysis and its
companion, perturbation theory.
The further we proceed along the road from mathematical description to imple-
mentation the more important variants of an algorithm become. What appears to be a
single algorithm at the highest level splits into several algorithms, each having its ad-
vantages and disadvantages. For example, the interaction of a matrix algorithm with
memory depends on the way in which a matrix is stored—something not usually spec-
ified in a mathematical description. By the time we reach rounding error, truly minute
changes in an algorithm can lead to enormous differences in behavior. What is an al-
gorithm? The answer, it seems, depends on where you're at.
81
www.pdfgrip.com
Here are instructions on how to get to my house. The party starts at 7:30.
1. Go to the last traffic light on Kingston Pike
2. Turn right
3. Drive 5.3 miles
4. Turn left at the convenience store
5. We are the ninth house on the right
1. PSEUDOCODE
Most computer languages have all that is needed to implement algorithms for dense
matrices—two-dimensional arrays, conditional statements, and looping constructs.
Many also allow one to define new data structures—something that is useful in coding
sparse matrix algorithms. Yet matrix algorithms are not usually presented in a standard
programing language but in some form of pseudocode. The chief reason is that pseu-
docode allows one to abandon lexical rigor for ease of exposition. English sentences
and mathematical expressions can be interleaved with programming constructs. State-
ments can be neatly labeled for later reference. And pseudocode provides a veneer of
neutrality by appearing not to favor one language over another.
For all these reasons, we have chosen to present algorithms in pseudocode. This
section is devoted to setting down the basics. The reader is assumed to be familiar with
a high-level, structured programming language.
1.1. GENERALITIES
A program or code fragment is a sequence of statements, perhaps numbered sequen-
tially. The statements can be ordinary English sentences; e.g.,
1. Go to the last traffic light on Kingston Pike
2. Turn right
When it is necessary to be formal, we will call a sequence of pseudocode an al-
gorithm and give it a prologue explaining what it does. For example, Algorithm 1.1
describes how to get to a party.
We will use standard mathematical notation freely in our pseudocode. However,
in a statement like
SEC. 1. PSEUDOCODE 83
But the same statement would be awkward in a program. Consequently we will use
the conventions in Figure 1.1 to extract submatrices from a matrix. Inconsistent di-
mensions like [n+l:n] represent a void vector or matrix, a convention that is useful at
the beginning and end of loops [see (2.5) for an example].
The if statement
The if statement has the following form:
2. first block of statements
2. first block of statements
3. else if (second conditional statement)
4. second block of statemtents
5. else if (third conditional statement)
6. third block of statemtnts
7. else
8. last block of statements
9. end if
www.pdfgrip.com
Both the else and the else if s are optional. There may be no more than one else. The
conditional statements are evaluated in order until one evaluates to true, in which case
the corresponding block of statements is executed. If none of the conditional state-
ments are true, the block of statements following the else, if there is one, is executed.
In nested if statements, an else refers to the most recent if that has not been paired with
an endif or an else.
The token fi is an abbreviation for end if. It is useful for one-liners:
1. if (5 = 0) return Ofi
Here i is a variable, which may not be modified in the loop. The parameters j, k, and
d are expressions, which will be evaluated once before the loop is executed. If the by
part of the loop is omitted, d is assumed to be one.
For i < k and d > 0, the block of statements is executed for i = j,i = j+d,
i = j+2d,... , j+nd, where n is the largest integer such that j+nd < k. Similarly,
for i > k and d < 0 the index i steps downward by increments of d until it falls below
k. The identifier i is not required after the end for, but it is useful for keeping track of
long loops. In fact, any appropriate token will do—e.g., the statement number of the
for statement.
The for loop obeys the following useful convention.
For example, the following code subtracts the last n—1 components of an n-vector
from the first component, even when n = 1.
1. for i =2 to n
2. x1 =x1 - xi
3. end for
SEC. 1. PSEUDOCODE 85
1. leave <name>
causes the algorithm to leave the control statement indicated by <name>. The state-
ment may be a for, while, or if statement. The name may be anything that unambigu-
ously identifies the statement: the index of a loop, the line number of an if statement,
or the corresponding end if.
The statement
1. iterate <name>
forces an new iteration of the for or while loop indicated by <name>.
1. goto <name>
Here name may be a statement number or a statement label. A statement label proceeds
the statement and is followed by a colon:
1.3. FUNCTIONS
Functions and subprograms with arguments will be indicated, as customary, by pre-
ceding the code by the name of the function with its argument list. The statement re-
turn exits from the subprogram. It can also return a value. For example, the following
function returns \/a2 + 62, calculated in such a way as to avoid overflows and render
underflows innocuous (see §4.5).
1. Euclid (a,b)
2. s = a + b
3. if (s =0)
4. return 0 ! Zero is a special case
5. else
6. return
7. end if
8. end Euclid
Pseudocode
Another reason for the use of pseudocode in this book is, paradoxically, to make the
algorithms a little difficult to implement in a standard language. In many of the algo-
rithms to follow I have omitted the consistency and error tests that make their overall
structure difficult to see. If these algorithms could be lifted from the text and compiled,
they would no doubt find their way unpolished into the real world. In fact, implement-
ing the algorithms, which requires line-by-line attention, is a good way to become re-
ally familiar with them.
The pseudocode used in this work shows a decided tilt toward FORTRAN in its
looping construct and its passing of subprogram arguments by reference. This latter
feature of FORTRAN has been used extensively to pass subarrays by the BLAS (Basic
Linear Algebra Subprograms, see §3). In C one has to go to the additional trouble
of creating a pointer to the subarray. But it should be added that our conventions for
specifying submatrices (Figure 1.1) render the decisions to pass by reference largely
moot.
The use of the colon to specify an index range (Figure 1.1) is found in array dec-
larations FORTRAN??. It was extended to extract subarrays in MATLAB and later in
FORTRAN90. The use of brackets to specify array references is in the spirit of C. It
avoids loading yet another burden on the overworked parenthesis.
Twenty years ago one could be pilloried for including a goto statement in a lan-
guage. The reason was a 1968 letter in the Communications of the ACM by Dijkstra
titled "Go to statement considered harmful" [96]. Although others had deprecated the
use of goto's earlier, Dijkstra's communication was the match that lit the fire. The
argument ran that the goto's in a program could be replaced by other structured con-
structs —to the great improvement of the program. This largely correct view was well
on the way to freezing into dogma, when Knuth in a wonderfully balanced article [200,
1974] (which contains a history of the topic and many references) showed that goto's
www.pdfgrip.com
2. TRIANGULAR SYSTEMS
The implementation of matrix algorithms is partly art, partly science. There are gen-
eral principles but no universal prescriptions for their application. Consequently, any
discussion of code for matrix algorithms must be accompanied by examples to bridge
the gap between the general and the particular.
In this section we will use the problem of solving a lower triangular system as a
running example. There are three reasons for this choice. First, it is a real problem of
wide applicability. Second, it is simple enough so that the basic algorithm can be read-
ily comprehended. Third, it is complex enough to illustrate many of the principles of
sound implementation. We have chosen to work with lower triangular systems instead
of upper triangular systems because the order of computations runs forward through
the matrix in the former as opposed to backward in the latter. But everything we say
about lower triangular systems applies mutatis mutandis to upper triangular systems.
Proof. We will use the fact that a matrix L is nonsingular if and only if the system
has a solution for every 6 (see Theorem 3.21, Chapter 1). Let us write the system (2.1)
in scalar form:
First, suppose that the diagonal elements of L are nonzero. Then the first equation
in (2.2) has the solution xi = b\li\\. Now suppose that we have computed #1, x?,
... , Xk-i- Then from the fcth equation,
www.pdfgrip.com
1. for A: = 1 to n
2. Xk - bk
3. forj = ltofc-l
4. Zfc = Z f c - 4 j £ j
5. end for j
6. Xk = Xk/ikk
1. end for k
we have
Consequently, the fact that the tkk are all nonzero implies that equation (2.1) has a
solution for any right-hand side 6.
On the other hand, suppose that some diagonals of L are zero, and suppose that l^k
is the first such diagonal. If k = 1, then the equation fails to have a solution whenever
&i / 0. If k > 1, the quantities x\, x<i,... , Xk-i are determined uniquely as in (2.3).
If bk is then chosen so that
itself be transformed. Instead one should overwrite the vector 6 with the solution of
the system Lx = b. It is easy to modify Algorithm 2.1 to do this.
1. for k = 1 to n
2. forj = l t o f c - l
3. bk = bk - ikjbj
4. end for j
5. bk = bk/tkk
6. end for k
Since L\\ is a lower triangular matrix of order n—l we can solve the first system re-
cursively for xi and then solve the second equation for £n. This leads to the following
recursive code.
1. trisolve(L,x,b,n)
2. if (n=0) return fi
3. trisolve (L[1:n-1,1:n-1],x[1:n-1],b[1:n-1],n-1)
4. x[n]=(b[n] -L[n,1:n-1]*[1:n-1])/L[n,n]
5. end trisolve
There are two things to say about this algorithm.
• We have made heavy use of the conventions in Figure 1.1 to extract submatrices
and subvectors. The result is that no loop is required to compute the inner product in
the formula for x[n]. This suggests that we can code shorter, more readable algorithms
by consigning operations such as inner products to subprograms. We will return to this
point when we discuss the BLAS in §3.
• Implicit in the program is the assumption that Z[n, l:n-l]*x[l:n-1] evaluates to
zero when n = 1. This is the equivalent of our convention about inconsistent for
loops. In fact, the natural loop to compute the inner product in (2.5), namely,
www.pdfgrip.com
1. sum=0
2. for j = 1 to n-1
3. sum = sum + L[n,j]*x[j]
4. end for
returns zero when n — 1. In what follows we will assume that degenerate statements
are handled in such a way as to make our algorithms work.
Many matrix algorithms are derived, as was (2.5), from a matrix partition in such
a way as to suggest a recursive algorithm. Another example is the recursive algorithm
for computing the LU decomposition implicit in the proof of Theorem 2.13, Chap-
ter 1. How then are we to recover a more conventional nonrecursive algorithm? A re-
cursive matrix algorithm will typically contain a statement or sequence of statements
performing a computation over a fixed range, usually from 1 or 2 to n—1 orn, where
n is the recursion parameter—e.g., statement (2.5.4). The nonrecursive code is ob-
tained by replacing the index n by another variable k and surrounding the statements
by a loop in k that ranges between 1 and n. Whether k goes forward or backward must
be determined by inspection. For example, the nonrecursive equivalent of (2.5) is
1. for k =1ton
2. x[k]=(b[k]-L[k,1:k-1]*x[1:k-1])/L[k,k]
3. end for
Matrix algorithms are seldom written in recursive form. There are two plausible
reasons.
1. A recursive call is computationally more expensive than iterating a for loop.
2. When an error occurs, it is easy to jump out of a nest of loops to an appro-
priate error handler. Getting out of a recursion is more difficult.
On modern computers a matrix must be rather small for the recursion overhead to
count for much. Yet small matrices are often manipulated in the inner loops of ap-
plication programs, and the implementer of matrix algorithms is well advised to be
parsimonious whenever possible.
(or equivalently the system x^L = 6T). To derive an algorithm, note that the last
equation of this system has the form
which can be solved for xn. The fcth equation of the system has the form
This algorithm is the analogue of the forward substitution algorithm (back substi-
tution it is called), but in changing from the original system to the transposed system
it has become column oriented. The analogue for transposed systems of the of the col-
umn-oriented algorithm (2.7) is row oriented.
is clearly lower triangular. Hence the algorithms we have just derived will solve bidi-
agonal systems. But they will spend most of their time manipulating zero elements.
We can get a more efficient algorithm by restricting the computations to nonzero ele-
ments.
For example, in the relation
www.pdfgrip.com
defining xk, only lk,k-i and 4,fc are nonzero. Hence we may rewrite it in the form
Thus we get Algorithm 2.2. This algorithm is clearly cheaper than Algorithm 2.1. But
how much cheaper? We will return to this question after we derive another algorithm.
(cf. the proof of Theorem 3.20, Chapter 1). However, when A = L is lower triangular
there are some special savings. As is often the case, the algorithm is a spin-off of a
useful result.
Theorem 2.2. The inverse of a lower (upper) triangular matrix is lower (upper) trian-
gular.
Proof. We will prove the result for lower triangular matrices. Partition the system
LXJ = GJ in the form
www.pdfgrip.com
where L{$ is(j-l)x(j-l). Theni^ar^ = 0, and hence 2^) = 0. This shows that
the first j—1 components of the jth column of L~* are zero, and it follows that L~l
is lower triangular.
The proof of Theorem 2.2 implies that to compute the inverse of L we need only
solve the (n-j+l)x(n-j+l) systems L f t x j = ei for.;' = 1,2,... , n. If we use
Algorithm 2.1 to solve these systems, we obtain Algorithm 2.3.
The algorithm can be modified to overwrite L with its inverse by replacing all ref-
erences to X with references to L. The reader should verify that the following algo-
rithm does the job.
1. forfc = lton
2. L[k,k] = l/l[k,k]
3. for i = k+l to n
4. L[i, k] = -£[», k:i-I]*L[k:i-l, k]/L[k, k]
5. end for i
6. end for k
The savings in storage can be considerable, since a lower triangular matrix of order n
has at most n(n+l)/2 nonzero elements.
Bidiagonal systems
Let us look first at the number of operations required to solve a bidiagonal system. For
k = 1, the loop in Algorithm 2.2 performs a single division. For k > 1, it performs
one multiplication, one addition (actually a subtraction), and one division. Since the
loop runs from k = 1 to n, the entire algorithm requires
1. n—1 additions,
2. n-1 multiplications,
3. n divisions.
additions and multiplications. Taking into account the number of divisions, we get the
following operation count:
1. |n2 - |n additions,
2. ^n 2 — \n multiplications,
3. n divisions.
Abbreviation Description
• The abbreviations take the usual prefixes denoting powers of ten (e.g.,
Gflam).
constant is the only thing that distinguishes algorithms of the same order and it can
have important consequences for algorithms of different order.
• Nomenclature. The terminology for presenting operation counts is in a state of dis-
array. The widely used term "flop," which was originally an acronym for floating
point operation, has undergone so many changes that the substance has been practi-
cally wrung out of it (for more, see the notes and references for this section). Instead
we will use the abbreviations in Figure 2.1.
Note that the flam has replaced the flop in its sense (now defunct) of a floating-
point addition combined with a floating-point multiplication. Since in many matrix
algorithms, additions and multiplications come roughly in pairs, we will report many
of our counts in flams.
• Complex arithmetic. We will also use this nomenclature for complex arithmetic.
However, it is important to keep in mind that complex arithmetic is more expensive
www.pdfgrip.com
requires two real additions and hence is twice as expensive as a real addition. Again,
the calculation
requires four real multiplications and two real additions and is at least four times as
expensive as a real multiplication. These two examples also show that the ratio of
multiplication times to addition times can be different for real and complex arithmetic.
which represents an inner product of length i—k—l requiring about i—k flams. Since i
ranges from k to n and k ranges from 1 to n, the total number of flams for the algorithm
is
We could use standard summation formulas to evaluate this sum, but the process is
error prone. However, if we are only interested in the highest-order term in the sum,
we may replace the sum by an integral:
Note that the range of the outer integral has been adjusted to make it easy to evaluate.
We can do this because a shift of one or two in the limits of a range does not change
the high-order term.
This example makes clear one reason for deprecating the invert-and-multiply al-
gorithm for solving linear systems — at least triangular systems. The direct algorithm
for solving triangular systems is 0(n2), while the inversion of a triangular matrix is
an 0(n3) process.
Since operation counts are widely used to compare algorithms, it is important to
have an idea of their merits and limitations.
• Lower bounds. Operation counts provide a rigorous lower bound on the time an
algorithm will take—just divide the various counts by their rates for the computer in
question and add. Such lower bounds can be useful. If, for example, a bound predicts
that a calculation will take at least a thousand years, then it is time to consider alter-
natives.
• Arithmetic is not everything. Algorithms have overheads other than arithmetic
operations, overheads we will treat in the next section. Hence running time cannot be
predicted from operation counts alone. In a large number of cases, however, the run-
ning time as a function of the size of the problem is proportional to the time predicted
by operation counts. Moreover, the constant of proportionality is often approximately
the same over many algorithms—provided they are implemented with due respect for
the machine in question.
• Comparing algorithms of equal order. In using operation counts to compare al-
gorithms of the same order, it is the order constant that decides. Other things being
equal, one should prefer the algorithm with the smaller order constant. But keep in
mind that other things are never exactly equal, and factors of, say, two in the order
constants may be insignificant. The larger the factor, the more likely there is to be a
corresponding difference in performance.
• Comparing algorithms of different order. In principle, order constants are not
needed to decide between algorithms of different order: the algorithm of lower order
ultimately wins. But ultimately may never come. For example, if an 0(n3) algorithm
has an order constant equal to one while an 0(n2) has an order constant of one thou-
sand, then the first will be better for matrices of size less than one thousand. The size
of problem for which a lower-order algorithm becomes superior to a higher-order al-
gorithm is called the break-even point. Many promising algorithms have been undone
by high break-even points.
Finally, keep in mind that there are other things than speed to consider in select-
ing an algorithm—numerical stability, for example. An algorithm that persistently
returns bad answers is useless, even if it runs at blinding speed.
www.pdfgrip.com
• The names of these BLAS describe their functions. For example, xelib
means "X Equals L Inverse B."
• B may replace X in the calling sequence, in which case the result over-
writes B — e.g., xelib(B,L,B).
Recursion
Although there are sound reasons why recursion is not much used in matrix computa-
tions, at least part of the story is that at one time recursion could be quite expensive.
Improved compiler techniques (e.g., see [5, 317]) have made recursive calls compar-
atively inexpensive, so that the overhead is negligible except for very small matrices.
Operation counts
Operation counts belong to the field of algorithms and their complexity. Two classical
references are the book of Aho, Hopcroft, and Ullman [4], which treats the algorith-
mic aspect, and the book by Hopcroft and Ullman [181], which treats the theoretical
aspects. For an encyclopedic treatment with many reference see [73].
Pat Eberlein has told me that the word "flop" was in use by 1957 at the Prince-
ton Institute for Advanced Studies. Here is a table of the various meanings that have
attached themselves to the word.
1. Flop —a floating point operation.
2. Flop — a floating point addition and multiplication.
3. Flops—plural of 1 or 2.
4. Hops—flops (1 or 2) per second.
In its transmogrifications, the meaning of "flop" has flipped from 1 to 2 and back to 1
again. Golub and Van Loan [152, p. 19] hint, ever so gently, that the chief beneficiaries
of the second flip were the purveyors of flops—supercomputer manufacturers whose
machines got a free boost in speed, at least in advertising copy.
The system adopted here consists of natural abbreviations. Since precision re-
quires that heterogeneous counts be spelled out, there is no canonical term for a float-
ing-point operation. However, the flam and the rot (short for "rotation" and pronounc-
ed "wrote") cover the two most frequently occurring cases of compound operations.
The usage rules were lifted from The New York Public Library Writer's Guide to Style
and Usage [316] and The American Heritage College Dictionary [74].
The technique of approximating sums by integrals, as in (2.10), is a standard trick
of the trade. It provides the correct asymptotic forms, including the order constant,
provided the integrand does not grow too fast.
Computational theorists and matrix algorithmists measure complexity differently.
The former measure the size of their problems in terms of number of inputs, the latter
www.pdfgrip.com
in terms of the order of the matrix. Since a matrix of order n has ra = n2 elements,
an 0(n3) matrix algorithm is an 0(ra2) algorithm to a computational theorist. This
places matrix algorithms somewhere between the Fourier transform, which is 0(ra2),
and the fast Fourier transform, which is 0(m log ra). And a good thing too! If our
algorithms were 0(ra3), we wouldn't live long enough to run them.
o
Whether you call the order n3 or ra 2, the order constants of matrix algorithms can
vary dramatically. The table in Figure 2.3, containing the number operations required
for some common O(n3} matrix algorithms applied to a 20x20 matrix, was compiled
using MATLAB. (Thanks to Jack Dongarra for the idea.) Thus the order constant for
finding the eigenvalues and eigenvectors of a nonsymmetric matrix is nearly one hun-
dred times larger than that for finding the Cholesky decomposition. Beresford Parlett,
complaining about the abuse of the big O notation, says that it plays the part of a fig
leaf on a statue: it covers up things people don't want seen. The above table supports
this simile.
3. MATRICES IN MEMORY
There are many ways to execute the algorithms of the preceding section. The calcu-
lations could be done by hand, perhaps with the help of a slide rule or a table of log-
arithms. They could be done with an abacus or a mechanical calculator. Each mode
of computation requires special adaptations of the algorithm in question. The order
in which operations are performed, the numbers that are written down, the safeguards
against errors — all these differ from mode to mode.
This work is concerned with matrix computations on a digital computer. Just like
www.pdfgrip.com
any other mode of computation, digital computers place their own demands on matrix
algorithms. For example, recording a number on a piece of paper is an error-prone
process, whereas the probability of generating an undetected error in writing to the
memory of a computer is vanishingly small. On the other hand, it is easy to mismanage
the memory of a computer in such a way that the speed of execution is affected.
The theme of this section is matrices in memory. We will begin by describing how
dense matrices are represented on computers, with emphasis on the overhead required
to retrieve the elements of a matrix. We will then move on to a discussion of hierar-
chical memories.
Storage of arrays
In high-level programming languages, matrices are generally placed in two-dimen-
sional arrays. Apxq array A is a set of pq memory locations in the computer. An el-
ement of an array is specified by two integers i and j which lie within certain ranges.
In this work we will assume 1 < i < p and 1 < j < q. The syntax by which an ele-
ment of an array is represented will depend on the programming language. Here we
will use the convention we have already been using for matrices — the (z, ^-element
of the array A is written A[z, j ] .
A difficulty with arrays is that they are two-dimensional objects that must be stored
in a one-dimensional memory. There are many ways in which this can be done, each
having its own advantages for specialized applications. For general matrix computa-
tions, however, there are just two conventions.
• Storage by rows. Beginning at a base address a, the array is stored a row at a time,
the components of each row appearing sequentially in the memory. For example, a
www.pdfgrip.com
This order of storage is also called lexicographical order because the elements A[i, j]
are ordered with their first subscript varying least rapidly, just like letters in words al-
phabetized in a dictionary. This form of storage is also called row major order.
The general formula for the location of the (i,j)-element of a pxq array can be
deduced as follows. The first i—1 rows have (z-l)<? elements. Consequently, the first
element of the z'th row is a[(i-l)q+l}. Since the elements of a row are stored in se-
quence, the jth element of the ith row must be a[(i—l)q+j]. Thus
• Storage by columns. Here the array is stored a column at a time. The memory
locations containing a 5x3 array are shown below.
Strides
When a px q array is stored rowwise, the distance in memory between an element and
the next element in the same column is q. This number is called the stride of the array
because it is the stride you must take through memory to walk down a column. When
the array is stored by columns, the stride is p and refers to the stride required to traverse
a row.
The stride is confusing to people new to matrix computations. One reason is that it
depends on whether the array is stored rowwise or columnwise. Thus the stride is dif-
ferent in C, which stores arrays rowwise, and FORTRAN, which stores them column-
wise.
Another source of confusion is that matrices are frequently stored in arrays whose
dimensions are larger than those of the matrix. Many programs manipulate matrices
www.pdfgrip.com
whose dimensions are unknown at the time the program is invoked. One way of han-
dling this problem is to create a px q array whose dimensions are larger than any ma-
trix the program will encounter. Then any raxn matrix with m < p and n < q can be
stored in the array—usually in the northwest corner.
Now it is clear from (3.1) and (3.2) that you have to know the stride to locate el-
ements in an array. Consequently, if a matrix in an oversized array is passed to a sub-
program, the argument list must contain not only the dimensions of the matrix but also
the stride of the array. The source of confusion is that the stride is generally different
from the dimensions of the matrix.
The parameter N is the order of the matrix L. The parameter LDL is the stride of the
array L containing the matrix L. The name is an abbreviation for "leading dimension
of L" because the first dimension in the declaration of an array in FORTRAN is the
stride of the array.
In, then we can use (3.1) to retrieve elements of L. The result of these alterations to
(3.3) is the following program.
1. for k = 1 ton
2. x[k] = b[k]
3. forj = ltofe-1
4. x[k] = x[k] - t[(k-l}p+j]*x[j\
5. endforj
6. x[k] = x[k]/l[(k-l)p+k]
1. end for k
This program shows why arithmetic operation counts alone cannot predict the run-
ning time of an algorithm. Each array reference involves a certain amount of additional
computational work. For example, the reference to t^j in the original program trans-
lates into £[(k—l)p+j], which requires two additions and a multiplication to compute.
But the program also shows why the running time is proportional to the arithmetic op-
eration count. Each floating-point addition and multiplication in the inner loop is ac-
companied by the same overhead for memory references.
that it represents an increment of one with each iteration of the loop. The optimizing
compiler will simply generate code to increment the index.
Since time is important in computations with large matrices, it is natural to ask ii
there is a way to circumvent the code generated by a nonoptimizing compiler. The an-
swer is yes. The idea is to isolate frequently occurring computations into subprograms
where they can be optimized—by hand if necessary. Let's see how this works for the
forward substitution algorithm.
The place to start (as always in speeding up code) is the inner loop—in this case
the loop on j in (3.3). The effect of this loop is to compute the inner product
and subtract it from Xk- Since the inner product represents the bulk of the work, that
is the computation we want to isolate in a subprogram. The following function does
the job.
1.dot(n,x,y)
2. s = 0
3. for k = 1 to n
4. s=s+x[k]y[k]
5. end for k
6. return s
7. end dot
Now we can substitute the program dot for the inner loop on j:
1. for k = 1 to n
2. x[k]=(b[k]-dot(k-1,L[k,1],x[1]))/A[k,k]
3. end for k
There are five comments to be made about dot and its usage.
• The subprogram uses the convention for inconsistent loops and returns zero when
n is zero.
• Comparing the calling sequence dot(k—l, £[&,!], z[l])with Jofitself, we see that
it is the address of L[k, 1] and x[l] that is passed to the subprogram. This is sometimes
called call by reference, as contrasted with call by value in which the value of the ar-
gument is passed (see § 1.3).
• Since we have replaced doubly subscripted array references to L with singly sub-
scripted array references to x in the subprogram dot, even a nonoptimizing compiler
will generate efficient code. But if not, we could compile dot on an optimizing com-
piler (or code it in assembly language) and put it in a library. Since the inner product
is one of the more frequently occurring matrix operations, the effort will pay for itself
many times over.
www.pdfgrip.com
• The subprogram dot can be written to take advantage of special features of the ma-
chine on which it will be run. For example, it can use special hardware — if it exists—
to compute the inner product.
• The subprogram dot is not as general as it should be. To see why, imagine that L is
stored by columns rather than by rows, say with stride p. Then to move across a row,
the index of x in dot must increase by p instead of one. A revised subprogram will
take care of this problem.
1. dot(n, x, xstr, y, ystr)
2. ix = 1; iy = 1
3. s=0
4. for k = 1 to n
5. s - s + x[ix]*y[iy]
6. ix = ix+xstr
1. iy — iy+ystr
8. end for k
9. return s
10. end dot
In this subprogram the index of both x and y are incremented by a stride provided by
the user. In particular, to convert (3.6) to handle a matrix stored by columns with stride
p, the statement (3.6.2) would be replaced by
with
www.pdfgrip.com
This cramming of the nonzero elements of a matrix into a linear array is called a packed
representation.
We will implement the column-oriented algorithm (2.7), which for convenience
is reproduced here.
1. x = b
2. for k = 1 to n
3. x[k] = x[k]/L(k,k] (3.7)
4. x[k+l:n] = x[k+l:n] - x[k]*L[k+l:n,k]
5. end for k
The implementation is in the spirit of the algorithm (3.5) in that we set up an index i
that moves through the array i.
1. t = l
2. x = b
3. for k = I to n
4. x[k] = x[k]/t[i\
5. i = i+l
6. forj =fc+lton
7. x\j] = x\j]-x[k]*l[t\
8. i = i+1
9. end for j
10. end for A:
It is worth noting that there is no need to do an extra computation to get the index of
the diagonal of L before the division. Because of the packed representation, all we
need do is increment i by one.
The level-one BLAS interact nicely with packed representations. For example, the
basic operation the statement (3.7.4) performs is to overwrite a vector by its sum with
www.pdfgrip.com
a constant times another vector. Consequently, if we create a new BLAS called axpy
(for ax plus y) that overwrites a vector y with ax + y, we can use it to implement the
inner loop of (3.8).
Specifically, define axpy as follows.
1. axpy(n, a, x, xstr, y, ystr)
2. ix = 1; iy = 1
3. for k = 1 to n
4- y[«>] = y[i>] + a*x[ix]
5. ix — ix+xstr
6. iy = iy+ystr
1. end for k
8. end axpy
Then (3.8) can be written in the form
1. t=1
2. z=6
3. for k — 1 to n
4. z[fc] = x[k]/l[i]
5. axpy(n-k, -x[k], l[i+l], 1, »[*+!], 1)
6. i = i+n—k+1
1. end for k
Most matrices with regular patterns of zeros lend themselves to a packed represen-
tation. The packing need not be in a single array. People often pack the three diagonals
of a tridiagonal matrix in three linear arrays (see §2.3, Chapter 3). Nor is packing con-
fined to matrices with zero elements. Almost half the elements of a symmetric matrix
are redundant (since a zj = a,ji) and do not need to be stored.
Packing is not the only way to economize storage. Overwriting is another. For
example, the algorithm (2.9) overwrites a lower triangular matrix with its inverse, thus
saving 0(n2) words of memory. Later in this work we shall see how a matrix can be
overwritten by a decomposition of itself. However, we can overwrite a matrix only if
we know that we will not need it again.
Most computers have more than one level of memory. Figure 3.1 exhibits a typical
(though idealized) hierarchical memory. At the bottom are the registers of the central
processing unit—the place where words from higher memories are manipulated. At
the top is a disk containing the entire memory allocated to the machine—the virtual
memory that cannot fit into the main memory below. Between the main memory and
the registers is a small, fast cache memory.
Volumes have been written on hierarchical memories, and it is impossible to de-
scend to architectural details in a work of this nature. But even a superficial descrip-
tion will be enough to suggest how matrix algorithms might be coded to avoid prob-
lems with memory hierarchies. We will begin this subsection by discussing virtual and
cache memory. We will then turn to strategies for writing matrix algorithms that use
hierarchical memories efficiently.
1. The page is in the main memory. In this case the reference — whether a read
or a write — is performed with no delay.
2. The page is not in main memory, a condition known as a page fault. In this
case the system swaps the page in backing store with one of the pages in
main memory and then honors the reference.
The problem with this arrangement is that reads and writes to backing store are
more costly than references to main memory, e.g., a hundred thousand times more
costly. It is therefore important to code in such a way as to avoid page faults. In compu-
tations with large matrices, some page faults are inevitable because the matrices con-
sume so much memory. But it is easy to miscode matrix algorithms so that they cause
unnecessary page faults.
The key to avoiding page faults is locality of reference. Locality of reference has
two aspects, locality in space and locality in time.
www.pdfgrip.com
Locality in space refers to referencing nearby locations. The rationale is that con-
tiguous memory locations are likely to lie in the same page, so that a cluster of refer-
ences to nearby locations is unlikely to generate page faults. On the other hand, loca-
tions far removed from one another will lie on different pages and referencing them
one after another may cause a sequence of page faults. Thus it is desirable to arrange
computations so that if a location is referenced subsequent references are to nearby
locations.
To understand the notion of locality in time consider two references to a single
memory location. If these references occur near each other in time — the extreme case
is when they occur one right after the other—the page containing the item is likely to
be still around. As the references become further separated in time, the probability of
a page fault increases. Thus it is desirable to arrange computations so that repeated
references to the same locations are made close together in time.
Cache memory
In recent years the speed of processors has increased faster than the speed of mem-
ory — at least memory that can be built in quantity at a reasonable cost. To circumvent
this roadblock, computer architects have incorporated small, fast memories—called
cache memories or simply caches — into computers.
Cache memory bears the same relation to main memory as main memory does
to virtual memory, though the details and terminology differ. The cache is divided
into blocks which contain segments from main memory. When a memory reference
is made, the hardware determines if it is in the cache. If it is, the request is honored
right away. If it is not—a cache miss this situation is called—an appropriate (and
generally time-consuming) action is taken before the reference is honored.
An important difference between cache and virtual memories is that writes a cache
are usually more expensive than reads. The reason is the necessity of preserving cache
coherency — the identity of the contents of the cache and the contents of the corre-
sponding block of memory. A coherent cache block, say one that has only been read
from, can be swapped out at any time simply by overwriting it. An incoherent cache
block, on the other hand, cannot be overwritten until its coherency is restored.
There are two common ways of maintaining cache coherency. The first, called
write through, is to replicate any write to cache with a write to the corresponding lo-
cation in main memory. This will cause writes—or at least a sequence of writes near
each other in time— to be slower than reads. The other technique, called write back,
is to wait for a miss and if necessary write the whole block back to memory, also a
time-consuming procedure. Actually, write-through and write-back represent two ex-
tremes. Most caches have buffering that mitigates the worst behavior of both. None-
theless, hammering on a cache with writes is a good way to slow down algorithms.
www.pdfgrip.com
A model algorithm
We are now going to consider techniques by which we can improve the interaction of
matrix algorithms with hierarchical memories. It must be stressed that this is more
an art than a science. Machines and their compilers have become so diverse and so
complicated that it is difficult to predict the effects of these techniques. All we can
say is that they have the potential for significant speedups.
In order to present the techniques in a uniform manner, we will consider a model
calculation. People acquainted with Gaussian elimination—to be treated in Chap-
ter 3 — will recognize the following fragment as a stripped down version of that al-
gorithm. The matrix A in the fragment is of order n,
1. for k — 1 to n-1
2. A[k+l:n, k+l:n] = A[k+l:n, k+l:n]
- A[k+l:n,k]*A[k,k+I:n]
3. end for k
In matrix terms, if at the &th stage the array A is partitioned in the form (northwest
indexing)
If n is at all large, the references jump around memory — i.e., they do not preserve
locality in space.
www.pdfgrip.com
1. for k = 1 ton
2. for i - k+1 to n
3. forj = k+1 ton
4. A[i,j] = A[i,j]-A[i,k]*A[k,j]
5. endforj
6. end for i
1. end for k
Row-Oriented Algorithm
1. for A; = 1 to n
2. for j = k+1 ton
3. for i = k+1 to n
4. A[i,j] = A[iJ]-A[i,k]*A[kJ]
5. end for i
6. end for j
1. end for k
Column-Oriented Algorithm
On the other hand, the second algorithm, which traverses the columns of A, makes
the following sequence of memory references.
Thus the algorithm proceeds from one memory location to the next—i.e., it preserves
locality in space about as well as any algorithm can.
As k increases the behavior of the algorithms becomes more complicated. But the
first algorithm never jumps in memory by less than n words, while the second algo-
rithm never jumps by more than n words and usually by only one word. If A is stored
by rows instead of columns, the two algorithms reverse themselves with the second
www.pdfgrip.com
Level-two BLAS
It is not always necessary to decide on row or column orientation at the time an al-
gorithm is written. For example, we have seen that the basic operation of our model
algorithm is the subtraction of a rank-one matrix:
[see (3.11)]. Suppose we write a function amrnkl (A, £, ?/T) that overwrites A with
A — xy^- (the name means "A minus a rank-one matrix). Then we can write our model
algorithm in the form
1. for k = 1 to n—1
2. amrnkl(A[k+l:n,k+l:n], A[k+l:n,k], A[k,k+l:n])
3. end for Ar
The program amrnkl can then be loaded from a library of code written for the target
system and language.
The program amrnkl is called a level-two BLAS. The name comes from the fact
that it performs 0(n 2 ) matrix-vector operations. Other examples of level-two BLAS
www.pdfgrip.com
are the ones for triangular systems listed in Figure 2.2. Yet another example is the
formation of a matrix-vector product. It turns out that the catalogue of useful matrix-
vector operations is small enough that it is practical to code libraries of them for a
given machine or language. Row and column orientation, as required, can then be
incorporated in these libraries.
We will not make explicit use of level-two BLAS in presenting algorithms. The
reason is the same as for level-one BLAS — our notation is sufficient to express the
action of most level-two BLAS. Provided we present our algorithms at a high enough
level, the level-two BLAS required will be obvious on inspection.
Unfortunately, the level-two BLAS are not a panacea. Algorithms for the more
complicated matrix decompositions usually have row or column orientation built into
them in subtle ways—for example, in the decision whether to store a transformation
or its transpose. Until we agree on a common language or until all languages become
ambidextrous, orientation of matrix algorithms will continue to trouble us.
On the other hand, if the element is a,kj with k > j, the algorithm performs the fol-
lowing computations:
These formulas presuppose that the a's forming the products on the left have already
been processed. This will be true if, as k goes from one to n—1, we compute a^ (i —
fc+1,... , n) and akj (j = fc,... , n).
These considerations give us the following algorithm.
www.pdfgrip.com
1. for A; = lton-1
2. fort = £ + l t o n
3. forj = ltofc-1
4. A[i,k] = A[i,k]-A[i,j]*A\j,k]
5. end for j
6. end for i
1. for j - k to n
8. fori = ltofc-l
9. A[fc,j] = A[kJ] - A[k,i]*A[i,k]
10. end for i
11. end for j
12. end for k
Incidentally, this program is a stripped down version of the Crout form of Gaussian
elimination (see Algorithm 1.7, Chapter 3).
The advantage of this form of the algorithm is that the reference to A[i, k] in state-
ment 4 does not change in the inner loop on i. Consequently, we can put it in a register
and work with it there without having to write to cache. Similarly for the computation
in statement 9. Thus the reorganized algorithm is potentially faster than either of the
algorithms in Figure 3.2 when writes to cache are expensive.
Unfortunately, there are trade-offs. The program (3.14) is neither column- nor
row-oriented and cannot be made so. What we gain by keeping the data in registers we
may lose to the poor orientation. Moreover, there is no way to use BLAS to hide the
difference between (3.14) and the algorithms in Figure 3.2. With regard to memory
they are fundamentally different algorithms, and the choice between them must be at
the highest level.
where the indexing is to the northwest. If we process the elements in AH , A\)Tn +i, and
Am+i,! in the usual way, then the effect of the first m steps of the algorithm (3.13) on
A m +i,m+i is to overwrite A m+ i m+1 as follows
This overwriting is a rank-ra update. After the update, we can repeat the process on
the matrix A m+ i )Tn+ i.
www.pdfgrip.com
1. for A; = 1 to n by m
2. ku — min{A;+m-l, n}
3. for I = k to ku-l
4. for j = i + l t o n
5. for i = /+! to min{j, ku}
6. A[i,j] = A[i,j]-A[i,t\*A[l,j]
1. end for i
8. end for j
9. f o r j = l+ltoku
10. fori = j + l t o n
11. A[*,j] = A[»,j]-A[»,/]*A[/,j]
12. end fore
13. end for j
14. end for /
15. A[JfcM+l:n,*M+l:n] = A[ku+l:n,ku+l:n]
—A[ku+l:n, k:ku]*A[k:ku, ^w+l:n]
16. end for k
The code in Figure 3.3 implements this scheme. The code may be best understood
by referring to the following figure.
The grey region contains the elements that have been completely processed. Regions
I (which contains the diagonal) and II are the blocks corresponding to AH, Ai >m+ i,
and Am+iti in the partition (3.15). They are processed in the loop on /. The loop in
statement 4 processes region I. The loop in statement 9 processes region II. Region III
is processed in statement 15, which if the block size m is not large accounts for most
of the work in the algorithm.
If we now define a level-three BLAS amrnkm(A, X, Y) that overwrites A with A—
www.pdfgrip.com
The BLAS amrnkm can then be coded to take advantage of the features of a particu-
lar machine. The fact that amrnkm works with more data than amrnkl gives us more
opportunity to economize. For example, if ra is not too large we may be able to use
inner products in the style of (3.14) without triggering a volley of cache misses.
The choice of the block size m is not easy. Two things limit its size. First, the
overhead for processing regions I and II increases until it swamps out any benefits.
Second, as we have suggested above, if m is too large we increase the probability of
cache misses and page faults. The routines provided by LAPACK choose a value of 1,
32, or 64, depending on the name of the BLAS and whether the arithmetic is real or
complex.
Blocking can be remarkably effective in speeding up matrix algorithms—espe-
cially the simpler ones. However, we shall not present blocked algorithms in this work.
There are three reasons. First, the blocking obscures the simplicity of the basic algo-
rithm. Second, once a matrix algorithm is well understood, it is usually an easy matter
to code a blocked form. Finally, the LAPACK codes are thoroughly blocked and well
commented, so that the reader can easily learn the art of blocking by studying them.
For these reasons, we will present our algorithms at the level of matrix-vector opera-
tions, i.e., algorithms that can be implemented with the level-two BLAS.
The BLAS
The BLAS arose in stages, as suggested by their level numbers. The original BLAS
[214], formally proposed in 1979, specified only vector operations. When this level of
abstraction was found to be unsatisfactory for certain vector supercomputers, notably
the various CRAYs, the level-two BLAS [102, 103] for matrix-vector operations were
proposed in 1988. Finally, the level-three BLAS [101, 100] were proposed in 1990 to
deal with hierarchical memories.
The fact that the BLAS could enhance the performance of code generated by non-
optimizing compilers was first noted by the authors of LINPACK and was an important
factor in their decision to adopt the BLAS.
A problem with generalizing the original BLAS is that each level of ascent adds
disproportionately to the functions that could be called BLAS. For example, the solu-
tion of triangular systems is counted among the level-two BLAS. But then, why not in-
clude the solution of Hessenberg systems, which is also an 0(n2) process. By the time
one reaches the level-three BLAS, everything in a good matrix package is a candidate.
The cure for this problem is, of course, a little common sense and a lot of selectivity.
Virtual memory
Virtual memory was proposed by Kilburn, Edwards, Lanigan, and Sumner in 1962
[198]. Virtual memories are treated in most books on computer architecture (e.g., [169,
317, 257, 78]). Moler [231] was the first to point out the implications of virtual mem-
ory for matrix computations.
A common misconception is that virtual memory in effect gives the user a mem-
ory the size of the address space—about 4 Gbytes for an address space of 232 bytes.
But on a multiuser system each user would then have to be allocated 4 Gbytes of disk,
which would strain even a large system. In practice, each user is given a considerably
smaller amount of virtual memory.
Cache memory
Cache memory was the creation of Maurice Wilkes [341], the leader of the project that
resulted in the first effective stored program computer. A comprehensive survey may
be found in [283]. Also see [78, 169, 173, 257].
sible problem with the excuse that it would take too long.
Blocking
It is important to distinguish between a blocked algorithm like the one in Figure 3.3
and a block algorithm in which the blocks of a partitioned matrix are regarded as (non-
commuting) scalars. We will return to this point when we consider block Gaussian
elimination (Algorithm 1.2, Chapter 3).
4. ROUNDING ERROR
As I was going up the stair
I met a man who wasn 't there!
He wasn't there again today!
I wish, I wish he'd stay away!
Hughs Mearns
Rounding error is like that man. For most people it isn't there. It isn't there as
they manipulate spreadsheets, balance checking accounts, or play computer games.
Yet rounding error hovers at the edge of awareness, and people wish it would go away.
But rounding error is inevitable. It is a consequence of the finite capacity of our
computers. For example, if we divide 1 by 3 in the decimal system, we obtain the
nonterminating fraction 0.33333 .... Since we can store only a finite number of these
3's, we must round or truncate the fraction to some fixed number of digits, say 0.3333.
The remaining 3's are lost, and forever after we have no way of knowing whether we
are working with the fraction 1/3 or some other number like 0.33331415 ....
Any survey of matrix algorithms—or any book on numerical computation, for
that matter—must come to grips with rounding error. Unfortunately, most rounding-
error analyses are tedious affairs, consisting of several pages of algebraic manipula-
tions followed by conclusions that are obvious only to the author. Since the purpose
of this work is to describe algorithms, not train rounding-error analysts, we will con-
fine ourselves to sketching how rounding error affects our algorithms. To understand
the sketches, however, the reader must be familiar with the basic ideas — absolute and
relative error, floating-point arithmetic, forward and backward error analysis, and per-
turbation theory. This section is devoted to laying out the basics.
Absolute error
We begin with a definition.
Definition 4.1. Let a and b be scalars. Then the ABSOLUTE ERROR in b as an approx-
imation to a is the number
• The number e — b — a is usually called the error in 6, and some people would con-
fine the use of the term "error" to this difference. Such a restriction, however, would
require us to qualify any other measure of deviation, such as absolute error, even when
it is clear what is meant. In this work the meaning of the word "error" will vary with
the context.
• In many applications only an approximate quantity is given, while the true value is
unknown. This means that we cannot know the error exactly. The problem is resolved
by computing upper bounds on the absolute error. We will see many examples in what
follows.
• The absolute error is difficult to interpret without additional information about the
true value.
Example 4.2. Suppose b approximates a with an absolute error o/0.01. If a = 22 A3,
then a andb agree to roughly four decimal digits. On the other hand, if a = 0.002243,
then the error overwhelms a. In fact, we could have b — 0.012243, which is almost
five times the size of a.
Relative error
Example 4.2 suggests that the problem with absolute error is that it does not convey a
sense of scale, i.e., of the relation of the error to the quantity being approximated. One
way of expressing this relation is to take the ratio of the error to the true value. In the
above example, if a = 22 A3, this ratio is about 0.0004, which is satisfactorily small.
If, on the other hand, a = 0.002243, the ratio is about four. These considerations lead
to the following definition.
www.pdfgrip.com
Definition 4.3. Let a ^ 0 and 6 be scalars. Then the RELATIVE ERROR in b as an ap-
proximation to a is the number
Proof. From the definition of relative error, we have /?|a| = |6 — a| > a| — \b\, from
which it follows that \b\> (I- p)\a > 0. Hence from the definition of relative error
and the last inequality, it follows that
which quantity is used as a normalizer. In this case one may speak of the relative error
in a and b without bothering to specify which is the quantity being approximated.
• The relative error is related to the number of significant digits to which two numbers
agree. Consider, for example, the following approximations to e = 2.71828... and
their relative errors.
Floating-point numbers
Floating-point numbers and their arithmetic are familiar to anyone who has used a
hand calculator in scientific mode. For example, when I calculate I/TT on my calcula-
tor, I might see displayed
3.183098 –01
This display has two components. The first is the number 3.183098, which is called
the mantissa. The second is the number —01, called the exponent, which represents
www.pdfgrip.com
a power of ten by which the mantissa is to be multiplied. Thus the display represents
the number
3.183098.10–1 = 0.3183098
It is easy to miss an important aspect of the display. The numbers have only a finite
number of digits—seven for the mantissa and two for the exponent. This is character-
istic of virtually all floating-point systems. The mantissa and exponent are represented
by numbers with a fixed number of digits. As we shall see, the fixed number of digits
in the mantissa makes rounding error inevitable.
Although the above numbers are all represented in the decimal system, other bases
are possible. In fact most computers use binary floating-point numbers.
Let us summarize these observations in a working definition of floating-point num-
ber.
Definition 4.5. A t-digit, base-(3 FLOATING-POINT NUMBER having EXPONENT RANGE
[emin, emax] is a pair (m, e), where
1. m is a t-digit number in the base ft with its (3-point in a fixed location, and
2. e is an integer in the interval [emin, emax].
The number m is called the MANTISSA of (m, e), and the number e is its EXPONENT.
The VALUE of the number (m, e) is
m-(3e.
The number (m, e) is NORMALIZED if the leading digit in m is nonzero.
It is important not to take this definition too seriously; the details of floating-point
systems are too varied to capture in a few lines. Instead, the above definition should be
taken as a model that exhibits the important features of most floating-point systems.
On hand calculators the floating-point base is ten. On most digital computers it
is two, although base sixteen occurs on some IBM computers. The location of the /3-
point varies. On hand calculators it is immediately to the left of the most significant
digit, e.g., (3.142,0). On digital computers the binary point is located either to the left
of the most significant digit, as in (1.10010,1), or to the right, as in (.110010,2).
The way in which the exponent is represented also varies. In the examples in the
last paragraph, we used decimal numbers to represent the exponent, even though the
second example concerned a binary floating-point number. In the IBM hexadecimal
format, the exponent is represented in binary.
A floating-point number on a digital computer typically occupies one or two 32-bit
words of memory. A number occupying one word is called a single-precisionnumber,
one occupying two words is called a double-precision number. Some systems provide
quadruple-precision numbers occupying four words. The necessity of representing
floating-point numbers within the confines of a fixed number of words accounts for
the limits on the size of the mantissa and on the range of the exponent.
www.pdfgrip.com
The small numbers above the box denote bit positions within the 32-bit word contain-
ing the number. The box labeled a contains the sign of the mantissa. The other two
boxes contain the exponent and the trailing part of the mantissa. The value of the num-
ber is
The quantities frac and exp are not the same as the quantities ra and e in Defini-
tion 4.5. Here is a summary of the differences.
• Since the leading bit in the mantissa of a normalized, binary, floating-point number
is always one, it is wasteful to devote a bit to its representation. To conserve precision,
the IEEE fraction stores only the part below the leading bit and recovers the mantissa
via the formula m = (-1)^-1.frac.
• The number exp is called a biased exponent, since the true value e of the exponent
is computed by subtracting a bias. The unbiased exponent range for single precision is
[-126,127], which represents a range of numbers from roughly 10~38 to 1038. Double
precision ranges from roughly 10~307 to 10307. In both precisions the extreme expo-
nents (i.e., -127 and 128 in single precision) are reserved for special purposes.
• Zero is represented by exp = 0 (one of the reserved exponents) and / = 0. The
sign bit can be either 0 or 1, so that the system has both a +0 and a —0.
www.pdfgrip.com
Rounding error
Relative error is the natural mode in which to express the errors made in rounding to
a certain number of digits. For example, if we round TT = 3.14159 ... to four digits,
we obtain the approximation 3.142, which has a relative error of about 10~4. The ex-
ponent -4, or something nearby, is to be expected from the relation between relative
error and the number of significant digits to which two numbers agree.
More generally, consider the problem of rounding a normalized binary fraction a
to t digits. We can represent this fraction as
where the y and the z's represent the digits to be rounded. If y = 0 we truncate the
fraction, which gives the number
t l
The worst possible error (which is approached when the 2*5 are all one) is 2 . On
the other hand, if y is one, we round up to get the number
Again the worst error (which is attained when the z's are all zero) is 2 * l. Since the
smallest possible value of a is 1/2, we make a relative error
In a floating-point system rounding depends only on the mantissa and not on the
exponent. To see this result let a be multiplied by 2e, where e is an integer. Then the
value of a will also be multiplied by 2e, and the factor 2e will cancel out as we compute
the relative error (4.4).
Let us write fl(a) for the rounded value of a number a. Then from the character-
ization (4.1), it follows that if a number a is rounded to a i-digit binary floating-point
number we have
There are other ways to shoehorn a number into a finite precision word. The round-
ing we have just described is sometimes called "round up" because the number
www.pdfgrip.com
The number eM is called the ROUNDING UNIT for the system in question.
Typically, the rounding unit for a /-digit, base-/? floating-point system will be ap-
proximately /?~*. The size can vary a little depending on the details. For example, the
rounding unit for chopping is generally twice the rounding unit for ordinary round-
ing. Although this increase is minor, we will see later that chopping has an important
drawback that does not reveal itself in the bounds.
Example 4.7 (IEEE standard). The single-precision rounding unit for IEEE floating
point is about 10~7. The double precision rounding unit is about 10~16.
Floating-point arithmetic
Floating-point numbers have an arithmetic that mimics the arithmetic of real num-
bers. The operations are usually addition, subtraction, multiplication, and division.
This arithmetic is necessarily inexact. For example, the product of two four-digit num-
bers is typically an eight-digit number, and in a four-digit floating-point system it must
be rounded back. The standard procedure is for each operation to return the correctly
rounded answer. This implies the following error bounds for floating-point arithmetic.
Let o denote one of the arithmetic operations +, -, x, -f-, and let fl(a o 6) denote
the result of performing the operation in a floating-point system with rounding
unit€ M. Then
The bound (4.6) will be called the standard bound for floating-point arithmetic,
and a floating-point system that obeys the standard bound will be called a standard
system. The standard bound is the basis for most rounding-error analyses of matrix
algorithms. Only rarely do we need to know the details of the arithmetic itself. This
fact accounts for the remarkable robustness of many matrix algorithms.
www.pdfgrip.com
Example 4.8 (IEEE standard). In the IEEE standard the default rounding mode is
round to nearest even. However, the standard specifies other modes, such as round
toward zero, that are useful in specialized applications.
The practice of returning the correctly rounded answer has an important implica-
tion.
Example 4.9. If \a + 6| < min{|a|, |6|}, we say that CANCELLATION has occurred in
the sum a + b. The reason for this terminology is that cancellation is usually accom-
panied by a loss of significant figures. For example, consider the difference
0.4675
-0.4623
0.0052
Since cancellation implies that no more than the full complement of significant figures
is required to represent the result, it follows that:
When cancellation occurs in a standard floating-point system, the computed re-
sult is exact.
There is a paradox here. People frequently blame cancellation for the failure of an
algorithm; yet we have just seen that cancellation itself introduces no error. We will
return to this paradox in the next subsection.
One final point. Many algorithms involve elementary functions of floating-point
numbers, which are usually computed in software. For the purposes of rounding-error
analysis, however, it is customary to regard them as primitive operations that return
the correctly rounded result. For example, most rounding error-analyses assume that
Simple as this computation is, it already illustrates many of the features of a full-blown
rounding-error analysis.
The details of the analysis depend on the order in which the numbers are summed.
For definiteness we will analyze the following algorithm.
1. Si = X\
2. for i = 2 to n
3. Si = S;_! +
4. end i
www.pdfgrip.com
Then
www.pdfgrip.com
• First-order bounds. The numbers 77; in (4.9) are called the (relative) backward er-
ror, and it is important to have a bound on their sizes. To see what kind of bound we
can expect, consider the product
Now |en_2 + e n _i| < 2eM and |e n _ 2 e n -i| < 4- If» sav> £M = 10 16, then 2eM =
2 • 10~16 while 4 - 10~32. Thus the third term on the right-hand side of (4.10) is
insignificant compared to the second term and can be ignored. If we do ignore it, we
get
or
• Rigorous bounds. To get a completely rigorous bound, we use the following result,
whose proof is left as an exercise.
If further
then
where
If we assume that neM < 0.1, then the bounds (4.11) can be written rigorously in
the form
The simplicity of the bounds (4.13) more than justifies the slight overestimate of
the error that results from using the adjusted rounding unit. For this reason we make
the following assumption.
In all rounding-error analyses it is tacitly assumed that the size of the problem is
such that approximate bounds of the form \rj\ < neu can be rigorously replaced
by\rj\ < ne'M, where c^ is the adjusted rounding unit defined by (4.12).
Backward stability
The expression
the problem.) On the other hand, if / varies greatly with small perturbations in x we
will say that it is ill conditioned. Examples of well- and ill-conditioned functions are
illustrated in Figure 4.1. The circles on the left represent a range of values of x; the
ellipses on the right represent the range of values of f ( x ) . The larger ellipse on the
right represents the ill-conditioning of /.
An algorithm to solve the computational problem represented by the function /
can be regarded as another function g that approximates /. Suppose that the algorithm
g is backward stable and is applied to an input x. Then g(x) = f ( y ) for some y near
x. If the problem is well conditioned, then /(x) must be near f ( y ) , and the computed
solution g(x) is accurate. On the other hand, if the problem is ill conditioned, then
the computed result will generally be inaccurate. But provided the errors in the data
are larger than the backward error, the answer will lie within the region of uncertainty
caused by the errors in the data. These two situations are illustrated in Figure 4.2.
Weak stability
We have seen that a backward stable algorithm solves well-conditioned problems ac-
curately. But not all algorithms that solve well-conditioned problems accurately are
backward stable. For example, an algorithm might produce a solution that is as ac-
curate as one produced by a stable algorithm, but the solution does not come from a
slight perturbation of the input. This situation, which is called weak stability, is illus-
trated in Figure 4.3. The ellipse on the right represents the values of / corresponding
www.pdfgrip.com
to the circle on the left. The large circle on the right represents the values returned by
a weakly stable algorithm. Since the radius of the circle is the size of the major axis of
the ellipse, the algorithm returns a value that is no less accurate than one returned by
a stable algorithm. But if the value does not lie in the ellipse, it does not correspond
to a data point in the circle on the left, i.e., a point near the input.
Condition numbers
As informative as a backward rounding-error analysis of an algorithm is, it does not tell
us what accuracy we can expect in the computed solution. In the notation introduced
above, the backward error analysis only insures that there is a y near x such that <jf(:r) =
f ( y ) . But it does not tell us how far f ( x ) is from f ( y ) .
In fact, the problem of accuracy can be cast in general terms that has nothing to
do with rounding error.
Given a function f and two arguments x and y, bound the distance between f ( x )
and f ( y ) in terms of the distance between x and y.
Resolving such problems is the subject of the mathematical discipline of perturbation
theory. The perturbation that gives rise to y from x can have any source, which need
not be rounding error. For this reason we will first treat the general problem of the
perturbation of a sum of numbers and then reintroduce rounding error.
We are concerned with perturbations of the sum
The right-hand side of (4.17) is the relative error in s. The number e bounds the
relative error in the :r's. Thus K is a factor that mediates the passage from a bound on
the perturbation of the arguments of a function to a bound on the perturbation induced
in the function itself. Such a number is called a condition number.
Just as a backward rounding-error analysis distinguishes between satisfactory and
unsatisfactory algorithms, condition numbers distinguish between easy and hard prob-
lems. For our problem the condition number is never less than one. It is equal to one
when the absolute value of the sum is equal to the sum of the absolute values, some-
thing that happens whenever all the z's have the same sign. On the other hand, it is
large when the sum is small compared to the x's. Thus the condition number not only
bounds the error, but it provides insight into what makes a sum hard to compute.
More simply,
Let us look more closely at this bound and the way in which it was derived.
In popular accounts, the accumulation of rounding error is often blamed for the
failure of an algorithm. Here the accumulation of rounding error is represented by the
factor neM, which grows slowly. For example, if K is one, we cannot loose eight digits
of accuracy unless n is greater than 100 million. Thus, even for large n, the condition
www.pdfgrip.com
number can be more influential than the accumulation of rounding error. In fact, a
single rounding error may render an ill-conditioned problem inaccurate, as we shalJ
see later.
The bound itself is almost invariably an overestimate. In the first place, it is de-
rived by replacing bounds like \rji\ < (n-i)e'M with \rjn\ < ne'M, which exaggerates
the effects of the terms added last. In fact, if we were to arrange the terms so thai
KI < #2 < • • • < xn, then the larger z's will combine with the smaller T?'S to make
the final bound an even greater overestimate.
There is another factor tending to make the bound too large. Recall that
[see (4.8)]. Multiplying this relation out and keeping only first-order terms in eM, we
find that
Now the worst-case bound on \rji \ is about (n— l)cM- But if we are rounding, we can
expect the e's to vary in sign. With negative e's cancelling positive e's, the sum can be
much smaller than the worst-case bound. In fact, on statistical grounds, we can expect
it to behave more like \/neM. Note that if all the X{ are positive, and the floating-point
system in question chops instead of rounding, all the a will be negative, and we will
see a growth proportional to n.
If the bound is an overestimate, what good is it? The above discussion provides
three answers. First, even as it stands, the bound shows that rounding errors accu-
mulate slowly. Second, by looking more closely at the derivation of the bound, we
discovered that arranging the x's in ascending order of magnitude tends to make the
bound a greater overestimate. Since the bound is fixed, it can only become worse if
actual error becomes smaller. This suggests the following rule of thumb.
Summing a set of numbers in increasing order of magnitude tends to
diminish the effects of rounding error.
Finally, an even deeper examination of the derivation shows a fundamental difficulty
with chopped arithmetic—if the numbers are all of the same sign, the chopping errors
are all in the same direction and accumulate faster.
4.4. CANCELLATION
The failure of a computation is often signaled by the cancellation of significant figures
as two nearly equal numbers are subtracted. However, it is seldom the cause of the
failure. To see why, consider the sum
472635.0000
+ 27.503
- 472630.0000
32.5013
www.pdfgrip.com
Consequently, we should expect to see a loss of about four digits in attempting to com-
pute the sum.
Our expectations are realized. If we compute the sum in six-digit, decimal arith-
metic, we get first
fl 472635.0000
+2 7 . 5 0 3
472663.0000
and then
fl 472663
472630.
33
fl 472635
- 472630
5.
and
fl 5.0000
+ 27.5013
32.503
The cancellation in the first subtraction is every bit as catastrophic as the cancellation
in (4.22). Yet the answer is exact.
There is no single paradigm for explaining the effects of cancellation. But it is fre-
quently useful to regard cancellation as revealing a loss of information that occurred
earlier in the computation—or if not in the computation, then when the input for the
www.pdfgrip.com
computation was generated. In the example above, the number 27.5013 in (4.21) could
be replaced by any number in the interval [27.5000,28.4999] and the computed sum
would be unaltered. The addition has destroyed all information about the last four dig-
its of the number 27.5013. The cancellation, without introducing any errors of its own,
informs us of this fact.
Cancellation often occurs when a stable algorithm is applied to an ill-condition-
ed problem. In this case, there is little to be done, since the difficulty is intrinsic to
the problem. But cancellation can also occur when an unstable algorithm is applied
to a well-conditioned problem. In this case, it is useful to examine the computation to
see where information has been lost. The exercise may result in a modification of the
algorithm that makes it stable.
Overflow
There are two reasonable actions to take when overflow occurs.
1. Stop the calculation.
2. Return a number representing a machine infinity and continue the calcula-
tion.
For underflow there are three options.
1. Stop the calculation.
2. Return zero.
3. Return an unnormalized number or zero.
www.pdfgrip.com
The following table illustrates the contrast between the two last options for four-digit
decimal arithmetic with exponent range [-99,99],
Avoiding overflows
The two ways of treating overflows have undesirable aspects. The first generates no
answer, and the second generates one that is not useful except in specialized applica-
tions. Thus overflows should be avoided if at all possible. The first option for under-
flow is likewise undesirable; however, the second and third options may give useful
results if the calculation is continued. The reason is that in many cases scaling to avoid
overflows insures that underflows are harmless.
To see how this comes about, consider the problem of computing
s=\a\ + \b\,
then the numbers a/s and are less than one in magnitude, and their squares cannot
overflow. Thus we can compute c according to the formula
The following algorithm computes \/a2 + 62 with scaling that insures that overflows
cannot occur and underflows are harmless.
1. Eudid(a, b]
2. 8 = \a\ + \b\
3. if (5 = 0)
4. return 0 ! Zero is a special case.
5. else
6. return 5y / (a/5) 2 + (b/s)2
7. end if
8. end Euclid
Moreover, this scaling renders underflows harmless. To see this, suppose that the
number a has the largest magnitude. Then the magnitude of a/s must be greater than
0.5. Now if (b/s) underflows and is set to zero, the formula (4.26) gives the answer
a . But in this case a\ is the correctly rounded answer. For if b/s underflows, we
must have \b/a = 10~100, and hence s = \a . Consequently, by the Taylor series
expansion \/l + x = 1 + 0.5& + • • •, we have
The factor on the right represents a relative error of about 10 20°. Since eM = 10 10,
a | is the correctly rounded value of c.
We summarize this technique in Algorithm 4.1 (which we have already seen in
§1.2).
Many matrix algorithms can be scaled in such a way that overflows do not occur
and underflows are not a problem. As in Algorithm 4.1, there is a price to be paid. But
for most matrix algorithms the price is small compared to the total computation when
n is large enough.
In addition, the standard specifies system interrupts on exceptions and status flags
to indicate the nature of the interrupt. These can be used to take special action, e.g.,
stop the calculation on overflow.
2.00000
1.99999
agree to no figures, yet they have a relative error of about 10~6. We need to add some-
thing like: If two digits differ by one and the larger is followed by a sequence of nines
while the smaller is followed by a sequence of zeros, then the digits and their trailers
are in agreement.
1.000000
- 0.999999
0.00000
www.pdfgrip.com
and normalize the result to the correct answer: .100000 • 10 6. However, if the com-
puter carries six digits, the trailing 9 will be lost during the alignment. The resulting
computation will proceed as follows
.00000
- 0.99999
0.0000
giving a normalized answer of .100000 -10 5. In this case, the computed answer has
a relative error of ten!
The high relative error is due to the absence of a guard digit to preserve essential
information in the course of the computation. Although the absence of a guard digit
does not affect the grosser aspects of matrix calculations, it makes certain fine ma-
neuvers difficult or impossible. At one time algorithmists had no choice but to work
around computers without guard digits: there were too many of them to ignore. But
as the number of such computers has declined, people have become less tolerant of
those that remain, and the present consensus is that anything you can do with a stan-
dard floating-point arithmetic is legitimate.
Stability
Stability is an overworked word in numerical analysis. As used by lay people it usually
means something imprecise like, "This algorithm doesn't bite." The professionals, on
the other hand, have given it a number of precise but inconsistent meanings. The sta-
bility of a method for solving ordinary differential equations is very different from the
stability of an iterative method for solving linear equations.
It is not clear just when the term stability in dense matrix computations acquired
its present meaning of backward stability. The word does not appear in the index of
Rounding Errors in Algebraic Processes [345, 1963] or of The Algebraic Eigenvalue
Problem [346,1965]. Yet by 1971 Wilkinson [348] was using it in the current sense.
The meaning of backward stability can vary according to the measure of nearness
used to define it. The stability of a computed sum might be called relative, component-
wise, backward stability because small relative errors are thrown back on the individ-
ual components of the input. Many classical results on stability are cast in terms of
www.pdfgrip.com
norms, which tend to smear out the error across components. For more see §3, Chap-
ter 3.
The term "weak stability" was first published by Bunch [52, 1987], although the
phrase had been floating around for some time. The first significant example of a weak
stability result was Bjorck's analysis of the modified Gram-Schmidt algorithm for solv-
ing least squares problems [37]. It should be noted that a weakly stable algorithm may
be satisfactory for a single application but fail when it is iterated [307].
Condition numbers
The notion of a condition number for matrices was introduced by Turing [321,1948].
In his own words:
When we come to make estimates of errors in matrix processes we shall find that
the chief factor limiting the accuracy that can be obtained is "ill-conditioning"
of the matrices involved. The expression "ill-conditioned" is sometimes used
merely as a term of abuse applicable to matrices or equations, but seems most
often to carry a meaning somewhat similar to that defined below.
Cancellation
Since cancellation often accompanies numerical disasters, it is tempting to conclude
that a cancellation-free calculation is essentially error free. See [177, Ch. 1] for coun-
terexamples.
To most people catastrophic cancellation means the cancellation of a large number
of digits. Goldberg [147] defines it to be the cancellation of numbers that have errors
in them, implying that cancellation of a single bit is catastrophic unless the operands
are known exactly.
Exponent exceptions
If the scale factor 5 in the algorithm Euclid is replaced by max{|a|, |6|}, the results
may be less accurate on a hexadecimal machine. The reason is that the number
is a little bit greater than one so that the leading three bits in its representation are zero.
I discovered this fact after two days of trying to figure out why an algorithm I had coded
consistently returned answers about a decimal digit less accurate than the algorithm it
was meant to replace. Such are the minutiae of computer arithmetic.
www.pdfgrip.com
3
GAUSSIAN ELIMINATION
During a convivial dinner at the home of Iain Duff, the conversation turned to the fol-
lowing question. Suppose you know that an imminent catastrophe will destroy all the
books in the world—except for one which you get to choose. What is your choice
and why? There are, of course, as many answers to that question as there are people,
and so we had a lively evening.
A similar question can be asked about matrix computations. Suppose that all ma-
trix algorithms save one were to disappear. Which would you choose to survive? Now
algorithms are not books, and I imagine a group of experts would quickly agree that
they could not do without the ability to solve linear equations. Their algorithm of
choice would naturally be Gaussian elimination—the most versatile of all matrix al-
gorithms. Gaussian elimination is an algorithm that computes a matrix decomposi-
tion— in this case the factorization of a matrix A into the product LU of a lower tri-
angular matrix L and an upper triangular matrix U. The value of having a matrix de-
composition is that it can often be put to more than one use. For example, the LU
decomposition can be used as follows to solve the linear system Ax = b. If we write
the system in the form LUx = b, then Ux = L~lb, and we can generate x by the
following algorithm.
1. Solve the system Ly = b
2. Solve the system Ux = y
But the decomposition can also be used to solve the system A^x = b as follows.
1. Solve the system UTy = b
2. Solve the system L^x = y
It is this adaptability that makes the decompositional approach the keystone of dense
matrix computations.
This chapter consists of four sections. The first is devoted to the ins and outs of
Gaussian elimination when it is applied to a general matrix. However, a major virtue
of Gaussian elimination is its ability to adapt itself to special matrix structures. For this
reason the second section treats Gaussian elimination applied to a variety of matrices.
147
www.pdfgrip.com
The third section treats the perturbation theory of linear systems. Although this sec-
tion is logically independent of Gaussian elimination, it leads naturally into the fourth
section, which discusses the effects of rounding error on the algorithm.
For the most part, Gaussian elimination is used with real, square matrices. Since
the extension of the theory and algorithms to complex or rectangular matrices is trivial,
we will make the following expository simplification.
Throughout this chapter A will be a real matrix of order n.
1. GAUSSIAN ELIMINATION
This section is concerned with Gaussian elimination for dense matrices that have no
special structure. Although the basic algorithm is simple, it can be derived and im-
plemented in many ways, each representing a different aspect of the algorithm. In the
first subsection we consider the basic algorithm in four forms, each of which has its
own computational consequences. The next subsection is devoted to a detailed alge-
braic analysis of the algorithm, an analysis which leads naturally to the topics of block
elimination and Schur complements.
The basic algorithm can fail with a division by zero, and in §1.3 we show how to
use row and column interchanges to remove the difficulty—a device called pivoting.
In § 1.4 we present a number of common variants of Gaussian elimination. Finally in
§ 1.5 we show how to apply the results of Gaussian elimination to the solution of linear
systems and the inversion of matrices.
Gauss's elimination
Gauss originally derived his algorithm as a sequential elimination of variables in a
quadratic form. Here we eliminate variables in a system of linear equations, but the
process has the flavor of Gauss's original derivation.
www.pdfgrip.com
and substitute this value into the last three equations, we obtain the system
where
We may repeat the process, solving the second equation in (1.2) for x-2 and substituting
the results in the last two equations. The result is the system
where
Finally, if we solve the third equation of the system (1.3) for x3 and substitute it into
the fourth, we obtain the system
where
This last system is upper triangular and can be solved for the Xi by any of the tech-
niques described in §2, Chapter 2.
www.pdfgrip.com
Then it is easily verified that after one step of Gaussian elimination the system has the
form
(The numbers t^ will be called multipliers.) If for i = 2,3,4 we subtract In times the
first row of the system from the z'th row of the system, the result is the system (1.2).
Another way of looking at this process is that the multipliers l\{ are calculated in such
a way that when In times the first row is subtracted from the ith row the coefficient of
x\ vanishes.
To continue the process, we subtract multiples of the second row of the reduced
system (1.2) from the remaining rows to make the coefficient of x% zero. And so on.
This is an extremely productive way of viewing Gaussian elimination. If, for ex-
ample, a coefficient of #1 is already zero, there is no need to perform the correspond-
ing row operation. This fact allows us to derive efficient algorithms for matrices with
many zero elements. We will return to this view of Gaussian elimination in §2, where
we consider tridiagonal, Hessenberg, and band matrices.
www.pdfgrip.com
where the numbers In are the multipliers defined by (1.5). Then it is easily verified
that
Thus the matrix LI A is just the matrix of the system (1.2) obtained after one step of
Gaussian elimination.
The process can be continued by setting
where
It follows that
where
Then
Hence:
Gaussian elimination computes the LU DECOMPOSITION
Equivalently,
The first two equations say that the first row of U is the same as the first row of A.
The third equation, written in the form ti\ — a^c^i, says that the first column of L
consists of the first set of multipliers. Finally, the third equation, written in the form
says that the product L^ #22 is the Schur complement of aii. In other words, to com-
pute the LU decomposition of A:
1. set the first row of U equal to the first row of A
2. compute the multipliers in and store them in the first
column of L
3. apply this process recursively the Schur complement
of an
The technique of partitioning a decomposition to get an algorithm is widely ap-
plicable— for example, it can be used to derive the Gram-Schmidt algorithm for or-
thogonalizing the columns of a matrix (§1.4, Chapter 4). By varying the partitioning
one can obtain variants of the algorithm in question, and we will exploit this fact ex-
tensively in §1.4.
The algorithm
In presenting classical Gaussian elimination, we will regard it as a method for comput-
ing the LU decomposition of A [see (1.8)]. In most implementations of the algorithm,
www.pdfgrip.com
the L- and U-factors overwrite the matrix A in its array. Specifically, at the first step
the first row of U, which is identical with the first row of A, is already in place. The el-
ements of ^21 = a^1021 can overwrite 021. (We do note need to store the elemental,
since it is known to be one.) Symbolically, we can represent the first step as follows:
Hence:
The operation count for Algorithm 1.1 is
www.pdfgrip.com
The Schur complement plays an important role in matrix computations and is well
worth studying in its own right. But to keep things focused, we will only establish
a single result that we need here. We will return to Schur complements later in this
subsection.
www.pdfgrip.com
The first factor on the right-hand side is nonsingular, because it is lower triangular with
ones on its diagonal. Hence A is nonsingular if and only if the second factor is non-
singular. This factor is block upper triangular and is therefore nonsingular if and only
if its diagonal blocks are nonsingular. But by hypothesis the diagonal block AH is
nonsingular. Hence, A is nonsingular if and only if the second diagonal block S is
nonsingular. •
We are now in a position to establish conditions under which Gaussian elimination
goes to completion.
Theorem 1.3. A necessary and sufficient condition for Algorithm 1.1 to go to comple-
tion is that the leading principal submatrices of A of order I,... , n-1 be nonsingular.
Proof. The proof of sufficiency is by induction. Forn = 1 the algorithm does nothing,
which amounts to setting L = 1 and U = an.
Now assume the assertion is true of all matrices of order n—1. Let A be partitioned
in the form
Since an / 0, the first step of Algorithm 1.1 can be performed. After this step the
matrix assumes the form
where S is the Schur complement of an. Since the remaining steps of the algorithm
are performed on S, by the induction hypothesis it is sufficient to prove that the leading
principal submatrices of S of order 1,... , n—2 are nonsingular.
Let A^ and 5^ denote the leading principal submatrices of A and S order k.
Then S^ is the Schur complement of an in A^k+l^. (To see this, consider Algo-
rithm 1.1 restricted to A^+1L) By hypothesis A^k+1^ is nonsingular fork = 1,... , n—
2. Hence by Theorem 1.2, 5^ is nonsingular for k — 1,... , n-2. This completes
the proof of sufficiency.
www.pdfgrip.com
For the necessity of the conditions, assume that the algorithm goes to comple-
tion. Then the result is an LU decomposition A = LU of A. Moreover, the pivots
for the elimination are the diagonal elements of Vkk of U. Since these elements must
be nonzero for k = 1,... , n—1, the leading principal submatrices U^ are nonsingu-
lar for k = 1,... ,7i-l. Since L has ones on its diagonals, the matrices L^ are also
nonsingular. Now by the triangularity of L and U,
The last comment raises an important question. Suppose we stop classical Gauss-
ian elimination after the completion of the kth step. What have we computed?
The easiest way to answer this question is to focus on an individual element and
ask what computations are performed on it. [For another example of this approach see
(2.8), Chapter 2]. Figure 1.1 shows that the elements at-j are divided into two classes:
an upper class of a z j's(i < y) destined to become u 'sand a lower class of a z -j's(i > j )
destined to become I's. The following programs show what happens to the members
of each class lying in the kth row or column.
Then
where the components of S are the partially processed elements of A (the a's in Fig-
ure 1.1). The correctness of this equation may be most easily seen by verifying it for
the case k = 3.
Equation (1.12) shows that we have computed a partial LU decomposition that
reproduces A in its first k rows and columns. It remains to determine what the matrix
5 is. To do this, multiply out (1.12) to get
Theorem 1.4. Let Algorithm 1.1 be stopped at the end of the k th step, and let the con-
tents of the first k rows and columns of A be partitioned as in (1.11). If A is partitioned
conformally, then
where
This theorem shows that there are two ways of computing the Schur complement
5. The first is to compute a nested sequence of Schur complements, as in classical
Gaussian elimination. The second is to compute it in one step via (1.13). This fact is
also a consequence of a general theorem on nested Schur complements (Theorem 1.7),
which we will prove later in this subsection. But first some observations on LU decom-
positions are in order.
www.pdfgrip.com
LU decompositions
In Theorem 1.3 we gave conditions under which such an LU decomposition exists. We
now turn to the uniqueness of the decomposition. To simplify the exposition, we will
assume that the matrix A is nonsingular.
In discussing uniqueness it is important to keep in mind that what we have been
calling an LU decomposition is just one of a continuum of factorizations of A into the
product of lower and upper triangular matrices. For if A = LU and D is a nonsingular
diagonal matrix, then
Now the matrix on the right-hand side of this equation is lower triangular and the ma-
trix on the left is upper triangular. Hence they are both diagonal. By the convention
(1.15) L and L~l are unit lower triangular. Hence
and L = L and U = U.
It should be stressed that even when A is singular, it may still have a unique LU de-
composition. For example, if the leading principal submatrices of A of orders up to
n—1 are nonsingular, then by Theorem 1.3 A has an LU decomposition, and it is easy
to show that it is unique.
In general, however, a nonsingular matrix may fail to have an LU decomposition.
An example is the matrix
www.pdfgrip.com
Block elimination
To introduce block elimination, we consider the following system:
whose solution is the vector of all ones. Because the first pivot is zero, we cannot solve
the system by eliminating the variables in their natural order. Equivalently, classical
Gaussian elimination fails on this system.
The usual cure for this problem is to eliminate variables in a different order. This
amounts to interchanging rows and columns of the matrix to bring a nonzero element
into the pivot position. We will treat this important technique—called pivoting—
more thoroughly in §1.3.
An alternative is to eliminate more than one variable at a time. For example, if the
first two equations are solved for £1 and £2 in terms of £3 and the results are plugged
into the third equation, we get
This system can be solve for £3 = 1 and then simultaneously for £1 and £2- This pro-
cess is an example of block elimination.
Because block elimination is important in many of applications, in this section we
will give a brief sketch of its implementation and analysis. Fortunately, if the non-
commutativity of matrix multiplication is taken into account, the results for classical
Gaussian elimination carry over mutatis mutandis.
To fix our notation, let A be partitioned in the form
www.pdfgrip.com
In other words:
1. The first (block) row of U in the block LU decomposition of A is the first
(block) row of A.
2. The first (block) column of L, excluding a leading identity matrix of order
T*i, is A*iAii.
3. The rest of the block decomposition can be obtained by computing the block
^
decomposition of the Schur complement A,* - A^A^Ai*.
Algorithm 1.2 is an implementation of this scheme. The code parallels the code for
classical Gaussian elimination (Algorithm 1.1), the major difference being the use of
the indices Ix and ux to keep track of the lower and upper limits of the current blocks.
Three comments.
• In the scalar case, the LU decomposition is normalized so that L is unit lower trian-
gular. Here the diagonal blocks of L are identity matrices. This means that the scalar
and block LU decompositions are not the same. However, the scalar decomposition,
if it exists, can be recovered from the block decomposition in the form (LD~l)(DU),
where D is a diagonal matrix formed from the U-factors of the diagonal blocks of L.
• Although we have written the algorithm in terms of inverses, in general we would
use some decomposition (e.g., a pivoted LU decomposition) to implement the compu-
tations. (See §1.5 for how this is done.)
• Surprisingly, the operation count for the algorithm does not depend on the size of
the blocks, provided LU decompositions are used to implement the effects of inverses
in the algorithm. In fact, the count is roughly ^n3, the same as for scalar Gaussian
elimination.
Let A be partitioned as in (1.16), where the diagonal blocks AH are of order n[i]. The
following algorithm overwrites A with its block LU decomposition
the blocking strategy illustrated in Figure 3.3, Chapter 2. For more see the notes and
references.
Not only is the code for block elimination quite similar to the code for classical
Gaussian elimination, but the natural generalizations of the analysis are valid. If for
k = 1,... , ra—1 the leading principal submatrices of A of order n\ + • • • + n^ are
nonsingular, then the algorithm goes to completion. If the algorithm is stopped after
www.pdfgrip.com
step k, then
where
Schur complements
Throughout this section we have worked piecemeal with Schur complements. We will
now give a systematic statement of their more important properties. We begin with
what we can derive from a 2x2 block LU decomposition. Many of the following re-
sults have already been established, and we leave the rest as an exercise. Note that
they do not depend on the choice of diagonal blocks in the block LU decomposition.
where LU and U\\ are nonsingular. Moreover, for any such decomposition
www.pdfgrip.com
If in addition A is nonsingular, then so are L22, U22, and hence the Schur complement
A22 - A2iA^Ai2 = L22U22. If we partition
then
Let
1.3. PIVOTING
Algorithm 1.1 fails if the element A[fc, k] in statement 3 is zero. A cure for this prob-
lem is to perform row and column interchanges to move a nonzero element from the
submatrix A[k:n, k:n] into A[k, k], a process known as pivoting. Although the idea of
pivoting is simple enough, it is not a trivial matter to decide which element to use as
a pivot. In this subsection we discuss some generalities about pivoting. More details
will be found at appropriate points in this work.
1. for A; = 1 ton-1
2. if (A[k:n, k:n] = 0) return fi
3. Find indices Pk,qk>k such that A\pk, qk] ^ 0
4. A[k, l:n] ++ A\pk, l:n]
5. A[\.:n,k] <->• A[l:n, q^}
6. A[k+l:n, k] = A[k+l:n, k]/A[k, k]
1. A[k+l:n, k+l:n] = A[k+l:n, k+l:n] - A[Jfe+l:n,fc]*A[fc,k+l:n]
8. end k
Theorem 1.8. In Algorithm 1.3 let Pk be the(k, pk)-exchange matrix and let Qk be
the (fc, qk)-exchange matrix. Then in the notation of Theorem 1.4,
where the matrix S is the Schur complement of the leading principal submatrix of order
k ofPk • • • PiAQi • - -Qk- In particular, if we set
Generalities on pivoting
A consequence of Theorem 1.8 is that pivoted Gaussian elimination is equivalent to
making interchanges in the original matrix and performing unpivoted Gaussian elim-
ination. For purposes of analysis, this is a useful result, since we may assume that
any interchanges have been made at the outset. In practice, however, we must know
something about the reduced matrix A[k:n, k:n] in order to choose a pivot. Sometimes
theory will guide us; but where it does not, our only recourse is to determine pivots on
the fly.
The process of selecting pivots has two aspects: where pivots come from and how
pivots are chosen. We will treat each in turn. Since the details of pivoting depend on
the algorithm in question and its application, the following discussion is necessarily
general — an overview of the territory.
• Where pivots come from. The most important restriction on choosing pivots is
that each candidate has to be completely reduced so that it is a member of the cur-
rent Schur complement. Since classical Gaussian elimination updates the entire Schur
www.pdfgrip.com
complement at each stage, every element of A[k:n, k:n] is a candidate. However, other
variants of Gaussian elimination postpone the reduction of some of the elements of
A[k:n, k:n] and thus restrict the range of choice of pivots. For examples see Algo-
rithms 1.5,1.6, and 1.7.
The process of choosing pivots from the entire array A[fc:n, k:n] is known as com-
plete pivoting. Its advantage is that it gives us the widest possible choice of pivots.
However, since the entire array must be searched to find a pivot, it adds a small 0(n3)
overhead to unpivoted Gaussian elimination.
An alternative that does not add significant overhead is to choose the pivot element
from the column A[k:n, k], a process known as partial pivoting. Although partial piv-
oting restricts our selection, it can be done with more variants of Gaussian elimination
than complete pivoting. The alternative of selecting pivots from the row A[k, n:k] is
seldom done.
Schur complements in a symmetric matrix are symmetric. Consequently, Gaus-
sian elimination preserves symmetry and, with proper organization, can factor a sym-
metric matrix at half the usual cost. Unfortunately, pivoting destroys symmetry. The
exception is when pivots are chosen from the diagonal of A[k:n, k:n]. This process is
known as diagonal pivoting. Diagonal pivoting is also required to preserve the struc-
ture of other classes of matrices — most notably, M-matrices and diagonally dominant
matrices.
• How pivots are chosen. Although any nonzero pivot will be sufficient to advance
Gaussian elimination to the next stage, in practice some pivots will be better than oth-
ers. The definition of better, however, depends on what we expect from the algorithm.
The most common way of selecting a pivot from a set of candidates is to choose
the one that is largest in magnitude. The process is called pivoting for size. There are
two reasons to pivot for size.
The first reason is numerical stability. We shall see in §4 that Gaussian elimina-
tion is backward stable provided the elements of the array A do not grow too much in
the course of the algorithm. Pivoting for size tends to inhibit such growth. Complete
pivoting for size is unconditionally stable. Partial pivoting for size can be unstable,
but real-life examples are infrequent and unusual in structure.
The second reason is to determine rank. In Theorem 2.13, Chapter 1, we used
Gaussian elimination to establish the existence of a full-rank factorization of a matrix.
The algorithm corresponding to this proof is a version of Algorithm 1.3 that returns at
statement 2. For in that case, the current Schur complement is zero, and the first k-l
columns in the array A contain full-rank trapezoidal factors Lk-i and Uk-i such that
PAQ = Lk-iUk-\- This suggests that Gaussian elimination can be used to determine
rank. For more see §2.4, Chapter 5.
Another way of choosing pivots is to preserve sparsity. A sparse matrix is one
whose elements are mostly zero. We can often take advantage of sparsity to save time
and memory in a matrix algorithm. However, most matrix algorithms tend to reduce
sparsity as they proceed. For example, if Aj, denotes the current matrix in Gaussian
www.pdfgrip.com
elimination and a\k' and a].-' are both nonzero, then a^ ' = a\j' - a]k 'a\- /akk
(k)
will in general be nonzero—always when a]-' = 0. This introduction of nonzero
elements in a sparse matrix is called fill-in.
Clearly the choice of pivot influences fill-in. For example, if all the elements in
the pivot row and column are nonzero, then Gaussian elimination will fill the current
submatrix with nonzero elements. Consequently, most algorithms for sparse matrices
use a pivoting strategy that reduces fill-in, a process called pivoting for sparsity. Un-
fortunately, pivoting for size and pivoting for sparsity can be at odds with one another,
so that one must compromise between stability and sparsity.
A word on nomenclature. The terms complete and partial pivoting are frequently
used to mean complete and partial pivoting for size. This usage is natural for dense
matrices, where pivoting for size is the norm. But other applications demand other
strategies. It therefore makes sense to reserve the words "complete" and "partial" to
describe where the pivots are found and to add qualifiers to indicate how pivots are
chosen.
Sherman's march
Figure 1.3 illustrates an algorithm we will call Sherman's march. Here (and in the other
variants) no Schur complement is computed, and the white area represents untouched
elements of the original matrix. A step of the algorithm proceeds from the LU decom-
position of a leading principal submatrix of A and computes the LU decomposition
of the leading principal submatrix of order one greater. Thus the algorithm proceeds
to the southeast through the matrix, just like Sherman's procession from Chattanooga,
Tennessee, to Savannah, Georgia.
The algorithm is easy to derive. Consider the LU decomposition of the leading
principal submatrix of A of order k in the following partitioned form (in this subsection
www.pdfgrip.com
which is a triangular system that can be solved for u\k. Computing the (1,2)-block,
we get
another triangular system. Finally, from the (2,2)-block we have t^uik + Vkk = #&&>
or
Algorithm 1.5 implements the bordering method described above. The triangular
systems (1.21) and (1.22) are solved by the BLAS xellib and xebui (see Figure 2.2,
Chapter 2). We begin the loop at k = 2, since A[l,l] already contains its own LU de-
composition. But with our conventions on inconsistent statements, we could equally
well have begun at k = 1.
At this point we should reemphasize that this algorithm, and the ones to follow,
are arithmetically identical with classical Gaussian elimination. If xellib and xebui are
coded in a natural way, Sherman's march and classical Gaussian elimination perform
exactly the same arithmetic operations on each element and for each element perform
the operations in the same order. In spite of this arithmetic equivalence, Algorithm 1.5
has two important drawbacks.
First, it does not allow pivoting for size. At the kth step, the Schur complement
of AH, where one must look for pivots, has not been computed. For this reason the
algorithm is suitable only for matrices for which pivoting is not required.
www.pdfgrip.com
Second, the work in the algorithm is concentrated in the solution of triangular sys-
tems. Although this does not change the operation counts, the severely sequential na-
ture of algorithms for solving triangular systems makes it difficult to get full efficiency
out of certain architectures.
Pickett's charge
Figure 1.4 illustrates the two versions of another variant of Gaussian elimination. It
is called Pickett's charge because the algorithm sweeps across the entire matrix like
Pickett's soldiers at Gettysburg. The charge can be to the east or to the south. We will
consider the eastern version.
To derive the algorithm, partition the first k columns of the LU decomposition of
A in the form
and assume that LU, Lki, and U\\ have already been computed. Then on computing
the (1,2)-block, we get
www.pdfgrip.com
which we can solve for uik- Computing the (2,2)-block, we get LkiUik + v^kk =
a,kk, from which we have
After the right-hand side of this relation is computed, v^k is determined so that the first
component of tkk is equal to one.
It is possible to introduce partial pivoting into this algorithm. To see how, consider
the following picture.
It shows the state of the array A just after the computation of Vkk^kk, which is indicated
by the lightly shaded column. Now by (1.24) this vector is equal to a^ - L^uik,
which is precisely the part of the Schur complement from which we would pick a pivot.
If we choose a pivot from this column and interchange the entire row of the array A
with the pivot row (as indicated by the arrows in the above picture), we are actually
performing three distinct operations.
1. In A[:, 1 :k] we interchange the two rows of the part of the L-factor that has
already been computed.
2. In A[:, k] we interchange the two elements of the current Schur complement.
3. In A[:, k+l:n] we interchange the two rows of A.
But these three operations are precisely the interchanges we make in Gaussian elimi-
nation with partial pivoting.
Combining these observations, we get Algorithm 1.6, in which the BLAS xellib is
used to solve (1.23). In pivoting we perform the interchanges on the entire array A, so
that the final result is an LU decomposition of the matrix A with its rows interchanged
as specified by the integers pk. The charge-to-the-south algorithm is analogous. How-
ever, if we want to pivot, we must perform column rather than row interchanges.
Grout's method
The Crout algorithm has the same pattern as classical Gaussian elimination, except
that the computation of the Schur complement is put off to the last possible moment
(Figure 1.5). To derive the algorithm, partition the first k columns of the LU decom-
position in the form
www.pdfgrip.com
where LU, L^i, Uu, and uik are assumed known. Then
where vkk is determined so that the first component of ikk is one. As in Pickett's
charge, we can pivot for size at this point.
Now partition the first k rows of the factorization in the form
It follows that
Thus we arrive at Algorithm 1.7. Like classical Gaussian elimination, the Crout algo-
rithm is entirely free of triangular solves.
1. forfc = l t o n
2. A[k:n, k] = A[k:n, k]-A[k:n, l:k-l]*A[l:k-l, k]
3. Determine pk so that |A[pfc, k]\ > \A[i, k}\ (i = /;,..., n)
4. j4[Ar, l:n] <->• j4[pfc51:«]
5.. (A[k,k] ^ 0) A[A+l:n,fc] = A[k+l:n,k]/A[k,k}R
6. A[k, k+l:n] = A[k, k+l:n]-A[k, l:*-!]*^!^-!, k+l:n]
7. end for k
example, classical Gaussian elimination alters the (n, n)-element n—1 times as it up-
dates the Schur complement. The algorithms of this subsection can be coded so that
they alter it only once when it becomes vnn. As we have seen in §3.3, Chapter 2, this
can be a considerable advantage on machines where writing to cache is more expen-
sive than reading from it.
Let the array A contain the LU decomposition of A computed with partial pivoting,
and let pi,... , pn~i be the pivot indices. Then the following algorithm overwrites B
with the solution of the system AX = B.
1. for k = 1 to n—1
2. B[kt:] <-> B\pk,:}
3. end for k
4. xellib(B, A, 5)
5. xeuib(B, A, 5)
If we define y = Pn_! • • • P\x, we can solve the system U^L^y = b and then inter-
change components to get x — PI • • • Pn-\y- The result is Algorithm 1.9, in which
we use the BLAS xeuitb and xellitb to solve transposed triangular systems.
For all their small size, these algorithms have a lot to be said about them.
• The bulk of the work in these algorithms is concentrated in solving triangular sys-
tems. If B has only a single column, then two systems must be solved. Since it takes
about \nL flam to solve a triangular system, the algorithms take about n2 flam. More
generally:
www.pdfgrip.com
IfB has t columns, the operation count for Algorithm 1.8 or Algorithm 1.9 is
in2 flam.
• The algorithms effectively compute A~l B and A~TB. The cost of these computa-
tions is the same as multiplying B by A~l or by A~ T . Thus the algorithms represent
a reasonable alternative to the invert-and-multiply algorithm. We will return to this
point later.
• The algorithms overwrite theright-handside B with the solution X. This conserves
storage—which can be substantial if B is large — at the cost of forcing the user to save
B whenever it is required later. An alternative would be to code, say, Algorithm 1.8
with the calling sequence linsolve(A, B,X),in which B is first copied to X and the
solution is returned in X. The invocation linsolve(A, B,B) would then be equivalent
to Algorithm 1.8.
• Algorithm 1.8 could be combined with one of our algorithms for computing an
LU decomposition in a single program. This approach is certainly easy on the naive
user, who then does not have to know that the solution of a linear system proceeds in
two distinct steps: factorization and solution. But this lack of knowledge is danger-
ous. For example if the user is unaware that on return the array A contains a valuable
factorization that can be reused, he or she is likely to recompute the factorization when
another task presents itself—e.g., a subsequent solution of ATx = b.
• Ideally our two algorithms for using the LU decomposition should be supplemented
by two more: one to solve XA = B and another to solve XAT = B. Fortunately,
our triangular BLAS make the coding of such algorithms an elementary exercise.
Determinants
People who work in matrix computations are often asked for programs to compute
determinants. It frequently turns out that the requester wants the determinant in order
to solve linear systems—usually by Cramer's rule. There is a delicious irony in this
situation, for the best way to compute a determinant of a general matrix is to compute
it from its LU decomposition, which, as we have seen, can be used to solve the linear
system.
However, if a determinant is really needed, here is how to compute it. Since A =
Pi...P n _i£tf,
Now det(Z) = 1 because L is unit lower triangular, and det({7) = vu- • • vnn. More-
over, det(Pfc) is 1 if Pk is the identity matrix and —1 otherwise. Thus the product of
the determinant of the exchanges is 1 if the number of proper interchanges is even and
— 1 if the number is odd. It follows that
www.pdfgrip.com
It should be noted that the formula (1.27) can easily underflow or overflow, even
when the elements of A are near one in magnitude. For example, if the va are all ten,
then the formula will overflow in IEEE single-precision arithmetic for n > 38. Thus
a program computing the determinant should return it in coded form. For example,
UNPACK returns numbers d and e such that det(A) = d • 10e.
Matrix inversion
Turning now to the use of the LU decomposition to compute matrix inverses, the algo-
rithm that comes first to mind mimics the proof of Theorem 3.20, Chapter 1, by using
Algorithm 1.8 to solve the systems
for the columns Xj of the inverse of A. If only a few of the columns of A l are re-
quired, this is a reasonable way to compute. However, if the entire inverse is needed,
we can economize on both storage and operations by generating the inverse in place
directly from the factors L and U. There are many ways to do this, of which the fol-
lowing is one.
As above, let us suppose that the LU decomposition of A has been computed with
partial pivoting. Then Pn-\ -—P\A = LU
The indexing in this partition is to the northwest. It follows that VkkVkk = 1 and
Suuik + vkksik = 0. Equivalent^,
www.pdfgrip.com
These formulas for the kth column of S do not involve the first k-l columns of U.
Thus the columns of S can be generated in their natural order, overwriting the corre-
sponding columns of U.
We will now show how to compute X = U~lL~l. As above, let S = U~l,so
that XL = S. Partition this relation in the form
Then
Thus we can generate the columns of X in reverse order, each column overwriting the
corresponding column of L and U~l in the array A.
After U~lL~l has been computed, we must perform the interchanges Pk in re-
verse order as in (1.28).
Algorithm 1.10 is an implementation of the method derived above. It is by far the
most involved algorithm we have seen so far.
• The product S\iu\k in (1-29) is computed explicitly. In a quality implementation
the task would be done by a level-two BLAS.
• The complexity of this algorithm can be determined in the usual way. The first loop
on k requires the multiplication of a (k—l)-vector times a triangular matrix, which
requires ^k2 flam. Consequently the total count for this loop is
The body of the second loop on k requires nk2 flam. Hence its operation count is |n3.
Adding the two operation counts we get
If we add this count to the |n3 flam required to compute the LU decomposition in the
first place, we find:
It reauires
Let the array A contain the LU decomposition of A computed with partial pivoting
and let pi, ... , pn-\ be the pivoting indices. This algorithm overwrites the array A
with A"1.
! Invert U.
1. for k = 1 to n
2. A[k,k] = l/A[k,k]
3. fori = ltofc-1
4. A[i,k] = -A[k,k]*(A[i,i:k-l]*A[ifk-l,k])
5. end for i;
6. end for k
! Calculate U~lL~l.
7. fork = n - l t o l b y - l
8. temp- A[k+l:n,k]
9. A[k+l:n,k] = Q
10. A[l:n,fc]= A[l:n, k] - A[l:n, A+l:n]*rem/?
11. end for k
The count (1.30) has important implications for the invert-and-multiply technique
for solving linear equations. Let B £ R nx '. To solve the system AX = B by invert-
and-multiply, we calculate A~l and then compute X = A~1B. The latter calculation
requires in2 flam, for a total of |n3 + In2 flam. On the other hand, Algorithm 1.8
requires |n3 flam to compute the LU decomposition of A, followed by In2 flam for
the algorithm itself. The total is ^n 3 + In2 flam. The ratio of these counts is
The following table contains the values of p for various representative values o f l / n .
l/n p
0.00 2.5
0.25 1.9
0.50 1.6
0.75 1.5
1.00 1.3
2.00 1.2
oo 1.0
wnen / is small compared wim n, solving directly is almost two and a nan times raster
than inverting and multiplying. Even when / = n it is 30% faster. And of course p
is never less than one, so that the advantage, however small, is always to the direct
solution.
These ratios are a compelling reason for avoiding matrix inversion to solve linear
systems. In §4 we will show that the direct solution is not only faster but it is more
stable, another reason for avoiding matrix inversion.
"But," I hear someone protesting, "I really need the inverse," and I respond, "Are
you sure?" Most formulas involving inverses can be reduced to the solution of linear
systems. An important example is the computation of the bilinear form r = y1 A~lx.
The following algorithm does the job.
1. Solve Au = x
o T
2. T — y*-u
It is worth noting that if you need the (z, j)-element of A~l, all you have to do is plug
x = ej and y = e; into (1.31).
Of course, there are applications in which the inverse is really needed. Perhaps the
most important example is when a researcher wants to scan the elements of the inverse
to get a feel for its structure. In such cases Algorithm 1.10 stands ready to serve.
Elementary matrix
The terminology "elementary lower triangular matrix" for matrices like (1.6) ultimate-
ly derives from the elementary row operations found in most introductory linear alge-
bra texts. The elementary operations are
The LU decomposition
Gauss, who worked with positive definite systems, gave a symmetric decomposition
that is more properly associated with the Cholesky decomposition. Jacobi [191,1857,
posthumous], factored a bilinear form f ( x , y) in the form
in which the linear functions gi and hi depend only on the last n—i+l components
of x and y. If A is the matrix corresponding to /, the coefficients of the hi and gi
form the columns and rows of an LU-decomposition of A. The connection of Gaussian
elimination with a matrix factorization was first noted by Dwyer [111] in 1944—one
hundred and thirty-five years after Gauss published his algorithm.
Schur used the relation only to prove a theorem on determinants and did not otherwise
exploit it. The name Schur complement for the matrix AM — A^A^ A\i is due to
Haynsworth [172].
Cottle [75] and Ouellette [248] give surveys of the Schur complement with histor-
ical material.
Pivoting
The technique of pivoting did not arise from Gaussian elimination, which was histor-
ically a method for solving positive definite systems and did not require it to avoid
division by zero. Instead the idea came from Ohio's method of pivotal condensation
for computing determinants [66,1853]. I*1 modern terminology, the idea is to choose
a nonzero element a^ of A and compute its Schur complement S. Then det(A) =
(—l) t+J 'fl,-j det(5'). Thus the determinant of A can be calculated by repeating the pro-
cedure recursively on S. The element a,ij was called the pivot element and was se-
lected to be nonzero. The practitioners of this method (e.g., see [6], [339]) seem not
to have realized that it is related to Gaussian elimination.
The terms "partial pivoting" and "complete pivoting" are due to Wilkinson [344].
www.pdfgrip.com
him. Grout's contributions are substantial enough to attach his name to the variant
he published (though a case can be made for Dwyer). Sherman's march and Pickett's
charge are pure whimsy—they echo my leisure reading at the time (and they would
have been different had I been reading about the campaigns of Alexander, Caesar, or
Napoleon). Just for the record, Pickett charged to the east.
Matrix inversion
Algorithm 1.10 is what Higham calls Method B in his survey of four algorithms for
inverting matrices [177, §13.3]. The algorithms have essentially the same numerical
properties, so the choice between them must rest on other considerations—e.g., their
interaction with memory.
Most modern texts on numerical linear algebra stress the fact that matrix inverses
are seldom needed (e.g., [153, §3.4.11]). It is significant that although Gauss knew
how to invert systems of equations he devoted most of his energies to avoiding nu-
merical inversion [140, pp. 225-231]. However, it is important to keep things in per-
spective. If the matrix in question is well conditioned and easy to invert (an orthogonal
matrix is the prime example), then the invert-and-multiply may be faster and as stable
as computing a solution from the LU decomposition.
Augmented matrices
Let A = LU be the LU decomposition of A. Given any nxp matrix B, set C —
L~1B. Then
Gauss-Jordan elimination
Gauss-Jordan elimination is a variant of Gaussian elimination in which all the ele-
ments in a column are eliminated at each stage. A typical reduction of a 4x4 matrix
would proceed as follows.
www.pdfgrip.com
(Here the elements to be eliminated are given hats.) Thus the elimination reduces the
matrix to diagonal form. If the same operations are applied to the right-hand side of a
system, the resulting diagonal system is trivial to solve. Pivoting can be incorporated
into the algorithm, but the selection must be from the Schur complement to avoid fill-
ing in zeros already introduced. The method is not backward stable, but it is weakly
stable. For rounding error analyses see [262] and especially [177, §13.4], where fur-
ther references will be found.
With some care, the method can be arranged so that the inverse emerges in the
same array, and this has lead to elegant code for inverting positive definite matrices
[22]. Combined with the expanded matrix approach, it has been used by statisticians to
move variables in and out of regression problems [156]. For more see §3.1, Chapter 4.
1. A is Hermitian,
2. x ^ 0 =» zH,4z > 0.
The requirement that A be Hermitian reduces to symmetry for real matrices. Some
people drop the symmetry requirement and call a real matrix positive definite if x ^
0 => xTAx > 0. We will avoid that usage in this work.
The simplest nontrivial example of a positive definite matrix is a diagonal matrix
with positive diagonal elements. In particular, the identity matrix is positive definite.
However, it is easy to generate more.
Theorem 2.2. Let A G Cnxn be positive definite, and let X e C nxp . Then XE AX
is positive semidefinite. It is positive definite if and only ifX is of full column rank.
Proof. Let x / 0 and let y = Xx. Then xH(XHAX)x = yHAy > 0, by the positive
definiteness of A. If X is of full column rank, then y = Xx ^ 0, and by the positive
definiteness of A, yHAy > 0. On the other hand if X is not of full column rank, there
is a nonzero vector x such that Xx = 0. For this particular x, xli(XllAX)x = 0. •
Any principal submatrix of A can be written in the form XT AX, where the col-
umns of X are taken from the identity matrix (see §2.5, Chapter 1). Since any matrix
X so formed has full column rank, it follows that:
If P is a permutation matrix, then P has full rank. Hence P1AP is positive def-
inite. A transformation of the form P T AP is called a diagonal permutation because
it rearranges the diagonal elements of A. Hence:
We now turn to the properties of positive definite matrices. One of the most im-
portant is that they can be characterized succinctly in terms of their eigenvalues.
Theorem 2.3. A Hermitian matrix A is positive (semi)definite if and only if its eigen-
values are positive (nonnegative).
www.pdfgrip.com
Proof. We will treat the definite case, leaving the semidefinite case as an exercise.
Let A = t/AC/H be the spectral decomposition of A (see Theorem 4.33, Chapter 1).
If the eigenvalues of A are positive, then A is positive definite, and by Theorem 2.2,
so is A. Conversely, if A is positive definite and Au = Xu, with \\u\\z = 1 then
A = uHAu > 0.
From the facts that the determinant of a matrix is the product of the eigenvalues
of the matrix and the eigenvalues of the inverse matrix are the inverses of the eigen-
values, we have the following corollary. Since the eigenvalues of A are positive, A is
nonsingular. Moreover A~l = ?7H A"1 U. This establishes the following corollary.
The fact that a positive definite matrix has positive eigenvalues implies that it also
has a positive definite square root.
Proof. Let A = UA.U be the spectral decomposition of A. Then the diagonal el-
ements \i of A are nonnegative. If we define A a = diag(Ai,... , A n ), then A 2 =
[/A 2 C/T satisfies (2.1). It is clearly positive semidefinite. If A is positive definite,
then the numbers A; are positive and A 2 is positive definite.
Uniqueness is established by a rather involved argument based on the fact that sub-
spaces spanned by eigenvectors corresponding to equal eigenvalues are unique. We
omit it here.
For computational purposes one of the most important facts about positive definite
matrices is that they have positive definite Schur complements.
Theorem 2.6. Let the positive (semi)definite matrix A be partitioned in the form
Proof. We will treat the positive definite case, leaving the semidefinite case as an exer-
cise. Since AH is positive definite, it is nonsingular, and hence its Schur complement
www.pdfgrip.com
is well defined. Let x / 0. Then by the positive definiteness of A and direct compu-
tation,
Given a positive definite matrix stored in the upper half of the array A, this algorithm
overwrites it with its Cholesky factor R.
1. for k — 1 ton
2. xeuitb(A[l:k-l,k],A[l:k-l,l:k-l],A[I:k-l,k])
3. A[k, k] = y/A[k,k] - A[l:Jfc-l,Ar] T *A[l:Jb-l,fc]
4. end for &
But this last number is just the Schur complement of AH and hence by Theorem 2.6
is positive. Thus pnn is uniquely determined in the form
Hence:
www.pdfgrip.com
which has no positive diagonal elements. The only way to obtain a positive pivot is to
move the element 4 or the element 15 into the (1, Imposition, a process that obviously
destroys symmetry.
In itself, this problem is not insurmountable. Since Schur complements in a sym-
metric matrix are symmetric, we could perform Gaussian elimination working with
only the upper part of the matrix. This procedure would have the desired operation
count. And we can even retain a sort of symmetry in the factorization. For if we mul-
tiply the first row of A by -1 and then perform one step of Gaussian elimination, we
obtain a reduction of the form
in which all potential pivots are zero. And even nonzero pivots, if they are sufficiently
small, will cause numerical instability.
Example 2.8. If we attempt to compute the Schur complement of the (1,1)-element
in the matrix
This matrix is exactly of rank one, even though the original matrix is nonsingular. An-
other way of viewing this disaster is to observe that (2.5) is the matrix that would have
resulted from exact computations on the singular matrix
Thus all information about the original elements in the trailing principal submatrix of
order two has been lost in the passage to (2.5).
The above example is our first hint that the generation of large elements in the
course of Gaussian elimination can cause numerical difficulties. We will take up this
point in detail in §4. Here we will use it to derive an algorithm for symmetric indefinite
systems.
The basic idea of the algorithm is to compute a block LDU decomposition in which
blocks of the diagonal matrix D are of order one or two. For example, although the
natural pivots in (2.4) are zero, the leading 2x2 principal submatrix is just a permu-
tation of the identity. If we use it as a "pivot" to compute a Schur complement, we
obtain a symmetric block decomposition of the form
Note that we have chosen the block diagonal factor so that the diagonal blocks of the
triangular factors are identity matrices of order two and one.
www.pdfgrip.com
Unfortunately, it is not sufficient simply to increase the order of the pivot from one
to two whenever a pivot of order one is unsatisfactory. A pivot of order two can also be
small or—just as bad—be near a singular matrix. This means we must search for a
pair of diagonal elements to form the pivot. The search criterion cannot be very elabo-
rate if large overheads are to be avoided. We will describe two strategies for choosing
a satisfactory pivot of order one or two. By satisfactory, we mean that its use in the
elimination algorithm will not introduce unduly large elements, as happened in Exam-
ple 2.8. We will suppose that we are at the fcth step of the reduction, so that ctkk is in
the pivot position. We use a tolerance
(The choice of the tolerance optimizes a bound on the growth of elements in the course
of the reduction.)
The first strategy — complete diagonal pivoting—begins by locating the maximal
off-diagonal element in the Schur complement, say apq, and the maximal element on
the diagonal of the Schur complement, say arr. The choice of pivots is made as fol-
lows.
1. If | arr | > a | apq \ use arr as a pivot.
2. Otherwise use the 2x2 principal matrix whose off-diagonal is apq (2-6)
as a pivot.
The justification of this strategy is that in the first case the largest multiplier cannot be
greater than cr"1. On the other hand, in the second case the pivot block must have an
inverse that is not too large.
This strategy involves an O(n 3 ) overhead to find the largest element in each of the
Schur complements. The effects of this overhead will depend on the implementation;
but small or large it will remain proportionally the same as n increases. The following
alternative—partial diagonal pivoting—has only an O(n2} overhead, which must
wash out as n increases.
The strategy begins by finding an index p > k such that
In other words, apk is the largest element in the fcth row and column of the current
Schur complement, a^k excluded. Likewise, apq is the largest element in the pth row
and column, app excluded. Note that the determination of these indices requires only
0(n) operations.
The final pivot is determined in four stages, the first three yielding a pivot of or-
der one and the last a pivot of order two. We list the stages here, along with a brief
justification of the first three.
www.pdfgrip.com
Once a pivot has been determined, the elimination step must be performed. We
will not give a detailed implementation, since it does not illustrate any new principles.
However, when the pivot is a 2 x 2 block, the computation of the Schur complement
can be done in various ways.
Suppose we have a 2 x 2 pivot and partition the active part of the matrix in the form
where C is the pivot block. Then we must compute the Schur complement
l
One way to proceed is to set C = CB and calculate 5 in the form
Since C and C have but two columns, this is equivalent to subtracting two rank-one
matrices from D, which can be done by level-two BLAS.
A disadvantage of this approach is that extra working storage must be supplied for
C'. We can get around this difficulty by computing the spectral decomposition
of the pivot block (§4.4, Chapter 1). If we define C = CV, the Schur complement can
be written in the form
which again can be implemented in level-two BLAS. If we allow the matrix C to over-
write C we can recover C in the form C = CV1-. Because V is orthogonal, the pas-
sage to C and back is stable.
www.pdfgrip.com
The complexity of this algorithm can be determined by observing that the work
done by pivoting on a block is essentially twice the work done by pivoting on a scalar.
However, pivoting on a block advances the elimination by two steps. Hence the total
operation count is the same as if we had only used scalar pivoting, i.e., the same as for
the Cholesky algorithm. Hence:
The method of block diagonal pivoting takes
This count omits the time to find the pivot. As we have mentioned, the first strategy
(2.6) imposes an O(n3) overhead on the algorithm which may be negligible or sig-
nificant, depending on the implementation details. The second strategy (2.7) requires
only O(n 2 ) work.
There are obvious variants of this algorithm for Hermitian matrices and complex
symmetric matrices. For Hermitian matrices rounding errors can cause small imagi-
nary components to appear on the diagonal, and they should be set to zero during each
elimination step. (In general, it is not good computational practice to enforce reality
in this manner. But for this particular algorithm no harm is done.)
A matrix is tridiagonal if it is zero below the first subdiagonal and above the first su-
perdiagonal. Tridiagonal matrices have the form
Hessenberg and tridiagonal matrices are special cases of band matrices, which we will
treat in the next subsection. But they arise in so many applications that they are worth
a separate treatment.
www.pdfgrip.com
At the kth step of the algorithm, multiples of the kth row are subtracted from
the other rows to annihilate nonzero elements in the kth column. This process
produces an upper triangular matrix that is the U-factor of the original matrix.
The multipliers, placed in the position of the elements they annihilate, constitute
the nonzero elements of the L-factor.
Hessenberg matrices
Let us see how this version of the algorithm works for an upper Hessenberg matrix:
say
and subtract £21 times the first row from the second. This gives the matrix
where
The rows below the second are completely unaltered and hence contribute nothing to
the work of the elimination.
www.pdfgrip.com
If £32 times the second row is subtracted from the third row, the result is
where
The general step should be clear at this point. At the end of the process we obtain
an LU decomposition of A in the form
This reduction fails if any pivot element h'kk is zero. It then becomes necessary to
pivot. Complete pivoting destroys the Hessenberg structure. However, partial pivot-
ing preserves it. In fact, there are only two candidates for a pivot at the fcth stage: h'kk
and /ifc+i.fc- Since the rows containing these pivots have exactly the same structure
of nonzero elements, interchanging them leaves the matrix upper Hessenberg. Algo-
rithm 2.2 implements this scheme. Here are some comments.
• The reduction of a Hessenberg matrix is a comparatively inexpensive process. The
bulk of the work is concentrated in statement 5, which requires about n—k flam. In-
tegrating this count from 0 to n, we find that:
This should be compared with the 0(n3) flam required for Gaussian elimination on a
full dense matrix.
www.pdfgrip.com
Given an upper Hessenberg matrix H, this algorithm overwrites the upper part of the
array H with the U-factor of H. The subdiagonal elements of the array contain the
multipliers.
1. for k — 1 ton-1
2. Choose a pivot index p/t {k,k+l}.
3. H[k, k:n] <-»• fi"[pfc,^:«]
4. #[*+!, *] = lT[fe+l, k]/H[k, k]
5. H[k+l, k+l:n] = JI[fc+l, *+l:n] - fT[fc+l,fc]*JT[Jfe,fc+l:n]
6. end for fc
• The algorithm is decidedly row oriented. However, it passes only once over the
entire matrix, so that the row orientation should make little difference unless the al-
gorithm is used repeatedly on a large matrix. However, it can be recoded in column-
oriented form. For the strategy see Algorithm 1.9, Chapter 4.
• The part of the array H below the first subdiagonal is not referenced and can be used
to store other information. Alternatively, the matrix H can be represented in packed
form.
• The treatment of the L-factor is quite different from its treatment in our previous
algorithms. In the latter, whenever we pivoted we made the same interchanges in L, so
that the final result was a factorization of a matrix A in which all the pivoting had been
done initially. If we attempt to do the same thing in Algorithm 2.2, the n—1 elements of
/ would spread out through the lower part of the array H, destroying any information
contained there. Consequently, we leave the elements of L in the place where they
are generated. This treatment of L, incidentally, is typical of algorithms that take into
account the zero structure of the matrix. It has algorithmic implications for the way
the output is used to solve linear systems, which we will treat in a moment.
Most matrices with a structured arrangement of zero and nonzero elements can
be reduced in more than one way. For example, an upper Hessenberg matrix can be
reduced to upper triangular form by column operations beginning at the southeast cor-
ner. The result is a UL decomposition. Similarly, there are two strategies for reducing
a lower Hessenberg matrix. Reducing the matrix by column operations beginning at
the northwest corner gives an LU decomposition; reducing by row operations begin-
ning at the southeast corner gives a UL decomposition.
The use of the output of Algorithm 2.2 to solve linear systems introduces some-
thing new. Heretofore we were able to apply all the interchanges from pivoting to the
right-hand side at the very beginning. Since Algorithm 2.2 leaves the multipliers in
www.pdfgrip.com
This algorithm uses the output of Algorithm 2.2 to solve the upper Hessenberg system
Hx = b.
1. for A; = 1 ton-1
2. b[k] <-> b\pk]
3. b[k+l] = b[k+l] - H[k+l,k]*b[k]
4. end for k
5. xeuib(b, H, b)
place, we must interleave the pivoting with the reduction of the right-hand side. Al-
gorithm 2.3 shows how this is done with the output of Algorithm 2.2.
The bulk of the work in this algorithm is concentrated in xeuib (see Figure 2.2,
Chapter 2), which requires \T? flam. Thus the a Hessenberg system requires a total
of n2 flam to solve—the same count as for matrix multiplication.
Tridiagonal matrices
The first thing to note about Gaussian elimination applied to tridiagonal matrices is
that pivoting does not preserve the tridiagonal form. However, partial pivoting does
preserve a band structure, so that the storage requirements for the algorithm are of the
same order as the storage required for the matrix itself.
We begin with the matrix
As with Hessenberg matrices, complete pivoting is out of the question, and the only
candidates for partial pivoting are the first and second elements in the first column. We
can represent the pivoting step by writing
www.pdfgrip.com
where the primes indicate a possible change in value. The element d\ will be zero if
there was no interchange; otherwise it will have the value of 62 and 62 wiH be zero.
For the elimination itself, we compute a multiplier
and then subtract i\ times the first row from the second to get the matrix
From this it is seen that each pivot step generates a new (possibly zero) element on
the second superdiagonal, and the subsequent elimination step annihilates an element
on the subdiagonal. The process continues until the matrix has been reduced to the
triangular form
www.pdfgrip.com
Let the tridiagonal matrix T be represented in the form (2.8). The following algorithm
returns a pivoted LU decomposition of T. The three nonzero diagonals of the U-f actor
are returned in the arrays a, 6, and d. The array c contains the multipliers.
1. d(l:n-2) = 0
2. forfc = l t o n - l
3. Choose a pivot index pk {k,k+l}
4. if(pfc^fc)
5. a[k] «• c[k]; b[k] *-» a[*+l]
6. if(Jfc^n-l)d[Jfc]^6[A;+l]fi
7. end if
8. c[Jfc] = c[k]/a[k]
9. a[Jfe+l] = a[Ar+l] - c[k]*b[k]
10. if (k ^ n-1) b[k+l] = b[k+l] - c[k]*d[k] fl
11. end for k
The notation for representing T has been chosen with an implementation in mind.
Initially the matrix is contained in linear arrays a, 6, and c. An additional array d is used
to contain the extra superdiagonal generated by the pivoting. At the end, the arrays a,
6, and d contain the U-factor. The multipliers can be stored in the array c as they are
generated. Algorithm 2.4 gives an implementation.
Much of the code in this algorithm is devoted to pivoting. Arithmetically, the inner
loop requires two additions and two multiplications for a total of In flam. However,
there are also n divisions. Hence:
On many machines the divisions will account for most of the work.
Even with the inclusion of divisions in the count, we are not really through. The
pivoting carries an overhead that is proportional to the number of multiplications and
divisions. Even as simple a thing as the if statement at the end of the loop on k slows
down the algorithm. The algorithm is, in fact, a good example of why one should try
to keep conditional statements out of inner loops. In this case we might let k run from
1 to n-2 and put the code for k = n outside the loop.
Algorithm 2.5 for solving a linear system is analogous to the algorithm for Hessen-
berg systems, except that the call to xeuib is replaced by an explicit back substitution.
www.pdfgrip.com
This algorithm uses the output of Algorithm 2.4 to solve the tridiagonal system Tx
y, overwriting y with the solution.
1. for A; = lton-1
2. y[k] «-> y\pk]
3. y[k+l] = y[k+l] - c[k]*y[k]
4. end for k
5. y[n] = y[n]/a[n]
6. y[n-l] = (y[n-l] - 6[n-l]*y[n])/a[n-l]
7. fork - n - 2 t o l b y - l
8. y[k] = (y[k] - b[k]*y[k+l] - d[k]*y[k+2\)/a[k]
9. end for k
This algorithm takes a positive definite positive matrix whose diagonal is in the array
a and superdiagonal is in b and overwrites a and b with the diagonal and superdiagonal
of the Cholesky factor.
1. o[l]=v/S[IJ
2. for k = I ton-1
3. b[k] = b[k]/a\k]
4. o[Jfc+l] = y/a[k+l] - b[k]2
5. end for i
When T is positive definite, the subdiagonal is the same as the superdiagonal, and
hence we can dispense with the array c. Moreover, pivoting is unnecessary, so that we
can also dispense with the array d. It would appear that an additional array is needed
to store the multipliers. However, if we compute the Cholesky decomposition T =
RTR, then R is bidiagonal, and its elements can overwrite the original elements in
a and b. These considerations lead to Algorithm 2.6 for reducing a positive definite
tridiagonal matrix.
An operation count for the algorithm is easy.
The operation count for Algorithm 2.6 is
Depending on how square roots are implemented, this algorithm could be slower than
www.pdfgrip.com
simply performing Gaussian elimination on the matrix and storing the multipliers, es-
pecially since no pivoting is done.
The band width of A is p+q+l. In this subsection we will show how to factor band
matrices.
The algorithm is analogous to the algorithm for factoring a tridiagonal matrix;
however, there are more diagonals to deal with. In particular, since our algorithm must
work for matrices having arbitrary band widths, we cannot store the diagonals in linear
arrays with individual names—e.g., the arrays a, 6, c, and d in Algorithm 2.8. How-
ever, before we turn to the problem of representing band matrices, it will be useful to
consider the reduction of a band matrix in standard array storage.
We will use Wilkinson diagrams to describe the algorithm. The general algorithm
is sufficiently well illustrated for the case p = 2 and q' = 3. In this case the leading
Dart of A has the form
Since A is general, some form of pivoting will be necessary, and the only form
of pivoting that results in a band matrix is partial pivoting. Consequently, at the first
step we must select our pivots from the first three elements of the first columns. How-
ever, interchanging the rows may introduce new elements above the superdiagonal.
These potential nonzero elements are indicated by the Y's in the following diagram.
The 0 represents an element that cannot possibly become nonzero as a result of the
interchanges.
www.pdfgrip.com
We next subtract multiples of the first row from the second and third to eliminate
the (2,1)- and (3,1)-elements of A. The result is a matrix of the form
Note how the 0 in (2.10) becomes a Y. This reflects the fact that the element could
become nonzero as the two subdiagonal elements are eliminated.
The next step is analogous to the first. We choose a pivot from among the last three
elements of the second column and interchange. This gives a matrix of the form
We then eliminate the (3,2)- and (4,2)-elements to get a matrix of the form
The pattern is now obvious. As zeros are introduced into the lower band, the upper
band expands by exactly the width of the lower band. The following code implements
this algorithm, assuming that the matrix is stored in an array A of order n with elements
outside the band explicitly set to zero.
1. forfc = ltora-l
2. I'M = min{n, k+p}
3. _/'M = minjn, &+;>+<?}
4. Choose a pivot index pk G {A;,... , I'M}.
5. A[Ar, fc:j'«] <->• A[p/(., fc:j«]
6. A[Jt+l:iu, fc] = A[k+l:iu, k]/A[k, k]
7. A[fc+l:itt, k+l:ju] - A[k+l:iu, k+l:ju]
— A[k+l:iu, k]*A[k,k+l:ju]
8. end for k
We now turn to the implementation of this algorithm in a compact storage scheme.
Here our notation for representing submatrices—for example, A[k+1 :zw, k+1 :ju] —
fails us. The reason is that any reasonable compact storage scheme will map rectangu-
lar submatrices onto parts of the array that are not rectangular subarrays. Anticipating
this problem, we will recast (2.11) in terms of level-one and level-two BLAS (see §3.2,
Chapter 2, for details about BLAS and strides).
Specifically, we will suppose we have BLAS subprograms to swap two vectors, to
scale a vector, and to add a rank-one matrix to a general matrix. The following is the
specification of these BLAS.
1. swap( n, x, xstr, y, ystr): This program swaps the n-vectors x and y having
strides xstr and ystr.
2. scale(n, cr, x, xstr): This program overwrites the n-vector x having stride
xstr with ax.
3. apsxyt(m, n, A, astr, cr, x, xstr, y, ystr): This program overwrites A £
R m x n having (row) stride astr with A-\- <jxy^, where x is an m-vector hav-
ing stride xstr and y is an n-vector having stride ystr. (The name means "a
plus s times x y transpose".)
The reader should verify that passing along a row of this array moves along a diagonal
of the matrix, while passing down a column of the array passes down a column of the
matrix. To move across a row of the matrix, however, one must move diagonally in
the array toward the northeast.
It is precisely this diagonal storage of rows that makes the colon convention for
representing submatrices unworkable. However, the BLAS we coded above can take
diagonal storage in stride. For suppose the array in which A is stored has stride astr.
If r is the address in memory of the first element of the first row, then the addresses of
the elements of the entire row are
In other words a row in our representation is like a row in conventional storage but
with its stride reduced by one. Thus we can convert our BLAS to this representation
by reducing astr by one. There is one more difficulty. References like A[fc, k] must
be translated to refer the position of a^k in the new storage scheme. Fortunately, the
correspondence is trivial. Let
Then
aij corresponds to A[m+i—j,j].
These transformations are implemented in Algorithm 2.7. Here are some com-
ments.
• The algorithm can be blocked in the usual way. But unless p and q are large, block-
ing will not improve the performance by much.
• The complexity of the algorithm is not easy to derive because it depends on three
parameters: n, p, and q. The algorithm has three distinct stages depending on the value
of the index k of the outer loop.
1. For & = !,..., n—p—q, we have ni = p and nj = p+q. Consequently the
update of the Schur complement in statement 8 takes p(p+q) flam for a total
°f P(P+Q.)(n~P~Q} flam.
2. For k = n—p—q+1,... , n -p, the length ni has the fixed value p, but nj de-
creases by one with each iteration of the loop. By standard integration tech-
niques, we see that this part of the loop contributes p2q+ \pq2 flam.
3. For the remaining values of k, the algorithm reduces the pxp matrix in the
southeast corner, for an operation count of |p3 flam.
Thus:
Ifn>p+q, the operation count for Algorithm 2.7 is
www.pdfgrip.com
Let the matrix A with lower band width p and upper band width q be represented ac-
cording to the scheme (2.12) in an array A having stride astr. The following algorithm
overwrites the first p+q+1 rows of array A with the U-factor of the matrix A. The last
p rows contain the multipliers.
1. m = p+q+l
2. forfc = ltorc-1
3. ni = min{p, n—k}
4. nj = min{p+<?, n—k}
5. Choose a pivot index p^ e {&,..., k+ni}.
6. swap(nj+l, A[m, fc], asfr-1, A[m+pfc, &], a5fr—1)
7. scale(ni, l/A[m,k], A[m+l,k], 1)
8. apsxyt(ni, nj, A[m,k+l], astr—1, -1,
A[m+l,A;], 1, A[m-l,fc+l], a^r-1)
9. end for k
here for consistency with the QR decomposition to be treated later, but the latter is also
common.
Band matrices
The storage scheme for band matrices presented here is due to the authors of LINPACK
[99]. The ingenious use of level-two BLAS to move diagonally in a matrix is found in
LAPACK [9].
Linear systems of equations seldom come unadulterated. For example, the matrix A of
the system may be measured, in which case the matrix at hand is not A itself but a per-
turbation A + E of A. Or the elements of the matrix may be computed, in which case
rounding errors insure that we will be working with a perturbed matrix. Even when
A is known exactly, an algorithm like Gaussian elimination will effectively perturb A
www.pdfgrip.com
(see §4). The question treated in this section is how do these perturbations affect the
solution of a linear system.
In §3.1 we will present the classical perturbation theory, which bounds the norm
of the error. The fact that norms are only a rough measure of the size of a vector or ma-
trix can cause normwise bounds to be pessimistic. Consequently, in §3.2, we will treat
componentwise bounds that to some extent alleviate this problem. In §3.3 we will be
concerned with projecting a measure of the accuracy of the solution back on the orig-
inal matrix. These backward perturbation bounds have many practical applications.
Finally, in the last subsection, we will apply perturbation theory to analyze the method
of iterative refinement, a technique for improving the accuracy of the solution of linear
systems.
Then
or
www.pdfgrip.com
Hence
then
If in addition
Proof. From (3.4) it follows that x — x = A 1 Ex, and (3.5) follows on taking norms.
Now assume that || A"1^!! < 1. The matrix A is nonsingular if and only if the
matrix A~l A = / + A~1E is nonsingular. Hence by Theorem 4.18, Chapter I , A is
nonsingular, and (3.6) follows immediately from (3.5) and Lemma 3.1. •
(The C in .Dc stands for column, since Dc scales the columns of A.) If we have a
rough idea of the sizes of the components of x, we can take £t- = Xi (i = 1,... , n),
and the components of D^lx will all be nearly one. However, this approach has its
own drawbacks — as we shall see when we consider artificial ill-conditioning.
The right-hand sides of (3.5) or (3.6) are not as easy to interpret as the left-hand
sides. But if we weaken the bound, we can put them in a more revealing form. Specif-
ically, from (3.5)
where
x
If 11A 1111 £ 11 < 1, then relative error in x is bounded by
The number K(A) is called the condition number of A with respect to inversion,
or, when the context is clear, simply the condition number. For all the commonly used
norms it is greater than one, since
We have already observed that the left-hand side of this bound can be regarded as
a relative error in x. The factor
www.pdfgrip.com
can likewise be regarded as a relative error in A. Thus the condition number K( A) tells
us by how much the relative error in the matrix of the system Ax = b is magnified in
the solution.
There is a rule of thumb associated with the condition number. Let us suppose that
the normwise relative errors reflect the relative errors in the elements of x and A. Thus
if Fp = 10""*, then A is accurate to about t decimal digits. If K,(A) = I0k, then
In other words, A = A + E, where e z j = aijtij • It follows that for any absolute norm
Hence by (3.7)
This says that the larger components ofx may suffer a loss of log K( A) decimal digits
due to the rounding of A. The smaller components may lose even more digits.
The condition number in the two norm has a nice characterization in terms of sin-
gular values. It is the ratio of the largest singular value to the smallest:
In many applications the elements of A are about one in magnitude, in which case a
large condition number is equivalent to a small singular value.
Since the condition number involves || A~l ||, it would seem that to compute the
condition number one would have to compute the inverse of A. However, once the
matrix A has been factored — say by Gaussian elimination—there are 0(n 2 ) tech-
niques to estimate its condition number. These condition estimators will be treated in
§3, Chapter 5.
www.pdfgrip.com
where
The result now follows on dividing this inequality by 11 x \ \ and applying the definitions
of K(A) and^.
Once again, K( A] mediates the transfer of relative error from the data to the solu-
tion. However, the factor fi mitigates the effect of K. In particular, it is easy to show
that
Hence the factor IJL has the potential to reduce the factor of ||e||/||6|| to one—i.e., to
make the problem perfectly conditioned.
To see what /z means in terms of the original problem, suppose that || A|| = 1. If
l l
M = K(A) = \\A~ \\, then \\x\\ = HA-^IH&II; that is, \\x\\ reflects the size of \\A~ \\.
On the other hand if n — 1, then 11x\\ = \\b\\ and the size of 11x \\ tells us nothing about
|| A~l ||. In proportion as IJL is near K( A) we say that x reflects the ill-conditioning of A.
Problems that reflect the ill-conditioning of their matrix are insensitive to perturbations
in the right-hand side.
It should be stressed that solutions of real-life problems usually do not reflect the
ill-conditioning of their matrices. That is because the solutions have physical signifi-
cance that makes it impossible for them to be large. And even when a right-hand side
reflects the ill-conditioning of the matrix, the solution is still sensitive to errors in the
matrix itself.
www.pdfgrip.com
Artificial ill-conditioning
Unfortunately, Example 3.4 does not tell the whole story. Let us look at another ex-
ample.
If we were to round the matrix A in the sixth digit, we might get an error matrix like
Now the condition number of A is afc>out3.4e+5, so that by Example 3.4 we should ex-
pect a relative accuracy of one or two figures in the solution x\ obtained by perturbing
A by EI . In fact the bound on the normwise relative error is
The error bound overestimates the actual error by almost four orders of magnitude.
Theorem 3.7. Let A be nonsingular and let Ax = b. Lett > 0. IffHA"" 1 !^ < 1,
then there is a matrix E with \\E\\2 = € such that if x = (A + E}~lb then
Proof. Let u be a vector of norm one such that \\A lu\\2 = \\A 1\\% and set
www.pdfgrip.com
and
Since c\\A 1|| < l,A + Eis nonsingular and x is well defined. From the easily
verified identity x — x — (I - A~lE)~lA~l Ex, we have
The difference between the two bounds is in the sign of the denominator, which in
both cases can be made as near one as we like by choosing e small enough. Thus by
making e arbitrarily small, we can make the relative perturbation in x arbitrarily close
tolljEI^HA" 1 !^. Although we have worked with the 2-norm for simplicity, analogous
results hold for the other commonly used norms.
Having established that there are errors that makes the bounds (3.7) or (3.8) real-
istic, let us exhibit one such matrix of errors for Example 3.6.
Example 3.8 (Artificial ill-conditioning, continued). Let
We now have two errors — one for which the bound (3.7) works and one for which
it does not. There is no question of one error being better or more realistic than the
other. It depends on the application. If we are concerned with the effects of rounding
the elements of A on the solution, then EI reflects the fact that we make only small
relative errors in the components of A. On the other hand, if the elements of A were
measured with an instrument that had an absolute accuracy of 10~6, then E^ more
accurately reflects the error. Thus it is the structure of the error that creates the artificial
ill-conditioning. The condition number has to be large enough to predict the results of
perturbing by E%. But it then overestimates the perturbations in the solution due to E\.
Mathematically speaking, the overestimate results from the fact that for EI the
right-hand side of the inequality
greatly overestimates the left-hand side. This suggests it may be possible to rescale the
problem to strengthen the inequality. Specifically, we can replace the system Ax = b
with
where DR and Dc are suitably chosen diagonal matrices. Since DRADC = DRADC +
DREDC, the matrix E inherits the scaling of A. Thus we wish to choose DR and Dc
so that the inequality
is as sharp as possible.
The strategy recommended here is the following.
Choose DR and Dc so that the elements ofE are, as nearly as possible, equal.
There can be no completely rigorous justification of this recommendation, but the fol-
lowing theorem, which is stated without proof, is suggestive.
Theorem 3.9. If the elements of E are uncorrelated random variables with variance
cr2, then
Here E is the mathematical expectation—the average — and the number a can be re-
garded as the size of a typical element of E. Thus the theorem says that if we regard
a as also representing \\E\\2, then on the average the inequality (3.13) is sharp.
Then
Proof. The bound (3.14) follows immediately on taking absolute values in the identity
x - x — A~lEx.
Turning now to (3.16), if (3.15) is satisfied then by Corollary 4.21, Chapter 1,
and the right-hand side is a fortiori nonnegative. The bound (3.16) now follows on
taking absolute values in the identity
Mathematically the bounds (3.14) and (3.16) differ, but as a practical matter they
are essentially the same. For if l^"1!!^ is reasonably small, then the factor (/ -
l^"1!!^!)"1 will differ insignificantly from the identity, and either bound will give
essentially the same result.
This is a good place to point out that there is a difference between a mathematician
using perturbation bounds to prove a theorem and a person who wants some idea of
how accurate the solution of a linear system is. The former must be punctilious; the
latter can afford to be a little sloppy. For example, A~l will generally be computed in-
accurately. But as long as the problem is not so ill conditioned that A~l has no figures
of accuracy, it can be used with confidence in the bound. Similarly, if x and x agree
to one or two figures, it does not much matter which of the bounds (3.14) or (3.16)
is used. Generally speaking, if the bounds say the solution is at all accurate, they are
almost certainly overestimates.
The bounds can be quite an improvement over normwise bounds.
Example 3.12 (Artificial ill-conditioning, concluded). In our continuing example,
the error in x\, component by component, is
This corollary converts the problem of computing a bound on the left-hand side of
(3.18) to that of estimating 111 A"111 .E11 z 111. This can be done by the condition estimator
described in §3.1, Chapter 5.
The solution of this backward perturbation problem requires that we have a comput-
able measure of the quality of x as an approximate solution of the system. We will
measure the quality by the size of the residual vector
The problem (3.19) has the flavor of a backward rounding-error analysis (see §4.3,
Chapter 2), in that it projects an error back on the original data—namely, the matrix
A. However, the bound on this error is based on the residual, which can actually be
computed. Consequently, backward perturbation results are used in many practical
applications.
satisfying
such that
Proof. The fact that E as defined above satisfies (3.21) and (3.22) is a matter of direct
calculation. On the other hand, if x satisfies (3.22), then
(here 0/0 = 0 and otherwise />/0 = ooj. If c ^ oo, there is a matrix E and a vector e
with
such that
This in turn implies that r = D(S\x\ + s), where \D\ < el. It is then easily verified
that E — DS diag(sign(£i),... , sign(|n)) and e = -Ds are the required backward
perturbations.
On the other hand, given perturbations E and e satisfying (3.25) and (3.26) for
some e, we have
Hence e > \pi\/(Sx + s);, which shows that the e defined by (3.24) is optimal.
The proof of this theorem is constructive in that it contains a recipe for calculating
E from S and s and r. It is an instructive exercise to verify that when 5 = ee4 and
3 = 0, the resulting matrix E is precisely (3.23).
whose solution we will denote by x*. The notation used here emphasizes two points
of generality. First, the use of boldface indicates that we are not assuming anything
about the nature of the objects in the vector space. For example, the space could be
the space of upper triangular matrices, and A could be the mapping R i-> J?T + R.
(This mapping arises in perturbation theory for Cholesky factors.) Second, we are at-
tempting to find a point x* that makes a function r(x) equal to zero. Although for the
moment we assume that r is linear, we will see later that it can also be nonlinear.
Now let XQ be a putative solution of (3.27), and let TO = r(xo). Then
Theorem 3.16. Let (3.28) be applied iteratively to give the sequence XQ, xi, — Let
ek = Xfc - x and E& = A* - A. If
then
In oarticular. if
then
then
www.pdfgrip.com
Moreover,
Now
Hence
Similarly
An obvious induction now gives (3.29). The bounds (3.30) and (3.31) follow directly
from (3.29) and the definitions of 7± and r)±.
The usual application of iterative refinement is to improve solutions of linear sys-
tems. In this case the error CQ starts off larger than || A|J7+ + ?7+ • The inequality (3.30)
says that, assuming p is reasonably less than one, each iteration will decrease the error
by a factor of p until the error is of a size with 11A117+ + 77+, at which point the iteration
will stagnate. Thus the theorem provides both a convergence rate and a limit on the
attainable accuracy.
The theorem also applies to nonlinear function r. For if r is differentiable at x*,
then
and we can incorporate the o(||eyt||) term into the vector g^. In this case the bound
(3.31) says that the initial values of the o(||e^||) terms do not affect the limiting ac-
curacy — though they may slow down convergence. Such an iteration is called self-
correcting.
It should be stressed that the theorem does not tell us when a nonlinear iteration
will converge—always a difficult matter. What it does says is that if the method does
converge then nonlinearities have no effect on its asymptotic behavior.
www.pdfgrip.com
Artificial ill-conditioning
Wilkinson [344,1961] seems to have been the first to point out that artificial ill-con-
ditioning could be laid to the unsharpness of bounds like HA"1.^ < HA^mi-EH.
His definitive words on the subject may be found on pages 192-193 of his Algebraic
Eigenvalue Problem [346]. All other treatments of the subject, including the one here,
are just elaborations of the good common sense contained in those two pages.
The authors of LINPACK [99, 1979] recommend equal error scaling to mitigate
the effects of artificial ill-conditioning. Curtis and Reid [83] describe an algorithm for
balancing the elements of a matrix in the least squares sense.
Componentwise bounds
Bauer [21,1966] began the study of componentwise perturbation theory. His results,
which chiefly concerned matrix inverses, never caught on (possibly because he wrote
in German) until Skeel [281, 1979] established what is essentially (3.14). (We shall
hear more of Skeel in the next subsection.) If we assume that \E\ < c\A\, then on
taking norms in (3.14), we get
this problem is to work with formulas for the individual components of the solution
[e.g., (3.17)].
For a survey of componentwise perturbation theory, see [176].
Iterative refinement
According to Higham [177, §9.10], Wilkinson gave a program for iterative refinement
in 1948. The essence of the method is making do with the inverse of an approximation
to the operator in question. Its prototype is the practice of replacing the derivative
in Newton's method with a suitable, easily computable approximation. Some other
applications of the method, notably to eigenvalue problems, will be found in [36,45,
94, 98,106,162, 287].
The analysis given here parallels what may be found in the literature, e.g., [230,
288,177] but with some variations that make it easier to apply to nonlinear problems.
eM/0.9 is the adjusted rounding unit (see Theorem 4.10, Chapter 2).
such that
Note that in applying the bound we would probably make this simplification anyway.
• An obvious variant of the theorem applies to the solution of an upper triangular
system Ux = 6 by back substitution. The chief difference is that the error matrix
assumes the form
where «(D) = ||i||||i 1|| is the condition number of L. From (4.3) we easily see that
for any absolute norm
Consequently
Thus if K(L) is large, we may get inaccurate solutions. Let us look at an example.
The second column of this table contains a ballpark estimate of the error. The third
column contains the actual error. Although the former is pessimistic, it tracks the latter,
which grows with the size ofWn.
If n M||£|| 11or 11 is less than errors already present in 6, then any inaccuracies of the
solution can be regarded as coming from b rather than being introduced by the algo-
rithm.
It is worth noting that the bound (4.6) is within a factor of n of what we could
expect from the correctly rounded solution. For if Lx* = b and we round x*, we get
x — x 4- a. where
Hence
satisfies
Theorem 4.4 states that the computed LU decomposition of A is the exact decom-
position of A+E, where E is proportional to n times the rounding unit. The factor n is
www.pdfgrip.com
usually an overestimate and applies only to the elements in the southeast. For matrices
of special structure—e.g., Hessenberg matrices — the factor n can often be replaced
by a small constant.
The sizes of the individual elements of E depend on the size of the computed fac-
tors L and U. To see the effects of large factors, let us return to the matrix of Exam-
ple 2.8.
Example 4.5. Let
Both these factors have large elements, and our analysis suggests that their product
will not reproduce A well. In fact, the product is
which is very close to A with its first and third row interchanged.
The product (4.8) illustrates a point that is easy to overlook—namely, the matrix
A + E from Theorem 4.4 need not be representable as an array of floating-point num-
bers. In fact, if (4.8) is rounded to four digits, the result is A itself.
We have given componentwise bounds, but it is also possible to give bounds in
terms of norms. To do this, it will be convenient to introduce some notation.
www.pdfgrip.com
IfLtM and UeM areL- and U-factors computed in floating-point arithmetic with round-
ing unit CM, then
where U and V are random orthogonal matrices. Thus ^(A) = 105. The L-factor
resulting from Gaussian elimination with partial pivoting for size is
and its condition number in the 2-norm is 4.2e+00. The corresponding U-factor is
and its condition number is K2(U) = 3.4e+04. But if we row-scale U so that its di-
agonal elements are one, we obtain a matrix whose condition number is 2.9e+00.
Since row-scaling a triangular system has no essential effect on the accuracy of the
computed solution, systems involving the U-factor will be solved accurately. Systems
involving the L-factor will also be solved accurately because L is well conditioned.
As we shall see, these facts have important consequences.
It should be stressed that this phenomenon is not a mathematical necessity. It is
easy to construct matrices for which the L-factor is ill conditioned and for which the
ill-conditioning in the U-factor is genuine. Moreover, the strength of the phenomenon
depends on the pivoting strategy, being weakest for no pivoting and strongest for com-
plete pivoting. For more see the notes and references.
where
But
The norm bound on H follows on taking norms in the bound on \H\ and applying the
definition of 7 eM (yl).
Thus the computed solution is the exact solution of a slightly perturbed system.
The bound on the perturbation is greater by a factor of essentially three than the bound
on the perturbation E produced by the computation of the LU decomposition. How-
ever, this bound does not take into account the observation (4.10) that the L-factor pro-
duced by Gaussian elimination with pivoting tends to be well conditioned while any
ill-conditioning in the U-factor tends to be artificial. Consequently, the first triangu-
lar system in (4.11) will be solved accurately, and the solution of the second will not
magnify the error. This shows that:
If Gaussian elimination with pivoting is used to solve the system
Ax = b, the result is usually a vector x that is near the solution of
the system (A + E)x = b, where E is the backward error from the
elimination procedure.
Proof. Since b - (A + H)x = 0, we have r = Ex. Hence \\r\\ < ||#||||x|| <
(3 + n6M)n^M(A)€ M\\A\\\\x\\.
We have seen in Theorem 3.14 that the converse of this corollary is also true. If
a purported solution has a small residual, it comes from a slightly perturbed problem.
This converse has an important practical implication. If we want to know if an al-
gorithm has solved a linear system stably, all we have to do is compare its residual,
suitably scaled, to the rounding unit.
Matrix inversion
We conclude our treatment of backward stability with a discussion of matrix inversion.
For definiteness, let us suppose that we compute the inverse X of A by solving the
svstems
However, it does not follow from this that there is a single matrix H such that X =
(A+H)~l, and in general there will not be—matrix inversion is not backward stable.
However, it is almost stable. By the observation (4.12) each X{ will tend to be
near the solution of the system (A + E)x,; = ej, where E is the backward error from
Gaussian elimination. Thus the computed inverse will tend to be near the inverse of
matrix A + E, where E is small.
Unfortunately in some applications nearly stable is not good enough. The follow-
ing example shows that this is true of the invert-and-multiply algorithm for solving a
linear system.
Example 4.11. The following equation displays the first five digits of a matrix having
singular values 1,10~7, and 10~14:
X{. The following is a table of the relative errors and the relative residuals for the two
solutions.
The invert-and-multiply solution is slightly more accurate than the solution by Gauss-
ian elimination. But its residual is more than 12 orders of magnitude larger.
A simplified analysis will show what is going on here. Suppose that we compute
the correctly rounded inverse—that is, the computed matrix X satisfies X = A~l +
F, where \F\\ < a\\A~l\\€ M, for some constant a. If no further rounding errors are
made, the solution computed by the invert-and-multiply algorithm is A~l b + Fb, and
the residual is r = - AFb. Hence
Thus the residual (compared to ||6||) can be larger than the rounding unit by a factor
of K(A). In other words, if A is ill conditioned expect large residuals from the invert-
and-multiply algorithm.
[cf. (4.7)]. In particular, to the extent that the bounds are valid, the relative backward
error in any component of A remains unchanged by the scaling. This means that if a
fixed pivoting sequence is found to be good for a particular matrix, the same sequence
will be good for all scaled versions of the matrix.
The normwise growth factor
is not easy to work with. However, if partial or complete pivoting for size is used in
computing the LU decomposition, the components of \L\ are not greater than one, and
any substantial growth will be found in the elements of U. For this reason we will
analyze the growth in terms of the number
The absence of the matrix A and the presence of the subscript n indicates that we will
be concerned with the behavior of 7n for arbitrary matrices as a function of the order
of the matrix.
The backward error bound for Gaussian elimination is cast in terms of the com-
puted L- and U-factors, and strictly speaking we should include the effects of rounding
errors in our analysis of 7n. However, this is a tedious business that does not change
the results in any essential way. Hence we will analyze the growth for the exact elim-
ination procedure.
(k)
Proof. Assume that pivoting has been done initially. Let a\-' be the elements of the
(k)
fcth Schur complement, and let ajt = max,-j |aj- |. Now
sINCE
www.pdfgrip.com
1
Hence a^+i < 2a^ < 2k «i. Since the fcth row of U consists of the elements aj^' ,
the result follows.
The discouraging aspect of this bound is that it suggests that we cannot use Gauss-
ian elimination with partial pivoting on matrices larger than roughly - Iog2 eM. For at
that size and beyond, the backward error could overwhelm the elements of the matrix.
For IEEE standard arithmetic, this would confine us to matrices of order, say, 50 or
less.
Moreover, the bound can be attained, as the following example shows.
Then if we break ties in the choice of pivot in favor of the diagonal element, it is easily
seen that each step of Gaussian elimination doubles the components of the last column,
so that the final U has the form
In spite of this unhappy example, Gaussian elimination with partial pivoting is the
method of choice for the solution of dense linear systems. The reason is that the growth
suggested by the bound rarely occurs in practice. The reasons are not well understood,
but here is the bill of particulars.
4. Example 4.13 is highly contrived. Moreover, all examples that exhibit the
same growth are closely related to it.
www.pdfgrip.com
Against all this must be set the fact that two examples have recently surfaced in
which partial pivoting gives large growth. Both bear a family resemblance to the ma-
trix in Example 4.13. The existence of these examples suggests that Gaussian elimi-
nation with partial pivoting cannot be used uncritically—when in doubt one should
monitor the growth. But the general approbation of partial pivoting stands.
• Complete pivoting. Complete pivoting can be recommended with little reserva-
tion. It can be shown that
The bound is not exactly small—for n = 1000 it is about seven million—but for
many problems it would be satisfactory. However, the bound is rendered largely ir-
relevant by the fact that until recently no one has been able to devise an example for
which 7n is greater than n. For many years it was conjectured that n was an upper
bound on the growth, but a matrix of order 25 has been constructed for which the 7n
is about 33.
Given the general security of complete pivoting and the potential insecurity of par-
tial pivoting, it is reasonable to ask why not use complete pivoting at all times. There
are three answers.
1. Complete pivoting adds an O(n3) overhead to the algorithm—the time re-
quired to find the maximum elements in the Schur complements. This over-
head is small on ordinary computers, but may be large on supercomputers.
2. Complete pivoting can be used only with unblocked classical Gaussian elim-
ination. Partial pivoting can be use with blocked versions of all the variants
of Gaussian elimination except for Sherman's march. Thus partial pivoting
gives us more flexibility to adapt the algorithm to the machine in question.
3. Complete pivoting frequently destroys the structure of a matrix. For exam-
ple, complete pivoting can turn a banded matrix into one that is not banded.
Partial pivoting, as we have seen, merely increases the band width.
• Positive definite matrices. The reason no pivoting is required for positive definite
matrices is contained in the following result.
Theorem 4.14. Let A be positive definite, and let a^ be a maximal diagonal element
of A. Then
Proof. Suppose that for some a t j we have a,-j| > a^. Since A is positive definite,
the matrix
In other words, the element of a positive definite matrix A that is largest in magni-
tude will be found on the diagonal. By Theorem 2.6, the Schur complements generated
by Gaussian elimination are positive definite. When we perform one step of Gaussian
elimination on A, the diagonal elements of the Schur complement are given by
the inequality following from the fact that a? > 0 and an > 0. Hence the diagonal
elements of a positive definite matrix are not increased by Gaussian elimination. Since
we have only to look at the diagonal elements to determining the growth factor, we
have the following result.
For Gaussian elimination applied to a positive definite matrix 7n = 1.
Several times we have observed that pivoting can destroy structure. For exam-
ple, partial pivoting increases the bandwidth of a band matrix. Since positive definite
matrices do not require pivoting, we can avoid the increase in band width with a cor-
responding savings in work and memory.
Partial pivoting is not an option with positive definite matrices, for the row inter-
changes destroy symmetry and hence positive-defmiteness. Moreover, Theorem 4.14
does not imply that a partial pivoting strategy will automatically select a diagonal el-
ement—e.g., consider the matrix
For this reason positive definite systems should not be trusted to a general elimination
algorithm, since most such algorithms perform partial pivoting.
• Diagonally dominant matrices. Diagonally dominant matrices occur so frequent-
ly that they are worthy of a formal definition.
www.pdfgrip.com
The following theorem lists the facts we need about diagonal dominance. Its proof
is quite involved, and we omit it here.
Theorem 4.16. Let A be strictly diagonally dominant by rows, and let A be partition
in the form
An analogous theorem holds for matrices that are diagonally dominant by col-
umns. Note that the statement that AH is nonsingular is essentially a statement that
any strictly diagonally dominant matrix is nonsingular.
To apply these results to Gaussian elimination, let A be diagonally dominant by
columns, and assume that all the leading principal submatrices of A are nonsingular,
so that A has an LU decomposition. Since the Schur complements of the leading prin-
cipal minors are diagonally dominant by columns, Gaussian elimination with partial
pivoting is the same as Gaussian elimination without pi voting, provided ties are broken
in favor of the diagonal element. Moreover, by (4.18) the sum of the magnitudes of el-
ements in any column of the Schur complement are less than or equal to the sum of the
magnitudes of elements in the corresponding column of the original matrix and hence
are less than or equal to twice the magnitude of the corresponding diagonal element of
the original matrix. Since the largest element of a diagonally dominant matrix may be
found on its diagonal, it follows that the growth factor 7n cannot be greater than two.
We have thus proved the following theorem.
www.pdfgrip.com
Scaling
By scaling we mean the replacement of the matrix A by D^ADC, where Z)R and Dc
are diagonal matrices. There are two preliminary observations to be made. We have
already observed that the backward error committed in Gaussian elimination inherits
any scaling. Specifically, the error bound (4.7) becomes
Thus if the pivot order is fixed, the elements suffer essentially the same relative errors
(exactly the same if the computation is in binary and the diagonals of DR and DC are
powers of two).
A second observation is that by scaling we can force partial or complete pi voting to
choose any permissible sequence of nonzero elements. For example, if for the matrix
Thus partial or complete pivoting chooses the (1,1)-element as a pivot. This scaling
procedure can be repeated on the Schur complements.
The two observations put us in a predicament. The second observation and the
accompanying example show that we can use scaling to force partial or complete piv-
oting to choose a bad pivot. The first observation says that the bad pivot continues to
have the same ill effect on the elimination. For example, Gaussian elimination applied
to (4.19) still obliterates the (2,3)- and (3,2)-elements. What scaling strategy, then,
will give a good pivot sequence?
There is no easy answer to this question. Here are three scaling strategies that are
sometimes suggested.
A general analysis
We are going to use Theorem 3.16 to analyze the algorithm when it is carried out in
floating-point arithmetic. To apply the theorem, we must compute the bounds p, 7+,
and 774. in (3.30). In doing so we will make some reasonable simplifying assump-
tions.
1. The vectors Xk are bounded in norm by a constant £.
2. The vectors Axk approximate 6.
3. The correction dk is smaller in norm than Xk.
4. The residual r^ may be computed with a rounding unit eM that is different
from the other computations.
The first three conditions are what one would expect from a converging iteration. The
last represents a degree of freedom in the computation of the residual.
Using these assumptions we can derive the following bounds, which for brevity
we state without proof. First, from the error analysis of Gaussian elimination we get
For the residual it can be shown that the computed vector satisfies
where
The first term on the right-hand side of (4.21) says that the initial error is decreased
by a factor of p at each iteration. This decrease continues until the other two terms
dominate, at which point convergence ceases. The point at which this happens will
depend on € M — the precision to which the residual is computed. We will consider
two cases: double and single precision.
www.pdfgrip.com
If 2c r Av(A)c M is less than one, the attainable accuracy is limited by the term 3eM. Thus
with double-precision computation of the residual iterative refinement produces a re-
sult that is effectively accurate to working precision. If A is ill conditioned the con-
vergence will be slow, but ultimately the solution will attain almost full accuracy.
Iterative refinement in fixed precision tends to produce solutions that have small
componentwise backward error.
The formal derivation of this result is quite detailed. But it is easy to understand
why it should be true. The computed residual is r = b - (A + G)x, where \G\ <
cr\A\€ M — i.e., A + G is a componentwise small relative perturbation of A. Now let's
shift our focus a bit and pretend that we were really trying to solve the system (A +
G)x = b. Then our residual calculation gives a nonzero vector r that considered as the
residual of the system (A + G) x = b is completely accurate. Consequently, one step
of iterative refinement will move us nearer the solution o f ( A + G)x = b — a solution
that by definition has a small relative componentwise backward error with respect to
the original system Ax = b.
Historical
It is a commonplace that rounding-error analysis and the digital computer grew up
together. In the days of hand computation, the person performing the computations
could monitor the numbers and tell when a disaster occurred. In fact the principal
source of errors were simple blunders on the part of the computer, and the compu-
tational tableaus of the time contained elaborate checks to guard against them (e.g.,
see [112,1951]). With the advent of the digital computer intermediate quantities were
not visible, and people felt the need of mathematical reassurance.
Nonetheless, the first rounding-error analysis of Gaussian elimination predated the
digital computer. The statistician Hotelling [186,1943] gave a forward error analysis
that predicted an exponential growth of errors and ushered in a brief period of pes-
simism about the use of direct methods for the solution of linear systems. This pes-
simism was dispelled in 1947 by von Neumann and Goldstine [331], who showed that
a positive definite system would be solved to the accuracy warranted by its condition.
This was essentially a weak stability result, but, as Wilkinson [348] points out, back-
ward error analysis was implicit in their approach. In an insightful paper [321,1948],
Turing also came close to giving a backward rounding-error analysis.
The first formal backward error analysis was due to Givens [145,1954]. He show-
ed that the result of computing a Sturm sequence for a symmetric tridiagonal matrix
is the same as exact computations on a nearby system. However, this work appeared
only as a technical report, and the idea languished until Wilkinson's definitive paper on
the error analysis of direct methods for solving linear systems [344,1961]. Wilkinson
went on to exploit the technique in a variety of situations.
multipliers are bounded by one. Second, the multiplier of the backward error involves
the maximum element of all the intermediate Schur complements, not just the elements
of L and U. Although his analysis is componentwise, he is quick to take norms. The
first componentwise bound in the style of Theorem 4.4 is due to Chartres and Geuder
[65, 1967], and their bound is essentially the same, though not expressed in matrix
form.
Inverses
The fact that there is no backward error analysis of matrix inversion was first noted
by Wilkinson [344,1961]. But because triangular systems from Gaussian elimination
tend to be solved accurately, the computed inverse will generally be near the inverse
of a slightly perturbed matrix. Unfortunately, as we have seen (Example 4.11), near is
not good enough for the invert-and-multiply method for solving linear systems. For
this reason, the invert-and-multiply algorithm has rightly been deprecated. However,
if the matrix in question is known to be well conditioned, there is no reason not to use
it. A trivial example is the solution of orthogonal systems via multiplication by the
transpose matrix.
The backward error analysis of the LU decomposition does not imply that the com-
puted L- and U-factors are accurate. In most applications the fact that the product LU
reproduces the original matrix to working accuracy is enough. However, there is a con-
siderable body of literature on the sensitivity of the decomposition [18, 315,304,308].
For a summary of these results and further references see [177].
Growth factors
Definition 4.6, in which growth factors are defined in terms of norms, is somewhat
unconventional and has the drawback that one needs to know something special about
\L\ and \U\ to compute them. Usually the growth factors are defined by something
like (4.14), with the assumption that a pivoting strategy has kept the elements of L
under control. Whatever the definition, one must choose whether to work with the
exact factors or the computed factors.
The matrix in Example 4.13, which shows maximal growth under partial pivot-
ing, is due to Wilkinson [344]. N. J. and D. J. Higham [178] show that any matrix
www.pdfgrip.com
that attains that growth must be closely related. The observation that Hessenberg
and tridiagonal matrices have reasonable bounds for their growth factors is also due
to Wilkinson [344]. Trefethen and Schreiber [320] have made an extensive investiga-
tion of pivot growth in random matrices. Higham and Higham [178] have exhibited
orthogonal matrices that exhibit modest growth. For a practical example in which par-
tial pivoting fails see [121].
The bound (4.15) for Gaussian elimination with complete pivoting is due to Wil-
kinson, who observed that it could not be attained. For further references on complete
pivoting, see [177].
Wilkinson [344] showed that pivoting was unnecessary for positive definite ma-
trices and matrices that are diagonally dominant by columns. That the same is true
of matrices that are diagonally dominant by rows is obvious from the fact that Gauss-
ian elimination by rows or columns gives the same sequence of Schur complements.
Cryer [81] established the nonnegativity of the L- and U-factors of totally positive ma-
trices; the connection with the stability of Gaussian elimination was made by de Boor
and Pinkus [90].
Scaling
Bauer [19,1963] was the first to observe that scaling affects Gaussian elimination only
by changing the choice of pivots. Equal-error scaling is recommended by the authors
of UNPACK [99]. For another justification see [292].
A strategy that was once in vogue was to scale to minimize the condition number
of the matrix of the system (e.g., see [20]). Given the phenomenon of artificial ill-
conditioning, the theoretical underpinnings of this strategy are at best weak. It should
be noted, however, that balancing the elements of a matrix tends to keep the condition
number in the usual norms from getting out of hand [20, 324, 323, 310].
Since row and column scaling use 2n-l free parameters to adjust the sizes of the
n 2 elements of a matrix, any balancing strategy must be a compromise. Curtis and
Reid [83] describe an algorithm for balancing according to a least squares criterion.
Iterative refinement
Iterative refinement is particularly attractive on machines that can accumulate inner
products in double precision at little additional cost. But the double-precision calcula-
tion of the residual is difficult to implement in general software packages. The authors
of UNPACK, who were not happy with mixed-precision computation, did not include
it in their package. They noted [99, p. 1.8], "Most problems involve inexact input data
and obtaining a highly accurate solution to an imprecise problem may not be justified."
This is still sound advice.
The fact that iterative refinement with single-precision computation of the residual
could yield componentwise stable solutions was first noted by Skeel [282,1980]. For
a complete analysis of this form of the method see [177]. For implementation details
see the LAPACK code [9].
www.pdfgrip.com
4
THE QR DECOMPOSITION AND LEAST SQUARES
The extension of the results of this chapter to complex matrices is not difficult. The
case where rank(X) < p will be treated in the next chapter.
249
www.pdfgrip.com
1. THE QR DECOMPOSITION
The QR decomposition of X is an orthogonal reduction to triangular form—that is,
a decomposition of the form
where Q is orthogonal and R is upper triangular. We will begin this section by es-
tablishing the existence of the QR decomposition and describing its properties. In the
next subsection we will show how to compute it by premultiplying X by a sequence
of simple orthogonal matrices called Householder transformations. In the following
section we will introduce another class of orthogonal matrices—the plane rotations,
which are widely used to introduce zeros piecemeal into a matrix. We will conclude
with an alternative algorithm—the Gram-Schmidt algorithm.
1.1. BASICS
In this subsection we will establish the existence of the QR decomposition and give
some of its basic properties.
is orthogonal. Since the column space of Qx forms an orthonormal basis for the col-
umn space of X, we have Q^X = 0. It follows that
where R is upper triangular with positive diagonal elements. The matrix R is unique,
as are the first p columns ofQ.
www.pdfgrip.com
Thus:
The R-factor ofX is the Cholesky factor ofXTX. The Q factor is XR~l.
Also
It is worth noting that (1.3) gives us two distinct representations of Pj_. Although
they are mathematically equivalent, their numerical properties differ. Specifically, if
we have only a QR factorization of X, we must compute Pj_y in the form
If there is cancellation, the resulting vector may not be orthogonal to X. On the other
hand, if we have a full QR decomposition, we can compute
This expresses P_\_ y explicitly as a linear combination of the columns of Q j., and hence
it will be orthogonal to H(X) to working accuracy. We will return to this point when
we discuss the Gram-Schmidt algorithm (§1.4).
The above formulas for projections are the ones customarily used by numerical
analysts. People in other fields tend to write the projection in terms of the original
matrix X. The formula, which we have already given in §4.2, Chapter 1, can be easily
derived from (1.2). If we write Q = XR~l, then
This formula can be written more succinctly in terms of the pseudoinverse of X, which
is defined by
and
There are alternative expressions for X^ in terms of the QR and singular value
factorizations of X. Specifically,
Then:
The QR factorization ofXi is Xi = Q\R\\.
If we compute the second column of the partition we get
Let Pj1 be the projection onto the orthogonal complement of Tl(Xi). Then from the
above equation
In other words:
The matrix QiRii is the projection of X^ onto the orthogonal complement of
ft(Xi).
One final result. Consider the partitioned cross-product matrix
The right-hand side of this equation is a Cholesky factorization of the left-hand side.
By Theorem 1.6, Chapter 3, the matrix #22-^22 is the Schur complement of X^X\.
Hence:
The matrix #22 is the Cholesky factor of the Schur complement ofX^Xi in
XTX.
Then
Thus the singular values of X and R are the same, as are their left singular vectors.
Householder transformations
Before we introduce Householder transformations, let us look ahead and see how we
are going to use them to triangularize a matrix. Partition X in the form (x\ Xi) and
www.pdfgrip.com
let H be an orthogonal matrix whose first row is zi/||£i||2- Then Hx\ = ||xi||ei. It
follows that
where
It is easy to see that the operation count for this algorithm is Inp flam, which is satis-
factorily small.
We must now show that Householder transformations can be used like elementary
lower triangular matrices to introduce zeros into a vector. The basic construction is
contained in the following theorem.
www.pdfgrip.com
Hence
This algorithm takes a vector x and produces a vector u that generates a Householder
transformation H = I - uu^ such that Hx = =F||x'||2ei. The quantity TlNh is
returned in v.
1. housegen(x, u, v]
2. u=x
3. v= ||u||2
4. if v = 0; u[l] = 1/2; return ; fl
5. u = xjv
6. if(u[l]>0)
7. u[l] = u[l] + 1
8. z/ = -v
9. else
10. «[1] = «[1] - 1
11. end if
12. u = u/^/\u[l]\
13. endhousegen
(However, the alternate transformation can be computed stably. See the notes and ref-
erences.)
• If ||ar||2 / 1, we can generate u from a;/||ar||2, in which case
where p is a scalar of absolute value one chosen to make the first component of u non-
negative. We then proceed as usual. The resulting Householder transformation satis-
www.pdfgrip.com
fies
Householder triangularization
Let us now return the orthogonal triangularization of X. A little northwest index-
ing will help us derive the algorithm. Suppose that we have determined Householder
transformations H\,... , Hk-i so that
Hence:
Algorithm 1.2 requires (np2 — ^p3) flam.
When n > p, the np2 term dominates. On the other hand, when n = p, the count
reduces to |n3 flam, which is twice the count for Gaussian elimination.
• If we partition X = (Xi X^}, where X\ has q columns, then HI • • • Hq is the
orthogonal part of the QR decomposition of Q. Thus, having computed the factored
decomposition of X, we have a factored decomposition of every initial set of columns.
• The algorithm is backward stable, as we shall see in Theorem 1.5.
• The algorithm can be blocked, but the process is more complicated than with the
variants of Gaussian elimination. The reason is that the transformations must be fur-
ther massaged so that their effect can be expressed in terms of matrix-matrix opera-
tions. This topic is treated at the end of this subsection.
• The algorithm works when n < p. In this case the final matrix has the form
where RU is upper triangular. The operation count changes to (pn 2 -^n 3 ) flam.
• The algorithm can be applied to a matrix that is not of full rank. Thus it gives a
constructive proof that matrices of any rank have a QR decomposition. However, R
www.pdfgrip.com
Computation of projections
After a Householder reduction of X, the orthogonal part of its QR decomposition is
given by
where ra = min{n—1, p}. We will now show how to use this factored form to com-
pute projections of a vector y onto H(X) and its orthogonal complement.
Let the orthogonal part of the QR decomposition be partitioned as usual in the form
Let
Thus to compute PX y all we have to do is to compute z = Q^y, zero out the last n—p
components of z to get an new vector z, and then compute Pxy = Qz. Similarly, to
compute P±y we zero out the first p components of z and multiply by Q.
Algorithm 1.3 is an implementation of this procedure. It is easily seen that it
requires (Inp — ^p2) flam to perform the forward multiplication and the same for each
of the back multiplications. If n > p, the total is essentially 6np flam, which compares
favorably with multiplication by an nxp matrix.
As another illustration of the manipulation of the Householder QR decomposition,
suppose we wish to compute Qx of the QR factorization. We can write this matrix in
the form
www.pdfgrip.com
This algorithm takes a vector y and the output of Algorithm 1.2 and computes yx =
Pxyandyji - PLy.
1. hproj(n, p, U, y, yx,y±)
2. yx = y
3. fork = ltop
4. v = U[k:n, fc]T*j/;f [fc:n]
5. yx[k'.n] = yx[k:n] — v*U[k:n, k]
6. end for k
7. yi. = yx
8. »x[l:p] = 0
9. yx\P+l-n] = 0
10. for A; = p t o l b y -1
11. v = !7[A;:n,^]T*T/x[^:^]
12. y^[fc:^] = J/;f[k:ra] - z/*?7[A;:n,fc]
13. v = U[k:n,k]'T*y±[k:n]
14. 2/j_[&:rc] = yi[^:^] - v*U[k:n, k]
15. end for k
16. end Aprey
T
Consequently, we can generate Qx by computing the product of Q and (Ip 0) . The
algorithm is simplicity itself.
1. fiX[l:p,:] = /p
2. gX[p+l:n,:] = 0
3. forfc = p t o l b y - l
4. VT = C/[A;:n,fc]T*(2A:[A;:n,A;:p]
5. QX[k:n, k:p] = QX[k:n, k:p] - U[k:n, k}^
6. end for k
Note that at the fcth step of the algorithm it is only necessary to work with QX[k:n, k:p],
the rest of the array being unaffected by the transformation Hk- The operation count
for the algorithm is (np2 - ^p3) flam—the same as for Householder triangularization.
Numerical stability
The hard part about the error analysis of Householder transformations is to decide what
to prove. There are three problems.
The first problem is that Householder transformations are used both to triangu-
larize matrices and then later to compute such things as projections. We will sidestep
this problem by giving a general analysis of what it means to multiply a vector by a se-
www.pdfgrip.com
quence of Householder transformations and then apply the general analysis to specific
cases.
The second problem is that when a transformation is used to introduce zeros into
a vector, we do not actually transform the vector but set the components to zero. For-
tunately, the error analysis can be extended to this case.
The third problem is that we must deal with three different kinds of transforma-
tions.
1. The transformations we would have computed if we had done ex-
act computations. We have been denoting these transformations
generically by H.
2. The transformations we would have computed by exact compu-
tation in the course of the inexact reduction. We will denote these n i ryi
byH.
3. The transformations we actually apply. This includes the errors
made in generating the transformation (Algorithm 1.1) and those
made in applying the transformation via (1.7). We will use the fl
notation to describe the effects of these transformations.
The key to solving the problem of multiple classes of transformations is to forget about
the first kind of transformation, which is unknowable, and focus on the relation be-
tween the second and the third.
With these preliminaries, the basic result can be stated as follows.
Theorem 1.5 (Wilkinson). Let Q = Hi • • • Hm be a product of Householder trans-
formations, and let b = fi(Hm • • • Hio). Then
where
(the right-hand side follows from the fact that Hp+i,... , Hp operate only on the zero
part offt(Hk • • -Hi)xk). If, as above, we set Qj = Hp • • • H I , then by Theorem 1.5
there is a vector ek such that
where f is the computed value of the fcth column of R. From (1.11) and the fact that
Xk is multiplied by only k transformations we see that the kth column of E satisfies
This is the usual bound reported in the literature, but it should be kept in mind that it
is derived from the more flexible columnwise bound (1.12).
In assessing these bounds it is important to understand that it does not say that Q
and R are near the matrices that would be obtained by exact computation with X. For
example, the column spaces of X and X + E may differ greatly, in which case the
compute Qx will differ greatly from its exact counterpart. This phenomenon is worth
pursuing.
Example 1.6. Let
while
These are clearly different spaces. And in fact the Q factors ofX and X are
www.pdfgrip.com
and
A consequence of this example is that when we use (1.9) to compute Qx» the
columns of the resulting matrix may not span 7l(X). However, we can use our error
analysis to show that the columns of the computed matrix—call itQx — are orthog-
onal to working accuracy. Specifically,
It follows from the exact orthogonality of the product H\- • -Hp that
where E\ consists of the first p rows of E. Ignoring the second-order term, we have
This ability to produce almost exactly orthogonal bases is one of the strong points of
orthogonal triangularization.
Graded matrices
An important feature of the backward error bound (1.12) is that it is independent of the
scaling of the columns of X. Unfortunately, the backward error is not independent of
row scaling, as the following example shows.
which differs from the first in having its first and third rows interchanged. If we try
our procedure on this system, we get
In trying to find out what is going on it is important to keep in mind where the
mystery is. It is not mysterious that one can get inaccurate results. The error analysis
says that the backward relative normwise error ||.Z?||F/||>I||F is small. But each system
has a very small row, which can be overwhelmed by that error. In fact this is just what
has happened in the second system. The backward error is
The backward error in the first row is almost as large as the row itself.
The mystery comes when we compute the backward error in the first system:
This represents a very small relative error in each of the elements of the matrix A,
which accounts for the accuracy of the solution. But what accounts for the low relative
backward error?
www.pdfgrip.com
There is no truly rigorous answer to this question. The matrices of these two sys-
tems are said to be graded, meaning their elements show an upward or downward trend
as we pass from the top to the bottom of the matrix. The second system grades up, and
it is easy to see why it is a disaster. When we normalize its first column, preparatory
to computing the first Householder transformation, we get
.000000000000006E+00.
Only the rounded first digit of the first component is preserved. The loss of information
in that first component is sufficient to account for the inaccuracy. (Actually, all the
elements in the first row are affected, and it is an instructive exercise to see how this
comes about.)
On the other hand if the matrix is graded downward, the results of Householder
reduction are often quite satisfactory. The reason is that the vectors generating the
Householder transformations tend to share the grading of the matrix. In this case when
we apply the transformation to a column of A in the form
the corresponding components of the terms a and (w T a)w are roughly the same size
so that large components cannot wash out small ones. However, we cannot rule out
the possibility that an unfortunate cancellation of large elements will produce a u that
is not properly graded.
Example 1.8. Consider the matrix
Note that owing to rounding error the (2,2)-element, which should be exactly zero,
is at the level of the rounding unit and only an order of magnitude different from the
(3,2)-element. Consequently the next Householder trans formation will be reasonably
www.pdfgrip.com
balanced and will mix the (2,3)- and (3,3)-elements, largely destroying the latter. In
fact the backward error for the full reduction is
2
The relative backward error in the (3,3)-element is 5.4-10 —i.e., only two figures
are accurate.
It is worth observing that if we interchange the second and third columns of A, the
leading 2x2 matrix is well conditioned and the problem goes away. In this case the
backward error is
Blocked reduction
In §3.3, Chapter 2, we described a technique, called blocking, that could potentially
enhance the performance of algorithms on machines with hierarchical memories. For
Householder triangularization, the analogue of the algorithm in Figure 3.3, Chapter 2,
is the following. In Algorithm 1.2, having chosen a block size ra, we generate the
Householder transformations^—uiw^),... , (/—w m w^)from^T[l:n, l:m] but defer
applying them to the rest of the matrix until they have all been generated. At that point
we are faced with the problem of computing
in the form
where T is upper triangular. Specifically, we have the following theorem, which ap-
plies not just to Householder transformations but to any product of the form (1.16).
www.pdfgrip.com
This algorithm takes a sequence of m vectors contained in the array U and returns
an upper triangular matrix T such that (I - uiu^)(I - u^u^) • • • ( / - UmU^) =
I - UTUT.
1. utu(m, U, T)
2. for j = I to m
3.' T\j,j] = l
4. T(l:j-l,j] = U [ : , l : j - l ] r f * U [ : , j ]
5. T[l:j-lJ] = -T[l:j-l, ly-l]*r[ly-l, j]
6. end for j
7. end utu
where T is unit upper triangular. We will call this the UTU form of the product. Note
that the vectors HJ appear unchanged in U. The only new item is the matrix T. The
procedure for generating T is implemented in Algorithm 1.4. Two comments.
• If the vectors are of length n, then the algorithm takes
(|m 2 n+ |ra3)flam.
This algorithm takes an nxp matrix X and a block size m and produces q = \p/m\
orthogonal transformations / - UkTkU^ in UTU form such that
on X [ : , lira]. The transformations are then put in UTU form, after which they can be
applied to the rest of the matrix as
The process is then repeated with the next set of m columns. We can use Algorithm 1.2
to reduce the blocks.
Algorithm 1.5 implements this procedure. Here are some observations.
• The transpose in statement 9 reflects the inconsistency between order of storage and
order of application in the UTU representation.
• If m is not large compared with p, the blocked algorithm requires about np2 flam,
the same as for the unblocked algorithm. In this case, the overhead to form the UTU
representations is negligible.
• We have not tried to economize on storage. In practice, the vectors in U and the
matrix R would share the storage originally occupied by X. The matrices Tj could
occupy an mxp array (or \rnxp array, if packed storage is used).
www.pdfgrip.com
• The UTU form of the transformations enjoy the same numerical properties as the
original transformations. In particular the natural analogue of Theorem 1.5 holds.
• Because the application of the transformations in a block are deferred, one cannot
column pivot for size as a block of transformations are accumulated. This is a serious
drawback to the algorithm in some applications.
• In the unblocked form of the algorithm it is possible to recover the QR decompo-
sition of any initial set of columns of X. Because the blocked algorithm recasts each
block of Householder transformations as a single UTU transformation, we can only
recover initial decompositions that are conformal with the block structure.
With Gaussian elimination, blocking is unlikely to hurt and may help a great deal.
For triangularization by Householder transformations the situation is mixed. If one
needs to pivot or get at initial partitions of the decomposition—as is true of many ap-
plications in statistics—then the blocked algorithm is at a disadvantage. On the other
hand, if one just needs the full decomposition, blocking is a reasonable thing to do.
This is invariably true when Householder transformations are used to compute an in-
termediate decomposition— as often happens in the solution of eigenvalue problems.
Then only the subdiagonal of H has to be annihilated. In this case it would be inef-
ficient to apply the full Householder triangularization to H. Instead we should apply
2x2 transformations to the rows of H to put zeros on the subdiagonal (details later).
Now applying a 2 x 2 Householder transformation to a vector requires 3 fladd+4 flmlt.
On the other hand to multiply the same vector by a 2 x 2 matrix requires 2 fladd +
4 flmlt. If the order of X is large enough, it will pay us to reconstitute the Householder
transformation as a matrix before we apply it.
An alternative is to generate a 2 x 2 orthogonal matrix directly. The matrices that
are conventionally used are called plane rotations. This subsection is devoted to the
basics properties of these transformations.
Plane rotations
We begin with a definition.
www.pdfgrip.com
where
The vector
Rotations would not be of much use if we could only apply them to 2-vectors.
However, we can apply them to rows and columns of matrices. Specifically, define a
rotation in the (i, j)-plane as a matrix of the form
www.pdfgrip.com
The following algorithm generates a plane rotation from the quantities a and 6. It over-
writes a with \/«2 -f b2 and b with 0.
1. rotgen(a, 6, c, 5)
2. r = |a| + |6|
3. if(r = 0)
4. c = 1; s = 0; return
5. end if
6. i/ = r* x /(a/^) 2 +(6/r) 2
7. c = a/z/; 5 = 6/i>
8. a = v\ b - 0
9. end rotgen
In other words, a rotation in the (z, j)-plane is an identity matrix in which a plane ro-
tation has been embedded in the submatrix corresponding to rows and columns i and
3-
To see the effect of a rotation in the (z, j)-plane on a matrix, let X be a matrix and
let
The following function applies a rotation to two vectors x and y, overwriting the vec-
tors.
1. rotapp(c, s,x,y)
2. t — c*x+s*y
3. y = c*y—s*x
4. x =t
5. end rotapp
• The scaling factor r is introduced to avoid overflows and make underflows harmless
(see Algorithm 4.1, Chapter 2, for more details).
• Since the vectors in rotapp overwrite themselves, it is necessary to create a third
vector to contain intermediate values. In a real-life implementation one must take care
that the program does not call a storage allocator each time it is invoked.
• As a sort of shorthand we will write
rotapp(c, s,x,y)
even when x and y are scalars. In applications the operations should be written out in
scalar form to avoid the overhead of invoking rotapp.
• In a BLAS implementation the vectors x and y would be accompanied by strides
telling how they are allocated in memory.
• If we are computing in complex arithmetic, considerable savings can be effected
by scaling the rotation so that the cosine c is real. To multiply a complex 2-vector by
a complex plane rotation requires 16 flmlt + 4 fladd. If the cosine is real, this count
becomes 12 flmlt + 2 fladd. The price to be paid is that v becomes complex.
Transformations that can introduce a zero into a matrix also have the power to
destroy zeros that are already there. In particular, plane rotations are frequently used
to move a nonzero element around in a matrix by successively annihilating it in one
position and letting it pop up elsewhere. In designing algorithms like this, it is useful
to think of the transformations as a game played on a Wilkinson diagram. Here are the
rules for one step of the game.
Begin by selecting two rows of the matrix (or two columns):
www.pdfgrip.com
The meaning of this sequence is the following. The arrows in a particular diagram
point to the rows on which the plane rotation will operate. The X with a hat is the ele-
ment that will be annihilated. On the double arrow following the diagram is the name
of the rotation that effects the transformation. Thus the above sequence describes a
transformation P^P^P^Pi2% by a sequence of rotations in the (i, i+l)-plane that
successively annihilates the elements #21, #32» #43, and #54.
Algorithm 1.8 implements this procedure. The very simple code is typical of algo-
rithms involving plane rotations. An operation count is easily derived. The application
of a rotation to a pair of scalars requires 2 fladd and 4 flmlt. Since statement 3 performs
this operation about n—k times, we find on integrating from k = 0 to k = n that
Here we have introduced the notation "flrot" as an abbreviation for 2 fladd -f 4 flmlt
(see Figure 2.1, Chapter 2).
Algorithm 1.8 has the disadvantage that it is row oriented. Now in many appli-
cations involving plane rotations the matrices are not very large, and the difference
between column and row orientation is moot. However, if we are willing to store our
rotations, we can apply them to each column until we reach the diagonal and then gen-
erate the next rotation. Algorithm 1.9 is an implementation of this idea. It should be
stressed that this algorithm is numerically the exact equivalent of Algorithm 1.8. The
only difference is the way the calculations are interleaved. Note the inefficient use of
rotapp with the scalars H[i, k] and H[i, k-\-l].
Numerical properties
Plane rotations enjoy the same stability properties as Householder transformations.
Specifically, Theorem 1.5 continues to hold when the Householder transformations are
replaced by plane rotations. However, in many algorithms some of the plane rotations
are nonoverlapping. For example, in Algorithm 1.8 each row is touched by at most
www.pdfgrip.com
1. fork = 1 ton
2. fon'= ltofc-1
3. rotapp(c[i\t s[t\, H[i, k], H[i+l, k])
4. end for i
5. rotgen(H(k, k], H[k+l, k], c[k], s[k])
6. end for k
two rotations. This sometimes makes it possible to reduce the constant multiplying
the rounding unit in the error bounds.
Plane rotations tend to perform better than Householder transformations on graded
matrices. For example, if a plane rotation is generated from a vector whose grading is
downward, say
Thus it is a perturbation of the identity and will not combine small and large elements.
On the other hand, if the grading is upward, say
Thus the rotation is effectively an exchange matrix (with a sign change) and once again
does not combine large and small elements.
However, as with Householder transformations, we can prove nothing in general.
It is possible for an unfortunate cancellation to produce a balanced transformation that
combines large and small elements. In fact, the matrix A of Example 1.8 serves as a
counterexample for plane rotations as well as Householder transformations.
www.pdfgrip.com
and we want to compute the QR factorization of (Xi Xk), where as usual Xk is the fcth
column of X.
The projection of Xk onto the orthogonal complement of 7l(X\) = TZ(Qi) is
Now x jj- cannot be zero, for that would mean that it lies in H(Xi). Hence if we define
It follows that
www.pdfgrip.com
Given an nxp matrix X with linearly independent columns, this algorithm computes
the QR factorization of X.
1. for A; = 1 top
2. Q[:,k] = X[:,k]
3. if (Ml)
4. R[l:k-l,k] = Q[:,1:*-1]T*Q[:,*]
5. Q[:, fc] = Q[:, k] - Q[:, Irfc-l]*^!^-!, Jfc]
6. end if
7. £[M] = ||g[:,*]||2
8. Q[:,k] = Q[:,k]/R[k,k]
9. end for k
Given an nxp matrix X with linearly independent columns, this algorithm computes
the QR factorization of X by the modified Gram-Schmidt method in a version that
constructs R column by column.
1. forfc = ltop
2. Q[:,k] = X[:,k]
3. fori=ltofc-l
4. R[i,k]=Q[:,i]t*Q[:,k]
5. Q[:,k]=Q[:,k]-R[i,k]*Q[:,i]
6. end for i
1. R[k,k] = \\Q[:,k]\\2
8. Q[:,k] = Q[:,k]/R[k,k}
9. end for k
Given an nxp matrix X with linearly independent columns, this algorithm computes
the QR factorization of X by the modified Gram-Schmidt method, in a version that
constructs R row by row.
1. Q = X
2. for k = 1 to p
3. fl[M]=||Q(:,*)||2
4. g[:,fc] = g[:,fc]/fl[M]
5. JZ[fc, k+l:p] = g[:, fc]T*g[:, fc+l:p]
6. g[:,fc+l:p]= g[:, k+l:p] - Q(:, k]*R[k, k+l:p]
7. end for k
This process can be continued until all that is left is qkTkk, from which r^k can be
obtained by normalization.
Algorithm 1.11 implements this scheme. It is called the modified Gram-Schmidt
algorithm—a slightly misleading name, since it is no mere rearrangement of the clas-
sical Gram-Schmidt algorithm but a new algorithm with, as we shall see, greatly dif-
ferent numerical properties.
Algorithm 1.11 builds up R column by column. A different interleaving of the
computations, shown in Algorithm 1.12, builds up R row by row. It should be stressed
that this algorithm is numerically the exact equivalent of Algorithm 1.11 in the sense
that it will produce exactly the same results in computer arithmetic. However, it is
www.pdfgrip.com
Equivalently,
Now Qi is not orthonormal. But from the relation (1.22), we can show that there is an
orthonormal matrix Q, satisfying A+E = Q R, where 11 ej \ \p < \ \ ef ^ \ \F+11 eJ2) 11F (t
proof of this fact is not trivial). From this and the bound (1.12) we get the following
theorem.
www.pdfgrip.com
Theorem 1.12. Let Q and R denote the matrices computed by the modified Gram-
Schmidt algorithm in floating-point arithmetic with rounding unit eM. Then there is an
orthonormal matrix Q such that
Here <p is a slowly growing function ofn and p. Moreover, there is a matrix F such
that
Equation (1.23) says that the factor R computed by the modified Gram-Schmidt
algorithm is the exact R-factor of a slightly perturbed X. The bound is columnwise—
as might be expected, since scaling a column of X does not materially affect the course
of the algorithm. Unfortunately, Q can have little to do with the computed Q.
Equation (1.25), on the other hand, says that the product of the factors we actually
compute reproduces X accurately. Unfortunately, there is nothing to insure that the
columns of Q are orthogonal. Let us look at this problem more carefully.
Loss of orthogonality
The classical and modified Gram-Schmidt algorithms are identical when they are ap-
plied to two vectors. Even in this simple case the resulting Q can be far from orthonor-
mal. Suppose, for example, that x\ and #2 are nearly proportional. If P^- denotes the
projection onto the orthogonal complement of x\, then P^x^ will be small compared
to X2. Now the Gram-Schmidt algorithm computes this projection in the form
where q\ — ^i/||^i||2- The only way we can get a small vector out of this difference
is for there to be cancellation, which will magnify the inevitable rounding errors in #2
and qi. Rounding errors are seldom orthogonal to anything useful.
A numerical example will make this point clear.
is exactly orthogonal. Let us take the first column u\ ofUasxi. For x<2 we round u\
to three digits:
It is worth noting that if we write P^- in the form u^u^ then we can compute the
projection in the form
This vector is almost exactly orthogonal to xi (though it is not accurate, since there is
cancellation in the computation of the inner product qjx2\ This is the kind of result
we would get if we used the basis Q j. from Householder's triangularization to compute
the projection. Thus when it comes to computing projections, orthogonal triangular-
ization is superior to both versions of Gram-Schmidt.
When p > 2, the classical and modified Gram-Schmidt algorithms go their sep-
arate ways. The columns of Q produced by the classical Gram-Schmidt can quickly
lose all semblance of orthogonality. On the other hand, the loss of orthogonality in the
Q produced by the modified Gram-Schmidt algorithm is proportional to the condition
number of R. Specifically, we have the following theorem.
Theorem 1.14. Let X = QR be the QR factorization ofX. Let Q and R be the
QR factorization computed by the modified Gram-Schmidt algorithm in floating-point
arithmetic with rounding unit eu. Let Q be the orthogonal matrix whose existence is
guaranteed by Theorem 1.12. Then there is a constant 7 such that
Proof. From the bounds of Theorem 1.12, we can conclude that there is a constant 7
such that
Since X+E = QR and Q is exactly orthogonal, the smallest singular value of R is the
same as the smallest singular value of X + E, which is bounded below by a - \\E\\p,
where a is the smallest singular value of X and R (they are the same). Thus
we have
where U and V are random orthonormal matrices. Thus the singular values ofX are
1,10"1,... , 10~9 andKp(X) = 109. Both the classical Gram-Schmidt and modified
www.pdfgrip.com
Gram-Schmidt algorithms were applied to this matrix with the following results.
Reorthogonalization
The loss of orthogonality generated by the Gram-Schmidt algorithm is acceptable in
some applications. However, in others—updating, for example—we demand more.
Specifically, given a vector a vector x and an orthonormal matrix Q we need to com-
pute quantities #j_, r, and p such that
1. p-*\\xL\\ = l,
2. x — Qr + #j_ to working accuracy,
3. 7Z(Q) JL p~1x_i to working accuracy.
The first item says that x j_ ^ 0 so that it can be normalized. The second item says that
if we set q = p~lx^,ihenx = Qr+pq', i.e., rand p can be regarded as forming the last
column of a QR factorization. The third says that to working accuracy q is orthogonal
to 7l(Q). It is the last item that gives the Gram-Schmidt algorithms trouble. The cure
is reorthogonalization.
To motivate the reorthogonalization procedure, suppose that we have computed a
nonzero vector x j_ that satisfies
but that X_L is not sufficiently orthogonal to7l(Q). If we ignore rounding error and
define
www.pdfgrip.com
then
By construction xj_ is exactly orthogonal to 72. (Q). It is not unreasonable to expect that
in the presence of rounding error the orthogonality of x j. will be improved. All this
suggests the following iterative algorithm for orthogonalizing x against the columns
ofQ.
1. a;_L = X
2. r=0
3. while (true)
4. s = gTarj.
5. X_L = x± — Qs
6. r=r+s
1. if (x j_ is satisfactory) leave the loop; fl
8. end while
9- />
10. g
Let us see what happens when this algorithm is applied to the results of Exam-
ple 1.13.
Example 1.16. In attempting to orthogonalize
for which
where
Hence if
is small and x±_ lies almost exactly in T^(^)1. This means that we can tell if the current
x± is satisfactory by choosing a tolerance a—e.g., a = | — and demanding
and x± satisfy (1.32).
This analysis also suggests that the loop in (1.29) is unlikely be executed more
than twice. The reason is that loss of orthogonality can occur only when x is very near
7£(<2). If the loss of orthogonality is not catastrophic, the vector x± will not be near
7£(Q), and the next iteration will produce a vector that is almost exactly orthogonal.
On the other had if there is a catastrophic loss of orthogonality, the vector x± will be
dominated by the vector e of rounding errors. This vector is unlikely to be near H(Q),
and once again the next iterate will give an orthogonal vector.
There still remains the unlikely possibility that the vector e and its successors are
all very near ft(Q), so that the vectors x± keep getting smaller and smaller without
becoming orthogonal. Or it may happen that one of the iterates becomes exactly zero.
In either case, once the current xi is below the rounding unit times the norm of the
original x, we may replace x± with an arbitrary vector of the same norm, and the rela-
tion x = Qr + x _|_ will remain valid to working accuracy. In particular, if we choose
www.pdfgrip.com
This algorithm takes an orthonormal matrix Q and a nonzero vector x and returns a
vector q of norm one, a vector r, and a scalar p such that x — Qr + pq to working
accuracy. Moreover, Qlq = 0 in proportion as the parameter a is near one.
T
1. gsreorthog(Q, x,q,r, p)
2. l, = <r=\\X\\2
3. zj_ = x
4. r=0
5. while (true)
6. s = QTx±
7. r = r+5
8. zj_ = x±_ — Qs
9. r = ||*JL||2
10. if (r/cr > a) leave the loop; fi
11. if(r > 0.1*f*eM)
12. CT= r
13. else
14. i/ = a = 0.1*cr*€ M
15. i — index of the row of minimal 1-norm in Q
16. xji — <r*ej
17. end if
18. end while
19- P= \\XL\\I
20. q = xLlp
21. end for gsreorthog
x_i to have a significant component in K(Q)1-, then the iteration will terminate after
one or two further iterations. A good choice is the vector et, where i is the index of
the row of least 1-norm in Q.
Although the analysis we have given here is informal in that it assumes the exact
orthonormality of Q, it can be extended to the case where the columns of Q are nearly
orthonormal.
Algorithm 1.13 implements this reorthogonalization scheme. The value of \\x \%
is held in v. The current \\x_i\\-2 is held in a and the value of \\x±.\\2 is held in r. Thus
statement 10 tests for orthogonality by comparing the reduction in \\x_i JJ2- If that test
fails, the algorithm goes on to ask if the current x j_ is negligible compared to the orig-
inal x (statement 11). If it is not, another step of reorthogonalization is performed. If
it is, the original vector x is replaced by a suitable, small vector, after which the algo-
rithm will terminate after no more than two iterations.
www.pdfgrip.com
The QR decomposition
For historical comments on the QR factorization see §4.6, Chapter 1.
www.pdfgrip.com
The pseudoinverse
In Theorem 3.23, Chapter 1, we showed that any matrix X of full column rank has a
left inverse X1 satisfying X1X = /. The pseudoinverse X f = X(X1:X)-1XT is
one of many possible choices—but an important one. It has the useful property that,
of all left-inverses, it has minimal Frobenius norm. This result (though not phrased
in terms of matrices) was essentially established by Gauss [133,1823] to support his
second justification of least squares. The modern formulation of the pseudoinverse is
due to Moore [236, 1920], Bjerhammer [34, 1951], and Penrose [259, 1955], all of
whom considered the case where X is not of full rank.
For full-rank matrices, the pseudoinverse is a useful notational device, whose for-
mulas can be effectively implemented by numerical algorithms. As with the matrix
inverse, however, one seldom has to compute the pseudoinverse itself. For matrices
that are not of full rank, one is faced with the difficult problem of determining rank—
usually in the presence of error. We will treat this important problem in the next chap-
ter.
The pseudoinverse is only one of many generalized inverses which have been pro-
posed over the years (Penrose's paper seems to have triggered the vogue). For a brief
introduction to the subject via the singular value decomposition see [310, §111.1.1]. For
an annotated bibliography containing 1776 entries see the collection edited by Nashed
and Rail [241].
Householder triangularization
Householder transformations seem first to have appeared in a text by Turnbull and
Aitken [322,1932], where they were used to establish Schur's result [274,1909] that
any square matrix can be triangularized by a unitary similarity transformation. They
also appear as a special case of a class of transformations in [117,1951]. Householder
[188,1958], who discovered the transformations independently, was the first to realize
their computational significance.
Householder called his transformations elementary Hermitian matrices in his The-
ory of Matrices in Numerical Analysis [189], a usage which has gone out fashion.
Since the Householder transformation / - UUT reflects the vector u through its or-
thogonal complement (which remains invariant), these transformations have also been
called elementary reflectors.
www.pdfgrip.com
Householder seems to have missed the fact that there are two transformations that
will reduce a vector to a multiple of ei and that the natural construction of one of them
is unstable. This oversight was corrected by Wilkinson [343]. Parlett [252, 253] has
shown how to generate the alternative transformation in a stable manner.
Although Householder derived his triangularization algorithm for a square matrix,
he pointed out that it could be applied to rectangular matrices. We will return to this
point in the next section, where we treat algorithms for least squares problems.
Rounding-error analysis
The rounding-error analysis of Householder transformations is due to Wilkinson [346,
347]. Higham gives a proof of Theorem 1.5 in [177, §18.3].
Martin, Reinsch, and Wilkinson [225, 1968] noted that graded matrices must be
oriented as suggested in (1.14) to be successfully reduced by Householder transforma-
tions. Simultaneously, Powell and Reid [264,1968] showed that under a combination
of column pivoting on the norms of the columns and row pivoting for size the reduc-
tion is rowwise stable. Cox and Higham [77] give an improved analysis, in which
they show that the row pivoting for size can be replaced by presorting the rows. Un-
fortunately, these results contain a growth factor which can be large if an initial set of
rows is intrinsically ill conditioned—something that can easily occur in the weighting
method for constrained least squares (§2.4).
Blocked reduction
The first blocked triangularization by orthogonal transformations is due to Bischof
and Van Loan [33]. They expressed the product of A; Householder transformations in
the form WYT where W and Y are nxk matrices. The UTU representation (Theo-
rem 1.9), which requires only half the storage, is due to Schreiber and Van Loan [273].
For an error analysis see [177, §18.4].
Plane rotations
Rotations of the form (1.19) were used by Jacobi [190, 1846] in his celebrated al-
gorithm for the symmetric eigenvalue problem. They are usually distinguished from
plane rotations because Jacobi chose his angle to diagonalize a 2 x 2 symmetric matrix.
Givens [145,1954] was the first to use them to introduce a zero at a critical point in a
matrix; hence they are often called Givens rotations.
For error analyses of plane rotations see [346, pp. 131-143], [142], and especially
[177, §18.5].
The superior performance of plane rotations on graded matrices is part of the folk-
lore. As Example 1.8 shows, there are no rigorous general results. In special cases,
however, it may be possible to show something. For example, Demmel and Veselic
[93] have shown that Jacobi's method applied to a positive define matrices is superior
to Householder tridiagonalization followed by the QR algorithm. Mention should also
be made of the analysis of Anda and Park [8].
www.pdfgrip.com
Storing rotations
In most applications plane rotations are used to refine or update an existing decom-
position. In this case the rotations are accumulated in the orthogonal part of the de-
composition. However, rotations can also be used as an alternative to Householder
transformations to triangularize an nxp matrix. If n > p, then the reducing matrix
must be stored in factored form—i.e., the rotations must be stored. If we store both the
sine and cosine, the storage requirement is twice that of Householder transformations.
We could store, say, the cosine c, and recover the sine from the formula s = A/1 - c2.
However, this formula is unstable when c is near one. Stewart [289] shows how to
compute a single number from which both s and c can be stably retrieved.
Fast rotations
The operation counts for the application of a plane rotation to a matrix X can be re-
duced by scaling the rotation. For example, if c > s we can write the rotation
in the form
On the other had if c < s we can write the rotation in the form
Thus we can apply the scaled rotations Q k to X — at reduced cost because two of the
elements of Q are now one. The product of the scaling factors — one product for each
row of the matrix—can be accumulated separately.
This is the basic idea behind \hzfast rotations of Gentleman [141] and Hammar-
ling [170]. By a careful arrangement of the calculations it is possible to avoid the
square roots in the formation of fast rotations. The principal difficulty with the scaling
strategy is that the product of the scaling factors decreases monotonically and may un-
derflow. It is therefore necessary to monitor the product and rescale when it becomes
too small. Anda and Park [7] give a more flexible scheme that avoids this difficulty.
See also [214], [41, §2.3.3].
www.pdfgrip.com
Reorthogonalization
Rice [268] experimented with reorthogonalization to keep the Gram-Schmidt algo-
rithms on track. Error analyses have been given by Abdelmalek [2]; Daniel, Gragg,
Kaufman, and Stewart [84] and by Hoffman [179]. In particular, Hoffman investigates
the effect of varying the value of a and concludes that for a = \ both the classical and
modified Gram-Schmidt algorithms give orthogonality to working accuracy.
The twice-is-enough algorithm is due to Kahan and Parlett and is described in
[253, §6-9].
Given annxp matrix X of rank p and an n-vectory, find a vector b such that
www.pdfgrip.com
This problem, which goes under the name of the linear least squares problem, occurs
in virtually every branch of science and engineering and is one of the mainstays of
statistics.
Historically, least squares problems have been solved by forming and solving the
normal equations—a simple and natural procedure with much to recommend it (see
§2.2). However, the problem can also be solved using the QR decomposition—a pro-
cess with superior numerical properties. We will begin with the QR approach in §2.1
and then go on to the normal equations in §2.2. In §2.3 we will use error analysis and
perturbation theory to assess the methods. We then consider least squares problems
with linear equality constraints. We conclude with a brief treatment of iterative re-
finement of least squares solutions.
Now the second term in the sum \\zx - Rb\\\ + Iki Hi i§ constant. Hence the sum will
be minimized when \\zx — Rb\\\ is minimized. Since R is nonsingular, the minimizer
is the unique solution of the equation Rb = zx and the norm at the minimum is \\y -
Xb\\a = ||ZL|| 2 .
Since PX = QxQ\> we may calculate y = Xb in the y = Qx*x- Similarly we
may calculate the residual vectorr = y — Xb in the form r = Q±ZJ_. We summarize
the results in the following theorem.
Theorem 2.1. Let X be of full column rank and have a QR decomposition of the form
Then the solution of the least squares problem of minimizing \\y — Xb\\\ is uniquely
determined by the QR EQUATION
Moreover, the residual at the minimum is orthogonal to the column space ofX.
The way we have established this theorem is worth some comments. In Theo-
rem 4.26, Chapter 1, where we proved that the projection of a vector y onto a space
7Z(X) minimizes the distance between y and TZ(X), we wrote an equation of the form
www.pdfgrip.com
of X and a vector y and computes the solution 6 of the problem of minimizing ||y —
Xb\\2. It also returns the least squares approximation y = Xb and the residual r =
b - Ax.
Algorithm 2.1 is a computational summary of Theorem 2.1. Here are some com-
ments.
• The algorithm is generic in that it does not specify how the matrix Q is represented.
One possibility is that the matrix Q is known explicitly, in which case the formulas in
the algorithm can be implemented as shown. However, for n at all large this alternative
is inefficient in terms of both storage and operations.
• The solution of the triangular system in statement 3 can be accomplished by the
BLAS xebuib (see Figure 2.2, Chapter 2).
• If Householder triangularization is used, the products in the algorithm can be cal-
culated as in Algorithm 1.2. In this case, if n > p, the algorithm requires Gnpflam.
www.pdfgrip.com
This algorithm takes the output of Algorithm 1.9 and computes the solution 6 of the
problem of minimizing \\y — Hb\\z. It also returns the least squares approximation
y = Hb and the residual r = b - Hx.
1. y = y
2. for .7" = 1 ton
3. rotapp(c[j}, s[j], y[j], y[j+l})
4. end for j
5. xeuib(b, H[l:n, l:n],y[\.:n])
6. r = Q; r[n+l] = y[n+l}; y[n+l] = 0
7. for j = n to 1 by -1
8. rotapp(c[j], -s[jf], y[j], y[j+l})
9. rotapp(c[j], -s[j], r[j], r[j+l}}
10. endforj
Unless p is very small, this is insignificant compared to the original reduction. This
algorithm is often called the Golub-Householder algorithm. For more see the notes
and references.
• If plane rotations have been used to reduce the matrix, the details of the computa-
tion will depend on the details of the original reduction. For example, Algorithm 2.2
illustrates how the computations might proceed when plane rotations have been used
to triangularize an augmented Hessenberg matrix as in Algorithm 1.9. Since Q1- is
represented in the form
where PJJ is a rotation in the (z, j)-plane, the vector y must be computed in the form
Unlike Householder transformations, plane rotations are not symmetric, and this fact
accounts for the argument of —s[i] in statements 8 and 9.
• However the computations are done, y and r will be nearly orthogonal. But if X is
ill conditioned, they may be far from 1l(X) and Tl(X)L.
the columns of Qx are orthogonal to working accuracy. This factorization can be used
to solve least squares problems. The QR equations remain the same, i.e.,
Since we have no basis for the orthogonal complement of 7£( X), we must compute
the residual in the form
Note that this is equivalent to using the classical Gram-Schmidt algorithm to orthogo-
nalize y against the columns of Qx • By the analysis of reorthogonalization [see (1.32)
and (1.33)], we cannot guarantee the orthogonality of r to the columns of Qx when
r is small. It is important to keep in mind that here "small" means with respect to the
rounding unit. In real-life problems, where y is contaminated with error, r is seldom
small enough to deviate much from orthogonality. However, if orthogonality is im-
portant— say r is used in an algorithm that presupposes it—reorthogonalization will
cure the problem.
Then
In terms of our least squares problem, this theorem says that if we compute the QR
factorization of the augmented least squares matrix (X y), we get
www.pdfgrip.com
Let
Hence if we apply the Gram-Schmidt algorithm to the augmented least squares matrix,
we get right-hand side of the QR equation. For stability we must use the modified form
of the algorithm. These considerations yield Algorithm 2.3.
This algorithm is stable in the sense that the computed solution b comes from a
small perturbation of X that satisfies the bounds of the usual form. The reason is the
connection between modified Gram-Schmidt and Householder triangularization (The-
orem 1.11). In fact, least squares solutions produced by Algorithm 2.3 are generally
more accurate than solutions produced by orthogonal triangularization.
The stability result also implies that y = Qzx will be a good approximation to
Xb, where b is the computed solution. Thus we do not have to save X and compute
y in the form Xb. Unfortunately, the vector pq can consist entirely of rounding error,
and the residual is best computed in the form y — y.
Theorem 2.3. The least squares solution of the problem of minimizing \\y - Xb\\2
satisfies the NORMAL EQUATIONS
Then
Thus the element a;j of A can be formed by computing the inner products xjxj. By
symmetry we need only form the upper (or lower) part of A. It follows that:
The cross-product matrix can be formed in ^np2 flam.
This should be compared with the count of np2 flam for Householder triangularization
or the modified Gram-Schmidt algorithm.
For the outer-product method, partition X by rows:
Then
Thus the cross-product matrix can be formed by accumulating the sum of outer prod-
ucts in A.
An implementation of this method is given in Algorithm 2.4. If X is full, the algo-
rithm requires \np2 flam, just like the inner-product algorithm. But if X contains zero
elements, we can skip some of the computations, which is what the test in statement 4
accomplishes. It is not hard to show that
Ifthejth column ofX contains rrij zero elements, Algorithm 2.4 requires
This count shows that one should order the columns so that those having the most zeros
appear first. It also suggests that the potential savings in forming the cross-product of
a sparse least squares matrix can be substantial. This is one reason why the normal
equations are often preferred to their more stable orthogonal counterparts, which gain
less from sparseness.
There are variations on this algorithm. If, for example, on specialized computers
X is too large to hold in main memory, it can be brought in from a backing store several
rows at a time. Care should be taken that X is organized with proper locality on the
backing store. (See the discussion of hierarchical memories in §3.3, Chapter 2.)
www.pdfgrip.com
Since the Cholesky factor of this matrix is the R-factor of the augmented least squares
matrix, we have from (2.4) that
The Cholesky factor of the augmented-cross product matrix is the ma-
trix
Thus decomposing the augmented cross-product matrix gives the matrix and right-
hand side of the QR equation as well as the square root of the residual sum of squares.
where e'M is the adjusted rounding unit [see (4.12), Chapter 2].
www.pdfgrip.com
where G is small compared with A. (For simplicity we have omitted the error intro-
duced by the computation of c = X^y.) Thus the method of normal equations has a
backward error analysis in terms of the cross-product matrix A.
However, the cross-product matrix is an intermediate, computed quantity, and it
is reasonable to ask if the errors in the cross-product matrix can be propagated back to
the original matrix X. The answer is: Not in general. It is instructive see why.
The problem is to determine a matrix E such that
To give a sense of scale, we will assume that ||Jf||2, and hence ||A\\2, is equal to one.
The first thing to note is that the very structure of the left-hand side of (2.6) puts
limits on the perturbation G. For any cross-product matrix (X + E)^(X + E) must be
at least positive semidefinite. Thus the perturbation G must be restricted to not make
any eigenvalue of A negative. (See §2.1, Chapter 3.)
Let ap be the smallest singular value of X, so that a^ is the smallest eigenvalue of
A. Let dp and a* be the perturbed quantities. Then by the fourth item in Theorem 4.34,
Chapter 1, we must have
Even if G satisfies (2.7), the backward error E in (2.6) may be much larger than
G. To see this, we use the singular value decomposition to calculate an approximation
toE.
Let
If we set F2 = 0 and assume that F\ is small enough so that we can ignore the term
F^-FI, then we get
from which if follows that if <TJ / <7j then <^j = y?jj. Hence
Note that this relation gives the proper result for i = j. It also works when a^ = &j,
in which case it gives the solution for which v'ij-Vjt *s minimal.
It follows that \\Fi\\p < \\H\\p/2ffp. In terms of the original matrices (remember
that the 2-norm is unitarily invariant),
Dividing by ||^||F and remembering that ||X||f, > ||A||F and ||-X"'||F > a~ly we get
is the condition number of X [see (1.28)]. Thus when we throw the error G in A back
on X, the error can grow by as much as k Kp(X)/2.
To summarize:
www.pdfgrip.com
Moreover, if we attempt to project the error G back onto the least squares matrix
X, the resulting error E satisfies
In particular, aspp approaches K-p(X), the bound on the norm wise relative error
approaches one.
The error matrix G, which combines the error in forming the normal equations and the
error in solving the system, satisfies
(This bound includes the error in calculating c — X^y, which has been thrown onto
A via Theorem 3.14, Chapter 3.) The constant 7^ depends on n and p. As we have
seen, the error G cannot be projected back on X without magnification.
A more satisfactory result holds for a least squares solution 6QR computed from a
QR factorization obtained from either Householder triangularization or the modified
Gram-Schmidt algorithm. Specifically it is the solution of a perturbed problem
Again the constant 7QR depends on the dimensions of the problem. The error E in-
cludes the errors made in computing the QR factorization and the errors made in solv-
ing the QR equation.
www.pdfgrip.com
Theorem 2.4. Let A be nonsingular. Then for all sufficiently small G and any con-
sistent norm II • II
Proof. By Corollary 4.19, Chapter 1, we know that for all sufficiently small G the
matrix A + G is nonsingular and IK^l + G)"1!! is uniformly bounded. Hence the result
follows on taking norms in the equation
which is accurate if ||P|| is reasonably less than one [see (4.17), Chapter 1].
Now let Ab = c and (A + G\b = c. Then
www.pdfgrip.com
Hence
where, as usual, ^(A) = \\A\\\\A 1||. Here we have used the notation "<" to stress
that we have left out higher-order terms in E in the inequality.
To make comparisons easier, it will be convenient to write this bound in terms of
K%(X) = \\X\\2\\Xi\\2. From (1.6), we have ^(X) = <7i/<7p, where a\ and ap are
the largest and smallest singular values of X. Similarly, the condition number of A is
the ratio of its largest to smallest singular value. But the singular values of A are the
squares of the singular values of X. Hence
Since K,^ (X) > 1, squaring K2(X) can only increase it. This suggests that the per-
turbation theory of the normal equations may be less satisfactory than the perturbation
theory of the original least squares problem, to which we now turn.
where X^ is the pseudoinverse of X [See (1.4)]. Thus we can base the perturbation
theory for least squares solutions on a perturbation expansion for pseudoinverses—
just as we based our analysis of the normal equations on an expansion for the inverses.
We are looking for a first-order expansion of the matrix
bY (2.10)
www.pdfgrip.com
[t follows that
Then
Since Ej_X = 0,
Theorem 2.5. Let b = X^y and b = (X + E)\y + /). Then for sufficiently small E
Here the subscript EX and fx are the projections of E and f onto the column space
ofX.
The first term on the right-hand side of (2.14) is analogous to the bound for linear
systems. It says that the part of the error lying in T^(X} is magnified by ^{X} in the
solution.
The second term depends on K\(X} and is potentially much larger than the first
term. However, it is multiplied by IHb/H-X'lhll&lh. which can be small if the least
squares residual is small. But if the residual is not small, this term will dominate the
sum. The following example shows how strong this effect can be.
the first is exactly Xe and has norm about 6.4. The second is y\ + r, where r is in
f i ( X ) 1 - and has the same norm as y\. Figure 2.1 shows the effects of perturbations
in the least squares solutions. The first column gives the norm of the error. Following
the error are the relative errors in the perturbed solutions for y\ and 3/2 and below them
are the error bounds (computed without projecting the errors). Since y\ and r are of
a size and /^(X) is around 104, we would expect a deterioration of about 108 in the
solution with the large residual—exactly what we observe.
www.pdfgrip.com
Theorem 2.7. Let b be the solution of the least squares problem of minimizing \\y -
Xb\\2- Let &NE be the solution obtained by forming and solving the normal equations
www.pdfgrip.com
Let 6QR be the solution obtained from a QR factorization in the same arithmetic. Then
where r — y—X bis the residual vector. The constants 7 are slowly growing functions
of the dimensions of the problem.
Comparisons
We are now in a position to make an assessment of the two chief methods of solv-
ing least squares problems: the normal equations and backward stable variants of the
QR equation. We will consider three aspects of these methods: speed, stability, and
accuracy.
• Speed. Here the normal equations are the undisputed winner. They can be formed
from a dense n x p matrix X at a cost of \ np2 flam. On the other hand, Householder tri-
angularization or the modified Gram-Schmidt algorithms require np2 flam. It is easier
to take advantage of sparsity in forming the normal equations (see Algorithm 2.4).
• Stability. Here unitary triangularization is the winner. The computed solution is
the exact solution of a slightly perturbed problem. The same cannot be said of the
normal equations. As the condition number of X increases, ever larger errors must
be placed in X to account for the effects of rounding error on the normal equations.
When Ki(X) = y/e^, we cannot even guarantee that the computed normal equations
are positive definite.
• Accuracy. Here the QR approach has the edge—but not a large one. The perturba-
tion theory for the normal equations shows that ^(X) controls the size of the errors
we can expect. The bound for the solution computed from the QR equation also has a
term multiplied by K\ ( X ) , but this term is also multiplied by the scaled residual, which
can diminish its effect. However, in many applications the vector y is contaminated
with error, and the residual can, in general, be no smaller than the size of that error.
To summarize, if one is building a black-box least squares solver that is to run with
all sorts of problems at different levels of precision, orthogonal triangularization is the
way to go. Otherwise, one should look hard at the class of problems to be solved to
see if the more economical normal equations will do.
www.pdfgrip.com
minimize
subject
Moreover, we can vary 62 in any way we like and the result still satisfies the trans-
formed constraint (2.17).
Now set
Then
Since &i is fixed and 62 is free to vary, we may minimize ||y - Xb\\2 by solving the
least squares problem
www.pdfgrip.com
Given an nxp matrix X of rank p and an mxp matrix C of rank ro, this algorithm
solves the constrained least squares problem
Algorithm 2.6: The null space method for linearly constrained least squares
Once this problem has been solved we may undo the transformation (2.16) to get b in
the form
The algorithm just sketched is called the null-space method because the matrix V2
from which the least squares matrix X2 — X V2 is formed is a basis for the null space
of the constraint matrix C. Algorithm 2.6 summarizes the method. Here are some
comments.
• We have left open how to determine the matrix V. A natural choice is by a variant
of orthogonal triangularization in which C is reduced to lower triangular form by post-
multiplication by Householder transformations. If this method is adopted, the matrix
(Xi X2) = XV can be formed by postmultiplying X by Householder transforma-
tions.
• We have also left open how to solve the least squares problem in statement 4. Any
convenient method could be used.
• If Householder triangularization is used throughout the algorithm, the operation
count for the algorithm becomes
www.pdfgrip.com
• Two factors control the accuracy of the result. The first is the condition of the con-
straint matrix C. If C is ill conditioned the vector 61 will be inaccurately determined.
The second factor is the condition of X<i, which limits our ability to solve the least
squares problem in statement 4. For more, see the notes and references.
• The condition of X itself does not affect the solution. In fact, X can be of rank
less than p —just as long as X<i is well conditioned. For this reason constraints on
the solution are often used to improve degenerate problems. This technique and its
relatives go under the generic name of regularization methods.
It is worth noting that on our way to computing the solution of the constrained
least squares problem, we have also computed the solution to another problem. For
the vector
satisfies the constraint Cb = d. Now any solution satisfying the constraint can be
written in the form
The vector 6mjn = V\b\ is the unique minimal norm solution of the
underdetermined system CT6 = d.
www.pdfgrip.com
If we substitute this in the expression y - Xb, we obtain the reduced least square prob-
lem
in which U\ is upper triangular. Comparing the second row of this matrix with (2.20),
we see that pieces of the reduced least squares problem come from the Schur comple-
ment of C\ in W. Thus we can solve the constrained problem by performing m steps
of Gaussian elimination on W and solving the least squares problem
solved for 6 2 - But the process as it stands is dangerous. If C\ is ill conditioned, the
matrix XiC^lX2 in the formula
will be large, and its addition to X2 will cause information to be lost. (For more on
this point see Example 2.8, Chapter 3, and §4, Chapter 3.)
The ill-conditioning of C\ does not necessarily reflect an ill-conditioning of the
constrained least squares problem, which depends on the condition of the entire matrix
C. Instead it reflects the bad luck of having nearly dependent columns at the beginning
of the constraint matrix. The cure is to pivot during the Gaussian elimination. Since we
must move columns around, the natural pivoting strategy is complete pivoting for size.
However, we may choose pivots only from C, since we must not mix the constraints
with the matrix X.
These considerations lead to Algorithm 2.7. Some comments.
• The success of the algorithm will depend on how successful the pivoting strategy
is in getting a well-conditioned matrix in the first m columns of the array (assuming
that such a matrix exists). The folklore says that complete pivoting is reasonably good
at this.
www.pdfgrip.com
When n > p this amounts to n(mp — ^m ) flam, which should be compared to the
count n(2mp + m 2 ) flam from item two in the count (2.18) for the null-space method.
When m = p—i.e., when the number of constraints is large—the count be-
comes approximately (\np2 + |p3) flam. This should be compared with a count of
(np2 + |p3) flam from items one and two in the count for the null-space method. This
suggests that elimination is to be especially preferred when the number of constraints
is large.
where r is a suitably large number. The rationale is that as r increases, the size of the
residual d — Cb must decrease so that the weighted residual r(d — Cb) remains of a
size with the residual y — Xb. If r is large enough, the residual d — Cb will be so small
that the constraint is effectively satisfied.
The method has the appeal of simplicity—weight the constraints and invoke a
least squares solver. In principle, the weighted problem (2.22) can be solved by any
method. In practice, however, we cannot use the normal equations, which take the
form
The reason is that as r increases, the terms r 2 C T C and r2CTy will dominate XTX
and X^y and, in finite precision, will eventually obliterate them. For this reason or-
thogonal triangularization is the preferred method for solving (2.22).
To determine how large to take r, we will give an analysis of the method based on
orthogonal triangularization. Suppose that we have triangularized the first m columns
of the matrix
where
is orthogonal. If we were to stop here, we could solve the reduced least squares prob-
lem
or
or from (2.26)
If Qii were orthogonal, the solution of this problem would be the correct answer 62
to the constrained problem. The following assertion shows that the deviation of $22
from orthogonality decreases as r increases.
where
www.pdfgrip.com
If this result is applied to (2.27) and (2.28), we find that there is an orthogonal matrix
Qi2 such that
where
These results can be used to determine r. The bounds (2.30) are relative to the
quantities that g and G perturb. Thus the perturbations will be negligible if (7/r) 2 is
less than the rounding unit eM. In view of the definition (2.29) of 7, we should take
Now we cannot know Cl 1 without calculating it. But if the columns of C are all of a
size, it is unlikely that C\ would have a singular value smaller than ||C||2eM, in which
case HCf 1 !^ < 1/||C||2 M. Replacing yCf 1 ^ by this bound and replacing ||Xi||2
by \\X 112, we get the criterion
Stated in words, we must choose r so that X is smaller that C by a factor that is smaller
than the rounding unit.
The above analysis shows that increasing the weight r increases the orthogonal-
ity of Q2-2- The criterion (2.31) insures that even when C\ is very ill conditioned $22
will be orthogonal to working accuracy. However, as we have observed in connection
with the method of elimination, when C\ is ill conditioned the matrix X-i will be in-
adequately represented in the Schur complement X? and hence in the matrix X-2 =
$22^2-
We must therefore take precautions to insure that C\ is well conditioned. In the
method of elimination we used complete pivoting. The appropriate strategy for House-
holder triangularization is called column pivoting for size. We will treat this method
in more detail in the next chapter (Algorithm 2.1, Chapter 5), but in outline it goes as
follows. At the kth stage of the reduction in Algorithm 1.2, exchange the column of
largest norm in the working matrix (that is, X[k:n, k:p]) with the first column. This
procedure is a very reliable, though not completely foolproof, method of insuring that
C\ is as well conditioned as possible. Most programs for Householder triangulariza-
tion have this pivoting option.
Algorithm 2.8 summarizes the weighting method. Three comments.
• In statement 1, we have used the easily computable Frobenius norm in place of the
2-norm. This substitution makes no essential difference.
www.pdfgrip.com
Given an n x p matrix X of rank p and an m xp matrix of rank ra, this algorithm solves
the constrained least squares problem
• The placement of rC and rd at the top of the augmented matrix causes the grad-
ing of the matrix to be downward and is essential to the success of the Householder
triangularization. For more see (1.14).
• The operation count for the algorithm is [(n + m)p2 — ^p3} flam.
• Our analysis shows that we do not have to worry about unusually small elements
creating a balanced transformation that obliterates X^ as in Example 1.8. It is not that
such a situation cannot occur—it occurs when C\ is ill conditioned. But our pivoting
strategy makes the condition of C\ representative of the condition of C as a whole.
Consequently, we can lose information in Xi only when the problem itself is ill con-
ditioned.
Let 6 and r be approximations to the solution and residual of the least squares problem
\\y — Xb\\2 = min. This algorithm performs one step of iterative refinement via the
residual system.
1. g = y-r - Xb
2. h = -X^r
3. Solve the system
(In X\(,\_(g\
UT o)(t)-(h)
4. r = r + s;b = b + t
This is the procedure we used to correct the seminormal equations (Algorithm 2.5).
Unfortunately, this algorithm does not perform well unless r is very small. To see
why, let 6 + h be the true solution. Then s = y-Xb = r + Xh, where r is the residual
at the solution. Since X V = 0, in exact arithmetic we have
so that x + d is the exact solution. However, in inexact arithmetic, the vector s will
assume the form
where 11 e \ \ 2 /11 r \ \ 2 = 7 € M for some small constant 7. Thus we are committed to solving
a problem in which Xh has been perturbed to be Xh + e. The relative error in Xh is
then
The first equation in this system, r+Xb = y, defines the residual. The second equation
X^r = 0 says that the residual is orthogonal to the column space of X.
Iterative refinement applied to the residual system takes the form illustrated in Al-
gorithm 2.9. The chief computational problem in implementing this method is to solve
the general residual system
www.pdfgrip.com
in which the zero on the right-hand side of (2.33) is replaced by a nonzero vector. For-
tunately we can do this efficiently if we have a QR decomposition of X.
Let
If we set
From this system we obtain the equations R^SX = h, s±_ — g±, and Rt = gx — sx-
Once these quantities have been computed, we can compute 5 = Qx&x + Q±.SL-
Algorithm 2.10 implements this scheme. Two comments.
• We have written the algorithm as if a full QR decomposition were available, per-
haps in factored form. However, we can get away with a QR factorization. The key
observation is that s in statement 4 can be written in the form
5 = Qx&x + Pj_g-
where ||j£l,-||/||A"t-|| < 7 M (i = 1,2) for some constant 7 depending on the norm and
the dimensions of the problem. This is not quite backward stability, since the matrix
X is perturbed in two different ways.
www.pdfgrip.com
Let
i- ax = Qxg; 0i = <9l0
2. Solve the system R^SX = h
3. Solve the system Rt — gx — sx
4. 5 = Qxsx + Q±.9i.
Iterative refinement via the residual system works quite well. Unfortunately our
general analysis does not apply to this variant of iterative refinement because the con-
dition number of the system is approximately AC 2 (X). But if the special structure of
the system is taken into account, the refinement can be shown to converge at a rate
governed by K,(X)eM. The method can be used with double precision calculation of
the g and h, in which case the iteration will converge to a solution of working accu-
racy. The behavior of the fixed precision version is more problematic, but it is known
to improve the solution.
1812]. Gauss returned to the subject in the 1820s with an extended memoir in three
parts [132, 133, 135]. In the first of these he proved the optimality of least squares
estimates under suitable assumptions. The result is commonly known as the Gauss-
Markov theorem, although Markov's name is spurious in this connection.
There is an extensive secondary literature on the history of least squares. Stigler
[311] gives an excellent account of the problem of combining observations, although I
find his treatment of Gauss deficient. Plackett [263] gives a balanced and entertaining
treatment of the priority controversy accompanied by many passages from the corre-
spondence of the principals. For a numerical analyst's view see the afterword in [140].
The QR approach
In his paper on unitary triangularization [188], Householder observed that when the
method was applied to a rectangular matrix the result was the Cholesky factor of the
matrix of the normal equations. But he did not give an algorithm for solving least
squares problems. I first heard of the QR approach from Ross Bums, who had been us-
ing plane rotations in the early 1960s to solve least squares problems [47]. But the real
birth year of the approach is 1965, when Gene Golub published a ground-breaking pa-
per on the subject [148,1965]. In it he showed in full detail how to apply Householder
triangularization to least squares. But more important, he pioneered the QR approach
as a general technique for solving least squares problems.
son [261] computes an LU factorization of the least squares matrix with pivoting to
insure that L is well conditioned and then applies the method of normal equations to
the problem \\y — L(Ub}\\\ = min. Once the solution Ub has been computed 6 may
be found by back substitution.
For more on techniques involving Gaussian elimination, see [243].
Rounding-error analyses
Except for the additional error made in forming the normal equations, the analysis of
the method of normal equations parallels that of Theorem 4.9, Chapter 3. The error
analysis of the QR solution depends on what flavor of decomposition is used. For or-
thogonal triangularization the basic tool is Theorem 1.5 combined with an error analy-
sis of the triangular solve. It is interesting to note that, in contrast to linear systems, the
error is thrown back on y as well as X. For a formal analysis and further references,
see [177, Ch. 10].
Perturbation analysis
The first-order perturbation analysis of least squares solutions was given by Golub and
Wilkinson [155], where the K2-effect was first noted. There followed a series of pa-
pers in which the results were turned into rigorous bounds [37,213,260,286]. Special
mention should be made of the paper by Wedin [336], who treats rank deficient prob-
lems. For surveys see [290] and [310]. For the latest on componentwise and backward
perturbation theorems, see [177, Ch. 19].
Iterative refinement
The natural algorithm (2.32) was proposed by Golub [148] and analyzed by Golub
and Wilkinson [154]. Iterative refinement using the residual system is due to Bjorck
[36, 35, 38]. For an error analysis of iterative refinement see [177, §19.5].
The term "residual system" is new. The matrix of the system is often called the
augmented matrix, an unfortunate name because it takes a useful general phrase out of
circulation. But the terminology is firmly embedded in the literature of optimization.
3. UPDATING
The moving Finger writes; and, having writ,
Moves on: nor all thy Piety nor Wit
Shall lure it back to cancel half a Line:
Nor all thy Tears wash out a Word of it.
Omar Kayyam
We all make mistakes. And when we do, we wish we could go back and change things.
Unfortunately, Omar Kayyam said the last word on that.
By contrast, we can sometimes go back and undo mistakes in decompositions.
From the decomposition itself we can determine how it is changed by an alteration
in the original matrix — a process called updating. In general, an update costs far less
than recomputing the decomposition from scratch. For this reason updating methods
are the computational mainstays of disciplines, such as optimization, that must operate
with sequences of related matrices.
www.pdfgrip.com
A nonblank entry means that there is an algorithm for the corresponding problem. The
symbol "±" means that we will treat both adding or deleting a row or column. The
lonely "—" means that there is no algorithm for adding a column to an R-factor—
the R-factor alone does not contain enough information to allow us do it. Likewise,
we cannot perform a general rank-one update on an R-factor. We will go through this
table by columns.
Two general observations. First, in addition to its generic sense, the term "updat-
ing" is used in contrast with downdating—the process of updating an R-factor after a
row has been removed from the original matrix. As we shall see, this is a hard problem.
The term downdating is sometimes used to refer to removing a row or a column from
a QR decomposition or factorization. However, these problems have stable solutions.
For this reason we will confine the term downdating to R-factor downdating.
Second, the R-factor of a QR decomposition X is the Cholesky factor of the cross-
product matrix A = X^X, and updating an R-factor can be formulated in terms of
A alone. For this reason the problem of updating R-factors is also called Cholesky
updating or downdating according to the task.
The algorithms in this section make heavy use of plane rotations, and the reader
may want to review the material in §1.3.
Woodbury's formula
Our goal is to compute the inverse of a matrix of the form A - UV1-. It will be con-
venient to begin with the case A = I.
Theorem 3.1 fWoodburvl Let U. V £ RP*k. If J - UVT is nons/ni?u7ar. then so is
Let y = Ux. Then y ^ 0, for otherwise (3.3) would imply that x = 0. Multiplying
the relation x - VTy = 0 by U, wefindthat y - UVTy = (I - UV'I)y = 0; i.e.,
/ - UVT is singular, contrary to hypothesis.
The formula (3.2) can now be verified by multiplyingright-handside by / - 17 V T
and simplifying to get /
In most applications of this theorem we will have p > k and U and V will be of full
column rank. But as the proof of the theorem shows, neither condition is necessary.
Turning now to the general case, suppose that A is nonsingular. Then
Given x satisfying Ax = b, this algorithm computes the solution of the modified equa-
tion (A - uvT)y = b.
1. Solve Aw = u
2. r = vTx/(l-vr[w)
3. y — X+TW
To refactor A - uv1 from scratch would require 0(n3) operations. Instead we can
compute the solution in 0(n2) operations as follows.
From (3.5) we have
whose inverse is
Note that changing the element 1.499 to 1.500 makes A exactly singular.
www.pdfgrip.com
If we apply the formula (3.5) to compute an approximation to A~l and round the
result to four digits, we get
If we now apply the formula again to compute (B — m;T) 1, which should be A~l,
we get
1
The (2,2,)-elements of A and C agree to no significant figures.
The size of B in this example is a tip-off that something has gone wrong. It says
that B was obtained by adding a large rank-one matrix to A~l. When this matrix is
rounded information about A is lost, a loss which is revealed by cancellation in the
passage from B to C. It should be stressed the loss is permanent and will propagate
through subsequent updates. The cure for this problem is to update not the inverse but
a decomposition of the matrix that can be used to solve linear systems.
in which AH is nonsingular. If the first row of this system is solved for x\, the result
is
If this value of x\ is substituted into the second row of (3.6), the result is
Thus the matrix in (3.9) reflects the interchange of the vectors x\ and y\.
The sweep operator results from exchanging two corresponding components of x
and y. If, for example, the first components are interchanged, it follows from (3.9) that
the matrix of the system transforms as follows:
www.pdfgrip.com
Given a matrix A of order p and a pivot k, consider the system (in northwest indexing)
1. sweep(A, k)
2. A[k,k] = l/A[k,k]
3. A[k,l:k-l] = -A(k,k)*A[k,l:k-l]
4. A[k, k+l:p] = -A(k, k)*A[k, k+l:p]
5. A[l:k-l, l:Jfe-l] = A[l:k-l, l:k-l] + A[l:k-l,k]*A[k, l:k-l]
6. A[l:k-l,k+l:p] = A[l:k-l,k+l:p] + A[l:k-l,k]*A[k,k+l:p]
1. A[k+l:p, l:k-l] = A[k+l:p, l:k-l] + A[k+l:p,k]*A[k, l:k-l]
8. A[k+l:p,k+l:p] = A[k+l:p,k+l:p] +A[k+l:p,k]*A[k,k+l:p]
9. A[1:A-1, k] = A(k, k)*A[l:k-l,k]
10. A[k+l:p, k] = A(k, k)*A[k+l:p, k]
11. end sweep
Algorithm 3.2 defines the sweep operator for an arbitrary pivot k. There are sev-
eral appealing facts about the sweep operator. In what follows we assume that the in-
dicated sweeps can actually be performed.
• Since two exchanges of the same two components of x and y leave the system un-
changed, sweep(A, k) is its own inverse.
• The sequence sweep(A, 1), s\veep( A, 2 ) , . . . , sweep(A, k) yields a matrix of the
form (3.9). In fact, these sweeps can be performed in any order. A set of sweeps on
an arbitrary sequence of pivots yields a matrix of the form (3.9) but with its parts dis-
tributed throughout the matrix according to the sequence of pivots. In particular, after
the sweeps the submatrix corresponding to the sweeps will contain its inverse, and the
complementary submatrix will contain its Schur complement.
• If we sweep through the entire matrix, the result is the inverse matrix at a cost of
n3 flam.
• One sweep requires n 2 flam.
www.pdfgrip.com
• If we generalize Algorithm 3.2 to sweep the augmented matrix (A y], the solution
of the subsystem A\\x\ = y\ corresponding to the pivots will be found in the compo-
nents of the last column corresponding to the pivots.
• If A is positive definite then any sweep can be performed at any time. Specifi-
cally, the principal submatrix B corresponding to the pivots swept in is the inverse of
a positive definite matrix and hence is positive definite (Corollary 2.4, Chapter 3). The
complementary principal submatrix is the Schur complement of B and is also positive
definite (Theorem 2.6, Chapter 3). Hence the diagonals of a swept matrix are posi-
tive— i.e., the pivot elements are nonzero. Of course, we can take advantage of the
symmetry of A to save storage and operations.
The sweep operator is widely used in least squares calculations when it is neces-
sary to move variables in and out of the problem. To see why, consider the partitioned
augmented cross-product matrix
where AH is of order A;. If we sweep on the first k diagonals of this matrix we get
Now A^CI is the solution of the least squares problem \\y — X\bi \\\ = min. More-
over, p2 is the Schur complement of AH in
and hence is the square of the (k, k) -element of the Cholesky factor of the same matrix
[see (2.2), Chapter 3]. Hence by (2.5), p2 is the residual sum of squares. Under a
classical statistical model, p2 A^ is an estimate of the covariance matrix of b\. Given
these facts, it is no wonder that the sweep operator is a statistician's delight.
The stability properties of the sweep operator are incompletely known. It is closely
related to a method called Gauss-Jordan elimination [see (1.32), Chapter 3], and for
positive definite matrices it is probably at least as good as invert-and-multiply. But
such a statement misses an important point about updating algorithms. We not only
require that our algorithm be stable—at least in some sense—but we demand that
the stability be preserved over a sequence of updates.
Example 3.4. Suppose we use the sweep operator to compute the inverse of a matrix
with a condition number of 10*. Then we might reasonably expect a loss oft digits
www.pdfgrip.com
in the solution (see Example 3.4, Chapter 3). Since the condition of the inverse is the
same as the condition of the original matrix, if we use the sweep operator to recompute
the original matrix, we would expect the errors in the inverse to be magnified by 10f
giving a total error of 10~2*. By repeating this process several times, we should be
able to obliterate even a well-conditioned matrix.
I tried this experiment with a positive definite matrix A whose condition is 1010
using arithmetic with a rounding unit of about 2-10~16. Below are the relative (norm-
wise) errors in the succession of computed A's.
The results of this example are confirmed by the fact that the sweep operator has
been used successfully in problems (e.g., subset selection) for which any significant
magnification of the error would quickly show itself. The sweep operator does not
have the stability of methods based on orthogonal transformations, but its defects re-
main bounded.
A general approach
The key idea is very simple. Let
Interchanging columns
Suppose we want to apply the strategy in (3.10) to interchange columns two and five
of a QR decomposition of an nx6 matrix X. The first step is to interchange columns
two and five of R to get the matrix
(The replacement of the fifth column by the second introduces some inconsequential
zeros not shown in the above diagram.)
We must now reduce this matrix to triangular form. It is done in two steps. First
use plane rotations to eliminate the elements in the spike below the first subdiagonal
in column two, as shown in the following diagram.
Let
loops are only two deep, it is easier to use the method of areas. Specifically, consider
the following diagram.
The shaded portion represents the part of the matrix to which the plane rotations are
applied during the reduction of the spike and the return to triangular form. Since each
application represents a flrot (2 fladd+4 flmlt), the number of flrots is equal to twice the
area of the shaded portion—i.e., (2p-l-m)(m-l) flrot. Finally, the algorithm gen-
erates a total of 2(m-l) plane rotations, each of which must be applied to the columns
of Q for a count of 2(m - l)n flrot. To summarize:
Algorithm 3.3 requires
• According to the table (3.1) in the introduction to this section, we need to also show
how to update the QR factorization and the R-factor under column interchanges. How-
ever, Algorithm 3.3 works for the QR factorization, since the rotations are applied only
to the first p columns of Q. The operation count remains the same. For the R-factor
all we have to do is suppress the updating of Q (statements 6 and 11). The operation
count is now equal to item 1 in (3.11).
• When the algorithm is applied to two contiguous columns—say columns I and
I +1 — the operation count is (p—l+n) flrot, about one half the value given in (3.11).
This is because the reduction of the spike is bypassed. In particular, if we want to move
a column to a position in X and do not care what becomes of the other columns, it will
be faster to implement it as a sequence of contiguous interchanges. For example, in
moving column I to the east side of the matrix, the code
1. for k = Ho p-1
2. qrdexch(R, Q, k, k+1)
3. end for k
is preferable to
1. qrdexch(R, Q, k, p)
www.pdfgrip.com
Of course, if we have to do this task frequently, it may pay to code it explicitly to avoid
the overhead in invoking qrdexch—especially if p is small.
Example 3.5. We have already observed [see (2.4)J that the R-factor
of the augmented least squares matrix (X y) contains the solution of the least squares
problem \\y — Xb\\\ = min in the form b — R~lz. Moreover, the residual sum of
squares is p2.
Actually, we can use the same R-factor to solve several related problems. For if we
partition the augmented least squares matrix in the form (Xi X% y) and the R-factor
correspondingly in the form
then it is easy to see that R^z\ is the solution of the least squares problem \\y —
Xib\\2 — min. Moreover, the residual sum of squares is p2 + \\z-2\\\. Thus from the
single R-factor (3.13) we can compute any least squares solution corresponding to an
initial set of columns. By our updating procedures, we can make that initial set any-
thing we like. In particular, column exchanges in the augmented R-factor represent a
backward stable alternative to the sweep operator (Algorithm 3.2).
Then
Given a QR decomposition this algorithm returns the QR decomposition after the ith
column has been removed.
1. qrdrmcol(R, Q, i]
2. forj = /top-l
3. qrdxch(R, Q, j, j+l)
4. end for j
5. R = R[l:p-l,l:p-l]
6. end qrdrc
The general procedure for removing the i\h column is to move it to the extreme
east and drop the last column of R. Algorithm 3.4 uses qrdexch to interchange the £th
row into the last position [see (3.12)] and then adjust R. Two comments.
• The method of areas gives an operation count of
• The same algorithm will update a QR factorization X = QxR', however, the last
column of Qx must also be dropped from the factorization. In Cholesky updating,
where only R is available, forget about Q.
is a QR decomposition of (X x).
www.pdfgrip.com
1. qrdappcol(R, Q, x )
2. x = QT*z
3. R[l:p,p+l] = x[l:p]
4. housegen(x\p+l:n], u, R\p+l,p+l]}
5. v = Q[:,p+l:n]*u
6. Q[:,p+l:n] = Q[:,p+l:n]-v*uT
7. end qrddappcol
n(3n-2p) flam.
Once a column has been appended it can be moved to anywhere in the matrix using
Algorithm 3.3.
where R is upper triangular, then R is the R-factor of the updated decomposition and
Algorithm 3.6 implements this scheme. The last column of the updated Q is ac-
cumulated in a temporary scratch vector to make the algorithm easy to modify. Three
comments.
• The algorithm requires
1. \p2 flrot to update R,
2. np flrot to update Q.
1. qrdapprow(R, Q, x)
2. g[n+l,l:p] = 0
3. Q[n+l,p+l:n] = 0
4. t = en+i
5. for k — I to p
6. rotgen(R[k,k], x[k], c, s}
1. rotapp(R[k,k+l:p], x[k+l:p], c, s)
8. rotapp(Q[:;k],t,c,s)
9. end for k
10. Q[:,n+l] = *
11. end qrdapprow
be the matrix whose last row is to be removed. The algorithms differ depending on
what is being updated—the full decomposition, the factorization, or the R-factor—
and we will treat each case separately.
www.pdfgrip.com
in which the bottom zero in the matrix on the right is a row vector. Suppose we can
determine an orthogonal matrix P such that
where R is upper triangular. The first of these conditions implies that QPT has the
form
Given a QR decomposition of
1. qrdrmrow(R, Q}
2. for j — n-l to p+l by -1
3. rotgen(Q[n, n], Q[n,j], c, s)
4. rotapp(c, s, Q[l:n-l, n], Q[l:n-l, j } )
5. end for j
6. w[l:p] = 0
7. for j — p to 1 by — 1
8. mtgen(Q[n,n], Q[n,j], c, s}
9. rotapp(c, 5, Q[n, l:n-l], Q[j, l:n-l])
10. rotapp(c, s , w [ j : p ] , J?[j, j:p])
11. end for j
12. Q = g[l:n-l,l:n-l]
13. end qrdrmrow
be the factorization in question. Suppose that we can find a vector q and an orthonor-
mal matrix P such that
Hence
• If Pj.en = 0, the procedure sketched above breaks down, since Pj_en cannot be
normalized. However, in that case en is already in the column space of Qx, and any
normalized vector that is orthogonal to 'R(Qx) will do. That is precisely what is re-
turned by gsreorthog.
www.pdfgrip.com
Given a QR factorization of
1. qrfrmrow(R, Q)
2. gsreorthog(Q, en, q, r, p)
3. w[l:p] = 0
4. for j' = pto Iby -1
5. rotgen(q[n], Q [ n , j ] , c, s)
6. rotapp(c, s, q[l:n-l], Q[l:n-l,j})
1. rotapp(c, s, w\j:p], R\j,j:p])
8. end for j
9. Q = Q[l:n-l,:]
10. end qrfrmrow
Let us consider the first step of this algorithm, in which we generate a rotation
from pu and £1 and apply it to the first row of X and R. We can represent this step in
www.pdfgrip.com
This relation, it turns out, is sufficient to allow us to derive pn, c, s, rf2, and #2 from
(pu r T 2 )andz T .
We begin by observing that because we are working with an orthogonal transfor-
mation, we have p\a -f- £f = p\l or
we get
Finally,
Thus we have computed the first row R. Since we know X2, we may repeat the process
on the matrix
Given a triangular matrix R with positive diagonal elements and a vector x such that
A — R^-R — xx1 is positive definite, this algorithm overwrites R with the Cholesky
factor of A.
1. chdd(R, x)
2. for k = Hop
3. hrkk = ^R[k,k]2-x[k}*
4. c = hrkk/R[k, k];s = x[k]/R[k, k]
5. R[k, k] = hrkk
6. R[k,k+l:p] = c-l*(R[k,k+l:p] - s*x[k+l:p])
1. x[k+l:p] = c*x[k+l:p] - s*R[k, k+l:p]
8. end for k
9. end chdd
Downdating a vector
A special case of Cholesky downdating occurs when R = p is a scalar and X = x
is a vector. In this case p = \\x\\z, and downdating this quantity amounts to recom-
puting the norm after a component £ has been removed from x. An important differ-
ence between vector downdating and downdating a general Cholesky factor is that it
is sometimes feasible to retain x, which can be used to recover from a failure of the
algorithm. A second difference is that in applications we may not need a great deal of
accuracy in the norm. (Pivoted orthogonal triangularization is an example. See §2.1,
Chapter 5.)
In principle the norm can be downdated by the formula
However, in a sequence of downdates this procedure may break down. To see why, let
p be the norm of the original vector and let p be the norm of the current vector x. If y
is the vector of components we have already removed from x then
From this equation we see that any attempt to reduce p2 /p2 to a number near the round-
ing unit must produce inaccurate results. For in that case, the quantity ||2/||2/P2 must
be very near one, and the slightest change in either \\y\\2 or p will completely change
the result.
The cure for this problem is to make a tentative computation of p. If the ratio of
p / p is satisfactory, use the computed value. Otherwise recompute the norm from x.
The details are contained in Algorithm 3.10.
www.pdfgrip.com
This algorithm takes the norm p of a vector x and overwrites it with the norm of the
vector x which is obtained by a deleting the component £ of x. The algorithm uses
and updates a quantity p, which should be initialized to \\x\\2 on first calling.
1. vecdd(p, p, f, x)
2. if (/J = 0) return ;fl
3. /x = max{0, l-(£/p)2}
4. if(Kp/^ 2 >100* M )
5. /» = p*^
6. else
7. p = p=||x|| 2
8. end if
9. end v^c^rf
The quantity /u in the algorithm is a reduction factor that tells how much ||ar|J2 is
reduced by the deletion of £. The total reduction of the square of the norm of the orig-
inal vector is then n*(p/pf. If this quantity is sufficiently greater than the rounding
unit, the value of p is downdated. Otherwise, p is computed directly from x, and p is
reinitialized to p. The number 100 in statement 4 defines what is meant by "sufficiently
great." It enforces about two decimal digits of accuracy in p.
Updating a factorization
Let X = QXR be the QR factorization of X. Let
Suppose we determine plane rotations Pk,k+i in the (&, k + l)-plane such that
Then
Since PT = Pi2 • • • Pp-ijpPplp+i, the matrix H + veiv1- has the form illustrated
below for p = 5:
where R is upper triangular and set Qx equal to the first p columns of (Qx q}PU,
then
X + UVT = QXR
is the required factorization. The matrix U can be calculated as in Algorithm 1.8 as a
product of plane rotations.
Algorithm 3.11 is a straightforward implementation of this scheme. It requires
(2np + p 2 )flrot.
Updating a decomposition
The method for updating a decomposition is similar to the method for updating a fac-
torization. In analogy with (3.16), write
where t = QTu. We then proceed to reduce t by plane rotations from the bottom
up to a multiple of ei, accumulating the transformations in Q and R. The result is a
decomposition of the form
where H + ve,\v*- is zero below its first subdiagonal. If we reduce this matrix to upper
triangular form, accumulating the transformations in QP, we get the updated decom-
position.
www.pdfgrip.com
1. qrfrnkl(R, Q, w, v)
2. gsreorthog(Q, u, Q[:,p+l], t, r)
3. J RbH-l,:] = 0;i[p+l] = r
4. for k = pto 1 by — 1
5. w/gen(/[A;], i[Ar+l], c, 5)
6. rotapp(c, s, J2[A,fe:p], J2[fc+l, fc:p])
7. roropXc, 5, Q[:,fe],Q[:,fc+l])
8. end for k
9. £[!,:] = fi[l,:]+t[l]*t;T
10. for A; = 1 top
11. rofgen(5[A,A;],5[fc+l,A;], c, 5)
12. rotapp(c, s, R[k, k+l:p]), R[k+l, k+l:p])
13. wtapp(c, 5, Q[:,fc]),Q[:,*+l])
14. end for fc
15. g = Q[:,l:p];£ = fl[l:p,l:p]
16. end qrfrnkl
The R-factor (1) contains no information about the number 7. Hence any attempt
to downdate by removing the first row of X must fail. On the other hand 7 is fully
present in the Q-factor, and we can safely use Algorithm 3.8 to remove the first row
of A".
www.pdfgrip.com
Updating
The error results for updating QR decomposition, QR factorizations, and R-factors all
have the same flavor. For definiteness, we will consider the QR factorization in detail
and then describe briefly how the results apply to the full decomposition and the R-
factor.
It is important to keep in mind that updating is not a one-time thing. The decom-
position that presents itself to the algorithm will have been computed with error, pre-
sumably from a sequence of previous updates. Thus our problem is not to bound the
error from a single update but to show how much the update adds to the error already
present.
To start things off, we will assume that we have a computed factorization QxR
that satisfies the following conditions (in some suitable norm || • ||).
The first inequality bounds the deviation of the columns of Qx from orthogonality.
The second bounds the backward error in the computed decomposition. The number
p measures the size of the problem in a sense that will become clear a little later.
Now assume that this QR factorization is updated by any of the algorithms of this
section to give the new QR factorization QxR- Then this factorization satisfies
Here 7 and 6 are constants that depend on the dimensions of the problem.
To interpret these bounds, let us suppose we perform a sequence of updates on the
matrices XQ, X\, — Then if Qxk and Rk denote the kth computed factorization, we
have
This says that the Q-factors suffer a slow loss in orthogonality. Specifically, if 7 is
an upper bound on the 7,- from the individual updates, then the loss of orthogonality is
bounded by (a + &7)eM. Thus the deterioration in orthogonality grows at most linearly
in the number of updates k.
The bound for the backward error shows that the process has a memory. Specifi-
cally, if 6 bounds the 6k for the individual updates, then
Thus the backward error is small compared not to ||.Rjb|| but to the norm of the largest
Ri encountered in the sequence of updates. If one encounters a very large R-factor,
it will introduce a large backward error that stays around to harm subsequent smaller
updates.
www.pdfgrip.com
This situation is analogous to the situation in Example 3.3, where Woodbury's for-
mula was used to update inverses. There a large inverse introduced large errors that
propagated to a subsequent inverse. However, there is an important difference. The
large inverse resulted from the presence of an ill-conditioned matrix—the Woodbury
update cannot pass through an ill-conditioned problem without loosing accuracy. On
the other hand, an ill-conditioned R-factor does not have to be large. Consequently,
our QR updating algorithms can pass through ill-conditioning with no bad effects.
The same results hold for updating the QR decomposition. The orthogonality de-
teriorates linearly with k as does the backward error. Large intermediate matrices mag-
nify the error, which propagates to subsequent updates.
At first glance it would seem that these results cannot apply to updating the R-
factor, since no matrix Qx is computed. However, it can be shown that there is an
exactly orthonormal Qx such that the backward error bound holds. Thus the above
comments also apply to updating R-factors.
For convenience we have presented normwise bounds involving entire matrices.
However, if we exclude the general rank-one updating algorithms, the backward error
has columnwise bounds analogous to those in Theorem 1.5. For these algorithms the
statement about remembering large matrices may be modified to say that large columns
are remembered. But one large column does not affect the backward error in the other
columns.
Downdating
Algorithm 3.9 for downdating a Cholesky or R-factor is not backward stable. Nonethe-
less, it has a useful error analysis. Specifically, let the vector x1 be downdated from
R to give the matrix R. Then there is an orthogonal matrix Q such that
where
and associate the last row of F with X T . But in that case the rest of F must be asso-
ciated with R, which is not a backward error analysis. We call this kind of a result
relational stability because a mathematical relation that must hold in exact arithmetic
holds approximately in the presence of rounding error.
www.pdfgrip.com
Two facts make relational stability important. First, it continues to hold through a
sequence of downdates and updates. As with the updating algorithms, the error grows
slowly and is proportional to the largest R-factor in the sequence.
Second, if we pass to cross-product matrices, we have that
where E\ consists of the first p rows of E. It follows that the computed R-factor is the
R-factor of a perturbation G of the exact downdated matrix RTR - xx1. The norm
of G is bounded by
It follows that if the R-factor is not sensitive to perturbations in the cross-product ma-
trix, the result will be accurate.
The result also suggests a fundamental limitation on downdating. We have seen
[see (2.7)] that for R^R — xx^ to remain positive definite under a perturbation G, the
norm of G must be less than the square of the smallest singular value of R1- R - xx^ —
call it (Tp. From the bound (3.18) this will be true if
If K2(R] > I/\/^M' ^en ^s me£luality fails. In other words, one should not expect
to successfully downdate matrices whose condition number is greater that the recip-
rocal of the square root of the rounding error. In IEEE double-precision arithmetic,
this means that one should beware of matrices whose condition number is greater than
about 108. In fact, with such matrices the downdating Algorithm 3.9 may fail in state-
ment 3 attempting to take the square root of a negative number.
the triangular decomposition from the normal equations, and consequently his updat-
ing technique could only be applied once.
Modern updating seems to have begun with the simplex method for linear pro-
graming, in which the inverse of the matrix of active constraints is updated as the
constraints are swapped in and out (e.g., see [85]). Inverse updating is also used in
quasi-Newton methods for nonlinear optimization [87, 1959]. The first example of
QR updating was given by Golub in the same paper in which he showed how to use
Householder transformations to solve least squares problems [148,1965].
Updating inverses
According to Zielke [355], Woodbury's formula can be found as incidental formulas
in papers by Duncan [109,1944] and Guttman [164,1946]. Woodbury's formula ap-
peared explicitly in a technical report in 1950 [351]. Earlier Sherman and Morrison
gave formulas for special cases [278,279,280], and the general method is sometimes
called the Sherman-Morrison-Woodbury formula. Although the formula has its nu-
merical drawbacks (Example 3.3), it is an indispensable theoretical tool.
The sweep operator was introduced by Beaton [23,1964]. For a tutorial see [156].
The method was used by Furnival and Wilson [128] to select optimal subsets of regres-
sion variables. The observation that the errors do not grow exponentially—as would
be suggested by a naive analysis—is due to the author; but a formal analysis is lack-
ing. The operator is closely related to Gauss-Jordan elimination, which is discussed
in §1.6, Chapter 3.
Updating
The collection of updating algorithms has been assembled from various sources. Al-
gorithms for moving around columns may be found in UNPACK [99]. The algorithms
for updating a QR factorization are due to Daniel, Gragg, Kaufman, and Stewart [84].
The algorithm for appending a row to an R-factor is due to Golub [148], although he
used 2x2 Householder transformations rather than plane rotations.
Yoo and Park [352] give an alternative method, based on the relation of Gram-
Schmidt orthogonalization and Householder triangularization (Theorem 1.11), for re-
moving a row from a QR factorization.
It is not surprising that stable algorithms should exist for updating QR decompo-
sitions and factorizations. If we begin with a stable QR factorization—say QxR =
X + E — we can compute an update stably by reconstituting X + E from Qx and R,
making the modification, and recomputing the factorization. Thus the problem is not
one of the existence of stable updating algorithms but of finding algorithms that are
both stable and efficient.
There is no formal error analysis of all the updating algorithms presented here, and
the results in (3.17) are largely my own concoction. For appending a row, the result
follows from the standard error analyses of plane rotations; e.g., [346, pp. 131-143],
[142], and [177, §18.5].
www.pdfgrip.com
Exponential windowing
In signal processing, the rows of X represent a time series. Since only the most re-
cent rows are pertinent to the problem, it is important to discard old rows. This can be
done by interleaving updates and downdates, a process called windowing. However,
a widely used alternative is to update the configuration
Cholesky downdating
There are three algorithms for downdating an R-factor: Saunders' method, the method
of hyperbolic rotations, and the method of mixed rotations presented here.
Saunders' method [271, 1972] is the algorithm used by UNPACK [99]. It was
shown to be relationally stable by Stewart [293], who introduced the term "downdat-
ing."
The method of hyperbolic rotations originated in an observation of Golub [149,
1969] that downdating could be regarded as updating with the row in question multi-
plied by the square root of —1. When this result is cast in terms of real arithmetic, it
amounts to multiplying by transformations of the form
where c2 — s2 = 1. This implies that c = cosh/ and 3 = sinhi for some t. For
this reason matrices of the form P are called hyperbolic rotations. The method of hy-
perbolic rotations is not relationally stable, and in sequential application it can give
unnecessarily inaccurate results [307].
The method of mixed rotations is due to Chambers [62,1971], who in transcribing
the method of hyperbolic rotations wrote the formulas in the form used here. It is called
the method of mixed rotations because one updated component is computed from a
hyperbolic rotation and the other from an ordinary rotation. The proof that the method
is relationally stable is due to Bojanczyk, Brent, Van Dooren, and de Hoog [48]. The
implications of relational stability for sequences of updates and downdates are due to
Stewart [307].
The notion of hyperbolic rotations can be extended to Householder-like transfor-
mations. For the use of mixed Householder transformations in block updating, see
[49].
www.pdfgrip.com
Downdating a vector
The algorithm given here first appeared as undocumented code in the LINPACK routine
SQRDC [99]. It seems to have lived on in other programs as a black box which no one
dared tamper with.
www.pdfgrip.com
5
RANK-REDUCING DECOMPOSITIONS
357
www.pdfgrip.com
where
h
Note that we have changed notation slightly from the more conventional notation of
§4.3, Chapter 1. There E was a diagonal matrix of order p. Here S is an n x p ma-
trix, with £[l:p, l:p] a diagonal matrix containing the singular values. The change is
convenient because it puts partitions of U and V on an equal footing.
We will be particularly concerned with the case where U\ and Vi come from the
partitions of the form
where Si contains the m largest singular values of X and £2 the p—m smallest. We
will call the subspace spanned by U\ the left superior subspace and call the subspace
spanned by Ui the left inferior subspace. Together they will be called the left funda-
mental subspaces. Similarly we will call the subspaces spanned by Vi and V? the right
superior and inferior subspaces—collectively, the right fundamental subspaces.
It should be stressed that the notion of fundamental subspaces is relative to the
integer m and requires a gap between <jm and crm+i to be well defined (see Theorem
4.28, Chapter 1). However, in rank-reduction problems we will generally have such a
gap-
In what follows we will use a pair of simple algebraic relations. Specifically, it is
easy to verify that
The matrix Si is square, and if rank(JC) > m it is nonsingular. Thus (1.1) provides
a way of passing from a basis for a left (right) superior subspace to a basis for a left
(right) superior subspace.
where S is of order m and F and G or H (possibly both) are small. If G and H were
zero, the left and right fundamental subspaces of X would be spanned by
If G or H is nonzero but small, the bases at best approximate the fundamental sub-
spaces. Our concern here will be with assessing their accuracy. In addition we will
relate the singular values of S and F to those of X.
We will begin by partitioning the singular vectors of X in the form
www.pdfgrip.com
where Un and Vu are of order ra. The columns of these partitions span the funda-
mental subspaces of X. By Theorem 4.37, Chapter 1, the singular values of U\2 or
l/2i —they are the same — are the sines of the canonical angles between these sub-
spaces and the column spaces of (1.3). Similarly for V\2 and V\2- We will now show
how to bound these quantities.
Theorem 1.1. Let X be partitioned as in (1.2), where S is of order m, and let the sin-
gular vectors ofX be partitioned as in (1.4), where U\\ and V\\ are of order m. Let
Let
be the sine and cosine between the left fundamental subspaces ofX and their approx-
imations from (1.3). Similarly, let
be the sine and corresponding cosine for the right fundamental subspaces. If
Moreover, if su andsv are less than one, then U\\, 1/22, Vu, and ¥22 are nonsingular,
and
and
Consequently, the choice r = inf (Si) gives the smaller bounds, and it is sufficient to
prove the theorem for that choice.
www.pdfgrip.com
it follows that
it follows that
If we substitute (1.10) into (1.9) and replace cu and cv with the upper bound one, we
get
Solving this inequality for su we get the first bound in (1.5). The second inequality
follows similarly.
To establish (1.6) and (1.7), first note that if s u ,5 V < 1 then the canonical cosines
are all greater than zero. Since the canonical cosines are the singular values of the
matrices £/n, £/22, Vn, and V^i, these matrices are nonsingular.
We will now establish the first expression in (1.6). Multiply out the relation
to get
Now U^ - U^i ^22^12 i§me Schur complement of U^ in UT, and by Theorem 1.6,
Chapter 3, it is the inverse of the (l,l)-block of (7~T. But by the orthogonality of UT
that block is simply Un. Hence U^ - U21^22^^12 = U^, and the first expression
in (1.6) follows directly.
The other expressions for EI and £2 follow by similar arguments.
Let us examine what this theorem says in more detail. In what follows, we will let
Thus if p is small, there is a strong gap between the rath and (ra+l)th singular values
ofX.
• The bounds (1.5) say that the fundamental subspaces of X are O(e) perturbations
of those of diag(5, F). Moreover, since
the cosines of the canonical angles between the subspaces are 0(e 2 ) approximations
to one. In particular, since the canonical cosines are the singular values of the matrices
Uu, t/22> ^ii> and ¥22, these matrices are 0(e 2 ) approximations to the orthogonal
matrices obtained by setting their singular values to one.
• An important phenomenon occurs when X is block triangular. Suppose, for ex-
ample, that 77 = 0 so that X is block lower triangular. Then the bounds (1.5) become
Thus the approximate left singular subspaces are better than the approximate right sin-
gular subspaces by a factor of p—the relative gap in the singular values.
www.pdfgrip.com
• The expressions (1.6) and (1.7) for EI and E2 imply that the singular values of S
and F are 0(e2) approximations to those of EI and E2 respectively. For example, we
have already observed that the matrices U\\ and V\\ in the expression
are within 0(e2) of orthogonal matrices. It follows that S + HV2iVll1 contain 0(e 2 )
approximations to the singular values of EI. But H-H'VjjiV^ 1 ^ = 0(e2), so that S
also contains 0(e2) approximations to those of EI. It is straightforward to evaluate
bounds on the error given e.
saged in various ways. For example, if we observe that (V^ V^) has orthonormal
columns, we may conclude that
which gives an easily computable bound on the square root of the sum of squares of
the sines of the canonical angles.
In some cases a little common sense will help, as in the following example.
Example 1.2. The matrix
Since both X and X have been transformed by U and V, the bounds in Theorem 1.1
bound the canonical angles between the singular subspaces of X and X.
At first glance we seem not to have gained much, since the right-hand side of (1.12)
is unknowable. However, because our transformations are orthogonal we may bound
the quantities in the theorem by any bound e on the norm of E:
Moreover, from the min-max characterization of singular values (Corollary 4.30, Chap-
ter 1) we have
www.pdfgrip.com
a quantity which we have already computed and used. Thus we may apply Theo-
rem 1.1 to obtain completely rigorous bounds on the canonical angles between the
singular subspaces of X and X.
The bounds obtained from this procedure may be pessimistic. The reason is that
11E112 will generally be an overestimate for the quantities in the theorem. One cure is to
make probabilistic assumptions about the error and calculate estimates of the quanti-
ties. However, it would take us too far afield to develop this approach here. For more,
see the notes and references.
Singular subspaces
Singular subspaces are a natural analogue of the invariant subspace associated with
eigendecompositions. The term itself is comparatively new but is now well estab-
lished. The use of "superior" and "inferior" to denote singular subspaces associated
with leading and trailing sets of singular values is new. When X is of rank m and the
breaking point in the singular values is set at m, these spaces are the row, column, and
null spaces of X. Strang [313] calls these particular subspaces fundamental subspaces.
Following Per Christian Hansen, I have applied the term "fundamental" to the four su-
perior and inferior subspaces at any fixed break point m. In array signal processing
the right superior and inferior subspaces are called the signal and noise subspaces.
www.pdfgrip.com
Rank determination
The approach taken here is basic common sense, refined somewhat to show its limita-
tions. The idea of looking for gaps in the singular values is natural and often recom-
mended. The assumptions (1.11) are less often emphasized by numerical analysts—
perhaps through overacquaintance with easy problems like the one in Example 1.2. It
is worth stressing that the gap must be reasonably large compared with the particular
error estimate e that one is actually using. Too large an e can cause a gap to be missed.
XWA~5 is effectively whitened, and the columns of this matrix are graded down-
ward. Although we cannot guarantee that our computational procedures will work
well with such matrices, by and large they do.
• Project out errorless columns. Suppose that the first k columns of X are error-
less. It has been shown in [150] that the appropriate way to handle this situation is to
project the last p-k columns onto the orthogonal complement of the space spanned
by the first k and to work with that matrix (see also [92]). In the general case, if we
compute the spectral decomposition of D then the initial columns of X W—the ones
corresponding to zero eigenvalues—are error free, and we can use the same proce-
dure.
• Solve a generalized eigenvalue problem. It can be shown that the squares of the
singular values are the eigenvalues of the generalized eigenvalue problem X^-Xv =
fiDv. Consequently, we can form the cross-product matrix X^X and solve the gener-
alized eigenvalue problems. This is the way statisticians do it in treating measurement
error models [127]. The procedure is open to the same objections that apply to forming
the normal equations (§2.3, Chapter 4).
• Compute a generalized singular value decomposition. It can be shown that there
are orthogonal matrices Qx and QD and a nonsingular matrix B such that the matrices
(Q^XBJll: p, 1 : p] and Q^DB are diagonal. The ratios of their diagonal elements
are the singular values of the whitened X. The computation of the generalized singular
value decomposition avoids the need to compute cross-product matrices. (The gener-
alized singular value decomposition was introduced by Van Loan [326,1975] and was
reformulated in a more convenient form by Paige and Saunders [249].)
All these methods have their advantages and drawbacks. In practice, zero eigen-
values of X are usually part of the structure of the problem and can be projected out of
it. The remaining eigenvalues are not usually small, at least compared to the double-
precision rounding unit, and one can use whatever method one finds convenient.
Another approach is to use first-order perturbation theory to compute test statistics
directly from the unwhitened data in which D but not its inverse appears. I know of
no systematic exposition of this approach, although I gave one in an earlier version of
this section. For more see [299, 302].
At the kth stage of the algorithm we will have computed k-1 Householder trans-
formations Hi and k-1 interchanges II; such that
where R is upper triangular. We now repeat the pivoting strategy of the first step: Find
the column of X^k of largest 2-norm, interchange it with the initial column, and pro-
ceed with the reduction.
www.pdfgrip.com
The principal computational difficulty with this strategy is the expense of com-
puting the norms of the columns of Xkk- We can solve this problem by using Al-
gorithm 3.10, Chapter 4, to downdate the vectors. The result is Algorithm 2.1. Note
that the columns of R are interchanged along with the columns of X to preserve the
integrity of the final decomposition [cf. (2.2)]. Here are some comments on the algo-
rithm.
• The current pivot column is chosen at statement 6. The strategy given here is col-
www.pdfgrip.com
umn pivoting for size, but we could substitute any other strategy at this point.
• The downdating function vecdd requires 0(1) time, except in the rare cases when
the norm must be recomputed from scratch. Thus the pivoted algorithm takes essen-
tially the same amount of time as the unpivoted algorithm.
• For j = k,... , n let Xj denote the vector contained in X[k:n, j] at the kth step of
the algorithm. These vectors are transformed by the subsequent Householder transfor-
mations into the vectors R[k:j, j ] . Since Householder transformations are orthogonal,
after some rearrangement to account for the subsequent pivoting, we have
But since we pivot the largest column into X[k:n, k], it follows that
Thus the kth diagonal element of R dominates the trailing principal submatrix of R. In
this light the pivoting strategy can be regarded as a greedy algorithm keeping R well
conditioned by keeping its diagonal elements large.
• If, after the interchange (statement 8), the vector X[k:n, k] is zero, the entire ma-
trix X[k:n, k:p] must also be zero, and the algorithm can be terminated. With inexact
data, we are more likely to encounter a column of small norm. By (2.3), the trailing
elements of R will be dominated by \R[k, k]\ = \\X[k:n,k]\\2. Thus if we have a cri-
terion for determining when elements of R can be regarded as zero, we can terminate
the algorithm simply by inspecting the norms ||X[fc:n, &]||2.
• The algorithm is frequently used to extract a well-conditioned, square matrix from
a collection of p n-vectors. In this case, n > p, and we must take special action in
processing the last column—hence the if in statement 9.
Theorem 2.1. Let the pivoted QR decomposition that is computed by Algorithm 2.1
be partitioned in the form
where R is of order ra-1. By the induction hypothesis, if rank(X) > ra > k, then
Rn is nonsingular. Moreover, Xmm is nonzero, for otherwise the rank of X would
be ra - 1. Consequently, r mm , which is the norm of the largest columns of Xmm is
nonzero, and Rn is nonsingular.
If jRn is nonsingular, then for rank(X) to be equal to ra we must, as above, have
#22 = 0
Thus if X is of rank ra, the pivoted QR decomposition (computed exactly) will
reveal the rank of X by the presence of a zero trailing principal submatrix of order
p-m. In this case, Q\ provides an orthonormal basis for the column space of X (the
left superior subspace), and (Q^ Q±.) provides a basis for the orthogonal complement
(the left inferior subspace). Moreover, if we partition the pivoted matrix X = XII in
the form
Let Q 21 be an orthonormal basis for the orthogonal complement of the column space
ofQi. Then
is the sine of the largest canonical angle between the spaces spanned by X\ and X\.
www.pdfgrip.com
Thus if the singular values of RU are large compared with E,7Z(Xi) will be insensi-
tive to perturbations in Xi. The fact that column pivoting for size tends to keep RU as
well conditioned as possible is the reason why X\ tends to be a stable basis for 7£(X).
The decomposition does not directly provide orthonormal bases for the right fun-
damental subspaces of X. However, it is easy to see that
and
Bases for the original matrix X may be computed by premultiplying (2.5) and (2.6)
by II — i.e., by undoing the interchanges. Thus the algorithm provides (nonorthog-
onal) bases for the right fundamental subspaces of X, at the cost of some additional
calculation for the right inferior subspace.
When X is near a matrix of rank m we may hope that Algorithm 2.1 will return
an R-factor with #22 small. In this case the decomposition
only approximate the left fundamental subspaces of X. Similarly, the column spaces
Vi and V>2 of the bases (2.5) and (2.6) only approximate the right fundamental sub-
spaces. Unfortunately, because RI% need not be small we cannot apply Theorem 1.1
directly to bound the accuracy of these approximations. However, we can show the
following.
www.pdfgrip.com
Assume that
The sine su of the largest canonical angle between U\ and the right
superior subspace ofX is bounded by
The sine sv of the largest canonical angle between V\ and the left su-
perior subspace ofX is bounded by
where
1. S is a diagonal matrix with diagonals decreasing geometrically
from one to 10~3 with the last 50 values replaced by zero,
2. U and V are random orthogonal matrices,
3. E is a matrix of standard normal deviates.
Thus X represents a matrix of rank 50 perturbed by an error whose elements are one-
tenth the size of the last nonzero singular value.
Figure 2.1 plots the common logarithms of the singular values of X (solid line)
www.pdfgrip.com
and R-values of X (dotted line) against their indices. The +'s indicate the values of
7*50,50 and r$i$\. It is seen that there is a well-marked gap in the R-values, though not
as marked as the gap in the singular values.
Unfortunately, the pivoted QR decomposition is not foolproof, as the following
example shows.
Example 2.2. Let Kn be the upper triangular matrix illustrated below for n = 6:
where c2 + s2 = 1. All the columns of the matrix have the same 2-norm—namely,
one—so that if ties in the pivoting process are broken by choosing the first candidate,
the first step of Algorithm 2.1 leaves the matrix unchanged. Similarly for the the re-
maining steps. Thus Algorithm 2.1 leaves Kn unchanged, and the smallest R-value is
sn-1
However, the matrix can have singular values far smaller than sn~l. The follow-
www.pdfgrip.com
ing table
presents the 99th and 100th singular and R-values of KIQQ for various values of c.
When c = 0, Kn — I, and the R-values and singular values coincide. As c departs
from zero, however, there is an increasingly great gap between the next-to-last and last
singular values, while the ratio of the corresponding R-values remains near one.
This example, which is closely allied to Example 4.2, Chapter 3, shows that the
R-values from the pivoted QR decomposition can fail by orders of magnitude to reveal
gaps in the singular values. Although such dramatic failures seem not to occur in prac-
tice, the possibility has inspired a great deal of work on alternative pivoting strategies,
for which see the notes and references.
Assessment of pivoted QR
It is important to appreciate that a pivoted QR decomposition, whatever the pivoting
strategy, has fundamental limitations when it comes to revealing the properties of sin-
gular values. For example, we would hope that the first R-value r\\ of X would ap-
proximate the first singular value v\. But rn is the 2-norm of the first columns of X,
while oi is the 2-norm of the entire matrix. When X is of order n, the latter can exceed
the former by a factor of ^l/n. For example, if X — eeT, then \\X\\2 = n, while the
norm of the first column of X is \fn. Moreover, all the columns of X have the same
norm, so that no pivoting strategy can make the first R-value a better approximation
to the first singular value.
The graph in Figure 2.1 shows that, in a modest way, the problem occurs with-
out our looking for it. For the largest R-value in the graph underestimates the largest
singular value by a factor of greater than two. Moreover, the smallest R-value overes-
timates the smallest singular value by a factor of almost four.
The pivoted QR decomposition finds its most important application in rank-reduc-
tion problems where there is a strong gap in the singular values. Excluding artificial
examples like Kn, it is cheap and effective, and it isolates a set of independent columns
of X. However, the R-values from the decomposition tend to be fuzzy approximations
to the singular values. In §2.3 we will show how to sharpen the approximations by a
subsequent reduction to lower triangular form.
Specifically, if
so that the pivoted R-factor is the Cholesky factor of the permuted cross-product ma-
trix IITAII. Thus if we can find some way of adaptively determining pivots as we
compute the Cholesky factor of A, we can compute the pivoted R-factor of X directly
from A.
The problem has a nice solution when we pivot for size. At the kth step of the
pivoted Householder reduction, we have
[see (2.2)]. The pivot column is determined by examining the norms of the columns
x ( ' of Xkk • Now if we were to continue the computation without pivoting, we would
obtain a cross-product matrix of the form
Thus
and the quantities ||o;J- \\\ are the diagonals of R^kRkk- But by Theorem 1.6, Chap-
ter 3, the matrix R^kRkk is the Schur complement of AH. Thus if we compute the
Cholesky decomposition of A by the classical variant of Gaussian elimination, which
generates the full Schur complement at each stage, we will find the numbers we need
to determine the pivots on the diagonals of the Schur complement.
Algorithm 2.2 is an implementation of this procedure. Here are some comments.
• Only the upper half of the matrix A is stored and manipulated, and the lower half
of the array A can be used for other purposes.
• In many ways the trickiest part of the algorithm is to perform the interchanges,
which is done in statements 4-12. The problem is that only the upper half of the matrix
www.pdfgrip.com
Given a positive definite matrix stored in the upper half of the array A, this algorithm
overwrites it with its pivoted Cholesky decomposition.
1. for A; = 1 to n
2. Determine pk > k for which A\pk,Pk] is maximal
3. if(A[p fc ,pfc] = 0)quitfi
4. for i = ltok
5. A[i,k]+*A[i,pk]
6. end for i
1. fori = k+ltopk-l
8. A[k,i]+*A[i,pk]
9. end for i
10. for i = pk to n
11. A[Ar, i] «-» A[pfc, i]
12. end for z
13. temp[k:n] = A[fc, fc:n] = A[fc, fc:n]/-y/A[A;, A;]
14. forj = fc+lton
15. for« = A;+ltoj
16. A[i,j] = A[i,j]-temrfi]+A[k,j]
17. end for i
18. end for j
19. end for k
is stored, so we cannot simply interchange rows and then interchange columns. There
is no really good way to explain this code; the reader should verify by example that it
works.
• We have written out the inner loops that update the Schur complement (statements
14-18) because our notation does not provide a compact means of specifying that only
the upper part of a matrix is to be modified. In practice, however, this computation
would be done by a level-two BLAS.
• The scratch array temp in statements 13 and 16 has been introduced to preserve
column orientation. Without it statement 16 in the inner loop would become
where IIL and HR are permutations, Q and P are orthogonal, and L is lower triangular
The matrices Q and P are each the products of p Householder transformations storec
in the arrays Q and P. The matrices IIL and IIR are the products of interchanges whos<
indices are stored in the arrays pi and pr. (See Algorithm 2.1 for more details.)
1. hpqlp(X,pl,Q,L,pr,P)
2. hpqrd(X, Q, R, pr)
3. hpqrd(RT,P, L,pl)
4. L = IT
5. end hpqlp
We can obtain an even better value if we interchange the largest row of R with the first:
Now if we transpose (2.13), we see that it is the first step of pivoted Householder
triangularization applied to R? [cf. (2.1)]. If we continue this reduction and transpose
the result, we obtain a triangular decomposition of the form
We will call this the pivoted QLP decomposition of X and will call the diagonal ele-
ments of L the L-values of X.
• The algorithm consists essentially of two applications of the routine hpqrd to com-
pute pivoted QR decompositions. Since this kind of routine is widely implemented,
the pivoted QLP decomposition can be computed using off-the-shelf software.
• The operations count for hpqlp applied to an nxp matrix is approximately (np2 -
^p3) flam. In the above algorithm it is applied once to the nxp matrix X and once to
thepxp. Thus:
Algorithm 2.3 requires (np2 + ^p3) flam.
If n = p, the computation of L doubles the work over the initial computation of R. If
n > p the additional work is negligible.
• It might be thought that one could take advantage of the triangular form of R in its
subsequent reduction to L. But the reduction of the first row of R [see (2.13)] destroys
the triangularity.
• The decomposition also requires an additional p2 words of storage to contain L and
the generating vectors of the Householder transformations used to compute L. This
should be compared with the np words for the initial reduction.
Although the pivoted QLP decomposition costs more that a pivoted QR decompo-
sition, there are good reasons for bearing the expense. First, the pivoted QLP decom-
position tracks the singular values better. Second, it furnishes good approximations to
orthonormal bases for all four fundamental subspaces at any reasonable break point
m. We will consider each of these points in turn.
The above examples show that the pivoted QLP decomposition is better at track-
ing singular values and revealing gaps than the pivoted QR decomposition. That the
improvement is so striking is an empirical observation, unsupported at this time by
adequate theory.
Fundamental subspaces
If we incorporate the pivots in the QLP decomposition into the orthogonal transfor-
mations bv defining
where LU is of order m. Since the L-values tend to track the singular values, if there
is a gap in the latter at ra, the partition of P and Q provides orthonormal bases approx-
imating the four fundamental subspaces of X at m. Specifically,
1. Tl(Qi) approximates the left superior subspace of X,
2. Tl[(Q2 Qi)] approximates the left inferior subspace of X,
3. 7£(Pi) approximates the right superior subspace of X,
4. Tl(P'i) approximates the right inferior subspace of X.
Thus the pivoted QLP decomposition, like the pivoted QR decomposition, furnishes
orthonormal approximations to the left fundamental subspaces, but unlike the latter, it
also furnishes orthonormal approximations to the right fundamental subspaces.
We can apply Theorem 1.1 to bound the accuracy of these approximations. Specif-
ically, we have the following theorem.
Theorem 2.3. Let su be the sine of the largest canonical angle between the left supe-
rior subspace ofX and T^(Q\), and let sv be the sine of the largest canonical angle
www.pdfgrip.com
then
In the last section we stated the perturbation bounds [namely, (2.8)] for the QR de-
composition but deferred their proof until this section. In fact, the bounds follow di-
rectly from Theorem 2.3.
Specifically, suppose that the reduction of R to L is done without pivoting. Then
Q is left unaltered, so that U\ of (2.7) is the column space of Q\ from the QLP decom-
position. Moreover,
from the pivoted QR decomposition. Our theorem now applies to give the bounds
(2.14).
www.pdfgrip.com
foffaK(Qi) = K(Xi).
In the pivoted QRP decomposition we must replace Q by
Low-rank approximations
In some applications the matrix X is a perturbation of a matrix of rank ra, and it is
desired to compute a full-rank approximation of X of rank ra. One way to do this is
to start computing the pivoted QR decomposition via Algorithm 2.1 and stop after the
rath stage—or, if the rank is initially unknown, stop after the step ra at which a gap
appears. The resulting decomposition will have the form
This approximation could of course be obtained from the entire decomposition. How-
ever, if m is small and p is large, the savings in stopping the reduction are substantial.
There is a QLP variant of the full-rank decomposition. If we go on to compute the
pivoted QR decomposition
and set
then
This is not necessarily the same decomposition as we would obtain from the full piv-
oted QLP decomposition, since the range of pivots is restricted. However, as we ob-
served above, if the gap is substantial, the approximations to the fundamental sub-
spaces will likely be the same for both.
This procedure is particularly attractive in cases where gaps in the singular values
are narrow. It this case we use the pivoted QR decomposition as an exploratory tool to
locate potential gaps, and the QLP decomposition as a confirmatory tool. If it fails to
confirm the gap, the QR decomposition can be advanced until another potential gap is
found, and the QLP is advanced to check it. The details of this interleaving are tedious
but straightforward.
in which the gap in the singular values is quite evident [cf. (2.11)].
Rank-revealing QR decompositions
The impact of Kahan's example has been to fuel a search for better pivoting strategies.
An early strategy, first suggested in [297], is equivalent to computing the Cholesky fac-
tor with pivoting of the inverse cross-product matrix. In 1987 Chan [63] proposed a
method in which the R-factor is postprocessed to produce a small block in the south-
east corner. Although the theoretical bounds for the method were disappointing, its
obvious worth and its elegant name—Chan coined the term "rank-revealing QR de-
composition"— set off a search for alternatives [31, 32,64,161,180, 298]. For some
www.pdfgrip.com
of these algorithms to work the putative rank ra must be known ahead of time. Others
will find a gap at an unknown point but do not provably reveal multiple gaps.
of a matrix A of order n involves two quantities — the norm of a matrix A and the
norm of its inverse. The norm of A may or may not be difficult to calculate. Of the
commonly used norms, the 1-norm, the oo-norm, and the Frobenius norm are easy to
calculate. The 2-norm, which is the largest singular value of A, is expensive to calcu-
late. Computing the norm ofA~l introduces the additional problem of calculating the
inverse, which we have seen is an expensive undertaking.
In this section we will consider techniques by which norms of matrices and their
inverses can be estimated at a reasonable cost. We will begin with the LAPACK algo-
rithm for estimating the 1-norm of a matrix. This algorithm requires only the ability
to form the products of A and AT with a vector x. To estimate the 1 -norm of A~l, the
necessary products can be formed from a suitable decomposition of A. In §3.2 we will
consider some LINPACK-type estimators for \\T~l ||, where T is triangular. Unlike the
LAPACK estimator these estimators require a knowledge of the elements of the matrix
in question. Finally, in §3.3 we will consider a method for estimating the 2-norm of a
general matrix based on the QLP decomposition.
www.pdfgrip.com
The LAPACK 1-norm estimator is based on a technique for finding indices j\, j2,...
such that the quantities \\Aej4 ||i are strictly increasing. What makes the technique es-
pecially suitable for condition estimation is that it does not require that we know the
elements of A or a decomposition of A — only that we be able to multiply arbitrary
vectors by A and A1.
To derive the algorithm, suppose we have a vector v of 1-norm one for which we
hope m^||i approximates \\A\\i. We would like to determine if there is vector GJ that
gives a better approximation. One way is to compute Aej and compare its 1-norm
with the 1-norm of Av. This is fine for a single vector, but if we wish to investigate
all possible vectors GJ, the overhead becomes unacceptable.
To circumvent this problem, we use a weaker test that can fail to recognize when
\\Aej\\i > \\Av\\L Set
where
It follows that
Thus if Halloo > |H|i and 11 a: 11 oo = \Xj\, then ||Aej||i > ||M||I. Hence we can restart
our search by replacing v with ej.
These considerations lead to the following algorithm.
1. v = an initial vector with \\v\\i = 1
2. for* = 1,2,...
3. u = Av
4. w = sign(w)
5. x = ATw
6. if (Halloo < ||u||i) leave *fi
7. Choose j so that |x_,-| = ||x||oo
8. v = QJ
9. end for k
www.pdfgrip.com
The program must terminate, since the norms ||w||i are strictly increasing. On termi-
nation ||w||i = ||Av||i is the estimate of the 1-norm.
Because we have replaced ||-4ej||i by the lower bound wTAej\ from (3.1), the
test in statement 6 can bypass a vector ej that gives a better estimate. In fact, exam-
ples can be constructed for which the algorithm underestimates the 1-norm of A by
an arbitrary amount (see the notes and references). Nonetheless, the algorithm is very
good and can be made even better by the following modifications.
1. Rather than starting with a unit vector, the initial vector is taken to be n-1e.
This mixes the columns and avoids a chance choice of an uncharacteristi-
cally small column.
2. The number of iterations is restricted to be at least two and no more than
five.
3. The sequence of norm estimates is required to be strictly increasing (to avoid
cycling in finite precision arithmetic).
4. If the vector w is the same as the previous w convergence is declared.
5. On convergence the estimate is compared with the estimate obtained with
the vector
and the larger of the two taken as the estimate. This provides an additional
safeguard against an unfortunate starting vector.
Algorithm 3.1 implements this scheme. Here are some comments.
• The major source of work in the algorithm is the formation of matrix-vector prod-
ucts. The algorithm requires a minimum of four such products and a maximum of
eleven. The average is between four and five.
• The algorithm can be fooled, but experience shows that the estimate is unlikely to
be less than the actual norm by more than a factor of three. In fact, rounding errors
cause the algorithm to perform rather well on examples specifically designed to cause
it to fail.
• If A is replaced by AT, the algorithm estimates the oo-norm of A.
we can use Algorithm 3.1 to estimate ||>l||i by computing products in the forms
www.pdfgrip.com
Given a matrix A this algorithm returns an estimate nrm of the 1-norm of A and a
vector v such that || Av\\i = nrm.
Similarly, to estimate || A~l ||i we can compute products with the inverse in the forms
l
where the multiplications by L etc. are accomplished by solving triangular systems.
If A is ill conditioned and we apply Algorithm 3.1 to estimate \\A-1 ||i, we also
get an approximate null vector of A, To see this assume that A has been scaled so that
||A||i = 1. Then the algorithm returns a vector v and u = A~lv such that
It follows that
which is small because A is ill conditioned. Thus w/||w||i is an approximate null vector
of A
www.pdfgrip.com
(see Corollary 3.13, Chapter 3). Specifically, if we let D be the diagonal matrix whose
diagonal entries are the components of \E\\x , then
T
The last norm can be estimated by applying the algorithm to the matrix DA .
To put it in another way that does not involve the explicit inverse of L,
This suggests that one way to approximate \\L\\ is to choose a suitable u with ||u|| = 1
and solve the system
A simple estimator
In the forward substitution algorithm for solving (3.3), the ith component of v is given
by
www.pdfgrip.com
Given a triangular matrix T of order n, this routine returns an estimate inf of inf (T)
and a vector v of 2-norm one such that \\Lv\\2 = inf
1. rightinf(T, v, nrm)
2. if (T is lower triangular)
3. for i = 1 to n
4. d = r[l:»-l,»]*t;[l:t-l]
5. if(rf>0)
6. v[i] = -(l+d)/T[i,i]
1. else
8. v[i\ = (l-d)/T[i,i]
9. end if
10. end for i
11. else! T is upper triangular
12. for z = n t o l b y - 1
13. <f=T[t+l:n]*t>[t+l:nt>]
14. if(d>0)
15. v[i] = -(l+d}/T[i,i]
16. else
17. t;[i] = (l-d)/r[»,«]
18. end if
19. end for i
20. end if
21. v = v/^/n
22. Hi/=l/|H| 3
23. v = inf*v
24. end rightinf
the components of v will grow at a rate of about 10*. However, when T is a triangular
factor from a decomposition of a balanced matrix A, such growth is unlikely. The
reason is that the initial rounding of the matrix will increase the small singular values
to approximately ||A||2C M -
• One could write a corresponding program to approximate left inferior vectors. But
if transposing is cheap, then rightinf '(TT) will do the same job.
The following example reflects the ability of rightinf to reveal a rank deficiency.
Example 3.1. The routine rightinf was used to estimate the smallest singular value of
the matrix KIQQ of Example 2.2 for various values of the cosine ofc. The following
table shows the results.
It is seen that the output of rightinf gives a good estimate of the smallest singular value
ofKiQQ.
An enhanced estimator
Algorithm 3.2 is a greedy algorithm. It attempts to increase each vt- as much as possible
as it is generated. Many greedy algorithms can be made to fail because in their greed
they eat up resources that are needed later. Algorithm 3.2 is no exception. In particular,
a greedy choice at one point can make subsequent values of d in statements 4 and 13 too
small. An important enhancement in the UNPACK estimator is a device to look ahead
at the effects of the choice on these quantities. For variety, however, we will consider
www.pdfgrip.com
a different estimator, in which the components of u are allowed to vary, subject to the
constraint that \\u\\2 = 1.
To derive the algorithm, partition the lower triangular matrix L in the form
and suppose that we have determined u\ with 11 u\ \ \ 2 such that the solution of the equa-
tion
is suitably large. For any c and 5 with c2 + s2 = 1 the right-hand side of the system
has 2-norm one. Hence the solution vi of (3.5) is a candidate for the next vector.
A greedy algorithm would choose c and s to maximize ||t>i||2- However, such a
strategy overlooks the fact that we will also want the components of
is maximized subject to c2 + s2 = 1.
At first sight this appears to be a formidable problem. But it simplifies remarkably.
To keep the notation clean, set
Then a tedious but straightforward calculation shows that the problem of maximizing
(3.6) is equivalent to the following problem:
www.pdfgrip.com
where
Referring to (4.43), Chapter 1, we see that the solution is the normalized eigenvector
corresponding to the largest eigenvalue of
With the exception of the formation of p, the above operations require only 0(n)
work. The formation of p by the formula in (3.7) requires a higher order of work. For-
tunately, p can be updated after c and 5 have been determined as follows:
www.pdfgrip.com
for j equal to one or two. This is effectively as fast as incremental condition estimation
and provides a degree of additional protection.
• It is not recommended that the novice try to code the solution of the eigenproblem in
statement 12. Conceptually, the solution of 2x2 eigenvalue problems is trivial. Prac-
tically, the details are difficult to get right. LAPACK has a routine (SLAEV2) to do the
job.
• The comments made about scaling in Algorithm 3.2 apply here.
Condition estimation
In LINPACK the routines to estimate the norm of the inverse of a triangular matrix
are used as part of a more extensive algorithm to estimate condition. To motivate the
algorithm, consider the system
Thus if vn is not unusually small, the last term in the above sum will dominate, and z
will grow in proportion to <r~ 2 .
In UNPACK the triangular estimators are used to insure that the number vn is not
too small. Specifically, if we have decomposed A = LU, the first step in solving the
system (3.8) is to solve the triangular system
If we use a triangular estimator to determine the right-hand side w, then x will reflect
the ill-conditioning of U. Now if the LU factorization of A has been computed using
pivoting, the matrix L will tend to be well conditioned and hence x will also reflect the
ill-conditioning of A—that is, it will have significant components along the inferior
singular vectors. These considerations lead to the following condition estimator.
1. Solve the system f/ T x = w using a triangular estimator
to encourage growth in x
2. Solve L^y = x
3. Solve Az = y
4. Estimate \\A~l|| by ||*||/||y||
1. for j = k top
2. rkj = Xk Xj
3. rkj = rkj - r-ikrij nfe-i,*r*-ij
4. r fcj = rkj/^/ri^
5. end for fc
The first statement in the loop generates an element in the fcth row of A. The second
statement computes the Schur complement of that element. The third statement scales
the element so that it becomes an element of the R-factor.
Turning now to the pivoted algorithm, we must answer two questions.
The answer to the first question is that the squares of the norms that determine the piv-
ots are the diagonals of the Schur complement [see the discussion surrounding (2.12)].
These quantities can be formed initially and downdated as we add rows. The answer
to the second question is that we perform no interchanges. Instead we keep track of
indices of the columns we have selected as pivots and skip operations involving them
as we add the fcth row. For example, with a pivot sequence o f 3 , 2 , 4 , l i n a 4 x 4 matrix
we would obtain an "R-factor" of the form
Because the first kmax columns of X dominate the rest, the estimator will return a
value of norm2est of one. But the norm of the matrix is (n - kmax)/^/n.
4. UTV DECOMPOSITIONS
In some applications we must track the rank and the fundamental subspaces of a ma-
trix which changes over time. For example, in signal processing one must deal with a
sequence of matrices defined by the recursion
www.pdfgrip.com
where /3 < lisa positive forgetting factor that damps out the effects of old information
contained in Xj, (this way of damping is called exponential windowing). In general,
the matrix Xk will be a perturbation of a matrix Xk whose rank is exactly ra. Thus the
problem of tracking rank amounts to updating a decomposition that reveals the small
singular values due to the perturbation.
Unfortunately, the singular value decomposition itself is not updatable—or rather
not cheaply updatable. An alternative is to update a pivoted QR or QLP decomposi-
tion; however, no satisfactory gap-revealing algorithm to do this is known. The ap-
proach taken in this section is to update a decomposition of the form
The fact that a pair of small elements remains small (column 7) follows from the fact
that a plane rotation is orthogonal and cannot change the norm of any vector to which
it is applied. As usual a zero paired with a nonzero element is annihilated, but if the
other element is an E, the zero is replaced by an E (column 8).
The importance of these observations for our algorithms is that it is possible to
organize calculations with plane rotations in such a way as to preserve patterns of small
elements. In particular, we can preserve the gap-revealing structure of URV and ULV
decompositions.
A little additional nomenclature will prove useful. Premultiplication by a plane
rotation operates on the rows of the matrix. We will call such rotations left rotations.
Postmultiplication by right rotations operates on the columns. Rules analogous to
those in (4.2) hold when a right rotation combines two columns of a matrix. We will
denote left and right rotations in the (i, j)-plane by Qij and Pij respectively.
URV decompositions
Let X be an nxp matrix. A URV decomposition of X is a decomposition of the form
where U and V are orthogonal and R is upper triangular. As we mentioned in the intro-
duction to this section, there are many URV decompositions—including the singular
value decomposition and the QR decomposition.
Suppose X has a gap in its singular values at m. We will say that a URV decom-
position of X is gap revealing if it can be partitioned in the form
where
1. 5 is of order m,
2. inf(S) * am(X],
3. ||F|| 2 S<a m+1 pf),
4. 11H112 is suitably small.
The last condition insures that the blocks of the partitioned matrices U and V will ap-
proximate the fundamental subspaces of X. We are going to show how to update a
gap-revealing URV decomposition when a row is added to X. Although it is possi-
ble to update both the matrices U and V, the order of U can grow beyond reasonable
bounds as more and more rows are added. Fortunately, in most applications we are
interested in only the right fundamental subspaces. Consequently, we will ignore U in
what follows.
The basic algorithm consists of two steps: the updating proper and the adjustment
of the gap. We will consider each in turn.
Incorporation
We will suppose that we have a rank revealing URV decomposition of the form (4.3)
and that we wish to updated it by adding a row X T to X. The first step is to transform
it into the coordinate system corresponding to V. This is done as follows
We are then left with the problem of incorporating t/T into the decomposition—i.e.,
of reducing the matrix
www.pdfgrip.com
to triangular form.
To see the chief difficulty in effecting this reduction, imagine that H and F are
exactly zero, so that the matrix is of rank ra. If 3/2 is nonzero, the updated matrix will
beofrankra+1. Now if we use plane rotations to fold y\ into S [see (3.14), Chapter 4],
quantities from y2 will fill H and F, and it will be impossible to tell that the matrix is
of rank ra+1. In the general case, where H and F are small but y^ is large, the gap
that should be at position m+1 will be obliterated.
Thus we must distinguish two cases. In the first, where 3/2 is small enough, we can
simply perform Cholesky updating as in (3.14), Chapter 4. This will cost \p2 flrot for
the reduction. Since only left rotations are performed, no rotations have to be accu-
mulated in V. We will call this form of incorporation simple incorporation.
If 7/2 is too large, we zero out the last p-m components of ?/T, so that the effect
of 2/2 on the updating is restricted to the (ra+l)th column. We will call this process
constrained incorporation. The process, which affects only H and F, is illustrated
below.
Note that in these diagrams the Vs represent the last row of H, and the planes of the
rotations are relative to the northwest corner of the diagram. Code for this reduction
is given below.
www.pdfgrip.com
By items 5 and 6 in the list (4.2) of rules for plane rotations, the elements in the last
ra—p columns of H and F must remain small. However, if ym+i is large, its effect
will spread through the first columns of H and F. In other words, the addition of #T
to the decomposition has the potential to increase the estimated rank by one. Code for
this reduction is given below.
1. for k = 1 top
2. rotgen(R[k,k],y[k],c,8)
3. rotapp(R[k, k+l:p], y[k+l:p], c, s)
4. end for k
This completes the incorporation process.
Deflation
We must now consider the possibility that m is too large, i.e., that S has a small sin-
gular value that is not revealed by the current decomposition. To do this we estimate
the smallest singular value of S by a LINPACK-style estimator. Call the estimate a. A
byproduct of this process is a vector w of 2-norm one such that a — ||5ty||2. If & is
suitably small, we conclude that ra is too large.
In the event that ra is too large, we must alter the decomposition to reveal the small
singular value—a process we will call deflation. The basic idea is simple. Suppose
www.pdfgrip.com
Thus the last column of S has 2-norm a and can be moved into H and F by reducing
m.
In practice we will determine P and Q as the product of plane rotations. The re-
duction of w goes as follows.
As the rotations PZJ are generated they are applied to 5, and the resulting deviation
from triangularity is undone by a left rotation.
The labeling of the last column of the result of this reduction reflects where the ele-
ments will end up when m is reduced.
The following is code for the reduction. We place it in a loop in which the process
is repeated until m cannot be reduced.
www.pdfgrip.com
1. while (1=1)
2. Determine a and w such that a is an estimate
of the smallest singular value of R[I:m, l:ra],
\\w\\2 = 1, and ||.R[I:TO, l:m]*w||2 = a
3. if (a is large) leave the while loop fi
4. forfc = ltom-l
5. rotgen(w[k+l], w[k], c, s)
6. rotapp(R[l:k+l,k+l], R[l:k+l,k], c, 5)
7. rotapp(V[:,k+l],V[:,k],c,s)
8. rotgen(R[k, k], R[k+l, k] c, s)
9. rotapp(R[k, fc+l:p], R[k+l,k+l:p] c, 5)
10. end for k
11. Refine the decomposition (optional)
12. m = TO—1;
13. end while
We have added an additional refinement step, which will be treated later.
The most expensive task in the algorithm is constrained updating for small m, in which
case the count is approximately |p2.
Refinement
We can apply Theorem 1.1 to assess the accuracy of the column spaces of V\ and Vi
as approximations to the right fundamental subspaces of X. Specifically, the sines of
the canonical angles between the spaces are bounded by
www.pdfgrip.com
where
is the gap ratio. In a gap-revealing decomposition, the quantity \\F\\2 = crm+i is effec-
tively fixed. Consequently, if we are unhappy with the accuracy of the approximations
we must reduce the size of ||^T||2- At the cost of some additional work we can do just
that.
To motivate our refinement step, suppose that R has a gap at m and partition the
leading (m+l)x(m+l) principal submatrix of R in the form
Now suppose that we generate an orthogonal matrix that reduce this submatrix to block
lower triangular form:
We are going to show that if there is a good gap ratio, the norm of g will be less than
that of h.
From (4.9) we have
and
(remember |7r22| < 1). Since ||pi2||2 = IbJilh* we nave fr°m (4.10)
In other words, the norm of g is smaller than the norm of h by a factor no larger than
the gap ratio. If we now reduce the left-hand side of (4.9) back to upper triangular
form, we will obtain a URV decomposition in which h is reduced by about the square
of the gap ratio.
Algorithmically, the two reductions are easily implemented. Specifically, we can
generate the lower triangular matrix by the sequence of transformations illustrated be-
low.
www.pdfgrip.com
Given a URV decomposition with a gap at ra, this algorithm reduces the size of
R[l:m,m+l].
1. for k - mto 1 by -1
2. rotgen(R[k, k}, R[k, m+1], c, s)
3. rotapp(R[l:k-l,k], R[l:k-l,m+l], c, s)
4. rotapp(R[m+l, A;], E[m+l, m+1], c, 3)
5. rotapXV[:, *], ^[:, m+1], c, 5)
6. end for k
1. fork - 1 to m
8. rotgen(R[k, k], R[m+l,k], c, s)
9. rotapp(R[k, k+l:p], R[m+l, k+l:p], c, 5)
10. end for k
The return to upper triangular form is analogous to the basic Cholesky update.
Algorithm 4.2 implements this refinement. It requires approximately
Imp flrot.
For small m the additional work is insignificant. For m = p, the algorithm requires
about 2p2 flrot. This should be compared with the count of |p2 flrot for the basic up-
dating.
The decision to refine must be based on the application. If one expects m to be
small, there is no reason not to take advantage of the benefits of refinement. On the
other hand, if m is near p refinement quadruples the work over the basic update step.
Low-rank splitting
We have seen that when m is small the constrained URV updating scheme requires
about |p 2 flrot—seven times the amount of work required for a simple update. It turns
www.pdfgrip.com
out that if we are willing to update only the bases for the right superior subspace, we
can reduce the work considerably.
To see how this comes about, let us examine the decomposition computed by Al-
gorithm 4.1 right after statement 3, where the last t-m components of the vector y1 =
VTx are annihilated. At this point the decomposition has the form
Note that in the course of this reduction the vector y^ = x^Vi remains unaltered—
only the last p—m components of y1 are changed. Consequently, the first m steps
of the subsequent reduction to triangular form amount to a simple QR update on the
matrix
Since only left rotations are used in this part of the reduction, the matrix V\ does not
change. The total count for this algorithm is ^m 2 flrot.
Although this algorithm allows us to test 5 and if necessary decrease m, it does
not allow us to increase m, which we would have to do if 77 were large. Fortunately,
we can compute 77 and v^ directly. For we have
By the orthogonality of V it follows that v% is just the projection of x onto the orthogo-
nal complement of 7l(Vi) and 77 = v%x. Thus we can generate v2 — say by the Gram-
Schmidt algorithm with reorthogonalization (Algorithm 1.13, Chapter 4)—then com-
pute 77 and test its size.
At this point we cannot proceed with a direct update, since we do not know h and
(p. However, if we are using exponential windowing, we can update
increase m by one, and take for our new V\ the matrix (Vi v^). Although this decom-
position will not be exact, the exponential windowing will damp out the inaccuracies,
so that in a few iterations it will be good enough for practical purposes.
expensive to compute. Since the basic ideas are the same as for URV decompositions,
we will only sketch the updating and deflation algorithms using Wilkinson diagrams.
ULV decompositions
A ULV decomposition of the n xp matrix X has the form
where
1. S is of order m,
2. inf(S) 3 a m (X),
3. ||F||2 S am+1(X),
4. ||G||2 is suitably small.
Theorem 1.1 shows that the sines of the canonical angles between the column
spaces spanned by V\ and V-z and the corresponding right fundamental subspaces of
X will be bounded by
where
is the gap ratio. This is better by a factor of p than the corresponding bound (4.8) for
the URV decomposition. This suggests that we try to update ULV decompositions as
an alternative to refining URV decompositions.
As with URV updating, any direct attempt to fold y into the decomposition can
make the elements of G and F large. To limit the damage, we reduce the last m—p
components to zero as follows.
Note how the reduction introduces a row of ?/'s in the (ra+l)th row (the fourth
row in the diagram). To handle them we increase m by one and attempt to deflate S,
The deflation process is analogous to the one for the URV decomposition. We use a
condition estimator to determine a vector w of 2-norm one such that 11 w1 X \ \ 2 is small.
We then reduce it; to a multiple of em as follows.
After this step has been performed we may proceed to adjust the gap, as in URV up-
dating.
The ULV update is less flexible than the URV update—less able to take advantage
of a vector x that essentially lies in the column space of Vi. For example, we must
always reduce the trailing components of y. Moreover, after we have folded y into S,
we are left with a large row in G. This makes a deflation step mandatory.
The following is a list of the operation counts for the pieces of the algorithm.
Obviously counts like these and the ones for URV updating provide only crude hints
concerning which method to use. Experience and experiment will be a better guide.
in which small elements are set to zero (see, e.g., [153, §5.4.2]). The difference is that
here the small elements are retained and preserved during the updating. A condition
estimator is necessary to keep the operation count for the update to 0(n2). For appli-
cations in signal processing see [3, 219].
The UTV decompositions were introduced to overcome the difficulties in updating
the singular value decomposition. However, in some circumstances one or two steps
of an iterative method will suffice to maintain a sufficiently accurate approximation to
the singular value decomposition [235].
The low-rank version of URV updating is due to Rabideau [266], who seems not
to have noticed that his numerical algorithms are minor variations on those of [301].
Although we have developed our algorithms in the context of exponential win-
dowing, the algorithms can also be downdated. For more see [16,219, 251].
www.pdfgrip.com
REFERENCES
417
www.pdfgrip.com
418 REFERENCES
[12] M. Arioli, J. W. Demmel, and I. S. Duff. Solving sparse linear systems with
sparse backward error. SIAM Journal on Matrix Analysis and Applications,
10:165-190,1989.
[13] O. Axelsson. Iterative Solution Methods. Cambridge University Press, Cam-
bridge, 1994.
[14] S. Banach. Sur les operations dans les ensembles abstraits et leur application
aux equations integrales. Fundamenta Mathematicae, 3:133-181,1922.
[15] J. L. Barlow. Error analysis and implementation aspects of deferred correction
for equality constrained least squares problems. SIAM Journal on Numerical
Analysis, 25:1340-1358,1988.
[16] J. L. Barlow, P. A. Yoon, and H. Zha. An algorithm and a stability theory for
downdating the ULV decomposition. BIT, 36:14-40,1996.
[17] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donate, J. Dongarra, V. Eijkhout,
R. Pozo, C. Romine, and H. van der Vorst. Templates for the Solution of Linear
Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994.
[18] A. Barrlund. Perturbation bounds for the LDLH and the LU factorizations.
5/7,31:358-363,1991.
[19] F. L. Bauer. Optimal scaling of matrices and the importance of the minimal con-
dition number. In C. M. Popplewell, editor, Proceedings of the IFIP Congress
1962, pages 198-201. North-Holland, Amsterdam, 1963. Cited in [177].
[20] F. L. Bauer. Optimally scaled matrices. Numerische Mathematik, 5:73-87,
1963.
[21] F. L. Bauer. Genauigkeitsfragen bei der Losung linear Gleichungssysteme.
Zeitschriftfur angewandte Mathematik undMechanik, 46:409-421,1966.
[22] F. L. Bauer and C. Reinsch. Inversion of positive definite matrices by the Gauss-
Jordan methods. In J. H. Wilkinson and C. Reinsch, editors, Handbook for Au-
tomatic Computation Vol. 2: Linear Algebra, pages 45-49. Springer-Verlag,
New York, 1970.
[23] A. E. Beaton. The use of special matrix operators in statistical calculus. Re-
search Bulletin 64-51, Educational Testing Service, Princeton, NJ, 1964.
[24] R. Bellman. Introduction to Matrix Analysis. McGraw-Hill, New York, second
edition, 1970.
[25] E. Beltrami. Sulle funzioni bilineari. Giornale di Matematiche ad Uso degli
Studenti Delle Universita, 11:98-106,1873. An English translation by D. Bo-
ley is available in Technical Report 90-37, Department of Computer Science,
University of Minnesota, Minneapolis, 1990.
www.pdfgrip.com
REFERENCES 419
420 REFERENCES
REFERENCES 421
[54] J. R. Bunch and B. N. Parlett. Direct methods for solving symmetric indefinite
systems of linear equations. SIAM Journal on Numerical Analysis, 8:639-655,
1971.
[55] P. Businger and G. H. Golub. Linear least squares solutions by Householder
transformations. Numerische Mathematik, 7:269-276, 1965. Also in [349,
pp. 111-118].
[56] H. G. Campbell. Linear Algebra. Addison-Wesley, Reading, MA, second edi-
tion, 1980.
[57] A. L. Cauchy. Cours d'analyse de 1'ecole royale polytechnique. In Oeuvres
Completes (He Serie), volume 3. 1821.
[58] A. L. Cauchy. Sur 1'equation a 1'aide de laquelle on determine les inegalites
seculaires des mouvements des planetes. In Oeuvres Completes (IIe Serie), vol-
ume 9. 1829.
[59] A. Cayley. Remarques sur la notation des fonctions algebriques. Journal fur
die reine und angewandte Mathematik, 50:282-285,1855. Cited and reprinted
in [61, v. 2, pp. 185-188].
[60] A. Cayley. A memoir on the theory of matrices. Philosophical Transactions of
the Royal Socient of London, 148:17-37,1858. Cited and reprinted in [61, v. 2,
pp. 475-496].
[61] A. Cayley. The Collected Mathematical Papers of Arthur Cayley. Cambridge
University Press, Cambridge, 1889-1898. Arthur Cayley and A. R. Forsythe,
editors. Thirteen volumes plus index. Reprinted 1963 by Johnson Reprint Cor-
poration, New York.
[62] J. M. Chambers. Regression updating. Journal of'the American Statistical As-
sociation, 66:744-748,1971.
[63] T. F. Chan. Rank revealing QR factorizations. Linear Algebra and Its Applica-
tions, 88/89:67-82,1987.
[64] S. Chandrasekaran and I. Ipsen. On rank-revealing QR factorizations. SIAM
Journal on Matrix Analysis and Applications, 15:592-622,1991. Citation com-
municated by Per Christian Hansen.
[65] B. A. Chartres and J. C. Geuder. Computable error bounds for direct solution
of linear equations. JournaloftheACM, 14:63-71,1967.
[66] F. Chio. Meoire sur les functions connues sus le nom des resultantes ou de
determinants. Trurin. Cited in [339], 1853.
[67] C. W. Clenshaw and F. W. J. Olver. Beyond floating point. Journal of the ACM,
31:319-328,1984.
www.pdfgrip.com
422 REFERENCES
[68] A. K. Cline, A. R. Conn, and C. F. Van Loan. Generalizing the UNPACK con-
dition estimator. In J.P. Hennart, editor, Numerical Analysis, Lecture Notes in
Mathematics 909, pages 73-83. Springer-Verlag, Berlin, 1982. Cited in [177].
[69] A. K. Cline, C. B. Moler, G. W. Stewart, and J. H. Wilkinson. An estimate
for the condition number of a matrix. SIAM Journal on Numerical Analysis,
16:368-375,1979.
[70] A. K. Cline and R. K. Rew. A set of counter examples to three condition number
estimators. SIAM Journal on Scientific and Statistical Computing, 4:602-611,
1983.
[71] T. Coleman and C. F. Van Loan. Handbook for Matrix Computations. SIAM,
Philadelphia, 1988.
[72] John B. Conway. A Course in Functional Analysis. Springer-Verlag, New York,
1985.
[73] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms.
McGraw-Hill, New York, 1990.
[74] R. B. Costello, editor. The American Heritage College Dictionary. Houghton
Mifflin, Boston, third edition, 1993.
[75] R. W. Cottle. Manifestations of the Schur complement. Linear Algebra and Its
Applications, 8:189-211,1974.
[76] R. Courant. Ueber die Eigenwert bei den Differentialgleichungen der Mathe-
matischen Physik. Mathematische Zeitschrift, 7:1-57,1920.
[77] A. J. Cox and N. J. Higham. Stability of Householder QR factorization for
weighted least squares. Numerical Analysis Report 301, Department of Math-
ematics, University of Manchester, 1997.
[78] H. G. Cragon. Memory Systems and Pipelined Processors. Jones andBartlett,
Sudbury, MA, 1996.
[79] P. D. Crout. A short method for evaluating determinants and solving systems of
linear equations with real or complex coefficients. Transactions of the Ameri-
can Institute of Electrical Engineers, 60:1235-1240,1941.
[80] M. J. Crowe. A History of Vector Analysis. University of Notre Dame Press,
Notre Dame, IN, 1967.
[81] C. W. Cryer. The LU factorization of totally positive matrices. Linear Algebra
and Its Applications, 7:83-92,1973.
[82] J. K. Cullum and R. A. Willoughby. Lanczos Algorithms for Large Sym-
metric Eigenvalue Computations, volume 1: Theory, volume 2: Programs.
Birkhauser, Stuttgart, 1985. Cited in [41].
www.pdfgrip.com
REFERENCES 423
[83] A. R. Curtis and J. K. Reid. On the automatic scaling of matrices for Gaussian
elimination. Journal of the Institutefor Mathematics and Applications, 10:118-
124,1972.
[84] J. Daniel, W. B. Gragg, L. Kaufman, and G. W. Stewart. Reorthogonalization
and stable algorithms for updating the Gram-Schmidt QR factorization. Math-
ematics of Computation, 30:772-795,1976.
[85] G. B. Dantzig. Linear Programming and Extensions. Princeton University
Press, Princeton, NJ, 1963.
[86] B. N. Datta. Numerical Linear Algebra and Applications. Brooks/Cole, Pacific
Grove, CA, 1995.
[87] W. C. Davidon. Variable metric method for minimization. AEC Research and
Development Report ANL-5990, Argonne National Laboratory, Argonne, IL,
1959.
[88] C. Davis and W. Kahan. The rotation of eigenvectors by a perturbation. III.
SIAM Journal on Numerical Analysis, 7:1-46,1970.
[89] P. J. Davis. Interpolation and Approximation. Blaisdell, New York, 1961.
Reprinted by Dover, New York, 1975.
[90] C. de Boor and A. Pinkus. A backward error analysis for totally positive linear
systems. Numerische Mathematik, 27:485-490,1977.
[91] J. Demmel. On error analysis in arithmetic with varying relative precision. In
Proceeding of the Eighth Symposium on Computer Arithmetic, pages 148-152.
IEEE Computer Society, Washington, DC, 1987.
[92] J. Demmel. The smallest perturbation of a submatrix which lowers the rank and
constrained total least squares problems. SIAM Journal on Numerical Analysis,
24:199-206,1987.
[93] J. Demmel and K. Veselic. Jacobi's method is more accurate than QR. SIAM
Journal on Matrix Analysis and Applications, 13:1204—1245,1992.
[94] J. W. Demmel. Three methods for refining estimates of invariant subspaces.
Computing, 38:43-57,1987.
[95] Jean Dieudonne. History of Functional Analysis. North-Holland, Amsterdam,
1981.
[96] E. W. Dijkstra. Go to statement considered harmful. Communications of the
ACM, 11:147-148,1968.
[97] J. D. Dixon. Estimating extremal eigenvalues and condition numbers of matri-
ces. SIAM Journal on Numerical Analysis, 20:812-814,1983.
www.pdfgrip.com
424 REFERENCES
REFERENCES 425
[111] P. S. Dwyer. A matrix presentation of least squares and correlation theory with
matrix justification of improved methods of solution. Annals of Mathematical
Statistics, 15:82-89,1944.
[112] P. S. Dwyer. Linear Computations. John Wiley, New York, 1951.
[113] C. Eckart and G. Young. The approximation of one matrix by another of lower
rank. Psychometrika, 1:211-218,1936.
[114] M. A. Ellis and B. Stroustrup. The Annotated C++ Rerenence Manual.
Addison-Wesley, Reading, MA, 1990.
[115] D. K. Faddeev and V. N. Faddeeva. Computational Methods of Linear Alegbra.
W. H. Freeman and Co., San Francisco, 1963.
[116] V. N. Faddeeva. Computational Methods of Linear Algebra. Dover, New York,
1959. Translated from the Russian by C. D. Benster.
[117] W. Feller and G. E. Forsythe. New matrix transformations for obtaining char-
acteristic vectors. Quarterly of Applied Mathematics, 8:325-331,1951.
[118] R. D. Fierro. Perturbation theory for two-sided (or complete) orthogonal de-
compositions. SIAM Journal on Matrix Analysis and Applications, 17:383-
400,1996.
[119] E. Fischer. Uber quadratische Formen mit reelen Koffizienten. Monatsheftefiir
MathematikundPhysik, 16:234-249,1905.
[120] G. Forsythe and C. B. Moler. Computer Solution of Linear Algebraic Systems.
Prentice-Hall, Englewood Cliffs , NJ, 1967.
[121] L. V. Foster. Gaussian elimination with partial pivoting can fail in practice.
SIAM Journal on Matrix Analysis and Applications, 15:1354-1362,1994.
[122] L. Fox. An Introduction to Numerical Linear Algebra. Oxford University Press,
New York, 1965.
[123] J. G. F. Francis. The QR transformation, parts I and II. Computer Journal,
4:265-271,332-345,1961,1962.
[124] F. G. Frobenius. Uber den von L. Bieberbach gefundenen Beweis eines Satzes
von C. Jordan. Sitzungsberichte der Koniglich Preusischen Akademie der
Wisenschaften zu Berlin, 3:492-501, 1911. Cited and reprinted in [126, v. 3,
pp. 492-501].
[125] F. G. Frobenius. Uber die unzerlegbaren diskreten Beweguugsgruppen.
Sitzungsberichte der Koniglich Preusischen Akademie der Wissenschaften zu
Berlin, 3:507-518,1911. Cited and reprinted in [126, v. 3, pp. 507-518].
[126] F. G. Frobenius. Ferdinand Georg Frobenius. Gesammelte Abhandlungen, J.-P.
Serre, editor. Springer-Verlag, Berlin, 1968.
www.pdfgrip.com
426 REFERENCES
[127] W. A. Fuller. Measurement Error Models. John Wiley, New York, 1987.
[128] G. Fumival and R. Wilson. Regression by leaps and bounds. Technometrics,
16:499-511,1974.
[129] F. R. Gantmacher. The Theory of Matrices, Vols. I, II. Chelsea Publishing Com-
pany, New York, 1959.
[130] Carl Friedrich Gauss. Theoria Motus Corporum Coelestium in Sectionibus
Conicis Solem Ambientium. Perthes and Besser, Hamburg, 1809. Cited and
reprinted in [138, v. 7, pp. 1-261]. English translation by C. H. Davis [137].
French and German translations of Book II, Part 3 in [136,139].
[131] Carl Friedrich Gauss. Disquisitio de elementis ellipticis Palladis. Commen-
tatines societatis regiae scientarium Gottingensis recentiores, 1, 1810. Cited
and reprinted in [138, v. 6, pp. 1-64]. French translation of §§13-14 in [136].
German translation of §§10-15 in [139].
[132] Carl Friedrich Gauss. Anzeige: Theoria combinationis observationum er-
roribus minimis obnoxiae: Pars prior. Gottingische gelehrte Anzeigen, 33:321-
327,1821. Cited and reprinted in [138, v. 4, pp. 95-100]. English translation in
[140].
[133] Carl Friedrich Gauss. Anzeige: Theoria combinationis observationum er-
roribus minimis obnoxiae: Pars posterior. Gottingische gelehrte Anzeigen,
32:313-318, 1823. Cited and reprinted in [138, v.4, pp. 100-104]. English
translation in [140].
[134] Carl Friedrich Gauss. Theoria combinationis observationum erroribus minimis
obnoxiae: Pars posterior. Commentatines societatis regiae scientarium Got-
tingensis recentiores, 5, 1823. Cited and reprinted in [138, v.4, pp.27-53].
French, German, and English translations in [136,139,140].
[135] Carl Friedrich Gauss. Supplementum theoriae combinationis observationum
erroribus minimis obnoxiae. Commentatines societatis regiae scientarium Got-
tingensis recentiores, 6, 1828. Cited and reprinted in [138, v.4, pp.55-93].
French, German, and English translations in [136,139,140].
[136] Carl Friedrich Gauss. Methode des Moindres Carres. Ballet-Bachelier, Paris,
1855. Translation by J. Bertrand of various works of Gauss on least squares.
[137] Carl Friedrich Gauss. Theory of the Motion of the Heavenly Bodies Moving
about the Sun in Conic Sections. Little, Brown, and Company, 1857. Trans-
lation by Charles Henry Davis of Theoria Motus [130]. Reprinted by Dover,
New York, 1963.
[138] Carl Friedrich Gauss. Werke. Koniglichen Gesellschaft der Wissenschaften zu
Gottingen, 1870-1928.
www.pdfgrip.com
REFERENCES 427
[139] Carl Friedrich Gauss. Abhandlungen zur Methode der kleinsten Quadrate. P.
Stankeiwica', Berlin, 1887. Translation by A. Borsch and P. Simon of various
works of Gauss on least squares.
[ 140] Carl Friedrich Gauss. Theory of the Combination of Observations Least Subject
to Errors. SIAM, Philadelphia, 1995. Translation by G. W. Stewart.
[141] W. M. Gentleman. Least squares computations by Givens transformations with-
out square roots. Journal of the Institute of Mathematics and Its Applications,
12:329-336,1973.
[142] W. M. Gentleman. Error analysis of QR decompositions by Givens transforma-
tions. Linear Algebra and Its Applications, 10:189-197,1975.
[143] J. A. George and J. W. H. Liu. Computer Solution of Large Sparse Positive
Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, 1981.
[144] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic
Press, New York, 1981.
[145] W. Givens. Numerical computation of the characteristic values of a real matrix.
Technical Report 1574, Oak Ridge National Laboratory, Oak Ridge, TN, 1954.
[146] D. Goldberg. Computer arithmetic. Appendix A in [173], 1990.
[147] D. Goldberg. What every computer scientist should know about floating-point
arithmetic. ACM Computing Surveys, 23:5-48,1991.
[148] G. H. Golub. Numerical methods for solving least squares problems. Nu-
merische Mathematik, 7:206-216,1965.
[149] G. H. Golub. Matrix decompositions and statistical computation. In R. C. Mil-
ton and J. A. Nelder, editors, Statistical Computation, pages 365-397. Aca-
demic Press, New York, 1969.
[150] G. H. Golub, A. Hoffman, and G. W. Stewart. A generalization of the Eckart-
Young matrix approximation theorem. Linear Algebra and Its Applications,
88/89:317-327,1987.
[151] G. H. Golub and C. F. Van Loan. An analysis of the total least squares problem.
SIAM Journal on Numerical Analysis, 17:883-893,1980.
[152] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Uni-
versity Press, Baltimore, MD, second edition, 1989.
[153] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Uni-
versity Press, Baltimore, MD, third edition, 1996.
[154] G. H. Golub and J. H. Wilkinson. Iterative refinement of least squares solu-
tions. In W. A. Kalenich, editor, Proceedings of the IFIP Congress 65, New
York, 1965, pages 606-607. Spartan Books, Washington, 1965. Cited in [41].
www.pdfgrip.com
428 REFERENCES
REFERENCES 429
430 REFERENCES
REFERENCES 431
432 REFERENCES
[217] S.J.Leon. Linear Algebra with Applications. Macmillan, New York, fourth
edition, 1994.
[218] S. B. Lippman. C++ Primer. Addison-Wesley, Reading, MA, second edition,
1991.
[219] K. J. R. Liu, D. P. O'Leary, G. W. Stewart, and Y.-J. J. Wu. URV ESPRIT
for tracking time-vary ing signals. IEEE Transactions on Signal Processing,
42:3441-3449,1994.
[220] C. C. Mac Duffee. The Theory of Matrices. Chelsea, New York, 1946.
[221] J. P. Mallory. In Search of the Indo-Europeans: Language, Archaeology, and
Myth. Thames and Hudson, New York, 1989.
[222] M. Marcus. Basic Theorems in Matrix Theory. Applied Mathematics Series 57.
National Bureau of Standards, Washington, DC, 1960.
[223] M. Marcus and H. Mine. A Survey of Matrix Theory and Matrix Inequalities.
Allyn and Bacon, Boston, 1964.
[224] M. Marcus and H. Mine. Introduction to Linear Algebra. Macmillan, New
York, 1965.
[225] R. S. Martin, C. Reinsch, and J. H. Wilkinson. Householder tridiagonalization
of a real symmetric matrix. Numerische Mathematik, 11:181-195,1968. Also
in [349, pp. 212-226].
[226] R. Mathias and G. W. Stewart. A block QR algorithm and the singular value
decomposition. Linear Algebra and Its Applications, 182:91-100,1993.
[227] M. Metcalf and J. Reid. Fortran 90 Explained. Oxford Science Publications,
Oxford, 1990.
[228] H. Minkowski. Theorie der Konvexen Korper, insbesondere Begriindung ihres
Oberflachenbegriffs. In David Hilbert, editor, Minkowski Abhandlung. Teubner
Verlag, 1911, posthumous.
[229] L. Mirsky. Symmetric gauge functions and unitarily invariant norms. Quarterly
Journal of Mathematics, 11:50-59,1960.
[230] C. B. Moler. Iterative refinement in floating point. Journal of the ACM, 14:316-
321,1967.
[231] C. B. Moler. Matrix computations with Fortran and paging. Communications
of the ACM, 15:268-270,1972.
[232] C. B. Moler, J. Little, and S. Bangert. Pro-Matlab User's Guide. The Math
Works, Natick, MA, 1987.
www.pdfgrip.com
REFERENCES 433
434 REFERENCES
[247] O. 0sterby and Z. Zlatev. Direct Methods for Sparse Matrices. Lecture Notes
in Computer Science 157. Springer-Verlag, New York, 1983.
[248] D. V. Ouellette. Schur complement and statistics. Linear Algebra and Its Ap-
plications, 36:187-295,1981.
[249] C. C. Paige and M. A. Saunders. Toward a generalized singular value decom-
position. SI AM Journal on Numerical Analysis, 18:398-405,1981.
[250] C. C. Paige and M. Wei. History and generality of the CS-decomposition. Lin-
ear Algebra and Its Applicationsn, 208/209:303-326,1994.
[251] H. Park and L. Elden. Downdating the rank-revealing URV decomposition.
SI AM Journal on Matrix Analysis and Applications, 16:138-155,1995.
[252] B. N. Parlett. Analysis of algorithms for reflections in bisectors. SIAM Review,
13:197-208,1971.
[253] B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood
Cliffs, NJ, 1980. Reissued with revisions by SIAM, Philadelphia, 1998.
[254] B. N. Parlett and J. K. Reid. On the solution of a system of linear equations
whose matrix is symmetric but not definite. BIT, 10:386-397,1970.
[255] E. Part-Enander, A. Sjoberg, B. Melin, and P. Isaksson. The MATLAB Hand-
book. Addison-Wesley, Reading, MA, 1996.
[256] R. V. Patel, A. J. Laub, and P. M. Van Dooren, editors. Numerical Linear Alge-
bra Techniques for Systems and Control. IEEE Press, Piscataway, NJ, 1994.
[257] D. A. Patterson and J. L. Hennessy. Computer Organization and Design: The
Hardware/Software Interface. Morgan Kaufmann, San Mateo, CA, 1994.
[258] G. Peano. Integration par series des equations differentielles lineaires. Mathe-
matische Annallen, 32:450-456,1888.
[259] R. Penrose. A generalized inverse for matrices. Proceedings of the Cambridge
Philosophical Society, 51:406-413,1955.
[260] V. Pereyra. Stability of general systems of linear equations. Aequationes Math-
ematicae, 2:194-206,1969.
[261] G. Peters and J. H. Wilkinson. The least squares problem and pseudo-inverses.
The Computer Journal, 13:309-316,1970.
[262] G. Peters and J. H. Wilkinson. On the stability of Gauss-Jordan elimination.
Communications of the ACM, 18:20-24,1975.
[263] R. L. Plackett. The discovery of the method of least squares. Biometrika,
59:239-251,1972.
www.pdfgrip.com
REFERENCES 435
436 REFERENCES
REFERENCES 437
438 REFERENCES
[309] G. W. Stewart. The triangular matrices of Gaussian elimination and related de-
compositions. IMA Journal on Numerical Analysis, 17:7-16,1997.
[310] G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press,
New York, 1990.
[311] S. M. Stigler. The History of Statistics. Belknap Press, Cambridge, MA, 1986.
[312] G. Strang. Linear Algebra and Its Applications. Academic Press, New York,
third edition, 1988.
[313] G. Strang. Wavelet transforms versus Fourier transforms. Bulletin oftheAMS,
28:288-305,1993.
[314] J.-G. Sun. Perturbation bounds for the Cholesky and QR factorizations. BIT,
31:341-352,1991.
[315] J.-G. Sun. Rounding-error and perturbation bounds for the Cholesky and
LDLT factorizations. Linear Algebra and Its Applications, 173:77-98,1992.
[316] A. J. Sutcliffe, editor. The New York Public Library Writer's Guide to Style and
Usage. Stonesong Press, HarperCollins, New York, 1994.
[317] A. S. Tannenbaum. Modern Operating Systems. Prentice-Hall, Englewood
Cliffs, NJ, 1992.
[318] I. Todhunter. A History of the Mathematical Theory of Probability from the Time
of Pascal to that of Laplace. G. E. Stechert, New York, 1865. Reprint 1931.
[319] L. N. Trefethen and D. Bau, III. Numerical Linear Algebra. SIAM, Philadel-
phia, 1997.
[320] L. N. Trefethen and R. S. Schreiber. Average-case stability of Gaussian elimi-
nation. SIAM Journal on Matrix Analysis and Applications, 11:335-360,1990.
[321] A. M. Turing. Rounding-off errors in matrix processes. The Quarterly Journal
of Mechanics and Applied Mathematics, 1:287-308,1948.
[322] H. W. Turnbull and A. C. Aitken. An Introduction to the Theory of Canonical
Matrices. Blackie and Son, London, 1932.
[323] A. van der Sluis. Condition, equilibration, and pivoting in linear algebraic sys-
tems. Numerische Mathematik, 15:74-86,1970.
[324] A. van der Sluis. Stability of solutions of linear algebraic systems. Numerische
Mathematik, 14:246-251,1970.
[325] S. Van Huffel and J. Vandewalle. The Total Least Squares Problem: Computa-
tional Aspects and Analysis. SIAM, Philadelphia, 1991.
[326] C. F. Van Loan. A general matrix eigenvalue algorithm. SIAM Journal on Nu-
merical Analysis, 12:819-834,1975.
www.pdfgrip.com
REFERENCES 439
[327] C. F. Van Loan. How near is a stable matrix to an unstable matrix? Contempo-
rary Mathematics, 41:465-411,1985. Reprinted and cited in [256].
[328] C. F. Van Loan. On the method of weighting for equality constrained least
squares. SI AM Journal on Numerical Analysis, 22:851-864,1985.
[329] C. F. Van Loan. On estimating the condition of eigenvalues and eigenvectors.
Linear Algebra and Its Applications, 88/89:715-732,1987.
[330] R. S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, NJ,
1962.
[331] J. von Neumann and H. H. Goldstine. Numerical inverting of matrices of high
order. Bulletin of 'the American Mathematical Society, 53:1021-1099,1947.
[332] E. L. Wachspress. Iterative Solution of Elliptic Systems. Prentice-Hall, Engle-
wood Cliffs, NJ, 1966.
[333] D. S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons,
New York, 1991.
[334] J. H. M. Wedderbum. Lectures on Matrices. American Mathematical Society
Colloquium Publications, V. XVII. American Mathematical Society, New York,
1934.
[335] P.-A. Wedin. Perturbation bounds in connection with singular value decompo-
sition. BIT, 12:99-111,1972.
[336] P.-A. Wedin. Pertubation theory for pseudo-inverses. BIT, 13:217-232,1973.
[337] K. Weierstrass. Zur Theorie der bilinearen und quadratischen Formen. Monat-
shefte Akadamie Wissenshaften Berlin, pages 310-338,1868.
[338] H. Weyl. Das asymptotische Verteilungsgesetz der Eigenwerte linearer par-
tieller Differentialgleichungen (mit einer Anwendung auf die Theorie der
Hohlraumstrahlung). Mathematische Annalen, 71:441-479,1912.
[339] E. Whittaker and G. Robinson. The Calculus of Observations. Blackie and Son,
London, fourth edition, 1944.
[340] N. Wiener. Limit in terms of continuous transformations. Bulletin de le Societe
Mathematique de France, 50:119-134,1922.
[341] M. Wilkes. Slave memories and dynamic storage allocation. IEEE Transac-
tions on Electronic Computers, EC-14:270-271,1965.
[342] J. H. Wilkinson. Error analysis of floating-point computation. Numerische
Mathematik, 2:319-340,1960.
[343] J. H. Wilkinson. Householder's method for the solution of the algebraic eigen-
value problem. Computer Journal, 3:23-27,1960.
www.pdfgrip.com
440 REFERENCES
INDEX
Underlined page numbers indicate a defining entry. Slanted page numbers indicate an
entry in a notes and references section. Page numbers followed by an "n" refer to a
footnote; those followed by an "a," to an algorithm. The abbreviation me indicates
that there is more information at the main entry for this item. Only authors mentioned
explicitly in the text are indexed.
441
www.pdfgrip.com
442 INDEX
INDEX 443
444 INDEX
INDEX 445
446 INDEX
INDEX 447
448 INDEX
INDEX 449
450 INDEX
INDEX 451
452 INDEX
INDEX 453
454 INDEX
INDEX 455
and rounding error, 127 Saunders, M. A., 77, 325, 355, 367
and significant digits, 124, 141,210 scalar, 2
norm wise, 210 as a 1-vector or a 1 x 1 matrix, 9
reciprocity, 123, 209 notational conventions, 2, 7
residual system, see least squares scalar-matrix product, see matrix
Rew, R. K., 400 scalar-vector product, see vector
Rice, J. R., 292 scalar product, see inner product
Rigal, J. L., 225 scaling, 247
right rotation, see plane rotation and condition, 216-217,224
rounding-error analysis, 81,128 and Gaussian elimination, 241-242
cross-product matrix, 301 and pivoting, 235
forward error analysis, 130 approximate balancing, 247
general references, 141 equal error, 216-217,242, 247
inverse matrix, 234-235,246 for minimum condition, 247
modified Gram-Schmidt algorithm, in rank determination, 366-367
280-281,292,324 whitening noise, 366
residual system, 322 scaling a matrix, 22
see also backward rounding-error Schmidt, E., 76, 289
analysis Schmidt-Mirsky theorem, 69
rounding-error bounds Schreiber, R. S., 247,290
assessment, 136 Schur,J., 27, 181,289
pessimism of bounds, 136 Schur complement, 150, 155
slow growth, 135 and block LU decomposition, 163
rounding error, 127-128 and Gaussian elimination, 150
adjusted rounding unit, 131, 143 and LU decomposition, 153, 181
chopping, 128 and QR decomposition, 253
effects on a sum, 136 generated by k steps of Gaussian
equivalence of forward substitution elimination, 158
and the axpy algorithm, 91 in positive definite matrix, 187
first-order bounds, 131 nested, 164
fl notation, 127, 143 nonsingularity, 155
general references, 141 via sweep operator, 331
inevitability, 121 see also Gaussian elimination, LU
in hexadecimal arithmetic, 144 decomposition
rigorous bounds, 131 semidefinite matrix, 186
rounding unit, 128, 142 seminormal equations, see least squares
approximation of, 142 corrected, 305 a
slow accumulation, 135 Sheffield, C., 292
truncation, 128 Shepherd, T. J., 326
varieties of rounding, 127 Sherman's march, see Gaussian
rounding unit, see rounding error elimination
row-sum norm, see oo-norm Sherman, J., 354
row index, see matrix signal processing, 357, 365, 400, 4J6
row orientation, see orientation of signal and noise subspaces, 365
algorithms significant digits
row space, 34, 365 and relative error, 124
row vector, 9 similarity transformation, 41
singular matrix, 37
www.pdfgrip.com
456 INDEX
INDEX 457
458 INDEX
Watkins, D. S., 79
weak stability, 133, 144,245, 324, 325
corrected seminormal equations, 304