0% found this document useful (0 votes)
3 views

Chapter3

Uploaded by

zihanliangeddie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter3

Uploaded by

zihanliangeddie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Chapter 3

Solution of Linear Systems

Linear systems of equations are ubiquitous in scientific computing – they arise when solving problems
in many applications, including biology, chemistry, physics, engineering and economics, and they
appear in nearly every chapter of this book. A fundamental numerical problem involving linear
systems is that of finding a solution (if one exists) to a set of n linear equations in n unknowns.
The first digital computers (developed in the 1940’s primarily for scientific computing problems)
required about an hour to solve linear systems involving only 10 equations in 10 unknowns. Modern
computers are substantially more powerful, and we can now solve linear systems involving thousands
of equations in thousands of unknowns in a fraction of a second. Indeed, many problems in science
and industry involve millions of equations in millions of unknowns. In this chapter we study the
most commonly used algorithm, Gaussian elimination with partial pivoting, to solve these important
problems.
After a brief introduction to linear systems, we discuss computational techniques for problems
(diagonal, lower and upper triangular) that are simple to solve. We then describe Gaussian elimina-
tion with partial pivoting, which is an algorithm that reduces a general linear system to one that is
simple to solve. Important matrix factorizations associated with Gaussian elimination are described,
and issues regarding accuracy of computed solutions are discussed. The chapter ends with a section
describing Matlab implementations, as well as the main tools provided by Matlab for solving
linear systems.

3.1 Linear Systems


A linear system of order n consists of the n linear algebraic equations

a1,1 x1 + a1,2 x2 + · · · + a1,n xn = b1


a2,1 x1 + a2,2 x2 + · · · + a2,n xn = b2
..
.
an,1 x1 + an,2 x2 + · · · + an,n xn = bn

in the n unknowns x1 , x2 , . . . , xn . A solution of the linear system is a set of values x1 , x2 , . . . , xn that


satisfy all n equations simultaneously. Problems involving linear systems are typically formulated
using the linear algebra language of matrices and vectors. To do this, first group together the
quantities on each side of the equals sign as vectors,
2 3 2 3
a1,1 x1 + a1,2 x2 + · · · + a1,n xn b1
6 a2,1 x1 + a2,2 x2 + · · · + a2,n xn 7 6 b2 7
6 7 6 7
6 .. 7 = 6 .. 7
4 . 5 4 . 5
an,1 x1 + an,2 x2 + · · · + an,n xn bn

53
54 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

and recall from Section 1.2 that the vector on the left side of the equal sign can be written as a
linear combination of column vectors, or equivalently as a matrix–vector product:
2 3 2 3 2 3 2 32 3 2 3
a1,1 a1,2 a1,n a1,1 a1,2 · · · a1,n x1 b1
6 a2,1 7 6 a2,2 7 6 a2,n 7 6 a2,1 a2,2 · · · a2,n 7 6 x2 7 6 b2 7
6 7 6 7 6 7 6 76 7 6 7
6 .. 7 x1 + 6 .. 7 x2 + · · · + 6 .. 7 xn = 6 .. .. .. .. 7 6 .. 7 = 6 .. 7
4 . 5 4 . 5 4 . 5 4 . . . . 5 4 . 5 4 . 5
an,1 an,2 an,n an,1 an,2 · · · an,n xn bn
Thus, in matrix–vector notation, the linear system is represented as
Ax = b
where
2 3
a1,1 a1,2 ··· a1,n
6 a2,1 a2,2 ··· a2,n 7
6 7
A=6 . .. .. .. 7 is the coefficient matrix of order n,
4 .. . . . 5
an,1 an,2 ··· an,n
2 3
x1
6 x2 7
6 7
x=6 . 7 is the unknown, or solution vector of length n, and
4 .. 5
xn
2 3
b1
6 b2 7
6 7
b=6.7 is the right hand side vector of length n.
4 .. 5
bn
We consider problems where the coefficients ai,j and the right hand side values bi are real numbers.
A solution of the linear system Ax = b of order n is a vector x that satisfies the equation Ax = b.
The solution set of a linear system is the set of all its solutions.
Example 3.1.1. The linear system of equations
1x1 + 1x2 + 1x3 = 3
1x1 + ( 1)x2 + 4x3 = 4
2x1 + 3x2 + ( 5)x3 = 0
is of order n = 3 with unknowns x1 , x2 and x3 . The matrix–vector form is Ax = b where
2 3 2 3 2 3
1 1 1 x1 3
A=4 1 1 4 5 , x = 4 x2 5 , b = 4 4 5
2 3 5 x3 0
This linear system has precisely one solution, given by x1 = 1, x2 = 1 and x3 = 1, so
2 3
1
x=4 1 5
1

Linear systems arising in most realistic applications are usually much larger than the order n = 3
system of the previous example. However, a lot can be learned about linear systems by looking at
small problems. Consider, for example, the 2 ⇥ 2 linear system:
  
a1,1 a1,2 x1 b1 a1,1 x1 + a1,2 x2 = b1
= ,
a2,1 a2,2 x2 b2 a2,1 x1 + a2,2 x2 = b2 .
3.1. LINEAR SYSTEMS 55

If a1,2 6= 0 and a2,2 6= 0, then we can write these equations as


a1,1 b1 a2,1 b2
x2 = x1 + and x2 = x1 + ,
a1,2 a1,2 a2,2 a2,2
which are essentially the slope-intercept form equations of two lines in a plane. Solutions of this
linear system consist of all values x1 and x2 that satisfy both equations; that is, all points (x1 , x2 )
where the two lines intersect. There are three possibilities for this simple example (see Fig. 3.1):
• Unique solution – the lines intersect at only one point.
• No solution – the lines are parallel, with di↵erent intercepts.
• Infinitely many solutions – the lines are parallel with the same intercept.

Unique solution. No solution. Infinitely many solutions.


4 4 4

3.5 3.5 3.5

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4

Figure 3.1: Possible solution sets for a general 2 ⇥ 2 linear system. The left plot shows two lines that
intersect at only one point (unique solution), the middle plot shows two parallel lines that do not
intersect at any points (no solution), and the right plot shows two identical lines (infinitely many
solutions).

This conclusion holds for linear systems of any order; that is, a linear system of order n has
either no solution, 1 solution, or an infinite number of distinct solutions. A linear system
Ax = b of order n is nonsingular if it has one and only one solution. A linear system is singular if
it has either no solution or an infinite number of distinct solutions; which of these two possibilities
applies depends on the relationship between the matrix A and the right hand side vector b, a matter
that is considered in a first linear algebra course.
Whether a linear system Ax = b is singular or nonsingular depends solely on properties of its
coefficient matrix A. In particular, the linear system Ax = b is nonsingular if and only if the matrix
A is invertible; that is, if and only if there is a matrix, A 1 , such that AA 1 = A 1 A = I, where I
is the identity matrix, 2 3
1 0 ··· 0
6 . 7
6 0 1 . . . .. 7
I=6 6 . .
7.
7
4 .. .. ... 0 5
0 ··· 0 1
So, we say A is nonsingular (invertible) if the linear system Ax = b is nonsingular, and A is singular
(non-invertible) if the linear system Ax = b is singular. It is not always easy to determine, a-priori,
whether or not a matrix is singular especially in the presence of roundo↵ errors. This point will be
addressed in Section 3.6.
Example 3.1.2. Consider the linear systems of order 2:
  
1 2 x1 5 x2 = 12 x1 + 52 x2 = 12 x1 + 52
(a) = ) )
3 4 x2 6 x2 = 34 x1 + 64 x2 = 34 x1 + 32
This linear system consists of two lines with unequal slopes. Thus the lines intersect at only
one point, and the linear system has a unique solution.
56 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

   1 5 1 5
1 2 x1 5 x2 = 2 x1 + 2 x2 = 2 x1 + 2
(b) = ) 3 6
) 1
3 6 x2 6 x2 = 6 x1
+ 6 x2 = 2 x1 + 1
This linear system consists of two lines with equal slopes. Thus the lines are parallel. Since
the intercepts are not identical, the lines do not intersect at any points, and the linear system
has no solution.
  
1 2 x1 5 x2 = 12 x1 + 52 x2 = 12 x1 + 52
(c) = ) )
3 6 x2 15 x2 = 36 x1 + 156 x2 = 12 x1 + 52

This linear system consists of two lines with equal slopes. Thus the lines are parallel. Since
the intercepts are also equal, the lines are identical, and the linear system has infinitely many
solutions.

Problem 3.1.1. Consider the linear system of Example 3.1.1. If the coefficient matrix A
remains unchanged,
2 3 then what choice of right hand side vector b would lead to the solution
1
vector x = 4 2 5?
3

Problem 3.1.2. Consider any linear system of equations of order 3. The solution of each
equation can be portrayed as a plane in a 3-dimensional space. Describe geometrically how
3 planes can intersect in a 3-dimensional space. Why must two planes that intersect at two
distinct points intersect at an infinite number of points? Explain how you would conclude
that if a linear system of order 3 has 2 distinct solutions it must have an infinite number of
distinct solutions.
Problem 3.1.3. Give an example of one equation in one unknown that has no solution.
Give another example of one equation in one unknown that has precisely one solution, and
another example of one equation in one unknown that has an infinite number of solutions.

Problem 3.1.4. The determinant of an n⇥n matrix, det(A), is a number that theoretically
can be used computationally to indicate if a matrix is singular. Specifically, if det(A) 6=
0 then A is nonsingular. The formula to compute det(A) for a general n ⇥ n matrix is
complicated, but there are some special cases where it can be computed fairly easily. In the
case of a 2 ⇥ 2 matrix, ✓ ◆
a b
det = ad bc.
c d
Compute the determinants of the 2 ⇥ 2 matrices in Example 3.1.2. Why do these results
make sense?
Important note: The determinant is a good theoretical test for singularity, but it is not a
practical test in computational problems. This is discussed in more detail in Section 3.6.

3.2 Simply Solved Linear Systems


Some linear systems are easy to solve. Consider the linear systems of order 3 displayed in Fig. 3.2.
By design, the solution of each of these linear systems is x1 = 1, x2 = 2 and x3 = 3. The structure of
these linear systems makes them easy to solve; to explain this, we first name the structures exhibited.
The entries of a matrix A = [ai,j ]ni,j=1 are partitioned into three classes:
(a) the diagonal entries, i.e., the entries ai,j for which i = j,
(b) the strictly lower triangular entries, i.e., the entries ai,j for which i > j, and

(c) the strictly upper triangular entries, i.e., the entries ai,j for which i < j.
3.2. SIMPLY SOLVED LINEAR SYSTEMS 57

2 32 3 2 3
( 1)x1 + 0x2 + 0x3 = 1 1 0 0 x1 1
(a) 0x1 + 3x2 + 0x3 = 6 , 4 0 3 0 5 4 x2 5 = 4 6 5
0x1 + 0x2 + ( 5)x3 = 15 0 0 5 x3 15
2 32 3 2 3
( 1)x1 + 0x2 + 0x3 = 1 1 0 0 x1 1
(b) 2x1 + 3x2 + 0x3 = 8 , 4 2 3 0 5 4 x2 5 = 4 8 5
( 1)x1 + 4x2 + ( 5)x3 = 8 1 4 5 x3 8
2 32 3 2 3
( 1)x1 + 2x2 + ( 1)x3 = 0 1 2 1 x1 0
(c) 0x1 + 3x2 + 6x3 = 24 , 4 0 3 6 5 4 x2 5 = 4 24 5
0x1 + 0x2 + ( 5)x3 = 15 0 0 5 x3 15

Figure 3.2: Simply Solved Linear Systems

The locations of the diagonal, strictly lower triangular, and strictly upper triangular entries of A
are illustrated in Fig. 3.3. The lower triangular entries are composed of the strictly lower triangular
and diagonal entries, as illustrated in Fig. 3.4. Similarly, the upper triangular entries are composed
of the strictly upper triangular and diagonal entries. Using this terminology, we say that the linear
system in Fig. 3.2(a) is diagonal, the linear system in Fig. 3.2(b) is lower triangular, and the linear
system in Fig. 3.2(c) is upper triangular.

a11 a12 a1n

(s
a21 tri
ctl
yu
pp
er
tria
ng
(d ular
(s iago
tri pa
ctl na rt)
yl lp
ow ar
er t)
tria
ng
ula
rp
ar
t) a
n−1,n

an1 an,n−1 ann

Figure 3.3: Illustration of strict triangular and diagonal matrix entries.

Problem 3.2.1. Let A be a matrix of order n. Show that A has n2 entries, that n2 n =
n(n 1) entries lie o↵ the diagonal, and that each strictly triangular portion of A has
n(n 1)
entries.
2
58 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

a a a a a12 a
11 12 1n 11 1n

a21 a22 a21 a22


(u
pp
er
tria
ng
ula
rp
(lo ar
t)
wer
tri
ang
ula
rp
ar
t)
a a
n−1,n n−1,n

an1 an,n−1 ann an1 an,n−1 ann

Figure 3.4: Illustration of the lower triangular and upper triangular parts of a matrix.

3.2.1 Diagonal Linear Systems


A matrix A of order n is diagonal if all its nonzero entries are on its diagonal. (This description of
a diagonal matrix does not state that the entries on the diagonal are nonzero.) A diagonal linear
system of equations of order n is one whose coefficient matrix is diagonal. Solving a diagonal linear
system of order n, like that in Fig. 3.2(a), is easy because each equation determines the value of one
unknown, provided that the diagonal entry is nonzero. So, the first equation determines the value
of x1 , the second x2 , etc. A linear system with a diagonal coefficient matrix is singular if it contains
a diagonal entry that is zero. In this case the linear system may have no solutions or it may have
infinitely many solutions.

1 6 15
Example 3.2.1. In Fig. 3.2(a) the solution is x1 = ( 1) = 1, x2 = 3 = 2 and x3 = ( 5) = 3.

Example 3.2.2. Consider the following singular diagonal matrix, A, and the vectors b and d:
2 3 2 3 2 3
3 0 0 1 1
A=4 0 0 0 5, b = 4 1 5, d = 4 0 5.
0 0 2 0 1

(a) The linear system Ax = b has no solution. Although we can solve the first and last equations
to get x1 = 13 and x3 = 0, it is not possible to solve the second equation:

1
0 · x1 + 0 · x2 + 0 · x3 = 1 ) 0· + 0 · x2 + 0 · 0 = 1 ) 0 · x2 = 1.
3
Clearly there is no value of x2 that satisfies the equation 0 · x2 = 1.
(b) The linear system Ax = d has infinitely many solutions. From the first and last equations we
obtain x1 = 13 and x3 = 12 . The second equation is

1 1
0 · x1 + 0 · x2 + 0 · x3 = 0 ) 0· + 0 · x2 + 0 · = 0 ) 0 · x2 = 0,
3 2
and thus x2 can be any real number.
3.2. SIMPLY SOLVED LINEAR SYSTEMS 59

Problem 3.2.2. Consider the matrix


2 3
1 0 0
A=4 0 2 0 5
0 0 0

Why is this matrix singular? Find a vector b so that the linear system Ax = b has no
solution. Find a vector b so that the linear system Ax = b has infinitely many solutions.

3.2.2 Column and Row Oriented Algorithms


Normally, an algorithm that operates on a matrix must choose between a row–oriented or a column–
oriented version depending on how the programming language stores matrices in memory. Typically,
a computer’s memory unit is designed so that the CPU can quickly access consecutive memory
locations. So, consecutive entries of a matrix usually can be accessed quickly if they are stored in
consecutive memory locations.
Matlab stores matrices in column–major order; numbers in the same column of the matrix are
stored in consecutive memory locations, so column–oriented algorithms generally run faster.
Other scientific programming languages behave similarly. For example, FORTRAN 77 stores
matrices in column–major order, and furthermore, consecutive columns of the matrix are stored
contiguously; that is, the entries in the first column are succeeded immediately by the entries in the
second column, etc. Generally, Fortran 90 follows the FORTRAN 77 storage strategy where feasible,
and column–oriented algorithms generally run faster. In contrast, C and C++ store matrices in row–
major order; numbers in the same row are stored in consecutive memory locations, so row–oriented
algorithms generally run faster. However, there is no guarantee that consecutive rows of the matrix
are stored contiguously, nor even that the memory locations containing the entries of one row are
placed before the memory locations containing the entries in later rows.
In the sections that follow, we present both row– and column–oriented algorithms for solving
linear systems.

3.2.3 Forward Substitution for Lower Triangular Linear Systems


A matrix A of order n is lower triangular if all its nonzero entries are either strictly lower triangular
entries or diagonal entries. A lower triangular linear system of order n is one whose coefficient
matrix is lower triangular. Solving a lower triangular linear system, like that in Fig. 3.2(b), is usually
carried out by forward substitution. Forward substitution determines first x1 , then x2 , and so on,
until all xi are found. For example, in Fig. 3.2(b), the first equation determines x1 . Given x1 , the
second equation then determines x2 . Finally, given both x1 and x2 , the third equation determines
x3 . This process is illustrated in the following example.

1 8 2x1 8 2·1
Example 3.2.3. In Fig. 3.2(b) the solution is x1 = ( 1) = 1, x2 = 3 = 3 = 2 and
8 ( 1)x1 4x2 8 ( 1)·1 4·2
x3 = ( 5) = ( 5) = 3.

To write a computer code to implement forward substitution, we must formulate the process
in a systematic way. To motivate the two most “popular” approaches to implementing forward
substitution, consider the following lower triangular linear system.
a1,1 x1 = b1
a2,1 x1 + a2,2 x2 = b2
a3,1 x1 + a3,2 x2 + a3,3 x3 = b3
By transferring the terms involving the o↵-diagonal entries of A to the right hand side we obtain
a1,1 x1 = b1
a2,2 x2 = b2 a2,1 x1
a3,3 x3 = b3 a3,1 x1 a3,2 x2
60 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

The right hand sides can be divided naturally into rows and columns.

• Row–oriented forward substitution updates (modifies) the right hand side one row at a time.
That is, after computing x1 , x2 , . . . , xi 1 , we update bi as:
i 1
X
bi := bi (ai,1 x1 + ai,2 x2 + · · · + ai,i 1 xi 1 ) = bi ai,j xj
j=1

The symbol “:=” means assign the value computed on the right hand side to the variable on
the left hand side. Here, and later, when the lower limit of the sum exceeds the upper limit
the sum is P
considered to be “empty”. In this process the variable bi is “overwritten” with the
i 1
value bi j=1 ai,j xj . With this procedure to update the right hand side, an algorithm for
row-oriented forward substitution could have the form:

for each i = 1, 2, . . . , n
Pi 1
update bi := bi j=1 ai,j xj
compute xi := bi /ai,i

Notice that each update step uses elements in the ith row of A, ai,1 , ai,2 , . . . , ai,i 1.

• Column–oriented forward substitution updates (modifies) the right hand side one column at a
time. That is, after computing xj , we update bj+1 , bj+2 , . . . , bn as:

bj+1 := bj+1 aj+1,j xj


bj+2 := bj+2 aj+2,j xj
..
.
bn := bn an,j xj

With this procedure, an algorithm for column-oriented forward substitution could have the
form:

for each j = 1, 2, . . . , n
compute xj := bj /aj,j
update bi := bi ai,j xj , i = j + 1, j + 2, . . . , n

Notice in this case the update steps use elements in the jth column of A, aj+1,j , aj+2,j , . . . , an,j .

Pseudocodes implementing row– and column–oriented forward substitution are presented in Fig. 3.5.
Note that:
• We again use the symbol “:=” to assign values to variables.
• The “for” loops step in ones. So, “for i = 1 to n means execute the loop for each value i = 1,
i = 2, until i = n, in turn. (Later, in Fig. 3.6 we use “downto” when we want a loop to count
backwards in ones.)
• When a loop counting forward has the form, for example, “for j = 1 to i 1” and for a given
value of i we have i 1 < 1 then the loop is considered empty and does not execute. A similar
convention applies for empty loops counting backwards.

• The algorithms destroy the original entries of b. If these entries are needed for later calculations,
they must be saved elsewhere. In some implementations, the entries of x are written over the
corresponding entries of b.
3.2. SIMPLY SOLVED LINEAR SYSTEMS 61

Row–Oriented Column–Oriented

Input: matrix A = [ai,j ] Input: matrix A = [ai,j ]


vector b = [bi ] vector b = [bj ]
Output: solution vector x = [xi ] Output: solution vector x = [xj ]

for i = 1 to n for j = 1 to n
for j = 1 to i 1 xj := bj /aj,j
bi := bi ai,j xj for i = j + 1 to n
next j bi := bi ai,j xj
xi := bi /ai,i next i
next i next j

Figure 3.5: Pseudocode for Row– and Column–Oriented Forward Substitution. Note that the algo-
rithm assumes that the matrix A is lower triangular.

The forward substitution process can break down if a diagonal entry of the lower triangular
matrix is zero. In this case, the lower triangular matrix is singular, and a linear system involving
such a matrix may have no solution or infinitely many solutions.

Problem 3.2.3. Why is a diagonal linear system also lower triangular?

Problem 3.2.4. Illustrate the operation of column–oriented forward substitution when used
to solve the lower triangular linear system in Fig. 3.2(b). [Hint: Show the value of b each
time it has been modified by the for-loop and the value of each entry of x as it is computed.]

Problem 3.2.5. Use row–oriented forward substitution to solve the linear system:

3x1 + 0x2 + 0x3 + 0x4 = 6


2x1 + ( 3)x2 + 0x3 + 0x4 = 7
1x1 + 0x2 + 5x3 + 0x4 = 8
0x1 + 2x2 + 4x3 + ( 3)x4 = 3

Problem 3.2.6. Repeat Problem 3.2.5 but using the column–oriented version.

Problem 3.2.7. Modify the row– and the column–oriented pseudocodes in Fig. 3.5 so that
the solution x is written over the right hand side b.

Problem 3.2.8. Consider a general lower triangular linear system of order n. Show that
n(n 1) n(n 1)
row–oriented forward substitution costs multiplications, subtractions, and
2 2
Pn n(n + 1)
n divisions. [Hint: i=1 i = .]
2
Problem 3.2.9. Repeat Problem 3.2.8 for the column–oriented version of forward substi-
tution.
Problem 3.2.10. Develop pseudocodes, analogous to those in Fig. 3.5, for row–oriented
and column–oriented methods of solving the following linear system of order 3

a1,1 x1 + a1,2 x2 + a1,3 x3 = b1


a2,1 x1 + a2,2 x2 = b2
a3,1 x1 = b3
62 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Problem 3.2.11. Consider the matrix and vector


2 3 2 3
3 0 0 1
A=4 1 2 0 5, b = 4 1 5.
1 1 0 c

Why is A singular? For what values c does the linear system Ax = b have no solution? For
what values c does the linear system Ax = b have infinitely many solutions?

3.2.4 Backward Substitution for Upper Triangular Linear Systems


A matrix A of order n is upper triangular if its nonzero entries are either strictly upper triangular
entries or diagonal entries. An upper triangular linear system or order n is one whose coefficient
matrix is upper triangular. Solving an upper triangular linear system, like that in Fig. 3.2(c), is
usually carried out by backward substitution. Backward substitution determines first xn , then
xn 1 , and so on, until all xi are found. For example, in Fig. 3.2(c), the third equation determines
the value of x3 . Given the value of x3 , the second equation then determines x2 . Finally, given x2
and x3 , the first equation determines x1 . This process is illustrated in the following example.

15 24 6x3 24 6·3
Example 3.2.4. In the case in Fig. 3.2(c) the solution is x3 = ( 5) = 3, x2 = 3 = 3 =2
0 ( 1)x3 2x2 0 ( 1)·3 2·2
and x1 = ( 1) = ( 1) = 1.

As with forward substitution, there are two popular implementations of backward substitution.
To motivate these implementations, consider the following upper triangular linear system.

a1,1 x1 + a1,2 x2 + a1,3 x3 = b1


a2,2 x2 + a2,3 x3 = b2
a3,3 x3 = b3

By transferring the terms involving the o↵-diagonal entries of A to the right hand side we obtain

a1,1 x1 = b1 a1,3 x3 a1,2 x2


a2,2 x2 = b2 a2,3 x3
a3,3 x3 = b3

The right hand sides of these equations can be divided naturally into rows and columns.
• Row–oriented backward substitution updates (modifies) the right hand side one row at a time.
That is, after computing xn , xn 1 , . . . , xi+1 , we update bi as:
n
X
bi := bi (ai,i+1 xi+1 + ai,i+2 xi+2 + · · · + ai,n xn ) = bi ai,j xj
j=i+1
Pn
In this case the variable bi is “overwritten” with the value bi j=i+1 ai,j xj . With this
procedure to update the right hand side, an algorithm for row-oriented backward substitution
could have the form:

for each i = n, n 1, . . . , 1
Pn
update bi := bi j=i+1 ai,j xj
compute xi := bi /ai,i

Notice that each update step uses elements in the ith row of A, ai,i+1 , ai,i+2 , . . . , ai,n .
3.2. SIMPLY SOLVED LINEAR SYSTEMS 63

• Column–oriented backward substitution updates (modifies) the right hand side one column at
a time. That is, after computing xj , we update b1 , b2 , . . . , bj 1 as:

b1 := b1 a1,j xj
b2 := b2 a2,j xj
..
.
bj 1 := bj 1 aj 1,j xj

With this procedure, an algorithm for column-oriented backward substitution could have the
form:

for each j = n, n 1, . . . , 1
compute xj := bj /aj,j
update bi := bi ai,j xj , i = 1, 2, . . . , j 1

Notice in this case the update steps use elements in the jth column of A, a1,j , a2,j , . . . , aj 1,j .

Pseudocodes implementing row– and column–oriented backward substitution are presented in Fig. 3.6.

Row–Oriented Column–Oriented

Input: matrix A = [ai,j ] Input: matrix A = [ai,j ]


vector b = [bi ] vector b = [bj ]
Output: solution vector x = [xi ] Output: solution vector x = [xj ]

for i = n downto 1 do for j = n downto 1 do


for j = i + 1 to n xj := bj /aj,j
bi := bi ai,j xj for i = 1 to j 1
next j bi := bi ai,j xj
xi := bi /ai,i next i
next i next j

Figure 3.6: Pseudocode Row– and Column–Oriented Backward Substitution. Note that the algorithm
assumes that the matrix A is upper triangular.

The backward substitution process can break down if a diagonal entry of the upper triangular
matrix is zero. In this case, the upper triangular matrix is singular, and a linear system involving
such a matrix may have no solution or infinitely many solutions.

Problem 3.2.12. Why is a diagonal linear system also upper triangular?

Problem 3.2.13. Illustrate the operation of column–oriented backward substitution when


used to solve the upper triangular linear system in Fig. 3.2(c). Hint: Show the value of
b each time it has been modified by the for-loop and the value of each entry of x as it is
computed.

Problem 3.2.14. Use row–oriented backward substitution to solve the linear system:

2x1 + 2x2 + 3x3 + 4x4 = 20


0x1 + 5x2 + 6x3 + 7x4 = 34
0x1 + 0x2 + 8x3 + 9x4 = 25
0x1 + 0x2 + 0x3 + 10x4 = 10
64 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Problem 3.2.15. Repeat Problem 3.2.14 using the column–oriented backward substitution.
Problem 3.2.16. Modify the row– and column–oriented pseudocodes for backward substi-
tution so that the solution x is written over b.
Problem 3.2.17. Consider a general upper triangular linear system of order n. Show that
n(n 1) n(n 1)
row–oriented backward substitution costs multiplications, subtractions,
2 2
and n divisions.
Problem 3.2.18. Repeat Problem 3.2.17 using column–oriented backward substitution.
Problem 3.2.19. Develop pseudocodes, analogous to those in Fig. 3.6, for row– and
column–oriented methods of solving the following linear system:

a1,3 x3 = b1
a2,2 x2 + a2,3 x3 = b2
a3,1 x1 + a3,2 x2 + a3,3 x3 = b3

Problem 3.2.20. Consider the matrix and vector


2 3 2 3
1 2 3 2
A=4 0 0 4 5, b = 4 c 5.
0 0 2 4

Why is A singular? For what values c does the linear system Ax = b have no solution? For
what values c does the linear system Ax = b have infinitely many solutions?
Problem 3.2.21. The determinant of a triangular matrix, be it diagonal, lower triangular,
or upper triangular, is the product of its diagonal entries.
(a) Show that a triangular matrix is nonsingular if and only if each of its diagonal entries
is nonzero.

(b) Compute the determinant of each of the triangular matrices in Problems 3.2.5, 3.2.11,
3.2.14 and 3.2.20.
Why do these results make sense computationally?

3.3 Gaussian Elimination with Partial Pivoting


The 19th century German mathematician and scientist Carl Friedrich Gauss described a process,
called Gaussian elimination in his honor, that uses two elementary operations systematically to trans-
form any given linear system into one that is easy to solve. (Actually, the process was supposedly
known centuries earlier.)
The two elementary operations are
(a) exchange two equations
(b) subtract a multiple of one equation from any other equation.
Applying either type of elementary operations to a linear system does not change its solution set.
So, we may apply as many of these operations as needed, in any order, and the resulting system
of linear equations has the same solution set as the original system. To implement the procedure,
though, we need a systematic approach in which we apply the operations. In this section we describe
the most commonly implemented scheme, Gaussian elimination with partial pivoting (GEPP). Here
our “running examples” will all be computed in exact arithmetic. In the next subsection, we will
recompute these examples in four significant digit decimal arithmetic to provide some first insight
into the e↵ect of rounding error in the solution of linear systems.
3.3. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 65

3.3.1 Outline of the GEPP Algorithm


For linear systems of order n, GEPP uses n stages to transform the linear system into upper trian-
gular form. At stage k we eliminate variable xk from all but the first k equations. To achieve this,
each stage uses the same two steps. We illustrate the process on the linear system:

1x1 + 2x2 + ( 1)x3 = 0


2x1 + ( 1)x2 + 1x3 = 7
( 3)x1 + 1x2 + 2x3 = 3

Stage 1. Eliminate x1 from all but the first equation.


The exchange step, exchanges equations so that among 1x1 + 2x2 + ( 1)x3 = 0
the coefficients multiplying x1 in all of the equations, the co- 2x1 + ( 1)x2 + 1x3 = 7
efficient in the first equation has largest magnitude. If there is ( 3) x1 + 1x2 + 2x3 = 3
more than one such coefficient, choose the first. If this coeffi-
cient with largest magnitude is nonzero, then it is underlined exchange # rows 1 and 3
and called the pivot for stage 1, otherwise stage 1 had no pivot
and we terminate the elimination algorithm. In this example, ( 3) x1 + 1x2 + 2x3 = 3
the pivot occurs in the third equation, so equations 1 and 3 are 2x1 + ( 1)x2 + 1x3 = 7
exchanged. 1x1 + 2x2 + ( 1)x3 = 0
The elimination step, eliminates variable x1 from all but
eliminate # x1
the first equation. For this example, this involves subtracting
m2,1 = 23 times equation 1 from equation 2, and m3,1 = 13 ( 3) x1 + 1x2 + 2x3 = 3
times equation 1 from equation 3. Each of the numbers mi,1 is 1 7
0x1 + ( 3 )x2 + 3 x3 =9
a multiplier; the first subscript on mi,1 indicates from which
7 1
equation the multiple of the first equation is subtracted. mi,1 0x1 + 3 x2 +( 3 )x3 =1
is computed as the coefficient of x1 in equation i divided by
the coefficient of x1 in equation 1 (that is, the pivot).

Stage 2. Eliminate x2 from all but the first two equations.


Stage 2 repeats the above steps for the variable x2 on a ( 3) x1 + 1x2 + 2x3 = 3
smaller linear system obtained by removing the first equation
1 7
from the system at the end of stage 1. This new system in- ( 3 )x2 + 3 x3 = 9
volves one fewer unknown (the first stage eliminated variable 7 1
3 x2 +( 3 )x3 = 1
x1 ).
The exchange step, exchanges equations so that among exchange # rows 2 and 3
the coefficients multiplying x2 in all of the remaining equations,
( 3) x1 + 1x2 + 2x3 = 3
the second equation has largest magnitude. If this coefficient
with largest magnitude is nonzero, then it is underlined and 7 1
3 x2 +( 3 )x3 = 1
called the pivot for stage 2, otherwise stage 2 has no pivot ( 1 7
3 )x2 + 3 x3 = 9
and we terminate. In our example, the pivot occurs in the
third equation, so equations 2 and 3 are exchanged. eliminate # x2
The elimination step, eliminates variable x2 from all be-
( 3) x1 + 1x2 + 2x3 = 3
low the second equation. For our example, this involves sub-
1/3 7 1
tracting m3,2 = 7/3 = 17 times equation 2 from equation 3 x2 + ( 3 )x3 = 1
3. The multiplier mi,2 is computed as the coefficient of x2 in 16 64
0x2 + 7 x3 = 7
equation i divided by the coefficient of x2 in equation 2 (that
is, the pivot).

This process continues until the last equation involves only one unknown variable, and the
transformed linear system is in upper triangular form. The last stage of the algorithm simply involves
identifying the coefficient in the last equation as the final pivot element. In our simple example, we
are done at stage 3, where we identify the last pivot element by underlining the coefficient for x3 in
66 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

the last equation, and obtain the upper triangular linear system

( 3) x1 + 1x2 + 2x3 = 3
7 1
3 x2 + ( 3 )x3 = 1
16 64
7 x3 = 7

GEPP is now finished. When the entries on the diagonal of the final upper triangular linear
system are nonzero, as it is in our illustrative example, the linear system is nonsingular and its
solution may be determined by backward substitution. (Recommendation: If you are determining
a solution by hand in exact arithmetic, you may check that your computed solution is correct by
showing that it satisfies all the equations of the original linear system. If you are determining the
solution approximately this check may be unreliable as we will see later.)
The diagonal entries of the upper triangular linear system produced by GEPP play an important
role. Specifically, the k th diagonal entry, i.e., the coefficient of xk in the k th equation, is the pivot
for the k th stage of GEPP. So, the upper triangular linear system produced by GEPP is nonsingular
if and only if each stage of GEPP has a pivot.
To summarize:
• GEPP can always be used to transform a linear system of order n into an upper triangular
linear system with the same solution set.
• The k th stage of GEPP begins with a linear system that involves n k + 1 equations in the
n k + 1 unknowns xk , xk+1 , · · · , xn . The k th stage of GEPP eliminates xk and ends with a
linear system that involves n k equations in the n k unknowns xk+1 , xk+2 , · · · , xn .
• If GEPP finds a non-zero pivot at every stage, then the final upper triangular linear system
is nonsingular. The solution of the original linear system can be determined by applying
backward substitution to this upper triangular linear system.
• If GEPP fails to find a non-zero pivot at some stage, then the linear system is singular and
the original linear system either has no solution or an infinite number of solutions.
We emphasize that if GEPP does not find a non-zero pivot at some stage, we can
conclude immediately that the original linear system is singular. Consequently, many
GEPP codes simply terminate elimination and return a message that indicates the original linear
system is singular.

Example 3.3.1. The previous discussion showed that GEPP transforms the linear system to upper
triangular form:

1x1 + 2x2 + ( 1)x3 = 0 ( 3) x1 + 1x2 + 2x3 = 3


7 1
2x1 + ( 1)x2 + 1x3 = 7 ! ··· ! 3 x2 + ( 3 )x3 = 1
( 3)x1 + 1x2 + 2x3 = 3 16 64
7 x3 = 7

Using backward substitution, we find that the solution of this linear system is given by:
64 1
7 1+ 3 ·4 3 1·1 2·4
x3 = 16 = 4, x2 = 7 = 1, x1 = = 2.
7 3
3

Example 3.3.2. Consider the linear system:

4x1 + 6x2 + ( 10)x3 = 0


2x1 + 2x2 + 2x3 = 6
1x1 + ( 1)x2 + 4x3 = 4
3.3. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 67

Applying GEPP to this example we obtain:

In the first stage, the largest coefficient in magni- 4 x1 + 6x2 + ( 10)x3 = 0


tude of x1 is already in the first equation, so no 2x1 + 2x2 + 2x3 = 6
exchange steps are needed. The elimination steps 1x1 + ( 1)x2 + 4x3 = 4
then proceed with the multipliers m2,1 = 24 = 0.5
and m3,1 = 14 = 0.25. #
4 x1 + 6x2 + ( 10)x3 = 0
0x1 + ( 1)x2 + 7x3 = 6
In the second stage, we observe that the largest
0x1 + ( 2.5) x2 + 6.5x3 = 4
x2 coefficient in magnitude in the last two equa-
tions occurs in the third equation, so the second #
and third equations are exchanged. The elim-
ination step then proceeds with the multiplier 4 x1 + 6x2 + ( 10)x3 = 0
m3,2 = 2.5 1
= 0.4. 0x1 + ( 2.5) x2 + 6.5x3 = 4
0x1 + ( 1)x2 + 7x3 = 6
#

4 x1 + 6x2 + ( 10)x3 = 0
In the final stage we identify the final pivot en- 0x1 + ( 2.5) x2 + 6.5x3 = 4
try, and observe that all pivots are non-zero, and 0x1 + 0x2 + 4.4 x3 = 4.4
thus the linear system is nonsingular and there is
a unique solution.

Using backward substitution, we find the solution of the linear system:


4.4 4 6.5 · 1 0 6 · 1 + 10 · 1
x3 = = 1, x2 = = 1, x1 = = 1.
4.4 2.5 4

Example 3.3.3. Consider the linear system:

1x1 + ( 2)x2 + ( 1)x3 = 2


( 1)x1 + 2x2 + ( 1)x3 = 1
3x1 + ( 6)x2 + 9x3 = 0

Applying GEPP to this example we obtain:

1x1 + ( 2)x2 + ( 1)x3 = 2


( 1)x1 + 2x2 + ( 1)x3 = 1
In the first stage, the largest coefficient in magni-
3 x1 + ( 6)x2 + 9x3 = 0
tude of x1 is in the third equation, so we exchange
the first and third equations. The elimination #
steps then proceed with the multipliers m2,1 = 31
and m3,1 = 13 . 3 x1 + ( 6)x2 + 9x3 = 0
( 1)x1 + 2x2 + ( 1)x3 = 1
1x1 + ( 2)x2 + ( 1)x3 = 2
In the second stage, we observe that all coefficients
#
multiplying x2 in the last two equations are zero.
We therefore fail to find a non-zero pivot, and con- 3 x1 + ( 6)x2 + 9x3 = 0
clude that the linear system is singular. 0x1 + 0x2 + 2x3 = 1
0x1 + 0x2 + ( 4)x3 = 2

Problem 3.3.1. Use GEPP followed by backward substitution to solve the linear system of
Example 3.1.1. Explicitly display the value of each pivot and each multiplier.
68 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Problem 3.3.2. Use GEPP followed by backward substitution to solve the following linear
system. Explicitly display the value of each pivot and multiplier.

3x1 + 0x2 + 0x3 + 0x4 = 6


2x1 + ( 3)x2 + 0x3 + 0x4 = 7
1x1 + 0x2 + 5x3 + 0x4 = 8
0x1 + 2x2 + 4x3 + ( 3)x4 = 3

Problem 3.3.3. Use GEPP and backward substitution to solve the following linear system.
Explicitly display the value of each pivot and multiplier.

2x1 + x3 = 1
x2 + 4x3 = 3
x1 + 2x2 = 2

Problem 3.3.4. GEPP provides an efficient way to compute the determinant of any matrix.
Recall that the operations used by GEPP are (1) exchange two equations, and (2) subtract a
multiple of one equation from another (di↵erent) equation. Of these two operations, only the
exchange operation changes the value of the determinant of the matrix of coefficients, and
then it only changes its sign. In particular, suppose GEPP transforms the coefficient matrix
A into the upper triangular coefficient matrix U using m actual exchanges, i.e., exchange
steps where an exchange of equations actually occurs. Then

det(A) = ( 1)m det(U ).

(a) Use GEPP to show that


02 31 02 31
4 6 10 4 6 10
det @4 2 2 2 5A = ( 1)1 det @4 0 2.5 6.5 5A = (4)( 2.5)(4.4) = 44
1 1 4 0 0 4.4

Recall from Problem 3.2.21 that the determinant of a triangular matrix is the product
of its diagonal entries.
(b) Use GEPP to calculate the determinant of each of the matrices:
2 3
2 3 1 1 1 1
1 1 1 6
4 2 6 2 3 4 5 7
7
1 2 5 and 4 1 2 2 1 5
4 3 0
2 6 3 7

3.3.2 The GEPP Algorithm in Inexact Arithmetic


The GEPP algorithm was outlined in the previous subsection using exact arithmetic throughout.
In reality, the solution of linear systems is usually computed using DP standard arithmetic and the
errors incurred depend on the IEEE DP (binary) representation of the original system and on the
e↵ects of rounding error in IEEE DP arithmetic. Since it is difficult to follow binary arithmetic
we will resort to decimal arithmetic to simulate the e↵ects of approximate computation. In this
subsection, we will use round-to-nearest 4 significant digit decimal arithmetic to solve the problems
introduced in the previous subsection.
First, we illustrate the process on the linear system in Example 3.3.1:

1.000x1 + 2.000x2 + ( 1.000)x3 = 0


2.000x1 + ( 1.000)x2 + 1.000x3 = 7.000
( 3.000)x1 + 1.000x2 + 2.000x3 = 3.000
3.3. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 69

Performing the interchange, calculating the multipliers m21 = 0.6667 and m31 = 0.3333, and
eliminating x1 from the second and third equations we have

( 3.000)x1 + 1.000x2 + 2.000x3 = 3.000


0x1 + ( 0.3333)x2 + 2.333x3 = 9.000
0x1 + 2.333x2 + ( 0.3334)x3 = 0.9999

Performing the interchange, calculating the multiplier m32 = 0.1429, and eliminating x2 from the
third equation we have

( 3.000)x1 + 1.000x2 + 2.000x3 = 3.000


0x1 + 2.333x2 + ( 0.3334)x3 = 0.9999
0x1 + 0x2 + 2.285x3 = 9.143

Backsolving we have x3 = 4.001, x2 = 1.000 and x1 = 2.001, a rounding error level perturbation of
the answer with exact arithmetic.
If we consider now Example 3.3.2, we see that all the working was exact in four significant digit
decimal arithmetic. So, we will get precisely the same answers as before. However, we cannot
represent exactly the arithmetic in this example in binary DP arithmetic. So, computationally, in
say Matlab, we would obtain an answer that di↵ered from the exact answer at the level of roundo↵
error.
Consider next the problem in Example 3.3.3:

1.000x1 + ( 2.000)x2 + ( 1.000)x3 = 2.000


( 1.000)x1 + 2.000x2 + ( 1.000)x3 = 1.000
3.000x1 + ( 6.000)x2 + 9.000x3 = 0

Performing the interchange, calculating the multipliers m21 = 0.3333 and m31 = 0.3333, and
eliminating x1 from the second and third equations we have

3.000x1 + ( 6.000)x2 + 9.000x3 = 0


0x1 + 0x2 + (2.000)x3 = 1.000
0x1 + 0x2 + ( 4.000)x3 = 2.000

That is, roundo↵ does not a↵ect the result and we still observe that the pivot for the next stage is
zero hence we must stop.
If, instead, we scale the second equation to give

1x1 + ( 2)x2 + ( 1)x3 = 2


( 7)x1 + 14x2 + ( 7)x3 = 7
3x1 + ( 6)x2 + 9x3 = 0

then the first step of GEPP is to interchange the first and second equations to give

( 7)x1 + 14x2 + ( 7)x3 = 7


1x1 + ( 2)x2 + ( 1)x3 = 2
3x1 + ( 6)x2 + 9x3 = 0

1 3
Next, we calculate the multipliers m21 = = 0.1429 and m31 = = 0.4286. Eliminating
( 7) ( 7)
we compute
( 7)x1 + 14x2 + ( 7)x3 = 7
0x1 + (0.001)x2 + ( 2)x3 = 3
0x1 + 0x2 + 6x3 = 3
Observe that the elimination is now complete. The result di↵ers from the exact arithmetic results
in just the (2, 2) position but this small change is crucial because the resulting system is now
nonsingular. When we compute a solution, we obtain, x3 = 0.5, x2 = 4000 and x1 = 8000; the
70 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

large values possibly giving away that something is wrong. In the next subsection we describe a
GEPP algorithm. This algorithm includes a test for singularity that would flag the above problem
as singular even though all the pivots are nonzero.

Example 3.3.4. Here is a further example designed to reinforce what we have just seen on the
e↵ects of inexact arithmetic. Using exact arithmetic, 2 stages of GEPP transforms the singular
linear system:
1x1 + 1x2 + 1x3 = 1
1x1 + ( 1)x2 + 2x3 = 2
3x1 + 1x2 + 4x3 = 4
into the triangular form:
3x1 + 1x2 + 4x3 = 4
0x1 + ( 4/3)x2 + 2/3x3 = 2/3
0x1 + 0x2 + 0x3 = 0
which has an infinite number of solutions.
Using GEPP with four significant digit decimal arithmetic, we compute multipliers m21 = m31 =
0.3333 and eliminating we get

3.000x1 + 1.000x2 + 4.000x3 = 4.000


0x1 + ( 1.333)x2 + 0.6670x3 = 0.6670
0x1 + 0.6667x2 + ( 0.3330)x3 = 0.3330

So, m32 = 0.5002 and eliminating we get

3.000x1 + 1.000x2 + 4.000x3 = 4.000


0x1 + ( 1.333)x2 + 0.6670x3 = 0.6670
0x1 + 0x2 + 0.0006000x3 = 0.0006000

and backsolving we compute x3 = 1 and x2 = x1 = 0, one solution of the original linear system.
Here, we observe some mild e↵ects of loss of significance due to cancellation. Later, in Section 3.3.4,
we will see much greater numerical errors due to loss of significance.

Problem 3.3.5. If the original first equation x1 + x2 + x3 = 1 in the example above is


replaced by x1 + x2 + x3 = 2, then with exact arithmetic 2 stages of GE yields 0x3 = 1 as
the last equation and there is no solution. In four significant digit decimal arithmetic, a
suspiciously large solution is computed. Show this and explain why it happens.

3.3.3 Implementing the GEPP Algorithm


The version of GEPP discussed in this section is properly called Gaussian elimination with partial
pivoting by rows for size. The phrase “partial pivoting” refers to the fact that only equations
are exchanged in the exchange step. An alternative is “complete pivoting” where both equations
and unknowns are exchanged. The phrase “by rows for size” refers to the choice in the exchange
step where a nonzero coefficient with largest magnitude is chosen as pivot. Theoretically, Gaussian
elimination simply requires that any nonzero number be chosen as pivot. Partial pivoting by rows for
size serves two purposes. First, it removes the theoretical problem when, prior to the k th exchange
step, the coefficient multiplying xk in the k th equation is zero. Second, in almost all cases, partial
pivoting improves the quality of the solution computed when finite precision, rather than exact,
arithmetic is used in the elimination step.
In this subsection we present detailed pseudocode that combines GEPP with backward substi-
tution to solve linear systems of equations. The basic algorithm is:
3.3. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 71

• for stages k = 1, 2, . . . , n 1
- find the kth pivot
- if the kth pivot is zero, quit – the linear system is singular
- perform row interchanges, if needed
- compute the multipliers
- perform the elimination
• for stage n, if the final pivot is zero, quit – the linear system is singular
• perform backward substitution

Before presenting pseudocode for the entire algorithm, we consider implementation details for some
of the GEPP steps. Assume that we are given the coefficient matrix A and right hand side vector b
of the linear system.
• To find the kth pivot, we need to find the largest of the values |ak,k |, |ak+1,k |, . . . , |an,k |. This
can be done using a simple search:
p := k
for i = k + 1 to n
if |ai,k | > |ap,k | then p := i
next i
• If the above search finds that p > k, then we need to exchange rows. This is fairly simple,
though we have to be careful to use “temporary” storage when overwriting the coefficients. (In
some languages, such as Matlab, you can perform the interchanges directly, as we will show
in Section 3.7; the temporary storage is created and used in the background and is invisible
to the user.)
if p > k then
for j = k to n
temp := ak,j
ak,j := ap,j
ap,j :=temp
next j
temp := bk
bk := bp
bp :=temp
endif
Notice that when exchanging equations, we must exchange the coefficients in the matrix A and
the corresponding values in the right hand side vector b. (Another possibility is to perform the
interchanges virtually; that is, to rewrite the algorithm so that we don’t physically interchange
elements but instead leave them in place but act as if they had been interchanged. This
involves using indirect addressing which may be more efficient than physically performing
interchanges.)
• Computing the multipliers, mi,k , and performing the elimination step can be implemented as:
for i = k + 1 to n
mi,k := ai,k /ak,k
ai,k := 0
for j = k + 1 to n
ai,j := ai,j mi,k ⇤ ak,j
next j
bi := bi mi,k ⇤ bk
next i
Since we know that ai,k should be zero after the elimination step, we do not perform the
elimination step on these values, and instead just set them to zero using the instruction ai,k :=
72 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

0. However, we note that since these elements are not involved in determining the solution via
backward substitution, this instruction is unnecessary, and has only a “cosmetic” purpose.
Putting these steps together, and including pseudocode for backward substitution, we obtain the
pseudocode shown in Fig. 3.7. Note that we implemented the elimination step to update ai,j in a
specific order. In particular, the entries in row k + 1 are updated first, followed by the entries in
row k + 2, and so on, until the entries in row n are updated. As with the forward and backward
substitution methods, we refer to this as row-oriented GEPP, and to be consistent we combine it with
row-oriented backward substitution. It is not difficult to modify the GEPP procedure so that it is
column-oriented, and in that case we would combine it with column-oriented backward substitution.
During stage k, the pseudocode in Fig. 3.7 only modifies equations (k + 1) through n and the
coefficients multiplying xk , xk+1 , · · · , xn . Of the three fundamental operations: add (or subtract),
multiply, and divide, generally add (or subtract) is fastest, multiply is intermediate, and divide
is slowest. So, rather than compute the multiplier as mi,k := ai,k /ak,k , one might compute the
reciprocal ak,k := 1/ak,k outside the i loop and then compute the multiplier as mi,k := ai,k ak,k . In
stage k, this would replace n k divisions with 1 division followed by n k multiplications, which is
usually faster. Keeping the reciprocal in ak,k also helps in backward substitution where each divide
is replaced by a multiply.

Problem 3.3.6. In Fig. 3.7, in the second for-j loop (the elimination loop) the code could
be logically simplified by recognizing that the e↵ect of ai,k = 0 could be achieved by removing
this statement and extending the for-j loop so it starts at j = k instead of j = k + 1. Why
is this NOT a good idea?
Problem 3.3.7. In Fig. 3.7, the elimination step is row–oriented. Change the elimination
step so it is column–oriented. Hint: You must exchange the order of the for-i and for-j
loops.

Problem 3.3.8. In Fig. 3.7, show that the elimination step of the k th stage of GEPP
requires n k divisions to compute the multipliers, and (n k)2 + n k multiplications
and subtractions to perform the elimination. Conclude that the elimination steps of GEPP
requires about

Pn 1 Pn 1 n(n 1) n2
k=1 (n k) = k=1 k = ⇠ divisions
2 23
Pn 1 2
Pn 1 2 n(n 1)(2n 1) n
k=1 (n k) = k=1 k = ⇠ multiplications and subtractions
6 3
f (n)
The notation f (n) ⇠ g(n) is used when the ratio approaches 1 as n becomes large.
g(n)
Problem 3.3.9. How does the result in Problem 3.3.8 change if we first compute the recip-
rocals of the pivots and then use these values in computing the multipliers?
Problem 3.3.10. How many comparisons does GEPP use to find pivot elements?
Problem 3.3.11. How many assignment statements are needed for all the interchanges in
the GEPP algorithm, assuming that at each stage an interchange is required?

3.3.4 The Role of Interchanges


To illustrate the e↵ect of the choice of pivot, we apply Gaussian elimination to the linear system
0.000025x1 + 1x2 = 1
1x1 + 1x2 = 2
3.3. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 73

Row-Oriented GEPP with Backward Substitution

Input: matrix A = [ai,j ]


vector b = [bi ]
Output: solution vector x = [xi ]

for k = 1 to n 1
p := k
for i = k + 1 to n
if |ai,k | > |ap,k | then p := i
next i
if p > k then
for j = k to n
temp := ak,j ; ak,j := ap,j ; ap,j := temp
next j
temp := bk ; bk := bp ; bp := temp
endif
if ak,k = 0 then return (linear system is singular)
for i = k + 1 to n
mi,k := ai,k /ak,k
ai,k := 0
for j = k + 1 to n
ai,j := ai,j mi,k ⇤ ak,j
next j
bi := bi mi,k ⇤ bk
next i
next k
if an,n = 0 then return (linear system is singular)

for i = n downto 1 do
for j = i + 1 to n
bi := bi ai,j ⇤ xj
next j
xi := bi /ai,i
next i

Figure 3.7: Pseudocode for row oriented GEPP and Backward Substitution
74 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

for which the exact solution is x1 = 40000 39998


39999 ⇡ 1 and x2 = 39999 ⇡ 1.
What result is obtained with Gaussian elimination using floating–point arithmetic? To make
pen-and-paper computation easy, we use round-to-nearest 4 significant digit decimal arithmetic; i.e.,
the result of every add, subtract, multiply, and divide is rounded to the nearest decimal number
with 4 significant digits.
Consider what happens if the exchange step is not used. The multiplier is computed exactly
1
because its value 0.000025 = 40000 rounds to itself. The elimination step subtracts 40000 times
equation 1 from equation 2. Now 40000 times equation 1 is computed exactly. So, the only rounding
error in the elimination step is at the subtraction. There, the coefficient multiplying x2 in the second
equation is 1 40000 = 39999, which rounds to 40000, and the right hand side is 2 40000 =
39998, which also rounds to 40000. So the result is

0.000025x1 + 1x2 = 1
0x1 + ( 40000)x2 = 40000

Backward substitution commits no further rounding errors and produces the approximate solution
40000 1 x2
x2 = = 1, x1 = =0
40000 0.000025
This computed solution di↵ers significantly from the exact solution. Why? Observe that the com-
puted solution has x2 = 1. This is an accurate approximation of its exact value 39998 39999 , in error
1 x2
by 39998
39999 1 = 1
39999 . When the approximate value of x 1 is computed as 0.000025 , catastrophic
cancellation occurs when the approximate value x2 = 1 is subtracted from 1.
If we include the exchange step, the result of the exchange step is

1x1 + 1x2 = 2
0.000025x1 + 1x2 = 1

The multiplier 0.000025


1 = 0.000025 is computed exactly, as is 0.000025 times equation 1. The result
of the elimination step is
1x1 + 1x2 = 2
0x1 + 1x2 = 1
because, in the subtract operation, 1 0.000025 = 0.999975 rounds to 1 and 1 2 ⇥ 0.000025 =
0.99995 rounds to 1. Backward substitution commits no further rounding errors and produces an
approximate solution
2 x2
x2 = 1, x1 = =1
1
that is correct to four significant digits.
The above computation uses round-to-nearest 4 significant digit decimal arithmetic. It leaves
open the question of what impact the choice of the number of digits in the above calculation so next
we repeat the calculation above using first round-to-nearest 5 significant digit decimal arithmetic
then round-to-nearest 3 significant digit decimal arithmetic to highlight the di↵erences that may
arise when computing to di↵erent accuracies.
Consider what happens if we work in round-to-nearest 5 significant digit decimal arithmetic
1
without exchanges. The multiplier is computed exactly because its value 0.000025 = 40000 rounds
to itself. The elimination step subtracts 40000 times equation 1 from equation 2. Now, 40000 times
equation 1 is computed exactly. So, the only possible rounding error in the elimination step is at
the subtraction. There, the coefficient multiplying x2 in the second equation is 1 40000 = 39999,
which is already rounded to five digits, and the right hand side is 2 40000 = 39998, which is
again correctly rounded. So the result is

0.000025x1 + 1x2 = 1
0x1 + ( 39999)x2 = 39998
3.3. GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING 75

Backward substitution produces the approximate solution


39998 1 x2
x2 = = .99997, x1 = = 1.2
39999 0.000025
So, the impact of the extra precision is to compute a less inaccurate result. Next, consider what
happens if we permit exchanges and use round-to-nearest 5 significant digit decimal arithmetic. The
result of the exchange step is
1x1 + 1x2 = 2
0.000025x1 + 1x2 = 1
The multiplier 0.000025
1 = 0.000025 is computed exactly, as is 0.000025 times equation 1. The result
of the elimination step is
1x1 + 1x2 = 2
0x1 + 0.99998x2 = 0.99995
because, in the subtract operation, 1 0.000025 = 0.999975 rounds to 0.99998 and 1 2⇥0.000025 =
0.99995 is correctly rounded. Backward substitution produces an approximate solution
2 x2
x2 = .99997, x1 = =1
1
that is correct to five significant digits
Finally, we work in round-to-nearest 3 significant digit decimal arithmetic without using ex-
1
changes. The multiplier is computed exactly because its value 0.000025 = 40000 rounds to itself. The
elimination step subtracts 40000 times equation 1 from equation 2. Now 40000 times equation 1 is
computed exactly. So, the only possible rounding error in the elimination step is at the subtraction.
There, the coefficient multiplying x2 in the second equation is 1 40000 = 40000, to three digits,
and the right hand side is 2 40000 = 40000, which is again correctly rounded. So, the result is

0.000025x1 + 1x2 = 1
0x1 + ( 40000)x2 = 40000

Backward substitution produces the approximate solution


40000 1 x2
x2 = = 1.0, x1 = = 0.0
40000 0.000025
So, for this example working to three digits without exchanges produces the same result as working
to four digits. The reader may verify that working to three digits with exchanges produces the
correct result to three digits
Partial pivoting by rows for size is a heuristic. One explanation why this heuristic is generally
successful is as follows. Given a list of candidates for the next pivot, those with smaller magnitude
are more likely to have been formed by a subtract magnitude computation of larger numbers, so the
resulting cancellation might make them less accurate than the other candidates for pivot. The mul-
tipliers determined by dividing by such smaller magnitude, inaccurate numbers are therefore larger
magnitude, inaccurate numbers. Consequently, elimination may produce large and unexpectedly
inaccurate coefficients; in extreme circumstances, these large coefficients may contain little informa-
tion from the coefficients of the original equations. While the heuristic of partial pivoting by rows
for size generally improves the chances that GEPP will produce an accurate answer, it is neither
perfect nor always better than any other interchange strategy.

Problem 3.3.12. Use Gaussian elimination to verify that the exact solution of the linear
system
0.000025x1 + 1x2 = 1
1x1 + 1x2 = 2
is
40000 39998
x1 = ⇡ 1, x2 = ⇡ 1
39999 39999
76 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Problem 3.3.13. (Watkins) Carry out Gaussian elimination without exchange steps and
with row–oriented backward substitution on the linear system:

0.002x1 + 1.231x2 + 2.471x3 = 3.704


1.196x1 + 3.165x2 + 2.543x3 = 6.904
1.475x1 + 4.271x2 + 2.142x3 = 7.888

Use round-to-nearest 4 significant digit decimal arithmetic. Display each pivot, each multi-
plier, and the result of each elimination step. Does catastrophic cancellation occur, and if
so where? Hint: The exact solution is x1 = 1, x2 = 1 and x3 = 1. The computed solution
with approximate arithmetic and no exchanges is x1 = 4.000, x2 = 1.012 and x3 = 2.000.

Problem 3.3.14. Carry out GEPP (with exchange steps) and row–oriented backward sub-
stitution on the linear system in Problem 3.3.13. Use round-to-nearest 4 significant digit
decimal arithmetic. Display each pivot, each multiplier, and the result of each elimination
step. Does catastrophic cancellation occur, and if so where?

Problem 3.3.15. Round the entries in the linear system in Problem 3.3.13 to three sig-
nificant digits. Now, repeat the calculations in Problems 3.3.13 and 3.3.14 using round-to-
nearest 3 significant digit decimal arithmetic. What do you observe?

3.4 Gaussian Elimination and Matrix Factorizations


The concept of matrix factorizations is fundamentally important in the process of numerically solving
linear systems, Ax = b. The basic idea is to decompose the matrix A into a product of simply solved
systems, from which the solution of Ax = b can be easily computed. The idea is similar to what we
might do when trying to find the roots of a polynomial. For example, the equations

x3 6x2 + 11x 6=0 and (x 1)(x 2)(x 3) = 0

are equivalent, but the factored form is clearly much easier to solve. In general, we cannot solve linear
systems so easily (i.e., by inspection), but decomposing A makes solving the linear system Ax = b
computationally simpler. Some knowledge of matrix algebra, especially matrix multiplication, is
needed to understand the concepts introduced here; a review is given in Section 1.2.

3.4.1 LU Factorization
Our aim here is to show that if Gaussian elimination can be used to reduce an n ⇥ n matrix A to
upper triangular form, then the information computed in the elimination process can be used to
write A as the product
A = LU
where L is a unit lower triangular matrix (that is, a lower triangular matrix with 1’s on the diagonal)
and U is an upper triangular matrix. This is called a matrix factorization, and we will see that
the idea of matrix factorization is very powerful when solving, analyzing and understanding linear
algebra problems.
To see how we can get to the LU factorization, suppose we apply Gaussian elimination to a n ⇥ n
matrix A without interchanging any rows. For example, for n = 3,
2 3 2 3 2 3
a1,1 a1,2 a1,3 a1,1 a1,2 a1,3 a1,1 a1,2 a1,3
6 7 6 (1) (1) 7 6 (1) (1) 7
4 a2,1 a2,2 a2,3 5 ! 4 0 a2,2 a2,3 5 ! 4 0 a2,2 a2,3 5
a3,1 a3,2 a3,3 (1) (1) (2)
0 a3,2 a3,3 0 0 a3,3
(1)
a2,1 a3,1 a3,2
m2,1 = a1,1 , m3,1 = a1,1 m3,2 = (1)
a2,2
3.4. GAUSSIAN ELIMINATION AND MATRIX FACTORIZATIONS 77

(k)
Here the superscript on the entry ai,j indicates an element of the matrix modified during the kth
elimination step. Recall that, in general, the multipliers are computed as
element to be eliminated
mi,j = .
current pivot element
Instead of using arrows to show the elimination steps, we can use matrix-matrix multiplication with
elimination matrices. Specifically, consider the multipliers m21 and m31 used to eliminate the entries
a21 and a31 in the first column of A. If we put these into a unit lower triangular matrix as follows,
2 3
1 0 0
M1 = 4 m21 1 0 5 ,
m31 0 1

then
2 32 a a12 a13
3
1 0 0 11
4 6 7
M1 A = m21 1 0 5 4 a21 a22 a23 5
m31 0 1 a31 a32 a33
2 3
a11 a12 a13
6 7
= 4 a21 m21 a11 a22 m21 a12 a23 m21 a13 5
a31 m31 a11 a32 m31 a12 a33 m31 a13
2 3
a1,1 a1,2 a1,3
6 (1) (1) 7
= 4 0 a22 a23 5
(1) (1)
0 a32 a33

Now use the multipliers for the next column to similarly define the elimination matrix M2 ,
2 3
1 0 0
M2 = 4 0 1 0 5
0 m32 1

and observe that


2 3 2 a1,1 a12 a13
3
1 0 0
4 0 6 (1) (1) 7
M2 (M1 A) = 1 0 54 0 a22 a23 5
0 m32 1 (1) (1)
0 a32 a33
2 3
a11 a12 a13
6 (1) (1) 7
= 4 0 a22 a23 5
(1) (1) (1) (1)
0 a32 m32 a22 a3,3 m32 a23
2 3
a11 a12 a13
6 (1) (1) 7
= 4 0 a22 a23 5
(2)
0 0 a33

Thus, for this 3 ⇥ 3 example, we have

M2 M1 A = U (upper triangular)

Notice that M1 and M2 are nonsingular (as an exercise you should explain why this is the case),
and so we can write
A = M 1 1 M2 1 U
78 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

The next observation we make is that the inverse of an elimination matrix is very easy to compute,
we just need to change the signs of mij to mij ; that is,
2 3 2 3
1 0 0 1 0 0
M1 1 = 4 m21 1 0 5 and M2 1 = 4 0 1 0 5
m31 0 1 0 m32 1
This can be easily verified by showing that M1 1 M1 = I and M2 1 M2 = I.
The final observation we need is that the product of elimination matrices (and the product of
their inverses) is a unit lower triangular matrix. In particular, observe that
2 32 3 2 3
1 0 0 1 0 0 1 0 0
M1 1 M2 1 = 4 m21 1 0 5 4 0 1 0 5 = 4 m21 1 0 5
m31 0 1 0 m32 1 m31 m32 1
Thus, for this simple 3 ⇥ 3 example, we have computed
A = (M1 1 M2 1 )U = LU
where L is a unit lower triangular matrix, with multipliers below the main diagonal, and U is the
upper triangular matrix obtained after the elimination is complete.
It is not difficult to generalize (e.g., by induction) to a general n ⇥ n matrix. Specifically, if A is
an n ⇥ n matrix, then:
• An elimination matrix Mj is the identity matrix, except that the jth column has negative
multipliers, mij , below the main diagonal, i = j + 1, j + 2, . . . , n,
2 3
1
6 .. 7
6 . 7
6 7
6 1 7
6
Mi = 6 7
m 7
6 j+1,j 7
6 .. . 7
4 . . . 5
mn,j 1

• The inverse of an elimination matrix, Mj 1 , is easy to compute by simply changing signs of


the multipliers, 2 3
1
6 .. 7
6 . 7
6 7
6 1 7
1
Mj = 6 6 7
m 7
6 j+1,j 7
6 .. . 7
4 . . . 5
mn,j 1
(1) (2)
• If the process does not break down (that is, all the pivot elements, a1,1 , a2,2 , a3,3 , . . ., are
nonzero) then we can construct elimination matrices M1 , M2 , . . ., Mn 2 , Mn 1 such that
Mn 1 Mn 2 · · · M2 M 1 A = U (upper triangular)
or
A = (M1 1 M2 1 · · · Mn 12 Mn 11 )U = LU
where
2 3 2 3
1 0 0 ··· 0
6 m2,1 1 0 ··· 0 7 6 7
6 7 6 upper triangular matrix 7
6 m3,1 m3,2 1 0 7 6 7
L=6 7 and U = 6 after the elimination is 7.
6 .. .. .. .. 7 6 7
4 . . . . 5 4 complete 5
mn,1 mn,2 mn,3 ··· 1
3.4. GAUSSIAN ELIMINATION AND MATRIX FACTORIZATIONS 79

Example 3.4.1. Let 2 3


1 2 3
A=4 2 3 2 5.
3 1 1
Using Gaussian elimination without row interchanges, we obtain
2 3 2 3 2 3
1 2 3 1 2 3 1 2 3
4 2 3 2 5 ! 4 0 7 4 5 ! 4 0 7 4 5
50
3 1 1 0 5 10 0 0 7
m2,1 = 2 5
m3,2 = 7
m3,1 = 3

and thus
2 3 2 3 2 3
1 0 0 1 0 0 1 2 3
L = 4 m2,1 1 0 5=4 2 1 0 5 and U =4 0 7 4 5.
5 50
m3,1 m3,2 1 3 7 1 0 0 7

It is straightforward to verify that A = LU .

If we can compute the factorization A = LU , then

Ax = b ) LU x = b ) Ly = b, where U x = y.

So, to solve Ax = b:
• Compute the factorization A = LU .
• Solve Ly = b using forward substitution.
• Solve U x = y using backward substitution.

Example 3.4.2. Consider solving the linear system Ax = b, where


2 32 3 2 3
1 0 0 1 2 1 2
A = LU = 4 3 1 0 54 0 2 1 5 and b = 4 9 5
2 2 1 0 0 1 1

Since the LU factorization of A is given, we need only:


• Solve Ly = b, or 2 32 3 2 3
1 0 0 y1 2
4 3 1 0 5 4 y2 5 = 4 9 5
2 2 1 y3 1
Using forward substitution, we obtain
y1 = 2
3y1 + y2 = 9 ) y2 = 9 3(2) = 3
2y1 2y2 + y3 = 1 ) y3 = 1 2(2) + 2(3) = 1
• Solve U x = y, or 2 32 3 2 3
1 2 1 x1 2
4 0 2 1 5 4 x2 5 = 4 3 5
0 0 1 x3 1
Using backward substitution, we obtain
80 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

x3 = 1.
2x2 x3 = 3 ) x2 = (3 + 1)/2 = 2.
x1 + 2x2 x3 = 2 ) x1 = 2 2(2) + 1 = 1.
Therefore, the solution of Ax = b is given by
2 3
1
x=4 2 5.
1

Problem 3.4.1. Find the LU factorization of each of the following matrices:


2 3 2 3
4 2 1 0 3 6 1 2
6 4 6 1 3 7 6 6 13 0 1 7
A=6 4 8 16
7,
5 B=6 4
7
3 4 1 2 1 1 5
20 10 4 3 3 8 1 12

Problem 3.4.2. Suppose the LU factorization of a matrix A is given by:


2 32 3
1 0 0 3 2 1
A = LU = 4 2 1 0 5 4 0 2 5 5
1 1 1 0 0 2
2 3
1
For b = 4 7 5 , solve Ax = b.
6

Problem 3.4.3. Suppose


2 3 2 3
1 3 4 1
A=4 0 1 5 5 and b=4 3 5.
2 0 4 2

(a) Use Gaussian elimination without row interchanges to find the factorization A = LU .
(b) Use the factorization of A to solve Ax = b.
Problem 3.4.4. Suppose
2 3 2 3
4 8 12 8 3
6 3 1 1 4 7 6 60 7
A=6
4
7 and b=6 7
4 1 5.
1 2 3 4 5
2 3 2 1 5

(a) Use Gaussian elimination without row interchanges to find the factorization A = LU .
(b) Use the factorization of A to solve Ax = b.

3.4.2 P A = LU Factorization
The practical implementation of Gaussian elimination uses partial pivoting to determine if row
interchanges are needed. We show that this results in a modification of the LU factorization. Row
interchanges can by represented mathematically as multiplication by a permutation matrix, obtained
by interchanging rows of the identity matrix.
3.4. GAUSSIAN ELIMINATION AND MATRIX FACTORIZATIONS 81

Example 3.4.3. Consider the matrix


2 3
0 3 8 6
6 2 3 0 1 7
A=6
4
7
5 2 4 7 5
1 1 1 1

In Gaussian elimination with partial pivoting, because |a13 | = 5 is the largest (in magnitude) entry
in the first column, we begin by switching rows 1 and 3. This can be represented in terms of matrix-
matrix multiplication with a permutation matrix P , which is constructed by switching the first and
third rows of a 4 ⇥ 4 identity matrix. Specifically,
2 3 2 3
1 0 0 0 0 0 1 0
6 0 1 0 0 7 switch 6 0 1 0 0 7
I=6 4 0 0 1 0 5
7 ! P =6 4 1 0 0 0 5
7
rows 1 and 3
0 0 0 1 0 0 0 1

Multiplying the matrix A on the left by P switches its first and third rows. That is,
2 32 3 2 3
0 0 1 0 0 3 8 6 5 2 4 7
6 0 1 0 0 7 6 2 3 0 1 7 6 1 7
PA = 6 76 7=6 2 3 0 7
4 1 0 0 0 54 5 2 4 7 5 4 0 3 8 6 5
0 0 0 1 1 1 1 1 1 1 1 1

We can now combine elimination matrices with permutation matrices to describe Gaussian elim-
ination with partial pivoting using matrix-matrix multiplications. Specifically,

Mn 1 Pn 1 · · · M 2 P2 M 1 P1 A = U

where
• P1 is the permutation matrix that swaps the largest, in magnitude, entry in the first column
of A to the (1,1) location.
• M1 is the elimination matrix that zeros out all entries of the first column of P1 A below the
(1,1) pivot entry.
• P2 is the permutation matrix that swaps the largest, in magnitude, entry on and below the
(2,2) diagonal entry in the second column of M1 P1 A with the (2,2) location.
• M2 is the elimination matrix that zeros out all entries of the second column of P2 M1 P1 A below
the (2,2) pivot entry.
• etc.
Thus, after completing Gaussian elimination with partial pivoting, we have
1
A = (Mn 1 Pn 1 · · · M 2 P 2 M 1 P1 ) U. (3.1)

Unfortunately the matrix (Mn 1 Pn 1 · · · M2 P2 M1 P1 ) 1 is not lower triangular, but it is sometimes


called a “psychologically lower triangular matrix” because it can be transformed into a triangular
matrix by permutation. Specifically, if we multiply all of the permutation matrices,

P = Pn 1 · · · P2 P1 ,

then P is a permutation matrix that combines all row interchanges, and


1
L = P (Mn 1 Pn 1 · · · M 2 P2 M 1 P1 )
82 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

is a unit lower triangular matrix. Thus, if we multiply P on both sides of equation (3.1), we obtain
1
PA = P (Mn 1 Pn 1 · · · M 2 P2 M 1 P1 ) U
= LU .
Therefore, when we apply Gaussian elimination with partial pivoting by rows to reduce A to
upper triangular form, we obtain an LU factorization of a permuted version of A. That is,
P A = LU
where P is a permutation matrix representing all row interchanges in the order that they are applied.
When writing code, or performing hand calculations, it is not necessary to explicitly form any
of the matrices Pi or Mi , but instead we can proceed as for the LU factorization, and keep track of
the row interchanges and multipliers as follows:
• Each time we switch rows of A, we switch corresponding multipliers in L. For example, if at
stage 3 we switch rows 3 and 5, then we must also switch the previously computed multipliers
m3,k and m5,k , k = 1, 2.
• Begin with P = I. Each time we switch rows of A, we switch corresponding rows of P .

Example 3.4.4. This example illustrates the process of computing P A = LU for the matrix
2 3
1 2 4
A=4 4 5 6 5.
7 8 9

A P multipliers
2 3 2 3
1 2 4 1 0 0
4 4 5 6 5 4 0 1 0 5 nothing yet
7 8 9 0 0 1

# # #
2 3 2 3
7 8 9 0 0 1
4 4 5 6 5 4 0 1 0 5 nothing yet
1 2 4 1 0 0

# # #
2 3 2 3
7 8 9 0 0 1
4 0 3 6 5 4 0 1 0 5 m21 = 4
m31 = 1
7 7 7 7
6 19
0 7 7 1 0 0

# # #
2 3 2 3
7 8 9 0 0 1
4 0 6 19 5 4 1 0 0 5 m21 = 1
m31 = 4
7 7 7 7
3 6
0 7 7 0 1 0

# # #
2 3 2 3
7 8 9 0 0 1
4 0 6 19 5 4 1 0 0 5 m21 = 1
m31 = 4
m32 = 1
7 7 7 7 2
1
0 0 2 0 1 0
3.4. GAUSSIAN ELIMINATION AND MATRIX FACTORIZATIONS 83

From the information in the final step of the process, we obtain the P A = LU factorization, where
2 3 2 3 2 3
0 0 1 1 0 0 7 8 9
P = 4 1 0 0 5 , L = 4 17 1 0 5 , U = 4 0 67 19 5
7
4 1 1
0 1 0 7 2 1 0 0 2

An efficient algorithm for computing the P A = LU factorization usually does not explicitly
construct P , but instead keeps an index of “pointers” to rows, but we will leave that discussion for
more advanced courses on numerical linear algebra.
If we can compute the factorization P A = LU , then

Ax = b ) P Ax = P b ) LU x = P b ) Ly = P b, where U x = y.

Therefore, to solve Ax = b:
• Compute the factorization P A = LU .

• Permute entries of b to obtain d = P b.


• Solve Ly = d using forward substitution.
• Solve U x = y using backward substitution.

It is important to emphasize the importance, and power of matrix factorizations. The cost of
computing A = LU or P A = LU requires O(n3 ) FLOPS, but forward and backward solves require
only O(n2 ) FLOPS. Thus, if we need to solve multiple linear systems where the matrix A does not
change, but with di↵erent right hand side vectors, such as

Ax1 = b1 , Ax2 = b2 · · ·

then we need only compute the (relatively expensive) matrix factorization once, and reuse it for all
linear system solves.
Matrix factorizations might be useful for other calculations, such as computing the determinant
of A. Recall the following properties of determinants:

• The determinant of a product of matrices is the product of their determinants. Thus, in


particular,
det(P A) = det(P )det(A) and det(LU ) = det(L)det(U ) .

• The determinant of the permutation matrix P is ±1; it is +1 if an even number of row swaps
were performed, and 1 if odd number of row swaps were performed.
• The determinant of a triangular matrix is the product of its diagonal entries. In particular,
det(L) = 1 because it is a unit lower triangular matrix.

Using these properties, we see that

det(P A) = det(LU )
det(P )det(A) = det(L)det(U )
±det(A) = det(U )
det(A) = ±det(U ) = ±u11 u22 · · · unn

Thus, once we have the P A = LU factorization, it is trivial to compute the determinant of A.


84 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Example 3.4.5. Use the P A = LU factorization of Example 3.4.4 to solve Ax = b, where


⇥ ⇤
bT = 1 2 3
Since the P A = LU factorization is given, we need only:
2 3
3
• Obtain d = P b = 4 1 5
2
• Solve Ly = d, or 2 32 3 2 3
10 0 y1 3
4 1 1 0 5 4 y2 5 = 4 1 5
7
4 1
7 2 1 y3 2
Using forward substitution, we obtain
y1 = 3
1 1 4
7 y1 + y 2 = 1 ) y 2 = 1 7 (3) = 7
4 1 4 1 4
7 y1 + 2 y 2 + y3 = 2 ) y3 = 2 7 (3) 2(7) =0
• Solve U x = y, or 2 32 3 2 3
7 8 9 x1 3
4 0 6 19 5 4
x2 5 = 4 47 5
7 7
1
0 0 2 x3 0
Using backward substitution, we obtain
1
2 x3= 0 ) x3 = 0.
6
7 x2+ 19 4 7 4
7 x3 = 7 ) x2 = 6 ( 7
19 2
7 (0)) = 3 .
1 2 1
7x1 + 8x2 + 9x3 = 3 ) x1 = 7 (3 8( 3 ) 9(0)) = 3.

Therefore, the solution of Ax = b is given by


2 1
3
3
x=4 2
3
5.
0

Problem 3.4.5. Use Gaussian elimination with partial pivoting to find the P A = LU
factorization of the matrices:
2 3
2 3 2 3 2 1 0 3
1 3 4 3 6 9 6 0 3 1
4 7
A=4 0 1 5 5, B = 4 2 5 2 5, C = 6 4 1
4 2 7
1 5
1 1 2
2 0 4 3 4 11 5
2 2 1 13
⇥ ⇤
Problem 3.4.6. Suppose that bT = 3 60 1 5 , and suppose that Gaussian elimina-
tion with partial pivoting has been used on a matrix A to obtain its P A = LU factorization,
where
2 3 2 3 2 3
0 1 0 0 1 0 0 0 4 8 12 8
6 0 0 0 1 7 6 3/4 1 0 0 7 6 10 7
P =6 7 6 7 , U = 6 0 5 10 7.
4 1 0 0 0 5 , L = 4 1/4 0 1 0 5 4 0 0 6 6 5
0 0 1 0 1/2 1/5 1/3 1 0 0 0 1

Use this factorization (do not compute the matrix A) to solve Ax = b.


Problem 3.4.7. Use the P A = LU factorizations of the matrices in the previous two
problems to compute det(A).
3.5. OTHER MATRIX FACTORIZATIONS 85

3.5 Other Matrix Factorizations


As mentioned in the previous section, the idea of matrix factorization is very important and powerful
when performing linear algebra computations. There are many types of matrix factorization, besides
P A = LU , that may be preferred for certain problems. In this section we consider three other matrix
factorizations: Cholesky, QR, and singular value decomposition (SVD).

3.5.1 Cholesky factorization


A matrix A 2 Rn⇥n is symmetric if A = AT , which means that the entries are symmetric about the
main diagonal.

Example 3.5.1. Consider the matrices


2 3 2 3
3 0 1 5 3 0 1 5
6 0 2 4 8 7 6 1 2 4 8 7
A=6 4 1 4
7 and B=6 7
1 2 5 4 0 3 1 2 5
5 8 2 6 2 1 1 6

then 2 3 2 3
3 0 1 5 3 1 0 2
6 0 2 4 8 7 6 0 2 3 1 7
AT = 6
4
7=A but BT = 6 7=6 B.
1 4 1 2 5 4 1 4 1 1 5
5 8 2 6 5 8 2 6
Thus A is symmetric, but B is not symmetric.

A matrix A 2 Rn⇥n is positive definite if xT Ax > 0 for all x 2 Rn , x 6= 0. The special structure
of a symmetric and positive definite matrix (which we will abbreviate as SPD) allows us to compute
a special LU factorization, which is called the Cholesky factorization. Specifically, it can be shown
that an n ⇥ n matrix A is SPD if and only if it can be factored as1

A = RT R (called the Cholesky factorization)

where R is an upper triangular matrix with positive entries on the diagonal. Pivoting is generally
not needed to compute this factorization. The “if and only if” part of the above statement is
important. This means that if we are given a matrix A, and the Cholesky factorization fails, then
we know A is not SPD, but if it succeeds, then we know A is SPD.
To give a brief outline on how to compute a Cholesky factorization, it is perhaps easiest to begin
with a small 3 ⇥ 3 example. That is, consider
2 3
a11 a12 a13
A = 4 a12 a22 a23 5 .
a13 a23 a33

To find the Cholesky factorization:


2 3
r11 r12 r13
• Set R = 4 0 r22 r23 5
0 0 r33
2 2
3
r11 r11 r12 r11 r13
6 7
• Form the matrix RT R = 4 r11 r12 2
r12 2
+ r22 r12 r13 + r22 r23 5
2 2 2
r11 r13 r12 r13 + r22 r23 r13 + r23 + r33
1 In some books, the Cholesky factorization is defined as A = LLT where L is lower triangular. This is the same as

the notation used in this book (which better matches what is the default from computed in Matlab), with L = RT .
86 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

• Now set A = RT R and match corresponding components to solve for rij . That is,
2 p
r11 = a11 ) r11 = a11
r11 r12 = a12 ) r12 = a12 /r11
r11 r13 = a13 ) r13 = a13 /r11
p
2 2 2
r12 + r22 = a22 ) r22 = a22 r12
r13 r12 + r23 r22 = a23 ) r23 = (a23 r13 r12 ) /r22
p
2 2 2 2 2
r13 + r23 + r33 = a33 ) r33 = a33 r13 r23

The above process can easily be generalized for any n ⇥ n SPD matrix. We skip efficient imple-
mentation details, but make two observations. First, generally we do not need to consider pivoting
when computing the Cholesky factorization of an SPD matrix, and the values inside the square root
symbols are always positive. The second observation is that because the matrix is symmetric, it
should not be surprising that an efficient implementation costs approximately half the number of
FLOPS needed for standard P A = LU factorizations.
Solving linear systems with the Cholesky factorization is essentially the same as with P A = LU .
That is, if A = RT R, then

Ax = b ) RT Rx = b ) RT (Rx) = b ,

and so to compute x,
• use forward substitution to solve RT y = b, and
• use backward substitution to solve Rx = y.

Problem 3.5.1. Compute the Cholesky factorization of the matrix:


2 3
4 1 1
A=4 1 3 1 5
1 1 2

Problem 3.5.2. Compute the Cholesky factorization of the matrix:


2 3
1 2 1
6 7
A=4 2 8 4 5
1 4 6

Problem 3.5.3. Compute the Cholesky factorization of


2 3
25 15 5
A = 4 15 25 1 5
5 1 6

and use the factorization to solve Ax = b, where


2 3
5
b=4 7 5.
12
3.5. OTHER MATRIX FACTORIZATIONS 87

3.5.2 QR factorization
A matrix Q 2 Rn⇥n is called orthogonal if the columns of Q form an orthonormal set. That is, if
we write ⇥ ⇤
Q = q1 q2 · · · qn ,
where qj is the jth column of Q, then

1 if i=j
qiT qj = .
0 if i 6= j

This means that if Q is an orthogonal matrix, then


2 T 3 2 T 3 2 3
q1 q1 q1 q1T q2 ··· q1T qn 1 0 ··· 0
6 qT 7 6 T T
··· q2T qn 7 6 ··· 7
6 2 7⇥ ⇤ 6 q2 q1 q2 q2 7 6 0 1 0 7
T 6 7
Q Q = 6 . 7 q1 q2 · · · qn = 6 . 6 7=6 7.
.. .. 7 6 .. .. .. 7
4 .. 5 4 .. . . 5 4 . . . 5
qnT qnT q1 qnT q2 ··· qnT qn 0 0 ··· 1

That is, QT Q = I, and thus the inverse of Q is simply QT . This is a very nice property!
A first course on linear algebra often covers a topic called Gram-Schmidt orthonormalization,
which transforms a linearly independent set of vectors into an orthonormal set. We will not review
the Gram-Schmidt method in this book, but state that if it is applied to the columns of a nonsingular
matrix A 2 Rn⇥n , then it results in a matrix factorization of the form

A = QR ,

where Q 2 Rn⇥n is an orthogonal matrix, and R 2 Rn⇥n is upper triangular. This is called the QR
factorization of A.
We remark that if the columns of the (possibly over-determined rectangular) matrix A 2 Rm⇥n ,
m n, then it is still possible to compute a QR factorization of A. This will be discussed when we
consider the topic of least squares in the chapter on curve fitting. We also remark that there are
other (often better) approaches than Gram-Schmidt for computing A = QR (e.g., Householder and
Givens methods), but these are best left for a more advanced course on numerical linear algebra.
Computing solutions of Ax = b with the QR factorization is also straight forward:

Ax = b ) QRx = b ) Rx = QT b ,

and so to solve Ax = b,
• compute d = QT b, and
• use backward substitution to solve Rx = d.
Although we do not discuss algorithms for computing A = QR in this book, we should note that if
A 2 Rn⇥n is nonsingular, then an efficient implementation requires O(n3 ) FLOPS. But the hidden
constant in the O(·) notation is approximately two times that for the P A = LU factorization. Thus,
P A = LU is usually the preferred factorization for solving n ⇥ n nonsingular systems of equations.
However, the QR factorization is superior for solving least squares problems.

3.5.3 Singular Value Decomposition


We end this section with arguably the most important matrix factorization. Let A 2 Rm⇥n , m n.
Then there exist orthogonal matrices
⇥ ⇤
U = u1 u2 · · · um 2 Rm⇥m
⇥ ⇤
V = v1 v2 · · · vn 2 Rn⇥n
88 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

and a diagonal matrix


2 3
1
6 .. 7
6 . 7
6 7
⌃ = diag( 1 , 2 , . . . , n ) = 6 7 2 Rm⇥n
6 n 7
4 5

such that A = U ⌃V T , with 1 2 ··· n 0. The factorization, A = U ⌃V T is called the


singular value decomposition (SVD).
As with the QR factorization, algorithms for computing the SVD are very complicated, and best
left for a more advanced course on numerical linear algebra. Here we just define the decomposition,
and discuss some of its properties. We use the following notation and terminology:
• i are called singular values of the matrix A.
• ui , which are the columns of U , are called left singular vectors of the matrix A.
• vi , which are the columns of V , are called right singular vectors of the matrix A.
Notice that, because V is an orthogonal matrix, we know V T V = I. Thus, for the case m n,

A = U ⌃V T ) AV = U ⌃

and so, 2 3
1
6 .. 7
⇥ ⇤ ⇥ ⇤6
6 . 7
7
A v1 ··· vn = u1 ··· un un+1 ··· um 6 7.
6 n 7
4 5

The above can be written as:


⇥ ⇤ ⇥ ⇤
Av1 ··· Avn = 1 u1 ··· n un

That is,
Avi = i ui , i = 1, 2, . . . , n
The SVD has the following properties:
• If rank(A) = r, then
1 ··· r > r+1 = ··· = n =0
. In particular, if A is n ⇥ n and nonsingular, then all singular values are nonzero.
• If rank(A) = r, then the nullspace of A is:

null(A) = span{vr+1 , . . . , vn }

That is, Ax = 0 if and only if x is a linear combination of the vectors vr+1 , . . . , vn .


• If rank(A) = r, then the range space of A is:

range(A) = span{u1 , . . . , ur }

Computing solutions of Ax = b with the SVD is straight forward:

Ax = b ) U ⌃V T x = b ) ⌃V T x = U T b ,

and so to solve Ax = b,
3.6. THE ACCURACY OF COMPUTED SOLUTIONS 89

• compute d = U T b,
• solve the diagonal system ⌃y = d,
• compute x = V y.
Although we do not discuss algorithms for computing the SVD in this book, we should note that if
A 2 Rn⇥n is nonsingular, then an efficient implementation requires O(n3 ) FLOPS. But the hidden
constant in the O(·) notation is approximately nine times that for the QR factorization, and eighteen
times that for the P A = LU factorization. Because it is so expensive, it is rarely used to solve linear
systems, but it is a superior tool to analyze sensitivity of linear systems, and it finds use in important
applications such as rank deficient least squares problems, principle component analysis (PCA), and
even data compression.

Problem 3.5.4. Our definition of the SVD assumes that m n (that is, A has at least as
many rows as columns). Show that a similar definition, and hence decomposition, can be
written for the case m < n (that is, A has more columns than rows).
Hint: Consider using our original definition of the SVD for AT .

Problem 3.5.5. Show that AT ui = i vi , i = 1, 2, . . . , n.

Problem 3.5.6. Suppose A 2 Rn⇥n . Show that det(A) = ± 1 2 · · · n.


Hint: First show that for any orthogonal matrix U , det(U ) = ±1.

Problem 3.5.7. In this problem we consider relationships between singular values and
eigenvalues. Recall from your basic linear algebra class that if B 2 Rn⇥n , then is an
eigenvalue of B if there is a nonzero vector x 2 Rn such that

Bx = x .

The vector x is called an eigenvector of B. There are relationships between singular


values/vectors and eigenvalue/vectors. To see this, assume A 2 Rm⇥n , m n, and
A = U ⌃V T is the SVD of A. Then from above, we know:

Avi = i ui and AT u i = i vi .

Using these relationships, show:


• AT Avi = 2
i vi , and thus 2
i is an eigenvalue of AT A with corresponding eigenvector
vi .
• AAT ui = 2
i ui , and thus 2
i is an eigenvalue of AAT with corresponding eigenvector
ui .
• If A is square and symmetric, that is A = AT , then the singular values of A are the
absolute values of the eigenvalues of A.

3.6 The Accuracy of Computed Solutions


Methods for determining the accuracy of the computed solution of a linear system are discussed in
more advanced courses in numerical linear algebra. However, in this section we attempt to at least
qualitatively describe some of the factors a↵ecting the accuracy.

3.6.1 Vector norms


First we need a tool to measure errors between vectors. Suppose we have a vector x̂ that is an
approximation of the vector x. How do we determine if x̂ is a good approximation of x? It may
90 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

seem natural to consider the error vector e = x̂ x and determine if e is small. However, e is a
vector with possibly many entries, so what does it mean to say “e is small”? To answer this question,
we need the concept of vector norm, which uses the notation k · k, and must satisfy the following
properties:
1. kvk 0 for all vectors v 2 Rn , and kvk = 0 if and only if v = 0 (that is, the vector with all
zero entries),

2. kv + wk  kvk + kwk for all vectors v 2 Rn and w 2 Rn ,


3. kcvk = |c|kvk for all vectors v 2 Rn and all scalars c.
There many vector norms, so sometimes we include a subscript, such as k · kp , to indicate precisely
which norm we are using. Here are some examples:

• The 2-norm is the standard Euclidean length of a vector taught in multivariable calculus and
linear algebra courses. Specifically, if
2 3
e1
6 e2 7
6 7
e=6 . 7
4 .. 5
en

then we define the vector 2-norm as


p q
kek2 = eT e = e21 + e22 + · · · + e2n .

• The vector 1-norm is defined as

kek1 = |e1 | + |e2 | + · · · + |en | .

• The vector 1-norm is defined as

kek1 = max {|ei |} .


1in

• In general, if 1  p < 1, then the p-norm is defined as

n
!1/p
X
p
kekp = |ei | .
i=1

Although other norms are used in certain applications, we usually use the 2-norm. However, any
norm gives us a single number, and if the norm of the error,

kek = kx̂ xk

is small, then we say the error is small.


We should note that “small” may be relative to the magnitude of the values in the vector x, and
thus it is perhaps better to use the relative error,

kx̂ xk
kxk

provided x 6= 0.
3.6. THE ACCURACY OF COMPUTED SOLUTIONS 91

Example 3.6.1. Suppose  


1 0.999
x= and x̂ =
1 1.001
then  3
10
x̂ x= 3
10
and so, using the 2-norm, we get
p p
6 6 3
kx̂ xk2 = 10 + 10 = 2 · 10 ⇡ 0.0014142

and the relative error p 3


kx̂ xk2 2 · 10 3
= p = 10 .
kxk2 2

Problem 3.6.1. Consider the previous example, and compute the relative error using the
1-norm and the 1-norm.
⇥ ⇤T
Problem 3.6.2. If x = 1 2 3 0 1 , compute kxk1 , kxk2 , and kxk1 .

3.6.2 Matrix norms


The idea of vector norms can be extended to matrices. Formally, we say that k · k is a matrix norm
if it satisfies the following properties:
1. kAk 0 for all matrices A 2 Rm⇥n , and kAk = 0 if and only if A = 0 (that is, the matrix
with all zero entries),
2. kA + Bk  kAk + kBk for all matrices A 2 Rm⇥n and B 2 Rm⇥n ,
3. kcAk = |c|kAk for all matrices A 2 Rm⇥n and all scalars c.
Given what we know about vector norms, it may be perhaps most natural to first consider the
Frobenius matrix norm, which is defined as
v
uX
um X
n
kAkF = t a2ij .
i=1 j=1

Other matrix norms that are often used in scientific computing are the class of p-norms, which are
said to be induced by the corresponding vector norms, and are defined as
kAxkp
kAkp = max .
x6=0 kxkp

This is not a very useful definition for actual computations, but fortunately there are short cut
formulas that can be used for three of the most popular matrix p-norms. We will not prove these,
but proofs can be found in more advanced books on numerical analysis.
• The matrix 2-norm is defined as
kAxk2
kAk2 = max ,
x6=0 kxk2
but can be computed as
kAk2 = 1 ,
where 1 is the largest singular value of A.
92 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

• The matrix 1-norm is defined as


kAxk1
kAk1 = max ,
x6=0 kxk1

but can be computed as


m
X
kAk1 = max |aij | ,
1jn
i=1

that is, the maximum column sum.


• The matrix 1-norm is defined as
kAxk1
kAk1 = max ,
x6=0 kxk1

but can be computed as


n
X
kAk1 = max |aij | ,
1im
j=1

that is, the maximum row sum.


The induced matrix norms satisfy two important and useful properties,

kAxk  kAkkxk,

and
kABk  kAkkBk ,
provided the matrix multiplication of AB is defined.

Problem 3.6.3. If 2 3
1 2 3
6 2 0 5 7
A=6
4
7,
1 1 1 5
2 4 0
compute kAk1 , kAk1 , and kAkF .

Problem 3.6.4. In the case of the Frobenius norm, show that


n
X
kAk2F = kaj k22 = trace(AT A) = trace(AAT )
j=1

where aj is the j-th column vector of the matrix A, and trace is the sum of diagonal entries
of the given matrix.

Problem 3.6.5. In general, for an induced matrix norm, the following inequality holds:

kAxk  kAkkxk

In some special (important) cases, equality holds. In particular, assume Q is an orthogonal


matrix, and show that
kQxk2 = kxk2 .
This property says that the Euclidean length (2-norm) of the vector x does not change if
the vector is modified by an orthogonal transformation. In this case, we say the 2-norm is
invariant under orthogonal transformations.
3.6. THE ACCURACY OF COMPUTED SOLUTIONS 93

Problem 3.6.6. We know that if A = U ⌃V T then

kAk2 = 1 (largest singular value of A).

Show that if A is nonsingular, then

1 1
kA k2 = (reciprocal of the smallest singular value of A).
n

Hint: If A = U ⌃V T is the SVD or A, what is the SVD of A 1


?

3.6.3 Measuring accuracy of computed solutions


Suppose we compute an approximate solution, x̂, of Ax = b. How do we determine if x̂ is a good
approximation of x? If we know the exact solution, then we can simply compute the relative error,
kx̂ xk
kxk
using any vector norm.
If we do not know the exact solution, then we may try to see if the computed solution is a good
fit to the data. That is, we consider the residual error,

krk = kb Ax̂k ,

or the relative residual error,


krk kb Ax̂k
= .
kbk kbk
We might ask the question:
If the relative residual error is small, does this mean x̂ is a good approximation of the exact
solution x?
Unfortunately the answer is: Not always. Consider the following example.

Example 3.6.2. Consider the matrix


 
0.835 0.667 0.168
A= , b= .
0.333 0.266 0.067

1
It is easy to verify that the exact solution to Ax = b is x = . Suppose we somehow compute
1

267
the approximation x̂ = , which is clearly a very poor approximation of x. But the residual
334
vector is 
0.001000000000019
r = b Ax̂ ⇡ ,
0.000000000000003
and the relative residual error is
krk2 kb Ax̂k2
= ⇡ 0.005528913725860 .
kbk2 kbk2
Thus in this example a small residual does not imply x̂ is a good approximation of x.

Why does this happen? We can gain a little insight by examining a simple 2 ⇥ 2 linear system,
a11 x1 + a12 x2 = b1
Ax = b )
a21 x1 + a22 x2 = b2
94 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

and assume that a12 6= 0 and a22 6= 0. Each equation is a line, which then can be written as
a11 b1 a21 b2
x2 = x1 + and x2 = x1 + .
a12 a12 a22 a22

x1
The solution of the linear system, x = is the point where the two lines intersect. Consider
x2
the following two very di↵erent cases (it might also help to look at the plots in Figure 3.1):
Case 1: The slopes, aa1112
and aa2122
are very di↵erent, e.g., the two lines are nearly perpendic-
ular. Then small changes in b or A (e.g., due to round o↵ error) will not dramatically change
the intersection point. In this case, the rows of A are linearly independent – very far from
being linearly dependent, and A is very far from being singular. In this case, we say the matrix
A, and hence the linear system Ax = b, is well-conditioned.
Case 2: The slopes, aa11 12
and aa2122
are nearly equal, e.g., the two lines are nearly parallel.
In this case, small changes in b or A (e.g., due to round o↵ error) can cause a dramatic change
the intersection point. Here, the rows of A are nearly linearly dependent, and thus A is very
close to being singular. In this case we say that the matrix A, and hence the linear system
Ax = b, is ill-conditioned.
This idea of conditioning can be extended to larger systems. In general, if the matrix is nearly
singular (i.e., the columns or rows are nearly linearly dependent), then we say the matrix A is
ill-conditioned.
So far our discussion of conditioning is a bit vague, and it would be nice to have a formal definition
and/or way to determine if a matrix is ill-conditioned. We can do this by recalling the SVD; that
is, if A = U ⌃V T , where
⌃ = diag( 1 , 2 , . . . , n ) ,
where 1 2 ··· n 0. Recall that the rank(A) is the number of nonzero singular values.
That is, A is singular if the smallest singular value, n = 0.
Now suppose A is nonsingular, so that n 6= 0. How do we determine if A “is nearly singular”?
One way is to consider the ratio of the largest and smallest singular values, n1 . This ratio will be
⇡ 1 if the matrix is well-conditioned (e.g., as in the case of the identity matrix, I), and very large if
the matrix is ill-conditioned. We know that kAk2 = 1 , and from Problem 3.6.6 we also know that
kA 1 k2 = 1n . Thus, we can define the condition number associated with the matrix 2-norm
to be
1
2 (A) = kAk2 kA 1 k2 = .
n
More generally, for any matrix norm, we define the condition number as
1
(A) = kAkkA k.

Example 3.6.3. Consider the matrix


 
0.835 0.667 0.168
A= , b= .
0.333 0.266 0.067

Using MATLAB’s svd function, we find that 1 ⇡ 1.1505e + 00 and 2 ⇡ 8.6915e 07, and hence
1 1
2 (A) = kAk2 kA k2 = ⇡ 1.3238e + 06 .
n

This is a very large number, and so we conclude that A is ill-conditioned.


3.6. THE ACCURACY OF COMPUTED SOLUTIONS 95

Let us now return back to the question:


If the relative residual error is small, does this mean x̂ is a good approximation of the exact
solution x?
To see how the residual relates to the relative error, suppose A is a nonsingular matrix and x̂ is an
approximate solution of Ax = b. Then using an induced matrix norm (e.g., 2-norm), we obtain:
• r=b Ax̂ = Ax Ax̂ = A(x x̂), which means
1
x x̂ = A r. (3.2)

• If we take norms on both sides of (3.2), we obtain


1 1
kx x̂k = kA rk  kA kkrk . (3.3)

• Next observe that Ax = b implies kbk = kAxk  kAkkxk, and so

1 kAk
 . (3.4)
kxk kbk

• Putting together the inequalities (3.3) and (3.4), we obtain:

kx x̂k 1 krk krk


 kAkkA k = (A)k . (3.5)
kxk kbk kbk

The result in (3.5) is important! It tells us:


• If the matrix A is well conditioned, e.g. (A) ⇡ 1, and if the relative residual is small, then
we can be sure to have an accurate solution.
• However, if A is ill-conditioned (e.g., (A) is very large), then the relative error can be large
even if the relative residual is small. Notice that we cannot say “will be large” because the
result is given in terms of an upper bound. But the fact that it “can be large” is important
to know, and should be taken into account when attempting to solve ill-conditioned linear
systems.
We conclude this subsection with the following remarks. The two categories, well-conditioned and
ill-conditioned are separated by a grey area. That is, while we can say that a matrix with condition
number (A) in the range of 1 to 100 would be considered well-conditioned, and a condition number
(A) > 108 is considered ill-conditioned, there is a large “grey” area between these extremes, and it
depends on machine precision. Although we cannot get rid of this grey area, we can say that if the
condition number of A is about 10p and the machine epsilon, ✏, is about 10 s then the solution of
the linear system Ax = b may have no more than about s p decimal digits accurate. Recall that
in SP arithmetic, s ⇡ 7, and in DP arithmetic, s ⇡ 16.
We should keep in mind that a well-conditioned linear system has the property that all small
changes in the matrix A and the right hand side b lead to a small change in the solution of Ax = b,
and that an ill-conditioned linear system has the property that some small changes in the matrix A
and/or the right hand side b can lead to a large change in the solution of Ax = b.

3.6.4 Backward error


It is important to emphasize that the concepts of ill-conditioned and well-conditioned are related
to the problem Ax = b, and not to the algorithm used to solve the system. Even the very best
algorithms cannot be expected to compute accurate solutions of extremely ill-conditioned problems.
However, if a problem is well-conditioned, than the algorithms we use should compute accurate
solutions. This topic is typically referred to as algorithm stability.
96 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

To understand the idea of algorithm stability, consider a nonsingular linear system Ax = b of


order n. The solution process involves two steps: a matrix factorization, and solves with simple
methods, such as forward and backward substitution. Because of roundo↵ errors, we cannot expect
the computed solution x̂ to be the exact solution of Ax = b. However, theoretically there is a system
 and b̂ for which x̂ is the exact solution; that is, Âx̂ = b̂ (in some sense  and b̂ are obtained by
starting with x̂ and running the steps of the algorithm in a “backwards” direction). An algorithm
is called backward stable if it can be shown that x̂ is the exact solution of Âx̂ = b̂, where

k Ak and kb̂ bk (3.6)

are small.
To prove an algorithm is backward stable requires establishing bounds for the error norms in
(3.6), and showing that the bounds are small. This is a relatively advanced topic, so we do not
provide any results in this book, but interested readers can find the results in many excellent books
on numerical linear algebra and matrix computations. However, it is important for readers of this
book to understand that one of the nice properties of GEPP is that, generally, Â is “close to” A and
b̂ is “close to” b, and thus GEPP is generally2 backward stable. When GEPP is backward stable,
the computed solution x̂ of Ax = b is the exact solution of Âx̂ = b̂ where  is close to A and b̂ is
close to b.
We also mention that the best algorithms for Cholesky, QR and SVD (e.g., the ones used by
Matlab) are backward stable.
How do the concepts of well-conditioned and ill-conditioned linear systems relate to the concept
of algorithm stability? Backward error analysis shows that the approximate solution of Ax = b
computed by Cholesky, QR and SVD, and usually for GEPP, are exact solutions of a related linear
system Âx̂ = b̂, where  is close to A and b̂ is close to b. Thus, if Ax = b is well-conditioned, it
follows that the computed solution is accurate. On the other hand, if Ax = b is ill-conditioned, even
if  is close to A and b̂ is close to b, these small di↵erences may lead to a large di↵erence between
the exact solution of Ax = b and the the computed solution x̂. In summary, when using a backward
stable algorithm to solve a well-conditioned linear system, the computed solution is accurate. On
the other hand, if the linear system is ill-conditioned, then even with the most stable algorithms,
the computed solution may not be accurate.

Problem 3.6.7. This example illustrates how not to determine conditioning of a linear
system. We know that det(A) = 0 when the linear system Ax = b is singular. So, we might
assume that the magnitude of det(A) might be a good indicator of how close the matrix A
is to a singular matrix. Unfortunately, this is not always the case. Consider, for example,
the two linear systems:
2 32 3 2 3 2 32 3 2 3
1 0 0 x1 1 0.5 0 0 x1 0.5
4 0 1 0 5 4 x2 5 = 4 2 5 and 4 0 0.5 0 5 4 x2 5 = 4 1.0 5
0 0 1 x3 3 0 0 0.5 x3 1.5

where the second is obtained from the first by multiplying each of its equations by 0.5.
(a) Find and compare the determinant of the two coefficient matrices.
(b) Show that GEPP produces identical solutions for both linear systems (this should be
trivial because each linear system is diagonal, so no exchanges or eliminations need be
performed).
Thus, this problem shows that the magnitude of the determinant is not necessarily a good
indicator of how close a coefficient matrix is to the nearest singular matrix.
2 We use the term generally because the error bounds depend on something called a growth factor, which is generally

small (in which case the algorithm is backward stable), but there are unusual cases where the growth factor can be
large. This topic is studied in more advanced courses on numerical linear algebra.
3.7. MATLAB NOTES 97

Problem 3.6.8. Consider the linear system of equations of order 2:

1000x1 + 999x2 = 1999


999x1 + 998x2 = 1997

(a) Show that the exact solution of this linear system is x1 = x2 = 1.


(b) For the nearby approximate solution x1 = 1.01, x2 = 0.99, compute the residual vector
and its norm (use the 2-norm).
(c) For the very inaccurate approximate solution x1 = 20.97, x2 = 18.99, compute the
residual vector and its norm (use the 2-norm). How does this compare to part (b)?
(d) What can you conclude about the matrix A? Can you confirm your conclusion?

Problem 3.6.9. Consider the linear system of equation of order 2:

0.780x1 + 0.563x2 = 0.217


0.913x1 + 0.659x2 = 0.254

with exact solution x1 = 1, x2 = 1. Consider two approximate solutions: first x1 = 0.999,


x2 = 1.001, and second x1 = 0.341, x2 = 0.087. Compute the residuals for these
approximate solutions. Is the accuracy of the approximate solutions reflected in the size of
the residuals? Is the linear system ill–conditioned?

Problem 3.6.10. Consider a linear system Ax = b. When this linear system is placed into
the computer’s memory, say by reading the coefficient matrix and right hand side from a
data file, the entries of A and b must be rounded to floating–point numbers. If the linear
system is well–conditioned, and the solution of this “rounded” linear system is computed
exactly, will this solution be accurate? Answer the same question but assuming that the
linear system Ax = b is ill–conditioned.

3.7 Matlab Notes


Software is available for solving linear systems where the coefficient matrices have a variety of
structures and properties. We restrict our discussion to “dense” systems of linear equations. (A
“dense” system is one where all the coefficients are treated as non-zero values. So, the matrix is
considered to have no special structure.) Most of today’s best software for solving “dense” systems
of linear equations, including that found in Matlab, was developed in the LAPACK project. We
describe the main tools (i.e., the backslash operator and the linsolve function) provided by Matlab
for solving dense linear systems. These, and other useful built-in Matlab functions that are relevant
to the topics of this chapter, and which will be discussed in this section, include:
98 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

linsolve used to solve linear systems Ax = b

\ (backslash) used to solve linear systems Ax = b

lu used to compute A = LU and P A = LU factorizations

cond used to compute condition number of a matrix

triu used to get upper triangular part of a matrix

tril used to get lower triangular part of a matrix

diag used to get diagonal part of a matrix, or to make a diagonal matrix

First however we develop Matlab implementations of some of the algorithms discussed in this
chapter. These examples build on the introduction in Chapter 1, and are designed to introduce
useful Matlab commands and to teach proper Matlab programming techniques.

3.7.1 Diagonal Linear Systems


Consider a simple diagonal linear system
2 32 3 2 3
a11 x1 = b1 a11 0 ··· 0 x1 b1
a22 x2 = b2 6 0 a22 ··· 0 76 x2 7 6 b2 7
6 76 7 6 7
.. , 6 . .. .. .. 76 .. 7=6 .. 7.
. 4 .. . . . 54 . 5 4 . 5
ann xn = bn 0 0 ··· ann xn bn
If the diagonal entries, aii , are all nonzero, it is trivial to solve for xi :
x1 = b1 /a11
x2 = b2 /a22
..
.
xn = bn /ann
To write a Matlab function to solve a diagonal system, we must decide what quantities should be
specified as input, and what as output. For example, we could input the matrix A and right hand
side vector b, and output the solution of Ax = b, as in the code:

function x = DiagSolve1(A,b)
%
% x = DiagSolve1(A, b);
%
% Solve Ax=b, where A is an n-by-n diagonal matrix.
%
n = length(b); x = zeros(n,1);
for i = 1:n
if A(i,i) == 0
error(’Input matrix is singular’)
end
x(i) = b(i) / A(i,i);
end
3.7. MATLAB NOTES 99

We use the Matlab function length to determine the dimension of the linear system, assumed
the same as the length of the right hand side vector, and we use the error function to print an error
message in the command window, and terminate the computation, if the matrix is singular. We can
shorten this code, and make it more efficient by using array operations:

function x = DiagSolve2(A, b)
%
% x = DiagSolve2(A, b);
%
% Solve Ax=b, where A is an n-by-n diagonal matrix.
%
d = diag(A);
if any(d == 0)
error(’Input matrix is singular’)
end
x = b ./ d;

In DiagSolve2, we use Matlab’s diag function to extract the diagonal entries of A, and store
them in a column vector d. If there is at least one 0 entry in the vector d, then any(d == 0) returns
true, otherwise it returns false. If all diagonal entries are nonzero, the solution is computed using
the element-wise division operation, ./ In most cases, if we know the matrix is diagonal, then we
can substantially reduce memory requirements by using only a single vector (not a matrix) to store
the diagonal elements, as follows:

function x = DiagSolve3(d, b)
%
% x = DiagSolve3(d, b);
%
% Solve Ax=b, where A is an n-by-n diagonal matrix.
%
% Input: d = vector containing diagonal entries of A
% b = right hand side vector
%
if any(d == 0)
error(’Diagonal matrix defined by input is singular’)
end
x = b ./ d;

In the algorithms DiagSolve2 and DiagSolve3 we do not check if length(b) is equal to length(d).
If they are not equal, Matlab will report an array dimension error.

Problem 3.7.1. Implement the functions DiagSolve1, DiagSolve2, and DiagSolve3, and
use them to solve the linear system in Fig. 3.2(a).
100 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Problem 3.7.2. Consider the Matlab commands:

n = 200:200:1000;, t = zeros(length(n), 3);


for i = 1:length(n)
d = rand(n(i),1);, A = diag(d);, x = ones(n(i),1);, b = A*x;
tic, x1 = DiagSolve1(A,b);, t(i,1) = toc;
tic, x2 = DiagSolve2(A,b);, t(i,2) = toc;
tic, x3 = DiagSolve3(d,b);, t(i,3) = toc;
end
disp(’---------------------------------------------’)
disp(’ Timings for DiagSolve functions’)
disp(’ n DiagSolve1 DiagSolve2 DiagSolve3’)
disp(’---------------------------------------------’)
for i = 1:length(n)
disp(sprintf(’%4d %9.3e %9.3e %9.3e’, n(i), t(i,1), t(i,2), t(i,3)))
end
Using the Matlab help and/or doc commands write a brief explanation of what happens
when these commands are executed. Write a script M–file implementing the commands and
run the script. Check for consistency by running the script sufficient times so that you have
confidence in your results. Describe what you observe from the computed results.

3.7.2 Triangular Linear Systems


We describe Matlab implementations of the forward substitution algorithms for lower triangular
linear systems. Implementations of backward substitution are left as exercises.
Consider the lower triangular linear system
2 32 3 2 3
a11 x1 = b1 a11 0 ··· 0 x1 b1
a21 x1 + a22 x2 = b2 6 a21 a22 · · · 0 7 6 7 6 7
6 7 6 x2 7 6 b2 7
.. .. , 6 . .. .. .. 7 6 . 7=6 .. 7.
. . 4 .. . . . 5 4 .. 5 4 . 5
an1 x1 + an2 x2 + · · · + ann xn = bn an1 an2 ··· ann xn bn
We first develop a Matlab implementation to solve this lower triangular system using the pseu-
docode for the row-oriented version of forward substitution given in Fig. 3.5:

function x = LowerSolve0(A, b)
%
% x = LowerSolve0(A, b);
%
% Solve Ax=b, where A is an n-by-n lower triangular matrix,
% using row-oriented forward substitution.
%
n = length(b);, x = zeros(n,1);
for i = 1:n
for j = 1:i-1
b(i) = b(i) - A(i,j)*x(j);
end
x(i) = b(i) / A(i,i);
end

This implementation can be improved in several ways. As with the diagonal solve functions, we
should include a statement that checks to see if A(i,i) is zero. Also, the innermost loop can be
replaced with a single Matlab array operation. Observe that we can write the algorithm as:
3.7. MATLAB NOTES 101

for i = 1 : n
xi = (bi (ai1 x1 + ai2 x2 + · · · + ai,i 1 xi 1 )) /aii
end
Using linear algebra notation, we can write this as:
for i = 1 0
:n 2 31
x1 ,
B ⇥ ⇤6 x2 7C
B 6 7C
x i = Bb i ai1 ai2 ··· ai,i 1 6 .. 7C aii
@ 4 . 5A
xi 1
end
Recall that, in Matlab, we can specify entries in a matrix or vector using colon notation. That is,
2 3
x1
6 x2 7 ⇥ ⇤
6 7
x(1 : i 1) = 6 . 7 and A(i, 1 : i 1) = ai,1 ai,2 · · · ai,i 1 .
.
4 . 5
xi 1

So, using Matlab notation, the algorithm for row-oriented forward substitution can be written:
for i = 1 : n
x(i) = (b(i) A(i, 1 : i 1) ⇤ x(1 : i 1))/ A(i, i)
end

When i = 1, Matlab considers A(i, 1 : i 1) and x(1 : i 1) to be ”empty” matrices, and the
computation A(i, 1 : i 1) ⇤ x(1 : i 1) gives 0. Thus, the algorithm simply computes x(1) =
b(1)/A(1, 1), as it should. To summarize, a Matlab function to solve a lower triangular system
using row oriented forward substitution could be written:

function x = LowerSolve1(A, b)
%
% x = LowerSolve1(A, b);
%
% Solve Ax=b, where A is an n-by-n lower triangular matrix,
% using row-oriented forward substitution.
%
if any(diag(A) == 0)
error(’Input matrix is singular’)
end
n = length(b);, x = zeros(n,1);
for i = 1:n
x(i) = (b(i) - A(i,1:i-1)*x(1:i-1)) / A(i,i);
end

Implementation of column-oriented forward substitution is similar. However, because xj is com-


puted before bi is updated, it is not possible to combine the two steps, as in the function LowerSolve1.
A Matlab implementation of column-oriented forward substitution could be written:
102 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

function x = LowerSolve2(A, b)
%
% x = LowerSolve2(A, b);
%
% Solve Ax=b, where A is an n-by-n lower triangular matrix,
% using column-oriented forward substitution.
%
if any(diag(A) == 0)
error(’Input matrix is singular’)
end
n = length(b);, x = zeros(n,1);
for j = 1:n
x(j) = b(j) / A(j,j);
b(j+1:n) = b(j+1:n) - A(j+1:n,j)*x(j);
end

What is computed by the statement b(j+1:n) = b(j+1:n) - A(j+1:n,j)*x(j) when j = n?


Because there are only n entries in the vectors, Matlab recognizes b(n+1:n) and A(n+1:n,n) to
be ”empty matrices”, and skips this part of the computation.

Problem 3.7.3. Implement the functions LowerSolve1 and LowerSolve2, and use them
to solve the linear system in Fig. 3.2(b).
Problem 3.7.4. Consider the following Matlab commands:

n = 200:200:1000;, t = zeros(length(n), 2);


for i = 1:length(n)
A = tril(rand(n(i)));, x = ones(n(i),1);, b = A*x;
tic, x1 = LowerSolve1(A,b);, t(i,1) = toc;
tic, x2 = LowerSolve2(A,b);, t(i,2) = toc;
end
disp(’------------------------------------’)
disp(’ Timings for LowerSolve functions’)
disp(’ n LowerSolve1 LowerSolve2 ’)
disp(’------------------------------------’)
for i = 1:length(n)
disp(sprintf(’%4d %9.3e %9.3e’, n(i), t(i,1), t(i,2)))
end
Using the Matlab help and/or doc commands write a brief explanation of what happens
when these commands are executed. Write a script M–file implementing the commands, run
the script, and describe what you observe from the computed results.
Problem 3.7.5. Write a Matlab function that solves an upper triangular linear sys-
tem using row-oriented backward substitution. Test your code using the linear system in
Fig. 3.2(c).
Problem 3.7.6. Write a Matlab function that solves an upper triangular linear system
using column-oriented backward substitution. Test your code using the linear system in
Fig. 3.2(c).
Problem 3.7.7. Write a script M–file that compares timings using row-oriented and
column-oriented backward substitution to solve an upper triangular linear systems. Use
the code in Problem 3.7.4 as a template.
3.7. MATLAB NOTES 103

3.7.3 Gaussian Elimination


Next, we consider a Matlab implementation of Gaussian elimination, using the pseudocode given in
Fig. 3.7. First, we explain how to implement the various steps during the k th stage of the algorithm.
• Consider the search for the largest entry in the pivot column. Instead of using a loop, we can
use the built-in max function. For example, if we use the command

[piv, i] = max(abs(A(k:n,k)));

then, after execution of this command, piv contains the largest entry (in magnitude) in the
vector A(k:n,k), and i is its location in the vector. Note that if i = 1, then the index p in
Fig. 3.7 is p = k and, in general,

p = i + k - 1;

• Once the pivot row, p, is known, then the pth row of A is switched with the k th row, and the
corresponding entries of b must be switched. This may be implemented using Matlab’s array
indexing capabilities:

A([k,p],k:n) = A([p,k],k:n);, b([k,p]) = b([p,k]);

We might visualize these two statements as


   
akk ak,k+1 · · · akn apk ap,k+1 ··· apn bk bp
:= and :=
apk ap,k+1 · · · apn akk ak,k+1 ··· akn bp bk

That is, b([k,p]) = b([p,k]) instructs Matlab to replace b(k) with the original b(p), and
to replace b(p) with original b(k). Matlab’s internal memory manager makes appropriate
copies of the data so that the original entries are not overwritten before the assignments are
completed. Similarly, A([k,p],k:n) = A([p,k],k:n) instructs Matlab to replace the rows
specified on the left with the original rows specified on the right.
• The elimination step is straightforward; compute the multipliers, which we store in the strictly
lower triangular part of A:

A(k+1:n,k) = A(k+1:n,k) / A(k,k);

and use array operations to perform the elimination:

for i = k+1:n
A(i,k+1:n) = A(i,k+1:n) - A(i,k)*A(k,k+1:n);, b(i) = b(i) - A(i,k)*b(k);
end

• Using array operations, the backward substitution step can be implemented:

for i = n:-1:1
x(i) = (b(i) - A(i,i+1:n)*x(i+1:n)) / A(i,i);
end

The statement for i = n:-1:1 indicates that the loop runs over values i = n, n 1, . . . , 1;
that is, it runs from i=n in steps of -1 until it reaches i=1.
• How do we implement a check for singularity? Due to roundo↵ errors, it is unlikely that
any pivots akk will be exactly zero, and so a statement if A(k,k) == 0 will usually miss
detecting singularity. An alternative is to check if akk is small in magnitude compared to, say,
the largest entry in magnitude in the matrix A:
104 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

if abs(A(k,k)) < tol

where tol is computed in an initialization step:

tol = eps * max(abs(A(:)));

Here, the command A(:) reshapes the matrix A into one long vector, and max(abs(A(:)))
finds the largest entry. Of course, this test also traps matrices which are close to but not
exactly singular, which is usually reasonable as by definition they tend to be ill–conditioned.
• Finally, certain other initializations are needed: the dimension, n, space for the solution vector,
and for the multipliers. We use the length, zeros, and eye functions:

n = length(b);, x = zeros(n,1);, m = eye(n);


3.7. MATLAB NOTES 105

function x = gepp(A, b)
%
% Solves Ax=b using Gaussian elimination with partial pivoting
% by rows for size. Initializations:
%
n = length(b); x = zeros(n,1);
tol = eps*max(A(:));
%
% Loop for stages k = 1, 2, ..., n-1
%
for k = 1:n-1
%
% Search for pivot entry:
%
[piv, psub] = max(abs(A(k:n,k)));, p = psub + k - 1;
%
% Exchange current row, k, with pivot row, p:
%
A([k,p],k:n) = A([p,k],k:n);, b([k,p]) = b([p,k]);
%
% Check to see if A is singular:
%
if abs(A(k,k)) < tol
error(’Linear system appears to be singular’)
end
%
% Perform the elimination step - row-oriented:
%
A(k+1:n,k) = A(k+1:n,k) / A(k,k);
for i = k+1:n
A(i,k+1:n) = A(i,k+1:n) - A(i,k)*A(k,k+1:n);, b(i) = b(i) - A(i,k)*b(k);
end
end
%
% Check to see if A is singular:
%
if abs(A(n,n)) < tol
error(’Linear system appears to be singular’)
end
%
% Solve the upper triangular system by row-oriented backward substitution:
%
for i = n:-1:1
x(i) = (b(i) - A(i,i+1:n)*x(i+1:n)) / A(i,i);
end

The function gepp above implements a function that uses GEPP to solve Ax = b.

Problem 3.7.8. Implement the function gepp, and use it to solve the linear systems given
in problems 3.3.2 and 3.3.3.
106 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Problem 3.7.9. Test gepp using the linear system


2 32 3 2 3
1 2 3 x1 1
4 4 5 6 5 4 x2 5 = 4 0 5 .
7 8 9 x3 1

Explain your results.


Problem 3.7.10. Modify gepp, replacing the statements if abs(A(k,k)) < tol and if
abs(A(n,n)) < tol with, respectively, if A(k,k) == 0 and if A(n,n) == 0. Solve the
linear system given in Problem 3.7.9. Why are your results di↵erent?
Problem 3.7.11. The built-in Matlab function hilb constructs the Hilbert matrix, whose
(i, j) entry is 1/(i + j 1). Write a script M–file that constructs a series of test problems
of the form:
A = hilb(n);, x true = ones(n,1);, b = A*x true;
The script should use gepp to solve the resulting linear systems, and print a table of results
with the following information:

-----------------------------------------------------
n error residual condition number
-----------------------------------------------------
where

• error = relative error = norm(x true - x)/norm(x true)


• residual = relative residual error = norm(b - A*x)/norm(b)
• condition number = measure of conditioning = cond(A)
Print a table for n = 5, 6, . . . , 13. Are the computed residual errors small? What about the
relative errors? Is the size of the residual error related to the condition number? What about
the relative error? Now run the script with n 14. What do you observe?
Problem 3.7.12. Rewrite the function gepp so that it uses no array operations. Compare
the efficiency (for example, using tic and toc) of your function with gepp on matrices of
dimensions n = 100, 200, . . . , 1000.

3.7.4 Built-in Matlab Tools for Linear Systems


An advantage of using a powerful scientific computing environment like Matlab is that we do
not need to write our own implementations of standard algorithms, like Gaussian elimination and
triangular solves. Matlab provides two powerful tools for solving linear systems:
• The backslash operator: \
Given a matrix A and vector b, \ can be used to solve Ax = b with the single command:

x = A \ b;

Matlab first checks if the matrix A has a special structure, including diagonal, upper triangu-
lar, and lower triangular. If a special structure is recognized (for example, upper triangular),
then a special method (for example, column-oriented backward substitution) is used to solve
Ax = b. If a special structure is not recognized then Gaussian elimination with partial pivoting
is used to solve Ax = b. During the process of solving the linear system, Matlab estimates
the reciprocal of the condition number of A. If A is ill–conditioned, then a warning message is
printed along with the estimate, RCOND, of the reciprocal of the condition number.
3.7. MATLAB NOTES 107

• The function linsolve.


If we know a-priori that A has a special structure recognizable by Matlab, then we can
improve efficiency by avoiding checks on the matrix, and skipping directly to the special solver.
This can be especially helpful if, for example, it is known that A is upper triangular. A-priori
information on the structure of A can be provided to Matlab using the linsolve function.
For more information, see help linsolve or doc linsolve.

Because the backslash operator is so powerful, we use it almost exclusively to solve general linear
systems.
We can compute explicitly the P A = LU factorization using the lu function:
[L, U, P] = lu(A)
Given this factorization, and a vector b, we could solve the Ax = b using the statements:

d = P * b;, y = L \ d;, x = U \ y;
These statements could be combined into one instruction:
x = U \ ( L \ (P * b) );

Thus, given a matrix A and vector b we could solve the linear system Ax = b as follows:
[L, U, P] = lu(A);
x = U \ ( L \ (P * b) );
Note, Matlab follows the rules of operator precedence thus backslash and * are of equal precedence.
With operators of equal precedence Matlab works from the left. So, without parentheses the
instruction
x = U \ L \ P * b ;
would be interpreted as
x = ((U \ L) \ P) * b;

The cost and accuracy of this approach is essentially the same as using the backslash operator.
So when would we prefer to explicitly compute the P A = LU factorization? One situation is when
we need to solve several linear systems with the same coefficient matrix, but di↵erent right hand
side vectors. For large systems it is far more expensive to compute the P A = LU factorization than
it is to use forward and backward substitution to solve corresponding lower and upper triangular
systems. Thus, if we can compute the factorization just once, and use it for the various di↵erent
right hand side vectors, we can make a substantial savings. This is illustrated in problem 3.7.18,
which involves a relatively small linear system.

Problem 3.7.13. Use the backslash operator to solve the systems given in problems 3.3.2
and 3.3.3.
Problem 3.7.14. Use the backslash operator to solve the linear system
2 32 3 2 3
1 2 3 x1 1
4 4 5 6 5 4 x2 5 = 4 0 5 .
7 8 9 x3 1

Explain your results.

Problem 3.7.15. Repeat problem 3.7.11 using the backslash operator in addition to gepp.
108 CHAPTER 3. SOLUTION OF LINEAR SYSTEMS

Problem 3.7.16. Consider the linear system defined by the following Matlab commands:

A = eye(500) + triu(rand(500));, x = ones(500,1);, b = A * x;


Does this matrix have a special structure? Suppose we slightly perturb the entries in A:
C = A + eps*rand(500);

Does the matrix C have a special structure? Execute the following Matlab commands:
tic, x1 = A \ b;, toc
tic, x2 = C \ b;, toc
tic, x3 = triu(C) \ b;, toc
Are the solutions x1, x2 and x3 good approximations to the exact solution, x =
ones(500,1)? What do you observe about the time required to solve each of the linear
systems?

Problem 3.7.17. Use the Matlab lu function to compute the P A = LU factorization of


each of the matrices given in Problem 3.4.5.

Problem 3.7.18. Create a script M–file containing the following Matlab statements:
n = 50;, A = rand(n);
tic
for k = 1:n
b = rand(n,1);, x = A \ b;
end
toc
tic
[L, U, P] = lu(A);
for k = 1:n
b = rand(n,1);, x = U \ ( L \ (P * b) );
end
toc
The dimension of the problem is set to n = 50. Experiment with other values of n, such as
n = 100, 150, 200. What do you observe?

You might also like