Exercises For 8.5

8.6.
The Singular Value Decomposition 445
Exercises for 8.5
Exercise 8.5.1 In each case, find the exact eigenvalues Exercise 8.5.4 If A is symmetric, show that each matrix
and determine
corresponding eigenvectors. Then start Ak in the QR-algorithm is also symmetric. Deduce that
1 they converge to a diagonal matrix.
with x0 = and compute x4 and r3 using the power
1
method. 8.5.5 Apply the QR-algorithm to
Exercise
2 −3
2 −4 5 2 A= . Explain.
a. A = b. A = 1 −2
−3 3 −3 −2
Exercise 8.5.6 Given a matrix A, let Ak , Qk , and Rk ,
1 2 3 1 k ≥ 1, be the matrices constructed in the QR-algorithm.
c. A = d. A =
2 1 1 0 Show that Ak = (Q1 Q2 · · · Qk )(Rk · · · R2 R1 ) for each k ≥ 1
and hence that this is a QR-factorization of Ak .
Exercise 8.5.2 In each case, find the exact eigenvalues
[Hint: Show that Qk Rk = Rk−1 Qk−1 for each k ≥ 2, and
and then approximate them using the QR-algorithm.
use this equality to compute (Q1 Q2 · · · Qk )(Rk · · · R2 R1 )

1 1 3 1 “from the centre out.” Use the fact that (AB)n+1 =
a. A = b. A =
1 0 1 0 A(BA)n B for any square matrices A and B.]
8.5.3 Apply the power method

Exercise to
0 1 1
A= , starting at x0 = . Does it con-
−1 0 1
verge? Explain.
8.6 The Singular Value Decomposition
When working with a square matrix A it is clearly useful to be able to “diagonalize” A, that is to find
a factorization A = Q−1 DQ where Q is invertible and D is diagonal. Unfortunately such a factorization
may not exist for A. However, even if A is not square gaussian elimination provides a factorization of
the form A = PDQ where P and Q are invertible and D is diagonal—the Smith Normal form (Theorem
2.5.3). However, if A is real we can choose P and Q to be orthogonal real matrices and D to be real. Such
a factorization is called a singular value decomposition (SVD) for A, one of the most useful tools in
applied linear algebra. In this Section we show how to explicitly compute an SVD for any real matrix A,
and illustrate some of its many applications.
We need a fact about two subspaces associated with an m × n matrix A:
im A = {Ax | x in Rn } and col A = span {a | a is a column of A}
Then im A is called the image of A (so named because of the linear transformation Rn → Rm with x 7→ Ax);
and col A is called the column space of A (Definition 5.10). Surprisingly, these spaces are equal:
Lemma 8.6.1
For any m × n matrix A, im A = col A.
446 Orthogonality

Proof. Let A = a1 a2 · · · an in terms of its columns. Let x ∈ im A, say x = Ay, y in Rn . If
T
y = y1 y2 · · · yn , then Ay = y1 a1 + y2 a2 + · · · + yn an ∈ col A by Definition 2.5. This shows that
im A ⊆ col A. For the other inclusion, each ak = Aek where ek is column k of In .
8.6.1. Singular Value Decompositions
We know a lot about any real symmetric matrix: Its eigenvalues are real (Theorem 5.5.7), and it is orthog-
onally diagonalizable by the Principal Axes Theorem (Theorem 8.2.2). So for any real matrix A (square
or not), the fact that both AT A and AAT are real and symmetric suggests that we can learn a lot about A by
studying them. This section shows just how true this is.
The following Lemma reveals some similarities between AT A and AAT which simplify the statement
and the proof of the SVD we are constructing.
Lemma 8.6.2
Let A be a real m × n matrix. Then:
1. The eigenvalues of AT A and AAT are real and non-negative.
2. AT A and AAT have the same set of positive eigenvalues.
Proof.
1. Let λ be an eigenvalue of AT A, with eigenvector 0 6= q ∈ Rn . Then:
kAqk2 = (Aq)T (Aq) = qT (AT Aq) = qT (λ q) = λ (qT q) = λ kqk2
Then (1.) follows for AT A, and the case AAT follows by replacing A by AT .
2. Write N(B) for the set of positive eigenvalues of a matrix B. We must show that N(AT A) = N(AAT ).
If λ ∈ N(AT A) with eigenvector 0 6= q ∈ Rn , then Aq ∈ Rm and
AAT (Aq) = A[(AT A)q] = A(λ q) = λ (Aq)
Moreover, Aq 6= 0 since AT Aq = λ q 6= 0 and both λ 6= 0 and q 6= 0. Hence λ is an eigenvalue of

AAT , proving N(AT A) ⊆ N(AAT ). For the other inclusion replace A by AT .
To analyze an m × n matrix A we have two symmetric matrices to work with: AT A and AAT . In view
of Lemma 8.6.2, we choose AT A (sometimes called the Gram matrix of A), and derive a series of facts
which we will need. This narrative is a bit long, but trust that it will be worth the effort. We parse it out in
several steps:
1. The n × n matrix AT A is real and symmetric so, by the Principal Axes Theorem 8.2.2, let
{q1 , q2 , . . . , qn } ⊆ Rn be an orthonormal basis of eigenvectors of AT A, with corresponding eigenval-
ues λ1 , λ2 , . . . , λn . By Lemma 8.6.2(1), λi is real for each i and λi ≥ 0. By re-ordering the qi we may
8.6. The Singular Value Decomposition 447
(and do) assume that

λ1 ≥ λ2 ≥ · · · ≥ λr > 0 and 8 λi = 0 if i > r (i)
By Theorems 8.2.1 and 3.3.4, the matrix

Q = q1 q2 · · · qn is orthogonal and orthogonally diagonalizes AT A (ii)
2. Even though the λi are the eigenvalues of AT A, the number r in (i) turns out to be rank A. To understand
why, consider the vectors Aqi ∈ im A. For all i, j:
Aqi · Aq j = (Aqi )T Aq j = qTi (AT A)q j = qTi (λ j q j ) = λ j (qTi q j ) = λ j (qi · q j )
Because {q1 , q2 , . . . , qn } is an orthonormal set, this gives
Aqi · Aq j = 0 if i 6= j and kAqi k2 = λi kqi k2 = λi for each i (iii)
We can extract two conclusions from (iii) and (i):
{Aq1 , Aq2 , . . . , Aqr } ⊆ im A is an orthogonal set and Aqi = 0 if i > r (iv)
With this write U = span {Aq1 , Aq2 , . . . , Aqr } ⊆ im A; we claim that U = im A, that is im A ⊆ U .
For this we must show that Ax ∈ U for each x ∈ Rn . Since {q1 , . . . , qr , . . . , qn } is a basis of Rn (it is
orthonormal), we can write xk = t1 q1 + · · · + tr qr + · · · + tn qn where each t j ∈ R. Then, using (iv) we
obtain
Ax = t1 Aq1 + · · · + tr Aqr + · · · + tn Aqn = t1 Aq1 + · · · + tr Aqr ∈ U
This shows that U = im A, and so

{Aq1 , Aq2 , . . . , Aqr } is an orthogonal basis of im (A) (v)
But col A = im A by Lemma 8.6.1, and rank A = dim ( col A) by Theorem 5.4.1, so
(v)
rank A = dim ( col A) = dim ( im A) = r (vi)
3. Before proceeding, some definitions are in order:
Definition 8.7
√ (iii)
The real numbers σi = λi = kAq̄i k for i = 1, 2, . . . , n, are called the singular values of the
matrix A.
Clearly σ1 , σ2 , . . . , σr are the positive singular values of A. By (i) we have
σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and σi = 0 if i > r (vii)
With (vi) this makes the following definitions depend only upon A.
8 Of course they could all be positive (r = n) or all zero (so AT A = 0, and hence A = 0 by Exercise 5.3.9).
448 Orthogonality
Definition 8.8
Let A be a real, m × n matrix of rank r, with positive singular values σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and
σi = 0 if i > r. Define:

DA 0
DA = diag (σ1 , . . . , σr ) and ΣA =
0 0 m×n
Here ΣA is in block form and is called the singular matrix of A.
The singular values σi and the matrices DA and ΣA will be referred to frequently below.
4. Returning to our narrative, normalize the vectors Aq1 , Aq2 , . . . , Aqr , by defining
1
pi = kAqi k Aqi ∈ Rm for each i = 1, 2, . . . , r (viii)
By (v) and Lemma 8.6.1, we conclude that

{p1 , p2 , . . . , pr } is an orthonormal basis of col A ⊆ Rm (ix)
Employing the Gram-Schmidt algorithm (or otherwise), construct pr+1 , . . . , pm so that

{p1 , . . . , pr , . . . , pm } is an orthonormal basis of Rm (x)
5. By (x) and (ii) we have two orthogonal matrices

P = p1 · · · pr · · · pm of size m × m and Q = q1 · · · qr · · · qn of size n × n
These matrices are related. In fact we have:

p (iii) (viii)
σi pi = λi pi = kAqi kpi = Aqi for each i = 1, 2, . . . , r (xi)
This yields the following expression for AQ in terms of its columns:

(iv)
AQ = Aq1 · · · Aqr Aqr+1 · · · Aqn = σ1 p1 · · · σr pr 0 · · · 0 (xii)
Then we compute:
 
σ1 · · · 0 0 ··· 0
 .. . . . .. .. .. 
 . . . . 
 0 ··· σr 0 ··· 0 

PΣA = p1 · · · pr pr+1 · · · pm  
 0 ··· 0 0 ··· 0 
 . .. .. .. 
 .. . . . 
0 ··· 0 0 ··· 0

= σ1 p1 · · · σr pr 0 · · · 0
(xii)
= AQ
Finally, as Q−1 = QT it follows that A = PΣA QT .
With this we can state the main theorem of this Section.

Theorem 8.6.1
Let A be a real m × n matrix, and let σ1 ≥ σ2 ≥ · · · ≥ σr > 0 be the positive singular values of A.
Then r is the rank of A and we have the factorization
A = PΣA QT where P and Q are orthogonal matrices
The factorization A = PΣA QT in Theorem 8.6.1, where P and Q are orthogonal matrices, is called a
Singular Value Decomposition (SVD) of A. This decomposition is not unique. For example if r < m then
the vectors pr+1 , . . . , pm can be any extension of {p1 , . . ., pr } to an orthonormal basis of Rm , and each
will lead to a different matrix P in the decomposition. For a more dramatic example, if A = In then ΣA = In ,
and A = PΣA PT is a SVD of A for any orthogonal n × n matrix P.
Example 8.6.1

1 0 1
Find a singular value decomposition for A = .
−1 1 0
 
2 −1 1
Solution. We have AT A =  −1 1 0 , so the characteristic polynomial is
1 0 1
 
x−2 1 −1
cAT A (x) = det  1 x−1 0  = (x − 3)(x − 1)x
−1 0 x−1
Hence the eigenvalues of AT A (in descending order) are λ1 = 3, λ2 = 1 and λ3 = 0 with,

respectively, unit eigenvectors
     
2 0 −1
q1 = √1  −1  , q2 = √1  1  , and q3 = √13  −1 
6 2
1 1 1
It follows that the orthogonal matrix Q in Theorem 8.6.1 is
 √ 
2 √ 0 −√2
1 
Q = q1 q2 q3 = 6 −1 √3 −√2 
√
1 3 2
√
The singular values here are σ1 = 3, σ2 = 1 and σ3 = 0, so rank (A) = 2—clear in this
case—and the singular matrix is
√
σ1 0 0 3 0 0
ΣA = =
0 σ2 0 0 1 0
So it remains to find the 2 × 2 orthogonal matrix P in Theorem 8.6.1. This involves the vectors
√
√

6 1 2 1 0
Aq1 = 2 , Aq2 = 2 , and Aq3 =
−1 1 0
450 Orthogonality
Normalize Aq1 and Aq2 to get

√1
1 √1
1
p1 = and p2 =
2 −1 2 1
In this case, {p1 , p2 } is already a basis of R2 (so the Gram-Schmidt algorithm is not needed), and
we have the 2 × 2 orthogonal matrix

1 1 1
P = p1 p2 = √
2 −1 1
Finally (by Theorem 8.6.1) the singular value decomposition for A is
 
√ 2 −1
√ √ 1
1 1 3 0 0 √1 
A = PΣA QT = √12 √ 0 √ 3 √3 
−1 1 0 1 0 6
− 2 − 2 2
Of course this can be confirmed by direct matrix multiplication.
Thus, computing an SVD for a real matrix A is a routine matter, and we now describe a systematic
procedure for doing so.
SVD Algorithm
Given a real m × n matrix A, find an SVD A = PΣA QT as follows:
1. Use the Diagonalization Algorithm (see page 181) to find the (real and non-negative)
eigenvalues λ1 , λ2 , . . . , λn of AT A with corresponding (orthonormal) eigenvectors
q1 , q2 , . . . , qn . Reorder the qi (if necessary) to ensure that the nonzero eigenvalues are
λ1 ≥ λ2 ≥ · · · ≥ λr > 0 and λi = 0 if i > r.
2. The integer r is the rank of the matrix A.

3. The n × n orthogonal matrix Q in the SVD is Q = q1 q2 · · · qn .
4. Define pi = kA1q k Aqi for i = 1, 2, . . . , r (where r is as in step 1). Then {p1 , p2 , . . . , pr } is

i
orthonormal in Rm so (using Gram-Schmidt or otherwise) extend it to an orthonormal basis
{p1 , . . . , pr , . . . , pm } in Rm .

5. The m × m orthogonal matrix P in the SVD is P = p1 · · · pr · · · pm .
√
6. The singular values for A are σ1 , σ2, . . . , σn where σi = λi for each i. Hence the nonzero
values are σ1 ≥ σ2 ≥ · · · ≥ σr > 0, and so the singular matrix of A in the SVD is
singular
diag (σ1 , . . . , σr ) 0
ΣA = .
0 0 m×n
7. Thus A = PΣQT is a SVD for A.
In practise the singular values σi , the matrices P and Q, and even the rank of an m × n matrix are not
calculated this way. There are sophisticated numerical algorithms for calculating them to a high degree of
accuracy. The reader is referred to books on numerical linear algebra.
So the main virtue of Theorem 8.6.1 is that it provides a way of constructing an SVD for every real
matrix A. In particular it shows that every real matrix A has a singular value decomposition9 in the
following, more general, sense:
Definition 8.9
A Singular Value Decomposition (SVD) of an m ×n matrix A of rank r is a factorization
D 0
A = U ΣV T where U and V are orthogonal and Σ = in block form where
0 0 m×n
D = diag (d1 , d2 , . . . , dr ) where each di > 0, and r ≤ m and r ≤ n.
Note that for any SVD A = U ΣV T we immediately obtain some information about A:
Lemma 8.6.3
If A = U ΣV T is any SVD for A as in Definition 8.9, then:
1. r = rank A.
2. The numbers d1 , d2 , . . . , dr are the singular values of AT A in some order.
Proof. Use the notation of Definition 8.9. We have
AT A = (V ΣT U T )(U ΣV T ) = V (ΣT Σ)V T
so ΣT Σ and AT A are similar n ×n matrices (Definition 5.11). Hence r = rank A by Corollary 5.4.3, proving
(1.). Furthermore, ΣT Σ and AT A have the same eigenvalues by Theorem 5.5.1; that is (using (1.)):
{d12 , d22 , . . . , dr2 } = {λ1 , λ2 , . . . , λr } are equal as sets
where λ1 , λ2 , . . . , λr are the positive eigenvalues of AT A. Hence

√ there is a permutation τ of {1, 2, · · · , r}
such that di2 = λiτ for each i = 1, 2, . . . , r. Hence di = λiτ = σiτ for each i by Definition 8.7. This
proves (2.).
We note in passing that more is true. Let A be m × n of rank r, and let A = U ΣV T be any SVD for A.
Using the proof of Lemma 8.6.3 we have di = σiτ for some permutation τ of {1, 2, . . . , r}. In fact, it can
be shown that there exist orthogonal matrices U1 and V1 obtained from U and V by τ -permuting columns
and rows respectively, such that A = U1 ΣAV1T is an SVD of A.
9 In fact every complex matrix has an SVD [J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997]
452 Orthogonality
8.6.2. Fundamental Subspaces
It turns out that any singular value decomposition contains a great deal of information about an m ×
n matrix A and the subspaces associated with A. For example, in addition to Lemma 8.6.3, the set
{p1 , p2 , . . . , pr } of vectors constructed in the proof of Theorem 8.6.1 is an orthonormal basis of col A
(by (v) and (viii) in the proof). There are more such examples, which is the thrust of this subsection.
In particular, there are four subspaces associated to a real m × n matrix A that have come to be called
fundamental:
Definition 8.10
The fundamental subspaces of an m × n matrix A are:
row A = span {x | x is a row of A}
col A = span {x | x is a column of A}
null A = {x ∈ Rn | Ax = 0}
null AT = {x ∈ Rn | AT x = 0}
If A = U ΣV T is any SVD for the real m × n matrix A, any orthonormal bases of U and V provide orthonor-
mal bases for each of these fundamental subspaces. We are going to prove this, but first we need three
properties related to the orthogonal complement U ⊥ of a subspace U of Rn , where (Definition 8.1):
U ⊥ = {x ∈ Rn | u · x = 0 for all u ∈ U }
The orthogonal complement plays an important role in the Projection Theorem (Theorem 8.1.3), and we
return to it in Section 10.2. For now we need:
Lemma 8.6.4
If A is any matrix then:
1. ( row A)⊥ = null A and ( col A)⊥ = null AT .
2. If U is any subspace of Rn then U ⊥⊥ = U .
3. Let {f1 , . . . , fm } be an orthonormal basis of Rm . If U = span {f1 , . . . , fk }, then
U ⊥ = span {fk+1 , . . . , fm }
Proof.
1. Assume A is m × n, and let b1 , . . . , bm be the rows of A. If x is a column in Rn , then entry i of Ax is

bi · x, so Ax = 0 if and only if bi · x = 0 for each i. Thus:
x ∈ null A ⇔ bi · x = 0 for each i ⇔ x ∈ ( span {b1 , . . . , bm })⊥ = ( row A)⊥

Hence null A = ( row A)⊥ . Now replace A by AT to get null AT = ( row AT )⊥ = ( col A)⊥ , which is
the other identity in (1).
2. If x ∈ U then y · x = 0 for all y ∈ U ⊥, that is x ∈ U ⊥⊥ . This proves that U ⊆ U ⊥⊥ , so it is enough to

show that dim U = dim U ⊥⊥ . By Theorem 8.1.4 we see that dim V ⊥ = n − dim V for any subspace
V ⊆ Rn . Hence
dim U ⊥⊥ = n − dim U ⊥ = n − (n − dim U ) = dim U , as required
3. We have span {fk+1 , . . . , fm } ⊆ U ⊥ because {f1 , . . . , fm } is orthogonal. For the other inclusion, let
x ∈ U ⊥ so fi · x = 0 for i = 1, 2, . . . , k. By the Expansion Theorem 5.3.6:
x = (f1 · x)f1 + · · · + (fk · x)fk + (fk+1 · x)fk+1 + · · · + (fm · x)fm

= 0 + ··· + 0 + (fk+1 · x)fk+1 + · · · + (fm · x)fm
Hence U ⊥ ⊆ span {fk+1 , . . . , fm }.
With this we can see how any SVD for a matrix A provides orthonormal bases for each of the four
fundamental subspaces of A.
Theorem 8.6.2
Let A be an m × n real matrix, let A = U ΣV T be any SVD for A where U and V are orthogonal of
size m × m and n × n respectively, and let

D 0
Σ= where D = diag (λ1 , λ2 , . . . , λr ), with each λi > 0
0 0 m×n

Write U = u1 · · · ur · · · um and V = v1 · · · vr · · · vn , so {u1 , . . . , ur , . . . , um }
and {v1 , . . . , vr , . . . , vn } are orthonormal bases of Rm and Rn respectively. Then
√ √ √
1. r = rank A, and the singular values of A are λ1 , λ2 , . . . , λr .
2. The fundamental spaces are described as follows:
a. {u1 , . . . , ur } is an orthonormal basis of col A.

b. {ur+1 , . . . , um } is an orthonormal basis of null AT .
c. {vr+1 , . . . , vn } is an orthonormal basis of null A.
d. {v1 , . . ., vr } is an orthonormal basis of row A.
Proof.
1. This is Lemma 8.6.3.

454 Orthogonality
2. a. As col A = col (AV ) by Lemma 5.4.3 and AV = U Σ, (a.) follows from

diag (λ1 , λ2 , . . . , λr ) 0
U Σ = u1 · · · ur · · · um = λ1 u1 · · · λr ur 0 · · · 0
0 0
(a.)
b. We have ( col A)⊥ = ( span {u1 , . . . , ur })⊥ = span {ur+1 , . . . , um } by Lemma 8.6.4(3). This
proves (b.) because ( col A)⊥ = null AT by Lemma 8.6.4(1).
c. We have dim ( null A) + dim ( im A) = n by the Dimension Theorem 7.2.4, applied to
T : Rn → Rm where T (x) = Ax. Since also im A = col A by Lemma 8.6.1, we obtain
dim ( null A) = n − dim ( col A) = n − r = dim ( span {vr+1 , . . . , vn })
So to prove (c.) it is enough to show that v j ∈ null A whenever j > r. To this end write
λr+1 = · · · = λn = 0, so E T E = diag (λ12 , . . . , λr2 , λr+1

2
, . . . , λn2 )
Observe that each λ j is an eigenvalue of ΣT Σ with eigenvector e j = column j of In . Thus

v j = V e j for each j. As AT A = V ΣT ΣV T (proof of Lemma 8.6.3), we obtain

(AT A)v j = (V ΣT ΣV T )(V e j ) = V (ΣT Σe j ) = V λ j2 e j = λ j2V e j = λ j2 v j
for 1 ≤ j ≤ n. Thus each v j is an eigenvector of AT A corresponding to λ j2 . But then
kAv j k2 = (Av j )T Av j = vTj (AT Av j ) = vTj (λ j2 v j ) = λ j2 kv j k2 = λ j2 for i = 1, . . . , n
In particular, Av j = 0 whenever j > r, so v j ∈ null A if j > r, as desired. This proves (c).

(c.)
d. Observe that span {vr+1 , . . . , vn } = null A = ( row A)⊥ by Lemma 8.6.4(1). But then parts
(2) and (3) of Lemma 8.6.4 show
⊥
row A = ( row A)⊥ = ( span {vr+1 , . . . , vn })⊥ = span {v1 , . . . , vr }
This proves (d.), and hence Theorem 8.6.2.
Example 8.6.2
Consider the homogeneous linear system
Ax = 0 of m equations in n variables
Then the set of all solutions is null A. Hence if A = U ΣV T is any SVD for A then (in the notation
of Theorem 8.6.2) {vr+1 , . . . , vn } is an orthonormal basis of the set of solutions for the system. As
such they are a set of basic solutions for the system, the most basic notion in Chapter 1.
8.6.3. The Polar Decomposition of a Real Square Matrix
If A is real and n × n the factorization in the title is related to the polar decomposition A. Unlike the SVD,
in this case the decomposition is uniquely determined by A.
Recall (Section 8.3) that a symmetric matrix A is called positive definite if and only if xT Ax > 0 for
every column x 6= 0 ∈ Rn . Before proceeding, we must explore the following weaker notion:
Definition 8.11
A real n × n matrix G is called positive10 if it is symmetric and
xT Gx ≥ 0 for all x ∈ Rn

1 1
Clearly every positive definite matrix is positive, but the converse fails. Indeed, A = is positive
1 1
T T
because, if x = a b in R2 , then xT Ax = (a + b)2 ≥ 0. But yT Ay = 0 if y = 1 −1 , so A is not
positive definite.
Lemma 8.6.5
Let G denote an n × n positive matrix.
1. If A is any m × n matrix and G is positive, then AT GA is positive (and m × m).
2. If G = diag (d1 , d2 , · · · , dn ) and each di ≥ 0 then G is positive.
Proof.
1. xT (AT GA)x = (Ax)T G(Ax) ≥ 0 because G is positive.

T
2. If x = x1 x2 · · · xn , then
xT Gx = d1 x21 + d2 x22 + · · · + dn x2n ≥ 0
because di ≥ 0 for each i.
Definition 8.12
If A is a real n × n matrix, a factorization
A = GQ where G is positive and Q is orthogonal
is called a polar decomposition for A.
10 Also called positive semi-definite.

456 Orthogonality
Any SVD for a real square matrix A yields a polar form for A.
Theorem 8.6.3
Every square real matrix has a polar form.
Proof. Let A = U ΣV T be a SVD for A with Σ as in Definition 8.9 and m = n. Since U T U = In here we
have
A = U ΣV T = (U Σ)(U T U )V T = (U ΣU T )(UV T )
So if we write G = U ΣU T and Q = UV T , then Q is orthogonal, and it remains to show that G is positive.
But this follows from Lemma 8.6.5.
The SVD for a square matrix A is not unique (In = PIn PT for any orthogonal matrix P). But given the
proof of Theorem 8.6.3 it is surprising that the polar decomposition is unique.11 We omit the proof.
The name “polar form” is reminiscent of the same form for complex numbers (see Appendix A). This
is no coincidence. To see why, we represent the complex numbers as real 2 × 2 matrices. Write M2 (R) for
the set of all real 2 × 2 matrices, and define

a −b
σ : C → M2 (R) by σ (a + bi) = for all a + bi in C
b a
One verifies that σ preserves addition and multiplication in the sense that
σ (zw) = σ (z)σ (w) and σ (z + w) = σ (z) + σ (w)
for all complex numbers z and w. Since θ is one-to-one we may identify each complex number a + bi with
the matrix θ (a + bi), that is we write

a −b
a + bi = for all a + bi in C
b a

0 0 1 0 0 −1 r 0
Thus 0 = ,1= = I2 , i = , and r = if r is real.
0 0 0 1 1 0 0 r
√
If z = a + bi is nonzero then the absolute value r = |z| = a2 + b2 6= 0. If θ is the angle of z in standard
position, then cos θ = a/r and sin θ = b/r. Observe:

a −b r 0 a/r −b/r r 0 cos θ − sin θ
= = = GQ (xiii)
b a 0 r b/r a/r 0 r sin θ cos θ

r 0 cos θ − sin θ
where G = is positive and Q = is orthogonal. But in C we have G = r and
0 r sin θ cos θ
Q = cos θ + i sin θ so (xiii) reads z = r(cos θ + i sin θ ) = reiθ which is the classical
polar form for the
a −b
complex number a + bi. This is why (xiii) is called the polar form of the matrix ; Definition
b a
8.12 simply adopts the terminology for n × n matrices.
11 See J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997, page 379.
8.6.4. The Pseudoinverse of a Matrix
It is impossible for a non-square matrix A to have an inverse (see the footnote to Definition 2.11). Nonethe-
less, one candidate for an “inverse” of A is an m × n matrix B such that
ABA = A and BAB = B
Such a matrix B is called a middle inverse for A. If A is invertible then A−1 is the unique middle inverse
 for

1 0
A, but a middle inverse is not unique in general, even for square matrices. For example, if A =  0 0 
0 0
1 0 0
then B = is a middle inverse for A for any b.
b 0 0
If ABA = A and BAB = B it is easy to see that AB and BA are both idempotent matrices. In 1955 Roger
Penrose observed that the middle inverse is unique if both AB and BA are symmetric. We omit the proof.
Theorem 8.6.4: Penrose’ Theorem12

Given any real m × n matrix A, there is exactly one n × m matrix B such that A and B satisfy the
following conditions:
P1 ABA = A and BAB = B.
P2 Both AB and BA are symmetric.
Definition 8.13
Let A be a real m × n matrix. The pseudoinverse of A is the unique n × m matrix A+ such that A
and A+ satisfy P1 and P2, that is:
AA+ A = A, A+ AA+ = A+ , and both AA+ and A+ A are symmetric13
If A is invertible then A+ = A−1 as expected. In general, the symmetry in conditions P1 and P2 shows
that A is the pseudoinverse of A+ , that is A++ = A.
12 R. Penrose, A generalized inverse for matrices, Proceedings of the Cambridge Philosophical Society 5l (1955), 406-413.
In fact Penrose proved this for any complex matrix, where AB and BA are both required to be hermitian (see Definition 8.18 in
the following section).
13 Penrose called the matrix A+ the generalized inverse of A, but the term pseudoinverse is now commonly used. The matrix
+
A is also called the Moore-Penrose inverse after E.H. Moore who had the idea in 1935 as part of a larger work on “General
Analysis”. Penrose independently re-discovered it 20 years later.
458 Orthogonality
Theorem 8.6.5
Let A be an m × n matrix.
1. If rank A = m then AAT is invertible and A+ = AT (AAT )−1 .
2. If rank A = n then AT A is invertible and A+ = (AT A)−1 AT .
Proof. Here AAT (respectively AT A) is invertible by Theorem 5.4.4 (respectively Theorem 5.4.3). The rest
is a routine verification.
In general, given an m × n matrix A, the pseudoinverse A+ can be computed from any SVD for A. To
see how, we need some notation. Let A = U ΣV T be an SVD for A (as in Definition 8.9) where U and V
D 0
are orthogonal and Σ = in block form where D = diag (d1 , d2 , . . . , dr ) where each di > 0.
0 0 m×n
Hence D is invertible, so we make:
Definition 8.14
−1
′ D 0
Σ = .
0 0 n×m
A routine calculation gives:
Lemma 8.6.6

• ΣΣ′ Σ = Σ Ir 0
• ΣΣ′ =
0 0 m×m

Ir 0
• Σ′ Σ =
• Σ′ ΣΣ′ = Σ′ 0 0 n×n
That is, Σ′ is the pseudoinverse of Σ.

Now given A = U ΣV T , define B = V Σ′U T . Then
ABA = (U ΣV T )(V Σ′U T )(U ΣV T ) = U (ΣΣ′ Σ)V T = U ΣV T = A
by Lemma 8.6.6. Similarly BAB = B. Moreover AB = U (ΣΣ′)U T and BA = V (Σ′ Σ)V T are both symmetric
again by Lemma 8.6.6. This proves
Theorem 8.6.6
Let A be real and m × n, and let A = U ΣV T is any SVD for A as in Definition 8.9. Then
A+ = V Σ′U T .
Of
 coursewe can always use the SVD constructed in Theorem 8.6.1 to find the pseudoinverse. If
1 0
1 0 0
A =  0 0 , we observed above that B = is a middle inverse for A for any b. Furthermore
b 0 0
0 0
AB is symmetric but BA is not, so B 6= A+ .
Example 8.6.3
 
1 0
Find A+ if A =  0 0 .
0 0

1 0
T
Solution. A A = with eigenvalues λ1 = 1 and λ2 = 0 and corresponding eigenvectors
0 0
1 0
q1 = and q2 = . Hence Q = q1 q2 = I2 . Also A has rank 1 with singular values
0 1
 
1 0
  ′ 1 0 0
σ1 = 1 and σ2 = 0, so ΣA = 0 0 = A and ΣA = = AT in this case.
0 0 0
  0 0
  
1 0 1
 
Since Aq1 = 0 and Aq2 = 0 , we have p1 = 0  which extends to an orthonormal
  
0 0 0
   
0 0
basis {p1 , p2 , p3 } of R3 where (say) p2 =  1  and p3 =  0 . Hence
0 1
P = p1 p2 p3 =I, so the SVD for A is A = PΣA Q . Finally, the pseudoinverse of A is
T
1 0 0
A+ = QΣ′A PT = Σ′A = . Note that A+ = AT in this case.
0 0 0
The following Lemma collects some properties of the pseudoinverse that mimic those of the inverse.
The verifications are left as exercises.
Lemma 8.6.7
Let A be an m × n matrix of rank r.
1. A++ = A.
2. If A is invertible then A+ = A−1 .
3. (AT )+ = (A+ )T .
4. (kA)+ = kA+ for any real k.
5. (UAV )+ = U T (A+)V T whenever U and V are orthogonal.

460 Orthogonality
Exercises for 8.6

Exercise 8.6.1 If ACA = A show that B = CAC is a mid- Exercise 8.6.10 Find an SVD for A = 0 1
.
dle inverse for A. −1 0
Exercise 8.6.2 For any matrix A show that Exercise 8.6.11 If A = U ΣV T is an SVD for A, find an
SVD for AT .
ΣAT = (ΣA )T
Exercise 8.6.12 Let A be a real, m × n matrix with pos-
itive singular values σ1 , σ2 , . . . , σr , and write
Exercise 8.6.3 If A is m × n with all singular values pos-
itive, what is rank A? s(x) = (x − σ1 )(x − σ2 ) · · · (x − σr )
Exercise 8.6.4 If A has singular values σ1 , . . . , σr , what a. Show that cAT A (x) = s(x)xn−r and
are the singular values of: cAT A (c) = s(x)xm−r .
a. AT b. tA where t > 0 is real b. If m ≤ n conclude that cAT A (x) = s(x)xn−m .

c. A−1 assuming A is invertible.
Exercise 8.6.13 If G is positive show that:
Exercise 8.6.5 If A is square show that det A is the prod- a. rG is positive if r ≥ 0

uct of the singular values of A.
b. G + H is positive for any positive H.
Exercise 8.6.6 If A is square and real, show that A = 0
if and only if every eigenvalue of A is 0. Exercise 8.6.14 If G is positive and λ is an eigenvalue,
Exercise 8.6.7 Given a SVD for an invertible matrix A, show that λ ≥ 0.
find one for A−1 . How are ΣA and ΣA−1 related? Exercise 8.6.15 If G is positive show that G = H 2 for
Exercise 8.6.8 Let A−1 = A = AT where A is n × n. some positive matrix H. [Hint: Preceding exercise and
Given any orthogonal n × n matrix U , find an orthogonal Lemma 8.6.5]
matrix V such that A = U ΣAV T is an SVD for A. Exercise 8.6.16 If A is n × n show that AAT and AT A

0 1 are similar. [Hint: Start with an SVD for A.]
If A = do this for:
1 0
Exercise 8.6.17 Find A+ if:

1 3 −4 √1
1 −1 1 2
a. U = 5 b. U = a. A =
4 3 2 1 1 −1 −2
 
Exercise 8.6.9 Find a SVD for the following matrices: 1 −1
b. A =  0 0 
    1 −1
1 −1 1 1 1
a. A =  0 1  b.  −1 0 −2 
1 0 1 2 0 Exercise 8.6.18 Show that (A+ )T = (AT )+ .
8.7. Complex Matrices 461
8.7 Complex Matrices
If A is an n × n matrix, the characteristic polynomial cA (x) is a polynomial of degree n and the eigenvalues
of A are just the roots of cA (x). In most of our examples these roots have been real numbers (in fact,
the examples have been carefully chosen so this will be the case!); but it need not happen, even when
0 1
the characteristic polynomial has real coefficients. For example, if A = then cA (x) = x2 + 1
−1 0
has roots i and −i, where i is a complex number satisfying i2 = −1. Therefore, we have to deal with the
possibility that the eigenvalues of a (real) square matrix might be complex numbers.
In fact, nearly everything in this book would remain true if the phrase real number were replaced by
complex number wherever it occurs. Then we would deal with matrices with complex entries, systems
of linear equations with complex coefficients (and complex solutions), determinants of complex matrices,
and vector spaces with scalar multiplication by any complex number allowed. Moreover, the proofs of
most theorems about (the real version of) these concepts extend easily to the complex case. It is not our
intention here to give a full treatment of complex linear algebra. However, we will carry the theory far
enough to give another proof that the eigenvalues of a real symmetric matrix A are real (Theorem 5.5.7)
and to prove the spectral theorem, an extension of the principal axes theorem (Theorem 8.2.2).
The set of complex numbers is denoted C . We will use only the most basic properties of these numbers
(mainly conjugation and absolute values), and the reader can find this material in Appendix A.
If n ≥ 1, we denote the set of all n-tuples of complex numbers by Cn . As with Rn , these n-tuples will
be written either as row or column matrices and will be referred to as vectors. We define vector operations
on Cn as follows:
(v1 , v2 , . . . , vn ) + (w1 , w2 , . . . , wn ) = (v1 + w1 , v2 + w2 , . . . , vn + wn )

u(v1 , v2 , . . . , vn ) = (uv1 , uv2 , . . . , uvn ) for u in C
With these definitions, Cn satisfies the axioms for a vector space (with complex scalars) given in Chapter 6.
Thus we can speak of spanning sets for Cn , of linearly independent subsets, and of bases. In all cases,
the definitions are identical to the real case, except that the scalars are allowed to be complex numbers. In
particular, the standard basis of Rn remains a basis of Cn , called the standard basis of Cn .

A matrix A = ai j is called a complex matrix if every entry ai j is a complex number. The notion of
conjugation for complex numbers extends to matrices as follows: Define the conjugate of A = ai j to be
the matrix
A = ai j
obtained from A by conjugating every entry. Then (using Appendix A)
A+B = A+B and AB = A B
holds for all (complex) matrices of appropriate size.

Exercises For 8.5

Uploaded by

Copyright:

Available Formats

Exercises For 8.5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exercises For 8.5

Uploaded by

Copyright:

Available Formats

8.6.

The Singular Value Decomposition 445

Exercises for 8.5

 8.5.3 Apply the power method

8.6 The Singular Value Decomposition

im A = {Ax | x in Rn } and col A = span {a | a is a column of A}

8.6.1. Singular Value Decompositions

1. The eigenvalues of AT A and AAT are real and non-negative.

2. AT A and AAT have the same set of positive eigenvalues.

1. Let λ be an eigenvalue of AT A, with eigenvector 0 6= q ∈ Rn . Then:

kAqk2 = (Aq)T (Aq) = qT (AT Aq) = qT (λ q) = λ (qT q) = λ kqk2

AAT (Aq) = A[(AT A)q] = A(λ q) = λ (Aq)

Moreover, Aq 6= 0 since AT Aq = λ q 6= 0 and both λ 6= 0 and q 6= 0. Hence λ is an eigenvalue of

(and do) assume that

By Theorems 8.2.1 and 3.3.4, the matrix

Aqi · Aq j = (Aqi )T Aq j = qTi (AT A)q j = qTi (λ j q j ) = λ j (qTi q j ) = λ j (qi · q j )

Because {q1 , q2 , . . . , qn } is an orthonormal set, this gives

Aqi · Aq j = 0 if i 6= j and kAqi k2 = λi kqi k2 = λi for each i (iii)

We can extract two conclusions from (iii) and (i):

{Aq1 , Aq2 , . . . , Aqr } ⊆ im A is an orthogonal set and Aqi = 0 if i > r (iv)

This shows that U = im A, and so

3. Before proceeding, some definitions are in order:

Clearly σ1 , σ2 , . . . , σr are the positive singular values of A. By (i) we have

σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and σi = 0 if i > r (vii)

By (v) and Lemma 8.6.1, we conclude that

Employing the Gram-Schmidt algorithm (or otherwise), construct pr+1 , . . . , pm so that

5. By (x) and (ii) we have two orthogonal matrices

These matrices are related. In fact we have:

This yields the following expression for AQ in terms of its columns:

Finally, as Q−1 = QT it follows that A = PΣA QT .

With this we can state the main theorem of this Section.

A = PΣA QT where P and Q are orthogonal matrices

Hence the eigenvalues of AT A (in descending order) are λ1 = 3, λ2 = 1 and λ3 = 0 with,

Normalize Aq1 and Aq2 to get

2. The integer r is the rank of the matrix A.

4. Define pi = kA1q k Aqi for i = 1, 2, . . . , r (where r is as in step 1). Then {p1 , p2 , . . . , pr } is

7. Thus A = PΣQT is a SVD for A.

2. The numbers d1 , d2 , . . . , dr are the singular values of AT A in some order.

Proof. Use the notation of Definition 8.9. We have

AT A = (V ΣT U T )(U ΣV T ) = V (ΣT Σ)V T

{d12 , d22 , . . . , dr2 } = {λ1 , λ2 , . . . , λr } are equal as sets

where λ1 , λ2 , . . . , λr are the positive eigenvalues of AT A. Hence

8.6.2. Fundamental Subspaces

row A = span {x | x is a row of A}

col A = span {x | x is a column of A}

1. ( row A)⊥ = null A and ( col A)⊥ = null AT .

2. If U is any subspace of Rn then U ⊥⊥ = U .

3. Let {f1 , . . . , fm } be an orthonormal basis of Rm . If U = span {f1 , . . . , fk }, then

1. Assume A is m × n, and let b1 , . . . , bm be the rows of A. If x is a column in Rn , then entry i of Ax is

x ∈ null A ⇔ bi · x = 0 for each i ⇔ x ∈ ( span {b1 , . . . , bm })⊥ = ( row A)⊥

2. If x ∈ U then y · x = 0 for all y ∈ U ⊥, that is x ∈ U ⊥⊥ . This proves that U ⊆ U ⊥⊥ , so it is enough to

dim U ⊥⊥ = n − dim U ⊥ = n − (n − dim U ) = dim U , as required

x = (f1 · x)f1 + · · · + (fk · x)fk + (fk+1 · x)fk+1 + · · · + (fm · x)fm

Hence U ⊥ ⊆ span {fk+1 , . . . , fm }.

2. The fundamental spaces are described as follows:

a. {u1 , . . . , ur } is an orthonormal basis of col A.

1. This is Lemma 8.6.3.

2. a. As col A = col (AV ) by Lemma 5.4.3 and AV = U Σ, (a.) follows from

8.5.3 Apply the power method