Exercises For 8.5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

8.6.

The Singular Value Decomposition 445

Exercises for 8.5

Exercise 8.5.1 In each case, find the exact eigenvalues Exercise 8.5.4 If A is symmetric, show that each matrix
and determine
 corresponding eigenvectors. Then start Ak in the QR-algorithm is also symmetric. Deduce that
1 they converge to a diagonal matrix.
with x0 = and compute x4 and r3 using the power
1
method.  8.5.5  Apply the QR-algorithm to
Exercise
    2 −3
2 −4 5 2 A= . Explain.
a. A = b. A = 1 −2
−3 3 −3 −2
    Exercise 8.5.6 Given a matrix A, let Ak , Qk , and Rk ,
1 2 3 1 k ≥ 1, be the matrices constructed in the QR-algorithm.
c. A = d. A =
2 1 1 0 Show that Ak = (Q1 Q2 · · · Qk )(Rk · · · R2 R1 ) for each k ≥ 1
and hence that this is a QR-factorization of Ak .
Exercise 8.5.2 In each case, find the exact eigenvalues
[Hint: Show that Qk Rk = Rk−1 Qk−1 for each k ≥ 2, and
and then approximate them using the QR-algorithm.
use this equality to compute (Q1 Q2 · · · Qk )(Rk · · · R2 R1 )
   
1 1 3 1 “from the centre out.” Use the fact that (AB)n+1 =
a. A = b. A =
1 0 1 0 A(BA)n B for any square matrices A and B.]

 8.5.3 Apply the power method


Exercise   to
0 1 1
A= , starting at x0 = . Does it con-
−1 0 1
verge? Explain.

8.6 The Singular Value Decomposition

When working with a square matrix A it is clearly useful to be able to “diagonalize” A, that is to find
a factorization A = Q−1 DQ where Q is invertible and D is diagonal. Unfortunately such a factorization
may not exist for A. However, even if A is not square gaussian elimination provides a factorization of
the form A = PDQ where P and Q are invertible and D is diagonal—the Smith Normal form (Theorem
2.5.3). However, if A is real we can choose P and Q to be orthogonal real matrices and D to be real. Such
a factorization is called a singular value decomposition (SVD) for A, one of the most useful tools in
applied linear algebra. In this Section we show how to explicitly compute an SVD for any real matrix A,
and illustrate some of its many applications.
We need a fact about two subspaces associated with an m × n matrix A:

im A = {Ax | x in Rn } and col A = span {a | a is a column of A}

Then im A is called the image of A (so named because of the linear transformation Rn → Rm with x 7→ Ax);
and col A is called the column space of A (Definition 5.10). Surprisingly, these spaces are equal:

Lemma 8.6.1
For any m × n matrix A, im A = col A.
446 Orthogonality

 
Proof. Let A = a1 a2 · · · an in terms of its columns. Let x ∈ im A, say x = Ay, y in Rn . If
 T
y = y1 y2 · · · yn , then Ay = y1 a1 + y2 a2 + · · · + yn an ∈ col A by Definition 2.5. This shows that
im A ⊆ col A. For the other inclusion, each ak = Aek where ek is column k of In .

8.6.1. Singular Value Decompositions

We know a lot about any real symmetric matrix: Its eigenvalues are real (Theorem 5.5.7), and it is orthog-
onally diagonalizable by the Principal Axes Theorem (Theorem 8.2.2). So for any real matrix A (square
or not), the fact that both AT A and AAT are real and symmetric suggests that we can learn a lot about A by
studying them. This section shows just how true this is.
The following Lemma reveals some similarities between AT A and AAT which simplify the statement
and the proof of the SVD we are constructing.

Lemma 8.6.2
Let A be a real m × n matrix. Then:

1. The eigenvalues of AT A and AAT are real and non-negative.

2. AT A and AAT have the same set of positive eigenvalues.

Proof.

1. Let λ be an eigenvalue of AT A, with eigenvector 0 6= q ∈ Rn . Then:

kAqk2 = (Aq)T (Aq) = qT (AT Aq) = qT (λ q) = λ (qT q) = λ kqk2

Then (1.) follows for AT A, and the case AAT follows by replacing A by AT .

2. Write N(B) for the set of positive eigenvalues of a matrix B. We must show that N(AT A) = N(AAT ).
If λ ∈ N(AT A) with eigenvector 0 6= q ∈ Rn , then Aq ∈ Rm and

AAT (Aq) = A[(AT A)q] = A(λ q) = λ (Aq)

Moreover, Aq 6= 0 since AT Aq = λ q 6= 0 and both λ 6= 0 and q 6= 0. Hence λ is an eigenvalue of


AAT , proving N(AT A) ⊆ N(AAT ). For the other inclusion replace A by AT .

To analyze an m × n matrix A we have two symmetric matrices to work with: AT A and AAT . In view
of Lemma 8.6.2, we choose AT A (sometimes called the Gram matrix of A), and derive a series of facts
which we will need. This narrative is a bit long, but trust that it will be worth the effort. We parse it out in
several steps:

1. The n × n matrix AT A is real and symmetric so, by the Principal Axes Theorem 8.2.2, let
{q1 , q2 , . . . , qn } ⊆ Rn be an orthonormal basis of eigenvectors of AT A, with corresponding eigenval-
ues λ1 , λ2 , . . . , λn . By Lemma 8.6.2(1), λi is real for each i and λi ≥ 0. By re-ordering the qi we may
8.6. The Singular Value Decomposition 447

(and do) assume that


λ1 ≥ λ2 ≥ · · · ≥ λr > 0 and 8 λi = 0 if i > r (i)

By Theorems 8.2.1 and 3.3.4, the matrix


 
Q = q1 q2 · · · qn is orthogonal and orthogonally diagonalizes AT A (ii)

2. Even though the λi are the eigenvalues of AT A, the number r in (i) turns out to be rank A. To understand
why, consider the vectors Aqi ∈ im A. For all i, j:

Aqi · Aq j = (Aqi )T Aq j = qTi (AT A)q j = qTi (λ j q j ) = λ j (qTi q j ) = λ j (qi · q j )

Because {q1 , q2 , . . . , qn } is an orthonormal set, this gives

Aqi · Aq j = 0 if i 6= j and kAqi k2 = λi kqi k2 = λi for each i (iii)

We can extract two conclusions from (iii) and (i):

{Aq1 , Aq2 , . . . , Aqr } ⊆ im A is an orthogonal set and Aqi = 0 if i > r (iv)

With this write U = span {Aq1 , Aq2 , . . . , Aqr } ⊆ im A; we claim that U = im A, that is im A ⊆ U .
For this we must show that Ax ∈ U for each x ∈ Rn . Since {q1 , . . . , qr , . . . , qn } is a basis of Rn (it is
orthonormal), we can write xk = t1 q1 + · · · + tr qr + · · · + tn qn where each t j ∈ R. Then, using (iv) we
obtain
Ax = t1 Aq1 + · · · + tr Aqr + · · · + tn Aqn = t1 Aq1 + · · · + tr Aqr ∈ U

This shows that U = im A, and so


{Aq1 , Aq2 , . . . , Aqr } is an orthogonal basis of im (A) (v)

But col A = im A by Lemma 8.6.1, and rank A = dim ( col A) by Theorem 5.4.1, so
(v)
rank A = dim ( col A) = dim ( im A) = r (vi)

3. Before proceeding, some definitions are in order:

Definition 8.7
√ (iii)
The real numbers σi = λi = kAq̄i k for i = 1, 2, . . . , n, are called the singular values of the
matrix A.

Clearly σ1 , σ2 , . . . , σr are the positive singular values of A. By (i) we have

σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and σi = 0 if i > r (vii)

With (vi) this makes the following definitions depend only upon A.

8 Of course they could all be positive (r = n) or all zero (so AT A = 0, and hence A = 0 by Exercise 5.3.9).
448 Orthogonality

Definition 8.8
Let A be a real, m × n matrix of rank r, with positive singular values σ1 ≥ σ2 ≥ · · · ≥ σr > 0 and
σi = 0 if i > r. Define:
 
DA 0
DA = diag (σ1 , . . . , σr ) and ΣA =
0 0 m×n
Here ΣA is in block form and is called the singular matrix of A.

The singular values σi and the matrices DA and ΣA will be referred to frequently below.
4. Returning to our narrative, normalize the vectors Aq1 , Aq2 , . . . , Aqr , by defining
1
pi = kAqi k Aqi ∈ Rm for each i = 1, 2, . . . , r (viii)

By (v) and Lemma 8.6.1, we conclude that


{p1 , p2 , . . . , pr } is an orthonormal basis of col A ⊆ Rm (ix)

Employing the Gram-Schmidt algorithm (or otherwise), construct pr+1 , . . . , pm so that


{p1 , . . . , pr , . . . , pm } is an orthonormal basis of Rm (x)

5. By (x) and (ii) we have two orthogonal matrices


   
P = p1 · · · pr · · · pm of size m × m and Q = q1 · · · qr · · · qn of size n × n

These matrices are related. In fact we have:


p (iii) (viii)
σi pi = λi pi = kAqi kpi = Aqi for each i = 1, 2, . . . , r (xi)

This yields the following expression for AQ in terms of its columns:


  (iv)  
AQ = Aq1 · · · Aqr Aqr+1 · · · Aqn = σ1 p1 · · · σr pr 0 · · · 0 (xii)

Then we compute:
 
σ1 · · · 0 0 ··· 0
 .. . . . .. .. .. 
 . . . . 
  0 ··· σr 0 ··· 0 

PΣA = p1 · · · pr pr+1 · · · pm  
 0 ··· 0 0 ··· 0 
 . .. .. .. 
 .. . . . 
0 ··· 0 0 ··· 0
 
= σ1 p1 · · · σr pr 0 · · · 0
(xii)
= AQ

Finally, as Q−1 = QT it follows that A = PΣA QT .

With this we can state the main theorem of this Section.


8.6. The Singular Value Decomposition 449

Theorem 8.6.1
Let A be a real m × n matrix, and let σ1 ≥ σ2 ≥ · · · ≥ σr > 0 be the positive singular values of A.
Then r is the rank of A and we have the factorization

A = PΣA QT where P and Q are orthogonal matrices

The factorization A = PΣA QT in Theorem 8.6.1, where P and Q are orthogonal matrices, is called a
Singular Value Decomposition (SVD) of A. This decomposition is not unique. For example if r < m then
the vectors pr+1 , . . . , pm can be any extension of {p1 , . . ., pr } to an orthonormal basis of Rm , and each
will lead to a different matrix P in the decomposition. For a more dramatic example, if A = In then ΣA = In ,
and A = PΣA PT is a SVD of A for any orthogonal n × n matrix P.

Example 8.6.1
 
1 0 1
Find a singular value decomposition for A = .
−1 1 0
 
2 −1 1
Solution. We have AT A =  −1 1 0 , so the characteristic polynomial is
1 0 1
 
x−2 1 −1
cAT A (x) = det  1 x−1 0  = (x − 3)(x − 1)x
−1 0 x−1

Hence the eigenvalues of AT A (in descending order) are λ1 = 3, λ2 = 1 and λ3 = 0 with,


respectively, unit eigenvectors
     
2 0 −1
q1 = √1  −1  , q2 = √1  1  , and q3 = √13  −1 
6 2
1 1 1
It follows that the orthogonal matrix Q in Theorem 8.6.1 is
 √ 
  2 √ 0 −√2
1 
Q = q1 q2 q3 = 6 −1 √3 −√2 

1 3 2

The singular values here are σ1 = 3, σ2 = 1 and σ3 = 0, so rank (A) = 2—clear in this
case—and the singular matrix is
   √ 
σ1 0 0 3 0 0
ΣA = =
0 σ2 0 0 1 0
So it remains to find the 2 × 2 orthogonal matrix P in Theorem 8.6.1. This involves the vectors

  √
   
6 1 2 1 0
Aq1 = 2 , Aq2 = 2 , and Aq3 =
−1 1 0
450 Orthogonality

Normalize Aq1 and Aq2 to get


   
√1
1 √1
1
p1 = and p2 =
2 −1 2 1

In this case, {p1 , p2 } is already a basis of R2 (so the Gram-Schmidt algorithm is not needed), and
we have the 2 × 2 orthogonal matrix
 
  1 1 1
P = p1 p2 = √
2 −1 1
Finally (by Theorem 8.6.1) the singular value decomposition for A is
 
  √  2 −1
√ √ 1
1 1 3 0 0 √1 
A = PΣA QT = √12 √ 0 √ 3 √3 
−1 1 0 1 0 6
− 2 − 2 2
Of course this can be confirmed by direct matrix multiplication.

Thus, computing an SVD for a real matrix A is a routine matter, and we now describe a systematic
procedure for doing so.

SVD Algorithm
Given a real m × n matrix A, find an SVD A = PΣA QT as follows:

1. Use the Diagonalization Algorithm (see page 181) to find the (real and non-negative)
eigenvalues λ1 , λ2 , . . . , λn of AT A with corresponding (orthonormal) eigenvectors
q1 , q2 , . . . , qn . Reorder the qi (if necessary) to ensure that the nonzero eigenvalues are
λ1 ≥ λ2 ≥ · · · ≥ λr > 0 and λi = 0 if i > r.

2. The integer r is the rank of the matrix A.


 
3. The n × n orthogonal matrix Q in the SVD is Q = q1 q2 · · · qn .

4. Define pi = kA1q k Aqi for i = 1, 2, . . . , r (where r is as in step 1). Then {p1 , p2 , . . . , pr } is


i
orthonormal in Rm so (using Gram-Schmidt or otherwise) extend it to an orthonormal basis
{p1 , . . . , pr , . . . , pm } in Rm .
 
5. The m × m orthogonal matrix P in the SVD is P = p1 · · · pr · · · pm .

6. The singular values for A are σ1 , σ2, . . . , σn where σi = λi for each i. Hence the nonzero
 values are σ1 ≥ σ2 ≥ · · · ≥ σr > 0, and so the singular matrix of A in the SVD is
singular
diag (σ1 , . . . , σr ) 0
ΣA = .
0 0 m×n

7. Thus A = PΣQT is a SVD for A.

In practise the singular values σi , the matrices P and Q, and even the rank of an m × n matrix are not
8.6. The Singular Value Decomposition 451

calculated this way. There are sophisticated numerical algorithms for calculating them to a high degree of
accuracy. The reader is referred to books on numerical linear algebra.
So the main virtue of Theorem 8.6.1 is that it provides a way of constructing an SVD for every real
matrix A. In particular it shows that every real matrix A has a singular value decomposition9 in the
following, more general, sense:

Definition 8.9
A Singular Value Decomposition (SVD) of an m ×n matrix A of rank r is a factorization
D 0
A = U ΣV T where U and V are orthogonal and Σ = in block form where
0 0 m×n
D = diag (d1 , d2 , . . . , dr ) where each di > 0, and r ≤ m and r ≤ n.

Note that for any SVD A = U ΣV T we immediately obtain some information about A:

Lemma 8.6.3
If A = U ΣV T is any SVD for A as in Definition 8.9, then:

1. r = rank A.

2. The numbers d1 , d2 , . . . , dr are the singular values of AT A in some order.

Proof. Use the notation of Definition 8.9. We have

AT A = (V ΣT U T )(U ΣV T ) = V (ΣT Σ)V T

so ΣT Σ and AT A are similar n ×n matrices (Definition 5.11). Hence r = rank A by Corollary 5.4.3, proving
(1.). Furthermore, ΣT Σ and AT A have the same eigenvalues by Theorem 5.5.1; that is (using (1.)):

{d12 , d22 , . . . , dr2 } = {λ1 , λ2 , . . . , λr } are equal as sets

where λ1 , λ2 , . . . , λr are the positive eigenvalues of AT A. Hence


√ there is a permutation τ of {1, 2, · · · , r}
such that di2 = λiτ for each i = 1, 2, . . . , r. Hence di = λiτ = σiτ for each i by Definition 8.7. This
proves (2.).

We note in passing that more is true. Let A be m × n of rank r, and let A = U ΣV T be any SVD for A.
Using the proof of Lemma 8.6.3 we have di = σiτ for some permutation τ of {1, 2, . . . , r}. In fact, it can
be shown that there exist orthogonal matrices U1 and V1 obtained from U and V by τ -permuting columns
and rows respectively, such that A = U1 ΣAV1T is an SVD of A.
9 In fact every complex matrix has an SVD [J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997]
452 Orthogonality

8.6.2. Fundamental Subspaces

It turns out that any singular value decomposition contains a great deal of information about an m ×
n matrix A and the subspaces associated with A. For example, in addition to Lemma 8.6.3, the set
{p1 , p2 , . . . , pr } of vectors constructed in the proof of Theorem 8.6.1 is an orthonormal basis of col A
(by (v) and (viii) in the proof). There are more such examples, which is the thrust of this subsection.
In particular, there are four subspaces associated to a real m × n matrix A that have come to be called
fundamental:

Definition 8.10
The fundamental subspaces of an m × n matrix A are:

row A = span {x | x is a row of A}

col A = span {x | x is a column of A}

null A = {x ∈ Rn | Ax = 0}

null AT = {x ∈ Rn | AT x = 0}

If A = U ΣV T is any SVD for the real m × n matrix A, any orthonormal bases of U and V provide orthonor-
mal bases for each of these fundamental subspaces. We are going to prove this, but first we need three
properties related to the orthogonal complement U ⊥ of a subspace U of Rn , where (Definition 8.1):

U ⊥ = {x ∈ Rn | u · x = 0 for all u ∈ U }

The orthogonal complement plays an important role in the Projection Theorem (Theorem 8.1.3), and we
return to it in Section 10.2. For now we need:

Lemma 8.6.4
If A is any matrix then:

1. ( row A)⊥ = null A and ( col A)⊥ = null AT .

2. If U is any subspace of Rn then U ⊥⊥ = U .

3. Let {f1 , . . . , fm } be an orthonormal basis of Rm . If U = span {f1 , . . . , fk }, then

U ⊥ = span {fk+1 , . . . , fm }

Proof.

1. Assume A is m × n, and let b1 , . . . , bm be the rows of A. If x is a column in Rn , then entry i of Ax is


bi · x, so Ax = 0 if and only if bi · x = 0 for each i. Thus:

x ∈ null A ⇔ bi · x = 0 for each i ⇔ x ∈ ( span {b1 , . . . , bm })⊥ = ( row A)⊥


8.6. The Singular Value Decomposition 453

Hence null A = ( row A)⊥ . Now replace A by AT to get null AT = ( row AT )⊥ = ( col A)⊥ , which is
the other identity in (1).

2. If x ∈ U then y · x = 0 for all y ∈ U ⊥, that is x ∈ U ⊥⊥ . This proves that U ⊆ U ⊥⊥ , so it is enough to


show that dim U = dim U ⊥⊥ . By Theorem 8.1.4 we see that dim V ⊥ = n − dim V for any subspace
V ⊆ Rn . Hence

dim U ⊥⊥ = n − dim U ⊥ = n − (n − dim U ) = dim U , as required

3. We have span {fk+1 , . . . , fm } ⊆ U ⊥ because {f1 , . . . , fm } is orthogonal. For the other inclusion, let
x ∈ U ⊥ so fi · x = 0 for i = 1, 2, . . . , k. By the Expansion Theorem 5.3.6:

x = (f1 · x)f1 + · · · + (fk · x)fk + (fk+1 · x)fk+1 + · · · + (fm · x)fm


= 0 + ··· + 0 + (fk+1 · x)fk+1 + · · · + (fm · x)fm

Hence U ⊥ ⊆ span {fk+1 , . . . , fm }.

With this we can see how any SVD for a matrix A provides orthonormal bases for each of the four
fundamental subspaces of A.

Theorem 8.6.2
Let A be an m × n real matrix, let A = U ΣV T be any SVD for A where U and V are orthogonal of
size m × m and n × n respectively, and let
 
D 0
Σ= where D = diag (λ1 , λ2 , . . . , λr ), with each λi > 0
0 0 m×n
   
Write U = u1 · · · ur · · · um and V = v1 · · · vr · · · vn , so {u1 , . . . , ur , . . . , um }
and {v1 , . . . , vr , . . . , vn } are orthonormal bases of Rm and Rn respectively. Then
√ √ √
1. r = rank A, and the singular values of A are λ1 , λ2 , . . . , λr .

2. The fundamental spaces are described as follows:

a. {u1 , . . . , ur } is an orthonormal basis of col A.


b. {ur+1 , . . . , um } is an orthonormal basis of null AT .
c. {vr+1 , . . . , vn } is an orthonormal basis of null A.
d. {v1 , . . ., vr } is an orthonormal basis of row A.

Proof.

1. This is Lemma 8.6.3.


454 Orthogonality

2. a. As col A = col (AV ) by Lemma 5.4.3 and AV = U Σ, (a.) follows from


 
  diag (λ1 , λ2 , . . . , λr ) 0  
U Σ = u1 · · · ur · · · um = λ1 u1 · · · λr ur 0 · · · 0
0 0

(a.)
b. We have ( col A)⊥ = ( span {u1 , . . . , ur })⊥ = span {ur+1 , . . . , um } by Lemma 8.6.4(3). This
proves (b.) because ( col A)⊥ = null AT by Lemma 8.6.4(1).
c. We have dim ( null A) + dim ( im A) = n by the Dimension Theorem 7.2.4, applied to
T : Rn → Rm where T (x) = Ax. Since also im A = col A by Lemma 8.6.1, we obtain

dim ( null A) = n − dim ( col A) = n − r = dim ( span {vr+1 , . . . , vn })

So to prove (c.) it is enough to show that v j ∈ null A whenever j > r. To this end write

λr+1 = · · · = λn = 0, so E T E = diag (λ12 , . . . , λr2 , λr+1


2
, . . . , λn2 )

Observe that each λ j is an eigenvalue of ΣT Σ with eigenvector e j = column j of In . Thus


v j = V e j for each j. As AT A = V ΣT ΣV T (proof of Lemma 8.6.3), we obtain

(AT A)v j = (V ΣT ΣV T )(V e j ) = V (ΣT Σe j ) = V λ j2 e j = λ j2V e j = λ j2 v j

for 1 ≤ j ≤ n. Thus each v j is an eigenvector of AT A corresponding to λ j2 . But then

kAv j k2 = (Av j )T Av j = vTj (AT Av j ) = vTj (λ j2 v j ) = λ j2 kv j k2 = λ j2 for i = 1, . . . , n

In particular, Av j = 0 whenever j > r, so v j ∈ null A if j > r, as desired. This proves (c).


(c.)
d. Observe that span {vr+1 , . . . , vn } = null A = ( row A)⊥ by Lemma 8.6.4(1). But then parts
(2) and (3) of Lemma 8.6.4 show
 ⊥
row A = ( row A)⊥ = ( span {vr+1 , . . . , vn })⊥ = span {v1 , . . . , vr }

This proves (d.), and hence Theorem 8.6.2.

Example 8.6.2
Consider the homogeneous linear system

Ax = 0 of m equations in n variables

Then the set of all solutions is null A. Hence if A = U ΣV T is any SVD for A then (in the notation
of Theorem 8.6.2) {vr+1 , . . . , vn } is an orthonormal basis of the set of solutions for the system. As
such they are a set of basic solutions for the system, the most basic notion in Chapter 1.
8.6. The Singular Value Decomposition 455

8.6.3. The Polar Decomposition of a Real Square Matrix

If A is real and n × n the factorization in the title is related to the polar decomposition A. Unlike the SVD,
in this case the decomposition is uniquely determined by A.
Recall (Section 8.3) that a symmetric matrix A is called positive definite if and only if xT Ax > 0 for
every column x 6= 0 ∈ Rn . Before proceeding, we must explore the following weaker notion:

Definition 8.11
A real n × n matrix G is called positive10 if it is symmetric and

xT Gx ≥ 0 for all x ∈ Rn

 
1 1
Clearly every positive definite matrix is positive, but the converse fails. Indeed, A = is positive
1 1
 T  T
because, if x = a b in R2 , then xT Ax = (a + b)2 ≥ 0. But yT Ay = 0 if y = 1 −1 , so A is not
positive definite.

Lemma 8.6.5
Let G denote an n × n positive matrix.

1. If A is any m × n matrix and G is positive, then AT GA is positive (and m × m).

2. If G = diag (d1 , d2 , · · · , dn ) and each di ≥ 0 then G is positive.

Proof.

1. xT (AT GA)x = (Ax)T G(Ax) ≥ 0 because G is positive.


 T
2. If x = x1 x2 · · · xn , then

xT Gx = d1 x21 + d2 x22 + · · · + dn x2n ≥ 0

because di ≥ 0 for each i.

Definition 8.12
If A is a real n × n matrix, a factorization

A = GQ where G is positive and Q is orthogonal

is called a polar decomposition for A.

10 Also called positive semi-definite.


456 Orthogonality

Any SVD for a real square matrix A yields a polar form for A.

Theorem 8.6.3
Every square real matrix has a polar form.

Proof. Let A = U ΣV T be a SVD for A with Σ as in Definition 8.9 and m = n. Since U T U = In here we
have
A = U ΣV T = (U Σ)(U T U )V T = (U ΣU T )(UV T )
So if we write G = U ΣU T and Q = UV T , then Q is orthogonal, and it remains to show that G is positive.
But this follows from Lemma 8.6.5.
The SVD for a square matrix A is not unique (In = PIn PT for any orthogonal matrix P). But given the
proof of Theorem 8.6.3 it is surprising that the polar decomposition is unique.11 We omit the proof.
The name “polar form” is reminiscent of the same form for complex numbers (see Appendix A). This
is no coincidence. To see why, we represent the complex numbers as real 2 × 2 matrices. Write M2 (R) for
the set of all real 2 × 2 matrices, and define
 
a −b
σ : C → M2 (R) by σ (a + bi) = for all a + bi in C
b a

One verifies that σ preserves addition and multiplication in the sense that

σ (zw) = σ (z)σ (w) and σ (z + w) = σ (z) + σ (w)

for all complex numbers z and w. Since θ is one-to-one we may identify each complex number a + bi with
the matrix θ (a + bi), that is we write
 
a −b
a + bi = for all a + bi in C
b a
       
0 0 1 0 0 −1 r 0
Thus 0 = ,1= = I2 , i = , and r = if r is real.
0 0 0 1 1 0 0 r

If z = a + bi is nonzero then the absolute value r = |z| = a2 + b2 6= 0. If θ is the angle of z in standard
position, then cos θ = a/r and sin θ = b/r. Observe:
       
a −b r 0 a/r −b/r r 0 cos θ − sin θ
= = = GQ (xiii)
b a 0 r b/r a/r 0 r sin θ cos θ
   
r 0 cos θ − sin θ
where G = is positive and Q = is orthogonal. But in C we have G = r and
0 r sin θ cos θ
Q = cos θ + i sin θ so (xiii) reads z = r(cos θ + i sin θ ) = reiθ which is the classical
 polar form for the
a −b
complex number a + bi. This is why (xiii) is called the polar form of the matrix ; Definition
b a
8.12 simply adopts the terminology for n × n matrices.
11 See J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997, page 379.
8.6. The Singular Value Decomposition 457

8.6.4. The Pseudoinverse of a Matrix

It is impossible for a non-square matrix A to have an inverse (see the footnote to Definition 2.11). Nonethe-
less, one candidate for an “inverse” of A is an m × n matrix B such that

ABA = A and BAB = B

Such a matrix B is called a middle inverse for A. If A is invertible then A−1 is the unique middle inverse
 for

1 0
A, but a middle inverse is not unique in general, even for square matrices. For example, if A =  0 0 
  0 0
1 0 0
then B = is a middle inverse for A for any b.
b 0 0
If ABA = A and BAB = B it is easy to see that AB and BA are both idempotent matrices. In 1955 Roger
Penrose observed that the middle inverse is unique if both AB and BA are symmetric. We omit the proof.

Theorem 8.6.4: Penrose’ Theorem12


Given any real m × n matrix A, there is exactly one n × m matrix B such that A and B satisfy the
following conditions:

P1 ABA = A and BAB = B.

P2 Both AB and BA are symmetric.

Definition 8.13
Let A be a real m × n matrix. The pseudoinverse of A is the unique n × m matrix A+ such that A
and A+ satisfy P1 and P2, that is:

AA+ A = A, A+ AA+ = A+ , and both AA+ and A+ A are symmetric13

If A is invertible then A+ = A−1 as expected. In general, the symmetry in conditions P1 and P2 shows
that A is the pseudoinverse of A+ , that is A++ = A.

12 R. Penrose, A generalized inverse for matrices, Proceedings of the Cambridge Philosophical Society 5l (1955), 406-413.
In fact Penrose proved this for any complex matrix, where AB and BA are both required to be hermitian (see Definition 8.18 in
the following section).
13 Penrose called the matrix A+ the generalized inverse of A, but the term pseudoinverse is now commonly used. The matrix
+
A is also called the Moore-Penrose inverse after E.H. Moore who had the idea in 1935 as part of a larger work on “General
Analysis”. Penrose independently re-discovered it 20 years later.
458 Orthogonality

Theorem 8.6.5
Let A be an m × n matrix.

1. If rank A = m then AAT is invertible and A+ = AT (AAT )−1 .

2. If rank A = n then AT A is invertible and A+ = (AT A)−1 AT .

Proof. Here AAT (respectively AT A) is invertible by Theorem 5.4.4 (respectively Theorem 5.4.3). The rest
is a routine verification.
In general, given an m × n matrix A, the pseudoinverse A+ can be computed from any SVD for A. To
see how, we need some notation. Let A = U ΣV T be an SVD for A (as in Definition 8.9) where U and V
D 0
are orthogonal and Σ = in block form where D = diag (d1 , d2 , . . . , dr ) where each di > 0.
0 0 m×n
Hence D is invertible, so we make:

Definition 8.14
 −1 
′ D 0
Σ = .
0 0 n×m

A routine calculation gives:

Lemma 8.6.6
 
• ΣΣ′ Σ = Σ Ir 0
• ΣΣ′ =
0 0 m×m
 
Ir 0
• Σ′ Σ =
• Σ′ ΣΣ′ = Σ′ 0 0 n×n

That is, Σ′ is the pseudoinverse of Σ.


Now given A = U ΣV T , define B = V Σ′U T . Then

ABA = (U ΣV T )(V Σ′U T )(U ΣV T ) = U (ΣΣ′ Σ)V T = U ΣV T = A

by Lemma 8.6.6. Similarly BAB = B. Moreover AB = U (ΣΣ′)U T and BA = V (Σ′ Σ)V T are both symmetric
again by Lemma 8.6.6. This proves

Theorem 8.6.6
Let A be real and m × n, and let A = U ΣV T is any SVD for A as in Definition 8.9. Then
A+ = V Σ′U T .
8.6. The Singular Value Decomposition 459

Of
 coursewe can always use the SVD constructed in Theorem 8.6.1 to find the pseudoinverse. If
1 0  
1 0 0
A =  0 0 , we observed above that B = is a middle inverse for A for any b. Furthermore
b 0 0
0 0
AB is symmetric but BA is not, so B 6= A+ .

Example 8.6.3
 
1 0
Find A+ if A =  0 0 .
0 0
 
1 0
T
Solution. A A = with eigenvalues λ1 = 1 and λ2 = 0 and corresponding eigenvectors
  0 0 
1 0  
q1 = and q2 = . Hence Q = q1 q2 = I2 . Also A has rank 1 with singular values
0 1
 
1 0  
  ′ 1 0 0
σ1 = 1 and σ2 = 0, so ΣA = 0 0 = A and ΣA = = AT in this case.
0 0 0
  0 0
  
1 0 1
 
Since Aq1 = 0 and Aq2 = 0 , we have p1 = 0  which extends to an orthonormal
  
0 0 0
   
0 0
basis {p1 , p2 , p3 } of R3 where (say) p2 =  1  and p3 =  0 . Hence
  0 1
P = p1 p2 p3 =I, so the SVD  for A is A = PΣA Q . Finally, the pseudoinverse of A is
T

1 0 0
A+ = QΣ′A PT = Σ′A = . Note that A+ = AT in this case.
0 0 0

The following Lemma collects some properties of the pseudoinverse that mimic those of the inverse.
The verifications are left as exercises.

Lemma 8.6.7
Let A be an m × n matrix of rank r.

1. A++ = A.

2. If A is invertible then A+ = A−1 .

3. (AT )+ = (A+ )T .

4. (kA)+ = kA+ for any real k.

5. (UAV )+ = U T (A+)V T whenever U and V are orthogonal.


460 Orthogonality

Exercises for 8.6


 
Exercise 8.6.1 If ACA = A show that B = CAC is a mid- Exercise 8.6.10 Find an SVD for A = 0 1
.
dle inverse for A. −1 0

Exercise 8.6.2 For any matrix A show that Exercise 8.6.11 If A = U ΣV T is an SVD for A, find an
SVD for AT .
ΣAT = (ΣA )T
Exercise 8.6.12 Let A be a real, m × n matrix with pos-
itive singular values σ1 , σ2 , . . . , σr , and write
Exercise 8.6.3 If A is m × n with all singular values pos-
itive, what is rank A? s(x) = (x − σ1 )(x − σ2 ) · · · (x − σr )
Exercise 8.6.4 If A has singular values σ1 , . . . , σr , what a. Show that cAT A (x) = s(x)xn−r and
are the singular values of: cAT A (c) = s(x)xm−r .

a. AT b. tA where t > 0 is real b. If m ≤ n conclude that cAT A (x) = s(x)xn−m .


c. A−1 assuming A is invertible.
Exercise 8.6.13 If G is positive show that:

Exercise 8.6.5 If A is square show that det A is the prod- a. rG is positive if r ≥ 0


uct of the singular values of A.
b. G + H is positive for any positive H.
Exercise 8.6.6 If A is square and real, show that A = 0
if and only if every eigenvalue of A is 0. Exercise 8.6.14 If G is positive and λ is an eigenvalue,
Exercise 8.6.7 Given a SVD for an invertible matrix A, show that λ ≥ 0.
find one for A−1 . How are ΣA and ΣA−1 related? Exercise 8.6.15 If G is positive show that G = H 2 for
Exercise 8.6.8 Let A−1 = A = AT where A is n × n. some positive matrix H. [Hint: Preceding exercise and
Given any orthogonal n × n matrix U , find an orthogonal Lemma 8.6.5]
matrix V such that A = U ΣAV T is an SVD for A. Exercise 8.6.16 If A is n × n show that AAT and AT A
 
0 1 are similar. [Hint: Start with an SVD for A.]
If A = do this for:
1 0
Exercise 8.6.17 Find A+ if:
     
1 3 −4 √1
1 −1 1 2
a. U = 5 b. U = a. A =
4 3 2 1 1 −1 −2
 
Exercise 8.6.9 Find a SVD for the following matrices: 1 −1
b. A =  0 0 
    1 −1
1 −1 1 1 1
a. A =  0 1  b.  −1 0 −2 
1 0 1 2 0 Exercise 8.6.18 Show that (A+ )T = (AT )+ .
8.7. Complex Matrices 461

8.7 Complex Matrices

If A is an n × n matrix, the characteristic polynomial cA (x) is a polynomial of degree n and the eigenvalues
of A are just the roots of cA (x). In most of our examples these roots have been real numbers (in fact,
the examples have been carefully chosen so this will be the case!); but it need not  happen, even when
0 1
the characteristic polynomial has real coefficients. For example, if A = then cA (x) = x2 + 1
−1 0
has roots i and −i, where i is a complex number satisfying i2 = −1. Therefore, we have to deal with the
possibility that the eigenvalues of a (real) square matrix might be complex numbers.
In fact, nearly everything in this book would remain true if the phrase real number were replaced by
complex number wherever it occurs. Then we would deal with matrices with complex entries, systems
of linear equations with complex coefficients (and complex solutions), determinants of complex matrices,
and vector spaces with scalar multiplication by any complex number allowed. Moreover, the proofs of
most theorems about (the real version of) these concepts extend easily to the complex case. It is not our
intention here to give a full treatment of complex linear algebra. However, we will carry the theory far
enough to give another proof that the eigenvalues of a real symmetric matrix A are real (Theorem 5.5.7)
and to prove the spectral theorem, an extension of the principal axes theorem (Theorem 8.2.2).
The set of complex numbers is denoted C . We will use only the most basic properties of these numbers
(mainly conjugation and absolute values), and the reader can find this material in Appendix A.
If n ≥ 1, we denote the set of all n-tuples of complex numbers by Cn . As with Rn , these n-tuples will
be written either as row or column matrices and will be referred to as vectors. We define vector operations
on Cn as follows:

(v1 , v2 , . . . , vn ) + (w1 , w2 , . . . , wn ) = (v1 + w1 , v2 + w2 , . . . , vn + wn )


u(v1 , v2 , . . . , vn ) = (uv1 , uv2 , . . . , uvn ) for u in C

With these definitions, Cn satisfies the axioms for a vector space (with complex scalars) given in Chapter 6.
Thus we can speak of spanning sets for Cn , of linearly independent subsets, and of bases. In all cases,
the definitions are identical to the real case, except that the scalars are allowed to be complex numbers. In
particular, the standard basis of Rn remains a basis of Cn , called the standard basis of Cn .
 
A matrix A = ai j is called a complex matrix if every entry ai j is a complex number. The notion  of
conjugation for complex numbers extends to matrices as follows: Define the conjugate of A = ai j to be
the matrix  
A = ai j
obtained from A by conjugating every entry. Then (using Appendix A)

A+B = A+B and AB = A B

holds for all (complex) matrices of appropriate size.

You might also like