La PDF
La PDF
La PDF
DR
AF
T
Contents
1 Introduction to Matrices 7
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Definition of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Transpose and Conjugate Transpose of Matrices . . . . . . . . . . . . . . 11
1.3.2 Sum and Scalar Multiplication of Matrices . . . . . . . . . . . . . . . . . . 11
1.3.3 Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Some More Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
T
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
DR
3
4 CONTENTS
3 Vector Spaces 71
3.1 Vector Spaces: Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.1 Vector Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 Linear Combination and Linear Span . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.1 Linear Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.3.1 Basic Results on Linear Independence . . . . . . . . . . . . . . . . . . . . 86
3.3.2 Application to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4 Basis of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.4.1 Main Results associated with Bases . . . . . . . . . . . . . . . . . . . . . 93
3.4.2 Constructing a Basis of a Finite Dimensional Vector Space . . . . . . . . 94
3.5 Fundamental Subspaces Associated with a Matrix . . . . . . . . . . . . . . . . . 97
3.6 Fundamental Theorem of Linear Algebra and Applications . . . . . . . . . . . . . 99
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9 Appendix 235
DR
T
AF
DR
Chapter 1
Introduction to Matrices
1.1 Motivation
Recall that at some stage, we have solved a linear system of 3 equations in 3 unknowns. But,
for clarity, let us start with a few linear systems of 2 equations in 2 unknowns.
The two linear systems represent a pair of non-parallel lines in R2 . Note that x = 1, y = 1
is the unique solution of the given system as (1, 1) is the point of intersection of the two
DR
Here, we have three planes in R3 and an easy observation implies that the third equation
is the sum of the first two equations. Hence, the line of intersection of the first two planes
7
8 CHAPTER 1. INTRODUCTION TO MATRICES
is contained in the third plane. Hence, this system has infinite number of solutions given
by
x = 61 − 59k, y = −10 + 11k, z = k with k arbitrary real number.
For example, verify that for k = 1, we get x = 2, y = 1 and z = 1 as a possible solution.
Also,
1 5 4 11 1 5 4
1 · 2 + 6 · 1 + −7 · 1 = 1 = 1 · 61 + 6 · (−10) + −7 · 0,
2 11 −3 12 2 11 −3
2 11 −3 0
AF
x + 5y + 4z = 11
x + 6y − 7z = 1 (1.1.4)
2x + 11y − 3z = 13.
Here, we see that if we add the first two equations and subtract it with the third equation
then we are left with 0x + 0y + 0z = 1, which has no solution. That is, the above system
has no solution. I leave it to the readers to verify that there does not exist any x, y and
z such that
1 5 4 11
1 · x + 6 · y + −7 · z = 1 .
2 11 −3 13
Remark 1.1.2. So, what we see above is “each of the linear systems gives us certain ‘relation-
ships’ between vectors which are ‘associated’ with the unknowns”. These relationships will lead
to the study of certain objects when we study “vector spaces”. They are as follows:
1. The first idea of ‘relationship’ that helps us to write a vector "in#terms of other vectors will
7
lead us to the study of ’linear combination’ of vectors. So, is a ‘linear combination’
6
" # " # 11 1 5 4
2 5
of and 1 is a ‘linear combination’ of 1, 6 and −7.
. Similarly,
2 4
12 2 11 −3
1.2. DEFINITION OF A MATRIX 9
2. Further, it also leads to the study of ‘linear span’ of a set. A positive answer leads to the
vector being an element of the ‘linear span;
and
a negative
answer
to ‘NOT
an element of
1 5 4 11
the linear span’. For example, for S = 1, 6 , −7, the vector 1 belongs to
2 11 −3 12
11
the ‘linear span’ of S, whereas, 1
does NOT belong to the ‘linear span’ of S.
13
3. The idea of a unique solution leads us to the statement
(" # " #) that the corresponding vectors are
2 5
‘linearly independent’. For example, the set , ⊆ R2 is ‘linearly independent’.
2 4
1 5 4
Whereas, the set 1 , 6 , −7 ⊆ R3 is NOT ‘linearly independent’ as
2 11 −3
1 5 4 0
1 · (−59) + 6 · 11 + −7 · 1 = 0.
2 11 −3 0
The horizontal arrays of a matrix are called its rows and the vertical arrays are called its
columns. A matrix A having m rows and n columns is said to be a matrix of size/ order
m × n and can be represented in either of the following forms:
a11 a12 · · · a1n a11 a12 · · · a1n
a21 a22 · · · a2n a21 a22 · · · a2n
A= . or A = . ,
.. .. .. .. . .. .. ..
. . .
. . . .
am1 am2 · · · amn am1 am2 · · · amn
where aij is the entry at the intersection of the ith row and j th column. One writes A ∈ Mm,n (F)
to mean that A is an m×n matrix with entries from the set F, or in short A = [aij ] or A = (aij ).
We write A[i, :] to denote the i-th row of A, A[:, j] to denote the j-th column of A and aij or
(A)ij or A[i, j], for the (i,"j)-th entry of A.# " #
1 3+i 7 7
For example, if A = then A[1, :] = [1 3 + i 7], A[:, 3] = and
4 5 6 − 5i 6 − 5i
a22 = 5. Sometimes commas are inserted to differentiate between entries of a row vector. Thus,
A[1, :] may also be written as [1, 3 + i, 7]. A matrix having only one column is called a column
vector and a matrix with only one row is called a row vector. All our vectors will be column
vectors and will be represented by bold letters. A matrix of size 1 × 1 is also called a scalar
and is treated as such and hence we may or may not put it under brackets.
10 CHAPTER 1. INTRODUCTION TO MATRICES
Definition 1.2.2. Two matrices A = [aij ], B = [bij ] ∈ Mm,n (C) are said to be equal if aij = bij ,
for each i = 1, 2, . . . , m and j = 1, 2, . . . , n.
In other words, two matrices are said to be equal if they have the same order and their
corresponding entries are equal.
Example 1.2.3. 1. Consider a system of linear
" equations
# 2x + 5y ="7#and 3x + 2y =" 6. #
2 5 7 2 5
Then, we identify it with the matrix A = . Here, A[:, 1] = and A[:, 2] =
3 2 6 3 2
are associated with the variables/ unknowns x and y, respectively.
" # " # 0 0
0 0 0 1
2. A = ,B = Then, A 6= B as a12 6= b12 . Similarly, if C = 0 0 then
0 0 0 0
0 0
A 6= C as they are of different sizes.
3. Let A ∈ Mn (F).
(a) Then, the entries a11 , a22 , . . . , ann are called the diagonal entries of A. They consti-
tute the principal diagonal of A.
(b) Then, A is said to be a diagonal matrix, , denoted
" #diag(a11 , . . . , ann ), if aij = 0
4 0
for i 6= j. For example, the zero matrix 0n and are diagonal matrices.
0 1
(c) Then, A = diag(1, . . . , 1) is called the matrix, denoted In , or in short I.
identity
" # 1 0 0
1 0
For example, I2 = and I3 = 0 1 0
.
0 1
0 0 1
(d) If A = αI, for some α ∈ F, then A is called a scalar matrix.
(e) Then, A is said to be an upper triangular matrix if aij = 0 for i > j.
(f) Then, A is said to be a lower triangular matrix if aij = 0 for i < j.
(g) Then, A is said to be triangular if it is an upper or a lower triangular matrix.
0 1 4 0 0 0
For example, 0 3 −1 is upper triangular, 1 0 0 is lower triangular and the
0 0 −2 0 1 1
matrices 0, I are upper as well as lower triangular matrices.
1.3. MATRIX OPERATIONS 11
5. For 1 ≤ i ≤ n, define ei = In [:, i], a matrix of order n × 1. Then the column matrices
e1 , . . . , en are called the standard unit vectors or the standard basis of Mn,1 (C) or
Cn . The dependence of n is omitted as it is understood
from the context. For example,
" # 1
1
if e1 ∈ C then, e1 =
2 and if e1 ∈ C then e1 = 0
3
.
0
0
1. the transpose of A, denoted AT , is an n × m matrix with (AT )ij = aji , for all i, j.
2. the conjugate transpose of A, denoted A∗ , is an n × m matrix with (A∗ )ij = aji (the
complex-conjugate of aji ), for all i, j.
" # " # " #
1 4+i 1 0 ∗ 1 0
If A = T
then A = and A = . Note that A∗ 6= AT .
0 1−i 4+i 1−i 4−i 1+i
" #
1 h i
Note that if x = is a column vector then xT = 1 2 and x∗ are row vectors.
2
Proof. Let A = [aij ], A∗ = [bij ] and (A∗ )∗ = [cij ]. Clearly, the order of A and (A∗ )∗ is the
same. Also, by definition cij = bji = aij = aij for all i, j.
1. A + B = B + A (commutativity).
2. (A + B) + C = A + (B + C) (associativity).
3. k(`A) = (k`)A.
4. (k + `)A = kA + `A.
as complex numbers commute. The other parts are left for the reader.
3. Let A ∈ Mn (C). Then there exists matrices B and C such that A = B +C, where B T = B
(Symmetric matrix) and C T = −C (skew-symmetric matrix).
A + AT A − AT A + AT A − AT
Ans: Note A = + . Here, B = and C = .
2 2 2 2
1.3. MATRIX OPERATIONS 13
1 + i −1 " #
2 3 −1 ∗ ∗
4. Let A =
2 3 and B = 1 1 − i 2 . Compute A + B and B + A .
i 1
We now come to the most important operation between matrices, called the matrix multipli-
cation. We define it as follows.
T
Definition 1.3.8. Let A = [aij ] ∈ Mm,n (C) and B = [bij ] ∈ Mn,r (C). Then, the product of A
AF
and B, denoted AB, is a matrix C = [cij ] ∈ Mm,r (C) such that for 1 ≤ i ≤ m, 1 ≤ j ≤ r
DR
b1j
n
b2j X
cij = A[i, :]B[:, j] = [ai1 , ai2 , . . . , ain ] . = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj .
..
k=1
bnj
Thus, AB is defined if and only if the number of columns of A = the number of rows of
B. The way matrix product is defined seems quite complicated. Most of you have already seen
it. But, we will find other ways (3 more ways) to understand this matrix multiplication. These
will be quite useful at different stages in our study. So, we need to spend enough time on it.
1 −1 " #
3 4 5
Example 1.3.9. Let A =
2 0 and B = −1 0 1 .
0 1
2. Row Method: Note that A[1, :] is a 1 × 2 matrix and B is a 2 × 3 matrix and hence
14 CHAPTER 1. INTRODUCTION TO MATRICES
0
AF
0 1 0 1 0
DR
1 −1 " # 1 −1 4
5
A · B[:, 1] = 2 0
= 2 · 5 + 0 · 1 = 10
1
0 1 0 1 1
h i
Thus, if B = B[:, 1] B[:, 2] B[:, 3] then
h i h i 4 4 4
AB = A B[:, 1 B[:, 2] B[:, 3] = A · B[:, 1 A · B[:, 2] A · B[:, 3] =
6 .
8 10
−1 0 1
" #
h i B[1, :]
4. Matrix Method: We also have if A = A[:, 1] A[:, 2] and B = then A[:, 1]
B[2, :]
is a 3 × 1 matrix and B[1, :] is a 1 × 3 matrix. Thus, the matrix product A[:, 1] B[1, :] is
defined and is a 3 × 3 matrix. Hence,
1
h i −1h i
A[:, 1]B[1, :] + A[:, 2]B[2, :] =
2 3
4 5 + 0 −1 0 1
0 1
3 4 5 1 0 −1 4 4 4
=6 8 10+ 0 0 0 = 6
.
8 10
0 0 0 −1 0 1 −1 0 1
1.3. MATRIX OPERATIONS 15
Remark 1.3.10. Let A ∈ Mm,n (C) and B ∈ Mn,p (C). Then the product AB is defined and
observe the following:
AF
B[1, :]
h i .
4. Write A = A[:, 1] · · · A[:, n] and B = .
. . Then
DR
B[n, :]
(A + B)2 = A2 + AB + BA + B 2 6= A2 + B 2 + 2AB.
16 CHAPTER 1. INTRODUCTION TO MATRICES
" # " #
2 1 3 3
Whereas if C = then BC = CB = = 3A 6= A = CA. Note that cancella-
1 2 3 3
tion laws don’t hold.
Definition 1.3.11. Two square matrices A and B are said to commute if AB = BA.
Theorem 1.3.12. Let A ∈ Mm,n (C), B ∈ Mn,p (C) and C ∈ Mp,q (C).
n
X n
X p
X p
n X
X
A(BC) ij
= aik BC kj
= aik bk` c`j = aik bk` c`j
k=1 k=1 `=1 k=1 `=1
Xn Xp p
X Xn X T
= aik bk` c`j = aik bk` c`j = AB c = (AB)C
i` `j ij
.
k=1 `=1 `=1 k=1 `=1
Using a similar argument, the next part follows. The other parts are left for the reader.
T
AF
Exercise 1.3.13. 1. Let A ∈ Mn (C) and e1 , . . . , en ∈ Mn,1 (C) (see Definition 5). Then
DR
(d) If A[i, :] = A[j, :] for some i and j then (AB)[i, :] = (AB)[j, :].
Ans: By definition (AB)[i, :] = A[i, :]B = A[j, :]B = (AB)[j, :].
(e) If B[:, i] = B[:, j] for some i and j then (AB)[:, i] = (AB)[:, j].
Ans: By definition (AB)[:, i] = AB[:, i] = AB[:, j] = (AB)[:, j].
h i h i h h i h
A2 = a b a b = a b a b = a b A.
AF
β β β β β
DR
" #
h i α
Thus, A2 = tA, for some scalar t. For example, if we choose a b = 1 then
β
A2 = A. So, we have infinite number of choices for a and b depending on α and β. The
same idea can be used for any n × n matrix.
" # 0 1 1
0 1 n n
(e) Let A = and B = 0 0 1. Guess a formula for A and B and prove it?
0 0
0 0 0
Ans: An = 0 for n ≥ 2 and B n = 0 for n ≥ 3.
" # 1 1 1 1 1 1
1 1
and C = 1 1 1. Is it true that A2 −2A+I = 0?
(f ) Let A = ,B= 0 1 1
0 1
0 0 1 1 1 1
What is B 3 − 3B 2 + 3B − I? Is C 2 = 3C?
Ans: Yes, all the three statements are TRUE.
5. Let A ∈ Mm,n (C). If Ax = 0 for all x ∈ Mn,1 (C) then A = 0, the zero matrix.
Ans: Take x = ei . Then 0 = Ax = Aei = A[:, i]. Hence the i-th column of A is the zero
vector. Thus, as we vary i in {1, 2, . . . , n}, we see that all the columns of A are zero.
6. Let A, B ∈ Mm,n (C). If Ax = Bx, for all x ∈ Mn,1 (C) then prove that A = B.
Ans: Take C = A − B. Now use (5) above to show that C = 0 and conclude that A = B.
18 CHAPTER 1. INTRODUCTION TO MATRICES
x1 y1
. . n n
∗ yi x i , x∗ x =
. , y = . ∈ Mn,1 (C). Then y x = |xi |2 ,
. .
P P
7. Let x =
i=1 i=1
xn yn
|x1 |2 x1 x2 · · · x1 xn
x1 y1 x1 y2 · · · x1 yn
x2 x1 |x2 |2 · · · x2 xn
∗
. . . ∗
xy = . .. · · · .. and xx = .
.. ..
.
. .. ..
. . .
xn y1 xn y2 · · · xn yn
xn x1 xn x2 · · · |xn | 2
|a1n |2 = |a11 |2 . Hence, a12 = 0, . . . , a1n = 0. Now, use (A∗ A)22 = (AA∗ )22 to conclude
a23 = 0, . . . , a2n = 0 and so on.
T
AF
Note that (A − aI3 )[:, 1] = 0. So, if A[:, 1] = 0 then B[1, :] doesn’t play any role in AB.
? ?
Ans: Note (A − aI)[:, 1] = 0, (A − bI)[:, 1] = 0, (A − bI)[:, 2] = 0. Hence
0 0
0 0 ? 0 0 ? ? ? ?
(A−aI)(A−bI) =
0 0 ?. Thus, (A−aI)(A−bI)(A−cI) = 0 0 ?0 ? ? = 0.
0 0 ? 0 0 ? 0 0 0
11. Find A, B, C ∈ M2 (C) such that AB = AC but B 6= C (Cancellation laws don’t hold).
" # " # " #
1 −1 1 1 2 −3
Ans: Let A = ,B = and C = . Then AB = 0 = AC.
1 −1 1 1 2 −3
0 1 0
. Compute A2 and A3 . Is A3 = I? Determine aA3 + bA + cA2 .
12. Let A =
0 0 1
1 0 0
1.3. MATRIX OPERATIONS 19
0 0 1 a b c
Ans: A2 = and A3 = I. So, aA3 + bA + cA2 = c a b . Such matrices are
1 0 0
0 1 0 b c a
called circulant matrices.
Lemma 1.3.15. Let A ∈ Mn (C). If there exist B, C ∈ Mn (C) such that AB = In and CA = In
then B = C, i.e., If A has a left inverse and a right inverse then they are equal.
Remark 1.3.16. Lemma 1.3.15 implies that whenever A is invertible, the inverse is unique.
Thus, we denote the inverse of A by A−1 . That is, AA−1 = A−1 A = I.
" #
a b
T
c d
DR
" #
1 d −b
(a) If ad − bc 6= 0. Then, verify that A−1 = ad−bc .
−c a
" # " #
2 3 1 7 −3
(b) In particular, the inverse of equals 2 .
4 7 −4 2
(c) If ad − bc = 0 then prove that either A[1, :] = 0∗ or A[:, 1] = 0 or A[2, :] = αA[1, :] or
A[:, 2] = αA[:, 1] for some α ∈ C. Hence, prove that A is not invertible.
" # " # " #
1 2 1 0 4 2
(d) Matrices , and do not have inverses. Justify your answer.
0 0 4 0 6 3
1 2 3 −2 0 1
. Then A−1 = 0 (verify AA−1 = A−1 A = I3 ).
2. Let A = 2 3 4 3 −2
3 4 6 1 −2 1
1 1 1 1 1 2
3. Prove that the matrices A =
1 1 1 and B = 1 0 1 are not invertible.
1 1 1 0 1 1
Solution: Suppose there exists C such that CA = AC = I. Then, using matrix product
A[1, :]C = (AC)[1, :] = I[1, :] = [1, 0, 0] and A[2, :]C = (AC)[2, :] = I[2, :] = [0, 1, 0].
DB[:, 1] = (DB)[:, 1] = I[:, 1], DB[:, 2] = (DB)[:, 2] = I[:, 2] and DB[:, 3] = I[:, 3].
But B[:, 3] = B[:, 1] + B[:, 2] and hence I[:, 3] = I[:, 1] + I[:, 2], a contradiction.
1. (A−1 )−1 = A.
2. (AB)−1 = B −1 A−1 .
" # " #
cos(θ) sin(θ) cos(θ) − sin(θ)
3. Find the inverse of and .
sin(θ) − cos(θ) sin(θ) cos(θ)
" # " #
cos(θ) sin(θ) cos(θ) − sin(θ)
Ans: If A = then A−1 = A and if B = then
sin(θ) − cos(θ) sin(θ) cos(θ)
" #
−1 cos(θ) sin(θ)
B = .
− sin(θ) cos(θ)
" #
1 2
5. Determine A that satisfies (I + 3A)−1 = .
2 1
" # " #!−1 " #
−1 4 −2 −1 1 2 −1 1 −2
as (I +3A) = (I + 3A)−1
Ans: A = = = .
T
9 −2 4 2 1 3 −2 1
AF
DR
−2 0 1
6. Determine A that satisfies (I − A)−1 =
0 3 −2 . [See Example 1.3.17.2].
1 −2 1
1 2 3 1 2 3 0 −2 −3
Ans: Example 1.3.17.2 gives I −A =
2 3 4 ⇒ A = I − 2 3 4 = −2 −2 −4.
3 4 6 3 4 6 −3 −4 −5
1 2
7. Let A be an invertible matrix satisfying A3 + A − 2I = 0. Then A−1 =
A +I .
2
Ans: As A is invertible, multiplying by A−1 gives A2 + I − 2A−1 = 0. Hence, the result.
8. Let A = [aij ] be an invertible matrix and B = [pi−j aij ], for some p ∈ C, p 6= 0. Then
B −1 = [pi−j (A−1 )ij ].
Ans: Note that B = DAD−1 , where D = diag(p, p2 , . . . , pn ) is a diagonal matrix. As
p 6= 0, D is invertible. Hence B −1 is invertible and B −1 = (DAD−1 )−1 = DA−1 D−1 .
Then, the matrices ek` for 1 ≤ k ≤ m and 1 ≤ ` ≤ n are called the standard basis
elements for Mm,n (C).
" " # # " # " #
1 0 0
1 h i 0 1 0 1 h i
So, if ek` ∈ M2,3 (C) then e11 = = ,
1 0 0 12 e = = 0 1 0
0 0 0 0 0 0 0 0
" # " #
0 0 0 0 h i
and e22 = = 0 1 0 .
0 1 0 1
In particular, if eij ∈ Mn (C) then eij = ei eTj = ei e∗j , for 1 ≤ i, j ≤ n.
0 0 1, 0 1 0 and 1 0 0 are permutation matrices. Verify that per-
1 0 0 1 0 0 0 0 1
DR
6. An idempotent matrix which is also Hermitian is called a projection matrix. For example,
if u ∈ Mn,1 (C) is a unit vector then A = uu∗ is a Hermitian, idempotent matrix. Thus A
is a projection matrix.
1.4. SOME MORE SPECIAL MATRICES 23
In particular, if u ∈ Mn,1 (R) is a unit vector then A = uuT . Then verify that uT (x−Ax) =
uT x − uT Ax = uT x − uT (uuT )x = 0 (as uT u = 1), for any x ∈ R3 . Thus, with respect
to the dot product in R3 , Ax is the foot of the perpendicular from the point x on the
1
vector u. In particular, if u = √ [1, 2, −1]T and A = uuT . Then, for any vector
6
x = [x1 , x2 , x3 ]T ∈ M3,1 (R),
x1 + 2x2 − x3 x1 + 2x2 − x3
Ax = (uuT )x = u(uT x) = √ u= [1, 2, −1]T .
6 6
7. Fix a unit vector u ∈ Mn,1 (R) and let A = 2uuT − In . Then, verify that A ∈ Mn (R) and
Ay = 2(uT y)u − y, for all y ∈ Rn . This matrix is called the reflection matrix about the
line, say `, containing the points 0 and u. This matrix fixes each point on the line ` and
send the vector v, which is orthogonal to u, to −v.
Exercise 1.4.2. 1. Consider the matrices eij ∈ Mn (C) for 1 ≤ i, j, ≤ n. Is e12 e11 = e11 e12 ?
What about e12 e22 and e22 e12 ?
Ans: Note e11 = e1 eT1 and e12 = e1 eT2 . Thus e12 e11 = (e1 eT2 )(e1 eT1 ) = e1 (eT2 e1 )eT1 = 0
T
AF
as eT2 e1 = 0. Where as e11 e12 = (e1 eT1 )(e1 eT2 ) = e1 (eT1 e1 )eT2 = e1 eT2 = e12 .
DR
2. Let {u1 , u2 , u3 } be three vectors in R3 such that u∗i ui = 1, for 1 ≤ i ≤ 3, and u∗i uj = 0
whenever i 6= j. Prove the following.
4. Prove that in M5 (R), there are infinitely many orthogonal matrices of which only finitely
many are diagonal (in fact, there number is just 32).
24 CHAPTER 1. INTRODUCTION TO MATRICES
6. Let A, B ∈ Mn (C) be two unitary matrices. Then both AB and BA are unitary matrices.
8. Let A ∈ Mn (C). If x∗ Ax ∈ R for every x ∈ Mn,1 (C) then A is a Hermitian matrix. [Hint:
Use ej , ej + ek and ej + iek of Mn,1 (C) for x.]
Ans: Taking x = ei gives aii = e∗i Aei = x∗ Ax ∈ R. So, aii ∈ R.
Taking x = ei + iej , gives x∗ Ax = aii − iaji + iaij + ajj , a real number. As aii , ajj ∈ R,
T
aij − aji is a purely imaginary number, i.e., they have the same real part. Similarly, taking
AF
x = ei + ej gives aij + aji ∈ R, i.e., they have opposite imaginary parts. So aij = aji .
DR
11. Let A ∈ Mn (C). Then A = S1 +S2 , where S1 = 21 (A+A∗ ) is Hermitian and S2 = 21 (A−A∗ )
is skew-Hermitian.
1. Then, a matrix obtained by deleting some of the rows and/or columns of A is said to be
a submatrix of A.
2. If S ⊆ [m] and T ⊆ [n] then by A(S|T) , we denote the submatrix obtained from A by
deleting the rows with indices in S and columns with indices in T . By A[S, T ], we mean
A(S c |T c ), where S c = [m] \ S and T c = [n] \ T . Whenever, S or T consist of a single
element, then we just write the element. If S = [m], then A[S, T ] = A[:, T ] and if T = [n]
then A[S, T ] = A[S, :] which matches with our notation in Definition 1.2.1.
0 1 2 0 2
AF
" #
1
A[1, 1] = [1], A[2, 3] = [2], A[{1, 2}, 1] = A[:, 1] = , A[1, {1, 3}] = [1 5] and A are a few
DR
0
" # " #
1 4 1 4
sub-matrices of A. But the matrices and are not sub-matrices of A.
1 0 0 2
1 2 3 " #
1 3 h i
2. Let A = 5 6 7
, S = {1, 3} and T = {2, 3}. Then, A[S, S] = , A(S | S) = 6 ,
9 7
9 8 7
" #
6 7 h i
A[T, T ] = and A(T | T ) = 1 are principal sub-matrices of A.
8 7
Let A ∈ Mn,m (C) and B ∈ Mm,p (C). Then the product AB" is
# defined. Suppose r < m.
H
Then A and B can be decomposed as A = [P Q] and B = , where P ∈ Mn,r (C) and
K
H ∈ Mr,p (C) so that AB = P H + QK. This is proved next.
AB = P H + QK.
Proof. Verify that the matrix products P H and QK are valid. Further, their sum is defined
as P H, QK ∈ Mn,p (C). Now, let P = [Pij ], Q = [Qij ], H = [Hij ], and K = [Kij ]. Then, for
26 CHAPTER 1. INTRODUCTION TO MATRICES
1 ≤ i ≤ n and 1 ≤ j ≤ p, we have
m
X r
X m
X r
X m
X
(AB)ij = aik bkj = aik bkj + aik bkj = Pik Hkj + Qik Kkj
k=1 k=1 k=r+1 k=1 k=r+1
= (P H)ij + (QK)ij = (P H + QK)ij .
Remark 1.5.4. Theorem 1.5.3 is very useful due to the following reasons:
1. The matrices P, Q, H and K can be further partitioned so as to form blocks that are either
identity or zero or have certain nice properties. So, such partitions are useful during
different matrix operations. Examples of such partitions appear throughout the notes.
2. Suppose one wants to prove a result for a square matrix A. If we want to prove it using
induction then we can prove it for the 1 × 1 matrix (the initial step of induction). Then
assume the result to hold for all k × k sub-matrices
" # A or just the first k × k principal
of
B x
sub-matrix of A. At the next step write A = T , where B is a k × k matrix. Then
x a
the result holds for B and then one can proceed to prove it for A.
DR
0
.
.
j-th .
↓ 0
A[:, i]eTj = [0, · · · , 0, A[:, i], 0, · · · , 0] and ei A[j, :] = A[j, :] ←i-th .
0
.
.
.
0
4. For An×n = [aij ], the trace of A, denoted tr(A), is defined by tr(A) = a11 + a22 + · · · + ann .
" # " #
3 2 4 −3
(a) Compute tr(A) for A = and A = .
2 2 −5 1
Ans: 3 + 2 = 5 and 4 + 1 = 5.
" # " # " # " #
1 1 1 1
(b) Let A be a matrix with A =2 and A =3 . Determine tr(A)?
2 2 −2 −2
" #
a b
Ans: Let A = . Then, the given conditions imply a+2b = 2, c+2d = 4, a−2b =
c d
5 5
3 and c − 2d = −6. Thus tr(A) = a + d = + = 5.
2 2
28 CHAPTER 1. INTRODUCTION TO MATRICES
(c) Let A and B be two square matrices of the same order. Then
i. tr(A + B) = tr(A) + tr(B).
n
P n
P n
P
Ans: tr(A + B) = (A + B)ii = (A)ii + (B)ii = tr(A) + tr(B).
i=1 i=1 i=1
ii. tr(AB) = tr(BA).
P n n P
P n n P
P n n
P
Ans: tr(AB) = (AB)ii = aij bji = bji aij = (BA)jj = tr(BA).
i=1 i=1 j=1 j=1 i=1 j=1
(d) Does there exist matrices A, B ∈ Mn (C) such that AB − BA = cI, for some c 6= 0?
Ans: No. Note that tr(AB − BA) = 0, where as, for c 6= 0, tr(c I) = nc 6= 0.
(a) Verify that J = 11T , where 1 is a column vector having all entries 1.
(b) Verify that J 2 = nJ.
(c) Also, for any α1 , α2 , β1 , β2 ∈ R, verify that there exist α3 , β3 ∈ R such that
(α1 In + β1 J) · (α2 In + β2 J) = α3 In + β3 J.
(d) Let α, β ∈ R such that α 6= 0 and α + nβ 6= 0. Now, define A = αIn + βJ. Then,
use the above to prove that A is invertible.
Ans: J 2 = (11T )(11T ) = 1(1T 1)1T = n11T = nJ.
T
AF
Note that in part (5c), α3 = α1 α2 and β3 = α1 β2 + α2 β1 + nβ1 β2 . So, using the third
1 β
DR
" #
1 2 3
6. Let A = .
2 1 1
(a) If p = c − y∗ A−1
11 x is non zero, then verify that
" # " #
−1 −1
A 0 1 A x h i
A−1 = 11
+ 11
y∗ A−1
11 −1 .
0 0 p −1
10. Suppose the matrices B and C are invertible and the involved partitioned products are
defined, then verify that that
" #−1 " #
A B 0 C −1
= .
C 0 B −1 −B −1 AC −1
11. Let A ∈ Mm,n (C). Then, a matrix G ∈ Mn,m (C) is called a generalized inverse (for
short, g-inverse) of A if AGA
" = A. # For example, a generalized inverse of the matrix
1 − 2α
A = [1, 2] is a matrix G = , for all α ∈ R. A generalized inverse G is called a
α
pseudo inverse or a Moore-Penrose inverse if GAG = G and the matrices AG and
2
GA are symmetric. Check that for α = the matrix G is a pseudo inverse of A. Further,
5
2
among all the g-inverses, the inverse with the least euclidean norm also has α = .
5
1.6 Summary
In this chapter, we started with the definition of a matrix and came across lots of examples.
We recall these examples as they will be used in later chapters to relate different ideas:
3. Triangular matrices.
4. Hermitian/Symmetric matrices.
5. Skew-Hermitian/skew-symmetric matrices.
T
AF
6. Unitary/Orthogonal matrices.
DR
7. Idempotent matrices.
8. Nilpotent matrices.
We also learnt product of two matrices. Even though it seemed complicated, it basically
tells that multiplying by a matrix on the
The matrix multiplication is not commutative. We also defined the inverse of a matrix. Further,
there were exercises that informs us that the rows and columns of invertible matrices cannot
have certain properties.
Chapter 2
2.1 Introduction
We start this section with our understanding of the system of linear equations.
2. Recall that the linear system ax + by = c for (a, b) 6= (0, 0), in the variables x and y,
represents a line in R2 . So, let us consider the points of intersection of the two lines
a1 x + b1 y = c1 , a2 x + b2 y = c2 , (2.1.1)
where a1 , a2 , b1 , b2 , c1 , c2 ∈ R with (a1 , b1 ), (a2 , b2 ) 6= (0, 0) (see Figure 2.1 for illustration
of different cases).
❵✶
❵✶
❵✷ ❵✶ ✄☎❞ ❵✷
✝ ❵✷
◆♦ ❙♦❧ t✐♦♥ ■♥☞♥✐t✂ ◆ ♠❜✂✁ ♦❢ ❙♦❧ t✐♦♥s ❯♥✐✞ ✂ ❙♦❧ t✐♦♥✿ ■♥t✂✁s✂❝t✐♥❣ ▲✐♥✂s
P❛✐✁ ♦❢ P❛✁❛❧❧✂❧ ❧✐♥✂s ❈♦✐♥❝✐✆✂♥t ▲✐♥✂s ✟ ✿ P♦✐♥t ♦❢ ■♥t✂✁s✂❝t✐♦♥
(a) Unique
" # Solution
" # (a1 b2 − a2 b1 6= 0): The linear system x − y = 3 and 2x + 3y = 11
x 4
has = as the unique solution.
y 1
31
32 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
" # 2.1.2. Observe the following of the linear system in Example 2.1.1.2a.
Example
4
1. corresponds to the point of intersection of the corresponding two lines.
1
T
" #
1 −1
2. Using matrix multiplication, the given system equals Ax = b, where A = ,
AF
2 3
DR
Thus, there are three ways of looking at the linear system Ax = b, where, as the name
suggests, one of the ways is looking at the point of intersection of planes, the other is the vector
sum approach and the third is the matrix multiplication approach. We will see that all the
three approaches are fundamental to the understanding of linear algebra.
where for 1 ≤ i ≤ m and 1 ≤ j ≤ n; aij , bi ∈ R. The linear system (2.1.2) is called homoge-
neous if b1 = 0 = b2 = · · · = bm and non-homogeneous, otherwise.
2.1. INTRODUCTION 33
a11 a12 ··· a1n
x1 b1
a21 a22 · · · a2n . .
Definition 2.1.4. Let A = . ,x= . .
. and b = . . Then, Equa-
.. .. .. ..
. . .
xn bm
am1 am2 · · · amn
tion (2.1.2) can be re-written as Ax = b, where A is called the coefficient matrix and the
block matrix [A b] is called the augmented matrix .
AF
1 1 1 1 0
For example, Ax = b, with A = 1 4 2 and b = 0 has −1 as the solution set.
DR
4 1 1 1 2
" # " # (" #)
1 1 2 1
Similarly, A = and b = has as the solution set. Further, they are consistent
1 2 3 1
systems. Whereas, the system x + y = 2, 2x + 2y = 3 is inconsistent (has no solution).
Definition 2.1.6. For the linear system Ax = b the corresponding linear homogeneous system
Ax = 0 is called the associated homogeneous system.
The readers are advised to supply the proof of the next remark.
Remark 2.1.7. Consider the linear system Ax = b with two distinct solutions, say u and v.
2. Thus, any two distinct solutions of Ax = b differs by a solution of the associated homoge-
neous system Ax = 0, i.e., {x0 + xh } is the solution set of Ax = b with x0 as a particular
solution and xh , a solution of the associated homogeneous system Ax = 0.
Ans: Since there are two intersecting (system is consistent) planes in R3 they will intersect
in a line. So, infinite number of solutions.
2. Give a linear system of 3 equations in 2 variables such that the system is inconsistent
whereas it has 2 equations which form a consistent system.
Ans: x + y = 2, x + 2y = 3, 2x + 3y = 4.
3. Give a linear system of 4 equations in 3 variables such that the system is inconsistent
whereas it has three equations which form a consistent system.
Ans: x + y + z = 3, x + 2y + 3z = 6, 2x + 3y + 4z = 4, 2x + 2y + z = 5.
T
(a) Can the system, Ax = b have exactly two distinct solutions for any choice of m and
n? Give reasons for your answer.
(b) Can the system Ax = b have only a finitely many (greater than 1) solutions for any
choice of m and n? Give reasons for your answer.
Ans: No. Let x1 , x2 be two solutions. Define z = ax1 + (1 − a)x2 for a ∈ R. Then
Az = aAx1 + (1 − a)Ax2 = ab + (1 − a)b = b.
To proceed with the understanding of the solution set of a system of linear equations, we start
with the definition of a pivot.
Definition 2.2.1. Let A be a non-zero matrix. Then, in each non-zero row of A, the left most
non-zero entry is called a pivot/leading entry. The column containing the pivot is called a
pivotal column.
2.2. ROW-REDUCED ECHELON FORM (RREF) 35
If aij is a pivot then we denote it by aij . For example, the entries a12 and a23 are pivots
0 3 4 2
in A = 0 0 0 0 . Thus, columns 2 and 3 are pivotal columns.
0 0 2 1
Definition 2.2.2. A matrix is in row echelon form (REF) (staircase/ ladder like)
2. if the pivot of the (i + 1)-th row, if it exists, comes to the right of the pivot of the i-th
row.
0 0 0 0 and 0
0 0 0 1.
AF
0 0 1 1 0 0 0 1 4
DR
We now start with solving two systems of linear equations. The idea is to manipulate the
rows of the augmented matrix in place of the linear equations themselves. Since, multiplying
a matrix on the left corresponds to row operations, we left multiply by certain matrices to
the augmented matrix so that the final matrix is in row echelon form (REF). The process of
obtaining the REF of a matrix is called the Gauss Elimination method. The readers should
carefully look at the matrices being multiplied on the left in the examples given below.
(a) Interchange 1-st and 2-nd equations (interchange B0 [1, :] and B0 [2, :] to get B1 ).
2x + 3z =5 0 1 0 2 0 3 5
y+z =2 B1 = 1 0 0B0 = 0
1 1 2.
x+y+z =3 0 0 1 1 1 1 3
1
(b) In the new system, replace 3-rd equation by 3-rd equation minus times the 1-st
2
36 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
1
equation (replace B1 [3, :] by B1 [3, :] − B1 [1, :] to get B2 ).
2
2x + 3z = 5 1 0 0 2 0 3 5
y+z =2 B2 = 0
1 0B1 = 0
1 1
2 .
y − 21 z = 1
2 −1/2 0 1 0 1 −1/2 1/2
(c) In the new system, replace 3-rd equation by 3-rd equation minus 2-nd equation
(replace B2 [3, :] by B2 [3, :] − B2 [2, :] to get B3 ).
2x + 3z =5 1 0 0 2 0 3 5
y+z =2 B3 = 0 1 0B2 = 0
1 1 2 .
− 32 z = − 23 0 −1 1 0 0 -3/2 −3/2
Observe that the matrix B3 is in REF. Using the last row of B3 , we get z = 1. Using
this and the second row of B3 gives y = 1. Finally, the first row gives x = 1. Hence,
the solution set of Ax = b is {[x, y, z]T | [x, y, z] = [1, 1, 1]}, a unique solution. The
method of finding the values of the unknowns y and x, using the 2-nd and 1-st row of B3
and the value of z is called back substitution.
2. Solve the linear system x + y +
z = 4, 2x +
3z = 5, y + z = 3.
1 1 1 4
T
Solution: Let B0 = [A b] = 2 0 3 5
be the augmented matrix. Then
AF
0 1 1 3
DR
(a) The given system looks like (correspond to the augment matrix B0 ).
x+y+z =4 1 1 1 4
2x + 3z = 5 B0 =
2 0 3 5 .
y+z =3 0 1 1 3
(b) In the new system, replace 2-nd equation by 2-nd equation minus 2 times the 1-st
equation (replace B0 [2, :] by B0 [2, :] − 2 · B0 [1, :] to get B1 ).
x+y+z = 4 1 0 0 1 1 1 4
−2y + z = −3 B1 = −2 1 0B0 = 0
-2 .
1 −3
y+z = 3 0 0 1 0 1 1 3
(c) In the new system, replace 3-rd equation by 3-rd equation plus 1/2 times the 2-nd
equation (replace B1 [3, :] by B1 [3, :] + 1/2 · B1 [2, :] to get B2 ).
x+y+z = 4 1 0 0 1 1 1 4
−2y + z = −3
0 1 0B1 = 0
B2 = -2 1 −3
.
3 3
z= 0 1/2 1 0 0 3/2 3/2
2 2
Observe that the matrix B2 is in REF. Verify that the solution set is {[x, y, z]T | [x, y, z] =
[1, 2, 1]}, again a unique solution.
2.2. ROW-REDUCED ECHELON FORM (RREF) 37
We use the above ideas to define elementary row operations and the corresponding elemen-
tary matrices in the next subsection.
Example 2.2.8. Let e1 , . . . , en be the standard unit vectors of Mn,1 (R). Then, using eTi ej =
0 = eTj ei and eTi ei = 1 = eTj ej , verify that each elementary matrix is invertible.
Eij Eij = In − ei eTi − ej eTj + ei eTj + ej eTi In − ei eTi − ej eTj + ei eTj + ej eTi = In .
We now show that the above elementary matrices correspond to respective row operations.
T
AF
(Ek (c)A)[k, :] = eTk (Ek (c)A) = eTk Im + (c − 1)ek eTk A = eTk + (c − 1)eTk (ek eTk ) A
A similar argument with eTi ek = 0, for i 6= k, gives (Ek (c)A)[i, :] = A[i, :], for i 6= k.
2. For c 6= 0, Eij (c)A corresponds to the replacement of A[i, :] by A[i, :] + cA[j, :].
Using eTi ei = 1 and A[i, :] = eTi A, we get
(Eij (c)A)[i, :] = eTi (Eij (c)A) = eTi Im + c ei eTj A = eTi + c eTi (ei eTj ) A
A similar argument with eTk ei = 0, for k 6= i, gives (Eij (c)A)[k, :] = A[k, :], for k 6= i.
3. Eij A corresponds to interchange of A[i, :] and A[j, :].
Using eTi ei = 1, eTi ej = 0 and A[i, :] = eTi A, we get
Similarly, using eTj ej = 1, eTj ei = 0 and A[j, :] = eTj A show that (Eij A)[j, :] = A[i, :].
Further, using eTk ei = 0 = eTk ej , for k 6= i, j show that (Eij A)[k, :] = A[k, :].
2.2. ROW-REDUCED ECHELON FORM (RREF) 39
Definition 2.2.10. Two matrices A and B are said to be row equivalent if one can be
obtained from the other by a finite number of elementary row operations. Or equivalently,
there exists elementary matrices E1 , . . . , Ek such that B = E1 · · · Ek A.
Definition 2.2.11. The linear systems Ax = b and Cx = d are said to be row equivalent if
their respective augmented matrices, [A b] and [C d], are row equivalent.
Thus, note that the linear systems at each step in Example 2.2.4 are row equivalent to each
other. We now prove that the solution set of two row equivalent linear systems are same.
Theorem 2.2.12. Let Ax = b and Cx = d be two row equivalent linear systems. Then they
have the same solution set.
EA = C, E b = d, A = E −1 C and b = E −1 d. (2.2.3)
C y = EA y = E b = d. (2.2.4)
A z = E −1 C z = E −1 d = b. (2.2.5)
DR
Therefore, using Equations (2.2.4) and (2.2.5) the required result follows.
The following result is a particular case of Theorem 2.2.12.
Corollary 2.2.13. Let A and B be two row equivalent matrices. Then, the systems Ax = 0
and Bx = 0 have the same solution set.
1 0 0 1 0 a
Example 2.2.14. Are the matrices A = 0 1 0 and B = 0 1 b row equivalent?
0 0 1 0 0 0
a
b is a solution of Bx = 0 but it isn’t a solution of Ax = 0.
Solution: No, as
−1
The following exercise shows that every square matrix is row equivalent to an upper trian-
gular matrix. We will come back to this idea again in the chapter titled “Advanced Topics”.
Exercise 2.2.15. 1. Let A = [aij ] ∈ Mn (R). Then there exists an orthogonal matrix U
such that U A is upper triangular. The proof uses the following ideas.
(a) If A[1, :] = 0 then proceed to the next column. Else, A[:, 1] 6= 0.
(b) If A[:, 1] = αe1 , for some α ∈ R, α 6= 0, proceed to the next column. Else, either
a11 = 0 or a11 6= 0.
40 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
(c) If a11 = 0 then left multiply A with E1i (an orthogonal matrix) so that the (1, 1)
entry of B = E1i A is non-zero. Hence, without loss of generality, let a11 6= 0.
(d) Let [w1 , . . . , wn ]T = w ∈ Rn with w1 6= 0. Then use the Householder matrix H such
that Hw = w1 e1 , i.e., find x ∈ Rn such that (In − 2xxT )w = w1 e1 .
w − w1 e1 1
Ans: Given condition implies w − w1 e1 = 2(xT w)x. So x = T
. As
2x w 2xT w
2 T 2
is scalar, use x = α(w − w1 e1 ). Show that α satisfies α w w − w1 = 1 and
Hw = w1 e1 . " #
w1 ∗
(e) So, Part 1d gives an orthogonal matrix H1 with H1 A = .
0 A1
(f ) Use induction to get H2 ∈ Mn−1 (R) satisfying H2 A1 = T1 , an upper triangular
matrix. " # " #
1 0T w1 ∗
(g) Define H = H1 . Then H is an orthogonal matrix and HA = , an
0 H2 0 T1
upper triangular matrix.
2. Let A ∈ Mn (R) such that tr(A) = 0. Then prove that there exists a non-singular matrix
S such that SAS −1 = B with B = [bij ] and bii = 0, for 1 ≤ i ≤ n.
Ans: If diag(A) = 0 done. Else, assume a11 6= 0. If there is i such that ai1 6= 0 then use
a11
S = E1i (c), for c = − , it to get (SAS −1 )11 = 0. If ai1 = 0 for all i 6= 1, then tr(A) = 0
ai1
implies, there exists i such that aii 6= 0. Use this entry to get a non-zero entry in the first
T
Hence, we observe that solving the system Ax = b reduces to solving two easier linear systems,
namely Ly = b and U z = y, where y is obtained as a solution of Ly = b.
2.3. INITIAL RESULTS ON LU DECOMPOSITION 41
To give the LU -decomposition for a square matrix A, we need to know the determinant of A,
namely det(A), and its properties. Since, we haven’t yet studied it, we just give the idea of the
LU -decomposition. For the general case, the readers should see the chapter titled “Advanced
Topics”. Let us start with a few examples.
" #
0 1
Example 2.3.1. 1. Let A = . Then A cannot be decomposed into LU .
1 0
" #" #
a 0 e f
For if, A = LU = then the numbers a, b, c, e, f, g ∈ R satisfy
b c 0 g
ae = 0, af = 1, be = 1 and bf + cg = 0.
1 1 1
AF
4. Let A = 2 0 3 . Then, using ideas in Example 2.2.4.2 verify that A = LU , where
DR
0 1 1
1 0 0 1 1 1
L= 2 1 0 and U = 0 −2 1 .
0 −1/2 1 0 0 3/2
5. Recall that in Example 2.2.4.2, we had pivots pivots at each stage. Whereas, in Exam-
ple 2.2.4.1, we had to interchange the first and second row to get a pivot. So, it is not
possible to write A = LU .
6. Finally, using A
= LU , the system
Ax = b reduces to LU x = b. Here,
the solution of
4 4 1
Ly = b, for b = 5 equals y = −3 . This, in turn implies x = 2 as the solution of
3 3/2 1
both U x = y or Ax = b.
So, to proceed further, let A ∈ Mn (R). Then, recall that for any S ⊆ {1, 2, . . . , n}, A[S, S]
denotes the principal submatrix of A corresponding to the elements of S (see Page 25). Then,
we assume that det(A[S, S]) 6= 0, for every S = {1, 2, . . . , i}, 1 ≤ i ≤ n.
We need to show that there exists an invertible lower triangular matrix L such that LA is
an invertible upper triangular matrix. The proof uses the following ideas.
" #
a11 A12
1. By assumption A[1, 1] = a11 6= 0. Write A = , where A22 is a (n − 1) × (n − 1).
A21 A22
42 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
" #
1 0T −1
2. Let L1 = , where x = A21 . Then L1 is a lower triangular matrix and
x In−1 a11
" #" # " # " #
1 0T a11 A12 a11 A12 a11 A12
L1 A = = = .
x In−1 A21 A22 a11 x + A21 xA12 + A22 0 xA12 + A22
3. Note that (2, 2)-th entry of L1 A equals the (1, 1)-th entry of xA12 + A22 . This equals
a21
−1 . h i a11 a22 − a12 a21 A[{1, 2}, {1, 2}]
. a12 · · · a1n + (A22 )11 = = 6= 0.
a11 .
a11 a11
an1
11
" #
a11 ∗
4. Thus, L1 is an invertible lower triangular matrix with L1 A = and (A1 )11 6= 0.
0 A1
Hence, det(A) = a11 det(A1 ) and det(A1 [S, S]) 6= 0, for all S ⊆ {1, 2, . . . , n − 1} as
(a) the determinant of a lower triangular matrix equals product of diagonal entries and
(b) if A and B are two n × n matrices then det(AB) = det(A) · det(B).
5. Now, using induction, we get L2 ∈ Mn−1 (R), an invertible lower triangular matrix, with
1’s on the diagonal such that L2 A1 = T1 , an invertible upper triangular matrix.
" # " #
T
1 0 T a ∗
11
6. Define Le= L1 . Then, verify that LA
e = , is an upper triangular matrix
AF
0 L2 0 T1
DR
1. if C is already in REF,
[C d] = RREF([A b]) then what are the possible choices for [C d] and what are its
implication?
Solution: Since there are 3 rows, the number of pivots can be at most 3. So, let us verify that
there are 7 different choices for [C d] = RREF([A b]).
1. There are exactly 3 pivots. These pivots can be in either the columns {1, 2, 3}, {1, 2, 4}
and {1, 3, 4} as we have assumed A[:, 1] 6= 0. The corresponding cases are given below.
1 0 0 d1
(a) Pivots in the columns 1, 2, 3 ⇒ [C d] = 0 1 0 d2
. Here, Ax = b is consistent.
0 0 1 d3
x d
1
The unique solution equals y = d2
.
z d3
1 0 α 0 1 α 0 0
(b) Pivots in the columns 1, 2, 4 or 1, 3, 4 ⇒ [C d] equals 0 1 β 0 or 0 0 1
.
0
0 0 0 1 0 0 0 1
Here, Ax = b is inconsistent for any choice of α, β as there is a row of [0 0 0 1]. This
corresponds to solving 0 · x + 0 · y + 0 · z = 1, an equation which has no solution.
2. There are exactly 2 pivots. These pivots can be in either the columns {1, 2}, {1, 3} or
{1, 4} as we have assumed A[:, 1] 6= 0. The corresponding cases are given below.
44 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
1 0 α d1 1 α 0 d1
(a) Pivots in the columns 1, 2 or 1, 3 ⇒ [C d] equals
0 1 β d2
or 0 0 1 d2 .
0 0 0 0 0 0 0 0
Here, for the first matrix, the solution set equals
x d1 − αz d1 −α
y = d2 − βz = d2 + z −β ,
z z d3 1
where z is arbitrary. Here, z is called the “Free variable” as z can be assigned any
value and x and y are called “Basic Variables” and they can be written in terms of
the free variable z and constant.
1 α β 0
(b) Pivots in the columns 1, 4 ⇒ [C d] = 0 0 0 1 which has a row of [0 0 0 1].
0 0 0 0
This corresponds to solving 0 · x + 0 · y + 0 · z = 1, an equation which has no solution.
1 α β d1
3. There is exactly one pivot. In this case [C d] =
0 0 0 . Here, Ax = b is consis-
0
0 0 0 0
tent and has infinite number of solutions for every choice of α, β as RREF([A b])
T
AF
So, having seen the application of the RREF to the augmented matrix, let us proceed with the
algorithm, commonly known as the Gauss-Jordan Elimination (GJE), which helps us compute
the RREF.
4. Step 2: If all entries in the Region are 0, STOP. Else, in the Region, find the leftmost
nonzero column and find its topmost nonzero entry. Suppose this nonzero entry is aij = c
(say). Box it. This is a pivot.
5. Step 3: Interchange the row containing the pivot with the top row of the region. Also,
make the pivot entry 1 by dividing this top row by c. Use this pivot to make other entries
in the pivotal column as 0.
6. Step 4: Put Region = the submatrix below and to the right of the current pivot. Now,
go to step 2.
Important: The process will stop, as we can get at most min{m, n} pivots.
2.4. ROW-REDUCED ECHELON FORM (RREF) 45
0 2 3 7
1 1 1 1
Example 2.4.4. Apply GJE to 1
3 4 8
0 0 0 1
1. Region = A as A 6= 0.
1 1 1 1 1 1 1
1
0 2 3 7 0 2 3 7
2. Then, E12 A =
. Also, E31 (−1)E12 A = = B (say).
1 3 4 8
0 2 3 7
0 0 0 1 0 0 0 1
1 1 1 1
2 3 7 3 7
1
0 1 2 2
3. Now, Region = 2 3 7 6= 0. Then, E2 ( 2 )B = 0
= C(say). Then,
2 3 7
0 0 1
0 0 0 1
−1 −5
1 0 2 2
0 3 7
1 2 2
E12 (−1)E32 (−2)C = 0 = D(say).
0 0 0
0 0 0 1
1 0 −1 2
−5
2
" # 3 7
0 0 0 1
T
2 2
4. Now, Region = . Then, E34 D = . Now, multiply on the left
AF
0 1 0
0 0 1
0 0 0 0
DR
1 0 − 12 0
0 3
5 −7 1 2 0
by E13 ( 2 ) and E23 ( 2 ) to get
, a matrix in RREF. Thus, A is row
0 0 0 1
0 0 0 0
1 0 − 12 0
0 3
1 2 0
equivalent to F , where F = RREF(A) = .
0 0 0 1
0 0 0 0
5. Note that we have multiplied A on the left by the elementary matrices, E12 , E31 (−1),
E2 (1/2), E32 −2, E12 (−1), E34 , E23 (−7/2), E13 (5/2), i.e.,
The proof of the next result is beyond the scope of this book and hence is omitted.
Theorem 2.4.6. Let A and B be two row equivalent matrices in RREF. Then A = B.
Proof. Suppose there exists a matrix A having B and C as RREFs. As the RREFs are obtained
by left multiplication of elementary matrices, there exist elementary matrices E1 , . . . , Ek and
F1 , . . . , F` such that B = E1 · · · Ek A and C = F1 · · · F` A. Thus,
As inverse of an elementary matrix is an elementary matrix, B and C are are row equivalent.
T
1. Then, the uniqueness of RREF implies that RREF(A) is independent of the choice of the
row operations used to get the final matrix which is in RREF.
4. Let F = RREF(A) and B = [A[:, 1], . . . , A[:, s]], for some s ≤ n. Then,
But, P B = [P A[:, 1], . . . , P A[:, s]] = [F [:, 1], . . . , F [:, s]]. As F is in RREF, its first s
columns are also in RREF. Thus, by Corollary 2.4.7, RREF(P B) = [F [:, 1], . . . , F [:, s]].
Now, a repeated application of Remark 2.4.8.2 implies RREF(B) = [F [:, 1], . . . , F [:, s]].
Thus, the required result follows.
Proposition 2.4.9. Let A ∈ Mn (R). Then, A is invertible if and only if RREF(A) = In , i.e.,
every invertible matrix is a product of elementary matrices.
2.4. ROW-REDUCED ECHELON FORM (RREF) 47
Recall that if A ∈ Mn (C) is invertible then there exists a matrix B such that AB = In = BA.
DR
AF
−2 0 1 −1 −4 3
DR
Definition 2.5.1. Let A ∈ Mm,n (C). Then, the rank of A, denoted Rank(A), is the number
of pivots in the RREF(A).
Note that Rank(A) is defined using the number of pivots in RREF (A). These pivots
were obtained using the row operations. The question arises, what if we had applied column
operations? That is, what happens when we multiply by invertible matrices on the right?
Will the pivots using column operations remain the same or change? This question cannot be
answered at this stage. Using the ideas in vector spaces, we can show that the number of pivots
do not change and hence, we just use the word Rank(A).
We now illustrate the calculation of the rank by giving a few examples.
Remark 2.5.3. Before proceeding further, for A, B ∈ Mm,n (C), we observe the following.
1. If A and B are row-equivalent then Rank(A) = Rank(B).
2. The number of pivots in the RREF(A) equals the number of pivots in REF of A. Hence,
one needs to compute only the REF to determine the rank.
T
AF
Ans: The number of pivots cannot be more than the number of rows or the number of
columns.
" #
A C
2. If B = then Rank(B) = Rank([A C]).
0 0
Ans: RREF(B) = RREF([A C]).
" #
A 0
3. If B = then Rank(B) = Rank(A)
0 0
" #
RREF(A) 0
Ans: RREF(B) = .
0 0
4. If A = P B, for some invertible matrix P then Rank(A) = Rank(B).
Corollary 2.5.5. Let A ∈ Mm,n (R) and B ∈ Mn,q (R). Then, Rank(AB) ≤ Rank(A).
In particular, if B ∈ Mn (R) is invertible then Rank(AB) = Rank(A).
50 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
Proposition 2.5.6. Let A ∈ Mn (C) be an invertible matrix and let S be any subset of {1, 2, . . . , n}.
Then Rank(A[S, :]) = |S| and Rank(A[:, S]) = |S|.
Proof. Without loss of generality, let S = {1, . . . , r} and S c = {r + 1, . . . , n}. Write A1 = A[:, S]
and A2 = A[:, S c ]. Since A is invertible, RREF(A) = In . Hence, by Remark 2.4.8.3, there exists
an invertible matrix P such that P A = In . Thus,
" #
h i h i Ir 0
P A1 P A2 = P A1 A2 = P A = In = .
0 In−r
" # " #
T
Ir 0
Thus, P A1 = and P A2 = . So, using Corollary 2.5.5, Rank(A1 ) = r.
AF
0 In−r
For the second part, let B1 = A[S, :], B2 = A[S c , :] and let Rank(B1 ) = t < s. Then, by
DR
Exercise 2, there exists an s × s invertible matrix Q and a matrix C in RREF, of size t × n and
having exactly t pivots, such that
" #
C
QB1 = RREF(B1 ) = . (2.5.2)
0
Theorem 2.5.8. Let A ∈ Mm,n (R). If Rank(A) = r then, there exist invertible matrices P and
Q such that " #
Ir 0
P AQ= .
0 0
Proof. Let C = RREF(A). Then, by Remark 2.4.8.3 there exists as invertible matrix P such
that C = P A. Note that C has r pivots and they appear in columns, say i1 < i2 < · · · < ir .
Now, let Q1 = E1i1 E2i2 · · · Erir . As Ejij ’s are elementary matrices that interchange the
" #
Ir B
columns of C, one has D = CQ1 = , where B ∈ Mr,n−r (R).
0 0
" #
Ir −B
T
Corollary 2.5.9. Let A ∈ Mm,n (R). If Rank(A) = r then there exist matrices B ∈ Mm,r (R)
r
and C ∈ Mr,n (R) such that Rank(B) = Rank(C) = r and A = BC. Furthermore, A = xi yiT ,
P
i=1
for some xi ∈ Rm and yi ∈ Rn .
" #
Ir 0
Proof. By Theorem 2.5.8, there exist invertible matrices P and Q such that P A Q = .
0 0
" # " #
−1 Ir 0 −1 −1 −1 C
Or equivalently, A = P Q . Decompose P = [B D] and Q = such that
0 0 F
B ∈ Mm,r (R) and C ∈ Mr,n (R). Then Rank(B) = Rank(C) = r (see Proposition 2.5.6) and
" #" # " #
Ir 0 C h i C
A = [B D] = B 0 = BC.
0 0 F F
52 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
y1T
h i . r
. xi yiT .
P
Furthermore, assume that B = x1 · · · xr and C =
. . Then A = BC =
i=1
yrT
Proposition 2.5.10. Let A, B ∈ Mm,n (R). Then, prove that Rank(A + B) ≤ Rank(A) +
k
xi yiT , for some xi , yi ∈ R, for 1 ≤ i ≤ k, then Rank(A) ≤ k.
P
Rank(B). In particular, if A =
i=1
Proof. Let Rank(A) = r. Then, # exists an invertible matrix P and a matrix A1 ∈ Mr,n (R)
" there
A1
such that P A = RREF(A) = . Then,
0
" # " # " #
A1 B1 A1 + B 1
P (A + B) = P A + P B = + = .
0 B2 B2
Thus, the required result follows. The other part follows, as Rank(xi yiT ) = 1, for 1 ≤ i ≤ k.
" # " #
2 4 8 1 0 0
Exercise 2.5.11. 1. Let A = and B = . Find P and Q such that
1 3 2 0 1 0
B = P AQ.
T
2. Let A ∈ Mm,n (R) be a matrix of rank 1. Then prove that A = xyT , for non-zero vectors
AF
x ∈ Rm and y ∈ Rn .
DR
In the first case, there is a pivot in the (n + 1)-th column of the augmented matrix [A b].
Thus, the column corresponding to b has a pivot. This implies b 6= 0. This implies that the
2.6. SOLUTION SET OF A LINEAR SYSTEM 53
row corresponding to this pivot in RREF([A b]) has all entries before this pivot as 0. Thus,
in RREF([A b]) this pivotal row equals [0 0 · · · 0 1]. But, this corresponds to the equation
0 · x1 + 0 · x2 + · · · + 0 · xn = 1. This implies that the Ax = b has no solution whenever
Definition 2.6.1. Consider the linear system Ax = b. If RREF([A b]) = [C d]. Then,
the variables corresponding to the pivotal columns of C are called the basic variables and the
variables that correspond to non-pivotal columns are called free variables.
Then to get the solution set, observe that C has 4 pivotal columns, namely, the columns 1, 2, 5
T
and 6. Thus, x1 , x2 , x5 and x6 are basic variables. Therefore, the remaining variables x3 , x4 and
AF
x1 −2x3 + x4 − 2x7 −2 1 −2
x2 −x3 − 3x4 − 5x7 −1 −3 −5
x3 x3 1 0 0
x4 = x4 = x3 0 + x4 1 + x7 0 ,
4x7 0 0 4
x5
x
6 4 − x7
0
0
−1
x7 x7 0 0 1
Theorem 2.6.3. Let Ax = b be a linear system in n variables with RREF([A b]) = [C d].
Proof. Part 1: As Rank([A b]) > Rank(A), by Remark 2.4.8.4 ([C d])[r + 1, :] = [0T 1]. Note
that this row corresponds to the linear equation
0 · x1 + 0 · x2 + · · · + 0 · xn = 1
Part 2b: As Rank(A) = r < n. Suppose the pivots appear in columns i1 , . . . , ir with
1 ≤ i1 < · · · < ir ≤ n. Thus, the variables xij , for 1 ≤ j ≤ r, are basic variables and the
remaining n − r variables, say xt1 , . . . , xtn−r , are free variables with t1 < · · · < tn−r . Since C is
in RREF, in terms of the free variables and basic variables, the `-th row of [C d], for 1 ≤ ` ≤ r,
corresponds to the equation (writing basic variables in terms of a constant and free variables)
n−r
X n−r
X
x i` + c`tk xtk = d` ⇔ xi` = d` − c`tk xtk .
k=1 k=1
d1 c1t1 c1tn−r
. . .
. . .
. . .
dr crt crt
1 n−r
Define x0 = 0 and u1 = 1 , . . . , un−r = 0 . Then, it can be easily verified
0 0 0
. . .
. . .
. . .
0 0 1
that Ax0 = b and, for 1 ≤ i ≤ n − r, Aui = 0. Also, by Equation (2.6.3) the solution set has
indeed the required form, where ki corresponds to the free variable xti . As there is at least one
free variable the system has infinite number of solutions.
Thus, note that the solution set of Ax = b depends on the rank of the coefficient matrix, the
rank of the augmented matrix and the number of unknowns. In some sense, it is independent
of the choice of m.
Exercise 2.6.4. Consider the linear system given below. Use GJE to find the RREF of its
augmented matrix and use it to find the solution.
x +y −2 u +v = 2
z +u +2 v = 3
v +w = 3
T
AF
v +2 w = 5
DR
1 1 0 −2 0 0 1
0 0 1 1
0 0 1
Ans: RREF([A b]) =
0
. Thus, the solution set equals
0 0 0 1 0 1
0 0 0 0 0 1 2
1 −1 2
0 1 0
1 0 −1
{x0 + cu1 + du2 : c, d ∈ R}, where x0 = , u1 = and u2 = .
0 0 1
1 0 0
2 0 0
Let A ∈ Mm,n (R). Then, Rank(A) ≤ m. Thus, using Theorem 2.6.3 the next result follows.
Corollary 2.6.5. Let A ∈ Mm,n (R). If Rank(A) = r < n then the homogeneous system Ax = 0
has at least one non-trivial solution.
Remark 2.6.6. Let A ∈ Mm,n (R). Then, Theorem 2.6.3 implies that Ax = b is consistent
if and only if Rank(A) = Rank([A b]). Further, the the vectors ui ’s associated with the free
variables in Equation (2.6.3) are solutions of the associated homogeneous system Ax = 0.
Example 2.6.7. 1. Determine the equation of the circle passing through the points (−1, 4), (0, 1)
and (1, 4).
Solution: The equation a(x2 + y 2 ) + bx + cy + d = 0, for a, b, c, d ∈ R, represents a circle.
Since this curve passes through the given points, we get a homogeneous system having 3
equations in4 unknowns, namely
a
2 2
(−1) + 4 −1 4 1
b
(0)2 + 12 0 1 1 = 0.
c
2
1 +4 2 1 4 1
d
3
Solving this system, we get [a, b, c, d] = [ 13 d, 0, − 16
13 d, d]. Hence, choosing d = 13, the
required circle is given by 3(x2 + y 2 ) − 16y + 13 = 0.
2. Determine the equation of the plane that contains the points (1, 1, 1), (1, 3, 2) and (2, −1, 2).
Solution: The general equation of a plane in space is given by ax + by + cz + d = 0,
where a, b, c and d are unknowns. Since this plane passes through the 3 given points, we
get a homogeneous system in 3 equations and 4 variables. So, it has a non-trivial solution,
namely [a, b, c, d] = [− 43 d, − d3 , − 32 d, d]. Hence, choosing d = 3, the required plane is given
by −4x − y + 2z + 3 = 0.
2 3 4
T
AF
3. Let A =
0 −1 0. Then, find a non-trivial solution of Ax = 2x. Does there exist a
0 −3 4
DR
Exercise 2.6.8. 1. Let A ∈ Mn (R). If A2 x = 0 has a non trivial solution then show that
Ax = 0 also has a non trivial solution.
Ans: Choose x0 6= 0 such that A2 x0 = 0. If Ax0 = 0 then x0 is a non-trivial solution of
Ax = 0. Else, Ax0 6= 0 is a non-trivial solution of Ax = 0.
2. Let u = (1, 1, −2)T and v = (−1, 2, 3)T . Find condition on x, y and z such that the system
cu + dv = (x, y, z)T in the unknowns c and d is consistent.
Ans: 7x − y + 3z = 0
3. Find condition(s) on x, y, z so that the systems given below (in the unknowns a, b and c)
is consistent?
2.6. SOLUTION SET OF A LINEAR SYSTEM 57
(a) a + 2b − 3c = x, 2a + 6b − 11c = y, a − 2b + 7c = z.
(b) a + b + 5c = x, a + 3c = y, 2a − b + 4c = z.
Ans: (a) 5x − 2y + z = 0. (b) x − 3y + z = 0.
4. For what values of c and k, the following systems have i) no solution, ii) a unique
solution and iii) infinite number of solutions.
(a) x + y + z = 3, x + 2y + cz = 4, 2x + 3y + 2cz = k.
(b) x + y + z = 3, x + y + 2cz = 7, x + 2y + 3cz = k.
(c) x + y + 2z = 3, x + 2y + cz = 5, x + 2y + 4z = k.
(d) x + 2y + 3z = 4, 2x + 5y + 5z = 6, 2x + (c2 − 6)z = c + 20.
(e) x + y + z = 3, 2x + 5y + 4z = c, 3x + (c2 − 8)z = 12.
Ans: (a) c = 1 and k 6= 7 ⇒ No Solution. c = 1 and k = 7 ⇒ Infinite number of
solutions. c 6= 1 ⇒ Unique solution.
(b) c = 1/2 ⇒ No solution. c 6= 1/2 ⇒ Unique solution.
(c) c = 4 and k 6= 5 ⇒ No Solution. c = 4 and k = 5 ⇒ Infinite number of solutions.
c 6= 4 ⇒ Unique solution.
(d) c = 4 ⇒ No solution, c = −4 ⇒ Infinite number of solutions. c 6= ±4 ⇒ Unique
solution.
T
solution.
DR
5. Consider the linear system Ax = b in m equations and 3 unknowns. Then, for each of
the given solution set, determine the possible choices of m? Further, for each choice of
m, determine a choice of A and b.
(a) (1, 1, 1)T is the only solution.
(b) {c(1, 2, 1)T |c ∈ R} as the solution set.
(c) {(1, 1, 1)T + c(1, 2, 1)T |c ∈ R} as the solution set.
(d) {c(1, 2, 1)T + d(2, 2, −1)T |c, d ∈ R} as the solution set.
(e) {(1, 1, 1)T + c(1, 2, 1)T + d(2, 2, −1)T |c, d ∈ R} as the solution set.
1 0 0 1
Ans: (a) A unique solution ⇒ RREF([A b]) = 0 1 0 1.
0 0 1 1
1 " #
1 0 −1 0
(b) Only the homogeneous system with A 2 = 0 ⇒ RREF([A b]) = 0 1 −2 0 .
1
1 " # 1
1 0 −1 a
(c)Similarly, ⇒ A2 = 0. ⇒ RREF([A b]) =
. To get 1
as a particular
0 1 −2 b
1 1
solution, choose z = 1 to get a = 0 and b = −1.
58 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
h i
(d) Homogeneous system ⇒ RREF([A b]) = 1 a b 0 with a = −3/4 and b = 1/2.
h i
(e) Similarly, RREF([A b]) = 1 −3/4 1/2 c . Show c = 3/4 to get (1, 1, 1)T .
Theorem 2.7.1. Let A ∈ Mn (R). Then, the following statements are equivalent.
1. A is invertible.
2. RREF(A) = In .
4. Rank(A) = n.
2⇔4 By definition. For the converse, Rank(A) = n ⇒ A has n pivots and A has n
DR
5 =⇒ 1 Ax = 0 has only the trivial solution implies that there are no free variables. So,
all the unknowns are basic variables. So, each column is a pivotal column. Thus, RREF(A) = In .
1⇒6 Note that x0 = A−1 b is the unique solution of Ax = b.
6⇒7 A unique solution implies that is at least one solution. So, nothing to show.
7 =⇒ 1 Given assumption implies that for 1 ≤ i ≤ n, the linear system Ax = ei has a
solution, say ui . Define B = [u1 , u2 , . . . , un ]. Then
x0 = In x0 = (AB)x0 = A(Bx0 ) = A0 = 0.
Thus, the homogeneous system Bx = 0 has a only the trivial solution. Hence, using Part 5, B
is invertible. As AB = In and B is invertible, we get BA = In . Thus AB = In = BA. Thus, A
is invertible as well.
We now give an immediate application of Theorem 2.7.1 without proof.
2.7. SQUARE MATRICES AND LINEAR SYSTEMS 59
Theorem 2.7.2. The following two statements cannot hold together for A ∈ Mn (R).
As an immediate consequence of Theorem 2.7.1, the readers should prove that one needs to
compute either the left or the right inverse to prove invertibility of A ∈ Mn (R).
Corollary 2.7.4. (Theorem of the Alternative) The following two statements cannot hold
together for A ∈ Mn (C) and b ∈ Rn .
Note that one of the requirement in the last corollary is yT b 6= 0. Thus, we want non-zero
DR
Exercise 2.7.5. 1. Give the proof of Theorem 2.7.2 and Corollary 2.7.3.
2. Let A ∈ Mn,m (R) and B ∈ Mm,n (R). Either use Theorem 2.7.1.5 or multiply the matrices
to verify the following statementes.
3. Let bT = [1, 2, −1, −2]. Suppose A is a 4 × 4 matrix such that the linear system Ax = b
has no solution. Mark each of the statements given below as true or false?
(c) Let cT = [−1, −2, 1, 2]. Then, the system Ax = c has no solution.
Ans: FALSE. A solution x0 ⇒ −x0 is a solution of Ax = b.
(d) Let B = RREF(A). Then,
i. B[4, :] = [0, 0, 0, 0].
Ans: TRUE
ii. B[4, :] = [0, 0, 0, 1].
Ans: FALSE
2.8 Determinant
1 2 3 " #
1 2
Recall the notations used in Section 1.5 on Page 25 . If A = 1 3 2 then A(1 | 2) = 2 7
2 4 7
and A({1, 2} | {1, 3}) = [4]. The actual definition of the determinant requires an understanding
of group theory. So, we will just give an inductive definition which will help us to compute
the determinant and a few results. The advanced students can find the main definition of the
determinant in Appendix 9.2.22, where it is proved that the definition given below corresponds
to the expansion of determinant along the first row.
Definition 2.8.1. Let A be a square matrix of order n. Then, the determinant of A, denoted
T
a,
if A = [a] (corresponds to n = 1),
DR
det(A) = n
(−1)1+j a1j det A(1 | j) ,
P
otherwise.
j=1
det(A) = | A | = a11 det(A(1 | 1)) − a12 det(A(1 | 2)) + a13 det(A(1 | 3))
a
22 a23
a
21 a23
a
21 a22
= a11 − a12 + a 13
a32 a33 a31 a33 a31 a32
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 ).
2.8. DETERMINANT 61
1 2 3
3 1
2 1
2 3
2 3 1, det(A) = 1 · 2 2 − 2 · 1 2 + 3 · 1 2 = 4 − 2(3) + 3(1) = 1.
For A =
1 2 2
Exercise
2.8.3.
Find the determinant
of the following matrices.
1 2 7 8 3 0 0 1
1 a a2
0 4 3 2 0 2 0 5
i)
ii) 6 −7 1 0 iii) 1
b b2
.
0 0 2 3
1 c c2
0 0 0 5 3 2 0 6
It turns out that the determinant of a matrix equals the volume of the parallelepiped formed
using the columns of the matrix. With this understanding, the singularity of A gets related with
the dimension in which we are looking at the parallelepiped. For, example, the length makes
sense in one-dimension but it doesn’t make sense to talk of area (which is a two-dimensional
idea) of a line segment. Similarly, it makes sense to talk of volume of a cube but it doesn’t make
sense to talk of the volume of a square or rectangle or parallelogram which are two-dimensional
objects.
We now state a few properties of the determinant function. For proof, see Appendix 9.3.
T
AF
1. det(In ) = 1.
Thus, using Theorem 2.8.5, det(A) = 2 · (1 · 2 · (−1)) = −4, where the first 2 appears from the
1
elementary matrix E1 ( ).
2
Exercise 2.8.7. Prove the following without computing the determinant (use Theorem 2.8.5).
h i
1. Let A = u v 2u + 3v , where u, v ∈ R3 . Then, det(A) = 0.
62 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
a b c a e x2 a + xe + h
2. Let A = e f g . If x 6= 0 and B = b f x2 b + xf + j then det(A) = det(B).
h j ` c g x2 c + xg + `
3 1 2
Hence, conclude that 3 divides 4 7 1 .
1 4 −2
Remark 2.8.8. Theorem 2.8.5.3 implies that the determinant can be calculated by expanding
along any row. Hence, the readers are advised to verify that
n
X
det(A) = (−1)k+j akj det(A(k | j)), for 1 ≤ k ≤ n.
j=1
Example
Using Remark 2.8.8, one has
2.8.9.
2 2 6 1
2 2 1 2 2 6
0 0 2 1
= (−1)2+3 · 2 · 0 1 0 + (−1)2+4 · 0 1 2 = −2 · 1 + (−8) = −10.
0 1 2 0
1 2 1 1 2 1
1 2 1 1
Definition 2.8.10. Let A ∈ Mn (R). Then, the cofactor matrix, denoted Cof(A), is an Mn (R)
T
And, the Adjugate (classical Adjoint) of A, denoted Adj(A), equals CofT (A).
1 2 3
Example 2.8.11. Let A = 2 3 1. Then,
1 2 4
C C21 C31
T
11
Adj(A) = Cof (A) =
C12 C22 C32
C13 C23 C33
(−1)1+1 det(A(1|1)) (−1)2+1 det(A(2|1)) (−1)3+1 det(A(3|1))
= 1+2 det(A(1|2)) (−1)2+2 det(A(2|2)) (−1)3+2 det(A(3|2))
(−1)
(−1)1+3 det(A(1|3)) (−1)2+3 det(A(2|3)) (−1)3+3 det(A(3|3))
10 −2 −7
=
−7 1 5 .
1 0 −1
−1 0 0 det(A) 0 0
Now, verify that AAdj(A) =
0 −1 0 = 0
det(A) 0 = Adj(A)A.
0 0 −1 0 0 det(A)
2.8. DETERMINANT 63
x − 1 −2 −3
Consider xI3 − A =
−2 x − 3 −1 . Then,
−1 −2 x − 4
C C21 C31 x2 − 7x + 10 2x − 2 3x − 7
11
Adj(xI − A) = C12 C22
C32 = 2x − 7 x 2 − 5x + 1 x + 5
C13 C23 C33 x+1 2x x2 − 4x − 1
−7 2 3
2 2
= x I + x 2
−5 1 + Adj(A) = x I + Bx + C(say).
1 2 −4
That is, we have obtained a matrix identity. Hence, replacing x by A makes sense. But, then
the LHS is 0. So, for the RHS to be zero, we must have A3 − 8A2 + 10A − det(A)I = 0 (this
equality is famously known as the Cayley-Hamilton Theorem).
The next result relates adjugate matrix with the inverse, in case det(A) 6= 0.
T
AF
n n
aij (−1)i+j det(A(`|j)) = 0, for i 6= `.
P P
2. Then aij C`j =
j=1 j=1
Proof. Part 1: It follows directly from Remark 2.8.8 and the definition of the cofactor.
Part 2: Fix positive integers i, ` with 1 ≤ i 6= ` ≤ n. Suppose that the i-th and `-th rows of
B are equal to the i-th row of A and B[t, :] = A[t, :], for t 6= i, `. Since two rows of B are equal,
det(B) = 0. Now, let us expand the determinant of B along the `-th row. We see that
n
X
(−1)`+j b`j det B(` | j)
0 = det(B) = (2.8.2)
j=1
n
X
(−1)`+j aij det B(` | j)
= (bij = b`j = aij for all j)
j=1
Xn n
X
= (−1)`+j aij det A(` | j) = aij C`j . (2.8.3)
j=1 j=1
i=1
AF
The next result gives another equivalent condition for a square matrix to be invertible.
DR
1
Proof. Let A be non-singular. Then, det(A) 6= 0 and hence A−1 = det(A) Adj(A).
Now, let us assume that A is invertible. Then, using Theorem 2.7.1, A = E1 · · · Ek , a product
of elementary matrices. Thus, a repeated application of Parts 3, 4 and 5 of Theorem 2.8.5 gives
det(A) 6= 0.
The next result relates the determinant of a matrix with the determinant of its transpose. Thus,
the determinant can be computed by expanding along any column as well.
Theorem 2.8.16. Let A ∈ Mn (R). Then det(A) = det(AT ). Further, det(A∗ ) = det(A).
Proof. If A is singular then, by Theorem 2.8.15, A is not invertible. So, AT is also not invertible
and hence by Theorem 2.8.15, det(AT ) = 0 = det(A).
Now, let A be a non-singular and let AT = B. Then, by definition,
n
X n
X
det(AT ) = det(B) = (−1)1+j b1j det B(1 | j) = (−1)1+j aj1 det A(j | 1)
j=1 j=1
n
X
= aj1 Cj1 = det(A)
j=1
2.8. DETERMINANT 65
using Corollary 2.8.14. Further, using induction and the first part, one has
n
X
det(A∗ ) = det((A)T ) = det(A) = (−1)1+j a1j det A(1 | j)
j=1
n
X
= (−1)1+j a1j det A(1 | j) = det(A)
j=1
Case 2: Let A be singular. Then, by Theorem 2.8.15 A is"not#invertible. So, "by Proposi-
#
C1 C1
DR
2. Let A and B be two matrices having positive entries and of orders 1 × n and n × 1,
respectively. Which of BA or AB is invertible? Give reasons.
Consider the linear system Ax = b. Then, using Theorems 2.7.1 and 2.8.15, we conclude
that Ax = b has a unique solution for every b if and only if det(A) 6= 0. The next theorem,
commonly known as the Cramer’s rule gives a direct method of finding the solution of the
linear system Ax = b when det(A) 6= 0.
T
AF
Theorem 2.8.20. Let A be an n × n non-singular matrix. Then, the unique solution of the
DR
det(Aj )
xj = , for j = 1, 2, . . . , n,
det(A)
where Aj is the matrix obtained from A by replacing the j-th column of A, namely A[:, j], by b.
Proof. Since det(A) 6= 0, A is invertible. Thus A−1 [A | b] = [I | A−1 b]. Let d = A−1 b. Then
Ax = b has the unique solution xj = dj , for 1 ≤ j ≤ n. Thus,
−1 −1
A Aj = A A[:, 1], . . . , A[:, j − 1], b, A[:, j + 1], . . . , A[:, n]
−1 −1 −1 −1 −1
= A A[:, 1], . . . , A A[:, j − 1], A b, A A[:, j + 1], . . . , A A[:, n]
det(Aj )
Hence, xj = and the required result follows.
det(A)
2.9. MISCELLANEOUS EXERCISES 67
1 2 3 1
Example 2.8.21. Solve Ax = b using Cramer’s rule, where A = 2 3 1 and b = 1
.
1 2 2 1
T
Solution: Check that det(A) = 1 and x = [−1, 1, 0] as
1 2 3 1 1 3 1 2 1
x1 = 1 3 1 = −1, x2 = 2 1 1 = 1, and x3 = 2 3 1 = 0.
1 2 2 1 1 2 1 2 1
2. Let A be a unitary matrix then what can you say about | det(A) |?
Ans: Using Theorem 2.8.16 and A unitary ⇒ AA∗ = I ⇒ | det(A)|2 = 1 ⇒ | det(A)| = ±1.
(c) det(A) = 0.
DR
10. Determine necessary and sufficient condition for a triangular matrix to be invertible.
68 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
11. Let A and B be two non-singular matrices. Are the matrices A + B and A − B non-
singular? Justify your answer.
12. For what value(s) of λ does the following systems have non-trivial solutions? Also, for
each value of λ, determine a non-trivial solution.
14. Let A = [aij ] ∈ Mn (R) with aij = max{i, j}. Prove that det A = (−1)n−1 n.
15. Let p ∈ R, p 6= 0. Let A = [aij ], B = [bij ] ∈ Mn (R) with bij = pi−j aij , for 1 ≤ i, j ≤ n.
Then, compute det(B) in terms of det(A).
16. The position of an element aij of a determinant is called even or odd according as i + j is
even or odd. Prove that if all the entries in
(a) odd positions are multiplied with −1 then the value of determinant doesn’t change.
(b) even positions are multiplied with −1 then the value of determinant
T
AF
2.10 Summary
In this chapter, we started with a system of m linear equations in n variables and formally
wrote it as Ax = b and in turn to the augmented matrix [A | b]. Then, the basic operations on
equations led to multiplication by elementary matrices on the right of [A | b]. These elementary
matrices are invertible and applying the GJE on a matrix A, resulted in getting the RREF of
A. We used the pivots in RREF matrix to define the rank of a matrix. So, if Rank(A) = r and
Rank([A | b]) = ra
We have also seen that the following conditions are equivalent for A ∈ Mn (R).
1. A is invertible.
7. Rank(A) = n.
8. det(A) 6= 0.
1. Solving the linear system Ax = b. This idea will lead to the question “is the vector b a
linear combination of the columns of A”?
2. Solving the linear system Ax = 0. This will lead to the question “are the columns of A
linearly independent/dependent”? In particular, we will see that
(a) if Ax = 0 has a unique solution then the columns of A are linear independent.
(b) if Ax = 0 has a non-trivial solution then the columns of A are linearly dependent.
T
AF
DR
70 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
T
AF
DR
Chapter 3
Vector Spaces
In this chapter, we will mainly be concerned with finite dimensional vector spaces over R or C.
Please note that the real and complex numbers have the property that any pair of elements can
be added, subtracted or multiplied. Also, division is allowed by a non-zero element. Such sets in
mathematics are called field. So, Q, R and C are examples of field and they have infinite number
of elements. But, in mathematics, we do have fields that have only finitely many elements. For
example, consider the set Z5 = {0, 1, 2, 3, 4}. In Z5 , we define addition and multiplication,
respectively, as
+ 0 1 2 3 4 · 0 1 2 3 4
T
AF
0 0 1 2 3 4 0 0 0 0 0 0
1 1 2 3 4 0 1 0 1 2 3 4
DR
and .
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1
Then, we see that the elements of Z5 can be added, subtracted and multiplied. Note that 4
behaves as −1 and 3 behaves as −2. Thus, 1 behaves as −4 and 2 behaves as −3. Also, we see
that in this multiplication 2 · 3 = 1 and 4 · 4 = 1. Hence,
1. the division by 2 is similar to multiplying by 3,
2. the division by 3 is similar to multiplying by 2, and
3. the division by 4 is similar to multiplying by 4.
Thus, Z5 indeed behaves like a field. So, in this chapter, F will represent a field.
71
72 CHAPTER 3. VECTOR SPACES
(a) α · (u + v) = (α · u) + (α · v).
(b) (α + β) · u = (α · u) + (β · u).
So, we want the above properties to hold for any collection of vectors. Thus, formally, we have
the following definition.
Definition 3.1.1. A vector space V over F, denoted V(F) or in short V (if the field F is clear
from the context), is a non-empty set, in which one can define vector addition, scalar multipli-
T
AF
cation. Further, with these definitions, the properties of vector addition, scalar multiplication
and distributive laws (see items 1, 2 and 3 above) are satisfied.
DR
w1 = w1 + 0 = w1 + (u + w2 ) = (w1 + u) + w2 = 0 + w2 = w2 .
Hence, we represent this unique vector by −u and call it the additive inverse.
5. If V is a vector space over R then V is called a real vector space.
6. If V is a vector space over C then V is called a complex vector space.
7. In general, a vector space over R or C is called a linear space.
Some interesting consequences of Definition 3.1.1 is stated next. Intuitively, they seem
obvious. The proof are given for better understanding of the given conditions.
1. u + v = u implies v = 0.
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 73
Proof. Part 1: By Condition 1d and Remark 3.1.2.4, for each u ∈ V there exists −u ∈ V such
that −u + u = 0. Hence u + v = u implies
0 = −u + u = −u + (u + v) = (−u + u) + v = 0 + v = v.
Example 3.1.4. The readers are advised to justify the statements given below.
2. Let A ∈ Mm,n (F) and define V = {x ∈ Mn,1 (F) : Ax = 0}. Then, by Theorem 2.1.7, V
T
AF
satisfies:
DR
(a) 0 ∈ V as A0 = 0.
(b) if x ∈ V then αx ∈ V, for all α ∈ F. In particular, for α = −1, −x ∈ V.
(c) if x, y ∈ V then, for any α, β ∈ F, αx + βy ∈ V.
3. Consider R with the usual addition and multiplication. Then R forms a real vector space.
√
Recall that the symbol i represents the complex number −1.
6. Fix m, n ∈ N and let Mm,n (C) = {Am×n = [aij ] | aij ∈ C}. Then, with usual addition
and scalar multiplication of matrices, Mm,n (C) is a complex vector space. If m = n, the
vector space Mm,n (C) is denoted by Mn (C).
8. Fix a, b ∈ R with a < b and let C([a, b], R) = {f : [a, b] → R | f is continuous}. Then,
C([a, b], R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ [a, b], is a real vector space.
10. Fix a < b ∈ R and let C 2 ((a, b), R) = {f : (a, b) → R | f 00 is continuous}. Then,
C 2 ((a, b), R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ (a, b), is a real vector space.
11. Let R[x] = {a0 + a1 x + · · · + an xn | ai ∈ R, for 0 ≤ i ≤ n}. Now, let p(x), q(x) ∈ R[x].
T
AF
and αp(x) = (αa0 ) + (αa1 )x + · · · + (αam )xm , for α ∈ R. With these operations
“component-wise addition and multiplication”, it can be easily verified that R[x] forms a
real vector space.
12. Fix n ∈ N and let R[x; n] = {p(x) ∈ R[x] | p(x) has degree ≤ n}. Then, with component-
wise addition and multiplication, the set R[x; n] forms a real vector space.
13. Let V and W be vector spaces over F, with operations (+, •) and (⊕, ), respectively. Let
V × W = {(v, w) | v ∈ V, w ∈ W}. Then, V × W forms a vector space over F, if for every
(v1 , w1 ), (v2 , w2 ) ∈ V × W and α ∈ R, we define
v1 +v2 and w1 ⊕w2 on the right hand side mean vector addition in V and W, respectively.
Similarly, α • v1 and α w1 correspond to scalar multiplication in V and W, respectively.
Note that R2 is similar to R × R, where the operations are the same in both spaces.
(a) R is a vector space over Q. In this space, all the irrational numbers are vectors but
not scalars.
√
(b) V = {a + b 2 : a, b ∈ Q} is a vector space.
√ √ √
(c) V = {a + b 2 + c 3 + d 6 : a, b, c, d ∈ Q} is a vector space.
√
(d) V = {a + b −3 : a, b ∈ Q} is a vector space.
Then, R2 is a real vector space with (−1, 3)T as the additive identity.
17. Recall the field Z5 = {0, 1, 2, 3, 4} given on the first page of this chapter. Then, V =
{(a, b) | a, b ∈ Z5 } is a vector space over Z5 having 25 elements/vectors.
From now on, we will use ‘u + v’ for ‘u ⊕ v’ and ‘αu or α · u’ for ‘α u’.
T
AF
Exercise 3.1.6. 1. Verify that the vectors spaces mentioned in Example 3.1.4 do satisfy all
DR
Then, does V form a vector space under any of the two operations?
Definition 3.1.7. Let V be a vector space over F. Then, a non-empty subset W of V is called
a subspace of V if W is also a vector space with vector addition and scalar multiplication in
W coming from that in V (compute the vector addition and scalar multiplication in V and then
the computed vector should be an element of W).
Example 3.1.8.
3. Let V be a vector space. Then V and {0} are subspaces, called trivial subspaces.
76 CHAPTER 3. VECTOR SPACES
4. The real vector space R has no non-trivial subspace. To check this, let V 6= {0} be a
vector subspace of R. Then, there exists x ∈ R, x 6= 0 such that x ∈ V. Now, using scalar
multiplication, we see that {αx | α ∈ R} ⊆ V. As, x 6= 0, the set {αx | α ∈ R} = R. This
in turn implies that V = R.
8. Is the set of sequences converging to 0 a subspace of the set of all bounded sequences?
Let V(F) be a vector space and W ⊆ V, W 6= ∅. We now prove a result which implies that
to check W to be a subspace, we need to verify only one condition.
T
4. The commutative and associative laws of vector addition hold as they hold in V.
5. The conditions related with scalar multiplication and the distributive laws also hold as
they hold in V.
Exercise 3.1.10. 1. Prove that a line in R2 is a subspace if and only if it passes through
origin.
3. Does the set V given below form a subspace? Give reasons for your answer.
(f ) W = {A ∈ Mn (R) | AT = 2A}?
9. Among the following, determine the subspaces of the complex vector space Cn ?
(b) {(z1 , z2 , . . . , zn )T | z1 + z2 = z3 }.
Let us recollect that system Ax = b was either consistent (has a solution) or inconsistent (no
solution). It turns out that the system Ax = b is consistent leads to the idea that the vector b
is a linear combination of the columns of A. Let us try to understand them using examples.
Example 3.2.1.
78 CHAPTER 3. VECTOR SPACES
1 1 2 2 1 1 2
1. Let A = 1 2 and b = 3. Then, 3 = 1 + 2
. Thus,
3 is a linear
1 3 4 4 1 3 4
1 1 10
combination of the vectors in S = 1, 2 . Similarly, the vector
16 is a linear
1
3
22
10 1 1 " #
4
combination of the vectors in S as
16 = 41 + 62 = A 6 .
22 1 3
2 1 1 2
2. Let b =
3. Then, the system Ax = b has no solution as REF ([A b]) = 0 1 1.
5 0 0 1
Definition 3.2.2. Let V be a vector space over F and let S = {u1 , . . . , un } ⊆ V. Then, a
vector u ∈ V is called a linear combination of elements of S if we can find α1 , . . . , αn ∈ F
such that
n
X
u = α 1 u1 + · · · + α n un = α i ui .
T
i=1
AF
n
αi ui , where α1 , . . . , αn ∈ F, is said to be a linear
P
Or equivalently, any vector of the form
DR
i=1
combination of the elements of S.
Example 3.2.3.
1. (3, 4, 5) is not a linear combination of (1, 1, 1) and (1, 2, 1) as the linear system (3, 4, 5) =
a(1, 1, 1) + b(1, 2, 1), in the unknowns a and b has no solution.
Exercise 3.2.4. 1. Let x ∈ R3 . Prove that xT is a linear combination of (1, 0, 0), (2, 1, 0)
and (3, 3, 1).
3.2. LINEAR COMBINATION AND LINEAR SPAN 79
1 2 3 a
T
Ans: Let x = (x, y, z) and A = 0 1 3. To show: A b
= x in the unknowns a, b
0 0 1 c
and c has a solution for every x. True, as RREF(A) = I3 .
Let V be a vector space over F and S a subset of V. We now look at ‘linear span’ of a collection
of vectors. So, here we ask “what is the largest collection of vectors that can be obtained as
linear combination of vectors from S”? Or equivalently, what is the smallest subspace of V that
contains S? We first look at an example for clarity.
Example 3.2.5. Let S = {(1, 0, 0), (1, 2, 0)} ⊆ R3 . We want the largest possible subspace
of R3 which contains vectors of the form α(1, 0, 0), β(1, 2, 0) and α(1, 0, 0) + β(1, 2, 0) for all
possible choices of α, β ∈ R. Note that
T
AF
2. `2 = {β(1, 2, 0) : β ∈ R} gives the line passing through (0, 0, 0) and (1, 2, 0).
So, we want the largest subspace of R3 that contains vectors which are formed as sum of
any two points on the two lines `1 and `2 . Or the smallest subspace of R3 that contains S? We
give the definition next.
That is, LS(S) is the set of all possible linear combinations of finitely many vectors of S.
If S is an empty set, we define LS(S) = {0}.
2. V is said to be finite dimensional if there exists a finite set S such that V = LS(S).
3. If there does not exist any finite subset S of V such that V = LS(S) then V is called
infinite dimensional.
3. S = {1 + 2x + 3x2 , 1 + x + 2x2 , 1 + 2x + x3 }.
Solution: To understand LS(S), we need to find condition(s) on α, β, γ, δ such that the
linear system
Note that, for every fixed n ∈ N, R[x; n] is finite dimensional as R[x; n] = LS ({1, x, . . . , xn }).
0 1 1 0 1 2
4. S = I3 , 1 1 2, 1 0 2 ⊆ M3 (R).
1 2 0 2 2 4
Solution: To get the equation, we need to find conditions on aij ’s such that the system
α β+γ β + 2γ a11 a12 a13
β+γ α + β 2β + 2γ = a21 a22 a23 ,
β + 2γ 2β + 2γ α + 2γ a31 a32 a33
in the unknowns α, β, γ is always consistent. Now, verify that the required condition
equals
a22 + a33 − a13
LS(S) = {A = [aij ] ∈ M3 (R) | A = AT , a11 = ,
2
a22 − a33 + 3a13 a22 − a33 + 3a13
a12 = , a23 = .
4 2
In general, for each fixed m, n ∈ N, the vector space Mm,n (R) is finite dimensional as
Mm,n (R) = LS ({eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n}).
3.2. LINEAR COMBINATION AND LINEAR SPAN 81
5. C[x] is not finite dimensional as the degree of a polynomial can be any large positive
integer. Indeed, verify that C[x] = LS({1, x, x2 , . . . , xn , . . .}).
Exercise 3.2.8. Determine the equation of the geometrical object represented by LS(S).
1. S = {π} ⊆ R.
Ans: R
4. S = {(1, 0, 1)T , (0, 1, 0)T , (2, 0, 2)T } ⊆ R3 . Give two examples of vectors u, v different
from the given set such that LS(S) = LS(u, v).
Ans: {(x, y, z) ∈ R3 : x = z}. Take u = (1, 2, 1)T and v = (2, 1, 2)T .
0 1 0 0 0 1 0 1 1
AF
0 1 −1 0 0 ⊆ M3 (R).
6. S = −1 0 1 0
, ,
DR
0 −1 0
−1 −1 0 −1 0 0
9. S = {1, x, x2 , . . .} ⊆ C[x].
Ans: The whole space C[x].
Lemma 3.2.9. Let V be a vector space over F with S ⊆ V. Then LS(S) is a subspace of V.
Theorem 3.2.11. Let V be a vector space over F and S ⊆ V. Then LS(S) is the smallest
subspace of V containing S.
Proof. For every u ∈ S, u = 1 · u ∈ LS(S). Thus, S ⊆ LS(S). Need to show that LS(S) is the
smallest subspace of V containing S. So, let W be any subspace of V containing S. Then, by
Exercise 3.2.10, LS(S) ⊆ W and hence the result follows.
Definition 3.2.12. Let V be a vector space over F and S, T be two subsets of V. Then, the
sum of S and T , denoted S + T equals {s + t| s ∈ S, t ∈ T }.
Example 3.2.13.
1. If V = R, S = {0, 1, 2, 3, 4, 5, 6} and T = {5, 10, 15} then S + T = {5, 6, . . . , 21}.
(" #) (" #) (" #)
1 −1 0
2. If V = R , S =
2 and T = then S + T = .
1 1 2
(" #) " #! (" # " # )
1 −1 1 −1
3. If V = R2 , S = and T = LS then S + T = +c |c∈R .
1 1 1 1
Exercise 3.2.14. Let P and Q be two non-trivial, distinct subspaces of R2 . Then P + Q = R2 .
Ans: Since P and Q are non-trivial both of them " are
# lines passing through 0. As they
" are
T
" # #
a c a c
AF
" #
x1
is invertible (ad − bc 6= 0). Hence Ax = b, for the unknown vector x = , has a solution for
x2
" # " #
b1 x1
every b = ∈ R2 . Therefore, if x = is a solution then b = x1 u + x2 v.
b2 x2
We leave the proof of the next result for readers.
Lemma 3.2.15. Let P and Q be two subspaces of a vector space V over F. Then P + Q is a
subspace of V. Furthermore, P + Q is the smallest subspace of V containing both P and Q.
5. Let S = {x1 , x2 , x3 , x4 }, where x1 = (1, 0, 0)T , x2 = (1, 1, 0)T , x3 = (1, 2, 0)T and x4 =
(1, 1, 1)T . Then, determine all xi such that LS(S) = LS(S \ {xi }).
6. Let W = LS((1, 0, 0)T , (1, 1, 0)T ) and U = LS((1, 1, 1)T ). Prove that W + U = R3 and
W ∩ U = {0}. If v ∈ R3 , determine w ∈ W and u ∈ U such that v = w + u. Is it
necessary that w and u are unique?
T
AF
7. Let W = LS((1, −1, 0), (1, 1, 0)) and U = LS((1, 1, 1), (1, 2, 1)). Prove that W + U = R3
and W ∩ U 6= {0}. Find v ∈ R3 such that v = w + u, for 2 different choices of w ∈ W
and u ∈ U. Thus, the choice of vectors w and u is not unique.
Ans: Verify that (0, 1, 0) ∈ W∩U. Thus, Y -axis belongs to both the subspaces. Fir example,
(0, 1, 0) = 21 (1, 1, 0) − 21 (1, −1, 0) = (1, 2, 1) − (1, 1, 1).
8. Let S = {(1, 1, 1, 1)T , (1, −1, 1, 2)T , (1, 1, −1, 1)T } ⊆ R4 . Does (1, 1, 2, 1)T ∈ LS(S)? Fur-
thermore, determine conditions on x, y, z and u such that (x, y, z, u)T ∈ LS(S).
Ans: (1, 1, 2, 1) = 23 (1, 1, 1, 1)− 21 (1, 1, −1, 1). L(S) = {(x, y, z, u)T ∈ R4 : 3x−y−2u = 0}.
Example 3.3.1.
84 CHAPTER 3. VECTOR SPACES
1 1
1. Let A = 1 2 . Then Ax = 0 has only the trivial solution. So, we say that the columns
1 3
1 1
of A are linearly independent. Thus, the set S = 1, 2 , consisting of columns of
1
3
A, is linearly independent.
1 1 2 1 1 2
2. Let A = 1 2 3. As REF (A) = 0 1 1, Ax = 0 has only the trivial solution.
1 3 5 0 0 1
1 1 2
Hence, the set S = 1, 2, 3 , consisting of columns of A, is linearly independent.
1
3 5
1 1 2 1 1 2
3. Let A = 1 2 3. As REF (A) = 0 1
, Ax = 0 has a non-trivial solution. Hence,
1
1 3 4 0 0 0
1 1 2
the set S = 1 , 2, 3 , consisting
of columns of A, is linearly dependent.
T
1
3 4
AF
α1 u1 + α2 u2 + · · · + αm um = 0, (3.3.1)
in the unknowns αi ’s, 1 ≤ i ≤ m, has only the trivial solution. If Equation (3.3.1) has a
non-trivial solution then S is said to be linearly dependent. If S has infinitely many vectors
then S is said to be linearly independent if for every finite subset T of S, T is linearly
independent.
Observe that we are solving a linear system over F. Hence, whether a set is linearly inde-
pendent or linearly dependent depends on the set of scalars.
Example 3.3.3.
1. Consider C2 as a vector space over R. Let S = {(1, 2)T , (i, 2i)T }. Then, the linear system
a · (1, 2)T + b · (i, 2i)T = (0, 0)T , in the unknowns a, b ∈ R has only the trivial solution,
namely a = b = 0. So, S is a linear independent subset of the vector space C2 over R.
2. Consider C2 as a vector space over C. Then S = {(1, 2)T , (i, 2i)T } is a linear dependent
subset of the vector space C2 over C as a = −i and b = 1 is a non-trivial solution.
3.3. LINEAR INDEPENDENCE 85
3. Let V be the vector space of all real valued continuous functions with domain [−π, π].
Then V is a vector space over R. Question: What can you say about the linear indepen-
dence or dependence of the set S = {1, sin(x), cos(x)}?
Solution: For all x ∈ [−π, π], consider the system
h ia
1 sin(x) cos(x) b = 0 ⇔ a · 1 + b · sin(x) + c · cos(x) = 0, (3.3.2)
c
in the unknowns a, b and c. Even though we seem to have only one linear system, we we
can obtain the following two linear systems (the first using differentiation and the second
π
using evaluation at 0, and π of the domain).
2
a + b sin x + c cos x =0
a+c =0
0 · a + b cos x − c sin x = 0 or a+b =0
0 · a − b sin x − c cos x = 0 a−c =0
Clearly, the above systems has only the trivial solution. Hence, S is linearly independent.
4. Let A ∈ Mm,n (C). If Rank(A) < m then, the rows of A are linearly dependent. " #
C
Solution: As Rank(A) < m, there exists an invertible matrix P such that P A = .
0
T
m
AF
Thus, 0T = (P A)[m, :] =
P
pmi A[i, :]. As P is invertible, at least one pmi 6= 0. Thus, the
i=1
required result follows.
DR
5. Let A ∈ Mm,n (C). If Rank(A) < n then, the columns of A are linearly dependent.
Solution: As Rank(A) < n the system Ax = 0 has a non-trivial solution.
6. Let S = {0}. Is S linearly independent?
Solution: Let u = 0. So, consider the system αu = 0. This has a non-trivial solution
α = 1 as 1 · 0 = 0.
(" # " #) " #
0 1 0 1
7. Let S = , . Then Ax = 0 corresponds to A = . This has a non-trivial
0 2 0 2
" #
1
solution x = . Hence, S is linearly dependent.
0
(" #)
1
8. Let S = . Is S linearly independent?
2
" #
1
Solution: Let u = . Then the system αu = 0 has only the trivial solution. Hence S
2
is linearly independent.
So, we observe that 0, the zero-vector cannot belong to any linearly independent set. Fur-
ther, a set consisting of a single non-zero vector is linearly independent.
Exercise 3.3.4. 1. Show that S = {(1, 2, 3)T , (−2, 1, 1)T , (8, 6, 10)T } ⊆ R3 is linearly de-
pendent.
86 CHAPTER 3. VECTOR SPACES
1 −2 8
Ans: Let A =
2 1 6 . The det(A) = 0. So, linearly dependent.
3 1 10
2. Let A ∈ Mn (R). Suppose x, y ∈ Rn \ {0} such that Ax = 3x and Ay = 2y. Then, prove
that x and y are linearly independent.
Ans: Consider the linear system ax + by = 0 in the unknowns a and b. Multiplying by A
and using the given conditions, we get 0 = 3ax + 2by. Thus, ax = 0. As x 6= 0 ⇒ a = 0.
Thus b = 0.
2 1 3
4 −1 3. Determine x, y, z ∈ R \ {0} such that Ax = 6x, Ay = 2y and
3
3. Let A =
3 −2 5
Az = −2z. Use the vectors x, y and z obtained above to prove the following.
Ans: (a) A2 v = A (A(cy + dz)) = A (2cy − 2dz) = 2cAy − 2dAz = 4(cy + dz) = 4v.
(b) Consider the system ax+by+cz = 0 in the unknowns a, b and c. Multiply by A to get
DR
6ax+2by−2cz
= 0 and again
multiply
this equation
get 62 ax+22 by+(−2)2 cz =
by A to
1 1 1 ax ax 0
0. Thus 6 2
−2 by = 0. Hence, by = 0
implies result. (b) implies
62 22 (−2)2 cz cz 0
(c). (d) follows using matrix multiplication.
We now prove a couple of results which will be very useful in the next section.
Lemma 3.3.7. Let S be a linearly independent subset of a vector space V over F. Then, each
v ∈ LS(S) is a unique linear combination of vectors from S.
Proof. Suppose there exists v ∈ LS(S) with v ∈ LS(T1 ), LS(T2 ) with T1 , T2 ⊆ S. Let T1 =
{v1 , . . . , vk } and T2 = {w1 , . . . , w` }, for some vi ’s and wj ’s in S. Define T = T1 ∪ T2 . Then,
T is a subset of S. Hence, using Proposition 3.3.5, the set T is linearly independent. Let T =
{u1 , . . . , up }. Then, there exist αi ’s and βj ’s in F, not all zero, such that v = α1 u1 + · · · + αp up
as well as v = β1 u1 + · · · + βp up . Equating the two expressions for v gives
So,
w1 a11 u1 + · · · + a1k uk a11 · · · a1k u1
. . . . . .
. = .. = . .. .. .. .
. .
wm am1 u1 + · · · + amk uk am1 · · · amk uk
Proof. Observe that Rn = LS({e1 , . . . , en }), where ei = In [:, i], is the i-th column of In . Hence,
using Theorem 3.3.8, the required result follows.
Theorem 3.3.10. Let S be a linearly independent subset of a vector space V over F. Then, for
any v ∈ V the set S ∪ {v} is linearly dependent if and only if v ∈ LS(S).
Proof. Let us assume that S ∪ {v} is linearly dependent. Then, there exist vi ’s in S such that
the linear system
α1 v1 + · · · + αp vp + αp+1 v = 0 (3.3.4)
Now, assume that v ∈ LS(S). Then, there exists vi ∈ S and ci ∈ F, not all zero, such that
p
P
v= ci vi . Thus, the linear system α1 v1 + · · · + αp vp + αp+1 v = 0 in the variables αi ’s has a
T
i=1
non-trivial solution [c1 , . . . , cp , −1]. Hence, S ∪ {v} is linearly dependent.
AF
We now state a very important corollary of Theorem 3.3.10 without proof. This result can
DR
Corollary 3.3.11. Let V be a vector space over F and let S be a subset of V containing a
non-zero vector u1 .
1. If S is linearly dependent then, there exists k such that LS(u1 , . . . , uk ) = LS(u1 , . . . , uk−1 ).
Or equivalently, if S is a linearly dependent set then there exists a vector uk , for k ≥ 2,
which is a linear combination of the previous vectors.
As an application, we have the following result about finite dimensional vector spaces. We
leave the proof for the reader as it directly follows from Corollary 3.3.11 and the idea that an
algorithm has to finally stop if it has finite number of steps to implement.
Exercise 3.3.13.
1. Prove Corollary 3.3.11.
2. Let V and W be subspaces of Rn such that V + W = Rn and V ∩ W = {0}. Prove that
each u ∈ Rn is uniquely expressible as u = v + w, where v ∈ V and w ∈ W.
3. Let W be a subspace of a vector space V over F. For u, v ∈ V \ W, define K = LS(W, u)
and M = LS(W, v). Then, prove that v ∈ K if and only if u ∈ M .
4. Suppose V is a vector space over R as well as over C. Then, prove that {u1 , . . . , uk }
is a linearly independent subset of V over C if and only if {u1 , . . . , uk , iu1 , . . . , iuk } is a
linearly independent subset of V over R.
5. Is the set {1, x, x2 , . . .} a linearly independent subset of the vector space C[x] over C?
6. Is the set {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} a linearly independent subset of the vector space
Mm,n (C) over C (see Definition 1.4.1.1)?
In this subsection, we use the understanding of vector spaces to relate the rank of a matrix
with linear independence and dependence of rows and columns of a matrix. We start with our
T
AF
2. As the 1-st, 2-nd and 4-th columns of B are linearly independent, the set consisting of
corresponding columns {A[:, 1], A[:, 2], A[:, 4]} is linearly independent.
3. Also, note that during the application of GJE, the 3-rd and 4-th rows were interchanged.
Hence, the rows A[1, :], A[2, :] and A[4, :] are linearly independent.
1. A is invertible.
αn
(b) If S2 is linearly independent then prove that A is invertible. Further, in this case,
the set S1 is necessarily linearly independent.
3.4. BASIS OF A VECTOR SPACE 91
h iT
Ans: Suppose A is not invertible. Then there exists x0 = x01 · · · x0n 6= 0 such
that Ax0 = 0. Thus, we have obtained x0 6= 0 such that
x01 x01
.. ..
h i h i h i
w1 · · · wn .
= A
u1 · · · un .
= u1 · · · un 0 = 0,
x0n x0n
Example 3.4.2. Let T = {2, 3, 4, 7, 8, 10, 12, 13, 14, 15}. Then, a maximal subset of T of
consecutive integers is S = {2, 3, 4}. Other maximal subsets are {7, 8}, {10} and {12, 13, 14, 15}.
Note that {12, 13} is not maximal. Why?
Definition 3.4.3. Let V be a vector space over F. Then, S is called a maximal linearly
independent subset of V if
1. S is linearly independent and
2. no proper superset of S in V is linearly independent.
Example 3.4.4.
1. In R3 , the set S = {e1 , e2 } is linearly independent but not maximal as S ∪ {(1, 1, 1)T } is
a linearly independent set containing S.
2. In R3 , S = {(1, 0, 0)T , (1, 1, 0)T , (1, 1, −1)T } is a maximal linearly independent set as S is
linearly independent and any collection of 4 or more vectors from R3 is linearly dependent
(see Corollary 3.3.9).
3. Let S = {v1 , . . . , vk } ⊆ Rn . Now, form the matrix A = [v1 , . . . , vk ] and let B =
RREF(A). Then, using Theorem 3.3.14, we see that if B[:, i1 ], . . . , B[:, ir ] are the piv-
otal columns of B then {vi1 , . . . , vir } is a maximal linearly independent subset of S.
4. Is the set {1, x, x2 , . . .} a maximal linearly independent subset of C[x] over C?
92 CHAPTER 3. VECTOR SPACES
Theorem 3.4.5. Let V be a vector space over F and S a linearly independent set in V. Then,
S is maximal linearly independent if and only if LS(S) = V.
Proof. Let v ∈ V. As S is linearly independent, using Corollary 3.3.11.2, the set S ∪ {v} is
linearly independent if and only if v ∈ V \ LS(S). Thus, the required result follows.
Let V = LS(S) for some set S with | S | = k. Then, using Theorem 3.3.8, we see that if
T ⊆ V is linearly independent then | T | ≤ k. Hence, a maximal linearly independent subset
of V can have at most k vectors. Thus, we arrive at the following important result.
Theorem 3.4.6. Let V be a vector space over F and let S and T be two finite maximal linearly
independent subsets of V. Then | S | = | T | .
Proof. By Theorem 3.4.5, S and T are maximal linearly independent if and only if LS(S) =
V = LS(T ). Now, use the previous paragraph to get the required result.
Let V be a finite dimensional vector space. Then, by Theorem 3.4.6, the number of vectors
in any two maximal linearly independent set is the same. We use this number to now define
the dimension of a vector space.
T
Definition 3.4.7. Let V be a finite dimensional vector space over F. Then, the number of
AF
vectors in any maximal linearly independent set is called the dimension of V, denoted dim(V).
DR
By convention, dim({0}) = 0.
Example 3.4.8.
1. As {1} is a maximal linearly independent subset of R, dim(R) = 1.
2. As {e1 , . . . , en } is a maximal linearly independent subset in Rn , dim(Rn ) = n.
3. As {e1 , . . . , en } is a maximal linearly independent subset in Cn over C, dim(Cn ) = n.
4. Using Exercise 3.3.13.4, {e1 , . . . , en , ie1 , . . . , ien } is a maximal linearly independent subset
in Cn over R. Thus, as a real vector space, dim(Cn ) = 2n.
5. As {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a maximal linearly independent subset of Mm,n (C) over
C, dim(Mm,n (C)) = mn.
Definition 3.4.9. Let V be a finite dimensional vector space over F. Then, a maximal linearly
independent subset of V is called a basis of V. The vectors in a basis are called basis vectors.
By convention, a basis of {0} is the empty set.
Thus, using Theorem 3.3.12 we see that every finite dimensional vector space has a basis.
Remark 3.4.10 (Standard Basis). The readers should verify the statements given below.
1. All the maximal linearly independent set given in Example 3.4.8 form the standard basis
of the respective vector space.
3.4. BASIS OF A VECTOR SPACE 93
3. Fix a positive integer n. Then {1, x, x2 , . . . , xn } is the standard basis of R[x; n] over R.
5. Let V = {A ∈ Mn (R) | AT = −A}. Then, V is a vector space over R with standard basis
{eij − eji | 1 ≤ i < j ≤ n}.
Definition 3.4.11. Let V be a vector space over F. Then, a subset S of V is called minimal
spanning if LS(S) = V and no proper subset of S spans V.
Example 3.4.12.
5. Let S = {a1 , . . . , an }. Then, RS is a real vector space (see Example 3.1.4.7). For 1 ≤ i ≤ n,
define the functions (
1 if j = i
ei (aj ) = .
0 otherwise
Theorem 3.4.13. Let V be a non-zero vector space over F. Then, the following statements are
equivalent.
Remark 3.4.14. Let B be a basis of a vector space V over F. Then, for each v ∈ V, there exist
n
unique ui ∈ B and unique αi ∈ F, for 1 ≤ i ≤ n, such that v =
P
αi ui .
i=1
The next result is generally known as “every linearly independent set can be extended to
form a basis of a finite dimensional vector space”. Also, recall Theorem 3.3.12.
Theorem 3.4.15. Let V be a vector space over F with dim(V) = n. If S is a linearly independent
subset of V then there exists a basis T of V such that S ⊆ T .
Proof. If LS(S) = V, done. Else, choose u1 ∈ V \ LS(S). Thus, by Corollary 3.3.11.2, the set
T
AF
S ∪{u1 } is linearly independent. We repeat this process till we get n vectors in T as dim(V) = n.
By Theorem 3.4.13, this T is indeed a required basis.
DR
We end this section with an algorithm which is based on the proof of the previous theorem.
in more than one way as a linear combination of vectors from S? Is it possible to get a
subset T of S such that T is a basis of V over F? Give reasons for your answer.
4. Let {v1 , . . . , vn } be a basis of Cn . Then, prove that the two matrices B = [v1 , . . . , vn ] and
v1T
.
C= .
. are invertible.
T
vnT
AF
the system has only the trivial solution. So, using Theorem 2.7.1 B is invertible. A similar
idea implies that C is invertible.
5. Let W1 and W2 be two subspaces of a finite dimensional vector space V such that W1 ⊆ W2 .
Then, prove that W1 = W2 if and only if dim(W1 ) = dim(W2 ).
6. Let W1 be a subspace of a finite dimensional vector space V over F. Then, prove that
there exists a subspace W2 of V such that
Also, prove that for each v ∈ V there exist unique vectors w1 ∈ W1 and w2 ∈ W2 with
v = w1 + w2 . The subspace W2 is called the complementary subspace of W1 in V and
we write V = W1 ⊕ W2 .
7. Let V be a finite dimensional vector space over F. If W1 and W2 are two subspaces of V
such that W1 ∩W2 = {0} and dim(W1 )+dim(W2 ) = dim(V) then prove that W1 +W2 = V.
Ans: Let S1 and S2 be the bases of W1 and W2 , respectively. As W1 ∩W2 = {0}, S = S1 ∪S2
is also linearly independent. Now, use Exercise 3.3a to get the result as S has dim(V) vectors.
8. Consider the vector space C([−π, π]) over R. For each n ∈ N, define en (x) = sin(nx).
Then, prove that S = {en | n ∈ N} is linearly independent. [Hint: Need to show that every
finite subset of S is linearly independent. So, on the contrary assume that there exists ` ∈ N and
functions ek1 , . . . , ek` such that α1 ek1 + · · · + α` ek` = 0, for some αt 6= 0 with 1 ≤ t ≤ `. But,
the above system is equivalent to looking at α1 sin(k1 x) + · · · + α` sin(k` x) = 0 for all x ∈ [−π, π].
Now in the integral
Z π Z π
sin(mx) (α1 sin(k1 x) + · · · + α` sin(k` x)) dx = sin(mx)0 dx = 0
−π −π
replace m with ki ’s to show that αi = 0, for all i, 1 ≤ i ≤ `. This gives the required contradiction.]
9. Is the set {1, sin(x), cos(x), sin(2x), cos(2x), sin(3x), cos(3x), . . .} a linearly subset of the
vector space C([−π, π], R) over R?
10. Find a basis of R3 containing the vector (1, 1, −2)T and (1, 2, −1)T .
T
AF
Ans: Just find a vector that doesn’t belong to linear span of given vectors.
DR
14. Let uT = (1, 1, −2), vT = (−1, 2, 3) and wT = (1, 10, 1). Find a basis of LS(u, v, w).
Determine a geometrical representation of LS(u, v, w).
Ans: LS(u, v, w) = {(x, y, z)T ∈ R3 : 7x − y + 3z = 0}. Any two of them form a basis.
Geometrically, it represents a plane containing the vectors u, v and w.
15. Is the set W = {p(x) ∈ R[x; 4] | p(−1) = p(1) = 0} a subspace of R[x; 4]? If yes, find its
dimension.
Ans: Verify W = {p(x) ∈ R[x; 4] | p(x) = (x2 − 1)g(x), g(x) ∈ R[x; 2]}. So, dim(W) = 3.
3.5. FUNDAMENTAL SUBSPACES ASSOCIATED WITH A MATRIX 97
Definition 3.5.1. Let A ∈ Mm,n (R). Then, we define the four fundamental subspaces associ-
ated with A as
1. Col(A) = {Ax | x ∈ Rn } is a subspace of Rm , called the Column space, and is the
linear span of the columns of A.
2. Row(A) = Col(AT ) = {AT x | x ∈ Rm } is a subspace of Rn , called the row space of A
and is the linear span of the rows of A.
3. Null(A) = {x ∈ Rn | Ax = 0}, called the Null space of A.
4. Null(AT ) = {x ∈ Rm | AT x = 0}, also called the left-null space.
Example 3.5.3.
T
AF
1 1 1 −2
1. Compute the fundamental subspaces for A =
1 2 −1 .
1
DR
1 −2 7 −11
Solution: Verify the following
Remark 3.5.4. Let A ∈ Mm,n (R). Then, in Example 3.5.3, observe that the direction ratios
of normal vectors of Col(A) matches with vector in Null(AT ). Similarly, the direction ratios
of normal vectors of Row(A) matches with vectors in Null(A). Are these true in the general
setting?
98 CHAPTER 3. VECTOR SPACES
Exercise 3.5.5. 1. For the matrices given below, determine the four fundamental spaces.
Further,
find the dimensions
of all
the vector subspacesso obtained.
1 2 1 3 2 2 4 0 6
−1 0 −2 5
0 2 2 2 4
A= and B = .
2
−2 4 0 8 −3 −5 1 −4
4 2 5 6 10 −1 −1 1 2
2. Let A = [X Y ]. Then, determine the condition under which Col(X) = Col(Y ).
The next result is a re-writing of the results on system of linear equations. The readers are
advised to provide the proof for clarity.
Let W1 and W1 be two subspaces of a vector space V over F. Then, recall that (see
Exercise 3.2.16.4d) W1 + W2 = {u + v | u ∈ W1 , v ∈ W2 } = LS(W1 ∪ W2 ) is the smallest
subspace of V containing both W1 and W2 . We now state a result similar to a result in Venn
T
AF
Theorem 3.5.7. Let V be a finite dimensional vector space over F. If W1 and W2 are two
subspaces of V then
For better understanding, we give an example for finite subsets of Rn . The example uses
Theorem 3.3.14 to obtain bases of LS(S), for different choices S. The readers are advised to
see Example 3.3.14 before proceeding further.
Thus, a required basis of V is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 0)T , (0, 0, 0, 1, 3)T }. Sim-
ilarly, a required basis of W is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 1)T }.
Exercise 3.5.9. 1. Give an example to show that if A and B are equivalent then Col(A)
need not equal Col(B).
3. Let W1 and W2 be two subspaces of a vector space V. If dim(W1 ) + dim(W2 ) > dim(V),
then prove that dim(W1 ∩ W2 ) ≥ 1.
DR
So, D = {Aur+1 , . . . , Aun } spans Col(A). We further need to show that D is linearly indepen-
dent. So, consider the homogeneous linear system given below in the unknowns α1 , . . . , αn−r .
In other words, we have shown that the only solution of Equation (3.6.3) is the trivial solution.
Hence, {Aur+1 , . . . , Aun } is a basis of Col(A). Thus, the required result follows.
Theorem 3.6.1 is part of what is known as the fundamental theorem of linear algebra (see
Theorem 3.6.5). The following are some of the consequences of the rank-nullity theorem. The
proofs are left as an exercise for the reader.
(d) dim(Col(A)) = k.
AF
3. Let A ∈ Mn (R) and define a function f : Rn → Rn by f (x) = Ax. Then, the following
statements are equivalent.
(a) f is one-one.
(b) f is onto.
(c) f is invertible.
" #
1 −1
4. Let A = . Then, verify that Null(A) = Col(A). Can such examples exist in Rn
1 −1
for n odd? What about n even? Further, verify that R2 6= Null(A) + Col(A). Does it
contradict the rank-nullity theorem?
We end this section by proving the fundamental theorem of linear algebra. We start with
the following result.
Lemma 3.6.4. Consider the vector space Rn . Then, for S ⊆ Rn prove that
1. S ⊥ is a subspace of Rn .
2. S ⊥ = (LS(S))⊥ .
3. (S ⊥ )⊥ = S ⊥ if and only if S is a subspace of Rn .
4. Let W be a subspace of Rn . Then, there exists a subspace V of Rn such that
(a) Rn = W ⊕ V. Or equivalently, W and V are complementary subspaces.
(b) vT u = 0, for every u ∈ W and v ∈ V. This, further implies that W and V are also
orthogonal to each other. Such spaces are called orthogonal complements.
Theorem 3.6.5 (Fundamental Theorem of Linear Algebra). Let A ∈ Mm,n (R). Then,
T
1. dim(Null(A)) + dim(Col(A)) = n.
AF
⊥ ⊥
2. Null(A) = Col(AT ) and Null(AT ) = Col(A) .
DR
0 = (AT y)T z = yT Az = yT y ⇔ y = 0.
Thus Az = 0 and z ∈ Null(A). This completes the proof of the first equality in Part 2. A
similar argument gives the second equality.
Part 3: Note that, using the rank-nullity theorem we have
⊥
dim(Col(A)) = n − dim(Null(A)) = n − dim Col(AT ) = n − n − dim Col(AT ) .
Remark 3.6.6. Let A ∈ Mm,n (R). Then, Theorem 3.6.5.2 implies the following:
⊥
1. Null(A) = Col(AT ) . This is just stating the usual fact that if x ∈ Null(A) then
Ax = 0. Hence, the dot product of every row of A with x equals 0.
2. Rn = Null(A) ⊕ Col(AT ). Further, Null(A) is orthogonal complement of Col(AT ).
3. Rm = Null(AT ) ⊕ Col(A). Further, Null(AT ) is orthogonal complement of Col(A).
As an implication of last two parts of Theorem 3.6.5, we show the existence of an invertible
function f : Col(AT ) → Col(A).
Corollary 3.6.7. Let A ∈ Mm,n (R). Then, the function f : Col(AT ) → Col(A) defined by
f (x) = Ax is invertible.
Proof. Let us first show that f is one-one. So, let x, y ∈ Col(AT ) such that f (x) = f (y).
Hence, Ax = Ay. Thus x − y ∈ Null(A) = (Col(AT ))⊥ (by Theorem 3.6.5.2). Therefore,
x − y ∈ (Col(AT ))⊥ ∩ Col(AT ) = {0}. Thus x = y and hence f is one-one.
We now show that f is onto. So, let z ∈ Col(A). To find y ∈ Col(AT ) such that f (y) = z.
As z ∈ Col(A) there exists w ∈ Rn with z = Aw. But Null(A) and Col(AT ) are
complementary subspaces and hence, there exists unique vectors, w1 ∈ Null(A) and w2 ∈
Col(AT ), such that w = w1 + w2 . Thus, z = Aw implies
The readers should look at Example 3.5.3 and Remark 3.5.4. We give one more example.
1 1 0
Example 3.6.8. Let A = 2 1 1
. Then, verify that
3 2 1
1. {(0, 1, 1)T , (1, 1, 2)T } is a basis of Col(A).
2. {(1, 1, −1)T } is a basis of Null(AT ).
3. Null(AT ) = (Col(A))⊥ .
Exercise 3.6.9. 1. Find distinct subspaces W1 and W2
(a) in R2 such that W1 and W2 are orthogonal but not orthogonal complement.
(b) in R3 such that W1 6= {0} and W2 6= {0} are orthogonal, but not orthogonal comple-
ment.
Ans: (a) Pick two oblique lines (non-perpendicular lines) passing through (0, 0). For (b),
take the X-axis and the Y -axis.
2. Let A ∈ Mm,n (R). Prove that Col(A) = Col(AT A). Thus, Rank(A) = n if and only if
Rank(AT A) = n. [ Hint: Use the rank-nullity theorem and/ or Lemma 3.6.3]
3. Let A ∈ Mm,n (R). Then, for every
(a) x ∈ Rn , x = u + v, where u ∈ Col(AT ) and v ∈ Null(A) are unique.
3.7. SUMMARY 103
AX
text
AX
AAtul
feelin
y T
AF
For more information related with the fundamental theorem of linear algebra the interested
readers are advised to see the article “The Fundamental Theorem of Linear Algebra, Gilbert
Strang, The American Mathematical Monthly, Vol. 100, No. 9, Nov., 1993, pp. 848 - 855.” The
diagram 3.6 has been taken from the above paper. It also explains Corollary 3.6.7.
3.7 Summary
In this chapter, we defined vector spaces over F. The set F was either R or C. To define a vector
space, we start with a non-empty set V of vectors and F the set of scalars. We also needed to
do the following:
If all conditions in Definition 3.1.1 are satisfied then V is a vector space over F. If W was a
non-empty subset of a vector space V over F then for W to be a space, we only need to check
whether the vector addition and scalar multiplication inherited from that in V hold in W.
We then learnt linear combination of vectors and the linear span of vectors. It was also shown
that the linear span of a subset S of a vector space V is the smallest subspace of V containing
S. Also, to check whether a given vector v is a linear combination of u1 , . . . , un , we needed to
104 CHAPTER 3. VECTOR SPACES
1. A is invertible.
AF
3. RREF(A) = In .
7. Rank(A) = n.
8. det(A) 6= 0.
9. Col(AT ) = Row(A) = Rn .
11. Col(A) = Rn .
Linear Transformations
Definition 4.1.1. Let V and W be vector spaces over F with vector operations +, · in V and
⊕, in W. A function (map) f : V → W is called a linear transformation if for all α ∈ F
and u, v ∈ V the function f satisfies
T
AF
By L(V, W), we denote the set of all linear transformations from V to W. In particular, if
W = V then the linear transformation f is called a linear operator and the corresponding set
of linear operators is denoted by L(V).
Even though, in the definition above, we have differentiated between the vector addition
and scalar multiplication for domain and co-domain, we will not differentiate them in the book
unless necessary.
Equation (4.1.1) just states that the two operations, namely, taking the image (apply f ) and
doing ‘vector space operations (vector addition and scalar multiplication) commute, i.e., first
apply vector operations (u + v or αv) and then look at their images f (u + v) or f (αv)) is same
as first computing the images (f (u), f (v)) and then compute vector operations (f (u) + f (v)
and αf (v)). Or equivalently, we look at only those functions which preserve vector operations.
Definition 4.1.2. Let g, h ∈ L(V, W). Then g and h are said to be equal if g(x) = h(x), for
all x ∈ V.
Example 4.1.3. 1. Let V be a vector space. Then, the maps Id, 0 ∈ L(V), where
(a) Id(v) = v, for all v ∈ V, is commonly called the identity operator.
(b) 0(v) = 0, for all v ∈ V, is commonly called the zero operator.
105
106 CHAPTER 4. LINEAR TRANSFORMATIONS
2. Let V and W be vector spaces over F. Then, 0 ∈ L(V, W), where 0(v) = 0, for all v ∈ V,
is commonly called the zero transformation.
4. Let V, W and Z be vector spaces over F. Then, for any T ∈ L(V, W) and S ∈ L(W, Z),
the map S ◦ T ∈ L(V, Z), defined by (S ◦ T )(v) = S T (v) for all v ∈ V, is called the
5. Fix a ∈ Rn and define f (x) = aT x, for all x ∈ Rn . Then f ∈ L(Rn , R). In particular, if
x = [x1 , . . . , xn ]T then, for all x ∈ Rn ,
n
xi = 1T x is a linear transformation.
P
(a) f (x) =
i=1
(b) fi (x) = xi = eTi x is a linear transformation, for 1 ≤ i ≤ n.
T
7. Fix A ∈ Mm×n (C). Define fA (x) = Ax, for every x ∈ Cn . Then, fA ∈ L(Cn , Cm ). Thus,
for each A ∈ Mm,n (C), there exists a linear transformation fA ∈ L(Cn , Cm ).
10. Is the map T : R[x; n] → R[x; n + 1] defined by T (f (x)) = xf (x), for all f (x) ∈ R[x; n] a
linear transformation?
d
Rx
11. The maps T, S : R[x] → R[x] defined by T (f (x)) = dx f (x) and S(f (x)) = f (t)dt, for all
0
f (x) ∈ R[x] are linear transformations. Is it true that T S = Id? What about ST ?
12. Recall the vector space RN in Example 3.1.4.7. Now, define maps T, S : RN → RN
by T ({a1 , a2 , . . .}) = {0, a1 , a2 , . . .} and S({a1 , a2 , . . .}) = {a2 , a3 , . . .}. Then, T and S,
commonly called the shift operators, are linear operators with exactly one of ST or T S
as the Id map.
4.1. DEFINITIONS AND BASIC PROPERTIES 107
13. Recall the vector space C(R, R) (see Example 3.1.4.9). Define T : C(R, R) → C(R, R) by
Rx Rx
T (f )(x) = f (t)dt. For example, T (sin)(x) = sin(t)dt = 1−cos(x), for all x ∈ R. Then,
0 0
verify that T is a linear transformation.
Remark 4.1.4. Let A ∈ Mn (C) and define TA : Cn → Cn by TA (x) = Ax, for every x ∈ Cn .
Then, verify that TAk (x) = (TA ◦ TA ◦ · · · ◦ TA )(x) = Ak x, for any positive integer k.
| {z }
k times
Exercise 4.1.5. Fix A ∈ Mn (C). Then, do the following maps define linear transformations?
1. Define f, g : Mn (C) → Mn (C) by f (B) = A∗ B and g(B) = BA, for every B ∈ Mn (C).
2. Define h, t : Mn (C) → C by h(B) = tr(A∗ B) and t(B) = tr(BA), for every B ∈ Mn (C).
We now prove that any linear transformation sends the zero vector to a zero vector.
Proposition 4.1.6. Let T ∈ L(V, W). Suppose that 0V is the zero vector in V and 0W is the
zero vector of W. Then T (0V ) = 0W .
Hence T (0V ) = 0W .
AF
From now on 0 will be used as the zero vector of the domain and co-domain. We now
DR
Definition 4.1.8. Let f ∈ L(V, W). Then the range/ image of f , denoted Rng(f ) or Im(f ),
is given by Rng(f ) = {f (x) : x ∈ V}.
108 CHAPTER 4. LINEAR TRANSFORMATIONS
As an exercise, show that Rng(f ) is a subspace of W. The next result, which is a very
important result, states that a linear transformation is known if we know its image on a basis
of the domain space.
Lemma 4.1.9. Let V and W be vector spaces over F with B = {v1 , v2 , . . .} as a basis of V. If
f ∈ L(V, W) then T is determined if we know the set {f (v1 ), f (v2 ), . . .}, ı.e., if we know the
image of f on the basis vectors of V, or equivalently, Rng(f ) = LS(f (x)|x ∈ B).
Proof. Let B be a basis of V over F. Then, for each v ∈ V, there exist vectors u1 , . . . , uk in B
k
and scalars c1 , . . . , ck ∈ F such that v =
P
ci ui . Thus
i=1
k k k
!
X X X
T (v) = f ci ui = f (ci ui ) = ci T (ui ).
i=1 i=1 i=1
Or equivalently, whenever
c1 c
. i 1
..
h
v = [u1 , . . . , uk ] .
. then f (v) = f (u1 ) · · · f (uk ) . . (4.1.2)
ck ck
Thus, the image of f on v just depends on where the basis vectors are mapped. Equation 4.1.2
T
Rng(f ) = LS(f (e1 ), T (e2 ), T (e3 )) = LS (1, 0, 1, 2)T , (−1, 1, 0, −5)T , (1, −1, 0, 5)T
= LS (1, 0, 1, 2)T , (1, −1, 0, 5)T = {λ(1, 0, 1, 2)T + β(1, −1, 0, 5)T | λ, β ∈ R}
2. Let B ∈ M2 (R). Now, define a map T : M2 (R) → M2 (R) by T (A) = BA − AB, for all
A ∈ M2 (R). Determine Rng(T ) and Null(T ).
Solution: Recall that {eij |1 ≤ i, j ≤ 2} is a basis of M2 (R). So,
(a) if B = cI2 then Rng(T ) = {0}.
" # " # " # " #
1 2 0 −2 −2 −3 2 0
(b) if B = then T (e11 ) = , T (e12 ) = , T (e21 ) = and
2 4 2 0 0 2 3 −2
" # " # " # " #!
0 2 0 2 2 3 −2 0
T (e22 ) = . Thus, Rng(T ) = LS , , .
−2 0 −2 0 0 −2 −3 2
" # " # " # " #!
1 2 0 2 2 2 −2 0
(c) for B = , verify that Rng(T ) = LS , , .
2 3 −2 0 0 −2 −2 2
4.1. DEFINITIONS AND BASIC PROPERTIES 109
Recall that by Example 4.1.3.5, for each a ∈ Rn , the map T (x) = aT x, for each x ∈ Rn , is
a linear transformation from Rn to R. We now show that these are the only ones.
Corollary 4.1.11. [Reisz Representation Theorem] Let T ∈ L(Rn , R). Then, there exists
a ∈ Rn such that T (x) = aT x.
Proof. By Lemma 4.1.9, T is known if we know the image of T on {e1 , . . . , en }, the standard
basis of Rn . So, for 1 ≤ i ≤ n, let T (ei ) = ai , for some ai ∈ R. Now define a = [a1 , . . . , an ]T
and x = [x1 , . . . , xn ]T ∈ Rn . Then, for all x ∈ Rn ,
n n n
!
X X X
T (x) = T xi ei = xi T (ei ) = xi ai = aT x.
i=1 i=1 i=1
Example 4.1.12. In each of the examples given below, state whether a linear transformation
exists or not. If yes, give at least one linear transformation. If not, then give the condition due
to which a linear transformation doesn’t exist.
1. Can we construct a linear transformation T : R2 → R2 such that T ((1, 1)T ) = (e, 2)T and
T ((2, 1)T ) = (5, 4)T ?
Solution: The first thing that we need to answer is “is the set {(1, 1), (2, 1)} linearly
independent”? The answer is ‘Yes’. So, we can construct it. So, how do we do it?
T
AF
y 1 1 1 1 β
linear transformation
" #! " # " #! " #! " #! " # " #
x 1 2 1 2 e 5
T = T α +β = αT + βT =α +β
y 1 1 1 1 2 4
" #" # " # " #!−1 " #
e 5 α e 5 1 2 x
= =
2 4 β 2 4 1 1 y
" #" #" # " #
e 5 −1 2 x (5 − e)x + (2e − 5)y
= = .
2 4 1 −1 y 2x
2. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((1, −1)T ) = (5, 10)T ?" #
1 1
Solution: Yes, as the set {(1, 1), (1, −1)} is a basis of R2 . Write B = . Then,
1 −1
" #! " #! " #!!
x x x
T = T (BB −1 ) = T B B −1
y y y
" " #! " #!# " #−1 " #
1 1 1 1 x
= T , T
1 −1 1 −1 y
" # " #−1 " # " #" # " #
x+y
1 5 1 1 x 1 5 2 3x − 2y
= = = .
2 10 1 −1 y 2 10 x−y 2 6x − 4y
110 CHAPTER 4. LINEAR TRANSFORMATIONS
3. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((5, 5)T ) = (5, 11)T ?
Solution: Note that the set {(1, 1), (5, 5)} is linearly dependent. Further, (5, 11)T =
T ((5, 5)T ) = 5T ((1, 1)T )5(1, 2)T = (5, 10)T gives us a contradiction. Hence, there is no
such linear transformation.
4. Does there exist a linear transformation T : R3 → R2 with T (1, 1, 1) = (1, 2), T (1, 2, 3) =
(4, 3) and T (2, 3, 4) = (7, 8)?
Solution: Here, the set {(1, 1, 1), (1, 2, 3), (2, 3, 4)} is linearly dependent and (2, 3, 4) =
(1, 1, 1) + (1, 2, 3). So, we need T ((2, 3, 4)) = T ((1, 1, 1) + (1, 2, 3)) = T ((1, 1, 1)) +
T ((1, 2, 3)) = (1, 2) + (4, 3) = (5, 5). But, we are given T (2, 3, 4) = (7, 8), a contradiction.
So, such a linear transformation doesn’t exist.
5. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((5, 5)T ) = (5, 10)T ?
Solution: Yes, as (5, 10)T = T ((5, 5)T ) = 5T ((1, 1)T ) = 5(1, 2)T = (5, 10)T .
y 1 0 y 2 v2 x−y 2
AF
6. Does there exist a linear transformation T : R3 → R2 with T (1, 1, 1) = (1, 2), T (1, 2, 3) =
DR
(a) T 6= 0, T ◦ T = T 2 6= 0, T ◦ T ◦ T = T 3 = 0.
(b) T 6= 0, S 6= 0, S ◦ T = ST 6= 0, T ◦ S = T S = 0.
(c) S ◦ S = S 2 = T 2 = T ◦ T, S 6= T .
(d) T ◦ T = T 2 = Id, T 6= Id.
5. Let V be a vector space and let a ∈ V. Then the map Ta : V → V defined by Ta (x) = x+a,
for all x ∈ V is called the translation map. Prove that Ta ∈ L(V) if and only if a = 0.
Ans: Ta ∈ L(V) ⇔ Ta (x + y) = Ta (x) + Ta (y) ⇔ x + y + a = x + a + y + a ⇔ a = 0.
6. Prove that there exists infinitely many linear transformations T : R3 → R2 such that
T ((1, −1, 1)T ) = (1, 2)T and T ((−1, 1, 2)T ) = (1, 0)T ?
Ans: Note that {(1, −1, 1)T , (−1, 1, 2)T , (1, 0, 0)T } is a linearly independent set. One can
replace (1, 0, 0)T be any vector v such that {(1, −1, 1)T , (−1, 1, 2)T , v} is a linearly indepen-
dent set. Further, you can define T (v) as any element of R2 . So, you have uncountably
infinite number of choices.
112 CHAPTER 4. LINEAR TRANSFORMATIONS
Ans: (a) yes, as the set {(1, 0, 1)T , (0, 1, 1)T , (1, 1, 1)T } is linearly independent subset of R3 .
(b) Here, (1, 1, 2)T = (1, 0, 1)T + (0, 1, 1)T . So, for T to be a linear transform, we need
T ((1, 1, 2)T ) = T ((1, 0, 1)T ) + T ((0, 1, 1)T ). Here, we see that T ((1, 1, 2)T ) = (2, 3)T 6=
(1, 2)T + (1, 0)T = T ((1, 0, 1)T ) + T ((0, 1, 1)T ). Hence, no such linear transform exists.
8. Find T ∈ L(R3 ) for which Rng(T ) = LS (1, 2, 0)T , (0, 1, 1)T , (1, 3, 1)T .
Ans: As T ∈ L(R3 ), take T (e1 ) = (1, 2, 0)T , T (e2 ) = (0, 1, 1)T and T (e3 ) = (1, 3, 1)T .
to find x ∈ R such thatT (x) = (9, 3, k) is equivalent to solving the linear
Ans: Notre that 3 T
9 2 3 4
3, where A = 1 1 1. Verify that k = 5.
system Ax =
k 1 1 3
10. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x − 2y + 2z, −2x + 5y + 2z, x + y + 4z)T .
Find x ∈ R3 such that T (x) = (1, 1, −1)T .
T
AF
1 2 −2 2 21
DR
12. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + 3y + 4z, −y, −3y + 4z)T . Determine
x, y, z ∈ R3 \ {0} such that T (x) = 2x, T (y) = 4y and T (z) = −z. Is the set {x, y, z}
linearly independent?
1 2 9
Ans: Proceed as in the previous question to get x = 0, y = 0 and z = −5
. Yes,
0 1 −3
{x, y, z} is a linearly independent set.
4.2. RANK-NULLITY THEOREM 113
13. Does there exist a linear transformation T : R3 → Rn such that T ((1, 1, −2)T ) = x,
T ((−1, 2, 3)T ) = y and T ((1, 10, 1)T ) = z
(a) with z = x + y?
(b) with z = cx + dy, for some choice of c, d ∈ R?
Ans: Note that 4(1, 1, −2)T +3(−1, 2, 3)T −(1, 10, 1)T = 0. Thus, for a linear transformation
T to exist, we need 4 T ((1, 1, −2)T ) + 3 T ((−1, 2, 3)T ) = T ((1, 10, 1)T ). Or equivalently, we
need z = 4x + 3y.
14. For each matrix A given below, define T ∈ L(R2 ) by T (x) = Ax. What do these linear
operators signify geometrically?
( "√ # " # " √ # " # " #)
cos 2π 2π
1 3 −1 1 1 −1 1 1 − 3 0 −1 3 − sin 3
(a) A ∈ √ ,√ , √ , , 2π 2π
.
2 1 2
3 2 1 1 3 1 1 0 sin 3 cos 3
( " # " # " # " #)
1 1 −1 1 1 2 0 0 1 0
(b) A ∈ , , , .
2 −1 1 5 2 4 0 1 0 0
( "√ # " # " √ # " #)
cos 2π 2π
1 3 1 1 1 1 1 1 3 3 sin 3
(c) A ∈ √ ,√ , √ , .
2 1 − 3 2 1 −1 2 sin 2π 2π
3 −1 3 − cos 3
π π π
2π
Ans: (a) Counter-clockwise rotations by θ = , , and
T
6 4 3 3
(b) Projection onto the line {t(1, −1) : t ∈ R}, {t(1, 2) R},
AF
12 8 6 3
15. Consider the space C3 over C. If f ∈ L(C3 ) with f (x) = x, f (y) = (1 + i)y and f (z) =
(2 + 3i)z, for x, y, z ∈ C3 \ {0} then prove that {x, y, z} forms a basis of C3 .
The coefficient matrix is a vandermonde matrix and hence the system has a unique solution
ax = 0, by = 0 and cz = 0. As x, y, z ∈ C3 \ {0}, a = b = c = 0.
Recall that for any f ∈ L(V, W), Rng(f ) = {f (v)|v ∈ V} (see Definition 4.1.8). Now, in line
with the ideas in Theorem 3.6.1, we define the null-space or the kernel of a linear transformation.
At this stage, the readers are advised to recall Section 3.6 for clarity and similarity with the
results in this section.
114 CHAPTER 4. LINEAR TRANSFORMATIONS
Definition 4.2.1. Let f ∈ L(V, W). Then the null space of f , denoted Null(f ) or Ker(f ),
is given by Null(f ) = {v ∈ V|f (v) = 0}. In most linear algebra books, it is also called the
kernel of f and written Ker(f ). Further, if V is finite dimensional then one writes
2. Fix B ∈ M2 (R). Now, define T : M2 (R) → M2 (R) by T (A) = BA−AB, for all A ∈ M2 (R).
Solution: Then A ∈ Null(T ) if and only if A commutes with B. In particular,
{I, B, B 2 , . . .} ⊆ Null(T ). For example, if B = αI, for some α then Null(T ) = M2 (R).
2. Define T ∈ L(R2 , R4 ) by T ((x, y)T ) = (x+y, x−y, 2x+y, 3x−4y)T . Determine Null(T ).
1 1
1 −1
2 1 . Then, Null(T ) = {x ∈ R : Ax = 0} = {(0, 0) }.
Ans: Let A = 2 T
3 −4
3. Describe Null(D) and Rng(D), where D ∈ L(R[x; n]) is defined by (D(f ))(x) = f 0 (x),
the differentiation with respect to x. Note that Rng(D) ⊆ R[x; n − 1].
Ans: Null(D) = {f (x) ∈ R[x; n] : f (x) is a constant polynomial} and
Rng(D) = {f (x) ∈ R[x; n] : f (x) is a polynomial of degree ≤ n − 1} = R[x; n − 1].
4. Define T ∈ L(R[x]) by (T (f ))(x) = xf (x), for all f (x) ∈ L(R[x]). What can you say
about Null(T ) and Rng(T )?
Ans: Null(T ) = {0} and Rng(T ) = {f (x) ∈ R[x] : f (x) = a1 x + · · · + an xn , ai ∈ R}.
Theorem 4.2.4. Let V and W be vector spaces over F and let T ∈ L(V, W).
Proof. Part 1 : As S is linearly dependent, there exist k ∈ N and vi ∈ S, for 1 ≤ i ≤ k, such that
k
xi vi = 0, in the unknowns xi ’s, has a non-trivial solution, say xi = ai ∈ F, 1 ≤ i ≤
P
the system
i=1
k
P Pk
k. Thus ai vi = 0. Then ai ’s also give a non-trivial solution to the system yi T (vi ) = 0,
i=1 i=1
k k
k
P P P
where yi ’s are unknown, as ai T (vi ) = T (ai vi ) = T ai vi = T (0) = 0. Hence the
i=1 i=1 i=1
required result follows.
Part 2 : On the contrary assume that S is linearly dependent. Then by Part 1, T (S) is
linearly dependent, a contradiction to the given assumption that T (S) is linearly independent.
We now prove the rank-nullity Theorem. The proof of this result is similar to the proof of
T
Theorem 4.2.5 (Rank-Nullity Theorem). Let V and W be vector spaces over F. If dim(V) is
DR
Rng(T ) = LS(T (v1 ), . . . , T (vk ), T (vk+1 ), . . . , T (vn )) = LS(T (vk+1 ), . . . , T (vn )).
n−k k
Hence, there exists b1 , . . . , bk ∈ F such that
P P
ai vk+i = bj vj . This gives a new system
i=1 j=1
n−k
X k
X
ai vk+i + (−bj )vj = 0,
i=1 j=1
116 CHAPTER 4. LINEAR TRANSFORMATIONS
in the unknowns ai ’s and bj ’s. As C is linearly independent, the new system has only the trivial
n−k
solution, namely [a1 , . . . , ak , −b1 , . . . , −b` ]T = 0. Hence, the system
P
ai T (vk+i ) = 0 has only
i=1
the trivial solution. Thus, the set {T (vk+1 ), . . . , T (vn )} is linearly independent subset of W. It
also spans Rng(T ) and hence is a basis of Rng(T ). Therefore,
Corollary 4.2.6. Let V and W be finite dimensional vector spaces over F and let T ∈ L(V, W).
If dim(V) = dim(W) then the following statements are equivalent.
1. T is one-one.
2. Ker(T ) = {0}.
3. T is onto.
4. dim(Rng(T )) = dim(W) = dim(V).
Exercise 4.2.7. 1. Prove Corollary 4.2.6.
2. Let V and W be finite dimensional vector spaces over F. If T ∈ L(V, W) then
T
Ans: If T is onto then Rng(T ) = W. So, dim(V) ≥ dim(Rng(T )) = dim(W) (by the
rank-nullity theorem), a contradiction to dim(V) < dim(W).
If T is one-one then Null(T ) = {0}. Thus, dim(W) ≥ dim(Rng(T )) = dim(V), a contra-
diction to dim(V) > dim(W).
3. Let A ∈ Mn (R) with A2 = A. Define T ∈ L(Rn ) by T (v) = Av for all v ∈ Rn . Then
prove that
Definition 4.3.1. Let V, W be vector spaces over F and let S, T ∈ L(V, W). Then, we define
the point-wise
1. sum of S and T , denoted S + T , by (S + T )(v) = S(v) + T (v), for all v ∈ V.
2. scalar multiplication, denoted c T for c ∈ F, by (c T )(v) = c (T (v)), for all v ∈ V.
( " # " #)
1 0
To understand the next result, consider L(R2 , R3 ) and let B = v1 = , v2 =
0 1
1 0 0
and C = w1 = 0, w2 = 1, w3 = 0 be bases of R2 and R3 , respectively. Now, for
T
0 0 1
AF
(
wj , if k = i
fji (vk ) =
0, 6 i.
if k =
Then verify that the above maps correspond to the following collection of matrices?
1 0 0 1 0 0 0 0 0 0 0 0
f11 = 0 0, f12 = 0 0, f21 = 1 0, f22 = 0 1, f31 = 0 0, f32 = 0 0.
0 0 0 0 0 0 0 0 1 0 0 1
Theorem 4.3.2. Let V and W be vector spaces over F. Then L(V, W) is a vector space over
F. Furthermore, if dim V = n and dim W = m, then dim L(V, W) = mn.
Proof. It can be easily verified that under point-wise addition and scalar multiplication, defined
above, L(V, W) is indeed a vector space over F. We now prove the other part. So, let us
assume that B = {v1 , . . . , vn } and C = {w1 , . . . , wm } are bases of V and W, respectively. For
1 ≤ i ≤ n, 1 ≤ j ≤ m, we define the functions fji on the basis vectors of V by
(
wj , if k = i
fji (vk ) =
0, if k 6= i.
n
For other vectors of V, we extend the definition by linearity, i.e., if v =
P
αs vs then
s=1
n n
!
X X
fji (v) = fji αs vs = αs fji (vs ) = αi fji (vi ) = αi wj . (4.3.1)
s=1 s=1
118 CHAPTER 4. LINEAR TRANSFORMATIONS
Thus fji ∈ L(V, W). We now show that {fji |1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L(V, W).
As a first step, we show that fji ’s are linearly independent. So, consider the linear system
n P
P m
cji fji = 0, in the unknowns cji ’s, for 1 ≤ i ≤ n, 1 ≤ j ≤ m. Using the point-wise addition
i=1 j=1
and scalar multiplication, we get
X n Xm n X
X m m
X
0 = 0(vk ) = cji fji (vk ) = cji fji (vk ) = cjk wj .
i=1 j=1 i=1 j=1 j=1
But, the set {w1 , . . . , wm } is linearly independent. Hence the only solution equals cjk = 0, for
1 ≤ j ≤ m. Now, as we vary vk from v1 to vn , we see that cji = 0, for 1 ≤ j ≤ m and 1 ≤ i ≤ n.
Thus, we have proved the linear independence of {fji |1 ≤ i ≤ n, 1 ≤ j ≤ m}.
Now, let us prove that LS ({fji |1 ≤ i ≤ n, 1 ≤ j ≤ m}) = L(V, W). So, let f ∈ L(V, W).
m
Then, for 1 ≤ s ≤ n, f (vs ) ∈ W and hence there exists βts ’s such that f (vs ) =
P
βts wt . So, if
t=1
n
αs vs ∈ V then, using Equation (4.3.1), we get
P
v=
s=1
n n n m n X
m
! !
X X X X X
f (v) = f α s vs = αs f (vs ) = αs βts wt = βts (αs wt )
s=1 s=1 s=1 t=1 s=1 t=1
n m n X
m
!
XX X
= βts fts (v) = βts fts (v).
s=1 t=1 s=1 t=1
T
n P
m
AF
Corollary 4.3.3. Let V be a vector space over F with dim(V) = n. If S, T ∈ L(V) then
1. Nullity(T ) + Nullity(S) ≥ Nullity(ST ) ≥ max{Nullity(T ), Nullity(S)}.
2. min{Rank(S), Rank(T )} ≥ Rank(ST ) ≥ n − Rank(S) − Rank(T ).
Proof. The prove of Part 2 is omitted as it directly follows from Part 1 and Theorem 4.2.5.
Part 1 - Second Inequality: Suppose v ∈ Ker(T ). Then
Remark 4.3.5. Let f : S → T be invertible. Then, it can be easily shown that any right inverse
and any left inverse are the same. Thus, the inverse function is unique and is denoted by f −1 .
It is well known that f is invertible if and only if f is both one-one and onto.
Lemma 4.3.6. Let V and W be vector spaces over F and let T ∈ L(V, W). If T is one-one and
onto then, the map T −1 : W → V is also a linear transformation. The map T −1 is called the
inverse linear transform of T and is defined by T −1 (w) = v whenever T (v) = w.
Proof. Part 1: As T is one-one and onto, by Theorem 4.2.5, dim(V) = dim(W). So, by
Corollary 4.2.6, for each w ∈ W there exists a unique v ∈ V such that T (v) = w. Thus, one
defines T −1 (w) = v.
We need to show that T −1 (α1 w1 + α2 w2 ) = α1 T −1 (w1 ) + α2 T −1 (w2 ), for all α1 , α2 ∈ F
and w1 , w2 ∈ W. Note that by previous paragraph, there exist unique vectors v1 , v2 ∈ V such
T
Definition 4.3.8. Let V and W be vector spaces over F and let T ∈ L(V, W). Then, T is said
to be singular if {0} $ Ker(T ), i.e., Ker(T ) contains a non-zero vector. If Ker(T ) = {0}
then, T is called non-singular.
" #! x
x
Example 4.3.9. Let T ∈ L(R2 , R3 ) be defined by T
=
y . Then, verify that T is
y
0
non-singular. Is T invertible?
Theorem 4.3.10. Let V and W be vector spaces over F and let T ∈ L(V, W). Then the
following statements are equivalent.
1. T is one-one.
2. T is non-singular.
Proof. 1⇒2 On the contrary, let T be singular. Then, there exists v 6= 0 such that T (v) =
0 = T (0). This implies that T is not one-one, a contradiction.
2⇒3 Let S ⊆ V be linearly independent. Let if possible T (S) be linearly dependent.
k
Then, there exists v1 , . . . , vk ∈ S and α = (α1 , . . . , αk )T 6= 0 such that
P
αi T (vi ) = 0.
k i=1
k
P P
Thus, T αi vi = 0. But T is non-singular and hence we get αi vi = 0 with α 6= 0, a
i=1 i=1
contradiction to S being a linearly independent set.
3⇒1 Suppose that T is not one-one. Then, there exists x, y ∈ V such that x 6= y but
T (x) = T (y). Thus, we have obtained S = {x − y}, a linearly independent subset of V with
T (S) = {0}, a linearly dependent set. A contradiction to our assumption. Thus, the required
result follows.
Definition 4.3.11. Let V and W be vector spaces over F and let T ∈ L(V, W). Then, T is
T
said to be an isomorphism if T is one-one and onto. The vector spaces V and W are said to
AF
be isomorphic, denoted V ∼
= W, if there is an isomorphism from V to W.
DR
We now give a formal proof of the statement that every finite dimensional vector space V
over F looks like Fn , where n = dim(V).
Corollary 4.3.13. The vector space R over Q is not finite dimensional. Similarly, the vector
space C over Q is not finite dimensional.
We now summarize the different definitions related with a linear operator on a finite dimen-
sional vector space. The proof basically uses the rank-nullity theorem and they appear in some
form in previous results. Hence, we leave the proof for the reader.
Theorem 4.3.14. Let V be a finite dimensional vector space over F with dim V = n. Then the
following statements are equivalent for T ∈ L(V).
1. T is one-one.
2. Ker(T ) = {0}.
4.4. ORDERED BASES 121
3. Rank(T ) = n.
4. T is onto.
5. T is an isomorphism.
6. If {v1 , . . . , vn } is a basis for V then so is {T (v1 ), . . . , T (vn )}.
7. T is non-singular.
8. T is invertible.
Example 4.4.1.
T
AF
1. Let f (x) =1 −x2 ∈ R[x; 2]. If B = (1, x, x2 ) be a basis of R[x; 2] then, f (x) =
DR
h i 1
2
1 x x 0 .
−1
So, from Example 4.4.1 we conclude the following: Let V be a vector space of dimension n
n
over F. If we fix a basis, say, B = (u1 , u2 , . . . , un ) of V and if v ∈ V with v =
P
α i ui ⇒
i=1
α1 α2
α2 α1
v = [u1 , u2 , . . . , un ] . = [u2 , u1 , . . . , un ] .
.. ..
αn αn
Note the change in the first two components of the column vectors which are elements of Fn .
So, a change in the position of the vectors ui ’s gives a change in the column vector. Hence,
if we fix the order of the basis vectors ui ’s then with respect to this order all vectors can be
thought of as elements of Fn . We use the above discussion to define an ordered basis.
122 CHAPTER 4. LINEAR TRANSFORMATIONS
Definition 4.4.2. Let W be a vector space over F with a basis B = {u1 , . . . , um }. Then, an
ordered basis for W is a basis B together with a one-to-one correspondence between B and
{1, 2, . . . , m}. Since there is an order among the elements of B, we write B = (u1 , . . . , um ). The
matrix B = [u1 , . . . , um ] containing the basis vectors of Wm and is called the basis matrix.
Example 4.4.3. Note that for Example 4.4.1.1 [1, x, x2 ] is a basis matrix, whereas for Exam-
ple 4.4.1.2, [u1 , u2 ] and [u2 , u1 ] are basis matrices.
Definition 4.4.4. Let B = [v1 , . . . , vm ] be the basis matrix corresponding to an ordered basis
B = (v1 , . . . , vm ) of W. Since B is a basis of W, for each v ∈ W, there exist βi , 1 ≤ i ≤ m,
β1 β1
m . .
. . , denoted [v]B , is called the coordinate
P
such that v = βi vi = B
. . The vector .
i=1
βm βm
vector of v with respect to B. Thus,
v1
.
v = B[v]B = [v1 , . . . , vm ][v]B , or equivalently, v = [v]TB .
. . (4.4.1)
vm
1
1. f (x) = 1 − x2 ∈ R[x; 2] with B = (1, x, x2 ) as an ordered basis of R[x; 2] ⇒ (x)]B =
DR
0 .
−1
The next definition relates the coordinates of a vector with respect to two distinct ordered
bases. This allows us to move from one ordered basis to another ordered basis.
Definition 4.4.8. Let V be a vector space over F with dim(V) = n. Let A = [v1 , . . . , vn ] and
B = [u1 , . . . , un ] be basis matrices corresponding to the ordered bases A and B, respectively, of
V. Thus, continuing with the symbolic expression in Equation (4.4.1), we have
where [A]B = [[v1 ]B , . . . , [vn ]B ], is called the matrix of A with respect to the ordered basis
B or the change of basis matrix from A to B.
We now summarize the ideas related with ordered bases. This also helps us to understand
the nomenclature ‘change of basis matrix’ for the matrix [A]B .
T
AF
Theorem 4.4.9. Let V be a vector space over F with dim(V) = n. Further, let A = (v1 , . . . , vn )
and B = (u1 , . . . , un ) be two ordered bases of V.
DR
1. Then the matrix [A]B is invertible. Further, Equation (4.4.2) gives [A]B = B −1 A.
3. Moreover, [x]B = [A]B [x]A , for all x ∈ V, i.e., [A]B takes coordinate vector of x with
respect to A to the coordinate vector of x with respect to B.
Proof. Part 1: Note that using Equation (4.4.3), we see that the matrix [A]B takes a linearly
independent set to another linearly independent set. Hence, by Exercise 3.3.17, the matrix [A]B
is invertible, which proves Part 1. A similar argument gives Part 2.
Part 3: Using Equation (4.4.2), [x]B = B −1 x = B −1 (AA−1 )x = (B −1 A)(A−1 x) = [A]B [x]A ,
for all x ∈ V. A similar argument gives Part 4 and clearly Part 5.
Example 4.4.10.
1. Let V = Rn , A = [v1 , . . . , vn ] and B = (e1 , . . . , en ) be the standard ordered basis. Then
A = [v1 , . . . , vn ] = [[v1 ]B , . . . , [vn ]B ] = [A]B .
2. Suppose A = (1, 0, 0)T , (1, 1, 0)T , (1, 1, 1)T and B = (1, 1, 1)T , (1, −1, 1)T , (1, 1, 0)T are
two ordered bases of R3 . Then, we verify the statements in the previous result.
124 CHAPTER 4. LINEAR TRANSFORMATIONS
−1
x 1 1 1 x x−y
(a) Using Equation (4.4.2),
y = 0 1 1 y = y − z .
z 0 0 1 z z
A
−1
x 1 1 1 x −1 1 2 x −x + y + 2z
1 1
y = 1 −1 1 y = 2 1 −1 0 y = 2
(b) Similarly, x−y .
z 1 1 0 z 2 0 −2 z 2x − 2z
B
−1/2 0 1 0 2 0
(c) [A]B = 1/2 0 0, [B]A = 0 −2 1
and [A]B [B]A = I3 .
1 1 0 1 1 0
Exercise 4.4.11. In R3 , let A = (1, 2, 0)T , (1, 3, 2)T , (0, 1, 3)T be an ordered basis.
1. If B = (1, 2, 1)T , (0, 1, 2)T , (1, 4, 6)T is another ordered basis of R3 . Then, determine
0 0 1
AF
When there is no mention of bases, we take it to be the standard ordered bases and denote
the corresponding matrix by [T ]. Also, note that for each x ∈ V, the matrix T [A, B][x]A is
the coordinate vector of T (x) with respect to the ordered basis B of the co-domain. Thus,
the matrix T [A, B] takes coordinate vector of the domain points to the coordinate vector of its
images. The above discussion is stated as the next result.
See Figure 4.1 for clarity on which basis occurs at which place.
𝒜 ℬ
x
AF
𝑇 (x)
DR
𝕍 𝕎
Figure 4.1: Matrix of the Linear Transformation
Remark 4.5.3. Let V and W be vector spaces over F with ordered bases A1 = (v1 , . . . , vn )
and B1 = (w1 , . . . , wm ), respectively. Also, for α ∈ F with α 6= 0, let A2 = (αv1 , . . . , αvn ) and
B2 = (αw1 , . . . , αwm ) be another set of ordered bases of V and W, respectively. Then, for any
T ∈ L(V, W)
h i h i
T [A2 , B2 ] = [T (αv1 )]B2 · · · [T (αvn )]B2 = [T (v1 )]B1 · · · [T (vn )]B1 = T [A1 , B1 ].
Thus, the same matrix can be the matrix representation of T for two different pairs of bases.
We now give a few examples to understand the above discussion and Theorem 4.5.2.
1
126 CHAPTER 4. LINEAR TRANSFORMATIONS
Q = (0, 1)
Q′ = (− sin θ, cos θ)
P ′ = (x′ , y ′ )
θ P ′ = (cos θ, sin θ)
θ P = (x, y)
P = (1, 0)
θ α
O O
Or equivalently, using the left figure in Figure 4.2 we see that the matrix in the standard
ordered basis of R2 equals
" #
h i cos θ − sin θ
[T ] = T (e1 ), T (e2 ) = . (4.5.1)
sin θ cos θ
1 1
(b) On the image space take the ordered basis as B = , . Then
AF
0 1
DR
5. Define T ∈ L(C3 ) by T (x) = x, for all x ∈ C3 . Note that T is the Id map. De-
termine the coordinate matrix with respect to the ordered basis A = e1 , e2 , e3 and
B = (1, 0, 0), (1, 1, 0), (1, 1, 1) .
Solution: By definition, verify that
1 0 0 1 −1 0
T [A, B] = [[T (e1 )]B , [T (e2 )]B , [T (e3 )]B ] =
0 , 1 , 0
= 0 1 −1
0 0 1 0 0 1
B B B
and
1 1 1 1 1 1
T [B, A] = 0 , 1 , 1
= 0 1 1 .
0 0 1 0 0 1
A A A
Thus, verify that T [B, A]−1 = T [A, B] and T [A, A] = T [B, B] = I3 as the given map is
indeed the identity map.
T
AF
We now give a remark which relates the above ideas with respect to matrix multiplication.
DR
Remark 4.5.5. 1. Fix S ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Sx, for all x ∈ Cn . If
A is the standard basis of Cn then [T ] = S as
[T ][:, i] = [T (ei )]A = [S(ei )]A = [S[:, i]]A = S[:, i], for 1 ≤ i ≤ n.
2. Fix S ∈ Mm,n (C) and define T ∈ L(Cn , Cm ) by T (x) = Sx, for all x ∈ Cn . Let A and B
be the standard ordered bases of Cn and Cm , respectively. Then T [A, B] = S as
(T [A, B])[:, i] = [T (ei )]B = [Sei ]B = [S[:, i]]B = S[:, i], for 1 ≤ i ≤ n.
3. Fix S ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Sx, for all x ∈ Cn . Let A = (v1 , . . . , vn )
and B = (u1 , . . . , un ) be two ordered bases of Cn with respective basis matrices A and B.
Then
h i h i
T [A, B] = [T (v1 )]B · · · [T (v1 )]B = B −1 T (v1 ) · · · B −1 T (v1 )
h i h i
= B −1 Sv1 · · · B −1 Sv1 = B −1 S v1 · · · vn = B −1 SA.
2. [Finding T from T [A, B]] Let V and W be vector spaces over F with ordered bases A and
B, respectively. Suppose we are given the matrix S = T [A, B]. Then to determine the
corresponding T ∈ L(V, W), we go back to the symbolic expression in Equation (4.4.1)
and Theorem 4.5.2. We see that
(a) T (v) = B[T (v)]B = BT [A, B][v]A = BS[v]A .
(b) In particular, if V = W = Fn and A = B then T (v) = BSB −1 v.
(c) Further, if B is the standard ordered basis then T (v) = Sv.
Exercise 4.5.7. 1. Relate Remark 4.5.5.3 with Theorem 4.4.9 as Id is the identity map.
3. Let T ∈ L(R2 ) represent the reflection about the line y = mx. Find [T ].
1 1
Ans: Note that T (1, 0) = (1 − m2 , 2m) and T (0, 1) = (2m, m2 − 1). Thus,
T
1+m 2 1 + m2
AF
" #
1 1 − m2 2m
A = [T ] = .
DR
1 + m2 2m m2 − 1
4. Let T ∈ L(R3 ) represent the reflection about/across the X-axis. Find [T ]. What about the
reflection across the XY -plane?
1 0 0 1 0 0
Ans: 0 −1 0
0 1 0 .
0 0 −1 0 0 −1
5. Let T ∈ L(R3 ) represent the counter-clockwise rotation around the positive Z-axis by an
angle θ, 0 ≤ θ < 2π. Findits matrix with respect to the standard ordered basis of R3 .
cos θ − sin θ 0
[Hint: Is sin θ cos θ 0
the required matrix?]
0 0 1
6. Define a function D ∈ L(R[x; n]) by D(f (x)) = f 0 (x). Find the matrix of D with respect
to the standard ordered basis of R[x; n]. Observe that Rng(D) ⊆ R[x; n − 1].
0 1 0 ··· 0
0 0 2 · · · 0
..
Ans: Note that this is an (n + 1) × (n + 1) matrix and equals
0 0 0 . 0.
. . . .
.. .. .. . .
n
0 0 0 ··· 0
4.6. SIMILARITY OF MATRICES 129
This idea can be generalized to any finite dimensional vector space. To do so, we start with
the matrix of the composition of two linear transformations. This also helps us to relate matrix
multiplication with composition of two functions.
sional vector spaces over F with ordered bases B, C and D, respectively. Also, let T ∈ L(V, W)
and S ∈ L(W, Z). Then S ◦ T = ST ∈ L(V, Z) (see Figure 4.3). Then
DR
So, for all u ∈ V, we get (S[C, D] · T [B, C]) [u]B = [(ST )(u)]D = (ST )[B, D] [u]B . Hence
(ST ) [B, D] = S[C, D] · T [B, C].
As an immediate corollary of Theorem 4.6.1 we see that the matrix of the inverse linear
transform is the inverse of the matrix of the linear transform, whenever the inverse exists.
Theorem 4.6.2 (Inverse of a Linear Transformation). Let V is a vector space with dim(V) = n.
If T ∈ L(V) is invertible then for any ordered basis B and C of the domain and co-domain,
respectively, one has (T [C, B])−1 = T −1 [B, C]. That is, the inverse of the coordinate matrix of
T is the coordinate matrix of the inverse linear transform.
Proof. As T is invertible, T T −1 = Id. Thus, Remark 4.5.5.3 and Theorem 4.6.1 imply
Hence, by definition of inverse, T −1 [B, C] = (T [C, B])−1 and the required result follows.
130 CHAPTER 4. LINEAR TRANSFORMATIONS
Exercise 4.6.3. Find the matrix of the linear transformations given below.
T (x) = 1 + x, T (x2 ) = (1 + x)2 and T (x3 ) = (1 + x)3 . Prove that T is invertible. Also,
find T [B, B] and T −1 [B, B].
Ans: Note that {1, 1 + x, (1 + x)2 , (1 + x)3 } is linearly independent. As T takes a linearly
independent T is
to another linearly independent, invertible. Further
1 1 1 1 1 −1 1 −1
1 −2 3
0 1 2 3 −1 0
T [B, B] =
0 0 1 3 and (T )[B, B] = 0
.
0 1 −3
0 0 0 1 0 0 0 1
Let V be a finite dimensional vector space. Then, the next result answers the question “what
happens to the matrix T [B, B] if the ordered basis B changes to C?”
T
T [B, B]
AF
(V, B) (V, B)
DR
Id ◦ T
Id[B, C] Id[B, C]
T ◦ Id
(V, C) (V, C)
T [C, C]
Figure 1: Commutative Diagram for Similarity of Matrices
Figure 4.4: T [C, C] = Id[B, C] · T [B, B] · (Id[B, C])−1 - Similarity of Matrices
Theorem 4.6.4. Let B = (u1 , . . . , un ) and C = (v1 , . . . , vn ) be two ordered bases of V and Id
the identity operator. Then, for any linear operator T ∈ L(V)
Proof. As Id is the identity operator, the composite functions (T ◦ Id), (Id ◦ T ) from (V, B) to
(V, C) are equal (see Figure 4.4 for clarity). Hence, their matrix representations with respect to
ordered bases B and C are equal. Thus, (T ◦ Id)[B, C] = T [B, C] = (Id ◦ T )[B, C]. Thus, using
Theorem 4.6.1, we get
Let V be a vector space and let T ∈ L(V). If dim(V) = n then every ordered basis B of V
gives an n × n matrix T [B, B]. So, as we change the ordered basis, the coordinate matrix of
T changes. Theorem 4.6.4 tells us that all these matrices are related by an invertible matrix.
Thus, we are led to the following definitions.
Definition 4.6.5. Let V be a vector space with ordered bases B and C. If T ∈ L(V) then,
T [C, C] = Id[B, C] · T [B, B] · Id[C, B]. The matrix Id[B, C] is called the change of basis matrix
(also, see Theorem 4.4.9) from B to C.
Definition 4.6.6. Let X, Y ∈ Mn (C). Then, X and Y are said to be similar if there exists a
non-singular matrix P such that P −1 XP = Y ⇔ X = P Y P −1 ⇔ XP = P Y .
bases of R[x; 2]. Then, verify that Id[B, C]−1 = Id[C, B], as
−1 1 −2
Id[C, B] = [[1]B , [1 + x]B , [1 + x + x2 ]B ] =
0 0 1 and
1 0 1
0 −1 1
Id[B, C] = [[1 + x]C , [1 + 2x + x2 ]C , [2 + x]C ] =
1 1 1 .
0 1 0
T
Exercise 4.6.8. 1. Let A ∈ Mn (R) such that tr(A) = 0. Then prove that there exists a
AF
non-singular matrix S such that SAS −1 = B with B = [bij ] and bii = 0, for 1 ≤ i ≤ n.
DR
2. Let V be a vector space with dim(V) = n. Let T ∈ L(V) satisfy T n−1 6= 0 but T n = 0.
Then, use Exercise 4.1.13.2 to get an ordered basis B = u, T (u), . . . , T n−1 (u) of V.
0 0 0 ··· 0
1 0 0 · · · 0
(a) Now, prove that T [B, B] = 0 1 0 · · · 0.
. . . . . ..
. . . .
.
0 0 ··· 1 0
(b) Let A ∈ Mn (C) satisfy An−1 6= 0 but An = 0. Then, prove that A is similar to the
matrix given in Part 2a.
3. Let A be an ordered basis of a vector space V over F with dim(V) = n. Then prove that
the set of all possible matrix representations of T is given by (also see Definition 4.6.5)
4. Let B1 (α, β) = {(x, y)T ∈ R2 : (x − α)2 + (y − β)2 ≤ 1}. Then, can we get a linear
transformation T ∈ L(R2 ) such that T (S) = W , where S and W are given below?
132 CHAPTER 4. LINEAR TRANSFORMATIONS
5. Let V, W be vector spaces over F with dim(V) = n and dim(W) = m and ordered bases
B and C, respectively. Define IB,C : L(V, W) → Mm,n (F) by IB,C (T ) = T [B, C]. Show
that IB,C is an isomorphism. Thus, when bases are fixed, the number of m × n matrices
is same as the number of linear transformations.
of R3 . Then find
AF
2 3 1 2 −2 1 1 1 2
DR
Chapter 5
The dot product helped us to compute the length of vectors and talk of perpendicularity of
T
vectors. We now generalize the idea of dot product to achieve similar goal for a general vector
DR
space over R or C.
Definition 5.1.1. Let V be a vector space over F. An inner product over V, denoted by
h , i, is a map from V × V to F satisfying
2. hu, vi = hv, ui, the complex conjugate of hu, vi, for all u, v ∈ V and
Remark 5.1.2. Using the definition of inner product, we immediately observe that
1. hu, 0i = hu, 0 + 0i = hu, 0i + hu, 0i. Thus, hu, 0i = 0, for all u ∈ V.
2. hv, α wi = hα w, vi = α hw, vi = α hv, wi, for all α ∈ F and v, w ∈ V.
3. If hu, vi = 0 for all v ∈ V then in particular hu, ui = 0. Hence u = 0.
Definition 5.1.3. Let V be a vector space with an inner product h , i. Then, (V, h , i) is called
an inner product space (in short, ips).
Example 5.1.4. Examples 1 to 4 that appear below are called the standard inner product
or the dot product. Whenever an inner product is not clearly mentioned, it will be assumed
to be the standard inner product.
133
134 CHAPTER 5. INNER PRODUCT SPACES
−1 −1
AF
R1 R1
(d) hαf , gi = (αf (x))g(x)dx = α f (x)g(x)dx = αhf , gi.
DR
−1 −1
" #
4 −1
5. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 and A = , define hx, yi = yT Ax. Then,
−1 2
h , i is an inner product as hx, xi = (x1 − x2 )2 + 3x21 + x22 .
" #
a b
6. Fix A = with a, c > 0 and ac > b2 . Then, hx, yi = yT Ax is an inner product on
b c
h i2
R2 as hx, xi = ax21 + 2bx1 x2 + cx22 = a x1 + bxa2 + a1 ac − b2 x22 .
Exercise 5.1.5. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 , we define three maps that satisfy at
least one condition out of the three conditions for an inner product. Determine the condition
which is not satisfied. Give reasons for your answer.
1. hx, yi = x1 y1 .
As hu, ui > 0, for all u 6= 0, we use inner product to define the length/ norm of a vector.
5.2. CAUCHY-SCHWARTZ INEQUALITY 135
Definition 5.1.6. Let V be an inner product space over F. Then, for any vector u ∈ V, we
p
define the length (norm) of u, denoted kuk = hu, ui, the positive square root. A vector of
u
norm 1 is called a unit vector. Thus, is called the unit vector in the direction of u.
kuk
1. Let V be an ips and u ∈ V. Then, for any scalar α, kαuk = α · kuk.
Example 5.1.7.
√ √
2. Let u = (1, −1, 2, −3)T ∈ R4 . Then, kuk = 1 + 1 + 4 + 9 = 15. Thus, √1 u and
15
− √115 u are unit vectors in the direction of u.
Exercise 5.1.8. 1. Let u = (−1, 1, 2, 3, 7)T ∈ R5 . Find all α ∈ R such that kαuk = 1.
a b
that kuk = 1, kvk = 1 and hu, vi = 0? [Hint: Let A = and define hx, yi = yT Ax.
b c
Use given conditions to get a linear system of 3 equations in the variables a, b, c.]
A very useful and a fundamental inequality, commonly called the Cauchy-Schwartz inequality,
DR
Theorem 5.2.1 (Cauchy- Schwartz inequality). Let V be an inner product space over F. Then,
for any u, v ∈ V
| hu, vi | ≤ kuk kvk. (5.2.1)
Moreover, equality holds in Inequality
(5.2.1) if and only if u and v are linearly dependent. In
u u
particular, if u 6= 0 then v = v, .
kuk kuk
Proof. If u = 0 then Inequality (5.2.1) holds. Hence, let u 6= 0. Then, by Definition 5.1.1.3,
hv, ui
hλu + v, λu + vi ≥ 0 for all λ ∈ F and v ∈ V. In particular, for λ = − , we have
kuk2
0 ≤ hλu + v, λu + vi = λλkuk2 + λhu, vi + λhv, ui + kvk2
hv, ui hv, ui 2 hv, ui hv, ui 2 2 | hv, ui |2
= kuk − hu, vi − hv, ui + kvk = kvk − .
kuk2 kuk2 kuk2 kuk2 kuk2
Or, in other words | hv, ui |2 ≤ kuk2 kvk2 and the proof of the inequality is over.
Now, note that equality holds in Inequality (5.2.1) if and only if hλu + v, λu + vi = 0, or
equivalently, λu + v = 0. Hence, u and v are linearly dependent. Moreover,
1 1
Exercise 5.2.3. 1. Let a, b ∈ R with a, b > 0. Then, prove that (a + b) ≥ 4. In
+
n n a b
P 1
general, for 1 ≤ i ≤ n, let ai ∈ R with ai > 0. Then ≥ n2 .
P
ai
i=1 i=1 ai
√ 1
Ans: Use the Cauchy-Schwartz inequality with ui = ai and vi = √ .
bi
2. Prove that | z1 + · · · + zn | ≤ n( | z1 |2 + · · · + | zn |2 ), for z1 , . . . , zn ∈ C. When does
p
3. Let V be an ips. If u, v ∈ V with kuk = 1, kvk = 1 and hu, vi = 1 then prove that u = αv
AF
for some α ∈ F. Is α = 1?
DR
Let V be a real vector space. Then, for u, v ∈ V, the Cauchy-Schwartz inequality implies
hu, vi
that −1 ≤ ≤ 1. This together with the properties of the cosine function is used to
kuk kvk
define the angle between two vectors in a real inner product space.
Definition 5.2.4. Let V be a real vector space. If θ ∈ [0, π] is the angle between u, v ∈ V \ {0}
then we define
hu, vi
cos θ = .
kuk kvk
Example 5.2.5. 1. Take (1, 0)T , (1, 1)T ∈ R2 . Then cos θ = √1 . So θ = π/4.
2
2. Take (1, 1, 0)T , (1, 1, 1)T ∈ R3 . Then, angle between them, say β = cos−1 √2 .
6
3. Angle depends on the IP. Take hx, yi = 2x1 y1 + x1 y2 + x2 y1 + x2 y2 on R2 . Then, angle
between (1, 0)T , (1, 1)T ∈ R2 equals cos−1 √3 .
10
4. As hx, yi = hy, xi for any real vector space, the angle between x and y is same as the
angle between y and x.
We will now prove that if A, B and C are the vertices of a triangle (see Figure 5.1) and a, b
b2 + c2 − a2
and c, respectively, are the lengths of the corresponding sides then cos(A) = . This
2bc
in turn implies that the angle between vectors has been rightly defined.
5.3. NORMED LINEAR SPACE 137
a
b
A B
c
Lemma 5.2.6. Let A, B and C be the vertices of a triangle (see Figure 5.1) with corresponding
side lengths a, b and c, respectively, in a real inner product space V then
b2 + c2 − a2
cos(A) = .
2bc
Proof. Let 0, u and v be the coordinates of the vertices A, B and C, respectively, of the triangle
ABC. Then, AB ~ = u, AC
~ = v and BC ~ = v − u. Thus, we need to prove that
Now, by definition kv−uk2 = kvk2 +kuk2 −2hv, ui and hence kvk2 +kuk2 −kv−uk2 = 2 hu, vi.
As hv, ui = kvk kuk cos(A), the required result follows.
4. kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 (parallelogram law: the sum of squares of the
lengths of the diagonals of a parallelogram equals twice the sum of squares of the lengths
of its sides).
Solution: Just expand the left hand side to get the required result follows.
Theorem 5.3.3. Let V be a normed linear space and x, y ∈ V. Then kxk − kyk ≤ kx − yk.
DR
Proof. As kxk = kx − y + yk ≤ kx − yk + kyk one has kxk − kyk ≤ kx − yk. Similarly, one
obtains kyk − kxk ≤ ky − xk = kx − yk. Combining the two, the required result follows.
3. Let A ∈ Mn (C) satisfy kAxk ≤ kxk for all x ∈ Cn . Then, prove that if α ∈ C with
| α | > 1 then A − αI is invertible.
Ans: The matrix B = A − αI is invertible if and only if the system Bx = 0 has only the
trivial solution. So, let x0 be a solution of Bx = 0. Then 0 = Bx0 = (A − αI)x0 implies
Ax0 = αx0 . As |α| > 1, we get kx0 k < |α|kx0 k = kαx0 k = kAx0 k ≤ kx0 k, a contradiction.
The next result is stated without proof as the proof is beyond the scope of this book.
5.4. ORTHOGONALITY IN INNER PRODUCT SPACE 139
Theorem 5.3.5. Let k · k be a norm on a normed linear space V. Then the norm k · k is induced
by some inner product if and only if k · k satisfies the parallelogram law:
Example 5.3.6. For x = (x1 , x2 )T ∈ R2 , we define kxk = |x1 | + |x2 |. Verify that kxk is
indeed a norm. But, for x = e1 and y = e2 , 2(kxk2 + kyk2 ) = 4 whereas
kx + yk2 + kx − yk2 = k(1, 1)T k2 + k(1, −1)T k2 = (|1| + |1|)2 + (|1| + | − 1|)2 = 8.
So the parallelogram law fails. Thus, kxk is not induced by any inner product in R2 .
Exercise 5.3.7. Does there exist an inner product in R2 such that kxk = max{|x1 |, |x2 |}?
Ans: No, as the parallelogram law fails. Take x = e1 and y = e2 . Then 2(kxk2 + kyk2 ) = 4
whereas kx + yk2 + kx − yk2 = k(1, 1)T k2 + k(1, −1)T k2 = (|1|)2 + (|1|)2 = 2.
2. If V is a vector space over R or C then 0 is the only vector that is orthogonal to itself.
3. Let V = R.
2x1 − x2
u u x1 + 2x2
x = hx, ui 2
+ x − hx, ui 2
= (1, 2)T + (2, −1)T
kuk kuk 5 5
is a decomposition of x into two vectors, one parallel to u and the other parallel to u⊥ .
140 CHAPTER 5. INNER PRODUCT SPACES
7. Let P = (1, 1, 1)T , Q = (2, 1, 3)T and R = (−1, 1, 2)T be three vertices of a triangle in R3 .
Compute the angle between the sides P Q and P R.
Solution: Method 1: Note that P~Q = (2, 1, 3)T − (1, 1, 1)T = (1, 0, 2)T , P~R =
(−2, 0, 1)T and RQ~ = (−3, 0, −1)T . As hP~Q, P~Ri = 0, the angle between the sides
π
P Q and P R is .
2
√ √ √
Method 2: kP Qk = 5, kP Rk = 5 and kQRk = 10. As kQRk2 = kP Qk2 + kP Rk2 ,
T
π
AF
Exercise 5.4.3.
(a) If S ⊆ V then S ⊥ is a subspace of V and S ⊥ = (LS(S))⊥ .
(b) Furthermore, if V is finite dimensional then S ⊥ and LS(S) are complementary.
Thus, V = LS(S) ⊕ S ⊥ . Equivalently, hu, wi = 0, for all u ∈ LS(S) and w ∈ S ⊥ .
2. Find v, w ∈ R3 such that v, w and (1, −1, −2)T are mutually orthogonal.
3. Let W = {(x, y, z, w)T ∈ R4 : x + y + z − w = 0}. Find a basis of W⊥ .
4. Determine W⊥ , where W = {A ∈ Mn (R) | AT = A}.
6. Consider R3 with the standard inner product. Find the plane containing
(a) (1, 1 − 1) with (a, b, c) 6= 0 as the normal vector.
(b) (2, −2, 1)T and perpendicular to the line ` = {(t − 1, 3t + 2, t + 1) : t ∈ R}.
(c) the lines (1, 2, −2) + t(1, 1, 0) and (1, 2, −2) + t(0, 1, 2).
(d) (1, 1, 2)T and orthogonal to the line `{(2 + t, 3, 1 − t) : t ∈ R}.
5.4. ORTHOGONALITY IN INNER PRODUCT SPACE 141
7. Let P = (3, 0, 2)T , Q = (1, 2, −1)T and R = (2, −1, 1)T be three points in R3 . Then,
(a) find the area of the triangle with vertices P, Q and R.
(b) find the area of the parallelogram built on vectors P~Q and QR.
~
(c) find a non-zero vector orthogonal to the plane of the above triangle.
√
(d) find all vectors x orthogonal to P~Q and QR~ with kxk = 2.
(e) the volume of the parallelepiped built on vectors P~Q and QR
~ and x, where x is one
of the vectors found in Part (d). Do you think the volume would be different if you
choose the other vector x?
8. Let p1 be a plane containing the point A = (1, 2, 3)T and the vector (2, −1, 1)T as its
normal. Then,
(a) find the equation of the plane p2 that is parallel to p1 and contains (−1, 2, −3)T .
(b) calculate the distance between the planes p1 and p2 .
9. In the parallelogram ABCD, ABkDC and ADkBC and A = (−2, 1, 3)T , B = (−1, 2, 2)T
and C = (−3, 1, 5)T . Find the
Rπ
4. Recall that hf (x), g(x)i = f (x)g(x)dx defines the standard inner product in C[−π, π].
−π
Consider S = {1} ∪ {em | m ≥ 1} ∪ {fn | n ≥ 1}, where 1(x) = 1, em (x) = cos(mx) and
fn (x) = sin(nx), for all m, n ≥ 1 and for all x ∈ [−π, π]. Then,
(a) S is a linearly independent set.
(b) k1k2 = 2π, kem k2 = π and kfn k2 = π.
(c) the functions in S are orthogonal.
1 1 1
Hence, √ ∪ √ em | m ≥ 1 ∪ √ fn | n ≥ 1 is an orthonormal set in C[−π, π].
2π π π
We now prove the most important initial result of this section.
i=1
AF
Hence ci = 0, for 1 ≤ i ≤ n. Thus, the above linear system has only the trivial solution. So,
the set S is linearly independent.
n
P n
P
Part 2: Note that hv, ui i = h αj uj , ui i = αj huj , ui i = αi hui , ui i = αi . This completes
j=1 j=1
Sub-part (a). For Sub-part (b), we have
n
* n n
+ n
* n
+
X X X X X
kvk2 = k αi ui k2 = αi ui , αi ui = αi ui , αj uj
i=1 i=1 i=1 i=1 j=1
n
X n
X n
X n
X
= αi αj hui , uj i = αi αi hui , ui i = | αi |2 .
i=1 j=1 i=1 i=1
5.4. ORTHOGONALITY IN INNER PRODUCT SPACE 143
X
hx, yi = hx, vi ihy, vi i, for each x, y ∈ V.
AF
i=1
DR
n
Furthermore, if x = y then kxk2 = | hx, vi i |2 (generalizing the Pythagoras Theorem).
P
i=1
We have another corollary of Theorem 5.4.6 which talks about an orthogonal set.
Theorem 5.4.8 (Bessel’s Inequality). Let V be an ips with {v1 , · · · , vn } as an orthogonal set.
n | hz, v i |2 n hz, v i
k k
Then, for each z ∈ V, 2 . Equality holds if and only if z =
P P
2
≤ kzk v .
2 k
k=1 kvk k k=1 kvk k
vk
Proof. For 1 ≤ k ≤ n, define uk = and use Theorem 5.4.6.4 to get the required result.
kvk k
Remark 5.4.9. Using Theorem 5.4.6, we see that if B = v1 , . . . , vn is an ordered orthonormal
basis of an ips V then
hu, v1 i
.
[u]B = .
. , for each u ∈ V.
hu, vn i
Thus, to get the coordinates of a vector with respect to an orthonormal ordered basis, we just
need to compute the inner product with basis vectors.
To proceed further with the applications of the above ideas, we pose a question for better
understanding.
144 CHAPTER 5. INNER PRODUCT SPACES
P lane − P
0 y
Example 5.4.10. Which point on the plane P is closest to the point, say Q?
Solution: Let y be the foot of the perpendicular from Q on P . Thus, by Pythagoras
Theorem (see Theorem 5.4.6.3c), y is unique. So, the question arises: how do we find y?
−→ →
− −→
Note that yQ gives a normal vector of the plane P . Hence, Q = y + yQ. So, need to
→
−
decompose Q into two vectors such that one of them lies on the plane P and the other is
orthogonal to the plane.
Thus, we see that given u, v ∈ V \ {0}, we need to find two vectors, say y and z, such that
y is parallel to u and z is perpendicular to u. Thus, y = u cos(θ) and z = u sin(θ), where θ is
the angle between u and v.
R v P
⃗ =v− ⟨v,u⟩ ⃗ =
OQ ⟨v,u⟩
u
OR ∥u∥2
u ∥u∥2
u
Q
O θ
T
AF
u
We do this as follows (see Figure 5.2). Let û = be the unit vector in the direction
kuk
~
of u. Then, using trigonometry, cos(θ) = kOQk ~ ~
~ k . Hence kOQk = kOP k cos(θ). Now using
kOP
~ hv,ui hv,ui
Definition 5.2.4, kOQk = kvk kvk kuk = kuk , where the absolute value is taken as the
length/norm is a positive quantity. Thus,
~ = kOQk
~ û = u u
OQ v, .
kuk kuk
~ u u u u ~
Hence, y = OQ = v, kuk and z = v − v, . In literature, the vector y = OQ
kuk kuk kuk
is called the orthogonal projection of v on u, denoted Proju (v). Thus,
hv, ui
u u ~
Proju (v) = v, and kProju (v)k = kOQk = . (5.4.2)
kuk kuk kuk
~ = kP~Qk =
Moreover, the distance of u from the point P equals kORk u
v − hv, kuk u
i kuk
.
Example 5.4.11. 1. Determine the foot of the perpendicular from the point (1, 2, 3) on the
XY -plane.
Solution: Verify that the required point is (1, 2, 0)?
2. Determine the foot of the perpendicular from the point Q = (1, 2, 3, 4) on the plane
generated by (1, 1, 0, 0), (1, 0, 1, 0) and (0, 1, 1, 1).
5.5. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 145
Answer: (x, y, z, w) lies on the plane x−y −z +2w = 0 ⇔ h(1, −1, −1, 2), (x, y, z, w)i = 0.
So, the required point equals
1 1
(1, 2, 3, 4) − h(1, 2, 3, 4), √ (1, −1, −1, 2)i √ (1, −1, −1, 2)
7 7
4 1
= (1, 2, 3, 4) − (1, −1, −1, 2) = (3, 18, 25, 20).
7 7
Note that Proju (w) is parallel to u and Projv2 (w) is parallel to v2 . Hence, we have
u 1 1 T is parallel to u,
(a) w1 = Proju (w) = hw, ui kuk2 = 4 u = 4 (1, 1, 1, 1)
T
Proof. Note that for orthonormality, we need kwi k = 1, for 1 ≤ i ≤ n and hwi , wj i = 0, for
1 ≤ i 6= j ≤ n. Also, by Corollary 3.3.11.2, vi ∈
/ LS(v1 , . . . , vi−1 ), for 2 ≤ i ≤ n, as {v1 , . . . , vn }
is a linearly independent set. We are now ready to prove the result by induction.
v1
Step 1: Define w1 = then LS(v1 ) = LS(w1 ).
kv1 k
u2
Step 2: Define u2 = v2 − hv2 , w1 iw1 . Then, u2 6= 0 as v2 6∈ LS(v1 ). So, let w2 = .
ku2 k
146 CHAPTER 5. INNER PRODUCT SPACES
Example 5.5.2. 1. Let S = {(1, −1, 1, 1), (1, 0, 1, 0), (0, 1, 0, 1)} ⊆ R4 . Find an orthonormal
set T such that LS(S) = LS(T ).
T
Solution: As we just require LS(S) = LS(T ), we can order the vectors as per our
AF
convenience. So, let v1 = (1, 0, 1, 0)T , v2 = (0, 1, 0, 1)T and v3 = (1, −1, 1, 1)T . Then,
DR
w1 = √1 (1, 0, 1, 0)T . As hv2 , w1 i = 0, we get w2 = √1 (0, 1, 0, 1)T . For the third vector,
2 2
let u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = (0, −1, 0, 1)T . Thus, w3 = √12 (0, −1, 0, 1)T .
h iT h iT h iT h iT
2. Let S = v1 = 2 0 0 , v2 = 23 2 0 , v3 = 1
2
3
2 0 , v4 = 1 1 1 . Find
an orthonormal set T such that LS(S) = LS(T ).
h iT
Solution: Take w1 = kvv11 k = 1 0 0 = e1 . For the second vector, consider u2 =
h iT h iT
v2 − 32 w1 = 0 2 0 . So, put w2 = kuu22 k = 0 1 0 = e2 .
2
hv3 , wi iwi = (0, 0, 0)T . So, v3 ∈ LS((w1 , w2 )). Or
P
For the third vector, let u3 = v3 −
i=1
equivalently, the set {v1 , v2 , v3 } is linearly dependent.
2
P
So, for again computing the third vector, define u4 = v4 − hv4 , wi iwi . Then, u4 =
i=1
v4 − w1 − w2 = e3 . So w4 = e3 . Hence, T = {w1 , w2 , w4 } = {e1 , e2 , e3 }.
Observe that (−2, 1, 0) and (−1, 0, 1) are orthogonal to (1, 2, 1) but are themselves not
orthogonal.
5.5. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 147
Method 1: Apply Gram-Schmidt process to { √16 (1, 2, 1)T , (−2, 1, 0)T , (−1, 0, 1)T } ⊆ R3 .
Method 2: Valid only in R3 using the cross product of two vectors.
−1
In either case, verify that { √16 (1, 2, 1), √ 5
(2, −1, 0), √−1
30
(1, 2, −5)} is the required set.
(a) Then prove that {x} can be extended to form an orthonormal basis of Rn .
(b) Let the extended basis be {x,x2 , . . . , xn } and B = [e 1 , . . . , en ] the standard ordered
basis of Rn . Prove that A = [x]B , [x2 ]B , . . . , [xn ]B is an orthogonal matrix.
7. Let v, w ∈ Rn , n ≥ 1 with kuk = kwk = 1. Prove that there exists an orthogonal matrix
P such that P v = w. Prove also that A can be chosen such that det(P ) = 1.
Ans: Let {v, v2 , . . . , vn } be an extended orthonormal basis of Rn containing v. Simi-
larly, let {w, w2 , . . . , wn } be an extended orthonormal basis of Rn containing w. Define
A = [v, v2 , . . . , vn ] and B = [w, w2 , . . . , wn ]. Then A and B are orthogonal matrices with
Ae1 = v and Be1 = w. So, det(A) = ±1 and det(B) = ±1 and BA−1 v = w. If
det(BA−1 ) = −1, just change v2 as a column to −v2 .
5.6 QR Decomposition
In this section, we study the QR-decomposition of a matrix A ∈ Mn (R). The decomposition
is obtained by applying the Gram-Schmidt Orthogonalization process to the columns of the
matrix A. Thus, the set {A[:, 1], . . . , A[:, n]} of the columns of A are taken as the collection of
vectors {v1 , . . . , vn }.
T
AF
If Rank(A) = n then the columns of A are linearly independent and the application of
the
h Gram-Schmidt process gives us vectors {w1 , . . . , wn } ⊆ Rn such that the matrix Q =
DR
i
w1 · · · wn is an orthogonal matrix. Further the condition
Theorem 5.6.1 (QR Decomposition). Let A ∈ Mn (R) be a matrix with Rank(A) = n. Then,
there exist matrices Q and R such that Q is orthogonal and R is upper triangular with A = QR.
Furthermore, the diagonal entries of R can be chosen to be positive. Also, in this case, the
decomposition is unique.
Proof. The argument before the statement of the theorem gives us A = QR, with
5.6. QR DECOMPOSITION 149
Thus, this completes the proof of the first part. Note that
1. αii 6= 0, for 1 ≤ i ≤ n, as A[:, 1] 6= 0 and A[:, i] ∈
/ LS(w1 , . . . , wi−1 ) as A has full column
rank.
2. if αii < 0, for some i, 1 ≤ i ≤ n then we can replace vi in Q by −vi to get new matrices
Q and R with the added condition that the diagonal entries of R are positive.
Remark 5.6.2. Note that in the proof of Theorem 5.6.1, we just used the idea that A[:, i] ∈
LS(w1 , . . . , wi ) to get the scalars αji , for 1 ≤ j ≤ i. As {w1 , . . . , wi } is an orthonormal set
So, it is quite easy to compute the entries of the upper triangular matrix R.
DR
Hence, proceeding on the lines of the above theorem, one has the following result.
Solution: From Example 5.5.2, we know that w1 = √1 (1, 0, 1, 0)T , w2 = √1 (0, 1, 0, 1)T
2 2
and w3 = √1 (0, −1, 0, 1)T . We now compute w4 . If v4 = (2, 1, 1, 1)T then
2
1
u4 = v4 − hv4 , w1 iw1 − hv4 , w2 iw2 − hv4 , w3 iw3 = (1, 0, −1, 0)T .
2
Thus, w4 = √1 (−1, 0, 1, 0)T . Hence, we see that A = QR with
2 √ √
√1 √1 2 − √32
0 0 2 0
2 2
√ √
1 −1
0 √ √ 0
2 2
0 2 0 − 2
Q= w1 , . . . , w4 = 1 −1
and R = √ .
2 0 0 √2
√ 0 0 2 0
1 1 1
0 √2 √2 0 0 0 0 √
2
1 1 1 0
−1 0 −2 1
2. Let A = . Find a 4 × 3 matrix Q satisfying QT Q = I3 and an upper
1 1 1 0
1 0 2 1
triangular matrix R such that A = QR.
Solution: Let us apply the Gram-Schmidt orthonormalization process to the columns of
A. As v1 = (1, −1, 1, 1)T , we get w1 = 21 v1 . Let v2 = (1, 0, 1, 0)T . Then,
1
u2 = v2 − hv2 , w1 iw1 = (1, 0, 1, 0)T − w1 = (1, 1, 1, −1)T .
2
T
1 1
2 2 0
−1 1 2 1 3 0
√1
2 2 2 and R =
Q = [v1 , v2 , v3 ] =
1 1
0 1 −1 0 .
2 2 0 √
1 −1 1
0 0 0 2
√
2 2 2
(a) Rank(A) = 3,
(b) A = QR with QT Q = I3 , and
(c) R is a 3 × 4 upper triangular matrix with Rank(R) = 3.
In most practical applications, the linear systems are inconsistent due to various reasons.
AF
The reasons could be related with human error, or computational/rounding-off error or missing
DR
data or there is not enough time to solve the whole linear system. So, we need to go bound
consistent linear systems. In quite a few such cases, we are interested in finding a point x ∈ Rn
such that the error vector, defined as kb − Axk has the least norm. Thus, we consider the
problem of finding x0 ∈ Rn such that
Definition 5.7.2. Let W be a finite dimensional subspace of an ips V. Then, by Theorem 5.7.1,
for each v ∈ V there exist unique vectors w ∈ W and u ∈ W⊥ with v = w + u. We thus define
the orthogonal projection of V onto W, denoted PW , by
PW : V → W by PW (v) = w.
So, note that the solution x0 ∈ Rn satisfying kb − Ax0 k = min{kb − Axk : x ∈ Rn } is the
projection of b on the Col(A).
Remark 5.7.3. Let A ∈ Mm,n (R) and W = Col(A). Then, to find the orthogonal projection
PW (b), we can use either of the following ideas:
k
P
1. Determine an orthonormal basis {f1 , . . . , fk } of Col(A). Then PW (b) = hb, fi ifi . Note
i=1
that
k k k k
!
X X X X
fi (fiT b) = fi fiT b = fi fiT
x0 = PW (b) = hb, fi ifi = b = P b,
i=1 i=1 i=1 i=1
k
fi fiT is called the projection matrix of Rm onto Col(A).
P
where P =
i=1
Corollary 5.7.4. Let A ∈ Mm,n (R) and b ∈ Rm . Then, x0 is a least square solution of Ax = b
if and only if x0 is a solution of the system AT Ax = AT b.
T
Proof. As b ∈ Rm , by Remark 5.7.3, there exists y ∈ Col(A) and v ∈ Null(AT ) such that
AF
Thus, the vectors b − Ax1 and Ax1 − Ax are orthogonal and hence
1 1
(a) Method 1: Observe that { √ (1, −1, 0)T , √ (1, 1, 2)T } is a basis of Null(A). Thus,
2 6
1 h 1 2/3 −1/3 1/3
1 1 −1 0 + 1 1 1 1 2 = −1/3 2/3 1/3
i h i
the projection matrix P = −1
2 6
0 2 1/3 1/3 1/3
1 2/3
and P 1 = 2/3.
1 4/3
1 1 1
form a basis of R3 . Then
(b) Method 2: Then the columns of B = −1 0 1
0 1 −1
−2 1
1 T
x = 4 is a solution of Bx = 1
. Thus, we see that (1, 1, 1) = u + v, where
3
1 1
−2
1 T T T 4 T 2
u = (1, 1, −1) ∈ Col(A ) and v = (1, −1, 0) + (1, 0, 1) = (1, 1, 2)T ∈
3 3 3 3
T
2 2 4
Null(A). Thus, the required projection equals v = , , .
3 3 3
1 1
(c) Method 3: Since we want the projection on Null(A). Consider B = −1 0.
T
AF
0 1
Then Null(A) =Col(B). Thus, we need the vector x0 , a solution of the linear sys-
DR
1 " # " #
2 1 0
tem B T Bx = B T
1. Or equivalently, we need the solution of 1 2 x = 2 . The
1
" #
2 −1 2 2 4 T
solution x0 = . Thus, the projection vector equals Bx0 = v = , , .
3 2 3 3 3
2. Find the foot of the perpendicular from the point v = (1, 2, 3, 4)T on the plane generated
by the vectors (1, 1, 0, 0)T , (1, 0, 1, 0)T and (0, 1, 1, 1)T .
(a) Method 1: Note that the three vectors lie on the plane x − y − z − 2w = 0. Then
r = (1, −1, −1, 2)T is the normal vector of the plane. Hence
4 1
v − Projr v = (1, 2, 3, 4)T − (1, −1, −1, 2)T = (3, 18, 25, 20)T
7 7
1 1 1
w1 = √ (1, 1, 0, 0)T , w2 = √ (1, −1, 2, 0)T , w3 = √ (−2, 2, 2, 3)T
2 6 21
as an orthonormal basis of the plane generated by the vectors (1, 1, 0, 0)T , (1, 0, 1, 0)T
154 CHAPTER 5. INNER PRODUCT SPACES
−2
AF
1 0 1 1 118 is the nearest vector to v = (1, 2, 3, 4)T .
Thus, Ax = · 5 =
0 1 1 7 7 25
DR
20
0 0 1 20
Exercise 5.7.6. 1. Let W = {(x, y, z, w) ∈ R4 : x = y, z = w} be a subspace of R4 .
Determine the matrix of the orthogonal projection.
2. Let PW1 and PW2 be the orthogonal projections of R2 onto W1 = {(x, 0) : x ∈ R} and
W2 = {(x, x) : x ∈ R}, respectively. Note that PW1 ◦ PW2 is a projection onto W1 . But,
it is not an orthogonal projection. Hence or otherwise, conclude that the composition of
two orthogonal projections need not be an orthogonal projection?
" #
1 1
3. Let A = . Then, A is idempotent but not symmetric. Now, define P : R2 → R2 by
0 0
P (v) = Av, for all v ∈ R2 . Then,
(a) P is idempotent.
(b) Null(P ) ∩ Rng(P ) = Null(A) ∩ Col(A) = {0}.
(c) R2 = Null(P ) + Rng(P ). But, (Rng(P ))⊥ = (Col(A))⊥ 6= Null(A).
(d) Since (Col(A))⊥ 6= Null(A), the map P is not an orthogonal projector. In this
case, P is called a projection of R2 onto Rng(P ) along Null(P ).
4. Find all 2 × 2 real matrices A such that A2 = A. Hence, or otherwise, determine all
projection operators of R2 .
5.8. ORTHOGONAL OPERATOR AND RIGID MOTION∗ 155
Example 5.8.2. Prove that the following maps T are orthogonal operators.
1. Fix a unit vector a ∈ Rn and define T : Rn → Rn by T (x) = 2hx, aia − x, for all x ∈ Rn .
Solution: Note that Proja (x) = hx, aia. So, hx, aia, x − hx, aia = 0 and
kT (x)k2 = k(hx, aia) + (hx, aia − x)k2 = khx, aiak2 + kx − hx, aiak2 = kxk2 .
" #" #
cos θ − sin θ x
2. Fix θ, 0 ≤ θ < 2π and define T : R2 → R2 by T (x) = , for all x ∈ R2 .
sin θ cos θ y
" # " #
x cos θ − y sin θ p x
Solution: Note that kT (x)k = k k = x2 + y 2 = k k.
x sin θ + y cos θ y
We now show that an operator is orthogonal if and only if it preserves the angle.
Theorem 5.8.3. Let T ∈ L(Rn ). Then, the following statements are equivalent.
1. T is an orthogonal operator.
156 CHAPTER 5. INNER PRODUCT SPACES
2. hT (x), T (y)i = hx, yi, for all x, y ∈ Rn , i.e., T preserves inner product.
Corollary 5.8.4. Let T ∈ L(Rn ). Then, T is an orthogonal operator if and only if “for every
orthonormal basis {u1 , . . . , un } of Rn , {T (u1 ), . . . , T (un )} is an orthonormal basis of Rn ”.
Thus, if B is an orthonormal ordered basis of Rn then T [B, B] is an orthogonal matrix.
Observe that if T and S are two rigid motions then ST is also a rigid motion. Furthermore,
it is clear from the definition that every rigid motion is invertible.
We now prove that every rigid motion that fixes origin is an orthogonal operator.
Theorem 5.8.7. The following statements are equivalent for any map T : Rn → Rn .
2. T is linear and hT (x), T (y)i = hx, yi, for all x, y ∈ Rn (preserves inner product).
3. T is an orthogonal operator.
Proof. We have already seen the equivalence of Part 2 and Part 3 in Theorem 5.8.3. Let us now
prove the equivalence of Part 1 and Part 2/Part 3.
If T is an orthogonal operator then T (0) = 0 and kT (x) − T (y)k = kT (x − y)k = kx − yk.
This proves Part 3 implies Part 1.
We now prove Part 1 implies Part 2. So, let T be a rigid motion that fixes 0. Thus,
T (0) = 0 and kT (x) − T (y)k = kx − yk, for all x, y ∈ Rn . Hence, in particular for y = 0, we
have kT (x)k = kxk, for all x ∈ Rn . So,
kT (x)k2 + kT (y)k2 − 2hT (x), T (y)i = kT (x) − T (y)k2 = kx − yk2 = kxk2 + kyk2 − 2hx, yi.
5.8. ORTHOGONAL OPERATOR AND RIGID MOTION∗ 157
Thus, using kT (x)k = kxk, for all x ∈ Rn , we get hT (x), T (y)i = hx, yi, for all x, y ∈ Rn . Now,
to prove T is linear, we use hT (x), T (y)i = hx, yi, for all x, y ∈ Rn , in 3-rd and 4-th line below
to get
Thus, T (x+y)−(T (x) + T (y)) = 0 and hence T (x+y) = T (x)+T (y). A similar calculation
gives T (αx) = αT (x) and hence T is linear.
Prove that Orthogonal and Unitary congruences are equivalence relations on Mn (R) and
Mn (C), respectively.
2. Let x ∈ C2 . Identify it with the complex number x = x1 + ix2 . If we rotate x by a
T
AF
xeiθ = (x1 + ix2 ) (cos θ + i sin θ) = x1 cos θ − x2 sin θ + i[x1 sin θ + x2 cos θ].
DR
5. Let U ∈ Mn (C). Then, prove that the following statements are equivalent.
angle.
DR
Ans: Part 5a⇔ Part 5g. If U is unitary, then kxk2 = x∗ x = x∗ U ∗ U x = kU xk2 . Conversely,
we have
hU ∗ U x, xi = hU x, U xi = kU xk2 = kxk2 = hx, xi, for all x.
That is h(U ∗ U − I)x, xi = 0, for all x. Put B = U ∗ U − I. Now, taking x = ei , we see that
B(i, i) = 0. For i 6= j, taking x = ei + ej , we get
5.9 Summary
In the previous chapter, we learnt that if V is vector space over F with dim(V) = n then V
basically looks like Fn . Also, any subspace of Fn is either Col(A) or Null(A) or both, for some
matrix A with entries from F.
So, we started this chapter with inner product, a generalization of the dot product in R3
or Rn . We used the inner product to define the length/norm of a vector. The norm has the
property that “the norm of a vector is zero if and only if the vector itself is the zero vector”. We
T
then proved the Cauchy-Schwartz Inequality which helped us in defining the angle between two
AF
vector. Thus, one can talk of geometrical problems in Rn and proved some geometrical results.
DR
We then independently defined the notion of a norm in Rn and showed that a norm is
induced by an inner product if and only if the norm satisfies the parallelogram law (sum of
squares of the diagonal equals twice the sum of square of the two non-parallel sides).
The next subsection dealt with the fundamental theorem of linear algebra where we showed
that if A ∈ Mm,n (C) then
1. dim(Null(A)) + dim(Col(A)) = n.
⊥ ⊥
2. Null(A) = Col(A∗ ) and Null(A∗ ) = Col(A) .
3. dim(Col(A)) = dim(Col(A∗ )).
So, the question arises, how do we compute an orthonormal basis? This is where we came
across the Gram-Schmidt Orthonormalization process. This algorithm helps us to determine
an orthonormal basis of LS(S) for any finite subset S of a vector space. This also lead to the
QR-decomposition of a matrix.
Thus, we observe the following about the linear system Ax = b. If
1. b ∈ Col(A) then we can use the Gauss-Jordan method to get a solution.
160 CHAPTER 5. INNER PRODUCT SPACES
2. b ∈
/ Col(A) then in most cases we need a vector x such that the least square error between
b and Ax is minimum. We saw that this minimum is attained by the projection of b on
Col(A). Also, this vector can be obtained either using the fundamental theorem of linear
algebra or by computing the matrix B(B T B)−1 B T , where the columns of B are either
the pivot columns of A or a basis of Col(A).
T
AF
DR
Chapter 6
Note that we have been trying to solve the linear system Ax = b. But, in most cases, we are
not able to solve it because of certain restrictions. Hence in the last chapter, we looked at the
nearest solution or obtained the projection of b on the column space of A.
T
AF
These problems arise from the fact that either our data size is too large or there are missing
informations (data is incomplete or the data has ambiguities or the data is inaccurate) or the
DR
data is coming too fast in the sense that our computational power doesn’t match the speed at
which data is received or it could be any other reason. So, to take care of such issues, we either
work with a submatrix of A or with the matrix AT A. We also try to concentrate on only a few
important aspects depending on our past experience.
Thus, we need to find certain set of critical vectors/directions associated with the given
linear system. Hence, in this chapter, all our matrices will be square matrices. They will have
real numbers as entries for convenience. But, we need to work over complex numbers. Hence,
we will be working with Mn (C) and x = (x1 , . . . , xn )T ∈ Cn , for some n ∈ N. Further, Cn will
be considered only as a complex vector space. We start with an example for motivation.
Example 6.1.1. Let A be a real symmetric matrix. Consider the following problem:
n X
n n
!
X X
T T
L(x, λ) = x Ax − λ(x x − 1) = aij xi xj − λ x2i −1 .
i=1 j=1 i=1
161
162 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
∂L
= 2a11 x1 + 2a12 x2 + · · · + 2a1n xn − 2λx1 ,
∂x1
.. ..
.=.
∂L
= 2an1 x1 + 2an2 x2 + · · · + 2ann xn − 2λxn .
∂xn
Therefore, to get the points of extremum, we solve for
T
T ∂L ∂L ∂L ∂L
0 = , ,..., = = 2(Ax − λx).
∂x1 ∂x2 ∂xn ∂x
Thus, to solve the extremal problem, we need λ ∈ R, x ∈ Rn such that x 6= 0 and Ax = λx.
Note that we could have started with a Hermitian matrix and arrived at a similar situation.
So, in previous chapters, we had looked at Ax = b, where A and b were known. Here, we need
to solve Ax = λx with x 6= 0. Note that 0 is already a solution and is not of interest to us.
Further, we will see that we are interested in only those solutions of Ax = λx which are linearly
independent. To proceed further, let us take a few examples, where we will try to look at what
does the system Ax = b imply?
" # " # " #
1 2 9 −2 x
Example 6.1.2. 1. Let A = ,B= and x = .
2 1 −2 6 y
T
AF
1 1 1
" # " # " # " #
1 1 1 1
by changing the direction of as A = −1 . Further, the vectors
−1 −1 −1 1
" #
1
and are orthogonal.
−1
" # " # " # " # " # " #
1 −2 1 1 2 2
(b) B magnifies both the vectors and as B =5 and B = 10 .
2 1 2 2 −1 −1
" # " #
1 2
Here again, the vectors and are orthogonal.
2 −1
(x + y)2 (x − y)2
(c) xT Ax = 3 − . Here, the displacements occur along perpendicular
2 2 " # " #
1 1
lines x + y = 0 and x − y = 0, where x + y = (x, y) and x − y = (x, y) .
1 −1
(x + 2y)2 (2x − y)2
Whereas xT Bx = 5 + 10 . Here also the maximum/minimum
5 5
displacements occur
" # along the orthogonal
" # lines x + 2y = 0 and 2x − y = 0, where
1 2
x + 2y = (x, y) and 2x − y = (x, y) .
2 −1
(d) the curve xT Ax = 10 represents a hyperbola, where as the curve xT Bx = 10 rep-
resents an ellipse (see the left two curves in Figure 6.1 drawn using the package
“Sagemath”).
6.1. INTRODUCTION AND DEFINITIONS 163
Figure 6.1: A Hyperbola and two Ellipses (first one has orthogonal axes)
.
In the above two examples we looked at symmetric matrices. What if our matrix is not
symmetric?
" #
7 −2
2. Let C = , a non-symmetric matrix. Then, does there exist a non-zero x ∈ C2
2 2
which gets magnified by C?
We need x 6= 0 and α ∈ C such that Cx = αx ⇔ [αI2 −C]x = 0. As x 6= 0, [αI2 −C]x = 0
has a solution if and only if det[αI − A] = 0. But,
T
AF
" #!
α−7 2
det[αI − A] = det = α2 − 9α + 18.
DR
−2 α − 2
" # " #
2 1
So α = 6, 3. For α = 6, verify that x = 6= 0 satisfies Cx = 6x. Similarly, x =
1 2
satisfies Cx = 3x. In this example,
" # " #
2 1
(a) we still have magnifications in the directions and .
1 2
(b) the maximum/minimum displacements do not occur along the lines 2x + y = 0 and
x + 2y = 0 (see the third curve in Figure 6.1). Note that
" #
7 0
{x ∈ R2 : xT Ax = 10} = {x ∈ R2 : xT x = 10},
0 2
" #
7 0
where is a symmetrization of A.
0 2
(c) the lines 2x + y = 0 and x + 2y = 0 are not orthogonal.
We observe the following about the matrices A, B and C that appear above:
1. det(A) = −3 = 3 × −1, det(B) = 50 = 5 × 10 and det(C) = 18 = 6 × 3.
2. tr(A) = 2 = 3 − 1, tr(B) = 15 = 5 + 10 and det(C) = 9 = 6 + 3.
(" # " #) (" # " #) (" # " #)
1 1 1 2 2 1
3. The sets , , , and , are linearly independent.
1 −1 2 −1 1 2
164 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
" # " #
1 1
4. If v1 = and v2 = and S = [v1 , v2 ] then
1 −1
" # " #
3 0 3 0
(a) AS = [Av1 , Av2 ] = [3v1 , −v2 ] = S ⇔ S −1 AS = = diag(3, −1).
0 −1 0 −1
1 1
(b) Let u1 = √ v1 and u2 = √ v2 . Then, u1 and u2 are orthonormal unit vectors,
2 2
i.e., if U = [u1 , u2 ] then I = U U ∗ = u1 u∗1 + u2 u∗2 and A = 3u1 u∗1 − u2 u∗2 .
" # " #
1 2
5. If v1 = and v2 = and S = [v1 , v2 ] then
2 −1
" # " #
5 0 5 0
(a) AS = [Av1 , Av2 ] = [5v1 , 10v2 ] = S ⇔ S −1 AS = = diag(3, −1).
0 10 0 10
1 1
(b) Let u1 = √ v1 and u2 = √ v2 . Then, u1 and u2 are orthonormal unit vectors,
5 5
i.e., if U = [u1 , u2 ] then I = U U ∗ = u1 u∗1 + u2 u∗2 and A = 5u1 u∗1 + 10u2 u∗2 .
" # " # " #
2 1 6 0
6. If v1 = and v2 = and S = [v1 , v2 ] then S −1 CS = = diag(6, 3).
1 2 0 3
representation is a diagonal matrix. To understand the ideas better, we start with the following
AF
definitions.
DR
Ax = λx ⇔ (A − λIn )x = 0 (6.1.1)
Proof. Let B = A − αIn . Then, by definition, α is an eigenvalue of A if any only if the system
Bx = 0 has a non-trivial solution. By Theorem 2.6.3 this holds if and only if det(B) = 0.
Definition 6.1.5. Let A ∈ Mn (C). Then det(A − λI) is a polynomial of degree n in λ and is
called the characteristic polynomial of A, denoted PA (λ), or in short P (λ).
6.1. INTRODUCTION AND DEFINITIONS 165
Remark 6.1.6. Let A ∈ Mn (C). Then A is singular if and only if 0 ∈ σ(A). Further, the
following statements hold.
1. If α ∈ σ(A) then
(a) {0} $ Null(A − αI). Therefore, if Rank(A − αI) = r then r < n. Hence, by
Theorem 2.6.3, the system (A − αI)x = 0 has n − r linearly independent solutions.
(b) v ∈ Null(A−αI) if and only if cv ∈ Null(A−αI), for c 6= 0. Thus, an eigenvector
v of A is in some sense a line ` = Span({v}) that passes through 0 and v and has
the property that the image of ` is either ` itself or 0.
r
ci xi ∈ Null(A − αI), for all ci ∈ C. Hence, if
P
(c) If x1 , . . . , xr ∈ Null(A − αI) then
i=1
S is a collection of eigenvectors then, we necessarily want the set S to be linearly
independent.
T
2. α ∈ σ(A) if and only if α is a root of PA (x) ∈ C[x]. As deg(PA (x)) = n, A has exactly n
AF
Almost all books in mathematics differentiate between characteristic value and eigenvalue
as the ideas change when one moves from complex numbers to any other scalar field. We give
the following example for clarity.
Remark 6.1.7. Let A ∈ M2 (F). Then, A induces a map T ∈ L(F2 ) defined by T (x) = Ax, for
all x ∈ F2 . We use this idea to understand the difference.
" #
0 1
1. Let A = . Then, pA (λ) = λ2 + 1.
−1 0
Let us look at some more examples. Also, as stated earlier, we look at roots of the charac-
teristic equation over C.
0 1 0
6. Let A = 0 0 1. Then, σ(A) = {0, 0, 0} with e1 as the only eigenvector.
0 0 0
0 1 0 0 0 x1
0 0 1 0 0 x2
7. Let A = 0 0 0 0 0. Then, σ(A) = {0, 0, 0, 0, 0}. Note that Ax3 = 0 implies
0 0 0 0 1 x
4
0 0 0 0 0 x5
x2 = 0 = x3 = x5 . Thus, e1 and e4 are the only eigenvectors. Note that the diagonal
blocks of A are nilpotent matrices.
Exercise 6.1.9. 1. Prove that the matrices A and AT have the same set of eigenvalues.
Construct a 2 × 2 matrix A such that the eigenvectors of A and AT are different.
Ans: Use "pA (α) = det(A−αI)
# = det((A−αI)T ) = det(AT −αI) = p"AT (α).
# For" the#second
1 1 1 1
part, A = . Then 0 ∈ σ(A). Verify that 1T A = 01T and A =0 .
−1 −1 −1 −1
4. Let A be a nilpotent matrix. Then, prove that its eigenvalues are all 0.
Ans: A is nilpotent implies there exist N ∈ N such that AN = 0. So, if (α, x) is an eigen-pair
of A then αN is an eigenvalue of AN = 0. Thus αN = 0 ⇒ α = 0.
5. Let J = 11T ∈ Mn (C). Then, J is a matrix with each entry 1. Show that
(a) (n, 1) is an eigenpair for J.
(b) 0 ∈ σ(J) with multiplicity n − 1. Find a set of n − 1 linearly independent eigenvectors
for 0 ∈ σ(J).
Ans: Note that J1 = (11T )1 = 1(1T 1) = n1. Let x ∈ Rn such that 1T x = 0. Note that
1
there are n − 1 such linearly independent vectors as the set √ 1 can be extended to form an
n
orthonormal basis of Rn . For each such x, Jx = (11T )x = 1(1T x) = 0x.
T
6. Let A = [aij ] ∈ Mn (R), where aij = a, if i = j and b, otherwise. Then, verify that
AF
9. Let A ∈ Mn (C) satisfy kAxk ≤ kxk for all x ∈ Cn . Then prove that every eigenvalue of
A lies between −1 and 1.
n
10. Let A = [aij ] ∈ Mn (C) with
P
aij = a, for all 1 ≤ i ≤ n. Then, prove that a is an
j=1
eigenvalue of A with corresponding eigenvector 1 = [1, 1, . . . , 1]T .
Ans: Just multiply by 1.
" #
B 0
11. Let B ∈ Mn (C) and C ∈ Mm (C). Let Z = . Then
0 C
168 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
" #!
x
(a) (α, x) is an eigen-pair for B implies α, is an eigen-pair for Z.
0
" #!
0
(b) (β, y) is an eigen-pair for C implies β, is an eigen-pair for Z.
y
Definition 6.1.10. Let A ∈ L(Cn ). Then, a vector y ∈ Cn \ {0} satisfying y∗ A = λy∗ is called
a left eigenvector of A for λ.
" # " # " # " # " #
7 −2 2 1 2 1
Example 6.1.11. Let A = , x= , y = , u= and v = . Then
2 2 1 2 −1 −2
verify that (6, x) and (3, y) are (right) eigen-pairs of A and (6, u), (3, v) are left eigen-pairs of
A. Note that xT v = 0 and yT u = 0. This is true in general and is proved next.
Theorem 6.1.12. [Principle of bi-orthogonality] Let (λ, x) be a (right) eigen-pair and (µ, y)
be a left eigen-pair of A. If λ 6= µ then y is orthogonal to x.
Proof. Verify that µy∗ x = (y∗ A)x = y∗ (Ax) = y∗ (λx) = λy∗ x. Thus y∗ x = 0.
Ans: Note λ(x∗ x) = x∗ (λx) = x∗ (Ax) = (x∗ A)x = (µx∗ )x = µ(x∗ x).
T
AF
2. Let S be a non-singular matrix such that its columns are left eigenvectors of A. Then,
prove that the columns of (S ∗ )−1 are right eigenvectors of A.
DR
Proposition 6.1.15. Let T ∈ L(Cn ) and let B be an ordered basis in Cn . Then (α, v) is an
eigen-pair of T if and only if (α, [v]B ) is an eigen-pair of A = T [B, B].
Proof. By definition, T (v) = αv if and only if [T v]B = [αv]B . Or equivalently, α ∈ σ(T ) if and
only if A[v]B = α[v]B . Thus, the required result follows.
Thus, the spectrum of a linear operator is independent of the choice of basis.
Remark 6.1.16. We give two examples to show that a linear operator on an infinite
dimensional vector space need not have an eigenvalue.
6.1. INTRODUCTION AND DEFINITIONS 169
1. Let V be the space of all real sequences (see Example 3.1.4.7) and define T ∈ L(V) by
We now prove the observations that det(A) is the product of eigenvalues and tr(A) is the
sum of eigenvalues.
Theorem 6.1.17. Let λ1 , . . . , λn , not necessarily distinct, be the A = [aij ] ∈ Mn (C). Then,
n
Q Pn n
P
det(A) = λi and tr(A) = aii = λi .
i=1 i=1 i=1
i=1
a11 − x a12 ··· a1n
a21 a22 − x · · · a 2n
det(A − xIn ) = . (6.1.3)
.. .. .. ..
. . .
an1 an2 · · · ann − x
= a0 − xa1 + · · · + (−1)n−1 xn−1 an−1 + (−1)n xn (6.1.4)
for some a0 , a1 , . . . , an−1 ∈ C. Then, an−1 , the coefficient of (−1)n−1 xn−1 , comes from the term
Ans: Use induction with A2 x = A(Ax) = A(αx) = α(Ax) = α2 x as the idea. Further, A
1
/ σ(A). Thus, Ax = αx with α 6= 0 implies x = A−1 x.
is invertible implies 0 ∈
α
3. Let A be a 3 × 3 orthogonal matrix (AAT = I). If det(A) = 1, then prove that there exists
v ∈ R3 \ {0} such that Av = v.
Ans: A is orthogonal ⇒ det(A) = ±1. Let (α, x) be an eigen-pair. As A is orthogonal,
kAvk = kvk for all v ∈ Rn . Thus |α| = 1. Further, A has real entries implies the complex
roots occur in pair. So, among the 3 eigenvalues, one of them has to be real. Thus, det(A) = 1
implies the real root is 1. Hence, Av = v, the eigenvector for 1.
We now show that for any eigenvalue α, the algebraic and geometric multiplicities do not
change under similarity transformation, or equivalently, under change of basis.
Theorem 6.2.3. Let A and B be two similar matrices. Then,
1. α ∈ σ(A) if and only if α ∈ σ(B).
2. for each α ∈ σ(A), Alg.Mulα (A) = Alg.Mulα (B) and Geo.Mulα (A) = Geo.Mulα (B).
Proof. Since A and B are similar, there exists an invertible matrix S such that A = SBS −1 .
So, α ∈ σ(A) if and only if α ∈ σ(B) as
Note that Equation (6.2.5) also implies that Alg.Mulα (A) = Alg.Mulα (B). We will now
show that Geo.Mulα (A) = Geo.Mulα (B).
So, let Q1 = {v1 , . . . , vk } be a basis of Null(A − αI). Then, B = SAS −1 implies that
Q2 = {Sv1 , . . . , Svk } ⊆ Null(B − αI). Since Q1 is linearly independent and S is invertible, we
get Q2 is linearly independent. So, Geo.Mulα (A) ≤ Geo.Mulα (B). Now, we can start with
eigenvectors of B and use similar arguments to get Geo.Mulα (B) ≤ Geo.Mulα (A). Hence
the required result follows.
Remark 6.2.4. 1. Let A = S −1 BS. Then, from the proof of Theorem 6.2.3, we see that x
is an eigenvector of A for λ if and only if Sx is an eigenvector of B for λ.
T
" #
0 0 0 1
For example, take A = and B = .
DR
0 0 0 0
3. Let A ∈ Mn (C). Then, for any invertible matrix B, the matrices AB and BA =
B(AB)B −1 are similar. Hence, in this case the matrices AB and BA have
(a) the same set of eigenvalues.
(b) Alg.Mulα (AB) = Alg.Mulα (BA), for each α ∈ σ(A).
(c) Geo.Mulα (AB) = Geo.Mulα (BA), for each α ∈ σ(A).
We will now give a relation between the geometric multiplicity and the algebraic multiplicity.
Theorem 6.2.5. Let A ∈ Mn (C). Then, for α ∈ σ(A), Geo.Mulα (A) ≤ Alg.Mulα (A).
Proof. Let Geo.Mulα (A) = k. So, suppose that {v1 , . . . , vk } is an orthonormal basis of
Null(A − αI). Extend it to get {v1 , . . . , vk , vk+1 , . . . , vn } as an orthonormal basis of Cn . Put
P = [v1 , . . . , vk , vk+1 , . . . , vn ]. Then P ∗ = P −1 and
Thus, the matrix B in the proof of Theorem 6.2.5 is with eigenvalues are −1 ± i.
AF
1 −1
DR
Ans: (a) Bx = (xyT A−1 )x = x yT A−1 x = λ0 x. Note that xyT A−1 and yT A−1 x, a
Ans: (a) Yes as they are solutions of the same characteristic polynomial x2 −tr(A)x+det(A).
DR
5. Let A, B ∈ Mn (R). Also, let (λ1 , u) and (λ2 , v) are eigen-pairs of A and B, respectively.
Proof. Let {v1 , . . . , vk } be linearly dependent. Then, there exists a smallest ` ∈ {1, . . . , k − 1}
and c 6= 0 such that v`+1 = c1 v1 + · · · + c` v` . So,
and
0 = (α`+1 − α1 ) c1 v1 + · · · + (α`+1 − α` ) c` v` .
So, v` ∈ LS(v1 , . . . , v`−1 ), a contradiction to the choice of `. Thus, the required result follows.
An immediate corollary of Theorem 6.3.3 and Theorem 6.3.4 is stated next without proof.
2. The converse of Theorem 6.3.4 is not true as In has n linearly independent eigenvectors
AF
k
S
So, to prove that Si is linearly independent, consider the linear system
i=1
in the variables cij ’s. Now, applying the matrix pj (A) and using Equation (6.3.3), we get
Y
(αj − αi ) cj1 uj1 + · · · + cjnj ujnj = 0.
i6=j
Q
But (αj − αi ) 6= 0 as αi ’s are distinct. Hence, cj1 uj1 + · · · + cjnj ujnj = 0. As Sj is a basis
i6=j
of Null(A − αj In ), we get cjt = 0, for 1 ≤ t ≤ nj . Thus, the required result follows.
176 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
2 1 1 1 1
Example 6.3.11. Let A = 1 2 1 . Then, 1, 0 and 2, 1
are the only
0 −1 1 −1 −1
eigen-pairs. Hence, by Theorem 6.3.3, A is not diagonalizable.
3. Let A be an n×n matrix with λ ∈ σ(A) with alg.mulλ (A) = m. If Rank[A−λI] 6= n−m
then prove that A is not diagonalizable.
4. Let A and B be two similar matrices such that A is diagonalizable. Prove that B is
diagonalizable.
5. If σ(A) = σ(B) and both A and B are diagonalizable then prove that A is similar to B.
Thus, they are two basis representation of the same linear transformation.
Ans: Let A = RD1 R−1 and B = QD2 Q−1 with σ(A) as the diagonal entries of D1 and
σ(B) as the diagonal entries of D2 . As σ(A) = σ(B), there exists a permutation matrix P
such that D1 = P D2 P −1 (P −1 = P T ). Verify that B = (RP Q−1 )−1 A(RP Q−1 ).
" #
A 0
6. Let A ∈ Mn (R) and B ∈ Mm (R). Suppose C = . Then, prove that C is diagonal-
0 B
izable if and only if both A and B are diagonalizable.
13. Let u, v ∈ Cn such that {u, v} is a linearly independent set. Define A = uvT + vuT .
(a) Then prove that A is a symmetric matrix.
(b) Then prove that dim(Ker(A)) = n − 2.
(c) Then 0 ∈ σ(A) and has multiplicity n − 2.
(d) Determine the other eigenvalues of A.
" #
vT
Ans: (a) AT = (uvT + vuT )T = vuT + uvT
= A. Also, A = [u, v] T .
u
⊥ ⊥
(b) Let w ∈ {u, v} . Then Aw = 0 and dim {u, v} = n − 2.
(c) Hence, 0 is an eigenvalue with multiplicity n − 2.
(d) As the eigenvalues of AB and BA " are same (except
# for the multiplicity of the
T
v u v vT
eigenvalue 0), consider the 2 × 2 matrix . The eigenvalue of this 2 × 2
uT u uT v
matrix gives the other eigenvalues.
T
AF
DR
Lemma 6.4.1. [Schur’s unitary triangularization (SUT)] Let A ∈ Mn (C). Then, there exists
a unitary matrix U such that A is similar to an upper triangular matrix. Further, if A ∈ Mn (R)
and σ(A) have real entries then U is a real orthogonal matrix.
Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let n > 1
and assume the result to be true for k < n and prove it for n.
Let (λ1 , x1 ) be an eigen-pair of A with kx1 k = 1. Now, extend it to form an orthonormal
basis {x1 , x2 , . . . , xn } of Cn and define X = [x1 , x2 , . . . , xn ]. Then, X is a unitary matrix and
x∗
1∗ " #
x2 λ1 ∗
∗ ∗
X AX = X [Ax1 , Ax2 , . . . , Axn ] = . [λ1 x1 , Ax2 , . . . , Axn ] = , (6.4.4)
.. 0 B
x∗n
where B ∈ Mn−1 (C). Now, by induction hypothesis there exists a unitary # U ∈ Mn−1 (C)
" matrix
such that U ∗ BU = T is an upper triangular matrix. Define U b = X 1 0 . As product of
0 U
6.4. SCHUR’S UNITARY TRIANGULARIZATION AND DIAGONALIZABILITY 179
Remark 6.4.2. Let A ∈ Mn (C). Then, by Schur’s Lemma there exists a unitary matrix U
such that U ∗ AU = T = [tij ], a triangular matrix. Thus,
Definition 6.4.3. Let A, B ∈ Mn (C). Then, A and B are said to be unitarily equiva-
lent/similar if there exists a unitary matrix U such that A = U ∗ BU .
T
AF
Remark 6.4.4. We know that if two matrices are unitarily equivalent then they are necessarily
similar as U ∗ = U −1 , for every unitary matrix U . But, similarity doesn’t imply unitary equiv-
DR
alence (see Exercise 6.4.6.5). In numerical calculations, unitary transformations are preferred
as compared to similarity transformations due to the following main reasons:
1. A is unitary implies kAxk = kxk. This need not be true under a similarity.
Exercise 6.4.6.
2. Consider
the following 6 matrices.
√ √ √
2 −1 3 2 2 1 3 2 2 0 3 2
√ √ √
M1 =
0 1 , M2 = 0 1
2 − 2, M3 = 1
1 2,
0 0 3 0 0 3 0 0 1
√
2 0 3 2 1 1 4 2 1 4
√
M4 = −1 1 − 2, M5 = 0 2
2 and M6 = 0
.
1 2
0 0 1 0 0 3 0 0 1
Now, use the exercises given below to conclude that the upper triangular matrix obtained
in the “Schur’s Lemma” need not be unique.
(a) Prove that M1 , M2 and M5 are unitarily equivalent.
(b) Prove that M3 , M4 and M6 are unitarily equivalent.
(c) Do the above results contradict Exercise 5.8.8.5c? Give reasons for your answer.
√ √
1 0 0 1/ 2 1/ 2 0
√ √
Ans: Let U = 0 −1 0 and V = 1/ 2 −1/ 2 0. Then U and V are unitary
0 0 1 0 0 1
and U M1 U −1 = M2 , V M 1 V −1 = M5 , U M 3 U −1 = M4 , V M3 V −1 = M6 . No, they do
|aij |2 , for all A = [aij ].
P
not contradict Exercise 5.8.8.5c as we need to look at
√
1 1 1 2 −1 2
3. Prove that A = 0 2 1 and B = 0 1 are unitarily equivalent.
0
T
AF
0 0 3 0 0 3
√ √
DR
1/ 2 1/ 2 0
√ √ −1 = N .
Ans: Let V = 1/ 2 −1/ 2 0. Then V M V
0 0 1
4. Let A ∈ Mn (C). Then, Prove that if x∗ Ax = 0, for all x ∈ Cn , then A = 0. Do these
results hold for arbitrary matrices?
Ans: Let x = ej . Then x∗ Ax = 0 ⇒ ajj = 0, for all j. Now, take x = e1 + e2 and
y = e1 + ie2 . Then 0 = x∗ Ax = aij + aji and 0 = y∗ Ay = aij − aji . Thus aij = 0.
" # " #
4 4 10 9
5. Show that the matrices A = and B = are similar. Is it possible to find
0 4 −4 −2
a unitary matrix U such that A = U ∗ BU ?
" #
3 2
Ans: Take S = . Then S −1 BS = A. There doesn’t exist an unitary matrix as the
−2 0
sum of the squares of the matrix entries are NOT equal.
We now use Lemma 6.4.1 to give another proof of Theorem 6.1.17.
n n
Corollary 6.4.7. Let A ∈ Mn (C). If α1 , . . . , αn ∈ σ(A) then det(A) =
Q P
αi and tr(A) = αi .
i=1 i=1
Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
n
Q n
Q
triangular matrix. By Remark 6.4.2, σ(A) = σ(T ). Hence, det(A) = det(T ) = tii = αi
i=1 i=1
n n
and tr(A) = tr(A(UU∗ )) = tr(U∗ (AU)) = tr(T) =
P P
tii = αi .
i=1 i=1
6.4. SCHUR’S UNITARY TRIANGULARIZATION AND DIAGONALIZABILITY 181
We now use Schur’s unitary triangularization Lemma to state the main theorem of this sub-
section. Also, recall that A is said to be a normal matrix if AA∗ = A∗ A. Further, Hermitian,
skew-Hermitian and scalar multiples of Unitary matrices are examples of normal matrices.
Theorem 6.4.8. [Spectral Theorem for Normal Matrices] Let A ∈ Mn (C). If A is a normal
matrix then there exists a unitary matrix U such that U ∗ AU = diag(α1 , . . . , αn ).
Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
triangular matrix. Since A is a normal
T ∗ T = (U ∗ AU )∗ (U ∗ AU ) = U ∗ A∗ AU = U ∗ AA∗ U = (U ∗ AU )(U ∗ AU )∗ = T T ∗ .
Thus, we see that T is an upper triangular matrix with T ∗ T = T T ∗ . Thus, by Exercise 1.3.13.8,
T is a diagonal matrix and this completes the proof.
We re-write Theorem 6.4.8 in another form to indicate that A can be decomposed into linear
combination of orthogonal projectors onto eigen-spaces. Thus, it is independent of the choice
of eigenvectors. This remark is also valid for Hermitian, skew-Hermitian and Unitary matrices.
(c) the columns of U form a set of orthonormal eigenvectors for A (use Theorem 6.3.3).
(d) A = A · In = A (u1 u∗1 + · · · + un u∗n ) = α1 u1 u∗1 + · · · + αn un u∗n .
Theorem 6.4.10. [Spectral Theorem for Hermitian Matrices] Let A ∈ Mn (C) be a Hermitian
matrix. Then Remark 6.4.9 holds. Further, all the eigenvalues of A are real.
Proof. The first part is immediate from Theorem 6.4.8 as Hermitian matrices are also normal
matrices. Let (α, x) be an eigen-pair. To show, α is a real number.
As A∗ = A and Ax = αx, we have x∗ A = x∗ A∗ = (Ax)∗ = (αx)∗ = αx∗ . Hence,
Corollary 6.4.11. Let A ∈ Mn (R) be symmetric. Then there exists an orthogonal matrix
P and real numbers α1 , . . . , αn such that A = P diag(α1 , . . . , αn )P T . Or equivalently, A is
diagonalizable using orthogonal matrix.
Exercise 6.4.12. 1. Let A be a normal matrix. If all the eigenvalues of A are 0 then prove
that A = 0. What happens if all the eigenvalues of A are 1?
−1 2
AF
4. Let σ(A) = {λ1 , . . . , λn }. Then, prove that the following statements are equivalent.
DR
(a) A is normal.
(b) A is unitarily diagonalizable.
|aij |2 = |λi |2 .
P P
(c)
i,j i
(d) A has n orthonormal eigenvectors.
Ans: In view of earlier results, we only prove c) ⇒ b). By Schur’ theorem, there exists
a unitary matrix U such that U ∗ AU = T is upper triangular. As U ∗ AU = T , we have
|aij |2 = |tij |2 = |tii |2 . So tij = 0, for all i < j.
P P P
i,j i,j i
(a) if det(A) = 1 then A is a rotation about a fixed axis, in the sense that A has an
eigen-pair (1, x) such that the restriction of A to the plane x⊥ is a two dimensional
rotation in x⊥ .
(b) if det A = −1 then A corresponds to a reflection across a plane P , followed by a
rotation about the line through origin that is orthogonal to P .
9. Let A be a normal matrix. Then, prove that Rank(A) equals the number of nonzero
eigenvalues of A.
10. [Equivalent characterizations of Hermitian matrices] Let A ∈ Mn (C). Then, the fol-
lowing statements are equivalent.
T
AF
holds true as a matrix identity. This is a celebrated theorem called the Cayley Hamilton
theorem. We give a proof using Schur’s unitary triangularization. To do so, we look at
multiplication of certain upper triangular matrices.
Lemma 6.4.13. Let A1 , . . . , An ∈ Mn (C) be upper triangular matrices such that the (i, i)-th
entry of Ai equals 0, for 1 ≤ i ≤ n. Then A1 A2 · · · An = 0.
B[:, i] = A1 [:, 1](A2 )1i + A1 [:, 2](A2 )2i + · · · + A1 [:, n](A2 )ni = 0 + · · · + 0 = 0.
B[:, i] = C[:, 1](An )1i + C[:, 2](An )2i + · · · + C[:, n](An )ni = 0 + · · · + 0 = 0.
AF
DR
Theorem 6.4.14. [Cayley Hamilton Theorem] Let A ∈ Mn (C). Then A satisfies its charac-
teristic equation, i.e., if PA (x) = det(xIn − A) = xn − an−1 xn−1 + · · · + (−1)n−1 a1 x + (−1)n a0
then
An − an−1 An−1 + · · · + (−1)n−1 a1 A + (−1)n a0 I = 0
Therefore,
n
Y n
Y h i
PA (A) = (A − αi I) = (U T U ∗ − αi U IU ∗ ) = U (T − α1 I) · · · (T − αn I) U ∗ = U 0U ∗ = 0.
i=1 i=1
1 0 0
DR
3
5. For A = 0 1 , PA (t) = (t − 1) . So PA (A) = 0. But, observe that q(A) = 0, where
1
0 0 1
q(t) = (t − 1)2 .
(a) Then, for any ` ∈ N, the division algorithm gives α0 , α1 , . . . , αn−1 ∈ C and a poly-
nomial f (x) with coefficients from C such that
iii. In the language of graph theory, it says the following: “Let G be a graph on n
vertices and A its adjacency matrix. Suppose there is no path of length n − 1 or
less from a vertex v to a vertex u in G. Then, G doesn’t have a path from v to u
of any length. That is, the graph G is disconnected and v and u are in different
components of G.”
186 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
1
A−1 = a1 I − a2 A + · · · + (−1)n−2 an−1 An−2 + (−1)n−1 An−1 .
a0
(c) The above also implies that if A is invertible then A−1 ∈ LS I, A, A2 , . . . . That is,
1.
Use the Cayley-Hamilton
theorem
to compute
the inverse of following matrices:
2 3 4 −1 −1 1 1 −2 −1
5
6 7, 1 −1 1 and −2 1 −1
.
1 1 2 0 1 1 0 −1 2
3. Let A, B ∈ Mn (C) be upper triangular matrices with the top leading principal submatrix of
A of size k being 0. If B[k + 1, k + 1] = 0 then prove that the leading principal submatrix
of size k + 1 of AB is 0.
" # " #! " #!
0 B x x
4. Let B ∈ Mm,n (C) and A = λ, is an eigen-pair ⇔ −λ,
T
. Then
BT 0 y −y
AF
is an eigen-pair.
DR
" #
B C
5. Let B, C ∈ Mn (R). Define A = . Then, prove the following:
−C B
" #
x
(a) if s is a real eigenvalue of A with corresponding eigenvector then s is also an
y
" #
−y
eigenvalue corresponding to the eigenvector .
x
" #
x + iy
(b) if s + it is a complex eigenvalue of A with corresponding eigenvector then
−y + ix
" #
x − iy
s − it is also an eigenvalue of A with corresponding eigenvector .
−y − ix
(c) (s + it, x + iy) is an eigen-pair of B+iC if and only if (s − it, x − iy) is an eigen-pair
of B − iC.
" #!
x + iy
(d) s + it, is an eigen-pair of A if and only if (s + it, x + iy) is an eigen-
−y + ix
pair of B + iC.
(e) det(A) = | det(B + iC)|2 .
The next section deals with quadratic forms which helps us in better understanding of conic
sections in analytic geometry.
6.5. QUADRATIC FORMS 187
Lemma 6.5.2. Let A ∈ Mn (C). Then A is Hermitian if and only if at least one of the following
statements hold:
1. S ∗ AS is Hermitian for all S ∈ Mn .
3. x∗ Ax ∈ R for all x ∈ Cn .
For the last part, note that x∗ Ax ∈ C. Thus x∗ Ax = (x∗ Ax)∗ = x∗ A∗ x = x∗ Ax, we get
AF
Remark 6.5.3. Let A ∈ Mn (R). Then the condition x∗ Ax ∈ R, for all x ∈ Cn , in Defini-
tion 6.5.8 implies AT = A, i.e., A is a symmetric matrix. But, when we study matrices over
R, we seldom consider vectors from Cn . So, in such cases, we assume A is symmetric.
" # " #
2 1 3 1+i
Example 6.5.4. 1. Let A = or A = . Then, A is positive definite.
1 2 1−i 4
" # "√ #
1 1 2 1+i
2. Let A = or A = √ . Then, A is positive semi-definite but not positive
1 1 1−i 2
definite.
" # " #
−2 1 −2 1 − i
3. Let A = or A = . Then, A is negative definite.
1 −2 1 + i −2
" # " #
−1 1 −2 1 − i
4. Let A = or A = . Then, A is negative semi-definite.
1 −1 1 + i −1
188 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
" # " #
0 1 1 1+i
5. Let A = or A = . Then, A is indefinite.
1 −1 1−i 1
Theorem 6.5.5. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is positive semi-definite.
2. A∗ = A and each eigenvalue of A is non-negative.
3. A = B ∗ B for some B ∈ Mn (C).
Proof. 1 ⇒ 2: Let A be positive semi-definite. Then, by Lemma 6.5.2, A is Hermitian. If
(α, v) is an eigen-pair of A then αkvk2 = α(v∗ v) = v∗ (αv) = v∗ Av ≥ 0. So, α ≥ 0.
2 ⇒ 3: Let σ(A) = {α1 , . . . , αn }. Then, by spectral theorem, there exists a unitary
matrix U such that U ∗ AU = D with D = diag(α1 , . . . , αn ). As αi ≥ 0, for 1 ≤ i ≤ n, define
1 √ √ 1 1
D 2 = diag( α1 , . . . , αn ). Then, A = U D 2 [D 2 U ∗ ] = B ∗ B.
3 ⇒ 1: Let A = B ∗ B. Then, for x ∈ Cn , x∗ Ax = x∗ B ∗ Bx = kBxk2 ≥ 0. Thus, the
required result follows.
A similar argument gives the next result and hence the proof is omitted.
Theorem 6.5.6. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is positive definite.
2. A∗ = A and each eigenvalue of A is positive.
3. A = B ∗ B for a non-singular matrix B ∈ Mn (C).
T
such that A = U DU ∗ . Now, for 1 ≤ i ≤ n, define αi = max{λi , 0} and βi = min{λi , 0}. Then
1. for D1 = diag(α1 , α2 , . . . , αn ), the matrix A1 = U D1 U ∗ is positive semi-definite.
2. for D2 = diag(β1 , β2 , . . . , βn ), the matrix A2 = U D2 U ∗ is positive semi-definite.
3. A = A1 − A2 . The matrix A1 is generally called the positive semi-definite part of A.
Definition 6.5.8. Let A = [aij ] ∈ Mn (C) be a Hermitian matrix and let x, y ∈ Cn . Then, a
sesquilinear form in x, y ∈ Cn is defined as H(x, y) = y∗ Ax. In particular, H(x, x), denoted
H(x), is called a Hermitian form. In case A ∈ Mn (R), H(x) is called a quadratic form.
Remark 6.5.9. Observe that
1. if A = In then the bilinear/sesquilinear form reduces to the standard inner product.
2. H(x, y) is ‘linear’ in the first component and ‘conjugate linear’ in the second component.
3. the quadratic form H(x) is a real number. Hence, for α ∈ R, the equation H(x) = α,
represents a conic in Rn .
Example 6.5.10. 1. Let A ∈ Mn (R). Then, f (x, y) = yT Ax, for x, y ∈ Rn , is a bilinear
form on Rn .
" # " #
1 2−i x
2. Let A = . Then, A∗ = A and for x = ∈ C2 , verify that
2+i 2 y
H(x) = x∗ Ax = |x|2 + 2|y|2 + 2Re ((2 − i)xy)
where ‘Re’ denotes the real part of a complex number, is a sesquilinear form.
6.5. QUADRATIC FORMS 189
The main idea of this section is to express H(x) as sum or difference of squares. Since H(x) is
a quadratic in x, replacing x by cx, for c ∈ C, just gives a multiplication factor by |c|2 . Hence,
one needs to study only the normalized vectors. Let us consider Example 6.1.2 again. There
we see that
(x + y)2 (x − y)2
xT Ax = 3 − = (x + 2y)2 − 3y 2 , and (6.5.1)
2 2
(x + 2y)2 (2x − y)2 2y 50y 2
xT Bx = 5 + 10 = (3x − )2 + . (6.5.2)
5 5 3 9
Note that both the expressions in Equation (6.5.1) is the difference of two non-negative terms.
Whereas, both the expressions in Equation (6.5.2) consists of sum of two non-negative terms.
Is the number of non-negative terms, appearing in the above expressions, just a coincidence?
For a better understanding, we define inertia of a Hermitian matrix.
Definition 6.5.11. Let A ∈ Mn (C) be a Hermitian matrix. The inertia of A, denoted i(A),
is the triplet (i+ (A), i− (A), i0 (A)), where i+ (A) is the number of positive eigenvalues of A,
i− (A) is the number of negative eigenvalues of A and i0 (A) is the nullity of A. The difference
i+ (A) − i− (A) is called the signature of A.
Exercise 6.5.12. Let A ∈ Mn (C) be a Hermitian matrix. If the signature and the rank of A
is known then prove that one can find out the inertia of A.
T
AF
To proceed with the earlier discussion, let A ∈ Mn (C) be Hermitian with eigenvalues
DR
Lemma 6.5.13. [Sylvester’s Law of Inertia] Let A ∈ Mn (C) be a Hermitian matrix and let
x ∈ Cn . Then, every Hermitian form H(x) = x∗ Ax, in n variables can be written as
where y1 , . . . , yr are linearly independent linear forms in the components of x and the integers
p and r satisfying 0 ≤ p ≤ r ≤ n, depend only on A.
Proof. Equation (6.5.3) implies that H(x) has the required form. We only need to show that
p and r are uniquely determined by A. Hence, let us assume on the contrary that there exist
p, q, r, s ∈ N with p > q such that
0
AF
(6.5.5), we have
DR
Remark 6.5.14. Since A is Hermitian, Rank(A) equals the number of nonzero eigenvalues.
Hence, Rank(A) = r. The number r is called the rank and the number r − 2p is called the
inertial degree of the Hermitian form H(x).
1. As rank and nullity do not change under similarity transformation, i0 (A) = i0 (DB ) = m
as i(B) = (k, l, m).
Similarly, X[:, k + 2]∗ AX[:, k + 2] = · · · = X[:, k + l]∗ AX[:, k + l] = −1. As the vectors
X[:, k + 1], . . . , X[:, k + l] are linearly independent, using 9.7.10, we see that A has at least
l negative eigenvalues.
3. Similarly, X[:, 1]∗ AX[:, 1] = · · · = X[:, k]∗ AX[:, k] = 1. As X[:, 1], . . . , X[:, k] are linearly
independent, using 9.7.10 again, we see that A has at least k positive eigenvalues.
form, to characterize conic sections in R2 , with respect to the standard inner product.
Proposition 6.5.18. Consider the quadratic f (x, y) = ax2 + 2hxy + by 2 + 2gx + 2f y + c, for
a, b, c, g, f, h ∈ R. If (a, b, h) 6= (0, 0, 0) then f (x, y) = 0 represents
orthogonal matrix, with (α1 , u1 ) and (α2 , u2 ) as eigen-pairs of A. As (a, b, h) 6= (0, 0, 0) at least
one of α1 , α2 6= 0. Also,
" # " # " #" #
h i α 0 x h i α 0 u
1 1
xT Ax = x, y U UT = u v = α1 u2 + α2 v 2 ,
0 α2 y 0 α2 v
" #
u
where = U T x. The lines u = 0, v = 0 are the two linearly independent linear forms, which
v
correspond to two perpendicular lines passing through the origin in the (x, y)-plane. In terms
of u, v, f (x, y) reduces to f (u, v) = α1 u2 + α2 v 2 + d1 u + d2 v + c, for some choice of d1 , d2 ∈ R.
We now look at different cases:
d2 2
f (u, v) = 0 ⇔ α2 v + = c1 − d1 u,
2α2
for some c1 ∈ R.
d2
(a) If d1 = 0, the quadratic corresponds to either the same line v + = 0, two parallel
2α2
lines or two imaginary lines, depending on whether c1 = 0, c1 α2 > 0 and c1 α2 < 0,
respectively.
(b) If d1 6= 0, the quadratic corresponds to a parabola of the form V 2 = 4aU , for some
T
translate U = u + α and V = v + β.
AF
α1 (u + d1 )2 α2 (v + d2 )2
− = 1.
d3 d3
α1 (u + d1 )2 α2 (v + d2 )2
+ = 1.
d3 d3
Thus, we have considered all the possible cases and the required result follows.
6.5. QUADRATIC FORMS 193
" # " #
u x
Remark 6.5.19. Observe that the linearly independent forms = UT are functions of
v y
the eigenvectors u1 and u2 . Further, the linearly independent forms together with the shifting
of the origin give us the principal axes of the corresponding conic.
2 2
2. Let H(x)
" # − 5y + 20xy be the associated
= 10x " quadratic
√ #!
form for "
a class of#!
√
curves. Then
10 10 2/ 5 1/ 5
A= and the eigen-pairs are 15, √ and −10, √ . So, for
10 −5 1/ 5 −2/ 5
(a) f (x, y) = 10x2 − 5y 2 + 20xy + 16x − 2y + 1, we have f (x, y) = 0 ⇔ 3(2x + y + 1)2 −
2(x − 2y − 1)2 = 0, a pair of perpendicular lines.
(b) f (x, y) = 10x2 − 5y 2 + 20xy + 16x − 2y + 19, we have
x − 2y − 1 2 2x + y + 1 2
f (x, y) = 0 ⇔ − √ = 1,
3 6
a hyperbola.
194 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
2x + y + 1 2 x − 2y − 1 2
f (x, y) = 0 ⇔ √ − = 1,
6 3
a hyperbola.
3. Let H(x) = 2 2
" # 6x + 9y + 4xy be the associated
" √quadratic
#! form"for a class
√ #!
of curves. Then,
6 2 1/ 5 2/ 5
A= , and the eigen-pairs are 10, √ and 5, √ . So, for
2 9 2/ 5 −1/ 5
T
x + 2y + 1 2 2x − y − 1 2
f (x, y) = 0 ⇔ + √ = 1,
5 5 2
an ellipse.
1. x2 + 2xy + y 2 + 6x + 10y = 3.
Ans: a parabola.
6.5. QUADRATIC FORMS 195
As a last application,
we consider
a
quadratic
in 3 variables,
namely x1 , x2 and x3 . To do so,
a h g x l y
1 1
let A = h b f , x = x2 , b = m and y = y2
with
g f c x3 n y3
f (x1 , x2 , x3 ) = xT Ax + 2bT x + q
= ax21 + bx22 + cx23 + 2hx1 x2 + 2gx1 x3 + 2f x2 x3
+2lx1 + 2mx2 + 2nx3 + q (6.5.6)
3. Depending on the values of αi ’s, rewrite g(y1 , y2 , y3 ) to determine the center and the
planes of symmetry of f (x1 , x2 , x3 ) = 0.
x+y+z 2 x−y 2 x + y − 2z 2
4 √ + √ + √ = −(4x + 2y + 4z + 2).
3 2 6
196 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
4(x + y + z) + 5 2 x−y+1 2 x + y − 2z − 1 2
√ √ √ 9
Or equivalently to 4 + + = 12 . So, the
4 3 2 6
principal axes of the quadric (an ellipsoid) are 4(x + y + z) = −5, x − y = 1 and x + y − 2z = 1.
y2 3x2 z2
Part 2 Here f (x, y, z) = 0 reduces to 10 − 10 − 10 = 1 which is the equation of a
hyperboloid consisting of two sheets with center 0 and the axes x, y and z as the principal axes.
3x2 y2 z2
Part 3 Here f (x, y, z) = 0 reduces to 10 − 10 + 10 = 1 which is the equation of a
hyperboloid consisting of one sheet with center 0 and the axes x, y and z as the principal axes.
Part 4 Here f (x, y, z) = 0 reduces to z = y 2 −3x2 +10 which is the equation of a hyperbolic
paraboloid.
Figure 6.5: Ellipsoid, hyperboloid of two sheets and one sheet, hyperbolic paraboloid
T
.
AF
DR
Chapter 7
k
AF
M
W −1 AW = Ti , where, Ti ∈ Mmi (C), for 1 ≤ i ≤ k
DR
i=1
and Ti ’s are upper triangular matrices with constant diagonal λi . If A has real entries with real
eigenvalues then W can be chosen to have real entries.
Proof. By Schur Upper Triangularization (see Lemma 6.4.1), there exists a unitary matrix U
such that U ∗ AU = T , an upper triangular matrix with diag(T ) = (λ1 , . . . , λ1 , . . . , λk , . . . , λk ).
Now, for any upper triangular matrix B, a real number α and i < j, consider the matrix
F (B, i, j, α) = Eij (−α)BEij (α), where the matrix Eij (α) is defined in Definition ??. Then, for
1 ≤ k, ` ≤ n,
Bij − αBjj + αBii , whenever k = i, ` = j
B − αB , whenever ` 6= j
i` j`
(F (B, i, j, α))k` = (7.1.1)
Bkj + αBki , whenever k 6= i
Bk` , otherwise.
Now, using Equation (7.1.1), the diagonal entries of F (T, i, j, α) and T are equal and
(
Tij , whenever Tjj = Tii
(F (T, i, j, α))ij = Tij
0, whenever Tjj 6= Tii and α = Tjj −Tii .
Thus, if we denote the matrix F (T, i, j, α) by T1 then (F (T1 , i − 1, j, α))i−1,j = 0, for some
choice of α, whenever (T1 )i−1,i−1 6= Tjj . Moreover, this operation also preserves the 0 created by
197
198 CHAPTER 7. JORDAN CANONICAL FORM
F (T, i, j, α) at (i, j)-th place. Similarly, F (T1 , i, j + 1, α) preserves the 0 created by F (T, i, j, α)
at (i, j)-th place. So, we can successively apply the following sequence of operations to get
where α, β, . . . , γ are appropriately chosen and Tm1 [:, m1 + 1] = λ2 em1 +1 . Thus, observe that
the above operation can be applied for different choices of i and j with i < j to get the required
result.
Practice 7.1.2. Apply Theorem 7.1.1 to the matrix given below for better understanding.
1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7
0 0 0 2 3 4 5 6 7
0 0 0 0 2 3 4 5 6.
0 0 0 0 0 2 3 4 5
0 0 0 0 0 0 3 4 5
0 0 0 0 0 0 0 3 4
0 0 0 0 0 0 0 0 3
Definition 7.1.3. 1. Let λ ∈ C and k be a positive integer. Then, by the Jordan block
T
λ 1
DR
.. ..
. .
.
λ 1
λ
2. A Jordan matrix is a direct sum of Jordan blocks. That is, if A is a Jordan matrix
having r blocks then there exist positive integers ki ’s and complex numbers λi ’s (not
necessarily distinct), for 1 ≤ i ≤ r such that
5. Observe that the number of Jordan matrices of size 4 with 0 on the diagonal are 5.
We now give some properties of the Jordan blocks. The proofs are immediate and hence left
for the reader. They will be used in the proof of subsequent results.
(b) Rank(Jk (λ) − αIk ) = k, whenever α 6= λ. Or equivalently, for all α 6= λ the matrix
DR
2. Let J be a Jordan matrix that contains ` Jordan blocks for λ. Then, prove that
(a) Rank(J − λI) = n − `.
(b) J has ` linearly independent eigenvectors for λ.
(c) Rank(J − λI) ≥ Rank((J − λI)2 ) ≥ Rank((J − λI)3 ) ≥ · · · .
3. Let A ∈ Mn (C). Then, prove that AJn (λ) = Jn (λ)A if and only if AJn (0) = Jn (0)A.
Definition 7.1.7. Let J be a Jordan matrix containing Jt (λ), for some positive integer t
and some complex number λ. Then, the smallest value of k for which Rank((J − λI)k ) stops
decreasing is the order of the largest Jordan block Jk (λ) in J. This number k is called the
index of the eigenvalue λ.
Lemma 7.1.8. Let A ∈ Mn (C) be strictly upper triangular. Then, A is similar to a direct sum
of Jordan blocks. Or equivalently, there exists integers n1 ≥ . . . ≥ nm ≥ 1 and a non-singular
matrix S such that
A = S −1 Jn1 (0) ⊕ · · · ⊕ Jnm (0) S.
Proof. We will prove the result by induction on n. For n = 1, the statement is trivial. So, let
the result be # matrices of size ≤ n − 1 and let A ∈ Mn (C) be strictly upper triangular.
" true for
0 a T
Then, A = . By induction hypothesis there exists an invertible matrix S1 such that
0 A1
m
S1−1
X
A1 = Jn1 (0) ⊕ · · · ⊕ Jnm (0) S1 with ni = n − 1.
i=1
Thus,
0 a T a T
1 2
" # " # " #" #" # " #
1 0 1 0 10 0 aT 1 0 0 a T S1
A = = = 0 Jn (0) 0 ,
0 S1−1 0 S1−1 0 A1 0 S1
1
0 S1 0 S −1 A1 S1
0 0 J
h i
where S1−1 Jn1 (0) ⊕ · · · ⊕ Jnm (0) S1 = Jn1 (0) ⊕ J and aT S1 = aT1 aT2 . Now, writing Jn1 to
mean Jn1 (0) and using Remark 7.1.5.4e, we have
1 −aT1 JnT1 0 0 aT1 aT2 1 aT1 JnT1 0 0 ha1 , e1 ieT1 aT2
= 0 .
0 In1 0
0 Jn1 0 0 In1 0 Jn1 0
0 0 I 0 0 J 0 0 I 0 0 J
first case, A is similar to T
0 Jn1 0 . This in turn is similar to 0 0 a2 by permuting
AF
0 0 J 0 0 J
DR
the first row and column. At this stage, one can apply induction and if necessary do a block
permutation, in order to keep the block sizes in decreasing order.
So, let us now assume that ha1 , e1 i =
6 0. Then, writing α = ha1 , e1 i, we have
1 T aT T T
0 0 0 αe 1 2 α 0 0 0 e1 a 2
" #
α Jn1 +1 e a
1 2
T
0 I 0 = 0 Jn1 0 ≡ .
0 I 0 0 Jn 0
1
1
0 J
0 0 αI 0 0 J 0 0 αI 0 0 J
a Jordan matrix with 0 on the diagonal and the size of the Jordan blocks decreases as we move
down the diagonal. So, Si−1 Ti Si = J(λi ) is a Jordan matrix with λi on the diagonal and the
size of the Jordan blocks
k decreases
as we move down the diagonal.
Si . Then, verify that W −1 AW is a Jordan matrix.
L
Now, take W = S
i=1
Let A ∈ Mn (C). Suppose λ ∈ σ(A) and J is a Jordan matrix that is similar to A. Then, for
each fixed i, 1 ≤ i ≤ n, by `i (λ), we denote the number of Jordan blocks Jk (λ) in J for which
k ≥ i. Then, the next result uses Exercise 7.1.6 to determine the number `i (λ).
Remark 7.1.11. Let A ∈ Mn (C). Suppose λ ∈ σ(A) and J is a Jordan matrix that is similar
to A. Then, for 1 ≤ k ≤ n,
Proof. In view of Exercise 7.1.6, we need to consider only the Jordan blocks Jk (λ), for different
AF
n
L
values of k. Hence, without loss of generality, let us assume that J = ai Ji (λ), where ai ’s are
DR
i=1
non-negative integers and J contains exactly ai copies of the Jordan block Ji (λ), for 1 ≤ i ≤ n.
Then, by definition and Exercise 7.1.6, we observe the following:
P
1. n = iai .
i≥1
P
2. Rank(J − λI) = (i − 1)ai .
i≥2
3. Rank((J − λI)2 ) =
P
(i − 2)ai .
i≥3
X X X
`2 = ai = (i − 1)ai − (i − 2)ai = Rank(J − λI) − Rank((J − λI)2 ),
i≥2 i≥2 i≥3
..
.
X X X
`k = ai = (i − (k − 1))ai − (i − k)ai = Rank((J − λI)k−1 ) − Rank((J − λI)k ).
i≥k i≥k i≥k+1
Now, the required result follows as rank is invariant under similarity operation and the matrices
J and A are similar.
202 CHAPTER 7. JORDAN CANONICAL FORM
Lemma 7.1.12. [Similar Jordan matrices] Let J and J 0 be two similar Jordan matrices of
size n. Then, J is a block permutation of J 0 .
Proof. For 1 ≤ i ≤ n, let `i and `0i be, respectively, the number of Jordan blocks of J and J 0
of size at least i corresponding to λ. Since J and J 0 are similar, the matrices (J − λI)i and
(J 0 − λI)i are similar for all i, 1 ≤ i ≤ n. Therefore, their ranks are equal for all i ≥ 1 and
hence, `i = `0i for all i ≥ 1. Thus the required result follows.
We now state the main result of this section which directly follows from Lemma 6.4.1,
Theorem 7.1.1 and Corollary 7.1.10 and hence the proof is omitted.
Theorem 7.1.13. [Jordan canonical form theorem] Let A ∈ Mn (C). Then, A is similar to
a Jordan matrix J, which is unique up to permutation of Jordan blocks. If A ∈ Mn (R) and has
real eigenvalues then the similarity transformation matrix S may be chosen to have real entries.
This matrix J is called the the Jordan canonical form of A, denoted Jordan CF(A).
Example 7.1.14. Let us use the idea from Lemma 7.1.11 to find the Jordan Canonical Form
of the following matrices.
0 0 1 0
0 0 0 1
1. Let A = J4 (0)2 = .
T
0 0 0 0
AF
0 0 0 0
DR
Solution: Note that `1 = 4 − Rank(A − 0I) = 2. So, there are two Jordan blocks.
Also, `2 = Rank(A − 0I) − Rank((A − 0I)2 ) = 2. So, there are at least 2 Jordan blocks of
size 2. As there are exactly two Jordan blocks, both the blocks must have size 2. Hence,
Jordan CF(A) = J2 (0) ⊕ J2 (0).
1 1 0 1
0 1 1 1
2. Let A1 =
.
0 0 1 1
0 0 0 1
Solution: Let B = A1 − I. Then, `1 = 4 − Rank(B) = 1. So, B has exactly one Jordan
block and hence A1 is similar to J4 (1).
1 1 0 1
0 1 1 1
3. A2 = 0
.
0 1 0
0 0 0 1
Solution: Let C = A2 − I. Then, `1 = 4 − Rank(C) = 2. So, C has exactly two Jordan
blocks. Also, `2 = Rank(C) − Rank(C 2 ) = 1 and `3 = Rank(C 2 ) − Rank(C 3 ) = 1. So, there
is at least 1 Jordan blocks of size 3.
Thus, we see that there are two Jordan blocks and one of them is of size 3. Also, the size
of the matrix is 4. Thus, A2 is similar to J3 (1) ⊕ J1 (1).
7.1. JORDAN CANONICAL FORM THEOREM 203
i=1
k
AF
L
T = Ti into the required Jordan matrix.
i=1
DR
n
2. Let A ∈ Mn (C) be a diagonalizable matrix. Then, by definition, A is similar to
L
λi ,
i=1
n
L
where λi ∈ σ(A), for 1 ≤ i ≤ n. Thus, Jordan CF(A) = λi , up to a permutation of
i=1
λi ’s.
3. In general, the computation
" # of Jordan CF(A) is not numerically stable. To understand
0
this, let A = . Then, A is diagonalizable as A has distinct eigenvalues. So,
1 0
" #
0
Jordan CF(A ) = .
0 0
" # " #
0 0 0 1
Whereas, for A = , we know that Jordan CF(A) = 6= lim Jordan CF(A ).
1 0 0 0 →0
Thus, a small change in the entries of A may change Jordan CF(A) significantly.
4. Let A ∈ Mn (C) and > 0 be given. Then, there exists an invertible matrix S such
k
that S −1 AS =
L
Jni (λi , ), where Jni (λi , ) is obtained from Jni (λi ) by replacing each
i=1
off diagonal entry 1 by an . To get this, define Di() = diag(1, , 2 , . . . , ni −1 ), for
k
(Di())−1 Jni (λi )Di() .
L
1 ≤ i ≤ k. Now compute
i=1
5. Let Jordan CF(A) contain ` Jordan blocks for λ. Then, A has ` linearly independent
eigenvectors for λ.
204 CHAPTER 7. JORDAN CANONICAL FORM
For if, A has at least ` + 1 linearly independent eigenvectors for λ, then dim(Null(A −
λI)) > `. So, Rank(A − λI) < n − `. But, the number of Jordan blocks for λ in A is `.
Thus, we must have Rank(J − λI) = n − `, a contradiction.
6. Let λ ∈ σ(A). Then, by Remark 7.1.5.5, Geo.Mulλ (A) = the number of Jordan blocks
Jk (λ) in Jordan CF(A).
7. Let λ ∈ σ(A). Then, by Remark 7.1.5.3, Alg.Mulλ (A) = the sum of the sizes of all
Jordan blocks Jk (λ) in Jordan CF(A).
8. Let λ ∈ σ(A). Then, Jordan CF(A) does not get determined
by Alg.Mulλ (A) and
" # 0 1 0 " # " # " #
h i 0 1 0 1 0 1 0 1
Geo.Mulλ (A). For example, 0 ⊕ ⊕0 0 1 and 0 0 ⊕ 0 0 ⊕ 0 0
0 0
0 0 0
are different Jordan CFs but they have the same algebraic and geometric multiplicities.
9. Let A ∈ Mn (C). Suppose that, for each λ ∈ σ(A), the values of Rank(A − λI)k , for
k = 1, . . . , n are known. Then, using Remark 7.1.11, Jordan
h i CF(A) can be computed.
But, note here that finding rank is numerically unstable as has rank 1 but it converges
h i
to 0 which has a different rank.
1
(i, j)-th entry of A goes to (n − i + 1, n − j + 1)-th position in KAK. Hence,
hM i hM i hM i hM iT
Kni Jni (λi ) Kni = Jni (λi ) .
Definition 7.2.1. Let P (t) = tn + an−1 tn−1 + · · · + a0 be a monic polynomial in t of degree
0 0 0 · · · 0 −a0
T
1 0 0 ··· 0 −a1
AF
0 1 0 ··· 0 −a2
DR
Remark 7.2.2. Let A ∈ Mn (C) and let f (x) = xn +an−1 xn−1 +· · ·+a1 x+a0 be its characteristic
polynomial. Then by the Cayley Hamilton Theorem, An + an−1 An−1 + · · · + a1 A + a0 I = 0.
Hence An = −(an−1 An−1 + · · · + a1 A + a0 I).
Suppose, there exists a vector u ∈ Cn such that B = u, Au, A2 u, . . . , An−1 u is an ordered
Definition 7.2.3. Let A ∈ Mn (C). Then, the polynomial P (t) is said to annihilate (destroy)
A if P (A) = 0.
Definition 7.2.4. Let A ∈ Mn (C). Then, the minimal polynomial of A, denoted mA (x), is
a monic polynomial of least positive degree satisfying mA (A) = 0.
Theorem 7.2.5. Let A be the companion matrix of the monic polynomial P (t) = tn +an−1 tn−1 +
· · · + a0 . Then, P (t) is both the characteristic and the minimal polynomial of A.
We will now show that P (t) is the minimal polynomial of A. To do so, we first observe that
DR
Now, Suppose we have a monic polynomial Q(t) = tm + bm−1 tm−1 + · · · + b0 , with m < n,
such that Q(A) = 0. Then, using Equation (7.2.1), we get
Lemma 7.2.6. [Existence of the minimal polynomial] Let A ∈ Mn (C). Then, there exists a
unique monic polynomial m(x) of minimum (positive) degree such that m(A) = 0. Further, if
f (x) is any polynomial with f (A) = 0 then m(x) divides f (x).
Proof. Let P (x) be the characteristic polynomial of A. Then, deg(P (x)) = n and by the
Cayley-Hamilton Theorem, P (A) = 0. So, consider the set
we get r(A) = 0. But, m(x) was the least degree polynomial with m(A) = 0 and hence r(x) is
the zero polynomial. That is, m(x) divides f (x).
As an immediate corollary, we have the following result.
Corollary 7.2.7. [Minimal polynomial divides the characteristic polynomial] Let mA (x)
and PA (x) be, respectively, the minimal and the characteristic polynomials of A ∈ Mn (C).
1. Then, mA (x) divides PA (x).
2. Further, if λ is an eigenvalue of A then mA (λ) = 0.
Proof. The first part following directly from Lemma 7.2.6. For the second part, let (λ, x) be an
T
mA (λ)x = mA (A)x = 0x = 0.
DR
Lemma 7.2.8. Let A and B be two similar matrices. Then, they have the same minimal
polynomial.
Proof. Since A and B are similar, there exists an invertible matrix S such that A = S −1 BS.
Hence, f (A) = F (S −1 BS) = S −1 f (B)S, for any polynomial f . Hence, mA (A) = 0 if and only
if mA (B) = 0 and thus the required result follows.
k
(x−λi )αi , for some αi ’s with 1 ≤ αi ≤ Alg.Mulλi (A).
Q
Proof. Using 7.2.7, we see that mA (x) =
i=1
k
(J − λi I)αi = 0. But, observe that
Q
As mA (A) = 0, using Lemma 7.2.8 we have mA (J) =
i=1
for the Jordan block Jni (λi ), one has
1. (Jni (λi ) − λi I)αi = 0 if and only if αi ≥ ni , and
208 CHAPTER 7. JORDAN CANONICAL FORM
Theorem 7.2.10. Let A ∈ Mn (C) and let λ1 , . . . , λk be the distinct eigenvalues of A. If the
k
(x − λi )ni then ni is the size of the largest Jordan block for
Q
minimal polynomial of A equals
i=1
λi in J = Jordan CFA.
Theorem 7.2.11. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is diagonalizable.
2. Every zero of mA (x) has multiplicity 1.
d
3. Whenever mA (α) = 0, for some α, then mA (x)x=α 6= 0.
dx
Proof. Part 1 ⇒ Part 2. If A is diagonalizable, then each Jordan block in J = Jordan CFA
T
k
Q
has size 1. Hence, by Theorem 7.2.9, mA (x) = (x−λi ), where λi ’s are the distinct eigenvalues
AF
i=1
of A.
DR
k
Q
Part 2 ⇒ Part 3. Let mA (x) = (x − λi ), where λi ’s are the distinct eigenvalues of A.
i=1
Then, mA (x) = 0 if and only if x = some i, 1 ≤ i ≤ k. In that case, it is easy to verify
λi , for
d
that mA (x) 6= 0, for each λi .
dx
d
Part 3 ⇒ Part 1. Suppose that for each α satisfying mA (α) = 0, one has mA (α) 6= 0.
dx
Then, it follows that each zero of mA (x) has multiplicity 1. Also, using Corollary 7.2.7, each
zero of mA (x) is an eigenvalue of A and hence by Theorem 7.2.9, the size of each Jordan block
is 1. Thus, A is diagonalizable.
We now have the following remarks and observations.
Remark 7.2.12. 1. Let f (x) be a monic polynomial and A = Companion(f ) be the com-
panion matrix of f . Then, by Theorem 7.2.5) f (A) = 0 and no monic polynomial of
smaller degree annihilates A. Thus PA (x) = mA (x) = f (x), where PA (x) is the charac-
teristic polynomial and mA (x), the minimal polynomial of A.
2. Let A ∈ Mn (C). Then, A is similar to Companion(f ), for some monic polynomial f if
and only if mA (x) = f (x).
Proof. Let B = Companion (f ). Then, using Lemma 7.2.8, we see that mA (x) = mB (x).
But, by Remark 7.2.12.1, we get mB (x) = f (x) and hence the required result follows.
Conversely, assume that mA (x) = f (x). But, by Remark 7.2.12.1, mB (x) = f (x) =
PB (x), the characteristic polynomial of B. Since mA (x) = mB (x), the matrices A and
7.2. MINIMAL POLYNOMIAL 209
B have the same largest Jordan blocks for each eigenvalue λ. As PB = mB , we know
that for each λ, there is only one Jordan block in Jordan CFB. Thus, Jordan CFA =
Jordan CFB and hence A is similar to Companion (f ).
Ans: Yes.
AF
6. Let A ∈ Mn (C) be idempotent and let J = Jordan CFA. Thus, J 2 = J and hence con-
DR
clude that J must be a diagonal matrix. Hence, every idempotent matrix is diagonalizable.
Ans: J 2 = J implies that the minimal polynomial is x(x − 1), a product of distinct linear
factors. So, by Theorem 7.2.11, A is diagonalizable.
7. Let A ∈ Mn (C). Suppose that mA (x)|x(x − 1)(x − 2)(x − 3). Must A be diagonalizable?
Ans: Yes. As, each zero of mA (x) has multiplicity 1, using Theorem 7.2.11, we see that A
is diagonalizable.
8. Let A ∈ M9 (C) be a nilpotent matrix such that A5 6= 0 but A6 = 0. Determine PA (x) and
mA (x).
Ans: As A is a 9 × 9 nilpotent matrix, PA (x) = x9 . And, using the given condition, we get
mA (x) = x6 .
9. Recall that for A, B ∈ Mn (C), the characteristic polynomial of AB and BA are the same.
That is, P"
AB (x)#= PBA (x)." However,
# they need not have the same minimal polynomial.
0 1 0 0
Take A = and B = to verify that mAB (x) 6= mBA (x).
0 0 0 1
10. Let A ∈ Mn (C) be an invertible matrix. Then prove that if the minimal polynomial of A
equals m(x, λ1 , . . . , λk ) then the minimal polynomial of A−1 equals m(x, 1/λ1 , . . . , 1/λk ).
11. Let λ an eigenvalue of A ∈ Mn (C) with two linearly independent eigenvectors. Show that
there does not exist a vector u ∈ Cn such that LS u, Au, A2 u, . . . = Cn .
210 CHAPTER 7. JORDAN CANONICAL FORM
Ans: If such a vector u exists then by Remark 7.2.2 the characteristic polynomial of A equals
its minimal polynomial. This is NOT true as λ has two linearly independent eigenvectors and
hence by Remark 7.1.15(6) has at least Jordan blocks corresponding to the eigenvalue λ.
We end this section with a method to compute the minimal polynomial of a given matrix.
Example 7.2.14. [Computing the minimal polynomial] Let λ1 , . . . , λk be the distinct eigen-
values of A ∈ Mn (C).
Ans: One can use Remark 7.1.11 to first compute Jordan CFA and then compute mA (t)
using Theorem 7.2.9.
Alternate: We can use Gram-Schmidt orthogonalization process to directly compute the min-
2
To do that consider the vector space C . Let us represent the matrix A by
imal polynomial. n
a11
. n2 n2
. ∈ C . Then, it can easily be shown that the map η : Mn (C) → C is a vector
η(A) = .
ann
space isomorphism.
Now, use the Cayley-Hamilton theorem to deduce that the set of vectors {η(I), η(A), . . . , η(An )}
is linearly dependent. Hence, there is a smallest k such that {η(I), η(A), . . . , η(Ak )} is a linearly
independent set whereas the set {η(I), η(A), . . . , η(Ak+1 )} is linearly dependent.
This k can be found by applying Gram-Schmidt orthogonalization process (or the idea of QR-
T
decomposition) on the set {η(I), η(A), . . . , η(An )}. In particular, the Gram-Schmidt process will
AF
using S −1 A = JS −1 . verify that the initial problem x0 (t) = Ax(t) is equivalent to the equation
S −1 x0 (t) = S −1 Ax(t) which in turn is equivalent to y0 (t) = Jy(t), where S −1 x(t) = y(t) with
y(0) = S −1 x(0) = 0. Therefore, if y is a solution to the second equation then x(t) = Sy is a
solution to the initial problem.
When J is diagonalizable then solving the second is as easy as solving yi0 (t) = λi yi (t) for
which the required solution is given by yi (t) = yi (0)eλi t .
If J is not diagonal, then for each Jordan block, the system reduces to
This problem can also be solved as in this case the solution is given by yk = c0 eλt ; yk−1 =
(c0 t + c1 )eλt and so on.
Let P (x) be a polynomial and A ∈ Mn (C). Then, P (A)A = AP (A). What about the converse?
That is, suppose we are given that AB = BA for some B ∈ Mn (C). Does it necessarily imply
that B = P (A), for some nonzero polynomial P (x)? The answer is No as I commutes with A
for every A. We start with a set of remarks.
Theorem 7.3.1. Let A ∈ Mn (C) and B ∈ Mm (C). Then, the linear system AX − XB = 0, in
T
AF
the variable matrix X of size n × m, has a unique solution, namely X = 0 (the trivial solution),
if and only if σ(A) and σ(B) are disjoint.
DR
Thus, we see that if λ is a common eigenvalue of A and B then the system AX − XB = 0 has
a nonzero solution X0 , a contradiction. Hence, the required result follows.
Corollary 7.3.2. Let A ∈ Mn (C), B ∈ Mm (C) and C be an n × m matrix. Also, assume that
σ(A) and σ(B) are disjoint. Then, it can be easily verified that the system AX − XB = C, in
the variable matrix X of size n × m, has a unique solution, for any given C.
212 CHAPTER 7. JORDAN CANONICAL FORM
Proof. Consider the linear transformation T : Mn,m (C) → Mn,m (C), defined by T (X) =
AX − XB. Then, by Theorem 7.3.1, Null(T ) = {0}. Hence, by the rank-nullity theorem, T
is a bijection and the required result follows.
a3 a2 a1 b1
b1 b2 b3 b4
0 b1 b2 b3
Toeplitz type matrix. and the matrix B = is an upper triangular Toeplitz
0 0 b1 b2
0 0 0 b1
type matrix.
Exercise 7.3.4. Let Jn (0) ∈ Mn (C) be the Jordan block with 0 on the diagonal.
1. Further, if A ∈ Mn (C) such that AJn (0) = Jn (0)A then prove that A is an upper Toeplitz
type matrix.
2. Further, if A, B ∈ Mn (C) are two upper Toeplitz type matrices then prove that
(a) there exists ai ∈ C, 1 ≤ i ≤ n, such that A = a0 I + a1 Jn (0) + · · · + an Jn (0)n−1 .
T
AF
To proceed further, recall that a matrix A ∈ Mn (C) is called non-derogatory if Geo.Mulα (A) =
1, for each α ∈ σ(A) (see Definition 6.3.9).
Theorem 7.3.5. Let A ∈ Mn (C) be a non-derogatory matrix. Then, the matrices A and B
commute if and only if B = P (A), for some polynomial P (t) of degree at most n − 1.
Proof. If B = P (A), for some polynomial P (t), then A and B commute. Conversely, suppose
that AB = BA, σ(A) = {λ1 , . . . , λk } and let J = Jordan CFA = S −1 AS be the Jordan matrix
Jn1 (λ1 ) B 11 · · · B 1k
.. . Now, write B = S −1 BS = .. . . . ..
of A. Then, J = . . ., where
Jnk (λk ) B k1 · · · B kk
B is partitioned conformally with J. Note that AB = BA gives JB = BJ. Thus, verify that
Fi (Jni (λi ))−1 B ii = c1 I + c2 Jni (0) + · · · + cni Jni (0)ni −1 = Ri (Jni (λi )), (say).
Thus, B ii = Fi (Jni (λi ))Ri (Jni (λi )). Putting Pi (t) = Fi (t)Ri (t), for 1 ≤ i ≤ k, we see that Pi (t)
is a polynomial of degree at most n − 1 with Pi ((Jnj (λj )) = 0, for j 6= i and Pi ((Jnj (λi )) = B ii .
Taking, P = P1 + · · · + Pk , we have
Jn1 (λ1 ) Jn1 (λ1 )
P (J) = P1
..
+ · · · + Pk ..
. .
Jnk (λk ) Jnk (λk )
B 11 0
= ..
+ · · · + ...
= B.
.
0 B kk
Hence, B = SBS −1 = SP (J)S −1 = P (SJS −1 ) = P (A) and the required result follows.
T
AF
DR
214 CHAPTER 7. JORDAN CANONICAL FORM
T
AF
DR
Chapter 8
Advanced Topics on
Diagonalizability and
Triangularization∗
We start this subsection with a few definitions and examples. So, it will be nice to recall the
T
notations used in Section 1.5 and a few results from Appendix 9.2.
AF
1. Also, let S ⊆ [n]. Then, det (A[S, S]) is called the Principal minor of A corresponding
to S.
1 2 3 4
5 6 7 8
Example 8.1.3. Let A =
9
. Then, note that
8 7 6
5 4 3 2
1. EM1 (A) = 1 + 6 + 7 + 2 = 16 and EM2 (A) = det A({1, 2}, {1, 2}) + det A({1, 3}, {1, 3}) +
det A({1, 4}, {1, 4}) + det A({2, 3}, {2, 3}) + det A({2, 4}, {2, 4}) + det A({3, 4}, {3, 4}) =
−80.
215
216CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
n
1. the coefficient of tn−k in PA (t) =
Q
(t − λi ), the characteristic polynomial of A, is
i=1
X
(−1)k λi1 · · · λik = (−1)k Sk (λ1 , . . . , λn ). (8.1.1)
i1 <···<ik
For all i ∈ S, consider all permutations σ such that σ(i) = i. Our idea is to select a ‘t’ from
AF
these biσ(i) . Since we do not want any more ‘t’, we set t = 0 for any other diagonal position. So
DR
the contribution from S to the coefficient of tn−k is det[−A(S|S)] = (−1)k det A(S|S). Hence
the coefficient of tn−k in PA (t) is
X X
(−1)k det A(S|S) = (−1)k det A[T, T ] = (−1)k Ek (A).
S⊆[n], |S|=n−k T ⊆[n], |T |=k
Let A and B be similar matrices. Then, by Theorem 6.2.3, we know that σ(A) = σ(B).
Thus, as a direct consequence of Part 2 of Theorem 8.1.4 gives the following result.
Corollary 8.1.6. Let A and B be two similar matrices of order n. Then, EMk (A) = EMk (B)
for 1 ≤ k ≤ n.
So, the sum of principal minors of similar matrices are equal. Or in other words, the sum
of principal minors are invariant under similarity.
Proof. For 1 ≤ i ≤ n, let us denote A(i|i) by Ai . Then, using Equation (8.1.3), we have
n
X X X X
PAi (t) = tn−1 − EM1 (Ai )tn−2 + · · · + (−1)n−1 EMn−1 (Ai )
i=1 i i i
= ntn−1 − (n − 1)EM1 (A)tn−2 + (n − 2)EM2 (A)tn−3 − · · · + (−1)n−1 EMn−1 (A)
= PA0 (t).
Proof. As Alg.Mulα (A) = 1, PA (t) = (t − λ)q(t), where q(t) is a polynomial with q(λ) 6=
0. Thus PA0 (t) = q(t) + (t − λ)q 0 (t). Hence, PA0 (λ) = q(λ) 6= 0. Thus, by Corollary 8.1.7,
0
P
i PA(i|i) (λ) = PA (λ) 6= 0. Hence, there exists i, 1 ≤ i ≤ n such that PA(i|i) (λ) 6= 0. That is,
det[A(i|i) − λI] 6= 0 or Rank[A − λI] = n − 1.
" #
0 1
Remark 8.1.9. Converse of Corollary 8.1.8 is false. Note that for the matrix A = ,
0 0
Rank[A − 0I] = 1 = 2 − 1 = n − 1, but 0 has multiplicity 2 as a root of PA (t) = 0.
1. Geo.Mulλ (A) ≥ k.
2. If B is a principal sub-matrix of A of size m > n − k then λ ∈ σ(B).
3. Alg.Mulλ (A) ≥ k.
Proof. Part 1⇒ Part 2. Let {x1 , . . . , xk } be linearly independent eigenvectors for λ and
" let #
B
B ∗
be a principal sub-matrix of A of size m > n − k. Without loss, we may write A = .
∗ ∗
" #
xi1
Let us partition the xi ’s , say xi = , such that
xi2
" #" # " #
B ∗ xi1 xi1
=λ , for 1 ≤ i ≤ k.
∗ ∗ xi2 xi2
As m > n − k, the size of xi2 is less than k. Thus, the set {x12 , . . . , xk2"} is# linearly dependent
y1
(see Corollary 3.3.9). So, there is a nonzero linear combination y = of x1 , . . . , xk such
y2
that y2 = 0. Notice that y1 6= 0 and By1 = λy1 .
n
Part 2⇒ Part 3. By Corollary 8.1.7, we know that PA0 (t) =
P
PA(i|i) (t). As A(i|i) is of size
i=1
n − 1, we get PA(i|i) (λ) = 0, for all i = 1, 2, . . . , n. Thus, PA0 (λ) = 0. A similar argument now
(2) (2) d
applied to each of the A(i|i)’s, gives PA (λ) = 0, where PA (t) = PA0 (t). Proceeding on above
dt
(i)
lines, we finally get PA (λ) = 0, for i = 0, 1, . . . , k − 1. This implies that Alg.Mulλ (A) ≥ k.
218CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
Theorem 8.1.12. [Newton’s identities] Let P (t) = tn + an−1 tn−1 + · · · + a0 have zeros
n
λki . Then, for 1 ≤ k ≤ n,
P
λ1 , . . . , λn , counted with multiplicities. Put µk =
i=1
That is, the first n moments of the zeros determine the coefficients of P (t).
Proof. For simplicity of expression, let an = 1. Then, using Equation (8.1.4), we see that
k = 1 gives us an−1 = −µ1 . To compute an−2 , put k = 2 in Equation (8.1.4) to verify that
−µ2 +µ21
an−2 = 2 . This process can be continued to get all the coefficients of P (t). Now, let us
prove the n given equations.
P 1 P 0 (t)
Define f (t) = t−λi = P (t) and take |t| > max |λi |. Then, the left hand side can be
i i
re-written as
n n n h
X 1 X 1 X 1 λi i n µ
1
f (t) = = = + 2 + ··· = + 2 + ··· . (8.1.5)
t − λi λi t t t t
i=1 i=1 t 1 − i=1
t
T
AF
hn µ1 ih i
nan tn−1 + (n − 1)an−1 tn−2 + · · · + a1 = P 0 (t) = + + · · · an tn
+ · · · + a0 .
t t2
Remark 8.1.13. Let P (t) = an tn + · · · + a1 t + a0 with an = 1. Thus, we see that we need not
find the zeros of P (t) to find the k-th moments of the zeros of P (t). It can directly be computed
recursively using the Newton’s identities.
Exercise 8.1.14. Let A, B ∈ Mn (C). Then, prove that A and B have the same eigenvalues if
and only if tr(Ak ) = tr(Bk ), for k = 1, . . . , n. (Use Exercise 6.1.9 ??).
Thus, the set G forma a group with respect to multiplication. We now define this group.
For a unitary matrix U , we know that U −1 = U ∗ . Our next result gives a necessary and
sufficient condition on an invertible matrix A so that the matrix A−1 is similar to A∗ .
Theorem 8.2.3. [Generalizing a Unitary Matrix] Let A be an invertible matrix. Then A−1
is similar to A∗ if and only if there exists an invertible matrix B such that A = B −1 B ∗ .
That is, −e2iθ ∈ σ(S −1 S ∗ ). Thus, if we choose θ0 ∈ R such that −e2i(θ0 ) 6∈ σ(S −1 S ∗ ) then H( θ0 )
is nonsingular.
To get our result, we finally choose B = β(αI − A∗ )H( θ0 ) such that β 6= 0 and α = eiγ ∈
/
σ(A∗ ).
Note that with α and β chosen as above, B is invertible. Furthermore,
As we need, BA = B ∗ , we get βH( θ0 )(αA − I) = βH( θ0 )(αI − A) and thus, we need β = −βα,
which holds true if β = ei(π−γ)/2 . Thus, the required result follows.
Exercise 8.2.4. Suppose that A is similar to a unitary matrix. Then, prove that A−1 is similar
to A∗ .
Definition 8.2.5. [Plane Rotations] For a fixed positive integer n, consider the vector space
Rn with standard basis {e1 , . . . , en }. Also, for 1 ≤ i, j ≤ n, let Ei,j = ei eTj . Then, for θ ∈ R
and 1 ≤ i, j ≤ n, a plane rotation, denoted U (θ; i, j), is defined as
U (θ; i, j) = I − Ei,i − Ej,j + [Ei,i + Ej,j ] cos θ − Ei,j sin θ + Ej,i sin θ.
T
1
AF
.
..
DR
cos θ − sin θ ← i-th row
Remark 8.2.6. Note the following about the matrix U (θ; i, j), where θ ∈ R and 1 ≤ i, j ≤ n.
1. U (θ; i, j) are orthogonal.
2. Geometrically U (θ; i, j)x rotates x by the angle θ in the ij-plane.
3. Geometrically (U (θ; i, j))T x rotates x by the angle −θ in the ij-plane.
4. If y = U (θ; i, j)x then the coordinates of y are given by
(a) yi = xi cos θ − xj sin θ,
(b) yj = xi sin θ + xj cos θ, and
(c) for l 6= i, j, yl = xl .
5. Thus, for x ∈ Rn , the choice of θ for which yj = 0, where y = U (θ; i, j)x equals
(a) θ = 0, whenever xj = 0. That is, U (0; i, j) = I.
(b) θ = cot−1 − xxji , whenever xj 6= 0.
8.2. METHODS FOR TRIDIAGONALIZATION AND DIAGONALIZATION 221
(0, 0
, 0)
T
AF
DR
7. In general, in Rn , suppose that we want to apply plane rotation to a along the x1 x2 -plane
so that the resulting vector has 0 in the 2-nd coordinate. In that case, our circle on x1 x2 -
p h iT
plane has radius r = a21 + a22 and it gets translated by 0, 0, a3 , · · · an . So, there
h iT
are two points x on this circle with x2 = 0 and they are ±r, 0, a3 , · · · an .
8. Consider three mutually orthogonal unit vectors, say x, y, z. Then, x can be brought to e1
by two plane rotations, namely by an appropriate U (θ1 ; 1, 3) and U (θ2 ; 1, 2). Thus,
As unitary transformations preserve angles, note that ŷ(1) = ẑ(1) = 0. Now, we can apply
an appropriate plane rotation U (θ3 ; 2, 3) so that U (θ3 ; 2, 3)ŷ = e2 . Since e3 is the only
unit vector in R3 orthogonal to both e1 and e2 , it follows that U (θ3 ; 2, 3)ẑ = e3 . Thus,
h i h i
I = e1 e2 e3 = U (θ3 ; 2, 3)U (θ2 ; 1, 2)U (θ1 ; 1, 3) x y z .
222CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
Hence, any real orthogonal matrix A ∈ M3 (R) is a product of three plane rotations.
We are now ready to give another method to get the QR-decomposition of a square matrix
(see Theorem 5.6.1 that uses the Gram-Schmidt Orthonormalization Process).
Proposition 8.2.7. [QR Factorization Revisited: Square Matrix] Let A ∈ Mn (R). Then
there exists a real orthogonal matrix Q and an upper triangular matrix R such that A = QR.
Proof. We start by applying the plane rotations to A so that the positions (2, 1), (3, 1), . . . , (n, 1)
of A become zero. This means, if a21 = 0, we multiply by I. Otherwise, we use the plane rotation
U (θ; 1, 2), where θ = cot−1 (−a11 /a21 ). Then, we apply a similar technique to A so that the
(3, 1) entry of A becomes 0. Note that this plane rotation doesn’t change the (2, 1) entry of A.
We continue this process till all the entry in the first column of A, except possibly the (1, 1)
entry, is zero.
We then apply the plane rotations to make positions (3, 2), (4, 2), . . . , (n, 2) zero. Observe
that this does not disturb the zeros in the first column. Thus, continuing the above process a
finite number of times give us the required result.
Lemma 8.2.8. [QR Factorization Revisited: Rectangular Matrix] Let A ∈ Mm,n (R). Then
there exists a real orthogonal matrix Q and a matrix R ∈ Mm,n (R) in upper triangular form
such that A = QR.
T
AF
Proof. If RankA < m, add some columns to A to get a matrix, say à such that Rankà = m. Now
suppose that à has k columns. For 1 ≤ i ≤ k, let vi = Ã[:, i]. Now, apply the Gram-Schmidt
DR
Proof. If a31 6= 0, then put U1 = U (θ1 ; 2, 3), where θ1 = cot−1 (−a21 /a31 ). Notice that U1T [:
, 1] = e1 and so
(U1 AU1T )[:, 1] = (U1 A)[:, 1].
8.2. METHODS FOR TRIDIAGONALIZATION AND DIAGONALIZATION 223
We already know that U1 A[3, 1] = 0. Hence, U1 AU1T is a real symmetric matrix with (3, 1)-
th entry 0. Now, proceed to make the (4, 1)-th entry of U1 A equal to 0. To do so, take
U2 = U (θ2 ; 2, 4). Notice that U2T (:, 1) = e1 and so
But by our choice of the plane rotation U2 , we have U2 (U1 AU1T )(4, 1) = 0. Furthermore, as
U2 [3, :] = eT3 , we have
(U2 U1 AU1T )[3, 1] = U2 [3, :](U1 AU1T )[:, 1] = (U1 AU1T )[3, 1] = 0.
Proof. The idea is to reduce the off-diagonal entries of A to 0 as much as possible. So, we start
with choosing i 6= j) such that i < j and |aij | is maximum. Now, put
T
AF
1 aii − ajj
θ= cot−1 , U = U (θ; i, j), and B = U T AU.
DR
2 2aij
Thus, using the above, we see that whenever l, k 6= i, j, a2lk = b2lk and for l 6= i, j, we have
As the rest of the diagonal entries have not changed, we observe that the sum of the squares of
the off-diagonal entries have reduced by 2a2ij . Thus, a repeated application of the above process
makes the matrix “close to diagonal”.
224CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
We will now look at another class of unitary matrices, commonly called the Householder matrices
(see Exercise 1.5.5.8).
Definition 8.2.11. [Householder Matrix] Let w ∈ Cn be a unit vector. Then, the matrix
Uw = I − 2ww∗ is called a Householder matrix.
3. If x ∈ w⊥ then Uw x = x.
2
where e1 ∈ w⊥ and 2e2 ∈ LS(w). So
DR
Recall that if x, y ∈ Rn with x 6= y and kxk = kyk then, (x +"y)# ⊥ (x − y)." This # is not
1 i
true in Cn as can be seen from the following example. Take x = and y = . Then
1 −1
" # " #
1+i 1−i
h , i = (1 + i)2 6= 0. Thus, to pick the right choice for the matrix Uw , we need to
0 2
be observant of the choice of the inner product space.
Example 8.2.14. Let x, y ∈ Cn with x 6= y and kxk = kyk. Then, which Uw should be used
to reflect y to x?
1. Solution in case of Rn : Imagine the line segment joining x and y. Now, place a mirror
at the midpoint and perpendicular to the line segment. Then, the reflection of y on that
x−y
mirror is x. So, take w = kx−yk ∈ Rn . Then,
x−y
Uw y = (I − 2wwT )y = y − 2wwT y = y − 2 (x − y)T y
kx − yk2
x − y −kx − yk2
= y−2 = x.
kx − yk2 2
8.2. METHODS FOR TRIDIAGONALIZATION AND DIAGONALIZATION 225
x−y
Uw y = (I − 2ww∗ )y = y − 2ww∗ y = y − 2 (x − y)∗ y
kx − yk2
x − y −kx − yk2
= y−2 = x.
kx − yk2 2
T
" # " #
1 i
For example, taking x = and y = , we have hx + y, x − yi =
6 0.
DR
1 −1
As an application, we now prove that any real symmetric matrix can be transformed into a
tri-diagonal matrix.
" #
a vT
Proposition 8.2.15. [Householder’s Tri-Diagonalization] Let v ∈ Rn−1 and A = ∈
v B
Mn (R) be a real symmetric matrix. Then, there exists a real orthogonal matrix Q, a product of
Householder matrices, such that QT AQ is tri-diagonal.
Proof. If v = e1 then we proceed to apply our technique to the matrix B, a matrix of lower
order. So, without loss of generality, we assume that v 6= e1 .
As we want QT AQ to be tri-diagonal, we need to find a vector w ∈ Rn−1 such that Uw v =
re1 ∈ Rn−1 , where r = kvk = kUw vk. Thus, using Example 8.2.14, choose the required vector
w ∈ Rn−1 . Then,
" #" #" # " # a r 0 " #
1 0 a vT 1 0 a v T Uw
T a reT
1
T
= T
=
r ∗ ∗ = re
,
0 Uw v B 0 Uw Uw v Uw BUw 1 S
0 ∗ ∗
where S ∈ Mn−1 (R) is a symmetric matrix. Now, use induction on the matrix S to get the
required result.
226CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
Definition 8.2.16. Let s and t be two symbols. Then, an expression of the form
Remark 8.2.17. [More on Unitary Equivalence] Let s and t be two symbols and W (s, t) be
a word in symbols s and t.
1. Suppose U is a unitary matrix such that B = U ∗ AU . Then, W (A, A∗ ) = U ∗ W (B, B ∗ )U .
Thus, tr[W(A, A∗ )] = tr[W(B, B∗ )].
2. Let A and B be two matrices such that tr[W(A, A∗ )] = tr[W(B, B∗ )], for each word W .
Then, does it imply that A and B are unitarily equivalent? The answer is ‘yes’ as provided
by the following result. The proof is outside the scope of this book.
Exercise 8.2.19. [Triangularization via Complex Orthogonal Matrix need not be Possi-
T
ble] Let A ∈ Mn (C) and A = QT QT , where Q is complex orthogonal matrix and T is upper
AF
Ans: As QQT = I. Hence, AQ = QT and thus Q[:, 1], the first column of Q is an eigenvector
with corresponding eigenvalue t11 . Observe that Q[:, 1]T Q[:, 1] = "1 6= #0.
i
For the second part, verify that σ(A) = {0, 0} and x = is the only eigenvector of
−1
" #
1 i
A= corresponding to the eigenvalue 0. Moreover, notice that xT x = 0.
i −1
Proposition 8.2.20. [Matrices with Distinct Eigenvalues are Dense in Mn (C)] Let A ∈
Mn (C). Then, for each > 0, there exists a matrix A() ∈ Mn (C) such that A() = [a()ij ] has
|aij − a()ij |2 < .
P
distinct eigenvalues and
Proof. By Schur Upper Triangularization (see Lemma 6.4.1), there exists a unitary matrix U
such that U ∗ AU = T , an upper triangular matrix. Now, choose αi ’s such that tii + αi are
|αi |2 < . Now, consider the matrix A() = U (T + diag(α1 , . . . , αn )) U ∗ . Then,
P
distinct and
B = A() − A = U diag(α1 , . . . , αn )]U ∗ with
X X
|bij |2 = tr(B∗ B) = trU diag(|α1 |2 , . . . , |αn |2 )U∗ = |αi |2 < .
i,j i
8.3. COMMUTING MATRICES AND SIMULTANEOUS DIAGONALIZATION 227
As α < , the required result follows.
2
Proposition 8.2.22. [A matrix is Almost Diagonalizable] Let A ∈ Mn (C) and > 0 be
given. Then, there exists an invertible matrix S such that S−1 AS = T , an upper triangular
matrix with |tij | < , for all i 6= j.
Proof. By Schur Upper Triangularization (see Lemma 6.4.1), there exists a unitary matrix U
such that U ∗ AU = T , an upper triangular matrix. Now, take t = 2 + max |tij | and choose α
i<j
such that 0 < α < /t. Then, if we take Dα = diag(1, α, α2 , . . . , αn−1 ) and S = U Dα , we have
T
AF
Definition 8.3.1. [Simultaneously Diagonalizable] Let A, B ∈ Mn (C). Then, they are said
to be simultaneously diagonalizable if there exists an invertible matrix S such that S −1 AS
and S −1 BS are both diagonal matrices.
Theorem 8.3.3. Let A, B ∈ Mn (C) be diagonalizable matrices. Then they are simultaneously
diagonalizable if and only if they commute.
228CHAPTER 8. ADVANCED TOPICS ON DIAGONALIZABILITY AND TRIANGULARIZATION∗
Proof. One part of this theorem has already been proved in Proposition 8.3.2. For the other
part, let us assume that AB = BA. Since A is diagonalizable, there exists an invertible matrix
S such that
S −1 AS = Λ = λ1 I ⊕ · · · ⊕ λk I, (8.3.1)
−1 −1
. ..
. ..
.. ..
T S AST = =
AF
. .
Tk−1 λk I Tk λk I
DR
and
T1−1 C11 T1 Λ1
.. .. .. ..
T −1 S −1 BST =
. . . = . .
Tk−1 Ckk Tk Λk
Thus A and B are simultaneously diagonalizable and the required result follows.
Theorem 8.3.7. Let F ⊆ Mn (C) be a commuting family of matrices. Then, all the matrices
in F have a common eigenvector.
contradiction.
AF
Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let us
assume the result to be valid for all n < m. Now, let us assume that F ⊆ Mm (C) is a family of
diagonalizable matrices.
If F is simultaneously diagonalizable, then by Proposition 8.3.2, the family F is commuting.
Conversely, let F be a commuting family. If each A ∈ F is a scalar matrix then they are simul-
taneously diagonalizable via I. So, let A ∈ F be a non-scalar matrix. As A is diagonalizable,
there exists an invertible matrix S such that
S −1 AS = λ1 I ⊕ · · · ⊕ λk I, k ≥ 2,
Remark 8.3.9. [σ(AB) and σ(BA)] Let m ≤ n, A ∈ Mm×n (C), and B ∈ Mn×m (C). Then
σ(BA) = σ(AB) with n − m extra 0’s. In particular, if A, B ∈ Mn (C) then, PAB (t) = PBA (t).
4. Now, use continuity to argue that PAB (t) = lim PAα B (t) = lim PBAα (t) = PBA (t).
α→0+ α→0+
DR
5. Let σ(A) = {λ1 , . . . , λn }, σ(B) = {µ1 , . . . , µn } and suppose that AB = BA. Then,
(a) prove that there is a permutation π such that σ(A+B) = {λ1 +µπ(1) , . . . , λn +µπ(n) }.
In particular, σ(A + B) ⊆ σ(A) + σ(B).
Ans: Use Simultaneous Triangularization.
(b) if we further assume that σ(A) ∩ σ(−B) = ∅ then the matrix A + B is nonsingular.
6. Let A and B be two non-commuting matrices. Then, give an example to show that it is
difficult to relate σ(A + B) with σ(A) and σ(B).
" # " #
0 0 0 1
Ans: Take A = and B = . Here σ(A + B) = {1, −1} but σ(A) = σ(B) =
1 0 0 0
{0, 0}.
0 1 0 0 0 0
7. Are the matrices A = 0 0 −1 and B = 1 0 0 simultaneously triangularizable?
0 0 0 0 1 0
Ans: Suppose yes. Then there exists a unitary matrix U such that U ∗ AU = T1 , U ∗ BU = T2
such that T1 and T2 are strictly upper triangular matrices as 0 is the only eigenvalue of A and
B. Thus, the eigenvalues of A + B, AB and BA must all be 0. In this case, note that 0 is
0 1 0
the only eigenvalue of A + B = 1 0 −1 but the eigenvalues of AB are 0, 1 and −1.
0 1 0
8.3. COMMUTING MATRICES AND SIMULTANEOUS DIAGONALIZATION 231
8. Let F ⊆ Mn (C) be a family of commuting normal matrices. Then, prove that each element
of F is simultaneously unitarily diagonalizable.
Ans: By Theorem 8.3.7, we know that all elements in F have a common eigenvector.
Hence, one can apply induction to show that all the elements of F are simultaneously unitarily
triangularizable. But, each elements is normal and hence each of these upper triangular
matrices will be diagonal.
9. Let A ∈ Mn (C) with A∗ = A and x∗ Ax ≥ 0, for all x ∈ Cn . Then prove that σ(A) ⊆ R+
and if tr(A) = 0, then A = 0.
Ans: As A is Hermitian, by Theorem 6.4.10, each eigenvalue of A is real. Suppose (λ, x)
is an eigenpair of A. Then, for this choice of A, the condition
implies that λ ≥ 0. Hence, σ(A) ⊆ R+ . Then, the condition sum of eigenvalues= tr(A) = 0
implies that the sum of non-negative integers is 0 and hence each of them must be zero. That
is λi = 0, for each eigenvalue λ. Hence, the diagonal matrix D, containing the eigenvalues is
the zero matrix. Therefore, A = 0.
Proposition 8.3.11. [Triangularization: Real Matrix] Let A ∈ Mn (R). Then, there exists a
T
AF
real orthogonal matrix Q such that QT AQ is block upper triangular, where each diagonal block
is of size either 1 or 2.
DR
Proof. If all the eigenvalues of A are real then the corresponding eigenvectors have real entries
and hence, one can use induction to get the result in this case (see Lemma 6.4.1).
So, now let us assume that A has a complex eigenvalue, say λ = α + iβ with β 6= 0 and
x = u + iv as an eigenvector for λ. Thus, Ax = λx and hence Ax = λx. But, λ 6= λ as
β 6= 0. Thus, the eigenvectors x, x are linearly independent and therefore, {u, v} is a linearly
independent set. By Gram-Schmidt Orthonormalization process, we get an ordered basis, say
{w1 , w2 , . . . , wn } of Rn , where LS(w1 , w2 ) = LS(u, v). Also, using the eigen-condition Ax =
λx gives
Aw1 = aw1 + bβw2 , Aw2 = cw1 + dw2 ,
where B ∈ Mn−2 (R). Now, by induction hypothesis the required result follows.
The next result is a direct application of Proposition 8.3.11 and hence the proof is omitted.
Proposition 8.3.13. Let A ∈ Mn (R). Then the following statements are equivalent.
1. A is normal.
2. There exists a real orthogonal matrix Q such that QT AQ =
L
i Ai , where Ai ’s are real
normal matrices of size either 1 or 2.
Proof. 2 ⇒ 1 is trivial. To prove 1 ⇒ 2, recall that Proposition 8.3.11 gives the existence of
a real orthogonal matrix Q such that QT AQ is upper triangular with diagonal blocks of size
either 1 or 2. So, we can write
λ1 ∗ ∗ ∗ ∗ ∗
..
0 . ∗ ∗ ∗ ∗
" #
0 · · · λp ∗ ∗ ∗ R C
QT AQ = = (say).
0 ··· 0 A11 · · · A1k 0 B
..
0 ··· 0 0 . ∗
T
0 ··· 0 0 · · · Akk
AF
DR
Remark 8.3.16. 1. Let A be a diagonalizable matrix with ρ(A) < 1. Then, A is a convergent
matrix.
Proof. Let A = U ∗ diag(λ1 , . . . , λn )U . As ρ(A) < 1, for each i, 1 ≤ i ≤ n, λm
i → 0 as
m → ∞. Thus, Am = U ∗ diag(λm m
1 , . . . , λn )U → 0.
2. Even if the matrix A is not diagonalizable, the above result holds. That is, whenever
ρ(A) < 1, the matrix A is convergent. The converse is also true.
Proof. Let Jk (λ) = λIk + Nk be a Jordan block of J = Jordan CFA. Then as Nkk = 0,
for each fixed k, we have
Theorem 8.3.17. [Decomposition into Diagonalizable and Nilpotent Parts] Let A ∈ Mn (C).
Then A = B + C, where B is diagonalizable matrix and C is nilpotent such that BC = CB.
T
nilpotent matrix.
Now, note that DN = N D as for each Jordan block Jk (λ) = Dk + Nk , we have Dk = λI and
DR
T
AF
DR
Chapter 9
Appendix
Remark 9.1.2. Recall that in Remark 9.2.16.1, it was observed that each permutation is a
AF
1. Verify that the elementary matrix Eij is the permutation matrix corresponding to the
transposition (i, j) .
2. Thus, every permutation matrix is a product of elementary matrices E1j , 1 ≤ j ≤ n.
1 0 0 0 1 0
3. For n = 3, the permutation matrices are I3 , 0 0 1 = E23 = E12 E13 E12 , 1
0 0=
0 1 0 0 0 1
0 1 0 0 0 1 0 0 1
E12 ,
0 0 1 = E12 E13 , 1 0 0 = E13 E12 and 0 1 0 = E13 .
1 0 0 0 1 0 1 0 0
4. Let f ∈ Sn and P f = [pij ] be the corresponding permutation matrix. Since pij = δi,j and
{f (1), . . . , f (n)} = [n], each entry of P f is either 0 or 1. Furthermore, every row and
column of P f has exactly one nonzero entry. This nonzero entry is a 1 and appears at
the position pi,f (i) .
5. By the previous paragraph, we see that when a permutation matrix is multiplied to A
(a) from left then it permutes the rows of A.
(b) from right then it permutes the columns of A.
6. P is a permutation matrix if and only if P has exactly one 1 in each row and column.
Solution: If P has exactly one 1 in each row and column, then P is a square matrix, say
235
236 CHAPTER 9. APPENDIX
n × n. Now, apply GJE to P . The occurrence of exactly one 1 in each row and column
implies that these 1’s are the pivots in each column. We just need to interchange rows to
get it in RREF. So, we need to multiply by Eij . Thus, GJE of P is In and P is indeed a
product of Eij ’s. The other part has already been explained earlier.
Theorem 9.1.3. Let A and B be two matrices in RREF. If they are row equivalent then A = B.
Proof. Note that the matrix A = 0 if and only if B = 0. So, let us assume that the matrices
A, B 6= 0. Also, the row-equivalence of A and B implies that there exists an invertible matrix
C such that A = CB, where C is product of elementary matrices.
Since B is in RREF, either B[:, 1] = 0T or B[:, 1] = (1, 0, . . . , 0)T . If B[:, 1] = 0T then
A[:, 1] = CB[:, 1] = C0 = 0. If B[:, 1] = (1, 0, . . . , 0)T then A[:, 1] = CB[:, 1] = C[:, 1]. As C is
invertible, the first column of C cannot be the zero vector. So, A[:, 1] cannot be the zero vector.
Further, A is in RREF implies that A[:, 1] = (1, 0, . . . , 0)T . So, we have shown that if A and B
are row-equivalent then their first columns must be the same.
Now, let us assume that the first k − 1 columns of A and B are equal and it contains r
pivotal columns. We will now show that the k-th column is also the same.
Define Ak = [A[:, 1], . . . , A[:, k]] and Bk = [B[:, 1], . . . , B[:, k]]. Then, our assumption implies
that A[:, i] = B[:, i], for 1 ≤ i ≤ k − 1. Since, the first k − 1 columns contain r pivotal columns,
there exists a permutation matrix P such that
T
AF
" # " #
Ir W A[:, k] Ir W B[:, k]
Ak P = and Bk P = .
DR
0 0 0 0
" # If the k-th columns of A and B are pivotal columns then by definition of RREF, A[:, k] =
0
= B[:, k], where 0 is a vector of size r and e1 = (1, 0, . . . , 0)T . So, we need to consider two
e1
cases depending on whether both are non-pivotal or one is pivotal and the other is not.
As A = CB, we get Ak = CBk and
" # " #" # " #
Ir W A[:, k] C1 C2 Ir W B[:, k] C1 C1 W CB[:, k]
= Ak P = CBk P = = .
0 0 C3 C4 0 0 C3 C3 W
" #
I r C2
So, we see that C1 = Ir , C3 = 0 and A[:, k] = B[:, k].
0 C4
Case 1: Neither A[:, k] nor B[:, k] are pivotal. Then
" # " # " #" # " #
X I r C2 I r C2 Y Y
= A[:, k] = B[:, k] = = .
0 0 C4 0 C4 0 0
Thus, X = Y and in this case the k-th columns are equal.
Case 2: A[:, k] is pivotal but B[:, k] in non-pivotal. Then
" # " # " #" # " #
0 I r C2 Ir C2 Y Y
= A[:, k] = B[:, k] = = ,
e1 0 C4 0 C4 0 0
a contradiction as e1 6= 0. Thus, this case cannot arise.
Therefore, combining both the cases, we get the required result.
9.2. PERMUTATION/SYMMETRIC GROUPS 237
Example 9.2.2. Let A = {1, 2, 3}, B = {a, b, c, d} and C = {α, β, γ}. Then, the function
1. j : A → B defined by j(1) = a, j(2) = c and j(3) = c is neither one-one nor onto.
2. f : A → B defined by f (1) = a, f (2) = c and f (3) = d is one-one but not onto.
3. g : B → C defined by g(a) = α, g(b) = β, g(c) = α and g(d) = γ is onto but not one-one.
4. h : B → A defined by h(a) = 2, h(b) = 2, h(c) = 3 and h(d) = 1 is onto.
5. h ◦ f : A → A is a bijection.
6. g ◦ f : A → C is neither one-one not onto.
Exercise 9.2.5. Let S3 be the set consisting of all permutation on 3 elements. Then, prove
that S3 has 6 elements. Moreover, they are one of the 6 functions given below.
1. f1 (1) = 1, f1 (2) = 2 and f1 (3) = 3.
2. f2 (1) = 1, f2 (2) = 3 and f2 (3) = 2.
3. f3 (1) = 2, f3 (2) = 1 and f3 (3) = 3.
4. f4 (1) = 2, f4 (2) = 3 and f4 (3) = 1.
5. f5 (1) = 3, f5 (2) = 1 and f5 (3) = 2.
6. f6 (1) = 3, f6 (2) = 2 and f6 (3) = 1.
Remark 9.2.6. Let f : [n] → [n] be a bijection. Then, the inverse of f , denote f −1 , is defined
by f −1 (m) = ` whenever f (`) = m for m ∈ [n] is well defined and f −1 is a bijection. For
example, in Exercise 9.2.5, note that fi−1 = fi , for i = 1, 2, 3, 6 and f4−1 = f5 .
Remark 9.2.7. Let Sn = {f : [n] → [n] : σ is a permutation}. Then, Sn has n! elements and
forms a group with respect to composition of functions, called product, due to the following.
238 CHAPTER 9. APPENDIX
1. Let f ∈ Sn . Then,
!
1 2 ··· n
(a) f can be written as f = , called a two row notation.
f (1) f (2) · · · f (n)
(b) f is one-one. Hence, {f (1), f (2), . . . , f (n)} = [n] and thus, f (1) ∈ [n], f (2) ∈ [n] \
{f (1)}, . . . and finally f (n) = [n]\{f (1), . . . , f (n−1)}. Therefore, there are n choices
for f (1), n − 1 choices for f (2) and so on. Hence, the number of elements in Sn
equals n(n − 1) · · · 2 · 1 = n!.
4. Sn has a special permutation called the identity permutation, denoted Idn , such that
Idn (i) = i, for 1 ≤ i ≤ n.
Lemma 9.2.8. Fix a positive integer n. Then, the group Sn satisfies the following:
2. Sn = {g −1 : g ∈ Sn }.
Proof. Part 1: Note that for each α ∈ Sn the functions f −1 ◦α, α◦f −1 ∈ Sn and α = f ◦(f −1 ◦α)
T
as well as α = (α ◦ f −1 ) ◦ f .
AF
Part 2: Note that for each f ∈ Sn , by definition, (f −1 )−1 = f . Hence the result holds.
DR
Definition 9.2.9. Let f ∈ Sn . Then, the number of inversions of f , denoted n(f ), equals
3. Let f = (1, 3, 5, 4) and g = (2, 4, 1) be two cycles. Then, their product, denoted f ◦ g or
(1, 3, 5, 4)(2, 4, 1) equals (1, 2)(3, 5, 4). The calculation proceeds as (the arrows indicate the
images):
1 → 2. Note (f ◦ g)(1) = f (g(1)) = f (2) = 2.
2 → 4 → 1 as (f ◦ g)(2) = f (g(2)) = f (4) = 1. So, (1, 2) forms a cycle.
3 → 5 as (f ◦ g)(3) = f (g(3)) = f (3) = 5.
5 → 4 as (f ◦ g)(5) = f (g(5)) = f (5) = 4.
4 → 1 → 3 as (f ◦ g)(4) = f (g(4)) = f (1) = 3. So, the other cycle is (3, 5, 4).
4. Let f = (1, 4, 5) and g = (2, 4, 1) be two permutations. Then, (1, 4, 5)(2, 4, 1) = (1, 2, 5)(4) =
(1, 2, 5) as 1 → 2, 2 → 4 → 5, 5 → 1, 4 → 1 → 4 and
(2, 4, 1)(1, 4, 5) = (1)(2, 4, 5) = (2, 4, 5) as 1 → 4 → 1, 2 → 4, 4 → 5, 5 → 1 → 2.
!
1 2 3 4 5
5. Even though is not a cycle, verify that it is a product of the cycles
4 3 2 5 1
(1, 4, 5) and (2, 3).
2. in general, the r-cycle (i1 , . . . , ir ) = (1, i1 )(1, ir )(1, ir−1 ) · · · (1, i2 )(1, i1 ).
AF
3. So, every r-cycle can be written as product of transpositions. Furthermore, they can be
DR
written using the n transpositions (1, 2), (1, 3), . . . , (1, n).
With the above definitions, we state and prove two important results.
Proof. Note that using use Remark 9.2.14, we just need to show that f can be written as
product of disjoint cycles.
Consider the set S = {1, f (1), f (2) (1) = (f ◦ f )(1), f (3) (1) = (f ◦ (f ◦ f ))(1), . . .}. As S is an
infinite set and each f (i) (1) ∈ [n], there exist i, j with 0 ≤ i < j ≤ n such that f (i) (1) = f (j) (1).
Now, let j1 be the least positive integer such that f (i) (1) = f (j1 ) (1), for some i, 0 ≤ i < j1 .
Then, we claim that i = 0.
For if, i − 1 ≥ 0 then j1 − 1 ≥ 1 and the condition that f is one-one gives
f (i−1) (1) = (f −1 ◦ f (i) )(1) = f −1 f (i) (1) = f −1 f (j1 ) (1) = (f −1 ◦ f (j1 ) )(1) = f (j1 −1) (1).
Thus, we see that the repetition has occurred at the (j1 − 1)-th instant, contradicting the
assumption that j1 was the least such positive integer. Hence, we conclude that i = 0. Thus,
(1, f (1), f (2) (1), . . . , f (j1 −1) (1)) is one of the cycles in f .
Now, choose i1 ∈ [n] \ {1, f (1), f (2) (1), . . . , f (j1 −1) (1)} and proceed as above to get another
cycle. Let the new cycle by (i1 , f (i1 ), . . . , f (j2 −1) (i1 )). Then, using f is one-one follows that
1, f (1), f (2) (1), . . . , f (j1 −1) (1) ∩ i1 , f (i1 ), . . . , f (j2 −1) (i1 ) = ∅.
240 CHAPTER 9. APPENDIX
So, the above process needs to be repeated at most n times to get all the disjoint cycles. Thus,
the required result follows.
Remark 9.2.16. Note that when one writes a permutation as product of disjoint cycles, cycles
of length 1 are suppressed so as to match Definition 9.2.11. For example, the algorithm in the
proof of Theorem 9.2.15 implies
1. Using Remark 9.2.14.3, we see that every permutation can be written as product of the n
transpositions (1, 2), (1, 3), . . . , (1, n).
!
1 2 3 4 5
2. = (1)(2, 4, 5)(3) = (2, 4, 5).
1 4 3 5 2
!
1 2 3 4 5 6 7 8 9
3. = (1, 4, 5)(2)(3)(6, 9)(7, 8) = (1, 4, 5)(6, 9)(7, 8).
4 2 3 5 1 9 8 7 6
Note that Id3 = (1, 2)(1, 2) = (1, 2)(2, 3)(1, 2)(1, 3), as well. The question arises, is it
possible to write Idn as a product of odd number of transpositions? The next lemma answers
this question in negative.
Idn = f1 ◦ f2 ◦ · · · ◦ ft ,
then t is even.
T
Proof. We will prove the result by mathematical induction. Observe that t 6= 1 as Idn is not a
AF
transposition. Hence, t ≥ 2. If t = 2, we are done. So, let us assume that the result holds for
DR
f = g1 ◦ g2 ◦ · · · ◦ gk = h1 ◦ h2 ◦ · · · ◦ h`
Idn = g1 ◦ g2 ◦ · · · ◦ gk ◦ h` ◦ h`−1 ◦ · · · ◦ h1 .
Hence by Lemma 9.2.17, k + ` is even. Thus, either k and ` are both even or both odd.
Definition 9.2.20. Observe that if f and g are both even or both odd permutations, then f ◦ g
and g ◦ f are both even. Whereas, if one of them is odd and the other even then f ◦ g and g ◦ f
are both odd. We use this to define a function sgn : Sn → {1, −1}, called the signature of a
permutation, by (
1 if f is an even permutation
sgn(f ) = .
−1 if f is an odd permutation
Example 9.2.21. Consider the set Sn . Then,
3. using Remark 9.2.20, sgn(f ◦ g) = sgn(f ) · sgn(g) for any two permutations f, g ∈ Sn .
T
Definition 9.2.22. Let A = [aij ] be an n × n matrix with complex entries. Then, the deter-
DR
Observe that det(A) is a scalar quantity. Even though the expression for det(A) seems
complicated at first glance, it is very helpful in proving the results related with “properties of
determinant”. We will do so in the next section. As another examples, we verify that this
definition also matches for 3 × 3 matrices. So, let A = [aij ] be a 3 × 3 matrix. Then, using
Equation (9.2.2),
X 3
Y
det(A) = sgn(σ) aiσ(i)
σ∈Sn i=1
3
Y 3
Y 3
Y
= sgn(f1 ) aif1 (i) + sgn(f2 ) aif2 (i) + sgn(f3 ) aif3 (i) +
i=1 i=1 i=1
3
Y 3
Y 3
Y
sgn(f4 ) aif4 (i) + sgn(f5 ) aif5 (i) + sgn(f6 ) aif6 (i)
i=1 i=1 i=1
= a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 .
242 CHAPTER 9. APPENDIX
5. Let B and C be two n×n matrices. If there exists m ∈ [n] such that B[i, :] = C[i, :] = A[i, :]
for all i 6= m and C[m, :] = A[m, :] + B[m, :] then det(C) = det(A) + det(B).
7. If A is a triangular matrix then det(A) = a11 · · · ann , the product of the diagonal entries.
Proof. Part 1: Note that each sum in det(A) contains one entry from each row. So, each sum
has an entry from A[i, :] = 0T . Hence, each sum in itself is zero. Thus, det(A) = 0.
Part 2: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = cA[i, :]. So,
X Y X Y
det(B) = sgn(σ) bkσ(k) biσ(i) = sgn(σ) akσ(k) caiσ(i)
σ∈Sn k6=i σ∈Sn k6=i
X n
Y
= c sgn(σ) akσ(k) = c det(A).
σ∈Sn k=1
Part 3: Let τ = (i, j). Then, sgn(τ ) = −1, by Lemma 9.2.8, Sn = {σ ◦ τ : σ ∈ Sn } and
X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ ◦ τ ) bi,(σ◦τ )(i)
σ∈Sn i=1 σ◦τ ∈Sn i=1
X Y
= sgn(τ ) · sgn(σ) bkσ(k) bi(σ◦τ )(i) bj(σ◦τ )(j)
σ◦τ ∈Sn k6=i,j
X Y X n
Y
= sgn(τ ) sgn(σ) bkσ(k) biσ(j) bjσ(i) = − sgn(σ) akσ(k)
σ∈Sn k6=i,j σ∈Sn k=1
= − det(A).
Part 4: As A[i, :] = A[j, :], A = Eij A. Hence, by Part 3, det(A) = − det(A). Thus, det(A) = 0.
9.3. PROPERTIES OF DETERMINANT 243
Part 5: By assumption, C[i, :] = B[i, :] = A[i, :] for i 6= m and C[m, :] = B[m, :] + A[m, :]. So,
X Yn X Y
det(C) = sgn(σ) ciσ(i) = sgn(σ) ciσ(i) cmσ(m)
σ∈Sn i=1 σ∈Sn i6=m
X Y
= sgn(σ) ciσ(i) (amσ(m) + bmσ(m) )
σ∈Sn i6=m
X n
Y X n
Y
= sgn(σ) aiσ(i) + sgn(σ) biσ(i) = det(A) + det(B).
σ∈Sn i=1 σ∈Sn i=1
Part 6: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = A[i, :] + cA[j, :]. So,
X Yn X Y
det(B) = sgn(σ) bkσ(k) = sgn(σ) bkσ(k) biσ(i)
σ∈Sn k=1 σ∈Sn k6=i
X Y
= sgn(σ) akσ(k) (aiσ(i) + cajσ(j) )
σ∈Sn k6=i
X Y X Y
= sgn(σ) akσ(k) aiσ(i) + c sgn(σ) akσ(k) ajσ(j) )
σ∈Sn k6=i σ∈Sn k6=i
X n
Y
= sgn(σ) akσ(k) + c · 0 = det(A). U seP art 4
T
σ∈Sn k=1
AF
Part 7: Observe that if σ ∈ Sn and σ 6= Idn then n(σ) ≥ 1. Thus, for every σ 6= Idn , there
DR
exists m ∈ [n] (depending on σ) such that m > σ(m) or m < σ(m). So, if A is triangular,
amσ(m) = 0. So, for each σ 6= Idn , ni=1 aiσ(i) = 0. Hence, det(A) = ni=1 aii . the result follows.
Q Q
Part 8: Using Part 7, det(In ) = 1. By definition Eij = Eij In and Ei (c) = Ei (c)In and
Eij (c) = Eij (c)In , for c 6= 0. Thus, using Parts 2, 3 and 6, we get det(Ei (c)) = c, det(Eij ) = −1
and det(Eij (k)) = 1. Also, again using Parts 2, 3 and 6, we get det(EA) = det(E) det(A).
Part 9: Suppose A is invertible. Then, by Theorem 2.7.1, A = E1 · · · Ek , for some elementary
matrices E1 , . . . , Ek . So, a repeated application of Part 8 implies det(A) = det(E1 ) · · · det(Ek ) 6=
0 as det(Ei ) 6= 0 for 1 ≤ i ≤ k.
Now, suppose that det(A) 6= 0. We need to show that A is invertible. On the contrary, as-
sume that A is not invertible. Then, by Theorem 2.7.1, Rank(A) <"n. So, # by Proposition 2.4.9,
B
there exist elementary matrices E1 , . . . , Ek such that E1 · · · Ek A = . Therefore, by Part 1
0
and a repeated application of Part 8 gives
" #!
B
det(E1 ) · · · det(Ek ) det(A) = det(E1 · · · Ek A) = det = 0.
0
In case A is not invertible, by Part 9, det(A) = 0. Also, AB is not invertible (AB is invertible
will imply A is invertible using the rank argument). So, again by Part 9, det(AB) = 0. Thus,
det(AB) = det(A) det(B).
Part 11: Let B = [bij ] = AT . Then, bij = aji , for 1 ≤ i, j ≤ n. By Lemma 9.2.8, we know that
Sn = {σ −1 : σ ∈ Sn }. As σ ◦ σ −1 = Idn , sgn(σ) = sgn(σ −1 ). Hence,
X n
Y X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ) aσ(i),i = sgn(σ −1 ) aiσ−1 (i)
σ∈Sn i=1 σ∈Sn i=1 σ −1 ∈S n i=1
= det(A).
Remark 9.3.2. 1. As det(A) = det(AT ), we observe that in Theorem 9.3.1, the condition
on “row” can be replaced by the condition on “column”.
2. Let A = [aij ] be a matrix satisfying a1j = 0, for 2 ≤ j ≤ n. Let B = A(1 | 1), the submatrix
of A obtained by removing the first row and the first column. Then det(A) = a11 det(B).
Proof: Let σ ∈ Sn with σ(1) = 1. Then, σ has a cycle (1). So, a disjoint cycle represen-
tation of σ only has numbers {2, 3, . . . , n}. That is, we can think of σ as an element of
Sn−1 . Hence,
X n
Y X n
Y
det(A) = sgn(σ) aiσ(i) = sgn(σ) aiσ(i)
T
AF
X Y X Y
= a11 sgn(σ) aiσ(i) = a11 sgn(σ) biσ(i) = a11 det(B).
σ∈Sn ,σ(1)=1 i=2 σ∈Sn−1 i=1
We now relate this definition of determinant with the one given in Definition 2.8.1.
n
(−1)1+j a1j det A(1 | j) , where
P
Theorem 9.3.3. Let A be an n × n matrix. Then, det(A) =
j=1
recall that A(1 | j) is the submatrix of A obtained by removing the 1st row and the j th column.
0 0 · · · a1j · · · 0
a21 a22 · · · a2j · · · a2n
Proof. For 1 ≤ j ≤ n, define an n × n matrix Bj = . . Also, for
.. .. .. .. ..
. . . .
an1 an2 · · · anj · · · ann
each matrix Bj , we define the n × n matrix Cj by
1. Cj [:, 1] = Bj [:, j],
2. Cj [:, i] = Bj [:, i − 1], for 2 ≤ i ≤ j and
3. Cj [:, k] = Bj [:, k] for k ≥ j + 1.
Also, observe that Bj ’s have been defined to satisfy B1 [1, :] + · · · + Bn [1, :] = A[1, :] and
Bj [i, :] = A[i, :] for all i ≥ 2 and 1 ≤ j ≤ n. Thus, by Theorem 9.3.1.5,
n
X
det(A) = det(Bj ). (9.3.1)
j=1
9.4. DIMENSION OF W1 + W2 245
Let us now compute det(Bj ), for 1 ≤ j ≤ n. Note that Cj = E12 E23 · · · Ej−1,j Bj , for 1 ≤ j ≤ n.
Then, by Theorem 9.3.1.3, we get det(Bj ) = (−1)j−1 det(Cj ). So, using Remark 9.3.2.2 and
Theorem 9.3.1.2 and Equation (9.3.1), we have
n
X n
X
(−1)j−1 det(Cj ) = (−1)j+1 a1j det A(1 | j) .
det(A) =
j=1 j=1
Thus, we have shown that the determinant defined in Definition 2.8.1 is valid.
9.4 Dimension of W1 + W2
Theorem 9.4.1. Let V be a finite dimensional vector space over F and let W1 and W2 be two
subspaces of V. Then,
2. LS(D) = W1 + W2 .
DR
The second part can be easily verified. For the first part, consider the linear system
α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws + γ1 v1 + · · · + γt vt = 0 (9.4.2)
α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws = −(γ1 v1 + · · · + γt vt ).
s r T
γi vi ∈ LS(B1 ) = W1 . Also, v = βk wk . So, v ∈ LS(B2 ) = W2 .
P P P
Then, v = − αr ur +
i=1 j=1 k=1
r
Hence, v ∈ W1 ∩ W2 and therefore, there exists scalars δ1 , . . . , δk such that v =
P
δ j uj .
j=1
Substituting this representation of v in Equation (9.4.2), we get
α1 u1 + · · · + αk uk + γ1 v1 + · · · + γr vr = 0
which has αi = 0 for 1 ≤ i ≤ r and γj = 0 for 1 ≤ j ≤ s as the only solution. Hence, we see that
the linear system of Equations (9.4.2) has no nonzero solution. Therefore, the set D is linearly
independent and the set D is indeed a basis of W1 + W2 . We now count the vectors in the sets
B, B1 , B2 and D to get the required result.
246 CHAPTER 9. APPENDIX
Theorem 9.5.1. Let V be a real vector space. A norm k · k is induced by an inner product if
and only if, for all x, y ∈ V, the norm satisfies
Proof. Suppose that k · k is indeed induced by an inner product. Then, by Exercise 5.2.7.3 the
result follows.
So, let us assume that k · k satisfies the parallelogram law. So, we need to define an inner
product. We claim that the function f : V × V → R defined by
1
kx + yk2 − kx − yk2 , for all x, y ∈ V
f (x, y) =
4
satisfies the required conditions for an inner product. So, let us proceed to do so.
1
Step 1: Clearly, for each x ∈ V, f (x, 0) = 0 and f (x, x) = kx + xk2 = kxk2 . Thus,
4
f (x, x) ≥ 0. Further, f (x, x) = 0 if and only if x = 0.
Step 3: Now note that kx + yk2 − kx − yk2 = 2 kx + yk2 − kxk2 − kyk2 . Or equivalently,
AF
Now, substituting z = 0 in Equation (9.5.3) and using Equation (9.5.2), we get 2f (x, y) =
f (x, 2y) and hence 4f (x + z, y) = 2f (x + z, 2y) = 4 (f (x, y) + f (z, y)). Thus,
Step 4: Using Equation (9.5.4), f (x, y) = f (y, x) and the principle of mathematical induction,
it follows that nf (x, y) = f (nx, y), for all x, y ∈ V and n ∈ N. Another application of
Equation (9.5.4) with f (0, y) = 0 implies that nf (x, y) = f (nx, y), for all x, y ∈ V and
n ∈ Z. Also, for m 6= 0,
n n
mf x, y = f (m x, y) = f (nx, y) = nf (x, y).
m m
Hence, we see that for all x, y ∈ V and a ∈ Q, f (ax, y) = af (x, y).
9.6. ROOTS OF A POLYNOMIALS 247
where [a, b] ⊆ R.
DR
1. If the function f is one-one on [a, b) and also on (a, b], then it is called a simple curve.
2. If f (b) = f (a), then it is called a closed curve.
3. A closed simple curve is called a Jordan curve.
4. The derivative (integral) of a curve f = u+iv is defined component wise. If f 0 is continuous
on [a, b], we say f is a C 1 -curve (at end points we consider one sided derivatives and
continuity).
5. A C 1 -curve on [a, b] is called a smooth curve, if f 0 is never zero on (a, b).
6. A piecewise smooth curve is called a contour.
7. A positively oriented simple closed curve is called a simple closed curve such that
while traveling on it the interior of the curve always stays to the left. (Camille Jordan
has proved that such a curve always divides the plane into two connected regions, one of
which is called the bounded region and the other is called the unbounded region. The
one which is bounded is considered as the interior of the curve.)
Theorem 9.6.2. [Rouche’s Theorem] Let C be a positively oriented simple closed contour.
Also, let f and g be two analytic functions on RC , the union of the interior of C and the curve
C itself. Assume also that |f (x)| > |g(x)|, for all x ∈ C. Then, f and f + g have the same
number of zeros in the interior of C.
248 CHAPTER 9. APPENDIX
Corollary 9.6.3. [Alen Alexanderian, The University of Texas at Austin, USA.] Let P (t) =
tn +an−1 tn−1 +· · ·+a0 have distinct roots λ1 , . . . , λm with multiplicities α1 , . . . , αm , respectively.
Take any > 0 for which the balls B (λi ) are disjoint. Then, there exists a δ > 0 such that the
polynomial q(t) = tn + a0n−1 tn−1 + · · · + a00 has exactly αi roots (counting with multiplicities) in
B (λi ), whenever |aj − a0j | < δ.
Hence, by Rouche’s theorem, p(z) and q(z) have the same number of zeros inside Cj , for each
j = 1, . . . , m. That is, the zeros of q(t) are within the -neighborhood of the zeros of P (t).
As a direct application, we obtain the following corollary.
opposite to a0 ; am2 is the first after am1 with sign opposite to am1 ; and so on.
AF
maximum number of positive roots of P (x) = 0 is the number of changes in sign of the
coefficients and that the maximum number of negative roots is the number of sign changes
in P (−x) = 0.
Proof. Assume that a0 , a1 , · · · , an has k > 0 sign changes. Let b > 0. Then, the coeffi-
cients of (x − b)P (x) are
This list has at least k + 1 changes of signs. To see this, assume that a0 > 0 and an 6= 0.
Let the sign changes of ai occur at m1 < m2 < · · · < mk . Then, setting
we see that ci > 0 when i is even and ci < 0, when i is odd. That proves the claim.
Now, assume that P (x) = 0 has k positive roots b1 , b2 , · · · , bk . Then,
Proof. Proof of Part 1: By spectral theorem (see Theorem 6.4.10, there exists a unitary matrix
U such that A = U DU ∗ , where D = diag(λ1 (A), . . . , λn (A)) is a real diagonal matrix. Thus,
the set {U [:, 1], . . . , U [:, n]} is a basis of Cn . Hence, for each x ∈ Cn , there exists Ans :i ’s
(scalar) such that x = αi U [:, i]. So, note that x∗ x = |αi |2 and
P
X X X
λ1 (A)x∗ x = λ1 (A) |αi |2 ≤ |αi |2 λi (A) = x∗ Ax ≤ λn |αi |2 = λn x∗ x.
For Part 2 and Part 3, take x = U [:, 1] and x = U (:, n), respectively.
As an immediate corollary, we state the following result.
T
x∗ Ax
AF
Proof. Let x ∈ Cn such that x is orthogonal to U [, 1], . . . , U [:, k − 1]. Then, we can write
Pn
x= αi U [:, i], for some scalars αi ’s. In that case,
i=k
n
X n
X
λk x∗ x = λk |αi |2 ≤ |αi |2 λi = x∗ Ax
i=k i=k
and the equality occurs for x = U [:, k]. Thus, the required result follows.
Hence, λk ≥ max min x∗ Ax, for each choice of k − 1 linearly independent vectors.
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1
But, by Proposition 9.7.3, the equality holds for the linearly independent set {U [:, 1], . . . , U [:
, k − 1]} which proves the first equality. A similar argument gives the second equality and hence
the proof is omitted.
Proof. As A and B are Hermitian matrices, the matrix A + B is also Hermitian. Hence, by
Courant-Fischer theorem and Lemma 9.7.1.1,
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1
DR
and
= max min x∗ Ax
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1 ,z
Theorem 9.7.6.
" [Cauchy
# Interlacing Theorem] Let A ∈ Mn (C) be a Hermitian matrix.
A y
Define  = ∗ , for some a ∈ R and y ∈ Cn . Then,
y a
and
λk+n−r (A).
DR
Theorem 9.7.8. [Poincare Separation Theorem] Let A ∈ Mn (C) be a Hermitian matrix and
{u1 , . . . , ur } ⊆ Cn be an orthonormal set for some positive integer r, 1 ≤ r ≤ n. If further
B = [bij ] is an r × r matrix with bij = u∗i Auj , 1 ≤ i, j ≤ r then, λk (A) ≤ λk (B) ≤ λk+n−r (A).
Proof. Let us extend the i set {u1 , . . . , ur } to an orthonormal basis, say {u1 , . . . , un }
h orthonormal
of Cn and write U = u1 · · · un . Then, B is a r × r principal submatrix of U ∗ AU . Thus, by
inclusion principle, λk (U ∗ AU ) ≤ λk (B) ≤ λk+n−r (U ∗ AU ). But, we know that σ(U ∗ AU ) = σ(A)
and hence the required result follows.
The proof of the next result is left for the reader.
Corollary 9.7.9. Let A ∈ Mn (C) be a Hermitian matrix and r be a positive integer with
1 ≤ r ≤ n. Then,
Now assume that x∗ Ax > 0 holds for each nonzero x ∈ W and that λn−k+1 = 0. Then, it
follows that min x∗ Ax = 0. Now, define f : Cn → C by f (x) = x∗ Ax.
kxk=1
x⊥x1 ,...,xn−k
Then, f is a continuous function and min f (x) = 0. Thus, f must attain its bound on the
kxk=1
x∈W
unit sphere. That is, there exists y ∈ W with kyk = 1 such that y∗ Ay = 0, a contradiction.
Thus, the required result follows.
T
AF
DR
Index
253
254 INDEX
Subspace
AF
Vector
Column, 9
Coordinate, 122
Row, 9
Unit, 22
Vector Space, 72
Basis, 92
Complex, 72
Complex n-tuple, 73
Dimension of M + N , 245
Finite Dimensional, 79
Infinite Dimensional, 79
Inner Product, 133
Isomorphic, 120
Minimal spanning set, 93
Real, 72
Real n-tuple, 73
Subspace, 75
Vector Subspace, 75
Vectors
T
AF
Angle, 136
Length, 135
DR
Linear Combination, 78
Linear Dependence, 84
Linear Independence, 84
Linear Span, 79
Mutually Orthogonal, 141
Norm, 135
Orthogonal, 139
Orthonormal, 141
Zero matrix, 10
Zero Operator, 105
Zero Transformation, 106