Ebin - Pub - Linear Algebra and Differential Equations Using Matlab

Download as pdf or txt
Download as pdf or txt
You are on page 1of 654

Linear Algebra

and Differential Equations


using MATLAB
June 12, 2020

by Martin Golubitsky and Michael Dellnitz

cba This document was typeset on Friday 12th June, 2020.


Copyright © 1998 Martin Golubitsky and Michael Dellnitz
This work is licensed under the Creative Commons Attribution-ShareAlike License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/
The cover photograph was taken by Ben Scumin and is licensed under a CC BY-SA license.
If you distribute this work or a derivative, include the history of the document. The source
code is available at:
http://github.com/mooculus/laode/
This book is typeset using LATEX and the STIX and Gillius fonts.
This book uses the XIMERA document class.
We will be glad to receive corrections and suggestions for improvement at
[email protected]
Contents 3.5 Composition and Mul-
tiplication of Ma-
Preface . . . . . . . . . . . . . . i trices . . . . . . . 149
1 Preliminaries . . . . . . . . 1 3.6 Properties of Matrix
1.1 Vectors and Matrices . 2 Multiplication . . 157

1.2 MATLAB . . . . . . . 6 3.7 Solving Linear Sys-


tems and Inverses 164
1.3 Special Kinds of Ma-
trices . . . . . . . 11 3.8 Determinants of 2 × 2
Matrices . . . . . 180
1.4 The Geometry of Vec-
tor Operations . . 16 4 Solving Linear Differen-
tial Equations . . . . 189
2 Solving Linear Equations 27
2.1 Systems of Linear 4.1 A Single Differential
Equations and Equation . . . . . 190
Matrices . . . . . 28 4.2 *Rate Problems . . . . 198
2.2 The Geometry of Low- 4.3 Uncoupled Linear
Dimensional Solu- Systems of Two
tions . . . . . . . 41 Equations . . . . . 208
2.3 Gaussian Elimination . 56
4.4 Coupled Linear Systems 218
2.4 Reduction to Echelon
Form . . . . . . . 78 4.5 The Initial Value
Problem and
2.5 Linear Equations with Eigenvectors . . . 227
Special Coefficients 92
4.6 Eigenvalues of 2 × 2
2.6 Uniqueness of Re- Matrices . . . . . 242
duced Echelon
Form . . . . . . . 101 4.7 Initial Value Problems
Revisited . . . . . 251
3 Matrices and Linearity . . 103
3.1 Matrix Multiplication 4.8 *Markov Chains . . . 269
of Vectors . . . . . 104 5 Vector Spaces . . . . . . . . 285
3.2 Matrix Mappings . . . 116 5.1 Vector Spaces and
3.3 Linearity . . . . . . . . 127 Subspaces . . . . . 286
3.4 The Principle of Su- 5.2 Construction of Sub-
perposition . . . . 142 spaces . . . . . . . 298
5.3 Spanning Sets and 7.4 *Existence of Determi-
MATLAB . . . . . 311 nants . . . . . . . 481
5.4 Linear Dependence 8 Linear Maps and Changes
and Linear Inde- of Coordinates . . . . 486
pendence . . . . . 319
8.1 Linear Mappings and
5.5 Dimension and Bases . 329 Bases . . . . . . . 487

5.6 The Proof of the Main 8.2 Row Rank Equals Col-
Theorem . . . . . 341 umn Rank . . . . 501

6 Closed Form Solutions for 8.3 Vectors and Matrices


Planar ODEs . . . . . 355 in Coordinates . . 510

6.1 The Initial Value 8.4 *Matrices of Linear


Problem . . . . . 356 Maps on a Vector
Space . . . . . . . 525
6.2 Closed Form Solutions
by the Direct 9 Least Squares . . . . . . . . 534
Method . . . . . . 367 9.1 Least Squares Approx-
imations . . . . . 535
6.3 Similar Matrices and
Jordan Normal 9.2 Least Squares Fitting
Form . . . . . . . 379 of Data . . . . . . 539
6.4 Sinks, Saddles, and 10 Orthogonality . . . . . . . 552
Sources . . . . . . 390
10.1 Orthonormal Bases
6.5 *Matrix Exponentials 403 and Orthogonal
Matrices . . . . . 553
6.6 *The Cayley Hamilton
Theorem . . . . . 419 10.2 Gram-Schmidt Or-
thonormalization
6.7 *Second Order Equa- Process . . . . . . 560
tions . . . . . . . 426
10.3 The Spectral Theory
7 Determinants and Eigen- of Symmetric Ma-
values . . . . . . . . . 437 trices . . . . . . . 566
7.1 Determinants . . . . . 438 10.4 *QR Decompositions 573
7.2 Eigenvalues . . . . . . 458 11 *Matrix Normal Forms . 582
7.3 Real Diagonalizable 11.1 Simple Complex
Matrices . . . . . 472 Eigenvalues . . . . 583
11.2 Multiplicity and Gen-
eralized Eigenvec-
tors . . . . . . . . 598
11.3 The Jordan Normal
Form Theorem . . 608
11.4 *Markov Matrix The-
ory . . . . . . . . 625
11.5 *Proof of Jordan
Normal Form . . . 630
12 Matlab Commands . . . 634
Index . . . . . . . . . . . . . . . 639
Preface

Preface differential equations — both single equa-


tions and planar systems — to motivate
the notions of eigenvectors and eigenval-
ues. In subsequent chapters linear algebra
These notes provide an integrated ap- and ODE theory are often mixed.
proach to linear algebra and ordinary dif- Regarding differential equations, our pur-
ferential equations based on computers — pose is to introduce at the sophomore
in this case the software package MAT- – junior level ideas from dynamical sys-
LABR 1 . We believe that computers can tems theory. We focus on phase portraits
improve the conceptual understanding of (and time series) rather than on techniques
mathematics — not just enable the com- for finding closed form solutions. We as-
pletion of complicated calculations. We sume that now and in the future practic-
use computers in two ways: in linear al- ing scientists and mathematicians will use
gebra computers reduce the drudgery of ODE solving computer programs more fre-
calculations and enable students to focus quently than they will use techniques of in-
on concepts and methods, while in differ- tegration. For this reason we have focused
ential equations computers display phase on the information that is embedded in the
portraits graphically and enable students computer graphical approach. We discuss
to focus on the qualitative information em- both typical phase portraits (Morse-Smale
bodied in solutions rather than just on de- systems) and typical one parameter bifur-
veloping formulas for solutions. cations (both local and global). Our goal
We develop methods for solving both sys- is to provide the mathematical background
tems of linear equations and systems of that is needed when interpreting the re-
(constant coefficient) linear ordinary differ- sults of computer simulation.
ential equations. It is generally accepted
that linear algebra methods aid in find-
ing closed form solutions to systems of lin- The integration of computers: Our ap-
ear differential equations. The fact that proach assumes that students have an eas-
the graphical solution of systems of dif- ier time learning with computers if the
ferential equations can motivate concepts computer segments are fully integrated
(both geometric and algebraic) in linear al- with the course material. So we have in-
gebra is less often discussed. These notes terleaved the instructions on how to use
begin by solving linear systems of equa- MATLAB with the examples and theory in
tions (through standard Gaussian elimi- the text. With ease of use in mind, we have
nation theory) and discussing elementary also provided a number of preloaded ma-
matrix theory. We then introduce simple trices and differential equations with the
notes. Any equation label in this text that
1
MATLAB is a registered trademark of The is followed by an asterisk can be loaded
MathWorks Inc. Natick, MA into MATLAB just by typing the formula

i
Preface

number. For the successful use of this text, tions. This one semester course covers the
it is important that students have access material in the first eight chapters. The
to computers with MATLAB and the com- Linear Systems course stresses eigenval-
puter files associated with these notes. ues and a baby Jordan normal form the-
John Polking has developed an excellent ory for 2 × 2 matrices and culminates in a
graphical user interface for solving planar classification of phase portraits for planar
systems of autonomous differential equa- constant coefficient linear systems of dif-
tions called pplane9. We use pplane9 in- ferential equations. Time permitting ad-
ditional linear algebra topics from Chap-
stead of using the MATLAB native com-
ters 9 and 10 may be included. Such ma-
mands for solving ODEs. In these notes
we also provide an introduction to pplane9 terial includes changes of coordinates for
and the other associated software routines. linear mappings, and orthogonality includ-
ing Gram-Schmidt orthonormalization and
For the most part we treat the computer as least squares fitting of data.
a black box. We have not attempted to ex-
plain how the computer, or more precisely We believe that by being exposed to ODE
MATLAB, performs computations. Lin- theory a student taking just the first
ear algebra structures are developed (typ- semester of this sequence will gain a better
ically) with proofs, while differential equa- appreciation of linear algebra than will a
student who takes a standard one semester
tions theorems are presented (typically)
introduction to linear algebra. However, a
without proof and are instead motivated
by computer experimentation. more traditional Linear Algebra course can
be taught by omitting Chapter 7 and de-
There are two types of exercises included emphasizing some of the material in Chap-
with most sections — those that should be ter 6. Then there will be time in a one
completed using pencil and paper (called semester course to cover a selection of the
Hand Exercises) and those that should be linear algebra topics mentioned at the end
completed with the assistance of comput- of the previous paragraph.
ers (called Computer Exercises).

Chapters 1–3 We consider the first two


Ways to use the text: We envision this chapters to be introductory material and
course as a one-year sequence replacing we attempt to cover this material as
the standard one semester linear algebra quickly as we can. Chapter 1 introduces
and ODE courses. There is a natural MATLAB along with elementary remarks
one semester Linear Systems course that on vectors and matrices. In our course we
can be taught using the material in this ask the students to read the material in
book. In this course students will learn Chapter 1 and to use the computer instruc-
both the basics of linear algebra and the tions in that chapter as an entry into MAT-
basics of linear systems of differential equa- LAB. In class we cover only the material

ii
Preface

on dot product. Chapter 2 explains how the computer is mandatory in this chapter.
to solve systems of linear equations and is Chapter 4 dwells on the qualitative theory
required for a first course on linear algebra. of solutions to autonomous ordinary dif-
The proof of the uniqueness of reduced ech- ferential equations. In one dimension we
elon form matrices is not very illuminating discuss the importance of knowing equilib-
for students and can be omitted in class- ria and their stability so that we can un-
room discussion. Sections whose material derstand the fate of all solutions. In two
we feel can be omitted are noted by as- dimensions we emphasize constant coeffi-
terisks in the Table of Contents and Sec- cient linear systems and the existence (nu-
tion 2.6 is the first example of such a sec- merical) of invariant directions (eigendirec-
tion. tions). In this way we motivate the in-
In Chapter 3 we introduce matrix mul- troduction of eigenvalues and eigenvectors,
tiplication as a notation that simplifies which are discussed in detail for 2×2 matri-
the presentation of systems of linear equa- ces. Once we know how to compute eigen-
values and eigendirections, we then show
tions. We then show how matrix multipli-
how this information coupled with super-
cation leads to linear mappings and how
linearity leads to the principle of superpo- position leads to closed form solution to
sition. Multiplication of matrices is intro- initial value problems, at least when the
duced as composition of linear mappings, eigenvalues are real and distinct.
which makes transparent the observation We are not trying to give a thorough
that multiplication of matrices is associa- grounding in techniques for solving differ-
tive. The chapter ends with a discussion of ential equations in Chapter 4; rather we
inverse matrices and the role that inverses are trying to give an introduction to the
play in solving systems of linear equations. ways that modern computer programs will
The determinant of a 2 × 2 matrix is intro- represent graphically solutions to differen-
duced and its role in determining matrix tial equations. We have included, how-
inverses is emphasized. ever, a section on separation of variables
for those who wish to introduce techniques
for finding closed form solutions to single
Chapter 4 This chapter provides a non- differential equations at this time. Our
standard introduction to differential equa- preference is to omit this section in the
tions. We begin by emphasizing that solu- Linear Systems course as well as to omit
tions to differential equations are functions the applications in Section 4.2 of the linear
(or pairs of functions for planar systems). growth model in one dimension to interest
We explain in detail the two ways that we rates and population dynamics.
may graph solutions to differential equa-
tions (time series and phase space) and
how to go back and forth between these Chapter 5 In this chapter we introduce
two graphical representations. The use of vector space theory: vector spaces, sub-

iii
Preface

spaces, spanning sets, linear independence, rank equals column rank and the matrix
bases, dimensions and the other basic no- representation of mappings in different co-
tions in linear algebra. Since solutions to ordinate systems. The material in Sec-
differential equations naturally reside in tions 8.1 and 8.2 could be presented di-
function spaces, we are able to illustrate rectly after Chapter 5, while the material
that vector spaces other than Rn arise nat- in Section 8.3 explains the geometric mean-
urally. We have found that, depending on ing of similarity.
time, the proof of the main theorem, which Orthogonal bases and orthogonal matri-
appears in Section 5.6, may be omitted in a
ces, least squares and Gram-Schmidt or-
first course. The material in these chapters
thonormalization, and symmetric matrices
is mandatory in any first course on linear are presented in Chapter 10. This material
algebra. is very important, but is not required later
in the text, and may be omitted.
Chapter 6 At this juncture the text di- The Jordan normal form theorem for n × n
vides into two tracks: one concerned with matrices is presented in Chapter 11. Di-
the qualitative theory of solutions to linear agonalization of matrices with distinct real
and nonlinear planar systems of differential and complex eigenvalues is presented in the
equations and one mainly concerned with first two sections. The appendices, includ-
the development of higher dimensional lin- ing the proof of the complete Jordan nor-
ear algebra. We begin with a description mal form theorem, are included for com-
of the differential equations chapters. pleteness and should be omitted in class-
Chapter 6 describes closed form solutions room presentations.
to planar systems of constant coefficient
linear differential equations in two different
ways: a direct method based on eigenval- The Classroom Use of Computers At the
ues and eigenvectors and a related method University of Houston we use a classroom
based on similarity of matrices. Each with an IBM compatible PC and an over-
method has its virtues and vices. Note that head display. Lectures are presented three
the Jordan normal form theorem for 2 × 2 hours a week using a combination of black-
matrices is proved when discussing how to board and computer display. We find it
solve linear planar systems using similarity inadvisable to use the computer for more
of matrices. than five minutes at a time; we tend to go
back and forth between standard lecture
Chapters 7, 8, 10, and 11 Chapter 7 dis- style and computer presentations. (The
preloaded matrices and differential equa-
cusses determinants, characteristic polyno-
tions are important to the smooth use of
mials, and eigenvalues for n × n matri-
ces. Chapter 8 presents more advanced the computer in class.)
material on linear mappings including row We ask students to enroll in a one hour

iv
Preface

computer lab where they can practice us- dents who stayed with this course on an
ing the material in the text on a computer, experimental basis and by doing so helped
do their homework and additional projects, to shape its form.
and ask questions of TA’s. Our computer
lab happens to have 15 power macs. In Houston and Bayreuth Martin Golubitsky
addition, we ensure that MATLAB and the May, 1998 Michael Dellnitz
laode files are available on student use com-
Columbus Martin Golubitsky
puters around the campus (which is not
February, 2018 James Fowler
always easy). The laode files are on the
enclosed CDROM; they may also be down-
loaded by using a web browser or by anony-
mous ftp.

Acknowledgements This course was first


taught on a pilot basis during the 1995–96
academic year at the University of Hous-
ton. We thank the Mathematics De-
partment and the College of Natural Sci-
ences and Mathematics of the University of
Houston for providing the resources needed
to bring a course such as this to fruition.
We gratefully acknowledge John Polking’s
help in adapting his software for our use
and for allowing us access to his code so
that we could write companion software for
use in linear algebra.
We thank Denny Brown for his advice and
his careful readings of the many drafts
of this manuscript. We thank Gerhard
Dangelmayr, Michael Field, Michael Fried-
berg, Steven Fuchs, Kimber Gross, Bar-
bara Keyfitz, Charles Peters and David
Wagner for their advice on the presenta-
tion of the material. We also thank Eliza-
beth Golubitsky, who has written the com-
panion Solutions Manual, for her help in
keeping the material accessible and in a
proper order. Finally, we thank the stu-

v
Chapter 1 Preliminaries

1 Preliminaries
The subjects of linear algebra and differential equations involve manipulating vector equa-
tions. In this chapter we introduce our notation for vectors and matrices — and we introduce
MATLAB, a computer program that is designed to perform vector manipulations in a natural
way.
We begin, in Section 1.1, by defining vectors and matrices, and by explaining how to add
and scalar multiply vectors and matrices. In Section 1.2 we explain how to enter vectors and
matrices into MATLAB, and how to perform the operations of addition and scalar multipli-
cation in MATLAB. There are many special types of matrices; these types are introduced in
Section 1.3. In the concluding section, we introduce the geometric interpretations of vector
addition and scalar multiplication; in addition we discuss the angle between vectors through
the use of the dot product of two vectors.

1
§1.1 Vectors and Matrices

{chap:prelim}
{S:1.1} 1.1 Vectors and Matrices
In their elementary form, matrices and vectors are just lists of real numbers in different
formats. An n-vector is a list of n numbers (x1 , x2 , . . . , xn ). We may write this vector as a
row vector as we have just done — or as a column vector
 
x1
 .. 
 . .
xn

The set of all (real-valued) n-vectors is denoted by Rn ; so points in Rn are called vectors.
The sets Rn when n is small are very familiar sets. The set R1 = R is the real number
line, and the set R2 is the Cartesian plane. The set R3 consists of points or vectors in three
dimensional space.
An m × n matrix is a rectangular array of numbers with m rows and n columns. A general
2 × 3 matrix has the form  
a11 a12 a13
A= .
a21 a22 a23
We use the convention that matrix entries aij are indexed so that the first subscript i refers
to the row while the second subscript j refers to the column. So the entry a21 refers to the
matrix entry in the 2nd row, 1st column.
An n × m matrix A and an n0 × m0 matrix B are equal precisely when the sizes of the
matrices are equal (n = n0 and m = m0 ) and when each of the corresponding entries are
equal (aij = bij ).
There is some redundancy in the use of the terms “vector” and “matrix”. For example, a
row n-vector may be thought of as a 1 × n matrix, and a column n-vector may be thought
of as a n × 1 matrix. There are situations where matrix notation is preferable to vector
notation and vice-versa.

Addition and Scalar Multiplication of Vectors There are two basic operations on vectors:
addition and scalar multiplication. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) be n-vectors.
Then
x + y = (x1 + y1 , . . . , xn + yn );
that is, vector addition is defined as componentwise addition.
Similarly, scalar multiplication is defined as componentwise multiplication. A scalar is just
a number. Initially, we use the term scalar to refer to a real number — but later on we

2
§1.1 Vectors and Matrices

sometimes use the term scalar to refer to a complex number. Suppose r is a real number;
then the multiplication of a vector by the scalar r is defined as

rx = (rx1 , . . . , rxn ).

Subtraction of vectors is defined simply as

x − y = (x1 − y1 , . . . , xn − yn ).

Formally, subtraction of vectors may also be defined as

x − y = x + (−1)y.

Division of a vector x by a scalar r is defined to be


1
x.
r
The standard difficulties concerning division by zero still hold.

Addition and Scalar Multiplication of Matrices Similarly, we add two m × n matrices by


adding corresponding entries, and we multiply a scalar times a matrix by multiplying each
entry of the matrix by that scalar. For example,
     
0 2 1 −3 1 −1
+ =
4 6 1 4 5 10

and    
2 −4 8 −16
4 = .
3 1 12 4
The main restriction on adding two matrices is that the matrices must be of the same size.
So you cannot add a 4 × 3 matrix to 6 × 2 matrix — even though they both have twelve
entries.

Exercises

{c1.1.1A} In Exercises 1 – 3, let x = (2, 1, 3) and y = (1, 1, −1) and compute the given expression.
1. x + y.
Answer: x + y = (3, 2, 2).

3
§1.1 Vectors and Matrices

{c1.1.1B}
2. 2x − 3y.

{c1.1.1C} Answer: 2x − 3y = (4, 2, 6) − (3, 3, −3) = (1, −1, 9).


3. 4x.
Answer: 4x = (8, 4, 12).
{c1.1.2}
4. Let A be the 3 × 4 matrix  
2 −1 0 1
A= 3 4 −7 10  .
6 −3 4 2

(a) For which n is a row of A a vector in Rn ? .


(b) What is the 2nd column of A?
(c) Let aij be the entry of A in the ith row and the j th column. What is a23 − a31 ?

Answer:
 (a)
 The number of entries in a row is the number of columns. Thus, n = 4;
−1
(b)  4 ; (c) a23 − a31 = −7 − 6 = −13.
−3

For each of the pairs of vectors or matrices in Exercises 5 – 9, decide whether addition of the
{c1.1.3a} members of the pair is possible; and, if addition is possible, perform the addition.
5. x = (2, 1) and y = (3, −1).

{c1.1.3b} Answer: x + y = (5, 0).


6. x = (1, 2, 2) and y = (−2, 1, 4).

{c1.1.3c} Answer: x + y = (−1, 3, 6).


7. x = (1, 2, 3) and y = (−2, 1).
{c1.1.3d} Answer: x has three entries; y has two entries; addition is not possible.
   
1 3 2 1
8. A = and B = .
0 4 1 −2
 
3 4
Answer: A + B = .
{c1.1.3e} 1 2
 
2 1 0  
2 1
9. A =  4 1 0  and B = .
1 −2
0 0 0
Answer: Addition is not possible.

4
§1.1 Vectors and Matrices

   
2 1 0 2
In Exercises 10 – 11, let A = and B = and compute the given expression.
−1 4 3 −1
{c1.1.4A}
10. 4A + B.
 
8 6
Answer: 4A + B = .
−1 15
{c1.1.4B}
11. 2A − 3B.
 
4 −4
Answer: 2A − 3B = .
−11 11

5
§1.2 MATLAB

{S:1.2} 1.2 MATLAB


We shall use MATLAB to compute addition and scalar multiplication of vectors in two
and three dimensions. This will serve the purpose of introducing some basic MATLAB
commands.

Entering Vectors and Vector Operations Begin a MATLAB session. We now discuss how
to enter a vector into MATLAB. The syntax is straightforward; to enter the row vector
x = (1, 2, 1) type2

x = [1 2 1]

and MATLAB responds with

x =
1 2 1

Next we show how easy it is to perform addition and scalar multiplication in MATLAB.
Enter the row vector y = (2, −1, 1) by typing

y = [2 -1 1]

and MATLAB responds with

y =
2 -1 1

To add the vectors x and y, type

x + y

and MATLAB responds with

ans =
3 1 2
2
MATLAB has several useful line editing features. We point out two here:
(a) Horizontal arrow keys (→, ←) move the cursor one space without deleting a character.
(b) Vertical arrow keys (↑, ↓) recall previous and next command lines.

6
§1.2 MATLAB

This vector is easily checked to be the sum of the vectors x and y. Similarly, to perform a
scalar multiplication, type

2*x

which yields

ans =
2 4 2

MATLAB subtracts the vector y from the vector x in the natural way. Type

x - y

to obtain

ans =
-1 3 0

We mention two points concerning the operations that we have just performed in MATLAB.

(a) When entering a vector or a number, MATLAB automatically echoes what has been
entered. This echoing can be suppressed by appending a semicolon to the line. For
example, type

z = [-1 2 3];

and MATLAB responds with a new line awaiting a new command. To see the contents
of the vector z just type z and MATLAB responds with

z =
-1 2 3

(b) MATLAB stores in a new vector the information obtained by algebraic manipulation.
Type

a = 2*x - 3*y + 4*z;

Now type a to find

7
§1.2 MATLAB

a =
-8 15 11
We see that MATLAB has created a new row vector a with the correct number of entries.

Note: In order to use the result of a calculation later in a MATLAB session, we need to
name the result of that calculation. To recall the calculation 2*x - 3*y + 4*z, we needed
to name that calculation, which we did by typing a = 2*x - 3*y + 4*z. Then we were
able to recall the result just by typing a.
We have seen that we enter a row n vector into MATLAB by surrounding a list of n
numbers separated by spaces with square brackets. For example, to enter the 5-vector
w = (1, 3, 5, 7, 9) just type

w = [1 3 5 7 9]

Note that the addition of two vectors is only defined when the vectors have the same number
of entries. Trying to add the 3-vector x with the 5-vector w by typing x + w in MATLAB
yields the warning:

??? Error using ==> +


Matrix dimensions must agree.

In MATLAB new rows are indicated by typing ;. For example, to enter the column vector
 
−1
z =  2 ,
3
just type:

z = [-1; 2; 3]

and MATLAB responds with

z =
-1
2
3

Note that MATLAB will not add a row vector and a column vector. Try typing x + z.
Individual entries of a vector can also be addressed. For instance, to display the first
component of z type z(1).

8
§1.2 MATLAB

Entering Matrices Matrices are entered into MATLAB row by row with rows separated
either by semicolons or by line returns. To enter the 2 × 3 matrix
 
2 3 1
A= ,
1 4 7
just type

A = [2 3 1; 1 4 7]

MATLAB has very sophisticated methods for addressing the entries of a matrix. You can
directly address individual entries, individual rows, and individual columns. To display the
entry in the 1st row, 3rd column of A, type A(1,3). To display the 2nd column of A, type
A(:,2); and to display the 1st row of A, type A(1,:). For example, to add the two rows of
A and store them in the vector x, just type

x = A(1,:) + A(2,:)

MATLAB has many operations involving matrices — these will be introduced later, as
needed.

Exercises

{c1.2.1} 1. (matlab) Enter the 3 × 4 matrix


 
1 2 5 7
A =  −1 2 1 −2  .
4 6 8 0

As usual, let aij denote the entry of A in the ith row and j th column. Use MATLAB to compute
the following:

(a) a13 + a32 .


(b) Three times the 3rd column of A.
(c) Twice the 2nd row of A minus the 3rd row.
(d) The sum of all of the columns of A.
   
15 15
Answer: (a) 11; (b)  3 ; (c) 2(−1, 2, 1, −2) − (4, 6, 8, 0) = (−6, −2, −6, −4); (d)  0 .
24 18

9
§1.2 MATLAB

{c1.2.2} 2. (matlab) Verify that MATLAB adds vectors only if they are of the same type, by typing
(a) x = [1 2], y = [2; 3] and x + y.
(b) x = [1 2], y = [2 3 1] and x + y.

Answer: Typing x + y should generate a MATLAB error in both cases.

In Exercises 3 – 4, let x = (1.2, 1.4, −2.45) and y = (−2.6, 1.1, 0.65) and use MATLAB to
compute the given expression.

{c1.2.3a} 3. (matlab) 3.27x − 7.4y.


Answer: 3.27x − 7.4y = (23.1640, −3.5620, −12.8215).

{c1.2.3b} 4. (matlab) 1.65x + 2.46y.


Answer: 1.65x + 2.46y = (−4.4160, 5.0160, −2.4435).

In Exercises 5 – 6, let
   
1.2 2.3 −0.5 −2.9 1.23 1.6
A= and B=
0.7 −1.4 2.3 −2.2 1.67 0

and use MATLAB to compute the given expression.

{c1.2.4a} 5. (matlab) −4.2A + 3.1B.


 
−14.0300 −5.8470 7.0600
Answer: −4.2A + 3.1B = .
−9.7600 11.0570 −9.6600

{c1.2.4b} 6. (matlab) 2.67A − 1.1B.


 
6.3940 4.7880 −3.0950
Answer: 2.67A − 1.1B = .
4.2890 −5.5750 6.1410

10
§1.3 Special Kinds of Matrices

{S:1.3} 1.3 Special Kinds of Matrices


There are many matrices that have special forms and hence have special names — which
we now list.

• A square matrix is a matrix with the same number of rows and columns; that is, a
square matrix is an n × n matrix.
• A diagonal matrix is a square matrix whose only nonzero entries are along the main
diagonal; that is, aij = 0 if i 6= j. The following is a 3 × 3 diagonal matrix
 
1 0 0
 0 2 0 .
0 0 3

There is a shorthand in MATLAB for entering diagonal matrices. To enter this 3 × 3


matrix, type diag([1 2 3]).
• The identity matrix is the diagonal matrix all of whose diagonal entries equal 1. The
n × n identity matrix is denoted by In . This identity matrix is entered in MATLAB by
typing eye(n).
• A zero matrix is a matrix all of whose entries are 0. A zero matrix is denoted by
0. This notation is ambiguous since there is a zero m × n matrix for every m and
n. Nevertheless, this ambiguity rarely causes any difficulty. In MATLAB, to define an
m × n matrix A whose entries all equal 0, just type A = zeros(m,n). To define an
n × n zero matrix B, type B = zeros(n).
• The transpose of an m×n matrix A is the n×m matrix obtained from A by interchanging
rows and columns. Thus the transpose of the 4 × 2 matrix
 
2 1
 −1 2 
 
 3 −4 
5 7

is the 2 × 4 matrix  
2 −1 3 5
.
1 2 −4 7
Suppose that you enter this 4 × 2 matrix into MATLAB by typing

A = [2 1; -1 2; 3 -4; 5 7]

11
§1.3 Special Kinds of Matrices

The transpose of a matrix A is denoted by At . To compute the transpose of A in


MATLAB, just type A0 .
• A symmetric matrix is a square matrix whose entries are symmetric about the main
diagonal; that is aij = aji . Note that a symmetric matrix is a square matrix A for
which At = A.
• An upper triangular matrix is a square matrix all of whose entries below the main
diagonal are 0; that is, aij = 0 if i > j. A strictly upper triangular matrix is an upper
triangular matrix whose diagonal entries are also equal to 0. Similar definitions hold
for lower triangular and strictly lower triangular matrices. The following four 3 × 3
matrices are examples of upper triangular, strictly upper triangular, lower triangular,
and strictly lower triangular matrices:
   
1 2 3 0 2 3
 0 2 4   0 0 4 
0 0 6 0 0 0
   
7 0 0 0 0 0
 5 2 0   5 0 0 .
−4 1 −3 10 1 0
• A square matrix A is block diagonal if
 
B1 0 ··· 0
 0 B2 ··· 0 
A= . .. .. ..
 ..
 
. . .


0 0 ··· Bk
where each Bj is itself a square matrix. An example of a 5 × 5 block diagonal matrix
with one 2 × 2 block and one 3 × 3 block is:
 
2 3 0 0 0
 4 1 0 0 0 
 
 0 0 1 2 3 .
 
 0 0 3 2 4 
0 0 1 1 5

Exercises

In Exercises 1 – 5 decide whether or not the given matrix is symmetric.

12
§1.3 Special Kinds of Matrices

{c1.1.01a}
 
2 1
1. .
1 5
Answer: The matrix is symmetric.
{c1.1.01b}
 
1 1
2. .
0 −5
Answer: The matrix is not symmetric.
{c1.1.01c}
3. (3).
Answer: The matrix is symmetric.
{c1.1.01d}
 
3 4
4.  4 3 .
0 1
Answer: The matrix is not symmetric.
{c1.1.01e}
 
3 4 −1
5. A =  4 3 1 .
−1 1 10
Answer: Since a21 = a12 , a31 = a13 , and a32 = a23 , the matrix is symmetric.

In Exercises 6 – 10 decide which of the given matrices are upper (or lower) triangular and which
are strictly upper (or lower) triangular.
{c1.1.02a}
 
2 0
6. .
−1 −2
Answer: The matrix is lower triangular.
{c1.1.02b}
 
0 4
7. .
0 0
Answer: The matrix is strictly upper triangular.
{c1.1.02c}
8. (2).
Answer: The matrix is upper triangular.
{c1.1.02d}
 
3 2
9.  0 1 .
0 0
Answer: The matrix is not upper triangular since a triangular matrix must be square.

13
§1.3 Special Kinds of Matrices

{c1.1.02e}
 
0 2 −4
10.  0 7 −2 .
0 0 0
Answer: The matrix is upper triangular.

 
a 0
A general 2 × 2 diagonal matrix has the form . Thus the two unknown real numbers a
0 b
and b are needed to specify each 2 × 2 diagonal matrix.
In Exercises 11 – 16, how many unknown real numbers are needed to specify each of the given
{c1.3.1a} matrices:
11. An upper triangular 2 × 2 matrix?
 
a11 a12
Solution: A 2 × 2 upper triangular matrix A has the form A = . Thus the number
0 a22
{c1.3.1b} of entries needed to define A is 3.
12. A symmetric 2 × 2 matrix?

{c1.3.2} Answer: 3.
13. An m × n matrix?
Answer: Each row of the matrix has n entries and there are m rows. Hence the number of unknown
{c1.3.3a} entries is mn.
14. A diagonal n × n matrix?

{c1.3.3b} Answer: n.
15. An upper triangular n × n matrix?

n(n + 1)
Solution: 1 + 2 + · · · + (n − 1) + n = .
{c1.3.3c} 2
16. A symmetric n × n matrix?
Solution: The number of independent entries in row k of an n × n symmetric matrix is n − k + 1.
Thus the number of independent entries in the matrix is
n
X n(n + 1)
n + (n − 1) + · · · + 1 = 1 + 2 + · · · + n = k= .
2
k=1

In each of Exercises 17 – 19 determine whether the statement is True or False?

14
§1.3 Special Kinds of Matrices

{c1.3.4a}
17. Every symmetric, upper triangular matrix is diagonal.

{c1.3.4b} Answer: T rue.


18. Every diagonal matrix is a multiple of the identity matrix.
 
2 0 0
Answer: F alse — for example:  0 1 0 .
0 0 3
{c1.3.4c}
19. Every block diagonal matrix is symmetric.
 
1 2 0
Answer: F alse — for example:  3 1 0 .
0 0 4

{c1.3.5a} 20. (matlab) Use MATLAB to compute A when


t

 
1 2 4 7
A= 2 1 5 6  (1.3.1)
4 6 2 1

Use MATLAB to verify that (At )t = A by setting B=A', C=B', and checking that C = A.
 
1 2 4
 2 1 6 
Answer: At = 
 4 5 2 .

7 6 1

{c1.3.5b} 21. (matlab) Use MATLAB to compute A when A = (3) is a 1 × 1 matrix.


t

Answer: At = (3).

15
§1.4 The Geometry of Vector Operations

{S:1.4} 1.4 The Geometry of Vector Operations


In this section we discuss the geometry of addition, scalar multiplication, and dot product
of vectors. We also use MATLAB graphics to visualize these operations.

Geometry of Addition MATLAB has an excellent graphics language that we shall use at
various times to illustrate concepts in both two and three dimensions. In order to make the
connections between ideas and graphics more transparent, we will sometimes use previously
developed MATLAB programs. We begin with such an example — the illustration of the
parallelogram law for vector addition.
Suppose that x and y are two planar vectors. Think of these vectors as line segments from
the origin to the points x and y in R2 . We use a program written by T.A. Bryan to visualize
x + y. In MATLAB type3 :

x = [1 2];
y = [-2 3];
addvec(x,y)

The vector x is displayed in blue, the vector y in green, and the vector x + y in red. Note
that x + y is just the diagonal of the parallelogram spanned by x and y. A black and white
version of this figure is given in Figure 1.

x+y
5

3 y

2 x

−1

−2

−3

−4

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5

{F:vec2} Figure 1: Addition of two planar vectors.


3
Note that all MATLAB commands are case sensitive — upper and lower case must be correct

16
§1.4 The Geometry of Vector Operations

The parallelogram law (the diagonal of the parallelogram spanned by x and y is x + y) is


equally valid in three dimensions. Use MATLAB to verify this statement by typing:

x = [1 0 2];
y = [-1 4 1];
addvec3(x,y)

The parallelogram spanned by x and y in R3 is shown in cyan; the diagonal x + y is shown


in blue. See Figure 2. To test your geometric intuition, make several choices of vectors x
and y. Note that one vertex of the parallelogram is always the origin.

x+y

2.5

2 x

1.5 y

0.5

0
4 0
2 4
0 2
0
−2
−2
−4 −4

{F:vec3} Figure 2: Addition of two vectors in three dimensions.

Geometry of Scalar Multiplication In all dimensions scalar multiplication just scales the
length of the vector. To discuss this point we need to define the length of a vector. View
an n-vector x = (x1 , . . . , xn ) as a line segment from the origin to the point x. Using the
Pythagorean theorem, it can be shown that the length or norm of this line segment is:
q
||x|| = x21 + · · · + x2n .

MATLAB has the command norm for finding the length of a vector. Test this by entering
the 3-vector

x = [1 4 2];

17
§1.4 The Geometry of Vector Operations

Then type

norm(x)

MATLAB responds with:

ans =
4.5826

which is indeed approximately
p
1 + 4 2 + 22 = 21.
Now suppose r ∈ R and x ∈ R . A calculation shows that
n

{E:lengths} ||rx|| = |r|||x||. (1.4.1)

See Exercise 18. Note also that if r is positive, then the direction of rx is the same as that
of x; while if r is negative, then the direction of rx is opposite to the direction of x. The
lengths of the vectors 3x and −3x are each three times the length of x — but these vectors
point in opposite directions. Scalar multiplication by the scalar 0 produces the 0 vector, the
vector whose entries are all zero.

Dot Product and Angles The dot product of two n-vectors x = (x1 , . . . , xn ) and y =
(y1 , . . . , yn ) is an important operation on vectors. It is defined by:

{e:dotproduct} x · y = x1 y1 + · · · + xn yn . (1.4.2)

Note that x · x is just ||x||2 , the length of x squared.


MATLAB also has a command for computing dot products of n-vectors. Type

x = [1 4 2];
y = [2 3 -1];
dot(x,y)

MATLAB responds with the dot product of x and y, namely,

ans =
12

One of the most important facts concerning dot products is the one that states

{dotprod=0} x·y =0 if and only if x and y are perpendicular. (1.4.3)

18
§1.4 The Geometry of Vector Operations

Indeed, dot product also gives a way of numerically determining the angle between n-
vectors, as follows.
{T:dotangle}
Theorem 1.4.1. Let θ be the angle between two nonzero n-vectors x and y. Then
x·y
{e:dotproductang} cos θ = . (1.4.4)
||x||||y||

It follows that cos θ = 0 if and only if x · y = 0. Thus (1.4.3) is valid.


We show that Theorem 1.4.1 is just a restatement of the law of cosines. This law states

c2 = a2 + b2 − 2ab cos θ,

where a, b, c are the lengths of the sides of a triangle and θ is the interior angle opposite the
side of length c. See Figure 3.

(a cos θ, a sin θ)

a c

θ
(0, 0) b (b, 0)

{F:cosines} Figure 3: Triangle formed by sides of length a, b, c with interior angle θ opposite side c.

We use trigonometry to verify the law of cosines. First, translate the triangle so that a vertex
is at the origin. Second, rotate the triangle placing a vertex on the x-axis and another vertex
above the x-axis. After translating and rotating, the coordinates of the nonzero vertex on
the x-axis is (b, 0). Observe that the vertex above the x-axis has coordinates (a cos θ, a sin θ).
Then use the distance formula to observe that the length c is the distance from the vertex
at (b, 0) to the vertex at (a cos θ, a sin θ). That is,

c2 = (a cos θ − b)2 + (a sin θ)2


= a2 cos2 θ − 2ab cos θ + b2 + a2 sin2 θ
= a2 + b2 − 2ab cos θ.

19
§1.4 The Geometry of Vector Operations

x-y

θ y

{F:costri} Figure 4: Triangle formed by vectors x and y with interior angle θ.

Proof of Theorem 1.4.1 In vector notation we can form a triangle two of whose sides are
given by x and y in Rn . The third side is just x − y as x = y + (x − y), as in Figure 4.
It follows from the law of cosines that

||x − y||2 = ||x||2 + ||y||2 − 2||x||||y|| cos θ.

We claim that
||x − y||2 = ||x||2 + ||y||2 − 2x · y.
Assuming that the claim is valid, it follows that

x · y = ||x||||y|| cos θ,

which proves the theorem. Finally, compute

||x − y||2 = (x1 − y1 )2 + · · · + (xn − yn )2


= (x21 − 2x1 y1 + y12 ) + · · · + (x2n − 2xn yn + yn2 )
= (x21 + · · · + x2n ) − 2(x1 y1 + · · · + xn yn ) + (y12 + · · · + yn2 )
= ||x||2 − 2x · y + ||y||2

to verify the claim. 


Theorem 1.4.1 gives a numerically efficient method for computing the angle between vectors
x and y. In MATLAB this computation proceeds by typing

theta = acos(dot(x,y)/(norm(x)*norm(y)))

20
§1.4 The Geometry of Vector Operations

where acos is the inverse cosine of a number. For example, using the 3-vectors x = (1, 4, 2)
and y = (2, 3, −1) entered previously, MATLAB responds with

theta =
0.7956

Remember that this answer is in radians. To convert this answer to degrees, just multiply
by 360 and divide by 2π:

360*theta / (2*pi)

to obtain the answer of 45.5847◦ .

Area of Parallelograms Let P be a parallelogram whose sides are the vectors v and w as in
Figure 5. Let |P | denote the area of P . As an application of dot products and (1.4.4), we
calculate |P |. We claim that

{e:areaP} |P |2 = ||v||2 ||w||2 − (v · w)2 . (1.4.5)

We verify (1.4.5) as follows. Note that the area of P is the same as the area of the rectangle
R also pictured in Figure 5. The side lengths of R are: ||v|| and ||w|| sin θ where θ is the
angle between v and w. A computation using (1.4.4) shows that

|R|2 = ||v||2 ||w||2 sin2 θ


= ||v||2 ||w||2 (1 − cos2 θ)
 2 !
2 2 v·w
= ||v|| ||w|| 1 −
||v||||w||
= ||v||2 ||w||2 − (v · w)2 ,

which establishes (1.4.5).

Exercises

{c1.4.8a} In Exercises 1 – 4 compute the lengths of the given vectors.


1. x = (3, 0).
Answer: The length of x is
p
32 + 02 = 3.

21
w

P |w| sin(θ) R
θ
0 v |v|

{F:parallel} Figure 5: Parallelogram P beside rectangle R with same area.

{c1.4.8b}
2. x = (2, −1).

{c1.4.8c} Answer: The length of x is
p
22 + (−1)2 = 5.

3. x = (−1, 1, 1).

{c1.4.8d} Answer: The length of x is
p
(−1)2 + 12 + 12 = 3.

4. x = (−1, 0, 2, −1, 3).



Answer: The length of x is
p
(−1)2 + 02 + 22 + (−1)2 + 32 = 15.

{c1.4.1a} In Exercises 5 – 8 determine whether the given pair of vectors is perpendicular.


5. x = (1, 3) and y = (3, −1).
Answer: The vectors are perpendicular.

{c1.4.1b} Solution: Vectors x and y are perpendicular if and only if x · y = 0. In this case, (1, 3) · (3, −1) = 0.
6. x = (2, −1) and y = (−2, 1).
Answer: The vectors are not perpendicular.

{c1.4.1bb} Solution: Compute: (2, −1) · (−2, 1) = −5.

7. x = (1, 1, 3, 5) and y = (1, −4, 3, 0).


Answer: The vectors are not perpendicular.

{c1.4.1c} Solution: Compute: (1, 1, 3, 5) · (1, −4, 3, 0) = 6.


8. x = (2, 1, 4, 5) and y = (1, −4, 3, −2).
Answer: The vectors are perpendicular.
Solution: Compute: (2, 1, 4, 5) · (1, −4, 3, −2) = 0.
§1.4 The Geometry of Vector Operations

{c1.4.2}
9. Find a real number a so that the vectors

x = (1, 3, 2) and y = (2, a, −6)

are perpendicular.
Solution: The vectors x and y are perpendicular when (1, 3, 2) · (2, a, −6) = 3a − 10 = 0. Thus,
10
a= .
3

{c1.4.3}
10. Find the lengths of the vectors u = (2, 1, −2) and v = (0, 1, −1), and the angle between them.

Answer: 45◦

Solution: ||u|| =
p p
22 + 12 + (−2)2 = 3; ||v|| = 02 + 12 + (−1)2 = 2;
u·v 3 1 π
cos θ = = √ = √ = = 45◦ .
||u|| ||v|| 3 2 2 4

{mc.exercise1}
11. Find the cosine of the angle between the normal vectors to the planes

2x − 2y + z = 14 and x + y − 2z = −10.

2
Answer: − √
3 6
Solution: The normal vectors are v = (2, −2, 1) and w = (1, 1, −2). The cosine of the angle θ
between the normal vectors is
v·w −2 2
cos(θ) = = √ √ =− √ .
||v|| ||w|| 9 6 3 6

In Exercises 12 – 17 compute the dot product x · y for the given pair of vectors and the cosine of
the angle between them.
{c1.4.9a}
12. x = (2, 0) and y = (2, 1).
2
Answer: The dot product x · y = 4, and the cosine of the angle θ between x and y is √ .
5

Solution: Compute x · y = 4, ||x|| = 2, and ||y|| = 5. Then by Theorem 1.4.1,

x·y 2
cos θ = = √ .
||x|| ||y|| 5

23
§1.4 The Geometry of Vector Operations

{c1.4.9b}
13. x = (2, −1) and y = (1, 2).
Answer: The dot product x · y = 0, and the cosine of the angle θ between x and y is 0.
√ √
Solution: Compute x · y = 0, ||x|| = 5, and ||y|| = 5. Then by Theorem 1.4.1,
x·y 0
cos θ = = = 0.
||x|| ||y|| 5
{c1.4.9c}
14. x = (−1, 1, 4) and y = (0, 1, 3).
13
Answer: The dot product x · y = 13, and the cosine of the angle θ between x and y is √ .
6 5
√ √
Solution: Compute x · y = 13, ||x|| = 3 2, and ||y|| = 10. Then by Theorem 1.4.1,
x·y 13 13
cos θ = = √ = √ .
||x|| ||y|| 3 20 6 5
{c1.4.9d}
15. x = (−10, 1, 0) and y = (0, 1, 20).
1
Answer: The dot product x · y = 1, and the cosine of the angle θ between x and y is √ ≈
40501
0.0050.
√ √
Solution: Compute x · y = 1, ||x|| = 101, and ||y|| = 401. Then by Theorem 1.4.1,
x·y 1 1
cos θ = = √ √ = √ ≈ 0.0050.
||x|| ||y|| 101 401 40501
{c1.4.9e}
16. x = (2, −1, 1, 3, 0) and y = (4, 0, 2, 7, 5).
31
Answer: The dot product x · y = 31, and the cosine of the angle θ between x and y is √ ≈
1410
0.8256.
√ √
Solution: Compute x · y = 31, ||x|| = 15, and ||y|| = 94. Then by Theorem 1.4.1,
x·y 31 31
cos θ = = √ √ = √ ≈ 0.8256.
||x|| ||y|| 15 94 1410
{c1.4.9f}
17. x = (5, −1, 4, 1, 0, 0) and y = (−3, 0, 0, 1, 10, −5).
−14
Answer: The dot product x · y = −14, and the cosine of the angle θ between x and y is √ ≈
5805
−0.1837.
√ √
Solution: Compute x · y = −14, ||x|| = 43, and ||y|| = 135. Then by Theorem 1.4.1,
x·y 14 14
cos θ = = −√ √ = −√ ≈ −0.1837.
||x|| ||y|| 43 135 5805

24
§1.4 The Geometry of Vector Operations

{c1.4.9A}
18. Using the definition of length, verify that formula (1.4.1) is valid.
q
Solution: Using the definition ||x|| = x21 + · · · + x2n , we can compute

||rx|| = ||r(x1 , . . . , xn )||


= ||(rx
p 1 , . . . , rxn )||
q(rx1 ) + · · · + (rxn )
= 2 2

= r2 (x21 + · · · + x2n )
q
= |r| x21 + · · · + x2n
= |r|||x||.

{c1.4.4} 19. (matlab) Use addvec and addvec3 to add vectors in R and R . More precisely, enter pairs
2 3

of 2-vectors x and y of your choosing into MATLAB, use addvec to compute x+y, and note the
parallelogram formed by 0, x, y, x + y. Similarly, enter pairs of 3-vectors and use addvec3.

{c1.4.5} 20. (matlab) Determine the vector of length 1 that points in the same direction as the vector
x = (2, 13.5, −6.7, 5.23).

x
Answer: = (0.1244, 0.8397, −0.4167, 0.3253).
||x||

{c1.4.5b} 21. (matlab) Determine the vector of length 1 that points in the same direction as the vector
y = (2.1, −3.5, 1.5, 1.3, 5.2).

x
Answer: = (0.3043, −0.5071, 0.2173, 0.1883, 0.7534).
||x||

In Exercises 22– 24 find the angle in degrees between the given pair of vectors.

{c1.4.6a} 22. (matlab) x = (2, 1, −3, 4) and y = (1, 1, −5, 7).


 
x·y
Answer: θ = arccos = 0.2715 radians = 15.5570◦ .
||x|| ||y||

{c1.4.6b} 23. (matlab) Answer: x = (2.43, 10.2, −5.27, π) and y = (−2.2, 0.33, 4, −1.7).
 
x·y
Answer: θ = arccos = 2.0701 = 118.6076◦ .
||x|| ||y||

25
§1.4 The Geometry of Vector Operations

{c1.4.6c} 24. (matlab) x = (1, −2, 2, 1, 2.1) and y = (−3.44, 1.2, 1.5, −2, −3.5).
 
x·y
Answer: θ = arccos = 2.1769 = 124.7286◦ .
||x|| ||y||

In Exercises 25 – 26 let P be the parallelogram generated by the given vectors v and w in R3 .


Compute the area of that parallelogram.

{c1.4.7a} 25. (matlab) v = (1, 5, 7) and w = (−2, 4, 13).



Answer: The area of P is 2294 ≈ 47.8957.
Solution: Using (1.4.5),

|P |2 = ||v||2 ||w||2 − (v · w)2 = 75(189) − 1092 = 2294.

{c1.4.7b} 26. (matlab) v = (2, −1, 1) and w = (−1, 4, 3).



Answer: The area of P is 147 ≈ 12.1244.

26
Chapter 2 Solving Linear Equations

2 Solving Linear Equations


The primary motivation for the study of vectors and matrices is based on the study of
solving systems of linear equations. The algorithms that enable us to find solutions are
themselves based on certain kinds of matrix manipulations. In these algorithms, matrices
serve as a shorthand for calculation, rather than as a basis for a theory. We will see later
that these matrix manipulations do lead to a rich theory of how to solve systems of linear
equations. But our first step is just to see how these equations are actually solved.
We begin with a discussion in Section 2.1 of how to write systems of linear equations in
terms of matrices. We also show by example how complicated writing down the answer
to such systems can be. In Section 2.2, we recall that solution sets to systems of linear
equations in two and three variables are lines and planes.
The best known and probably the most efficient method for solving systems of linear equa-
tions (especially with a moderate to large number of unknowns) is Gaussian elimination.
The idea behind this method, which is introduced in Section 2.3, is to manipulate matrices
by elementary row operations to reduced echelon form. It is then possible just to look at
the reduced echelon form matrix and to read off the solutions to the linear system, if any.
The process of reading off the solutions is formalized in Section 2.4; see Theorem 2.4.6. Our
discussion of solving linear equations is presented with equations whose coefficients are real
numbers — though most of our examples have just integer coefficients. The methods work
just as well with complex numbers, and this generalization is discussed in Section 2.5.
Throughout this chapter, we alternately discuss the theory and show how calculations that
are tedious when done by hand can easily be performed by computer using MATLAB. The
chapter ends with a proof of the uniqueness of row echelon form (a topic of theoretical
importance) in Section 2.6. This section is included mainly for completeness and need not
be covered on a first reading.

27
§2.1 Systems of Linear Equations and Matrices

{lineq}

{S:2.1} 2.1 Systems of Linear Equations and Matrices


It is a simple exercise to solve the system of two equations

x + y =7
{small} (2.1.1)
−x + 3y = 1

to find that x = 5 and y = 2. One way to solve system (2.1.1) is to add the two equations,
obtaining
4y = 8;
hence y = 2. Substituting y = 2 into the 1st equation in (2.1.1) yields x = 5.
This system of equations can be solved in a more algorithmic fashion by solving the 1st
equation in (2.1.1) for x as
x = 7 − y,
and substituting this answer into the 2nd equation in (2.1.1), to obtain

−(7 − y) + 3y = 1.

This equation simplifies to:


4y = 8.
Now proceed as before.

Solving Larger Systems by Substitution In contrast to solving the simple system of two
equations, it is less clear how to solve a complicated system of five equations such as:

5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4


2x1 + x2 − x3 − x4 + x5 = 6
{big} x1 + 2x2 + x3 + x4 + 3x5 = 19 (2.1.2)
−2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.

The algorithmic method used to solve (2.1.1) can be expanded to produce a method, called
substitution, for solving larger systems. We describe the substitution method as it applies
to (2.1.2). Solve the 1st equation in (2.1.2) for x1 , obtaining

4 4 3 6 2
{x1} x1 = + x2 − x3 + x4 − x5 . (2.1.3)
5 5 5 5 5

28
§2.1 Systems of Linear Equations and Matrices

Then substitute the right hand side of (2.1.3) for x1 in the remaining four equations in
(2.1.2) to obtain a new system of four equations in the four variables x2 ,x3 ,x4 ,x5 . This
procedure eliminates the variable x1 . Now proceed inductively — solve the 1st equation in
the new system for x2 and substitute this expression into the remaining three equations to
obtain a system of three equations in three unknowns. This step eliminates the variable
x2 . Continue by substitution to eliminate the variables x3 and x4 , and arrive at a simple
equation in x5 — which can be solved. Once x5 is known, then x4 , x3 , x2 , and x1 can be
found in turn.

Two Questions

• Is it realistic to expect to complete the substitution procedure without making a mistake


in arithmetic?

• Will this procedure work — or will some unforeseen difficulty arise?

Almost surely, attempts to solve (2.1.2) by hand, using the substitution procedure, will
lead to arithmetic errors. However, computers and software have developed to the point
where solving a system such as (2.1.2) is routine. In this text, we use the software package
MATLAB to illustrate just how easy it has become to solve equations such as (2.1.2).
The answer to the second question requires knowledge of the theory of linear algebra. In
fact, no difficulties will develop when trying to solve the particular system (2.1.2) using the
substitution algorithm. We discuss why later.

Solving Equations by MATLAB We begin by discussing the information that is needed


by MATLAB to solve (2.1.2). The computer needs to know that there are five equations in
five unknowns — but it does not need to keep track of the unknowns (x1 , x2 , x3 , x4 , x5 ) by
name. Indeed, the computer just needs to know the matrix of coefficients in (2.1.2)
 
5 −4 3 −6 2
 2 1 −1 −1 1 
{bigmatrix} (2.1.4*)
 

 1 2 1 1 3 

 −2 −1 −1 1 −1 
1 −6 1 1 4

29
§2.1 Systems of Linear Equations and Matrices

and the vector on the right hand side of (2.1.2)


 
4
 6 
{bigRHS} (2.1.5*)
 
 19  .
 
 −12 
4

We now describe how we enter this information into MATLAB. To reduce the drudgery and
to allow us to focus on ideas, the entries in equations having a ∗ after their label, such
as (2.1.4*), have been entered in the laode toolbox. This information can be accessed as
follows. After starting your MATLAB session, type

e2_1_4

followed by a carriage return. This instruction tells MATLAB to load equation (2.1.4*) of
Chapter 2. The matrix of coefficients is now available in MATLAB; note that this matrix is
stored in the 5 × 5 array A. What should appear is:

A =
5 -4 3 -6 2
2 1 -1 -1 1
1 2 1 1 3
-2 -1 -1 1 -1
1 -6 1 1 4

Indeed, comparing this result with (2.1.4*), we see that A contains precisely the same infor-
mation.
Since the label (2.1.5*) is followed by a ‘∗’, we can enter the vector in (2.1.5*) into MATLAB
by typing

e2_1_5

Note that the right hand side of (2.1.2) is stored in the vector b. MATLAB should have
responded with

b =
4

30
§2.1 Systems of Linear Equations and Matrices

6
19
-12
4

Now MATLAB has all the information it needs to solve the system of equations given in
(2.1.2). To have MATLAB solve this system, type

x = A\b

to obtain

x =
5.0000
2.0000
3.0000
4.0000
1.0000

This answer is interpreted as follows: the five values of the unknowns x1 ,x2 ,x3 ,x4 ,x5 are
stored in the vector x; that is,

{answer1} x1 = 5, x2 = 2, x3 = 3, x4 = 4, x5 = 1. (2.1.6)

The reader may verify that (2.1.6) is indeed a solution of (2.1.2) by substituting the values
in (2.1.6) into the equations in (2.1.2).

Changing Entries in MATLAB MATLAB also permits access to single components of x.


For instance, type

x(5)

and the 5th entry of x is displayed,

ans =
1.0000

We see that the component x(i) of x corresponds to the component xi of the vector x where
i = 1, 2, 3, 4, 5. Similarly, we can access the entries of the coefficient matrix A. For instance,
by typing

31
§2.1 Systems of Linear Equations and Matrices

A(3,4)

MATLAB responds with

ans =
1

It is also possible to change an individual entry in either a vector or a matrix. For example,
if we enter

A(3,4) = -2

we obtain a new matrix A which when displayed is:

A =
5 -4 3 -6 2
2 1 -1 -1 1
1 2 1 -2 3
-2 -1 -1 1 -1
1 -6 1 1 4

Thus the command A(3,4) = -2 changes the entry in the 3rd row, 4th column of A from 1
to −2. In other words, we have now entered into MATLAB the information that is needed
to solve the system of equations

5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4


2x1 + x2 − x3 − x4 + x5 = 6
x1 + 2x2 + x3 − 2x4 + 3x5 = 19
−2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.

As expected, this change in the coefficient matrix results in a change in the solution of
system (2.1.2), as well. Typing

x = A\b

now leads to the solution

x =

32
§2.1 Systems of Linear Equations and Matrices

1.9455
3.0036
3.0000
1.7309
3.8364

that is displayed to an accuracy of four decimal places.


In the next step, change A as follows:

A(2,3) = 1

The new system of equations is:

5x1 − 4x2 + 3x3 − 6x4 + 2x5 = 4


2x1 + x2 + x3 − x4 + x5 = 6
{incon} x1 + 2x2 + x3 − 2x4 + 3x5 = 19 (2.1.7)
−2x1 − x2 − x3 + x4 − x5 = −12
x1 − 6x2 + x3 + x4 + 4x5 = 4.

The command

x = A\b

now leads to the message

Warning: Matrix is singular to working precision.

x =
Inf
Inf
Inf
Inf
Inf

Obviously, something is wrong; MATLAB cannot find a solution to this system of equations!
Assuming that MATLAB is working correctly, we have shed light on one of our previous
questions: the method of substitution described by (2.1.3) need not always lead to a solution,
even though the method does work for system (2.1.2). Why? As we will see, this is one
of the questions that is answered by the theory of linear algebra. In the case of (2.1.7), it

33
§2.1 Systems of Linear Equations and Matrices

is fairly easy to see what the difficulty is: the second and fourth equations have the form
y = 6 and −y = −12, respectively.

Warning: The MATLAB command

x = A\b

may give an error message similar to the previous one. When this happens, one must
approach the answer with caution.

Exercises

In Exercises 1 – 3 find solutions to the given system of linear equations.


{c2.1.8a}
1.
2x − y = 0
3x = 6

Answer: (x, y) = (2, 4).


{c2.1.8b}
2.
3x − 4y = 2
2y + z = 1
3z = 9

2
Answer: (x, y, z) = (− , −1, 3).
{c2.1.8c} 3
3.
−2x + y = 9
3x + 3y = −9

Answer: (x, y) = (−4, 1).

{c2.1.8A}
4. Write the coefficient matrices for each of the systems of linear equations given in Exercises 1 –
3.
Answer: The matrices for the three systems are,
 
  3 −4 0  
2 −1 −2 1
,  0 2 1 , and .
3 0 3 3
0 0 3

34
{c2.1.9}
5. Neither of the following systems of three equations in three unknowns has a unique solution
— but for different reasons. Solve these systems and explain why these systems cannot be solved
uniquely.
x − y = 4
(a) x + 3y − 2z = −6
4x + 2y − 3z = 1
and
2x − 4y + 3z = 4
(b) 3x − 5y + 3z = 5
2y − 3z = −4

Answer: The system in part (a) has an infinite number of solutions, whereas the system in part
(b) has no solution.
Solution: (a) Replace x in the second and third equations with 4 + y to obtain 4y − 2z = −10 and
6y − 3z = −15. Since these equations have identical solutions, the system can be restated as

x = y + 4
z = −2y + 5

So, for each choice of y, there exists a single solution. For example, if y = 1, then x = 5 and z = 3;
or if y = 0, then x = 4 and z = 5.
(b) If we again substitute for x using the first equation, the second and third equations become

2y − 3z = −2 and 2y − 3z = −4

These two expressions contradict each other, so there is no solution for this system.

{c2.1.10}
6. Last year Dick was twice as old as Jane. Four years ago the sum of Dick’s age and Jane’s age
was twice Jane’s age now. How old are Dick and Jane?
Answer: Dick is 17 and Jane is 9.
Solution: Rewrite the two statements as linear equations in D — Dick’s age now — and J —
Jane’s age now. Then solve the system of linear equations.

{c2.1.11}
7. (a) Find a quadratic polynomial p(x) = ax2 + bx + c satisfying p(0) = 1, p(1) = 5, and
p(−1) = −5.
§2.1 Systems of Linear Equations and Matrices

(b) Prove that for every triple of real numbers L, M , and N , there is a quadratic polynomial
satisfying p(0) = L, p(1) = M , and p(−1) = N .

(c) Let x1 , x2 , x3 be three unequal real numbers and let A1 , A2 , A3 be three real numbers. Show
that finding a quadratic polynomial q(x) that satisfies q(xi ) = Ai is equivalent to solving a
system of three linear equations.

(a) Answer: The quadratic p(x) = −x2 + 5x + 1 satisfies these conditions.


Solution: Since p(x) = ax2 + bx + c for any quadratic equation, we find this solution by evaluating
p(0) = 1, p(1) = 5, and p(−1) = −5, which yields the system of equations

p(0) = c = 1
p(1) = a + b + c = 5
p(−1) = a − b + c = −5

We solve this system to obtain (a, b, c) = (−1, 5, 1), then substitute these coefficients into the general
quadratic.
(b) Let p(x) = ax2 + bx + c be a quadratic equation. Then, the assumptions p(0) = L, p(1) = M ,
and p(−1) = N imply:
p(0) = c = L
p(1) = a + b + c = M
p(−1) = a − b + c = N
M + N − 2L M − N
The unique solution to this system is (a, b, c) = ( , , L).
2 2
(c) Substituting q(xi ) = Ai , for i = 1, 2, 3, into the standard quadratic equation q(x) = ax2 + bx + c
yields
ax21 + bx1 + c = A1
ax22 + bx2 + c = A2
ax23 + bx3 + c = A3
Finding the appropriate quadratic polynomial would be equivalent to solving this system of linear
equations for a, b, and c in terms of A1 , A2 , and A3 .

36
§2.1 Systems of Linear Equations and Matrices

{c2.1.1} 8. (matlab) Using MATLAB type the commands e2_1_8 and e2_1_9 to load the matrices:
 
−5.6 0.4 −9.8 8.6 4.0 −3.4
 −9.1 6.6 −2.3 6.9 8.2 2.7 
 
 3.6 −9.3 −8.7 0.5 5.2 5.1 
{MATLAB:15} A= 3.6 −8.9 −1.7 −8.2 −4.8
 (2.1.8*)
 9.8 

 8.7 0.6 3.7 3.1 −9.1 −2.7 
−2.3 3.4 1.8 −1.7 4.7 −5.1

and the vector  


9.7

 4.5 

5.1
{MATLAB:16} (2.1.9*)
 
b= 

 3.0 

 −8.5 
2.6
Solve the corresponding system of linear equations.
Solution: Type A\b, to get

ans =
0.7060
2.4963
-2.9778
-1.7627
-0.1163
0.2654

{c2.1.2} 9. (matlab) Matrices are entered in MATLAB as follows. To enter the 2 × 3 matrix A, type A =
[ -1 1 2; 4 1 2]. Enter this matrix into MATLAB; the displayed matrix should be

A =
-1 1 2
4 1 2

Now change the entry in the 2nd row, 1st column to −5.
Solution: Type A(2,1) = -5. MATLAB responds with

A =
-1 1 2
-5 1 2

37
§2.1 Systems of Linear Equations and Matrices

{c2.1.3} 10. (matlab) Column vectors with n entries rd


are viewed by MATLAB as n × 1 matrices. Enter
the vector b = [1; 2; -4]. Then change the 3 entry in b to 13.
Solution: Type b(3) = 13 to obtain

b =
1
2
13

{c2.1.4} 11. (matlab) This problem illustrates some of the different ways that MATLAB displays numbers
using the format long, the format short and the format rational commands.
Use MATLAB to solve the following system of equations

2x1 − 4.5x2 + 3.1x3 = 4.2


x1 + x2 + x3 = −5.1
x1 − 6.2x2 + x3 = 1.3 .

You may change the format of your answer in MATLAB. For example, to print your result with an
accuracy of 15 digits type format long and redisplay the answer. Similarly, to print your result as
fractions type format rational and redisplay your answer.
Answer: According to MATLAB,

ans =
-12.0495
-0.8889
7.8384

Solution: Write the system as Ax = b, where:

A = b =
2.0000 -4.5000 3.1000 4.2000
1.0000 1.0000 1.0000 -5.1000
1.0000 -6.2000 1.0000 1.3000

then type A\b to solve.

{c2.1.5} 12. (matlab) Enter the following matrix and vector into MATLAB

A = [ 1 0 -1 ; 2 5 3 ; 5 -1 0];
b = [ 1; 1; -2];

38
§2.1 Systems of Linear Equations and Matrices

and solve the corresponding system of linear equations by typing

x = A\b

Your answer should be

x =
-0.2000
1.0000
-1.2000

Find an integer for the entry in the 2nd row, 2nd column of A so that the solution

x = A\b

is not defined. Hint: The answer is an integer between −4 and 4.


A\b is not defined when A(2, 2) = −1.

{c2.1.6} 13. (matlab) The MATLAB command rand(m,n) defines matrices with random entries between
0 and 1. For example, the command A = rand(5,5) generates a random 5 × 5 matrix, whereas the
command b = rand(5,1) generates a column vector with 5 random entries. Use these commands
to construct several systems of linear equations and then solve them.
Computer experiment.

{c2.1.7} 14. (matlab) Suppose that the four substances S1 , S2 , S3 , S4 contain the following percentages
of vitamins A, B, C and F by weight

Vitamin S1 S2 S3 S4
A 25% 19% 20% 3%
B 2% 14% 2% 14%
C 8% 4% 1% 0%
F 25% 31% 25% 16%

Mix the substances S1 , S2 , S3 and S4 so that the resulting mixture contains precisely 3.85 grams
of vitamin A, 2.30 grams of vitamin B, 0.80 grams of vitamin C, and 5.95 grams of vitamin F. How
many grams of each substance have to be contained in the mixture?
Discuss what happens if we require that the resulting mixture contains 2.00 grams of vitamin B
instead of 2.30 grams.
Answer: The vector of solutions is:

39
§2.1 Systems of Linear Equations and Matrices

ans =
7.3828
4.1016
4.5313
10.6250

Solution: First, translate the data in the table to a system of linear equations, relating the
quantities of S1 , S2 , S3 , and S4 in the mixture to the quantities of vitamins A, B, C, and F . The
first equation is .25S1 + .19S2 + .20S3 + .03S4 = A, and the other three equations correspond to the
other vitamins. From this data, find the coefficient matrix A for the system. The desired quantities
of each vitamin form the solution vector b.

A = b =
0.2500 0.1900 0.2000 0.0300 3.8500
0.0200 0.1400 0.0200 0.1400 2.3000
0.0800 0.0400 0.0100 0 0.8000
0.2500 0.3100 0.2500 0.1600 5.9500

As in the previous problems, the system can be solved by typing A\b.


If b(2) = 2.00 instead of 2.30, then A\b yields

ans =
22.4023
-27.8320
12.1094
37.1875

Note that the components of the answer vector refer to weights of substances, which cannot be
negative. This answer contains a negative component; so although a mathematically valid solution
exists, we cannot mix the substances in such a way that
 
3.85
 2.00 
b=  0.80 .

5.95

40
§2.2 The Geometry of Low-Dimensional Solutions

{S:2.2} 2.2 The Geometry of Low-Dimensional Solutions


In this section we discuss how to use MATLAB graphics to solve systems of linear equations
in two and three unknowns. We begin with two dimensions.

Linear Equations in Two Dimensions The set of all solutions to the equation

{2x-y=6} 2x − y = 6 (2.2.1)

is a straight line in the xy plane; this line has slope 2 and y-intercept equal to −6. We can
use MATLAB to plot the solutions to this equation — though some understanding of the
way MATLAB works is needed.
The plot command in MATLAB plots a sequence of points in the plane, as follows. Let X
and Y be n vectors. Then

plot(X,Y)

will plot the points (X(1), Y (1)), (X(2), Y (2)), …, (X(n), Y (n)) in the xy-plane.
To plot points on the line (2.2.1) we need to enter the x-coordinates of the points we wish
to plot. If we want to plot a hundred points, we would be facing a tedious task. MATLAB
has a command to simplify this task. Typing

x = linspace(-5,5,100);

produces a vector x with 100 entries with the 1st entry equal to −5, the last entry equal to
5, and the remaining 98 entries equally spaced between −5 and 5. MATLAB has another
command that allows us to create a vector of points x. In this command we specify the
distance between points rather than the number of points. That command is:

x = -5:0.1:5;

Producing x by either command is acceptable.


Typing

y = 2*x - 6;

produces a vector whose entries correspond to the y-coordinates of points on the line (2.2.1).
Then typing

41
§2.2 The Geometry of Low-Dimensional Solutions

plot(x,y)

produces the desired plot. It is useful to label the axes on this figure, which is accomplished
by typing

xlabel('x')
ylabel('y')

We can now use MATLAB to solve the equation (2.1.1) graphically. Recall that (2.1.1) is:

x + y =7
−x + 3y = 1

A solution to this system of equations is a point that lies on both lines in the system.
Suppose that we search for a solution to this system that has an x-coordinate between −3
and 7. Then type the commands

x = linspace(-3,7,100);
y = 7 - x;
plot(x,y)
xlabel('x')
ylabel('y')
hold on
y = (1 + x)/3;
plot(x,y)
axis('equal')
grid

The MATLAB command hold on tells MATLAB to keep the present figure and to add the
information that follows to that figure. The command axis('equal') instructs MATLAB to
make unit distances on the x and y axes equal. The last MATLAB command superimposes
grid lines. See Figure 6. From this figure you can see that the solution to this system is
(x, y) = (5, 2), which we already knew.
There are several principles that follow from this exercise.

• Solutions to a single linear equation in two variables form a straight line.

• Solutions to two linear equations in two unknowns lie at the intersection of two straight
lines in the plane.

42
§2.2 The Geometry of Low-Dimensional Solutions

10

y 4

0 5
x

{lineint} Figure 6: Graph of equations in (2.1.1)

It follows that the solution to two linear equations in two variables is a single point if the
lines are not parallel. If these lines are parallel and unequal, then there are no solutions, as
there are no points of intersection.

Linear Equations in Three Dimensions We begin by observing that the set of all solutions
to a linear equation in three variables forms a plane. More precisely, the solutions to the
equation
{abcd} ax + by + cz = d (2.2.2)
form a plane that is perpendicular to the vector (a, b, c) — assuming of course that the
vector (a, b, c) is nonzero.
This fact is most easily proved using the dot product. Recall from Chapter 1 (1.4.2) that
the dot product is defined by
X · Y = x1 y1 + x2 y2 + x3 y3 ,
where X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ). We recall from Chapter 1 (1.4.3) the following
important fact concerning dot products:
X ·Y =0

43
§2.2 The Geometry of Low-Dimensional Solutions

if and only if the vectors X and Y are perpendicular.


Suppose that N = (a, b, c) 6= 0. Consider the plane that is perpendicular to the normal
vector N and that contains the point X0 . If the point X lies in that plane, then X − X0 is
perpendicular to N ; that is,
{XX_0} (X − X0 ) · N = 0. (2.2.3)
If we use the notation

X = (x, y, z) and X0 = (x0 , y0 , z0 ),

then (2.2.3) becomes


a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.
Setting
d = ax0 + by0 + cz0
puts equation (2.2.3) into the form (2.2.2). In this way we see that the set of solutions to a
single linear equation in three variables forms a plane. See Figure 7.

X
X
0

{F:plane} Figure 7: The plane containing X0 and perpendicular to N .

We now use MATLAB to visualize the planes that are solutions to linear equations. Plotting
an equation in three dimensions in MATLAB follows a structure similar to the planar plots.
Suppose that we wish to plot the solutions to the equation

{-2x+3y+z=2} − 2x + 3y + z = 2. (2.2.4)

44
§2.2 The Geometry of Low-Dimensional Solutions

We can rewrite (2.2.4) as


z = 2x − 3y + 2.
It is this function that we actually graph by typing the commands

[x,y] = meshgrid(-5:0.5:5);
z = 2*x - 3*y + 2;
surf(x,y,z)

The first command tells MATLAB to create a square grid in the xy-plane. Grid points are
equally spaced between −5 and 5 at intervals of 0.5 on both the x and y axes. The second
command tells MATLAB to compute the z value of the solution to (2.2.4) at each grid point.
The third command tells MATLAB to graph the surface containing the points (x, y, z). See
Figure 8.

30

20

10

−10

−20

−30
5
5
0
0

−5 −5

{F:p1int} Figure 8: Graph of (2.2.4).

We can now see that solutions to a system of two linear equations in three unknowns consists
of points that lie simultaneously on two planes. As long as the normal vectors to these planes
are not parallel, the intersection of the two planes will be a line in three dimensions. Indeed,
consider the equations
−2x + 3y + z = 2
2x − 3y + z = 0.
We can graph the solution using MATLAB , as follows. We continue from the previous graph
by typing

45
§2.2 The Geometry of Low-Dimensional Solutions

hold on
z = -2*x + 3*y;
surf(x,y,z)

The result, which illustrates that the intersection of two planes in R3 is generally a line, is
shown in Figure 9.

30

20

10

−10

−20

−30
5
5
0
0

−5 −5

{F:p2int} Figure 9: Line of intersection of two planes.

We can now see geometrically that the solution to three simultaneous linear equations in
three unknowns will generally be a point — since generally three planes in three space
intersect in a point. To visualize this intersection, as shown in Figure 10, we extend the
previous system of equations to

−2x + 3y + z = 2
2x − 3y + z = 0
−3x + 0.2y + z = 1.

Continuing in MATLAB type

z = 3*x - 0.2*y + 1;
surf(x,y,z)

Unfortunately, visualizing the point of intersection of these planes geometrically does not
really help to get an accurate numerical value of the coordinates of this intersection point.

46
§2.2 The Geometry of Low-Dimensional Solutions

30

20

10

−10

−20

−30
5
5
0
0

−5 −5

{F:p3int} Figure 10: Point of intersection of three planes.

However, we can use MATLAB to solve this system accurately. Denote the 3 × 3 matrix of
coefficients by A, the vector of coefficients on the right hand side by b, and the solution by
x. Solve the system in MATLAB by typing

A = [ -2 3 1; 2 -3 1; -3 0.2 1];
b = [2; 0; 1];
x = A\b

The point of intersection of the three planes is at

x =
0.0233
0.3488
1.0000

Three planes in three dimensional space need not intersect in a single point. For example,
if two of the planes are parallel they need not intersect at all. The normal vectors must
point in independent directions to guarantee that the intersection is a point. Understanding
the notion of independence (it is more complicated than just not being parallel) is part of
the subject of linear algebra. MATLAB returns “Inf”, which we have seen previously, when
these normal vectors are (approximately) dependent. For example, consider Exercise 7.

47
§2.2 The Geometry of Low-Dimensional Solutions

Plotting Nonlinear Functions in MATLAB Suppose that we want to plot the graph of a
nonlinear function of a single variable, such as
{E:quadex} y = x2 − 2x + 3 (2.2.5)
on the interval [−2, 5] using MATLAB. There is a difficulty: How do we enter the term x2 ?
For example, suppose that we type

x = linspace(-2,5);
y = x*x - 2*x + 3;

Then MATLAB responds with

??? Error using ==> *


Inner matrix dimensions must agree.

The problem is that in MATLAB the variable x is a vector of 100 equally spaced points
x(1), x(2), …, x(100). What we really need is a vector consisting of entries x(1)*x(1),
x(2)*x(2), …, x(100)*x(100). MATLAB has the facility to perform this operation auto-
matically and the syntax for the operation is .* rather than *. So typing

x = linspace(-2,5);
y = x.*x - 2*x + 3;
plot(x,y)

produces the graph of (2.2.5) in Figure 11. In a similar fashion, MATLAB has the ‘dot’
operations of ./, .\, and .^, as well as .*.

Exercises

{c2.2.5}
1. Find the equation for the plane perpendicular to the vector (2, 3, 1) and containing the point
(−1, −2, 3).

Answer: The equation of the desired plane is 2x + 3y + z = −5.


Solution: Note that a vector perpendicular to a plane is orthogonal to any vector connecting two
points in the plane. So, if X0 = (−1, −2, 3) is one point in the plane perpendicular to N = (2, 3, 1)
and X = (x, y, z) is any other point in that plane, then (X − X0 ) · N = 0. Substituting into this
formula, we obtain
2(x + 1) + 3(y + 1) + 1(z − 3) = 0 or 2x + 3y + z = −5.

48
§2.2 The Geometry of Low-Dimensional Solutions

18

16

14

12

10

y
8

2
−2 −1 0 1 2 3 4 5
x

{F:quadex} Figure 11: Graph of y = x2 − 2x + 3.

{c2.2.6}
2. Determine three systems of two linear equations in two unknowns so that the first system has
a unique solution, the second system has an infinite number of solutions, and the third system has
no solutions.
Answer: Examples:

x + y = 2
unique solution
x − y = 0

x + y = 2
infinite number of solutions
2x + 2y = 4

x + y = 2
no solutions
x + y = 4
{c2.2.7}
3. Write the equation of the plane through the origin containing the vectors (1, 0, 1) and (2, −1, 2).

Answer: The equation for the plane is z = x.


Solution: Note that the plane goes through the origin and contains the vectors (1, 0, 1) and
(2, −1, 2), and therefore contains the points (0, 0, 0), (1, 0, 1), and (2, −1, 2). The general equation

49
§2.2 The Geometry of Low-Dimensional Solutions

for a plane is ax + by + cz = d. We can substitute the coordinates of the three points into this
equation to get the linear system

0 = d
a + c = d
2a − b + 2c = d

We can solve the system by substitution to get b = d = 0 and a = −c, which yields the equation of
the plane.

{c2.2.8}
4. Find a system of two linear equations in three unknowns whose solution set is the line consisting
of scalar multiples of the vector (1, 2, 1).

Answer: One such system is:

x − y + z = 0
2x + y − 4z = 0

Solution: The solution set contains all multiples of the vector (1, 2, 1), so it contains the origin,
since 0(1, 2, 1) = (0, 0, 0). The equation of any plane containing the origin is ax + by + cz = 0.
Substituting the point (1, 2, 1) implies that a + 2b + c = 0. Any two equations which satisfy that
condition and are not multiples of one another will have the appropriate line as a solution set.
For example, let (a, b, c) = (1, −1, 1) in the first equation, and (a, b, c) = (2, 1, −4) in the second
equation to obtain the system given here.

{c2.2.85}
5. Find the cosine of the angle between the normal vectors to the planes

2x − 2y + z = 14 and x + y − 2z = −10.

2
Answer: − √
3 6
Solution: The normal vectors are v = (2, −2, 1) and w = (1, 1, −2). The cosine of the angle θ
between the normal vectors is
v·w −2 2
cos(θ) = = √ √ =− √
||v|| ||w|| 9 6 3 6

{c2.2.9}
6. (a) Find a vector u normal to the plane 2x + 2y + z = 3.
(b) Find a vector v normal to the plane x + y + 2z = 4.

50
§2.2 The Geometry of Low-Dimensional Solutions

(c) Find the cosine of the angle θ between the vectors u and v.
Solution: (a) u = (2, 2, 1), since we know that the normal vector to the plane ax + by + cz = d is
(a, b, c).
(b) v = (1, 1, 2).
u·v 2
(c) cos θ = = √ . In MATLAB, type acos(2/sqrt(6))*180/pi to obtain θ = 35.2644◦ .
||u|| ||v|| 6

{c2.2.10} 7. (matlab) Determine graphically the geometry of the set of solutions to the system of equations
in the three unknowns x, y, z:
x + 3z = 1
3x − z = 1
z = 2
by sketching the plane of solutions for each equation individually. Describe in words why there are
no solutions to this system. (Use MATLAB graphics to verify your sketch. Note that you should
enter the last equation as z = 2 - 0*x - 0*y and the first two equations with 0*y terms. Try
different views — but include view([0 1 0]) as one view.)
The system has no solutions because there is no point at which the three planes intersect. Figure 7a
shows the MATLAB graph of the system. The commands view([0 -1 0]) and axis([-7 3 -3 3
-1 3]) produce a view of the graph in which the geometry can be seen, shown in Figure 7b. Since
there is no y term in any of the equations, all three planes are perpendicular to this view, and
therefore appear as lines in the graph.

2.5

30
2
20

10 1.5

0
1
−10

−20
0.5
−30

−40 0
10

5 10
5
−0.5
0
0
−5
−5
−1
−10 −10 −7 −6 −5 −4 −3 −2 −1 0 1 2 3

Figure 7a Figure 7b

51
§2.2 The Geometry of Low-Dimensional Solutions

{c2.2.1} 8. (matlab) Use MATLAB to solve graphically the planar system of linear equations
x + 4y = −4
4x + 3y = 4

to an accuracy of two decimal points.


Hint: The MATLAB command zoom on allows us to view the plot in a window whose axes are one-
half those of original. Each time you click with the mouse on a point, the axes’ limits are halved
and centered at the designated point. Coupling zoom on with grid on allows you to determine
approximate numerical values for the intersection point.
Answer: (x, y) ≈ (2.15, −1.54).
Solution: In MATLAB , graph the system by typing:

x = linspace(-5,5,100);
y = -1 - x/4;
plot(x,y)
xlabel('x')
ylabel('y')
hold on
y = 4/3 - 4*x/3;
plot(x,y)
axis('equal')
grid

Using the zoom command, we can zoom in on the graph of Figure 8 until the intersection of the
lines is visible at an accuracy of two decimal places.

{c2.2.2} 9. (matlab) Use MATLAB to solve graphically the planar system of linear equations
4.23x + 0.023y = −1.1
1.65x − 2.81y = 1.63

to an accuracy of two decimal points.


(x, y) ≈ (−0.26, −0.73).

{c2.2.3} 10. (matlab) Use MATLAB to find an approximate graphical solution to the three dimensional
system of linear equations
3x − 4y + 2z = −11
2x + 2y + z = 7
−x + y − 5z = 7.

52
§2.2 The Geometry of Low-Dimensional Solutions

8 −1.515

−1.52
6
−1.525

4 −1.53

−1.535
2

y
−1.54
y

0 −1.545

−1.55
−2
−1.555

−4 −1.56

−8 −6 −4 −2 0 2 4 6 8 2.13 2.14 2.15 2.16 2.17 2.18


x x

Figure 8a Figure 8b

Then use MATLAB to find an exact solution.


Answer: (x, y, z) = (1, 3, −1).
Solution: The commands to instruct MATLAB to graph this three dimensional system are:

[x,y] = meshgrid(-10:0.5:10);
z = (-11 - 3*x + 4*y)/2;
surf(x,y,z)
hold on
z = 7 - 2*x - 2*y;
surf(x,y,z)
hold on
z = (7 + x - y)/(-5);
surf(x,y,z)

It is hard to determine a solution for the system from this graph. The command axis([xmin xmax
ymin ymax zmin zmax]) can make the graph clearer by zooming in on a specific range of points,
but a numerically accurate solution is difficult to obtain graphically in three dimensions. Obtain
an accurate solution using the command A\b in MATLAB.

{c2.2.4} 11. (matlab) Use MATLAB to determine graphically the geometry of the set of solutions to the
system of equations:
x + 3y + 4z = 5
2x + y + z = 1
−4x + 3y + 5z = 7.

53
§2.2 The Geometry of Low-Dimensional Solutions

Attempt to use MATLAB to find an exact solution to this system and discuss the implications of
your calculations.
Hint: After setting up the graphics display in MATLAB, you can use the command view([0,1,0])
to get a better view of the solution point.
Answer: The solution set is a line because the three planes intersect in a line.
Solution: If the left-hand side of the system is entered into MATLAB as matrix A, and the solution
vector is entered as b, then typing A\b yields

Warning: Matrix is singular to working precision.


ans =
Inf
Inf
Inf

{c2.2.a5} 12. (matlab) Use MATLAB to graph the function y = 2 − x sin(x − 1) on the interval [−2, 3].
2

How many relative maxima does this function have on this interval?
Answer: The function has three relative maxima on this interval.
Solution: Graph the function in MATLAB using the commands:

x = linspace(-2,3);
y = 2 - x.*sin(x.^2 - 1);
plot(x,y)

Determine the number of relative maxima numerically from the graph, which is shown in Figure 12.

54
§2.2 The Geometry of Low-Dimensional Solutions

2
y

−1
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
x

Figure 12

55
§2.3 Gaussian Elimination

{S:Gauss} 2.3 Gaussian Elimination


A general system of m linear equations in n unknowns has the form
a11 x1 + a12 x2 + ··· + a1n xn = b1
a21 x1 + a22 x2 + ··· + a2n xn = b2
{general} .. .. .. .. (2.3.1)
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm .
The entries aij and bi are constants. Our task is to find a method for solving (2.3.1) for the
variables x1 , . . . , xn .

Easily Solved Equations Some systems are easily solved. The system of three equations
(m = 3) in three unknowns (n = 3)
x1 + 2x2 + 3x3 = 10
1 7
{examp3} x2 − x3 = (2.3.2)
5 5
x3 = 3

is one example. The 3rd equation states that x3 = 3. Substituting this value into the 2nd
equation allows us to solve the 2nd equation for x2 = 2. Finally, substituting x2 = 2 and
x3 = 3 into the 1st equation allows us to solve for x1 = −3. The process that we have just
described is called back substitution.
Next, consider the system of two equations (m = 2) in three unknowns (n = 3):
x1 + 2x2 + 3x3 = 10
{e23} (2.3.3)
x3 = 3 .

The 2nd equation in (2.3.3) states that x3 = 3. Substituting this value into the 1st equation
leads to the equation
x1 = 1 − 2x2 .
We have shown that every solution to (2.3.3) has the form (x1 , x2 , x3 ) = (1 − 2x2 , x2 , 3) and
that every vector (1 − 2x2 , x2 , 3) is a solution of (2.3.3). Thus, there is an infinite number
of solutions to (2.3.3), and these solutions can be parameterized by one number x2 .

Equations Having No Solutions Note that the system of equations


x1 − x2 = 1
x1 − x2 = 2

56
§2.3 Gaussian Elimination

has no solutions.
Definition 2.3.1. A linear system of equations is inconsistent if the system has no solutions
and consistent if the system does have solutions.

As discussed in the previous section, (2.1.7) is an example of a linear system that MATLAB
cannot solve. In fact, that system is inconsistent — inspect the 2nd and 4th equations in
(2.1.7).
Gaussian elimination is an algorithm for finding all solutions to a system of linear equations
by reducing the given system to ones like (2.3.2) and (2.3.3), that are easily solved by back
substitution. Consequently, Gaussian elimination can also be used to determine whether a
system is consistent or inconsistent.

Elementary Equation Operations There are three ways to change a system of equations
without changing the set of solutions; Gaussian elimination is based on this observation.
The three elementary operations are:

(a) Swap two equations.


(b) Multiply a single equation by a nonzero number.
(c) Add a scalar multiple of one equation to another.

We begin with an example:


x1 + 2x2 + 3x3 = 10
x1 + 2x2 + x3 = 4 (2.3.4)
2x1 + 9x2 + 5x3 = 27 .
Gaussian elimination works by eliminating variables from the equations in a fashion similar
to the substitution method in the previous section. To begin, eliminate the variable x1 from
all but the 1st equation, as follows. Subtract the 1st equation from the 2nd , and subtract
twice the 1st equation from the 3rd , obtaining:
x1 + 2x2 + 3x3 = 10
−2x3 = −6 (2.3.5)
5x2 − x3 = 7 .
Next, swap the 2nd and 3rd equations, so that the coefficient of x2 in the new 2nd equation
is nonzero. This yields
x1 + 2x2 + 3x3 = 10
5x2 − x3 = 7 (2.3.6)
−2x3 = −6 .

57
§2.3 Gaussian Elimination

Now, divide the 2nd equation by 5 and the 3rd equation by −2 to obtain a system of equations
identical to our first example (2.3.2), which we solved by back substitution.

Augmented Matrices The process of performing Gaussian elimination when the number
of equations is greater than two or three is painful. The computer, however, can help with
the manipulations. We begin by introducing the augmented matrix. The augmented matrix
associated with (2.3.1) has m rows and n + 1 columns and is written as:
 
a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2 
{augmented} .. .. .. ..  (2.3.7)
 
. . . . 


am1 am2 ··· amn bm

The augmented matrix contains all of the information that is needed to solve system (2.3.1).

Elementary Row Operations The elementary operations used in Gaussian elimination can
be interpreted as row operations on the augmented matrix, as follows:

(a) Swap two rows.

(b) Multiply a single row by a nonzero number.

(c) Add a scalar multiple of one row to another.

We claim that by using these elementary row operations intelligently, we can always solve a
consistent linear system — indeed, we can determine when a linear system is consistent or
inconsistent. The idea is to perform elementary row operations in such a way that the new
augmented matrix has zero entries below the diagonal.
We describe this process inductively. Begin with the 1st column. We assume for now that
some entry in this column is nonzero. If a11 = 0, then swap two rows so that the number
a11 is nonzero. Then divide the 1st row by a11 so that the leading entry in that row is 1.
Now subtract ai1 times the 1st row from the ith row for each row i from 2 to m. The end
result is that the 1st column has a 1 in the 1st row and a 0 in every row below the 1st . The
result is  
1 ∗ ··· ∗
 0 ∗ ··· ∗ 
 .. .. .. ..  .
 
 . . . . 
0 ∗ ··· ∗

58
§2.3 Gaussian Elimination

Next we consider the 2nd column. We assume that some entry in that column below the 1st
row is nonzero. So, if necessary, we can swap two rows below the 1st row so that the entry
a22 is nonzero. Then we divide the 2nd row by a22 so that its leading nonzero entry is 1.
Then we subtract appropriate multiples of the 2nd row from each row below the 2nd so that
all the entries in the 2nd column below the 2nd row are 0. The result is
 
1 ∗ ··· ∗
 0 1 ··· ∗ 
 .. .. .. ..  .
 
 . . . . 
0 0 ··· ∗

Then we continue with the 3rd column. That’s the idea. However, does this process always
work and what happens if all of the entries in a column are zero? Before answering these
questions we do experimentation with MATLAB.

Row Operations in MATLAB In MATLAB the ith row of a matrix A is specified by A(i,:).
Thus to replace the 5th row of a matrix A by twice itself, we need only type:

A(5,:) = 2*A(5,:)

In general, we can replace the ith row of the matrix A by c times itself by typing

A(i,:) = c*A(i,:)

Similarly, we can divide the ith row of the matrix A by the nonzero number c by typing

A(i,:) = A(i,:)/c

The third elementary row operation is performed similarly. Suppose we want to add c times
the ith row to the j th row, then we type

A(j,:) = A(j,:) + c*A(i,:)

For example, subtracting 3 times the 7th row from the 4th row of the matrix A is accomplished
by typing:

A(4,:) = A(4,:) - 3*A(7,:)

59
§2.3 Gaussian Elimination

The first elementary row operation, swapping two rows, requires a different kind of MATLAB
command. In MATLAB, the ith and j th rows of the matrix A are permuted by the command

A([i j],:) = A([j i],:)

So, to swap the 1st and 3rd rows of the matrix A, we type

A([1 3],:) = A([3 1],:)

Examples of Row Reduction in MATLAB Let us see how the row operations can be used
in MATLAB. As an example, we consider the augmented matrix
 
1 3 0 −1 −8
 2 6 −4 4 4 
{examp4} 
 1 0 −1 −9 −35 
 (2.3.8*)
0 1 0 3 10

We enter this information into MATLAB by typing

e2_3_8

which produces the result

A =
1 3 0 -1 -8
2 6 -4 4 4
1 0 -1 -9 -35
0 1 0 3 10

We now perform Gaussian elimination on A, and then solve the resulting system by back
substitution. Gaussian elimination uses elementary row operations to set the entries that
are in the lower left part of A to zero. These entries are indicated by numbers in the following
matrix:

* * * * *
2 * * * *
1 0 * * *
0 1 0 * *

60
§2.3 Gaussian Elimination

Gaussian elimination works inductively. Since the first entry in the matrix A is equal to 1,
the first step in Gaussian elimination is to set to zero all entries in the 1st column below
the 1st row. We begin by eliminating the 2 that is the first entry in the 2nd row of A. We
replace the 2nd row by the 2nd row minus twice the 1st row. To accomplish this elementary
row operation, we type

A(2,:) = A(2,:) - 2*A(1,:)

and the result is

A =
1 3 0 -1 -8
0 0 -4 6 20
1 0 -1 -9 -35
0 1 0 3 10

In the next step, we eliminate the 1 from the entry in the 3rd row, 1st column of A. We do
this by typing

A(3,:) = A(3,:) - A(1,:)

which yields

A =
1 3 0 -1 -8
0 0 -4 6 20
0 -3 -1 -8 -27
0 1 0 3 10

Using elementary row operations, we have now set the entries in the 1st column below the
1st row to 0. Next, we alter the 2nd column. We begin by swapping the 2nd and 4th rows
so that the leading nonzero entry in the 2nd row is 1. To accomplish this swap, we type

A([2 4],:) = A([4 2],:)

and obtain

A =
1 3 0 -1 -8

61
§2.3 Gaussian Elimination

0 1 0 3 10
0 -3 -1 -8 -27
0 0 -4 6 20

The next elementary row operation is the command

A(3,:) = A(3,:) + 3*A(2,:)

which leads to

A =
1 3 0 -1 -8
0 1 0 3 10
0 0 -1 1 3
0 0 -4 6 20

Now we have set all entries in the 2nd column below the 2nd row to 0.
Next, we set the first nonzero entry in the 3rd row to 1 by multiplying the 3rd row by −1,
obtaining

A =
1 3 0 -1 -8
0 1 0 3 10
0 0 1 -1 -3
0 0 -4 6 20

Since the leading nonzero entry in the 3rd row is 1, we next eliminate the nonzero entry in
the 3rd column, 4th row. This is accomplished by the following MATLAB command:

A(4,:) = A(4,:) + 4*A(3,:)

Finally, divide the 4th row by 2 to obtain:

A =
1 3 0 -1 -8
0 1 0 3 10
0 0 1 -1 -3
0 0 0 1 4

62
§2.3 Gaussian Elimination

By using elementary row operations, we have arrived at the system


x1 + 3x2 − x4 = −8
x2 + 3x4 = 10
(2.3.9)
x3 − x4 = −3
x4 = 4 ,
that can now be solved by back substitution. We obtain
{ans1} x4 = 4, x3 = 1, x2 = −2, x1 = 2. (2.3.10)
We return to the original set of equations corresponding to (2.3.8*)
x1 + 3x2 − x4 = −8
2x1 + 6x2 − 4x3 + 4x4 = 4
{examp4_con} (2.3.11*)
x1 − x3 − 9x4 = −35
x2 + 3x4 = 10 .
Load the corresponding linear system into MATLAB by typing

e2_3_11

The information in (2.3.11*) is contained in the coefficient matrix C and the right hand side
b. A direct solution is found by typing

x = C\b

which yields the same answer as in (2.3.10), namely,

x =
2.0000
-2.0000
1.0000
4.0000

Introduction to Echelon Form Next, we discuss how Gaussian elimination works in an


example in which the number of rows and the number of columns in the coefficient matrix
are unequal. We consider the augmented matrix
 
1 0 −2 3 4 0 1
 0 1 2 4 0 −2 0 
{examp5} 
 2 −1 −4
 (2.3.12*)
0 −2 8 −4 
−3 0 6 −8 −12 2 −2

63
§2.3 Gaussian Elimination

This information is entered into MATLAB by typing

e2_3_12

Again, the augmented matrix is denoted by A.


We begin by eliminating the 2 in the entry in the 3rd row, 1st column. To accomplish the
corresponding elementary row operation, we type

A(3,:) = A(3,:) - 2*A(1,:)

resulting in

A =
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0
0 -1 0 -6 -10 8 -6
-3 0 6 -8 -12 2 -2

We proceed with

A(4,:) = A(4,:) + 3*A(1,:)

to create two more zeros in the 4th row. Finally, we eliminate the -1 in the 3rd row, 2nd
column by

A(3,:) = A(3,:) + A(2,:)

to arrive at

A =
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0
0 0 2 -2 -10 6 -6
0 0 0 1 0 2 1

Next we set the leading nonzero entry in the 3rd row to 1 by dividing the 3rd row by 2.
That is, we type

A(3,:) = A(3,:)/2

64
§2.3 Gaussian Elimination

to obtain

A =
1 0 -2 3 4 0 1
0 1 2 4 0 -2 0
0 0 1 -1 -5 3 -3
0 0 0 1 0 2 1

We say that the matrix A is in (row) echelon form since the first nonzero entry in each row
is a 1, each entry in a column below a leading 1 is 0, and the leading 1 moves to the right
as you go down the matrix. In row echelon form, the entries where leading 1’s occur are
called pivots.
If we compare the structure of this matrix to the ones we have obtained previously, then
we see that here we have two columns too many. Indeed, we may solve these equations by
back substitution for any choice of the variables x5 and x6 .
The idea behind back substitution is to solve the last equation for the variable corresponding
to the first nonzero coefficient. In this case, we use the 4th equation to solve for x4 in terms
of x5 and x6 , and then we substitute for x4 in the first three equations. This process can also
be accomplished by elementary row operations. Indeed, eliminating the variable x4 from
the first three equations is the same as using row operations to set the first three entries in
the 4th column to 0. We can do this by typing

A(3,:) = A(3,:) + A(4,:);


A(2,:) = A(2,:) - 4*A(4,:);
A(1,:) = A(1,:) - 3*A(4,:)

Remember: By typing semicolons after the first two rows, we have told MATLAB not
to print the intermediate results. Since we have not typed a semicolon after the 3rd row,
MATLAB outputs

A =
1 0 -2 0 4 -6 -2
0 1 2 0 0 -10 -4
0 0 1 0 -5 5 -2
0 0 0 1 0 2 1

We proceed with back substitution by eliminating the nonzero entries in the first two rows
of the 3rd column. To do this, type

65
§2.3 Gaussian Elimination

A(2,:) = A(2,:) - 2*A(3,:);


A(1,:) = A(1,:) + 2*A(3,:)

which yields

A =
1 0 0 0 -6 4 -6
0 1 0 0 10 -20 0
0 0 1 0 -5 5 -2
0 0 0 1 0 2 1

The augmented matrix is now in reduced echelon form and the corresponding system of
equations has the form

x1 − 6x5 + 4x6 = −6
x2 + 10x5 − 20x6 = 0
{e:refexamp5} (2.3.13)
x3 − 5x5 + 5x6 = −2
x4 + 2x6 = 1,

A matrix is in reduced echelon form if it is in echelon form and if every entry in a column
containing a pivot, other than the pivot itself, is 0.
Reduced echelon form allows us to solve directly this system of equations in terms of the
variables x5 and x6 ,    
x1 −6 + 6x5 − 4x6
 x2   −10x5 + 20x6 
   
 x3   −2 + 5x5 − 5x6 
{e:refexamp6}  x4  = 
   . (2.3.14)
   1 − 2x6 

 x5   x5 
x6 x6
It is important to note that every consistent system of linear equations corresponding to
an augmented matrix in reduced echelon form can be solved as in (2.3.14) — and this is
one reason for emphasizing reduced echelon form. We will discuss the reduction to reduced
echelon form in more detail in the next section.

Exercises

In Exercises 1 – 3 determine whether the given matrix is in reduced echelon form.

66
§2.3 Gaussian Elimination

{c2.3.6a}
 
1 −1 0 1
1.  0 1 0 −6 .
0 0 1 0

{c2.3.6b} The matrix is not in reduced echelon form.


 
1 0 −2 0
2.  0 1 4 0 .
0 0 0 1

{c2.3.6c} The matrix is in reduced echelon form.


 
0 1 0 3
3.  0 0 2 1 .
0 0 0 0
The matrix is not in reduced echelon form.

In Exercises 4 – 6 we list the reduced echelon form of an augmented matrix of a system of linear
equations. Which columns in these augmented matrices contain pivots? Describe all solutions to
these systems of equations in the form of (2.3.14).
{c2.3.7a}
 
1 4 0 0
4.  0 0 1 5 .
0 0 0 0
The 1st and 3rd columns of the matrix contain pivots. The solutions of the system are:

   
x1 −4x2
 x2  =  x2 
x3 5
{c2.3.7b}
 
1 2 0 0 0
5.  0 0 1 1 0 .
0 0 0 0 1
The 1st , 3rd , and 5th columns of the matrix contain pivots. Since the last row of the matrix
translates to the linear equation 0 = 1, the system is inconsistent, and there are no solutions.
{c2.3.7c}
 
1 −6 0 0 1
6.  0 0 1 0 9 .
0 0 0 0 0

67
§2.3 Gaussian Elimination

7. The solutions of the system are:


   
x1 1 + 6x2
 x2   x2 
 x3  = 
   
9 
x4 x4

The 1st and 3rd columns of the matrix contain pivots. The solutions of the system are:
   
x1 1 + 6x2
 x2   x2 
 x3  = 
   
9 
x4 x4

HOOSE.A.NEW.LABEL}
8. YOUR QUESTION GOES HERE
Answer: THE SHORT ANSWER GOES HERE (if relevant)
Solution: A LONGER SOLUTION, EXPLAINING HOW THE ANSWER WAS FOUND, GOES
HERE (if relevant)

HOOSE.A.NEW.LABEL}
9. YOUR QUESTION GOES HERE
Answer: THE SHORT ANSWER GOES HERE (if relevant)
Solution: A LONGER SOLUTION, EXPLAINING HOW THE ANSWER WAS FOUND, GOES
HERE (if relevant)

{c2.3.8}
10. (a) Consider the 2 × 2 matrix  
a b
{e:2x2} (2.3.15)
c 1
where a, b, c ∈ R and a 6= 0. Show that (2.3.15) is row equivalent to the matrix
 
b
1
a
[given]a − bc  .
 

0
a
(b) Show that (2.3.15) is row equivalent to the identity matrix if and only if a 6= [given]bc.

68
§2.3 Gaussian Elimination

(a) Two matrices are row equivalent if there is a sequence of row operations that leads from one to
the other. In this case:
1. Divide the 1st row of (2.3.15) by a.
2. Subtract c times the 1st row from the 2nd row.
The result is  
b
 1 a
a − bc  .


0
a

(b) If a 6= bc, then we can row reduce (2.3.15) to I2 by performing the two row operations in part
(a), followed by:
a
3. Multiply the 2nd row by .
a − bc
b
4. Subtract times the second row from the first row.
a
This will give us the identity matrix I2 .
a − bc
If a = bc, then = 0, and the matrix is
a
b
!
1
a .
0 0

which has only one pivot and cannot be row reduced to I2 .

{c2.3.9}
11. Use row reduction and back substitution to solve the following system of two equations in three
unknowns:
x1 − x2 + x3 = 1
2x1 + x2 − x3 = −1

Answer: The solution to this system is


       
x1 0 0 0
 x2  =  x3 − 1  =  −1  + x3  1  ,
x3 x3 0 1

where x3 is any real number.

69
§2.3 Gaussian Elimination

Solution: Row reduce the augmented matrix of the system:


   
1 −1 1 1 1 0 0 0
−→ .
2 1 −1 −1 0 1 −1 −1

Although (1, 2, 2) is not a solution to this system, there is a solution for which x3 = 2, namely
(0, 1, 2).

In Exercises 12 – 13 determine the augmented matrix and all solutions for each system of linear
{c2.3.10a} equations
x−y+z = 1
12. 4x + y + z = 5 .
2x + 3y − z = 2
The augmented matrix for this system is
 
1 −1 1 1
 4 1 1 5 
2 3 −1 2

We can row reduce this matrix to reduced echelon form, obtaining


 
1 0 2/5 6/5
 0 1 −3/5 1/5 
0 0 0 1

The last row of the reduced system implies that 0 = 1, so the system is inconsistent and has no
{c2.3.10b} solutions.
2x − y + z + w = 1
13. .
x + 2y − z + w = 7
The augmented matrix for this system is
 
2 −1 1 1 1
1 2 −1 1 7
which can be row reduced to
 
1 0 1/5 3/5 9/5
.
0 1 −3/5 1/5 13/5

The solution set is therefore


         
x 9/5 − 1/5 z − 3/5 w 9/5 −1/5 −3/5
 y   13/5 + 3/5 z − 1/5 w   13/5   3/5   −1/5 
 z =
=
  0  + z 1  + w 0 .
       
z
w w 0 0 1

70
§2.3 Gaussian Elimination

In Exercises 14 – 17 consider the augmented matrices representing systems of linear equations, and
decide

(a) if there are zero, one or infinitely many solutions, and


{c2.3.11a} (b) if solutions are not unique, how many variables can be assigned arbitrary values.
 
1 0 0 3
14.  0 2 1 1 .
0 0 0 0
Answer: (a) The system has infinitely many solutions.
(b) One variable can be assigned arbitrary values.
Solution: The row-reduced form of the matrix is:

 
1 0 0 3
1 1 
 0 1 .

2 2
{c2.3.11b} 0 0 0 0
 
1 2 0 0 3
15.  0 1 1 0 1 .
0 0 0 0 2
Answer: The system has no solutions.
Solution: The row-reduced form of the matrix is:
 
1 2 0 0 3
 0 1 1 0 1 .
{c2.3.11c} 0 0 0 0 1
 
1 0 2 1
16.  0 5 0 2 .
0 0 4 3
Answer: The system has a unique solution.
Solution: The row-reduced form of the matrix is:
1 0 2 1
 
2 
 0 1 0

5 .

3

0 0 1
{c2.3.11d} 4
 
1 0 2 0 3
 2 3 6 1 16 
17. 
 0
.
3 2 1 10 
0 0 0 0 0

71
§2.3 Gaussian Elimination

18. There is/are 2 variable(s) that may be assigned arbitrary values.


Answer: (a) The system has infinitely many solutions.
(b) Two variables can be assigned arbitrary values.
Solution: The row-reduced form of the matrix is:
 
1 0 2 0 3
2 1 10
 0 1
 
.

 3 3 3
 0 0 0 0 0 
0 0 0 0 0

A system of m equations in n unknowns is linear if it has the form (2.3.1); any other system of
equations is called nonlinear. In Exercises 19 – 23 decide whether each of the given systems of
equations is linear or nonlinear.
{c2.3.12a}
19.
3x1 − 2x2 + 14x3 − 7x4 = 35
2x1 + 5x2 − 3x3 + 12x4 = −1

Answer: The system is linear.


{c2.3.12b}
20.
3x1 + πx2 = 0
2x1 − ex2 = 1

Answer: The system is linear.


{c2.3.12c}
21.
3x1 x2 − x2 = 10
2x1 − x22 = −5

Answer: The system is not linear.


Solution: The term 3x1 x2 contains two variables, and the term x22 contains a variable squared.
Thus, these terms are not linear.
{c2.3.12d}
22.
3x1 − x2 = cos(12)
2x1 − x2 = −5

Answer: The system is linear.


{c2.3.12e}
23.
3x1 − sin(x2 ) = 12
2x1 − x3 = −5

72
§2.3 Gaussian Elimination

Answer: The system is not linear.


Solution: The term sin(x2 ) is not linear.

In Exercises 24 – 26 use elementary row operations and MATLAB to put each of the given matrices
into row echelon form. Suppose that the matrix is the augmented matrix for a system of linear
equations. Is the system consistent or inconsistent?

{c2.3.1a} 24. (matlab)  


2 1 1
.
4 2 3

Answer: The row echelon form matrix is

A =
1.0000 0.5000 0.5000
0 0 1.0000

Solution: Enter the augmented matrix into MATLAB as A, then reduce to row echelon form by
typing:

A(1,:) = A(1,:)/2
A(2,:) = A(2,:) - 4*A(1,:)

The linear system that this matrix represents is inconsistent, since the 2nd row of the reduced
matrix represents the equation 0 = 1.

{c2.3.1b} 25. (matlab)  


3 −4 0 2
 0 2 3 1 .
3 1 4 5

The row-reduced matrix is:

A =
1.0000 -1.3333 0 0.6667
0 1.0000 1.5000 0.5000
0 0 1.0000 -0.1429

This matrix represents a consistent linear system.

{c2.3.1c} 26. (matlab)  


−2 1 9 1
 3 3 −4 2 .
1 4 5 5

73
§2.3 Gaussian Elimination

The row-reduced matrix is:

A =
1.0000 -0.5000 -4.5000 -0.5000
0 1.0000 2.1111 0.7778
0 0 0 2.0000

This matrix represents an inconsistent linear system.

Observation: In standard format MATLAB displays all nonzero real numbers with four decimal
places while it displays zero as 0. An unfortunate consequence of this display is that when a matrix
has both zero and noninteger entries, the columns will not align — which is a nuisance. You can
work with rational numbers rather than decimal numbers by typing format rational. Then the
columns will align.

{c2.3.2} 27. (matlab) Load the following 6 × 8 matrix A into MATLAB by typing e2_3_16.
 
0 0 0 1 3 5 0 9
 0
 3 6 −6 −6 −12 0 1 

 0 2 4 −5 −7 14 0 1 
{MATLAB:13} A=  (2.3.16*)
 0
 1 2 1 14 21 0 −1  
 0 0 0 2 4 9 0 7 
0 5 10 −11 −13 2 0 2

Use MATLAB to transform this matrix to row echelon form.


The row echelon form is:

A =
0 1.0000 2.0000 1.0000 14.0000 21.0000 0 -1.0000
0 0 0 1.0000 3.0000 5.0000 0 9.0000
0 0 0 0 1.0000 -0.5000 0 -4.7143
0 0 0 0 0 1.0000 0 0.3457
0 0 0 0 0 0 0 1.0000
0 0 0 0 0 0 0 0

{c2.3.3} 28. (matlab) Use row reduction and back substitution to solve the following system of linear
equations:
2x1 + 3x2 − 4x3 + x4 = 2
3x1 − x2 − x3 + 2x4 = 4
x1 − 7x2 + 5x3 − x4 = 6

74
§2.3 Gaussian Elimination

Answer: The system’s solution is


 7 21 
  − − x4
x1 4 8
9 11
 
 x2   − − x4

 = 2 4

 x3  
 19 25


x4
 − − x4 
4 8
x4

where x4 is any real number.


Solution: Note that the system of linear equations is related to the augmented matrix

A =
2 3 -4 1 2
3 -1 -1 2 4
1 -7 5 -1 6

which row reduces to

A =
1 3/2 -2 1/2 1
0 1 -10/11 -1/11 -2/11
0 0 1 25/8 -19/4

The system can then be solved by back substitution.

{c2.3.4} 29. (matlab) Comment: To understand the point of this exercise you must begin by typing the
MATLAB command format short e. This command will set a format in which you can see the
difficulties that sometimes arise in numerical computations.
Consider the following two 3 × 3-matrices:
   
1 3 4 3 1 4
{MATLAB:14} A= 2 1 1  and B= 1 2 1 . (2.3.17*)
−4 3 5 3 −4 5

Note that matrix B is obtained from matrix A by interchanging the first two columns.

(a) Use MATLAB to put A into row echelon form using the transformations
(a) Subtract 2 times the 1st row from the 2nd .
(b) Add 4 times the 1st row to the 3rd .
(c) Divide the 2nd row by −5.

75
§2.3 Gaussian Elimination

(d) Subtract 15 times the 2nd row from the 3rd .


(b) Put B by hand into row echelon form using the transformations
(a) Divide the 1st row by 3.
(b) Subtract the 1st row from the 2nd .
(c) Subtract 3 times the 1st row from the 3rd .
(d) Multiply the 2nd row by 3/5.
(e) Add 5 times the 2nd row to the 3rd .
(c) Use MATLAB to put B into row echelon form using the same transformations as in part (b).
(d) Discuss the outcome of the three transformations. Is there a difference in the results? Would
you expect to see a difference? Could the difference be crucial when solving a system of linear
equations?

(a)

A =
1.0000e+00 3.0000e+00 4.0000e+00
0 1.0000e+00 1.4000e+00
0 0 0
 1 4 
1
3 3
(b) B =  1
 


 0 1 
5
0 0 0
(c)

B =
1.0000e+00 3.3333e-01 1.3333e+00
0 1.0000e+00 -2.0000e-01
0 0 2.2204e-16

(d) Note that switching the first two columns of matrix A produces matrix B. Suppose that A and
B represent the left-hand sides of linear systems with the same vector representing the right-hand
sides. If the solution of system A is (x, y, z) = (a, b, c), then the solution of system B should be
(x, y, z) = (b, a, c). However, according to the row reduced matrices produced by MATLAB , the
system corresponding to B has a unique solution, and the system corresponding to A does not. Row
reducing B by hand shows that there is not a unique solution. MATLAB calculations provide one
because of roundoff error. The division by 3 in the first step of the row reduction for B causes a
rounding inaccuracy. Because of this, MATLAB eventually computes a very small nonzero value
for B(3, 3) rather than the correct answer of 0.

76
§2.3 Gaussian Elimination

{c2.3.5} 30. (matlab) Find a cubic polynomial


p(x) = ax3 + bx2 + cx + d

so that p(1) = 2, p(2) = 3, p0 (−1) = −1, and p0 (3) = 1.


3 3 5 2 5 3
Answer: The polynomial p(x) = − x + x + x + satisfies the given conditions.
44 11 44 2
Solution: Let p(x) = ax3 +bx2 +cx+d be the general cubic polynomial. Then p0 (x) = 3ax2 +2bx+c.
We can write the conditions on p as a system of linear equations:

a + b + c + d = 2
8a + 4b + 2c + d = 3
3a − 2b + c = −1
27a + 6b + c = 1

Using MATLAB , we solve for a, b, c, and d, obtaining the vector of coefficients:

ans =
-3/44
5/11
5/44
3/2

77
§2.4 Reduction to Echelon Form

{S:2.4} 2.4 Reduction to Echelon Form


In this section, we formalize our previous numerical experiments. We define more precisely
the notions of echelon form and reduced echelon form matrices, and we prove that every
matrix can be put into reduced echelon form using a sequence of elementary row operations.
Consequently, we will have developed an algorithm for determining whether a system of
linear equations is consistent or inconsistent, and for determining all solutions to a consistent
{D:echelon} system.
Definition 2.4.1. A matrix E is in (row) echelon form if two conditions hold.

(a) The first nonzero entry in each row of E is equal to 1. This leading entry 1 is called a
pivot.
(b) A pivot in the (i + 1)st row of E occurs in a column to the right of the column where
the pivot in the ith row occurs.

Note: A consequence of Definition 2.4.1 is that all rows in an echelon form matrix that are
identically zero occur at the bottom of the matrix.
Here are three examples of matrices that are in echelon form. The pivot in each row (which
is always equal to 1) is preceded by a ∗.
 
∗1 0 −1 0 −6 4 −6
 0 ∗1 4 0 0 −2 0 
 
 0 0 0 ∗1 −5 5 −2 
0 0 0 0 0 ∗1 0
 
∗1 0 −1 0 −6
 0 ∗1 0 3 0 
 
 0 0 0 ∗1 −5 
0 0 0 0 0
 
0 ∗1 −1 14 −6
 0
 0 0 ∗1 15  
 0 0 0 0 0 
0 0 0 0 0
Here are three examples of matrices that are not in echelon form.
 
0 0 1 15
 1 −1 14 −6 
0 0 0 0

78
§2.4 Reduction to Echelon Form

 
1 −1 14 −6
 0 0 3 15 
0 0 0 0
 
1 −1 14 −6
 0 0 0 0 
0 0 1 15
{D:roweq}
Definition 2.4.2. Two m × n matrices are row equivalent if one can be transformed to the
other by a sequence of elementary row operations.

Let A = (aij ) be a matrix with m rows and n columns. We want to show that we can
perform row operations on A so that the transformed matrix is in echelon form; that is, A
is row equivalent to a matrix in echelon form. If A = 0, then we are finished. So we assume
that some entry in A is nonzero and that the 1st column where that nonzero entry occurs
is in the k th column. By swapping rows we can assume that a1k is nonzero. Next, divide
the 1st row by a1k , thus setting a1k = 1. Now, using MATLAB notation, perform the row
operations

A(i,:) = A(i,:) - A(i,k)*A(1,:)

for each i ≥ 2. This sequence of row operations leads to a matrix whose first nonzero column
has a 1 in the 1st row and a zero in each row below the 1st row.
Now we look for the next column that has a nonzero entry below the 1st row and call that
column `. By construction ` > k. We can swap rows so that the entry in the 2nd row, `th
column is nonzero. Then we divide the 2nd row by this nonzero element, so that the pivot
in the 2nd row is 1. Again we perform elementary row operations so that all entries below
the 2nd row in the `th column are set to 0. Now proceed inductively until we run out of
nonzero rows.
This argument proves:
{P:echform}
Proposition 2.4.3. Every matrix is row equivalent to a matrix in echelon form.

More importantly, the previous argument provides an algorithm for transforming matrices
into echelon form.

Reduction to Reduced Echelon Form

Definition 2.4.4. A matrix E is in reduced echelon form if

79
§2.4 Reduction to Echelon Form

(a) E is in echelon form, and


(b) in every column of E having a pivot, every entry in that column other than the pivot
is 0.

{T:redechform} We can now prove


Theorem 2.4.5. Every matrix is row equivalent to a matrix in reduced echelon form.

Proof Let A be a matrix. Proposition 2.4.3 states that we can transform A by elementary
row operations to a matrix E in echelon form. Next we transform E into reduced echelon
form by some additional elementary row operations, as follows. Choose the pivot in the last
nonzero row of E. Call that row `, and let k be the column where the pivot occurs. By
adding multiples of the `th row to the rows above, we can transform each entry in the k th
column above the pivot to 0. Note that none of these row operations alters the matrix before
the k th column. (Also note that this process is identical to the process of back substitution.)
Again we proceed inductively by choosing the pivot in the (` − 1)st row, which is 1, and
zeroing out all entries above that pivot using elementary row operations. 

Reduced Echelon Form in MATLAB Preprogrammed into MATLAB is a routine to row


reduce any matrix to reduced echelon form. The command is rref. For example, recall the
4 × 7 matrix A in (2.3.12*) by typing e2_3_12. Put A into reduced row echelon form by
typing rref(A) and obtaining

ans =
1 0 0 0 -6 4 -6
0 1 0 0 10 -20 0
0 0 1 0 -5 5 -2
0 0 0 1 0 2 1

Compare the result with the system of equations (2.3.13).

Solutions to Systems of Linear Equations Originally, we introduced elementary row op-


erations as operations that do not change solutions to the linear system. More precisely, we
discussed how solutions to the original system are still solutions to the transformed system
and how no new solutions are introduced by elementary row operations. This argument is
most easily seen by observing that

all elementary row operations are invertible

80
§2.4 Reduction to Echelon Form

— they can be undone.


For example, swapping two rows is undone by just swapping these rows again. Similarly,
multiplying a row by a nonzero number c is undone by just dividing that same row by c.
Finally, adding c times the j th row to the ith row is undone by subtracting c times the j th
row from the ith row.
Thus, we can make several observations about solutions to linear systems. Let E be an
augmented matrix corresponding to a system of linear equations having n variables. Since
an augmented matrix is formed from the matrix of coefficients by adding a column, we see
that the augmented matrix has n + 1 columns.
{number}
Theorem 2.4.6. Suppose that E is an m × (n + 1) augmented matrix that is in reduced
echelon form. Let ` be the number of nonzero rows in E

(a) The system of linear equations corresponding to E is inconsistent if and only if the `th
row in E has a pivot in the (n + 1)st column.

(b) If the linear system corresponding to E is consistent, then the set of all solutions is
parameterized by n − ` parameters.

Proof Suppose that the last nonzero row in E has its pivot in the (n + 1)st column. Then
the corresponding equation is:

0x1 + 0x2 + · · · + 0xn = 1,

which has no solutions. Thus the system is inconsistent.


Conversely, suppose that the last nonzero row has its pivot before the last column. Without
loss of generality, we can renumber the columns — that is, we can renumber the variables
xj — so that the pivot in the ith row occurs in the ith column, where 1 ≤ i ≤ `. Then the
associated system of linear equations has the form:

x1 + a1,`+1 x`+1 + · · · + a1,n xn = b1


x2 + a2,`+1 x`+1 + · · · + a2,n xn = b2
.. ..
. .
x` + a`,`+1 x`+1 + · · · + a`,n xn = b` .

81
§2.4 Reduction to Echelon Form

This system can be rewritten in the form:

x1 = b1 − a1,`+1 x`+1 − · · · − a1,n xn


{e1-ell} x2 = b2 − a2,`+1 x`+1 − · · · − a2,n xn (2.4.1)
.. ..
. .
x` = b` − a`,`+1 x`+1 − · · · − a`,n xn .

Thus, each choice of the n − ` numbers x`+1 , . . . , xn uniquely determines values of x1 , . . . , x`


so that x1 , . . . , xn is a solution to this system. In particular, the system is consistent, so (a)
is proved; and the set of all solutions is parameterized by n−` numbers, so (b) is proved. 

Two Examples Illustrating Theorem 2.4.6 The reduced echelon form matrix
 
1 5 0 0
E= 0 0 1 0 
0 0 0 1

is the augmented matrix of an inconsistent system of three equations in three unknowns.


The reduced echelon form matrix
 
1 5 0 2
E= 0 0 1 5 
0 0 0 0

is the augmented matrix of a consistent system of three equations in three unknowns


x1 , x2 , x3 . For this matrix n = 3 and ` = 2. It follows from Theorem 2.4.6 that the
solutions to this system are specified by one parameter. Indeed, the solutions are

x1 = 2 − 5x2
x3 = 5

and are specified by the one parameter x2 .

Consequences of Theorem 2.4.6 It follows from Theorem 2.4.6 that linear systems of equa-
tions with fewer equations than unknowns and with zeros on the right hand side always
{existencehomo} have nonzero solutions. More precisely:
Corollary 2.4.7. Let A be an m × n matrix where m < n. Then the system of linear
equations whose augmented matrix is (A|0) has a nonzero solution.

82
§2.4 Reduction to Echelon Form

Proof Perform elementary row operations on the augmented matrix (A|0) to arrive at
the reduced echelon form matrix (E|0). Since the zero vector is a solution, the associated
system of equations is consistent. Now the number of nonzero rows ` in (E|0) is less than or
equal to the number of rows m in E. By assumption m < n and hence ` < n. It follows from
Theorem 2.4.6 that solutions to the linear system are parametrized by n − ` ≥ 1 parameters
and that there are nonzero solutions. 

Recall that two m × n matrices are row equivalent if one can be transformed to the other
{consistent} by elementary row operations.
Corollary 2.4.8. Let A be an n × n square matrix and let b be in Rn . Then A is row
equivalent to the identity matrix In if and only if the system of linear equations whose
augmented matrix is (A|b) has a unique solution.

Proof Suppose that A is row equivalent to In . Then, by using the same sequence of
elementary row operations, it follows that the n × (n + 1) augmented matrix (A|b) is row
equivalent to (In |c) for some vector c ∈ Rn . The system of linear equations that corresponds
to (In |c) is:
x1 = c1
.. .. ..
. . .
xn = cn ,
which transparently has the unique solution x = (c1 , . . . , cn ). Since elementary row opera-
tions do not change the solutions of the equations, the original augmented system (A|b) also
has a unique solution.
Conversely, suppose that the system of linear equations associated to (A|b) has a unique
solution. Suppose that (A|b) is row equivalent to a reduced echelon form matrix E. Suppose
that the last nonzero row in E is the `th row. Since the system has a solution, it is consistent.
Hence Theorem 2.4.6(b) implies that the solutions to the system corresponding to E are
parameterized by n − ` parameters. If ` < n, then the solution is not unique. So ` = n.
Next observe that since the system of linear equations is consistent, it follows from Theo-
rem 2.4.6(a) that the pivot in the nth row must occur in a column before the (n + 1)st . It
follows that the reduced echelon matrix E = (In |c) for some c ∈ Rn . Since (A|b) is row
equivalent to (In |c), it follows, by using the same sequence of elementary row operations,
that A is row equivalent to In . 

Uniqueness of Reduced Echelon Form and Rank Abstractly, our discussion of reduced
echelon form has one point remaining to be proved. We know that every matrix A can be

83
§2.4 Reduction to Echelon Form

transformed by elementary row operations to reduced echelon form. Suppose, however, that
we use two different sequences of elementary row operations to transform A to two reduced
{uniquerowechelon} echelon form matrices E1 and E2 . Can E1 and E2 be different? The answer is: No.
Theorem 2.4.9. For each matrix A, there is precisely one reduced echelon form matrix E
that is row equivalent to A.

The proof of Theorem 2.4.9 is given in Section 2.6. Since every matrix is row equivalent to
{D:rank} a unique matrix in reduced echelon form, we can define the rank of a matrix as follows.
Definition 2.4.10. Let A be an m × n matrix that is row equivalent to a reduced echelon
form matrix E. Then the rank of A, denoted rank(A), is the number of nonzero rows in E.

We make three remarks concerning the rank of a matrix.

• An echelon form matrix is always row equivalent to a reduced echelon form matrix with
the same number of nonzero rows. Thus, to compute the rank of a matrix, we need
only perform elementary row operations until the matrix is in echelon form.
• The rank of any matrix is easily computed in MATLAB. Enter a matrix A and type
rank(A).
• The number ` in the statement of Theorem 2.4.6 is just the rank of E.

In particular, if the rank of the augmented matrix corresponding to a consistent system of


linear equations in n unknowns has rank `, then the solutions to this system are parametrized
by n − ` parameters.

Exercises

In Exercises 1 – 2 row reduce the given matrix to reduced echelon form by hand and determine its
{c2.4.1} rank.
 
1 2 1 6
1. A =  3 6 1 14 
1 2 2 8
The reduced echelon form of the matrix is:
 
1 2 0 4
A= 0 0 1 2 
0 0 0 0
The rank of A is two, since the reduced echelon matrix has two nonzero rows.

84
§2.4 Reduction to Echelon Form

{c2.4.1b}
 
1 −2 3
2. B =  3 −6 9 
1 −8 2
The reduced echelon form of the matrix is:
 10 
1 0
3
1
 
B=
 0 1


6
0 0 0

The rank of B is two, since the reduced echelon matrix has two nonzero rows.

{C2S4_1c}
3. How many solutions does the equation
   
x1 2
A  x2  =  1 
x3 2

have for the following choices of A. Explain your reasoning.


 
1 0 1
(a) A =  0 1 0 
0 0 0
 
1 3 1
(b) A =  2 1 0 
0 0 1
 
1 1 1
(c) A =  1 2 1 
1 1 1

Answer: (a) no solutions; (b) 1 solution; (c) infinitely many solutions


Solution:

(a) The third equation in this system is 0 = 1 and that is inconsistent.


(b) A is invertible; so there is 1 solution
(c) Reduce the augmented matrix to echelon form. The rank of A is 2 as is the rank of the
augmented matrix. Therefore, there exists a one-parameter set of solutions.

85
§2.4 Reduction to Echelon Form

{c2.4.2}
4. The augmented matrix of a consistent system of five equations in seven unknowns has rank
equal to three. How many parameters are needed to specify all solutions?
Answer: Four parameters are needed to specify all solutions.
Solution: According to Theorem 2.4.6, n − ` parameters are needed to parameterize the set of all
solutions of a linear system, where n is the number of unknowns, and ` is the rank of the reduced
echelon matrix. In this case, n = 7 and ` = 3.
{c2.4.2b}
5. The augmented matrix of a consistent system of nine equations in twelve unknowns has rank
equal to five. How many parameters are needed to specify all solutions?
Answer: Seven parameters are needed to specify all solutions.
Solution: According to Theorem 2.4.6, n − ` parameters are needed to parameterize the set of all
solutions of a linear system, where n is the number of unknowns, and ` is the rank of the reduced
echelon matrix. In this case, n = 12 and ` = 5.
{c2.4.2b.2}
6. Consider the system of equations
x1 + 3x3 = 1
−x1 + 2x2 − 3x3 = 1
2x2 + ax3 = b
For which real numbers a and b does the system have no solutions, a unique solution, or infinitely
many solutions? Your answer should subdivide the ab-plane into three disjoint sets.
Answer: Unique solutions occur when a 6= 0; no solution occurs when a = 0 and b 6= 2; and
infinitely many solutions exist when a = 0 and b = 2.
Solution: Use row reduction on the augmented matrix to obtain
   
1 0 3 1 1 0 3 1
 −1 2 −3 1  →  0 2 0 2 →
0 2 a b 0 2 a b
   
1 0 3 1 1 0 3 1
 0 2 0 2 → 0 1 0 1 
0 0 a b−2 0 0 a b−2
If a 6= 0 the system has a unique solution. If a = 0 we obtain the echelon form matrix
 
1 0 3 1
 0 1 0 1 
0 0 0 b−2
There are no solutions if b 6= 2 and infinitely many solutions if b = 2.

86
§2.4 Reduction to Echelon Form

In Exercises 7 – 10, use rref on the given augmented matrices to determine whether the associ-
ated system of linear equations is consistent or inconsistent. If the equations are consistent, then
determine how many parameters are needed to enumerate all solutions.

{c2.4.3a} 7. (matlab)  
2 1 3 −2 4 1
 5 12 −1 3 5 1 
{MATLAB:17} A=
 −4
 (2.4.2*)
−21 11 −12 2 1 
23 59 −8 17 21 4

Answer: Matrix A is consistent and requires 3 parameters to enumerate all solutions.


Solution:

rref(A) =
1.0000 0 1.9474 -1.4211 2.2632 0.5789
0 1.0000 -0.8947 0.8421 -0.5263 -0.1579
0 0 0 0 0 0
0 0 0 0 0 0

{c2.4.3b} 8. (matlab)  
2 4 6 −2 1
{MATLAB:18} B= 0 0 4 1 −1  (2.4.3*)
2 4 0 1 2

Answer: Matrix B is consistent and requires 1 parameter.


Solution:

rref(B) =
1.0000 2.0000 0 0 1.0556
0 0 1.0000 0 -0.2222
0 0 0 1.0000 -0.1111

{c2.4.3c} 9. (matlab)  
2 3 −1 4
{MATLAB:19} C= 8 11 −7 8  (2.4.4*)
2 2 −4 −3

Answer: Matrix C is inconsistent.


Solution:

rref(C) =
1 0 -5 0

87
§2.4 Reduction to Echelon Form

0 1 3 0
0 0 0 1

{c2.4.3d} 10. (matlab)  


2.3 4.66 −1.2 2.11 −2
0 0 1.33 0 1.44 
{MATLAB:20} (2.4.5*)

D= 
 4.6 9.32 −7.986 4.22 −10.048 
1.84 3.728 −5.216 1.688 −6.208

Answer: Matrix D is consistent and requires 2 parameters.


Solution:

rref(D) =
1.0000 2.0261 0 0.9174 -0.3047
0 0 1.0000 0 1.0827
0 0 0 0 0
0 0 0 0 0

In Exercises 11 – 13 compute the rank of the given matrix.


 
1 −2
11. (matlab) .
{c2.4.4a} −3 6
Answer: The rank of the matrix is 1.
Solution: Use the MATLAB command rank(A) to determine the rank of a matrix A.
 
2 1 0 1
12. (matlab)  −1 3 2 4 .
{c2.4.4b} 5 −1 2 −2
The rank of the matrix is 3.
 
3 1 0
 −1 2 4 
13. (matlab)   2
.
3 4 
{c2.4.4c} 4 −1 −4
The rank of the matrix is 2.
{A:2.4.1}
14. Prove that the rank of an m × n matrix A is less than or equal to the minimum of m and n.
Suppose A is row equivalent to the m × n reduced row echelon matrix E. The rank of A equals the
number of pivots in E. Since there is at most 1 pivot in each column, the number of pivots is less
than or equal to the number of columns n of E. Similarly, since each row of E contains at most
one pivot, the number of pivots in E is at most the number m of rows of E. It follows that the
rank of A is less than or equal to both m and n and hence the minimum of m and n.

88
§2.4 Reduction to Echelon Form

{mc.exerciseErr5}
15. Consider the matrix  
1 0 −1
 −2 0 2 
A=
 
0 1 −2 
0 0 0
 
b1
 b2
(a) Describe the sets of vectors b =   ∈ R4 so that the system of equations Ax = b has (i)

 b3 
b4
no solution, (ii) one solution, and (iii) infinitely many solutions.
(b) Is the vector
 
2
 −4 
y=
 5 

0
a linear combination of the columns of A? If so, express it as one.

Answer:

(a) (i) The set is empty when b2 + 2b1 6= 0 or b4 6= 0. (ii) The set is empty. (iii) the set consists
of vectors for which b2 + 2b1 = 0 and b4 = 0.
(b) No.

Solution:

(a) Begin by row reducing the augmented matrix (A | b) to reduced echelon form obtaining
 
1 0 −1 b1
 0
 1 −2 b3 
.
 0 0 0 b2 + 2b1 
0 0 0 b4

(i) The system is inconsistent if either b2 + 2b1 6= 0 or b4 6= 0. This is the set of vectors b for
which there is no solution to the equation Ax = b.
(ii,iii) Solutions can exist only if b4 = 0 and b2 + 2b1 = 0. Then the system Ax = b simplifies
to  
1 0 −1 b1
 0 1 −2 b3 
 .
 0 0 0 0 
0 0 0 0

89
§2.4 Reduction to Echelon Form

Solving, this system one finds that solutions have the form
x1 = x3 + b1 and x2 = 2x3 + b3 .
Thus, when there are any solutions to Ax = b, there are infinitely many solutions (one for
each value of x3 ). So (ii) is the empty set and (iii) is the set b4 = 0 and b2 + 2b1 = 0.
(b) Denote the first column of A by C1 , the second by C2 and the third by C3 . A linear combination
of these columns can be written as
 
x1
x1 C1 + x2 C2 + x3 C3 = (C1 | C2 | C3 )  x2  = Ax.
x3
From (a) it follows that y is a linear combination of the columns if and only if y4 = 0 and
y2 + 2y1 = 0. However, the second constraint does not hold and y is not a linear combination
of C1 , C2 , C3 .
{A:2.4.2}
16. Consider the augmented matrix
 
1 −r 1
A=
r −1 1
where r is a real parameter.

(a) Find all r so that rank(A) = 2.


(b) Find all r for which the corresponding linear system has
(i) no solution,
(ii) one solution, and
(iii) infinitely many solutions.

Solution: Subtracting r times the first row of A from the second row of that matrix yields
   
1 −r 1 1 −r 1
=
0 r2 − 1 1 − r 0 (r + 1)(r − 1) 1 − r
So the reduced row echelon form of A is

1


 1 0
1+r 

r 6= ±1
 
1

  
1 −


 0


 1+!r
RREF(A) = 1 −1 1
 r=1



 0 0 0
 !

 1 1 0
r = −1



 0 0 1

90
§2.4 Reduction to Echelon Form

(a) rank(A) = 2 if r 6= 1.
(b) The linear system corresponding to the augmented matrix A has
(i) no solution if r = −1,
(ii) one solution if r 6= ±1, and
(iii) infinitely many solutions if r = 1.

91
§2.5 Linear Equations with Special Coefficients

{S:specialcoeff} 2.5 Linear Equations with Special Coefficients


In this chapter we have shown how to use elementary row operations to solve systems of
linear equations. We have assumed that each linear equation in the system has the form

aj1 x1 + · · · + ajn xn = bj ,

where the aji s and the bj s are real numbers. For simplicity, in our examples we have only
chosen equations with integer coefficients — such as:

2x1 − 3x2 + 15x3 = −1.

Systems with Nonrational Coefficients In fact, a more general choice of coefficients for a
system of two equations might have been

2x1 + 2πx2 = 22.4
{e:irrat} 3x1 + 36.2x2 = e. (2.5.1)

Suppose that we solve (2.5.1) by elementary row operations. In matrix form we have the
augmented matrix
 √ 
2 2π 22.4
.
3 36.2 e

Proceed with the following elementary row operations. Divide the 1st row by 2 to obtain
 √ √ 
1 π 2 11.2 2
.
3 36.2 e

Next, subtract 3 times the 1st row from the 2nd row to obtain:
 √ √ 
1 π 2 √ 11.2 2√
.
0 36.2 − 3π 2 e − 33.6 2

Then divide the 2nd row by 36.2 − 3π 2, obtaining:
 √ √ 
1 π 2 11.2 2√
 e − 33.6 2  .
0 1 √
36.2 − 3π 2

92
§2.5 Linear Equations with Special Coefficients


Finally, multiply the 2nd row by π 2 and subtract it from the 1st row to obtain:

√ √ e − 33.6 2
 
 1 0 11.2 2 − π 2 36.2 − 3π √2


 .
 e − 33.6 2 
0 1 √
36.2 − 3π 2
So

√ √ e − 33.6 2
x1 = 11.2 2 − π 2 √
36.2 − 3π 2
{e:irratans} (2.5.2)

e − 33.6 2
x2 = √
36.2 − 3π 2
which is both hideous to look at and quite uninformative. It is, however, correct.
Both x1 and x2 are real numbers — they had to be because all of the manipulations in-
volved addition, subtraction, multiplication, and division of real numbers — which yield
real numbers.

If we wanted to use MATLAB to perform these calculations, we have to convert 2, π,
and e to their decimal equivalents — at least up to a certain decimal place accuracy. This
introduces errors — which for the moment we assume are small.
To enter A and b in MATLAB , type

A = [sqrt(2) 2*pi; 3 36.2];


b = [22.4; exp(1)];

Now type A to obtain:

A =
1.4142 6.2832
3.0000 36.2000

As its default display, MATLAB displays real numbers to four decimal place accuracy. Sim-
ilarly, type b to obtain

b =
22.4000
2.7183

93
§2.5 Linear Equations with Special Coefficients

Next use MATLAB to solve this system by typing:

A\b

to obtain

ans =
24.5417
-1.9588

The reader may check that this answer agrees with the answer in (2.5.2) to MATLAB output
accuracy by typing

x2 = (exp(1)-33.6*sqrt(2))/(36.2-3*pi*sqrt(2))
x1 = 11.2*sqrt(2)-pi*sqrt(2)*x2

to obtain

x1 =
24.5417

and

x2 =
-1.9588

More Accuracy MATLAB can display numbers in machine precision (15 digits) rather than
the standard four decimal place accuracy. To change to this display, type

format long

Now solve the system of equations (2.5.1) again by typing

A\b

and obtaining

ans =
24.54169560069650
-1.95875151860858

94
§2.5 Linear Equations with Special Coefficients

Integers and Rational Numbers Now suppose that all of the coefficients in a system
of linear equations are integers. When we add, subtract or multiply integers — we get
integers. In general, however, when we divide an integer by an integer we get a rational
number rather than an integer. Indeed, since elementary row operations involve only the
operations of addition, subtraction, multiplication and division, we see that if we perform
elementary row operations on a matrix with integer entries, we will end up with a matrix
with rational numbers as entries.
MATLAB can display calculations using rational numbers rather than decimal numbers. To
display calculations using only rational numbers, type

format rational

For example, let  


2 2 1 0
 1 3 −5 1 
{e:A4} A=
 4
 (2.5.3*)
2 1 3 
2 1 −1 4
and let  
1
 1 
{e:b4} b=
 −5  .
 (2.5.4*)
2
Enter A and b into MATLAB by typing

e2_5_3
e2_5_4

Solve the system by typing

A\b

to obtain

ans =
-357/41
309/41
137/41
156/41

95
§2.5 Linear Equations with Special Coefficients

To display the answer in standard decimal form, type

format
A\b

obtaining

ans =
-8.7073
7.5366
3.3415
3.8049

The same logic shows that if we begin with a system of equations whose coefficients are
rational numbers, we will obtain an answer consisting of rational numbers — since adding,
subtracting, multiplying and dividing rational numbers yields rational numbers. More pre-
cisely:

Theorem 2.5.1. Let A be an n × n matrix that is row equivalent to In , and let b be an n


vector. Suppose that all entries of A and b are rational numbers. Then there is a unique
solution to the system corresponding to the augmented matrix (A|b) and this solution has
rational numbers as entries.

Proof Since A is row equivalent to In , Corollary 2.4.8 states that this linear system has
a unique solution x. As we have just discussed, solutions are found using elementary row
operations — hence the entries of x are rational numbers. 

Complex Numbers In the previous parts of this section, we have discussed why solutions
to linear systems whose coefficients are rational numbers must themselves have entries that
are rational numbers. We now discuss solving linear equations whose coefficients are more
general than real numbers; that is, whose coefficients are complex numbers.
First recall that addition, subtraction, multiplication and division of complex numbers yields
complex numbers. Suppose that

a = α + iβ
b = γ + iδ

96
§2.5 Linear Equations with Special Coefficients


where α, β, γ, δ are real numbers and i = −1. Then
a+b = (α + γ) + i(β + δ)
a−b = (α − γ) + i(β − δ)
ab = (αγ − βδ) + i(αδ + βγ)
a αγ + βδ βγ − αδ
= 2 2
+i 2
b γ +δ γ + δ2

MATLAB has been programmed to do arithmetic with complex numbers using exactly the
same instructions as it uses to do arithmetic with real and rational numbers. For example,
we can solve the system of linear equations
(4 − i)x1 + 2x2 = 3−i
2x1 + (4 − 3i)x2 = 2+i
in MATLAB by typing

A = [4-i 2; 2 4-3i];
b = [3-i; 2+i];
A\b

The solution to this system of equations is:

ans =
0.8457 - 0.1632i
-0.1098 + 0.2493i

Note: Care must be given when entering complex numbers into arrays in MATLAB. For example, if you
type

b = [3 -i; 2 +i]

then MATLAB will respond with the 2 × 2 matrix

b =
3.0000 0 - 1.0000i
2.0000 0 + 1.0000i

Typing either b = [3-i; 2+i] or b = [3 - i; 2 + i] will yield the desired 2 × 1 column vector.

All of the theorems concerning the existence and uniqueness of row echelon form — and for
solving systems of linear equations — work when the coefficients of the linear system are
complex numbers as opposed to real numbers. In particular:

97
§2.5 Linear Equations with Special Coefficients

{T:complexcoeff}
Theorem 2.5.2. If the coefficients of a system of n linear equations in n unknowns are
complex numbers and if the coefficient matrix is row equivalent to In , then there is a unique
solution to this system whose entries are complex numbers.

Complex Conjugation Let a = α + iβ be a complex number. Then the complex conjugate


of a is defined to be
a = α − iβ.
Let a = α + iβ and c = γ + iδ be complex numbers. Then we claim that

a+c = a+c
(2.5.5)
ac = a c

To verify these statements, calculate

a + c = (α + γ) + i(β + δ) = (α + γ) − i(β + δ)
= (α − iβ) + (γ − iδ) = a + c

and

ac = (αγ − βδ) + i(αδ + βγ)


= (αγ − βδ) − i(αδ + βγ)
= (α − iβ)(γ − iδ) = a c.

Exercises

{c2.5.1}
1. Solve the system of equations

x1 − ix2 = 1
ix1 + 3x2 = −1

Check your answer using MATLAB.


 3 1 

x1
 − i
= 2 1
2 .
1
x2 − − i
2 2

98
§2.5 Linear Equations with Special Coefficients

Solve the systems of linear equations given in Exercises 2 – 3 and verify that the answers are rational
{c2.5.1A} numbers.
x1 + x2 − 2x3 = 1
2. x1 + x2 + x3 = 2
x1 − 7x2 + x3 = 3
 43 
 
x1  24 
1 
 − .
 x2  = 
 8 
x3 1
{c2.5.1B} 3
x1 − x2 = 1
3.
x1 + 3x2 = −1
 
 1 
x1 2 .
= 1
x2 −
2

In Exercises 4 – 6 use MATLAB to solve the given system of linear equations to four significant
decimal places.

{c2.5.2a} 4. (matlab) √
0.1x1
√ + 5x2 − 2x3 = 1
− 3x1 + πx2 − 2.6x3 = 14.3 .
π √
x1 − 7x2 + x3 = 2
2
Enter the left-hand side of each system as matrix A and the right-hand side as vector b:

A =
0.1000 2.2361 -2.0000
-1.7321 3.1416 -2.6000
1.0000 -7.0000 1.5708

b = A\b =
1.0000 -7.2216
14.3000 -1.9048
1.4142 -2.9907

{c2.5.2b} 5. (matlab)
(4 − i)x1 + (2 + 3i)x2 = −i
.
ix1 − 4x2 = 2.2

Enter the left-hand side of each system as matrix A and the right-hand side as vector b:

99
§2.5 Linear Equations with Special Coefficients

A =
4.0000 - 1.0000i 2.0000 + 3.0000i
0 + 1.0000i -4.0000

b = A\b =
0 - 1.0000i 0.3006+ 0.2462i
2.2000 -0.6116+ 0.0751i

{c2.5.2c} 6. (matlab)

(2 + i)x1 + ( 2 −√3i)x2 − 10.66x3 = 4.23
14x1 − 5ix2 + (10.2 − i)x3 = 3 −√
1.6i .
−4.276x1 + 2x2 − (4 − 2i)x3 = 2i


Hint: When entering 2i in MATLAB you must type sqrt(2)*i, even though when you enter 2i,
you can just type 2i.
Enter the left-hand side of each system as matrix A and the right-hand side as vector b

A =
2.0000 + 1.0000i 1.4142 - 3.0000i -10.6600
14.0000 0 - 2.2361i 10.2000 - 1.0000i
-4.2760 2.0000 -4.0000 + 2.0000i

b = A\b =
4.2300 0.2060- 0.1139i
3.0000 - 1.6000i 0.1982+ 0.6586i
0 + 1.4142i -0.1358+ 0.0296i

100
§2.6 Uniqueness of Reduced Echelon Form

{S:uniquerowechelon} 2.6 Uniqueness of Reduced Echelon Form


In this section we prove Theorem 2.4.9, which states that every matrix is row equivalent to
precisely one reduced echelon form matrix.
Proof of Theorem 2.4.9: Suppose that E and F are two m × n reduced echelon matrices
that are row equivalent to A. Since elementary row operations are invertible, the two
matrices E and F are row equivalent. Thus, the systems of linear equations associated to
the m × (n + 1) matrices (E|0) and (F |0) must have exactly the same set of solutions. It
is the fact that the solution sets of the linear equations associated to (E|0) and (F |0) are
identical that allows us to prove that E = F .
Begin by renumbering the variables x1 , . . . , xn so that the equations associated to (E|0)
have the form:
x1 = −a1,`+1 x`+1 − · · · − a1,n xn
x2 = −a2,`+1 x`+1 − · · · − a2,n xn
{e1-ell2a} .. .. (2.6.1)
. .
x` = −a`,`+1 x`+1 − · · · − a`,n xn .
In this form, pivots of E occur in the columns 1, . . . , `. We begin by showing that the
matrix F also has pivots in columns 1, . . . , `. Moreover, there is a unique solution to these
equations for every choice of numbers x`+1 , . . . , xn .
Suppose that the pivots of F do not occur in columns 1, . . . , `. Then there is a row in F
whose first nonzero entry occurs in a column k > `. This row corresponds to an equation

xk = ck+1 xk+1 + · · · + cn xn .

Now, consider solutions that satisfy

x`+1 = · · · = xk−1 = 0 and xk+1 = · · · = xn = 0.

In the equations associated to the matrix (E|0), there is a unique solution associated with
every number xk ; while in the equations associated to the matrix (F |0), xk must be zero to
be a solution. This argument contradicts the fact that the (E|0) equations and the (F |0)
equations have the same solutions. So the pivots of F must also occur in columns 1, . . . , `,
and the equations associated to F must have the form:

x1 = −â1,`+1 x`+1 − · · · − â1,n xn


x2 = −â2,`+1 x`+1 − · · · − â2,n xn
{e1-ell2b} .. .. (2.6.2)
. .
x` = −â`,`+1 x`+1 − · · · − â`,n xn

101
§2.6 Uniqueness of Reduced Echelon Form

where âi,j are scalars.


To complete this proof, we show that ai,j = âi,j . These equalities are verified as follows.
There is just one solution to each system (2.6.1) and (2.6.2) of the form

x`+1 = 1, x`+2 = · · · = xn = 0.

These solutions are


(−a1,`+1 , . . . , −a`,`+1 , 1, 0, · · · , 0)
for (2.6.1) and
(−â1,`+1 , . . . , −â`,`+1 , 1, 0 · · · , 0)
for (2.6.2). It follows that aj,`+1 = âj,`+1 for j = 1, . . . , `. Complete this proof by repeating
this argument. Just inspect solutions of the form

x`+1 = 0, x`+2 = 1, x`+3 = · · · = xn = 0

through
x`+1 = · · · = xn−1 = 0, xn = 1.

102
Chapter 3 Matrices and Linearity

3 Matrices and Linearity


In this chapter we take the first step in abstracting vectors and matrices to mathematical
objects that are more than just arrays of numbers. We begin the discussion in Section 3.1 by
introducing the multiplication of a matrix times a vector. Matrix multiplication simplifies
the way in which we write systems of linear equations and is the way by which we view
matrices as mappings. This latter point is discussed in Section 3.2.
The mappings that are produced by matrix multiplication are special and are called linear
mappings. Some properties of linear maps are discussed in Section 3.3. One consequence of
linearity is the principle of superposition that enables solutions to systems of linear equations
to be built out of simpler solutions. This principle is discussed in Section 3.4.
In Section 3.5 we introduce multiplication of two matrices and discuss properties of this
multiplication in Section 3.6. Matrix multiplication is defined in terms of composition of
linear mappings which leads to an explicit formula for matrix multiplication. This dual role
of multiplication of two matrices — first by formula and second as composition — enables
us to solve linear equations in a conceptual way as well as in an algorithmic way. The
conceptual way of solving linear equations is through the use of matrix inverses (or inverse
mappings) which is described in Section 3.7. In this section we also present important
properties of matrix inversion and a method of computation of matrix inverses. There is a
simple formula for computing inverses of 2 × 2 matrices based on determinants. The chapter
ends with a discussion of determinants of 2 × 2 matrices in Section 3.8.

103
§3.1 Matrix Multiplication of Vectors

{chap:matrices}

{S:4.1} 3.1 Matrix Multiplication of Vectors


In Chapter 2 we discussed how matrices appear when solving systems of m linear equations
in n unknowns. Given the system

a11 x1 + a12 x2 + ··· + a1n xn = b1


a21 x1 + a22 x2 + ··· + a2n xn = b2
{general2} .. .. .. .. (3.1.1)
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm ,

we saw that all relevant information is contained in the m × n matrix of coefficients


 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= .. .. .. 
 
 . . . 
am1 am2 ··· amn

and the m vector  


b1
b =  ...  .
 

bm

Matrices Times Vectors We motivate multiplication of a matrix times a vector just as a


notational advance that simplifies the presentation of the linear systems. It is, however, much
more than that. This concept of multiplication allows us to think of matrices as mappings
and these mappings tell us much about the structure of solutions to linear systems. But
first we discuss the notational advantage.
Multiplying an m × n matrix A times an n vector x produces an m vector, as follows:
    
a11 · · · a1n x1 a11 x1 + · · · + a1n xn
{Atimesx} Ax =  ... ..   ..  = 
.  .  
..
. . (3.1.2)
 

am1 · · · amn xn am1 x1 + · · · + amn xn

For example, when m = 2 and n = 3, then the product is a 2-vector


 
  x1  
a11 a12 a13  a11 x1 + a12 x2 + a13 x3
{Atimesx231} x2  = . (3.1.3)
a21 a22 a23 a21 x1 + a22 x2 + a23 x3
x3

104
§3.1 Matrix Multiplication of Vectors

As a specific example, compute


 
  2  
2 3 −1   2·2 + 3 · (−3) + (−1) · 4
−3 =
4 1 5 4·2 + 1 · (−3) + 5·4
4
 
−9
= .
25

Using (3.1.2) we have a compact notation for writing systems of linear equations. For
example, using a special instance of (3.1.3),
 
x
2 3 −1  1 
   
2x1 + 3x2 − x3
x2 = .
4 1 5 4x1 + x2 + 5x3
x3
In this notation we can write the system of two linear equations in three unknowns
2x1 + 3x2 − x3 = 2
4x1 + x2 + 5x3 = −1
as the matrix equation
 
  x1  
2 3 −1  x2  = 2
.
4 1 5 −1
x3

Indeed, the general system of linear equations (3.1.1) can be written in matrix form using
matrix multiplication as
Ax = b
where A is the m × n matrix of coefficients, x is the n vector of unknowns, and b is the m
vector of constants on the right hand side of (3.1.1).

Matrices Times Vectors in MATLAB We have already seen how to define matrices and
vectors in MATLAB. Now we show how to multiply a matrix times a vector using MATLAB.
Load the matrix A  
5 −4 3 −6 2
 2 −4 −2 −1 1 
{eq:5matrix} (3.1.4*)
 
A=
 1 2 1 −5 3 

 −2 −1 −2 1 −1 
1 −6 1 1 4

105
§3.1 Matrix Multiplication of Vectors

and the vector x  


−1
 2 
{eq:5rhs} (3.1.5*)
 
x=
 1 

 −1 
3
into MATLAB by typing

e3_1_4
e3_1_5

The multiplication Ax can be performed by typing

b = A*x

and the result should be

b =
2
-8
18
-6
-1

We may verify this result by solving the system of linear equations Ax = b. Indeed if we
type

A\b

then we get the vector x back as the answer.

Exercises

{c4.1.1}
1. Let    
2 1 3
A= and x= .
−1 4 −2
Compute Ax.

106
§3.1 Matrix Multiplication of Vectors

      
2 1 3 6−2 4
Ax = = =
−1 4 −2 −3 − 8 −11

{c4.1.2}
2. Let  
  2
3 4 1
B= and y =  5 .
1 2 3
−2
Compute By.

 
  2    
3 4 1  5 = 6 + 20 − 2 24
By = =
1 2 3 2 + 10 − 6 6
−2

In Exercises 3 – 6 decide whether or not the matrix vector product Ax can be computed; if it can,
{c4.1.a3a} compute the product.
   
1 2 2
3. A = and x = .
0 −5 2
 
6
Ax = .
−10
{c4.1.a3b}
 
  2
1 2
4. A = and x =  2 .
0 −5
4

{c4.1.a3c} The product Ax cannot be computed.


 
−1
5. A = 1 2 4 and x =  1 .


3
Ax = (13).
{c4.1.a3d}
 
1
6. A = (5) and x= .
0
The product Ax cannot be computed.

107
§3.1 Matrix Multiplication of Vectors

{c4.1.b3}
7. Let
···
   
a11 a12 a1n x1
 a21 a22 ··· a2n   x2 
A= .. .. .. and x= .. .
   
. . . .

   
am1 am2 ··· amn xn
Denote the columns of the matrix A by
     
a11 a12 a1n
 a21   a22   a2n 
A1 =  .  , A2 =  . , ··· An =  .. .
     
 ..   ..   . 
am1 am2 amn
Show that the matrix vector product Ax can be written as
Ax = x1 A1 + x2 A2 + · · · + xn An ,
where xj Aj denotes scalar multiplication (see Chapter 1).
Compute Ax directly:
x1 a11 + x2 a12 + · · · + xn a1n
 
     
 x1 a21 + x2 a22 + · · · + xn a2n  a11 a12 a1n
   a21   a22   a2n 
Ax =   = x1  ..  + x2  ..  + · · · + xn  ..  .
       
.. . . .
.
       
 
am1 am2 amn
x1 am1 + x2 am2 + · · · + xn amn
So, it is indeed true that Ax = x1 A1 + x2 A2 + · · · + xn An .
{c4.1.3}
8. Let    
1 1 1
C= and b= .
2 −1 1
Find a 2-vector z such that Cz = b.
2 1
Answer: The equation Cz = b is valid for z = ( , )t .
3 3
Solution: Let z = (z1 , z2 )t . Then Cz = b implies
    
1 1 z1 1
=
2 −1 z2 1
which can be multiplied out, yielding the linear system
z1 + z2 = 1
2z1 − z2 = 1.

108
§3.1 Matrix Multiplication of Vectors

2 1
This system can be solved by substitution to obtain z1 = and z2 = .
3 3

{c4.1.4}
9. Write the system of linear equations

2x1 + 3x2 − 2x3 = 4


6x1 − 5x3 = 1

in the matrix form Ax = b.


 
  x1  
2 3 −2  x2  = 4
6 0 −5 1
x3

{c4.1.6}
10. Find all solutions to
 
  x1  
1 3 −1 4  x2  14
 2 1 5 7    =  17  .
 x3 
3 4 4 11 31
x4

Answer: All solutions are of the form


 37 16 17 

x1
 − x3 − x4
 5 5 5
 x2   11 7 1

+ x3 −

x4
 x3  =  5
   
5 5 
x4
 x3 
x4

where x3 and x4 are free parameters.


Solution: Create the augmented matrix
 
1 3 −1 4 14
 2 1 5 7 17 
3 4 4 11 31

which can be row reduced to  16 17 37 


1 0
5 5 5
7 1 11
 
,


 0 1 
5 5 5
0 0 0 0 0
yielding the desired solution.

109
§3.1 Matrix Multiplication of Vectors

{c4.1.7}
11. Let A be a 2 × 2 matrix. Find A so that
   
1 3
A =
0 −5
   
0 1
A = .
1 4

Answer: The equations are valid when


 
3 1
A= .
−5 4
Solution: Let  
a11 a12
A= .
a21 a22
So          
a11 a12 1 3 a11 a12 0 1
= and =
a21 a22 0 −5 a21 a22 1 4.
These matrix equations are equivalent to the linear equations
a11 = 3
a21 = −5
a12 = 1
a22 = 4.

{c4.1.8}
12. Let A be a 2 × 2 matrix. Find A so that
   
1 2
A =
1 −1
   
1 4
A = .
−1 3

Answer: The equations are valid when


 
3 −1
A= .
1 −2
Solution: Let  
a11 a12
A=
a21 a22
Then          
a11 a12 1 2 a11 a12 1 4
= and =
a21 a22 1 −1 a21 a22 −1 3.

110
§3.1 Matrix Multiplication of Vectors

These matrix equations yield the linear system

a11 + a12 = 2
a21 + a22 = −1
a11 − a12 = 4
a21 − a22 = 3,

which can be written as an augmented matrix and row-reduced to yield the values aij :
   
1 1 0 0 2 1 0 0 0 3
 0
 0 1 1 −1   −→  0 1 0 0 −1  .
 
 1 −1 0 0 4   0 0 1 0 1 
0 0 1 −1 3 0 0 0 1 2

{c4.1.9}
13. Is there an upper triangular 2 × 2 matrix A such that
   
1 1
{eq:avect} A = ? (3.1.6)
0 2

Is there a symmetric 2 × 2 matrix A satisfying (3.1.6)?


Answer: There is no 2 × 2 upper triangular matrix A that satisfies equation (3.1.6), but any
symmetric matrix A of the form  
1 2
A= ,
2 a22
where a22 is a real number, satisfies (3.1.6).
Solution: Let A be the upper triangular matrix
 
a11 a12
.
0 a22

The resulting matrix equation


    
a11 a12 1 1
=
0 a22 0 2

yields the linear equations


a11 = 1
0 = 2.
The second equation is inconsistent, so there is no solution.
Then let A be the symmetric matrix
 
a11 a12
.
a12 a22

111
§3.1 Matrix Multiplication of Vectors

Write the matrix equation     


a11 a12 1 1
= ,
a12 a22 0 2
from which we obtain the consistent linear system
a11 = 1
a12 = 2.

In Exercises 14 – 15 use MATLAB to compute b = Ax for the given A and x.

{c4.1.a10a} 14. (matlab)


   
−0.2 −1.8 3.9 −6 −1.6 −2.6
 6.3 8 3 2.5 5.1   2.4 
{multiplication-exercise} and (3.1.7*)
   
A=
 −0.8 −9.9 9.7 4.7 5.9 
 x=
 4.6 .

 −0.9 −4.1 1.1 −2.5 8.4   −6.1 
−1 −9 −2 −9.8 6.9 8.1

After loading the system into MATLAB, typing b = A*x yields

b =
37.7800
42.6800
42.0600
77.6100
87.4700

{c4.1.a10b} 15. (matlab)  


14 −22 −26 −2 −77 100 −90

 26 25 −15 −63 33 92 14 

 −53 40 19 40 −27 −88 40 
multiplication-exercise-2} (3.1.8*)
 
A=
 10 −21 13 97 −72 −28 92 


 86 −17 43 61 13 10 50 

 −33 31 2 41 65 −48 48 
31 68 55 −3 35 19 −14
and  
2.7

 6.1 


 −8.3 

x=
 8.9 .


 8.3 

 2 
−4.9

Load the system into MATLAB, then type b = A*x to obtain:

112
§3.1 Matrix Multiplication of Vectors

b =
103.5000
175.8000
-296.9000
-450.1000
197.4000
656.6000
412.4000

{c4.1.10} 16. (matlab) Let


   
2 4 −1 2
{inverse-exercise} A= 1 3 2  and b =  1 . (3.1.9*)
−1 −2 5 4
Find a 3-vector x such that Ax = b.
Using MATLAB:

A\b =
7.1111
-2.7778
1.1111

{c4.1.11} 17. (matlab) Let


   
1.3 −4.15 −1.2 1.12
{MATLAB:24} A =  1.6 −1.2 2.4  and b =  −2.1  . (3.1.10*)
−2.5 2.35 5.09 4.36

Find a 3-vector x such that Ax = b.


Using MATLAB:

A\b =
-2.3828
-1.0682
0.1794

113
§3.1 Matrix Multiplication of Vectors

{c4.1.12} 18. (matlab) Let A be a 3 × 3 matrix. Find A so that

   
2 1
A  −1  =  1 
1 −1
   
1 −1
A  −1  =  −2 
0 1
   
0 5
A 2  =  1 .
4 1

Hint: Rewrite these three conditions as a system of linear equations in the nine entries of A. Then
solve this system using MATLAB. (Then pray that there is an easier way.)
Answer: The conditions on A are met when

 
2.5 3.5 −0.5
A =  7.5 9.5 −4.5  .
−5.5 −6.5 3.5

Solution: Let
 
a11 a12 a13
A =  a21 a22 a23 
a31 a32 a33

Substituting for A in each of the three equations yields:

    
a11 a12 a13 2 1
 a21 a22 a23   −1 = 1 
 a31 a32 a33   1   −1 
a11 a12 a13 1 −1
 a21 a22 a23   −1 = −2 
a31 a32 a33   0   1
a11 a12 a13 0 5
 a21 a22 a23   2 = 1 .
a31 a32 a33 4 1

114
§3.1 Matrix Multiplication of Vectors

These equations can be rewritten as the linear system:

2a11 − a12 + a13 = 1


2a21 − a22 + a23 = 1
2a31 − a32 + a33 = −1
a11 − a12 = −1
a21 − a22 = −2
a31 − a32 = 1
2a12 + 4a13 = 5
2a22 + 4a23 = 1
2a32 + 4a33 = 1

We can enter the left-hand side of the system into MATLAB as matrix C and the right-hand side
as vector b.
   
2 −1 1 0 0 0 0 0 0 1
 0
 0 0 2 −1 1 0 0 0 


 1 

 0
 0 0 0 0 0 2 −1 1 


 −1 

 1 −1 0 0 0 0 0 0 0   −1 
   
C= 0 0 0 1 −1 0 0 0 0 
 b=
 −2 .

 0
 0 0 0 0 0 1 −1 0 


 1 

 0 2 4 0 0 0 0 0 0   5 
   
 0 0 0 0 2 4 0 0 0   1 
0 0 0 0 0 0 0 2 4 1

Then solve to obtain:

C\b =
2.5000
3.5000
-0.5000
7.5000
9.5000
-4.5000
-5.5000
-6.5000
3.5000

Substitute these values back into A to obtain the appropriate matrix.

115
§3.2 Matrix Mappings

{s:4.2} 3.2 Matrix Mappings


Having illustrated the notational advantage of using matrices and matrix multiplication, we
now begin to discuss why there is also a conceptual advantage to matrix multiplication, a
conceptual advantage that will help us to understand how systems of linear equations and
linear differential equations may be solved.
Matrix multiplication allows us to view m × n matrices as mappings from Rn to Rm . Let A
be an m × n matrix and let x be an n vector. Then

x 7→ Ax

defines a mapping from Rn to Rm .


The simplest example of a matrix mapping is given by 1 × 1 matrices. Matrix mappings
defined from R → R are
x 7→ ax
where a is a real number. Note that the graph of this function is just a straight line through
the origin (with slope a). From this example we see that matrix mappings are very special
mappings indeed. In higher dimensions, matrix mappings provide a richer set of mappings;
we explore here planar mappings — mappings of the plane into itself — using MATLAB
graphics and the program map.
The simplest planar matrix mappings are the dilatations. Let A = cI2 where c > 0 is a
scalar. When c < 1 vectors are contracted by a factor of c and and these mappings are
examples of contractions. When c > 1 vectors are stretched or expanded by a factor of c
and these dilatations are examples of expansions. We now explore some more complicated
planar matrix mappings.
The next planar motions that we study are those given by the matrices
 
λ 0
A= .
0 µ

Here the matrix mapping is given by (x, y) 7→ (λx, µy); that is, a mapping that inde-
pendently stretches and/or contracts the x and y coordinates. Even these simple looking
mappings can move objects in the plane in a somewhat complicated fashion.

The Program map We use MATLAB to explore planar matrix mappings using the program
map. In MATLAB type the command

map

116
§3.2 Matrix Mappings

and a window appears labeled Map. The 2 × 2 matrix


 
0 −1
{e:map_A} . (3.2.1)
1 0

has been pre-entered. Click on the Custom button. In the Icons menu click on an icon —
say Dog — and a blue ‘Dog’ will appear in the graphing window. Next click on the Iterate
button and a new version of the Dog will appear in yellow —the yellow Dog is just rotated
about the origin counterclockwise by 90◦ from the blue dog. Indeed, the matrix (3.2.1)
rotates the plane counterclockwise by 90◦ . To verify this statement click on Iterate again
and see that the yellow dog rotates 90◦ counterclockwise into the magenta dog. Of course,
the magenta dog is rotated 180◦ from the original blue dog. Clicking on Iterate once more
produces a fourth dog — this one in cyan. Finally, one more click on the Iterate button will
rotate the cyan dog into a red dog that exactly covers the original blue dog.
Other matrices will produce different motions of the plane. Click on the Reset button.
Then either push the Custom button, type the entries in the matrix, and click on the Iterate
button; or choose one of the pre-assigned matrices listed in the Gallery menu and click on the
Iterate button. For example, clicking on the Contracting rotation button recalls the matrix
 
0.3 −0.8
0.8 0.3

This matrix rotates the plane through an angle of approximately 69.4◦ counterclockwise and
contracts the plane by a factor of approximately 0.85. Now click on Dog in the Icons menu
to bring up the blue dog again. Repeated clicking on Iterate rotates and contracts the dog
so that dogs in a cycling set of colors slowly converge towards the origin in a spiral of dogs.4

Rotations Rotating the plane counterclockwise through an angle θ is a motion given by a


matrix mapping. We show that the matrix that performs this rotation is:
 
cos θ − sin θ
{e:rotmat} Rθ = . (3.2.2)
sin θ cos θ

To verify that Rθ rotates the plane counterclockwise through angle θ, let vϕ be the unit
vector whose angle from the horizontal is ϕ; that is, vϕ = (cos ϕ, sin ϕ). We can write every
vector in R2 as rvϕ for some number r ≥ 0. Using the trigonometric identities for the cosine
4
When using the program map first choose an Icon (or Vector), second choose a Matrix from the Gallery
(or a Custom matrix), and finally click on Iterate. Then Iterate again or Reset to start over.

117
§3.2 Matrix Mappings

and sine of the sum of two angles, we have:


  
cos θ − sin θ r cos ϕ
Rθ (rvϕ ) =
sin θ cos θ r sin ϕ
 
r cos θ cos ϕ − r sin θ sin ϕ
=
r sin θ cos ϕ + r cos θ sin ϕ
 
cos(θ + ϕ)
= r
sin(θ + ϕ)
= rvϕ+θ .

This calculation shows that Rθ rotates every vector in the plane counterclockwise through
angle θ.
It follows from (3.2.2) that R180◦ = −I2 . So rotating a vector in the plane by 180◦ is
the same as reflecting the vector through the origin. It also follows that the movement
associated with the linear map x 7→ −cx where x ∈ R2 and c > 0 may be thought of as a
dilatation (x 7→ cx) followed by rotation through 180◦ (x 7→ −x).
We claim that combining dilatations with general rotations produces spirals. Consider the
matrix  
c cos θ −c sin θ
S= = cRθ
c sin θ c cos θ
where c < 1. Then a calculation similar to the previous one shows that

S(rvϕ ) = c(rvϕ+θ ).

So S rotates vectors in the plane while contracting them by the factor c. Thus, multiplying
a vector repeatedly by S spirals that vector into the origin. The example that we just
considered while using map is
0.85 cos(69.4◦ ) −0.85 sin(69.4◦ )
   
0.3 −0.8 ∼
= ,
0.8 0.3 0.85 sin(69.4◦ ) 0.85 cos(69.4◦ )

which is an example of S with c = 0.85 and θ = 69.4◦ .

A Notation for Matrix Mappings We reinforce the idea that matrices are mappings by
introducing a notation for the mapping associated with an m × n matrix A. Define

LA : Rn → Rm

by
LA (x) = Ax,

118
§3.2 Matrix Mappings

for every x ∈ Rn .
There are two special matrices: the m × n zero matrix O all of whose entries are 0 and the
n × n identity matrix In whose diagonal entries are 1 and whose off diagonal entries are 0.
For instance,  
1 0 0
I3 =  0 1 0  .
0 0 1

The mappings associated with these special matrices are also special. Let x be an n vector.
Then
{multby0} Ox = 0, (3.2.3)
where the 0 on the right hand side of (3.2.3) is the m vector all of whose entries are 0. The
mapping LO is the zero mapping — the mapping that maps every vector x to 0.
Similarly,
In x = x
for every vector x. It follows that
LIn (x) = x
is the identity mapping, since it maps every vector to itself. It is for this reason that the
matrix In is called the n × n identity matrix.

Exercises

In Exercises 1 – 3 find a nonzero vector that is mapped to the origin by the given matrix.
{c4.2.a1a}
 
0 1
1. A = .
0 −2
Answer: If x = (x1 , 0)t , where x1 is any real scalar, then Ax = 0.
Solution: Let x = (x1 , x2 )t and solve the system
    
0 1 x1 0
Ax = =
0 −2 x2 0

by row reducing A to obtain  


0 1
.
0 0
Thus, Ax = 0 when x2 = 0.

119
§3.2 Matrix Mappings

{c4.2.a1b}
 
1 2
2. B = .
−2 −4
Answer: If x = (−2x2 , x2 )t , where x2 is any real scalar, then Bx = 0.
Solution: Let x = (x1 , x2 )t , and solve Bx = 0 by row reducing B:
   
1 2 1 2
−→ .
−2 −4 0 0

{c4.2.a1c} Therefore, Bx = 0 when x1 + 2x2 = 0.


 
3 −1
3. C = .
−6 2
Answer: If x = (x1 , 3x1 )t , where x1 is any real scalar, then Cx = 0.
1
Solution: Solve Cx = 0 by row reducing C to find that Cx = 0 when x1 − x2 = 0.
3
{c4.2.1a}
4. What 2 × 2 matrix rotates the plane about the origin counterclockwise by 30◦ ?
Answer:  √ 
3 1

cos 30◦
− sin 30◦  − 
R30◦ = = 2 √2  .

sin 30◦ cos 30◦ 1 3
2 2
Solution: The 2 × 2 matrix that rotates the plane counterclockwise through an angle θ is
 
cos θ − sin θ
Rθ = .
sin θ cos θ

{c4.2.1b}
5. What 2 × 2 matrix rotates the plane clockwise by 45◦ ?
Answer:  1 1 
√ √
cos(−45◦ ) − sin(−45◦ )
 
R(−45◦ ) = = 2 2 
1 .

sin(−45◦ ) cos(−45◦ ) 1
−√ √
2 2
{c4.2.1c}
6. What 2 × 2 matrix rotates the plane clockwise by 90◦ while dilating it by a factor of 2?
Answer:
2 cos(−90◦ ) −2 sin(−90◦ )
   
0 2
2R(−90◦ ) = = .
2 sin(−90◦ ) 2 cos(−90◦ ) −2 0

120
§3.2 Matrix Mappings

{c4.2.2a}
7. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane across the x axis.
The map LA that reflects vectors across the x-axis is (x, y) → (x, −y). The matrix is
 
1 0
A= .
0 −1
{c4.2.2b}
8. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane across the y axis.
The map LA that reflects vectors across the y-axis is (x, y) → (−x, y). The matrix is
 
−1 0
A= .
0 1
{c4.2.2c}
9. Find a 2 × 2 matrix that reflects vectors in the (x, y) plane across the line x = y.
The map LA that reflects vectors across the line y = x is (x, y) → (y, x). The matrix is
 
0 1
A= .
1 0
{mc.exercise7}
10. Suppose the mapping L : R3 → R2 is linear and satisfies
     
1   0   0  
1 2 −1
L 0  = L 1  = L 0  =
2 0 4
0 1 1
What is the 2 × 3 matrix A such that L = LA ?
Answer:  
1 3 −1
A=
2 −4 4

Solution:  −1
  1 0 0
1 2 −1  0
A= 1 0 
2 0 4
0 1 1
Use Gaussian elimination to compute
   
1 0 0 1 0 0 1 0 0 1 0 0
 0 1 0 0 1 0 ∼ 0 1 0 0 1 0 
0 1 1 0 0 1 0 0 1 0 −1 1
Therefore  
  1 0 0  
1 2 −1  0 1 3 −1
A= 1 0 =
2 0 4 2 −4 4
0 −1 1

121
§3.2 Matrix Mappings

{c7.8.1}
11. The matrix
 
1 K
A=
0 1

is a shear. Describe the action of A on the plane for different values of K.


For any point (x, y) in the plane, A(x, y)t = (x + Ky, y)t . Therefore, if K > 0, then (x, y) is shifted
to the right by a factor of Ky. If K < 0, then (x, y) is shifted to the left by a factor of |K|y. If
K = 0, then (x, y) is mapped to itself.

{c7.8.2}
12. Determine a rotation matrix that maps the vectors (3, 4) and (1, −2) onto the vectors (−4, 3)
and (2, 1) respectively.
Answer: The matrix

cos 90◦ − sin 90◦


   
0 −1
R90◦ = =
sin 90◦ cos 90◦ 1 0

performs the desired transformation.


Solution: Note that the transformations (3, 4) → (−4, 3) and (1, −2) → (2, 1) are obtained by
rotating the plane 90◦ counterclockwise. Then use (3.2.2) to obtain the matrix corresponding to
this rotation.

{c4.2.3}
13. Find a 2 × 3 matrix P that projects three dimensional xyz space onto the xy plane. Hint:
Such a matrix will satisfy
  
0   x  
0 x
P 0 = and P y = .
0 y
z 0

Answer: The matrix is


 
1 0 0
P = .
0 1 0

Solution: Let
 
p11 p12 p13
P = .
p21 p22 p23

122
§3.2 Matrix Mappings

Note that a matrix that projects xyz space onto the xy plane satisfies the vector equations:
 
  1  
p11 p12 p13  0 = 1
p21 p22 p23 0
 0 
  0  
p11 p12 p13  1 = 0
p21 p22 p23 1
 0 
  0  
p11 p12 p13  0 = 0
p21 p22 p23 0
1

from which we get the equations

p11 = 1 p12 = 0 p13 = 0


, and .
p21 = 0 p22 = 1 p23 = 0

Substitute these values back into P to obtain the solution.

{c4.2.3a}
 
a −b
14. Show that every matrix of the form corresponds to rotating the plane through
b a
the angle θ followed by a dilatation cI2 where
p
c = a2 + b2
a
cos θ =
c
b
sin θ = .
c

The matrix which is a rotation of the plane through angle θ followed by a dilatation cI2 is
 
c cos θ −c sin θ
cRθ = .
c sin θ c cos θ

In order for the given matrix to equal cRθ , we need


   
a −b c cos θ −c sin θ
= .
b a c sin θ c cos θ

Thus, we have a = c cos θ and b = c sin θ. We can then write

c2 = c2 (cos2 θ + sin2 θ) = c2 cos2 θ + c2 sin2 θ = a2 + b2 ,

or p
c= a2 + b2 .

123
§3.2 Matrix Mappings

{c4.2.3b}
 
3 4
15. Using Exercise 14 observe that the matrix rotates the plane counterclockwise
−4 3
through an angle θ and then dilates the planes by a factor of c. Find θ and c. Use map to verify
your results.
 
−4
Answer: This matrix is a counterclockwise rotation through an angle θ = sin−1 ≈ 2π −
5
0.9273 ≈ 5.3559 and a dilatation by a scalar c = 5.
Solution:This follows from Exercise
 14 with a = 3 and b = −4. Thus, c = a + b = 5 and
p
2 2

b −4
θ = sin−1 √ = sin−1 . Add 2π to correspond to counterclockwise rotation.
a2 + b2 5

In Exercises 16 – 18 use map to find vectors that are stretched and/or contracted to a multiple
of themselves by the given linear mapping. Hint: Choose a vector in the Map window and apply
Iterate several times.
 
2 0
16. (matlab) A = .
{c4.2.a4a} 1.5 0.5
The matrix A maps any vector x of the form x = s(1, 1)t , where s is a real scalar, to twice its
length, and any vector of the form x = s(0, 1)t to half its length.
If you are having trouble finding this vector with map, turn on the rescale vectors option,
which scales every vector to length 1 after mapping it. Then, test vectors by applying map several
times until you find a vector which (approximately) maps to itself.
 
1.2 −1.5
17. (matlab) B = .
{c4.2.a4b} −0.4 1.2
The matrix B maps any vector of the form x ≈ s(−0.89, 0.46) to approximately 1.97 times its
length.
 
2 −1.25
18. (matlab) C = .
{c4.2.a4c} 0 −0.5
The matrix C maps any vector of the form x = s(1, 0) to twice its length and maps any vector of
1
the form x = s(1, 2) to − times its length.
2

In Exercises 19 – 21 use Exercise 14 and map to verify that the given matrices rotate te through
an angle θ followed by a dilatation cI2 . Find θ and c in each case.
 
1 −2
19. (matlab) A = .
{c4.2.ba} 2 1
Answer:
√ Matrix A rotates the plane by θ =≈ 1.1071 counterclockwise and dilatates it by a factor
of c = 5 ≈ 2.2361.

124
§3.2 Matrix Mappings


Solution: Matrix
 A is a special case 
of Exercise  14 with a = 1 and b = 2. Thus, c = a2 + b2 = 5
p

a 1
and θ = cos−1 √ = cos−1 √ ≈ 1.1071.
a2 + b2 12 + 2 2
 
−2.4 −0.2
20. (matlab) B = .
{c4.2.bb} 0.2 −2.4
Answer:
√ Matrix B rotates the plane by θ =≈ 3.0585 counterclockwise and dilatates it by a factor
of c = 5.8 ≈ 2.4083.
Solution: Matrix B is a special case of Exercise 14 with a = −2.4 and ! b = 0.2. Thus, c =

 
a 2.4
a2 + b2 = 5.8 and θ = cos−1 √
p
= cos−1 − p ≈ 3.0585.
a2 + b2 (−2.4)2 + (0.2)2
 
2.67 1.3
21. (matlab) C = .
{c4.2.bc} −1.3 2.67
Answer: Matrix C rotates the plane by θ =≈ 0.4531 counterclockwise and dilatates it by a factor
of c ≈ 2.9697.
Solution: Matrix C is a special case of Exercise 14 with a = 2.67 and b = −1.3. Thus, c =
a2 + b2 = (2.67)2 + (−1.3)2 ≈ 2.9697 and
p p

  !
−1 a −1 2.67
θ = cos √ = cos p ≈ 0.4531.
a2 + b2 (2.67)2 + (−1.3)2

In Exercises 22 – 26 use map to help describe the planar motions of the associated linear mappings
for the given 2 × 2 matrix.
 √ 
3 1
22. (matlab) A =  2 √2  .

1 3

{c4.2.4a} 2 2
A rotates the plane 30◦ clockwise.
 1 1 

23. (matlab) B =  2 1
2 .
1
{c4.2.4b} 2 2

B rotates the plane 45 counterclockwise and reduces it by a factor of 2.

 
0 1
24. (matlab) C = .
{c4.2.4c} 1 0
C reflects the plane across the line y = x.

125
§3.2 Matrix Mappings

 
1 0
25. (matlab) D = .
{c4.2.4d} 0 0
D maps the plane onto the x-axis.
 1 1 

26. (matlab) E =  2 1 1
2 .
{c4.2.4e} 2 2
x+y x+y
E maps (x, y) to a point on the line y = x; that point is ( , ).
2 2

{c4.2.5} 27. (matlab) The matrix  


0 −1
A=
−1 0
reflects the xy-plane across the diagonal line y = −x while the matrix
 
−1 0
B=
0 −1

rotates the plane through an angle of 180◦ . Using the program map verify that both matrices map
the vector (1, 1) to its negative (−1, −1). Now perform two experiments. First, choose the dog icon
and move that dog by the matrix A. Second, move that dog using the matrix B. Describe the
difference in the result.
Both matrices map the dog to near the point (−1, −1) but the orientation of the dog is different in
the two cases. See Figure 27.

(x,y)−Plane (x,y)−Plane

2 2

1.5 1.5

1 1

0.5 0.5

0 0
y

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2


x x

Figure 27a Figure 27b

126
§3.3 Linearity

{S:linearity} 3.3 Linearity


We begin by recalling the vector operations of addition and scalar multiplication. Given
two n vectors, vector addition is defined by
     
x1 y1 x1 + y1
 ..   ..   ..
 . + . = . .

xn yn xn + yn

Multiplication of a scalar times a vector is defined by


   
x1 cx1
c  ...  =  ...  .
   

xn cxn

Using (3.1.2) we can check that matrix multiplication satisfies

{sum} A(x + y) = Ax + Ay (3.3.1)


{product} A(cx) = c(Ax). (3.3.2)

Using MATLAB we can also verify that the identities (3.3.1) and (3.3.2) are valid for some
particular choices of x, y, c and A. For example, let c = 5 and
   
  1 1
2 3 4 1  5   −1 
{MATLAB:29} A= , x=  4  , y =  −1  .
   (3.3.3*)
1 1 2 3
3 4

Typing e3_3_3 enters this information into MATLAB. Now type

z1 = A*(x+y)
z2 = A*x + A*y

and compare z1 and z2. The fact that they are both equal to
 
35
33

verifies (3.3.1) in this case. Similarly, type

w1 = A*(c*x)
w2 = c*(A*x)

127
§3.3 Linearity

and compare w1 and w2 to verify (3.3.2).


The central idea in linear algebra is the notion of linearity.
{linearity}
Definition 3.3.1. A mapping L : Rn → Rm is linear if

(a) L(x + y) = L(x) + L(y) for all x, y ∈ Rn .

(b) L(cx) = cL(x) for all x ∈ Rn and all scalars c ∈ R.

To better understand the meaning of Definition 3.3.1(a,b), we verify these conditions for
the mapping L : R2 → R2 defined by

{E:mme} L(x) = (x1 + 3x2 , 2x1 − x2 ), (3.3.4)

where x = (x1 , x2 ) ∈ R2 . To verify Definition 3.3.1(a), let y = (y1 , y2 ) ∈ R2 . Then

L(x + y) = L(x1 + y1 , x2 + y2 )
= ((x1 + y1 ) + 3(x2 + y2 ), 2(x1 + y1 ) − (x2 + y2 ))
= (x1 + y1 + 3x2 + 3y2 , 2x1 + 2y1 − x2 − y2 ).

On the other hand,

L(x) + L(y) = (x1 + 3x2 , 2x1 − x2 ) + (y1 + 3y2 , 2y1 − y2 )


= (x1 + 3x2 + y1 + 3y2 , 2x1 − x2 + 2y1 − y2 ).

Hence
L(x + y) = L(x) + L(y)
for every pair of vectors x and y in R2 .
Similarly, to verify Definition 3.3.1(b), let c ∈ R be a scalar and compute

L(cx) = L(cx1 , cx2 ) = ((cx1 ) + 3(cx2 ), 2(cx1 ) − (cx2 )).

Then compute

cL(x) = c(x1 + 3x2 , 2x1 − x2 ) = (c(x1 + 3x2 ), c(2x1 − x2 )),

from which it follows that


L(cx) = cL(x)
for every vector x ∈ R and every scalar c ∈ R. Thus L is a linear mapping.
2

128
§3.3 Linearity

In fact, the mapping (3.3.4) is a matrix mapping and could have been written in the form
 
1 3
L(x) = x.
2 −1
Hence the linearity of L could have been checked using identities (3.3.1) and (3.3.2). Indeed,
matrix mappings are always linear mappings, as we now discuss.

Matrix Mappings are Linear Mappings Let A be an m × n matrix and recall that the matrix
mapping LA : Rn → Rm is defined by LA (x) = Ax. We may rewrite (3.3.1) and (3.3.2)
using this notation as
LA (x + y) = LA (x) + LA (y)
LA (cx) = cLA (x).
Thus all matrix mappings are linear mappings. We will show that all linear mappings are
matrix mappings (see Theorem 3.3.5). But first we discuss linearity in the simplest context
of mappings from R → R.

Linear and Nonlinear Mappings of R → R Note that 1 × 1 matrices are just scalars
A = (a). It follows from (3.3.1) and (3.3.2) that we have shown that the matrix mappings
LA (x) = ax are all linear, though this point could have been verified directly. Before showing
that these are all the linear mappings of R → R, we focus on examples of functions of R → R
that are not linear.

Examples of Mappings that are Not Linear

• f (x) = x2 . Calculate
f (x + y) = (x + y)2 = x2 + 2xy + y 2
while
f (x) + f (y) = x2 + y 2 .
The two expressions are not equal and f (x) = x2 is not linear.
• f (x) = ex . Calculate
f (x + y) = ex+y = ex ey
while
f (x) + f (y) = ex + ey .
The two expressions are not equal and f (x) = ex is not linear.

129
§3.3 Linearity

• f (x) = sin x. Recall that

f (x + y) = sin(x + y) = sin x cos y + cos x sin y

while
f (x) + f (y) = sin x + sin y.
The two expressions are not equal and f (x) = sin x is not linear.

Linear Functions of One Variable Suppose we take the opposite approach and ask what
functions of R → R are linear. Observe that if L : R → R is linear, then

L(x) = L(x · 1).

Since we are looking at the special case of linear mappings on R, we note that x is a real
number as well as a vector. Thus we can use Definition 3.3.1(b) to observe that

L(x · 1) = xL(1).

So if we let a = L(1), then we see that

L(x) = ax.

Thus linear mappings of R into R are very special mappings indeed; they are all scalar
multiples of the identity mapping.

All Linear Mappings are Matrix Mappings We end this section by proving that every
linear mapping is given by matrix multiplication. But first we state and prove two lemmas.
There is a standard set of vectors that is used over and over again in linear algebra, which
{D:canonicalbasis} we now define.

Definition 3.3.2. Let j be an integer between 1 and n. The n-vector ej is the vector that
has a 1 in the j th entry and zeros in all other entries.
{linequal}
Lemma 3.3.3. Let L1 : Rn → Rm and L2 : Rn → Rm be linear mappings. Suppose that
L1 (ej ) = L2 (ej ) for every j = 1, . . . , n. Then L1 = L2 .

Proof Let x = (x1 , . . . , xn ) be a vector in Rn . Then

x = x1 e1 + · · · + xn en .

130
§3.3 Linearity

Linearity of L1 and L2 implies that

L1 (x) = x1 L1 (e1 ) + · · · + xn L1 (en )


= x1 L2 (e1 ) + · · · + xn L2 (en )
= L2 (x).

Since L1 (x) = L2 (x) for all x ∈ Rn , it follows that L1 = L2 . 


{columnsA}
Lemma 3.3.4. Let A be an m × n matrix. Then Aej is the j th column of A.

Proof Recall the definition of matrix multiplication given in (3.1.2). In that formula,
just set xi equal to zero for all i 6= j and set xj = 1. 
{lin-matrices}
Theorem 3.3.5. Let L : Rn → Rm be a linear mapping. Then there exists an m × n matrix
A such that L = LA .

Proof There are two steps to the proof: determine the matrix A and verify that LA = L.
Let A be the matrix whose j th column is L(ej ). By Lemma 3.3.4 L(ej ) = Aej ; that is,
L(ej ) = LA (ej ). Lemma 3.3.3 implies that L = LA . 

Theorem 3.3.5 provides a simple way of showing that

L(0) = 0

for any linear map L. Indeed, L(0) = LA (0) = A0 = 0 for some matrix A. (This fact can
also be proved directly from the definition of linear mapping.)

Using Theorem 3.3.5 to Find Matrices Associated to Linear Maps The proof of Theorem 3.3.5
shows that the j th column of the matrix A associated to a linear mapping L is L(ej ) viewed
as a column vector. As an example, let L : R2 → R2 be rotation clockwise through 90◦ .
Geometrically, it is easy to see that
   
1 0
L(e1 ) = L =
0 −1

and    
0 1
L(e2 ) = L = .
1 0

131
§3.3 Linearity

Since we know that rotations are linear maps, it follows that the matrix A associated to the
linear map L is:  
0 1
A= .
−1 0
Additional examples of linear mappings whose associated matrices can be found using The-
orem 3.3.5 are given in Exercises 11 – 14.

Exercises

{c4.3.1}
1. Compute ax + by for each of the following:

(a) a = 2, b = −3, x = (2, 4) and y = (3, −1).


(b) a = 10, b = −2, x = (1, 0, −1) and y = (2, −4, 3).
(c) a = 5, b = −1, x = (4, 2, −1, 1) and y = (−1, 3, 5, 7).

(a) 2(2, 4) − 3(3, −1) = (−5, 11);


(b) 10(1, 0, −1) − 2(2, −4, 3) = (6, 8, −16);
(c) 5(4, 2, −1, 1) − 1(−1, 3, 5, 7) = (21, 7, −10, −2).
{c4.3.2}
2. Let x = (4, 7) and y = (2, −1). Write the vector αx + βy as a vector in coordinates.
α(4, 7) + β(2, −1) = (4α + 2β, 7α − β).
{c4.3.3}
3. Let x = (1, 2), y = (1, −3), and z = (−2, −1). Show that you can write
z = αx + βy
for some α, β ∈ R.
Hint: Set up a system of two linear equations in the unknowns α and β, and then solve this linear
system.
The equation
(−2, −1) = α(1, 2) + β(1, −3) = (α + β, 2α − 3β)
can be rewritten as the linear system
−2 = α + β
.
−1 = 2α − 3β
7 3
Solving this system yields α = − and β = − .
5 5

132
§3.3 Linearity

{c4.3.4}
4. Can the vector z = (2, 3, −1) be written as

z = αx + βy

where x = (2, 3, 0) and y = (1, −1, 1)?


Answer: The vector z = (2, 3, −1) cannot be written as

z = α(2, 3, 0) + β(1, −1, 1).

Solution: Write the equation

(2, 3, −1) = α(2, 3, 0) + β(1, −1, 1) = (2α + β, 3α − β, β)

as a linear system of two unknowns in three equations:

2 = 2α + β
3 = 3α − β .
−1 = β

Substituting β = −1 into the first and second equations yields


3
= α
2
2
= α.
3
The system is inconsistent, so there is no solution to the desired equation.

{c4.3.5}
5. Let x = (3, −2), y = (2, 3), and z = (1, 4). For which real numbers α, β, γ does

αx + βy + γz = (1, −2)?

Answer: The equation


α(3, −2) + β(2, 3) + γ(1, 4) = (1, −2)
5 7 14 4
holds for any real numbers α,β, γ such that α = γ+ and β = − γ − .
13 13 13 13
Solution: Write the equation as the linear system

3α + 2β + γ = 1
−2α + 3β + 4γ = −2.

The augmented matrix  


3 2 1 1
−2 3 4 −2

133
§3.3 Linearity

row reduces to  5 7 
1 0 −
13 13 
14 4

0 1 −
13 13
and the equation is valid for any values of α,β, and γ that satisfy this system.

{c4.3.6a} In Exercises 6 – 9 determine whether the given transformation is linear.


6. T : R3 → R2 defined by T (x1 , x2 , x3 ) = (x1 + 2x2 − x3 , x1 − 4x3 ).
Answer: The transformation T (x, y, z) = (x + 2y − z, x − 4z) is linear.
Solution: Use the definition that a mapping is linear if it satisfies (3.3.1) and (3.3.2) to verify:
Let X1 = (x1 , y1 , z1 ) and let X2 = (x2 , y2 , z2 ). Verify (3.3.1) by computing T (X1 + X2 ) and
comparing it to T (X1 ) + T (X2 ).
T (X1 + X2 ) = T ((x1 , y1 , z1 ) + (x2 , y2 , z2 ))
= T (x1 + x2 , y1 + y2 , z1 + z2 )
= ((x1 + x2 ) + 2(y1 + y2 ) − (z1 + z2 ), (x1 + x2 ) − 4(z1 + z2 ))
= (x1 + x2 + 2y1 + 2y2 − z1 − z2 , x1 + x2 − 4z1 − 4z2 )
= (x1 + 2y1 − z1 , x1 − 4z1 ) + (x2 + 2y2 − z2 , x2 − 4z2 ).

T (X1 ) + T (X2 ) = T (x1 , y1 , z1 ) + T (x2 , y2 , z2 )


= (x1 + 2y1 − z1 , x1 − 4z1 ) + (x2 + 2y2 − z2 , x2 − 4z2 ).
Let X = (x, y, z) and let c be any real scalar. Verify (3.3.2) by computing cT (X) and comparing it
to T (cX).
cT (X) = cT (x, y, z)
= c(x + 2y − z, x − 4z)
= (cx + 2cy − cz, cx − 4cz).
T (cX) = T (cx, cy, cz)
= T (cX)
= (cx + 2cy − cz, cx − 4cz).
{c4.3.6b} So T (x, y, z) = (x + 2y − z, x − 4z) is a linear mapping.
7. T : R2 → R2 defined by T (x1 , x2 ) = (x1 + x1 x2 , 2x2 ).
Answer: The transformation T (x, y) = (x + xy, 2y) is not linear.
Solution: If T is a linear transformation, then
T (x1 + x2 , y1 + y2 ) = T (x1 , y1 ) + T (x2 , y2 )
for any real numbers x1 ,x2 ,y1 ,y2 . However,
T (1, 1) = (2, 2)
T (1, 0) + T (0, 1) = (1, 0) + (0, 2) = (1, 2).

134
§3.3 Linearity

Therefore T (1, 1) 6= T (1, 0) + T (0, 1) and T is not linear.


{c4.3.6c}
8. T : R2 → R2 defined by T (x1 , x2 ) = (x1 + x2 , x1 − x2 − 1).
The transformation T (x, y) = (x + y, x − y − 1) is not linear because T (0, 0) = (0, −1) 6= 0,
contradicting Theorem 3.3.5.
{c4.3.6d}
9. T : R2 → R3 defined by T (x1 , x2 ) = (1, x1 + x2 , 2x2 )
The transformation T (x, y) = (1, x + y, 2y) is not linear because T (0, 0) = (1, 0, 0) 6= 0.

{c4.3.6A}
10. Determine which of the following maps are linear maps. If the map is linear give the matrix
associated to the linear map. Explain your reasoning.
   
x x+y+3
(a) L1 : R → R where L1
2 2
=
y 2y + 1
 
  sin x
x
(b) L2 : R2 → R3 where L2 = x+y 
y
2y
 
x
(c) L3 : R2 → R where L3 =x+y
y

Answer: (a) Not linear; (b) not linear; (c) linear with 1 × 2 matrix A = .

1 1
Solution:
   
0 3
(a) Linear maps map the origin to the origin. L1 = 6= 0. So L1 is not linear.
0 1
(b) Linear maps L satisfy L(cX) = cL(X). In this case
   
  c sin x   sin(cx)
x cx
cL2 =  cx + cy  and L2 =  cx + cy 
y cy
2cy 2cy

Since sin(cx) 6= c sin(x), L3 is not linear.


(c) All matrix mappings are linear. Since we can write
   
x  x
L3 =x+y = 1 1
y y

it follows that L3 is linear with 1 × 2 matrix A = .



1 1

135
§3.3 Linearity

{c4.3.7}
11. Find the 2 × 3 matrix A that satisfies
     
2 1 0
Ae1 = , Ae2 = , and Ae3 = .
3 −1 1

Answer: The matrix that satisfies these conditions is


 
2 1 0
A= .
3 −1 1

Solution: Let  
a11 a12 a13
A= .
a21 a22 a23
Rewrite the conditions on A as:
 
  1  
a11 a12 a13  0 = 2
a21 a22 a23 3
0
 
  0  
a11 a12 a13  1 = 1
a21 a22 a23 −1
0
 
  0  
a11 a12 a13  0 = 0
a21 a22 a23 1
1
These equations imply:

a11 = 2 a12 = 1 a13 = 0


, and
a21 = 3 a22 = −1 a23 = 1

and determine the matrix A.

{c4.3.8}
12. The cross product of two 3-vectors x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ) is the 3-vector

x × y = (x2 y3 − x3 y2 , −(x1 y3 − x3 y1 ), x1 y2 − x2 y1 ).

Let K = (2, 1, −1).

(a) Show that the mapping L : R3 → R3 defined by

L(x) = x × K

is a linear mapping.

136
§3.3 Linearity

(b) Find the 3 × 3 matrix A such that


L(x) = Ax,
that is, L = LA .

Answer: The matrix of linear mapping L is


 
0 −1 −1
A= 1 0 2 .
1 −2 0

Solution: Let X = (x1 , x2 , x3 ) and let Y = (y1 , y2 , y3 ). Since K = (2, 1, −1),

L(X) = (x1 , x2 , x3 ) × K = (−x2 − x3 , x1 + 2x3 , x1 − 2x2 ).

(a) To show that L(X) is a linear mapping, first demonstrate that (3.3.1) is valid:

L(X + Y ) = L(x1 + y1 , x2 + y2 , x3 + y3 )
= (−(x2 + y2 ) − (x3 + y3 ), (x1 + y1 ) + 2(x3 + y3 ), (x1 + y1 ) − 2(x2 + y2 ))
= (−x2 − x3 , x1 + 2x3 , x1 − 2x2 ) + (−y2 − y3 , y1 + 2y3 , y1 − 2y2 )
= L(X) + L(Y ),

then show that (3.3.2) is valid:

cL(X) = cL(x1 , x2 , x3 )
= c(−x2 − x3 , x1 + 2x3 , x1 − 2x2 )
= (−cx2 − cx3 , cx1 + 2cx3 , cx1 − 2cx2 )
= L(cx1 , cx2 , cx3 )
= L(cX).

(b) Find A by noting that L(ej ) = Aej is the j th column of A, and computing

L(e1 ) = L(1, 0, 0) = (0, 1, 1)


L(e2 ) = L(0, 1, 0) = (−1, 0, −2)
L(e3 ) = L(0, 0, 1) = (−1, 2, 0).

{c4.3.9}
13. Argue geometrically that rotation of the plane counterclockwise through an angle of 45◦ is a
linear mapping. Find a 2 × 2 matrix A such that LA rotates the plane counterclockwise by 45◦ .
To see that L(X + Y ) = L(X) + L(Y ), consider X and Y as vectors. These vectors are two
sides of a parallelogram with diagonal X + Y , as shown in Figure 13a. Rotating the plane 45◦
counterclockwise has the effect of rotating the entire parallelogram, as in Figure 13b. Therefore,

137
§3.3 Linearity

adding X and Y and then rotating the sum X + Y is the same as rotating X and Y separately and
then adding them.
To see that cL(X) = L(cX), note that multiplying a vector X by a scalar c affects only
the length of the vector, and that rotating the plane affects only the orientation of the vector.
Therefore, the two operations can be performed in either order with the same effect.
To find the matrix A of this transformation, we can use (3.2.2):
 1 1 
 ◦ ◦  √ −√
cos 45 − sin 45
A= =  12 2 

sin 45◦ cos 45◦ √ √
1 
2 2

x+y x+y
5

3
4

3 y x
2

2 x y
1
1

0 0

−1
−1
−2

−3 −2

−4
−3

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −3 −2 −1 0 1 2 3

Figure 13a Figure 13b


{c4.3.10}
14. Let σ : R3 → R3 permute coordinates cyclically; that is,
σ(x1 , x2 , x3 ) = (x2 , x3 , x1 ).
Find the 3 × 3 matrix A such that σ = LA .
Answer: The matrix A of the linear mapping LA is
 
0 1 0
A =  0 0 1 .
1 0 0

Solution: Note that if σ = LA , then σ(ej ) = Aej is the j th column of matrix A. Thus A is
determined by
σ(e1 ) = σ(1, 0, 0) = (0, 0, 1)
σ(e2 ) = σ(0, 1, 0) = (1, 0, 0)
σ(e3 ) = σ(0, 0, 1) = (0, 1, 0).

138
§3.3 Linearity

{c4.3.11}
15. Let L be a linear map. Using the definition of linearity, prove that L(0) = 0.

Proof Let L(0) = K. By the definition of linearity, for any real number c,

L(0) = L(c0) = cL(0)

which is valid only when L(0) = 0. 

{c4.3.12}
16. Let P : Rn → Rm and Q : Rn → Rm be linear mappings.

(a) Prove that S : Rn → Rm defined by

S(x) = P (x) + Q(x)

is also a linear mapping.


(b) Theorem 3.3.5 states that there are matrices A, B and C such that

P = LA and Q = LB and S = LC .

What is the relationship between the matrices A, B, and C?

Solution: The mapping L is linear if L(x + y) = L(x) + L(y) and if cL(x) = L(cx).

(a) We can use the assumption that P (x) and Q(x) are linear mappings to show:

S(x + y) = P (x + y) + Q(x + y)
= P (x) + P (y) + Q(x) + Q(y)
= [P (x) + Q(x)] + [P (y) + Q(y)]
= S(x) + S(y)

and
cS(x) = cP (x) + cQ(x)
= P (cx) + Q(cx)
= S(cx).

(b) Assume that S = LC , P = LA and Q = LB for m × n matrices A, B, C. We claim that


A = B + C. By definition, A(ej ) = LA (ej ) = LB (ej ) + LC (ej ) = (B + C)(ej ). Lemma 3.3.4
implies that the j th column of C is the sum of the j th column of A and the j th column of B
for all columns j, so C = A + B.

139
§3.3 Linearity

{c4.3.13} 17. (matlab) Let  


0.5 0
A= .
0 2
Use map to verify that the linear mapping LA halves the x-component of a point while it doubles
the y-component.
Computer experiment.

{c4.3.14} 18. (matlab) Let  


0 0.5
A= .
−0.5 0
Use map to determine how the mapping LA acts on 2-vectors. Describe this action in words.
The mapping LA performs the transformation (x, y) → (0.5y, −0.5x). That is, the mapping rotates
a 2-vector 90◦ clockwise then halves its length.

In Exercises 19 – 20 use MATLAB to verify (3.3.1) and (3.3.2).

{c4.3.15A} 19. (matlab)


  
 
1 2 3 3 0
{eq4.3.15a} A= 0 1 −2  , x =  2 , y =  −5  , c = 21; (3.3.5*)
4 0 1 −1 10

Verify (3.3.1) by typing A*(x + y) to obtain:

ans =
24
-21
21

Then, type A*x + A*y, which yields the same answer. Verify (3.3.2) by typing c*(A*x), which gives
the same answer as A*(c*x), namely:

ans =
84
84
231

140
§3.3 Linearity

{c4.3.15B} 20. (matlab)    


4 0 −3 2 4 1
 2 8 −4 −1 3   3 
{eq4.3.15b} (3.3.6*)
   
A=
 −1 2 1 10 −2  , x =  −2 
 
 4 4 −2 1 2   3 
−2 3 1 1 −1 −1
 
2

 0 

y=
  , c = −13.
13 
 −2 
1

Typing either A*(x + y) or A*x + A*y yields

ans =
-19
-15
24
3
15

verifying (3.3.1). Typing c*(A*x) or A*(c*x) yields

ans =
-156
-364
-455
-273
-117

verifying (3.3.2).

141
§3.4 The Principle of Superposition

{S:Superposition} 3.4 The Principle of Superposition


The principle of superposition is just a restatement of the fact that matrix mappings are
linear. Nevertheless, this restatement is helpful when trying to understand the structure of
solutions to systems of linear equations.

Homogeneous Equations A system of linear equations is homogeneous if it has the form

{homosys} Ax = 0, (3.4.1)

where A is an m × n matrix and x ∈ Rn . Note that homogeneous systems are consistent


since 0 ∈ Rn is always a solution, that is, A(0) = 0.
The principle of superposition makes two assertions:

• Suppose that y and z in Rn are solutions to (3.4.1) (that is, suppose that Ay = 0 and
Az = 0); then y + z is a solution to (3.4.1).
• Suppose that c is a scalar; then cy is a solution to (3.4.1).

The principle of superposition is proved using the linearity of matrix multiplication. Calcu-
late
A(y + z) = Ay + Az = 0 + 0 = 0
to verify that y + z is a solution, and calculate

A(cy) = c(Ay) = c · 0 = 0

to verify that cy is a solution.


We see that solutions to homogeneous systems of linear equations always satisfy the general
property of superposition: sums of solutions are solutions and scalar multiples of solutions
are solutions.
We illustrate this principle by explicitly solving the system of equations
 
  x1  
1 2 −1 1   x2  =
 0
.
2 5 −4 −1  x3  0
x4
Use row reduction to show that the matrix
 
1 2 −1 1
2 5 −4 −1

142
§3.4 The Principle of Superposition

is row equivalent to  
1 0 3 7
0 1 −2 −3
which is in reduced echelon form. Recall, using the methods of Section 2.3, that every
solution to this linear system has the form
     
−3x3 − 7x4 −3 −7
 2x3 + 3x4 
 = x3  2  + x4  3  .
   

 x3   1   0 
x4 0 1

Superposition is verified again by observing that the form of the solutions is preserved under
vector addition and scalar multiplication. For instance, suppose that
       
−3 −7 −3 −7
 2   3   2   3 
α1  1  + α2  0  and β1  1  + β2  0 
      

0 1 0 1

are two solutions. Then the sum has the form


   
−3 −7
 2   3 
γ1 
 1  + γ2  0 
  

0 1

where γj = αj + βj .
We have actually proved more than superposition. We have shown in this example that
every solution is a superposition of just two solutions
   
−3 −7
 2 
 and  3  .
 

 1   0 
0 1

Inhomogeneous Equations The linear system of m equations in n unknowns is written as

Ax = b

where A is an m × n matrix, x ∈ Rn , and b ∈ Rm . This system is inhomogeneous when


the vector b is nonzero. Note that if y, z ∈ Rn are solutions to the inhomogeneous equation

143
§3.4 The Principle of Superposition

(that is, Ay = b and Az = b), then y − z is a solution to the homogeneous equation. That
is,
A(y − z) = Ay − Az = b − b = 0.
For example, let    
1 2 0 3
A= and b = .
−2 0 1 −1
Then   
1 3
y= 1  and z =  0 
1 5
are both solutions to the linear system Ax = b. It follows that
 
−2
y−z = 1 
−4

is a solution to the homogeneous system Ax = 0, which can be checked by direct calculation.


Thus we can completely solve the inhomogeneous equation by finding one solution to the
inhomogeneous equation and then adding to that solution every solution of the homogeneous
equation. More precisely, suppose that we know all of the solutions w to the homogeneous
equation Ax = 0 and one solution y to the inhomogeneous equation Ax = b. Then y + w is
another solution to the inhomogeneous equation and every solution to the inhomogeneous
equation has this form.

An Example of an Inhomogeneous Equation Suppose that we want to find all solutions of


Ax = b where    
3 2 1 −2
A= 0 1 −2  and b =  4  .
3 3 −1 2
Suppose that you are told that y = (−5, 6, 1)t is a solution of the inhomogeneous equation.
(This fact can be verified by a short calculation — just multiply Ay and see that the result
equals b.) Next find all solutions to the homogeneous equation Ax = 0 by putting A into
reduced echelon form. The resulting row echelon form matrix is
5
 
1 0
3 
 0 1 −2  .

0 0 0

144
§3.4 The Principle of Superposition

Hence we see that the solutions of the homogeneous equation Ax = 0 are


5 5
   
− s −
 3   3 
 2s  = s  2 .
s 1

Combining these results, we conclude that all the solutions of Ax = b are given by
5
   
−5 −
 6  + s 3 .
 2 
1 1

Exercises

{c4.4.1}
1. Consider the homogeneous linear equation

x+y+z =0

(a) Write all solutions to this equation as a general superposition of a pair of vectors v1 and v2 .
(b) Write all solutions as a general superposition of a second pair of vectors w1 and w2 .

(a) Answer: All solutions can be written as a superposition of the vectors


   
−1 −1
v1 =  1  and v2 =  0  .
0 1

Solution: The equation x + y + z = 0 is a linear system of three variables in one equation. If we


consider y and z to be free variables, then every solution to the system has the form
       
x −y − z −1 −1
 y = y  = y 1  + z 0 .
z z 0 1

(b) Answer: All solutions can be written as a superposition of the second pair of vectors
   
1 0
w1 =  0  and w2 =  1  .
−1 −1

145
§3.4 The Principle of Superposition

Solution: Write the same linear system, but this time consider x and y to be free variables. In
this case, every solution has the form:
       
x x 1 0
 y = y  = x 0  + z 1 .
z −x − y −1 −1

{c4.4.2}
2. Write all solutions to the homogeneous system of linear equations

x1 + 2x2 + x4 − x5 = 0
x3 − 2x4 + x5 = 0

as the general superposition of three vectors.


Answer: Every solution can be written as a superposition of the vectors
     
−2 −1 1
 1   0   0 
 2  and  −1  .
     
 0 ,
     
 0   1   0 
0 0 1

Solution: Write the matrix of the homogeneous system:


 
1 2 0 1 −1
.
0 0 1 −2 1

This matrix cannot be row reduced further. Every solution has the form
         
x1 x5 − x4 − 2x2 −2 −1 1
 x2   x2   1   0   0 
         
 x3  =  −x5 + 2x4  = x2  0  + x4  2  + x5  −1  .
         
 x4   x4   0   1   0 
x5 x5 0 0 1

{c4.4.3}
3. (a) Find all solutions to the homogeneous equation Ax = 0 where
 
2 3 1
A= .
1 1 4

(b) Find a single solution to the inhomogeneous equation


 
6
{E:inhom} Ax = . (3.4.2)
6

146
§3.4 The Principle of Superposition

(c) Use your answers in (a) and (b) to find all solutions to (3.4.2).
(a) Answer: All solutions to the homogeneous equation are of the form
   
x1 −11
x =  x2  = s  7 .
x3 1

Solution: Row reduce the matrix of the homogeneous system Ax = 0 to obtain:


 
1 0 11
.
0 1 −7

So x1 = −11s, x2 = 7s and x3 = s.
(b) Answer: One possible solution is
   
x1 1
 x2  =  1  .
x3 1

Solution: Assign a value to x3 , then substitute into the two equations of the inhomogeneous
system to obtain values for x1 and x2 .
(c) All solutions to (3.4.2) can be found by adding a single solution of the inhomogeneous system
to all solutions of the homogeneous system, so:
   
1 −11
x =  1  + s 7 .
1 1

{A.3.4.1}
4. How many solutions can a homogeneous system of 4 linear equations in 7 unknowns have?
Answer: The system must have infinitely many solutions.
The system must have a solution because homogeneous systems are always consistent. The system
cannot have a unique solution because the rank of the corresponding augmented matrix cannot
exceed 4 which is less than the number of variables 7.
{A.3.4.2}
5. Let A be a 3 × 3 matrix with rank 2. Suppose the linear system Ax = b has two solutions
   
1 0
x1 =  3  and x2 =  0 
4 1

147
§3.4 The Principle of Superposition

Find the full set of solutions to Ax = b.



  
0 1
Answer: Every solution has the form  0  + t  3  for some real number t.
1 3
Solution: Since A has rank 2 and the system Ax = b is consistent, the system has 3 − rank(A) = 1
parameter’s worth of solutions. By superposition
 
1
t(x1 − x2 ) = t  3 
3
   
0 1
solves the homogeneous system and  0  + t  3  solves the inhomogeneous system.
1 3

148
§3.5 Composition and Multiplication of Matrices

{S:4.6} 3.5 Composition and Multiplication of Matrices


The composition of two matrix mappings leads to another matrix mapping from which the
concept of multiplication of two matrices follows. Matrix multiplication can be introduced by
formula, but then the idea is unmotivated and one is left to wonder why matrix multiplication
is defined in such a seemingly awkward way.
We begin with the example of 2 × 2 matrices. Suppose that
   
2 1 0 3
A= and B = .
1 −1 −1 4

We have seen that the mappings

x 7→ Ax and x 7→ Bx

map 2-vectors to 2-vectors. So we can ask what happens when we compose these mappings.
In symbols, we compute

LA ◦LB (x) = LA (LB (x)) = A(Bx).

In coordinates, let x = (x1 , x2 ) and compute


 
3x2
A(Bx) = A
−x1 + 4x2
 
−x1 + 10x2
= .
x1 − x2

It follows that we can rewrite A(Bx) using multiplication of a matrix times a vector as
  
−1 10 x1
A(Bx) = .
1 −1 x2

In particular, LA ◦LB is again a linear mapping, namely LC , where


 
−1 10
C= .
1 −1

With this computation in mind, we define the product


    
2 1 0 3 −1 10
AB = = .
1 −1 −1 4 1 −1

149
§3.5 Composition and Multiplication of Matrices

Using the same approach we can derive a formula for matrix multiplication of 2×2 matrices.
Suppose
   
a11 a12 b11 b12
A= and B = .
a21 a22 b21 b22
Then
 
b11 x1 + b12 x2
A(Bx) = A
b21 x1 + b22 x2
 
a11 (b11 x1 + b12 x2 ) + a12 (b21 x1 + b22 x2 )
=
a21 (b11 x1 + b12 x2 ) + a22 (b21 x1 + b22 x2 )
 
(a11 b11 + a12 b21 )x1 + (a11 b12 + a12 b22 )x2
=
(a21 b11 + a22 b21 )x1 + (a21 b12 + a22 b22 )x2
  
a11 b11 + a12 b21 a11 b12 + a12 b22 x1
= .
a21 b11 + a22 b21 a21 b12 + a22 b22 x2

Hence, for 2 × 2 matrices, we see that composition of matrix mappings defines the matrix
multiplication
  
a11 a12 b11 b12
a21 a22 b21 b22
to be  
a11 b11 + a12 b21 a11 b12 + a12 b22
{2x2mult} . (3.5.1)
a21 b11 + a22 b21 a21 b12 + a22 b22

Formula (3.5.1) may seem a bit formidable, but it does have structure. Suppose A and B
are 2 × 2 matrices, then the entry of
C = AB

in the ith row, j th column may be written as

2
X
ai1 b1j + ai2 b2j = aik bkj .
k=1

We shall see that an analog of this formula is available for matrix multiplications of all sizes.
But to derive this formula, it is easier to develop matrix multiplication abstractly.
{complin}
Lemma 3.5.1. Let L1 : Rn → Rm and L2 : Rp → Rn be linear mappings. Then L =
L1 ◦L2 : Rp → Rm is a linear mapping.

150
§3.5 Composition and Multiplication of Matrices

Proof Compute

L(x + y) = L1 ◦L2 (x + y)
= L1 (L2 (x) + L2 (y))
= L1 (L2 (x)) + L1 (L2 (y))
= L1 ◦L2 (x) + L1 ◦L2 (y)
= L(x) + L(y).

Similarly, compute L1 ◦L2 (cx) = cL1 ◦L2 (x). 

We apply Lemma 3.5.1 in the following way. Let A be an m×n matrix and let B be an n×p
matrix. Then LA : Rn → Rm and LB : Rp → Rn are linear mappings, and the mapping
L = LA ◦LB : Rp → Rm is defined and linear. Theorem 3.3.5 implies that there is an m × p
matrix C such that L = LC . Abstractly, we define the matrix product AB to be C.

Note that the matrix product AB is defined only when the number of columns of A
is equal to the number of rows of B.

Calculating the Product of Two Matrices Next we discuss how to calculate the product
of matrices; this discussion generalizes our discussion of the product of 2 × 2 matrices.
Lemma 3.3.4 tells how to compute C = AB. The j th column of the matrix product is just

Cej = A(Bej ),

where Bj ≡ Bej is the j th column of the matrix B. Therefore,

{E:matprod} C = (AB1 | · · · |ABp ). (3.5.2)

Indeed, the (i, j)th entry of C is the ith entry of ABj , that is, the ith entry of
   
b1j a11 b1j + · · · + a1n bnj
 .. ..
A . = . .
  

bnj am1 b1j + · · · + amn bnj

It follows that the entry cij of C in the ith row and j th column is
n
{multij} (3.5.3)
X
cij = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj .
k=1

151
§3.5 Composition and Multiplication of Matrices

We can interpret (3.5.3) in the following way. To calculate cij : multiply the entries of the
ith row of A with the corresponding entries in the j th column of B and add the results.
This interpretation reinforces the idea that for the matrix product AB to be defined, the
number of columns in A must equal the number of rows in B.
For example, we now perform the following multiplication:

 
  1 −2
2 3 1  3 1 
3 −1 2
−1 4
 
2 · 1 + 3 · 3 + 1 · (−1) 2 · (−2) + 3 · 1 + 1 · 4
=
3 · 1 + (−1) · 3 + 2 · (−1) 3 · (−2) + (−1) · 1 + 2 · 4
 
10 3
= .
−2 1

Some Special Matrix Products Let A be an m × n matrix. Then


OA = O
AO = O
AIn = A
Im A = A
The first two equalities are easily checked using (3.5.3). It is not significantly more difficult
to verify the last two equalities using (3.5.3), but we shall verify these equalities using the
language of linear mappings, as follows:
LAIn (x) = LA ◦LIn (x) = LA (x),
since LIn (x) = x is the identity map. Therefore AIn = A. A similar proof verifies that
Im A = A. Although the verification of these equalities using the notions of linear mappings
may appear to be a case of overkill, the next section contains results where these notions
truly simplify the discussion.

Exercises

In Exercises 1 – 4 determine whether or not the matrix products AB or BA can be computed for
each given pair of matrices A and B. If the product is possible, perform the computation.

152
§3.5 Composition and Multiplication of Matrices

{c4.6.-1a}
   
1 0 −2 0
1. A = and B = .
−2 1 3 −1
   
−2 0 −2 0
AB = and BA = .
7 −1 5 −1
{c4.6.-1b}
   
0 −2 1 0 2
2. A = and B = .
4 10 0 3 −1
 
8 20 0
AB is not defined. BA =
−4 −16 3
{c4.6.-1c}
 
  0 2 5
8 0 2 3
3. A = and B =  −1 3 −1 .
−3 0 −10 3
0 1 −5
{c4.6.-1d} AB is not defined. BA is not defined.
   
8 −1 2 8 0 −3
4. A =  −3 12  and B =  1 4 0 1 
5 −4 −5 6 7 −20
AB is not defined. BA is not defined.

{c4.6.0a} In Exercises 5 – 8 compute the given matrix product.


  
2 3 −1 1
5. .
0 1 −3 2
      
2 3 −1 1 −2 − 9 2 + 6 −11 8
= = .
0 1 −3 2 −3 2 −3 2
{c4.6.0b}
 
  2 3
1 2 3 
6. −2 5 .
−2 3 −1
1 −1
 
  2 3    
1 2 3  2−4+3 3 + 10 − 3 1 10
−2 5 = = .
−2 3 −1 −4 − 6 − 1 −6 + 15 + 1 −11 10
1 −1
{c4.6.0c}
 
2 3  
1 2 3
7.  −2 5  .
−2 3 −1
1 −1
   
2 3   −4 13 3
1 2 3
 −2 5  =  −12 11 −11 .
−2 3 −1
1 −1 3 −1 4

153
§3.5 Composition and Multiplication of Matrices

{c4.6.0d}
  
2 −1 3 1 7
8.  1 0 5   −2 −1 .
1 5 −1 −5 3
      
2 −1 3 1 7 2 + 2 − 15 14 + 1 + 9 −11 24
 1 0 5   −2 −1  =  1 − 25 7 + 15  =  −24 22 .
1 5 −1 −5 3 1 − 10 + 5 7−5−3 −4 −1

{c4.6.1}
9. Determine all the 2 × 2 matrices B such that AB = BA where A is the matrix
 
2 0
A= .
0 −1

Answer: For any matrix B of the form


 
b11 0
0 b22

the equation AB = BA is valid.


Solution: Let  
b11 b12
B= .
b21 b22
Then compute the matrix B for which

  AB = BA
  
2 0 b11 b12 b11 b12 2 0
=
0 −1 b21 b22   b 21 b 22 0 −1
2b11 2b12 2b11 −b12
=
−b21 −b22 2b21 −b22

which is equivalent to the linear system

2b11 = 2b11
2b12 = −b12
−b21 = 2b21
−b22 = −b22 .

{c4.6.2}
10. Let    
2 5 a 3
A= and B= .
1 4 b 2
For which values of a and b does AB = BA?

154
§3.5 Composition and Multiplication of Matrices

Answer:  4 
3
B= 5
3
.
2
5
Solution: Compute

  AB = BA
  
2 5 a 3 a 3 2 5
=
1 4 b 2   b 2 1 4 
2a + 5b 16 2a + 3 5a + 12
= .
a + 4b 11 2b + 2 5b + 8

This equation can be rewritten as the system

2a + 5b = 2a + 3
16 = 5a + 12
a + 4b = 2b + 2
11 = 5b + 8

which yields the solution a = 4/5 and b = 3/5.

{c4.6.3}
11. Let  
1 0 −3
A =  −2 1 1 .
0 1 −5
Let At is the transpose of the matrix A, as defined in Section 1.3. Compute AAt .

    
1 0 −3 1 −2 0 10 −5 15
t
AA =  −2 1 1  0 1 1  =  −5 6 −4  .
0 1 −5 −3 1 −5 15 −4 26

In Exercises 12 – 14 decide for the given pair of matrices A and B whether or not the products AB
or BA are defined and compute the products when possible.

{c4.7.1a} 12. (matlab)  


  3 −2 0
2 2 −2
{MATLAB:26} A= and B= 0 −1 4  (3.5.4*)
−4 4 0
−2 −3 5

Load the system into MATLAB. The matrix AB is defined, and typing A*B yields:

155
§3.5 Composition and Multiplication of Matrices

ans =
10 0 -2
-12 4 16

The matrix BA is not defined, since B has 3 columns while A has 2 rows. Typing B*A generates
an error message.

{c4.7.1b} 13. (matlab)


 
  1 3 −4 3 −2 1
−4 1 0 5 −1  0 3 2 3 −1 4 
{MATLAB:27} A= 5 −1 −2 −4 −2  and B=  (3.5.5*)
 5 4 4 5 −1 0 
1 5 −4 1 5
−4 −3 2 4 1 4

The matrix AB is not defined because A has 5 columns while B has four rows. The matrix BA is
also not defined because B has 6 columns and A has 3 rows.

{c4.7.1c} 14. (matlab)


   
−2 −2 4 5 2 3 −4 5
 0 −3 −4 3   4 −3 0 −2 
{MATLAB:28} A=  and B=  (3.5.6*)
 1 −3 1 1   −3 −4 −4 −3 
0 1 0 4 −2 −2 3 −1

Both AB and BA are defined and can be computed using MATLAB:

A*B B*A
ans = ans =
-34 -26 7 -23 -8 4 -8 35
-6 19 25 15 -8 -1 28 3
-15 6 -5 7 2 27 0 -43
-4 -11 12 -6 7 0 3 -17

156
§3.6 Properties of Matrix Multiplication

{S:4.7} 3.6 Properties of Matrix Multiplication


In this section we discuss the facts that matrix multiplication is associative (but not com-
mutative) and that certain distributive properties hold. We also discuss how matrix multi-
plication is performed in MATLAB .

{assoc} Matrix Multiplication is Associative


Theorem 3.6.1. Matrix multiplication is associative. That is, let A be an m × n matrix,
let B be a n × p matrix, and let C be a p × q matrix. Then
(AB)C = A(BC).

Proof Begin by observing that composition of mappings is always associative. In symbols,


let f : Rn → Rm , g : Rp → Rn , and h : Rq → Rp . Then
f ◦(g ◦h)(x) = f [(g ◦h)(x)]
= f [g(h(x))]
= (f ◦g)(h(x))
= [(f ◦g)◦h](x).
It follows that
f ◦(g ◦h) = (f ◦g)◦h.

We can apply this result to linear mappings. Thus


LA ◦(LB ◦LC ) = (LA ◦LB )◦LC .
Since
LA(BC) = LA ◦LBC = LA ◦(LB ◦LC )
and
L(AB)C = LAB ◦LC = (LA ◦LB )◦LC ,
it follows that
LA(BC) = L(AB)C ,
and
A(BC) = (AB)C.


It is worth convincing yourself that Theorem 3.6.1 has content by verifying by hand that
matrix multiplication of 2 × 2 matrices is associative.

157
§3.6 Properties of Matrix Multiplication

Matrix Multiplication is Not Commutative Although matrix multiplication is associative, it


is not commutative. This statement is trivially true when the matrix AB is defined while
that matrix BA is not. Suppose, for example, that A is a 2 × 3 matrix and that B is a 3 × 4
matrix. Then AB is a 2×4 matrix, while the multiplication BA makes no sense whatsoever.
More importantly, suppose that A and B are both n × n square matrices. Then AB = BA
is generally not valid. For example, let
   
1 0 0 1
A= and B = .
0 0 0 0

Then    
0 1 0 0
AB = and BA = .
0 0 0 0
So AB 6= BA. In certain cases it does happen that AB = BA. For example, when B = In ,

AIn = A = In A.

But these cases are rare.

Additional Properties of Matrix Multiplication Recall that if A = (aij ) and B = (bij ) are
both m × n matrices, then A + B is the m × n matrix (aij + bij ). We now enumerate several
properties of matrix multiplication.

• Let A and B be m × n matrices and let C be an n × p matrix. Then

(A + B)C = AC + BC.

Similarly, if D is a q × m matrix, then

D(A + B) = DA + DB.

So matrix multiplication distributes across matrix addition.


• If α and β are scalars, then
(α + β)A = αA + βA.
So addition distributes with scalar multiplication.
• Scalar multiplication and matrix multiplication satisfy:

(αA)C = α(AC).

158
§3.6 Properties of Matrix Multiplication

Matrix Multiplication and Transposes Let A be an m × n matrix and let B be an n × p


matrix, so that the matrix product AB is defined and AB is an m × p matrix. Note that
At is an n × m matrix and that B t is a p × n matrix, so that in general the product At B t is
not defined. However, the product B t At is defined and is an p × m matrix, as is the matrix
(AB)t . We claim that
{e:transposeprod} (AB)t = B t At . (3.6.1)
We verify this claim by direct computation. The (i, k)th entry in (AB)t is the (k, i)th entry
in AB. That entry is:
Xn
akj bji .
j=1

The (i, k)th entry in B t At is:


n
X
btij atjk ,
j=1

where atjk is the (j, k)th entry in At and btij is the (i, j)th entry in B t . It follows from the
definition of transpose that the (i, k)th entry in B t At is:
n
X n
X
bji akj = akj bji ,
j=1 j=1

which verifies the claim.

Matrix Multiplication in MATLAB Let us now explain how matrix multiplication works in
MATLAB. We load the matrices
 
−5 2 0  
 −1 1 −4  2 −2 −2 5 5
{examp_AB} A=  −4 4
 and B =  4 −5 1 −1 2  (3.6.2*)
2 
3 2 3 −3 3
−1 3 −1

by typing

e3_6_2

Now the command C = A*B asks MATLAB to compute the matrix C as the product of A
and B. We obtain

159
§3.6 Properties of Matrix Multiplication

C =
-2 0 12 -27 -21
-10 -11 -9 6 -15
14 -8 18 -30 -6
7 -15 2 -5 -2

Let us confirm this result by another computation. As we have seen above the 4th column
of C should be given by the product of A with the 4th column of B. Indeed, if we perform
this computation and type

A*B(:,4)

the result is

ans =
-27
6
-30
-5

which is precisely the 4th column of C.


MATLAB also recognizes when a matrix multiplication of two matrices is not defined. For
example, the product of the 3 × 5 matrix B with the 4 × 3 matrix A is not defined, and if
we type B*A then we obtain the error message

??? Error using ==> *


Inner matrix dimensions must agree.

We remark that the size of a matrix A can be seen using the MATLAB command size. For
example, the command size(A) leads to

ans =
4 3

reflecting the fact that A is a matrix with four rows and three columns.

Exercises

{c4.7.2.2}
1. Let A be an m × n matrix. Show that the matrices AAt and At A are symmetric.

160
§3.6 Properties of Matrix Multiplication

Let B = AAt , where A is an m × n matrix. Then B t = (AAt )t = (At )t At by (3.6.1). Since


(At )t = A, B t = AAt = B and B is symmetric. Similarly, C = At A is symmetric.

{c4.7.3}
2. Let    
1 2 2 3
A= and B = .
−1 −1 1 4
Compute AB and B t At . Verify that (AB)t = B t At for these matrices A and B.
First compute AB:     
1 2 2 3 4 11
=
−1 −1 1 4 −3 −7
then take its transpose.  
t 4 −3
(AB) = .
11 −7
Then calculate     
1 −1 2 1 4 −3
B t At = = .
2 −1 3 4 11 −7
So (AB)t = B t At for these matrices A and B.

{c4.7.4}
3. Let  
0 1 0
A= 0 0 1 .
0 0 0
1 2 1
Compute B = I + A + A and C = I + tA + (tA)2 .
2 2

     1   1 
1 0 0 0 1 0 0 0 1 1
1 2 = 2 .
B = I + A + A2 =  0 1 0 + 0 0 1 + 0

2 0 0   0 1 1 
0 0 1 0 0 0 0 0 0 0 0 1

  t2 t2
     
1 0 0 0 t 0 0 0 1 t
1 2   2 
C = I + tA + (tA)2 =  0 1 0 + 0 0 t + 0 0 = 0 t .

2 0 1
0 0 1 0 0 0 0 0 0 0 0 1

{c4.7.5}
4. Let    
1 0 0 −1
I= and J= .
0 1 1 0

161
§3.6 Properties of Matrix Multiplication

(a) Show that J 2 = −I.


(b) Evaluate (aI + bJ)(cI + dJ) in terms of I and J.

(a) Verify J 2 = −I by computation:


    
0 −1 0 −1 −1 0
J2 = = = −I.
1 0 1 0 0 −1

(b) Answer: (aI + bJ)(cI + dJ) = (ac − bd)I + (ad + bc)J.


Solution: Evaluate (aI + bJ)(cI + dJ), yielding acI 2 + adIJ + bcJI + bdJ 2 . Then, use the identities
IJ = JI = J, I 2 = I, and J 2 = −I to rewrite the expression in terms of I and J.

{c4.7.8}
5. Recall that a square matrix C is upper triangular if cij = 0 when i > j. Show that the matrix
product of two upper triangular n × n matrices is also upper triangular.
Let A and B be n × n upper triangular matrices. To show that AB is upper triangular, we must
show that if i > j, then
Xn
(ab)ij = aik bkj = 0.
k=1

For every component of this sum, either i > k, in which case aik = 0 since A is upper-triangular,
or i ≤ k, in which case, since i > j, k > j, so bkj = 0. Therefore, for all i > j, (ab)ij = 0, so AB is
upper triangular.

In Exercises 6 – 8 use MATLAB to verify that (A + B)C = AC + BC for the given matrices.
     
0 2 −2 1 2 −1
6. (matlab) A = ,B= and C =
{c4.7.0a} 2 1 3 0 1 5
Computer experiment.
     
12 −2 8 −20 10 2 4
7. (matlab) A = ,B= and C =
{c4.7.0b} 3 1 3 10 2 13 −4
Computer experiment.
   
6 1 2 −10  
−2 10
8. (matlab) A =  3 20 , B =
  5 0  and C =
12 10
{c4.7.0c} −5 3 3 1
Computer experiment.

162
§3.6 Properties of Matrix Multiplication

{c4.7.2} 9. (matlab) Use the rand(3,3) command in MATLAB to choose five pairs of 3 × 3 matrices
A and B at random. Compute AB and BA using MATLAB to see that in general these matrix
products are unequal.
Computer experiment.

{c4.7.2.1} 10. (matlab) Experimentally, find two symmetric 2 × 2 matrices A and B for which the matrix
product AB is not symmetric.
Let    
1 2 2 −1
A= and B=
2 −1 −1 2
be symmetric matrices. Then  
0 3
AB =
5 −4
is not symmetric. In general, for
   
a11 a12 b11 b12
A= and B= ,
a12 a22 b12 b22

AB is symmetric if a12 b11 + a22 b12 = a11 b12 + a12 b22 .

163
§3.7 Solving Linear Systems and Inverses

{S:SLS} 3.7 Solving Linear Systems and Inverses


When we solve the simple equation
ax = b,

we do so by dividing by a to obtain
1
x= b.
a
This division works as long as a 6= 0.
Writing systems of linear equations as

Ax = b

suggests that solutions should have the form

1
x= b
A
and the MATLAB command for solving linear systems

x=A\b

suggests that there is some merit to this analogy.


The following is a better analogy. Multiplication by a has the inverse operation: division
by a; multiplying a number x by a and then multiplying the result by a−1 = 1/a leaves
the number x unchanged (as long as a 6= 0). In this sense we should write the solution to
ax = b as
x = a−1 b.

For systems of equations Ax = b we wish to write solutions as

x = A−1 b.

In this section we consider the questions: What does A−1 mean and when does A−1 exist?
1
(Even in one dimension, we have seen that the inverse does not always exist, since 0−1 =
0
is undefined.)

Invertibility We begin by giving a precise definition of invertibility for square matrices.

164
§3.7 Solving Linear Systems and Inverses

{inverse}
Definition 3.7.1. The n × n matrix A is invertible if there is an n × n matrix B such that
AB = In and BA = In .
The matrix B is called an inverse of A. If A is not invertible, then A is noninvertible or
singular.

Geometrically, we can see that some matrices are invertible. For example, the matrix
 
0 −1
R90 =
1 0
rotates the plane counterclockwise through 90◦ and is invertible. The inverse matrix of R90
is the matrix that rotates the plane clockwise through 90◦ . That matrix is:
 
0 1
R−90 = .
−1 0
This statement can be checked algebraically by verifying that R90 R−90 = I2 and that
R−90 R90 = I2 .
Similarly,  
5 3
B=
2 1
is an inverse of  
−1 3
A= ,
2 −5
as matrix multiplication shows that AB = I2 and BA = I2 . In fact, there is an elementary
formula for finding inverses of 2 × 2 matrices (when they exist); see (3.8.1) in Section 3.8.
On the other hand, not all matrices are invertible. For example, the zero matrix is nonin-
{B=C} vertible, since 0B = 0 for any matrix B.
Lemma 3.7.2. If an n × n matrix A is invertible, then its inverse is unique and is denoted
by A−1 .

Proof Let B and C be n × n matrices that are inverses of A. Then


BA = In and AC = In .
We use the associativity of matrix multiplication to prove that B = C. Compute
B = BIn = B(AC) = (BA)C = In C = C.


165
§3.7 Solving Linear Systems and Inverses

We now show how to compute inverses for products of invertible matrices.


{P:invprod}
Proposition 3.7.3. Let A and B be two invertible n × n matrices. Then AB is also
invertible and
(AB)−1 = B −1 A−1 .

Proof Use associativity of matrix multiplication to compute

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In .

Similarly,
(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 B = In .
Therefore AB is invertible with the desired inverse. 
{L:transposeinv}
Proposition 3.7.4. Suppose that A is an invertible n × n matrix. Then At is invertible
and
(At )−1 = (A−1 )t .

Proof We must show that (A−1 )t is the inverse of At . Identity (3.6.1) implies that

(A−1 )t At = (AA−1 )t = (In )t = In ,

and
At (A−1 )t = (A−1 A)t = (In )t = In .
Therefore, (A−1 )t is the inverse of At , as claimed. 

Invertibility and Unique Solutions Next we discuss the implications of invertibility for the
solution of the inhomogeneous linear system:

{squarematrix} Ax = b, (3.7.1)

{P:inv=>unique} where A is an n × n matrix and b ∈ R .


n

Proposition 3.7.5. Let A be an invertible n×n matrix and let b be in Rn . Then the system
of linear equations (3.7.1) has a unique solution.

Proof We can solve the linear system (3.7.1) by setting

{soln} x = A−1 b. (3.7.2)

166
§3.7 Solving Linear Systems and Inverses

This solution is easily verified by calculating

Ax = A(A−1 b) = (AA−1 )b = In b = b.

Next, suppose that x is a solution to (3.7.1). Then

x = In x = (A−1 A)x = A−1 (Ax) = A−1 b.

So A−1 b is the only possible solution. 


{C:inv=>In}
Corollary 3.7.6. An invertible matrix is row equivalent to In .

Proof Let A be an invertible n × n matrix. Proposition 3.7.5 states that the system of
linear equations Ax = b has a unique solution. Chapter 2, Corollary 2.4.8 states that A is
row equivalent to In . 

{P:row=>inv} The converse of Corollary 3.7.6 is also valid.


Proposition 3.7.7. An n × n matrix A that is row equivalent to In is invertible.

Proof Form the n × 2n matrix M = (A|In ). Since A is row equivalent to In , there is a


sequence of elementary row operations so that M is row equivalent to (In |B). Eliminating
all columns from the right half of M except the j th column yields the matrix (A|ej ). The
same sequence of elementary row operations states that the matrix (A|ej ) is row equivalent
to (In |Bj ) where Bj is the j th column of B. It follows that Bj is the solution to the system
of linear equations Ax = ej and that the matrix product

AB = (AB1 | · · · |ABn ) = (e1 | · · · |en ) = In .

So AB = In .
We claim that BA = In and hence that A is invertible. To verify this claim form the n × 2n
matrix N = (In |A). Using the same sequence of elementary row operations again shows
that N is row equivalent to (B|In ). By construction the matrix B is row equivalent to
In . Therefore, there is a unique solution to the system of linear equations Bx = ej . Now
eliminating all columns except the j th from the right hand side of the matrix (B|In ) shows
that the solution to the system of linear equations Bx = ej is just Aj , where Aj is the j th
column of A. It follows that

BA = (BA1 | · · · |BAn ) = (e1 | · · · |en ) = In .

Hence BA = In . 

167
§3.7 Solving Linear Systems and Inverses

{invertequiv}
Theorem 3.7.8. Let A be an n × n matrix. Then the following are equivalent:

(a) A is invertible.
(b) The equation Ax = b has a unique solution for each b ∈ Rn .
(c) The only solution to Ax = 0 is x = 0.
(d) A is row equivalent to In .

Proof (a) ⇒ (b) This implication is just Proposition 3.7.5.


(b) ⇒ (c) This implication is straightforward — just take b = 0 in (3.7.1).
(c) ⇒ (d) This implication is just a restatement of Chapter 2, Corollary 2.4.8.
(d) ⇒ (a). This implication is just Proposition 3.7.7. 

A Method for Computing Inverse Matrices The proof of Proposition 3.7.7 gives a con-
{T:AIn} structive method for finding the inverse of any invertible square matrix.
Theorem 3.7.9. Let A be an n × n matrix that is row equivalent to In and let M be the
n × 2n augmented matrix
{e:M} M = (A|In ). (3.7.3)
Then the matrix M is row equivalent to (In |A−1 ).

An Example Compute the inverse of the matrix


 
1 2 0
A =  0 1 3 .
0 0 1
Begin by forming the 3 × 6 matrix
 
1 2 0 1 0 0
M = 0 1 3 0 1 0 .
0 0 1 0 0 1
To put M in row echelon form by row reduction, first subtract 3 times the 3rd row from the
2nd row, obtaining  
1 2 0 1 0 0
 0 1 0 0 1 −3  .
0 0 1 0 0 1

168
§3.7 Solving Linear Systems and Inverses

Second, subtract 2 times the 2nd row from the 1st row, obtaining
 
1 0 0 1 −2 6
 0 1 0 0 1 −3  .
0 0 1 0 0 1

Theorem 3.7.9 implies that  


1 −2 6
A−1 = 0 1 −3  ,
0 0 1
which can be verified by matrix multiplication.

Computing the Inverse Using MATLAB There are two ways that we can compute inverses
using MATLAB . Either we can perform the row reduction of (3.7.3) directly or we can use
the MATLAB the command inv. We illustrate both of these methods. First type e3_7_4
to recall the matrix  
1 2 4
{MATLAB:31} A= 3 1 1 . (3.7.4*)
2 0 −1

To perform the row reduction of (3.7.3) we need to form the matrix M . The MATLAB
command for generating an n × n identity matrix is eye(n). Therefore, typing

M = [A eye(3)]

in MATLAB yields the result

M =
1 2 4 1 0 0
3 1 1 0 1 0
2 0 -1 0 0 1

Now row reduce M to reduced echelon form as follows. Type

M(3,:) = M(3,:) - 2*M(1,:)


M(2,:) = M(2,:) - 3*M(1,:)

obtaining

169
§3.7 Solving Linear Systems and Inverses

M =
1 2 4 1 0 0
0 -5 -11 -3 1 0
0 -4 -9 -2 0 1

Next type

M(2,:) = M(2,:)/M(2,2)
M(3,:) = M(3,:) + 4*M(2,:)
M(1,:) = M(1,:) - 2*M(2,:)

to obtain

M =
1.0000 0 -0.4000 -0.2000 0.4000 0
0 1.0000 2.2000 0.6000 -0.2000 0
0 0 -0.2000 0.4000 -0.8000 1.0000

Finally, type

M(3,:) = M(3,:)/M(3,3)
M(2,:) = M(2,:) - M(2,3)*M(3,:)
M(1,:) = M(1,:) - M(1,3)*M(3,:)

to obtain

M =
1.0000 0 0 -1.0000 2.0000 -2.0000
0 1.0000 0 5.0000 -9.0000 11.0000
0 0 1.0000 -2.0000 4.0000 -5.0000

Thus C = A−1 is obtained by extracting the last three columns of M by typing

C = M(:,[4 5 6])

which yields

C =
-1.0000 2.0000 -2.0000
5.0000 -9.0000 11.0000
-2.0000 4.0000 -5.0000

170
§3.7 Solving Linear Systems and Inverses

You may check that C is the inverse of A by typing A*C and C*A.
In fact, this entire scheme for computing the inverse of a matrix has been preprogrammed
into MATLAB . Just type

inv(A)

to obtain

ans =
-1.0000 2.0000 -2.0000
5.0000 -9.0000 11.0000
-2.0000 4.0000 -5.0000

We illustrate again this simple method for computing the inverse of a matrix A. For example,
reload the matrix in (3.1.4*) by typing e3_1_4 and obtaining:

A =
5 -4 3 -6 2
2 -4 -2 -1 1
1 2 1 -5 3
-2 -1 -2 1 -1
1 -6 1 1 4

The command B = inv(A) stores the inverse of the matrix A in the matrix B, and we
obtain the result

B =
-0.0712 0.2856 -0.0862 -0.4813 -0.0915
-0.1169 0.0585 0.0690 -0.2324 -0.0660
0.1462 -0.3231 -0.0862 0.0405 0.0825
-0.1289 0.0645 -0.1034 -0.2819 0.0555
-0.1619 0.0810 0.1724 -0.1679 0.1394

This computation also illustrates the fact that even when the matrix A has integer entries,
the inverse of A usually has noninteger entries.
Let b = (2, −8, 18, −6, −1). Then we may use the inverse B = A−1 to compute the solution
of Ax = b. Indeed if we type

b = [2;-8;18;-6;-1];
x = B*b

171
§3.7 Solving Linear Systems and Inverses

then we obtain

x =
-1.0000
2.0000
1.0000
-1.0000
3.0000

as desired (see (3.1.5*)). With this computation we have confirmed the analytical results of
the previous subsections.

Exercises

{c4.8.1}
1. Verify by matrix multiplication that the following matrices are inverses of each other:
   
1 0 2 −1 0 2
 0 −1 2  and  2 −1 −2  .
1 0 1 1 0 −1

If two matrices are inverses of each other, then their product is the identity matrix. So:
    
1 0 2 −1 0 2 1 0 0
 0 −1 2  2 −1 −2  =  0 1 0 .
1 0 1 1 0 −1 0 0 1

{c4.8.2}
2. Let α 6= 0 be a real number and let A be an invertible matrix. Show that the inverse of the
1
matrix αA is given by A−1 .
α
We can compute
   
1 −1 1
(αA) A = α (AA−1 ) = I.
α α
1 −1
So the inverse of αA is indeed A .
α

172
§3.7 Solving Linear Systems and Inverses

{c4.8.3}
 
a 0
3. Let A = be a 2 × 2 diagonal matrix. For which values of a and b is A invertible?
0 b
Answer: The matrix A is invertible for a 6= 0 and b 6= 0.
Solution: By Theorem 3.7.8, a matrix is invertible if it is row equivalent to the identity matrix.
If a = 0 or if b = 0, then A is not row equivalent to I2 and is therefore not invertible.
{c4.8.4}
4. Let A, B, C be general n × n matrices. Simplify the expression A−1 (BA−1 )−1 (CB −1 )−1 .
Answer: The expression simplifies to C −1 .
Solution: Proposition 3.7.3 states that, if A and B are invertible matrices such that AB is defined,
then (AB)−1 = B −1 A−1 . Therefore,
A−1 (BA−1 )−1 (CB −1 )−1 = A−1 AB −1 BC −1 = C −1 .

{c4.9.3a} In Exercises 5 – 6 use row reduction to find the inverse of the given matrix.
 
1 4 5
5.  0 1 −1 .
−2 0 −8
 
−8 32 −9
1
Answer: A−1 =  2 2 1 .
10
2 −8 1
Solution: Let  
1 4 5 1 0 0
M = (A|I3 ) =  0 1 −1 0 1 0 .
−2 0 −8 0 0 1

{c4.9.3b} Then, row reduce M to obtain the augmented matrix (I3 |A−1 ).
 
1 −1 −1
6.  0 2 0 .
2 0 −1
 1 
−1 − 1
2
Answer: B = 
−1 1
0 .
 

 0
2
−2 −1 1
Solution: Let  
1 −1 −1 1 0 0
N = (B|I3 ) =  0 2 0 0 1 0 .
2 0 −1 0 0 1

173
§3.7 Solving Linear Systems and Inverses

Row reduce N to obtain the augmented matrix (I3 |B −1 ).

{c4.9.3b.2}
7. True or false? If true, explain why; if false, give a counterexample.

(a) If A and B are matrices such that AB = I2 , then BA = I2 .


(b) If A, B, C are matrices such that AB = In and BC = In , then A = C.
(c) Let A be an m × n matrix and b be a column m vector. If the system of linear equations
Ax = b has a unique solution, then A is invertible.

Answer: (a) False; (b) True; (c) False


Solution:
 
  1 0  
1 0 0 1 0
(a) Let A = and B =  0 1 . Then AB = and BA =
0 1 0 0 1
0 0
 
1 0 0
 0 1 0 .
0 0 0
(b) Calculate (AB)C = In C = C and A(BC) = AIn = A. By associativity of matrix multiplica-
tion, A = C.
   
1 1
(c) Let A = and b = , then the unique solution is the 1 × 1 matrix x = 1. But A is
0 0
not square and therefore not invertible.

{c4.8.5}
8. Let A be an n × n matrix that satisfies

A3 + a2 A2 + a1 A + In = 0,

where A2 = AA and A3 = AA2 . Show that A is invertible.


Hint: Let B = −(A2 + a2 A + a1 In ) and verify that AB = BA = In .
The matrix A is invertible with inverse B = −(A2 + a2 A + a1 ). To show this, note that

AB = A(−(A2 + a2 A + a1 )) = In

if and only if A3 + a2 A2 + a1 A = −In . This condition is valid by definition of A, so A is invertible.


Similarly, BA = In .

174
§3.7 Solving Linear Systems and Inverses

{c4.8.6}
9. Let A be an n × n matrix that satisfies

Am + am−1 Am−1 + · · · + a1 A + In = 0.

Show that A is invertible.


Given that
Am + am−1 Am−1 + · · · + a1 A + In = 0,
we can compute
A(−(Am−1 + am−1 Am−2 + · · · + a1 ) = In .
Therefore, A is an invertible matrix with inverse

−(Am−1 + am−1 Am−2 + · · · + a1 ).

{c4.9.6}
10. For which values of a, b, c is the matrix
 
1 a b
A= 0 1 c 
0 0 1

invertible? Find A−1 when it exists.


Answer: The matrix A is invertible for any choice of a, b, and c, and
 
1 −a −b + ac
−1
A = 0 1 −c .
0 0 1

Solution: Theorem 3.7.8 states that a matrix is invertible if it is row equivalent to In . By row
reducing the augmented matrix (A|I3 ) as follows:
   
1 a b 1 0 0 1 0 0 1 −a −b + ac
 0 1 c 0 1 0 → 0 1 0 0 1 −c 
0 0 1 0 0 1 0 0 1 0 0 1

we show that A is invertible for any choice of a, b, and c, and find a value for A−1 .

In Exercises 11 – 12 use row reduction to find the inverse of the given matrix and confirm your
results using the command inv.

175
§3.7 Solving Linear Systems and Inverses

{c4.9.7a} 11. (matlab)  


2 1 3
{MATLAB:32} A= 1 2 3 . (3.7.5*)
5 1 0

Type M = [A eye(3)] in MATLAB, then row reduce the augmented matrix M, obtaining:

ans =
1.0000 0 0 0.1667 -0.1667 0.1667
0 1.0000 0 -0.8333 0.8333 0.1667
0 0 1.0000 0.5000 -0.1667 -0.1667

Check your answer using inv(A) which confirms

ans =
0.1667 -0.1667 0.1667
-0.8333 0.8333 0.1667
0.5000 -0.1667 -0.1667

{c4.9.7b} 12. (matlab)  


0 5 1 3
 1 5 3 −1 
{MATLAB:33} B= . (3.7.6*)
 2 1 0 −4 
1 7 2 3

Type N = [B eye(4)] in MATLAB, then row reduce N to obtain:

ans =
1.0000 0 0 0 -1.5714 -0.4286 0 1.4286
0 1.0000 0 0 0.7429 0.0571 0.2000 -0.4571
0 0 1.0000 0 -0.9143 0.3143 -0.4000 0.4857
0 0 0 1.0000 -0.6000 -0.2000 -0.2000 0.6000

The command inv(B) returns the right half of this augmented matrix:

ans =
-1.5714 -0.4286 0 1.4286
0.7429 0.0571 0.2000 -0.4571
-0.9143 0.3143 -0.4000 0.4857
-0.6000 -0.2000 -0.2000 0.6000

176
§3.7 Solving Linear Systems and Inverses

{c4.9.8} 13. (matlab) Try to compute the inverse of the matrix


 
1 0 3
{MATLAB:34} C =  −1 2 −2  (3.7.7*)
0 2 1

in MATLAB using the command inv. What happens — can you explain the outcome?
Now compute the inverse of the matrix
 
1  3
 −1 2 −2 
0 2 1

for some nonzero numbers  of your choice. What can be observed in the inverse if  is very small?
What happens when  tends to zero?
Typing inv(C) in MATLAB yields the response

Warning: Matrix is singular to working precision.


ans =
Inf Inf Inf
Inf Inf Inf
Inf Inf Inf

Matrix C cannot be inverted because it is not row equivalent to I3 . We can type rref(C) to confirm
that

ans =
1.0000 0 3.0000
0 1.0000 0.5000
0 0 0

When C(1,2) is nonzero, C is invertible. As  → 0 the entries of C −1 approach infinity. For


example, if  = 0.01, then inv(C) yields

ans =
600.0000 599.0000 -602.0000
100.0000 100.0000 -100.0000
-200.0000 -200.0000 201.0000

At  = 0, C −1 does not exist.

177
§3.7 Solving Linear Systems and Inverses

{A3.7.1}
14. Let A and B be 3 × 3 invertible matrices so that
   
1 0 −1 1 1 1
A−1 =  −1 −1 0  and B −1 = 1 1 0 
0 1 −1 1 0 0

Without computing A or B, determine the following:

(a) rank(A)
(b) The solution to  
1
Bx =  1 
1

(c) (2BA)−1
(d) The matrix C so that ACB + 3I3 = 0.

(a) A is an invertible 3 × 3 matrix, so rank(A) = 3.


(b) The solution is       
1 1 1 1 1 3
x = B −1  1  =  1 1 0  1  =  2 
1 1 0 0 1 1
(c)
    
1 0 −1 1 1 1 0 1 1
−1 1 −1 −1 1 1
(2BA) = A B = −1 −1 0  1 1 0  =  −2 −2 −1 
2 2 2
0 1 −1 1 0 0 0 1 0

(d) Recall that multiplication on the left by a matrix is not the same as multiplication on the
right. We have that

ACB = −3I3 =⇒ A−1 ACB = −3A−1 I3 multiplying on the left by A−1


=⇒ CB = −3A−1
=⇒ CBB −1 = −3A−1 B −1 multiplying on the right by B −1
=⇒ C = −3A−1 B −1
 
0 1 1
=⇒ C = −3  −2 −2 −1 
0 1 0

178
§3.7 Solving Linear Systems and Inverses

{A3.7.2}
15. True or False: Determine whether the following statements are true or false, and explain your
answer.

(a) The only 3 × 2 matrix A so that Ax = 0 for all x ∈ R2 is A = 0.


(b) A system of 5 equations in 3 unknowns with the solution x1 = 0, x2 = −3, x3 = 1 must have
infinitely many solutions.
(c) If A is a 2 × 2 matrix and A2 = 0, then A = 0.
(d) If u, v ∈ R3 are perpendicular, then ku + vk = kuk + kvk.

(a) True:  
  0 0
A= Ae1 Ae2 = 0 0 = 0 0 =0
0 0
(b) False, it may have a unique solution. For example, the system of equations
   
1 0 0 0
 0 1 0   −3 
   
 0 0 1 = 1 
   
 1 1 1   −2 
1 0 1 1

has x1 = 0, x2 = −3, x3 = 1 as a unique solution.


(c) False: if  
0 1
A=
0 0
then A2 = 0.
(d) False:        
1 0 √ 1 0
 0  +  1  = 2 6= 2 =  0  +  1 
0 0 0 0
However, by the Pythagorean theorem,

ku + vk2 = kuk2 + kvk2

179
§3.8 Determinants of 2 × 2 Matrices

{S:det2x2} 3.8 Determinants of 2 × 2 Matrices


There is a simple way for determining whether a 2 × 2 matrix A is invertible and there is a
simple formula for finding A−1 . First, we present the formula. Let
 
a b
A= .
c d

and suppose that ad − bc 6= 0. Then


 
1 d −b
{e:formAinv} A −1
= . (3.8.1)
ad − bc −c a

This is most easily verified by directly applying the formula for matrix multiplication. So
A is invertible when ad − bc 6= 0. We shall prove below that ad − bc must be nonzero when
A is invertible.
From this discussion it is clear that the number ad − bc must be an important quantity for
2 × 2 matrices. So we define:

Definition 3.8.1. The determinant of the 2 × 2 matrix A is

{D:determinant} det(A) = ad − bc. (3.8.2)


{propdet}
Proposition 3.8.2. As a function on 2 × 2 matrices, the determinant satisfies the following
properties.

(a) The determinant of an upper triangular matrix is the product of the diagonal elements.

(b) The determinants of a matrix and its transpose are equal.

(c) det(AB) = det(A) det(B).

Proof Both (a) and (b) are easily verified by direct calculation. Property (c) is also
verified by direct calculation — but of a more extensive sort. Note that
    
a b α β aα + bγ aβ + bδ
= .
c d γ δ cα + dγ cβ + dδ

180
§3.8 Determinants of 2 × 2 Matrices

Therefore,

det(AB) = (aα + bγ)(cβ + dδ) − (aβ + bδ)(cα + dγ)


= (acαβ + bcβγ + adαδ + bdγδ)
−(acαβ + bcαδ + adβγ + bdγδ)
= bc(βγ − αδ) + ad(αδ − βγ)
= (ad − bc)(αδ − βγ)
= det(A) det(B),

as asserted. 
{C:2x2invert}
Corollary 3.8.3. A 2 × 2 matrix A is invertible if and only if det(A) 6= 0.

Proof If A is invertible, then AA−1 = I2 . Proposition 3.8.2 implies that

det(A) det(A−1 ) = det(I2 ) = 1.

Therefore, det(A) 6= 0. Conversely, if det(A) 6= 0, then (3.8.1) implies that A is invertible.




Determinants and Area Suppose that v and w are two vectors in R2 that point in different
directions. Then, the set of points

z = αv + βw where 0 ≤ α, β ≤ 1

is a parallelogram, that we denote by P . We denote the area of P by |P |. For example,


the unit square S, whose corners are (0, 0), (1, 0), (0, 1), and (1, 1), is the parallelogram
generated by the unit vectors e1 and e2 .
Next let A be a 2 × 2 matrix and let

A(P ) = {Az : z ∈ P }.

It follows from linearity (since Az = αAv + βAw) that A(P ) is the parallelogram generated
by Av and Aw.
{P:det&area}
Proposition 3.8.4. Let A be a 2 × 2 matrix and let S be the unit square. Then

{e:det&area2} |A(S)| = | det A|. (3.8.3)

181
§3.8 Determinants of 2 × 2 Matrices

Proof Note that A(S) is the parallelogram generated by u1 = Ae1 and u2 = Ae2 , and u1
and u2 are the columns of A. It follows that
 t
u1 u1 ut1 u2

2 t t
(det A) = det(A ) det(A) = det(A A) = det .
ut2 u1 ut2 u2
Hence
||u1 ||2
 
2 u1 · u2
(det A) = det = ||u1 ||2 ||u2 ||2 − (u1 · u2 )2 .
u1 · u2 ||u2 ||2
Recall that (1.4.5) of Chapter 1 states that
|P |2 = ||v||2 ||w||2 − (v · w)2 .
where P is the parallelogram generated by v and w. Therefore, (det A)2 = |A(S)|2 and
{T:det&area} (3.8.3) is verified. 
Theorem 3.8.5. Let P be a parallelogram in R2 and let A be a 2 × 2 matrix. Then
{e:det&area} |A(P )| = | det A||P |. (3.8.4)

Proof First note that (3.8.3) a special case of (3.8.4), since |S| = 1. Next, let P be
the parallelogram generated by the (column) vectors v and w, and let B = (v|w). Then
P = B(S). It follows from (3.8.3) that |P | = | det B|. Moreover,
|A(P )| = |(AB)(S)|
= | det(AB)|
= | det A|| det B|
= | det A||P |,
as desired. 

Exercises

{c4.9.1}
1. Find the inverse of the matrix  
2 1
.
3 2

Use (3.8.1) to compute the inverse of the matrix, as follows:


 −1    
2 1 1 2 −1 2 −1
= = .
3 2 4−3 −3 2 −3 2

182
§3.8 Determinants of 2 × 2 Matrices

{c7.8.4}
 
1 K
2. Find the inverse of the shear matrix .
0 1
Answer: The inverse of the matrix is
 
1 −K
.
0 1

Solution: Note that the inverse of any 2 × 2 matrix is:


 −1  
a b 1 d −b
= .
c d ad − bc −c a

{c4.9.4}
 
a b
3. Show that the 2 × 2 matrix A = is row equivalent to I2 if and only if ad − bc 6= 0.
c d
Hint: Prove this result separately in the two cases a 6= 0 and a = 0.
Case: a 6= 0. A can be row reduced as follows:
 
b
! b
 1
 
a b 1 a
→ → ad − bc  .

c d a
c d 0
a
If ad − bc 6= 0, then the matrix can be row reduced to I2 , whereas if ad − bc = 0, the row reduced
matrix is:
b
!
1
a
0 0
which cannot be reduced further and is not row equivalent to I2 .
Case: a = 0. If either c = 0 or b = 0, then the resulting matrices,
   
0 b 0 0
and
0 d c d

respectively, are not row equivalent to I2 , and ad − bc = 0 − 0 = 0. If c 6= 0 and b 6= 0, then the


matrix can be row reduced:
d
    !
0 b c d 1
→ → c
c d 0 b 0 b

which is row equivalent to I2 . So A is indeed row equivalent to I2 if and only if ad − bc 6= 0.

183
§3.8 Determinants of 2 × 2 Matrices

{c4.9.5}
4. Let A be a 2 × 2 matrix having integer entries. Find a condition on the entries of A that
guarantees that A−1 has integer entries.
Answer: The matrix A−1 has integer entries when |ad − bc| = 1.
Solution: By (3.8.1),
 
1 d −b
A−1 = .
ad − bc −c a
1
So, in order for A−1 to have integer entries, must be an integer. Since a, b, c, and d are
ad − bc
1
integers, is an integer only if |ad − bc| = 1.
ad − bc

{c6.4.4}
5. Let A be a 2 × 2 matrix and assume that det(A) 6= 0. Then use the explicit form for A−1 given
in (3.8.1) to verify that
1
det(A−1 ) = .
det(A)

Verify by computation:

a d −b −c ad − bc 1
det(A−1 ) = − = = .
det(A) det(A) det(A) det(A) (det(A))2 det(A)

{c3_8.00650a}
6. Suppose a 2 × 2 matrix A satisfies the following equation:
   
0 2 −1 2
{e:c3_8.00650} A = (3.8.5)
1 2 1 4

Without calculating the entries of A, find det(A).


Answer: det(A) = 3
Solution: Use the product rule for determinants to compute
      
0 2 0 2 −1 2
det A = det(A) det = −2 det(A) = det = −6.
1 2 1 2 1 4

Hence det(A) = 3.
{c3_8.00650b}
7. Find the entries of A defined in (3.8.5) and verify your determinant calculation from Exercise 6.
 
2 −1
Answer: A =
1 1

184
§3.8 Determinants of 2 × 2 Matrices

Solution: Use invertibility to compute


  −1      
−1 2 0 2 −1 2 1 −2 2 2 −1
A= = =
1 4 1 2 1 4 2 1 0 1 1

Hence det(A) = 3.

{c7.8.3}
8. Sketch the triangle whose vertices are 0, p = (3, 0)t , and q = (0, 2)t ; and find the area of this
triangle. Let  
−4 −3
M= .
5 −2
Sketch the triangle whose vertices are 0, M p, and M q; and find the area of this triangle.
Answer: Let T be the triangle whose vertices are 0, p, and q, and let U be the triangle whose
vertices are 0, M p and M q. Then AT = 3 is the area of T , and AU = 69 is the area of U .
1
Solution: Use the formula for the area of a triangle to compute AT = (3)(2) = 3. Then, use
2
Theorem 3.8.5 to compute
AU = | det M |AT = 23(3) = 69.
Figure 8 shows triangles T and U .

16

14

12

10

2
U
T
0

−2

−4
−12 −10 −8 −6 −4 −2 0 2 4

Figure 8

185
§3.8 Determinants of 2 × 2 Matrices

{c7.8.4A}
9. Cramer’s rule provides a method based on determinants for finding the unique solution to the
linear equation Ax = b when A is an invertible matrix. More precisely, let A be an invertible 2 × 2
matrix and let b ∈ R2 be a column vector. Let Bj be the 2 × 2 matrix obtained from A by replacing
the j th column of A by the vector b. Let x = (x1 , x2 )t be the unique solution to Ax = b. Then
Cramer’s rule states that
det(Bj )
{E:cramer} xj = . (3.8.6)
det(A)
Prove Cramer’s rule. Hint: Write the general system of two equations in two unknowns as

a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2 .

Subtract a11 times the second equation from a21 times the first equation to eliminate x1 ; then solve
for x2 , and verify (3.8.6). Use a similar calculation to solve for x1 .
The general system of linear equations in two unknowns is:

a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2 .

Subtract a11 times the 2nd equation from a21 times the 1st equation to eliminate x1 and obtain

(a21 a12 − a11 a22 )x2 = (a21 b1 − a11 b2 ).

Therefore
a21 b1 − a11 b2 det(B2 )
x2 = = .
a21 a12 − a11 a22 det(A)
A similar argument works for x1 .

In Exercises 10 – 11 use Cramer’s rule (3.8.6) to solve the given system of linear equations.
{c7.8.4B}
2x + 3y = 2
10. Solve for x.
3x − 5y = 1
13
Answer: x = .
19
Solution: By Cramer’s rule (see (3.8.6)),
   
2 3 2 3 −13
x = det det = .
1 −5 3 −5 −19
{c7.8.4C}
4x − 3y = −1
11. Solve for y.
x + 2y = 7

186
§3.8 Determinants of 2 × 2 Matrices

29
Answer: y = .
11
Solution: By Cramer’s rule (see (3.8.6)),
   
4 −1 4 −3 29
y = det det = .
1 7 1 2 11

{c4.9.9} 12. (matlab) Use MATLAB to choose five 2 × 2 matrices at random and compute their inverses.
Do you get the impression that ‘typically’ 2 × 2 matrices are invertible? Try to find a reason for
this fact using the determinant of 2 × 2 matrices.
A randomly selected 2 × 2 matrix is almost always invertible. A matrix will fail to be invertible
only if the determinant of the matrix is 0, which is seldom the case.

In Exercises 13 – 16 use the unit square icon in the program map to test Proposition 3.8.4, as follows.
Enter the given matrix A into map and map the unit square icon. Compute det(A) by estimating
the area of A(S) — given that S has unit area. For each matrix, use this numerical experiment to
decide whether or not the matrix is invertible.
 
0 −2
13. (matlab) A = .
{c3.8.AA} 2 0
Answer: The matrix A is invertible and det(A) = 4.
Solution: Figure 13 shows the map output for this matrix. The area of the square resulting from
the map is 4, so | det(A)| = 4.
 
−0.5 −0.5
14. (matlab) A = .
{c3.8.AB} 0.7 0.7
Answer: The matrix A is not invertible and det(A) = 0.
Solution: Figure 14 shows the map output for this matrix. The square is mapped to a line, whose
area is 0, so | det(A)| = 0.
Answer: The matrix A is invertible and det(A) = 1.
Solution: Figure 16 shows the map output for this matrix. The result of the map is another square
of area 1, so | det(A)| = 1.
 
−1 −0.5
15. (matlab) A = .
{c3.8.AC} −2 −1
Answer: The matrix A is not invertible and det(A) = 0.
Solution: Figure 15 shows the map output for this matrix. The square is mapped to a line, whose
area is 0, so | det(A)| = 0.

187
§3.8 Determinants of 2 × 2 Matrices

(x,y)−Plane (x,y)−Plane
(x,y)−Plane 4 (x,y)−Plane 4
4 4

3 3
3 3

2 2
2 2

1
1 1
1

0 0

y
0 0

y
−1 −1 −1 −1

−2 −2 −2 −2

−3 −3 −3 −3

−4 −4 −4 −4
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
x x x x

Figure 13 Figure 14 Figure 15 Figure 16

 
0.7071 0.7071
16. (matlab) A = .
{c3.8.AD} −0.7071 0.7071

17. Suppose a 2 × 2 matrix A satisfies the following equation:


   
0 2 −1 2
A = .
1 2 1 4

Without calculating the entries of A, find det(A).


Answer: det(A) = 3.
Solution: It follows from the identity that
   
0 2 −1 2
det(A) det = det
1 2 1 4

Hence
−2 det(A) = −6
and
det A = 3.

188
Chapter 4 Solving Linear Differential Equations

4 Solving Linear Differential Equations


The study of linear systems of equations given in Chapter 2 provides one motivation for
the study of matrices and linear algebra. Linear constant coefficient systems of ordinary
differential equations provide a second motivation for this study. In this chapter we show
how the phase space geometry of systems of differential equations motivates the idea of
eigendirections (or invariant directions) and eigenvalues (or growth rates).
We begin this chapter with a discussion of the theory and application of the simplest of
linear differential equations, the linear growth equation, ẋ = λx. In Section 4.1, we solve
the linear growth equation and discuss the fact that solutions to differential equations are
functions; and we emphasize this point by using MATLAB to graph solutions of x as a
function of t. In the optional Section 4.2 we illustrate the applicability of this very simple
equation with a discussion of compound interest and a simple population model.
The next two sections introduce planar constant coefficient linear differential equations. In
these sections we use the program pplane9 (written by John Polking) that solves numerically
planar systems of differential equations. In Section 4.3 we discuss uncoupled systems — two
independent one dimensional systems like those presented in Section 4.1 — whose solution
geometry in the plane is somewhat more complicated than might be expected. In Section 4.4
we discuss coupled linear systems. Here we illustrate the existence and nonexistence of
eigendirections.
In Section 4.5 we show how initial value problems can be solved by building the solution —
through the use of superposition as discussed in Section 3.4 — from simpler solutions. These
simpler solutions are ones generated from real eigenvalues and eigenvectors — when they
exist. In Section 4.6 we develop the theory of eigenvalues and characteristic polynomials of
2 × 2 matrices. (The corresponding theory for n × n matrices is developed in Chapter 7.)
The method for solving planar constant coefficient linear differential equations with real
eigenvalues is summarized in Section 4.7. This method is based on the material of Sec-
tions 4.5 and 4.6. The complete discussion of the solutions of linear planar systems of
differential equations is given in Chapter 6. This discussion is best done after we have
introduced the linear algebra concepts of vector subspaces and bases in Chapter 5.
The chapter ends with an optional discussion of Markov chains in Section 4.8. Markov chains
give a method for analyzing branch processes where at each time unit several outcomes are
possible, each with a given probability.

189
§4.1 A Single Differential Equation

{chap:SolveOdes}

{S:singlelineareqn} 4.1 A Single Differential Equation


Algebraic operations such as addition and multiplication are performed on numbers while
the calculus operations of differentiation and integration are performed on functions. Thus
algebraic equations (such as x2 = 9) are solved for numbers (x = ±3) while differential (and
integral) equations are solved for functions.
In Chapter 2 we discussed how to solve systems of linear equations, such as

x1 + x2 = 2
x1 − x2 = 4

for numbers
x1 = 3 and x2 = −1,
while in this chapter we discuss how to solve some linear systems of differential equations
for functions.
Solving a single linear equation in one unknown x is a simple task. For example, solve

2x = 4

for x = 2. Solving a single differential equation in one unknown function x(t) is far from
trivial.

Integral Calculus as a Differential Equation Mathematically, the simplest type of differential


equation is:
dx
{e:intcalc} (t) = f (t) (4.1.1)
dt
where f is some continuous function. In words, this equation asks us to find all functions
x(t) whose derivative is f (t). The fundamental theorem of calculus tells us the answer: x(t)
is an antiderivative of f (t). Thus to find all solutions, we just integrate both sides of (4.1.1)
with respect to t. Formally, using indefinite integrals,
Z Z
dx
{E:integrate} (t)dt = f (t)dt + C, (4.1.2)
dt

where C is an arbitrary constant. (It is tempting to put a constant of integration on both


sides of (4.1.2), but two constants are not needed, as we can just combine both constants

190
§4.1 A Single Differential Equation

on the right hand side of this equation.) Since the indefinite integral of dx/dt is just the
function x(t), we have
Z
{e:intcalcsoln} x(t) = f (τ )dτ + C. (4.1.3)

In particular, finding closed form solutions to differential equations of the type (4.1.1) is
equivalent to finding all definite integrals of the function f (t). Indeed, to find closed form
solutions to differential equations like (4.1.1) we need to know all of the techniques of
integration from integral calculus.
We note that if x(t) is a real-valued function of t, then we denote the derivative of x with
respect to t using the following
dx
ẋ x0
dt
all of which are standard notations for the derivative.

Initial Conditions and the Role of the Integration Constant C Equation (4.1.3) tells us that
there are an infinite number of solutions to the differential equation (4.1.1), each one cor-
responding to a different choice of the constant C. To understand how to interpret the
constant C, consider the example
dx
(t) = cos t.
dt
Using (4.1.3) we see that the answer is
Z
x(t) = cos τ dτ + C = sin t + C.

Note that
x(0) = sin(0) + C = C.
Thus, the constant C represents an initial condition for the differential equation. We will
return to the discussion of initial conditions several times in this chapter.

The Linear Differential Equation of Growth and Decay The subject of differential equations
that we study begins when the function f on the right hand side of (4.1.1) depends explicitly
on the function x, and the simplest such differential equation is:

dx
(t) = x(t).
dt

191
§4.1 A Single Differential Equation

Using results from differential calculus, we can solve this equation; indeed, we can solve the
slightly more complicated equation

dx
{lin1} (t) = λx(t), (4.1.4)
dt
where λ ∈ R is a constant. The differential equation (4.1.4) is linear since x(t) appears by
itself on the right hand side. Moreover, (4.1.4) is homogeneous since the constant function
x(t) = 0 is a solution.
In words (4.1.4) asks: For which functions x(t) is the derivative of x(t) equal to λx(t). The
function
x(t) = eλt
is such a function, since
dx d
(t) = eλt = λeλt = λx(t).
dt dt
More generally, the function
{soln1} x(t) = Keλt (4.1.5)
is a solution to (4.1.4) for any real constant K. We claim that the functions (4.1.5) list all
(differentiable) functions that solve (4.1.4).
To verify this claim, we let x(t) be a solution to (4.1.4) and show that the ratio

x(t)
= x(t)e−λt
eλt
is a constant (independent of t). Using the product rule and (4.1.4), compute

d  d d −λt 
x(t)e−λt = (x(t)) e−λt + x(t)

e
dt dt dt
−λt −λt
= (λx(t))e + x(t)(−λe )
= 0.

Now recall that the only functions whose derivatives are identically zero are the constant
functions. Thus,
x(t)e−λt = K
for some constant K ∈ R. Hence x(t) has the form (4.1.5), as claimed.
Next, we discuss the role of the constant K. We have written the function as x(t), and we
have meant the reader to think of the variable t as time. Thus x(0) is the initial value of

192
§4.1 A Single Differential Equation

the function x(t) at time t = 0; we say that x(0) is the initial value of x(t). From (4.1.5)
we see that
x(0) = K,
and that K is the initial value of the solution of (4.1.4). Henceforth, we write K as x0 so
that the notation calls attention to the special meaning of this constant.
By deriving (4.1.5) we have proved:
{T:singleeqn}
Theorem 4.1.1. There is a unique solution to the initial value problem

dx
(t) = λx(t)
{ivp1} dt (4.1.6)
x(0) = x0 .

That solution is
x(t) = x0 eλt .

As a consequence of Theorem 4.1.1 we see that there is a qualitative difference in the behavior
of solutions to (4.1.6) depending on whether λ > 0 or λ < 0. Suppose that x0 > 0. Then

+∞ λ>0
{explimits} λt
lim x(t) = lim x0 e = (4.1.7)
t→∞ t→∞ 0 λ < 0.

When λ > 0 we say that the solution has exponential growth and when λ < 0 we say that
the solution has exponential decay . In either case, however, the number λ is called the
growth rate. We can visualize this discussion by graphing the solutions in MATLAB.
Suppose we set x0 = 1 and λ = ±0.5. Type

x0 = 1;
lambda = 0.5;
t = linspace(-1,4,100);
x = x0*exp(lambda*t);
plot(t,x)
hold on
xlabel('t')
ylabel('x')
lambda = -0.5;
x = x0*exp(lambda*t);
plot(t,x)

193
§4.1 A Single Differential Equation

The result of this calculation is shown in Figure 12. In this way we can actually see the
difference between exponential growth (λ = 0.5) and exponential decay (λ = −0.5), as
discussed in the limit in (4.1.7).

4
x

0
−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
t

{graph_labelfig} Figure 12: Solutions of (4.1.4) for t ∈ [−1, 4], x0 = 1 and λ = ±0.5.

Exercises

In Exercises 1 – 4 determine whether or not each of the given functions x1 (t) and x2 (t) is a solution
to the given differential equation.
{c3.1.ba}
dx t
1. ODE: = .
dt x−1
p
4t2 + 1
1+
Functions: x1 (t) = t + 1 and x2 (t) = .
2
Answer: The function x1 (t) is a solution to the differential equation; the function x2 (t) is not a
solution.

194
§4.1 A Single Differential Equation

Solution: Compute
d d dx1 t t
(x1 ) = (t + 1) = 1, and = = = 1.
dt dt dt x1 − 1 (t + 1) − 1
Thus, x1 (t) is a solution to the differential equation. Then compute
 √ 
d d 1 + 4t2 + 1 4t dx2 t 2t
(x2 ) = = √ , and = = √ .
dt dt 2 4t2 + 1 dt x2 − 1 4t2 + 1 − 1
d dx2
Thus, (x2 ) 6= , so x2 (t) is not a solution to the differential equation.
{c3.1.bb} dt dt
dx
2. ODE: = x + et .
dt
Functions: x1 (t) = tet and x2 (t) = 2et .
Answer: The function x1 (t) is a solution to the differential equation; the function x2 (t) is not a
solution.
Solution: Compute
d d dx1
(x1 ) = (tet ) = tet + et , and = x1 + et = tet + et .
dt dt dt
Thus, x1 (t) is a solution to the differential equation. Then compute
d d dx2
(x2 ) = (2et ) = 2et , and = x2 + et = 2et + et = 3et .
dt dt dt
d dx2
Thus, (x2 ) 6= , so x2 (t) is not a solution to the differential equation.
{c3.1.bc} dt dt
dx
3. ODE: = x2 + 1.
dt
Functions: x1 (t) = − tan t and x2 (t) = tan t.
Answer: The function x1 (t) is not a solution to the differential equation; the function x2 (t) is a
solution.
Solution: Compute
d d dx1
(x1 ) = (− tan(t)) = − sec2 (t), and = x21 + 1 = tan2 (t) + 1 = sec2 (t).
dt dt dt
d dx1
Thus, (x1 ) 6= , so x1 (t) is not a solution to the differential equation. Then compute
dt dt
d d dx2
(x2 ) = (tan(t)) = sec2 (t), and = x22 + 1 = tan2 (t) + 1 = sec2 (t).
dt dt dt
Thus, x2 (t) is a solution to the differential equation.

195
§4.1 A Single Differential Equation

{c3.1.bd}
dx x
4. ODE: = .
dt t
Functions: x1 (t) = t + 1 and x2 (t) = 5t.
Answer: The function x1 (t) is not a solution to the differential equation; the function x2 (t) is a
solution.
Solution: Compute
d d dx1 x1 t+1
(x1 ) = (t + 1) = 1, and = = .
dt dt dt t t
d dx1
Thus, (x1 ) 6= , so x1 (t) is not a solution to the differential equation. Then compute
dt dt
d d dx2 x2 5t
(x2 ) = (5t) = 5, and = = = 5.
dt dt dt t t
Thus, x2 (t) is a solution to the differential equation.
{c3.1.1}
5. Solve the differential equation
dx
= 2x,
dt
where x(0) = 1. At what time t1 will x(t1 ) = 2?
Answer: The solution for the given initial value problem is x(t) = e2t . From this equation, we
1
find that x(t1 ) = 2 when t1 = ln 2.
2
dx dx
Solution: Note that = λx implies x(t) = x0 eλt . In this case, = 2x, so λ = 2, and x0 = 1,
dt dt
so x(t) = e . In order to find t1 , substitute into the formula for x(t), obtaining e2t1 = 2 and solve
2t

for t1 .
{c3.1.2}
6. Solve the differential equation
dx
= −3x.
dt
At what time t1 will x(t1 ) be half of x(0)?
dx
Answer: Using the initial value problem, we find that = −3x implies x(t) = x0 e−3t . Given
dt
1
this equation, x(t1 ) will be half of x(0) at time t1 = − ln(0.5).
3
Solution: Find this value of t1 by substituting into the formula for x. That is, use:
1
x0 e−3t1 = x(t1 ) = x0
2

196
§4.1 A Single Differential Equation

which implies
1
e−3t1 = .
2
Then solve for t1 .

In Exercises 7 – 10 use MATLAB to graph the given function f on the specified interval.

{c3.1.a78a} 7. (matlab) f (t) = t2 on the interval t ∈ [0, 2].


Answer: The graph is shown in Figure 7.
Solution: Graph the function f (t) = t2 using the following MATLAB commands:

t = linspace(0,2,100); x = t.*t; plot(t,x)

{c3.1.a78b} 8. (matlab) f (t) = et − t on the interval t ∈ [0, 3].


The graph is shown in Figure 8.

{c3.1.a78c} 9. (matlab) f (t) = cos(2t) − t on the interval t ∈ [2, 8].


The graph is shown in Figure 9.

{c3.1.a78d} 10. (matlab) f (t) = sin(5t) on the interval t ∈ [0, 6.5].


The graph is shown in Figure 10.
4 18 −2
1

16 0.8
3.5 −3

14 0.6
3
−4
0.4
12
2.5
−5 0.2
10
2
x

x
x

x
8 −6
−0.2
1.5
6
−7 −0.4
1
4
−0.6
−8
0.5 2 −0.8

0 0 −9 −1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.5 1 1.5 2 2.5 3 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7
t t t t

Figure 7 Figure 8 Figure 9 Figure 10

Hint: Use the fact that the trigonometric functions sin and cos can be evaluated in MATLAB in
the same way as the exponential function, that is, by using sin and cos instead of exp.

197
§4.2 *Rate Problems

{S:growthmodels} 4.2 *Rate Problems


Even though the homogeneous linear differential equation (4.1.6) is one of the simplest
differential equations, it still has some use in applications. We present two here: compound
interest and population dynamics.

Compound Interest Banks pay interest on an account in the following way. At the end of
each day, the bank determines the interest rate rday for that day, checks the principal P in
the account, and then deposits an additional rday P . So the next day the principal in this
account is (1 + rday )P . Note that if r denotes the interest rate per year, then rday = r/365.
Of course, a day is just a convenient measure for elapsed time. Before computers were
prevalent, banks paid interest yearly or quarterly or monthly or, in a few cases, even weekly,
depending on the particular bank rules.
Observe that the more frequently interest is paid, the more money is earned. For example,
if interest is paid only once at the end of a year, then the money in the account at the end
of the year is (1 + r)P , and the amount rP is called simple interest. But if interest is paid
r
twice a year, then the principal at the end of six months will be (1 + )P , and the principal
2
r
at the end of the year will be (1 + )2 P . Since
2
 r 2 1
1+ = 1 + r + r2 > 1 + r,
2 4
there is more money in the account at the end of the year if the interest is compounded
semiannually rather than annually. But how much is the difference and what is the maximum
earning potential?
While making the calculation in the previous paragraph, we implicitly made a number of
simplifying assumptions. In particular, we assumed

• an initial principal P0 is deposited in the bank on January 1,


• the money is not withdrawn for one year,
• no new money is deposited in that account during the year,
• the yearly interest rate r remains constant throughout the year, and
• interest is added to the account N times during the year.

In this model, simple interest corresponds to N = 1, compound monthly interest to N = 12,


and compound daily interest to N = 365.

198
§4.2 *Rate Problems

We first answer the question: How much money is in this account after one year? After one
1
time unit of year, the amount of money in the account is
N
 r
Q1 = 1 + P0 .
N
r
The interest rate in each time period is , the yearly rate r divided by the number of
N
time periods N . Here we have used the assumption that the interest rate remains constant
throughout the year. After two time units, the principal is:
 r  r 2
Q2 = 1 + Q1 = 1 + P0 ,
N N
and at the end of the year (that is, after N time periods)
 r N
{compint} QN = 1 + P0 . (4.2.1)
N
Here we have used the assumption that money is neither deposited nor withdrawn from our
account. Note that QN is the amount of money in the bank after one year assuming that
interest has been compounded N (equally spaced) times during that year, and the effective
interest rate when compounding N times is:
 r N
1+ − 1.
N

For the curious, we can write a program in MATLAB to compute (4.2.1). Suppose we assume
that the initial deposit P0 = $1, 000, the simple interest rate is 6% per year, and the interest
payments are made monthly. In MATLAB type

N = 12;
P0 = 1000;
r = 0.06;
QN = (1 + r/N)^N*P0

The answer is QN = $1, 061.68, and the effective interest rate for monthly payments is
6.16778%. For daily interest payments N = 365, the answer is QN = $1, 061.83, and the
effective interest rate is 6.18313%.
To find the maximum effective interest, we ask the bank to compound interest continuously;
that is, we ask the bank to compute
 r N
lim 1 + .
N →∞ N

199
§4.2 *Rate Problems

We compute this limit using differential equations. The concept of continuous interest is
rephrased as follows. Let P (t) be the principal at time t, where t is measured in units of
years. Suppose that we assume that interest is compounded N times during the year. The
length of time in each compounding period is

1
∆t = ,
N

and the change in principal during that time period is

r
∆P = P = rP ∆t.
N

It follows that
∆P
= rP,
∆t
and, on taking the limit ∆t → 0, we have the differential equation

dP
(t) = rP (t).
dt

Since P (0) = P0 the solution of the initial value problem given in Theorem 4.1.1 shows that

P (t) = P0 ert .

After one year (t = 1) we find that

P (1) = er P0 .

Note that
P (1) = lim QN ,
N →∞

and we have thus verified that


 r N
lim 1+ = er .
N →∞ N

Thus the maximum effective interest rate is er − 1. When r = 6% the maximum effective
interest rate is 6.18365%.

200
§4.2 *Rate Problems

Population Dynamics To provide a second interpretation of the constant λ in (4.1.4), we


discuss a simplified model for population dynamics. Let p(t) be the size of a population
of a certain species at time t and let r be the rate at which the population p is changing
at time t. In general, r depends on the time t and is a complicated function of birth and
death rates and of immigration and emigration, as well as of other factors. Indeed, the rate
r may well depend on the size of the population itself. (Overcrowding can be modeled by
assuming that the death rate increases with the size of the population.) These population
models assume that the rate of change in the size of the population dp/dt is given by
dp
{pop_model} (t) = rp(t), (4.2.2)
dt
they just differ on the precise form of r. In general, the rate r will depend on the size of the
population p as well as the time t, that is, r is a function r(p, t).
The simplest population model — which we now assume — is the one in which r is assumed
to be constant. Then equation (4.2.2) is identical to (4.1.4) after identifying p with x and
r with λ. Hence we may interpret r as the growth rate for the population. The form of
the solution in (4.1.5) shows that the size of a population grows exponentially if r > 0 and
decays exponentially if r < 0.
The mathematical description of this simplest population model shows that the assumption
of a constant growth rate leads to exponential growth (or exponential decay). Is this re-
alistic? Surely, no population will grow exponentially for all time, and other factors, such
as limited living space, have to be taken into account. On the other hand, exponential
growth describes well the growth in human population during much of human history. So
this model, though surely oversimplified, gives some insight into population growth.

Exercises

{c3.1.a01a} In Exercises 1 – 3 find solutions to the given initial value problems.


dx
1. = sin(2t), x(π) = 2.
dt
5 − cos(2t)
Answer: x(t) = .
2
dx
Solution: Integrate both sides of = sin(2t) as in (4.1.3) to obtain
dt
Z t Z t
cos(2t) 1
x(t) = x0 + sin(2τ )dτ = 2 + sin(2τ )dτ = 2 − + .
t0 π 2 2

201
§4.2 *Rate Problems

{c3.1.a01b}
dx
2. = t2 , x(2) = 8.
dt
t3 + 16
x(t) = .
{c3.1.a01c} 3
dx 1
3. = 2 , x(1) = 1.
dt t
1
x(t) = 2 − .
t
{c3.1.3}
4. Bacteria grown in a culture increase at a rate proportional to the number present. If the number
of bacteria doubles every 2 hours, then how many bacteria will be present after 5 hours? Express
your answer in terms of x0 , the initial number of bacteria.
Answer: After 5 hours, the amount of bacteria present will be x(5) = x0 e 2 ln 2 ≈ 5.66x0 .
5

dx
Solution: Note that the rate of growth is proportional to the amount of bacteria, so = λx(t).
dt
The initial value problem implies x(t) = x0 e . The amount of bacteria doubles after two hours.
λt

1
So, at time t = 2, 2x0 = x0 e2λ and λ = ln 2. Therefore,
2
1
x(t) = x0 e 2 (ln 2)t .

We substitute t = 5 into this formula to find the amount of bacteria after 5 hours.
{c3.1.4}
5. Suppose you deposit $10,000 in a bank at an interest of 7.5% compounded continuously. How
much money will be in your account a year and a half later? How much would you have if the
interest were compounded monthly?
Answer: After a year and a half, given instantaneously compounded interest, you would have
Pinstant (1.5) = $11,190.72. Alternatively, if interest is compounded monthly, you would have
Pmonthly (1.5) = $11,186.81.
Solution: Use the formula for compound interest. In the case of interest compounded instanta-
neously, the formula is: Pinstant (t) = P0 ert . The interest rate is given as 7.5%, so r = 0.075. Thus,
Pinstant (t) = $10,000e0.075t . We then compute the amount of money in the account after a year
and a half by setting t = 1.5.
If interest is compounded monthly, then
 12t
 r 12t 0.075
Pmonthly (t) = P0 1 + = $10,000 1 +
12 12
Again, set t = 1.5 to find the principal after a year and a half.

202
§4.2 *Rate Problems

{c3.1.5}
6. Newton’s law of cooling states that the rate at which a body changes temperature is proportional
to the difference between the body temperature and the temperature of the surrounding medium.
That is,
dT
{e:Newton} = α(T − Tm ) (4.2.3)
dt
where T (t) is the temperature of the body at time t, Tm is the constant temperature of the surround-
ing medium, and α is the constant of proportionality. Suppose the body is in air of temperature
50◦ and the body cools from 100◦ to 75◦ in 20 minutes. What will the temperature of the body be
after one hour? Hint: Rewrite (4.2.3) in terms of U (t) = T (t) − Tm .
Answer: The temperature after 1 hour will be T (1) = 56.25◦ .
Solution: Let U = T − Tm . Since Tm = 50 is a constant,
dU d dT
= (T − Tm ) = .
dt dt dt
We also know that
dU
= α(T − Tm ) = αU.
dt
Therefore
T (t) − Tm = U (t) = Keαt .
So
T (t) = Keαt + Tm = Keαt + 50.
1
Use the information T (0) = 100 in the formula for T (t) to find that K = 50. Next, use T ( ) = 75
3
to get α = −3 ln 2. So,
T (t) = 50e(−3 ln 2)t + 50
T (1) = 50e−3 ln 2 + 50 = 56.25◦ .
{c3.1.6}
7. Let p(t) be the population of group Grk at time t measured in years. Let r be the growth rate of
the group Grk. Suppose that the population of Grks changes according to the differential equation
(4.2.2). Find r so that the population of Grks doubles every 50 years. How large must r be so that
the population doubles every 25 years?
1
Answer: If the population doubles every 50 years, then r = (ln 2). If the population doubles
50
1
every 25 years, then r = (ln 2).
25
dp
Solution: Use the given equation (t) = rp(t), from which we can infer p(t) = p0 ert . The
dt
population doubles every 50 years, so p(50) = 2p0 . We can substitute this value for p(50) into the
population formula and solve for r. That is,
2p0 = p(50) = p0 e50r .

203
§4.2 *Rate Problems

In the same way, we can solve for the rate of growth if the population doubles every 25 years, by
substituting p(25) = 2p0 into the population formula.

{c3.1.7A}
8. You deposit $4,000 in a bank at an interest of 5.5% but after half a year the bank changes the
interest rate to 4.5%. Suppose that the interest is compounded continuously. How much money
will be in your account after one year?
Answer: After one year, you will have $4205.08 in your account.
Solution: Let P0 = 4000 be the initial amount of money in the account. During the first half of
the year, the money in the account grows to P1 = P0 e0.5r1 , where r1 = 0.055. During the second
half of the year, the money in the account grows to P = P1 e0.5r2 , where r2 = 0.045. So, after one
year, the money in the account is

P = P0 e0.5r1 e0.5r2 = P0 e0.5(r1 +r2 ) = 4000e0.5(0.1) .


{c3.1.7}
9. As an application of (4.1.3) answer the following question (posed by R.P. Agnew).

One day it started snowing at a steady rate. A snowplow started at noon and went two
miles in the first hour and one mile in the second hour. Assume that the speed of the
snowplow times the depth of the snow is constant. At what time did it start to snow?

To set up this problem, let d(t) be the depth of the snow at time t where t is measured in hours and
t = 0 is noon. Since the snow is falling at a constant rate r, d(t) = r(t − t0 ) where t0 is the time
that it started snowing. Let x(t) be the position of the snowplow along the road. The assumption
that speed times the depth equals a constant k means that
dx k K
(t) = =
dt d(t) t − t0
where K = k/r. The information about how far the snowplow goes in the first two hours translates
to
x(1) = 2 and x(2) = 3.
Now solve the problem.
Answer: The snow started falling at 11:23 am.
Solution: Begin with the given equation
dx K
= .
dt t − t0
In order to get a formula for x(t), we take the integral of both sides of this equation, obtaining
Z t
K
x(t) = x(0) + dτ
0 τ − t0

204
§4.2 *Rate Problems

Note that x(0) = 0, since the snowplow started plowing at time t = 0. We obtain by integration
that
t − t0 t0 − t
x(t) = K[ln(τ − t0 )]|t0 = K(ln |t − t0 | − ln | − t0 |) = K ln = K ln .
−t0 t0
We are given two values for x, namely, x(1) = 2 and x(2) = 3. We can substitute these values into
the formula for x(t) to get the system of equations

t0 − 1
2 = K ln
t0
t0 − 2
3 = K ln .
t0

Solving the second equation for K and substituting into the first equation gives

t0 − 2 t0 − 1
2 ln = 3 ln .
t0 t0

Next take the exponential of both sides, then expand and solve for t0 . That is
 2  3
t0 − 2 t0 − 1
=
t0 t0
t0 (t0 − 2)2 = (t0 − 1)3
0 = t20 − t0 − 1

Note that t0 < 0, since the snow began falling before the snowplow started. Hence

1− 5
t0 = ≈ −0.618.
2
So the snow started falling at t0 ≈ −37 minutes, that is, at 11:23 am.

{c3.1.8}
10. Two banks each pay 7% interest per year — one compounds money daily and one compounds
money continuously. What is the difference in earnings in one year in an account having $10,000.
Answer: The difference in earnings is 7 cents.
Solution: Compute this using the interest formula, since we are given that the principal P0 =
$10,000, the interest rate r = 0.07, and t = 1. So we can substitute these values into the interest
formulas and find the amount of money in each account after 1 year. The formula for interest
compounded daily is
 r 365t
P (t) = P0 1 +
365 365
0.07
P (1) = $10,000 1 +
365
P (1) = $10,725.01

205
§4.2 *Rate Problems

To find the amount of money in the account at the other bank, we substitute into the formula for
interest compounded instantaneously:

P (t) = P0 ert
P (1) = $10,000e0.07
P (1) = $10,725.08

{c3.1.9}
11. There are two banks in town — Intrastate and Statewide. You plan to deposit $5,000 in
one of these banks for two years. Statewide Bank’s best savings account pays 8% interest per
year compounded quarterly and charges $10 to open an account. Intrastate Bank’s best savings
account pays 7.75% interest compounded daily. Which bank will pay you the most money when
you withdraw your money? Would your answer change if you had planned to keep your money in
the bank for only one year?
Answer: After two years, an account at Statewide returns more interest than the Intrastate
account. However, after only one year, the Intrastate account returns more.
Solution: To find the return from each account, use the formula for compound interest:
 r N t
P (t) = P0 1 +
N
If you deposit your money at Statewide, P0 = $4,990 because of the fine. Interest is compounded
four times a year, so N = 4. The interest rate is 8%, so r = 0.08. After two years, an account at
Statewide returns  2(4)
0.08
PS (2) = $4,990 1 + = $5,846.58
4
If you deposit your money at Intrastate, r = 0.0775. Interest is compounded daily, so N = 365.
So, after two years, the account returns
 2(365)
0.0775
PI (2) = $5,000 1 + = $5,838.19.
365

After only 1 year, however,


 4
0.08
PS (1) = $4,900 1 + = $5,401.37
4

at Statewide, and
 365
0.0775
PI (1) = $5,000 1 + = $5,402.87
365
at Intrastate.

206
§4.2 *Rate Problems

{c3.1.10}
12. In the beginning of the year 1990 the population of the United States was approximately
250,000,000 people and the growth rate was estimated at 3% per year. Assuming that the growth
rate does not change, during what year will the population of the United States reach 400,000,000?
Answer: According to this population model, the population of the United States reaches
400,000,000 in the year 2005.
Solution: Note that the population model represents a continuous growth function p(t) = p0 ert .
Let t be time in years, starting in 1990. So p0 = 250 million. The growth rate r is 0.03. Thus, the
population at time t (in millions) is P (t) = 250e0.03t . The population will reach 400,000,000 when
400 = 250e0.03t . Solving for t yields t ≈ 15.67, that is, in the year 2005.

207
§4.3 Uncoupled Linear Systems of Two Equations

{sec:UncoupledLS} 4.3 Uncoupled Linear Systems of Two Equations


A system of two linear ordinary differential equations has the form
dx
(t) = ax(t) + by(t)
{e:autlin} dt (4.3.1)
dy
(t) = cx(t) + dy(t),
dt
where a, b, c, d are real constants. Solutions of (4.3.1) are pairs of functions (x(t), y(t)).
A solution to the planar system (4.3.1) that is constant in time t is called an equilibrium.
Observe that the origin (x(t), y(t)) = (0, 0) is always an equilibrium solution to the linear
system (4.3.1).
We begin our discussion of linear systems of differential equations by considering uncoupled
systems of the form
dx
(t) = ax(t)
{lin2} dt (4.3.2)
dy
(t) = dy(t).
dt
Since the system is uncoupled (that is, the equation for ẋ does not depend on y and the
equation for ẏ does not depend on x), we can solve this system by solving each equation
independently, as we did for (4.1.4):
x(t) = x0 eat
{e:explicitsoln} (4.3.3)
y(t) = y0 edt .
There are now two initial conditions that are identified by
x(0) = x0 and y(0) = y0 .

Having found all the solutions to (4.3.2) in (4.3.3), we now explore the geometry of the
phase plane for these uncoupled systems both analytically and by using MATLAB.

Asymptotic Stability of the Origin As we did for the single equation (4.1.4), we ask what
happens to solutions to (4.3.2) starting at (x0 , y0 ) as time t increases. That is, we compute
lim (x(t), y(t)) = lim (x0 eat , y0 edt ).
t→∞ t→∞

This limit is (0, 0) when both a < 0 and d < 0; but if either a or d is positive, then most
solutions diverge to infinity, since either
lim |x(t)| = ∞ or lim |y(t)| = ∞.
t→∞ t→∞

208
§4.3 Uncoupled Linear Systems of Two Equations

Roughly speaking, an equilibrium (x0 , y0 ) is asymptotically stable if every trajectory


(x(t), y(t)) beginning from an initial condition near (x0 , y0 ) stays near (x0 , y0 ) for all positive
t, and
lim (x(t), y(t)) = (x0 , y0 ).
t→∞
The equilibrium is unstable if there are trajectories with initial conditions arbitrarily close
to the equilibrium that move far away from that equilibrium.
At this stage, it is not clear how to determine whether the origin is asymptotically stable
for a general linear system (4.3.1). However, for uncoupled linear systems we have shown
that the origin is an asymptotically stable equilibrium when both a < 0 and d < 0. If either
a > 0 or d > 0, then (0, 0) is unstable.

Invariance of the Axes There is another observation that we can make for uncoupled systems.
Suppose that the initial condition for an uncoupled system lies on the x-axis; that is, suppose
y0 = 0. Then the solution (x(t), y(t)) = (x0 eat , 0) also lies on the x-axis for all time.
Similarly, if the initial condition lies on the y-axis, then the solution (0, y0 edt ) lies on the
y-axis for all time.
This invariance of the coordinate axes for uncoupled systems follows directly from (4.3.3).
It turns out that many linear systems of differential equations have invariant lines; this is a
topic to which we return later in this chapter.

Generating Phase Space Pictures with pplane9 How can we visualize a solution (x(t), y(t))
in (4.3.3) to the system of differential equations (4.3.2)? The time series approach suggests
that we should graph (x(t), y(t)) as a function of t; that is, we should plot the curve
(t, x(t), y(t))
in three dimensions. Using MATLAB it is possible to plot such a graph — but such a graph
by itself is difficult to interpret. Alternatively, we could graph either of the functions x(t) or
y(t) by themselves as we do for solutions to single equations — but then some information
is lost.
The method we prefer is the phase space plot obtained by thinking of (x(t), y(t)) as the
position of a particle in the xy-plane at time t. We then graph the point (x(t), y(t)) in
the plane as t varies. When looking at phase space plots, it is natural to call solutions
trajectories, since we can imagine that we are watching a particle moving in the plane as
time changes.
We begin by considering uncoupled linear equations. As we saw, when the initial conditions
are on a coordinate axis (either (x0 , 0) or (0, y0 )), the solutions remain on that coordinate

209
§4.3 Uncoupled Linear Systems of Two Equations

axis for all time t. For these initial conditions, the equations behave as if they were one
dimensional. However, if we consider an initial condition (x0 , y0 ) that is not on a coordinate
axis, then even for an uncoupled system it is a little difficult to see what the trajectory looks
like. At this point it is useful to use the computer.
The method used to integrate planar systems of differential equations is similar to that used
to integrate single equations. The solution curve (x(t), y(t)) to (4.3.2) at a point (x0 , y0 ) is
tangent to the direction (f, g) = (ax0 + by0 , cx0 . + dy0 ). So the differential equation solver
plots the direction field (f, g) and then finds curves that are tangent to these vectors at each
point in time.
The program pplane9, written by John Polking, draws two-dimensional phase planes. In
MATLAB type

pplane9

and the window with the PPLANE9 Setup appears. pplane9 has a number of preprogrammed
differential equations listed in a menu accessed by clicking on Gallery. To explore linear
systems, choose linear system in the Gallery. (Note that the parameters in the linear system
are given by capitals rather than lower case a,b,c,d.)
To integrate the uncoupled linear system, set the parameters b and c equal to zero. We now
have the system (4.3.2) with a = 2 and d = −3. After pushing Proceed, a display window
appears. In this window the plane is filled by vectors (f, g) indicating directions.
We may start the computations by clicking with a mouse button on an initial value (x0 , y0 ).
For example, if we click approximately onto (x(0), y(0)) = (x0 , y0 ) = (1, 1), then the trajec-
tory in the upper right quadrant of Figure 13 displays.
First pplane9 draws the trajectory in forward time for t ≥ 0 and then it draws the trajectory
in backwards time for t ≤ 0. More precisely, when we click on a point (x0 , y0 ) in the (x, y)-
plane, pplane9 computes that part of the solution that lies inside the specified display window
and that goes through this point. For linear systems there is precisely one solution that
goes through a specified point in the (x, y)-plane.

Saddles, Sinks, and Sources for the Uncoupled System (4.3.2) In a qualitative fash-
ion, the trajectories of uncoupled linear systems are determined by the invariance of the
coordinate axes and by the signs of the constants a and d.

Saddles: ad < 0 In Figure 13, where a = 2 > 0 and d = −3 < 0, the origin is a saddle. If we
choose several initial values (x0 , y0 ) one after another, then we find that as time increases

210
§4.3 Uncoupled Linear Systems of Two Equations

y
−1

−2

−3

−4

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

Figure 13: PPLANE9 Display for (4.3.2) with a = 2, d = −3 and x, y ∈ [−5, 5]. Solutions
{pp_dsp1} going through (±1, ±1) are shown.

all solutions approach the x-axis. That is, if (x(t), y(t)) is a solution to this system of
differential equations, then lim y(t) = 0. This observation is particularly noticeable when
t→∞
we choose initial conditions close to the origin (0, 0). On the other hand, solutions also
approach the y-axis as t → −∞. These qualitative features of the phase plane are valid
whenever a > 0 and d < 0.
When a < 0 and d > 0, then the origin is also a saddle — but the roles of the x and y axes
are reversed.

Sinks: a < 0 and d < 0 Now change the parameter a to −1. After clicking on Proceed
and specifying several initial conditions, we see that all solutions approach the origin as
time tends to infinity. Hence — as mentioned previously, and in contrast to saddles — the
equilibrium (0, 0) is asymptotically stable. Observe that solutions approach the origin on
trajectories that are tangent to the x-axis. Since d < a < 0, the trajectory decreases to zero
faster in the y direction than it does in the x-direction. If you change parameters so that
a < d < 0, then trajectories will approach the origin tangent to the y-axis.

211
§4.3 Uncoupled Linear Systems of Two Equations

Sources: a > 0 and d > 0 Choose the constants a and d so that both are positive. In
forward time, all trajectories, except the equilibrium at the origin, move towards infinity
and the origin is called a source.

Time Series Using pplane9 We may also use pplane9 to graph the time series of the single
components x(t) and y(t) of a solution (x(t), y(t)). For this we choose x vs. t from the
Graph menu. After using the mouse to select a solution curve, another window with the
title PPLANE9 t-plot appears. There the time series of x(t) is shown. For example, when
the differential equation is a sink, we observe that this component approaches 0 as time t
tends to infinity. We may also display the time series of both components x(t) and y(t)
simultaneously by clicking on Both in the PPLANE9 t-plot window. Again we see that both
x(t) and y(t) tend to 0 for increasing t.
We may also visualize the time series of x(t) and y(t) in the three-dimensional (x, y, t)-space.
To see this, click onto 3 D and a curve (x(t), y(t), t) becomes visible. Since x(t) and y(t)
approach 0 for t → ∞ we see that this curve approaches the t-axis for increasing time t.
Finally, we may look at all the different visualizations — the phase space plot, the time series
for x(t) and y(t) and the three-dimensional representation of the solution — by clicking the
Composite button. See Figure 14.

Exercises

In Exercises 1 – 2 find all equilibria of the given system of nonlinear autonomous differential
equations.
{c3.5.1A}
1.

ẋ = x−y
ẏ = x2 − y.

Answer: There are two equilibria at (0, 0) and (1, 1).


Solution: Find the equilibria by solving the equations

ẋ = x−y
ẏ = x2 − y.

Substituting x = y from the 1st equation into the 2nd equation yields y 2 − y = y(y − 1) = 0.
Therefore, y = 0 or y = 1.

212
§4.3 Uncoupled Linear Systems of Two Equations

{c3.5.1B}
2.

ẋ = x2 − xy
ẏ = x2 + y 2 − 4.
√ √ √ √
Answer: There are four equilibria at (0, 2), 0, −2), ( 2, 2), and (− 2, − 2).
Solution: Find the equilibria by solving the equations

ẋ = x2 − xy
ẏ = x2 + y 2 − 4.

The 1st equation implies that either x = 0 or x = y. Substituting x = 0 into the 2nd equation
yields y 2 = 4. Therefore,
√ y =√2 or y = −2. Substituting x = y into the 2nd equation yields 2y 2 = 4.
Therefore, y = 2 or y = − 2.

In Exercises 3 – 5 consider the uncoupled system of differential equations (4.3.2). For each choice
{E:uncoupleda} of a and d, determine whether the origin is a saddle, source, or sink.
3. a = 1 and d = −1.
Answer: The origin is a saddle.
Solution: This uncoupled system is of the form
dx
(t) = Ax(t)
dt
dy
(t) = Dy(t)
dt
If AD < 0, then the origin is a saddle. If A < 0 and D < 0, then the origin is a sink. If A > 0 and
{E:uncoupledb} D > 0, then the origin is a source. In this case, AD = −1 < 0.
4. a = −0.01 and d = −2.4.
Answer: The origin is a sink.

{E:uncoupledc} Solution: For this uncoupled system, A = −0.01 < 0 and D = −2.4 < 0.
5. a = 0 and d = −2.3.
dx
For this system, A = 0, so the origin is neither a saddle, a source nor a sink. Note that = 0, so
dt
every point on the x-axis is an equilibrium.
{c3.4.2}
6. Let (x(t), y(t)) be the solution (4.3.3) of (4.3.2) with initial condition (x(0), y(0)) = (x0 , y0 ),
where x0 6= 0 6= y0 .

213
§4.3 Uncoupled Linear Systems of Two Equations

(a) Show that the points (x(t), y(t)) lie on the curve whose equation is:

y0a xd − xd0 y a = 0.

(b) Verify that if a = 1 and d = 2, then the solution lies on a parabola tangent to the x-axis.

The solutions x(t) and y(t) are:


x(t) = x0 eAt
.
y(t) = y0 eDt
We show that the point (x(t), y(t)) lies on the curve y0A xD − xD
0 y
A
= 0 as follows. Substitute the
formulas for x(t) and y(t) into the equation to obtain
 D  A
y0A x0 eAt − xD0 y0 eDt = xD A ADt
0 y0 e − xD A ADt
o y0 e = 0.

If A = 1 and D = 2, then the solutions lie on the curve 0 = y0 x2 − x20 y, which can be rewritten as
y0
y = 2 x2 . Since x0 and y0 are constants, this curve is a parabola tangent to the x-axis.
x0

{c3.4.3}
7. Usethe phase plane picture given in Figure 13 to draw the time series x(t) when (x(0), y(0)) =

1 1
, . Check your answer using pplane9.
2 2
1
We are given x(0) = . Figure 13 shows that x increases as t increases, and that x limits on 0 as t
2
decreases. From this information, we can sketch a graph of x(t) for the system. The graph should
be similar to Figure 7, which can be viewed in MATLAB by entering the system in pplane9, then
choosing “x vs. t” from the graph menu.

{c3.4.4} 8. (matlab) For the three choices of a and d in the uncoupled system of linear differential equations
in Exercises 3 – 5, use pplane9 to compute phase portraits. Use Keyboard input to look at solutions
with initial conditions on the x and y axes. As time t increases, do solutions with these initial
conditions tend towards or away from the origin?
(a) Solutions with initial points on the y-axis approach the origin as t increases. Solutions with
initial points on the x-axis go away from the origin as t increases. Initial points on the x and y-axes
are shown in Figure 8a.
(b) Solutions with initial points on either axis approach the origin as t increases. However, points
on the x-axis approach slowly because the value of A is small, so the MATLAB graph, shown in
Figure 8b, is incomplete.
(c) Solutions on the y-axis approach the origin as t increases. Solutions on the x-axis do not move,
dx
since = 0. Initial conditions for this system are shown in Figure 8c.
dt

214
§4.3 Uncoupled Linear Systems of Two Equations

{c3.4.5} 9. (matlab) Suppose that a and d are both negative, so that the origin is asymptotically stable.
Make several choices of a < d < 0 and observe that solution trajectories tend to approach the origin
tangent to one of the axes. Determine which one. Try to prove that your experimental guess is
always correct?
Answer: Trajectories approach the origin tangent to the y-axis if A < D < 0.
Solution: We can prove this fact by showing that, as t → ∞, the tangent direction of the trajectory
limits on the y-axis. The tangent vector of the trajectory (x(t), y(t)) at any point is:
 
dx dy    
(t), (t) = (Ax(t), Dy(t)) = Ax0 eAt , Dy0 eDt = eDt Ax0 e(A−D)t , Dy0
dt dt

The value of eDt is relevant only to the length of the tangent and does not affect the direction.
As t approaches infinity, e(A−D)t approaches 0, since A − D < 0. Therefore, the limiting tangent
direction is  
lim Ax0 e(A−D)t , Dy0 = (0, Dy0 ),
t→∞

which is on the y-axis. So our conjecture is indeed true.

{c3.4.6} 10. (matlab) Suppose that a = d < 0. Verify experimentally using pplane9 that all trajectories
approach the origin along straight lines. Try to prove this conjecture?
If A = D < 0, the equations for the system will be
dx
(t) = Ax(t)
dt .
dy
(t) = Ay(t)
dt
Therefore,
x(t) = x0 eAt and y(t) = y0 eAt .
Solve the first equation for eAt and substitute into the second, obtaining
y0
y(t) = x(t).
x0
y0
Since is a constant, all trajectories are straight lines. Since A < 0, all trajectories go toward the
x0
origin as t increases.

215
§4.3 Uncoupled Linear Systems of Two Equations

25 20

18

20 16

14

15 12

10
x

y
10 8

5 4

0 0
−1 −0.5 0 0.5 1 1.5 2 −1 −0.5 0 0.5 1 1.5 2
t t

2 2

1.5 1.5

1 1

0.5 0.5
t

0 0

−0.5 −0.5

−1 −1
20 20
15 15
10 25 10 30
20 20
5 15 5
10 10
0 5 0
y 0 y 0
x x

Figure 14: PPLANE9 Display for (4.3.2) with a = 2, d = −3 and x ∈ [0, 25], y ∈ [0, 20]. The
solution going through (1, 1) is shown. UL: (t, x(t)); UR: (t, y(t)); LL: (x(t), y(t), t); LR: all
{plotall} plots.

216
§4.3 Uncoupled Linear Systems of Two Equations

25

20

15

x
10

0
−1.5 −1 −0.5 0 0.5 1 1.5 2
t

Figure 7

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
y

−1 −1 −1

−2 −2 −2

−3 −3 −3

−4 −4 −4

−5 −5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x x

a = 1; d = −1 a = −0.01; d = −2.4 a = 0; d = −2.3


Figure 8a Figure 8b Figure 8c

217
§4.4 Coupled Linear Systems

{s:3.5} 4.4 Coupled Linear Systems


The general linear constant coefficient system in two unknown functions x1 , x2 is:
dx1
(t) = ax1 (t) + bx2 (t)
{lin3} dt (4.4.1)
dx2
(t) = cx1 (t) + dx2 (t).
dt
The uncoupled systems studied in Section 4.3 are obtained by setting b = c = 0 in (4.4.1).
We have discussed how to solve (4.4.1) by formula (4.3.3) when the system is uncoupled.
We have also discussed how to visualize the phase plane for different choices of the diagonal
entries a and d. At present, we cannot solve (4.4.1) by formula when the coefficient matrix
is not diagonal. But we may use pplane9 to solve the initial value problems numerically for
these coupled systems. We illustrate this point by solving
dx1
(t) = −x1 (t) + 3x2 (t)
dt
dx2
(t) = 3x1 (t) − x2 (t).
dt
After starting pplane9, select linear system from the Gallery and set the constants to:
a = −1, b = 3, c = 3, d = −1.
Click on Proceed. In order to have equally spaced coordinates on the x and y axes, do the
following. In the PPLANE9 Display window click on the edit button and then on the zoom
in square command. Then, using the mouse, click on the origin.

Eigendirections After computing several solutions, we find that for increasing time t all
the solutions seem to approach the diagonal line given by the equation x1 = x2 . Similarly,
in backward time t the solutions approach the anti-diagonal x1 = −x2 . In other words, as
for the case of uncoupled systems, we find two distinguished directions in the (x, y)-plane.
See Figure 15. Moreover, the computations indicate that these lines are invariant in the
sense that solutions starting on these lines remain on them for all time. This statement can
be verified numerically by using the Keyboard input in the PPLANE9Options to choose initial
{D:eigendirection} conditions (x0 , y0 ) = (1, 1) and (x0 , y0 ) = (1, −1).
Definition 4.4.1. An invariant line for a linear system of differential equations is called an
eigendirection.

Observe that eigendirections vary if we change parameters. For example, if we set b to 1,


then there are still two distinguished lines but these lines are no longer perpendicular.

218
§4.4 Coupled Linear Systems

y
−1

−2

−3

−4

−5

−6 −4 −2 0 2 4 6
x

Figure 15: PPLANE9 Display for (4.4.1) with a = −1 = d; b = 3 = c; and x, y ∈ [−5, 5].
{F:invariantlines} Solutions going through (±0.5, 0) and (0, ±0.5) are shown.

For uncoupled systems, we have shown analytically that the x and y axes are eigendirections.
The numerical computations that we have just performed indicate that eigendirections exist
for many coupled systems. This discussion leads naturally to two questions:

(a) Do eigendirections always exist?


(b) How can we find eigendirections?

The second question will be answered in Sections 4.5 and 4.6. We can answer the first
question by performing another numerical computation. In the setup window, change the
parameter b to −2. Then numerically compute some solutions to see that there are no
eigendirections in the phase space of this system. Observe that all solutions appear to spiral
into the origin as time goes to infinity. The phase portrait is shown in Figure 16.

Nonexistence of Eigendirections We now show analytically that certain linear systems of


differential equations have no invariant lines in their phase portrait. Consider the system

ẋ = y
{E:2nd->1st} (4.4.2)
ẏ = −x.

219
§4.4 Coupled Linear Systems

y
−1

−2

−3

−4

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

{pp_dsp2} Figure 16: PPLANE9Display for the linear system with a = −1, b = −2, c = 3, d = −1.

Observe that (x(t), y(t)) = (sin t, cos t) is a solution to (4.4.2) by calculating


d
ẋ(t) = sin t = cos t = y(t)
dt
d
ẏ(t) = cos t = − sin t = −x(t)
dt
We have shown analytically that the unit circle centered at the origin is a solution trajectory
for (4.4.2). Hence (4.4.2) has no eigendirections. It may be checked using MATLAB that all
solution trajectories for (4.4.2) are just circles centered at the origin.

Exercises

{c3.5.a01} 1. (matlab) Choose the linear system in pplane9 and set a = 0, b = 1, and c = −1. Then find
values d such that except for the origin itself all solutions appear to

(a) spiral into the origin;


(b) spiral away from the origin;
(c) form circles around the origin;

220
§4.4 Coupled Linear Systems

Solution:

(a) All trajectories converge on the origin when D < 0, as shown in Figure 1a, which graphs the
system with D = −1;
(b) All trajectories move away from the origin when D > 0, as shown in Figure 1b, which graphs
the system with D = 1
(c) Trajectories form circles around the origin when D = 0, as shown in Figure 1c.

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
y

y
−1 −1 −1

−2 −2 −2

−3 −3 −3

−4 −4 −4

−5 −5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x x

D = −1 D=1 D=0
Figure 1a Figure 1b Figure 1c

{c3.5.2} 2. (matlab) Choose the linear system in pplane9 and set a = −1, c = 3, and d = −1. Then find a
value for b such that the behavior of the solutions of the system is “qualitatively” the same as for a
diagonal system where a and d are negative. In particular, the origin should be an asymptotically
stable equilibrium and the solutions should approach that equilibrium along a distinguished line.
Answer: The solutions are similar to those for a diagonal system when B = 0.
Solution: At this value of B, the origin is a stable equilibrium, and the solutions approach the
origin tangent to the line x = 0. When B < 0, the graph is a spiral, so the solutions do not approach
the origin along a distinguished line. When B > 0, the origin is a saddle rather than a sink; that
is, the origin is not an asymptotically stable equilibrium. Figure 2 shows five sample trajectories
for the system, two of which begin on the y-axis.

{c3.5.3} 3. (matlab) Choose the linear system in pplane9 and set a = d and b = c. Verify that for these
systems of differential equations:

(a) When |a| < b typical trajectories approach the line y = x as t → ∞ and the line y = −x as
t → −∞.
(b) Assume that b is positive, a is negative, and b < −a. With these assumptions show that the
origin is a sink and that typical trajectories approach the origin tangent to the line y = x.

221
§4.4 Coupled Linear Systems

y
−1

−2

−3

−4

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

Figure 2

Graphs made in pplane9 using the axis('equal') command verify these statements regarding linear
systems where A = D and B = C. Figure 3a, uses A = D = −1 and B = C = 2, and shows four
sample trajectories which approach the line y = x as t → ∞. Figure 3b graphs the linear system
with A = D = −3 and B = C = 2. It shows four sample trajectories, three of which approach the
origin tangent to y = x. The fourth trajectory has an initial point 0 < x0 = −y0 and approaches
the origin on the straight line y = −x, which is orthogonal to y = x.

{c3.5.4} 4. (matlab) Sketch the time series y(t) for the solution to the differential
  equation whose phase
1 1
plane is pictured in Figure 16 with initial condition (x(0), y(0)) = , . Check your answer
2 2
using pplane9.
The pplane9 graph for the linear system (Figure 16) is a spiral. So as t → ∞, y approaches 0
oscillating between positive and negative values. The graph of y vs. t for the trajectory with
1 1
initial condition (x0 , y0 ) = , is shown in Figure 4.
2 2

In Exercises 5 – 8, determine which of the function pairs (x1 (t), y1 (t)) and (x2 (t), y2 (t)) are solutions
{c3.5.5a} to the given system of ordinary differential equations.
5. The ODE is:

ẋ = 2x + y
ẏ = 3y.

222
§4.4 Coupled Linear Systems

5 5

4 4

3 3

2 2

1 1

0 0
y

y
−1 −1

−2 −2

−3 −3

−4 −4

−5 −5

−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
x x

Figure 3a Figure 3b

The pairs of functions are:

(x1 (t), y1 (t)) = (e2t , 0) and (x2 (t), y2 (t)) = (e3t , e3t ).

Answer: Both function pairs are solutions to the given system.


Solution: To determine whether (x1 (t), y1 (t)) = (e2t , 0) is a solution to the system, compute the
left hand sides of the equations:
dx1 d dy1 d
(t) = (e2t ) = 2e2t and (t) = (0) = 0.
dt dt dt dt
Then compute the right hand sides of the equations:

2x1 (t) + y1 (t) = 2e2t + 0 = 2e2t and 3y1 (t) = 3(0) = 0.

Since the left hand side of each equation equals the right hand side, the equations are consistent,
and the pair of functions is a solution.
Similarly, to determine whether (x2 (t), y2 (t)) = (e3t , e3t ) is a solution to the system, compute
the left hand sides of the equations:
dx2 d dy2 d
(t) = (e3t ) = 3e3t and (t) = (e3t ) = 3e3t .
dt dt dt dt
Then compute the right hand sides of the equations:

2x2 (t) + y2 (t) = 2e3t + e3t = 3e2t and 3y2 (t) = 3e3t .

Since the left hand side of each equation equals the right hand side, the equations are consistent,
and the pair of functions is a solution.

223
§4.4 Coupled Linear Systems

10

−5

y
−10

−15

−20

−25
−4 −2 0 2 4 6
t

Figure 4

{c3.5.5b}
6. The ODE is:

ẋ = 2x − 3y
ẏ = x − 2y.

The pairs of functions are:

(x1 (t), y1 (t)) = et (3, 1) and (x2 (t), y2 (t)) = (e−t , e−t ).

Answer: Both function pairs are solutions to the given system.


Solution: To determine whether (x1 (t), y1 (t)) = (3et , et ) is a solution to the system, compute the
left hand sides of the equations:
dx1 d dy1 d
(t) = (3et ) = 3et and (t) = (et ) = et .
dt dt dt dt
Then compute the right hand sides of the equations:

2x1 (t) − 3y1 (t) = 2(3et ) − 3et = 3et and x1 (t) − 2y1 (t) = 3et − 2et = et .

Since the left hand side of each equation equals the right hand side, the equations are consistent,
and the pair of functions is a solution.
Similarly, to determine whether (x2 (t), y2 (t)) = (e−t , e−t ) is a solution to the system, com-
pute the left hand sides of the equations:
dx2 d dy2 d
(t) = (e−t ) = −e−t and (t) = (e−t ) = −e−t .
dt dt dt dt

224
§4.4 Coupled Linear Systems

Then compute the right hand sides of the equations:

2x2 (t) − 3y2 (t) = 2e−t − 3e−t = −e−t and x2 (t) − 2y2 (t) = e−t − 2e−t = −e−t .

Since the left hand side of each equation equals the right hand side, the equations are consistent,
{c3.5.5c} and the pair of functions is a solution.
7. The ODE is:

ẋ = x+y
ẏ = −x + y.

The pairs of functions are:

(x1 (t), y1 (t)) = (3et , −2et ) and (x2 (t), y2 (t)) = et (sin t, cos t).

Answer: The function pair (et sin t, et cos t) is a solution to the system, while the function pair
(3et , −2et ) is not.
Solution: To determine whether (x1 (t), y1 (t)) = (3et , −2et ) is a solution to the system, compute
the left hand sides of the equations:
dx1 d dy1 d
(t) = (3et ) = 3et and (t) = (−2et ) = −2et .
dt dt dt dt
Then compute the right hand sides of the equations:

x1 + y1 = 3et − 2et = et and − x1 + y1 = −3et − 2et = −5et .


dx1 dy1
Since 6= x1 + y1 and 6= −x1 + y1 , the pair of functions is not a solution.
dt dt
Similarly, to determine whether (x2 (t), y2 (t)) = (et sin t, et cos t) is a solution to the system,
compute the left hand sides of the equations:
dx2 d dy2 d
(t) = (et sin t) = et sin t + et cos t and (t) = (et cos t) = et cos t − et sin t.
dt dt dt dt
Then compute the right hand sides of the equations:

x2 (t) + y2 (t) = et sin t + et cos t and − x2 (t) + y2 (t) = −et sin t + et cos t.

Since the left hand side of each equation equals the right hand side, the equations are consistent,
{c3.5.5d} and the pair of functions is a solution.
8. The ODE is:

ẋ = y
1 1
ẏ = − x + y + 1.
t2 t

225
§4.4 Coupled Linear Systems

The pairs of functions are:

(x1 (t), y1 (t)) = (t2 , 2t) and (x2 (t), y2 (t)) = (2t2 , 4t).

Answer: The function pair (t2 , 2t) is a solution to the system, while the function pair (2t2 , 4t) is
not.
Solution: To determine whether (x1 (t), y1 (t)) = (t2 , 2t) is a solution to the system, compute the
left hand sides of the equations:
dx1 d dy1 d
(t) = (t2 ) = 2t and (t) = (2t) = 2.
dt dt dt dt
Then compute the right hand sides of the equations:
1 1 1 1
y2 (t) = 2t and − x1 (t) + y1 (t) + 1 = − 2 (t2 ) + (2t) + 1 = −1 + 2 + 1 = 2.
t2 t t t
Since the left hand side of each equation equals the right hand side, the equations are consistent,
and the pair of functions is a solution.
Similarly, to determine whether (x2 (t), y2 (t)) = (2t2 , 4t) is a solution to the system, compute
the left hand sides of the equations:
dx2 d dy2 d
(t) = (2t2 ) = 4t and (t) = (4t) = 4.
dt dt dt dt
Then compute the right hand sides of the equations:
1 1 1 1
y2 (t) = 4t and − x2 (t) + y2 (t) + 1 = − 2 (2t2 ) + (4t) + 1 = −2 + 4 + 1 = 3.
t2 t t t
dy2 1 1
Since 6= − 2 x2 + y2 + 1, the function pair is not a solution.
dt t t

226
§4.5 The Initial Value Problem and Eigenvectors

{S:IVP&E} 4.5 The Initial Value Problem and Eigenvectors


The general constant coefficient system of n differential equations in n unknown functions
has the form
dx1
(t) = c11 x1 (t) + · · · + c1n xn (t)
dt
.. .. ..
{lingen} . . . (4.5.1)
dxn
(t) = cn1 x1 (t) + · · · + cnn xn (t)
dt
where the coefficients cij ∈ R are constants. Suppose that (4.5.1) satisfies the initial condi-
tions
x1 (0) = K1 , . . . , xn (0) = Kn .

Using matrix multiplication of a vector and matrix, we can rewrite these differential equa-
tions in a compact form. Consider the n × n coefficient matrix
 
c11 c12 · · · c1n
 c21 c22 · · · c2n 
C= .. .. .. 
 
 . . . 
cn1 cn2 ··· cnn

and the n vectors of initial conditions and unknowns


   
K1 x1
X0 =  ...  and X =  ..  .
. 
  

Kn xn

Then (4.5.1) has the compact form


dX
= CX
{E:geneqn} dt (4.5.2)
X(0) = X0 .

In Section 4.4, we plotted the phase space picture of the planar system of differential equa-
tions    
ẋ x(t)
{-13} =C (4.5.3)
ẏ y(t)
where  
−1 3
C= .
3 −1

227
§4.5 The Initial Value Problem and Eigenvectors

In those calculations we observed that there is a solution to (4.5.3) that stayed on the main
diagonal foreachmoment in time. Note that a vector is on the main diagonal if it is a scalar
1
multiple of . Thus a solution that stays on the main diagonal for all time t must have
1
the form    
x(t) 1
{e:diagform} = u(t) (4.5.4)
y(t) 1
for some real-valued function u(t). When a function of form (4.5.4) is a solution to (4.5.3),
it satisfies:
     
1 ẋ(t) x(t)
u̇(t) = =C
1 ẏ(t) y(t)
   
1 1
= Cu(t) = u(t)C .
1 1

A calculation shows that    


1 1
C =2 .
1 1
Hence    
1 1
u̇(t) = 2u(t) .
1 1
It follows that the function u(t) must satisfy the differential equation

du
= 2u.
dt
whose solutions are
u(t) = αe2t ,
for some scalar α.
Similarly, we also saw in our MATLAB experiments that there was a solution that for all
time stayed on the anti-diagonal, the line y = −x. Such a solution must have the form
   
x(t) 1
= v(t) .
y(t) −1

A similar calculation shows that v(t) must satisfy the differential equation

dv
= −4v.
dt

228
§4.5 The Initial Value Problem and Eigenvectors

Solutions to this equation all have the form

v(t) = βe−4t ,

for some real constant β.


Thus, using matrix multiplication, we are able to prove analytically that there are solutions
to (4.5.3) of exactly the type suggested by our MATLAB experiments. However, even more
is true and this extension is based on the principle of superposition that was introduced for
algebraic equations in Section 3.4.

Superposition in Linear Differential Equations Consider a general linear differential equa-


tion of the form
dX
{gen1} = CX, (4.5.5)
dt
where C is an n×n matrix. Suppose that Y (t) and Z(t) are solutions to (4.5.5) and α, β ∈ R
are scalars. Then X(t) = αY (t) + βZ(t) is also a solution. We verify this fact using the
‘linearity’ of d/dt. Calculate
d dY dZ
X(t) = α (t) + β (t)
dt dt dt
= αCY (t) + βCZ(t)
= C(αY (t) + βZ(t))
= CX(t).

So superposition is valid for solutions of linear differential equations.

Initial Value Problems Suppose that we wish to find a solution to (4.5.3) satisfying the
initial conditions    
x(0) 1
= .
y(0) 3
Then we can use the principle of superposition to find this solution in closed form. Super-
position implies that for each pair of scalars α, β ∈ R, the functions
     
x(t) 1 1
{e:solnODE} = αe 2t
+ βe −4t
, (4.5.6)
y(t) 1 −1

are solutions to (4.5.3). Moreover, for a solution of this form


   
x(0) α+β
= .
y(0) α−β

229
§4.5 The Initial Value Problem and Eigenvectors

Thus we can solve our prescribed initial value problem, if we can solve the system of linear
equations

α+β =1
α − β = 3.

This system is solved for α = 2 and β = −1. Thus


     
x(t) 2t 1 −4t 1
= 2e −e
y(t) 1 −1

is the desired closed form solution.

Eigenvectors and Eigenvalues We emphasize that just knowing that there are two lines in
the plane that are invariant under the dynamics of the system of linear differential equations
is sufficient information to solve these equations. So it seems appropriate to ask the question:
When is there a line that is invariant under the dynamics of a system of linear differential
equations? This question is equivalent to asking: When is there a nonzero vector v and a
nonzero real-valued function u(t) such that

X(t) = u(t)v

is a solution to (4.5.5)?
Suppose that X(t) is a solution to the system of differential equations Ẋ = CX. Then u(t)
and v must satisfy
dX
{E:diffdir} u̇(t)v = = CX(t) = u(t)Cv. (4.5.7)
dt
Since u is nonzero, it follows that v and Cv must lie on the same line through the origin.
Hence
{e:eigendef} Cv = λv, (4.5.8)

{D:eigenvalue1} for some real number λ.

Definition 4.5.1. A nonzero vector v satisfying (4.5.8) is called an eigenvector of the matrix
C, and the number λ is an eigenvalue of the matrix C.

Geometrically, the matrix C maps an eigenvector onto a multiple of itself — that multiple
is the eigenvalue.
Note that scalar multiples of eigenvectors are also eigenvectors. More precisely:

230
§4.5 The Initial Value Problem and Eigenvectors

{L:e’vector}
Lemma 4.5.2. Let v be an eigenvector of the matrix C with eigenvalue λ. Then αv is also
an eigenvector of C with eigenvalue λ as long as α 6= 0.

Proof By assumption, Cv = λv and v is nonzero. Now calculate

C(αv) = αCv = αλv = λ(αv).

The lemma follows from the definition of eigenvector. 

It follows from (4.5.7) and (4.5.8) that if v is an eigenvector of C with eigenvalue λ, then
du
= λu.
dt
Thus we have returned to our original linear differential equation that has solutions

u(t) = Keλt ,

for all constants K.

{T:eigensoln} We have proved the following theorem.


Theorem 4.5.3. Let v be an eigenvector of the n × n matrix C with eigenvalue λ. Then

X(t) = eλt v

is a solution to the system of differential equations Ẋ = CX.

Finding eigenvalues and eigenvectors from first principles — even for 2 × 2 matrices — is
not a simple task. We end this section with a calculation illustrating that real eigenvalues
need not exist. In Section 4.6, we present a natural method for computing eigenvalues
(and eigenvectors) of 2 × 2 matrices. We defer the discuss of how to find eigenvalues and
eigenvectors of n × n matrices until Chapter 7.

An Example of a Matrix with No Real Eigenvalues Not every matrix has real eigenvalues and
eigenvectors. Recall the linear system of differential equations ẋ = Cx whose phase plane is
pictured in Figure 16. That phase plane showed no evidence of an invariant line and indeed
there is none. The matrix C in that example was
 
−1 −2
C= .
3 −1

231
§4.5 The Initial Value Problem and Eigenvectors

We ask: Is there a value of λ and a nonzero vector (x, y) such that


   
x x
{E:eigexamp} C =λ ? (4.5.9)
y y

Equation (4.5.9) implies that


  
−1 − λ −2 x
= 0.
3 −1 − λ y

If this matrix is row equivalent to the identity matrix, then the only solution of the linear
system is x = y = 0. To have a nonzero solution, the matrix
 
−1 − λ −2
3 −1 − λ

must not be row equivalent to I2 . Dividing the 1st row by −(1 + λ) leads to
2 !
1
1+λ .
3 −1 − λ

Subtracting 3 times the 1st row from the second produces the matrix
2
 
1
 1+λ .

 6
0 −(1 + λ) −
1+λ
This matrix is not row equivalent to I2 when the lower right hand entry is zero; that is,
when
6
(1 + λ) + = 0.
1+λ
That is, when
(1 + λ)2 = −6,
which is not possible for any real number λ. This example shows that the question of
whether a given matrix has a real eigenvalue and a real eigenvector — and hence when the
associated system of differential equations has a line that is invariant under the dynamics
— is a subtle question.
Questions concerning eigenvectors and eigenvalues are central to much of the theory of linear
algebra. We discuss this topic for 2×2 matrices in Section 4.6 and Chapter 6 and for general
square matrices in Chapters 7 and 11.

232
§4.5 The Initial Value Problem and Eigenvectors

Exercises

{c4.1.5}
1. Write the system of linear ordinary differential equations
dx1
(t) = 4x1 (t) + 5x2 (t)
dt
dx2
(t) = 2x1 (t) − 3x2 (t)
dt
in matrix form.
Answer:     
dx1
 dt (t)   4 5   x1 (t) 
=
dx2
  
(t) 2 −3 x2 (t)
dt

{c4.4.4}
2. Show that all solutions to the system of linear differential equations
dx
= 3x
dt
dy
= −2y
dt
are linear combinations of the two solutions
   
1 0
U (t) = e3t and V (t) = e−2t .
0 1

The system is uncoupled, so we can solve each equation independently, using the initial value
problem to obtain:
x(t) = x0 e3t
y(t) = y0 e−2t .
All solutions are of the form
x0 e3t
         
x(t) 1 0
= −2t = x0 e3t + y0 e−2t .
y(t) y0 e 0 1

So all solutions are linear combinations of


   
1 0
U (t) = e3t and V (t) = e−2t .
0 1

233
§4.5 The Initial Value Problem and Eigenvectors

{c4.5.1}
3. Consider
dX
{e:Ceqn} (t) = CX(t) (4.5.10)
dt
where  
2 3
C= .
0 −1
Let    
1 1
v1 = and v2 = ,
0 −1
and let
Y (t) = e2t v1 and Z(t) = e−t v2 .

(a) Show that Y (t) and Z(t) are solutions to (4.5.10).


(b) Show that X(t) = 2Y (t) − 14Z(t) is a solution to (4.5.10).
(c) Use the principle of superposition to verify that X(t) = αY (t) + βZ(t) is a solution to (4.5.10).
(d) Using the general solution found in part (c), find a solution X(t) to (4.5.10) such that
 
3
X(0) = .
−1

Solution:

(a) In order to determine that Y (t) is a solution to (4.5.10), substitute Y (t) into both sides of the
dX
equation = CX:
dt
 2t  
2e2t
   
dY d 1 d e
= e2t = = ;
dt dt 0 dt 0 0

e2t 2e2t
    
2 3
CY (t) = = .
0 −1 0 0
Similarly, show that Z(t) is a solution:

e−t −e−t
      
dZ d 1 d
= e−t = = ;
dt dt −1 dt −e−t e−t

e−t −e−t
    
2 3
CZ(t) = = .
0 −1 −e−t e−t

234
§4.5 The Initial Value Problem and Eigenvectors

(b) Again, verify that X(t) = 2Y (t) − 14Z(t) is a solution to (4.5.10) by substituting into both
sides of the equation and noting that the values are equal:

2e2t − 14e−t 4e2t + 14e−t


        
dX d 1 1 d
= 2e2t − 14e−t = −t = −t ;
dt dt 0 −1 dt 14e −14e

2e2t 14e−t 4e2t + 14e−t


     
CX(t) = C (2Y (t) − 14Z(t)) = C − = .
0 −14e−t −14e−t
(c) As demonstrated in Section 3.4, if Y (t) and Z(t) are both solutions to (4.5.10), then X(t) =
αY (t) + βZ(t) is also a solution to (4.5.10).
(d) Answer:    
1 1
X(t) = 2e2t + e−t .
0 −1
Solution: Note that
   
1 1
X(t) = αY (t) + βZ(t) = αe2t + βe−t
0 −1

is a solution to (4.5.10). Substitute the value X(0) = (3, −1)t into the equation to find a
solution with that initial condition:
     
3 1 1
= X(0) = α +β .
−1 0 −1

We now have the linear system:

3 = α + β
−1 = −β

which we can solve to find α = 2 and β = 1.

{c4.5.2}
4. Find a solution to
Ẋ(t) = CX(t)
where  
1 −1
C=
−1 1
and  
2
X(0) = .
1
Hint: Observe that    
1 1
and
1 −1

235
§4.5 The Initial Value Problem and Eigenvectors

are eigenvectors of C.
Answer:    
3 1 1 2t 1
X(t) = + e .
2 1 2 −1

Solution: Note that if Cv = λv, then X(t) = eλt v is a solution to Ẋ(t) = CX(t). Let
   
1 1
v1 = and v2 = .
1 −1
The eigenvalues corresponding to v1 and v2 are λ1 = 0 and λ2 = 2. This can be verified by
calculating Cv1 = 0 and Cv2 = 2v2 . So,
   
1 1
X(t) = and X(t) = e2t
1 −1

are both solutions to Ẋ(t) = CX. By the principle of superposition,


   
1 2t 1
X(t) = α + βe
1 −1
is also a solution. Substitute the given the initial condition into the equation to obtain
     
2 1 1
= X(0) = α +β .
1 1 −1
Now, solve the linear system
2 = α + β
1 = α − β
3 1
to find that α = and β = .
2 2
{mc.exercise8}
5. Solve the initial value problem to the planar system of differential equations
 
dX 2 −1
= X = CX
dt −4 2

where X(0) = (0, 4)t .


1 − e4t
 
Answer: X(t) = .
2 + 2e4t
 
2 −1
Solution: The characteristic polynomial of the matrix C = is p(λ) = λ2 − 4λ =
−4 2
λ(λ − 4) and hence, the eigenvalues of C are λ1 = 0 and λ2 = 4. Computing the eigenvectors, we
must solve for X in
   
2 −1 −2 −1
CX = X = 0 and (C − 4I2 )X = X = 0.
−4 2 −4 −2

236
§4.5 The Initial Value Problem and Eigenvectors

Call the solutions v1 = (1, 2)t and v2 = (−1, 2)t , respectively. The general solution is then of the
form
X(t) = α1 v1 + α2 e4t v2
with α1 and α2 variable. To obtain the specific solution subject to the initial condition, we solve
   
α1 − α2 0
X(0) = α1 v1 + α2 v2 = = .
2α1 + 2α2 4
By inspection, α1 = 1 and α2 = 1. Hence,

1 − e4t
 
X(t) = .
2 + 2e4t

{c4.5.3}
6. Let  
a b
C= .
b a
Show that    
1 1
and
1 −1
are eigenvectors of C. What are the corresponding eigenvalues?
Answer: Let    
1 1
v1 = and v2 = .
1 −1
The vector v1 is an eigenvector of C with corresponding eigenvalue a + b, and v2 is an eigenvector
with eigenvalue a − b.
Solution: Calculate
      
a b 1 a+b 1
Cv1 = = = (a + b) .
b a 1 a+b 1
      
a b 1 a−b 1
Cv2 = = = (a − b) .
b a −1 b−a −1
{c4.5.4}
7. Let  
1 2
C= .
−3 −1
Show that C has no real eigenvectors.
A vector (x, y) is an eigenvector of C if
   
x x
C =λ
y y

237
§4.5 The Initial Value Problem and Eigenvectors

that is, if  
x
(C − λI2 ) = 0.
y
In this case,   
1−λ 2 x
= 0.
−3 −1 − λ y
This equation will have a nonzero solution (x, y) only if
  
1−λ 2 x
−3 −1 − λ y

is not row equivalent to the identity matrix. Row reducing the matrix yields

2
 
 1 1 − λ 
 6 
0 −1 − λ +
1−λ
so C has an eigenvector when
6
−1 − λ + = 0,
1−λ
that is, when λ2 = −5. Therefore, C has no real eigenvectors.

{c4.9.6A}
8. Suppose that A is an n × n matrix with zero as an eigenvalue. Show that A is not invertible.
Hint: Assume that A is invertible and compute A−1 Av where v is an eigenvector of A corresponding
to the zero eigenvalue.
Solution: Suppose that A is an n × n matrix with zero eigenvalue. We need to prove that A is not
invertible. Let v ∈ Rn be a nonzero eigenvector corresponding to the zero eigenvalue. Therefore,
Av = 0. Suppose that A−1 exists. Then

v = In v = A−1 Av = A−1 0 = 0,

which is a contradiction. Therefore, A is not invertible.

Remark: In fact, A is invertible if all of the eigenvalues of A are nonzero. See Corollary 7.2.5 of
Chapter 7.

{c4.5.6} 9. (matlab) Consider the matrix A and vector X0 given by


   
2 1 1
A= and X0 = .
0 1 1

238
§4.5 The Initial Value Problem and Eigenvectors

Use map to compute X1 = AX0 , X2 = AX1 , X3 = AX2 etc. by a repeated use of the Map button in
the MAP Display window. What do you observe? What happens if you start  the iteration
 process
−1
with a different choice for X0 , and, in particular, for an X0 that is close to ?
1
For each iteration, the y-coordinate of the solution is 1. After a number of iterations, the x-
coordinate approaches infinity, so the direction of the solution vector approaches (1, 0), which, as
shown in Exercise 10, is an eigenvector of A. For any X0 6= (−1, 1), the solution vector approaches
the direction (1, 0). If X0 = (−1, 1) or some multiple thereof, then the solution vectors remains on
the line y = −x, since (−1, 1) is also an eigenvector of A.

In Exercises 10 – 11 use map to find an (approximate) eigenvector for the given matrix. Hint:
Choose a vector in map and repeatedly click on the button Map until the vector maps to a multiple
of itself. You may wish to use the Rescale feature in the MAP Options. Then the length of the
vector is rescaled to one after each use of the command Map. In this way, you can avoid overflows
in the computations while still being able to see the directions where the vectors are moved by the
matrix mapping. The coordinates of the new vector obtained by applying map can be viewed in
the Vector input window.
 
2 −2
10. (matlab) B = .
{c4.5.5a} 2 7
The vector (−1, 2)t is an eigenvector of B with corresponding eigenvalue 6, and (−2, 1)t is an
eigenvector with corresponding eigenvalue 3.

 
1 1.5
11. (matlab) C = .
{c4.5.5b} 0 −2
The vector (−1, 2)t is an eigenvector of C with corresponding eigenvalue −2, and the vector (1, 0)t
is an eigenvector with eigenvalue 1.

{c4.4.5} 12. (matlab) Use MATLAB to verify that solutions to the system of linear differential equations
dx
= 2x + y
dt
dy
= y
dt
are linear combinations of the two solutions
   
1 −1
U (t) = e2t and V (t) = et .
0 1

More concretely, proceed as follows:

239
§4.5 The Initial Value Problem and Eigenvectors

(a) By superposition, the general solution to the differential equation


 has the form X(t) = αU (t)+
0
βV (t). Find constants α and β such that αU (0) + βV (0) = .
1
(b) Graph the second component y(t) of this solution using the MATLAB plot command.
(c) Use pplane9 to compute a solution via the Keyboard input starting at (x(0), y(0)) = (0, 1) and
then use the y vs t command in pplane9 to graph this solution.
(d) Compare the results of the two plots.
 
1
(e) Repeat steps (a)–(d) using the initial vector .
1

(a) Answer: If α = 1 and β = 1, then


 
0
αU (0) + βV (0) = .
1

Solution: Solve the linear system


α − β = 0
β = 1.

(b) Figure 12a shows y as a function of t. The figure was created by the MATLAB commands:

t = linspace(-8,2);
y = exp(t);
plot(t,y)

(c) Figure 12b shows the pplane9 graph of the system, and Figure 12c shows the y vs. t graph.
(d) The two plots are identical, since the pplane9 command y vs. t graphs the y component of
the solution, which is precisely what we did by hand in (b).
(e) In this case, α = 2 and β = 1. Since the y component of U (t) is zero, the graphs of y(t) are
identical to those in (b) and (c).

240
§4.5 The Initial Value Problem and Eigenvectors

7 5 6

4
6
3 5

2
5
1 4

4 0
y
y

y
−1
3
−2

−3 2
2
−4

−5 1
1
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
0 0
−8 −7 −6 −5 −4 −3 −2 −1 0 1 2 −8 −6 −4 −2 0 2
t t

Figure 12a Figure 12b Figure 12c

241
§4.6 Eigenvalues of 2 × 2 Matrices

{S:evchp} 4.6 Eigenvalues of 2 × 2 Matrices


We now discuss how to find eigenvalues of 2 × 2 matrices in a way that does not depend
explicitly on finding eigenvectors. This direct method will show that eigenvalues can be
complex as well as real.
We begin the discussion with a general square matrix. Let A be an n × n matrix. Recall
that λ ∈ R is an eigenvalue of A if there is a nonzero vector v ∈ Rn for which

{eigeneqn} Av = λv. (4.6.1)

The vector v is called an eigenvector. We may rewrite (4.6.1) as:

(A − λIn )v = 0.

Since v is nonzero, it follows that if λ is an eigenvalue of A, then the matrix A − λIn is


singular.
Conversely, suppose that A − λIn is singular for some real number λ. Then Theorem 3.7.8
of Chapter 3 implies that there is a nonzero vector v ∈ Rn such that (A − λIn )v = 0. Hence
(4.6.1) holds and λ is an eigenvalue of A. So, if we had a direct method for determining
when a matrix is singular, then we would have a method for determining eigenvalues.

Characteristic Polynomials Corollary 3.8.3 of Chapter 3 states that 2×2 matrices are singular
precisely when their determinant is zero. It follows that λ ∈ R is an eigenvalue for the 2 × 2
matrix A precisely when
{deteqn} det(A − λI2 ) = 0. (4.6.2)
We can compute (4.6.2) explicitly as follows. Note that
 
a−λ b
A − λI2 = .
c d−λ

Therefore

det(A − λI2 ) = (a − λ)(d − λ) − bc


{e:charpoly} = λ2 − (a + d)λ + (ad − bc). (4.6.3)
{charpolyn=2}
Definition 4.6.1. The characteristic polynomial of the matrix A is

pA (λ) = det(A − λI2 ).

242
§4.6 Eigenvalues of 2 × 2 Matrices

For an n × n matrix A = (aij ), define the trace of A to be the sum of the diagonal elements
of A; that is
{e:tracedef} tr(A) = a11 + · · · + ann . (4.6.4)
Thus, using (4.6.3), we can rewrite the characteristic polynomial for 2 × 2 matrices as

{e:invcharpoly} pA (λ) = λ2 − tr(A)λ + det(A). (4.6.5)

As an example, consider the 2 × 2 matrix


 
2 3
{e:2x2ex} A= . (4.6.6)
1 4

Then  
2−λ 3
A − λI2 = ,
1 4−λ
and
pA (λ) = (2 − λ)(4 − λ) − 3 = λ2 − 6λ + 5.
It is now easy to verify (4.6.5) for (4.6.6).

Eigenvalues For 2 × 2 matrices A, pA (λ) is a quadratic polynomial. As we have discussed,


the real roots of pA are real eigenvalues of A. For 2 × 2 matrices we now generalize our first
{D:eigen2} definition of eigenvalues, Definition 4.5.1, to include complex eigenvalues, as follows.
Definition 4.6.2. An eigenvalue of A is a root of the characteristic polynomial pA .

It follows from Definition 4.6.2 that every 2 × 2 matrix has precisely two eigenvalues, which
may be equal or complex conjugate pairs.
Suppose that λ1 and λ2 are the roots of pA . It follows that

{e:charpolyprod} pA (λ) = (λ − λ1 )(λ − λ2 ) = λ2 − (λ1 + λ2 )λ + λ1 λ2 . (4.6.7)

Equating the two forms of pA (4.6.5) and (4.6.7) shows that

{e:treigen} tr(A) = λ1 + λ2 (4.6.8)


{e:deteigen} det(A) = λ1 λ2 . (4.6.9)

Thus, for 2 × 2 matrices, the trace is the sum of the eigenvalues and the determinant is the
product of the eigenvalues. In Chapter 7, Theorems 7.2.4(b) and 7.2.9 we show that these
statements are also valid for n × n matrices.

243
§4.6 Eigenvalues of 2 × 2 Matrices

Recall that in example (4.6.6) the characteristic polynomial is


pA (λ) = λ2 − 6λ + 5 = (λ − 5)(λ − 1).
Thus the eigenvalues of A are λ1 = 1 and λ2 = 5 and identities (4.6.8) and (4.6.9) are easily
verified for this example.
Next, we consider an example with complex eigenvalues and verify that these identities are
equally valid in this instance. Let
 
2 −3
B= .
1 4
The characteristic polynomial is:
pB (λ) = λ2 − 6λ + 11.
Using the quadratic formula we see that the roots of pB (that is, the eigenvalues of B) are
√ √
λ1 = 3 + i 2 and λ2 = 3 − i 2.
Again the sum of the eigenvalues is 6 which equals the trace of B and the product of the
eigenvalues is 11 which equals the determinant of B.
Since the characteristic polynomial of 2 × 2 matrices is always a quadratic polynomial, it
follows that 2 × 2 matrices have precisely two eigenvalues — including multiplicity — and
these can be described as follows. The discriminant of A is:
{e:discriminant}
{eigendist} D = [tr(A)]2 − 4 det(A). (4.6.10)
Theorem 4.6.3. There are three possibilities for the two eigenvalues of a 2 × 2 matrix A
that we can describe in terms of the discriminant:

(i) The eigenvalues of A are real and distinct (D > 0).


(ii) The eigenvalues of A are a complex conjugate pair (D < 0).
(iii) The eigenvalues of A are real and equal (D = 0).

Proof We can find the roots of the characteristic polynomial using the form of pA given
in (4.6.5) and the quadratic formula. The roots are:
1 p  tr(A) ± √D
2
tr(A) ± [tr(A)] − 4 det(A) = .
2 2
The proof of the theorem now follows. If D > 0, then the eigenvalues of A are real and dis-
tinct; if D < 0, then eigenvalues are complex conjugates; and if D = 0, then the eigenvalues
are real and equal. 

244
§4.6 Eigenvalues of 2 × 2 Matrices

{L:eigenexists} Eigenvectors The following lemma contains an important observation about eigenvectors:
Lemma 4.6.4. Every eigenvalue λ of a 2 × 2 matrix A has an eigenvector v. That is, there
is a nonzero vector v ∈ C2 satisfying

Av = λv.

Proof When the eigenvalue λ is real we know that an eigenvector v ∈ R2 exists. However,
when λ is complex, then we must show that there is a complex eigenvector v ∈ C2 , and
this we have not yet done. More precisely, we must show that if λ is a complex root of the
characteristic polynomial pA , then there is a complex vector v such that

(A − λI2 )v = 0.

As we discussed in Section 2.5, finding v is equivalent to showing that the complex matrix
 
a−λ b
A − λI2 =
c d−λ

is not row equivalent to the identity matrix. See Theorem 2.5.2 of Chapter 2. Since a is
real and λ is not, a − λ 6= 0. A short calculation shows that A − λI2 is row equivalent to
the matrix
b
 
1
 a − λ .
 pA (λ) 
0
a−λ
This matrix is not row equivalent to the identity matrix since pA (λ) = 0. 

An Example of a Matrix with Real Eigenvectors Once we know the eigenvalues of a 2 × 2


matrix, the associated eigenvectors can be found by direct calculation. For example, we
showed previously that the matrix
 
2 3
A= .
1 4

in (4.6.6) has eigenvalues λ1 = 1 and λ2 = 5. With this information we can find the
associated eigenvectors. To find an eigenvector associated with the eigenvalue λ1 = 1
compute  
1 3
A − λ1 I2 = A − I2 = .
1 3

245
§4.6 Eigenvalues of 2 × 2 Matrices

It follows that v1 = (3, −1)t is an eigenvector since

(A − I2 )v1 = 0.

Similarly, to find an eigenvector associated with the eigenvalue λ2 = 5 compute


 
−3 3
A − λ2 I2 = A − 5I2 = .
1 −1

It follows that v2 = (1, 1)t is an eigenvector since

(A − 5I2 )v2 = 0.

Examples of Matrices with Complex Eigenvectors Let


 
0 −1
A= .
1 0

Then pA (λ) = λ2 + 1 and the eigenvalues of A are ±i. To find the eigenvector v ∈ C2 whose
existence is guaranteed by Lemma 4.6.4, we need to solve the complex system of linear
equations Av = iv. We can rewrite this system as:
  
−i −1 v1
= 0.
1 −i v2

A calculation shows that  


i
{e:eigenv} v= (4.6.11)
1
is a solution. Since the coefficients of A are real, we can take the complex conjugate of the
equation Av = iv to obtain
Av = −iv.
Thus  
−i
v=
1
is the eigenvector corresponding to the eigenvalue −i. This comment is valid for any complex
eigenvalue.
More generally, let  
σ −τ
{E:cmplxnf} A= , (4.6.12)
τ σ

246
§4.6 Eigenvalues of 2 × 2 Matrices

where τ 6= 0. Then

pA (λ) = λ2 − 2σλ + σ 2 + τ 2
= (λ − (σ + iτ ))(λ − (σ − iτ )),

and the eigenvalues of A are the complex conjugates σ ±iτ . Thus A has no real eigenvectors.
The complex eigenvectors of A are v and v where v is defined in (4.6.11).

Exercises

{c4.9.2}
1. For which values of λ is the matrix
 
1−λ 4
2 3−λ
 
1 4
not invertible? Note: These values of λ are just the eigenvalues of the matrix .
2 3
Answer: The matrix is not invertible when λ = 5 or λ = −1.
Solution: Corollary 3.8.3 states that a matrix is not invertible if and only if the determinant is
zero; in this case, if
(1 − λ)(3 − λ) − (2)(4) = λ2 − 4λ − 5 = 0.

In Exercises 2 – 5 compute the determinant, trace, and characteristic polynomials for the given
{c6.4.1a} 2 × 2 matrix.
 
1 4
2. .
0 −1
The determinant of the matrix is −1, the trace is 0, and the characteristic polynomial is p(λ) =
2
{c6.4.1b} λ − 1.
 
2 13
3. .
−1 5
The determinant of the matrix is 23, the trace is 7, and the characteristic polynomial is p(λ) =
2
{c6.4.1c} λ − 7λ + 23.
 
1 4
4. .
1 −1
Solution: The determinant of the matrix is −5, the trace is 0, and the characteristic polynomial
is p(λ) = λ2 − 5.

247
§4.6 Eigenvalues of 2 × 2 Matrices

{c6.4.1d}
 
4 10
5. .
2 5
The determinant of the matrix is 0, the trace is 9, and the characteristic polynomial is p(λ) = λ2 −9λ.

In Exercises 6 – 8 compute the eigenvalues for the given 2 × 2 matrix.


{c6.4.2a}
 
1 2
6. .
0 −5
Answer: The eigenvalues of the matrix are λ1 = −5 and λ2 = 1.
Solution: For any 2 × 2 matrix A, the eigenvalues are the roots of the characteristic polynomials,
which can be found by solving the equation λ2 −tr(A)λ+det(A) = 0. In this case, the characteristic
polynomial of the matrix is λ2 + 4λ − 5.
{c6.4.2b}
 
−3 2
7. .
1 0
√ √
−3 + 17 −3 − 17
Answer: The eigenvalues of the matrix are λ1 = and λ2 = .
2 2
Solution: The eigenvalues are the roots of the characteristic polynomial, which is λ2 + 3λ − 2.
{c6.4.2c}
 
3 −2
8. .
2 −1
Answer: The matrix has a double eigenvalue at λ = 1.
Solution: The characteristic polynomial of the matrix is λ2 − 2λ + 1. The eigenvalues are the roots
of this polynomial.

{c6.4.2ba}
9. Suppose that the characteristic polynomial of the 2 × 2 matrix A is pA (λ) = λ2 + 2λ − 6. Find
det(A) and tr(A).
Answer: det(A) = −6 and tr(A) = −2.
Solution: Recall that the characteristic polynomial is

pA (λ) = λ2 − tr(A)λ + det(A).

{c6.4.3}
10. (a) Let A and B be 2 × 2 matrices. Using direct calculation, show that

{e:trAB=trBA} tr(AB) = tr(BA). (4.6.13)

248
§4.6 Eigenvalues of 2 × 2 Matrices

(b) Now let A and B be n × n matrices. Verify by direct calculation that (4.6.13) is still valid.
(a) Let    
a11 a12 b11 b12
A= and B=
a21 a22 b21 b22

Calculate tr(AB) and compare to tr(BA):


 
a11 b11 + a12 b21 a11 b12 + a12 b22
tr(AB) = tr = a11 b11 + a12 b21 + a21 b12 + a22 b22 .
a21 b11 + a22 b21 a21 b12 + a22 b22

 
a11 b11 + a21 b12 a12 b11 + a22 b12
tr(BA) = tr = a11 b11 + a21 b12 + a12 b21 + a22 b22 .
a11 b21 + a21 b22 a12 b21 + a22 b22
So, indeed, tr(AB) = tr(BA) for 2 × 2 matrices A and B.
(b) Let C = AB. Then each element along the main diagonal of C is:
n
X
cii = ai1 b1i + · · · + ain bni = aij bji .
j=1

Thus,
n X
X n
tr(C) = aij bji .
i=1 j=1

Let D = BA. Then each element along the main diagonal of D is:
n
X
dii = bi1 a1i + · · · + ain bni = aij bji .
j=1

Therefore,
n X
X n n X
X n
tr(D) = bij aji = aij bji = tr(C).
i=1 j=1 i=1 j=1

In Exercises 11 – 13 use the program map to guess whether the given matrix has real or complex
conjugate eigenvalues. For each example, write the reasons for your guess.
 
0.97 −0.22
11. (matlab) A = .
{c7.8.5a} 0.22 0.97
The matrix has complex conjugate eigenvalues, since the mapping is a contracting rotation.
 
0.97 0.22
12. (matlab) B = .
{c7.8.5b} 0.22 0.97
The mapping has real eigenvalues. Repeated mapping of any vector leads to a vector in one of two
invariant directions. These directions are the eigenvectors.

249
§4.6 Eigenvalues of 2 × 2 Matrices

 
0.4 −1.4
13. (matlab) C = .
{c7.8.5c} 1.5 0.5
The matrix has complex conjugate eigenvalues, since the mapping is an expanding rotation.

In Exercises 14 – 15 use the program map to guess one of the eigenvectors of the given matrix. What
is the corresponding eigenvalue? Using map, can you find a second eigenvalue and eigenvector?
 
2 4
14. (matlab) A = .
{c7.8.6a} 2 0
The eigenvectors are v1 = (2, 1)t , with eigenvalue λ1 = 4, and v2 = (1, −1)t , with eigenvalue
λ2 = −2.
 
2 −1
15. (matlab) B = .
{c7.8.6b} 0.25 1
Hint: Use the feature Rescale in the MAP Options. Then the length of the vector is rescaled to
one after each use of the command Map. In this way you can avoid overflows in the computations
while still being able to see the directions where the vectors are moved by the matrix mapping.
The matrix has a single eigenvector, v = (2, 1)t , with corresponding eigenvalue λ = 1.5.

{c7.8.7} 16. (matlab) The MATLAB command eig computes


 the eigenvalues of matrices. Use eig to
2.34 −1.43
compute the eigenvalues of A = .
π e
Answer: The eigenvalues of A are λ ≈ 2.5291 ± 2.1111i.
Solution: Enter the matrix A into MATLAB and find its eigenvalues by typing

A = [2.34 -1.43; pi exp(1)];


eig(A)

250
§4.7 Initial Value Problems Revisited

{S:IVPR} 4.7 Initial Value Problems Revisited


To summarize the ideas developed in this chapter, we review the method that we have
developed to solve the system of differential equations

ẋ = ax + by
{E:2dode} (4.7.1)
ẏ = cx + dy

satisfying the initial conditions


x(0) = x0
{E:2dic} (4.7.2)
y(0) = y0 .

Begin by rewriting (4.7.1) in matrix form

{E:2dodeM} Ẋ = CX (4.7.3)

where    
a b x(t)
C= and X(t) = .
c d y(t)
Rewrite the initial conditions (4.7.2) in vector form

{E:2dicV} X(0) = X0 (4.7.4)

where  
x0
X0 = .
y0

When the eigenvalues of C are real and distinct we now know how to solve the initial value
problem (4.7.3) and (4.7.4). This solution is found in four steps.

Step 1: Find the eigenvalues λ1 and λ2 of C.


These eigenvalues are the roots of the characteristic polynomial as given by (4.6.5):

pC (λ) = λ2 − tr(C)λ + det(C).

These roots may be found either by factoring pC or by using the quadratic formula. The
roots are real and distinct when the discriminant

D = tr(C)2 − 4 det(C) > 0.

Recall (4.6.10) and Theorem 4.6.3.

251
§4.7 Initial Value Problems Revisited

Step 2: Find eigenvectors v1 and v2 of C associated with the eigenvalues λ1 and λ2 .


For j = 1 and j = 2, the eigenvector vj is found by solving the homogeneous system of
linear equations
{E:eeqn} (C − λj I2 )v = 0 (4.7.5)
for one nonzero solution. Lemma 4.6.4 tells us that there is always a nonzero solution to
(4.7.5) since λj is an eigenvalue of C.

Step 3: Using superposition, write the general solution to the system of ODEs (4.7.3) as

{E:gensoln} X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 , (4.7.6)

where α1 , α2 ∈ R.
Theorem 4.5.3 tells us that for j = 1, 2

Xj (t) = eλj t vj

is a solution to (4.7.3). The principle of superposition (see Section 4.5) allows us to conclude
that
X(t) = α1 X1 (t) + α2 X2 (t)
is also a solution to (4.7.3) for any scalars α1 , α2 ∈ R. Thus, (4.7.6) is valid.
Note that the initial condition corresponding to the general solution (4.7.6) is

{E:geninit} X(0) = α1 v1 + α2 v2 , (4.7.7)

since e0 = 1.

Step 4: Solve the initial value problem by solving the system of linear equations

{E:geninit2} X0 = α1 v1 + α2 v2 (4.7.8)

for α1 and α2 (see (4.7.7)).


Let A be the 2 × 2 matrix whose columns are v1 and v2 . That is,

{E:Av1v2} A = (v1 |v2 ). (4.7.9)

Then we may rewrite (4.7.8) in the form


 
α1
{E:solveinit} A = X0 . (4.7.10)
α2

252
§4.7 Initial Value Problems Revisited

We claim that the matrix A = (v1 |v2 ) (defined in (4.7.9)) is always invertible. Recall
Lemma 4.5.2 which states that if w is a nonzero multiple of v2 , then w is also an eigenvector
of A associated to the eigenvalue λ2 . Since the eigenvalues λ1 and λ2 are distinct, it follows
that the eigenvector v1 is not a scalar multiple of the eigenvector v2 (see Lemma 4.5.2).
Therefore, the area of the parallelogram spanned by v1 and v2 is nonzero and the determinant
of A is nonzero by Theorem 3.8.5 of Chapter 3. Corollary 3.8.3 of Chapter 3 now implies
that A is invertible. Thus, the unique solution to (4.7.10) is
 
α1
= A−1 X0 .
α2

This equation is easily solved since we have an explicit formula for A−1 when A is a 2 × 2
matrix (see (3.8.1) in Section 3.8). Indeed,
 
1 d −b
A−1 = .
det(A) −c a

An Initial Value Problem Solved by Hand Solve the linear system of differential equations

ẋ = 3x − y
{E:ivpbh} (4.7.11)
ẏ = 4x − 2y

with initial conditions


x(0) = 2
{E:ivpbhic} (4.7.12)
y(0) = −3.

Rewrite the system (4.7.11) in matrix form as

Ẋ = CX

where  
3 −1
C= .
4 −2
Rewrite the initial conditions (4.7.12) in vector form
 
2
X(0) = X0 = .
−3

Now proceed through the four steps outlined previously.

253
§4.7 Initial Value Problems Revisited

Step 1: Find the eigenvalues of C.


The characteristic polynomial of C is

pC (λ) = λ2 − tr(C)λ + det(C) = λ2 − λ − 2 = (λ − 2)(λ + 1).

Therefore, the eigenvalues of C are

λ1 = 2 and λ2 = −1.

Step 2: Find the eigenvectors of C.


Find an eigenvector associated with the eigenvalue λ1 = 2 by solving the system of equations
   
3 −1 2 0
(C − λ1 I2 )v = − v
4 −2 0 2
 
1 −1
= v = 0.
4 −4
One particular solution to this system is
 
1
v1 = .
1

Similarly, find an eigenvector associated with the eigenvalue λ2 = −1 by solving the system
of equations
   
3 −1 −1 0
(C − λ2 I2 )v = − v
4 −2 0 −1
 
4 −1
= v = 0.
4 −1
One particular solution to this system is
 
1
v2 = .
4

Step 3: Write the general solution to the system of differential equations.


Using superposition the general solution to the system (4.7.11) is:
   
2t −t 2t 1 −t 1
X(t) = α1 e v1 + α2 e v2 = α1 e + α2 e ,
1 4

254
§4.7 Initial Value Problems Revisited

where α1 , α2 ∈ R. Note that the initial state of this solution is:


     
1 1 α1 + α2
X(0) = α1 + α2 = .
1 4 α1 + 4α2

Step 4: Solve the initial value problem.


Let  
1 1
A = (v1 |v2 ) = .
1 4
The equation for the initial condition is
 
α1
A = X0 .
α2

See (4.7.9).
We can write the inverse of A by formula as
 
1 4 −1
A−1 = .
3 −1 1

It follows that we solve for the coefficients αj as


      
α1 −1 1 4 −1 2 1 11
= A X0 = = .
α2 3 −1 1 −3 3 −5

In coordinates
11 5
α1 = and α2 = − .
3 3
The solution to the initial value problem (4.7.11) and (4.7.12) is:
1
11e2t v1 − 5e−t v2

X(t) =
3    
1 2t 1 −t 1
= 11e − 5e .
3 1 4

Expressing the solution in coordinates, we obtain:


1
11e2t − 5e−t

x(t) =
3
1
11e2t − 20e−t .

y(t) =
3

255
§4.7 Initial Value Problems Revisited

An Initial Value Problem Solved using MATLAB Next, solve the system of ODEs

ẋ = 1.7x + 3.5y
ẏ = 1.3x − 4.6y

with initial conditions

x(0) = 2.7
y(0) = 1.1 .

Rewrite this system in matrix form as

Ẋ = CX

where  
1.7 3.5
C= .
1.3 −4.6
Rewrite the initial conditions in vector form
 
2.7
X0 = .
1.1

Now proceed through the four steps outlined previously. In MATLAB begin by typing

C = [1.7 3.5; 1.3 -4.6]


X0 = [2.7; 1.1]

Step 1: Find the eigenvalues of C by typing

lambda = eig(C)

and obtaining

lambda =
2.3543
-5.2543

So the eigenvalues of C are real and distinct.

256
§4.7 Initial Value Problems Revisited

Step 2: To find the eigenvectors of C we need to solve two homogeneous systems of linear
equations. The matrix associated with the first system is obtained by typing

C1 = C - lambda(1)*eye(2)

which yields

C1 =
-0.6543 3.5000
1.3000 -6.9543

We can solve the homogeneous system (C1)x = 0 by row reduction — but MATLAB has
this process preprogrammed in the command null. So type

v1 = null(C1)

and obtain

v1 =
-0.9830
-0.1838

Similarly, to find an eigenvector associated to the eigenvalue λ2 type

C2 = C - lambda(2)*eye(2);
v2 = null(C2)

and obtain

v2 =
-0.4496
0.8932

Step 3: The general solution to this system of differential equations is:


   
2.3543t −0.9830 −5.2543t −0.4496
X(t) = α1 e + α2 e .
−0.1838 0.8932

257
§4.7 Initial Value Problems Revisited

Step 4: Solve the initial value problem by finding the scalars α1 and α2 . Form the matrix
A by typing

A = [v1 v2]

Then solve for the α’s by typing

alpha = inv(A)*X0

obtaining

alpha =
-3.0253
0.6091

Therefore, the closed form solution to the initial value problem is:
 
0.9830
X(t) =3.0253e2.3543t
0.1838
 
−0.4496
+0.6091e−5.2543t .
0.8932

Exercises

In Exercises 1 – 4 find the solution to the system of differential equations Ẋ = CX satisfying


X(0) = X0 .
{c4.10A.1a}
   
1 1 1
1. C = and X0 = .
0 2 4
Answer: The solution to Ẋ = CX satisfying this initial condition is

4e2t − 3et
     
1 1
X(t) = 4e2t − 3et = .
1 0 4e2t

Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial

pC (λ) = λ2 − tr(C)λ + det(C) = λ2 − 3λ + 2 = (λ − 2)(λ − 1).

258
§4.7 Initial Value Problems Revisited

So the eigenvalues are: λ1 = 2 and λ2 = 1. To find the eigenvector associated to each eigenvalue,
solve the equation (C − λj I2 )vj = 0 for j = 1 and j = 2. Solve
     
1 1 2 0 −1 1
− v1 = v1 = 0
0 2 0 2 0 0

to obtain v1 = (1, 1)t and solve


     
1 1 1 0 0 1
− v2 = v2 = 0
0 2 0 1 0 1

to obtain v2 = (1, 0)t . We can then write the general solution


   
1 1
X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 = α1 e2t + α2 et .
1 0

From this formula, find α1 and α2 by solving


       
4 1 1 α1 + α2
= X(0) = α1 + α2 = .
−3 1 0 α1

Solving the linear system


α1 + α2 = 4
α1 = −3
{c4.10A.1b} we obtain α1 = 4 and α2 = −3 and find the solution to the differential equation.
   
2 −3 1
2. C = and X0 = .
0 −1 −2
Answer: The solution to Ẋ = CX satisfying this initial condition is

3e2t − 2e−t
     
1 1
X(t) = 3e2t − 2e−t = −t .
0 1 −2e

Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial

pC (λ) = λ2 − tr(C)λ + det(C) = λ2 − λ − 2 = (λ − 2)(λ + 1).

So the eigenvalues are: λ1 = 2 and λ2 = −1. To find the eigenvector associated to each eigenvalue,
solve the equation (C − λj I2 )vj = 0 for j = 1 and j = 2. Solve
     
2 −3 2 0 0 −3
− v1 = v1 = 0
0 −1 0 2 0 −3

to obtain v1 = (1, 0)t and solve


     
2 −3 1 0 3 −3
+ v2 = v2 = 0
0 −1 0 1 0 0

259
§4.7 Initial Value Problems Revisited

to obtain v2 = (1, 1)t . We can then write the general solution


   
1 1
X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 = α1 e2t + α2 e−t .
0 1

From this formula, find α1 and α2 by solving


       
1 1 1 α1 + α2
= X(0) = α1 + α2 = .
−2 0 1 α2

Solving the linear system


α1 + α2 = 1
α2 = −2

{c4.10A.1c} we obtain α1 = 3 and α2 = −2 and find the solution to the differential equation.
   
−3 2 −1
3. C = and X0 = .
−2 2 3
Answer: The solution to Ẋ = CX satisfying this initial condition is

7et − 10e−2t
     
7 1 5 2 1
X(t) = et − e−2t = t −2t .
3 2 3 1 3 14e − 5e

Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial

pC (λ) = λ2 − tr(C)λ + det(C) = λ2 + λ − 2 = (λ − 1)(λ + 2).

So the eigenvalues are: λ1 = 1 and λ2 = −2. To find the eigenvector associated to each eigenvalue,
solve the equation (C − λj I2 )vj = 0 for j = 1 and j = 2. Solve
     
−3 2 1 0 −4 2
− v1 = v1 = 0
−2 2 0 1 −2 1

to obtain v1 = (1, 2)t and solve


     
−3 2 2 0 −1 2
+ v2 = v2 = 0
−2 2 0 2 −2 4

to obtain v2 = (2, 1)t . We can then write the general solution


   
1 2
X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 = α1 et + α2 e−2t .
2 1

From this formula, find α1 and α2 by solving


       
−1 1 2 α1 + 2α2
= X(0) = α1 + α2 = .
3 2 1 2α1 + α2

260
§4.7 Initial Value Problems Revisited

Solving the linear system


α1 + 2α2 = −1
2α1 + α2 = 3
7 5
we obtain α1 = and α2 = − and find the solution to the differential equation.
{c4.10A.1d} 3 3
   
2 1 1
4. C = and X0 = .
1 2 2
Answer: The solution to Ẋ = CX satisfying this initial condition is

3e3t − et
     
3 1 1 1 1
X(t) = e3t − et = 3t t .
2 1 2 −1 2 3e + e

Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial

pC (λ) = λ2 − tr(C)λ + det(C) = λ2 − 4λ + 3 = (λ − 3)(λ − 1).

So the eigenvalues are: λ1 = 3 and λ2 = 1. To find the eigenvector associated to each eigenvalue,
solve the equation (C − λj I2 )vj = 0 for j = 1 and j = 2. Solve
     
2 1 3 0 −1 1
− v1 = v1 = 0
1 2 0 3 1 −1

to obtain v1 = (1, 1)t and solve


     
2 1 1 0 1 1
− v2 = v2 = 0
1 2 0 1 1 1

to obtain v2 = (1, −1)t . We can then write the general solution


   
1 1
X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 = α1 e3t + α2 et .
1 −1

From this formula, find α1 and α2 by solving


       
1 1 1 α1 + 2α2
= X(0) = α1 + α2 = .
2 1 −1 α1 − α2

Solving the linear system


α1 + 2α2 = 1
α1 − α2 = 2
3 1
we obtain α1 = and α2 = − and find the solution to the differential equation.
2 2

261
§4.7 Initial Value Problems Revisited

{c4.10A.2}
5. Solve the initial value problem Ẋ = CX where X0 = e1 given that
 
1
(a) X(t) = e−t is a solution,
2
(b) tr(C) = 3, and
(c) C is a symmetric matrix.

Answer: The solution to the differential equation Ẋ = CX with the given restrictions is
 −t
e + 4e4t
    
1 1 2 2 1
X(t) = e−t + e4t = .
5 2 5 −1 5 2e−t − 2e4t

Solution: First, find the matrix C using the given information: First, since C is symmetric, we
can write  
a b
C= .
b d
Then, we are given tr(C) = a + d = 3, so we can rewrite C as
 
a b
C= .
b 3−a

Since X(t) = e−t (1, 2)t is a solution, λ1 = −1 must be an eigenvalue of C with associated eigenvector
v1 = (1, 2)t . Thus Cv1 = λ1 v1 , or
        
a b 1 a + 2b a + 2b −1
= = = .
b 3−a 2 b + 2(3 − a) −2a + b + 6 −2

This equation yields the linear system

a + 2b = −1
−2a + b = −8

which we can solve to obtain a = 3 and b = −2. So


 
3 −2
C= .
−2 0

Now, find λ2 , the other root of

pC (λ) = λ2 − tr(C) + det(C) = λ2 − 3λ − 4 = (λ + 1)(λ − 4).

Thus, the second eigenvalue of C is λ2 = 4, and we can solve


     
3 −2 4 0 −1 −2
(C − λ2 I2 )v2 = − v2 = v2 = 0
−2 0 0 4 −2 −4

262
§4.7 Initial Value Problems Revisited

to obtain v2 = (2, −1)t , the eigenvector associated to λ2 . The general solution to Ẋ = CX is


   
1 2
X(t) = α1 e−t + α2 e4t .
2 −1

Find α1 and α2 by substituting the initial condition X(0) = X0 into this formula:
       
1 1 2 α1 + 2α2
= X(0) = α1 + α2 = .
0 2 −1 2α1 − α2

1 2
Thus, α1 = and α2 = , so we find the general solution.
5 5

In Exercises 6 – 7, with MATLAB assistance, find the solution to the system of differential equations
Ẋ = CX satisfying X(0) = X0 .
   
1.76 4.65 0.34
6. (matlab) C = and X0 = .
{c4.10A.3a} 0.23 1.11 −0.50
Answer: The solution to the differential equation Ẋ = CX with the given initial condition is
   
2.5190t −0.9869 0.3510t −0.9570
X(t) ≈ 0.8627e − 1.2449e .
−0.1611 0.2900

Solution: In MATLAB, enter the matrix C and the vector X0. Then, type

lambda = eig(C)

to obtain the eigenvalues of C, which are λ1 ≈ 2.5190 and λ2 ≈ 0.3510. Find the eigenvectors v1
and v2 associated to λ1 and λ2 by typing

v1 = null(C - lambda(1)*eye(2))
v2 = null(C - lambda(2)*eye(2))

Thus, the general solution is


   
−0.9869 −0.9570
X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 ≈ α1 e2.5190t + α2 e0.3510t .
−0.1611 0.2900

The initial condition is


X0 = X(0) = α1 v1 + α2 v2 .
We can solve this linear system by creating the matrix A = (v1 |v2 ), and computing A−1 X0 . In
MATLAB, type

263
§4.7 Initial Value Problems Revisited

A = [v1 v2]
alpha = inv(A)*X0

obtaining α1 ≈ 0.8627 and α2 ≈ −1.2449.


   
1.23 2π 1.2
7. (matlab) C = and X0 = .
{c4.10A.3b} π/2 1.45 1.6
Answer: The solution to the differential equation Ẋ = CX with the given initial condition is
   
−0.9005 −0.8880
X(t) ≈ 1.0860e−1.8035t − 2.4527e4.4835t .
0.4348 −0.4598

Solution: In MATLAB, enter the matrix C and the vector X0. Then, type

lambda = eig(C)

to obtain the eigenvalues of C, which are λ1 ≈ −1.0835 and λ2 ≈ 4.4835. Find the eigenvectors v1
and v2 associated to λ1 and λ2 by typing

v1 = null(C - lambda(1)*eye(2))
v2 = null(C - lambda(2)*eye(2))

Thus, the general solution is


   
−0.9005 −0.8880
X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 ≈ α1 e−1.8035t + α2 e4.4835t .
0.4348 −0.4598
The initial condition is
X0 = X(0) = α1 v1 + α2 v2 .
We can solve this linear system by creating the matrix A = (v1 |v2 ), and computing A−1 X0 . In
MATLAB, type

A = [v1 v2]
alpha = inv(A)*X0

obtaining α1 ≈ 1.0860 and α2 ≈ −2.4527.

In Exercises 8 – 9, find the solution to Ẋ = CX satisfying X(0) = X0 in two different ways, as


follows.

(a) Use pplane9 to find X(0.5). Hint: Use the Specify a computation interval option in the
PPLANE9 Keyboard input window to compute the solution to t = 0.5. Then use the zoom
in square feature to determine an answer to three decimal places.

264
§4.7 Initial Value Problems Revisited

(b) Next use MATLAB to find the eigenvalues and eigenvectors of C and to find a closed form
solution X(t). Use this formula to evaluate X(0.5) to three decimal places.
(c) Do the two answers agree?
   
2.65 −2.34 0.5
8. (matlab) C = and X0 = .
{c4.10A.4a} −1.5 −1.2 0.1
Answer: X(0.5) = (0.155, 0.386)t and the two methods agree to three decimal places.
Solution: (a) The result of the pplane5 integration is given in Figure 8a. After zooming several
times we arrive at Figure 8b. By inspection X(0.5) = (0.155, 0.386).
(b) Enter the matrix C into MATLAB by typing

C = [2.65 -2.34; -1.5 -1.2];

Find the eigenvalues and eigenvectors of this matrix by typing [V,D] = eig(C) and obtaining

V =
0.9510 0.4525
-0.3093 0.8918
D =
3.4112 0
0 -1.9612

Therefore the general solution to this differential equation is:


   
0.9510 0.4525
X(t) = αe3.4112t + βe−1.9612t .
−0.3093 0.8918

It follows that  
α
X(0) = V
β
Therefore,
      
α 0.9510 0.4525 0.5 −0.0067
= V −1 X0 = =
β −0.3093 0.8918 1.0 1.1191

The last calculation is done by typing coeff = inv(V)*[0.5;1.0]. Therefore, the solution to the
initial value problem is:
   
0.9510 0.4525
X(t) = −0.0067e3.4112t + 1.1191e−1.9612t .
−0.3093 0.8918

We can evaluate X(0.5) in MATLAB by typing

265
§4.7 Initial Value Problems Revisited

X5 = coeff(1)*exp(D(1,1)*0.5)*V(:,1) + coeff(2)*exp(D(2,2)*0.5)*V(:,2)

and obtaining

X5 =
0.1547
0.3858

(c) The two answers agree to three decimal places.


x’=Ax+By A = 2.65 B = −2.34 x’=Ax+By A = 2.65 B = −2.34
y’=Cx+Dy C = −1.5 D = −1.2 y’=Cx+Dy C = −1.5 D = −1.2

0.39
5
0.389
4

0.388
3

2 0.387

1 0.386

0 0.385
y

−1 0.384

−2
0.383

−3
0.382
−4
0.381
−5
0.38
−5 −4 −3 −2 −1 0 1 2 3 4 5 0.154 0.1542 0.1544 0.1546 0.1548 0.155 0.1552 0.1554 0.1556
x x

Figure 8a Figure 8b
   
1.2 2.4 0.5
9. (matlab) C = and X0 = .
{c4.10A.4b} 0.6 −3.5 0.7
Answer: X(0.5) = (1.621, 0.291)t and the two methods agree to three decimal places.
Solution: (a) The result of the pplane5 integration is given in Figure 9a. After zooming several
times we arrive at Figure 9b. By inspection X(0.5) = (1.621, 0.291).
(b) (b) Enter the matrix C into MATLAB by typing

C = [1.2 2.4; 0.6 -3.5];

Find the eigenvalues and eigenvectors of this matrix by typing [V,D] = eig(C) and obtaining

V =
0.9928 -0.4335
0.1194 0.9011

266
§4.7 Initial Value Problems Revisited

D =
1.4887 0
0 -3.7887

Therefore the general solution to this differential equation is:


   
0.9928 −0.4335
X(t) = αe1.4887t + βe−3.7887t .
0.1194 0.9011

It follows that  
α
X(0) = V
β
Therefore,       
α 0.9928 −0.4335 0.5 0.7967
= V −1 X0 = =
β 0.1194 0.9011 0.7 0.6712
The last calculation is done by typing coeff = inv(V)*[0.5;0.7]. Therefore, the solution to the
initial value problem is:
   
0.9928 −0.4335
X(t) = 0.7967e1.4887t + 0.6712e−3.7887t .
0.1194 0.9011

We can evaluate X(0.5) in MATLAB by typing

X5 = coeff(1)*exp(D(1,1)*0.5)*V(:,1) + coeff(2)*exp(D(2,2)*0.5)*V(:,2)

and obtaining

X5 =
1.6213
0.2912

(c) The two answers agree to three decimal places.

267
§4.7 Initial Value Problems Revisited

x’=Ax+By A = 1.2 B = 2.4 x’=Ax+By A = 1.2 B = 2.4


y’=Cx+Dy C = 0.6 D = −3.5 y’=Cx+Dy C = 0.6 D = −3.5

4 0.2925

3
0.292
2

1
0.2915
0
y

−1 0.291

−2
0.2905
−3

−4
0.29

−5

−5 −4 −3 −2 −1 0 1 2 3 4 5 1.62 1.6205 1.621 1.6215 1.622 1.6225


x x

Figure 9a Figure 9b

268
§4.8 *Markov Chains

{S:TransitionApplied} 4.8 *Markov Chains


Markov chains provide an interesting and useful application of matrices and linear algebra.
In this section we introduce Markov chains via some of the theory and two examples. The
theory can be understood and applied to examples using just the background in linear
algebra that we have developed in this chapter.

An Example of Cats Consider the four room apartment pictured in Figure 17. One way
passages between the rooms are indicated by arrows. For example, it is possible to go from
room 1 directly to any other room, but when in room 3 it is possible to go only to room 4.

1 2

3 4
{F:apart} Figure 17: Schematic design of apartment passages.

Suppose that there is a cat in the apartment and that at each hour the cat is asked to move
from the room that it is in to another. True to form, however, the cat chooses with equal
probability to stay in the room for another hour or to move through one of the allowed
passages. Suppose that we let pij be the probability that the cat will move from room i to
room j; in particular, pii is the probability that the cat will stay in room i. For example,

269
§4.8 *Markov Chains

when the cat is in room 1, it has four choices — it can stay in room 1 or move to any of
the other rooms. Assuming that each of these choices is made with equal probability, we
see that
1 1 1 1
p11 = p12 = p13 = p14 = .
4 4 4 4
It is now straightforward to verify that

1 1
p21 = p22 = p23 = 0 p24 = 0
2 2

1 1
p31 = 0 p32 = 0 p33 = p34 =
2 2
1 1 1
p41 = 0 p42 = p43 = p44 = .
3 3 3

Putting these probabilities together yields the transition matrix:


 1 1 1 1 
 4 4 4 4 
 1 1 
 0 0 
{E:Pexamp} P = 2 2 (4.8.1*)
 
 0 1 1 
0 
2 2
 
1 1 1
 
0
3 3 3
This transition matrix has the properties that all entries are nonnegative and that the
entries in each row sum to 1.

Three Basic Questions Using the transition matrix P , we discuss the answers to three
questions:

(A) What is the probability that a cat starting in room i will be in room j after exactly k
steps? We call the movement that occurs after each hour a step.

(B) Suppose that we put 100 cats in the apartment with some initial distribution of cats in
each room. What will the distribution of cats look like after a large number of steps?

(C) Suppose that a cat is initially in room i and takes a large number of steps. For how
many of those steps will the cat be expected to be in room j?

270
§4.8 *Markov Chains

A Discussion of Question (A) We begin to answer Question (A) by determining the proba-
bility that the cat moves from room 1 to room 4 in two steps. We denote this probability
by p14 and compute
(2)

{E:prob14} (4.8.2)
(2)
p14 = p11 p14 + p12 p24 + p13 p34 + p14 p44 ;
that is, the probability is the sum of the probabilities that the cat will move from room 1
to each room i and then from room i to room 4. In this case the answer is:
(2) 1 1 1 1 1 1 1 13
p14 = × + ×0+ × + × = ≈ 0.27 .
4 4 4 4 2 4 3 48

It follows from (4.8.2) and the definition of matrix multiplication that p14 is just the (1, 4)th
(2)

entry in the matrix P 2 . An induction argument shows that the probability of the cat moving
from room i to room j in k steps is precisely the (i, j)th entry in the matrix P k — which
answers Question (A). In particular, we can answer the question: What is the probability
that the cat will move from room 4 to room 3 in four steps? Using MATLAB the answer is
given by typing e4_10_1 to recall the matrix P and then typing

P4 = P^4;
P4(4,3)

obtaining

ans =
0.2728

A Discussion of Question (B) We answer Question (B) in two parts: first we compute a
formula for determining the number of cats that are expected to be in room i after k steps,
and second we explore that formula numerically for large k. We begin by supposing that
100 cats are distributed in the rooms according to the initial vector V0 = (v1 , v2 , v3 , v4 )t ;
that is, the number of cats initially in room i is vi . Next, we denote the number of cats that
are expected to be in room i after k steps by vi . For example, we determine how many
(k)

cats we expect to be in room 2 after one step. That number is:

{E:probt2} (4.8.3)
(1)
v2 = p12 v1 + p22 v2 + p32 v3 + p42 v4 ;

that is, v2 is the sum of the proportion of cats in each room i that are expected to migrate
(1)

to room 2 in one step. In this case, the answer is:


1 1 1
v1 + v2 + v4 .
4 2 3

271
§4.8 *Markov Chains

It now follows from (4.8.3), the definition of the transpose of a matrix, and the definition
of matrix multiplication that v2 is the 2nd entry in the vector P t V0 . Indeed, it follows
(1)

by induction that vi is the ith entry in the vector (P t )k V0 which answers the first part of
(k)

Question (B).
We may rephrase the second part of Question (B) as follows. Let

Vk = (v1k , v2k , v3k , v4k )t = (P t )k V0 .

Question (B) actually asks: What will the vector Vk look like for large k. To answer that
question we need some results about matrices like the matrix P in (4.8.1*). But first we
explore the answer to this question numerically using MATLAB.
Suppose, for example, that the initial vector is
 
2
 43 
{MATLAB:9} V0 = 
 21  .
 (4.8.4*)
34

Typing e4_10_1 and e4_10_4 enters the matrix P and the initial vector V0 into MATLAB.
To compute V20 , the distribution of cats after 20 steps, type

Q=P'
V20 = Q^(20)*V0

and obtain

V20 =
18.1818
27.2727
27.2727
27.2727

Thus, after rounding to the nearest integer, we expect 27 cats to be in each of rooms 2,3 and
4 and 18 cats to be in room 1 after 20 steps. In fact, the vector V20 has a remarkable feature.
Compute Q*V20 in MATLAB and see that V20 = P t V20 ; that is, V20 is, to within four digit
numerical precision, an eigenvector of P t with eigenvalue equal to 1. This computation
was not a numerical accident, as we now describe. Indeed, compute V20 for several initial
distributions V0 of cats and see that the answer will always be the same — up to four digit
accuracy.

272
§4.8 *Markov Chains

A Discussion of Question (C) Suppose there is just one cat in the apartment; and we ask
how many times that cat is expected to visit room 3 in 100 steps. Suppose the cat starts
in room 1; then the initial distribution of cats is one cat in room 1 and zero cats in any of
the other rooms. So V0 = e1 . In our discussion of Question (B) we saw that the 3rd entry
in (P t )k V0 gives the probability ck that the cat will be in room 3 after k steps.
In the extreme, suppose that the probability that the cat will be in room 3 is 1 for each step
k. Then the fraction of the time that the cat is in room 3 is

(1 + 1 + · · · + 1)/100 = 1.

In general, the fraction of the time f that the cat will be in room 3 during a span of 100
steps is
1
f= (c1 + c2 + · · · + c100 ).
100
Since ck = (P t )k V0 , we see that
1
{E:f} f= (P t V0 + (P t )2 V0 + · · · + (P t )100 V0 ). (4.8.5)
100

So, to answer Question (C), we need a way to sum the expression for f in (4.8.5), at least
approximately. This is not an easy task — though the answer itself is easy to explain. Let
V be the eigenvector of P t with eigenvalue 1 such that the sum of the entries in V is 1. The
answer is: f is approximately equal to V . See Theorem 4.8.4 for a more precise statement.
In our previous calculations the vector V20 was seen to be (approximately) an eigenvector
of P t with eigenvalue 1. Moreover the sum of the entries in V20 is precisely 100. Therefore,
we normalize V20 to get V by setting
1
V = V20 .
100
So, the fraction of time that the cat spends in room 3 is f ≈ 0.2727. Indeed, we expect
the cat to spend approximately 27% of its time in rooms 2,3,4 and about 18% of its time in
room 1.

Markov Matrices We now abstract the salient properties of our cat example. A Markov
chain is a system with a finite number of states labeled 1,…,n along with probabilities pij
of moving from site i to site j in a single step. The Markov assumption is that these
probabilities depend only on the site that you are in and not on how you got there. In our
example, we assumed that the probability of the cat moving from say room 2 to room 4 did
not depend on how the cat got to room 2 in the first place.

273
§4.8 *Markov Chains

We make a second assumption: there is a k such that it is possible to move from any site
i to any site j in exactly k steps. This assumption is not valid for general Markov chains,
though it is valid for the cat example, since it is possible to move from any room to any
other room in that example in exactly three steps. (It takes a minimum of three steps to
get from room 3 to room 1 in the cat example.) To simplify our discussion we include this
assumption in our definition of a Markov chain.
{D:Markov}
Definition 4.8.1. Markov matrices are square matrices P such that

(a) all entries in P are nonnegative,

(b) the entries in each row of P sum to 1, and

(c) there is a positive integer k such that all of the entries in P k are positive.

It is straightforward to verify that parts (a) and (b) in the definition of Markov matrices
are satisfied by the transition matrix
 
p11 · · · p1n
P =  ... ..
.
.. 
. 

pn1 · · · pnn

of a Markov chain. To verify part (c) requires further discussion.


{T:Markoveasy}
Proposition 4.8.2. Let P be a transition matrix for a Markov chain.

(a) The probability of moving from site i to site j in exactly k steps is the (i, j)th entry in
the matrix P k .

(b) The expected number of individuals at site i after exactly k steps is the ith entry in the
vector Vk ≡ (P t )k V0 .

(c) P is a Markov matrix.

Proof Only minor changes in our discussion of the cat example proves parts (a) and (b)
of the proposition.
(c) The assumption that it is possible to move from each site i to each site j in exactly k
steps means that the (i, j)th entry of P k is positive. For that k, all of the entries of P k are
positive. In the cat example, all entries of P 3 are positive. 

274
§4.8 *Markov Chains

Proposition 4.8.2 gives the answer to Question (A) and the first part of Question (B) for
general Markov chains.
Let vi ≥ 0 be the number of individuals initially at site i, and let V0 = (v1 , . . . , vn(0) )t .
(0) (0)

The total number of individuals in the initial population is:


(0)
#(V0 ) = v1 + · · · + vn(0) .
{T:Markov}
Theorem 4.8.3. Let P be a Markov matrix. Then

(a) #(Vk ) = #(V0 ); that is, the number of individuals after k time steps is the same as the
initial number.
(b) V = lim Vk exists and #(V ) = #(V0 ).
k→∞

(c) V is an eigenvector of P t with eigenvalue equal to 1.

Proof (a) By induction it is sufficient to show that #(V1 ) = #(V0 ). We do this by


calculating from V1 = P t V0 that
(1)
#(V1 ) = v1 + · · · + vn(1)
(0) (0)
= (p11 v1 + · · · + pn1 vn(0) ) + · · · + (p1n v1 + · · · + pnn vn(0) )
(0)
= (p11 + · · · + p1n )v1 + · · · + (pn1 + · · · + pnn )vn(0)
(0)
= v1 + · · · + vn(0)

since the entries in each row of P sum to 1. Thus #(V1 ) = #(V0 ), as claimed.
(b) The hard part of this theorem is proving that the limiting vector V exists; we give a
proof of this fact in Chapter 11, Theorem 11.4.4. Once V exists it follows directly from (a)
that #(V ) = #(V0 ).
(c) Just calculate that

P t V = P t ( lim Vk ) = P t ( lim (P t )k V0 )
k→∞ k→∞
= lim (P t )k+1 V0 = lim (P t )k V0 = V,
k→∞ k→∞

which proves (c). 

Theorem 4.8.3(b) gives the answer to the second part of Question (B) for general Markov
chains. Next we discuss Question (C).

275
§4.8 *Markov Chains

{T:ergodic}
Theorem 4.8.4. Let P be a Markov matrix. Let V be the eigenvector of P t with eigenvalue
1 and #(V ) = 1. Then after a large number of steps N the expected number of times an
individual will visit site i is N vi where vi is the ith entry in V .

Sketch In our discussion of Question (C) for the cat example, we explained why the
fraction fN of time that an individual will visit site j when starting initially at site i is the
j th entry in the sum
1
fN = (P t + (P t )2 + · · · + (P t )N )ei .
N
See (4.8.5). The proof of this theorem involves being able to calculate the limit of fN as
N → ∞. There are two main ideas. First, the limit of the matrix (P t )N exists as N
approaches infinity — call that limit Q. Moreover, Q is a matrix all of whose columns equal
V . Second, for large N , the sum

P t + (P t )2 + · · · + (P t )N ≈ Q + Q + · · · + Q = N Q,

so that the limit of the fN is Qei = V .


The verification of these statements is beyond the scope of this text. For those interested,
the idea of the proof of the second part is roughly the following. Fix k large enough so that
(P t )k is close to Q. Then when N is large, much larger than k, the sum of the first k terms
in the series is nearly zero. 

Theorem 4.8.4 gives the answer to Question (C) for a general Markov chain. It follows
from Theorem 4.8.4 that for Markov chains the amount of time that an individual spends
in room i is independent of the individual’s initial room — at least after a large number of
steps.
A complete proof of this theorem relies on a result known as the ergodic theorem. Roughly
speaking, the ergodic theorem relates space averages with time averages. To see how this
point is relevant, note that Question (B) deals with the issue of how a large number of
individuals will be distributed in space after a large number of steps, while Question (C)
deals with the issue of how the path of a single individual will be distributed in time after
a large number of steps.

An Example of Umbrellas This example focuses on the utility of answering Question (C)
and reinforces the fact that results in Theorem 4.8.3 have the second interpretation given
in Theorem 4.8.4.

276
§4.8 *Markov Chains

Consider the problem of a man with four umbrellas. If it is raining in the morning when
the man is about to leave for his office, then the man takes an umbrella from home to
office, assuming that he has an umbrella at home. If it is raining in the afternoon, then the
man takes an umbrella from office to home, assuming that he has an umbrella in his office.
Suppose that the probability that it will rain in the morning is p = 0.2 and the probability
that it will rain in the afternoon is q = 0.3, and these probabilities are independent. What
percentage of days will the man get wet going from home to office; that is, what percentage
of the days will the man be at home on a rainy morning with all of his umbrellas at the
office?
There are five states in the system depending on the number of umbrellas that are at home.
Let si where 0 ≤ i ≤ 4 be the state with i umbrellas at home and 4 − i umbrellas at work.
For example, s2 is the state of having two umbrellas at home and two at the office. Let P be
the 5 × 5 transition matrix of state changes from morning to afternoon and Q be the 5 × 5
transition matrix of state changes from afternoon to morning. For example, the probability
p23 of moving from site s2 to site s3 is 0, since it is not possible to have more umbrellas
at home after going to work in the morning. The probability q23 = q, since the number of
umbrellas at home will increase by one only if it is raining in the afternoon. The transition
probabilities between all states are given in the following transition matrices:
 
1 0 0 0 0

 p 1−p 0 0 0 

P =
 0 p 1−p 0 0 ;

 0 0 p 1−p 0 
0 0 0 p 1−p

 
1−q q 0 0 0

 0 1−q q 0 0 

Q=
 0 0 1−q q 0 

 0 0 0 1−q q 
0 0 0 0 1

Specifically,
 
1 0 0 0 0
 0.2 0.8 0 0 0 
{MATLAB:10} (4.8.6*)
 
P =
 0 0.2 0.8 0 0 

 0 0 0.2 0.8 0 
0 0 0 0.2 0.8

277
§4.8 *Markov Chains

 
0.7 0.3 0 0 0

 0 0.7 0.3 0 0 

Q=
 0 0 0.7 0.3 0 

 0 0 0 0.7 0.3 
0 0 0 0 1

The transition matrix M from moving from state si on one morning to state sj the next
morning is just M = P Q. We can compute this matrix using MATLAB by typing

e4_10_6
M = P*Q

obtaining

M =
0.7000 0.3000 0 0 0
0.1400 0.6200 0.2400 0 0
0 0.1400 0.6200 0.2400 0
0 0 0.1400 0.6200 0.2400
0 0 0 0.1400 0.8600

It is easy to check using MATLAB that all entries in the matrix M 4 are nonzero. So M is
a Markov matrix and we can use Theorem 4.8.4 to find the limiting distribution of states.
Start with some initial condition like V0 = (0, 0, 1, 0, 0)t corresponding to the state in which
two umbrellas are at home and two at the office. Then compute the vectors Vk = (M t )k V0
until arriving at an eigenvector of M t with eigenvalue 1. For example, V70 is computed by
typing V70 = M'^(70)*V0 and obtaining

V70 =
0.0419
0.0898
0.1537
0.2633
0.4512

We interpret V ≈ V70 in the following way. Since v1 is approximately .042, it follows that
for approximately 4.2% of all steps the umbrellas are in state s0 . That is, approximately
4.2% of all days there are no umbrellas at home. The probability that it will rain in the
morning on one of those days is 0.2. Therefore, the probability of being at home in the
morning when it is raining without any umbrellas is approximately 0.008.

278
§4.8 *Markov Chains

Exercises

{c4.10.1}
1. Let P be a Markov matrix and let w = (1, . . . , 1)t . Show that the vector w is an eigenvector of
P with eigenvalue 1.
Let
···
 
p11 p1n
P =  ... ..
.
..  .
. 

pn1 ··· pnn


Then,
··· p11 + · · · + p1n
      
p11 p1n 1 1
P w =  ... ..   ..  =  ..
..   . 
. .  .   .  =  .. 

pn1 ··· pnn 1 pn1 + · · · + pnn 1


since the sum of the entries in each row of a Markov matrix is 1.

In Exercises 2 – 4 which of the matrices are Markov matrices, and why?


{c4.10.2a}
 
0.8 0.2
2. P = .
0.2 0.8
Answer: Matrix P is a Markov matrix.
Solution: The entries in each row of P sum to 1, and all the entries of P are positive. Thus, P
satisfies Definition 4.8.1 of Markov matrices.
{c4.10.2b}
 
0.8 0.2
3. Q = .
0 1
Matrix Q is not a Markov matrix, since there is no positive integer k for which Qk (2, 1) > 0.
{c4.10.2c}
 
0.8 0.2
4. R = .
−0.2 1.2
Matrix R is not a Markov matrix, since R(2, 1) is negative.

{c4.10.3}
5. The state diagram of a Markov chain is given in Figure 18. Assume that each arrow leaving a
state has equal probability of being chosen. Find the transition matrix for this chain.

279
§4.8 *Markov Chains

Let P be the transition matrix associated with Figure 18. Then,

 1 1 1 
0 0
 3 3 3 
 1 1 1 

 3 0 0 
3 3 
 1 1 
P =
 0 0 .
0 
2 2

 0 1 1 
0 0 
2 2 
 
1 1 1 1

0
4 4 4 4

1 3

{F:Mchain} Figure 18: State diagram of a Markov chain.

{c4.10.4}
6. Suppose that P and Q are each n × n matrices whose rows sum to 1. Show that P Q is also an
n × n matrix whose rows sum to 1.
Let
··· ···
   
p11 p1n q11 q1n
 . .. ..   . .. ..  .
P =  .. . .  and Q =  .. . . 
pn1 ··· pnn qn1 ··· qnn

280
§4.8 *Markov Chains

Then,
p11 q11 + · · · + p1n qn1 ··· p11 q1n + · · · + p1n qnn
 
.. .. ..
PQ =  . . . .
 

pn1 q11 + · · · + pnn qn1 ··· pn1 q1n + · · · + pnn qnn


So, the sum of the entries in the ith row is

(pi1 q11 + · · · + pin qn1 ) + · · · + (pi1 q1n + · · · + pin qnn ) =


pi1 (q11 + · · · + q1n ) + · · · + pin (qn1 + · · · + qnn ) =
pi1 + · · · + pin = 1.

{c4.10.5} 7. (matlab) Suppose the apartment in Figure 17 is populated by dogs rather than cats. Suppose
that dogs will actually move when told; that is, at each step a dog will move from the room that
he occupies to another room.

(a) Calculate the transition matrix PDOG for this Markov chain and verify that PDOG is a Markov
matrix.
(b) Find the probability that a dog starting in room 2 will end up in room 3 after 5 steps.
(c) Find the probability that a dog starting in room 3 will end up in room 1 after 4 steps. Explain
why your answer is correct without using MATLAB.
(d) Suppose that the initial population consists of 100 dogs. After a large number of steps what
will be the distribution of the dogs in the four rooms.

(a) Answer: Let D =PDOG be the transition matrix for this Markov chain:
 1 1 1 
0
 3 3 3 
 1 0 0 0 
D=  0 0 0
.
1 
1 1
 
0 0
2 2

Solution: All entries of D are nonnegative and the entries of each row of D sum to 1. The MAT-
LAB command D^5 verifies that all entries of the matrix D5 are positive. Therefore, Definition 4.8.1
is valid for D.
(b) The probability that a dog starting in room 2 will end up in room 3 after 5 steps is element
7
(2, 3) of the matrix D5 , or .
36
(c) Answer: The probability that a dog starting in room 3 will end up in room 1 after 4 steps is 0.
Solution: The only way a dog in room 3 can get to room 1 is by going from room 3 to room 4,
then to room 2, then to room 1, which takes three steps. The dog must then leave room 1 on the

281
§4.8 *Markov Chains

fourth step, and there is no other combination of steps by which the dog could go to room 1 in four
steps.
(d) Answer: After a large number of steps, there will be approximately 23 dogs in each of rooms
1, 2, and 3, and 31 dogs in room 4.
Solution: Using MATLAB , evaluate Dk for large values of k.

{c4.10.6} 8. (matlab) A truck rental company has locations in three cities A, B and C. Statistically, the
company knows that the trucks rented at one location will be returned in one week to the three
locations in the following proportions.

Rental Location Returned to A Returned to B Returned to C


A 75% 10% 15%
B 5% 85% 10%
C 20% 20% 60%

Suppose that the company has 250 trucks. How should the company distribute the trucks so that
the number of trucks available at each location remains approximately constant from one week to
the next?
Answer: The company should put 70 trucks in city A, 123 trucks in city B, and 57 trucks in city
C.
Solution: Given initial truck distribution V0 = (VA , VB , VC ), the distribution after k weeks is
(P t )k V0 , where P is the transition matrix for the movement of trucks. As k → ∞, (P t )k approaches

0.2807 0.2807 0.2807


0.4912 0.4912 0.4912
0.2281 0.2281 0.2281

So, if the distribution of trucks is to remain constant, then, for large values of k,
     
VA 0.2807(VA + VB + VC 70.1754
 VB  = (P t )k V0 =  0.4912(VA + VB + VC  =  122.8070  .
VC 0.2281(VA + VB + VC 57.0175

{c4.10.7} 9. (matlab) Let  


0.10 0.20 0.30 0.15 0.25
 0.05 0.35 0.10 0.40 0.10 
{MATLAB:12} (4.8.7*)
 
P =
 0 0 0.35 0.55 0.10 

 0.25 0.25 0.25 0.25 0 
0.33 0.32 0 0 0.35
be the transition matrix of a Markov chain.

282
§4.8 *Markov Chains

(a) What is the probability that an individual at site 2 will move to site 5 in three steps?
(b) What is the probability that an individual at site 4 will move to site 1 in seven steps?
(c) Suppose that 100 individuals are initially uniformly distributed at the five sites. How will the
individuals be distributed after four steps?
(d) Find an eigenvector of P t with eigenvalue 1.

(a) The probability that an individual at site 2 will move to site 5 in three steps is 11.62%, that is,
element (2, 5) of P 3 .
(b) The probability that an individual at site 4 will move to site 1 in seven steps is 14.07%, that
is, element (4, 1) of P 7 .
(c) After four steps, the distribution will be
   
20 14.1466
 20   22.1620 
(P t )4 
   
 20  =  21.2874
  .

 20   30.0624 
20 12.3417

(d) Answer: Let V be an eigenvector of P t with eigenvalue 1.


 
0.1408
 0.2195 
 
V =  0.2154  .

 0.3032 
0.1211

Solution: MATLAB can be used to verify that P t V = V . To find this eigenvector, evaluate the
matrix (P t )k for large values of k. The vector V is any column of this matrix.

{c4.10.8} 10. (matlab) Suppose that the probability that it will rain in the morning in p = 0.3 and the
probability that it will rain in the afternoon is q = 0.25. In the man with umbrellas example, what
is the probability that the man will be at home with no umbrellas while it is raining?
Answer: The probability that the man will be at home in the morning with no umbrella while it
is raining is 7.19%.
Solution: Create the matrix P which relates the number of umbrellas in the house in the morning
to the number in the house in the afternoon and the matrix Q which relates the number of umbrellas
in the house in the afternoon to the number there the next morning, then compute R = P*Q:

R =

283
§4.8 *Markov Chains

0.7500 0.2500 0 0 0
0.2250 0.6000 0.1750 0 0
0 0.2250 0.6000 0.1750 0
0 0 0.2250 0.6000 0.1750
0 0 0 0.2250 0.7750

Find the eigenvector V of R by computing (Rt )k for large values of k. The first element of V is
0.2398, the probability that there will be no umbrella in the house on a given morning. Multiply
this value by the probability that it will rain on any given morning to get the probability that the
man will have no umbrella on a rainy morning.

{c4.10.9} 11. (matlab) Suppose that the original man in the text with umbrellas has only three umbrellas
instead of four. What is the probability that on a given day he will get wet going to work?
Answer: The probability that the man will get wet going to work is 1.53%.
Solution: Create the 4 × 4 matrices P (the transition matrix between morning and afternoon) and
Q (the transition matrix between afternoon and the next morning), then multiply to obtain

R =
0.7000 0.3000 0 0
0.1400 0.6200 0.2400 0
0 0.1400 0.6200 0.2400
0 0 0.1400 0.8600

Find the eigenvector V of R. The first element of V is 0.0763, the probability that there will be no
umbrella in the house on a given morning. Multiply this by the probability that it will rain in the
morning to obtain the solution.

284
Chapter 5 Vector Spaces

5 Vector Spaces
In Chapter 2 we discussed how to solve systems of m linear equations in n unknowns. We
found that solutions of these equations are vectors (x1 , . . . , xn ) ∈ Rn . In Chapter 3 we
discussed how the notation of matrices and matrix multiplication drastically simplifies the
presentation of linear systems and how matrix multiplication leads to linear mappings. We
also discussed briefly how linear mappings lead to methods for solving linear systems —
superposition, eigenvectors, inverses. In Chapter 4 we discussed how to solve systems of
n linear differential equations in n unknown functions. These chapters have provided an
introduction to many of the ideas of linear algebra and now we begin the task of formalizing
these ideas.
Sets having the two operations of vector addition and scalar multiplication are called vector
spaces. This concept is introduced in Section 5.1 along with the two primary examples —
the set Rn in which solutions to systems of linear equations sit and the set C 1 of differentiable
functions in which solutions to systems of ordinary differential equations sit. Solutions to
systems of homogeneous linear equations form subspaces of Rn and solutions of systems of
linear differential equations form subspaces of C 1 . These issues are discussed in Sections 5.1
and 5.2.
When we solve a homogeneous system of equations, we write every solution as a superpo-
sition of a finite number of specific solutions. Abstracting this process is one of the main
points of this chapter. Specifically, we show that every vector in many commonly occurring
vector spaces (in particular, the subspaces of solutions) can be written as a linear combina-
tion (superposition) of a few solutions. The minimum number of solutions needed is called
the dimension of that vector space. Sets of vectors that generate all solutions by superpo-
sition and that consist of that minimum number of vectors are called bases. These ideas
are discussed in detail in Sections 5.3–5.5. The proof of the main theorem (Theorem 5.5.3),
which gives a computable method for determining when a set is a basis, is given in Sec-
tion 5.6. This proof may be omitted on a first reading, but the statement of the theorem is
most important and must be understood.

285
§5.1 Vector Spaces and Subspaces

Table 1: Properties of Vector Spaces: suppose u, v, w ∈ V and r, s ∈ R.

(A1) Addition is commutative v+w =w+v


(A2) Addition is associative (u + v) + w = u + (v + w)
(A3) Additive identity 0 exists v+0=v
(A4) Additive inverse −v exists v + (−v) = 0
(M1) Multiplication is associative (rs)v = r(sv)
(M2) Multiplicative identity exists 1v = v
(D1) Distributive law for scalars (r + s)v = rv + sv
(D2) Distributive law for vectors r(v + w) = rv + rw
{vectorspacelist}

{C:vectorspaces}

{S:5.1} 5.1 Vector Spaces and Subspaces


Vector spaces abstract the arithmetic properties of addition and scalar multiplication of
vectors. In Rn we know how to add vectors and to multiply vectors by scalars. Indeed, it is
straightforward to verify that each of the eight properties listed in Table 1 is valid for vectors
in V = Rn . Remarkably, sets that satisfy these eight properties have much in common with
Rn . So we define:
{vectorspace}
Definition 5.1.1. Let V be a set having the two operations of addition and scalar multi-
plication. Then V is a vector space if the eight properties listed in Table 5.1.1 hold. The
elements of a vector space are called vectors.

The vector 0 mentioned in (A3) in Table 1 is called the zero vector.


When we say that a vector space V has the two operations of addition and scalar multi-
plication we mean that the sum of two vectors in V is again a vector in V and the scalar
product of a vector with a number is again a vector in V . These two properties are called
closure under addition and closure under scalar multiplication.
In this discussion we focus on just two types of vector spaces: Rn and function spaces. The
reason that we make this choice is that solutions to linear equations are vectors in Rn while
solutions to linear systems of differential equations are vectors of functions.

An Example of a Function Space For example, let F denote the set of all functions f : R → R.
Note that functions like f1 (t) = t2 − 2t + 7 and f2 (t) = sin t are in F since they are defined

286
§5.1 Vector Spaces and Subspaces

1
for all real numbers t, but that functions like g1 (t) = and g2 (t) = tan t are not in F since
t
they are not defined for all t.
We can add two functions f and g by defining the function f + g to be:

(f + g)(t) = f (t) + g(t).

We can also multiply a function f by a scalar c ∈ R by defining the function cf to be:

(cf )(t) = cf (t).

With these operations of addition and scalar multiplication, F is a vector space; that is, F
satisfies the eight vector space properties in Table 1. More precisely:

(A3) Define the zero function O by

O(t) = 0 for all t ∈ R.

For every x in F the function O satisfies:

(x + O)(t) = x(t) + O(t) = x(t) + 0 = x(t).

Therefore, x + O = x and O is the additive identity in F.


(A4) Let x be a function in F and define y(t) = −x(t). Then y is also a function in F, and

(x + y)(t) = x(t) + y(t) = x(t) + (−x(t)) = 0 = O(t).

Thus, x has the additive inverse −x.

After these comments it is straightforward to verify that the remaining six properties in
Table 1 are satisfied by functions in F.

Sets that are not Vector Spaces It is worth considering how closure under vector addition
and scalar multiplication can fail. Consider the following three examples.

(i) Let V1 be the set that consists of just the x and y axes in the plane. Since (1, 0) and
(0, 1) are in V1 but
(1, 0) + (0, 1) = (1, 1)
is not in V1 , we see that V1 is not closed under vector addition. On the other hand, V1
is closed under scalar multiplication.

287
§5.1 Vector Spaces and Subspaces

(ii) Let V2 be the set of all vectors (k, `) ∈ R2 where k and ` are integers. The set V2 is
1 1
closed under addition but not under scalar multiplication since (1, 0) = ( , 0) is not
2 2
in V2 .

(iii) Let V3 = [1, 2] be the closed interval in R. The set V3 is neither closed under addition
(1 + 1.5 = 2.5 6∈ V3 ) nor under scalar multiplication (4 · 1.5 = 6 6∈ V3 ). Hence the set
V3 is not closed under vector addition and not closed under scalar multiplication.

Subspaces
{subspaces}
Definition 5.1.2. Let V be a vector space. A nonempty subset W ⊂ V is a subspace if W
is a vector space using the operations of addition and scalar multiplication defined on V .

Note that in order for a subset W of a vector space V to be a subspace it must be closed
under addition and closed under scalar multiplication. That is, suppose w1 , w2 ∈ W and
r ∈ R. Then

(i) w1 + w2 ∈ W , and

(ii) rw1 ∈ W .

The x-axis and the xz-plane are examples of subsets of R3 that are closed under addition and
closed under scalar multiplication. Every vector on the x-axis has the form (a, 0, 0) ∈ R3 .
The sum of two vectors (a, 0, 0) and (b, 0, 0) on the x-axis is (a + b, 0, 0) which is also on
the x-axis. The x-axis is also closed under scalar multiplication as r(a, 0, 0) = (ra, 0, 0), and
the x-axis is a subspace of R3 . Similarly, every vector in the xz-plane in R3 has the form
(a1 , 0, a3 ). As in the case of the x-axis, it is easy to verify that this set of vectors is closed
under addition and scalar multiplication. Thus, the xz-plane is also a subspace of R3 .
In Theorem 5.1.4 we show that every subset of a vector space that is closed under addition
and scalar multiplication is a subspace. To verify this statement, we need the following
lemma in which some special notation is used. Typically, we use the same notation 0 to
denote the real number zero and the zero vector. In the following lemma it is convenient to
distinguish the two different uses of 0, and we write the zero vector in boldface.
{lem:AddId}
Lemma 5.1.3. Let V be a vector space, and let 0 ∈ V be the zero vector. Then

0v = 0 and (−1)v = −v

for every vector in v ∈ V .

288
§5.1 Vector Spaces and Subspaces

Proof Let v be a vector in V and use (D1) to compute

0v + 0v = (0 + 0)v = 0v.

By (A4) the vector 0v has an additive inverse −0v. Adding −0v to both sides yields

(0v + 0v) + (−0v) = 0v + (−0v) = 0.

Associativity of addition (A2) now implies

0v + (0v + (−0v)) = 0.

A second application of (A4) implies that

0v + 0 = 0

and (A3) implies that 0v = 0.


Next, we show that the additive inverse −v of a vector v is unique. That is, if v + a = 0,
then a = −v.
Before beginning the proof, note that commutativity of addition (A1) together with (A3)
implies that 0 + v = v. Similarly, (A1) and (A4) imply that −v + v = 0.
To prove uniqueness of additive inverses, add −v to both sides of the equation v + a = 0
yielding
−v + (v + a) = −v + 0.
Properties (A2) and (A3) imply

(−v + v) + a = −v.

But
(−v + v) + a = 0 + a = a.
Therefore a = −v, as claimed.
To verify that (−1)v = −v, we show that (−1)v is the additive inverse of v. Using (M1),
(D1), and the fact that 0v = 0, calculate

v + (−1)v = 1v + (−1)v = (1 − 1)v = 0v = 0.

Thus, (−1)v is the additive inverse of v and must equal −v, as claimed. 
{T:subspaces}
Theorem 5.1.4. Let W be a subset of the vector space V . If W is closed under addition
and closed under scalar multiplication, then W is a subspace.

289
§5.1 Vector Spaces and Subspaces

Proof We have to show that W is a vector space using the operations of addition and
scalar multiplication defined on V . That is, we need to verify that the eight properties listed
in Table 1 are satisfied. Note that properties (A1), (A2), (M1), (M2), (D1), and (D2) are
valid for vectors in W since they are valid for vectors in V .
It remains to verify (A3) and (A4). Let w ∈ W be any vector. Since W is closed under
scalar multiplication, it follows that 0w and (−1)w are in W . Lemma 5.1.3 states that
0w = 0 and (−1)w = −w; it follows that 0 and −w are in W . Hence, properties (A3) and
(A4) are valid for vectors in W , since they are valid for vectors in V . 

Examples of Subspaces of Rn
{EX:subspaces}
Example 5.1.5. (a) Let V be a vector space. Then the subsets V and {0} are always
subspaces of V . A subspace W ⊂ V is proper if W 6= 0 and W 6= V .

(b) Lines through the origin are subspaces of Rn . Let w ∈ Rn be a nonzero vector and let
W = {rw : r ∈ R}. The set W is closed under addition and scalar multiplication and
is a subspace of Rn by Theorem 5.1.4. The subspace W is just a line through the origin
in Rn , since the vector rw points in the same direction as w when r > 0 and the exact
opposite direction when r < 0.

(c) Planes containing the origin are subspaces of R3 . To verify this point, let P be a plane
through the origin and let N be a vector perpendicular to P . Then P consists of all
vectors v ∈ R3 perpendicular to N ; using the dot-product (see Chapter 2, (2.2.3)) we
recall that such vectors satisfy the linear equation N · v = 0. By superposition, the set
of all solutions to this equation is closed under addition and scalar multiplication and
is therefore a subspace by Theorem 5.1.4.

In a sense that will be made precise all subspaces of Rn can be written as the span of a
finite number of vectors generalizing Example 5.1.5(b) or as solutions to a system of linear
equations generalizing Example 5.1.5(c).

Examples of Subspaces of the Function Space F Let P be the set of all polynomials in F.
The sum of two polynomials is a polynomial and the scalar multiple of a polynomial is a
polynomial. Thus, P is closed under addition and scalar multiplication, and P is a subspace
of F.
As a second example of a subspace of F, let C 1 be the set of all continuously differentiable
functions u : R → R. A function u is in C 1 if u and u0 exist and are continuous for all t ∈ R.
Examples of functions in C 1 are:

290
§5.1 Vector Spaces and Subspaces

(i) Every polynomial p(t) = am tm + am−1 tm−1 + · · · + a1 t + a0 is in C 1 .


(ii) The function u(t) = eλt is in C 1 for each constant λ ∈ R.
(iii) The trigonometric functions u(t) = sin(λt) and v(t) = cos(λt) are in C 1 for each constant
λ ∈ R.
(iv) u(t) = t7/3 is twice differentiable everywhere and is in C 1 .

Equally there are many commonly used functions that are not in C 1 . Examples include:
1
(i) u(t) = is neither defined nor continuous at t = 5.
t−5
(ii) u(t) = |t| is not differentiable (at t = 0).
(iii) u(t) = csc(t) is neither defined nor continuous at t = kπ for any integer k.

The subset C 1 ⊂ F is a subspace and hence a vector space. The reason is simple. If x(t)
and y(t) are continuously differentiable, then
d dx dy
(x + y) = + .
dt dt dt
Hence x + y is differentiable and is in C 1 and C 1 is closed under addition. Similarly, C 1 is
closed under scalar multiplication. Let r ∈ R and let x ∈ C 1 . Then
d dx
(rx)(t) = r (t).
dt dt
Hence rx is differentiable and is in C 1 .

The Vector Space (C 1 )n Another example of a vector space that combines the features of
both Rn and C 1 is (C 1 )n . Vectors u ∈ (C 1 )n have the form
u(t) = (u1 (t), . . . , un (t)),
where each coordinate function uj (t) ∈ C 1 . Addition and scalar multiplication in (C 1 )n are
defined coordinatewise — just like addition and scalar multiplication in Rn . That is, let u, v
be in (C 1 )n and let r be in R, then
(u + v)(t) = (u1 (t) + v1 (t), . . . , un (t) + vn (t))
(ru)(t) = (ru1 (t), . . . , run (t)).
The set (C 1 )n satisfies the eight properties of vector spaces and is a vector space. Solutions
to systems of n linear ordinary differential equations are vectors in (C 1 )n .

291
§5.1 Vector Spaces and Subspaces

Exercises

{c5.1.1}
1. Verify that the set V1 consisting of all scalar multiples of (1, −1, −2) is a subspace of R3 .
The set V1 ⊂ R3 is a subspace. The set V1 contains all vectors (a, −a, −2a), where a ∈ R. We can
show that it is closed under vector addition, since

a(1, −1, −2) + b(1, −1, −2) = (a + b)(1, −1, −2) ∈ V1

where a and b are scalars. The set V1 is closed under scalar multiplication since

b(a(1, −1, −2) = (ba)(1, −1, −2) ∈ V1 .

{c5.1.2}
2. Let V2 be the set of all 2 × 3 matrices. Verify that V2 is a vector space.
The set V2 is a vector space. Let A and B be matrices in V2 :
   
a11 a12 a13 b11 b12 b13
A= and B = .
a21 a22 a23 b21 b22 b23

The set V2 is closed under vector addition because


 
a11 + b11 a12 + b12 a13 + b13
A+B = ∈ V2 ,
a21 + b21 a22 + b22 a23 + b23

and is closed under scalar multiplication because, for r ∈ R,


 
ra11 ra12 ra13
rA = ∈ V2 .
ra21 ra22 ra23

Next, verify that V2 satisfies the eight properties of vector spaces, shown in Table 1.

(A1) Addition is commutative in V2 , since


 
a11 + b11 a12 + b12 a13 + b13
A+B = = B + A.
a21 + b21 a22 + b22 a23 + b23

(A2) Given a third matrix C ∈ V2 ,


 
a11 + b11 + c11 a12 + b12 + c12 a13 + b13 + c13
(A + B) + C = = A + (B + C).
a21 + b21 + c21 a22 + b22 + c22 a23 + b23 + c23

Thus, addition is associative.

292
§5.1 Vector Spaces and Subspaces

(A3) The additive identity is the matrix


 
0 0 0
0= .
0 0 0

(A4) For each matrix A ∈ V2 , there exists an element −A such that A + (−A) = 0, where
 
−a11 −a12 −a13
−A = .
−a21 −a22 −a23

(M1) Scalar multiplication is associative in V2 since, for any r, s ∈ R and A ∈ V2 ,


 
rsa11 rsa12 rsa13
(rs)A = = r(sA).
rsa21 rsa22 rsa23

(M2) There exists a multiplicative identity in V2 , since 1A = A for any A ∈ V2 .


(D1) For any scalars r, s ∈ R, and any matrix A ∈ V2 ,
 
ra11 + sa11 ra12 + sa12 ra13 + sa13
(r + s)A = = rA + sA.
ra21 + sa21 ra22 + sa22 ra23 + sa23

(D2) For any scalar r ∈ R and any matrices A, B ∈ V2 ,


 
ra11 + rb11 ra12 + rb12 ra13 + rb13
r(A + B) = = rA + rB.
ra21 + rb21 ra22 + rb22 ra23 + rb23

Thus, V2 is a vector space, since it is closed under addition and scalar multiplication and satisfies
the eight properties of vector spaces.
{c5.1.3}
3. Let  
1 1 0
A= .
1 −1 1
Let V3 be the set of vectors x ∈ R3 such that Ax = 0. Verify that V3 is a subspace of R3 . Compare
V1 with V3 .
The set V3 is a subspace of R3 since the solution set to any equation Ax = 0 is a space. This is
demonstrated by the principle of superposition introduced in Section 3.4. Also, V3 = V1 .
We can show that V3 = V1 by row reducing to find the solutions to Ax = 0:
 1 

1 1 0
 1 0
−→  2 .
1 −1 1 1
0 1 −
2
1 1
So all vectors in V3 are of the form x = s(− , , 1), where s ∈ R. The vector x is an element of V1
2 2
for each s.

293
§5.1 Vector Spaces and Subspaces

In Exercises 4 – 10 you are given a vector space V and a subset W . For each pair, decide whether
or not W is a subspace of V , and explain why.
{c5.1.4a}
4. V = R3 and W consists of vectors in R3 that have a 0 in their first component.
W is a subspace of V , since W is closed under addition and scalar multiplication.
{c5.1.4b}
5. V = R3 and W consists of vectors in R3 that have a 1 in their first component.
Answer: W is not a subspace of V .
Solution: The subset W is closed neither under addition nor under scalar multiplication. For
example, let w1 = (1, 4, 2) and w2 = (1, −1, 3) be elements of W . Then,

w1 + w2 = (1, 4, 2) + (1, −1, 3) = (2, 3, 5)

{c5.1.4d} which is not an element of W .


6. V = R2 and W consists of vectors in R2 for which the sum of the components is 1.
Answer: W is not a subspace of V .
Solution: The subset W is closed neither under addition nor under scalar multiplication. For
example, let w1 = (3, −2) and w2 = (0, 1) be elements of W . Then,

w1 + w2 = (3, −2) + (0, 1) = (3, −3).

The sum of the elements 3 − 3 = 0 6= 1.


{c5.1.4c}
7. V = R2 and W consists of vectors in R2 for which the sum of the components is 0.
W is a subspace of V , since W is closed under addition and scalar multiplication.
{c5.1.4g}
Z 4
8. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying x(t)dt = 0.
−2

W is a subspace of V , since W is closed under addition and scalar multiplication.


{c5.1.4e}
9. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying x(1) = 0.
W is a subspace of V , since W is closed under addition and scalar multiplication.
{c5.1.4f}
10. V = C 1 and W consists of functions x(t) ∈ C 1 satisfying x(1) = 1.
Answer: W is not a subspace of V .
Solution: The subset W is closed neither under addition nor under scalar multiplication. For
example, let w1 (t) = t and let w2 (t) = t2 .

(w1 + w2 )(1) = w1 (1) + w2 (1) = 1 + 1 = 2,

so (w1 + w2 )(t) is not an element of W .

294
§5.1 Vector Spaces and Subspaces

{c5.1.5a} In Exercises 11 – 15 which of the sets S are subspaces?


11. S = {(a, b, c) ∈ R3 : a ≥ 0, b ≥ 0, c ≥ 0}.
Answer: The set S is not a subspace.
Solution: The set S is closed under addition but not under scalar multiplication. To demonstrate,
let x = (1, 4, 2) be an element of S. Then

−2x = −2(1, 4, 2) = (−2, −8, −4)

{c5.1.5b} which is not an element of S.


12. S = {(x1 , x2 , x3 ) ∈ R3 : a1 x1 + a2 x2 + a3 x3 = 0 where a1 , a2 , a3 ∈ R are fixed}.

{c5.1.5c} The set S is a subspace, since it is closed under addition and scalar multiplication.
13. S = {(x, y) ∈ R2 : (x, y) is on the line through (1, 1) with slope 1}.

{c5.1.5d} The set S is a subspace, since it is closed under addition and scalar multiplication.
14. S = {x ∈ R2 : Ax = 0} where A is a 3 × 2 matrix.

{c5.1.5e} The set S is a subspace, since it is closed under addition and scalar multiplication.
15. S = {x ∈ R2 : Ax = b} where A is a 3 × 2 matrix and b ∈ R3 is a fixed nonzero vector.
Answer: The set S is not a subspace.
Solution: The set S is closed under neither addition nor scalar multiplication. For example, let
x1 and x2 be solutions to the equation Ax = b. Then,

A(x1 + x2 ) = Ax1 + Ax2 = b + b = 2b,

so (x1 + x2 ) is not an element of S.

{c5.1.6}
16. Let V be a vector space and let W1 and W2 be subspaces. Show that the intersection W1 ∩ W2
is also a subspace of V .
The subset W1 ∩ W2 is a subspace of V . To show that this subset is closed under addition and
scalar multiplication, let x and y be vectors in W1 ∩ W2 . It follows that x, y ∈ W1 and x, y ∈ W2 .
Therefore, by the definition of a subspace, x + y ∈ W1 and x + y ∈ W2 , so x + y ∈ W1 ∩ W2 . Also
by definition, rx ∈ W1 and rx ∈ W2 , for some scalar r, so rx ∈ W1 ∩ W2 .

{c5.1.7a}
17. For which scalars a, b, c do the solutions to the equation

ax + by = c

form a subspace of R2 ?

295
§5.1 Vector Spaces and Subspaces

Answer: Let V be the subset of solutions (x, y) to ax + by = c. The subset V is a subspace when
c = 0 and is not a subspace when c 6= 0.
Solution: Let (x1 , y1 ) and (x2 , y2 ) be elements of V . Then

a(x1 + x2 ) + b(y1 + y2 ) = (ax1 + by1 ) + (ax2 + by2 ) = c + c = 2c.

Thus V is closed under addition only when 2c = c, so c = 0. Similarly, for any scalar r,

r(ax1 + by1 ) = cr.

So V is closed under scalar multiplication only when rc = c for any scalar r. Thus, c = 0.

{c5.1.7b}
18. For which scalars a, b, c, d do the solutions to the equation

ax + by + cz = d

form a subspace of R3 ?
Answer: By the same proof as in Exercise 17, the solutions to the equation ax + by + cz = d form
a subspace of R3 when d = 0, and do not form a subspace when d 6= 0.

{c5.1.8}
19. Show that the set of all solutions to the differential equation ẋ = 2x is a subspace of C 1 .
The set of all solutions to the differential equation ẋ = 2x is indeed a subspace of C 1 . To demon-
strate, let x1 and x2 be elements of this set. The set is closed under addition since
d d d
(x1 + x2 ) = (x1 ) + (x2 ) = 2x1 + 2x2 = 2(x1 + x2 )
dt dt dt
and closed under scalar multiplication since, for any real scalar r,
d d
(rx1 ) = r (x1 ) = 2(rx1 ).
dt dt
Note that this problem provides another example of the principle of superposition.

{c5.1.9}
20. Recall from equation (4.5.6) of Section 4.5 that solutions to the system of differential equations
 
dX −1 3
= X
dt 3 −1
are    
1 1
X(t) = αe2t + βe−4t .
1 −1

296
§5.1 Vector Spaces and Subspaces

Use this formula for solutions to show that the set of solutions to this system of differential equations
is a vector subspace of (C 1 )2 .
Let V ⊂ (C 1 )2 be the set of solutions to (4.5.6). The set is closed under both addition and scalar
multiplication and is a subspace. To demonstrate, let
       
1 1 1 1
x1 (t) = α1 e2t + β1 e−4t and x2 (t) = α2 e2t + β2 e−4t
1 −1 1 −1
be elements of this set. Adding x1 and x2 yields
         
1 1 1 1
α1 e2t + β1 e−4t + α2 e2t + β2 e−4t
1 −1 1 −1
   
1 1
= (α1 + α2 )e2t + (β1 + β2 )e−4t ∈V
1 −1
and multiplying x1 by any real scalar r yields
        
1 1 1 1
rx1 = r α1 e2t + β1 e−4t = rα1 e2t + rβ1 e−4t ∈ V.
1 −1 1 −1
{c5.1.95}
21. Let V = R+ = {x ∈ R : x > 0}. Show that V is a vector space under the operations of
‘addition’ (⊕)
v ⊕ w = vw for all v, w ∈ V
and ‘scalar multiplication’ (⊗)
r ⊗ v = vrfor all v ∈ V and r ∈ R
1
Hints: The additive identity is v = 1; the additive inverse is ; and the multiplicative identity is
v
r = 1.

Proof We begin by verifying the hints.


(A3) 1⊕v ≡ 1v = v where 1, v ∈ V
1 1
(A4) v⊕ ≡ v = 1 where v ∈ V
v v
(M 2) 1⊗v ≡ v1 = v. where 1 ∈ R and v ∈ V
Next we verify the other five conditions of a vector space given in Table 1. Let u, v, w be vectors
in V and r, s be scalars in R.
(A1) v ⊕ w ≡ vw = wv ≡ w ⊕ v
(A2) (u ⊕ v) ⊕ w ≡ (uv)w = u(vw) ≡ u ⊕ (v ⊕ w)
(M 1) (rs) ⊗ v ≡ v rs = (v r )s ≡ s ⊗ (r ⊗ v)
(D1) (r + s) ⊗ v ≡ v r+s = v r v s ≡ (r ⊗ v) ⊕ (s ⊗ v)
(D2) r ⊗ (v ⊕ w) ≡ r ⊗ (vw) ≡ (vw)r = v r wr ≡ (r ⊗ v) ⊕ (r ⊗ w)


297
§5.2 Construction of Subspaces

{S:5.2} 5.2 Construction of Subspaces


The principle of superposition shows that the set of all solutions to a homogeneous system
of linear equations is closed under addition and scalar multiplication and is a subspace.
Indeed, there are two ways to describe subspaces: first as solutions to linear systems, and
second as the span of a set of vectors. We shall see that solving a homogeneous linear system
of equations just means writing the solution set as the span of a finite set of vectors.

{D:nullspace} Solutions to Homogeneous Systems Form Subspaces


Definition 5.2.1. Let A be an m × n matrix. The null space of A is the set of solutions to
the homogeneous system of linear equations
{Ax=0} Ax = 0. (5.2.1)

Lemma 5.2.2. Let A be an m × n matrix. Then the null space of A is a subspace of Rn .

Proof Suppose that x and y are solutions to (5.2.1). Then


A(x + y) = Ax + Ay = 0 + 0 = 0;
so x + y is a solution of (5.2.1). Similarly, for r ∈ R
A(rx) = rAx = r0 = 0;
so rx is a solution of (5.2.1). Thus, x + y and rx are in the null space of A, and the null
space is closed under addition and scalar multiplication. So Theorem 5.1.4 implies that the
null space is a subspace of the vector space Rn . 

Solutions to Linear Systems of Differential Equations Form Subspaces Let C be an n×n matrix
and let W be the set of solutions to the linear system of ordinary differential equations
dx
{Cx(t)} (t) = Cx(t). (5.2.2)
dt
We will see later that a solution to (5.2.2) has coordinate functions xj (t) in C 1 . The principle
of superposition then shows that W is a subspace of (C 1 )n . Suppose x(t) and y(t) are
solutions of (5.2.2). Then
d dx dy
(x(t) + y(t)) = (t) + (t) = Cx(t) + Cy(t) = C(x(t) + y(t));
dt dt dt
so x(t) + y(t) is a solution of (5.2.2) and in W . A similar calculation shows that rx(t) is
also in W and that W ⊂ (C 1 )n is a subspace.

298
§5.2 Construction of Subspaces

Writing Solution Subspaces as a Span The way we solve homogeneous systems of equa-
tions gives a second method for defining subspaces. For example, consider the system

Ax = 0,

where  
2 1 4 0
A= .
−1 0 2 1
The matrix A is row equivalent to the reduced echelon form matrix
 
1 0 −2 −1
E= .
0 1 8 2

Therefore x = (x1 , x2 , x3 , x4 ) is a solution of Ex = 0 if and only if x1 = 2x3 + x4 and


x2 = −8x3 − 2x4 . It follows that every solution of Ex = 0 can be written as:
   
2 1
 −8   −2 
x = x3  1  + x4  0  .
  

0 1

Since row operations do not change the set of solutions, it follows that every solution of
Ax = 0 has this form. We have also shown that every solution is generated by two vectors
by use of vector addition and scalar multiplication. We say that this subspace is spanned
by the two vectors    
2 1
 −8   −2 
 1  and  0  .
   

0 1
For example, a calculation verifies that the vector
 
−1
 −2 
 
 1 
−3

is also a solution of Ax = 0. Indeed, we may write it as


     
−1 2 1
 −2   −8   −2 
{eq:SpanSol}  1  =  1  − 3 0 .
      (5.2.3)
−3 0 1

299
§5.2 Construction of Subspaces

Spans Let v1 , . . . , vk be a set of vectors in a vector space V . A vector v ∈ V is a linear


combination of v1 , . . . , vk if
v = r1 v1 + · · · + rk vk
for some scalars r1 , . . . , rk .
{span}
Definition 5.2.3. The set of all linear combinations of the vectors v1 , . . . , vk in a vector
space V is the span of v1 , . . . , vk and is denoted by span{v1 , . . . , vk }.

For example, the vector on the left hand side in (5.2.3) is a linear combination of the two
vectors on the right hand side.
The simplest example of a span is Rn itself. Let vj = ej where ej ∈ Rn is the vector with a 1
in the j th coordinate and 0 in all other coordinates. Then every vector x = (x1 , . . . , xn ) ∈ Rn
can be written as
x = x1 e1 + · · · + xn en .
It follows that
Rn = span{e1 , . . . , en }.
Similarly, the set span{e1 , e3 } ⊂ R3 is just the x1 x3 -plane, since vectors in this span are

x1 e1 + x3 e3 = x1 (1, 0, 0) + x3 (0, 0, 1) = (x1 , 0, x3 ).


{spansubspace}
Proposition 5.2.4. Let V be a vector space and let w1 , . . . , wk ∈ V . Then W =
span{w1 , . . . , wk } ⊂ V is a subspace.

Proof Suppose x, y ∈ W . Then

x = r1 w1 + · · · + rk wk
y = s1 w1 + · · · + sk wk

for some scalars r1 , . . . , rk and s1 , . . . , sk . It follows that

x + y = (r1 + s1 )w1 + · · · + (rk + sk )wk

and
rx = (rr1 )w1 + · · · + (rrk )wk
are both in span{w1 , . . . , wk }. Hence W ⊂ V is closed under addition and scalar multipli-
cation, and is a subspace by Theorem 5.1.4. 

300
§5.2 Construction of Subspaces

For example, let


{e:vandw} v = (2, 1, 0) and w = (1, 1, 1) (5.2.4)
be vectors in R . Then linear combinations of the vectors v and w have the form
3

αv + βw = (2α + β, α + β, β)
for real numbers α and β. Note that every one of these vectors is a solution to the linear
equation
{ex1} x1 − 2x2 + x3 = 0, (5.2.5)
that is, the 1st coordinate minus twice the 2nd coordinate plus the 3rd coordinate equals
zero. Moreover, you may verify that every solution of (5.2.5) is a linear combination of the
vectors v and w in (5.2.4). Thus, the set of solutions to the homogeneous linear equation
(5.2.5) is a subspace, and that subspace can be written as the span of all linear combinations
of the vectors v and w.
In this language we see that the process of solving a homogeneous system of linear equations
is just the process of finding a set of vectors that span the subspace of all solutions. Indeed,
we can now restate Theorem 2.4.6 of Chapter 2. Recall that a matrix A has rank ` if it is
{P:n-rank} row equivalent to a matrix in echelon form with ` nonzero rows.
Proposition 5.2.5. Let A be an m × n matrix with rank `. Then the null space of A is the
span of n − ` vectors.

We have now seen that there are two ways to describe subspaces — as solutions of homo-
geneous systems of linear equations and as a span of a set of vectors, the spanning set.
Much of linear algebra is concerned with determining how one goes from one description of
a subspace to the other.

Exercises

In Exercises 1 – 4 a single equation in three variables is given. For each equation write the subspace
{c5.2.1a} of solutions in R as the span of two vectors in R .
3 3

1. 4x − 2y + z = 0.
Answer: The subspace of solutions can be spanned by the vectors (1, 0, −4)t and (0, 1, 2)t .
Solution: All solutions to 4x − 2y + z = 0 can be written in the form
       
x x 1 0
 y = y  = x 0  + y 1 .
z 2y − 4x −4 2

301
§5.2 Construction of Subspaces

{c5.2.1b}
2. x − y + 3z = 0.
Answer: The subspace of solutions can be spanned by the vectors (1, 1, 0)t and (−3, 0, 1)t .
Solution: All solutions to x − y + 3z = 0 can be written in the form
       
x y − 3z 1 −3
 y = y  = y 1  + z 0 .
z z 0 1
{c5.2.1c}
3. x + y + z = 0.
Answer: The subspace of solutions can be spanned by the vectors (1, 0, −1)t and (0, 1, −1)t .
Solution: All solutions to x + y + z = 0 can be written in the form
       
x x 1 0
 y = y  = x 0  + z 1 .
z −x − y −1 −1
{c5.2.1d}
4. y = z.
Answer: The subspace of solutions can be spanned by the vectors (1, 0, 0)t and (0, 1, 1)t .
Solution: All solutions to y = z can be written in the form
       
x x 1 0
 y = y  = x 0  + z 1 .
z y 0 1

In Exercises 5 – 8 each of the given matrices is in reduced echelon form. Write solutions of the
{c5.2.2a} corresponding homogeneous system of linear equations as a span of vectors.
 
1 2 0 1 0
5. A =  0 0 1 4 0 .
0 0 0 0 1
Answer: The subspace of solutions is spanned by the vectors

(−2, 1, 0, 0, 0)t and (−1, 0, −4, 1, 0)t .

Solution: Let x = (x1 , . . . , x5 ) be a solution to Ax = 0. All solutions to this equation have the
form        
x1 −2x2 − x4 −2 −1
 x2   x 2
  1   0 
       
 x3  = 
   −4x4  = x2  0  + x4  −4
   
.

 x4   x4   0   1 
x5 0 0 0

302
§5.2 Construction of Subspaces

{c5.2.2b}
 
1 3 0 5
6. B = .
0 0 1 2
Answer: The subspace of solutions to Bx = 0 is spanned by the vectors
(−3, 1, 0, 0)t and (−5, 0, −2, 1)t .

Solution: Let x = (x1 , x2 , x3 , x4 ) be a solution to Bx = 0. All solutions to this equation have the
form        
x1 −3x2 − 5x4 −3 −5
 x2   x2  = x2  1  + x4  0
    
 = .
 x3   −2x4   0   −2 
x4 x4 0 1
{c5.2.2c}
 
1 0 2
7. A = .
0 1 1
Answer: The subspace of solutions to Ax = 0 is spanned by the vector (−2, −1, 1)t .
Solution: Let x = (x1 , x2 , x3 ) be a solution to Ax = 0. All solutions to this equation have the
form      
x1 −2x3 −2
 x2  =  −x3  = x3  −1  .
x3 x3 1
{c5.2.2d}
 
1 −1 0 5 0 0
8. B =  0 0 1 2 0 2 .
0 0 0 0 1 2
Answer: The subspace of solutions to Bx = 0 is spanned by the vectors
     
1 −5 0
 1   0   0 
     
 ,  −2  ,  −2  .
 0     

 0   1   0 
     
 0   0   −2 
0 0 1

Solution: Let x = (x1 , . . . , x6 ) be a solution to Bx = 0. All solutions to this equation have the
form          
x1 x2 − 5x4 1 −5 0
 x2   x2   1   0   0 
         
 x3   −2x4 − 2x6   −2 
 + x6  −2  .
 0   
 =  = x2   + x4 
 x4   x4
  0   1   0 
         
 x5   −2x6   0   0   −2 
x6 x6 0 0 1

303
§5.2 Construction of Subspaces

{c5.2.3}
9. Write a system of two linear equations of the form Ax = 0 where A is a 2 × 4 matrix whose
subspace of solutions in R4 is the span of the two vectors
   
1 0
 −1   0 
 0  and v2 =  1  .
v1 =    

0 −1

Answer: The matrix A whose subspace of solutions in R4 is the span of v1 and v2 is


 
1 1 0 0
A= .
0 0 1 1

Solution: Note that all vectors x in the spanning set of v1 and v2 are of the form:
       
x1 1 0 a
 = a  −1  + b  0  =  −a  .
 x2       
x=  x3   0   1   b 
x4 0 −1 −b

Therefore, x1 = −x2 and x3 = −x4 . So,

x1 + x2 = 0
x3 + x4 = 0.

The matrix of this system is A.

For each subset of a vector space given in Exercises (10)-(13) determine whether the subset is a
vector subspace and if it is a vector subspace, find the smallest number of vectors that spans the
space.
{C5.2.3A}
10. S = {p(t) ∈ P5 : p(2) = 0 and p0 (1) = 0}
Answer: S is a subspace that is spanned by four vectors.
Solution: Verify that S is a subspace by showing that it is closed under addition and scalar
multiplication. Let p, q be in the subset. Then

(p + q)(2) = p(2) + q(2) = 0+0 = 0


(p + q)0 (1) = p0 (1) + q 0 (1) = 0+0 = 0

So the subset is closed under addition. Let c ∈ R and calculate

(cp)(2) = cp(2) = c0 = 0 and (cp)0 (1) = cp0 (1) = c0 = 0

304
§5.2 Construction of Subspaces

Hence the subset is also closed under scalar multiplication, and the subset is a subspace.
Next we compute a spanning set for S. A polynomial p ∈ P5 has the form

p(t) = a0 + a1 t + a2 t2 + a3 t3 + a4 t4 + a5 t5

where a0 , a1 , a2 , a3 , a4 , a5 ∈ R. It follows that dim P5 = 6. We calculate the conditions on the


coefficients that are needed for p(t) to be in the subspace. First note that

p0 (t) = a1 + 2a2 t + 3a3 t2 + 4a4 t3 + 5a5 t4


p(2) = a0 + 2a1 + 4a2 + 8a3 + 16a4 + 32a5 = 0
p0 (1) = a1 + 2a2 + 3a3 + 4a4 + 5a5 = 0

Thus p(t) is in the subspace if and only if

p(2) = a0 + 2a1 + 4a2 + 8a3 + 16a4 + 32a5 = 0


{Cp5} (5.2.6)
p0 (1) = a1 + 2a2 + 3a3 + 4a4 + 5a5 = 0

It follows from (5.2.6) that p(t) is in the subspace if and only if a0 and a1 are determined by
a2 , a3 , a4 , a5 . Hence the subspace is spanned by four vectors.
{C5.2.3B}
11. T = symmetric 2 × 2 matrices. That is, T is the set of 2 × 2 matrices A so that A = AT .
Answer: T is a subspace that is spanned by 3 vectors.
Solution: Symmetric n × n matrices form a subspace of matrices. Suppose A = AT and B = B T
are n × n matrices. Then

(A + B)T = AT + B T = A + B and (cA)T = cAT = cA

So the set of symmetric matrices is closed under addition and scalar multiplication, and is a sub-
space. Specifically 2 × 2 symmetric matrices have the form
 
a b
S=
b c

and this set is spanned by 3 vectors:


     
1 0 0 1 0 0
0 0 1 0 0 1
{C5.2.3C}
12. U = 2 × 3 matrices in reduced row-echelon form
Answer: U is not a subspace.
Solution: The reduced echelon form matrices E do not form a subspace. Suppose E has a pivot
is the i, j position. Then the (i, j)th entry of E is 1. But 2E does not have a 1 in the (i, j)th entry
and the set is not closed under scalar multiplication.

305
§5.2 Construction of Subspaces

{C5.2.3D}
13. Let A be the 3 × 4 matrix  
1 2 1 2
A= 1 1 0 1 
0 1 1 1
and let
V = {y ∈ R3 : there exists x ∈ R4 such that Ax = y}

Answer: V is a subspace that is spanned by 2 vectors.


Solution: To show that V is a subspace of R3 , let y1 , y2 be vectors in V ; that is, y1 , y2 ∈ R3 such
that there exists vectors x1 , x2 ∈ R4 satisfying Ax1 = y1 and Ax2 = y2 . Then

A(x1 + x2 ) = Ax1 + Ax2 = y1 + y2

and y1 + y2 ∈ V . Similarly, cy1 ∈ V . To verify calculate

A(cx1 ) = cAx1 = cy1 .

So V is closed under addition and scalar multiplication.


The vector subspace V is spanned by the four columns of A
      
1 2 1 2
w1 =  1  w2 =  1  w3 =  0  w4 =  1 
0 1 1 1

Since w4 = w2 and w3 = w2 − w1 , it follows that V is spanned by two vectors w1 and w2 . Since V


cannot be spanned by one vector, it follows that V is spanned by exactly 2 vectors.

{c5.2.4}
 
2 2
14. Write the matrix A = as a linear combination of the matrices
−3 0
   
1 1 0 0
B= and C= .
0 0 1 0

     
2 2 1 1 0 0
A= =2 −3 = 2B − 3C.
−3 0 0 0 1 0

{c5.2.5}
15. Is (2, 20, 0) in the span of w1 = (1, 1, 3) and w2 = (1, 4, 2)? Answer this question by setting
up a system of linear equations and solving that system by row reducing the associated augmented
matrix.
Answer: The vector (2, 20, 0)t is in the span of w1 and w2 . Specifically, v = −4w1 + 6w2 .

306
§5.2 Construction of Subspaces

Solution: Note that, for some real numbers a and b,

(2, 20, 0)t = aw1 + bw2 = a(1, 1, 3)t + b(1, 4, 2)t

if v is in the span of w1 and w2 . This corresponds to the linear system

a + b =2
a + 4b = 20
3a + 2b =0

To find a and b, row reduce the augmented matrix of the system:


   
1 1 2 1 0 −4
 1 4 20  −→  0 1 6 .
3 2 0 0 0 0

The system is consistent; a = −4 and b = 6.

In Exercises 16 – 19 let W ⊂ C 1 be the subspace spanned by the two polynomials x1 (t) = 1 and
x2 (t) = t2 . For the given function y(t) decide whether or not y(t) is an element of W . Furthermore,
if y(t) ∈ W , determine whether the set {y(t), x2 (t)} is a spanning set for W .
{c5.2.6a}
16. y(t) = 1 − t2 ,
Answer: The function y(t) = 1 − t2 is an element of W and the set {y(t), x2 (t)} is a spanning set
for W .
Solution: The space W equals span{x1 (t), x2 (t)} where x1 (t) = 1 and x2 (t) = t2 . To show that
y(t) is an element of W , let a = 1 and b = −1, and compute

ax1 (t) + bx2 (t) = x1 (t) − x2 (t) = 1 − t2 = y(t).

To show that {y(t), x2 (t)} is a spanning set for W , rewrite every linear combination of x1 (t) and
x2 (t) in terms of y(t) and x2 (t), as follows:

ax1 (t) + bx2 (t) = a + bt2 = a(1 − t2 ) + (a + b)t2 = ay(t) + (a + b)x2 (t).
{c5.2.6b}
17. y(t) = t4 ,
The function y(t) = t4 is not in W .
{c5.2.6c}
18. y(t) = sin t,
The function y(t) = sin(t) is not in W .
{c5.2.6d}
19. y(t) = 0.5t2
Answer: The function y(t) = 0.5t2 is an element of W , but the set {y(t), x2 (t)} does not span W .

307
§5.2 Construction of Subspaces

Solution: When a = 0 and b = 0.5,


ax1 (t) + bx2 (t) = 0.5x2 (t) = 0.5t2 = y(t).
In this case, there exist functions in W that are not in span{y(t), x2 (t)}. For example, the function
x1 (t) = 1 cannot be written as a linear combination of x2 (t) and y(t).
{c5.2.7}
20. Let W ⊂ R4 be the subspace that is spanned by the vectors
w1 = (−1, 2, 1, 5) and w2 = (2, 1, 3, 0).
Find a linear system of two equations such that W = span{w1 , w2 } is the set of solutions of this
system.
Answer: The span of W is the set of solutions to the system
x1 + x2 − x3 = 0
.
3x2 − x3 − x4 = 0
where x = (x1 , x2 , x3 , x4 ) ∈ W . Row reduction of the associated matrix demonstrates that this
system is a valid solution set.
Solution: Solve for x as a linear combination of w1 and w2 by creating the matrix whose columns
are w1 and w2 , then setting up the equation:
   
−1 2   x1
 2 1  a  x2 

 1 3 
 = 
b  x3 
5 0 x4
where a and b are scalars. Then row reduce the associated augmented matrix:
   
−1 2 x1 1 3 x3
 −→  0 −5 x2 − 2x3
 2 1 x2   
 .
 1 3 x3   0 0 x1 + x2 − x3 
5 0 x4 0 0 −3x2 + x3 + x4
Extract from this solution the values that are independent of a and b to obtain the linear system
above.
{c5.2.8a}
21. Let V be a vector space and let v ∈ V be a nonzero vector. Show that
span{v, v} = span{v}.

Every vector x ∈ span{v} is of the form x = av = av + 0v, so x ∈ span{v, v}. Therefore,


span{v} ⊂ span{v, v}. Every vector y ∈ span{v, v} is of the form
y = bv + cv = (b + c)v ∈ span{v}.
Therefore span{v, v} ⊂ span{v, v}, so the two spans are equal.

308
§5.2 Construction of Subspaces

{c5.2.8b}
22. Let V be a vector space and let v, w ∈ V be vectors. Show that
span{v, w} = span{v, w, v + 3w}.

Every vector x ∈ span{v, w} is of the form


x = av + bw = av + bw + 0(v + 3w) ∈ span{v, w, v + 3w}.
Also, every vector y ∈ span{v, w, v + 3w} is of the form
y = cv + dw + f (v + 3w) = (c + f )v + (d + 3f )w ∈ span{v, w}.
Therefore, span{v, w} = span{v, w, v + 3w}.
{c5.2.9}
23. Let W = span{w1 , . . . , wk } be a subspace of the vector space V and let wk+1 ∈ W be another
vector. Prove that W = span{w1 , . . . , wk+1 }.
Since W = span{w1 , . . . , wk }, every vector x ∈ W can be written as the linear combination
x = a1 w1 + · · · + ak wk
for some choice of a1 . . . ak . Since wk+1 is a vector in W , it can therefore be written as
wk+1 = b1 w1 + · · · + bk wk
and any vector x ∈ W can be written as
x = a1 w1 + · · · + ak wk + ak+1 wk+1
= a1 w1 + · · · + ak wk + ak+1 b1 w1 + · · · + ak+1 bk wk
= (a1 + ak+1 b1 )w1 + · · · + (ak + ak+1 bk )wk .
So, W = span{w1 , . . . , wk+1 }.
{c5.2.9B}
24. Let W ⊂ Rn be a k-dimensional subspace where k < n. Define
V = {v ∈ Rn : v · w = 0 for all w ∈ W }
Show that V is a subspace of R . n

Solution: Let v1 and v2 be in V . Then


(v1 + v2 ) · w = v1 · w + v2 · w = 0 + 0 = 0
for all w ∈ W . Hence v1 + v2 ∈ V . Also
(cv1 ) · w = c(v1 · w) = c · 0 = 0
for all c ∈ R and cv1 ∈ V . It follows that V is closed under addition and scalar multiplication and
is a subspace.

309
§5.2 Construction of Subspaces

{c5.2.10}
25. Let Ax = b be a system of m linear equations in n unknowns, and let r = rank(A) and
s = rank(A|b). Suppose that this system has a unique solution. What can you say about the
relative magnitudes of m, n, r, s?
Answer: The relationship of the constants is m ≥ n = r = s.
Solution: The rank of matrix A cannot be greater than the rank of matrix (A|b), since (A|b)
consists of A plus one column. The rank of A is the number of pivots in the row reduced matrix.
(A|b) can be row reduced through the same operations, and will have either the same number of
pivots as A or, if there is a pivot in the last column, one more pivot than A. Since the system has
a unique solution, it is consistent, and therefore (A|b) cannot have a pivot in the (n + 1)st column,
so r = rank(A) = rank(A|b) = s.
The set of solutions is parameterized by n − r parameters, where n is the number of columns
of A. Since there is a unique solution, the set of solutions is parameterized by 0 parameters, so
n = r.
The number m of rows of the matrix must be greater than or equal to n in order for the
system to have a unique solution, since there must be n pivots, and each pivot must be in a separate
row.

310
§5.3 Spanning Sets and MATLAB

{S:5.3} 5.3 Spanning Sets and MATLAB


In this section we discuss:

• how to find a spanning set for the subspace of solutions to a homogeneous system of
linear equations using the MATLAB command null, and
• how to determine when a vector is in the subspace spanned by a set of vectors using
the MATLAB command rref.

Spanning Sets for Homogeneous Linear Equations In Chapter 2 we saw how to use
Gaussian elimination, back substitution, and MATLAB to compute solutions to a system of
linear equations. For systems of homogeneous equations, MATLAB provides a command to
find a spanning set for the subspace of solutions. That command is null. For example, if
we type

A = [2 1 4 0; -1 0 2 1]
B = null(A)

then we obtain

B =
0.4830 0
-0.4140 0.8729
-0.1380 -0.2182
0.7591 0.4364

The two columns of the matrix B span the set of solutions of the equation Ax = 0. In par-
ticular, the vector (2, −8, 1, 0) is a solution to Ax = 0 and is therefore a linear combination
of the column vectors of B. Indeed, type

4.1404*B(:,1)-7.2012*B(:,2)

and observe that this linear combination is the desired one.


Next we describe how to find the coefficients 4.1404 and -7.2012 by showing that these
coefficients themselves are solutions to another system of linear equations.

When is a Vector in a Span? Let w1 , . . . , wk and v be vectors in Rn . We now describe a


method that allows us to decide whether v is in span{w1 , . . . , wk }. To answer this question

311
§5.3 Spanning Sets and MATLAB

one has to solve a system of n linear equations in k unknowns. The unknowns correspond
to the coefficients in the linear combination of the vectors w1 , . . . , wk that gives v.
Let us be more precise. The vector v is in span{w1 , . . . , wk } if and only if there are constants
r1 , . . . , rk such that the equation

{e:lindepeqn} r1 w1 + · · · + rk wk = v (5.3.1)

is valid. Define the n × k matrix A as the one having w1 , . . . , wk as its columns; that is,

{E:Abycol} A = (w1 | · · · |wk ). (5.3.2)

Let r be the k-vector  


r1
r =  ...  .
 

rk
Then we may rewrite equation (5.3.1) as

{E:Ar=v} Ar = v. (5.3.3)

To summarize:

Lemma 5.3.1. Let w1 , . . . , wk and v be vectors in Rn . Then v is in span{w1 , . . . , wk } if


and only if the system of linear equations (5.3.3) has a solution where A is the n × k defined
in (5.3.2).

To solve (5.3.3) we row reduce the augmented matrix (A|v). For example, is v = (2, 1) in
the span of w1 = (1, 1) and w2 = (1, −1)? That is, do there exist scalars r1 , r2 such that
     
1 1 2
r1 + r2 = ?
1 −1 1

As noted, we can rewrite this equation as


    
1 1 r1 2
= .
1 −1 r2 1

We can solve this equation by row reducing the augmented matrix


 
1 1 2
1 −1 1

312
§5.3 Spanning Sets and MATLAB

to obtain  
3
 1 0
2 
 1 .
0 1
2
3 1
So v = w1 + w2 .
2 2
Row reduction to reduced echelon form has been preprogrammed in the MATLAB command
rref. Consider the following example. Let
{e:w1w2} w1 = (2, 0, −1, 4) and w2 = (2, −1, 0, 2) (5.3.4)
and ask the question whether v = (−2, 4, −3, 4) is in span{w1 , w2 }.
In MATLAB load the matrix A having w1 and w2 as its columns and the vector v by typing
e5_3_5    
2 2 −2
 0 −1 
{e:Aandv}  and v =  4  . (5.3.5*)
 
A=  −1  −3 
0 
4 2 4
We can solve the system of equations using MATLAB. First, form the augmented matrix by
typing

aug = [A v]

Then solve the system by typing rref(aug) to obtain

ans =
1 0 3
0 1 -4
0 0 0
0 0 0

It follows that (r1 , r2 ) = (3, −4) is a solution and v = 3w1 − 4w2 .


Now we change the 4th entry in v slightly by typing v(4) = 4.01. There is no solution to
the system of equations  
−2
 4 
Ar =  −3 

4.01
as we now show. Type

313
§5.3 Spanning Sets and MATLAB

aug = [A v]
rref(aug)

which yields

ans =
1 0 0
0 1 0
0 0 1
0 0 0

This matrix corresponds to an inconsistent system; thus v is no longer in the span of w1


and w2 .

Exercises

In Exercises 1 – 3 use the null command in MATLAB to find all the solutions of the linear system
of equations Ax = 0.

{c5.3.1a} 1. (matlab)  
−4 0 −4 3
{e:BCDa} A= (5.3.6*)
−4 1 −1 1

Type null(A) in MATLAB to find that the set of solutions to Ax = 0 is spanned by the vectors
   
0.3225 0
 0.8931   −0.1961 
 −0.0992  and  0.5883  .
   

0.2977 0.7845

{c5.3.1b} 2. (matlab)  
1 2
{e:BCDb} A= 1 0  (5.3.7*)
3 −2

Answer: The solution to Ax = 0 is the vector (0, 0).


Solution: This can be demonstrated by typing null(A) in MATLAB , which yields

ans =
Empty matrix: 2-by-0

314
§5.3 Spanning Sets and MATLAB

{c5.3.1c} 3. (matlab)  
1 1 2
{e:BCDc} A= . (5.3.8*)
−1 2 −1

The set of solutions to Ax = 0 is spanned by the vector


 
−0.8452
 −0.1690  .
0.5071

{c5.3.2} 4. (matlab) Use the null command in MATLAB to verify your answers to Exercises 5 and 6.
Enter matrix A into MATLAB and type null(A), obtaining

ans =
-0.8957 0
0.4414 0.1204
-0.0519 0.9631
0.0130 -0.2408
0 0

The MATLAB answer is a valid solution if the vectors found by row reduction can be written as
linear combinations of the MATLAB answers. In MATLAB , row reduce the augmented matrices
null(A)|x, and null(A)|y where x = (−2, 1, 0, 0, 0) and y = (−1, 0, −4, 1, 0) to find that
     
−2 −0.8957 0
 1   0.4414   0.1204 
     
 0  = 2.2328  −0.0519  + 0.1204  0.9631 
     
 0   0.0130   −0.2408 
0 0 0
and      
−1 −0.8957 0

 0 


 0.4414 


 0.1204 


 −4  = 1.1164 
  −0.0519  − 4.0931 
  0.9631 .

 1   0.0130   −0.2408 
0 0 0

Enter matrix B into MATLAB and type null(B), obtaining

ans =
-0.9661 0
0.2070 -0.5976
-0.1380 -0.7171
0.0690 0.3586

315
§5.3 Spanning Sets and MATLAB

Again, the MATLAB answer can be checked by verifying by row reduction of the augmented matrix
that the vectors (−3, 1, 0, 0) and (−5, 0, −2, 1) can be written as linear combinations of the MAT-
LAB solution vectors. In particular,
     
−3 −0.9661 0
 1   0.2070   −0.5976 
 0  = 3.1053  −0.1380  − 0.5976  −0.7171 
     

0 0.0690 0.3586
and      
−5 −0.9661 0
 0   0.2070   −0.5976 
 −2  = 5.1755  −0.1380  + 1.7928  −0.7171  .
     

1 0.0690 0.3586

{c5.3.3} 5. (matlab) Use row reduction to find the solutions to Ax = 0 where A is given in (5.3.6*). Does
your answer agree with the MATLAB answer using null? If not, explain why.
Answer: The solution set of Bx = 0 is
   3     3 
x1 −x3 + x4 −1
 x2    4   −3   4 
  =  −3x3 + 2x4  = x3 
  + x4  2 .

 x3    1  
x3   0 
x4 x4 0 1

Solution: Row reduce B:


3
  !
−4 0 4 3 1 0 1 −
−→ 4 .
−4 1 −1 1 0 1 3 −2

The solution obtained by row reduction is not the same as the one obtained using null, but
the solution vectors are linear combinations of the MATLAB solution vectors, so the answers are
equivalent. By row reducing the matrix [null(B) x], where x = (−1, −3, 1, 0), we find that
     
−1 0.3225 0
 −3   0.8931   −0.1961 
 1  = −3.1009  −0.0992  + 1.1767  0.5883  .
     

0 0.2977 0.7845
3
By row reducing the matrix [null(B) y] where y = ( , 2, 0, 1) we find that:
4
 3     
0.3225 0
 4   0.8931   −0.1961 
 2 
  = 2.3257 
 −0.0992  + 0.3922  0.5883  .
  
 0 
1 0.2977 0.7845

316
§5.3 Spanning Sets and MATLAB

In Exercises 6 – 8 let W ⊂ R5 be the subspace spanned by the vectors

{MATLAB:65} w1 = (2, 0, −1, 3, 4), w2 = (1, 0, 0, −1, 2), w3 = (0, 1, 0, 0, −1). (5.3.9*)

Use MATLAB to decide whether the given vectors are elements of W .

{c5.3.4a} 6. (matlab) v1 = (2, 1, −2, 8, 3).


Answer: Vector v1 is an element of W .
Solution: The vector v1 is an element of W if there exist scalars a, b, and c such that

aw1 + bw2 + cw3 = v1 .

Using MATLAB , create the matrix A = [w1' w2' w3'], which has w1 , w2 , and w3 as its columns.
Then create the augmented matrix aug1 = [A v1']. The command rref(aug1) yields

ans =
1 0 0 2
0 1 0 -2
0 0 1 1
0 0 0 0
0 0 0 0

Since there is no pivot point in the last column, the linear system aw1 +bw2 +cw3 = v1 is consistent,
and v1 = 2w1 − 2w2 + w3 .

{c5.3.4b} 7. (matlab) v2 = (−1, 12, 3, −14, −1).


Answer: Vector v2 is not an element of W .
Solution: Create the augmented matrix aug2 = [A v2']. Row reducing aug2 yields

ans =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 0 0 0

There is a pivot point in the last column, so the linear system aw1 + bw2 + cw3 = v2 is inconsistent.

{c5.3.4c} 8. (matlab) v3 = (−1, 12, 3, −14, −14).


Answer: Vector v3 is an element of W .
Solution: Row reduce the augmented matrix aug3 = [A v3'] to obtain

317
§5.3 Spanning Sets and MATLAB

ans =
1 0 0 -3
0 1 0 5
0 0 1 12
0 0 0 0
0 0 0 0

which is the matrix of a consistent linear system. Therefore, v3 = −3w1 + 5w2 + 12w3 .

318
§5.4 Linear Dependence and Linear Independence

{S:5.4} 5.4 Linear Dependence and Linear Independence


An important question in linear algebra concerns finding spanning sets for subspaces having
the smallest number of vectors. Let w1 , . . . , wk be vectors in a vector space V and let
W = span{w1 , . . . , wk }. Suppose that W is generated by a subset of these k vectors. Indeed,
suppose that the k th vector is redundant in the sense that W = span{w1 , . . . , wk−1 }. Since
wk ∈ W , this is possible only if wk is a linear combination of the k − 1 vectors w1 , . . . , wk−1 ;
that is, only if
{e:depend} wk = r1 w1 + · · · + rk−1 wk−1 . (5.4.1)
{lineardependence}
Definition 5.4.1. Let w1 , . . . , wk be vectors in the vector space V . The set {w1 , . . . , wk }
is linearly dependent if one of the vectors wj can be written as a linear combination of the
remaining k − 1 vectors.

Note that when k = 1, the phrase ‘{w1 } is linearly dependent’ means that w1 = 0.
If we set rk = −1, then we may rewrite (5.4.1) as

r1 w1 + · · · + rk−1 wk−1 + rk wk = 0.

{L:lindep} It follows that:


Lemma 5.4.2. The set of vectors {w1 , . . . , wk } is linearly dependent if and only if there
exist scalars r1 , . . . , rk such that

(a) at least one of the rj is nonzero, and


(b) r1 w1 + · · · + rk wk = 0.

For example, the vectors w1 = (2, 4, 7), w2 = (5, 1, −1), and w3 = (1, −7, −15) are linearly
{linearindependence} dependent since 2w1 − w2 + w3 = 0.
Definition 5.4.3. A set of k vectors {w1 , . . . , wk } is linearly independent if none of the k
vectors can be written as a linear combination of the other k − 1 vectors.

{L:linindep} Since linear independence means not linearly dependent, Lemma 5.4.2 can be rewritten as:
Lemma 5.4.4. The set of vectors {w1 , . . . , wk } is linearly independent if and only if when-
ever
r1 w1 + · · · + rk wk = 0,
it follows that
r1 = r2 = · · · = rk = 0.

319
§5.4 Linear Dependence and Linear Independence

Let ej be the vector in Rn whose j th component is 1 and all of whose other components
are 0. The set of vectors e1 , . . . , en is the simplest example of a set of linearly independent
vectors in Rn . We use Lemma 5.4.4 to verify independence by supposing that
r1 e1 + · · · + rn en = 0.
A calculation shows that
0 = r1 e1 + · · · + rn en = (r1 , . . . , rn ).
It follows that each rj equals 0, and the vectors e1 , . . . , en are linearly independent.

Deciding Linear Dependence and Linear Independence Deciding whether a set of k vec-
tors in Rn is linearly dependent or linearly independent is equivalent to solving a system of
linear equations. Let w1 , . . . , wk be vectors in Rn , and view these vectors as column vectors.
Let
{E:Ank} A = (w1 | · · · |wk ) (5.4.2)
be the n × k matrix whose columns are the vectors wj . Then a vector
 
r1
R =  ... 
 

rk
is a solution to the system of equations AR = 0 precisely when
r1 w1 + · · · + rk wk = 0. (5.4.3)
If there is a nonzero solution R to AR = 0, then the vectors {w1 , . . . , wk } are linearly
dependent; if the only solution to AR = 0 is R = 0, then the vectors are linearly independent.
The preceding discussion is summarized by:
Lemma 5.4.5. The vectors w1 , . . . , wk in Rn are linearly dependent if the null space of the
n × k matrix A defined in (5.4.2) is nonzero and linearly independent if the null space of A
is zero.

A Simple Example of Linear Independence with Two Vectors The two vectors
   
2 1
 −8   −2 
w1 =  1  and w2 =  0 
  

0 1

320
§5.4 Linear Dependence and Linear Independence

are linearly independent. To see this suppose that r1 w1 + r2 w2 = 0. Using the components
of w1 and w2 this equality is equivalent to the system of four equations

2r1 + r2 = 0, −8r1 − 2r2 = 0, r1 = 0, and r2 = 0.

In particular, r1 = r2 = 0; hence w1 and w2 are linearly independent.

Using MATLAB to Decide Linear Dependence Suppose that we want to determine whether
or not the vectors
       
1 −1 1 0
 2   1   1   4 
{MATLAB:66} (5.4.4*)
       
w1 =  −1  w2 =  4  w3 =  −1  w4 = 
     
 3 

 3   −2   3   1 
5 0 12 −2

are linearly dependent. After typing e5_4_4 in MATLAB, form the 5 × 4 matrix A by typing

A = [w1 w2 w3 w4]

Determine whether there is a nonzero solution to AR = 0 by typing

null(A)

The response from MATLAB is

ans =
-0.7559
-0.3780
0.3780
0.3780

showing that there is a nonzero solution to AR = 0 and the vectors wj are linearly dependent.
Indeed, this solution for R shows that we can solve for w1 in terms of w2 , w3 , w4 . We can
now ask whether or not w2 , w3 , w4 are linearly dependent. To answer this question form the
matrix

B = [w2 w3 w4]

and type null(B) to obtain

321
§5.4 Linear Dependence and Linear Independence

ans =
Empty matrix: 3-by-0

showing that the only solution to BR = 0 is the zero solution R = 0. Thus, w2 , w3 , w4


are linearly independent. For these particular vectors, any three of the four are linearly
independent.

Exercises

{c5.4.1}
1. Let w be a vector in the vector space V . Show that the sets of vectors {w, 0} and {w, −w} are
linearly dependent.
To show that the set of vectors {w1 , w2 } is linearly dependent, show that there exist nonzero a and
b such that aw1 + bw2 = 0. For the set {w, 0}, if a = 0 and b = 1, then 0w + 1(0) = 0, so the set is
linearly dependent. For the set {w, −w}, if a = 1 and b = 1, then w − w = 0, so the set is linearly
dependent.
{c5.4.2}
2. For which values of b are the vectors (1, b) and (3, −1) linearly independent?
1
Answer: The set is linearly independent if b 6= − .
3
Solution: Note that a set of two vectors is linearly dependent if one is a multiple of the other. So
this set is dependent for any values of b at which
(3, −1) = α(1, b).
1
When equality holds α = 3. Therefore, b = − .
3
{c5.4.3}
3. Let
u1 = (1, −1, 1) u2 = (2, 1, −2) u3 = (10, 2, −6).
Is the set {u1 , u2 , u3 } linearly dependent or linearly independent?
Answer: The set is linearly dependent.
Solution: Let A be the matrix whose columns are u1 , u2 , and u3 . The set {u1 , u2 , u3 } is linearly
dependent if there exists a nonzero vector r = (r1 , r2 , r3 ) such that r1 u1 + r2 u2 + r3 u3 = 0, that is,
if the homogeneous system Ar = 0 has a nonzero solution. Row reduce:
   
1 2 10 1 0 2
 −1 1 2  −→  0 1 4  .
1 −2 −6 0 0 0

322
§5.4 Linear Dependence and Linear Independence

So, Ar = 0 when r = r3 (−2, −4, 1). The value of r is nonzero for r3 6= 0, so the set is indeed
linearly dependent. As an example, let r3 = 1. Then,
−4u1 − 2u2 + u3 = −2(1, −1, 1) − 4(2, 1, −2) + (10, 2, −6) = (0, 0, 0) = 0.
{c5.4.4}
4. For which values of b are the vectors (1, b, 2b) and (2, 1, 4) linearly independent?
Answer: The vectors (1, b, 2b) and (2, 1, 4) are linearly independent for any value of b.
Solution: Two vectors are linearly independent unless one is a multiple of the other; in this case,
unless
(1, b, 2b) = α(2, 1, 4).
1 1
Equality holds if 2α = 1, α = b, and 4α = 2b. Therefore, α = , b = and b = 1, which is
2 2
inconsistent, so the vectors are linearly independent.
{C5.4.10}
5. Let  
 
3 −6 0
u= 2  v= 1  w= 5 
−5 7 −3

(a) Determine whether the sets {u, v}, {u, w}, {v, w} are linearly independent.
(b) Is the set {u, v, w} linearly independent?

Answer:

(a) yes, yes, yes.


(b) no

Solution:

(a) A set of two vectors is linearly dependent if and only if one is a scalar multiple of the other.
None of the vectors is a scalar multiple of another. Thus, {u, v}, {u, w}, {v, w} are linearly
independent.
(b) To tell whether {u, v, w} is linearly independent, we solve the equation au + bv + cw = 0:
       
3 −6 0 0
a 2  + b 1  + c 5  =  0 
−5 7 −3 0
We row reduce the coefficient matrix of this system:
       
3 −6 0 1 −2 0 1 −2 0 1 −2 0
 2 1 5 → 2 1 5 → 0 5 5 → 0 1 1 
−5 7 −3 −5 7 −3 0 −3 −3 −5 1 −1

323
§5.4 Linear Dependence and Linear Independence

   
1 −2 0 1 0 2
→ 0 1 1 → 0 1 1 
0 0 0 0 0 0
Therefore, there are infinitely many solutions to this linearly system and {u, v, w} is linearly
dependent.

{c5.4.5}
6. Show that the polynomials p1 (t) = 2+t, p2 (t) = 1+t2 , and p3 (t) = t−t2 are linearly independent
vectors in the vector space C 1 .
Answer: The polynomials p1 (t) = 2 + t, p2 (t) = 1 + t2 , and p3 (t) = t − t2 are linearly independent
in C 1 .
Solution: We can determine this by noting that the polynomials are linearly dependent if there
exists a nonzero vector r = (r1 , r2 , r3 ) such that r1 p1 + r2 p2 + r3 p3 = 0. It is convenient to represent
each polynomial as a vector (a, b, c) = p(t) = a + bt + ct2 . Thus, p1 (t) = (2, 1, 0), p2 (t) = (1, 0, 1),
and p3 (t) = (0, 1, −1). Solve the homogeneous system Ar = 0, where A is the matrix whose columns
are p1 , p2 , and p3 , by row reduction.
   
2 1 0 1 0 0
 1 0 1  −→  0 1 0  .
0 1 −1 0 0 1

Therefore, there are no nonzero values of r for which r1 p1 + r2 p2 + r3 p3 = 0, and the polynomials
are linearly independent.

{c5.4.6}
 π
7. Show that the functions f1 (t) = sin t, f2 (t) = cos t, and f3 (t) = cos t + are linearly
3
dependent vectors in C .
1

The three functions are linearly dependent vectors in C 1 since there exists a nonzero vector r =
(r1 , r2 , r3 ) such that r1 f1 (t) + r2 f2 (t) + r3 f3 (t) = 0. We can find this vector r using trigonometric
identities:
√ √
 π π π 1 3 1 3
f3 (t) = cos t + = cos cos t + sin sin t = cos t − sin t = f1 (t) − f2 (t).
3 3 3 2 2 2 2

1 3
That is, f1 (t) + f2 (t) − f3 (t) = 0.
2 2
{c5.4.7}
8. Suppose that the three vectors u1 , u2 , u3 ∈ Rn are linearly independent. Show that the set

{u1 + u2 , u2 + u3 , u3 + u1 }

is also linearly independent.

324
§5.4 Linear Dependence and Linear Independence

To show that the vectors u1 + u2 , u2 + u3 and u3 + u1 are linearly independent, we assume that
there exist scalars r1 , r2 , r3 such that

r1 (u1 + u2 ) + r2 (u2 + u3 ) + r3 (u3 + u1 ) = 0.

We then prove that r1 = r2 = r3 = 0, as follows. Use distribution to obtain

(r1 + r3 )u1 + (r1 + r2 )u2 + (r2 + r3 )u3 = 0.

Since the set {u1 , u2 , u3 } is linearly independent,

r1 + r3 = 0
r1 + r2 = 0
r2 + r3 = 0.

Solving this system yields r1 = r2 = r3 = 0, so the set {u1 + u2 , u2 + u3 , u3 + u1 } is linearly


independent.

{mc.exerciseErr6_M}
9. Consider the matrix  
1 −6 −1
 −3 0 2 
A=
 .
0 1 −2 
0 −2 4

(a) Show that the rows of A are linearly dependent.


(b) Find two subsets S1 and S2 of rows of A such that
(i) S1 6= S2 .
(ii) span S1 = span S2 .
(iii) The vectors in S1 are linearly independent and the vectors in S2 are linearly independent.

Answer:

(a) Let {R1 , R2 , R3 , R4 } be the rows of A. Linear dependence follows from

0R1 + 0R3 + 2R3 + R4 = 0.

(b) S1 = {R1 , R2 , R3 } and S2 = {R1 , R2 , R4 } satisfy the three conditions.

Solution:

(a) By inspection, R4 = −2R3 . This implies, by Lemma 5.4.2, that the rows of A are linearly
dependent.
(b) Let S1 = {R1 , R2 , R3 } and S2 = {R1 , R2 , R4 }.

325
§5.4 Linear Dependence and Linear Independence

(i) The sets of vectors S1 and S2 are unequal.


(ii) Verify that span S1 = span S2 as follows. A linear combination of vectors in S1 has the
form
1
r1 R1 + r2 R2 + r3 R3 = r1 R1 + r2 R2 − r3 R4
2
which is a linear combination of vectors in S2 .
(iii) Use Lemma 5.4.4 to see that S1 is a linearly independent set of vectors. More precisely,
the augmented matrix for the system of equations r1 R1 + r2 R2 + r3 R3 = 0 row reduces
to  
1 0 0 0
 0 1 0 0 .
0 0 1 0
Thus, r1 = r2 = r3 = 0 so Lemma 5.4.4 implies S1 is a linearly independent set of vectors.
The same reasoning shows that S2 is a linearly independent set of vectors. Alternatively,
since R4 = −2R3 , linear independence of S1 implies linear independence of S2 .

In Exercises 10 – 12, determine whether the given sets of vectors are linearly independent or linearly
dependent.

{c5.4.8a} 10. (matlab)


{MATLAB:67} v1 = (2, 1, 3, 4) v2 = (−4, 2, 3, 1) v3 = (2, 9, 21, 22) (5.4.5*)

Answer: The set {v1 , v2 , v3 } is linearly dependent.


Solution: The set is linearly dependent if there exist scalars r1 , r2 , and r3 such that r1 v1 + r2 v2 +
r3 v3 = 0. Create a matrix A whose columns are v1 , v2 and v3 . Then row reduce A to solve the
homogeneous system Ar = 0. Specifically, row reducing the matrix A = [v1 v2 v3] yields

ans =
1 0 5
0 1 2
0 0 0
0 0 0

So −5v1 − 2v2 + v3 = 0.

{c5.4.8b} 11. (matlab)


{MATLAB:68} w1 = (1, 2, 3) w2 = (2, 1, 5) w3 = (−1, 2, −4) w4 = (0, 2, −1) (5.4.6*)

Answer: The set {w1 , w2 , w3 , w4 } is linearly dependent.


Solution: Create the matrix A associated to the set {w1 , w2 , w3 , w4 }, and row reduce to solve for
r = (r1 , r2 , r3 , r4 ), obtaining

326
§5.4 Linear Dependence and Linear Independence

ans =
1.0000 0 0 0.1429
0 1.0000 0 0.2857
0 0 1.0000 0.7143

Therefore, −0.1429w1 − 0.2857w2 − 0.7143w3 + w4 = 0.

{c5.4.8c} 12. (matlab)


{MATLAB:69} x1 = (3, 4, 1, 2, 5) x2 = (−1, 0, 3, −2, 1) x3 = (2, 4, −3, 0, 2) (5.4.7*)

Answer: The set {x1 , x2 , x3 } is linearly independent.


Solution: The matrix A associated to the set {x1 , x2 , x3 } row reduces to

ans =
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0

In this case, there are no nonzero solutions to r1 x1 + r2 x2 + r3 x3 = 0.

{c5.4.9} 13. (matlab) Perform the following experiments.

(a) Use MATLAB to choose randomly three column vectors in R3 . The MATLAB commands to
choose these vectors are:
y1 = rand(3,1)
y2 = rand(3,1)
y3 = rand(3,1)
Use the methods of this section to determine whether these vectors are linearly independent
or linearly dependent.
(b) Now perform this exercise five times and record the number of times a linearly independent
set of vectors is chosen and the number of times a linearly dependent set is chosen.
(c) Repeat the experiment in (b) — but this time randomly choose four vectors in R3 to be in
your set.

(a) The set of commands to perform this experiment is:

327
§5.4 Linear Dependence and Linear Independence

y1 = rand(3,1);
y2 = rand(3,1);
y3 = rand(3,1);
A = [y1 y2 y3];
rref(A)

If the resulting matrix is I3 , then the set is linearly independent.


(b) The most likely outcome is that all five trials result in linearly independent sets.
(c) Every trial yields a linearly dependent set of vectors.

328
§5.5 Dimension and Bases

{S:5.5} 5.5 Dimension and Bases


The minimum number of vectors that span a vector space has special significance.
Definition 5.5.1. The vector space V has finite dimension if V is the span of a finite
number of vectors. If V has finite dimension, then the smallest number of vectors that span
V is called the dimension of V and is denoted by dim V .

For example, recall that ej is the vector in Rn whose j th component is 1 and all of whose
other components are 0. Let x = (x1 , . . . , xn ) be in Rn . Then
{e:spanrn} x = x1 e1 + · · · + xn en . (5.5.1)
Since every vector in R is a linear combination of the vectors e1 , . . . , en , it follows that
n

Rn = span{e1 , . . . , en }. Thus, Rn is finite dimensional. Moreover, the dimension of Rn is


at most n, since Rn is spanned by n vectors. It seems unlikely that Rn could be spanned
by fewer than n vectors— but this point needs to be proved.

An Example of a Vector Space that is Not Finite Dimensional Next we discuss an example
of a vector space that does not have finite dimension. Consider the subspace P ⊂ C 1
consisting of polynomials of all degrees. We show that P is not the span of a finite number
of vectors and hence that P does not have finite dimension. Let p1 (t), p2 (t), . . . , pk (t) be a
set of k polynomials and let d be the maximum degree of these k polynomials. Then every
polynomial in the span of p1 (t), . . . , pk (t) has degree less than or equal to d. In particular,
p(t) = td+1 is a polynomial that is not in the span of p1 (t), . . . , pk (t) and P is not spanned
by finitely many vectors.

{basis} Bases and The Main Theorem


Definition 5.5.2. Let B = {w1 , . . . , wk } be a set of vectors in a vector space W . The
subset B is a basis for W if B is a spanning set for W with the smallest number of elements
in a spanning set for W .

It follows that if {w1 , . . . , wk } is a basis for W , then k = dim W . The main theorem about
{basis=span+indep} bases is:
Theorem 5.5.3. A set of vectors B = {w1 , . . . , wk } in a vector space W is a basis for W
if and only if the set B is linearly independent and spans W .

Remark: The importance of Theorem 5.5.3 is that we can show that a set of vectors is a
basis by verifying spanning and linear independence. We never have to check directly that
the spanning set has the minimum number of vectors for a spanning set.

329
§5.5 Dimension and Bases

For example, we have shown previously that the set of vectors {e1 , . . . , en } in Rn is linearly
independent and spans Rn . It follows from Theorem 5.5.3 that this set is a basis, and that
the dimension of Rn is n. In particular, Rn cannot be spanned by fewer than n vectors.
The proof of Theorem 5.5.3 is given in Section 5.6.

Consequences of Theorem 5.5.3 We discuss two applications of Theorem 5.5.3. First, we


use this theorem to derive a way of determining the dimension of the subspace spanned by
a finite number of vectors. Second, we show that the dimension of the subspace of solutions
to a homogeneous system of linear equation Ax = 0 is n − rank(A) where A is an m × n
matrix.

Computing the Dimension of a Span We show that the dimension of a span of vectors can
be found using elementary row operations on M .
{L:computerank}
Lemma 5.5.4. Let w1 , . . . , wk be k row vectors in Rn and let W = span{w1 , . . . , wk } ⊂ Rn .
Define
 
w1
M =  ... 
 

wk

to be the matrix whose rows are the wj s. Then

{e:dimW=rankM} dim(W ) = rank(M ). (5.5.2)

Proof To verify (5.5.2), observe that the span of w1 , . . . , wk is unchanged by

(a) swapping wi and wj ,

(b) multiplying wi by a nonzero scalar, and

(c) adding a multiple of wi to wj .

That is, if we perform elementary row operations on M , the vector space spanned by the
rows of M does not change. So we may perform elementary row operations on M until we
arrive at the matrix E in reduced echelon form. Suppose that ` = rank(M ); that is, suppose

330
§5.5 Dimension and Bases

that ` is the number of nonzero rows in E. Then


 
v1
 .. 
 . 
 
 v` 
E=  ,
 0 

 . 
 .. 
0

where the vj are the nonzero rows in the reduced echelon form matrix.
We claim that the vectors v1 , . . . , v` are linearly independent. It then follows from Theo-
rem 5.5.3 that {v1 , . . . , v` } is a basis for W and that the dimension of W is `. To verify the
claim, suppose
{e:rowsums} a1 v1 + · · · + a` v` = 0. (5.5.3)
We show that ai must equal 0 as follows. In the ith row, the pivot must occur in some
column — say in the j th column. It follows that the j th entry in the vector of the left hand
side of (5.5.3) is
0a1 + · · · + 0ai−1 + 1ai + 0ai+1 + · · · + 0a` = ai ,
since all entries in the j th column of E other than the pivot must be zero, as E is in reduced
echelon form. 

For instance, let W = span{w1 , w2 , w3 } in R4 where

w1 = (3, −2, 1, −1),


{eq:vectors} w2 = (1, 5, 10, 12), (5.5.4*)
w3 = (1, −12, −19, −25).

To compute dim W in MATLAB , type e5_5_4 to load the vectors and type

M = [w1; w2; w3]

Row reduction of the matrix M in MATLAB leads to the reduced echelon form matrix

ans =
1.0000 0 1.4706 1.1176
0 1.0000 1.7059 2.1765
0 0 0 0

331
§5.5 Dimension and Bases

indicating that the dimension of the subspace W is two, and therefore {w1 , w2 , w3 } is not
a basis of W . Alternatively, we can use the MATLAB command rank(M) to compute the
rank of M and the dimension of the span W .
However, if we change one of the entries in w3 , for instance w3(3)=-18 then indeed the
command rank([w1;w2;w3]) gives the answer three indicating that for this choice of vectors
{w1, w2, w3} is a basis for span{w1, w2, w3}.

Solutions to Homogeneous Systems Revisited We return to our discussions in Chapter 2 on


solving linear equations. Recall that we can write all solutions to the system of homogeneous
equations Ax = 0 in terms of a few parameters, and that the null space of A is the subspace
of solutions (See Definition 5.2.1). More precisely, Proposition 5.2.5 states that the number
of parameters needed is n − rank(A) where n is the number of variables in the homogeneous
system. We claim that the dimension of the null space is exactly n − rank(A).
For example, consider the reduced echelon form 3 × 7 matrix
 
1 −4 0 2 −3 0 8
{E:nullityexamp} A= 0 0 1 3 2 0 4  (5.5.5)
0 0 0 0 0 1 2

that has rank three. Suppose that the unknowns for this system of equations are x1 , . . . , x7 .
We can solve the equations associated with A by solving the first equation for x1 , the second
equation for x3 , and the third equation for x6 , as follows:

x1 = 4x2 − 2x4 + 3x5 − 8x7


x3 = −3x4 − 2x5 − 4x7
x6 = −2x7

Thus, all solutions to this system of equations have the form


 
4x2 − 2x4 + 3x5 − 8x7

 x2 

 −3x4 − 2x5 − 4x7 
{e:expandsoln} (5.5.6)
 

 x4 


 x5 

 −2x7 
x7

332
§5.5 Dimension and Bases

which equals        
4 −2 3 −8

 1 


 0 


 0 


 0 


 0 


 −3 


 −2 


 −4 

x2 
 0  + x4 
  1  + x5 
  0  + x7 
  0 .


 0 


 0 


 1 


 0 

 0   0   0   −2 
0 0 0 1
We can rewrite the right hand side of (5.5.6) as a linear combination of four vectors
w2 , w4 , w5 , w7
{e:w’scomb} x2 w2 + x4 w4 + x5 w5 + x7 w7 . (5.5.7)

This calculation shows that the null space of A, which is W = {x ∈ R7 : Ax = 0}, is


spanned by the four vectors w2 , w4 , w5 , w7 . Moreover, this same calculation shows that the
four vectors are linearly independent. From the left hand side of (5.5.6) we see that if this
linear combination sums to zero, then x2 = x4 = x5 = x7 = 0. It follows from Theorem 5.5.3
{D:nullity} that dim W = 4.
Definition 5.5.5. The nullity of A is the dimension of the null space of A.
{T:dimsoln}
Theorem 5.5.6. Let A be an m × n matrix. Then
nullity(A) + rank(A) = n.

Proof Neither the rank nor the null space of A are changed by elementary row operations.
So we can assume that A is in reduced echelon form. The rank of A is the number of nonzero
rows in the reduced echelon form matrix. Proposition 5.2.5 states that the null space is
spanned by p vectors where p = n − rank(A). We must show that these vectors are linearly
independent.
Let j1 , . . . , jp be the columns of A that do not contain pivots. In example (5.5.5) p = 4 and
j1 = 2, j2 = 4, j3 = 5, j4 = 7.
After solving for the variables corresponding to pivots, we find that the spanning set of the
null space consists of p vectors in Rn , which we label as {wj1 , . . . , wjp }. See (5.5.6). Note
that the jm th entry of wjm is 1 while the jm th entry in all of the other p − 1 vectors is 0.
Again, see (5.5.6) as an example that supports this statement. It follows that the set of
spanning vectors is a linearly independent set. That is, suppose that
r1 wj1 + · · · + rp wjp = 0.

333
§5.5 Dimension and Bases

From the jm th entry in this equation, it follows that rm = 0; and the vectors are linearly
independent. 

Theorem 5.5.6 has an interesting and useful interpretation. We have seen in the previous
subsection that the rank of a matrix A is just the number of linearly independent rows in
A. In linear systems each row of the coefficient matrix corresponds to a linear equation.
Thus, the rank of A may be thought of as the number of independent equations in a system
of linear equations. This theorem just states that the space of solutions loses a dimension
for each independent equation.

Exercises

{c5.5.1}
1. Show that U = {u1 , u2 , u3 } where

u1 = (1, 1, 0) u2 = (0, 1, 0) u3 = (−1, 0, 1)

is a basis for R3 .
By Theorem 5.5.3, U is a basis for R3 if the vectors of U are linearly independent and span R3 . By
Lemma 5.5.4, the dimension of U is equal to the rank of the matrix whose rows are u1 , u2 , and u3 .
Row reduce this matrix:    
1 1 0 1 0 0
 0 1 0  −→  0 1 0  .
−1 0 1 0 0 1
So dim(U) = 3 = dim(R3 ), and we need now only show that u1 , u2 , and u3 are linearly independent,
which we can do by row reducing the matrix whose columns are the vectors of U as follows:
   
1 0 −1 1 0 0
 1 1 0  −→  0 1 0  .
0 0 1 0 0 1

Therefore, there is no nonzero solution to the equation Ur = 0, so the vectors of U are linearly
independent and U is a basis for R3 .

{mc5_5A}
2. Determine whether or not the vectors

v1 = (−1, 1, 1) v2 = (1, −1, 0) v3 = (1, 1, −1)

form a basis of R3 .

334
§5.5 Dimension and Bases

 
−1 1 1
Solution: The matrix [v1t | v2t | v3t ] =  1 −1 1  can be row reduced to the identity
1 0 −1
 
1 0 0
 0 1 0  by the following sequence of manipulations
0 0 1
     
−1 1 1 1 −1 −1 1 −1 −1
−1× R +R
 1 −1 1  −−−→  −1 1 −1  −−2−−−→ 1 
0 0 −2 
1 0 −1 −1 0 1 −1 0 1
     
1 −1 −1 1 −1 −1 1 0 −1
R +R1 R ↔R3 R −R2
−−3−−−→  0 0 −2  −−2−−−→  0 −1 0  −−1−−−→  0 −1 0 
0 −1 0 0 0 −2 0 0 −2
   
R1 − 1 R
1 0 0 1 0 0
2 3  −1×R2
−−−−−−→ 0 −1 0  −−−−−→  0 1 0 
0 0 −2 0 0 −2
 
−1 ×R3
1 0 0
2
−−−−−→  0 1 0  .
0 0 1

Alternatively, if one is familiar with determinants, one can compute


 
−1 1 1
t t t
det[v1 | v2 | v3 ] = det  1 −1 1  = 2 6= 0
1 0 −1

Hence the columns form a basis of R3 .


{c5.5.2}
3. Let S = span{v1 , v2 , v3 } where

v1 = (1, 0, −1, 0) v2 = (0, 1, 1, 1) v3 = (5, 4, −1, 4).

Find the dimension of S and find a basis for S.


Answer: The dimension of S is 2, and vectors v1 and v2 form a basis for S.
Solution: Row reduce the matrix A whose rows are v1 , v2 , and v3 . By Lemma 5.6.4, the number
of nonzero rows in the reduced matrix is the dimension of S and these rows form a basis for S. So:
   
1 0 −1 0 1 0 −1 0
 0 1 1 1  −→  0 1 1 1 .
5 4 −1 4 0 0 0 0

335
§5.5 Dimension and Bases

{c5.5.3}
4. Find a basis for the null space of
 
1 0 −1 2
A= 1 −1 0 0 .
4 −5 1 −2

What is the dimension of the null space of A?


Answer: The vectors (1, 1, 1, 0) and (−2, −2, 0, 1) form a basis for the nullspace of A; therefore
the dimension of the nullspace is 2.
Solution: Find the set of solutions to Ax = 0 by solving
 
  x1
1 0 −1 2  x2 
 1 −1 0 0    = 0.
 x3 
4 −5 1 −2
x4

To solve, row reduce A, obtaining  


1 0 −1 2
 0 1 −1 2 .
0 0 0 0
So the set of solutions to Ax = 0 can be written
       
x1 x3 − 2x4 1 −2
 x2   x3 − 2x4 
 + x4  −2  .
1 
  
 x3  = 
    = x3 
x3   1   0 
x4 x4 0 1

{c5.5.4}
5. Show that the set V of all 2 × 2 matrices is a vector space. Show that the dimension of V is four
by finding a basis of V with four elements. Show that the space M (m, n) of all m × n matrices is
also a vector space. What is dim M (m, n)?
The set V is a vector space because the operations of addition and scalar multiplication satisfy the
eight properties of vector spaces described in Table 1. For 2 × 2 matrices, matrix addition is defined
for two matrices such that:
     
a1 b1 a2 b2 a1 + a2 b1 + b2
+ =
c1 d1 c 2 d2 c1 + c2 d1 + d2

and scalar multiplication is defined for a matrix and a scalar such that:
   
a b sa sb
s = .
c d sc sd

336
§5.5 Dimension and Bases

So, using these definitions, addition is commutative and associative, and the additive identity is
the 2 × 2 matrix of zeroes. If
   
w11 w12 −w11 −w12
W = then W −1 = .
w21 w22 −w21 −w22

Scalar multiplication is associative. There is a multiplicative identity, I2 , and scalar multiplication


is distributive both for scalars and for matrices. So V is a vector space. One basis for V is
       
1 0 0 1 0 0 0 0
, , and .
0 0 0 0 1 0 0 1

The set of m × n matrices is also a vector space, since it also satisfies the eight properties of
vector spaces. In this case, the additive identity is the m × n zero matrix, and the multiplicative
identity is In . The dimension of the set is mn, since one basis consists of the mn matrices with
aij = 1 and all other entries 0, for 1 ≤ i ≤ m and 1 ≤ j ≤ n.

{c5.5.5}
6. Show that the set Pn of all polynomials of degree less than or equal to n is a subspace of C 1 .
What is dim P2 ? What is dim Pn ?
The set Pn is a subspace if it is closed under addition and scalar multiplication. Let x(t) =
a0 + a1 t + · · · + an tn , y(t) = b0 + b1 t + · · · + bn tn and s ∈ R. Then

x(t) + y(t) = (a0 + b0 ) + (a1 + b1 )t + · · · + (an + bn )tn ∈ Pn .


cx(t) = c(a0 + a1 t + · · · + an tn ) = ca0 + ca1 t + · · · + can tn ∈ Pn .

The dimension of P2 is 3, since x1 = 1, x2 = t, and x3 = t2 form a basis for P2 . The dimension of


Pn is n + 1.

{c5.5.6}
7. Let P3 be the vector space of polynomials of degree at most three in one variable t. Let
p(t) = t3 + a2 t2 + a1 t + a0 where a0 , a1 , a2 ∈ R are fixed constants. Show that

dp d2 p d3 p
 
p, , ,
dt dt2 dt3

is a basis for P3 .
First, note that the dimension of P 3 is 4. We can show this by noting that the 4 polynomials
b1 (t) = t3 , b2 (t) = t2 , b3 (t) = t, and b4 (t) = 1 are linearly independent and span P 3 . The
dimension of a space is equal to the number of linearly independent vectors in a spanning set for
dp d2 p d3 p
that space. Therefore, {p, , 2 , 3 } is a basis if the polynomials are linearly independent.
dt dt dt

337
§5.5 Dimension and Bases

The general polynomial of degree 3 has the form q(t) = a3 t3 + a2 t2 + a1 t + a0 . We can


identify q(t) by the vector (a3 , a2 , a1 , a0 ). Thus, the set
p(t) = t3 + a2 t2 + a1 t + a0
dp
(t) = 3t2 + 2a2 t + a1
dt
d2 p
(t) = 6t + 2a2
dt3 2
d p
(t) = 6
dt3
dp d2 p d3 p
is identified with the matrix A, whose columns are , , and 3 :
dt dt2 dt
 
1 0 0 0
 a2 3 0 0 
A=
 a1
.
2a2 6 0 
a0 a1 2a2 6
The matrix is lower triangular and therefore row reduces to I4 . So this set of polynomials is linearly
independent and spans P 3 .
{c5.5.7}
8. Let u ∈ Rn be a nonzero row vector.

(a) Show that the n × n matrix A = ut u is symmetric and that rank(A) = 1. Hint: Begin by
showing that Av t = 0 for every row vector v ∈ Rn that is perpendicular to u and that Aut is
a nonzero multiple of ut .
(b) Show that the matrix P = In + ut u is invertible. Hint: Show that rank(P ) = n.

(a) The matrix A is symmetric because, by (3.6.1),


At = (ut u)t = ut u = A.
To show that rank(A) = 1, let v be a vector such that u · v = 0. This implies uv t = 0, and thus
Av t = ut (uv t ) = 0 for all vectors v. The span of vectors perpendicular to u has dimension n − 1,
so (A) ≥ n − 1. We know that uut = u · u = ||u||. Thus, Aut = ut uut = ||u||ut . Since u is a
nonzero vector, ||u|| 6= 0, so Aut is a nonzero multiple of ut . Therefore, A is not the zero matrix,
so (A) = n − 1, and therefore rank(A) = 1.
(b) The matrix P is invertible if P is row equivalent to In , that is, if rank(P ) = n. To show that
this is true, again let v be any vector perpendicular to u. Then:
P v t = (In + ut u)v t = v t + 0 = v t .
Thus, rank(P ) ≥ n − 1. In addition
P ut = (In + ut u)ut = ut + ||u||ut = (1 + ||u||)ut
which, since ||u|| > 0, is a nonzero multiple of ut . Therefore, rank(P ) = n and P is invertible.

338
§5.5 Dimension and Bases

{a5.5.900}
9. Let
{v1 , v2 , v3 }
be a basis for R . Find all k so that
3

{v1 , kv2 , v2 + (1 − k)v3 }

is also a basis for R . Justify your answer.


3

Answer: The set {v1 , kv2 , v2 + (1 − k)v3 } is a basis for each k 6= 0, 1.


Solution: Vectors in the space spanned by {v1 , kv2 , v2 + (1 − k)v3 } have the form

x = a1 v1 + a2 kv2 + a3 (v2 + (1 − k)v3 )


= a1 v1 + (a2 k + a3 )v2 + a3 (1 − k)v3

where a1 , a2 , a3 are scalars in R. Since the set {v1 , v2 , v3 } is a basis for R3 , it is linearly independent
and the coefficients must satisfy:

a1 = 0
a2 k + a3 = 0
a3 (1 − k) = 0

k = 1 Choose any nonzero value for a3 and the set is linearly dependent.
k = 0 Choose any nonzero value for a2 , and the set is linearly dependent.
k 6= 0, 1 The only possible solution is a1 = a2 = a3 = 0 and the set is linearly independent.

Thus, {v1 , kv2 , v2 + (1 − k)v3 } is a basis for R3 for each k 6= 0, 1.

{mc.exercise14}
10. Determine whether each of the following statements is true or false and explain your answer.

(a) If A is an m × n matrix and the equation AX = b is consistent for some b, then the columns
of A span Rm .
(b) Let A and B be n × n matrices. If AB = BA and if A is invertible, then A−1 B = BA−1 .
(c) If A and B are m×n matrices, then both AB t and At B are defined.
(d) Similar matrices always have the same eigenvectors.
(e) If u, v, w are vectors such that {u, v}, {u, w}, and {v, w} are linearly independent sets, then
{u, v, w} is a linearly independent set.
(f) Let {v1 , v2 , v3 } be a basis for a vector space V . If U is a subspace of V, then some subset of
{v1 , v2 , v3 } is a basis for U .

339
§5.5 Dimension and Bases

Answer: We strike out those statements that are false and circle those that are true.

(a) If A is an m × n matrix and the equation AX = b is consistent for some b, then the columns
of A span Rm .
(b) Let A and B be n × n matrices. If AB = BA and if A is invertible, then A−1 B = BA−1 .
(c) If A and B are m×n matrices, then both AB t and At B are defined.
(d) Similar matrices always have the same eigenvectors.
(e) If u, v, w are vectors such that {u, v}, {u, w}, and {v, w} are linearly independent sets, then
{u, v, w} is a linearly independent set.
(f) Let {v1 , v2 , v3 } be a basis for a vector space V . If U is a subspace of V, then some subset of
{v1 , v2 , v3 } is a basis for U .

Solution:
   
1 0 1
(a) This is false. For example, let A = and b = . Then the system of equations
0 0 0
 
1
AX = b is consistent with solution X = .
0
     
1 1 1 0 1 −1
(b) This is false. For example, let A = and B = . Then A−1 =
0 1 0 2 0 1
and    
1 −2 1 −1
A−1 B = 6= = BA−1 .
0 2 0 2
(c) This is true. The transpose of an m × n matrix is an n × m matrix. Therefore
AB t is (m × n)(n × m) = m × m
and
At B is (n × m)(m × n) = n × n.
(d) This is false. Suppose A and B aresimilar matrices.
 Then,
 thereexists an invertible
 matrix P
0 1 1 1 1 −1
such that A = P BP . Let B =
−1
and P = . Then P −1
=
1 0 0 1 0 1
 
−1 0
and A = P BP =
−1
. B has eigenvectors (1, 1) and (1, −1). A has eigenvectors
1 1
(0, 1) and (2, −1). These are not the same.
(e) This is false. For example, let V = R3 , u = e1 , v = e2 and w = e1 + e2 . Then {u, v, w} is a
linearly dependent set, but {u, v}, {u, w}, and {v, w} are all linearly independent sets.
(f) This is false. For example, let V = R2 and let v1 = e1 and v2 = e2 be the standard basis
vectors. Let U be the one-dimensional subspace with basis {e1 + e2 }. Then no subset of
{e1 , e2 } is a basis for U .

340
§5.6 The Proof of the Main Theorem

{S:5.6} 5.6 The Proof of the Main Theorem


We begin the proof of Theorem 5.5.3 with two lemmas on linearly independent and spanning
{reducetoindep} sets.

Lemma 5.6.1. Let {w1 , . . . , wk } be a set of vectors in a vector space V and let W be the sub-
space spanned by these vectors. Then there is a linearly independent subset of {w1 , . . . , wk }
that also spans W .

Proof If {w1 , . . . , wk } is linearly independent, then the lemma is proved. If not, then the
set {w1 , . . . , wk } is linearly dependent. If this set is linearly dependent, then at least one
of the vectors is a linear combination of the others. By renumbering if necessary, we can
assume that wk is a linear combination of w1 , . . . , wk−1 ; that is,

wk = a1 w1 + · · · + ak−1 wk−1 .

Now suppose that w ∈ W . Then

w = b1 w1 + · · · + bk wk .

It follows that
w = (b1 + bk a1 )w1 + · · · + (bk−1 + bk ak−1 )wk−1 ,
and that W = span{w1 , . . . , wk−1 }. If the vectors w1 , . . . , wk−1 are linearly independent,
then the proof of the lemma is complete. If not, continue inductively until a linearly inde-
pendent subset of the wj that also spans W is found. 

The important point in proving that linear independence together with spanning imply that
{lem:lindep} we have a basis is discussed in the next lemma.
Lemma 5.6.2. Let W be an m-dimensional vector space and let k > m be an integer. Then
any set of k vectors in W is linearly dependent.

Proof Since the dimension of W is m we know that this vector space can be written
as W = span{v1 , . . . , vm }. Moreover, Lemma 5.6.1 implies that the vectors v1 , . . . , vm are
linearly independent. Suppose that {w1 , . . . , wk } is another set of vectors where k > m. We
have to show that the vectors w1 , . . . , wk are linearly dependent; that is, we must show that
there exist scalars r1 , . . . , rk not all of which are zero that satisfy

{independence1} r1 w1 + · · · + rk wk = 0. (5.6.1)

341
§5.6 The Proof of the Main Theorem

We find these scalars by solving a system of linear equations, as we now show.


The fact that W is spanned by the vectors vj implies that

w1 = a11 v1 + · · · + am1 vm
w2 = a12 v1 + · · · + am2 vm
..
.
wk = a1k v1 + · · · + amk vm .

It follows that r1 w1 + · · · + rk wk equals

r1 (a11 v1 + · · · + am1 vm ) +
r2 (a12 v1 + · · · + am2 vm ) + · · · +
rk (a1k v1 + · · · + amk vm )

Rearranging terms leads to the expression:

(a11 r1 + · · · + a1k rk )v1 +


{e:r1v1etc} (a21 r1 + · · · + a2k rk )v2 +···+ (5.6.2)
(am1 r1 + · · · + amk rk )vm .

Thus, (5.6.1) is valid if and only if (5.6.2) sums to zero. Since the set {v1 , . . . , vm } is linearly
independent, (5.6.2) can equal zero if and only if

a11 r1 + · · · + a1k rk = 0
a21 r1 + · · · + a2k rk = 0
..
.
am1 r1 + · · · + amk rk = 0.

Since m < k, Chapter 2, Theorem 2.4.6 implies that this system of homogeneous linear
equations always has a nonzero solution r = (r1 , . . . , rk ) — from which it follows that the
wi are linearly dependent. 
{basis<n}
Corollary 5.6.3. Let V be a vector space of dimension n and let {u1 , . . . , uk } be a linearly
independent set of vectors in V . Then k ≤ n.

Proof If k > n then Lemma 5.6.2 implies that {u1 , . . . , uk } is linearly dependent. Since
we have assumed that this set is linearly independent, it follows that k ≤ n. 

342
§5.6 The Proof of the Main Theorem

Theorem 5.5.3 Suppose that B = {w1 , . . . , wk } is a basis for W . By definition, B spans


W and k = dim W . We must show that B is linearly independent. Suppose B is linearly
dependent, then Lemma 5.6.1 implies that there is a proper subset of B that spans W (and
is linearly independent). This contradicts the fact that as a basis B has the smallest number
of elements of any spanning set for W .
Suppose that B = {w1 , . . . , wk } both spans W and is linearly independent. Linear indepen-
dence and Corollary 5.6.3 imply that k ≤ dim W . Since, by definition, any spanning set of
W has at least dim W vectors, it follows that k ≥ dim W . Thus, k = dim W and B is a
basis. 

Extending Linearly Independent Sets to Bases Lemma 5.6.1 leads to one approach
to finding bases. Suppose that the subspace W is spanned by a finite set of vectors
{w1 , . . . , wk }. Then, we can throw out vectors one by one until we arrive at a linearly
independent subset of the wj . This subset is a basis for W .
We now discuss a second approach to finding a basis for a nonzero subspace W of a finite
dimensional vector space V .
{extendindep}
Lemma 5.6.4. Let {u1 , . . . , uk } be a linearly independent set of vectors in a vector space
V and assume that
uk+1 6∈ span{u1 , . . . , uk }.
Then {u1 , . . . , uk+1 } is also a linearly independent set.

Proof Let r1 , . . . , rk+1 be scalars such that

{rk+1} r1 u1 + · · · + rk+1 uk+1 = 0. (5.6.3)

To prove independence, we need to show that all rj = 0. Suppose rk+1 6= 0. Then we can
solve (5.6.3) for
1
uk+1 = − (r1 u1 + · · · + rk uk ),
rk+1
which implies that uk+1 ∈ span{u1 , . . . , uk }. This contradicts the choice of uk+1 . So
rk+1 = 0 and
r1 u1 + · · · + rk uk = 0.
Since {u1 , . . . , uk } is linearly independent, it follows that r1 = · · · = rk = 0. 

The second method for constructing a basis is:

343
§5.6 The Proof of the Main Theorem

• Choose a nonzero vector w1 in W .


• If W is not spanned by w1 , then choose a vector w2 that is not on the line spanned by
w1 .
• If W 6= span{w1 , w2 }, then choose a vector w3 6∈ span{w1 , w2 }.
• If W 6= span{w1 , w2 , w3 }, then choose a vector w4 6∈ span{w1 , w2 , w3 }.
• Continue until a spanning set for W is found. This set is a basis for W .

We now justify this approach to finding bases for subspaces. Suppose that W is a subspace
of a finite dimensional vector space V . For example, suppose that W ⊂ Rn . Then our
approach to finding a basis of W is as follows. Choose a nonzero vector w1 ∈ W . If
W = span{w1 }, then we are done. If not, choose a vector w2 ∈ W – span{w1 }. It
follows from Lemma 5.6.4 that {w1 , w2 } is linearly independent. If W = span{w1 , w2 }, then
Theorem 5.5.3 implies that {w1 , w2 } is a basis for W , dim W = 2, and we are done. If
not, choose w3 ∈ W – span{w1 , w2 } and {w1 , w2 , w3 } is linearly independent. The finite
dimension of V implies that continuing inductively must lead to a spanning set of linear
{c:extendindependent} independent vectors for W — which by Theorem 5.5.3 is a basis. This discussion proves:
Corollary 5.6.5. Every linearly independent subset of a finite dimensional vector space V
can be extended to a basis of V .

Further consequences of Theorem 5.5.3 We summarize here several important facts


{dimensiondecreases} about dimensions.
Corollary 5.6.6. Let W be a subspace of a finite dimensional vector space V .

(a) Suppose that W is a proper subspace. Then dim W < dim V .


(b) Suppose that dim W = dim V . Then W = V .

Proof (a) Let dim W = k and let {w1 , . . . , wk } be a basis for W . Since W is a proper sub-
space of V , there is a vector w ∈ V – W . It follows from Lemma 5.6.4 that {w1 , . . . , wk , w}
is a linearly independent set. Therefore, Corollary 5.6.3 implies that k + 1 ≤ n.
(b) Let {w1 , . . . , wk } be a basis for W . Theorem 5.5.3 implies that this set is linearly
independent. If {w1 , . . . , wk } does not span V , then it can be extended to a basis as above.
{C:dim=n} But then dim V > dim W , which is a contradiction. 
Corollary 5.6.7. Let B = {w1 , . . . , wn } be a set of n vectors in an n-dimensional vector
space V . Then the following are equivalent:

344
§5.6 The Proof of the Main Theorem

(a) B is a spanning set of V ,

(b) B is a basis for V , and

(c) B is a linearly independent set.

Proof By definition, (a) implies (b) since a basis is a spanning set with the number of
vectors equal to the dimension of the space. Theorem 5.5.3 states that a basis is a linearly
independent set; so (b) implies (c). If B is a linearly independent set of n vectors, then it
spans a subspace W of dimension n. It follows from Corollary 5.6.6(b) that W = V and
that (c) implies (a). 

Subspaces of R3 We can now classify all subspaces of R3 . They are: the origin, lines
through the origin, planes through the origin, and R3 . All of these sets were shown to be
subspaces in Example 5.1.5(a–c).
To verify that these sets are the only subspaces of R3 , note that Theorem 5.5.3 implies that
proper subspaces of R3 have dimension equal either to one or two. (The zero dimensional
subspace is the origin and the only three dimensional subspace is R3 itself.) One dimensional
subspaces of R3 are spanned by one nonzero vector and are just lines through the origin.
See Example 5.1.5(b). We claim that all two dimensional subspaces are planes through the
origin.
Suppose that W ⊂ R3 is a subspace spanned by two non-collinear vectors w1 and w2 . We
show that W is a plane through the origin using results in Chapter 2. Observe that there
is a vector N = (N1 , N2 , N3 ) perpendicular to w1 = (a11 , a12 , a13 ) and w2 = (a21 , a22 , a23 ).
Such a vector N satisfies the two linear equations:

w1 · N = a11 N1 + a12 N2 + a13 N3 = 0


w2 · N = a21 N1 + a22 N2 + a23 N3 = 0.

Chapter 2, Theorem 2.4.6 implies that a system of two linear equations in three unknowns
has a nonzero solution. Let P be the plane perpendicular to N that contains the origin. We
show that W = P and hence that the claim is valid.
The choice of N shows that the vectors w1 and w2 are both in P . In fact, since P is a
subspace it contains every vector in span{w1 , w2 }. Thus W ⊂ P . If P contains just one
additional vector w3 ∈ R3 that is not in W , then the span of w1 , w2 , w3 is three dimensional
and P = W = R3 .

345
§5.6 The Proof of the Main Theorem

Exercises

In Exercises 1 – 3 you are given a pair of vectors v1 , v2 spanning a subspace of R3 . Decide whether
that subspace is a line or a plane through the origin. If it is a plane, then compute a vector N that
{c5.7.1a} is perpendicular to that plane.
1. v1 = (2, 1, 2) and v2 = (0, −1, 1).
3
Answer: The span of v1 and v2 is a plane with normal vector N = n3 (− , 1, 1), where n3 is a
2
nonzero scalar.
Solution: If v1 and v2 are linearly independent, then they span a plane in R3 . If they are linearly
dependent, that is, if v1 = αv2 for some scalar α, then they span a line in R3 . In this case, there
is no scalar α such that (2, 1, 2) = α(0, −1, 1), so the span of v1 and v2 has dimension two. The
vector N = (n1 , n2 , n3 ) is found by observing that:

2n1 + n2 + 2n3 = 0
−n2 + n3 = 0

which is a linear system in two equations. Solve for N by row reducing the corresponding matrix:
3
  !
2 1 2 1 0
−→ 2 .
0 −1 1 0 1 −1
{c5.7.1b}
2. v1 = (2, 1, −1) and v2 = (−4, −2, 2).
1
Answer: The subspace spanned by v1 and v2 is a line, since, if α = − , then (2, 1, −1) =
2
α(−4, −2, 2).
{c5.7.1c}
3. v1 = (0, 1, 0) and v2 = (4, 1, 0).
Answer: The span of v1 and v2 is a plane with normal vector N = n3 (0, 0, 1), where n3 is a
nonzero scalar.
Solution: There is no scalar α such that (0, 1, 0) = α(4, 1, 0). Let N = (n1 , n2 , n3 ) be the vector
perpendicular to the plane. Then:

n2 = 0
4n1 + n2 = 0

Solve for N by substitution to find that n1 = n2 = 0, and n3 can be any nonzero real scalar.

{c5.7.2}
4. The pairs of vectors
v1 = (−1, 1, 0) and v2 = (1, 0, 1)

346
§5.6 The Proof of the Main Theorem

span a plane P in R3 . The pairs of vectors

w1 = (0, 1, 0) and w2 = (1, 1, 0)

span a plane Q in R3 . Show that P and Q are different and compute the subspace of R3 that is
given by the intersection P ∩ Q.
Answer: The intersection of the planes is P ∩ Q = s(1, −1, 0) for any real scalar s.
Solution: The planes P and Q are not equal if the normal vectors PN and QN point in different
directions. Solving by row reduction yields PN = (−1, −1, 1) and QN = (0, 0, 1), so P 6= Q.
Since P and Q are not the same plane and also are not parallel, they intersect in a line. The
intersection P ∩ Q is the simultaneous solutions to the equations for planes P and Q, that is:
−x − y + z = 0
z = 0.
Solve by row reduction or substitution to obtain x = −y and z = 0.
{c5.6.1}
5. Let A be a 7 × 5 matrix with rank(A) = r.

(a) What is the largest value that r can have?


(b) Give a condition equivalent to the system of equations Ax = b having a solution.
(c) What is the dimension of the null space of A?
(d) If there is a solution to Ax = b, then how many parameters are needed to describe the set of
all solutions?

(a) The largest value that r can have is 5, since the matrix has 5 columns. Thus, the reduced
echelon form matrix can have at most 5 pivot points.
(b) The equation Ax = b has a solution if the rank of the augmented matrix (A|b) is r. If rank(A|b)
is greater than r, then there is a pivot in the 6th column and the system is inconsistent, so there is
no solution.
(c) The null space has dimension 5 − r.
(d) The number of parameters needed to describe the solution to Ax = b is 5 − r, since 5 − r
parameters are needed to describe the solutions to Ax = 0, and the solutions to the inhomogeneous
system are obtained by adding the solutions of the homogeneous system to one solution of the
inhomogeneous system.
{c5.6.2}
6. Let  
1 3 −1 4
A= 2 1 5 7 .
3 4 4 11

347
§5.6 The Proof of the Main Theorem

(a) Find a basis for the subspace C ⊂ R3 spanned by the columns of A.


(b) Find a basis for the subspace R ⊂ R4 spanned by the rows of A.
(c) What is the relationship between dim C and dim R?

Answer: (a) The vectors (1, 2, 3) and (3, 1, 4) form a basis for the subspace C of R3 spanned by
the columns of A.
(b) The vectors (1, 3, −1, 4) and (2, 1, 5, 7) form a basis for the subspace R of R4 spanned by the
rowss of A.
(c) dim C = dim R.
Solution: (a) Note that
     
−1 1 3
16 7
 5 =  2 −  1 
5 5
4 3 4
     
4 1 3
 7 = 17 1
 2 +  1 
5 5
11 3 4

So the two vectors (1, 2, 3) and (3, 1, 4) span C. Since they are linearly independent, these vectors
are a basis for C and dim = C = 2.
(b) Note that
(3, 4, 4, 11) = (1, 3, −1, 4) + (2, 1, 5, 7).
Therefore, {(1, 3, −1, 4), (2, 1, 5, 7)} is a basis for R and dim R = 2

{c5.6.3}
7. Show that the vectors
v1 = (2, 3, 1) and v2 = (1, 1, 3)
are linearly independent. Show that the span of v1 and v2 forms a plane in R3 by showing that
every linear combination is the solution to a single linear equation. Use this equation to determine
the normal vector N to this plane. Verify Lemma 5.6.4 by verifying directly that v1 , v2 , N are
linearly independent vectors.
There is no scalar α such that (2, 3, 1) = α(1, 1, 3), so v1 and v2 are linearly independent. The set
of linear combinations of v1 and v2 is the set of solutions to
2a1 + 3a1 + a3 = 0
a1 + a2 + 3a3 = 0.

Row reducing this system yields a1 = −8a3 and a2 = 5a3 . Let a3 = 0. Then every linear
combination of v1 and v2 is of the form

8x1 − 5x2 − x3 = 0

348
§5.6 The Proof of the Main Theorem

which is the equation of a plane in R3 . The normal vector to this plane is N = (8, −5, −1). Row
reduce the matrix whose columns are the vectors v1 , v2 , and N to verify that these vectors are
linearly independent.    
2 1 8 1 0 0
 3 1 −5  −→  0 1 0  .
1 3 −1 0 0 1
So the vectors are indeed linearly independent, verifying Lemma 5.6.4.
{c5.6.3A}
8. Let W be an infinite dimensional subspace of the vector space V . Show that V is infinite
dimensional.
Let W be an infinite dimensional subspace of the vector space V . We want to show that V is
infinite dimensional. Suppose that V is finite dimensional with dim V = n. Then Corollary 5.6.3
states that any set of linear independent vectors in V has at most n vectors. Since W is infinite
dimensional, there exists a linearly independent set of vectors in W ⊂ V with more than n vectors.
This is a contradiction and V must be infinite dimensional.

{c5.6.4} 9. (matlab) Consider the following set of vectors


w1 = (2, −2, 1),
w2 = (−1, 2, 0),
w3 = (3, −2, λ),
w4 = (−5, 6, −2),

where λ is a real number.

(a) Find a value for λ such that the dimension of span{w1 , w2 , w3 , w4 } is three. Then decide
whether {w1 , w2 , w3 } or {w1 , w2 , w4 } is a basis for R3 .
(b) Find a value for λ such that the dimension of span{w1 , w2 , w3 , w4 } is two.

(a) Answer: The span has dimension 3 for λ 6= 2, and the set {w1 , w2 , w3 } is a basis for R3 .
Solution: Find the dimension of the span by creating a matrix with rows w1 , w2 , w3 , and w4 ,
then row reducing:
 1 

2 −2 1
 1 −1
2 
1 

 −1 2 0 
{exeq:5.6.4} (5.6.4)

  −→  0 1 .
 3 −2 λ   2 
−5 6 −2
 0 0 λ − 2 
0 0 0
If λ = 2, then the dimension of the span will be 2 and if λ 6= 2, then the dimension of the span will
be 3. For example, let λ = −1.

349
§5.6 The Proof of the Main Theorem

Verify by row reduction that the set {w1 , w2 , w3 } is a basis for R3 and that the set
{w1 , w2 , w4 } is not a basis for R3 .
(b) If λ = 2, then the dimension of span{w1 , w2 , w3 , w4 } is 2, as shown by equation (5.6.4).

{c5.6.5} 10. (matlab) Find a basis for R as follows. Randomly choose vectors x1 , x2 ∈ R by typing
5 5

x1 = rand(5,1) and x2 = rand(5,1). Check that these vectors are linearly independent. If not,
choose another pair of vectors until you find a linearly independent set. Next choose a vector x3 at
random and check that x1 , x2 , x3 are linearly independent. If not, randomly choose another vector
for x3 . Continue until you have five linearly independent vectors — which by a dimension count
must be a basis and span R5 . Verify this comment by using MATLAB to write the vector
 
2
 1 
 
 3 
 
 −2 
4
as a linear combination of x1 , . . . , x5 .
Here is a sample MATLAB output for this problem. Type:

x1 = rand(5,1);
x2 = rand(5,1);
x3 = rand(5,1);
x4 = rand(5,1);
x5 = rand(5,1);

A summary of the results is:

x1 = x2 = x3 = x4 = x5 =
0.9501 0.7621 0.6154 0.4057 0.0579
0.2311 0.4565 0.7919 0.9355 0.3529
0.6068 0.0185 0.9218 0.9169 0.8132
0.4860 0.8214 0.7382 0.4103 0.0099
0.8913 0.4447 0.1763 0.8936 0.1389

The command A = [x1 x2 x3 x4 x5] creates the matrix with columns x1 , ..., x5 . Type rref(A)
to verify that the vectors are linearly independent. The following steps display the vector b =
(2, 1, 3, −2, 4) as a linear combination of x1 , ..., x5 . Type:

b = [2;1;3;-2;4];
C = [A b];
rref(C)

350
§5.6 The Proof of the Main Theorem

which yields:

ans =
1.0000 0 0 0 0 28.7614
0 1.0000 0 0 0 -96.6468
0 0 1.0000 0 0 70.9112
0 0 0 1.0000 0 30.0838
0 0 0 0 1.0000 -129.8826

So, in this example, b = 28.7614x1 − 96.6468x2 + 70.9112x3 + 30.0838x4 − 129.8826x5 .

{c5.6.6} 11. (matlab) Find a basis for the subspace of R spanned by


5

u1 = (1, 1, 0, 0, 1)
u2 = (0, 2, 0, 1, −1)
{MATLAB:63} u3 = (0, −1, 1, 0, 2) (5.6.5*)
u4 = (1, 4, 1, 2, 1)
u5 = (0, 0, 2, 1, 3).

1 3 1 1 1 3
Answer: The vectors (1, 0, 0, − , ), (0, 1, 0, , − ), and (0, 0, 1, , ) form a basis for the sub-
2 2 2 2 2 2
space spanned by u1 , . . . , u5 .
Solution: Row reduce the matrix M, whose rows are u1 , u2 , u3 , u4 and u5 . According to
Lemma 5.6.4, the rows of the reduced echelon matrix form a basis for {u1 , . . . , u5 }. The com-
mand rref(M) yields:

ans =
1.0000 0 0 -0.5000 1.5000
0 1.0000 0 0.5000 -0.5000
0 0 1.0000 0.5000 1.5000
0 0 0 0 0
0 0 0 0 0

{A5.6.1}
12. Let V be the subspace of R4 defined by the equations

x1 − x3 − x4 = 0
x2 − 2x3 + x4 = 0

351
§5.6 The Proof of the Main Theorem

Consider the vectors


       
−1 2 2 2
 1   5   1   −2 
v1 = 
 0 
 v2 = 
 3 
 v3 =  
 1  v4 = 
 0 

−1 −1 1 2
Find all bases of V of the form {vi , vj } with 1 ≤ i < j ≤ 4. (Hint: use Corollary 5.6.7.)
Answer: The sets {v1 , v3 } and {v3 , v4 } are bases.
Solution: The vector v2 does not satisfy the second equation, so they are not elements of V . The
subspace V is two-dimensional, so any two linearly independent vectors contained in it form a basis.
The possibilities are
{v1 , v3 } {v1 , v4 } {v3 , v4 }
The set {v1 , v4 } is not a basis because v4 = −2v1 . The sets {v1 , v3 } and {v3 , v4 } are linearly
independent sets of two vectors in V ; so they are bases.
{A5.6.2}
13. Let A be an m × n matrix and B be an n × k matrix.

(a) Show that null space(B) ⊆ null space(AB).


(b) Show that nullity(B) ≤ nullity(AB).

Solution: Note that if x ∈ null space(B) then


ABx = A(Bx) = A0 = 0
so x ∈ null space(AB). It follows that null space(B) ⊆ null space(AB). The desired result imme-
diately follows from Corollary 5.6.6. Alternatively, suppose by way of contradiction that dim null
space(AB) > dim null space(B), and let {v1 , . . . , vd } be a basis for null space(AB), where d = dim
null space(AB). Then {v1 , . . . , vd } is a set of d > dim null space(B) linearly independent vectors
in null space(B), which contradicts Corollary 5.6.3.
{A5.6.3}
14. Let {v1 , v2 , v3 } and {w1 , w2 } be linearly independent sets of vectors in a vector space V . Show
that if
span{v1 , v2 , v3 } ∩ span{w1 , w2 } = {0}
then
dim(span{v1 , v2 , v3 , w1 , w2 }) = 5
Hint: First show that if v ∈ span{v1 , v2 , v3 }, w ∈ span{w1 , w2 }, and v + w = 0, then v = w = 0.
Solution: First, we verify the claim in the hint. Let v ∈ span{v1 , v2 , v3 }, w ∈ span{w1 , w2 }, and
v + w = 0. Since span{w1 , w2 , w3 } is closed under scalar multiplication, it follows that v = −w ∈
span{w1 , w2 }. Therefore
v ∈ span{v1 , v2 , v3 } ∩ span{w1 , w2 } = {0}

352
§5.6 The Proof of the Main Theorem

and v = w = 0.
Next, we show that {v1 , v2 , v3 , w1 , w2 } is a linearly independent set, which implies that it is a basis
for span{v1 , v2 , v3 , w1 , w2 }. Suppose that a1 , a2 , a3 and b1 , b2 are scalars so that

{A5.6.3_1} a1 v1 + a2 v2 + a3 v3 + b1 w1 + b2 w2 = 0 (5.6.6)

Let v = a1 v1 + a2 v2 + a3 v3 and w = b1 w1 + b2 w2 . Then v + w = 0. So, by the hint, v = w = 0.


Since {v1 , v2 , v3 } is a linearly independent set so

a1 v1 + a2 v2 + a3 v3 = v = 0

implies that a1 = a2 = a3 = 0. Similarly, {w1 , w2 } is a linearly independent set so

b1 w1 + b2 w2 = w = 0

implies that b1 = b2 = 0. Therefore, the only solution to (5.6.6) is a1 = a2 = a3 = b1 = b2 = 0.


Thus, the set {v1 , v2 , v3 , w1 , w2 } is linearly independent and therefore a basis for its span. It follows
that
dim(span{v1 , v2 , v3 , w1 , w2 }) = 3 + 2 = 5.

{A5.6.4} In Exercises 15-20 decide whether the statement is true or false, and explain your answer.
15. Every set of three vectors in R3 is a basis for R3 . Answer: False
Solution: The vectors could be linearly independent. For example {e1 , e2 , e1 + e2 } is not a basis
{A5.6.5} for R3 .
16. Every set of four vectors in R3 is linearly dependent. Answer: True
Solution: The dimension of R3 is three, so any set of more than three vectors in R3 is necessarily
{A5.6.6} linearly dependent.
17. If {v1 , v2 } is a basis for the plane z = 0 in R3 , then {v1 , v2 , e3 } is a basis for R3 . Answer:
True Solution: The vector e3 is not contained in the plane z = 0, so Lemma 5.6.4 implies that
{v1 , v2 , e3 } is linearly independent. Therefore, {v1 , v2 , e3 } is a basis for R3 , because any set of three
linearly independent vectors in a vector space of dimension three is a basis for that vector space.
Alternatively, {e1 , e2 , e3 } is a basis for R3 , so if w ∈ R3 , then w = a1 e1 + a2 e2 + a3 e3 for some
scalars a1 and a2 . It follows that a1 e1 + a2 e2 is a vector in the plane z = 0, so there exist scalars
b1 and b2 so a1 e1 + a2 e2 = b1 v1 + b2 v2 . Therefore, w = b1 v1 + b2 v2 + a3 e3 , so w ∈ span{v1 , v2 , e3 }.
Therefore, {v1 , v2 , e3 } spans R3 , and is a basis for R3 because any set of three linearly independent
{A5.6.7} vectors in a vector space of dimension three is a basis for that vector space.
18. If {v1 , v2 , v3 } is a basis for R3 , the only subspaces of R3 of dimension one are span{v1 }, span{v2 },
and span{v3 }. Answer: False Solution: For example, {e1 , e2 , e3 } is a basis for R3 and span{e1 +
e2 } does not equal the x, y, or z-axes.

353
§5.6 The Proof of the Main Theorem

{A5.6.8}
19. The only subspace of R3 that contains finitely many vectors is {0}. Answer: True Solution:
If a subspace of R3 contains a nonzero vector, it must contain all scalar multiples of that vector.
{A5.6.9}
20. If U is a subspace of R3 of dimension 1 and V is a subspace of R3 of dimension 2, then
U ∩ V = {0}. Answer: False Solution: U ∩ V is always subspace of R3 , but its dimension could
be one. For example, if U is the x-axis ( U = span{e1 }) and V is the xy-plane (V = span{e1 , e2 })
then U ∩ V = U. U ∩ V is always subspace of R3 , but its dimension could be one. For example, if
U is the x-axis (U = span{e1 }) and V is the xy-plane (V = span{e1 , e2 }), then U ∩ V = U .

354
Chapter 6 Closed Form Solutions for Planar ODEs

6 Closed Form Solutions for Planar ODEs


In this chapter we describe several methods for finding closed form solutions to planar
constant coefficient systems of linear differential equations and we use these methods to
discuss qualitative features of phase portraits of these solutions.
In Section 6.1 we show how uniqueness to initial value problems implies that the space of
solutions to a constant coefficient system of n linear differential equations is n dimensional.
Using this observation we present a direct method for solving planar linear systems in
Section 6.2. This method extends the discussion of solutions to systems whose coefficient
matrices have distinct real eigenvalues given in Section 4.7 to the cases of complex eigenvalues
and equal real eigenvalues.
A second method for finding solutions is to use changes of coordinates to make the coefficient
matrix of the differential equation as simple as possible. This idea leads to the notion of
similarity of matrices, which is discussed in Section 6.3, and leads to the second method for
solving planar linear systems. Similarity also leads to the Jordan Normal Form theorem for
2 × 2 matrices. Both the direct method and the method based on similarity require being
able to compute the eigenvalues and eigenvectors of the coefficient matrix.
The important subject of qualitative features of phase portraits of linear systems are explored
in Section 6.4. Specifically we discuss saddles, sinks, sources and asymptotic stability. This
discussion also uses similarity and Jordan Normal Form. We find that the qualitative theory
is determined by the eigenvalues and eigenvectors of the coefficient matrix — which is
not surprising given that we can classify matrices up to similarity by just knowing their
eigenvalues and eigenvectors.
Chapter 6 ends with three optional sections. Matrix exponentials yield an elegant third way
to derive closed form solutions to n-dimensional linear ODE systems (Section 6.5). This
method leads to a proof of uniqueness of solutions to initial value problems of linear systems
(Theorem 6.5.1). A proof of the Cayley Hamilton Theorem for 2 × 2 matrices is given in
Section 6.6. In the last section, Section 6.7, we obtain solutions to second order equations
by reducting them to first order systems.

355
§6.1 The Initial Value Problem

{Chap:Planar}

{S:6.1} 6.1 The Initial Value Problem


Recall that a planar autonomous constant coefficient system of ordinary differential equa-
tions has the form
dx
= ax + by
{2dlinsystem} dt (6.1.1)
dy
= cx + dy
dt
where a, b, c, d ∈ R. Computer experiments using pplane9 lead us to believe that there is
just one solution to (6.1.1) satisfying the initial conditions

x(0) = x0
y(0) = y0 .

We prove existence in this section and the next by determining explicit formulas for solutions.

The Initial Value Problem for Linear Systems In this chapter we discuss how to find
solutions (x(t), y(t)) to (6.1.1) satisfying the initial values x(0) = x0 and y(0) = y0 . It is
convenient to rewrite (6.1.1) in matrix form as:

dX
{ndlinsystem} (t) = CX(t). (6.1.2)
dt
The initial value problem is then stated as: Find a solution to (6.1.2) satisfying X(0) = X0
where X0 = (x0 , y0 )t . Everything that we have said here works equally well for n dimensional
systems of linear differential equations. Just let C be an n × n matrix and let X0 be an n
vector of initial conditions.

Solving the Initial Value Problem Using Superposition In Section 4.7 we discussed how to
solve (6.1.2) when the eigenvalues of C are real and distinct. Recall that when λ1 and
λ2 are distinct real eigenvalues of C with associated eigenvectors v1 and v2 , there are two
solutions to (6.1.2) given by the explicit formulas

X1 (t) = eλ1 t v1 and X2 (t) = eλ2 t v2 .

Superposition guarantees that every linear combination of these solutions

X(t) = α1 X1 (t) + α2 X2 (t) = α1 eλ1 t v1 + α2 eλ2 t v2

356
§6.1 The Initial Value Problem

is a solution to (6.1.2). Since v1 and v2 are linearly independent, we can always choose
scalars α1 , α2 ∈ R to solve any given initial value problem of (6.1.2). It follows from the
uniqueness of solutions to initial value problems that all solutions to (6.1.2) are included
in this family of solutions. Uniqueness is proved in the special case of linear systems in
Theorem 6.5.1. This proof uses matrix exponentials.
We generalize this discussion so that we will be able to find closed form solutions to (6.1.2)
in Section 6.2 when the eigenvalues of C are complex or are real and equal.
Suppose that X1 (t) and X2 (t) are two solutions to (6.1.1) such that

v1 = X1 (0) and v2 = X2 (0)

are linearly independent. Then all solutions to (6.1.1) are linear combinations of these two
solutions. We verify this statement as follows. Corollary 5.6.7 of Chapter 5 states that since
{v1 , v2 } is a linearly independent set in R2 , it is also a basis of R2 . Thus for every X0 ∈ R2
there exist scalars r1 , r2 such that

X0 = r1 v1 + r2 v2 .

It follows from superposition that the solution

X(t) = r1 X1 (t) + r2 X2 (t)

is the unique solution whose initial condition vector is X0 .


We have proved that every solution to this linear system of differential equations is a linear
combination of these two solutions — that is, we have proved that the dimension of the
space of solutions to (6.1.2) is two. This proof generalizes immediately to a proof of the
following theorem for n × n systems.
{T:solvends}
Theorem 6.1.1. Let C be an n × n matrix. Suppose that

X1 (t), . . . , Xn (t)

are solutions to Ẋ = CX such that the vectors of initial conditions vj = Xj (0) are linearly
independent in Rn . Then the unique solution to the system (6.1.2) with initial condition
X(0) = X0 is
{E:genlsoln} X(t) = r1 X1 (t) + · · · + rn Xn (t), (6.1.3)
where r1 , . . . , rn are scalars satisfying

{findscalars} X0 = r1 v1 + · · · + rn vn . (6.1.4)

357
§6.1 The Initial Value Problem

We call (6.1.3) the general solution to the system of differential equations Ẋ = CX. When
solving the initial value problem we find a particular solution by specifying the scalars
r1 , . . . , rn .
{C:indsoln}
Corollary 6.1.2. Let C be an n × n matrix and let

X = {X1 (t), . . . , Xn (t)}

be solutions to the differential equation Ẋ = CX such that the vectors Xj (0) are linearly
independent in Rn . Then the set of all solutions to Ẋ = CX is an n-dimensional subspace
of (C 1 )n , and X is a basis for the solution subspace.

Consider a special case of Theorem 6.1.1. Suppose that the matrix C has n linearly in-
dependent eigenvectors v1 , . . . , vn with real eigenvalues λ1 , . . . , λn . Then the functions
Xj (t) = eλj t vj are solutions to Ẋ = CX. Corollary 6.1.2 implies that the functions Xj
form a basis for the space of solutions of this system of differential equations. Indeed, the
general solution to (6.1.2) is

{e:gensoln} X(t) = r1 eλ1 t v1 + · · · + rn eλn t vn . (6.1.5)

The particular solution that solves the initial value X(0) = X0 is found by solving (6.1.4)
for the scalars r1 , . . . , rn .

Exercises

In Exercises 1 – 4, consider the system of differential equations

dx
= 65x + 42y
{Ex.1.03} dt
dy (6.1.6)
= −99x − 64y.
dt
{c6.1.03a}
1. Verify that
! !
2 −7
v1 = and v2 =
−3 11
are eigenvectors of the coefficient matrix of (6.1.6) and find the associated eigenvalues.
Answer: The vector v1 = (2, −3)t is an eigenvector with associated eigenvalue λ1 = 2. The vector
v2 = (−7, 11)t is an eigenvector with associated eigenvalue λ2 = −1.

358
§6.1 The Initial Value Problem

Solution: Calculate:
! ! ! !
65 42 2 4 2
= =2 .
−99 −64 −3 −6 −3
! ! ! !
65 42 −7 7 −7
= = −1 .
−99 −64 11 −11 11
{c6.1.03b}
2. Find the solution to (6.1.6) satisfying initial conditions X(0) = (−14, 22)t .
Answer: The solution to (6.1.6) with initial condition X(0) = (−14, 22)t is
!
−7
X(t) = 2e−t .
11

Solution: We are given two linearly independent initial conditions: v1 and v2 . Therefore, by
Theorem 6.1.1, the general solution to (6.1.6) with initial condition X(0) is
! !
2t 2 −t −7
X(t) = r1 e + r2 e .
−3 11

Find r1 and r2 by solving: ! !


2 −7
{c6.1.03eq} X(0) = r1 + r2 . (6.1.7)
−3 11
{c6.1.03c}
3. Find the solution to (6.1.6) satisfying initial conditions X(0) = (−3, 5)t .
The solution with initial condition X(0) = (−3, 5)t is
! !
2 −7
X(t) = 2e2t
+ e−t .
−3 11
{c6.1.03d}
4. Find the solution to (6.1.6) satisfying initial conditions X(0) = (9, −14)t .
The solution with initial condition X(0) = (9, −14)t is
! !
2 −7
X(t) = e2t
− e−t .
−3 11

In Exercises 5 – 8, consider the system of differential equations


dx
= x−y
{Ex.1.06} dt
dy (6.1.8)
= −x + y.
dt

359
§6.1 The Initial Value Problem

{c6.1.06a}
5. The eigenvalues of the coefficient matrix of (6.1.8) are 0 and 2. Find the associated eigenvectors.
Answer: The eigenvector associated to λ1 = 0 is v1 = (1, 1)t , and the eigenvector associated to
λ2 = 2 is v2 = (1, −1)t .
Solution: Solve the systems
! !
1 −1 1 −1
v1 = 0 and v2 = 2v2 .
−1 1 −1 0
{c6.1.06b}
6. Find the solution to (6.1.8) satisfying initial conditions X(0) = (2, −2)t .
Answer: The solution to (6.1.8) with initial condition X(0) = (2, −2)t is
!
2t 1
X(t) = 2e .
−1

Solution: Note that initial conditions v1 and v2 are linearly independent. Therefore, by Theo-
rem 6.1.1, the general solution to (6.1.8) with initial condition X(0) is
! !
1 2t 1
X(t) = r1 + r2 e .
1 −1
Find values for r1 and r2 by solving:
! !
1 1
{c6.1.06eq} X(0) = r1 + r2 . (6.1.9)
1 −1
{c6.1.06c}
7. Find the solution to (6.1.8) satisfying initial conditions X(0) = (2, 6)t .
The solution with initial condition X(0) = (2, 6)t is
! !
1 1
X(t) = 4 − 2e2t .
1 −1
{c6.1.06d}
8. Find the solution to (6.1.8) satisfying initial conditions X(0) = (1, 0)t .
The solution with initial condition X(0) = (1, 0)t is
! !
1 1 1 1
X(t) = + e2t .
2 1 2 −1

In Exercises 9 – 12, consider the system of differential equations


dx
= −y
{E:c6.1.1} dt
dy (6.1.10)
= x.
dt

360
§6.1 The Initial Value Problem

{c6.1.1a}
9. Show that (x1 (t), y1 (t)) = (cos t, sin t) is a solution to (6.1.10).
Solve by evaluation: (x1 (t), y1 (t)) = (cos t, sin t) is a solution because

dx1 d
(t) = (cos t) = − sin t = −y1 (t)
dt dt .
dy1 d
(t) = (sin t) = cos t = x1 (t).
{c6.1.1b} dt dt
10. Show that (x2 (t), y2 (t)) = (− sin t, cos t) is a solution to (6.1.10).
To show that (x2 (t), y2 (t)) = (− sin t, cos t) is a solution, calculate v1 = (x1 (0), y1 (0)) and v2 =
(x2 (0), y2 (0)), obtaining

v1 = (x1 (0), y1 (0) = (cos 0, sin 0) = (1, 0)


v2 = (x2 (0), y2 (0) = (− sin 0, cos 0) = (0, 1).
{c6.1.1c}
11. Using Exercises 9 and 10, find a solution (x(t), y(t)) to (6.1.10) that satisfies (x(0), y(0)) =
(0, 1).
If (x(0), y(0)) = (0, 1) = v2 , then Theorem 6.1.1 states that the solution is

(x(t), y(t)) = (x2 (t), y2 (t)) = (− sin t, cos t).


{c6.1.1d}
12. Using Exercises 9 and 10, find a solution (x(t), y(t)) to (6.1.10) that satisfies (x(0), y(0)) =
(1, 1).
If (x(0), y(0)) = (1, 1) = v1 + v2 , then

(x(t), y(t)) = (x1 (t), y1 (t)) + (x2 (t), y2 (t)) = (cos t − sin t, cos t + sin t).

In Exercises 13 – 14, consider the system of differential equations


dx
= −2x + 7y
{E:c6.1.2} dt
dy (6.1.11)
= 5y,
{c6.1.2a} dt
13. Find a solution to (6.1.11) satisfying the initial condition (x(0), y(0)) = (1, 0).
Answer: If (x(0), y(0)) = (1, 0) = X2 (0), then

(x(t), y(t)) = e−2t (1, 0).

Solution: The general solution to the system is

X(t) = r1 e5t (1, 1) + r2 e−2t (1, 0).

361
§6.1 The Initial Value Problem

To obtain this solution, first rewrite the system of differential equations as


! !
dX −2 7 x
= CX = .
dt 5 0 y

By inspection of C, (1, 1)t and (1, 0)t are eigenvectors with eigenvalues 5 and −2 respectively.
Therefore:
! !
1 1
X1 (t) = e5t
and X2 (t) = e−2t
1 0

are solutions to the differential equation. The initial values X1 (0) = (1, 1) and X2 (0) = (1, 0) are
linearly independent, so the general solution is valid. To find r1 and r2 , evaluate

X(0) = (x(0), y(0)) = r1 (1, 1) + r2 (1, 0) = (r1 + r2 , r1 ).


{c6.1.2b}
14. Find a solution to (6.1.11) satisfying the initial condition (x(0), y(0)) = (−1, 2).
If (x(0), y(0)) = (−1, 2) = 2X1 (0) − 3X2 (0), then

(x(t), y(t)) = 2e5t (1, 1) − 3e−2t (1, 0).

In Exercises 15 – 17, consider the matrix

 
−1 −10 −6
C= 0 4 3 .
 

0 −14 −9
{c6.1.3a}
15. Verify that

     
1 2 6
v1 =  0  v2 =  −1  and v3 =  −3 
     

0 2 7

are eigenvectors of C and find the associated eigenvalues.


Answer: The vector v1 = (1, 0, 0)t is an eigenvector with associated eigenvalue λ1 = −1. The
vector v2 = (2, −1, 2)t is an eigenvector with associated eigenvalue λ2 = −2. The vector v3 =
(6, −3, 7)t is an eigenvector with associated eigenvalue λ3 = −3.

362
§6.1 The Initial Value Problem

Solution: To verify, compute


      
−1 −10 −6 1 −1 1
 0 4 3   0  =  0  = −1  0  .
      

0 −14 −9 0 0 0
      
−1 −10 −6 2 −4 2
 0 4 3   −1  =  2  = −2  −1  .
      

0 −14 −9 2 −4 2
      
−1 −10 −6 6 −18 6
 0 4 3   −3  =  9  = −3  −3  .
      

0 −14 −9 7 −21 7
{c6.1.3b}
16. Find a solution to the system of differential equations Ẋ = CX satisfying the initial condition
X(0) = (10, −4, 9)t .
Answer: The solution
     
1 2 6
X(t) = 2e−t  0  + e−2t  −1  + e−3t  −3 
     

0 2 7

satisfies the initial condition X(0) = (10, −4, 9)t .


Solution: Note that, as a special case of Theorem 6.1.1, three linearly independent eigenvectors
uniquely define a solution for Ẋ = CX with initial condition X(0). First, row reduce the matrix
(v1 |v2 |v3 ) to verify that the eigenvectors are linearly independent. Then, find scalars r1 , r2 , and r3
such that:
X(0) = r1 v1 + r2 v2 + r3 v3 .
In this case, r1 = 2, r2 = 1, and r3 = 1. Substitute these values into

X(t) = r1 eλ1 t v1 + r2 eλ2 t v2 + r3 eλ3 t v3

{c6.1.3c} to obtain the solution.


17. Find a solution to the system of differential equations Ẋ = CX satisfying the initial condition
X(0) = (2, −1, 3)t .
Answer: The solution   
2 6
X(t) = −2e−2t  −1  + e−3t  −3 
   

2 7
satisfies the initial condition X(0) = (2, −1, 3)t .

363
§6.1 The Initial Value Problem

Solution: Solving the system


       
2 1 2 6
X(0) =  −1  = r1  0  + r2  −1  + r3  −3 
       

3 0 2 7

yields r1 = 0, r2 = −2, and r3 = 1.


{c6.1.4A}
18. Show that for some nonzero a the function x(t) = at5 is a solution to the differential equation
ẋ = x4/5 . Then show that there are at least two solutions to the initial value problem x(0) = 0 for
this differential equation.
1
Answer: If a = , then x(t) = at5 is a solution to the differential equation. The zero function
3125
x(t) = 0 is also a solution.
Solution: To find out whether x(t) = at5 is a solution to ẋ = x4/5 , first substitute x(t) into the
left hand side of the differential equation:
dx d
= (at5 ) = 5at4 .
dt dt
Then substitute x(t) into the right hand side of the equation

x4/5 = (at5 )4/5 = a4/5 t4 .

Thus, x(t) is a solution when 5at4 = a4/5 t4 . Solve the equation 5a = a4/5 for a to find that x(t)
1
is a solution when a = 5−5 = or when a = 0. Thus, there are at least two solutions to the
3125
1 4
differential equation: x(t) = t and x(t) = 0.
3125

{c6.1.4} 19. (matlab) Use pplane9 to investigate the system of differential equations
dx
= −2y
{Ex.1.4} dt
dy (6.1.12)
= −x + y.
dt
(a) Use pplane9 to find two independent eigendirections (and hence eigenvectors) for (6.1.12).
(b) Using (a), find the eigenvalues of the coefficient matrix of (6.1.12).
(c) Find a closed form solution to (6.1.12) satisfying the initial condition
!
4
X(0) = .
−1

364
§6.1 The Initial Value Problem

(d) Study the time series of y versus t for the solution in (c) by comparing the graph of the closed
form solution obtained in (c) with the time series graph using pplane9.

(a) The eigenvectors of the coefficient matrix of (6.1.12) are v1 = (1, −1)t and v2 = (2, 1)t . Fig-
ure 19a shows the pplane9 graph of (6.1.12) and its eigendirections.
(b) Answer: The eigenvalue associated to v1 is λ1 = 2, and the eigenvalue associated to v2 is
λ2 = −1.
Solution: Compute
! ! ! ! ! !
0 −2 1 1 0 −2 2 2
=2 and = −1 .
−1 1 −1 −1 −1 1 1 1

(c) Answer: The closed form solution


! !
2t 1 −t 2
X(t) = 2e +e
−1 1

satisfies the initial condition X(0) = (4, −1)t .


Solution: The eigenvectors v1 and v2 are linearly independent, so, by Theorem 6.1.1,

X(t) = r1 eλ1 t v1 + r1 eλ2 t v2 .

Solve ! ! !
4 1 2
X(0) = = r1 + r2
−1 −1 1
to obtain r1 = 2 and r2 = 1.
(d) Figure 19b shows the y vs. t graph of (6.1.12) with initial condition (4, −1). Figure 19c is a
graph of the closed form solution to (6.1.12), generated by the MATLAB commands:

t = linspace(-2.5,1.5);
y = 2*exp(2*t)*(-1) + exp(-1*t)*1;
plot(t,y)

365
§6.1 The Initial Value Problem

20

5
15
4 10
10
3

2 5 0
1
0
0
y

−10
−1 −5
y

−2
−10
−3 −20

−4 −15

−5 −30
−20
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
−25
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 −40
t −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5

Figure 19a Figure 19b Figure 19c

366
§6.2 Closed Form Solutions by the Direct Method

{S:TDM} 6.2 Closed Form Solutions by the Direct Method


In Section 4.7 we showed in detail how solutions to planar systems of constant coefficient
differential equations with distinct real eigenvalues are found. This method was just reviewed
in Section 6.1 where we saw that the crucial step in solving these systems of differential
equations is the step where we find two linearly independent solutions. In this section we
discuss how to find these two linearly independent solutions when the eigenvalues of the
coefficient matrix are either complex or real and equal.
By finding these two linearly independent solutions we will find both the general solution
of the system of differential equations Ẋ = CX and a method for solving the initial value
problem
dX
= CX
{e:eqnA} dt (6.2.1)
X(0) = X0 .

The principle results of this section are summarized as follows. Let C be a 2 × 2 matrix
with eigenvalues λ1 and λ2 , and associated eigenvectors v1 and v2 .

(a) If the eigenvalues are real and v1 and v2 are linearly independent, then the general
solution to (6.2.1) is given by (6.2.2).

(b) If the eigenvalues are complex, then the general solution to (6.2.1) is given by (6.2.3)
and (6.2.4).

(c) If the eigenvalues are equal (and hence real) and there is only one linearly independent
eigenvector, then the general solution to (6.2.1) is given by (6.2.16).

Real Distinct Eigenvalues We have discussed the case when λ1 6= λ2 ∈ R on several


occasions. For completeness we repeat the result. The general solution is:

{E:RD2} X(t) = α1 eλ1 t v1 + α2 eλ2 t v2 . (6.2.2)

The initial value problem is solved by finding real numbers α1 and α2 such that

X0 = α1 v1 + α2 v2 .

See Section 4.7 for a detailed discussion with examples.

Complex Conjugate Eigenvalues Suppose that the eigenvalues of C are complex, that is,
suppose that λ1 = σ + iτ with τ 6= 0 is an eigenvalue of C with eigenvector v1 = v + iw,

367
§6.2 Closed Form Solutions by the Direct Method

where v, w ∈ R2 . We claim that X1 (t) and X2 (t), where


X1 (t) = eσt (cos(τ t)v − sin(τ t)w)
{E:CC1} (6.2.3)
X2 (t) = eσt (sin(τ t)v + cos(τ t)w),
are solutions to (6.2.1) and that the general solution to (6.2.1) is:
{E:CC2} X(t) = α1 X1 (t) + α2 X2 (t), (6.2.4)
where α1 , α2 are real scalars.
There are several difficulties in deriving (6.2.3) and (6.2.4); these difficulties are related
to using complex numbers as opposed to real numbers. In particular, in the derivation of
(6.2.3) we need to define the exponential of a complex number, and we begin by discussing
this issue.

Euler’s Formula We find complex exponentials by using Euler’s celebrated formula:


{E:Euler} eiθ = cos θ + i sin θ (6.2.5)
for any real number θ. A justification of this formula is given in Exercise 1. Euler’s formula
allows us to differentiate complex exponentials, obtaining the expected result:
d iτ t d
e = (cos(τ t) + i sin(τ t))
dt dt
= τ (− sin(τ t) + i cos(τ t))
= iτ (cos(τ t) + i sin(τ t))
= iτ eiτ t .

Euler’s formula also implies that


{E:ecis} eλt = eσt+iτ t = eσt eiτ t = eσt (cos(τ t) + i sin(τ t)), (6.2.6)
where λ = σ + iτ . Most importantly, we note that
d λt
{E:eldiff} e = λeλt . (6.2.7)
dt
We use (6.2.6) and the product rule for differentiation to verify (6.2.7) as follows:
d λt d σt iτ t 
e = e e
dt dt 
σeσt eiτ t + eσt iτ eiτ t

=
= (σ + iτ )eσt+iτ t
= λeλt .

368
§6.2 Closed Form Solutions by the Direct Method

Verification that (6.2.4) is the General Solution A complex vector-valued function X(t) =
X1 (t) + iX2 (t) ∈ Cn consists of a real part X1 (t) ∈ Rn and an imaginary part X2 (t) ∈ Rn .
For such functions X(t) we define
Ẋ = Ẋ1 + iẊ2
and
CX = CX1 + iCX2 .
To say that X(t) is a solution to Ẋ = CX means that
{E:X1X2}
{L:RIsoln} Ẋ1 + iẊ2 = Ẋ = CX = CX1 + iCX2 . (6.2.8)
Lemma 6.2.1. The complex vector-valued function X(t) is a solution to Ẋ = CX if and
only if the real and imaginary parts are real vector-valued solutions to Ẋ = CX.

Proof Equating the real and imaginary parts of (6.2.8) implies that Ẋ1 = CX1 and
Ẋ2 = CX2 . 

It follows from Lemma 6.2.1 that finding one complex-valued solution to a linear differential
equation provides us with two real-valued solutions. Identity (6.2.7) implies that
X(t) = eλ1 t v1
is a complex-valued solution to (6.2.1). Using Euler’s formula we compute the real and
imaginary parts of X(t), as follows.
X(t) = e(σ+iτ )t (v + iw)
= eσt (cos(τ t) + i sin(τ t))(v + iw)
= eσt (cos(τ t)v − sin(τ t)w)
+ ieσt (sin(τ t)v + cos(τ t)w).
Since the real and imaginary parts of X(t) are solutions to Ẋ = CX, it follows that the
real-valued functions X1 (t) and X2 (t) defined in (6.2.3) are indeed solutions.
Returning to the case where C is a 2 × 2 matrix, we see that if X1 (0) = v and X2 (0) = w
are linearly independent, then Corollary 6.1.2 implies that (6.2.4) is the general solution to
{L:rievind} Ẋ = CX. The linear independence of v and w is verified using the following lemma.
Lemma 6.2.2. Let λ1 = σ + iτ with τ 6= 0 be a complex eigenvalue of the 2 × 2 matrix C
with eigenvector v1 = v + iw where v, w ∈ R2 . Then
Cv = σv − τ w
{e:complexcoord} (6.2.9)
Cw = τ v + σw.
and v and w are linearly independent vectors.

369
§6.2 Closed Form Solutions by the Direct Method

Proof By assumption Cv1 = λ1 v1 , that is,

C(v + iw) = (σ + iτ )(v + iw)


{E:viw} (6.2.10)
= (σv − τ w) + i(τ v + σw).

Equating real and imaginary parts of (6.2.10) leads to the system of equations (6.2.9). Note
that if w = 0, then v 6= 0 and τ v = 0. Hence τ = 0, contradicting the assumption that
τ 6= 0. So w 6= 0.
Note also that if v and w are linearly dependent, then v = αw. It then follows from the
previous equation that
Cw = (τ α + σ)w.
Hence w is a real eigenvector; but the eigenvalues of C are not real and C has no real
eigenvectors. 

An Example with Complex Eigenvalues Consider an example of an initial value problem for
a linear system with complex eigenvalues. Let
 
dX −1 2
{e:complexexample} = X = CX, (6.2.11)
dt −5 −3

and  
1
X0 = .
1

The characteristic polynomial for the matrix C is:

pC (λ) = λ2 + 4λ + 13,

whose roots are λ1 = −2 + 3i and λ2 = −2 − 3i. So

σ = −2 and τ = 3.

An eigenvector corresponding to the eigenvalue λ1 is


     
2 2 0
v1 = = +i = v + iw.
−1 + 3i −1 3

It follows from (6.2.3) that

X1 (t) = e−2t (cos(3t)v − sin(3t)w)


X2 (t) = e−2t (sin(3t)v + cos(3t)w),

370
§6.2 Closed Form Solutions by the Direct Method

are solutions to (6.2.11) and X = α1 X1 + α2 X2 is the general solution to (6.2.11). To solve


the initial value problem we need to find α1 , α2 such that

X0 = X(0) = α1 X1 (0) + α2 X2 (0) = α1 v + α2 w,

that is,      
1 2 0
= α1 + α2 .
1 −1 3
1 1
Therefore, α1 = and α2 = and
2 2
 
cos(3t) + sin(3t)
{e:complexexampleans} X(t) = e−2t . (6.2.12)
cos(3t) − 2 sin(3t)

Real and Equal Eigenvalues There are two types of 2 × 2 matrices that have real and
equal eigenvalues — those that are scalar multiples of the identity and those that are not.
An example of a 2 × 2 matrix that has real and equal eigenvalues is
 
λ1 1
{E:equalex} A= , λ1 ∈ R. (6.2.13)
0 λ1

The characteristic polynomial of A is

pA (λ) = λ2 − tr(A)λ + det(A) = λ2 − 2λ1 λ + λ21 = (λ − λ1 )2 .

Thus the eigenvalues of A both equal λ1 .

Only One Linearly Independent Eigenvector An important fact about the matrix A in (6.2.13)
is that it has only one linearly independent eigenvector. To verify this fact, solve the system
of linear equations
Av = λ1 v.
In matrix form this equation is
 
0 1
0 = (A − λ1 I2 )v = v.
0 0

A quick calculation shows that all solutions are multiples of v1 = e1 = (1, 0)t .
In fact, this observation is valid for any 2 × 2 matrix that has equal eigenvalues and is not
a scalar multiple of the identity, as the next lemma shows.

371
§6.2 Closed Form Solutions by the Direct Method

{L:1indeig}
Lemma 6.2.3. Let C be a 2 × 2 matrix. Suppose that C has two linearly independent
eigenvectors both with eigenvalue λ1 . Then C = λ1 I2 .

Proof Let v1 and v2 be two linearly independent eigenvectors of C, that is, Cvj = λ1 vj .
It follows from linearity that Cv = λ1 v for any linear combination v = α1 v1 + α2 v2 . Since
v1 and v2 are linearly independent and dim(R2 ) = 2, it follows that {v1 , v2 } is a basis of R2 .
Thus, every vector v ∈ R2 is a linear combination of v1 and v2 . Therefore, C is λ1 times
the identity matrix. 

Generalized Eigenvectors Suppose that C has exactly one linearly independent real eigen-
vector v1 with a double real eigenvalue λ1 . We call w1 a generalized eigenvector of C it
satisfies the system of linear equations

{e:Cw=lw+va} (C − λ1 I2 )w1 = v1 . (6.2.14)

The matrix A in (6.2.13) has a generalized eigenvector. To verify this point solve the linear
system    
0 1 1
(C − λ1 I2 )w1 = w1 = v 1 =
0 0 0
for w1 = e2 . Note that for this matrix C, v1 = e1 and w1 = e2 are linearly independent.
The next lemma shows that this observation about generalized eigenvectors is always valid.
{L:geneig2}
Lemma 6.2.4. Let C be a 2 × 2 matrix with both eigenvalues equal to λ1 and with one
linearly independent eigenvector v1 . Let w1 be a generalized eigenvector of C, then v1 and
w1 are linearly independent.

Proof If v1 and w1 were linearly dependent, then w1 would be a multiple of v1 and hence
an eigenvector of C. But C −λ1 I2 applied to an eigenvector is zero, which is a contradiction.
Therefore, v1 and w1 are linearly independent. 

The Cayley Hamilton theorem (see Section 6.6) coupled with matrix exponentials (see Sec-
tion 6.5) lead to a simple method for finding solutions to differential equations in the mul-
tiple eigenvalue case — one that does not require solving for either the eigenvector v1 or
the generalized eigenvector w1 . We next prove the special case of Cayley-Hamilton that is
needed.

372
§6.2 Closed Form Solutions by the Direct Method

{L:specialCH}
Lemma 6.2.5. Let C be a 2 × 2 matrix with a double eigenvalue λ1 ∈ R. Then

{e:specialCH} (C − λ1 I2 )2 = 0. (6.2.15)

Proof Suppose that C has two linearly independent eigenvectors. Then Lemma 6.2.3
implies that C − λ1 I2 = 0 and hence that (C − λ1 I2 )2 = 0.
Suppose that C has one linearly independent eigenvector v1 and a generalized eigenvector
w1 . It follows from Lemma 6.2.4(a) that {v1 , w1 } is a basis of R2 . It also follows by definition
of eigenvector and generalized eigenvector that

(C − λ1 I2 )2 v1 = (C − λ1 I2 )0 = 0
(C − λ1 I2 )2 w1 = (C − λ1 I2 )v1 = 0

Hence, (6.2.15) is valid. 

Independent Solutions to Differential Equations with Equal Eigenvalues Suppose that the 2×2
matric C has a double eigenvalue λ1 . Then the general solution to the initial value problem
Ẋ = CX and X(0) = X0 is:

{e:exp1eva} X(t) = eλ1 t [I2 + t(C − λ1 I2 )]X0 . (6.2.16)

This is the form of the solution that is given by matrix exponentials. We verify (6.2.16) by
observing that X(0) = X0 and calculating

CX(t) = eλ1 t [C + t(C 2 − λ1 C)]X0

Ẋ(t) = eλ1 t [λ1 (I2 + t(C − λ1 I2 )) + (C − λ1 I2 )]X0 .


Therefore
CX − Ẋ = eλ1 t M X0
where (6.2.15) implies

M = C + t(C 2 − λ1 C) − λ1 (I2 + t(C − λ1 I2 ))


−(C − λ1 I2 )
= t(C − λ1 I2 )2
= 0.

on use of (6.2.15). A remarkable feature of formula (6.2.16) is that it is not necessary to


compute either the eigenvector of C or its generalized eigenvector.

373
§6.2 Closed Form Solutions by the Direct Method

An Example with Equal Eigenvalues Consider the system of differential equations


 
dX 1 −1
{e:shearexample} = X (6.2.17)
dt 9 −5
with initial value  
2
X0 = .
3
 
1 −1
The characteristic polynomial for the matrix is
9 −5
pC (λ) = λ2 + 4λ + 4 = (λ + 2)2 .
Thus λ1 = −2 is an eigenvalue of multiplicity two. It follows that
 
3 −1
C − λ 1 I2 =
9 −3
and from (6.2.16) that
    
1 + 3t −t 2 2 + 3t
X(t) = e−2t = e−2t .
9t 1 − 3t 3 3 + 9t

Exercises

{c6.6.05}
1. Justify Euler’s formula (6.2.5) as follows. Recall the Taylor series
1 2 1
ex = 1+x+ x + · · · + xn + · · ·
2! n!
1 1 1
cos x = 1 − x2 + x4 + · · · + (−1)n x2n + · · ·
2! 4! (2n)!
1 1 1
sin x = x − x3 + x5 + · · · + (−1)n x2n+1 + · · · .
3! 5! (2n + 1)!
Now evaluate the Taylor series eiθ and separate into real and imaginary parts.
Using the identities i2 = −1, i3 = −i, and i4 = 1, write the Taylor series:
1 1 1 1
eiθ = 1 + iθ + (iθ)2 + (iθ)3 + (iθ)4 + (iθ)5 + · · ·
2! 3! 4! 5!
1 1 1 1
= 1 + iθ − θ2 − iθ3 + θ4 + iθ5 + · · ·
2! 3! 4! 5!
1 1 1 1
= (1 − θ2 + θ4 − · · · ) + i(θ − θ3 + θ5 − · · · )
2! 4! 3! 5!
= cos θ + i sin θ.

374
§6.2 Closed Form Solutions by the Direct Method

In modern language De Moivre’s formula states that


 n
eniθ = eiθ .

In Exercises 2 - 3 use De Moivre’s formula coupled with Euler’s formula (6.2.5) to determine
{c6.6.1a} trigonometric identities for the given quantity in terms of cos θ, sin θ, cos ϕ, sin ϕ.
2. cos(θ + ϕ).
Answer: cos(θ + ϕ) = cos θ cos ϕ − sin θ sin ϕ.
Solution: Using Euler’s formula ((6.2.5)):

cos(θ + ϕ) + i sin(θ + ϕ) = ei(θ+ϕ)


= eiθ eiϕ
= (cos θ + i sin θ)(cos ϕ + i sin ϕ)
= cos θ cos ϕ + i sin θ cos ϕ + i sin ϕ cos θ − sin θ sin ϕ
= (cos θ cos ϕ − sin θ sin ϕ) + i(sin θ cos ϕ + sin ϕ cos θ)

{c6.6.1b} The real part of this formula is equal to cos(θ + ϕ).


3. sin(3θ).
Answer: sin(3θ) = 3 cos2 θ sin θ − sin3 θ.
Solution: Calculate using Euler’s formula and De Moivre’s formula:
cos(3θ) + i sin(3θ) = e3iθ
= (eiθ )3
= (cos θ + i sin θ)3
= (cos2 θ + 2i cos θ sin θ − sin2 θ)(cos θ + i sin θ)
= cos3 θ + 3i cos2 θ sin θ − 3 sin2 θ cos θ − i sin3 θ
= (cos3 θ − 3 sin2 θ cos θ) + i(3 cos2 θ sin θ − sin3 θ).
The imaginary part of this formula is equal to sin(3θ).

{c6.6.2a} In Exercises 4 – 7 compute the general solution for the given system of differential equations.
 
dX −1 −4
4. = X.
dt 2 3
Answer: The general solution to the differential equation is
2et cos(2t) 2et sin(2t)
   
X(t) = α1 t + α2 t .
e (sin(2t) − cos(2t)) −e (sin(2t) + cos(2t))

Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial
pC (λ) = λ2 − 2λ + 5.

375
§6.2 Closed Form Solutions by the Direct Method

The eigenvalues are λ1 = 1 + 2i and λ2 = 1 − 2i. Then, find the eigenvector associated to λ1 by
solving the equation
     
−1 −4 1 + 2i 0 −2 − 2i −4
(C − λ1 I2 )v1 = − v1 = v1 = 0.
2 3 0 1 + 2i 2 2 − 2i

Solve this equation to find that


     
2 2 0
v1 = = +i
−1 − i −1 −1

is an eigenvector of C. Since the eigenvalues of C are complex, we can find the general solution
using (6.2.3) and (6.2.4). In this case, since λ1 = 1 + 2i is an eigenvalue, let σ = 1 and let τ = 2.
Then v1 = v + iw, where v = (2, −1)t and w = (0, −1)t . By (6.2.3),

X1 (t) = eσt (cos(τ t)v − sin(τ t)w) and X2 (t) = eσt (sin(τ t)v + cos(τ t)w)

are solutions to the differential equation. In this case,


      
2 0 2 cos(2t)
X1 (t) = et cos(2t) − sin(2t) = et .
  −1   −1   sin(2t) − cos(2t) 
2 0 2 sin(2t)
X2 (t) = et sin(2t) + cos(2t) = et .
−1 −1 − sin(2t) − cos(2t)

The general solution consists of all linear combinations X(t) = α1 X1 (t) + α2 X2 (t).
{c6.6.2b}
 
dX 8 −15
5. = X.
dt 3 −4
Answer: The general solution to the differential equation is

5e2t cos(3t) 5e2t sin(3t)


   
X(t) = α1 + α2 .
e2t (2 cos(3t) + sin(3t)) e2t (2 sin(3t) − cos(3t))

Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial

pC (λ) = λ2 − 4λ + 13.

The eigenvalues are λ1 = 2 + 3i and λ2 = 2 − 3i. Then, find the eigenvector associated to λ1 by
solving the equation
     
8 −15 2 + 3i 0 6 − 3i −15
(C − λ1 I2 )v1 = − v1 = v1 = 0.
3 −4 0 2 + 3i 3 −6 − 3i

Solve this equation to find that


     
5 5 0
v1 = = +i
2−i 2 −1

376
§6.2 Closed Form Solutions by the Direct Method

is an eigenvector of C. Since the eigenvalues of C are complex, we can find the general solution
using (6.2.3) and (6.2.4). In this case, since λ1 = 2 + 3i is an eigenvalue, let σ = 2 and let τ = 3.
Then v1 = v + iw, where v = (5, 2)t and w = (0, −1)t . By (6.2.3),

X1 (t) = eσt (cos(τ t)v − sin(τ t)w) and X2 (t) = eσt (sin(τ t)v + cos(τ t)w)

are solutions to the differential equation. In this case,


      
5 0 5 cos(3t)
X1 (t) = e2t cos(3t) − sin(3t) = e2t .
  2   −1   sin(3t) + 2 cos(3t) 
5 0 5 sin(3t)
X2 (t) = e2t sin(3t) + cos(3t) = e2t .
2 −1 2 sin(3t) − cos(3t)

{c6.6.2c} The general solution consists of all linear combinations X(t) = α1 X1 (t) + α2 X2 (t).
 
dX 5 −1
6. = X.
dt 1 3
Answer: The general solution to the differential equation is
 
1+t −t
X(t) = e4t X0
t 1−t
where X0 is the initial condition.
Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial

pC (λ) = λ2 − 8λ + 16 = (λ − 4)2 .

Thus, C has a double eigenvalue at λ1 = 4. Solve the system by using formula (6.2.16). Compute
 
1 −1
C − 4I2 =
1 −1
Hence     
1 −1 1+t −t
X(t) = e4t I2 + t = e4t X0
1 −1 t 1−t
{c6.6.2d} where X0 is the initial condition.
 
dX −4 4
7. = X.
dt −1 0
Answer: The general solution to the differential equation is
   
2 2t + 1
X(t) = αe−2t + βe−2t .
1 t+1

Solution: First, find the eigenvalues of C, which are the roots of the characteristic polynomial

pC (λ) = λ2 + 4λ + 4 = (λ + 2)2 .

377
§6.2 Closed Form Solutions by the Direct Method

Thus, C has a double eigenvalue at λ1 = −2. Since C is not a multiple of I2 , C has only one
linearly independent eigenvector. Find this eigenvector by solving the equation
     
−4 4 2 0 −2 4
(C − λ1 I2 )v1 = + v1 = v1 = 0,
−1 0 0 2 −1 2

obtaining v1 = (2, 1)t . Find the generalized eigenvector w1 by solving the equation (C − λ1 I2 )w1 =
v1 , that is,    
−2 4 2
w1 = .
−1 2 1
So w1 = (1, 1)t is the generalized eigenvector. Now, by (6.2.16), we know that the general solution
to Ẋ = CX when C has equal eigenvalues and only one independent eigenvector is

X(t) = eλ1 t (αv1 + β(w1 + tv1 )).

In this case,       


2 1 2
X(t) = e−2t α +β +t .
1 1 1

378
§6.3 Similar Matrices and Jordan Normal Form

{S:6.5} 6.3 Similar Matrices and Jordan Normal Form


In a certain sense every 2 × 2 matrix can be thought of as a member of one of three families
of matrices. Specifically we show that every 2 × 2 matrix is similar to one of the matrices
{D:similar} listed in Theorem 6.3.4, where similarity is defined as follows.
Definition 6.3.1. The n × n matrices B and C are similar if there exists an invertible
n × n matrix P such that
C = P −1 BP.

Our interest in similar matrices stems from the fact that if we know the solutions to the
system of differential equations Ẏ = CY , then we also know the solutions to the system of
{L:simsoln} differential equations Ẋ = BX. More precisely,
Lemma 6.3.2. Suppose that B and C = P −1 BP are similar matrices. If Y (t) is a solution
to the system of differential equations Ẏ = CY , then X(t) = P Y (t) is a solution to the
system of differential equations Ẋ = BX.

Proof Since the entries in the matrix P are constants, it follows that
dX dY
=P .
dt dt
Since Y (t) is a solution to the Ẏ = CY equation, it follows that
dX
= P CY.
dt
Since Y = P −1 X and P CP −1 = B,
dX
= P CP −1 X = BX.
dt
Thus X(t) is a solution to Ẋ = BX, as claimed. 

{L:simdettr} Invariants of Similarity


Lemma 6.3.3. Let A and B be similar 2 × 2 matrices. Then
pA (λ) = pB (λ),
det(A) = det(B),
tr(A) = tr(B),
and the eigenvalues of A and B are equal.

379
§6.3 Similar Matrices and Jordan Normal Form

Proof The determinant is a function on 2 × 2 matrices that has several important proper-
ties. Recall, in particular, from Chapter 3, Theorem 3.8.2 that for any pair of 2 × 2 matrices
A and B:
{e:detprod} det(AB) = det(A) det(B), (6.3.1)

and for any invertible 2 × 2 matrix P

1
{e:detinv} det(P −1 ) = . (6.3.2)
det(P )

Let P be an invertible 2 × 2 matrix so that B = P −1 AP . Using (6.3.1) and (6.3.2) we see


that

pB (λ) = det(B − λI2 )


= det(P −1 AP − λI2 )
= det(P −1 (A − λI2 )P )
= det(A − λI2 )
= pA (λ).

Hence the eigenvalues of A and B are the same. It follows from (4.6.8) and (4.6.9) of
Section 4.6 that the determinants and traces of A and B are equal. 

For example, if
   
−1 0 1 2
A= and P = ,
0 1 1 1
then  
−1 −1 2
P =
1 −1
and  
−1 3 4
P AP = .
−2 −3

A calculation shows that

det(P −1 AP ) = −1 = det(A) and tr(P −1 AP ) = 0 = tr(A),

as stated in Lemma 6.3.3.

380
§6.3 Similar Matrices and Jordan Normal Form

Classification of Jordan Normal Form 2 × 2 Matrices We now classify all 2 × 2 matrices


up to similarity.
{T:putinform}
Theorem 6.3.4. Let C and P = (v1 |v2 ) be 2 × 2 matrices where the vectors v1 and v2 are
specified below.

(a) Suppose that C has two linearly independent real eigenvectors v1 and v2 with real
eigenvalues λ1 and λ2 . Then
 
λ1 0
P −1 CP = .
0 λ2

(b) Suppose that C has no real eigenvectors and complex conjugate eigenvalues σ ± iτ where
τ 6= 0. Then  
−1 σ −τ
P CP = ,
τ σ
where v1 + iv2 is an eigenvector of C associated with the eigenvalue λ1 = σ − iτ .

(c) Suppose that C has exactly one linearly independent real eigenvector v1 with real eigen-
value λ1 . Then  
λ1 1
P −1 CP = ,
0 λ1
where v2 is a generalized eigenvector of C that satisfies

{e:Cw=lw+v} (C − λ1 I2 )v2 = v1 . (6.3.3)

Proof The strategy in the proof of this theorem is to determine the 1st and 2nd columns
of P −1 CP by computing (in each case) P −1 CP ej for j = 1 and j = 2. Note from the
definition of P that
P e1 = v1 and P e2 = v2 .
In addition, if P is invertible, then

P −1 v1 = e1 and P −1 v2 = e2 .

Note that if v1 and v2 are linearly independent, then P is invertible.


(a) Since v1 and v2 are assumed to be linearly independent, P is invertible. So we can
compute
P −1 CP e1 = P −1 Cv1 = λP −1 v1 = λe1 .

381
§6.3 Similar Matrices and Jordan Normal Form

It follows that the 1st column of P −1 CP is


 
λ1
.
0

Similarly, the 2nd column of P −1 CP is


 
0
λ2

thus verifying (a).


(b) Lemma 6.2.2 implies that v1 and v2 are linearly independent and hence that P is
invertible. Using (6.2.9), with τ replaced by −τ , v replaced by v1 , and w replaced by w1 ,
we calculate
P −1 CP e1 = P −1 Cv1 = σP −1 v1 + τ P −1 v2 = σe1 + τ e2 ,

and
P −1 CP e2 = P −1 Cv2 = −τ P −1 v1 + σP −1 v2 = −τ e1 + σe2 .

Thus the columns of P −1 CP are


   
σ −τ
and ,
τ σ

as desired.
(c) Let v1 be an eigenvector and assume that v2 is a generalized eigenvector satisfying
(6.3.3). By Lemma 6.2.4 the vectors v1 and v2 exist and are linearly independent.
For this choice of v1 and v2 , compute

P −1 CP e1 = P −1 Cv1 = λ1 P −1 v1 = λ1 e1 ,

and
P −1 CP e2 = P −1 Cv2 = P −1 v1 + λ1 P −1 v2 = e1 + λ1 e2 .

Thus the two columns of P −1 CP are:


   
λ1 1
and .
0 λ1

382
§6.3 Similar Matrices and Jordan Normal Form

name normalform equations closed form


 λsolution
e 1t
 
λ1 0 0
(a) Ẋ = X X(t) = λ2 t X0
 0 λ2  0 e 
σ −τ cos(τ t) − sin(τ t)
(b) Ẋ = X X(t) = eσt X0
 τ σ   sin(τ t) cos(τ t)
λ1 1 1 t
(c) Ẋ = X X(t) = eλ1 t X0
0 λ1 0 1

{T:3sys} Table 2: Solutions to Jordan normal form ODEs with X(0) = X0 .

Solutions of Jordan Normal Form Equations The eigenvectors of the matrices in Ta-
ble 2(a) are v1 = (1, 0)t and v2 = (0, 1)t . Hence, the closed form solution of (a) in that table
follows from the direct solution in (6.2.2).
The eigenvectors of the matrices in Table 2(b) are v1 = v + iw and v2 = v − iw, where
v = (0, 1)t and w = (1, 0)t . Hence, the closed form solution of (a) in that table follows from
the direct solution in (6.2.16)
Finally, the eigenvector and generalized eigenvector of the matrices in Table 2(c) are v1 =
(1, 0)t and w1 = (0, 1)t . Hence, the closed form solution of (c) in that table follows from the
direct solution in (6.2.3)

Closed Form Solutions Using Similarity We now use Lemma 6.3.2, Theorem 6.3.4, and
the explicit solutions to the normal form equations Table 2 to find solutions for Ẋ = CX
where C is any 2 × 2 matrix. The idea behind the use of similarity to solve systems of
ODEs is to transform a given system into another normal form system whose solution is
already known. This method is very much like the technique of change of variables used
when finding indefinite integrals in calculus.
We suppose that we are given a system of differential equations Ẋ = CX and use Theo-
rem 6.3.4 to transform C by similarity to one of the normal form matrices listed in that
theorem. We then solve the transformed equation (see Table 2) and use Lemma 6.3.2 to
transform the solution back to the given system.
For example, suppose that C has a complex eigenvalue σ −iτ with corresponding eigenvector
v + iw. Then Theorem 6.3.4 states that
 
−1 σ −τ
B = P CP = ,
τ σ
where P = (v|w) is an invertible matrix. Using Table 2 the general solution to the system

383
§6.3 Similar Matrices and Jordan Normal Form

of equations Ẏ = BY is:
  
σt cos(τ t) − sin(τ t) α
Y (t) = e .
sin(τ t) cos(τ t) β

Lemma 6.3.2 states that


X(t) = P Y (t)
is the general solution to the Ẋ = CX system. Moreover, we can solve the initial value
problem by solving  
α
X0 = P Y (0) = P
β
for α and β. In particular,  
α
= P −1 X0 .
β
Putting these steps together implies that
 
cos(τ t) − sin(τ t)
{e:exp0ev} X(t) = eσt P P −1 X0 (6.3.4)
sin(τ t) cos(τ t)

is the solution to the initial value problem.

The Example with Complex Eigenvalues Revisited Recall the example in (6.2.11)
 
dX −1 2
= X,
dt −5 −3

with initial values  


1
X0 = .
1
This linear system has a complex eigenvalue σ −iτ = −2−3i with corresponding eigenvector
 
2
v + iw = .
−1 − 3i

Thus the matrix P that transforms C into normal form is


   
2 0 1 3 0
P = and P =
−1
.
−1 −3 6 −1 −2

384
§6.3 Similar Matrices and Jordan Normal Form

It follows from (6.3.4) that the solution to the initial value problem is
 
−2t cos(3t) − sin(3t) −1
X(t) = e P P X0
sin(3t) cos(3t)
   
1 −2t 2 0 cos(3t) − sin(3t) 3 0
= e X0 .
6 −1 −3 sin(3t) cos(3t) −1 −2

A calculation gives
   
1 −2t 2 0 cos(3t) − sin(3t) 1
X(t) = e
2 −1 −3 sin(3t) cos(3t) −1
 
−2t cos(3t) + sin(3t)
= e .
cos(3t) − 2 sin(3t)

Thus the solution to (6.2.11) that we have found using similarity of matrices is identical to
the solution (6.2.12) that we found by the direct method.
Solving systems with either distinct real eigenvalues or equal eigenvalues works in a similar
fashion.

Exercises

{c6.5.1}
1. Suppose that the matrices A and B are similar and the matrices B and C are similar. Show
that A and C are also similar matrices.
Since A and B are similar and B and C are similar, A = P −1 BP for some matrix P , and B =
Q−1 CQ for some matrix Q. Therefore,

A = P −1 BP = P −1 Q−1 CQP.

By Proposition 3.7.3, (QP )−1 = P −1 Q−1 , so

A = (QP )−1 C(QP )

thus, A and C are similar.

{c6.5.2}
2. Use (4.6.13) in Chapter 3 to verify that the traces of similar matrices are equal.
Let A and B be similar matrices such that A = P −1 BP for some matrix P . Then, using (4.6.13),

tr(A) = tr(P −1 BP ) = tr(BP −1 P ) = tr(B).

In Exercises 3 – 4 determine whether or not the given matrices are similar, and why.

385
§6.3 Similar Matrices and Jordan Normal Form

{c6.5.3a}
   
1 2 2 −2
3. A = and B = .
3 4 −3 8
Answer: Matrices A and B are not similar.
Solution: When two matrices are similar, the traces are equal. In this case, tr(A) = 5 and
{c6.5.3b} tr(B) = 10, so the matrices are not similar.
   
2 2 4 −2
4. C = and D = .
2 2 −2 4
Answer: Matrices C and D are not similar.
Solution: The traces of the matrices are unequal; tr(C) = 4 and tr(D) = 8.
{c6.5.4}
5. Let B = P −1 AP so that A and B are similar matrices. Suppose that v is an eigenvector of B
with eigenvalue λ. Show that P v is an eigenvector of A with eigenvalue λ.
Since, A and B are similar matrices, if Bv = λv, then
A(P v) = P P −1 AP v = P Bv = λ(P v).
Thus, P v is an eigenvector of A with eigenvalue λ.
{c6.5.5}
6. Which n × n matrices are similar to In ?
Answer: In is similar only to itself.
Solution: If A is similar to In , then A = P −1 In P = P −1 P = In .
{c6.3.1}
7. Solve the initial value problem
ẋ = 2x + 3y
ẏ = −3x + 2y
where x(0) = 1 and y(0) = −2.
Answer: The solution to the initial value problem (x(0), y(0) = (1, −2) for this system is:
e2t (cos(3t) − 2 sin(3t))
   
x(t)
= .
y(t) −e2t (sin(3t) + 2 cos(2t))

Solution: Let σ = 2 and τ = −3. Then,


ẋ = σx − τ y
ẏ = τ x + σy
so, according to Table 2
   σt   2t 
x(t) e (x0 cos(τ t) − y0 sin(τ t)) e (x0 cos(−3t) − y0 sin(−3t))
= σt = 2t .
y(t) e (x0 sin(τ t) + y0 cos(τ t)) e (x0 sin(−3t) + y0 cos(2t))

386
§6.3 Similar Matrices and Jordan Normal Form

{c6.3.2}
8. Solve the initial value problem
ẋ = −2x + y
ẏ = −2y
where x(0) = 4 and y(0) = −1.
Answer: The solution for the initial value problem x(0) = 4 and y(0) = −1 for this system is:

e−2t (4 − t)
   
x(t)
= .
y(t) −e−2t

Solution: Let λ = −2. Then, by Table 2

ẋ = λx + y
ẏ = λy
so,
eλt (x0 + y0 t) e−2t (x0 + y0 t)
     
x(t)
= = .
y(t) eλt y0 e−2t y0

{c6.3.3} 9. (matlab) Use pplane9 to plot phase plane portraits for each of the three types of linear systems
(a), (b) and (c) in Table 2. Based on this computer exploration answer the following questions:

(i) If a solution to that system spirals about the origin, is the system of differential equations of
type (a), (b) or (c)?
(ii) How many eigendirections are there for equations of type (c)?
(iii) Let (x(t), y(t)) be a solution to one of these three types of systems and suppose that y(t)
oscillates up and down infinitely often. Then (x(t), y(t)) is a solution for which type of system?

Figure 9a shows the graph of the system

ẋ = λx
ẏ = µy

where λ = 2 and µ = −3.


Figure 9b shows the graph of the system

ẋ = σx − τ y
ẏ = τ x + σy

where σ = 2 and τ = 3.

387
§6.3 Similar Matrices and Jordan Normal Form

Figure 9c shows the graph of the system

ẋ = λx + y
ẏ = λy

where λ = 2.
(a) The system is of type (b) if a solution spirals about the origin.
(b) Equations of type (c) have one eigendirection.
(c) If y(t) oscillates up and down infinitely often, then (x(t), y(t)) is a solution to a system of type
(b).

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
y

y
−1 −1 −1

−2 −2 −2

−3 −3 −3

−4 −4 −4

−5 −5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x x

Figure 9a Figure 9b Figure 9c

{a6.3.1}
10. Use pplane9 to verify that the nonzero solutions to the system
dX
= CX
dt
where  
0 −1
{a6.3.1_C} C= (6.3.5)
1 0
are circles around the origin. Let  
2 1
P =
3 4
and let  
−2.8 −3.4
B = P −1 CP =
2.6 2.8
Describe the solutions to the system
dX
{a6.3.1_B} = BX. (6.3.6)
dt
What is the relationship between solutions of (6.3.5) to solutions of (6.3.6)?

388
§6.3 Similar Matrices and Jordan Normal Form

Cursor position: (-6.14, 3.45) Cursor position: (-4.73, 2.82)


The backward orbit from (-1.1, 3.2) --> a nearly closed orbit. The backward orbit from (0.54, 0.76) --> a nearly closed orbit.

The forward orbit from (-2.1, 3.9) --> a nearly closed orbit. The forward orbit from (0.69, 1.1) --> a nearly closed orbit.
The backward orbit from (-2.1, 3.9) --> a nearly closed orbit. The backward orbit from (0.69, 1.1) --> a nearly closed orbit.

Figure 10a Figure 10b

Solution:
The phase plane

389
§6.4 Sinks, Saddles, and Sources

{S:6.7} 6.4 Sinks, Saddles, and Sources


The qualitative theory of autonomous differential equations begins with the observation
that many important properties of solutions to constant coefficient systems of differential
equations
dX
{e:C2} = CX (6.4.1)
dt
are unchanged by similarity.
We call the origin of the linear system (6.4.1) a sink (or asymptotically stable) if all solutions
X(t) satisfy
lim X(t) = 0.
t→∞

The origin is a source if all nonzero solutions X(t) satisfy

lim ||X(t)|| = ∞.
t→∞

Finally, the origin is a saddle if some solutions limit to 0 and some solutions grow infinitely
large. Recall also from Lemma 6.3.2 that if B = P −1 CP , then P −1 X(t) is a solution to
Ẋ = BX whenever X(t) is a solution to (6.4.1). Since P −1 is a matrix of constants that do
not depend on t, it follows that

lim X(t) = 0 ⇐⇒ lim P −1 X(t) = 0.


t→∞ t→∞

or
lim ||X(t)|| = ∞ ⇐⇒ lim ||P −1 X(t)|| = ∞.
t→∞ t→∞

It follows the origin is C is a sink (or saddle or source) for (6.4.1) if and only if P −1 X(t) is
a sink (or saddle or source) for Ẋ = BX.
{C:asympstlin}
Theorem 6.4.1. Consider the system (6.4.1) where C is a 2 × 2 matrix.

(a) If the eigenvalues of C have negative real part, then the origin is a sink.

(b) If the eigenvalues of C have positive real part, then the origin is a source.

(c) If one eigenvalue of C is positive and one is negative, then the origin is a saddle.

Proof Lemma 6.3.3 states that the similar matrices B and C have the same eigenvalues.
Moreover, as noted the origin is a sink, saddle, or source for B if and only if it is a sink,
saddle, or source for C. Thus, we need only verify the theorem for normal form matrices as
given in Table 2.

390
§6.4 Sinks, Saddles, and Sources

(a) If the eigenvalues λ1 and λ2 are real and there are two independent eigenvectors, then
Chapter 6, Theorem 6.3.4 states that the matrix C is similar to the diagonal matrix
 
λ1 0
B= .
0 λ2

The general solution to the differential equation Ẋ = BX is


x1 (t) = α1 eλ1 t and x2 (t) = α2 eλ2 t .
Since
lim eλ1 t = 0 = lim eλ2 t ,
t→∞ t→∞
when λ1 and λ2 are negative, it follows that
lim X(t) = 0
t→∞

for all solutions X(t), and the origin is a sink. Note that if both of the eigenvalues are
positive, then X(t) will undergo exponential growth and the origin is a source.
(b) If the eigenvalues of C are the complex conjugates σ ±iτ where τ 6= 0, then Chapter 6,
Theorem 6.3.4 states that after a similarity transformation (6.4.1) has the form
 
σ −τ
Ẋ = X,
τ σ

and solutions for this equation have the form (6.3.4) of Chapter 6, that is,
 
cos(τ t) − sin(τ t)
X(t) = eσt X0 = eσt Rτ t X0 ,
sin(τ t) cos(τ t)

where Rτ t is a rotation matrix (recall (3.2.2) of Chapter 3). It follows that as time evolves
the vector X0 is rotated about the origin and then expanded or contracted by the factor
eσt . So when σ < 0, lim X(t) = 0 for all solutions X(t). Hence the origin is a sink and
t→∞
when σ > 0 solutions spiral away from the origin and the origin is a source.
(c) If the eigenvalues are both equal to λ1 and if there is only one independent eigenvector,
then Chapter 6, Theorem 6.3.4 states that after a similarity transformation (6.4.1) has the
form  
λ1 1
Ẋ = X,
0 λ1
whose solutions are  
tλ 1 t
X(t) = e X0
0 1

391
§6.4 Sinks, Saddles, and Sources

using Table 2(c). Note that the functions eλ1 t and teλ1 t both have limits equal to zero
as t → ∞. In the second case, use l’Hôspital’s rule and the assumption that −λ1 > 0 to
compute
t 1
lim = − lim = 0.
t→∞ e−λ1 t t→∞ λ1 e−λ1 t

Hence lim X(t) = 0 for all solutions X(t) and the origin is asymptotically stable. Note that
t→∞
initially ||X(t)|| can grow since t is increasing. But eventually exponential decay wins out
and solutions limit on the origin. Note that solutions grow exponentially when λ1 > 0. 

Theorem 6.4.1 shows that the qualitative features of the origin for (6.4.1) depend only on
the eigenvalues of C and not on the formulae for solutions to (6.4.1). This is a much simpler
{T:det_trace} calculation. However, Theorem 6.4.2 simplifies the calculation substantially further.
Theorem 6.4.2. (a) If det(C) < 0, then 0 is a saddle.
(b) If det(C) > 0 and tr(C) < 0, then 0 is a sink.
(c) If det(C) > 0 and tr(C) > 0, then 0 is a source.

Proof Recall from (4.6.9) that det(C) is the product of the eigenvalues of C. Hence,
if det(C) < 0, then the signs of the eigenvalues must be opposite, and we have a saddle.
Next, suppose det(C) > 0. If the eigenvalues are real, then the eigenvalues are either both
positive (a source) or both negative (a sink). Recall from (4.6.8) that tr(C) is sum of the
eigenvalues and the sign of the trace determines the sign of the eigenvalues. Finally, assume
the eigenvalues are complex conjugates σ ± iτ . Then det(C) = σ 2 + τ 2 > 0 and tr(C) = 2σ.
Thus, the sign of the real parts of the complex eigenvalues is given by the sign of tr(C). 

Time Series It is instructive to note how the time series x1 (t) damps down to the origin
in the three cases listed in Theorem 6.4.1. In Figure 19 we present the time series for the
three coefficient matrices:
 
−2 0
C1 = ,
0 −1
 
−1 −55
C2 = ,
55 −1
 
−2 1
C3 = .
0 −2
In this figure, we can see the exponential decay to zero associated with the unequal real
eigenvalues of C1 ; the damped oscillation associated with the complex eigenvalues of C2 ;

392
§6.4 Sinks, Saddles, and Sources

and the initial growth of the time series due to the te−2t term followed by exponential decay
to zero in the equal eigenvalue C3 example.
x’=Ax+By A = −2 B=0 x’=Ax+By A = −1 B = −55 x’=Ax+By A = −2 B=1
y’=Cx+Dy C=0 D = −1 y’=Cx+Dy C = 55 D = −1 y’=Cx+Dy C=0 D = −2

0
0
15
−0.5
−2

−1
10 −4

−1.5
−6
5
−2
−8
x

x
−2.5 0 −10

−3 −12
−5

−3.5 −14

−10
−4 −16

−4.5 −18
−15

−20
−5
−0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 −1 0 1 2 3 4 −0.5 0 0.5 1 1.5 2 2.5 3
t t t

{F:oscil} Figure 19: Time series for different sinks.

Sources Versus Sinks The explicit form of solutions to planar linear systems shows that
solutions with initial conditions near the origin grow exponentially in forward time when
the origin of (6.4.1) is a source. We can prove this point geometrically, as follows.
The phase planes of sources and sinks are almost the same; they have the same trajectories
but the arrows are reversed. To verify this point, note that
{e:C3} Ẋ = −CX (6.4.2)
is a sink when (6.4.1) is a source; observe that the trajectories of solutions of (6.4.1) are the
same as those of (6.4.2) — just with time running backwards. For let X(t) be a solution
to (6.4.1); then X(−t) is a solution to (6.4.2). See Figure 20 for plots of Ẋ = BX and
Ẋ = −BX where  
−1 −5
{E:SS} B= . (6.4.3)
5 −1
So when we draw schematic phase portraits for sinks, we automatically know how to draw
schematic phase portraits for sources. The trajectories are the same — but the arrows point
in the opposite direction.

Phase Portraits for Saddles Next we discuss the phase portraits of linear saddles. Using
pplane9, draw the phase portrait of the saddle
ẋ = 2x + y
{e:saddlet} (6.4.4)
ẏ = −x − 3y,

393
§6.4 Sinks, Saddles, and Sources

x’=Ax+By A = −1 B = −5 x’=Ax+By A=1 B=5


y’=Cx+Dy C=5 D = −1 y’=Cx+Dy C = −5 D=1

5 5

4 4

3 3

2 2

1 1

0 0
y

y
−1 −1

−2 −2

−3 −3

−4 −4

−5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

{F:SS} Figure 20: (Left) Sink Ẋ = BX where B is given in (6.4.3). (Right) Source Ẋ = −BX.

as in Figure 21. The important feature of saddles is that there are special trajectories (the
{D:stablemfld} eigendirections) that limit on the origin in either forward or backward time.
Definition 6.4.3. The stable manifold or stable orbit of a saddle consists of those trajec-
tories that limit on the origin in forward time; the unstable manifold or unstable orbit of a
saddle consists of those trajectories that limit on the origin in backward time.

Let λ1 < 0 and λ2 > 0 be the eigenvalues of a saddle with associated eigenvectors v1 and
v2 . The stable orbits are given by the solutions X(t) = ±eλ1 t v1 and the unstable orbits are
given by the solutions X(t) = ±eλ2 t v2 .

Stable and Unstable Orbits using pplane9 The program pplane9 is programmed to draw the
stable and unstable orbits of a saddle on command. Although the principal use of this
feature is seen when analyzing nonlinear systems, it is useful to introduce this feature here.
As an example, load the linear system (6.4.4) into pplane9 and click on Proceed. Now pull
down the PPLANE9 Options menu and click on Find an equilibrium. Click the cross hairs in
the PPLANE9 Display window on a point near the origin; pplane9 responds by opening a new
window — the PPLANE9 Equilibrium point data window — and by putting a small yellow
circle about the origin. The circle indicates that the numerical algorithm programmed into
pplane9 has detected an equilibrium near the chosen point. A new window opens and displays
the message There is a saddle point at (0, 0). This window also displays the coefficient matrix

394
§6.4 Sinks, Saddles, and Sources

5 25
x
4
y
3 20

1 15

0
y

x and y
−1 10

−2

−3 5

−4

−5 0

−5 −4 −3 −2 −1 0 1 2 3 4 5
x
−5
−1.5 −1 −0.5 0 0.5 1 1.5 2
t

{F:linsaddle} Figure 21: (Left) Saddle phase portrait. (Right) First quadrant solution time series.

(called the Jacobian at the equilibrium) and its eigenvalues and eigenvectors. This process
numerically verifies that the origin is a saddle (a fact that could have been verified in a more
straightforward way).
Now pull down the PPLANE9 Options menu again and click on Plot stable and unstable
orbits. Next click on the mouse when the cross hairs are within the yellow circle and pplane9
responds by drawing the stable and unstable orbits. The result is shown in Figure 21(left).
On this figure we have also plotted one trajectory from each quadrant; thus obtaining the
phase portrait of a saddle. On the right of Figure 21 we have plotted a time series of the first
quadrant solution. Note how the x time series increases exponentially to +∞ in forward
time and the y time series decreases in forward time while going exponentially towards
−∞. The two time series together give the trajectory (x(t), y(t)) that in forward time is
asymptotic to the line given by the unstable eigendirection.

Exercises

In Exercises 1 – 3 determine whether or not the equilibrium at the origin in the system of differential
equations Ẋ = CX is asymptotically stable.
{E:stabmata}

395
§6.4 Sinks, Saddles, and Sources

 
1 2
1. C = .
4 1
Answer: The origin is not asymptotically stable.
Solution: Theorem 6.4.1 states that the origin is a stable equilibrium only if all eigenvectors have
negative real √
part. The characteristic
√ polynomial of C is pC (λ) = λ2 − 2λ − 5. Thus, the eigenvalues
are λ1 = 1 + 6 and λ2 = 1 − 6. Since λ1 > 0, the origin is not stable.
{E:stabmatb}
 
−1 2
2. C = .
−4 −1
Answer: The origin is asymptotically stable.
Solution: The characteristic
√ polynomial
√ of the matrix is pC (λ) = λ +2λ+7. Thus, the eigenvalues
2

are λ1 = −1 + 2 2i and λ2 = −1 − 2 2i. Both of these have negative real part, so the origin is
{E:stabmatc} stable.
 
2 1
3. C = .
1 −5
Answer: The origin is not asymptotically stable.
Solution: The characteristic polynomial of the matrix is pC (λ) = λ2 − 3λ − 11. Thus, the
3 √ 3 √
eigenvalues are λ1 = + 53 and λ2 = − 53. Since λ1 > 0, the origin is not stable.
2 2

In Exercises 4 – 9 determine whether the equilibrium at the origin in the system of differential
equations Ẋ = CX is a sink, a saddle or a source.
{E:sisasoa}
 
−2 2
4. C = .
0 −1
Answer: The origin of the system Ẋ = CX is a sink.
Solution: The characteristic polynomial of C is pC (λ) = λ2 + 3λ + 2. So the eigenvalues are
λ1 = −1 and λ2 = −2. Since both eigenvalues have negative real part, the origin is a sink.
{E:sisasob}
 
3 5
5. C = .
0 −2
Answer: The origin of the system Ẋ = CX is a saddle.
Solution: The characteristic polynomial of C is pC (λ) = λ2 − λ − 6. So the eigenvalues are λ1 = 3
and λ2 = −2. Since one eigenvalue is negative and one is positive, the origin is a saddle.
{E:sisasoc}
 
4 2
6. C = .
−1 2
Answer: The origin of the system Ẋ = CX is a source.

396
§6.4 Sinks, Saddles, and Sources

Solution: The characteristic polynomial of C is pC (λ) = λ2 − 6λ + 10. So the eigenvalues are


{E:sisasod} λ = 3 ± i. Since both eigenvalues have positive real part, the origin is a source.
 
8 0
7. C = .
−5 3
Answer: The origin of the system Ẋ = CX is a source.
Solution: The characteristic polynomial of C is pC (λ) = λ2 − 11λ + 24. So the eigenvalues are
{E:sisasoe} λ1 = 8 and λ2 = 3. Since both eigenvalues have positive real part, the origin is a source.
 
9 −11
8. C = .
−11 9
Answer: The origin of the system Ẋ = CX is a saddle.
Solution: The characteristic polynomial of C is pC (λ) = λ2 − 18λ − 40. So the eigenvalues are
{E:sisasof} λ1 = 20 and λ2 = −2. Since one eigenvalue is positive and one is negative, the origin is a saddle.
 
1 −8
9. C = .
2 1
Answer: The origin of the system Ẋ = CX is a source.
Solution: The characteristic polynomial of C is pC (λ) = λ2 − 2λ + 17. So the eigenvalues are
λ = 1 ± 4i. Since both eigenvalues have positive real part, the origin is a source.

In Exercises 10 – 13 use pplane9 to determine whether the origin is a saddle, sink, or source in
Ẋ = CX for the given matrix C.
 
10 −2.7
10. (matlab) C = .
{E:sssa} 4.32 1.6
Answer: The origin is a source.
Solution: Enter the system into pplane9. Then compute trajectories with different initial conditions
and note that all trajectories go away from the origin in forward time.
 
−10 −2.7
11. (matlab) C = .
{E:sssb} 4.32 1.6
Answer: The origin is a saddle.
Solution: Enter the system into pplane9. Then compute trajectories with different initial conditions
and note that some trajectories approach the origin in forward time, while some approach the origin
in backward time.
 
−1 2
12. (matlab) C = .
{E:sssc} 4.76 1.5
Answer: The origin is a saddle.

397
§6.4 Sinks, Saddles, and Sources

Solution: Enter the system into pplane9. Then compute trajectories with different initial conditions
and note that some trajectories approach the origin in forward time, while some approach the origin
in backward time.
 
−2 −2
13. (matlab) C = .
{E:sssd} 4 1
Answer: The origin is a sink.
Solution: Enter the system into pplane9. Then compute trajectories with different initial conditions
and note that all trajectories approach the origin in forward time.

In Exercises 14 – 15 the given matrices B and C are similar. Observe that the phase portraits of
the systems Ẋ = BX and Ẋ = CX are qualitatively the same in two steps.

(a) Use MATLAB to find the 2 × 2 matrix P such that B = P −1 CP . Use map to understand how
the matrix P moves points in the plane.
(b) Use pplane9 to observe that P moves solutions of Ẋ = BX to the solution of Ẋ = CX. Write
a sentence or two describing your results.
   
2 3 1 1 −1
14. (matlab) C = and B = .
{E:sima} −1 −3 2 −9 −3
(a) Answer: √  
2 1 −1
P = .
2 1 1
The matrix P rotates vectors in the plane by 45◦ counterclockwise.
Solution: Enter matrices B and C into MATLAB. Then type

[Q,D] = eig(C);

Since C has distinct eigenvalues (you can check this using MATLAB), the matrix D is a diagonal
matrix with the eigenvalues of C along its diagonal. This matrix is similar to C. Indeed, D =
Q−1 CQ. The matrix D is also similar to B, and D = R−1 BR. Find the matrix R by typing

[R,D] = eig(B);

We know that D = R−1 BR and D = Q−1 CQ for the same diagonal matrix D. Therefore,

B = RDR−1 = R(Q−1 CQ)R−1 = (QR−1 )−1 C(QR).

Thus, P = QR−1 , and typing

P = Q*inv(R)

398
§6.4 Sinks, Saddles, and Sources

in MATLAB yields P .
(b) Answer: The solutions of Ẋ = CX are found by rotating the solutions of Ẋ = BX by 45◦
counterclockwise.
Solution: Enter the system Ẋ = BX into pplane9. Then enter the system Ẋ = CX. Note that
both systems are saddles. You can plot the stable and unstable trajectories at the origin in each
system to see that the trajectories for Ẋ = CX appear to be about 45◦ counterclockwise of those
for Ẋ = BX. Thus, if X(t) is a solution to Ẋ = BX, then P X(t) is a solution to Ẋ = CX,
verifying Lemma 6.3.2.
   
−1 5 −1 0.5
15. (matlab) C = and B = .
{E:simb} −5 −1 −50 −1
(a) Answer:  
7.1063 0
P ≈ .
0 0.7106
The matrix P stretches the x-coordinate of a vector and shrinks the y-coordinate.
Solution: Enter matrices B and C into MATLAB. Then type

[Q,D] = eig(C);

Since C has distinct eigenvalues (you can check this using MATLAB), the matrix D is a diagonal
matrix with the eigenvalues of C along its diagonal. This matrix is similar to C. Indeed, D =
Q−1 CQ. The matrix D is also similar to B, and D = R−1 BR. Find the matrix R by typing

[R,D] = eig(B);

We know that D = R−1 BR and D = Q−1 CQ for the same diagonal matrix D. Therefore,

B = RDR−1 = R(Q−1 CQ)R−1 = (QR−1 )−1 C(QR).

Thus, P = QR−1 , and typing

P = Q*inv(R)

in MATLAB yields P .
(b) Answer: The solutions of Ẋ = CX are obtained from the solutions of Ẋ = BX by stretching
the x-coordinate by a factor of 7.1063 and the y-coordinate by a factor of 0.7106.
Solution: Enter the system Ẋ = BX into pplane9. Then enter the system Ẋ = CX. Note that
both systems are spirals. You can plot trajectories in each system to see that the trajectories for
Ẋ = CX appear to be similar to those for Ẋ = BX, but stretched in one direction and contracted
in the other. Thus, if X(t) is a solution to Ẋ = BX, then P X(t) is a solution to Ẋ = CX, verifying
Lemma 6.3.2.

399
§6.4 Sinks, Saddles, and Sources

In Exercises 16-18 use the given data (the eigenvectors v1 , v2 ∈ R2 and associated eigenvalues
λ1 , λ2 ∈ C of the 2 × 2 matrix C, and initial condition X0 ∈ R2 ) to

(a) Find the general solution of the system of differential equations


dX
{A6.4-a} = CX. (6.4.5)
dt
{A6.4.1} (b) Sketch the trajectory in phase space of (6.4.5) with initial condition X0 .
16.      
1 1 2
v1 = v2 = λ1 = −1 λ2 = 3 X0 =
−1 1 0

Solution: Since the eigenvalues are real, the general solution is given by (6.2.2) and is:
   
1 1
X(t) = α1 e−t + α2 e3t
−1 1

Cursor position: (-6.45, 4.47)


The first unstable trajectory left the computation window.
The second unstable trajectory left the computation window.

The forward orbit from (2, 0) left the computation window.

Figure 16
{A6.4.2}
17.      
1 0 2
v1 = ; v2 = ; λ1 = −1; λ2 = 3; X0 =
−1 −3 −2

Solution: Since the eigenvalues are real the general solution is given by (6.2.2) and is:
   
1 0
X(t) = α1 e−t + α2 e3t
−1 −3

400
§6.4 Sinks, Saddles, and Sources

Cursor position: (-3.97, 1.35)

Computing the field elements.

The forward orbit from (2, -2) --> a possible eq. pt. near (-3e-19, -2e-16).

Figure 17

{A6.4.3}
18.      
1−i 1+i 2
v1 = ; v2 = ; λ1 = −1 + i; λ2 = −1 − i; X0 =
−1 −1 0

Solution: Since the eigenvalues are complex conjugates, the general solution is given by (6.2.3)
and is
X(t) = α1 X1 (t) + α2 X2 (t)
where u = Re(v1 ), w = Im(v1 ), and

X1 (t) = e−t (cos(t)u − sin(t)w)


X2 (t) = e−t (sin(t)u + cos(t)w).

401
§6.4 Sinks, Saddles, and Sources

Cursor position: (-3.67, 2.13)

The forward orbit from (-3.7, 0.95) --> a possible eq. pt. near (9e-15, -1.7e-15).

Select a graphics object with the mouse.

Figure 18

402
§6.5 *Matrix Exponentials

{S:Matrixexp} 6.5 *Matrix Exponentials


In Section 4.2 we showed that the solution of the single ordinary differential equation ẋ(t) =
λx(t) with initial condition x(0) = x0 is x(t) = etλ x0 (see (4.1.4) in Chapter 4). In this
section we show that we may write solutions of systems of equations in a similar form. In
particular, we show that the solution to the linear system of ODEs
dX
{eq:x=Mx} = CX (6.5.1)
dt
with initial condition
X(0) = X0 ,
where C is an n × n matrix and X0 ∈ Rn , is

{matrixsoln} X(t) = etC X0 . (6.5.2)

In order to make sense of the solution (6.5.2) we need to understand matrix exponentials.
More precisely, since tC is an n × n matrix for each t ∈ R, we need to make sense of the
expression eL where L is an n × n matrix. For this we recall the form of the exponential
function as a power series:
1 2 1 1
et = 1 + t + t + t3 + t4 + · · · .
2! 3! 4!
In more compact notation we have

X 1 k
et = t .
k!
k=0

By analogy, define the matrix exponential eL by


1 2 1
{e:expL} eL = In + L + L + L3 + · · · (6.5.3)
2! 3!

X 1 k
= L .
k!
k=0

In this formula L2 = LL is the matrix product of L with itself, and the power Lk is defined
inductively by Lk = LLk−1 for k > 1. Hence eL is an n × n matrix and is the infinite sum
of n × n matrices.
Remark: The infinite series for matrix exponentials (6.5.3) converges for all n × n matrices
L. This fact is proved in Exercises 13 and 14.

403
§6.5 *Matrix Exponentials

Using (6.5.3), we can write the matrix exponential of tC for each real number t. Since
(tC)k = tk C k we obtain

1 1
etC = In + tC + (tC)2 + (tC)3 + · · ·
{eq:MatrixExp} 2!2 3! (6.5.4)
t t3
= In + tC + C 2 + C 3 + · · · .
2! 3!
Next we claim that
d tC
{e:diffmatexp} e = CetC . (6.5.5)
dt
We verify the claim by supposing that we can differentiate (6.5.4) term by term with respect
to t. Then
d t2 2 d t3 3
   
d tC d d
e = (In ) + (tC) + C + C +
dt dt dt dt 2! dt 3!
 4 
d t 4
C + ···
dt 4!
t2 t3
= 0 + C + tC 2 + C 3 + C 4 + · · ·
2! 3!
t2 2 t3 3
 
= C In + tC + C + C + · · ·
2! 3!
= CetC .

It follows that the function X(t) = etC X0 is a solution of (6.5.1) for each X0 ∈ Rn ; that is,
d d
X(t) = etC X0 = CetC X0 = CX(t).
dt dt
Since (6.5.3) implies that e0C = e0 = In , it follows that X(t) = etC X0 is a solution of (6.5.1)
with initial condition X(0) = X0 . This discussion shows that solving (6.5.1) in closed form
{T:linODEsoln} is equivalent to finding a closed form expression for the matrix exponential etC .
Theorem 6.5.1. The unique solution to the initial value problem

dX
= CX
dt
X(0) = X0

is
X(t) = etC X0 .

404
§6.5 *Matrix Exponentials

Proof Existence follows from the previous discussion. For uniqueness, suppose that Y (t)
is a solution to Ẏ = CY with Y (0) = X0 . We claim that Y (t) = X(t). Let Z(t) = e−tC Y (t)
and use the product rule to compute
dZ dY
= −Ce−tC Y (t) + e−tC (t) = e−tC (−CY (t) + CY (t)) = 0
dt dt
It follows that Z is constant in t and Z(t) = Z(0) = Y (0) = X0 or Y (t) = etC X0 = X(t),
as claimed. 

Similarity and Matrix Exponentials We introduce similarity at this juncture for the following
reason: if C is a matrix that is similar to B, then eC can be computed from eB . More
{L:similarexp} precisely:
Lemma 6.5.2. Let C and B be n × n similar matrices, and let P be an invertible n × n
matrix such that
C = P −1 BP.
Then
{e:similarexp} eC = P −1 eB P. (6.5.6)

Proof Note that for all powers of k we have

(P −1 BP )k = P −1 B k P.

Next verify (6.5.6) by computing


∞ ∞
X 1 k X 1 −1
eC = C = (P BP )k
k! k!
k=0 k=0
∞ ∞
!
X 1 −1 k X 1 k
= P B P = P −1 B P = P −1 eB P.
k! k!
k=0 k=0

Explicit Computation of Matrix Exponentials We begin with the simplest computation


of a matrix exponential.
(a) Let L be a multiple of the identity; that is, let L = αIn where α is a real number.
Then
{ex:expm} eαIn = eα In . (6.5.7)

405
§6.5 *Matrix Exponentials

That is, eαIn is a scalar multiple of the identity. To verify (6.5.7), compute

α2 2 α3 3
eαIn = In + αIn + I + I + ···
2! n 3! n
α2 α3
= (1 + α + + + · · · )In = eα In .
2! 3!

(b) Let C be a 2 × 2 diagonal matrix,


!
λ1 0
C= ,
0 λ2

where λ1 and λ2 are real constants. Then


!
eλ1 t 0
{e:expdiag} etC = . (6.5.8)
0 eλ2 t

To verify (6.5.8) compute

t2 2 t3 3
etC = I2 + tC + C + C + ···
2! 3!
 2 
! ! t 2
1 0 λ1 t 0  2! λ1 0 
= + + 2 + ···
0 1 0 λ2 t t 2 
0 λ2
! 2!
eλ 1 t 0
= .
0 eλ2 t

(c) Suppose that !


0 −1
C= .
1 0
Then !
cos t − sin t
{e:exprotate} e tC
= . (6.5.9)
sin t cos t
We begin this computation by observing that

C 2 = −I2 , C 3 = −C, and C 4 = In .

406
§6.5 *Matrix Exponentials

Therefore, by collecting terms of odd and even power in the series expansion for the matrix
exponential we obtain

t2 2 t3 3
etC = I2 + tC + C + C + ···
2! 3!
t2 t3
= I2 + tC − I2 − C + · · ·
2! 3!
t2 t4 t6
 
= 1 − + − + · · · I2 +
2! 4! 6!
t3 t5 t7
 
t − + − + ··· C
3! 5! 7!
= (cos t)I2 + (sin t)C
!
cos t − sin t
= .
sin t cos t

In this computation we have used the fact that the trigonometric functions cos t and sin t
have the power series expansions:

X (−1)k
1 2 1
cos t = 1− t + t4 + · · · = t2k ,
2! 4! (2k)!
k=0
X (−1)k ∞
1 1
sin t = t − t3 + t5 + · · · = t2k+1 .
3! 5! (2k + 1)!
k=0

See Exercise 10 for an alternative proof of (6.5.9).


To compute the matrix exponential MATLAB provides the command expm. We use this
command to compute the matrix exponential etC for
!
0 −1 π
C= and t = .
1 0 4

Type

C = [0, -1; 1, 0];


t = pi/4;
expm(t*C)

that gives the answer

407
§6.5 *Matrix Exponentials

ans =
0.7071 -0.7071
0.7071 0.7071

Indeed, this is precisely what we expect by (6.5.9), since


π π 1
cos = sin = √ ≈ 0.70710678.
4 4 2

(d) Let
!
0 1
C= .
0 0

Then !
1 t
{e:nilpotent} etC = I2 + tC = , (6.5.10)
0 1

since C 2 = 0.

Exercises

{c6.2.1} 1. (matlab) Let L be the 3 × 3 matrix


 
2 0 −1
L= 0 −1 3 .
 

1 0 1

Find the smallest integer m such that

1 2 1 1 m
I3 + L + L + L3 + · · · + L
2! 3! m!

is equal to eL up to a precision of two decimal places. More exactly, use the MATLAB command
expm to compute eL and use MATLAB commands to compute the series expansion to order m.
Note that the command for computing n! in MATLAB is prod(1:n).
Answer: When m = 7, the power series is accurate up to two decimal places.
Solution: Enter L into MATLAB , then use the command expm(L) to find the value of eL .

408
§6.5 *Matrix Exponentials

ans =
4.8746 0 -3.9421
3.1370 0.3679 2.4154
3.9421 0 0.9324

Then, add elements of the power series to I3 until this answer is is equal to the MATLAB generated
answer up to a precision of two digits.

{c6.2.2} 2. (matlab) Use MATLAB to compute the matrix exponential e for


tC

!
1 1
C=
2 −1

by choosing for t the values 1.0, 1.5 and 2.5. Does eC e1.5C = e2.5C ?
Enter C into MATLAB , then use expm to obtain

expm(C) = expm(1.5*C) = expm(2.5*C) =


4.4952 1.5806 10.6138 3.8577 59.9058 21.9222
3.1612 1.3340 7.7154 2.8984 43.8444 16.0613

Typing expm(C)*expm(1.5*C) confirms that eC e1.5C = e2.5C is indeed valid.

{c6.2.3} 3. (matlab) For the scalar exponential function e it is well known that for any pair of real
t

numbers t1 , t2 the following equality holds:

et1 +t2 = et1 et2 .

Use MATLAB to find two 2 × 2 matrices C1 and C2 such that

eC1 +C2 6= eC1 eC2 .

One example of matrices for which eC1 +C2 6= eC1 eC2 is


! !
1 −2 −2 3
C1 = and C2 = .
3 1 −1 −2

For this case, using MATLAB,

expm(C1+C2) = expm(C1)*expm(C2) =
0.8013 0.5034 0.1547 -0.4534
1.0067 0.8013 0.1152 0.5370

409
§6.5 *Matrix Exponentials

{c6.2.4a} In Exercises 4 – 6 compute the matrix exponential e for the matrix.


tC

!
0 1
4. .
0 0
!
1 t
Answer: etC = .
0 1
Solution: In general,
t2 2 t3 3
etC = I + tC +
C + C + ···
2! 3!
Note that C 2 = 0, so C k = 0, for k ≥ 2, so we need only calculate the first two terms:
! ! !
tC 1 0 0 1 1 t
e = I2 + tC = +t = .
0 1 0 0 0 1
{c6.2.4b}
 
0 1 0
5.  0 0 1 .
 

0 0 0
t2
 
 1 t
2 
Answer: etC =
 0 1 t .

0 0 1

Solution: Since D3 = 0,
  t2 t2
     
1 0 0 0 t 0 0 0
t2 2   1 t
2 
etD = I3 + tD + D2 =  0
 
1 0 + 0 0 t + =
t .
  
 0 0 0   0 1

2
0 0 1 0 0 0 0 0 0 0 0 1
{c6.2.4c}
!
0 −2
6. .
2 0
!
cos 2t − sin 2t
Answer: etC = .
sin 2t cos 2t
Solution: First, write C as
! !
0 −2 0 −1
= 2E, where E = .
2 0 1 0

Then find e2tE by equation (6.5.9).

410
§6.5 *Matrix Exponentials

{c6.2.5}
7. Let α, β be real numbers and let αI and βI be corresponding n × n diagonal matrices. Use
properties of the scalar exponential function to show that

e(α+β)I = eαI eβI .

By equation (6.5.7), eαI = eα I. Therefore,

e(α+β)I = eα+β I = eα eβ I = eα Ieβ I = eαI eβI .

In Exercises 8 – 10 we use Theorem 6.5.1, the uniqueness of solutions to initial value problems, in
{c6.2.5A} perhaps a surprising way.
8. Prove that
et+s = et es
for all real numbers s and t. Hint:

(a) Fix s and verify that y(t) = et+s is a solution to the initial value problem
dx
= x
{E:init1} dt (6.5.11)
x(0) = es

(b) Fix s and verify that z(t) = et es is also a solution to (6.5.11).


(c) Use Theorem 6.5.1 to conclude that y(t) = z(t) for every s.

{exist&unique} In this exercise you will need to use the following theorem from analysis:
df
Theorem 6.5.3. If f (x) is differentiable near x0 , and if is continuous, then there is a unique
dx
solution to the differential equation ẋ = f (x) with initial condition x(0) = x0 .

(a) To verify that y(t) = et+s is a solution to the initial value problem, first substitute y(t) into the
left hand side of the equation. Using the chain rule, obtain
dy d d
(t) = (et+s ) = (t + s)et+s = et+s .
dt dt dt
Then substitute y(t) into the right hand side of the equation, obtaining

y(t) = et+s .

Thus, the left hand and right hand sides are equal, so y(t) is a solution to the differential equation.
Finally, check to see that y(t) satisfies the initial value:

y(0) = e0+s = es

411
§6.5 *Matrix Exponentials

as desired.
(b) Similarly, verify that z(t) = et es is a solution to the initial value problem by substituting z(t)
into each side of the differential equation:

dz d
(t) = (et es ) = et es and z(t) = et es .
dt dt
Note that the results are equal, so that z(t) is a solution to the differential equation. Since

z(0) = e0 es = es ,

it follows that z(t) is also a solution to the initial value problem.


df
(c) Theorem 6.5.3 states that if f (x) is differentiable near x0 , and if is continuous, then there
dx
is a unique solution to the differential equation ẋ = f (x) with initial condition x(0) = x0 . Let
f (x) = x and let x(0) = es . Then, as shown in (a) and (b) of this problem, et+s and et es are both
solutions. However, by the analysis theorem, there is only one solution, so et+s = et es .
{c6.2.5B}
9. Let A be an n × n matrix. Prove that

e(t+s)A = etA esA

for all real numbers s and t. Hint:

(a) Fix s ∈ R and X0 ∈ Rn and verify that Y (t) = e(t+s)A X0 is a solution to the initial value
problem
dX
= AX
{E:init2} dt (6.5.12)
X(0) = esA X0
 
(b) Fix s and verify that Z(t) = etA esA X0 is also a solution to (6.5.12).

(c) Use the n dimensional version of Theorem 6.5.1 to conclude that Y (t) = Z(t) for every s and
every X0 .

Remark: Compare the result in this exercise with the calculation in Exercise 7.
(a) To verify that Y (t) is a solution to the initial value problem (6.5.12), first substitute Y (t) into
the left hand side of the equation. Using the chain rule, obtain

dY d d
(t) = (e(t+s)A X0 ) = ((t + s)A)e(t+s)A X0 = Ae(t+s)A X0 .
dt dt dt
Then substitute Y (t) into the right hand side of the equation, obtaining

AY (t) = Ae(t+s)A X0 .

412
§6.5 *Matrix Exponentials

Thus, the left hand and right hand sides of the equation are equal, so Y (t) is a solution to the
differential equation. Finally, check to see that Y (t) satisfies the initial value:

Y (0) = e(0+s)A X0 = esA X0 ,

as desired.
(b) Similarly, verify that Z(t) = etA (esA X0 ) is a solution to the initial value problem by substituting
Z(t) into each side of the differential equation:

dZ d
(t) = (etA (esA X0 )) = AetA (esA X0 ) and AZ(t) = AetA (esA X0 ).
dt dt
Since the results are equal, Z(t) is a solution to the differential equation. Evaluating at t = 0, we
find
Z(0) = e0 (esA X0 ) = esA X0 ,
from which it follows that Z(t) is also a solution to the initial value problem.
(c) Since Y (t) and Z(t) are both solutions to the initial value problem

dX
= AX
dt
X(0) = esA X0 ,

it follows from the uniqueness part of Theorem 6.5.3 that Y (t) = Z(t). Thus, e(t+s)A = etA esA , as
{c6.2.5C} desired.
10. Prove that !! !
0 −1 cos t − sin t
{E:0-110E} exp t = . (6.5.13)
1 0 sin t cos t
Hint:
! !
cos t − sin t
(a) Verify that X1 (t) = and X2 (t) = are solutions to the initial value
sin t cos t
problems !
dX 0 −1
= X
{E:init3} dt 1 0 (6.5.14)
X(0) = ej
for j = 1, 2.
(b) Since Xj (0) = ej , use Theorem 6.5.1 to verify that
!!
0 −1
{E:0-110} Xj (t) = exp t ej . (6.5.15)
1 0

413
§6.5 *Matrix Exponentials

(c) Show that (6.5.15) proves (6.5.13)

(a) To verify that X1 (t) = (cos t, sin t)t is a solution to the initial value problem (6.5.14), substitute
X1 (t) into the left hand side of the differential equation, obtaining
! !
dX1 d cos t − sin t
= = .
dt dt sin t cos t

Then substitute X1 (t) into the right hand side of the differential equation, obtaining
! ! ! !
0 −1 0 −1 cos t − sin t
X1 = = .
1 0 1 0 sin t cos t

Since the two sides are equal, X1 (t) is a solution to (6.5.14). Further, since
! !
cos 0 1
X1 (0) = = = e1 ,
sin 0 0

X1 (t) is a solution to the given initial value problem.


Similarly, to verify that X2 (t) = (− sin t, cos t)t is a solution to the initial value problem,
substitute X2 (t) into the left hand side of the differential equation, obtaining
! !
dX2 d − sin t − cos t
= = .
dt dt cos t − sin t

Then substitute X2 (t) into the right hand side of the differential equation, obtaining
! ! ! !
0 −1 0 −1 − sin t − cos t
X2 = = .
1 0 1 0 cos t − sin t

Since the two sides of the differential equation are equal, and since
! !
− sin 0 0
X2 (0) = = = e2 ,
cos 0 1

X2 (t) is a solution to the initial value problem (6.5.14).


(b) By Theorem 6.5.1, the unique solution to (6.5.14) with initial condition X(0) = ej is
!!
0 −1
Yj (t) ≡ exp t ej .
1 0

We showed in part (a) that Xj (t) satisfies this initial value problem. Thus, by the uniqueness part
of Theorem 6.5.3, Xj (t) = Yj (t), as desired.

414
§6.5 *Matrix Exponentials

(c) By part (b), Xj (t) is equal to Yj (t), which is defined as the j th column of the matrix
!!
0 −1
exp t .
1 0

By definition, Xj (t) is equal to the j th column of the matrix


!
cos t − sin t
.
sin t cos t

Thus, !! !
0 −1 cos t − sin t
exp t = .
1 0 sin t cos t
{c6.5.6}
11. Compute eA where !
3 −1
A= .
1 1
Check your answer using MATLAB.
!
2 −1
Answer: e = e
A 2
.
1 0

Solution: We cannot compute eA directly by hand. By Lemma 6.5.2, if B = P −1 AP , then


eB = P −1 eA P . Therefore, we solve by finding a matrix similar to A whose exponential can be
computed. We first find the eigenvalues of A and associated eigenvectors. If λ is an eigenvalue of
A, then det(A − λI2 ) = 0, or, since A is a 2 × 2 matrix,
0 = λ2 − tr(A)λ + det(A) = λ2 − 4λ + 4 = (λ − 2)2 .
Thus, A has one eigenvalue, λ = 2. To find the associated eigenvector, solve (A − λI2 )v = 0 by row
reduction: ! !
1 −1 1 −1
A − 2I2 = −→ .
1 −1 0 0
So, v = (1, 1) is an eigenvector of A associated to λ. By Theorem 6.3.4, since A has one real
eigenvector, ! !
λ 1 2 1
B= =
0 λ 0 2
is a matrix similar to A. We can then compute P = (v|w), where Aw = v+λw, that is, (A−λI2 )w =
v. We can solve by row reducing the augmented matrix (A − λI2 |v):
! !
1 −1 1 1 −1 1
−→ .
1 −1 1 0 0 0

415
§6.5 *Matrix Exponentials

Thus, w = (2, 1), so !


1 2
P =
1 1
and A and B are similar. We now compute eA = P eB P −1 using
!
B 2 1 1
e =e .
0 1

So we calculate
! ! ! !
A B −1 1 2 2 1 1 −1 2 2 2 −1
e = Pe P = e =e .
1 1 0 1 1 −1 1 0

{c6.2.6A}
12. Let C be an n × n matrix. Use Theorem 6.5.1 to show that the n columns of the n × n matrix
etC give a basis of solutions for the system of differential equations Ẋ = CX.
By Theorem 6.5.1, the unique solution to the differential equation Ẋ = CX with initial condition
X(0) = X0 is X(t) = etC X0 . Let X(0) = ej . Then the vector Xj (t) = etC ej is a solution to
the differential equation, and is the j th column of the matrix etC . Thus, each column of etC is a
solution to the differential equation, and the set of columns forms a basis of solutions.

Remark: The completion of Exercises 13 and 14 constitutes a proof that the infinite series defini-
tion of the matrix exponential is a convergent series for all n × n matrices.
{c6.2.7}
13. Let A = (aij ) be an n × n matrix. Define
n
!
X
||A||m = max (|ai1 | + · · · + |ain |) = max |aij | .
1≤i≤n 1≤i≤n
j=1

That is, to compute ||A||m , first sum the absolute values of the entries in each row of A, and then
take the maximum of these sums. Prove that:

||AB||m ≤ ||A||m ||B||m .

Hint: Begin by noting that


n X
n
! n X
n
!
X X
||AB||m = max aik bkj ≤ max |aik bkj |
1≤i≤n 1≤i≤n
j=1 k=1 j=1 k=1
n X
n
!
X
= max |aik bkj | .
1≤i≤n
k=1 j=1

416
§6.5 *Matrix Exponentials

Using the hint from the text:


! !
XX X X
||AB||m ≤ max |aik bkj | ≤ max |aik | |bkj | .
1≤i≤n 1≤i≤n
k j k j

Then, because |bkj | is the sum of the entries in the kth row of B:
X

! !
X X X
max |aik | |bkj | ≤ max ||B||m |aik | = ||A||m ||B||m .
1≤i≤n 1≤i≤n
k j k

Thus ||AB||m ≤ ||A||m ||B||m .


{c6.2.8}
14. Recall that an infinite series of real numbers

c1 + c2 + · · · + cN + · · ·

converges absolutely if there is a constant K such that for every N the partial sum satisfies:

|c1 | + |c2 | + · · · + |cN | ≤ K.

Let A be an n × n matrix. To prove that the matrix exponential eA is an absolutely convergent


infinite series use Exercise 13 and the following steps. Let aN be the (i, j)th entry in the matrix
AN where A0 = In .

(a) |aN | ≤ ||AN ||m .


(b) ||AN ||m ≤ ||A||N
m.

1
(c) |a0 | + |a1 | + · · · + |aN | ≤ e||A||m .
N!

(a) Let aij be the (i, j)th entry in n × n matrix A. Then, since |aij | ≥ 0 for all (i, j),

|aij | ≤ |ai1 | + · · · + |ain | ≤ max(|ai1 | + · · · + |ain |) = ||A||m .


i

This statement is valid for the matrix AN , so |aN | ≤ ||AN ||m .


(b) To show this, we use the fact that ||AN ||m = ||AN −1 A||m . According to Exercise 13,

||AN −1 A||m ≤ ||AN −1 ||m ||A||m .

Expanding ||AN ||m in this way, we find that

||AN ||m ≤ ||A||m · · · ||A||m = ||A||N


m.

417
§6.5 *Matrix Exponentials

(c) According to our result in (a)


1 X 1 X 1
|a0 | + |a1 | + · · · + |aN | = |aN | ≤ ||AN ||m .
N! N
N ! N
N !

According to the result in (b)


X 1 X 1
||AN ||m ≤ ||A||N
m.
N
N ! N
N !

We know that

X 1 ||A||m
||A||N
m = e .
N =0
N !
Therefore,
X 1
|aN | ≤ e||A||m ,
N
N !

so eA is absolutely convergent.

{c6.3.14}
15. When the eigenvalues λ1 and λ2 of the 2×2 matrix C are real and distinct, etC can be computed
without determining the associated eigenvectors. To see this, prove that
1  
{E:exdist} etC = eλ1 t (C − λ2 I2 ) − eλ2 t (C − λ1 I2 ) . (6.5.16)
λ2 − λ1
Hint: The left and right hand sides of (6.5.16) are linear maps. Two linear maps are identical if
they have the same values on a basis of vectors v1 and v2 . Verify that the maps in (6.5.16) are
equal when applied to the linearly independent eigenvectors of C.

418
§6.6 *The Cayley Hamilton Theorem

{S:6.6} 6.6 *The Cayley Hamilton Theorem


The Jordan normal form theorem (Theorem 6.3.4) for real 2 × 2 matrices states that every
2 × 2 matrix is similar to one of the matrices in Table 2. We use this theorem to prove the
Cayley Hamilton theorem for 2 × 2 matrices and then use the Cayley Hamilton theorem
to present another method for computing solutions to planar linear systems of differential
equations in the case of real equal eigenvalues.
The Cayley Hamilton theorem states that a matrix satisfies its own characteristic polyno-
mial. More precisely:
{T:CH2}
Theorem 6.6.1 (Cayley Hamilton Theorem). Let A be a 2 × 2 matrix and let

pA (λ) = λ2 + aλ + b

be the characteristic polynomial of A. Then

pA (A) = A2 + aA + bI2 = 0.

Proof Suppose B = P −1 AP and A are similar matrices. We claim that if pA (A) = 0,


then pB (B) = 0. To verify this claim, recall from Lemma 6.3.3 that pA = pB and calculate

pB (B) = pA (P −1 AP ) = (P −1 AP )2 + aP −1 AP + bI2
= P −1 pA (A)P = 0.

Theorem 6.3.4 classifies 2 × 2 matrices up to similarity. Thus, we need only verify this
theorem for the matrices
     
λ1 0 σ −τ λ1 1
C= ,D = ,E = ,
0 λ2 τ σ 0 λ1

that is, we need to verify that

pC (C) = 0 pD (D) = 0 pE (E) = 0.

Using the fact that pA (λ) = λ2 − tr(A)λ + det(A), we see that

pC (λ) = (λ − λ1 )(λ − λ2 )
pD (λ) = λ2 − 2σλ + (σ 2 + τ 2 )
pE (λ) = (λ − λ1 )2 .

419
§6.6 *The Cayley Hamilton Theorem

It now follows that

pC (C) = (C − λ1 I2 )(C − λ2 I2 )
  
0 0 λ1 − λ2 0
=
0 λ2 − λ1 0 0
= 0,

and
σ 2 − τ 2 −2στ
 
pD (D) = −
2στ σ2 − τ 2
 
σ −τ
2σ +
τ σ
 
1 0
(σ 2 + τ 2 ) = 0,
0 1

and  2
0 1
pE (E) = (E − λ1 I2 )2 = = 0.
0 0


The Example with Equal Eigenvalues Revisited When the eigenvalues λ1 = λ2 , the closed
form solution of Ẋ = CX is a straightforward formula

{E:exeq} X(t) = eλ1 t (I2 + tN ) (6.6.1)

where N = C − λ1 I2 .
Note that when using (6.6.1), it is not necessary to compute the eigenvector or generalized
eigenvector of C, and this is a substantial simplification.

Verification of (6.6.1) We use the Cayley-Hamilton theorem to verify (6.6.1) as follows.


Specifically, since C is assumed to have a double eigenvalue λ1 , it follows that

N = C − λ1 I2

has zero as a double eigenvalue. Hence, the characteristic polynomial pN (λ) = λ2 and the
Cayley Hamilton theorem implies that N 2 = 0. Therefore,

CX(t) = eλ1 t C(I2 + tN )X0 = eλ1 t (λ1 I2 + N )(I2 + tN )X0

420
§6.6 *The Cayley Hamilton Theorem

Thus, using N 2 = 0, we see that

CX(t) = eλ1 t (λ1 I2 + tλ1 N + N )X0 = Ẋ(t),

as desired
Let us reconsider the system of differential equations (6.2.17)
 
dX 1 −1
= X = CX
dt 9 −5

with initial value  


2
X0 = .
3
The eigenvalues of C are real and equal to λ1 = −2.
We may write
C = λ1 I2 + N = −2I2 + N,
where  
3 −1
N= .
9 −3
It follows from (6.6.1) that
    
3 −1 1 + 3t −t
{e:solntob} tC
e =e −2t
I2 + t =e −2t
. (6.6.2)
9 −3 9t 1 − 3t

Hence the solution to the initial value problem is:


  
1 + 3t −t 2
X(t) = etC X0 = e−2t
9t 1 − 3t 3
 
2 + 3t
= e−2t .
3 + 9t

Exercises

{c6.6.2}
1. Solve the initial value problem
 
dX 0 1
= X
dt −2 3

421
§6.6 *The Cayley Hamilton Theorem

where X(0) = (2, 1)t .


Answer: The solution to the given initial value problem with X(0) = (2, 1)t is
  2t
e − 3et
   
1 1
X(t) = −3et + e2t = 2t t .
1 2 2e − 3e

Solution: Let  
0 1
C= .
−2 3
Then the solution to the system is X(t) = etC X0 , where X0 = X(0) = (2, 1)t . To find etC , first
find the eigenvalues of C by solving

0 = λ2 − tr(C) + det(C) = (λ − 1)(λ − 2).

The eigenvalues are λ1 = 1 and λ2 = 2. We can then find etC using (6.5.16):
1
etC = (eλ1 t (C − λ2 I2 ) − eλ2 t (C − λ1 I2 ))
λ2− λ1       
0 1 2 0 0 1 1 0
= et − − e2t −
 −2 3  0 2  −2 3 0 1
−2 1 −1 1
= et − e2t .
−2 1 −2 2
So, we can now compute
      
−2 1 −1 1 2
X(t) = etC X0 = et − e2t .
−2 1 −2 2 1

Complete the matrix multiplication to obtain X(t).


{c6.6.3}
2. Find all solutions to the linear system of ODEs
 
dX −2 4
= X.
dt −1 1

Answer: The general solution is


 √  √  √  
7 3 7 4 7
cos t − √ sin t √ sin t
1 2 2 7√  2 7 2
X(t) = e− 2 t  √  √ 
 
 X(0).
 1 7 7 3 7 
− √ sin t cos t + √ sin t
7 2 2 2 7 2

Solution: Let  
−2 4
C= .
−1 1

422
§6.6 *The Cayley Hamilton Theorem

Then the solution to the system is X(t) = etC X0 , where X0 = X(0) is the initial condition of the
system. Find the eigenvalues of C by solving
0 = λ2 − tr(C) + det(C) = λ2 + λ + 2.
1 √ 1 √
The eigenvalues are λ1 = (−1 + 7i) and λ2 = (−1 − 7i). Find etC using (6.5.16):
2 2
tC 1
e = (e (C − λ2 I2 ) − eλ2 t (C − λ1 I2 )).
λ1 t
λ2 − λ1
Since the scalar
1 1
=− √ i
λ2 − λ1 2 7
is purely imaginary, we need compute only the imaginary parts of eλ1 t (C −λ2 I2 ) and eλ2 t (C −λ1 I2 )
to obtain the real solution to the differential equation X(t) = etC X0 . So, compute
  √ 
3 7
1

7

 −2 4
 − − i 0
eλ1 t (C − λ2 I2 ) = e− 2 t e 2 it  − 2 2 √
 
−1 1 3 7

0 − − i
 √2 2 
√   √  3 7
1

7 7 − + i 4
= e− 2 t cos
 2 2 √
t + i sin t .

2 2 3 7

−1 + i
2 2
The imaginary part of this product is
 √  √ √  √  
3 7 7 7 7

 2 sin t + cos t 4 sin t
−1 t 2 √2  2 √  √ 2 √ 

ie 2  .
 7 3 7 7 7 
− sin t sin t + cos t
2 2 2 2 2
Then compute
  √ 
3 7
−1

 −2 4
t − 27 it
  − + i 0
eλ2 t (C − λ1 I2 ) = e e − 2 2 √
2
 
−1 1 3 7
 
0 − + i
 √ 2 2
√ √  3 7
7  −2 − 2 i 4

1 7
= e− 2 t cos t − i sin t  √ .

2 2 3 7
−1 − i
2 2
The imaginary part of this product is
 √  √ √  √  
3 7 7 7 7
−1 t
 2 sin 2 t − 2 cos 2 t −4 sin
2
t
√  √  √ √ 

ie 2  .
 7 3 7 7 7 
sin t − sin t − cos t
2 2 2 2 2

423
§6.6 *The Cayley Hamilton Theorem

Thus, the general solution is


√  √  √ 

 
7 7 7
−3 sin t + 7 cos t 8 sin t
1 − 12 t  2 2 2
√  √  √ 

X(t) = √ e 

 X0 .
2 7  7 7 7 
−2 sin t 3 sin t + 7 cos t
2 2 2
{c6.6.4}
3. Solve the initial value problem
 
dX 2 1
= X
dt −2 0

where X(0) = (1, 1)t .


Answer: The solution to the initial value problem X(0) = (1, 1)t is:
 
2 sin t + cos t
X(t) = et .
−3 sin t + cos t

Solution: Let  
2 1
C= .
−2 0
Then the solution to the system is X(t) = etC X0 , where X0 = X(0) is the initial condition of the
system. Find the eigenvalues of C by solving

0 = λ2 − tr(C) + det(C) = λ2 − 2λ + 2.

So the eigenvalues of C are λ1 = 1 + i and λ2 = 1 − i. We can then find etC using (6.5.16):
1
etC = (eλ1 t (C − λ2 I2 ) − eλ2 t (C − λ1 I2 )).
λ2 − λ1
Since the scalar
1 1
=− i
λ2 − λ1 2
is purely imaginary, we need compute only the imaginary parts of eλ1 t (C −λ2 I2 ) and eλ2 t (C −λ1 I2 )
to obtain the real solution to the differential equation X(t) = etC X0 . So, compute
   
2 1 1−i 0
eλ1 t (C − λ2 I2 ) = et eit −
−2 0  0 1 −i
1 + i 1
= et (cos t + i sin t) .
−2 −1 + i
The imaginary part of this product is
 
sin t + cos t sin t
iet .
−2 sin t − sin t + cos t

424
§6.6 *The Cayley Hamilton Theorem

Then compute
   
2 1 1+i 0
eλ2 t (C − λ1 I2 ) = et e−it −
−2 0 0 1+ i
1−i 1
= et (cos t − i sin t) .
−2 −1 − i

The imaginary part of this product is


 
t − sin t − cos t − sin t
ie .
2 sin t sin t − cos t

Thus, the solution to the given initial value problem is


  
1 2 sin t + 2 cos t 2 sin t 1
X(t) = etC X0 = .
2 −4 sin t −2 sin t + 2 cos t 0

Complete the matrix multiplication to obtain X(t).

{c6.CH}
4. Let A be a 2 × 2 matrix. Show that

A2 = tr(A)A − det(A)I2 .

For any 2 × 2 matrix A, the characteristic polynomial is

pA (λ) = λ2 − tr(A)λ + det(A).

By the Cayley-Hamilton Theorem (Theorem 6.6.1), a 2 × 2 matrix satisfies its own characteristic
polynomial. Thus,
pA (A) = A2 − tr(A)A + det(A)I2 = 0,
or A2 = tr(A)A − det(A)I2 .

425
§6.7 *Second Order Equations

{S:SOE} 6.7 *Second Order Equations


A second order constant coefficient homogeneous differential equation is a differential equa-
tion of the form:
{eq:soex1} ẍ + bẋ + ax = 0, (6.7.1)
where a and b are real numbers.

Newton’s Second Law Newton’s second law of motion is a second order ordinary differ-
ential equation, and for this reason second order equations arise naturally in mechanical
systems. Newton’s second law states that

{e:2ndlaw} F = ma (6.7.2)

where F is force, m is mass, and a is acceleration.

Newton’s Second Law and Particle Motion on a Line For a point mass moving along a line,
(6.7.2) is
d2 x
{E:F=ma} F =m 2, (6.7.3)
dt
where x(t) is the position of the point mass at time t. For example, suppose that a particle
of mass m is falling towards the earth. If we let g be the gravitational constant and if we
ignore all forces except gravitation, then the force acting on that particle is F = −mg. In
this case Newton’s second law leads to the second order ordinary differential equation

d2 x
{e:pointpart} + g = 0. (6.7.4)
dt2

Newton’s Second Law and the Motion of a Spring As a second example, consider the spring
model pictured in Figure 22. Assume that the spring has zero mass and that an object of
mass m is attached to the end of the spring. Let L be the natural length of the spring,
and let x(t) measure the distance that the spring is extended (or compressed). It follows
from Newton’s Law that (6.7.3) is satisfied. Hooke’s law states that the force F acting on
a spring is
F = −κx,
where κ is a positive constant. If the spring is damped by sliding friction, then

dx
F = −κx − µ ,
dt

426
§6.7 *Second Order Equations

where µ is also a positive constant. Suppose, in addition, that an external force Fext (t) also
acts on the mass and that that force is time-dependent. Then the entire force acting on the
mass is
dx
F = −κx − µ + Fext (t).
dt
By Newton’s second law, the motion of the mass is described by
d2 x dx
{e:springeq} m +µ + κx = Fext (t), (6.7.5)
dt2 dt
which is again a second order ordinary differential equation.

L x

{F:spring2} Figure 22: Hooke’s Law spring.

A Reduction to a First Order System There is a simple trick that reduces a single linear
second order differential equation to a system of two linear first order equations. For exam-
ple, consider the linear homogeneous ordinary differential equation (6.7.1). To reduce this
second order equation to a first order system, just set y = ẋ. Then (6.7.1) becomes
ẏ + by + ax = 0.
It follows that if x(t) is a solution to (6.7.1) and y(t) = ẋ(t), then (x(t), y(t)) is a solution
to
ẋ = y
{e:soex1sys} (6.7.6)
ẏ = −ax − by.
We can rewrite (6.7.6) as
Ẋ = QX.
where  
0 1
{e:coeffmatQ} Q= . (6.7.7)
−a −b

427
§6.7 *Second Order Equations

Note that if (x(t), y(t)) is a solution to (6.7.6), then x(t) is a solution to (6.7.1). Thus solving
the single second order linear equation is exactly the same as solving the corresponding first
order linear system.

The Initial Value Problem To solve the homogeneous system (6.7.6) we need to specify
two initial conditions X(0) = (x(0), y(0))t . It follows that to solve the single second order
equation we need to specify two initial conditions x(0) and ẋ(0); that is, we need to specify
both initial position and initial velocity.

The General Solution There are two ways in which we can solve the second order homo-
geneous equation (6.7.1). First, we know how to solve the system (6.7.6) by finding the
eigenvalues and eigenvectors of the coefficient matrix Q in (6.7.7). Second, we know from
the general theory of planar systems that solutions will have the form x(t) = eλ0 t for some
scalar λ0 . We need only determine the values of λ0 for which we get solutions to (6.7.1).
We now discuss the second approach. Suppose that x(t) = eλ0 t is a solution to (6.7.1).
Substituting this form of x(t) in (6.7.1) yields the equation
λ20 + bλ0 + a eλ0 t = 0.


So x(t) = eλ0 t is a solution to (6.7.1) precisely when pQ (λ0 ) = 0, where


{E:charQ} pQ (λ) = λ2 + bλ + a (6.7.8)
is the characteristic polynomial of the matrix Q in (6.7.7).
Suppose that λ1 and λ2 are distinct real roots of pQ . Then the general solution to (6.7.1) is
x(t) = α1 eλ1 t + α2 eλ2 t ,
where αj ∈ R.

An Example with Distinct Real Eigenvalues For example, solve the initial value problem
{e:ex12} ẍ + 3ẋ + 2x = 0 (6.7.9)
with initial conditions x(0) = 0 and ẋ(0) = −2. The characteristic polynomial is
pQ (λ) = λ2 + 3λ + 2 = (λ + 2)(λ + 1),
whose roots are λ1 = −1 and λ2 = −2. So the general solution to (6.7.9) is
x(t) = α1 e−t + α2 e−2t

428
§6.7 *Second Order Equations

To find the precise solution we need to solve


x(0) = α1 + α2 = 0
ẋ(0) = −α1 − 2α2 = −2
So α1 = −2, α2 = 2, and the solution to the initial value problem for (6.7.9) is
x(t) = −2e−t + 2e−2t

An Example with Complex Conjugate Eigenvalues Consider the differential equation


{E:ex13} ẍ − 2ẋ + 5x = 0. (6.7.10)
The roots of the characteristic polynomial associated to (6.7.10) are λ1 = 1 + 2i and λ2 =
1 − 2i. It follows from the discussion in the previous section that the general solution to
(6.7.10) is
x(t) = Re α1 eλ1 t + α2 eλ2 t


where α1 and α2 are complex scalars. Indeed, we can rewrite this solution in real form
(using Euler’s formula) as
x(t) = et (β1 cos(2t) + β2 sin(2t)) ,
for real scalars β1 and β2 .
In general, if the roots of the characteristic polynomial are σ ± iτ , then the general solution
to the differential equation is:
x(t) = eσt (β1 cos(τ t) + β2 sin(τ t)) .

An Example with Multiple Eigenvalues Note that the coefficient matrix Q of the associated
first order system in (6.7.7) is never a multiple of I2 . It follows from the previous section
that when the roots of the characteristic polynomial are real and equal, the general solution
has the form
x(t) = α1 eλ1 t + α2 teλ2 t .

Summary It follows from this discussion that solutions to second order homogeneous linear
equations are either a linear combination of two exponentials (real unequal eigenvalues),
α + βt times one exponential (real equal eigenvalues), or a time periodic function times an
exponential (complex eigenvalues).
In particular, if the real part of the complex eigenvalues is zero, then the solution is time
periodic. The frequency of this periodic solution is often called the internal frequency, a
point that is made more clearly in the next example.

429
§6.7 *Second Order Equations

Solving the Spring Equation Consider the equation for the frictionless spring without
external forcing. From (6.7.5) we get
{ex:uspring} mẍ + κx = 0. (6.7.11)
r r
κ κ
where κ > 0. The roots are λ1 = i and λ2 = − i. So the general solution is
m m
x(t) = α cos(τ t) + β sin(τ t),
r
κ
where τ = . Under these assumptions the motion of the spring is time periodic with
m
2π τ
period or internal frequency . In particular, the solution satisfying initial conditions
τ 2π
x(0) = 1 and ẋ(0) = 0 (the spring is extended one unit in distance and released with no
initial velocity) is
x(t) = cos(τ t).
The graph of this function when τ = 1 is given on the left in Figure 23.
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−0.4 −0.4

−0.6 −0.6

−0.8 −0.8

−1 −1
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70

Figure 23: (Left) Graph of solution to undamped spring equation with initial conditions
x(0) = 1 and ẋ(0) = 0. (Right) Graph of solution to damped spring equation with the same
{F:springp} initial conditions.

If a small amount of friction is added, then the spring equation is


mẍ + µẋ + κx = 0

430
§6.7 *Second Order Equations

where µ > 0 is small. Since the eigenvalues of the characteristic polynomial are λ = σ ± iτ
where r
µ κ  µ 2
σ=− < 0 and τ = − ,
2m m 2m
the general solution is
x(t) = eσt (α cos(τ t) + β sin(τ t)).
Since σ < 0, these solutions oscillate but damp down to zero. In particular, the solution
satisfying initial conditions x(0) = 1 and ẋ(0) = 0 is
 µ 
x(t) = e−µt/2m cos(τ t) − sin(τ t) .
2mτ
µ
The graph of this solution when τ = 1 and = 0.07 is given in Figure 23 (right). Compare
2m
the solutions for the undamped and damped springs.

Exercises

{c6.7.1}
1. By direct integration solve the differential equation (6.7.4) for a point particle moving only under
the influence of gravity. Find the solution for a particle starting at a height of 10 feet above ground
with an upward velocity of 20 feet/sec. At what time will the particle hit the ground? (Recall that
acceleration due to gravity is 32 feet/sec2 .)
Answer: The particle will hit the ground at t ≈ 1.63 seconds.
Solution: Let g(t) = 32. Then integrate:

d2 x
= −g
dt2 Z
dx
= (−g)dt
dt
dx
= −32t + v0
dt Z
x = (−32t + C1 )dt
x = −16t2 + C1 t + C2 .

Then let C1 = v0 , the initial velocity, and let C2 = x0 , the initial position of the particle, to
obtain x(t)
√ = −16t + v0 t + x0 . In this case, x(t) = −16t + 20t + 10. Solve for x(t) = 0 to find
2 2

10 + 260
t= ≈ 1.63.
16

431
§6.7 *Second Order Equations

{c6.7.2}
2. By direct integration solve the differential equation (6.7.4) for a point particle moving only under
the influence of gravity. Show that the solution is

1
x(t) = − gt2 + v0 t + x0
2
where x0 is the initial position of the particle and v0 is the initial velocity.
By (6.7.4), a point particle moving only under the influence of gravity has a position x(t) governed
by
d2 x d2 x
+ g = 0, or = −g.
dt2 dt2
dx dx
Integrate this formula to obtain = −gt + C1 . In this case (t) is the velocity of the particle
dt dt
dx
at time t, so the constant C1 = v0 , the initial velocity of the particle. Integrate = −gt + v0 to
dt
1 2
obtain x = − gt + v0 t + C2 . In this case, C2 = x0 the position of the particle at time t0 . Thus,
2
1
x = − gt2 + v0 t + x0 .
2

In Exercises 3 – 5 find the general solution to the given differential equation.


{c6.6.hoa}
3. ẍ + 2ẋ − 3x = 0.
Answer: The general solution is
x(t) = αet + βe−3t
for α, β ∈ R.
Solution: Let y = ẋ, and rewrite the differential equation as ẏ + 2y − 3x = 0. Write the first order
system in two equations:
ẋ = y
ẏ = 3x − 2y.
This system can be represented by the matrix Q, where
 
0 1
Q= .
3 −2

The characteristic polynomial of Q is pQ (λ) = λ2 + 2λ − 3, and the eigenvalues are λ1 = 1 and


λ2 = 2. Thus, the general solution is

x(t) = αeλ1 t + βeλ2 t = αet + βe−3t .

432
§6.7 *Second Order Equations

{c6.6.hob}
4. ẍ − 6ẋ + 9x = 0. In addition, find the solution to this equation satisfying initial values x(1) = 1
and ẋ(1) = 0.
Answer: The general solution is
√ √
x(t) = αe(3+3 2)t
+ βe(3−3 2)t
.

When x(1) = 1 and ẋ(1) = 0,


√ √
x(t) ≈ 0.0001e(3+3 2)t
+ 2.9573e(3−3 2)t
.

Solution: Let y = ẋ, and rewrite the differential equation as ẏ − 6y + 9x = 0. Then the system
can be rewritten as the first order system in two equations associated to the matrix
 
0 1
Q= .
9 6

The characteristic
√ polynomial of Q is pq (λ) = λ −6λ−9, and the eigenvalues of Q are λ1 = 3+3 2
2

and λ2 = 3 − 3 2. Thus, √ √
x(t) = αe(3+3 2)t + βe(3−3 2)t .
To find the solution with the given initial conditions, substitute x(1) = 1 into the formula for x(t),
then find ẋ(t) and substitute ẋ(1) = 0 into its formula to obtain the system:
√ √
x(t) = 1 = αe(3+3 2) + βe(3−3 2)
√ √
(3+3 2)
√ √
ẋ(t) = 0 = (3 + 3 2)αe + (3 − 3 2)βe(3−3 2) .

Using MATLAB or a calculator, solve this system to obtain α ≈ 0.0001 and β ≈ 2.9573.
{c6.6.hoc}
5. ẍ + 2ẋ + 2x = 0.
Answer: The general solution is

x(t) = e−t (α cos t + β sin t).

Solution: Let y ẋ and rewrite the system as ẏ + 2y + 2x = 0, or Ẋ = QX, where


   
x 0 1
X= and Q = .
y −2 −2

The characteristic polynomial of Q is pq (λ) = λ2 + 2λ + 2, and the eigenvalues of Q are λ1 = −1 + i


and λ2 = −1 − i. Thus, the general solution is

x(t) = c1 e(−1+i)t + c2 e(−1−i)t

433
§6.7 *Second Order Equations

for some c1 , c2 ∈ C. Using Euler’s formula eiθ = cos θ + i sin θ, we can write

x(t) = c1 e−t eit + c2 e−t e−it


= e−t (c1 (cos t + i sin t) + c2 (cos t − i sin t))
= e−t ((c1 + c2 ) cos t + i(c1 − c2 ) sin t).

We know that x(t) ∈ R for all t. Thus, (c1 + c2 ) ∈ R, and (c1 − c2 ) is purely imaginary, so c2 = c1 .
1
Let c1 = (α + iβ) for α, β ∈ R. Then x(t) = e−t (α cos t + β sin t).
2

{c6.7.3}
6. Prove that a nonzero solution to a second order linear differential equation with constant coeffi-
cients cannot be identically equal to zero on a nonempty interval.
Suppose x(t) = 0 for all t ∈ [a, b]. Then there exists a c ∈ [a, b] such that x(c) = 0 and ẋ(c) = 0,
since x(c) = 0 = x(c + ) for small . According to the differential equation for x, ẍ(c) = 0. Thus
x is identically 0.

{c6.7.4}
7. Let r > 0 and w > 0 be constants, and let x(t) be a solution to the differential equation

ẍ + rẋ + wx = 0.

Show that lim x(t) = 0.


t→∞

This differential equation can be rewritten as Ẋ = QX, where y = ẋ,


   
x 0 1
X= , and Q = .
y −w −r

Then we can solve for x(t) in closed form, obtaining

x(t) = αeλ1 t + βeλ2 t

where λ1 and λ2 are the roots of the characteristic polynomial pq (λ) = λ2 +rλ+w. By the quadratic
formula, the eigenvalues are √
−r ± r2 − 4w
λ= .
2
We know that −r < 0. If r2 − 4w < 0, then λ1 and λ2 are complex p conjugates with negative real
part. Otherwise, λ1 and λ2 are negative real numbers, since r > r2 − 4w for any choice of r and
w. For any λ with negative real part
lim eλt = 0.
t→∞

Thus,
lim x(t) = lim (αeλ1 t + βeλ2 t ) = 0.
t→∞ t→∞

434
§6.7 *Second Order Equations

In Exercises 8 – 10, let x(t) be a solution to the second order linear homogeneous differential
equation (6.7.1). Determine whether the given statement is true or false.
{c6.6.tfa}
8. If x(t) is nonconstant and time periodic, then the roots of the characteristic polynomial are
purely imaginary.

{c6.6.tfb} True.

9. If x(t) is constant in t, then one of the roots of the characteristic polynomial is zero.
Answer: False.
Solution: As a counterexample, consider the trivial case in which x(t) is identically 0.
{c6.6.tfc}
10. If x(t) is not bounded, then the roots of the characteristic polynomial are equal.
Answer: False.
Solution: The equation
x(t) = αeλ1 t + βeλ2 t
with positive α, β and λi is not bounded.

{c3.5.5}
11. Consider the second order differential equation

d2 x dx
{E:2ndorder} + a(x) + b(x) = 0. (6.7.12)
dt2 dt
Let y(t) = ẋ(t) and show that (6.7.12) may be rewritten as a first order coupled system in x(t) and
y(t) as follows:

ẋ = y
ẏ = −b(x) − a(x)y.

Let x(t) be a solution to the second order differential equation

d2 x dx
{order2} + a(x) + b(x) = 0. (6.7.13)
dt2 dt
Let y(t) = ẋ(t). Then (6.7.13) can be rewritten as
 
d dx dx dy
+ a(x) + b(x) = + a(x)y + b(x) = 0
dt dt dt dt

So any solution to (6.7.13) is also a solution to the system

ẋ = y
{order2sol} (6.7.14)
ẏ = −b(x) − a(x)y.

435
§6.7 *Second Order Equations

Conversely, we can solve system (6.7.14) as follows


d
(ẋ) = −b(x) − a(x)ẋ
dx 2
d x dx
= −b(x) − a(x)
dt2 dt
to show that any solution to (6.7.14) is also a solution to (6.7.13).

{c6.7.5} 12. (matlab) Use pplane9 to compute solutions to the system corresponding to the spring equa-
tions with small sliding friction. Plot the time series (in x) of the solution and observe the oscillating
and damping of the solution.
The pplane9 graph of the differential equation ẍ − 0.05ẋ − 3x is shown in Figure 12a. The x vs. t
graph is shown in Figure 12b.

5 15
4

3 10
2

1 5

0
y

−1 0
x

−2

−3 −5

−4

−5 −10
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
−15
−100 −50 0 50 100 150
t

Figure 12a Figure 12b

436
Chapter 7 Determinants and Eigenvalues

7 Determinants and Eigenvalues


In Section 3.8 we introduced determinants for 2 × 2 matrices A. There we showed that the
determinant of A is nonzero if and only if A is invertible. In Section 4.6 we saw that the
eigenvalues of A are the roots of its characteristic polynomial, and that its characteristic
polynomial is just the determinant of a related matrix, namely, pA (λ) = det(A − λI2 ).
In Section 7.1 we generalize the concept of determinants to n×n matrices, and in Section 7.2
we use determinants to show that every n × n matrix has exactly n eigenvalues — the roots
of its characteristic polynomial. Properties of eigenvalues are also discussed in detail in
Section 7.2. Certain details concerning determinants are deferred to Appendix 7.4.

437
§7.1 Determinants

{C:D&E}
{S:det} 7.1 Determinants
There are several equivalent ways to introduce determinants — none of which are easily
motivated. We prefer to define determinants through the properties they satisfy rather
than by formula. These properties actually enable us to compute determinants of n × n
matrices where n > 3, which further justifies the approach. Later on, we will give an
{D:determinants} inductive formula (7.1.9) for computing the determinant.
Definition 7.1.1. A determinant function of a square n × n matrix A is a real number
D(A) that satisfies three properties:

(a) If A = (aij ) is lower triangular, then D(A) is the product of the diagonal entries; that
is,
D(A) = a11 · · · · · ann .
(b) D(At ) = D(A).
(c) Let B be an n × n matrix. Then
{e:detproduct} D(AB) = D(A)D(B). (7.1.1)
{T:determinants}
Theorem 7.1.2. There exists a unique determinant function det satisfying the three prop-
erties of Definition 7.1.1.

We will show that it is possible to compute the determinant of any n × n matrix using
Definition 7.1.1. Here we present a few examples:
Lemma 7.1.3. Let A be an n × n matrix.

(a) Let c ∈ R be a scalar. Then D(cA) = cn D(A).


(b) If all of the entries in either a row or a column of A are zero, then D(A) = 0.

Proof (a) Note that Definition 7.1.1(a) implies that D(cIn ) = cn . It follows from (7.1.1)
that
D(cA) = D(cIn A) = D(cIn )D(A) = cn D(A).

(b) Definition 7.1.1(b) implies that it suffices to prove this assertion when one row of A is
zero. Suppose that the ith row of A is zero. Let J be an n × n diagonal matrix with a
1 in every diagonal entry except the ith diagonal entry which is 0. A matrix calculation
shows that JA = A. It follows from Definition 7.1.1(a) that D(J) = 0 and from (7.1.1) that
D(A) = 0. 

438
§7.1 Determinants

Determinants of 2 × 2 Matrices Before discussing how to compute determinants, we


discuss the special case of 2 × 2 matrices. Recall from (3.8.2) of Section 3.8 that when
 
a b
A=
c d

we defined
{e:determinantn=2} det(A) = ad − bc. (7.1.2)
We check that (7.1.2) satisfies the three properties in Definition 7.1.1. Observe that when
A is lower triangular, then b = 0 and det(A) = ad. So (a) is satisfied. It is straightforward
to verify (b). We already verified (c) in Chapter 3, Proposition 3.8.2.
It is less obvious perhaps — but true nonetheless — that the three properties of D(A)
actually force the determinant of 2 × 2 matrices to be given by formula (7.1.2). We begin
by showing that Definition 7.1.1 implies that
 
0 1
{e:detswap} D = −1. (7.1.3)
1 0

We verify (7.1.3) by observing that  


0 1
1 0
equals     
1 −1 1 0 1 0 1 1
{e:swapdecomp} . (7.1.4)
0 1 1 1 0 −1 0 1
Hence property (c), (a) and (b) imply that
 
0 1
D = 1 · 1 · (−1) · 1 = −1.
1 0

It is helpful to interpret the matrices in (7.1.4) as elementary row operations. Then (7.1.4)
states that swapping two rows in a 2 × 2 matrix is the same as performing the following row
operations in order:

• add the 2nd row to the 1st row;


• multiply the 2nd row by −1;
• add the 1st row to the 2nd row; and
• subtract the 2nd row from the 1st row.

439
§7.1 Determinants

Suppose that d 6= 0. Then


b ad − bc
  ! !
a b 1 0
A= = d d .
c d 0 1 c d

It follows from properties (c), (b) and (a) that


ad − bc
D(A) = d = ad − bc = det(A),
d
as claimed.
Now suppose that d = 0 and note that
    
a b 0 1 c 0
A= = .
c 0 1 0 a b

Using (7.1.3) we see that


 
c 0
D(A) = −D = −bc = det(A),
a b

as desired.
We have verified that the only possible determinant function for 2 × 2 matrices is the
determinant function defined by (7.1.2).

{P:ERO} Row Operations are Invertible Matrices


Proposition 7.1.4. Let A and B be m × n matrices where B is obtained from A by a
single elementary row operation. Then there exists an invertible m × m matrix R such that
B = RA.

Proof First consider multiplying the j th row of A by the nonzero constant c. Let R be
the diagonal matrix whose j th entry on the diagonal is c and whose other diagonal entries
are 1. Then the matrix RA is just the matrix obtained from A by multiplying the j th row
of A by c. Note that R is invertible when c 6= 0 and that R−1 is the diagonal matrix whose
1
j th entry is and whose other diagonal entries are 1. For example
c
    
1 0 0 a11 a12 a13 a11 a12 a13
 0 1 0   a21 a22 a23  =  a21 a22 a23  ,
0 0 2 a31 a32 a33 2a31 2a32 2a33

440
§7.1 Determinants

multiplies the 3rd row by 2.


Next we show that the elementary row operation that swaps two rows may also be thought
of as matrix multiplication. Let R = (rkl ) be the matrix that deviates from the identity
matrix by changing in the four entries:

rii = 0
rjj = 0
rij = 1
rji = 1

A calculation shows that RA is the matrix obtained from A by swapping the ith and j th
rows. For example,
    
0 0 1 a11 a12 a13 a31 a32 a33
 0 1 0   a21 a22 a23  =  a21 a22 a23  ,
1 0 0 a31 a32 a33 a11 a12 a13

which swaps the 1st and 3rd rows. Another calculation shows that R2 = In and hence that
R is invertible since R−1 = R.
Finally, we claim that adding c times the ith row of A to the j th row of A can be viewed as
matrix multiplication. Let Ek` be the matrix all of whose entries are 0 except for the entry
in the k th row and `th column which is 1. Then R = In + cEij has the property that RA
is the matrix obtained by adding c times the j th row of A to the ith row. We can verify by
multiplication that R is invertible and that R−1 = In − cEij . More precisely,

(In + cEij )(In − cEij ) = In + cEij − cEij − c2 Eij


2
= In ,

since Eij
2
= O for i 6= j. For example,
  
1 5 0 a11 a12 a13
(I3 + 5E12 )A =  0 1 0   a21 a22 a23 
0 0 1 a31 a32 a33
 
a11 + 5a21 a12 + 5a22 a13 + 5a23
=  a21 a22 a23 ,
a31 a32 a33

adds 5 times the 2nd row to the 1st row. 

441
§7.1 Determinants

{L:detelemrowmat} Determinants of Elementary Row Matrices


Lemma 7.1.5. (a) The determinant of the matrix that adds a multiple of one row to
another is 1.
(b) The determinant of the matrix that multiplies one row by c is c.
(c) The determinant of a swap matrix is −1.

Proof (a) The matrix that adds a multiple of one row to another is triangular (either
upper or lower) and has 1’s on the diagonal. Thus property (a) in Definition 7.1.1 implies
that the determinants of these matrices are equal to 1.
(b) The matrix that multiplies the ith row by c 6= 0 is a diagonal matrix all of whose diagonal
entries are 1 except for aii = c. Again property (a) implies that the determinant of this
matrix is c 6= 0.
(c) The matrix that swaps the ith row with the j th row is the product of four matrices of
types (a) and (b). To see this let A be an n × n matrix whose ith row vector is ai . Then
perform the following four operations in order:

Operation Result Matrix


Add row i to row j row i = ai row j = ai + aj B1
Multiply row i by −1 row i = −ai row j = ai + aj B2
Add row j to row i row i = aj row j = ai + aj B3
Subtract row i from row j row i = aj row j = ai B4

It follows that the swap matrix equals B4 B3 B2 B1 . Therefore

det(swap) = det(B4 ) det(B3 ) det(B2 ) det(B1 ) = (1)(−1)(1)(1) = −1.

Computation of Determinants We now show how to compute the determinant of any n×n
matrix A using elementary row operations and Definition 7.1.1. It follows from Proposi-
tion 7.1.4 that every elementary row operation on A may be performed by premultiplying
A by an elementary row matrix.
For each matrix A there is a unique reduced echelon form matrix E and a sequence of
elementary row matrices R1 . . . Rs such that

{e:rowreduction} E = Rs · · · R1 A. (7.1.5)

442
§7.1 Determinants

It follows from Definition 7.1.1(c) that we can compute the determinant of A once we know
the determinants of reduced echelon form matrices and the determinants of elementary row
matrices. In particular
D(E)
{e:detformula} D(A) = . (7.1.6)
D(R1 ) · · · D(Rs )

It is easy to compute the determinant of any matrix in reduced echelon form using Defini-
tion 7.1.1(a) since all reduced echelon form n×n matrices are upper triangular. Lemma 7.1.5
tells us how to compute the determinants of elementary row matrices. This discussion proves:

Proposition 7.1.6. If a determinant function exists for n × n matrices, then it is unique.


We call the unique determinant function det.

We still need to show that determinant functions exist when n > 2. More precisely, we
know that the reduced echelon form matrix E is uniquely defined from A (Chapter 2,
Theorem 2.4.9), but there is more than one way to perform elementary row operations on
A to get to E. Thus, we can write A in the form (7.1.6) in many different ways, and these
different decompositions might lead to different values for det A. (They don’t.)

An Example of Determinants by Row Reduction As a practical matter we row reduce a square


matrix A by premultiplying A by an elementary row matrix Rj . Thus

1
{e:pracdet} det(A) = det(Rj A). (7.1.7)
det(Rj )

We use this approach to compute the determinant of the 4 × 4 matrix


 
0 2 10 −2
 1 2 4 0 
A= .
 1 6 1 −2 
2 1 1 0

The idea is to use (7.1.7) to keep track of the determinant while row reducing A to upper
triangular form. For instance, swapping rows changes the sign of the determinant; so
 
1 2 4 0
 0 2 10 −2 
det(A) = − det  .
 1 6 1 −2 
2 1 1 0

443
§7.1 Determinants

Adding multiples of one row to another leaves the determinant unchanged; so


 
1 2 4 0
 0 2 10 −2 
det(A) = − det  .
 0 4 −3 −2 
0 −3 −7 0

Multiplying a row by a scalar c corresponds to an elementary row matrix whose determinant


is c. To make sure that we do not change the value of det(A), we have to divide the
determinant by c as we multiply a row of A by c. So as we divide the second row of the
matrix by 2, we multiply the whole result by 2, obtaining
 
1 2 4 0
 0 1 5 −1 
det(A) = −2 det  .
 0 4 −3 −2 
0 −3 −7 0

We continue row reduction by zeroing out the last two entries in the 2nd column, obtaining
 
1 2 4 0
 0 1 5 −1 
det(A) = −2 det  0 0 −23

2 
0 0 8 −3
 
1 2 4 0
 0 1 5 −1 
= 46 det  2 .
 
 0 0 1 − 
23
0 0 8 −3

Thus
1 2 4 0
 
 0 1 5 −1 

det(A) = 46 det  2 
 = −106.
 0 0 1 −
23

53
 
0 0 0 −
23

Determinants and Inverses We end this subsection with an important observation about
the determinant function. This observation generalizes to dimension n Corollary 3.8.3 of
Chapter 3.

444
§7.1 Determinants

{T:detandinv}
Theorem 7.1.7. An n × n matrix A is invertible if and only if det(A) 6= 0. Moreover, if
A−1 exists, then
1
{E:detinv} det A−1 = . (7.1.8)
det A

Proof If A is invertible, then

det(A) det(A−1 ) = det(AA−1 ) = det(In ) = 1.

Thus det(A) 6= 0 and (7.1.8) is valid. In particular, the determinants of elementary row ma-
trices are nonzero, since they are all invertible. (This point was proved by direct calculation
in Lemma 7.1.5.)
If A is singular, then A is row equivalent to a non-identity reduced echelon form matrix E
whose determinant is zero (since E is upper triangular and its last diagonal entry is zero).
So it follows from (7.1.5) that

0 = det(E) = det(R1 ) · · · det(Rs ) det(A)

Since det(Rj ) 6= 0, it follows that det(A) = 0. 

Corollary 7.1.8. If the rows of an n × n matrix A are linearly dependent (for example, if
one row of A is a scalar multiple of another row of A), then det(A) = 0.

An Inductive Formula for Determinants In this subsection we present an inductive for-


mula for the determinant — that is, we assume that the determinant is known for square
(n − 1) × (n − 1) matrices and use this formula to define the determinant for n × n matrices.
This inductive formula is called expansion by cofactors.
Let A = (aij ) be an n × n matrix. Let Aij be the (n − 1) × (n − 1) matrix formed from A
by deleting the ith row and the j th column. The matrices Aij are called cofactor matrices
of A.
Inductively we define the determinant of an n × n matrix A by:
n
X
det(A) = (−1)1+j a1j det(A1j )
j=1

= a11 det(A11 ) − a12 det(A12 ) + · · ·


{e:inductdet} + (−1)n+1 a1n det(A1n ). (7.1.9)

445
§7.1 Determinants

In Appendix 7.4 we show that the determinant function defined by (7.1.9) satisfies all
properties of a determinant function. Formula (7.1.9) is also called expansion by cofactors
along the 1st row, since the a1j are taken from the 1st row of A. Since det(A) = det(At ), it
follows that if (7.1.9) is valid as an inductive definition of determinant, then expansion by
cofactors along the 1st column is also valid. That is,

{e:inductdetc} det(A) = a11 det(A11 ) − a21 det(A21 ) + · · · + (−1)n+1 an1 det(An1 ). (7.1.10)

We now explore some of the consequences of this definition, beginning with determinants
of small matrices. For example, Definition 7.1.1(a) implies that the determinant of a 1 × 1
matrix is just
det(a) = a.
Therefore, using (7.1.9), the determinant of a 2 × 2 matrix is:
 
a11 a12
det = a11 det(a22 ) − a12 det(a21 ) = a11 a22 − a12 a21 ,
a21 a22

which is just the formula for determinants of 2 × 2 matrices given in (7.1.2).


Similarly, we can now find a formula for the determinant of 3 × 3 matrices A as follows.
   
a22 a23 a21 a23
det(A) = a11 det − a12 det
a32 a33 a31 a33
 
a21 a22
+ a13 det
a31 a32
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32
{e:det3} − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . (7.1.11)

As an example, compute  
2 1 4
det  1 −1 3 
5 6 −2
using formula (7.1.11) as

2(−1)(−2) + 1 · 3 · 5 + 4 · 6 · 1 − 4(−1)5 − 3 · 6 · 2 − (−2)1 · 1


= 4 + 15 + 24 + 20 − 36 + 2 = 29.

There is a visual mnemonic for remembering how to compute the six terms in formula
(7.1.11) for the determinant of 3 × 3 matrices. Write the matrix as a 3 × 5 array by

446
§7.1 Determinants

a11 a12 a13 a11 a12

a21 a22 a23 a21 a22

a31 a32 a33 a31 a32

{F:det3} Figure 24: Mnemonic for computation of determinants of 3 × 3 matrices.

repeating the first two columns, as shown in bold face in Figure 24: Then add the product
of terms connected by solid lines sloping down and to the right and subtract the products of
terms connected by dashed lines sloping up and to the right. Warning: this nice crisscross
algorithm for computing determinants of 3×3 matrices does not generalize to n×n matrices.
When computing determinants of n × n matrices when n > 3, it is usually more efficient to
compute the determinant using row reduction rather than by using formula (7.1.9). In the
appendix to this chapter, Section 7.4, we verify that formula (7.1.9) actually satisfies the
three properties of a determinant, thus completing the proof of Theorem 7.1.2.
An interesting and useful formula for reducing the effort in computing determinants is given
{L:detblockdiag} by the following formula.
Lemma 7.1.9. Let A be an n × n matrix of the form
 
B 0
A= ,
C D

where B is a k × k matrix and D is an (n − k) × (n − k) matrix. Then

det(A) = det(B) det(D).

Proof We prove this result using (7.1.9) coupled with induction. Assume that this lemma

447
§7.1 Determinants

is valid or all (n − 1) × (n − 1) matrices of the appropriate form. Now use (7.1.9) to compute
det(A) = a11 det(A11 ) − a12 det(A12 ) + · · · ± a1n det(A1n )
= b11 det(A11 ) − b12 det(A12 ) + · · · ± b1k det(A1k ).
Note that the cofactor matrices A1j are obtained from A by deleting the 1st row and the
j th column. These matrices all have the form
 
B1j 0
A1j = ,
Cj D

where Cj is obtained from C by deleting the j th column. By induction on k


det(A1j ) = det(B1j ) det(D).
It follows that
det(A) = (b11 det(B11 ) − b12 det(B12 ) + · · ·
±b1k det(B1k )) det(D)
= det(B) det(D),
as desired. 

Determinants in MATLAB The determinant function has been preprogrammed in MAT-


LAB and is quite easy to use. For example, typing e8_1_11 will load the matrix
 
1 2 3 0
 2 1 4 1 
{e:A4x4} A=  −2 −1
. (7.1.12*)
0 1 
−1 0 −2 3
To compute the determinant of A just type det(A) and obtain the answer

ans =
-46

Alternatively, we can use row reduction techniques in MATLAB to compute the determinant
of A — just to test the theory that we have developed. Note that to compute the determinant
we do not need to row reduce to reduced echelon form — we need only reduce to an upper
triangular matrix. This can always be done by successively adding multiples of one row to
another — an operation that does not change the determinant. For example, to clear the
entries in the 1st column below the 1st row, type

448
§7.1 Determinants

A(2,:) = A(2,:) - 2*A(1,:);


A(3,:) = A(3,:) + 2*A(1,:);
A(4,:) = A(4,:) + A(1,:)

obtaining

A =
1 2 3 0
0 -3 -2 1
0 3 6 1
0 2 1 3

To clear the 2nd column below the 2nd row type

A(3,:) = A(3,:) + A(2,:);A(4,:)


= A(4,:) - A(4,2)*A(2,:)/A(2,2)

obtaining

A =
1.0000 2.0000 3.0000 0
0 -3.0000 -2.0000 1.0000
0 0 4.0000 2.0000
0 0 -0.3333 3.6667

Finally, to clear the entry (4, 3) type

A(4,:) = A(4,:) -A(4,3)*A(3,:)/A(3,3)

to obtain

A =
1.0000 2.0000 3.0000 0
0 -3.0000 -2.0000 1.0000
0 0 4.0000 2.0000
0 0 0 3.8333

To evaluate the determinant of A, which is now an upper triangular matrix, type

A(1,1)*A(2,2)*A(3,3)*A(4,4)

449
§7.1 Determinants

obtaining

ans =
-46

as expected.

Exercises

In Exercises 1 – 3 compute the determinants of the given matrix.


{c10.1.1a}
 
−2 1 0
1. A =  4 5 0 .
1 0 2
Answer: The determinant of the matrix is −28.
Solution: Expand along the third column, obtaining:
 
−2 1 0  
−2 1
det  4 5 0  = 2 det = 2(−14) = −28.
4 5
1 0 2
{c10.1.1b}
 
1 0 2 3
 −1 −2 3 2 
2. B =  .
 4 −2 0 3 
1 2 0 −3
Answer: The determinant of the matrix is −110.
Solution: First row reduce:
   
1 0 2 3 1 0 2 3
 −1 −2 3 2   0 −2 5 5 
det   = det  .
 4 −2 0 3   0 −2 −8 −9 
1 2 0 −3 0 2 −2 −6

Then, use formula (7.1.9):


 
−2 5 5
(−2)(−8)(−6) + 5(−9)2 + 5(−2)(−2) − 5(−8)2
det  −2 −8 −9  =
−5(−2)(−6) − (−2)(−9)(−2)
2 −2 −6
= −96 − 90 + 20 + 80 − 60 + 36
= −110.

450
§7.1 Determinants

{c10.1.1c}
 
2 1 −1 0 0
 1 −2 3 0 0 
3. C =  .
 
 −3 2 −2 0 0 
 1 1 −1 2 4 
0 2 3 −1 −3
Answer: The determinant of the matrix is 14.
Solution: Using Lemma 7.1.9, compute
 
2 1 −1  
2 4
det(A) = det  1 −2 3  det
−1 −3
−3 2 −2
= 2(−2)(−2) + 1 · 3(−3) + (−1)1 · 2 − (−1)(−2)(−3) − 1 · 1(−2)
−2 · 3 · 2 (−2)
= (−7)(−2)
= 14.

{mc7_1A}
 
0 2 0 1
 1 −1 0 −1 
4. Compute det  1
.
1 1 3 
0 1 0 1
 
0 2 0 1
 1 −1 0 −1 
Answer: det 
 1
 = 1.
1 1 3 
0 1 0 1
Solution: Use cofactor expansion along the third column to compute
 
0 2 0 1  
0 2 1  
 1 −1 0 −1  2 1
det 
 1
 = det  1 −1 −1  = − det = −1.
1 1 3  1 1
0 1 1
0 1 0 1

{c10.1.2}
 
−2 −3 2
5. Find det(A−1 ) where A =  4 1 3 .
−1 1 1
1
Answer: The solution is det(A−1 ) = .
35

451
§7.1 Determinants

Solution: By Definition 7.1.1(c), det(A) det(A−1 ) = det(I3 ) = 1. Therefore,


1
det(A−1 ) = .
det(A)
Now compute det(A) using (7.1.9):
     
1 3 4 3 4 1
det(A) = −2 +3 +2 = 35.
1 1 −1 1 −1 1
{c10.1.3}
6. Show that the determinants of similar n × n matrices are equal.
Two n × n matrices B and C are similar if there exists an n × n matrix P such that B = P −1 CP .
Therefore, by Definition 7.1.1(c),
det(B) = det(P −1 CP ) = det(P −1 ) det(C) det(P ) = det(P )−1 det(P ) det(C) = det(C).

{c10.1.4} In Exercises 7 – 9 use row reduction to compute the determinant of the given matrix.
 
−1 −2 1
7. A =  3 1 3 .
−1 1 1
Answer: The determinant is det(A) = 18.
Solution: Compute by row reduction as follows:
   
−1 −2 1 1 2 −1
det  3 1 3  = − det  0 −5 6 
−1 1 1  0 3 0
1 2 −1
= 3 det  0 1 0 
 0 −5 6
1 2 −1
= 3 det  0 1 0  = 18.
{c10.1.5a} 0 0 6
 
1 0 1 0
 0 1 0 −1
8. B =  .

 1 0 −1 0 
0 1 0 1
Answer: The determinant is det(B) = −4.
Solution: Compute by row reduction:
   
1 0 1 0 1 0 1 0
 0 1 0 −1    0 1 0 −1 
det  = det   = 1(1)(−2)(2).
 1 0 −1 0   0 0 −2 0 
0 1 0 1 0 0 0 2

452
§7.1 Determinants

{c10.1.5b}
 
1 2 0 1
 0 2 1 0 
9. C = 
 −2
.
−3 3 −1 
1 0 5 2
Answer: The determinant is det(C) = −7.
Solution: Compute by row reduction:

1 2 0 1
 
 
1 2 0 1  0 1 3 1   
 = 5 −7 .
 0 2 1 0   2 
det 
  = 5 det 
0 0 1
−2 −3 3 −1  
5
 5
7
 
1 0 5 2 0 0 0 −
5

{mc.exercise8}
 
0 2 0 1
 1 −1 0 −1 
10. Compute det 
 1
.
1 1 3 
0 1 0 1
Answer:  
0 2 0 1
 1 −1 0 −1 
 = −1
det 
 1 1 1 3 
0 1 0 1

Solution: Expanding along the third column, we are reduced to computing


 
0 2 1
det  1 −1 −1  .
0 1 1

Then, expanding along the first column, we are reduced to computing


 
2 1
(−1) det = −(2 − 1) = −1.
1 1

{c10.1.6}
11. Let    
2 −1 0 2 0 0
A= 0 3 0  and B= 0 −1 0 .
1 5 3 0 0 3

453
§7.1 Determinants

(a) For what values of λ is det(λA − B) = 0?


(b) Is there a vector x for which Ax = Bx?
1
(a) Answer: When λ = 1 or λ = − , det(λA − B) = 0.
3
Solution: Compute
 
2λ − 2 −λ 0
det(λA − B) = det  0 3λ + 1 0 
λ  5λ 3λ − 3 
2(λ − 1) −λ
= 3(λ − 1) det
0 3λ + 1
= 6(λ − 1)2 (3λ − 1).

(b) Answer: Yes, there exists a vector x ∈ R3 such that Ax = Bx.


Solution: Let λ = 1. Then, part (a) of this exercise implies that det(A − B) = 0. Therefore, there
exists a vector x such that (A − B)x = 0, that is, Ax = Bx. You can solve this equation for x to
obtain x = (0, 0, 1)t .

{c10.1.5c}
 
0 2 0 1
 1 −1 0 −1 
12. Compute det 
 .
1 1 1 3 
0 1 0 1
Answer: −1
Solution: Use cofactor expansions to compute
 
0 2 0 1  
 1 −1 0 −1  0 2 1  
2 1
det 
 1
 = det  1 −1 −1  = − det = −1
1 1 3  1 1
0 1 1
0 1 0 1

{c10.1.a7a} In Exercises 13 – 14 verify that the given matrix has determinant −1.
 
1 0 0
13. A =  0 0 1 .
0 1 0
By swapping rows 2 and 3 of matrix A, we find that:
   
1 0 0 1 0 0
det  0 0 1  = − det  0 1 0  = −1.
0 1 0 0 0 1

454
§7.1 Determinants

{c10.1.a7b}
 
0 0 1
14. B =  0 1 0 .
1 0 0
By swapping rows 1 and 3 of matrix B, we find that:
   
0 0 1 1 0 0
det  0 1 0  = − det  0 1 0  = −1.
1 0 0 0 0 1

{c10.1.b7a}
 
3 2 −4
15. Compute the cofactor matrices A13 , A22 , A21 when A =  0 1 5 .
0 0 6
     
0 1 3 −4 2 −4
A13 = ; A22 = ; A21 = .
0 0 0 6 0 6

{c10.1.b7b}
 
0 2 −4 5
 −1 7 −2 10 
16. Compute the cofactor matrices B11 , B23 , B43 when B =  .
 0 0 0 −1 
3 4 2 −10
     
7 −2 10 0 2 5 0 2 5
B11 =  0 0 −1 ; B23 =  0 0 −1 ; B43 =  −1 7 10 .
4 2 −10 3 4 −10 0 0 −1

{c10.1.c7}
17. Find values of λ where the determinant of the matrix
 
λ−1 0 −1
Aλ =  0 λ−1 1 
−1 1 λ

vanishes.
Answer: The determinant of Aλ vanishes at λ = −1, λ = 1, and λ = 2.
Solution: Compute det(Aλ ) by expanding along the first column to obtain:

det(Aλ ) = (λ − 1)((λ − 1)λ − 1) − 1(λ − 1)


= (λ − 1)(λ − 2)(λ + 1).

455
§7.1 Determinants

{c10.1.c8}
18. Suppose that two n × p matrices A and B are row equivalent. Show that there is an invertible
n × n matrix P such that B = P A.
By Proposition 7.1.4, every elementary row operation on A can be represented by an invertible
n × n matrix R. That is, the matrix RA is row equivalent to A. If A and B are row equivalent,
then there exist matrices R1 , . . . , Rk such that B = Rk · · · R1 A. The product of invertible n × n
matrices is an invertible n × n matrix. Thus P = Rk · · · R1 is an invertible n × n matrix such that
B = P A.
{c10.1.c9}
19. Let A be an invertible n × n matrix and let b ∈ Rn be a column vector. Let Bj be the n × n
matrix obtained from A by replacing the j th column of A by the vector b. Let x = (x1 , . . . , xn )t be
the unique solution to Ax = b. Then Cramer’s rule states that
det(Bj )
{E:cramer2} xj = . (7.1.13)
det(A)

Prove Cramer’s rule. Hint: Let Aj be the j th column of A so that Aj = Aej . Show that

Bj = A(e1 | · · · |ej−1 |x|ej+1 | · · · |en ).

Using this product, compute the determinant of Bj and verify (7.1.13).


Let A be an invertible n × n matrix and let b ∈ Rn be a column vector. Let Bj be the n × n matrix
obtained from A by replacing the j th column of A by the vector b. Let x = (x1 , . . . , xn )t be the
unique solution to Ax = b. We claim that
det(Bj )
xj = .
det(A)

Let Aj be the j th column of A so that Aej = Aj . It follows that

A(e1 | · · · |ej−1 |x|ej+1 | · · · |en ) = (Ae1 | · · · |Aej−1 |Ax|Aej+1 | · · · |Aen )


= (A1 | · · · |Aj−1 |b|Aj+1 | · · · |An )
= Bj .

Therefore,
det(Bj ) = det(A) det(e1 | · · · |ej−1 |x|ej+1 | · · · |en ).
We claim that
det(e1 | · · · |ej−1 |x|ej+1 | · · · |en ) = xj ,
from which Cramer’s rule follows. Since det(C) = det(C t ), we can perform elementary column
operations on C while computing determinants. In particular, we can subtract xk ek (for k > j)
from the j th column of
(e1 | · · · |ej−1 |x|ej+1 | · · · |en )

456
§7.1 Determinants

without changing its determinant. The end result is that

det(e1 | · · · |ej−1 |x|ej+1 | · · · |en ) = det(e1 | · · · |ej−1 |y|ej+1 | · · · |en ),

where y = (x1 , . . . , xj , 0, . . . , 0)t . Since this last matrix is upper triangular, its determinant is the
product of its diagonal elements — which equals xj .

457
§7.2 Eigenvalues

{S:eig} 7.2 Eigenvalues


In this section we discuss how to find eigenvalues for an n × n matrix A. This discussion
parallels the discussion for 2 × 2 matrices given in Section 4.6. As we noted in that section,
λ is a real eigenvalue of A if there exists a nonzero eigenvector v such that
{e:eigen} Av = λv. (7.2.1)
It follows that the matrix A − λIn is singular since
(A − λIn )v = 0.
Theorem 7.1.7 implies that
det(A − λIn ) = 0.
{D:charpoly} With these observations in mind, we can make the following definition.
Definition 7.2.1. Let A be an n × n matrix. The characteristic polynomial of A is:
pA (λ) = det(A − λIn ).

In Theorem 7.2.3 we show that pA (λ) is indeed a polynomial of degree n in λ. Note here
that the roots of pA are the eigenvalues of A. As we discussed, the real eigenvalues of
A are roots of the characteristic polynomial. Conversely, if λ is a real root of pA , then
Theorem 7.1.7 states that the matrix A − λIn is singular and therefore that there exists a
nonzero vector v such that (7.2.1) is satisfied. Similarly, by using this extended algebraic
definition of eigenvalues we allow the possibility of complex eigenvalues. The complex analog
of Theorem 7.1.7 shows that if λ is a complex eigenvalue, then there exists a nonzero complex
{E:triangular} n-vector v such that (7.2.1) is satisfied.
Example 7.2.2. Let A be an n × n lower triangular matrix. Then the diagonal entries are
the eigenvalues of A. We verify this statement as follows.
 
a11 − λ 0
..
A − λIn =  . .
 

(∗) ann − λ
Since the determinant of a triangular matrix is the product of the diagonal entries, it follows
that
{e:triangpoly} pA (λ) = (a11 − λ) · · · (ann − λ), (7.2.2)
and hence that the diagonal entries of A are roots of the characteristic polynomial. A similar
argument works if A is upper triangular.

458
§7.2 Eigenvalues

It follows from (7.2.2) that the characteristic polynomial of a triangular matrix is a polyno-
mial of degree n and that
{e:leadingterm} pA (λ) = (−1)n λn + bn−1 λn−1 + · · · + b0 . (7.2.3)
{T:charpolyn} for some real constants b0 , . . . , bn−1 . In fact, this statement is true in general.
Theorem 7.2.3. Let A be an n × n matrix. Then pA is a polynomial of degree n of the
form (7.2.3).

Proof Let C be an n × n matrix whose entries have the form cij + dij λ. Then det(C)
is a polynomial in λ of degree at most n. We verify this statement by induction. It is
easily verified when n = 1, since then C = (c + dλ) for some real numbers c and d. Then
det(C) = c + dλ which is a polynomial of degree at most one. (It may have degree zero, if
d = 0.) So assume that this statement is true for (n − 1) × (n − 1) matrices. Recall from
(7.1.9) that
det(C) = (c11 + d11 λ) det(C11 ) + · · · + (−1)n+1 (c1n + d1n λ) det(C1n ).
By induction each of the determinants C1j is a polynomial of degree at most n−1. It follows
that multiplication by c1j + d1j λ yields a polynomial of degree at most n in λ. Since the
sum of polynomials of degree at most n is a polynomial of degree at most n, we have verified
our assertion.
Since A − λIn is a matrix whose entries have the desired form, it follows that pA (λ) is a
polynomial of degree at most n in λ. To complete the proof of this theorem we need to
show that the coefficient of λn is (−1)n . Again, we verify this statement by induction. This
statement is easily verified for 1 × 1 matrices — we assume that it is true for (n − 1) × (n − 1)
matrices. Again use (7.1.9) to compute
det(A − λIn ) = (a11 − λ) det(B11 ) − a12 det(B12 ) + · · ·
+ (−1)n+1 a1n det(B1n ).
where B1j are the cofactor matrices of A − λIn . Using our previous observation all of the
terms det(B1j ) are polynomials of degree at most n − 1. Thus, in this expansion, the only
term that can contribute a term of degree n is:
−λ det(B11 ).
Note that the cofactor matrix B11 is the (n − 1) × (n − 1) matrix
B11 = A11 − λIn−1 ,
where A11 is the first cofactor matrix of the matrix A. By induction, det(B11 ) is a polynomial
of degree n − 1 with leading term (−1)n−1 λn−1 . Multiplying this polynomial by −λ yields
a polynomial of degree n with the correct leading term. 

459
§7.2 Eigenvalues

General Properties of Eigenvalues The fundamental theorem of algebra states that ev-
ery polynomial of degree n has exactly n roots (counting multiplicity). For example, the
quadratic formula shows that every quadratic polynomial has exactly two roots. In general,
the proof of the fundamental theorem is not easy and is certainly beyond the limits of this
course. Indeed, the difficulty in proving the fundamental theorem of algebra is in proving
that a polynomial p(λ) of degree n > 0 has one (complex) root. Suppose that λ0 is a root
of p(λ); that is, suppose that p(λ0 ) = 0. Then it follows that

{e:factoring} p(λ) = (λ − λ0 )q(λ) (7.2.4)

for some polynomial q of degree n − 1. So once we know that p has a root, then we can
argue by induction to prove that p has n roots. A linear algebra proof of (7.2.4) is sketched
in Exercise 17.
Recall that a polynomial need not have any real roots. For example, the polynomial p(λ) =
λ2 +1 has no real
√ roots, since p(λ) > 0 for all real λ. This polynomial does have two complex
roots ±i = ± −1.
However, a polynomial with real coefficients has either real roots or complex roots that
come in complex conjugate pairs. To verify this statement, we need to show that if λ0 is a
complex root of p(λ), then so is λ0 . We claim that

p(λ) = p(λ).

To verify this point, suppose that

p(λ) = cn λn + cn−1 λn−1 + · · · + c0 ,

where each cj ∈ R. Then

p(λ)
= cn λn + cn−1 λn−1 + · · · + c0
n n−1
= cn λ + cn−1 λ + · · · + c0
= p(λ)

If λ0 is a root of p(λ), then


p(λ0 ) = p(λ0 ) = 0 = 0.
Hence λ0 is also a root of p.
It follows that

460
§7.2 Eigenvalues

{T:eigens}
Theorem 7.2.4. Every (real) n × n matrix A has exactly n eigenvalues λ1 , . . . , λn . These
eigenvalues are either real or complex conjugate pairs. Moreover,

(a) pA (λ) = (λ1 − λ) · · · (λn − λ),


(b) det(A) = λ1 · · · λn .

Proof Since the characteristic polynomial pA is a polynomial of degree n with real coeffi-
cients, the first part of the theorem follows from the preceding discussion. In particular, it
follows from (7.2.4) that
pA (λ) = c(λ1 − λ) · · · (λn − λ),
for some constant c. Formula (7.2.3) implies that c = 1 — which proves (a). Since pA (λ) =
det(A − λIn ), it follows that pA (0) = det(A). Thus (a) implies that pA (0) = λ1 · · · λn , thus
proving (b). 

The eigenvalues of a matrix do not have to be different. For example, consider the extreme
case of a strictly triangular matrix A. Example 7.2.2 shows that all of the eigenvalues of A
are zero.

{C:eig=0} We now discuss certain properties of eigenvalues.


Corollary 7.2.5. Let A be an n × n matrix. Then A is invertible if and only if zero is not
an eigenvalue of A.

Proof The proof follows from Theorem 7.1.7 and Theorem 7.2.4(b). 

Lemma 7.2.6. Let A be a singular n × n matrix. Then the null space of A is the span of
all eigenvectors whose associated eigenvalue is zero.

Proof An eigenvector v of A has eigenvalue zero if and only if

Av = 0.

This statement is valid if and only if v is in the null space of A. 


{T:inveig}
Theorem 7.2.7. Let A be an invertible n × n matrix with eigenvalues λ1 , · · · , λn . Then
the eigenvalues of A−1 are λ−1
1 , · · · , λn .
−1

461
§7.2 Eigenvalues

Proof We claim that


1
pA (λ) = (−1)n det(A)λn pA−1 ( ).
λ
1
It then follows that is an eigenvalue for A−1 for each eigenvalue λ of A. This makes sense,
λ
since the eigenvalues of A are nonzero.
Compute:
1 1
(−1)n det(A)λn pA−1 ( ) = (−λ)n det(A) det(A−1 − In )
λ λ
1
= det(−λA) det(A−1 − In )
λ
1
= det(−λA(A−1 − In ))
λ
= det(A − λIn )
= pA (λ),
{T:similareigen} which verifies the claim. 
Theorem 7.2.8. Let A and B be similar n × n matrices. Then
pA = pB ,
and hence the eigenvalues of A and B are identical.

Proof Since B and A are similar, there exists an invertible n × n matrix S such that
B = S −1 AS. It follows that
det(B − λIn ) = det(S −1 AS − λIn )
= det(S −1 (A − λIn )S) = det(A − λIn ),
which verifies that pA = pB . 

Recall that the trace of an n × n matrix A is the sum of the diagonal entries of A; that is
tr(A) = a11 + · · · + ann .

{T:tracen} We state without proof the following theorem:


Theorem 7.2.9. Let A be an n × n matrix with eigenvalues λ1 , . . . , λn . Then
tr(A) = λ1 + · · · + λn .

It follows from Theorem 7.2.8 that the traces of similar matrices are equal.

462
§7.2 Eigenvalues

MATLAB Calculations The commands for computing characteristic polynomials and eigen-
values of square matrices are straightforward in MATLAB . In particular, for an n×n matrix
A, the MATLAB command poly(A) returns the coefficients of (−1)n pA (λ).
For example, reload the 4 × 4 matrix A of (7.1.12*) by typing e8_1_11. The characteristic
polynomial of A is found by typing

poly(A)

to obtain

ans =
1.0000 -5.0000 15.0000 -10.0000 -46.0000

Thus the characteristic polynomial of A is:

pA (λ) = λ4 − 5λ3 + 15λ2 − 10λ − 46.

The eigenvalues of A are found by typing eig(A) and obtaining

ans =
-1.2224
1.6605 + 3.1958i
1.6605 - 3.1958i
2.9014

Thus A has two real eigenvalues and one complex conjugate pair of eigenvalues. Note
that MATLAB has preprogrammed not only the algorithm for finding the characteristic
polynomial, but also numerical routines for finding the roots of the characteristic polynomial.
The trace of A is found by typing trace(A) and obtaining

ans =
5

Using the MATLAB command sum we can verify the statement of Theorem 7.2.9. Indeed
sum(v) computes the sum of the components of the vector v and typing

sum(eig(A))

we obtain the answer 5.0000, as expected.

463
§7.2 Eigenvalues

Exercises

{c10.2.1a} In Exercises 1 – 2 determine the characteristic polynomial and the eigenvalues of the given matrices.
 
−9 −2 −10
1. A =  3 2 3 .
8 2 9
Answer: The characteristic polynomial of A is pA (λ) = −λ3 + 2λ2 + λ − 2, and the eigenvalues
are λ1 = 1, λ2 = −1, and λ3 = 2.
Solution: Compute:
pA (λ) = det(A
 − λI3 ) 
−9 − λ −2 −10
=  3 2−λ 3 
8  2 9 − λ   
2−λ 3 −2 −10
= (−9 − λ) det − 3 det +
 2  9−λ 2 9−λ
−2 −10
8 det
2−λ 3
= (−9 − λ)(λ2 − 11λ + 12) − 3(2 + 2λ) + 8(14 − 10λ)
= −λ3 + 2λ2 + λ − 2
= (λ − 1)(λ + 1)(λ − 2).

{c10.2.1b} The eigenvalues of A are the roots of the characteristic polynomial.


 
2 1 −5 2
 1 2 13 2 
2. B =  .
 0 0 3 −1 
0 0 1 1
Answer: The characteristic polynomial of B is pB (λ) = λ4 − 8λ3 + 23λ2 − 28λ + 12. The matrix
B has single eigenvalues at λ = 1 and λ = 3 and a double eigenvalue at λ = 2.
Solution: Using Lemma 7.1.9, compute:
pB (λ) =  − λI3 )
det(B 
2−λ 1 −5 2
 1 2 − λ 13 2 
= det  
 0 0 3−λ −1 
 0 0  1  1−λ 
2−λ 1 3−λ −1
= det det
1 2−λ 1 1−λ
= ((2 − λ)2 − 1)((3 − λ)(1 − λ) + 1)
= (λ − 3)(λ − 1)(λ − 2)2 .

464
§7.2 Eigenvalues

{c10.2.2}
3. Find a basis for the null space of A − 2I3 where
 
3 1 −1
A =  −1 1 1 
2 2 0

Answer: A basis for the null space of A − 2I3 is:


   
 −1 1 
 1 , 0  .
0 1
 

Solution: Solve the system


    
1 1 −1 v1 0
 −1 −1 1   v2  =  0  .
2 2 −2 v3 0

All solutions v to this system satisfy v1 = v3 − v2 . Thus:


     
v3 − v2 −1 1
v= v2  = v2  1  + v3  0  .
v3 0 1

Therefore, the vectors (−1, 1, 0)t and (1, 0, 1)t form a basis for this eigenspace.

{c10.2.3}
4. Consider the matrix
 
−1 1 1
A= 1 −1 1 .
1 1 −1

(a) Verify that the characteristic polynomial of A is pλ (A) = (λ − 1)(λ + 2)2 .

(b) Show that (1, 1, 1) is an eigenvector of A corresponding to λ = 1.

(c) Show that (1, 1, 1) is orthogonal to every eigenvector of A corresponding to the eigenvalue
λ = −2.

465
§7.2 Eigenvalues

(a) Find the characteristic polynomial by solving

pA (λ) = det(A
 − λI3 ) 
−1 − λ 1 1
=  1 −1 − λ 1 
1  1 −1 − λ   
−1 − λ 1 1 1
= (−1 − λ) det − det +
 1
 −1 − λ 1 −1 − λ
1 1
det
−1 − λ 1
= −(λ3 + 3λ − 4)
= −(λ − 1)(λ + 2)2 .

(b) Verify by computation:


    
−1 1 1 1 1
 1 −1 1  1  =  1 .
1 1 −1 1 1

(c) Find the space of eigenvectors v = (v1 , v2 , v3 ) corresponding to λ = −2 by solving (A−λI3 )v = 0


for λ = −2. That is, solve   
1 1 1 v1
 1 1 1   v2  = 0
1 1 1 v3
to obtain    
−1 −1
v = v2  1  + v3  0  .
0 1
Then compute        
1 −1 1 −1
 1 · 1 =0 and  1  ·  0  = 0.
1 0 1 1
Since (−1, 1, 0)t and (−1, 0, 1)t form a basis for the space of eigenvectors of A corresponding to
λ = −2 and since (1, 1, 1)t is orthogonal to these vectors, (1, 1, 1)t is orthogonal to every eigenvector
of A corresponding to λ = −2.

{c10.2.25}
5. Let  
0 −3 −2
A= 1 −4 −2 
−3 4 1
One of the eigenvalues of A is −1. Find the other eigenvalues of A.

466
§7.2 Eigenvalues


Answer: The other two eigenvalues of A are −1 ± 2.
Solution: The characteristic polynomial of A is
 
−λ −3 −2
det(A − λI3 ) =  1 −4 − λ −2 
−3 4 1−λ
= −λ3 − 3λ2 − λ + 1

Since −1 is an eigenvalue of A, λ + 1 is a factor of the characteristic polynomial. We can solve the


equation

−λ3 − 3λ2 − λ + 1 = −(λ + 1)(λ2 + aλ + b)


= −λ3 − (a + 1)λ2 − (a + b)λ + b

for
−λ3 − 3λ2 − λ + 1 = −(λ + 1)(λ2 + 2λ − 1).

Using the√
quadratic√formula, the roots of λ2 + 2λ − 1 are −1 ± 2. Therefore, the eigenvalues are
−1, −1 + 2, −1 − 2.

{c10.2.4}
 
8 5
6. Consider the matrix A = .
−10 −7

(a) Find the eigenvalues and eigenvectors of A.


(b) Show that the eigenvectors found in (a) form a basis for R2 .
(c) Find the coordinates of the vector (x1 , x2 ) relative to the basis in part (b).

(a) Answer: The eigenvalues of A are λ1 = 3 and λ2 = −2, with corresponding eigenvectors
v1 = (1, −1)t and v2 = (1, −2)t , respectively.
Solution: The characteristic polynomial is pA (λ) = λ2 − λ − 6 = (λ − 3)(λ + 2). Then, solve
Av = λv for each eigenvalue to find the corresponding eigenvectors.
(b) Two linearly independent vectors in R2 form a basis for R2 . Note that v1 6= αv2 for any scalar
α. Therefore, v1 and v2 form a basis for R2 .
(c) Answer: The coordinates of (x1 , x2 ) in the basis {v1 , v2 } are (2x1 + x2 , −x1 − x2 ).
Solution: Find α1 and α2 such that α1 v1 + α2 v2 = (x1 , x2 )t . That is, solve:
    
1 1 α1 x1
=
−1 −2 α2 x2

to obtain α1 = 2x1 + x2 and α2 = −x1 − x2 .

467
§7.2 Eigenvalues

{c10.2.5}
7. Find the characteristic polynomial and the eigenvalues of
 
−1 2 2
A= 2 2 2 .
−3 −6 −6

Find eigenvectors corresponding to each of the three eigenvalues.


Answer: The characteristic polynomial of A is pA (λ) = −(λ3 + 5λ2 + 6λ) = −λ(λ + 2)(λ + 3). The
eigenvalues are λ1 = 0, λ2 = −2, and λ3 = −3, with eigenvectors v1 = (0, 1, −1)t , v2 = (2, −1, 0)t ,
and v3 = (1, 0, −1)t , respectively.
Solution: The eigenvalues are the roots of the characteristic polynomial pA (λ) = det(A − λI3 ).
The eigenvectors are vectors v such that Av = λv, where λ is an eigenvalue of A. Find them by
solving the system (A − λI3 )v = 0.

{c10.2.6}
8. Let A be an n × n matrix. Suppose that

A2 + A + In = 0.

Prove that A is invertible.


We are given A2 + A + In = 0. Therefore, In = −A2 − A = A(−A − In ). Thus, A−1 = −A − In
exists.

{c10.2.6B}
9. Let A be an n × n matrix. Explain why the eigenvalues of A and At are identical.
Solution: The eigenvalues of At are the roots of the characteristic polynomial

det(At − λIn ) = det((A − λIn )t ) = det(A − λIn ),

which is equal to the characteristic polynomial of the matrix A.

In Exercises 10 – 12 decide whether the given statements are true or false. If the statements are
false, give a counterexample; if the statements are true, give a proof.
{c10.2.7a}
10. If the eigenvalues of a 2 × 2 matrix are equal to 1, then the four entries of that matrix are each
less than 500.
Answer: The statement is false.
 
1 500
Solution: A counterexample is the matrix A = .
0 1

468
§7.2 Eigenvalues

{c10.2.7ab}
11. If A is a 4 × 4 matrix and det(A) > 0, then det(−A) > 0.
Answer: The statement is true.

{c10.2.7b} Solution: Since A is a 4 × 4 matrix, det(−A) = (−1)4 det(A) = det(A) > 0.


12. The trace of the product of two n × n matrices is the product of the traces.
Answer: The statement is false.
Solution: For example, let
   
1 −1 1 −1
A= and B= .
0 1 2 0
Then tr(A)tr(B) = 2(1) = 2, and tr(AB) = −1.
{c10.2.8}
13. When n is odd show that every real n × n matrix has a real eigenvalue.
By Theorem 7.2.4, every n × n matrix has exactly n eigenvalues, which are either real or complex
conjugate pairs. Since complex eigenvalues are paired, the number of complex eigenvalues must be
even. Since n is odd, there can be no more than n − 1 complex eigenvalues; so the matrix has at
least one real eigenvalue.

In Exercises 14 – 15, use MATLAB to compute (a) the eigenvalues, traces, and characteristic
polynomials of the given matrix. (b) Use the results from part (a) to confirm Theorems 7.2.7 and
7.2.9.

{c10.2.9a} 14. (matlab)  


−12 −19 −3 14 0
 −12 10 14 −19 8 
{find-eigenvalues} (7.2.5*)
 
A=
 4 −2 1 7 −3 .

 −9 17 −12 −5 −8 
−12 −1 7 13 −12

(a) By calculation in MATLAB using the eig, trace, and poly commands, the eigenvalues of A
are
λ = −0.5861 ± 20.2517, λ = −12.9416, λ = −9.1033, and λ = 5.2171.
The trace of A is −18. The characteristic polynomial of A is
pA = λ5 + 18λ4 + 433λ3 + 6296λ2 + 429λ − 252292.
Note that in order to obtain an accurate value for the characteristic polynomial, it may be necessary
to use the format command.
(b) Theorem 7.2.7 states that the eigenvalues of A−1 are the inverses of the eigenvalues of A. In
MATLAB , compute

469
§7.2 Eigenvalues

eig(inv(A)) =
-0.1098
-0.0773
-0.0014 + 0.0493i
-0.0014 - 0.0493i
0.1917

Then, compute the inverse of each eigenvalue of A to find that if λ is an eigenvalue of A, then λ−1
is indeed an eigenvalue of A−1 .

{c10.2.9b} 15. (matlab)  


−12 −5 13 −6 −5 12

 7 14 6 1 8 18 

−8 14 13 9 2 1
mpute-more-eigenvalues} (7.2.6*)
 
B= .

 2 4 6 −8 −2 15 

 −14 0 −6 14 8 −13 
8 16 −8 3 5 19

(a) The eigenvalues of B are


λ = 32.6273, λ = −12.1564 ± 5.8787i, λ = 18.0009, λ = −3.4878, and λ = 11.1723.
The trace of B is 34. The characteristic polynomial of B is
pB = λ6 − 24λ5 − 298λ4 + 9618λ3 + 86273λ2 − 1019656λ − 4172976.

(b) Theorem 7.2.7 states that the eigenvalues of A−1 are the inverses of the eigenvalues of A. In
MATLAB , compute

eig(inv(B)) =
-0.2867
-0.0667 + 0.0322i
-0.0667 - 0.0322i
0.0895
0.0556
0.0306

Then, compute the inverse of each eigenvalue of B to find that if λ is an eigenvalue of B, then λ−1
is indeed an eigenvalue of B −1 .

{c10.2.10} 16. (matlab) Use MATLAB to compute the characteristic polynomial of the following matrix:
 
4 −6 7
A= 2 0 5 
−10 2 5

470
§7.2 Eigenvalues

Denote this polynomial by pA (λ) = −(λ3 + p2 λ2 + p1 λ + p0 ). Then compute the matrix


B = −(A3 + p2 A2 + p1 A + p0 I).
What do you observe? In symbols B = pA (A). Compute the matrix B for examples of other square
matrices A and determine whether or not your observation was an accident.
Answer: The matrix B is the zero matrix.
Solution: First use MATLAB to compute pA (λ) = λ3 − 9λ2 + 9λ − 348. Then B = pA (A) =
A3 − 9A2 + 9A − 348I3 is the zero matrix. To see why this is true, see the Cayley-Hamilton
Theorem (Theorem 11.3.3).

{c8.2.a1} 17. (matlab) Verify (7.2.4) by proving the following. Let Pn be the vector space of polynomials
in λ of degree less than or equal to n.

(a) Prove that dim Pn is n + 1 by showing that {1, λ, . . . , λn } is a basis.


(b) For every λ0 prove that {1, λ − λ0 , . . . , (λ − λ0 )n } is a basis of Pn .
(c) Use (b) to verify (7.2.4).
{c8.2.a2}
18. Let A be an n × n matrix. List as TRUE all of the following that are equivalent to A being
invertible and FALSE otherwise:

(a) dim(range(LA )) = n
(b) A has n distinct real eigenvalues
(c) 0 is not an eigenvalue of A
(d) The system of equations Ax = e1 is consistent
(e) The system of equations Ax = e1 has a unique solution
(f) A is similar to In
(g) det(A) 6= 0
(h) The rows of A form a basis for Rn

Answer:

(a) TRUE: dim(range(LA )) = n


(b) FALSE: A has n distinct real eigenvalues
(c) TRUE: 0 is not an eigenvalue of A
(d) FALSE: The system of equations Ax = e1 is consistent
(e) TRUE: The system of equations Ax = e1 has a unique solution
(f) FALSE: A is similar to In
(g) TRUE: det(A) 6= 0
(h) TRUE: The rows of A form a basis for Rn

471
§7.3 Real Diagonalizable Matrices

{S:RDM} 7.3 Real Diagonalizable Matrices


An n × n matrix is real diagonalizable if it is similar to a diagonal matrix. More precisely,
an n × n matrix A is real diagonalizable if there exists an invertible n × n matrix S such
that
D = S −1 AS
is a diagonal matrix. In this section we investigate when a matrix is diagonalizable. In this
discussion we assume that all matrices have real entries.
We begin with the observation that not all matrices are real diagonalizable. We saw in
Example 7.2.2 that the diagonal entries of the diagonal matrix D are the eigenvalues of D.
Theorem 7.2.8 states that similar matrices have the same eigenvalues. Thus if a matrix is
real diagonalizable, then it must have real eigenvalues. It follows, for example, that the 2×2
matrix  
0 −1
1 0
is not real diagonalizable, since its eigenvalues are ±i.
However, even if a matrix A has real eigenvalues, it need not be diagonalizable. For example,
the only matrix similar to the identity matrix In is the identity matrix itself. To verify this
point, calculate
S −1 In S = S −1 S = In .
Suppose that A is a matrix all of whose eigenvalues are equal to 1. If A is similar to a
diagonal matrix D, then D must have all of its eigenvalues equal to 1. Since the identity
matrix is the only diagonal matrix with all eigenvalues equal to 1, D = In . So, if A is similar
to a diagonal matrix, it must itself be the identity matrix. Consider, however, the 2 × 2
matrix  
1 1
A= .
0 1
Since A is triangular, it follows that both eigenvalues of A are equal to 1. Since A is not
the identity matrix, it cannot be diagonalizable. More generally, if N is a nonzero strictly
upper triangular n × n matrix, then the matrix In + N is not diagonalizable.
These examples show that complex eigenvalues are always obstructions to real diagonaliza-
{T:diagsimple} tion and multiple real eigenvalues are sometimes obstructions to diagonalization. Indeed,
Theorem 7.3.1. Let A be an n × n matrix with n distinct real eigenvalues. Then A is real
diagonalizable.

There are two ideas in the proof of Theorem 7.3.1, and they are summarized in the following
lemmas.

472
§7.3 Real Diagonalizable Matrices

{L:simpleeigen}
Lemma 7.3.2. Let λ1 , . . . , λk be distinct real eigenvalues for an n × n matrix A. Let vj be
eigenvectors associated with the eigenvalue λj . Then {v1 , . . . , vk } is a linearly independent
set.

Proof We prove the lemma by using induction on k. When k = 1 the proof is simple,
since v1 6= 0. So we can assume that {v1 , . . . , vk−1 } is a linearly independent set.
Let α1 , . . . , αk be scalars such that
{e:linindep} α1 v1 + · · · + αk vk = 0. (7.3.1)
We must show that all αj = 0.
Begin by multiplying both sides of (7.3.1) by A, to obtain:
0 = A(α1 v1 + · · · + αk vk )
{e:linother} = α1 Av1 + · · · + αk Avk (7.3.2)
= α1 λ1 v1 + · · · + αk λk vk .

Now subtract λk times (7.3.1) from (7.3.2), to obtain:


α1 (λ1 − λk )v1 + · · · + αk−1 (λk−1 − λk )vk−1 = 0.
Since {v1 , . . . , vk−1 } is a linearly independent set, it follows that
αj (λj − λk ) = 0,
for j = 1, . . . , k − 1. Since all of the eigenvalues are distinct, λj − λk 6= 0 and αj = 0 for
j = 1, . . . , k − 1. Substituting this information into (7.3.1) yields αk vk = 0. Since vk =
6 0,
{L:eigenv-diag} αk is also equal to zero. 
Lemma 7.3.3. Let A be an n × n matrix. Then A is real diagonalizable if and only if A
has n real linearly independent eigenvectors.

Proof Suppose that A has n linearly independent eigenvectors v1 , . . . , vn . Let λ1 , . . . , λn


be the corresponding eigenvalues of A; that is, Avj = λj vj . Let S = (v1 | · · · |vn ) be the n × n
matrix whose columns are the eigenvectors vj . We claim that D = S −1 AS is a diagonal
matrix. Compute
D = S −1 AS = S −1 A(v1 | · · · |vn ) = S −1 (Av1 | · · · |Avn )
= S −1 (λ1 v1 | · · · |λn vn ).

473
§7.3 Real Diagonalizable Matrices

It follows that
D = (λ1 S −1 v1 | · · · |λn S −1 vn ).
Note that
S −1 vj = ej ,
since
Sej = vj .
Therefore,
D = (λ1 e1 | · · · |λn en )
is a diagonal matrix.
Conversely, suppose that A is a real diagonalizable matrix. Then there exists an invertible
matrix S such that D = S −1 AS is diagonal. Let vj = Sej . We claim that {v1 , . . . , vn } is a
linearly independent set of eigenvectors of A.
Since D is diagonal, Dej = λj ej for some real number λj . It follows that
Avj = SDS −1 vj = SDS −1 Sej = SDej = λj Sej = λj vj .
So vj is an eigenvector of A. Since the matrix S is invertible, its columns are linearly
independent. Since the columns of S are vj , the set {v1 , . . . , vn } is a linearly independent
set of eigenvectors of A, as claimed. 

Theorem 7.3.1 Let λ1 , . . . , λn be the distinct eigenvalues of A and let v1 , . . . , vn be the


corresponding eigenvectors. Lemma 7.3.2 implies that {v1 , . . . , vn } is a linearly independent
set in Rn and therefore a basis. Lemma 7.3.3 implies that A is diagonalizable. 

Diagonalization Using MATLAB Let


 
−6 12 4
{diagonalize-example} A= 8 −21 −8  . (7.3.3*)
−29 72 27
We use MATLAB to answer the questions: Is A real diagonalizable and, if it is, can we find
the matrix S such that S −1 AS is diagonal? We can find the eigenvalues of A by typing
eig(A). MATLAB’s response is:

ans =
-2.0000
-1.0000
3.0000

474
§7.3 Real Diagonalizable Matrices

Since the eigenvalues of A are real and distinct, Theorem 7.3.1 states that A can be diago-
nalized. That is, there is a matrix S such that
 
−1 0 0
S −1 AS =  0 −2 0 
0 0 3

The proof of Lemma 7.3.3 tells us how to find the matrix S. We need to find the eigenvectors
v1 , v2 , v3 associated with the eigenvalues −1, −2, 3, respectively. Then the matrix (v1 |v2 |v3 )
whose columns are the eigenvectors is the matrix S. To verify this construction we first find
the eigenvectors of A by typing

v1 = null(A+eye(3));
v2 = null(A+2*eye(3));
v3 = null(A-3*eye(3));

Now type S = [v1 v2 v3] to obtain

S =
0.8729 0.7071 0
0.4364 0.0000 0.3162
-0.2182 0.7071 -0.9487

Finally, check that S −1 AS is the desired diagonal matrix by typing inv(S)*A*S to obtain

ans =
-1.0000 0.0000 0
0.0000 -2.0000 -0.0000
0.0000 0 3.0000

It is cumbersome to use the null command to find eigenvectors and MATLAB has been
preprogrammed to do these computations automatically. We can use the eig command to
find the eigenvectors and eigenvalues of a matrix A, as follows. Type

[S,D] = eig(A)

and MATLAB responds with

S =

475
§7.3 Real Diagonalizable Matrices

-0.7071 0.8729 -0.0000


-0.0000 0.4364 -0.3162
-0.7071 -0.2182 0.9487

D =
-2.0000 0 0
0 -1.0000 0
0 0 3.0000

The matrix S is the transition matrix whose columns are the eigenvectors of A and the
matrix D is a diagonal matrix whose j th diagonal entry is the eigenvalue of A corresponding
to the eigenvector in the j th column of S.

Exercises

{c10.3.1}
 
0 3
1. Let A = .
3 0

(a) Find the eigenvalues and eigenvectors of A.


(b) Find an invertible matrix S such that S −1 AS is a diagonal matrix D. What is D?

(a) The eigenvalues of A are λ1 = 3 and λ2 = −3, with corresponding eigenvectors v1 = (1, 1)t and
v2 = (1, −1)t , respectively.
(b) Let  
1 1
S = (v1 |v2 ) = .
1 −1
 
3 0
Then D = S −1 AS = is a diagonal matrix.
0 −3

{c10.3.2}
2. The eigenvalues of  
−1 2 −1
A= 3 0 1 
−3 −2 −3
are 2, −2, −4. Find the eigenvectors of A for each of these eigenvalues and find a 3 × 3 invertible
matrix S so that S −1 AS is diagonal.
The eigenvectors of A are v1 = (1, 1, −1)t associated to eigenvalue λ1 = 2; v2 = (1, −1, −1)t
associated to eigenvalue λ2 = −2; and v3 = (1, −1, 1)t associated to eigenvalue λ3 = −4. Find

476
§7.3 Real Diagonalizable Matrices

these vectors by solving (A − λI3 )v = 0 for each eigenvalue λ. The matrix S such that S −1 AS is
diagonal is  
1 1 1
S = (v1 |v2 |v3 ) =  1 −1 −1  .
−1 −1 1
{c10.3.3}
3. Let  
−1 4 −2
A= 0 3 −2  .
0 4 −3
Find the eigenvalues and eigenvectors of A, and find an invertible matrix S so that S −1 AS is
diagonal.
The eigenvalues of A are λ1 = 1 and λ2 = −1. The eigenvector associated to λ1 is v1 = (1, 1, 1)t .
There are two eigenvectors associated to λ2 : v2 = (1, 0, 0)t and v3 = (0, 1, 2)t .
 
1 1 0
S = (v1 |v2 |v3 ) =  1 0 1  .
1 0 2

{c10.3.4}
4. Let A and B be similar n × n matrices.

(a) Show that if A is invertible, then B is invertible.


(b) Show that A + A−1 is similar to B + B −1 .

(a) Let B = P −1 AP be a matrix similar to some invertible matrix A. Then


B −1 = (P −1 AP )−1 = P −1 A−1 (P −1 )−1 = P −1 A−1 P.
Since A−1 exists, B −1 exists also.
(b) If B = P −1 AP , then B −1 = (P −1 AP )−1 = P −1 A−1 P . Therefore,
B + B −1 = P −1 AP + P −1 A−1 P = P −1 (A + A−1 )P
since matrix multiplication is associative. Therefore, A + A−1 is similar to B + B −1 .
{c10.3.5}
5. Let A and B be n × n matrices. Suppose that A is real diagonalizable and that B is similar to
A. Show that B is real diagonalizable.
Let A = P −1 BP for some invertible matrix P , and let D = S −1 AS, where D is a diagonal matrix.
Then
D = S −1 AS = S −1 (P −1 BP )S = (S −1 P −1 )B(P S) = (P S)−1 B(P S).
Therefore, B is also similar to D, so B is real diagonalizable.

477
§7.3 Real Diagonalizable Matrices

{c10.3.6}
6. Let A be an n × n real diagonalizable matrix. Show that A + αIn is also real diagonalizable.
Let S be a matrix such that D = S −1 AS is a diagonal matrix. Then

S −1 (A + αIn )S = S −1 AS + S −1 (αIn )S = D + αIn .

The matrices D and αIn are both diagonal; so D + αIn is also diagonal. Therefore, A + αIn is
diagonalizable.

{c10.3.6A}
7. Let A be an n × n matrix with a real eigenvalue λ and associated eigenvector v. Assume that
all other eigenvalues of A are different from λ. Let B be an n × n matrix that commutes with A;
that is, AB = BA. Show that v is also an eigenvector for B.
We assume that Av = λv and that AB = BA. It follows that ABv = BAv = λBv. Therefore Bv
is an eigenvector of A with eigenvalue λ. Since λ has only one independent eigenvector, it follows
that v is a multiple of v; that is, there is a scalar µ such that Bv = µv.

{c10.3.6B}
8. Let A be an n × n matrix with distinct real eigenvalues and let B be an n × n matrix that
commutes with A. Using the result of Exercise 7, show that there is a matrix S that simultaneously
diagonalizes A and B; that is, S −1 AS and S −1 BS are both diagonal matrices.
Suppose that BA = AB and that the eigenvalues of A are distinct. By Theorem 7.3.1 and
Lemma 7.3.3, there is a basis v1 , . . . , vn of Rn consisting of eigenvectors of A. By Exercise 7,
these vectors are also eigenvectors of B. Let S = (v1 | · · · |vn ). Then both matrices S −1 AS and
S −1 BS are diagonal matrices.

{c10.3.6C}
9. Let A be an n × n matrix all of whose eigenvalues equal ±1. Show that if A is diagonalizable,
the A2 = In .
Since A is diagonalizable, there is an invertible matrix S such that S −1 AS is diagonal. The diagonal
entries of S −1 AS are the eigenvalues of A; that is, the diagonal entries equal ±1. Therefore,
(S −1 AS)2 = In . But (S −1 AS)2 = S −1 A2 S. Therefore, S −1 A2 S = In which implies that A2 = In .

{c10.3.6D}
10. Let A be an n×n matrix all of whose eigenvalues equal 0 and 1. Show that if A is diagonalizable,
the A2 = A.
Since A is diagonalizable, there is an invertible matrix S such that S −1 AS is diagonal. The diagonal
entries of S −1 AS are the eigenvalues of A; that is, the diagonal entries equal 0 and 1. Therefore,
(S −1 AS)2 = S −1 AS. But (S −1 AS)2 = S −1 A2 S. Therefore, S −1 A2 S = S −1 AS which implies that
A2 = A.

478
§7.3 Real Diagonalizable Matrices

{c10.3.7} 11. (matlab) Consider the 4 × 4 matrix


 
12 48 68 88
 −19 −54 −57 −68 
-by-four-diagonalization} C=  22
. (7.3.4*)
52 66 96 
−11 −26 −41 −64

Use MATLAB to show that the eigenvalues of C are real and distinct. Find a matrix S so that
S −1 CS is diagonal.
Verify that the eigenvalues of C are real and distinct using the MATLAB command eig(C), which
yields:

ans =
-4.0000
-12.0000
-8.0000
-16.0000

In order to find the matrix S, either use the null command to find the eigenvectors of C individually,
or type [S,D] = eig(C) to obtain the matrix S and the diagonal matrix D = S −1 CS:

S =
0.5314 -0.5547 0.0000 0.4082
-0.4871 0.5547 -0.4082 -0.8165
0.6199 -0.5547 0.8165 0.4082
-0.3100 0.2774 -0.4082 0.0000

D =
-4.0000 0 0 0
0 -12.0000 0 0
0 0 -8.0000 0
0 0 0 -16.0000

In Exercises 12 – 13 use MATLAB to decide whether or not the given matrix is real diagonalizable.

{c10.3.8a} 12. (matlab)  


−2.2 4.1 −1.5 −0.2
 −3.4 4.8 −1.0 0.2 
diagonalization-exercise} A=
 . (7.3.5*)
−1.0 0.4 1.9 0.2 
−14.5 17.8 −6.7 0.6

Answer: Matrix A is real diagonalizable.

479
§7.3 Real Diagonalizable Matrices

Solution: Compute the eigenvalues of A using MATLAB. By Theorem 7.3.1, a matrix is real
diagonalizable if it has real distinct eigenvalues. Thus, A is real diagonalizable.

{c10.3.8b} 13. (matlab)  


1.9 2.2 1.5 −1.6 −2.8
 0.8 2.6 1.5 −1.8 −2.0 
agonalization-exercise-2} (7.3.6*)
 
B=
 2.6 2.8 1.6 −2.1 −3.8 .

 4.8 3.6 1.5 −3.1 −5.2 
−2.1 1.2 1.7 −0.2 0.0

Answer: Matrix B is not real diagonalizable.


Solution: Compute the eigenvalues of B using MATLAB. If a matrix is real diagonalizable, it has
real eigenvalues. Matrix B has complex eigenvectors, and is therefore not real diagonalizable.

480
§7.4 *Existence of Determinants

{A:det} 7.4 *Existence of Determinants


The purpose of this appendix is to verify the inductive definition of determinant (7.1.9). We
have already shown that if a determinant function exists, then it is unique. We also know
that the determinant function exists for 1 × 1 matrices. So we assume by induction that
the determinant function exists for (n − 1) × (n − 1) matrices and prove that the inductive
definition gives a determinant function for n × n matrices.
Recall that Aij is the cofactor matrix obtained from A by deleting the ith row and j th
column — so Aij is an (n − 1) × (n − 1) matrix. The inductive definition is:

D(A) = a11 det(A11 ) − a12 det(A12 ) + · · · + (−1)n+1 a1n det(A1n ).

We use the notation D(A) to remind us that we have not yet verified that this definition
satisfies properties (a)-(c) of Definition 7.1.1. In this appendix we verify these properties
after assuming that the inductive definition satisfies properties (a)-(c) for (n − 1) × (n − 1)
matrices. For emphasis, we use the notation det to indicate the determinant of square
matrices of size less than n. Note that Lemma 7.1.5, the computation of determinants of
elementary row operations, can therefore be assumed valid for (n − 1) × (n − 1) matrices.
We begin with the following two lemmas.
{L:two_equal}
Lemma 7.4.1. Let C be an n × n matrix. If two rows of C are equal or one row of C is
zero, then D(C) = 0.

Proof Suppose that row i of C is zero. If i > 1, then each cofactor has a zero row and by
induction the determinant of the cofactor is 0. If row 1 is zero, then the cofactor expansion
is 0 and D(C) = 0.
Suppose that row i and row j are equal, where i > 1 and j > 1. Then the result follows by
the induction hypothesis, since each of the cofactors has two equal rows. So, we can assume
that row 1 and row j are equal. If j > 2, let Ĉ be obtained from C by swapping rows j and
2. The cofactors Ĉ1k are then obtained from the cofactors C1k by swapping rows j −1 and 1.
The induction hypothesis then implies that det(Ĉ1k ) = − det(C1k ) and det(Ĉ) = − det(C).
Thus, verifying that det(C) = 0 reduces to verifying the result when rows 1 and 2 are equal.
Indeed, the most difficult part of this proof is the calculation that shows that if the 1st and
2nd rows of C are equal, then D(C) = 0. This calculation is tedious and requires some
facility with indexes and summations. Rather than prove this for general n, we present the
proof for n = 4. This case contains all of the ideas of the general proof.

481
§7.4 *Existence of Determinants

We can assume that  


a1 a2 a3 a4
 a1 a2 a3 a4 
C=
 c31

c32 c33 c34 
c41 c42 c43 c44

Using the cofactor definition D(C) =


   
a2 a3 a4 a1 a3 a4
a1 det  c32 c33 c34  − a2 det  c31 c33 c34  +
 c42 c43 c44   c41 c43 c44 
a1 a2 a4 a1 a2 a3
a3 det  c31 c32 c34  − a4 det  c31 c32 c33  .
c41 c42 c44 c41 c42 c43

Next we expand each of the four 3 × 3 matrices along their 1st rows, obtaining D(C) =
      
c33 c34 c32 c34 c32 c33
a1 a2 det − a3 det + a4 det
 c43 c44  c42 c44  c42 c43 
c33 c34 c31 c34 c31 c33
−a2 a1 det − a3 det + a4 det
  c43 c44   c41 c44   c41 c43 
c32 c34 c31 c34 c31 c32
+a3 a1 det − a2 det + a4 det
  c 42 c 44   c 41 c 44   c41 c42 
c32 c33 c31 c33 c31 c32
−a4 a1 det − a2 det + a3 det
c42 c43 c41 c43 c41 c42

Combining the 2 × 2 determinants leads to D(C) = 0. 


{L:EB}
Lemma 7.4.2. Let E be an n × n elementary row matrix and let B be an n × n matrix.
Then
{e:proddetE} D(EB) = D(E)D(B) (7.4.1)

Proof We recall that the three elementary row operations are generated by two: (I)
multiply row i by a nonzero scalar c and (II) add row i to row j. The remaining elementary
row operations are obtained as follows. Adding c times row i to row j is the composition
of multiplying row i by c, adding row i to row j, and multiplying row i by 1/c. For
2 × 2 matrices swapping rows 1 and 2 was written in terms of four other elementary row
operations in (7.1.4). This observation works in general, as follows. Consider the sequence
of row operations:

• add row j to row i

482
§7.4 *Existence of Determinants

• multiply row j by −1

• add row i to row j

• subtract row j from row i

We can write swapping rows i and j schematically as:


         
ri ri + rj ri + rj ri + rj rj
→ → → →
rj rj −rj ri ri

Thus, we need to verify (7.4.1) for two types of elementary row operation: multiply row i
by c 6= 0 and add row j to row i.
(I) Suppose that E multiplies the ith row by a nonzero scalar c. If i > 1, then the cofactor
matrix (EA)1j is obtained from the cofactor matrix A1j by multiplying the (i − 1)st row
by c. By induction, det(EA)1j = c det(A1j ) and D(EA) = cD(A). On the other hand,
D(E) = det(E11 ) = c. So (7.4.1) is verified in this instance. If i = 1, then the 1st row of
EA is (ca11 , . . . , ca1n ) from which it is easy to use the cofactor formula to verify (7.4.1).
(II) Next suppose that E adds row i to row j. If i, j > 1, then the result follows from the
induction hypothesis since the new cofactors are obtained from the old cofactors by adding
row i − 1 to row j − 1.
If j = 1, then

D(EB) = (b11 + bi1 ) det(B11 ) + · · · +


(−1)n+1 (b1n + bin ) det(B1n )
= b11 det(B11 ) + · · · + (−1)n+1 b1n det(B1n ) +
 

bi1 det(B11 ) + · · · + (−1)n+1 bi1 det(B1n )


 

= D(B) + D(C)

where the 1st and ith rows of C are equal. The fact that D(C) = 0 follows from Lemma 7.4.1.
If i = 1, then the cofactors are unchanged. It follows by direct calculation of the cofactor
expansion that D(EB) = D(B) + D(C) where the 1st and ith rows of C are equal. Again,
the fact that D(C) = 0 follows from Lemma 7.4.1. 

Property (a) is verified for D(A) using cofactors since if A is lower triangular, then

D(A) = a11 det(A11 )

483
§7.4 *Existence of Determinants

and
det(A11 ) = a22 · · · ann
by the induction hypothesis.
Property (c) (D(AB) = D(A)D(B)) is proved separately for A singular and A nonsigular.
In either case, row reduction implies that A = Es · · · E1 R where R is in reduced echelon
form.
If A is singular, the the bottom row of R is zero and together Lemmas 7.4.1 and 7.4.2 imply
that D(A) = 0. On the other hand these lemmas also imply that

D(AB) = D(Es · · · E1 RB) = D(Es · · · E1 )D(RB)

and direct calculation shows that the bottom row of RB is also zero. Hence D(RB) = 0
and property (c) is valid.
Next suppose now that A is nonsingular. It follows that

AB = Es · · · E1 B.

Using (7.4.1) we see that

D(AB) = D(Es ) · · · D(E1 )D(B) = D(Es · · · E1 )D(B) = D(A)D(B),

as desired.
Before verifying property (b) we prove the following:
Lemma 7.4.3. Let E be an elementary row operation matrix. Then D(E t ) = D(E). An
n × n matrix A is singular if and only if At is singular.

Proof The two generators of elementary row operations are: multiply row i by c and add
row i to row j. The first matrix is diagonal; so E t = E. Denote the second matrix by Fij . It
follows that Fijt = Fji . We claim that D(Fij ) = 1 for all i, j and hence that D(E t ) = D(E)
for all E. If i < j, then Fij is lower triangular with 1s on the diagonal. Hence D(Fij ) = 1.
If 1 < j < i, then D(Fijt ) = D(Fji ) = 1 by induction. If j = 1, then D(Fi1 t
) = 1 by direct
calculation.
If A is singular, then A = Es · · · E1 R, where R is in reduced echelon form and its bottom
row is zero. Hence R is singular. It follows that D(A) = 0. Note that

At = Rt E1t · · · Est

Here we use the fact that (BC)t = C t B t that was discussed in (3.6.1). By counting pivots
in R, we see that the column space and the row space of R have the same dimensions.

484
§7.4 *Existence of Determinants

Therefore, the dimension of the row space of Rt equals the dimension of the column space
of Rt equals the dimension of the row space of R, and all of these are less than n. Hence
Rt is singular. Therefore, there exists a nonzero n-vector w such that Rt w = 0. It follows
that v = (Est )−1 · · · (E1t )−1 w satisfies At w = Rt v = 0 and At is singular. 

Property (b) We prove D(At ) = D(A) in two steps. Write

{e:Adecomp} A = Es · · · E1 R, (7.4.2)

where the Ej are elementary row matrices and R is in reduced echelon form. It follows that

{e:Atdecomp} At = Rt E1t · · · Est . (7.4.3)

If A is invertible, then R = In and D(At ) = D(A). If A is singular, then At is also singular


and D(A) = 0 = D(At ).
We have now completed the proof that a determinant function exists.

485
Chapter 8 Linear Maps and Changes of Coordinates

8 Linear Maps and Changes of Coordinates


The first section in this chapter, Section 8.1, defines linear mappings between abstract vector
spaces, shows how such mappings are determined by their values on a basis, and derives
basic properties of invertible linear mappings.
The notions of row rank and column rank of a matrix are discussed in Section 8.2 along
with the theorem that states that these numbers are equal to the rank of that matrix.
Section 8.3 discusses the underlying meaning of similarity — the different ways to view
the same linear mapping on Rn in different coordinates systems or bases. This discussion
makes sense only after the definitions of coordinates corresponding to bases and of changes
in coordinates are given and justified. In Section 8.4, we discuss the matrix associated to a
linearity transformation between two finite dimensional vector spaces in a given set of coor-
dinates and show that changes in coordinates correspond to similarity of the corresponding
matrices.

486
§8.1 Linear Mappings and Bases

{C:LMCC}

{Sect:linmap} 8.1 Linear Mappings and Bases


The examples of linear mappings from Rn → Rm that we introduced in Section 3.3 were
matrix mappings. More precisely, let A be an m × n matrix. Then

LA (x) = Ax

defines the linear mapping LA : Rn → Rm . Recall that Aej is the j th column of A (see Chap-
ter 3, Lemma 3.3.4); it follows that A can be reconstructed from the vectors Ae1 , . . . , Aen .
This remark implies (Chapter 3, Lemma 3.3.3) that linear mappings of Rn to Rm are de-
termined by their values on the standard basis e1 , . . . , en . Next we show that this result
is valid in greater generality. We begin by defining what we mean for a mapping between
{D:linearV} vector spaces to be linear.
Definition 8.1.1. Let V and W be vector spaces and let L : V → W be a mapping. The
map L is linear if

L(u + v) = L(u) + L(v)


L(cv) = cL(v)

for all u, v ∈ V and c ∈ R.

Examples of Linear Mappings (a) Let v ∈ Rn be a fixed vector. Use the dot product to
define the mapping L : Rn → R by

L(x) = x · v.

Then L is linear. Just check that

L(x + y) = (x + y) · v = x · v + y · v = L(x) + L(y)

for every vector x and y in Rn and

L(cx) = (cx) · v = c(x · v) = cL(x)

for every scalar c ∈ R.


(b) The map L : C 1 → R defined by

L(f ) = f 0 (2)

487
§8.1 Linear Mappings and Bases

is linear. Indeed,

L(f + g) = (f + g)0 (2) = f 0 (2) + g 0 (2) = L(f ) + L(g).

Similarly, L(cf ) = cL(f ).


(c) The map L : C 1 → C 1 defined by

L(f )(t) = f (t − 1)

is linear. Indeed,

L(f + g)(t) = (f + g)(t − 1) = f (t − 1) + g(t − 1) = L(f )(t) + L(g)(t).

Similarly, L(cf ) = cL(f ). It may be helpful to compute L(f )(t) when f (t) = t2 − t + 1.
That is,

L(f )(t) = (t − 1)2 − (t − 1) + 1 = t2 − 2t + 1 − t + 1 + 1 = t2 − 3t + 3.

{L:linmapfrombasis} Constructing Linear Mappings from Bases


Theorem 8.1.2. Let V and W be vector spaces. Let {v1 , . . . , vn } be a basis for V and let
{w1 , . . . , wn } be n vectors in W . Then there exists a unique linear map L : V → W such
that L(vi ) = wi .

Proof Let v ∈ V be a vector. Since span{v1 , . . . , vn } = V , we may write v as

v = α1 v1 + · · · + αn vn ,

where α1 , . . . , αn in R. Moreover, v1 , . . . , vn are linearly independent, these scalars are


uniquely defined. More precisely, if

α1 v1 + · · · + αn vn = β1 v1 + · · · + βn vn ,

then
(α1 − β1 )v1 + · · · + (αn − βn )vn = 0.
Linear independence implies that αj − βj = 0; that is αj = βj . We can now define

{e:v-coord} L(v) = α1 w1 + · · · + αn wn . (8.1.1)

We claim that L is linear. Let v̂ ∈ V be another vector and let

v̂ = β1 v1 + · · · + βn vn .

488
§8.1 Linear Mappings and Bases

It follows that
v + v̂ = (α1 + β1 )v1 + · · · + (αn + βn )vn ,
and hence by (8.1.1) that

L(v + v̂) = (α1 + β1 )w1 + · · · + (αn + βn )wn


= (α1 w1 + · · · + αn wn ) + (β1 w1 + · · · + βn wn )
= L(v) + L(v̂).

Similarly

L(cv) = L((cα1 )v1 + · · · + (cαn )vn )


= c(α1 w1 + · · · + αn wn )
= cL(v).

Thus L is linear.
Let M : V → W be another linear mapping such that M (vi ) = wi . Then

L(v) = L(α1 v1 + . . . + αn vn )
= α1 w1 + · · · + αn wn
= α1 M (v1 ) + · · · + αn M (vn )
= M (α1 v1 + · · · + αn vn )
= M (v).

Thus L = M and the linear mapping is uniquely defined. 

There are two assertions made in Theorem 8.1.2. The first is that a linear map exists
mapping vi to wi . The second is that there is only one linear mapping that accomplishes
this task. If we drop the constraint that the map be linear, then many mappings may satisfy
these conditions. For example, find a linear map from R → R that maps 1 to 4. There is
only one: y = 4x. However there are many nonlinear maps that send 1 to 4. Examples are
y = x + 3 and y = 4x2 .

Finding the Matrix of a Linear Map from Rn → Rm Given by Theorem 8.1.2 Suppose that
V = Rn and W = Rm . We know that every linear map L : Rn → Rm can be defined as
multiplication by an m × n matrix. The question that we next address is: How can we find
the matrix whose existence is guaranteed by Theorem 8.1.2?

489
§8.1 Linear Mappings and Bases

More precisely, let v1 , . . . , vn be a basis for Rn and let w1 , . . . , wn be vectors in Rm . We


suppose that all of these vectors are row vectors. Then we need to find an m × n matrix A
such that Avit = wit for all i. We find A as follows. Let v ∈ Rn be a row vector. Since the
vi form a basis, there exist scalars αi such that

v = α1 v1 + · · · + αn vn .

In coordinates  
α1
{e:v^t} v t = (v1t | · · · |vnt )  ...  , (8.1.2)
 

αn

where (v1t | · · · |vnt ) is an n × n invertible matrix. By definition (see (8.1.1))

L(v) = α1 w1 + · · · + αn wn .

Thus the matrix A must satisfy


 
α1
Av t = (w1t | · · · |wnt )  ...  ,
 

αn

where (w1t | · · · |wnt ) is an m × n matrix. Using (8.1.2) we see that

Av t = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 v t ,

and
{e:defA} A = (w1t | · · · |wnt )(v1t | · · · |vnt )−1 (8.1.3)
is the desired m × n matrix.

An Example of a Linear Map from R3 to R2 As an example we illustrate Theorem 8.1.2 and


(8.1.3) by defining a linear mapping from R3 to R2 by its action on a basis. Let

v1 = (1, 4, 1) v2 = (−1, 1, 1) v3 = (0, 1, 0).

We claim that {v1 , v2 , v3 } is a basis of R3 and that there is a unique linear map for which
L(vi ) = wi where
w1 = (2, 0) w2 = (1, 1) w3 = (1, −1).

490
§8.1 Linear Mappings and Bases

We can verify that {v1 , v2 , v3 } is a basis of R3 by showing that the matrix


 
1 −1 0
(v1t |v2t |v3t ) =  4 1 1 
1 1 0

is invertible. This can either be done in MATLAB using the inv command or by hand by
row reducing the matrix
 
1 −1 0 1 0 0
 4 1 1 0 1 0 
1 1 0 0 0 1
to obtain  
1 0 1
1
(v1t |v2t |v3t )−1 = −1 0 1 .
2
−3 2 −5
Now apply (8.1.3) to obtain
 
  1 0 1  
1 2 1 1  −1 −1 1 −1
A= 0 1  = .
2 0 1 −1 1 −1 3
−3 2 −5

As a check, verify by matrix multiplication that Avi = wi , as claimed.

Properties of Linear Mappings


{L:compose}
Lemma 8.1.3. Let U, V, W be vector spaces and L : V → W and M : U → V be linear
maps. Then L◦M : U → W is linear.

Proof The proof of Lemma 8.1.3 is identical to that of Chapter 3, Lemma 3.5.1. 

A linear map L : V → W is invertible if there exists a linear map M : W → V such that


L◦M : W → W is the identity map on W and M ◦L : V → V is the identity map on V .
{T:invertbasis}
Theorem 8.1.4. Let V and W be finite dimensional vector spaces and let v1 , . . . , vn be a
basis for V . Let L : V → W be a linear map. Then L is invertible if and only if w1 , . . . , wn
is a basis for W where wj = L(vj ).

491
§8.1 Linear Mappings and Bases

Proof If w1 , . . . , wn is a basis for W , then use Theorem 8.1.2 to define a linear map
M : W → V by M (wj ) = vj . Note that

L◦M (wj ) = L(vj ) = wj .

It follows by linearity (using the uniqueness part of Theorem 8.1.2) that L◦M is the identity
of W . Similarly, M ◦L is the identity map on V , and L is invertible.
Conversely, suppose that L◦M and M ◦L are identity maps and that wj = L(vj ). We must
show that w1 , . . . , wn is a basis. We use Theorem 5.5.3 and verify separately that w1 , . . . , wn
are linearly independent and span W .
If there exist scalars α1 , . . . , αn such that

α1 w1 + · · · + αn wn = 0,

then apply M to both sides of this equation to obtain

0 = M (α1 w1 + · · · + αn wn ) = α1 v1 + · · · + αn vn .

But the vj are linearly independent. Therefore, αj = 0 and the wj are linearly independent.
To show that the wj span W , let w be a vector in W . Since the vj are a basis for V , there
exist scalars β1 , . . . , βn such that

M (w) = β1 v1 + · · · + βn vn .

Applying L to both sides of this equation yields

w = L◦M (w) = β1 w1 + · · · + βn wn .

Therefore, the wj span W . 

Corollary 8.1.5. Let V and W be finite dimensional vector spaces. Then there exists an
invertible linear map L : V → W if and only if dim(V ) = dim(W ).

Proof Suppose that L : V → W is an invertible linear map. Let v1 , . . . , vn be a basis for


V where n = dim(V ). Then Theorem 8.1.4 implies that L(v1 ), . . . , L(vn ) is a basis for W
and dim(W ) = n = dim(V ).
Conversely, suppose that dim(V ) = dim(W ) = n. Let v1 , . . . , vn be a basis for V and let
w1 , . . . , wn be a basis for W . Using Theorem 8.1.2 define the linear map L : V → W by
L(vj ) = wj . Theorem 8.1.4 states that L is invertible. 

492
§8.1 Linear Mappings and Bases

Exercises

{c7.2.1}
1. Use Theorem 8.1.2 and (8.1.3) to construct matrix of a linear mapping L from R3 to R2 with
L(vi ) = wi , i = 1, 2, 3, where
v1 = (1, 0, 2) v2 = (2, −1, 1) v3 = (−2, 1, 0)
and
w1 = (−1, 0) w2 = (0, 1) w3 = (3, 1).

Solution: Compute A, the matrix of L, using Equation (8.1.3):


 −1
  1 2 −2  
t t t t t t −1 −1 0 3  −7 −11 3
A = (w1 |w2 |w3 )(v1 |v2 |v3 ) = 0 −1 1  = .
0 1 1 −4 −7 2
2 1 0

{c7.2.2}
2. Let Pn be the vector space of polynomials p(t) of degree less than or equal to n. Show that
{1, t, t2 , . . . , tn } is a basis for Pn .
To show that the set {1, t, t2 , . . . , tn } is a basis for Pn , we must show that the n + 1 polynomials are
linearly independent and span Pn . The polynomials are independent because the general polynomial
of degree n:
α1 + α2 t + α3 t2 + · · · + αn+1 tn
is identically 0 for all values of t only when α1 = α2 = · · · = αn+1 = 0. The polynomials span Pn
because every polynomial p(t) of degree n has the form
p(t) = β1 + β2 t + · · · + βn+1 tn
which is a linear combination of the polynomials {1, t, t2 , . . . , tn } for any p(t) in Pn .
{mc.exercise3}
3. Which of the following mappings T are linear? Circle those maps that are linear and cross out
those maps that are not linear.

(a) T : R2 → R3 where T (x1 , x2 ) = (2x1 − x2 , x2 + x1 , x1 + 2)


Z 5
(b) T : Pk → R2 where T (p(t)) = (p0 (1), p(t)dt)
2
(c) Fix B ∈ Mn,n . Then T : Mn,n → Mn,n where T (A) = AB − BA

Recall that Pk is the vector space of polynomials of degree ≤ k and Mn,n is the vector space of
n × n square matrices.
Answer:

493
§8.1 Linear Mappings and Bases

(a) T : R2 → R3 where T (x1 , x2 ) = (2x1 − x2 , x2 + x1 , x1 + 2)


Z 5
(b) T : Pk → R2 where T (p(t)) = (p0 (1), p(t)dt)
2

(c) Fix B ∈ Mn,n . Then T : Mn,n → Mn,n where T (A) = AB − BA

Solution:

(a) If T is linear, then T maps the zero vector to the zero vector. However, T (0, 0) = (0, 0, 2) 6=
(0, 0, 0); so T is not linear.
 Z 5 
d
(b) Since T (p(t)) = p(t) , p(t) dt , it suffices to show that
dt t=1 2
Z 5
d
p(t) 7→ p(t) = p0 (1) and p(t) 7→ p(t) dt
dt t=1 2

are each linear, and that follows from elementary calculus.


(c) One needs to show that T is closed under scalar multiplication and addition. Let c ∈ R and
A1 , A2 ∈ Mn,n . Then

T (cA) = (cA)B − B(cA) = c(AB) − c(BA) = c(AB − BA) = cT (A).

T (A1 + A2 ) = (A1 + A2 )B − B(A1 + A2 ) = A1 B + A2 B − BA1 − BA2


= A1 B − BA1 + A2 B − BA2 = T (A1 ) + T (A2 ).

This shows T is linear.


{c7.2.2a}
4. Show that
d
: P3 → P2
dt
is a linear mapping.
d d
Let be a transformation that maps p(t) 7→ p(t). For p(t) = p0 + p1 t + p2 t2 + p3 t3 , then
dt dt
d d
p(t) = p1 + 2p2 t + 3p3 t2 , so is a mapping P3 → P2 . From calculus, we know that, for any
dt dt
functions f and g:
d d d
(f + g)(t) = f (t) + g(t),
dt dt dt
and that, for any scalar c:
d d
(cf )(t) = c f (t).
dt dt
d
Let f and g be elements of P3 . Then : P3 → P2 is a linear mapping.
dt

494
§8.1 Linear Mappings and Bases

{c7.2.2b}
5. Show that Z t
L(p) = p(s)ds
0
is a linear mapping of P2 → P3 .
1 1
Let p(t) = p1 + p2 t + p3 t2 . Then the transformation L maps p(t) 7→ L(p(t)) = p1 t + p2 t2 + p3 t3 ,
2 3
so L is indeed a mapping P2 → P3 . We know from calculus that, for any functions f and g:
Z t Z t Z t
(f + g)(t) = f (t) + g(t)
0 0 0

And, for any scalar c ∈ R, Z t Z t


(cf )(t) = c f (t).
0 0
Let f and g be elements of P2 . Then L is a linear mapping.

{mc.exerciseErr1}
6. Let P 3 ⊂ C 1 be the vector space of polynomials p(t) of degree less than or equal to three; that
is,
P 3 = a3 t3 + a2 t2 + a1 t + a0 : a0 , a1 , a2 , a3 ∈ R


dp
Let T : P3 → R be the function T (p) = (0), where p ∈ P.
dt
(a) Show that T is linear.
(b) Find a basis for the null space of T .
(c) Let S : P3 → R be the function S(p) = p(0)2 . Show that S is not linear.

Solution:

(a) Differentiation is linear; that is,

d(f + g) df dg d(cf ) df
(t) = (t) + (t) and (t) = c (t).
dt dt dt dt dt
On evaluation at x = 0 we see that
d(f + g) df dg d(cf ) df
(0) = (0) + (0) and (0) = c (0)
dt dt dt dt dt
from which it follows that

T (f + g) = T (f ) + T (g) and T (cf ) = cT (f )

and hence that T is linear.

495
§8.1 Linear Mappings and Bases

(b) Let
p(t) = a0 + a1 t + a2 t2 + a3 t3 ∈ P 3 .
Then
dp
(t) = 3a3 t2 + 2a2 t + a1 .
dt
dp
So, (0) = a1 . Hence, null space(T ) consists of all polynomials in P 3 with a1 = 0, which are
dt
polynomials of the form p(x) = a0 + a2 t2 + a3 t3 . By inspection, a basis for this null space is
{1, t2 , t3 }.
(c) To show that S is not linear, it suffices to find polynomials p(t), q(t) ∈ P 3 for which S(p(t) +
q(t)) 6= S(p(t)) + S(q(t)). Take p(t) = 1 and q(t) = 1, the polynomials with constant value 1.
Then
S(p(t) + q(t)) = S(2) = 4,
but
S(p(t)) + S(q(t)) = 12 + 12 = 2,
which furnishes a counter-example to linearity.
{c7.2.2c}
7. Use Exercises 4, 5 and Theorem 8.1.2 to show that
d
◦L : P2 → P2
dt
is the identity map.
d
Let M = ◦ L be a mapping P2 → P2 . The fundamental theorem of calculus states that, for any
dt
function g, Z t
d
g(τ )dτ = g(t).
dt 0
Thus M (g) = g is valid for all values of g, so M is the identity map.
To prove this fact explicitly for this case, note that M is the identity mapping if it maps
every polynomial in P2 to itself. Lemma 8.1.2 states that this mapping can be uniquely determined
by a basis of P2 . According to Exercise 2, {1, t, t2 } is a basis for P2 . Therefore, M is the identity
map if M (1) = 1, M (t) = t and M (t2 ) = t2 :
Z t 
d d d
◦ L(1) = ds = (t) = 1.
dt dt 0 dt
Z t
d t2
  
d d
◦ L(t) = sds = = t.
dt dt 0 dt 2
Z t   3
d d d t
◦ L(t2 ) = s2 ds = = t2 .
dt dt 0 dt 3
d
So ◦ L is indeed the identity map for P2 .
dt

496
§8.1 Linear Mappings and Bases

{mc.exercise11}
8. Let W ⊂ Rn be a k-dimensional subspace where k < n. Define

W ⊥ = {v ∈ Rn : v · w = 0 for all w ∈ W }

(a) Show that W ⊥


is a subspace of R .
n

(b) Find a basis for W ⊥ in the special case that W = span{e1 , e2 , e3 } ⊂ R5 .

Solution:

(a) If v ∈ W ⊥ , then for any c ∈ Rn , cv ∈ W ⊥ since (cv) · w = c(v · w) = 0, so cv ∈ W ⊥ . Let


v1 , v2 ∈ W ⊥ . Then (v1 + v2 ) · w = v1 · w + v2 · w = 0 + 0 = 0. So v1 + v2 ∈ W ⊥ . So W ⊥ is a
subspace of Rn .
(b) A good guess would be that W ⊥ has a basis {e4 , e5 }. Since every vector v ∈ R5 is a linear
combination of the ei (i = 1, 2, . . . , 5), v ∈ W ⊥ precisely when v = c4 e4 + c5 e5 for some
c4 , c5 ∈ R. Indeed, if v = c1 e1 + · · · + c5 e5 where the ci ∈ R, then v ∈ W ⊥ forces v · ei = 0 for
i = 1, 2, 3. By inspection, v · ei = ci , so ci = 0. Thus, every vector in W ⊥ must be of the form
c4 e4 + c5 e5 and hence {e4 , e5 } is a basis for W ⊥ .

{c7.2.3}
9. Let C denote the set of complex numbers. Verify that C is a two-dimensional vector space.
Show that L : C → C defined by
L(z) = λz,
where λ = σ + iτ is a linear mapping.
The space C is a two dimensional real vector space since every element of C can be written as
c = σ + τ i, a linear combination of the linearly independent two dimensional set {1, i}.
Let z1 and z2 be elements of R. Then,

L(z1 + z2 ) = λ(z1 + z2 ) = λz1 + λz2 = L(z1 ) + L(z2 ).

For any real scalar c,


L(cz1 ) = λ(cz1 ) = cλz1 = cL(z1 ).
Therefore, L(z) = λz is a linear mapping.

{c7.2.4}
10. Let M(n) denote the vector space of n × n matrices and let A be an n × n matrix. Let
L : M(n) → M(n) be the mapping defined by L(X) = AX − XA where X ∈ M(n). Verify that L
is a linear mapping. Show that the null space of L, {X ∈ M : L(X) = 0}, is a subspace consisting
of all matrices that commute with A.
Let X and Y be elements of M(n). Then,

L(X + Y ) = A(X + Y ) − (X + Y )A = (AX − XA) + (AY − Y A) = L(X) + L(Y ).

497
§8.1 Linear Mappings and Bases

For any real scalar c,

L(cX) = A(cX) − (cX)A = c(AX) − c(XA) = c(AX − XA) = cL(X).

Therefore, L is a linear mapping.


The null space of L consists of all matrices X such that AX − XA = 0, or AX = XA. By
definition, these are the matrices such that X commutes with A. Let X and Y be elements of the
null space of L. Then show that X + Y is in the null space by calculating:

L(X + Y ) = A(X + Y ) − (X + Y )A = AX + AY − XA − Y A = AX + AY − AX − AY = 0.

Show that, for any real scalar c, cX is in the null space by calculating:

L(cX) = A(cX) − (cX)A = cAX − cXA = cAX − cAX = 0.

Therefore, the null space of L is a subspace consisting of all matrices that commute with A.

{mc.exercise12}
11. Which of the following are True and which False. Give reasons for your answer.

(a) For any n × n matrix A, det(A) is the product of its n eigenvalues.


(b) Similar matrices always have the same eigenvectors.
(c) For any n × n matrix A and scalar k ∈ R, det(kA) = kn det(A).
(d) There is a linear map L : R3 → R2 such that

L(1, 2, 3) = (0, 1) and L(2, 4, 6) = (1, 1).

(e) The only rank 0 matrix is the zero matrix.

Answer: True statements are circled and False statements are crossed out.

(a) For any n × n matrix A, det(A) is the product of its n eigenvalues.


(b) Similar matrices always have the same eigenvectors.
(c) For any n × n matrix A and scalar k ∈ R, det(kA) = kn det(A).

(d) There is a linear map L : R3 → R2 such that L(1, 2, 3) = (0, 1) and L(2, 4, 6) = (1, 1).
(e) The only rank 0 matrix is the zero matrix.

Solution:

(a) This follows from Theorem 7.2.4(a).

498
§8.1 Linear Mappings and Bases

(b) Suppose matrices A and B are similar.


 Then,  by definition,
 thereexists an invertible
 matrix P
0 1 1 1 1 −1
such that A = P BP . Let B =
−1
and P = . Then P −1
=
1 0 0 1 0 1
 
−1 0
and A = P BP =
−1
. B has eigenvectors (1, 1) and (1, −1). A has eigenvectors
1 1
(0, 1) and (2, −1). These are not the same.
(c) Use properties of the determinant to compute

det(kA) = det(kIn A) = det(kIn ) det(A) = kn det A.

(d) Since the vector (2, 4, 6) = 2(1, 2, 3), linearity forces L(2, 4, 6) = 2L(1, 2, 3). But 2L(1, 2, 3) =
(0, 2) 6= (1, 1). So there exists no such function.
(e) The rank of a matrix A is the number of linearly independent rows. Hence, if a row of A is
nonzero, then the rank(A) ≥ 1. Therefore, rank(A) = 0 implies all rows of A equal 0 and
A = 0.

{c7.2.5}
Z 2π
12. Let L : C 1 → R be defined by L(f ) = f (t) cos(t)dt for f ∈ C 1 . Verify that L is a linear
0
mapping.
Let f and g be functions in C 1 . Then,
Z 2π
L(f + g) = (f (t) + g(t)) cos(t)dt
Z0 2π
= (f (t) cos(t)dt + g(t) cos(t)dt)
Z0 2π Z 2π
= f (t) cos(t)dt + g(t) cos(t)dt
0 0
= L(f ) + L(g).

For any real scalar c,


Z 2π Z 2π
L(cf ) = cf (t) cos(t)dt = c f (t) cos(t)dt = cL(f ).
0 0

So L is a linear mapping.

{c7.2.6}
13.
Z x Let P be the vector space of polynomials in one variable x. Define L : P → P by L(p)(x) =
(t − 1)p(t)dt. Verify that L is a linear mapping.
0

499
§8.1 Linear Mappings and Bases

Let p and q be elements of P. Then,


Z x
L(p + q)(x) = (t − 1)(p(t) + q(t))dt
Z0 x
= ((t − 1)p(t)dt + (t − 1)q(t)dt)
Z0 x Z x
= (t − 1)p(t)dt + (t − 1)q(t)dt
0 0
= L(p(x)) + L(q(x)).
For any real scalar c,
Z x Z x
L(cp(x)) = (t − 1)cp(t)dt = c (t − 1)p(t)dt = cL(p(x)).
0 0

Therefore, L is a linear mapping.

{A8.1.1}
14. Show that
d2
: P4 → P2
dt2
d2
is a linear mapping. Then compute bases for the null space and range of .
dt2
d2 p
Solution: Let L be a transformation that maps p(t) 7→ (t). For
dt2
p(t) = p0 + p1 t + p2 t2 + p3 t3 + p4 t4 ,

then
L(p(t)) = 2p2 + 6p3 t + 12p4 t2 ,
so L is a mapping L : P4 → P2 . From calculus, we know that, for any functions f and g:
d2 d2 d2
(f + g)(t) = 2 f (t) + 2 g(t),
dt2 dt dt
and that, for any scalar c:
d2 d2
2
(cf )(t) = c 2 f (t).
dt dt
It follows that L is a linear mapping. The null space is

{p ∈ P4 : p2 = p3 = p4 = 0}

with basis {1, t} and


range(L) = {2p2 + 6p3 t + 12p4 t2 }
with basis {1, t, t2 }.

500
§8.2 Row Rank Equals Column Rank

{S:5.8} 8.2 Row Rank Equals Column Rank


Let A be an m × n matrix. The row space of A is the span of the row vectors of A and is a
subspace of Rn . The column space of A is the span of the columns of A and is a subspace
of Rm .
Definition 8.2.1. The row rank of A is the dimension of the row space of A and the column
rank of A is the dimension of the column space of A.

Lemma 5.5.4 of Chapter 5 states that

row rank(A) = rank(A).

We show below that row ranks and column ranks are equal. We begin by continuing the
discussion of the previous section on linear maps between vector spaces.

Null Space and Range Each linear map between vector spaces defines two subspaces. Let
V and W be vector spaces and let L : V → W be a linear map. Then

null space(L) = {v ∈ V : L(v) = 0} ⊂ V

and
{L:nsr} range(L) = {L(v) ∈ W : v ∈ V } ⊂ W.
Lemma 8.2.2. Let L : V → W be a linear map between vector spaces. Then the null space
of L is a subspace of V and the range of L is a subspace of W .

Proof The proof that the null space of L is a subspace of V follows from linearity in
precisely the same way that the null space of an m × n matrix is a subspace of Rn . That is,
if v1 and v2 are in the null space of L, then

L(v1 + v2 ) = L(v1 ) + L(v2 ) = 0 + 0 = 0,

and for c ∈ R
L(cv1 ) = cL(v1 ) = c0 = 0.
So the null space of L is closed under addition and scalar multiplication and is a subspace
of V .
To prove that the range of L is a subspace of W , let w1 and w2 be in the range of L. Then,
by definition, there exist v1 and v2 in V such that L(vj ) = wj . It follows that

L(v1 + v2 ) = L(v1 ) + L(v2 ) = w1 + w2 .

501
§8.2 Row Rank Equals Column Rank

Therefore, w1 + w2 is in the range of L. Similarly,

L(cv1 ) = cL(v1 ) = cw1 .

So the range of L is closed under addition and scalar multiplication and is a subspace of
W. 

Suppose that A is an m × n matrix and LA : Rn → Rm is the associated linear map.


Then the null space of LA is precisely the null space of A, as defined in Definition 5.2.1
of Chapter 5. Moreover, the range of LA is the column space of A. To verify this, write
A = (A1 | · · · |An ) where Aj is the j th column of A and let v = (v1 , . . . vn )t . Then, LA (v) is
the linear combination of columns of A

LA (v) = Av = v1 A1 + · · · + vn An .

There is a theorem that relates the dimensions of the null space and range with the dimension
{T:nsr} of V .
Theorem 8.2.3. Let V and W be vector spaces with V finite dimensional and let L : V → W
be a linear map. Then

dim(V ) = dim(null space(L)) + dim(range(L)).

Proof Since V is finite dimensional, the null space of L is finite dimensional (since the
null space is a subspace of V ) and the range of L is finite dimensional (since it is spanned
by the vectors L(vj ) where v1 , . . . , vn is a basis for V ). Let u1 , . . . , uk be a basis for the null
space of L and let w1 , . . . , w` be a basis for the range of L. Choose vectors yj ∈ V such that
L(yj ) = wj . We claim that u1 , . . . , uk , y1 , . . . , y` is a basis for V , which proves the theorem.
To verify that u1 , . . . , uk , y1 , . . . , y` are linear independent, suppose that

{E:uy} α1 u1 + · · · + αk uk + β1 y1 + · · · + β` y` = 0. (8.2.1)

Apply L to both sides of (8.2.1) to obtain

β1 w1 + · · · + β` w` = 0.

Since the wj are linearly independent, it follows that βj = 0 for all j. Now (8.2.1) implies
that
α1 u1 + · · · + αk uk = 0.

502
§8.2 Row Rank Equals Column Rank

Since the uj are linearly independent, it follows that αj = 0 for all j.


To verify that u1 , . . . , uk , y1 , . . . , y` span V , let v be in V . Since w1 , . . . , w` span W , it
follows that there exist scalars βj such that

L(v) = β1 w1 + · · · + β` w` .

Note that by choice of the yj

L(β1 y1 + · · · + β` y` ) = β1 w1 + · · · + β` w` .

It follows by linearity that


u = v − (β1 y1 + · · · + β` y` )
is in the null space of L. Hence there exist scalars αj such that

u = α1 u1 + · · · + αk uk .

Thus, v is in the span of u1 , . . . , uk , y1 , . . . , y` , as desired. 

Row Rank and Column Rank Recall Theorem 5.5.6 of Chapter 5 that states that the
nullity plus the rank of an m × n matrix equals n. At first glance it might seem that this
theorem and Theorem 8.2.3 contain the same information, but they do not. Theorem 5.5.6
of Chapter 5 is proved using a detailed analysis of solutions of linear equations based on
Gaussian elimination, back substitution, and reduced echelon form, while Theorem 8.2.3 is
proved using abstract properties of linear maps.
Let A be an m × n matrix. Theorem 5.5.6 of Chapter 5 states that

nullity(A) + rank(A) = n.

Meanwhile, Theorem 8.2.3 states that

dim(null space(LA )) + dim(range(LA )) = n.

But the dimension of the null space of LA equals the nullity of A and the dimension of the
range of A equals the dimension of the column space of A. Therefore,

nullity(A) + dim(column space(A)) = n.

Hence, the rank of A equals the column rank of A. Since rank and row rank are identical,
we have proved:

503
§8.2 Row Rank Equals Column Rank

rowrank=columnrank}
Theorem 8.2.4. Let A be an m × n matrix. Then

row rank A = column rank A.

Since the row rank of A equals the column rank of At , we have:

Corollary 8.2.5. Let A be an m × n matrix. Then

rank(A) = rank(At ).

Exercises

{c5.8.1}
1. The 3 × 3 matrix  
1 2 5
A= 2 −1 1 
3 1 6
has rank two. Let r1 , r2 , r3 be the rows of A and c1 , c2 , c3 be the columns of A. Find scalars αj
and βj such that

α1 r1 + α2 r2 + α3 r3 = 0
β1 c1 + β2 c2 + β3 c3 = 0.

Answer: The possible choices for the scalars αj are α = (α1 , α2 , α3 ) = α3 (−1, −1, 1) and the
7 9
possible choices for the scalars βj are β = (β1 , β2 , β3 ) = β3 (− , − , 1).
5 5
Solution: Find At and solve by row reduction the equation At α = 0. To find the scalars βj , solve
Aβ = 0. These equations yield

−r1 − r2 + r3 = 0 and − 7c1 − 9c2 + 5c3 = 0.

{c5.8.2}
2. What is the largest row rank that a 5 × 3 matrix can have?
The largest row rank that a 5 × 3 matrix can have is 3, since, by Theorem 8.2.4 the row rank is
equal to the column rank, and the matrix has 3 columns.

504
§8.2 Row Rank Equals Column Rank

{c5.8.3}
3. Let  
1 1 0 1
A= 0 −1 1 2 .
1 2 −1 3

(a) Find a basis for the row space of A and the row rank of A.
(b) Find a basis for the column space of A and the column rank of A.
(c) Find a basis for the null space of A and the nullity of A.
(d) Find a basis for the null space of At and the nullity of At .

(a) Answer: The vectors (1, 0, 1, 0), (0, 1, −1, 0) and (0, 0, 0, 1) form a basis for the row space of
A, and the row rank of A is 3.
Solution: Row reduce A:
   
1 1 0 1 1 0 1 0
 0 −1 1 2  −→  0 1 −1 0 .
1 2 −1 3 0 0 0 1

(b) Answer: The column rank of A is 3, and the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) form a basis
for the column space of A.
Solution: Row reduce At :
   
1 0 1 1 0 0
 1
 −1 2 
 −→  0
 1 0 

 0 1 −1   0 0 1 
1 2 3 0 0 0

(c) Answer: The vector (−1, 1, 1, 0) is a basis for the null space. Since one vector forms the basis,
the nullity of A is 1.
Solution: Solve Ax = 0 by row reducing A, which we have already done.
(d) Answer: The null space is trivial and the nullity of At is 0.
Solution: Find a basis by solving At x = 0 by row reduction. The row reduced matrix:
 
1 0 0
 0 1 0 
 
 0 0 1 
0 0 0

implies x = (0, 0, 0).

505
§8.2 Row Rank Equals Column Rank

{c5.8.4}
4. Let A be a nonzero 3 × 3 matrix such that A2 = 0. Show that rank(A) = 1.
Since A is a nonzero 3 × 3 matrix, rank(A) can equal 1, 2, or 3. If rank(A) = 3, then A is invertible,
so there exists a matrix B such that AB = I3 . Then A2 B 2 = AABB = I3 , so rank(A2 ) = 3, which
contradicts the assumption that A2 = 0. Therefore, rank(A) 6= 3.
Let rank(A) = 2 and let v1 and v2 be linearly independent vectors such that Av1 6= 0 and
Av2 6= 0. By Theorem 5.5.6, the nullity of A is 1. However, A2 v = 0 for all vectors v. In particular,

A2 v1 = A(Av1 ) = 0 and A2 v2 = A(Av2 ) = 0.

Since there are linearly independent vectors Av1 and Av2 such that A(Av1 ) = A(Av2 ) = 0, (A) ≥ 2,
contradicting Theorem 5.5.6. Thus, rank(A) 6= 2, so, by elimination rank(A) = 1.

{new_8_2_1}
5. Let V = range(LA ) where A is a n × m matrix of rank 2. Is V a subspace of Rn and, if so, what
is dim(V )?
Answer: V ⊂ Rn is a subspace whose dimension is 2.
Solution: The range of any linear map is a subspace. The dimension of the range of LA equals
the rank of A because of the two theorems

nullity(A) + dim(range(A)) = n and nullity(A) + rank(A) = n

where A is an n × m matrix. In this case rank(A) = 2; so dim(range(LA )) = 2.

{c5.8.5}
6. Let B be an m × p matrix and let C be a p × n matrix. Prove that the rank of the m × n matrix
A = BC satisfies
rank(A) ≤ min{rank(B), rank(C)}.

We show that rank(A) ≤ min{rank(B), rank(C)} by noting that, if A = BC, then the columns
of A are linear combinations of the columns of B, so the span of the column space of A cannot
exceed the span of the column space of B. Therefore, rank(A) ≤ rank(B). Next, note that
At = C t B t . By a similar argument, rank(At ) ≤ rank(C t ). Since rank(A) = rank(At ), rank(A) ≤
min{rank(B), rank(C)}.

{c5.8.6} 7. (matlab) Let  


1 1 2 2
0 −1 3 1 
{MATLAB:38} (8.2.2*)

A= .
 2 −1 1 0 
−1 0 7 4

506
§8.2 Row Rank Equals Column Rank

(a) Compute rank(A) and exhibit a basis for the row space of A.
(b) Find a basis for the column space of A.
(c) Find all solutions to the homogeneous equation Ax = 0.
(d) Does  
4
 2 
Ax =  
 2 
1
have a solution?
1 3 7
(a) Answer: The vectors (1, 0, 0, ), (0, 1, 0, ), and (0, 0, 1, ) form a basis for the row space
12 4 12
of A and rank(A) = 3.
Solution: Row reducing A in MATLAB yields:

ans =
1.0000 0 0 0.0833
0 1.0000 0 0.7500
0 0 1.0000 0.5833
0 0 0 0

(b) Answer: The vectors (1, 0, 0, 1), (0, 1, 0, 2), and (0, 0, 1, −1) form a basis for the column space
of A.
Solution: Row reduce At in MATLAB with the command rref(A') to obtain

ans =
1 0 0 1
0 1 0 2
0 0 1 -1
0 0 0 0
1 3 7
(c) Answer: Ax = 0 when x = s(− , − , − , 1).
12 4 12
Solution: The solutions to the homogeneous system Ax = 0 can be found using the row reduced
matrix A, which we computed in part (a).
(d) Answer: The vector (4, 2, 2, 1) is not in the span of the columns of A.
Solution: Row reducing the augmented matrix
 
1 1 2 2 4
 0 −1 3 1 2 
 
 2 −1 1 0 2 
−1 0 7 4 1

507
§8.2 Row Rank Equals Column Rank

in MATLAB yields

ans =
1 0 0 1/12 0
0 1 0 3/4 0
0 0 1 7/12 0
0 0 0 0 1

Since there is a pivot point in the last column, the system is inconsistent.

{A8.2.1}
8. Let C be the 3 × 3 matrix  
1 1 1
C = −1 b −1 
2 2 b2 + 1

(a) Find all b so that


(i) dim(range(C)) = 3
(ii) dim(range(C)) = 2
(iii) dim(range(C)) = 1
(iv) dim(range(C)) = 0
(b) Find all b so that
(i) dim(null space(C)) = 3
(ii) dim(null space(C)) = 2
(iii) dim(null space(C)) = 1
(iv) dim(null space(C)) = 0
(c) Find all b so that  
1
{A8.2.1a} Cx = 1 (8.2.3)
2
is consistent. (Hint: can you convert this into a statement about the range of C?)

Solution:

(a) The dimension of the range of C is the column rank of C which is the (row) rank of C. By
row reduction the rank of C is 3 when b2 6= 1; the rank of C is 2 when b = 1; and the rank of
C is 1 when b = −1.

508
§8.2 Row Rank Equals Column Rank

(b) The null space of C has dimension 3 - rank(C).


 
1
(c) Equation (8.2.3) has a solution if 1 is in the range of C. This is always true when rank(C) =
2
3; that is when b2 6= 1. Finally, one can check by row reduction that the equation is consistent
when b = 1 and inconsistent when b = −1.

{mc.exercise15}
9. Let  
a11 a12 a13 a14 a15
A=
a21 a22 a23 a24 a25
and suppose a11 a22 − a12 a21 6= 0. What is the nullity(A)? Explain your answer.
Answer: nullity(A) = 3.
Solution: Note that a11 a22 − a12 a21 is the determinant of the 2 × 2 submatrix
 
a11 a12
B= .
a21 a22

Corollary 3.8.3 implies that B is invertible and Theorem 3.7.8 implies that B is row equivalent to
I2 . Apply the same set of elementary row operations that transforms B to I2 to see that A is row
equivalent to
1 0 a013 a014 a015
 
0
A = .
0 1 a023 a024 a025
It follows that the column ranks of A0 and A equal 2 and by Theorem 8.2.4 that the row ranks of A
and A0 are equal to 2. Hence the ranks of A and A0 are equal to 2. Finally, Theorem 5.5.6 implies
that the nullities of A and A0 equal 3.

509
§8.3 Vectors and Matrices in Coordinates

{S:coordinates} 8.3 Vectors and Matrices in Coordinates


In the last half of this chapter we discuss how similarity of matrices should be thought of
as change of coordinates for linear mappings. There are three steps in this discussion.

(a) Formalize the idea of coordinates for a vector in terms of basis.

(b) Discuss how to write a linear map as a matrix in each coordinate system.

(c) Determine how the matrices corresponding to the same linear map in two different
coordinate systems are related.

The answer to the last question is simple: the matrices are related by a change of coordinates
if and only if they are similar. We discuss these steps in this section in Rn and in Section 8.4
for general vector spaces.

Coordinates of Vectors using Bases Throughout, we have written vectors v ∈ Rn in


coordinates as v = (v1 , . . . , vn ), and we have used this notation almost without comment.
From the point of view of vector space operations, we are just writing

v = v1 e 1 + · · · + vn e n

as a linear combination of the standard basis E = {e1 , . . . , en } of Rn .


More generally, each basis provides a set of coordinates for a vector space. This fact is
described by the following lemma (although its proof is identical to the first part of the
proof of Theorem 8.1.2 in Chapter 5).
{L:coordinates}
Lemma 8.3.1. Let W = {w1 , . . . , wn } be a basis for the vector space V . Then each vector
v in V can be written uniquely as a linear combination of vectors in W; that is,

v = α1 w1 + · · · + αn wn ,

for uniquely defined scalars α1 , . . . , αn .

Proof Since W is a basis, Theorem 5.5.3 of Chapter 5 implies that the vectors w1 , . . . , wn
span V and are linearly independent. Therefore, we can write v in V as a linear combination
of vectors in B. That is, there are scalars α1 , . . . , αn such that

v = α1 w1 + · · · + αn wn .

510
§8.3 Vectors and Matrices in Coordinates

Next we show that these scalars are uniquely defined. Suppose that we can write v as a
linear combination of the vectors in B in a second way; that is, suppose

v = β1 w1 + · · · + βn wn

for scalars β1 , . . . , βn . Then

(α1 − β1 )w1 + · · · + (αn − βn )wn = 0.

Since the vectors in W are linearly independent, it follows that αj = βj for all j. 
{D:coordinates}
Definition 8.3.2. Let W = {w1 , . . . , wn } be a basis in a vector space V . Lemma 8.3.1
states that we can write v ∈ V uniquely as

{e:coordv} v = α1 w1 + · · · + αn wn . (8.3.1)

The scalars α1 , . . . , αn are the coordinates of v relative to the basis W, and we denote the
coordinates of v in the basis W by

{e:coordnot} [v]W = (α1 , . . . , αn ) ∈ Rn . (8.3.2)

We call the coordinates of a vector v ∈ Rn relative to the standard basis, the standard
coordinates of v.

Writing Linear Maps in Coordinates as Matrices Let V be a finite dimensional vector


space of dimension n and let L : V → V be a linear mapping. We now show how each basis
of V allows us to associate an n × n matrix to L. Previously we considered this question
with the standard basis on V = Rn . We showed in Chapter 3 that we can write the linear
mapping L as a matrix mapping, as follows. Let E = {e1 , . . . , en } be the standard basis in
Rn . Let A be the n × n matrix whose j th column is the n vector L(ej ). Then Chapter 3,
Theorem 3.3.5 shows that the linear map is given by matrix multiplication as

L(v) = Av.

{R:standard} Thus every linear mapping on Rn can be written in this matrix form.
Remark. Another way to think of the j th column of the matrix A is as the coordinate
vector of L(ej ) relative to the standard basis, that is, as [L(ej )]E . We denote the matrix A
by [L]E ; this notation emphasizes the fact that A is the matrix of L relative to the standard
basis.

We now discuss how to write a linear map L as a matrix using different coordinates.

511
§8.3 Vectors and Matrices in Coordinates

{D:matrixincoord}
Definition 8.3.3. Let W = {w1 , . . . , wn } be a basis for the vector space V . The n × n
matrix [L]W associated to the linear map L : V → V and the basis W is defined as follows.
The j th column of [L]W is [L(wj )]W — the coordinates of L(wj ) relative to the basis W.

Note that when V = Rn and when W = E, the standard basis of Rn , then the definition
of the matrix [L]E is exactly the same as the matrix associated with the linear map L in
Remark 8.3.
Lemma 8.3.4. The coordinate vector of L(v) relative to the basis W is
{e:matrixofL} [L(v)]W = [L]W [v]W . (8.3.3)

Proof The process of choosing the coordinates of vectors relative to a given basis W =
{w1 , . . . , wn } of a vector space V is itself linear. Indeed,
[u + v]W = [u]W + [v]W
[cv]W = c[v]W .
Thus the coordinate mapping relative to a basis W of V defined by
{e:coordmap} v 7→ [v]W (8.3.4)
is a linear mapping of V into R . We denote this linear mapping by [·]W : V → R .
n n

It now follows that both the left hand and right hand sides of (8.3.3) can be thought of as
linear mappings of V → Rn . In verifying this comment, we recall Lemma 8.1.3 of Chapter 5
that states that the composition of linear maps is linear. On the left hand side we have the
mapping
v 7→ L(v) 7→ [L(v)]W ,
which is the composition of the linear maps: [·]W with L. See (8.3.4). The right hand side
is
v 7→ [v]W 7→ [L]W [v]W ,
which is the composition of the linear maps: multiplication by the matrix [L]W with [·]W .
Theorem 8.1.2 states that linear mappings are determined by their actions on a basis. Thus
to verify (8.3.3), we need only verify this equality for v = wj for all j. Since [wj ]W = ej ,
the right hand side of (8.3.3) is:
[L]W [wj ]W = [L]W ej ,
which is just the j th column of [L]W . The left hand side of (8.3.3) is the vector [L(wj )]W ,
which by definition is also the j th column of [L]W (see Definition 8.3.3). 

512
§8.3 Vectors and Matrices in Coordinates

Computations of Vectors in Coordinates in Rn We divide this subsection into three parts.


We consider a simple example in R2 algebraically in the first part and geometrically in the
second. In the third part we formalize and extend the algebraic discussion to Rn .

An Example of Coordinates in R2 How do we find the coordinates of a vector v in a basis?


For example, choose a (nonstandard) basis in the plane — say

w1 = (1, 1) and w2 = (1, −2).

Since {w1 , w2 } is a basis, we may write the vector v as a linear combination of the vectors
w1 and w2 . Thus we can find scalars α1 and α2 so that

v = α1 w1 + α2 w2 = α1 (1, 1) + α2 (1, −2) = (α1 + α2 , α1 − 2α2 ).

In standard coordinates, set v = (v1 , v2 ); this equation leads to the system of linear equations

v1 = α1 + α2
v2 = α1 − 2α2

in the two variables α1 and α2 . As we have seen, the fact that w1 and w2 form a basis of R2
implies that these equations do have a solution. Indeed, we can write this system in matrix
form as     
v1 1 1 α1
= ,
v2 1 −2 α2
which is solved by inverting the matrix to obtain:
    
α1 1 2 1 v1
{change1} = . (8.3.5)
α2 3 1 −1 v2

For example, suppose v = (2.0, 0.5). Using (8.3.5) we find that (α1 , α2 ) = (1.5, 0.5); that is,
we can write
v = 1.5w1 + 0.5w2 ,
and (1.5, 0.5) are the coordinates of v in the basis {w1 , w2 }.
Using the notation in (8.3.2), we may rewrite (8.3.5) as
 
1 2 1
[v]W = [v]E ,
3 1 −1

where E = {e1 , e2 } is the standard basis.

513
§8.3 Vectors and Matrices in Coordinates

Planar Coordinates Viewed Geometrically using MATLAB Next we use MATLAB to view
geometrically the notion of coordinates relative to a basis W = {w1 , w2 } in the plane. Type

w1 = [1 1];
w2 = [1 -2];
bcoord

MATLAB will create a graphics window showing the two basis vectors w1 and w2 in red.
Using the mouse click on a point near (2, 0.5) in that figure. MATLAB will respond by
plotting the new vector v in yellow and the parallelogram generated by α1 w1 and α2 w2 in
cyan. The values of α1 and α2 are also plotted on this figure. See Figure 25.

Coordinates in the {w1,w2} basis

1.5

1 w1
1.499
0.5 v

−0.5 0.5075

−1

−1.5

−2 w2

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

{F:coords} Figure 25: The coordinates of v = (2.0, 0.5) in the basis w1 = (1, 1), w2 = (1, −2).

Abstracting R2 to Rn Suppose that we are given a basis W = {w1 , . . . , wn } of Rn and a


vector v ∈ Rn . How do we find the coordinates [v]W of v in the basis W?
For definiteness, assume that v and the wj are row vectors. Equation (8.3.1) may be

514
§8.3 Vectors and Matrices in Coordinates

rewritten as  
α1
v t = (w1t | · · · |wnt )  ...  .
 

αn
Thus,  
α1
{e:coordRn} [v]W =  ...  = PW
−1 t
v, (8.3.6)
 

αn
where PW = (w1t | · · · |wnt ). Since the wj are a basis for Rn , the columns of the matrix PW
are linearly independent, and PW is invertible.
We may use (8.3.6) to compute [v]W using MATLAB. For example, let

v = (4, 1, 3)

and
w1 = (1, 4, 7) w2 = (2, 1, 0) w3 = (−4, 2, 1).
Then [v]W is found by typing

w1 = [ 1 4 7];
w2 = [ 2 1 0];
w3 = [-4 2 1];
inv([w1' w2' w3'])*[4 1 3]'

The answer is:

ans =
0.5306
0.3061
-0.7143

Determining the Matrix of a Linear Mapping in Coordinates Suppose that we are given
the linear map LA : Rn → Rn associated to the matrix A in standard coordinates and a
basis w1 , . . . , wn of Rn . How do we find the matrix [LA ]W . As above, we assume that the
vectors wj and the vector v are row vectors Since LA (v) = Av t we can rewrite (8.3.3) as

[LA ]W [v]W = [Av t ]W

515
§8.3 Vectors and Matrices in Coordinates

As above, let PW = (w1t | · · · |wnt ). Using (8.3.6) we see that


−1 t −1
[LA ]W PW v = PW Av t .

Setting
−1 t
u = PW v
we see that
−1
[LA ]W u = PW APW u.
Therefore,
−1
[LA ]W = PW APW .
We have proved:
{T:matrixcoord2}
Theorem 8.3.5. Let A be an n × n matrix and let LA : Rn → Rn be the associated linear
map. Let W = {w1 , . . . , wn } be a basis for Rn . Then the matrix [LA ]W associated to to LA
in the basis W is similar to A. Therefore the determinant, trace, and eigenvalues of [LA ]W
are identical to those of A.

Matrix Normal Forms in R2 If we are careful about how we choose the basis W, then we
can simplify the form of the matrix [L]W . Indeed, we have already seen examples of this
process when we discussed how to find closed form solutions to linear planar systems of
ODEs in the previous chapter. For example, suppose that L : R2 → R2 has real eigenvalues
λ1 and λ2 with two linearly independent eigenvectors w1 and w2 . Then the matrix associated
to L in the basis W = {w1 , w2 } is the diagonal matrix
 
λ1 0
{e:diagcoord} [L]W = , (8.3.7)
0 λ2

since  
λ1
[L(w1 )]W = [λ1 w1 ]W =
0
and  
0
[L(w2 )]W = [λ2 w2 ]W = .
λ2

In Chapter 6 we showed how to classify 2 × 2 matrices up to similarity (see Theorem 6.3.4)


and how to use this classification to find closed form solutions to planar systems of linear
ODEs (see Section 6.3). We now use the ideas of coordinates and matrices associated with
bases to reinterpret the normal form result (Theorem 6.3.4) in a more geometric fashion.

516
§8.3 Vectors and Matrices in Coordinates

{T:putinform2}
Theorem 8.3.6. Let L : R2 → R2 be a linear mapping. Then in an appropriate coordinate
system defined by the basis W below, the matrix [L]W has one of the following forms.

(a) Suppose that L has two linearly independent real eigenvectors w1 and w2 with real
eigenvalues λ1 and λ2 . Then
 
λ1 0
[L]W = .
0 λ2

(b) Suppose that L has no real eigenvectors and complex conjugate eigenvalues σ ± iτ where
τ 6= 0. Let w1 + iw2 be a complex eigenvector of L associated with the eigenvalue σ − iτ .
Then W = {w1 , w2 } is a basis and
 
σ −τ
[L]W = .
τ σ

(c) Suppose that L has exactly one linearly independent real eigenvector w1 with real eigen-
value λ. Choose the generalized eigenvector w2

{e:Lw=lw+v} (L − λI2 )(w2 ) = w1 . (8.3.8)

Then W = {w1 , w2 } is a basis and


 
λ 1
[L]W = .
0 λ

Proof The verification of (a) was discussed in (8.3.7). The verification of (b) follows from
(6.2.9) on equating w1 with v and w2 with w. The verification of (c) follows directly from
(8.3.8) as
[L(w1 )]W = λe1 and [L(w2 )]W = e1 + λe2 .


Visualization of Coordinate Changes in ODEs We consider two examples. As a first example


note that the matrices
   
1 0 4 −3
C= and B = ,
0 −2 6 −5

517
§8.3 Vectors and Matrices in Coordinates

are similar matrices. Indeed, B = P −1 CP where

 
2 −1
{e:Pchange} P = . (8.3.9)
1 −1

The phase portraits of the differential equations Ẋ = BX and Ẋ = CX are shown in


Figure 26. Note that both phase portraits are pictures of the same saddle — just in different
coordinate systems.

5 5

4 4

3 3

2 2

1 1

0 0
y

y
−1 −1

−2 −2

−3 −3

−4 −4

−5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

{F:comparesim} Figure 26: Phase planes for the saddles Ẋ = BX and Ẋ = CX.

As a second example note that the matrices

   
0 2 6 −4
C= and B =
−2 0 10 −6

are similar matrices, and both are centers. Indeed, B = P −1 CP where P is the same matrix
as in (8.3.9). The phase portraits of the differential equations Ẋ = BX and Ẋ = CX are
shown in Figure 27. Note that both phase portraits are pictures of the same center — just
in different coordinate systems.

518
§8.3 Vectors and Matrices in Coordinates

5 5

4 4

3 3

2 2

1 1

0 0
y

y
−1 −1

−2 −2

−3 −3

−4 −4

−5 −5

−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
x x

{F:comparesim2} Figure 27: Phase planes for the centers Ẋ = BX and Ẋ = CX.

Exercises

{c7.1.1}
1. Let
w1 = (1, 4) and w2 = (−2, 1).
Find the coordinates of v = (−1, 32) in the W basis.
Answer: [v]W = (7, 4).
Solution: Find the scalars α1 and α2 such that v = α1 w1 + α2 w2 . That is, solve the linear system

α1 − 2α2 = −1
4α1 + α2 = 32

to obtain (α1 , α2 ) = (7, 4), the coordinates of v in the W basis.

{mc.exerciseErr4}
2. Let W = {v1 , . . . , vn } be a basis of Rn .

(a) State the definition of the coordinates of a vector x ∈ Rn relative to W, and describe how to
find them given the standard coordinates of x.
(b) What vector v ∈ Rn satisfies
[v]W = e1 − e2

519
§8.3 Vectors and Matrices in Coordinates

(c) What is the definition of the matrix of a linear function T : Rn → Rn relative to W?


(d) Let T : Rn → Rn be a linear transformation with standard matrix A so that [T ]W = B. What
is the relationship between A and B?

Answer:

(a) This is Definition 8.3.2.


(b) v = v1 − v2 .
(c) This is Definition 8.3.3.
(d) By Theorem 8.3.5, A and B are similar matrices.
{c7.3.1}
3. Let w1 = (1, 2) and w2 = (0, 1) be a basis for R2 . Let LA : R2 → R2 be the linear map given by
the matrix  
2 1
A=
−1 0
in standard coordinates. Find the matrix [L]W .
Solution: From Section 8.3,
     
1 0 2 1 1 0 4 1
[L]W = (w1t |w2t )−1 L(w1t |w2t ) = = .
−2 1 −1 0 2 1 −9 −2
{c7.1.3}
4. Let Eij be the 2 × 3 matrix whose entry in the ith row and j th column is 1 and all of whose
other entries are 0.

(a) Show that


V = {E11 , E12 , E13 , E21 , E22 , E23 }
is a basis for the vector space of 2 × 3 matrices.
(b) Compute [A]V where  
−1 0 2
A= .
3 −2 4

(a) By Theorem 5.5.3, the subset V is a basis for the vector space of 2 × 3 matrices if the vectors
of V are linearly independent and span the vector space. Let
 
b11 b12 b13
B= .
b21 b22 b23
We show that B is in the span of V by noting that B = b11 E11 + b12 E12 + b13 E13 + b21 E21 + b22 E22 +
b23 E23 . To show that the matrices Eij are linearly independent, suppose b11 E11 + b12 E12 + b13 E13 +
b21 E21 + b22 E22 + b23 E23 = 0. Then,
 
b11 b12 b13
B= = 0,
b21 b22 b23

520
§8.3 Vectors and Matrices in Coordinates

so bij = 0. Therefore, V is a basis for the given vector space.


(b) Answer: [A]V = (−1, 0, 2, 3, −2, 4).
Solution: Compute A = −E11 + 2E13 + 3E21 − 2E22 + 4E23 .
{mc8_3A}
5. Suppose the mapping L : R3 → R2 is linear and satisfies
     
1   0   0  
1 2 −1
L 0  = L 1  = L 0  =
2 0 4
0 1 1

What is the 2 × 3 matrix A such that L = LA ?


Answer:  
1 3 −1
.
2 −4 4

Solution:  −1
  1 0 0
1 2 −1  0
A= 1 0 
2 0 4
0 1 1
Use Gaussian elimination to compute
   
1 0 0 1 0 0 1 0 0 1 0 0
 0 1 0 0 1 0 ∼ 0 1 0 0 1 0 
0 1 1 0 0 1 0 0 1 0 −1 1

Therefore  
  1 0 0  
1 2 −1  0 1 3 −1
A= 1 0 = .
2 0 4 2 −4 4
0 −1 1

{c7.1.4}
6. Verify that V = {p1 , p2 , p3 } where

p1 (t) = 1 + 2t, p2 (t) = t + 2t2 , and p3 (t) = 2 − t2 ,

is a basis for the vector space of polynomials P2 . Let p(t) = t and find [p]V .
4 1 2
Answer: If p(t) = t, then [p]V = ( , − , − ).
7 7 7
Solution: In order to verify that V is a basis for P2 , first show that the set {1, t, t2 } is a basis for
P2 . To prove this, note that any polynomial in P2 can be written as p = α1 + α2 t + α3 t2 , so the
set spans P2 . Also, 0 = α1 + α2 t + α3 t2 if and only if α1 = α2 = α3 = 0, so the set is linearly
independent.

521
§8.3 Vectors and Matrices in Coordinates

The set {1, t, t2 } has dimension 3 and is a basis for P2 . Therefore, any linearly independent
set of three vectors in P2 will span P2 . So we need only show that V is a linearly independent set,
which we do by solving:

0 = α1 p1 (t) + α2 p2 (t) + α3 p3 (t)


= α1 (1 + 2t) + α2 (t + 2t2 ) + α3 (2 − t2 )
= (α1 + 2α3 ) + (2α1 + α2 )t + (2α2 − α3 ).

This equation is identically 0 if


  
1 2 0 α1
 2 0 1   α2  = 0.
0 2 1 α3

The only solution to this system is α1 = α2 = α3 = 0, so the elements are linearly independent and
V is a basis for P2 .
Let p(t) = t. Then find this vector [p]V by solving p(t) = α1 p1 (t) + α2 p2 (t) + α3 p3 (t). That
is,     
0 1 0 2 α1
 1 = 2 1 0   α2  .
0 0 2 −1 α3
4 1 2
Solve by substitution to obtain α1 = , α2 = − , and α3 = − .
7 7 7

{c7.1.6} 7. (matlab) Let


w1 = (1, 0, 2), w2 = (2, 1, 4), and w3 = (0, 1, −1)
be a basis for R3 . Find [v]W where v = (2, 1, 5).
Answer: [v]W = (−2, 2, −1).
Solution: Use MATLAB to row reduce the augmented matrix (w1t |w2t |w3t |v), obtaining:

ans =
1 0 0 -2
0 1 0 2
0 0 1 -1

{c7.1.7} 8. (matlab) Let


w1 = (0.2, −1.3, 0.34, −1.1)
w2 = (0.5, −0.6, 0.7, 0.8)
{MATLAB:35} (8.3.10*)
w3 = (−1.0, 1.0, 2.0, 4.5)
w4 = (−5.1, 0.0, 1.6, −1.7)

522
§8.3 Vectors and Matrices in Coordinates

be a basis W for R4 . Find [v]W where v = (1.7, 2.3, 1.0, −5.0).


Answer: [v]W ≈ (−58.3171, 79.7282, −25.6754, 10.2308).
Solution: Using MATLAB , create the augmented matrix [w1' w2' w3' w4' v'] and row reduce
to obtain

ans =
1.0000 0 0 0 -58.3171
0 1.0000 0 0 79.7282
0 0 1.0000 0 -25.6754
0 0 0 1.0000 10.2308

{c7.3.4} 9. (matlab) Find a basis W = {w1 , w2 } such that [LA ]W is a diagonal matrix, where LA is the
linear map associated with the matrix
 
−10 −6
A= .
18 11

Answer: The matrix [L]W is diagonal in the basis:


   
1 2
W= ,
2 3

Solution: Theorem 8.3.6 states that the matrix [L]W is diagonal if W consists of eigenvectors
of L corresponding to real eigenvalues. By computation, we find that λ1 = 2 and λ2 = −1 are
the eigenvalues of L. We then find that Lw1 = 2w1 when w1 = (1, 2)t and Lw2 = −w2 when
w2 = (2, 3)t , so w1 and w2 are the eigenvectors of L.

{c7.3.5} 10. (matlab) Let A be the 4 × 4 matrix


 
2 1 4 6
 1 2 1 1 
{MATLAB:36} A=
 0
 (8.3.11*)
1 2 4 
2 1 1 5

and let W = {w1 , w2 , w3 , w4 } where

w1 = (1, 2, 3, 4)
w2 = (0, −1, 1, 3)
{MATLAB:37} (8.3.12*)
w3 = (2, 0, 0, 1)
w4 = (−1, 1, 3, 0)

523
§8.3 Vectors and Matrices in Coordinates

Verify that W is a basis of R4 and compute the matrix associated to A in the W basis.
Answer: Let L be the linear transformation with matrix A in the standard basis. Then:
 
92 −21 55 −9
[L]W =  −54
1  56 −10 −58  .
41  901 531 179 292 
254 180 3 124

Solution: Verify that W is a basis of R4 by noting that four linearly independent vectors in R4
span R4 and therefore form a basis. So, row reduce the matrix PW = (w1t |w2t |w3t |w4t ) to find that
the vectors are indeed linearly independent.
Use the formula [L]W = PW
−1
APW to compute
 −1  
1 0 2 −1 1 0 2 −1
 2 −1 0 1   2 −1 0 1 
[L]W =
 3
 A .
1 0 3   3 1 0 3 
4 3 1 0 4 3 1 0

Use the format rational command so that MATLAB displays the matrix elements as fractions.

524
§8.4 *Matrices of Linear Maps on a Vector Space

{MALT} 8.4 *Matrices of Linear Maps on a Vector Space


Returning to the general finite dimensional vector space V , suppose that
W = {w1 , . . . , wn } and Z = {z1 , . . . , zn }
are bases of V . Then we can write
v = α1 w1 + · · · + αn wn and v = β1 z 1 + · · · + βn z n
to obtain the coordinates
{e:vincoords} [v]W = (α1 , . . . , αn ) and [v]Z = (β1 , . . . , βn ) (8.4.1)
of v relative to the bases W and Z. The question that we address is: How are [v]W and
[v]Z related? We answer this question by finding an n × n matrix CWZ such that
   
α1 β1
 ..   . 
{e:coordchange}  .  = CWZ  ..  . (8.4.2)
αn βn
We may rewrite (8.4.2) as
{e:coordchange2} [v]W = CWZ [v]Z . (8.4.3)
Definition 8.4.1. Let W and Z be bases for the n-dimensional vector space V . The n × n
matrix CWZ is a transition matrix if CWZ satisfies (8.4.3).

Transition Mappings Defined The next theorem presents a method for finding the transition
{T:coordform} matrix between coordinates associated to bases in an n-dimensional vector space V .
Theorem 8.4.2. Let W = {w1 , . . . , wn } and Z = {z1 , . . . , zn } be bases for the n-
dimensional vector space V . Then
 
c11 · · · c1n
 .. .. .. 
{e:coordform} CWZ =  . . .  (8.4.4)
cn1 ··· cnn
is the transition matrix, where
z1 = c11 w1 + · · · + cn1 wn
{e:wtoz} .. (8.4.5)
.
zn = c1n w1 + · · · + cnn wn
for scalars cij .

525
§8.4 *Matrices of Linear Maps on a Vector Space

Proof We can restate (8.4.5) as


 
c1j
[zj ]W =  ...  .
 

cnj

Note that
[zj ]Z = ej ,
by definition. Since the transition matrix satisfies [v]W = CWZ [v]Z for all vectors v ∈ V , it
must satisfy this relation for v = zj . Therefore,

[zj ]W = CWZ [zj ]Z = CWZ ej .

It follows that [zj ]W is the j th column of CWZ , which proves the theorem. 

A Formula for CWZ when V = Rn For bases in Rn , there is a formula for finding transition
matrices. Let W = {w1 , . . . , wn } and Z = {z1 , . . . , zn } be bases of Rn — written as row
vectors. Also, let v ∈ Rn be written as a row vector. Then (8.3.6) implies that
−1 t
[v]W = PW v and [v]Z = PZ−1 v t ,

where
PW = (w1t | · · · |wnt ) and PZ = (z1t | · · · |znt ).
It follows that
−1
[v]W = PW PZ [v]Z
and that
{e:coordformn} −1
CWZ = PW PZ . (8.4.6)

As an example, consider the following bases of R4 . Let

w1 = [1, 4, 2, 3] z1 = [3, 2, 0, 1]
w2 = [2, 1, 1, 4] z2 = [−1, 0, 2, 3]
{MATLAB:39} (8.4.7*)
w3 = [0, 1, 5, 6] z3 = [3, 1, 1, 3]
w4 = [2, 5, −1, 0] z4 = [2, 2, 3, 5]

Then the matrix CWZ is obtained by typing e9_4_7 to enter the bases and

inv([w1' w2' w3' w4'])*[z1' z2' z3' z4']

526
§8.4 *Matrices of Linear Maps on a Vector Space

to compute CWZ . The answer is:

ans =
-8.0000 5.5000 -7.0000 -3.2500
-0.5000 0.7500 0.0000 0.1250
4.5000 -2.7500 4.0000 2.3750
6.0000 -4.0000 5.0000 2.5000

Coordinates Relative to Two Different Bases in R2 Recall the basis W

w1 = (1, 1) and w2 = (1, −2)

of R2 that was used in a previous example. Suppose that Z = {z1 , z2 } is a second basis of
R2 . Write v = (v1 , v2 ) as a linear combination of the basis Z

v = β1 z 1 + β2 z 2 ,

obtaining the coordinates [v]Z = (β1 , β2 ).


We use MATLAB to illustrate how the coordinates of a vector v relative to two bases may
be viewed geometrically. Suppose that z1 = (1, 3) and z2 = (1, −2). Then enter the two
bases W and Z by typing

w1 = [1 1];
w2 = [1 -2];
z1 = [1 3];
z2 = [-1 2];
ccoord

The MATLAB program ccoord opens two graphics windows representing the W and Z planes
with the basis vectors plotted in red. Clicking the left mouse button on a vector in the W
plane simultaneously plots this vector v in both planes in yellow and the coordinates of v
in the respective bases in cyan. See Figure 28. From this display you can visualize the
coordinates of a vector relative to two different bases.
Note that the program ccoord prints the transition matrix CWZ in the MATLAB control
window. We can verify the calculations of the program ccoord on this example by hand.

527
§8.4 *Matrices of Linear Maps on a Vector Space

Coordinates in the {w1,w2} basis Coordinates in the {z1,z2} basis

3 3 z1

2 2 z2

0.7916
1 w1 1
1.319

0 v 0 v

0.6645
−1 −1
−1.192

−2 w2 −2

−3 −3

−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4

Figure 28: The coordinates of v = (1.9839, −0.0097) in the bases w1 = (1, 1), w2 = (1, −2)
{F:2coords} and z1 = (1, 3), z2 = (−1, 2).

Recall that (8.4.6) states that


 −1  
1 2 1 2
CWZ =
2 3 4 1
  
−3 2 1 2
=
2 −1 4 1
 
5 −4
= .
−2 3

Matrices of Linear Maps in Different Bases


{T:matrixcoord}
Theorem 8.4.3. Let L : V → V be a linear mapping and let W and Z be bases of V . Then

[L]Z and [L]W

are similar matrices. More precisely,

{e:matrixcoord} −1
[L]W = CZW [L]Z CZW . (8.4.8)

528
§8.4 *Matrices of Linear Maps on a Vector Space

Proof For every v ∈ Rn we compute

CZW [L]W [v]W = CZW [L(v)]W


= [L(v)]Z
= [L]Z [v]Z
= [L]Z CZW [v]W .

Since this computation holds for every [v]W , it follows that

CZW [L]W = [L]Z CZW .

Thus (8.4.8) is valid. 

Exercises

{c7.1.2}
1. Let
w1 = (1, 2) and w2 = (0, 1)
and
z1 = (2, 3) and z2 = (3, 4)
be two bases of R . Find CW Z .
2

Answer:  
2 3
CWZ = .
−1 −2

Solution: Substitute into Equation (8.4.6) as follows:


 −1    
1 0 2 3 2 3
CWZ = (w1t |w2t )−1 (z1t |z2t ) = = .
2 1 3 4 −1 −2

{c7.3.2}
2. Let f1 (t) = cos t and f2 (t) = sin t be functions in C 1 . Let V be the two dimensional subspace
spanned by f1 , f2 ; so F = {f1 , f2 } is a basis for V . Let L : V → V be the linear mapping defined
df
by L(f ) = . Find [L]F .
dt
Answer:  
0 1
[L]F = .
−1 0

529
§8.4 *Matrices of Linear Maps on a Vector Space

Solution: By Definition 8.3.3, the j th column of [L]F is [L(fj )]F . In this case,
d d
L(f1 ) = (cos t) = − sin t and L(f2 ) = (sin t) = cos t.
dt dt
The basis {f1 , f2 } is {cos t, sin t}. In this basis, − sin t has coordinates (0, −1), and cos t has
coordinates (1, 0). Therefore, the 1st column of [L]F is (0, −1) and the 2nd column is (1, 0).

{c7.3.3}
3. Let L : V → W and M : W → V be linear mappings, and assume dim V > dim W . Show that
M ◦L : V → V is not invertible.
If there exists a nonzero vector v such that M ◦L(v) = 0, then the nullity of M ◦L is nonzero, so M ◦L
is not invertible. If L(v) = 0, then M ◦ L(v) = 0. We know that nullity(L) = dim V − dim W > 0.
Therefore, M ◦ L(v) is not invertible

{c7.1.5} 4. (matlab) Let


w1 = (0.23, 0.56) and w2 = (0.17, −0.71)
and
z1 = (−1.4, 0.3) and z2 = (0.1, −0.2)
be two bases of R and let v = (0.6, 0.1). Find [v]W , [v]Z , and CWZ .
2

Answer: MATLAB gives the values for [v]W , [v]Z , and CWZ as:

vW = vZ = CWZ =
1.7137 -0.5200 -3.6480 0.1431
1.2108 -1.2800 -3.2998 0.3946

Solution: In MATLAB, find [v]W by row reducing the matrix (w1t |w2t |v t ). Row reduce (z1t |z2t |v t )
to solve for [v]Z . To find CWZ , compute (w1t |w2t )−1 (z1t |z2t ).

{mc.exerciseErr3}
5. Let W = {v1 , v2 , v3 } be the basis of R3 where
     
−2 0 −1
v1 =  1  v2 =  0  v3 =  1  .
1 2 0

Let T : R3 → R3 be a linear function so that


     
−1 −4 1
T (v1 ) =  1  T (v2 ) =  2  T (v3 ) =  0  .
0 2 0

530
§8.4 *Matrices of Linear Maps on a Vector Space

(a) Find the coordinates of T (v3 ) relative to the basis W. That is, compute [T (v3 )]W .
(b) Find the matrix [T ]W of T relative to the basis W.
(c) Find the standard matrix [T ]E of T .

Answer:
 
−1
(a) [T (v3 )]W =  1/2 .
1
 
0 2 −1
(b) [T ]W =  0 0 1/2 .
1 0 1
 
0 1 −2
(c) [T ]E =  0 0 1 .
1 1 1

Solution:

(a) Express T (v3 ) as the linear combination

T (v3 ) = av1 + bv2 + cv3 .

Solving the linear system yields a = −1, b = 1/2 and c = 1. Then


   
a −1
[T (v3 )]W =  b  =  1/2  .
c 1

(b) Repeating the procedure above, obtain


 
0
[T (v1 )]W = 0 
1

and  
2
[T (v2 )]W =  0 .
0
By Lemma 8.3.3 (see also Definition 8.3.3), the matrix of T relative to the basis W as
 
0 2 −1
 0 0 1/2  .
1 0 1

531
§8.4 *Matrices of Linear Maps on a Vector Space

(c) It follows from Theorem 8.1.2 and (8.1.3) that [T ]E is given by

[T ]E = (T (v1 ) | T (v2 ) | T (v3 ))(v1 | v2 | v3 )−1 .


 
−1 −1 0
One can compute (v1 | v2 | v3 ) =  1/2 1/2 1/2  and so
−1

1 2 0
    
−1 −4 1 −1 −1 0 0 1 −2
[T ]E =  1 2 0   1/2 1/2 1/2  =  0 0 1 .
0 2 0 1 2 0 1 1 1

This could also be done in the same fashion as part (b).

6. (matlab) Consider the matrix


{c7.5.A}
 √ √   
1√ 1 − 3 1 + √3 0.3333 −0.2440 0.9107
1
{MATLAB:40} A =  1 + √3 1√ 1 − 3  =  0.9107 0.3333 −0.2440  (8.4.9*)
3
1− 3 1+ 3 1 −0.2440 0.9107 0.3333

(a) Try to determine the way that the matrix A moves vectors in R3 . For example, let
1 1
w1 = (1, 1, 1)t w2 = √ (1, −2, 1)t w3 = √ (1, 0, −1)t
6 2
and compute Awj .
(b) Let W = {w1 , w2 , w3 } be the basis of R3 given in (a). Compute [LA ]W .
(c) Determine the way that the matrix [LA ]W moves vectors in R3 . For example, consider how
this matrix moves the standard basis vectors e1 , e2 , e3 . Compare this answer with that in part
(a).

(a) Answer: The matrix A fixes w1 , moves w2 to w3 , and moves w3 to −w2 .


Solution: Load the matrix and vectors into MATLAB. Then compute Awj for each vector wj ,
obtaining Aw1 = w1 , Aw2 = w3 , and Aw3 = −w2 .
 
1 0 0
(b) Answer: [LA ]W =  0 0 −1 .
0 1 0
Solution: From Section 8.3, we know that

[LA ]W = P −1 AP,

where P = (w1 |w2 |w3 ). Enter P into MATLAB and compute this matrix.

532
§8.4 *Matrices of Linear Maps on a Vector Space

(c) Answer: The matrix [LA ]W fixes e1 , moves e2 to e3 , and moves e3 to −e2 .
Solution: Compute [LA ]W e1 = e1 , [LA ]W e2 = e3 , and [LA ]W e3 = −e2 . This result is consistent
with part (a), since [LA ]W maps the standard basis vectors to one another in the same way that A
maps the basis vectors of W to one another.

533
Chapter 9 Least Squares

9 Least Squares
In Section 9.1 we study the geometric problem of least squares approximations: Given a
point x0 and a subspace W ⊂ Rn , find the point w0 ∈ W closest to x0 . We then use least
squares approximation to discuss regression or least squares fitting of data in Section 9.2.

534
§9.1 Least Squares Approximations

{Chap:LeastSquares}

{S:LSA} 9.1 Least Squares Approximations


Let W ⊂ Rn be a subspace and x0 ∈ Rn be a vector. In this section we solve a basic
geometric problem and investigate some of its consequences. The problem is:

Find a vector w0 ∈ W that is the nearest vector in W to x0 .

x0

w0
W

{F:nearest} Figure 29: Approximation of x0 by w0 ∈ W by least squares.

The distance between two vectors v and w is ||v − w|| and the geometric problem can be
rephrased as follows: find a vector w0 ∈ W such that

{E:leastsq} ||x0 − w0 || ≤ ||x0 − w|| ∀w ∈ W. (9.1.1)

Condition (9.1.1) is called the least squares approximation. In order to see where this name
comes from we write(9.1.1) in the equivalent form

||x0 − w0 ||2 ≤ ||x0 − w||2 ∀w ∈ W.

This form means that for w = w0 the sum of the squares of the components of the vector
x0 − w is minimal.
Before continuing, we state and prove the Law of Phythagorus. Let z1 , z2 ∈ Rn be orthogonal
vectors. Then
{E:LP} ||z1 + z2 ||2 = ||z1 ||2 + ||z2 ||2 . (9.1.2)

535
§9.1 Least Squares Approximations

To verify (9.1.2) calculate

||z1 + z2 ||2 = (z1 + z2 ) · (z1 + z2 )


= z1 · z1 + 2z1 · z2 + z2 · z2
= ||z1 ||2 + 2z1 · z2 + ||z2 ||2 .

Since z1 and z2 are orthogonal, z1 · z2 = 0 and the Law of Phythagorus is valid.

{L:orthoLSA} Using (9.1.1) and (9.1.2), we can rephrase the minimum distance problem as follows.
Lemma 9.1.1. The vector w0 ∈ W is the closest vector to x0 ∈ Rn if the vector x0 − w0 is
orthogonal to every vector in W . (See Figure 29.)

Proof Write x0 − w = z1 + z2 where z1 = x0 − w0 and z2 = w0 − w. By assumption,


x0 − w0 is orthogonal to every vector in W ; so z1 and z2 ∈ W are orthogonal. It follows
from (9.1.2) that
||x0 − w||2 = ||x0 − w0 ||2 + ||w0 − w||2 .
Since ||w0 − w||2 ≥ 0, (9.1.1) is valid, and w0 is the vector nearest to x0 in W . 

Least Squares Distance to a Line Suppose W is as simple a subspace as possible; that is,
suppose W is one dimensional with basis vector w. Since W is one dimensional, a vector
w0 ∈ W that is closest to x0 must be a multiple of w; that is, w0 = aw. Suppose that we
can find a scalar a so that x0 − aw is orthogonal to every vector in W . Then it follows from
Lemma 9.1.1 that w0 is the closest vector in W to x0 . To find a, calculate

0 = (x0 − aw) · w = x0 · w − aw · w.

Then
x0 · w
a=
||w||2
and
x0 · w
{E:singleortho} w0 = w. (9.1.3)
||w||2
Observe that ||w||2 6= 0 since w is a basis vector.
For example, if x0 = (1, 2, −1, 3) ∈ R4 and w = (0, 1, 2, 3). The the vector w0 in the space
spanned by w that is nearest to x0 is
9
w0 = w
14
since x0 · w = 9 and ||w||2 = 14.

536
§9.1 Least Squares Approximations

Least Squares Distance to a Subspace Similarly, using Lemma 9.1.1 we can solve the general
least squares problem by solving a system of linear equations. Let w1 , . . . , wk be a basis for
W and suppose that
w0 = α1 w1 + · · · + αk wk

{T:nearestvector} for some scalars αi . We now show how to find these scalars.

Theorem 9.1.2. Let x0 ∈ Rn be a vector, and let {w1 , . . . , wk } be a basis for the subspace
W ⊂ Rn . Then
w0 = α1 w1 + · · · + αk wk
is the nearest vector in W to x0 when
 
α1
 .. 
{E:nearestvector} t −1 t
 .  = (A A) A x0 , (9.1.4)
αk

where A = (w1 | · · · |wk ) is the n × k matrix whose columns are the basis vectors of W .

Proof Observe that the vector x0 − w0 is orthogonal to every vector in W precisely when
x0 − w0 is orthogonal to each basis vector wj . It follows from Lemma 9.1.1 that w0 is the
closest vector to x0 in W if
(x0 − w0 ) · wj = 0
for every j. That is, if
w 0 · w j = x0 · w j
for every j. These equations can be rewritten as a system of equations in terms of the αi ,
as follows:
w1 · w1 α1 + · · · + w1 · wk αk = w1 · x0
{E:dots} .. (9.1.5)
.
wk · w1 α1 + · · · + wk · wk αk = wk · x0 .

Note that if u, v ∈ Rn are column vectors, then u · v = ut v. Therefore, we can rewrite (9.1.5)
as  
α1
At A  ...  = At x0 ,
 

αk
where A is the matrix whose columns are the wj and x0 is viewed as a column vector. Note
that the matrix At A is a k × k matrix.

537
§9.1 Least Squares Approximations

We claim that At A is invertible. To verify this claim, it suffices to show that the null space
of At A is zero; that is, if At Az = 0 for some z ∈ Rk , then z = 0. First, calculate

||Az||2 = Az · Az = (Az)t Az = z t At Az = z t 0 = 0.

It follows that Az = 0. Now if we let z = (z1 , . . . , zk )t , then the equation Az = 0 may be


rewritten as
z1 w1 + · · · + zk wk = 0.
Since the wj are linearly independent, it follows that the zj = 0. In particular, z = 0. Since
At A is invertible, (9.1.4) is valid, and the theorem is proved. 

Exercises

538
§9.2 Least Squares Fitting of Data

{S:7.6} 9.2 Least Squares Fitting of Data


We begin this section by using the method of least squares to find the best straight line fit
to a set of data. Later in the section we will discuss best fits to other curves.

An Example of Best Linear Fit to Data Suppose that we are given n data points (xi , yi ) for
i = 1, . . . , 10. For example, consider the ten points
(2.0, 0.1) (3.0, 2.7) (1.5, −1.1) (−1.0, −5.5) (0.0, −3.4)
{E:scatterdata} (9.2.1*)
(3.6, 3.0) (0.7, −2.8) (4.1, 4.0) (1.9, −1.9) (5.0, 5.5)
The ten points (xi , yi ) are plotted in Figure 30 using the commands

e9_3_1
plot(X,Y,'o')
axis([-3,7,-8,8])
xlabel('x')
ylabel('y')

0
y

−2

−4

−6

−8
−3 −2 −1 0 1 2 3 4 5 6 7
x

{F:linreg} Figure 30: Scatter plot of data in (9.2.1*).

Next, suppose that there is a linear relation between the xi and the yi ; that is, we assume
that there are constants b1 and b2 (that do not depend on i) for which yi = b1 + b2 xi for
each i. But these points are just data; errors may have been made in their measurement.
So we ask: Find b01 and b02 so that the error made in fitting the data to the line y = b01 + b02 x
is minimal, that is, the error that is made in that fit is less than or equal to the error made
in fitting the data to the line y = b1 + b2 x for any other choice of b1 and b2 .

539
§9.2 Least Squares Fitting of Data

We begin by discussing what that error actually is. Given constants b1 and b2 and given a
data point xi , the difference between the data value yi and the hypothesized value b1 + b2 xi
is the error that is made at that data point. Next, we combine the errors made at all of the
data points; a standard way to combine the errors is to use the Euclidean distance
1
2 2

E(b) = (y1 − (b1 + b2 x1 )) + · · · + (y10 − (b1 + b2 x10 )) 2.

Rewriting E(b) in vector notation leads to an economy in notation and to a conceptual


advantage. Let

X = (x1 , . . . , x10 )t Y = (y1 , . . . , y10 )t and F1 = (1, 1, . . . , 1)

be vectors in R10 . Then in coordinates


 
y1 − (b1 + b2 x1 )
..
Y − (b1 F1 + b2 X) =  . .
 

y10 − (b1 + b2 x10 )

It follows that
E(b) = ||Y − (b1 F1 + b2 X)||.
The problem of making a least squares fit is to minimize E over all b1 and b2 .
To solve the minimization problem, note that the vectors b1 F1 +b2 X form a two dimensional
subspace W = span{F1 , X} ⊂ R10 (at least when X is not a scalar multiple of F1 , which is
almost always). Minimizing E is identical to finding a vector w0 = b01 F1 + b02 X ∈ W that
is nearest to the vector Y ∈ R10 . This is the least squares question that we solved in the
Section 9.1.
We can use MATLAB to compute the values of b01 and b02 that give the best linear approxi-
mation to Y . If we set the matrix A = (F1 |X), then Theorem 9.1.2 implies that the values
of b01 and b02 are obtained using (9.1.4). In particular, type e10_3_1 to call the vectors X,
Y, F1 into MATLAB, and then type

A = [F1 X];
b0 = inv(A'*A)*A'*Y

to obtain

b0(1) = -3.8597
b0(2) = 1.8845

540
§9.2 Least Squares Fitting of Data

Superimposing the line y = −3.8597 + 1.8845x on the scatter plot in Figure 30 yields the
plot in Figure 31. The total error is E(b0) = 1.9634 (obtained in MATLAB by typing
norm(Y-(b0(1)*F1+b0(2)*X)). Compare this with the error E(2, −4) = 2.0928.
8

y
−2

−4

−6

−8
−3 −2 −1 0 1 2 3 4 5 6 7
x

{F:linreg2} Figure 31: Scatter plot of data in (9.2.1*) with best linear approximation.

General Linear Regression We can summarize the previous discussion, as follows. Given n
data points
(x1 , y1 ), . . . , (xn , yn );
form the vectors

X = (x1 , . . . , xn )t Y = (y1 , . . . , yn )t and F1 = (1, . . . , 1)t

in Rn . Find constants b01 and b02 so that b01 F1 + b02 X is a vector in W = span{F1 , X} ⊂ Rn
that is nearest to Y . Let
A = (F1 |X)
be the n × 2 matrix. This problem is solved by least squares in (9.1.4) as
 0 
b1
{E:LSlinfit} = (At A)−1 At Y. (9.2.2)
b02

Least Squares Fit to a Quadratic Polynomial Suppose that we want to fit the data (xi , yi )
to a quadratic polynomial
y = b1 + b2 x + b3 x2

541
§9.2 Least Squares Fitting of Data

by least squares methods. We want to find constants b01 , b02 , b03 so that the error made is
using the quadratic polynomial y = b01 + b02 x + b03 x2 is minimal among all possible choices
of quadratic polynomials. The least squares error is
 
E(b) = ||Y − b1 F1 + b2 X + b3 X (2) ||

where t
X (2) = x21 , . . . , x2n
and, as before, F1 is the n vector with all components equal to 1.
We solve the minimization problem as before. In this case, the space of possible approxima-
tions to the data W is three dimensional; indeed, W = span{F1 , X, X (2) }. As in the case
of fits to lines we try to find a point in W that is nearest to the vector Y ∈ Rn . By (9.1.4),
the answer is:
b = (At A)−1 At Y,
where A = (F1 |X|X (2) ) is an n × 3 matrix.
Suppose that we try to fit the data in (9.2.1*) with a quadratic polynomial rather than a
linear one. Use MATLAB as follows

e9_3_1
A = [F1 X X.*X];
b = inv(A'*A)*A'*Y;

to obtain

b0(1) = 0.0443
b0(2) = 1.7054
b0(3) = -3.8197

So the best parabolic fit to this data is y = −3.8197 + 1.7054x + 0.0443x2 . Note that the
coefficient of x2 is small suggesting that the data was well fit by a straight line. Note also
that the error is E(b0) = 1.9098 which is only marginally smaller than the error for the best
linear fit. For comparison, in Figure 32 we superimpose the equation for the quadratic fit
onto Figure 31.

General Least Squares Fit The approximation to a quadratic polynomial shows that least
squares fits can be made to any finite dimensional function space. More precisely, Let C be
a finite dimensional space of functions and let
f1 (x), . . . , fm (x)

542
§9.2 Least Squares Fitting of Data

y
−2

−4

−6

−8
−3 −2 −1 0 1 2 3 4 5 6 7
x

Figure 32: Scatter plot of data in (9.2.1*) with best linear and quadratic approximations.
{F:linreg3} The best linear fit is plotted with a dashed line.

be a basis for C. We have just considered two such spaces: C = span{f1 (x) = 1, f2 (x) = x}
for linear regression and C = span{f1 (x) = 1, f2 (x) = x, f3 (x) = x2 } for least squares fit to
a quadratic polynomial.
The general least squares fit of a data set

(x1 , y1 ), . . . , (xn , yn )

is the function g0 (x) ∈ C that is nearest to the data set in the following sense. Let

X = (x1 , . . . , xn )t and Y = (y1 , . . . , yn )t

be column vectors in Rn . For any function g(x) define the column vector

G = (g(x1 ), . . . , g(xn ))t ∈ Rn .

So G is the evaluation of g(x) on the data set. Then the error

E(g) = ||Y − G||

is minimal for g = g0 .
More precisely, we think of the data Y as representing the (approximate) evaluation of a
function on the xi . Then we try to find a function g0 ∈ C whose values on the xi are as
near as possible to the vector Y . This is just a least squares problem. Let W ⊂ Rn be the
vector subspace spanned by the evaluations of function g ∈ C on the data points xi , that

543
§9.2 Least Squares Fitting of Data

is, the vectors G. The minimization problem is to find a vector in W that is nearest to Y .
This can be solved in general using (9.1.4). That is, let A be the n × m matrix

A = (F1 | · · · |Fm )

where Fj ∈ Rn is the column vector associated to the j th basis element of C, that is,

Fj = (fj (x1 ), . . . , fj (xn ))t ∈ Rn .

The minimizing function g0 (x) ∈ C is a linear combination of the basis functions


f1 (x), . . . , fn (x), that is,
g0 (x) = b1 f1 (x) + · · · + bm fm (x)
for scalars bi . If we set
b = (b1 , . . . , bm ) ∈ Rm ,
then least squares minimization states that

{E:LSFG} b = (A0 A)−1 A0 Y. (9.2.3)

This equation can be solved easily in MATLAB. Enter the data as column n-vectors X and Y.
Compute the column vectors Fj = fj (X) and then form the matrix A = [F1 F2 · · · Fm].
Finally compute

b = inv(A'*A)*A'*Y

Least Squares Fit to a Sinusoidal Function We discuss a specific example of the general least
squares formulation by considering the weather. It is reasonable to expect monthly data
on the weather to vary periodically in time with a period of one year. In Table 3 we give
average daily high and low temperatures for each month of the year for Paris and Rio de
Janeiro. We attempt to fit this data with curves of the form:
   
2π 2π
g(T ) = b1 + b2 cos T + b3 sin T ,
12 12
where T is time measured in months and b1 , b2 , b3 are scalars. These functions are 12
periodic, which seems appropriate for weather data, and form a three dimensional function
space C. Recall the trigonometric identity

a cos(ωt) + c sin(ωt) = d sin(ω(t − ϕ))

where p
d= a2 + c2 .

544
§9.2 Least Squares Fitting of Data

Paris Rio de Janeiro Paris Rio de Janeiro


Month High Low High Low Month High Low High Low
1 55 39 84 73 7 81 64 75 63
2 55 41 85 73 8 81 64 76 64
3 59 45 83 72 9 77 61 75 65
4 64 46 80 69 10 70 54 77 66
5 68 55 77 66 11 63 46 79 68
6 75 61 76 64 12 55 41 82 71

{T:parrio} Table 3: Monthly Average of Daily High and Low Temperatures in Paris and Rio de Janeiro.

Based on this identity we call C the space of sinusoidal functions. The number d is called
the amplitude of the sinusoidal function g(T ).
Note that each data set consists of twelve entries — one for each month. Let T =
(1, 2, . . . , 12)t be the vector X ∈ R12 in the general presentation. Next let Y be the data in
one of the data sets — say the high temperatures in Paris.
Now we turn to the vectors representing basis functions in C. Let

F1=[1 1 1 1 1 1 1 1 1 1 1 1]'

be the vector associated with the basis function f1 (T ) = 1. Let F2 and F3 be the column
vectors associated to the basis functions
   
2π 2π
f2 (T ) = cos T and f3 (T ) = sin T .
12 12
These vectors are computed by typing

F2 = cos(2*pi/12*T);
F3 = sin(2*pi/12*T);

By typing temper, we enter the temperatures and the vectors T, F1, F2 and F3 into MATLAB.
To find the best fit to the data by a sinusoidal function g(T ), we use (9.1.4). Let A be the
12 × 3 matrix

A = [F1 F2 F3];

The table data is entered in column vectors ParisH and ParisL for the high and low Paris
temperatures and RioH and RioL for the high and low Rio de Janeiro temperatures. We

545
§9.2 Least Squares Fitting of Data

can find the best least squares fit of the Paris high temperatures by a sinusoidal function
g0 (T ) by typing

b = inv(A'*A)*A'*ParisH

obtaining

b(1) = 66.9167
b(2) = -9.4745
b(3) = -9.3688

The result is plotted in Figure 33 by typing

plot(T,ParisH,'o')
axis([0,13,0,100])
xlabel('time (months)')
ylabel('temperature (Fahrenheit)')
hold on
xx = linspace(0,13);
yy = b(1) + b(2)*cos(2*pi*xx/12) +
b(3)*sin(2*pi*xx/12);
plot(xx,yy)

A similar exercise allows us to compute the best approximation to the Rio de Janeiro high
temperatures obtaining

b(1) = 79.0833
b(2) = 3.0877
b(3) = 3.6487

The value of b(1) is just the mean high temperature and not surprisingly that value is much
higher in Rio than in Paris. There is yet more information contained in these approxima-
tions. For the high temperatures in Paris and Rio

dP = 13.3244 and dR = 4.7798.

The amplitude d measures the variation of the high temperature about its mean. It is much
greater in Paris than in Rio, indicating that the difference in temperature between winter
and summer is much greater in Paris than in Rio.

546
§9.2 Least Squares Fitting of Data

100 100

90 90

80 80

70 70

temperature (Farenheit)
temperature (Farenheit)

60 60

50 50

40 40

30 30

20 20

10 10

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
time (months) time (months)

Figure 33: Monthly averages of daily high temperatures in Paris (left) and Rio de Janeiro
{F:ParisH} (right) with best sinusoidal approximation.

Least Squares Fit in MATLAB The general formula for a least squares fit of data (9.2.3) has
been preprogrammed in MATLAB. After setting up the matrix A whose columns are the
vectors Fj just type

b = A\Y

This MATLAB command can be checked on the sinusoidal fit to the high temperature Rio
de Janeiro data by typing

b = A\RioH

and obtaining

b =
79.0833
3.0877
3.6487

547
§9.2 Least Squares Fitting of Data

Exercises

{c7.6.1} 1. (matlab) World population data for each decade of this century (except for 1910) is given in
Table 4. Assume that population growth is linear P = mT + b where time T is measured in decades
since the year 1900 and P is measured in billions of people. This data can be recovered by typing
e9_3_po.

(a) Find m and b to give the best linear fit to this data.
(b) Use this linear approximation to the data to make predictions of the world populations in the
year 1910 and 2000.
(c) Do you expect the prediction for the year 2000 to be high or low or on target? Explain why by
graphing the data with the best linear fit superimposed and by using the differential equation
population model discussed in Section 4.2.

Year Population (in millions) Year Population (in millions)


1900 1625 1950 2516
1910 n.a. 1960 3020
1920 1813 1970 3698
1930 1987 1980 4448
1940 2213 1990 5292

{T:popdata} Table 4: Twentieth Century World Population Data by Decades.

(a) Answer: The best linear fit to the data is obtained with m ≈ 0.4084 and b ≈ 0.9603, where m
and b are measured in billions.
Solution: Create the matrix A whose columns are w1 and w2 . Then use (9.1.4) to compute the
best values for m and b.
(b) In 1910, P ≈ 408.4(1) + 960.3 = 1369 million people.
In 2000, P ≈ 408.4(10) + 960.3 = 5044 million people.
(c) Answer: The prediction for 2000 is likely to be low.
Solution: As shown in Figure 1, a linear approximation does not fit the data points well. To
understand why, assume that population change is governed by the differential equation:
dP
= rP
dT
d2 P
where r is constant. Then = r2 P > 0. Then the population curve is concave up, and a linear
dT 2
approximation underestimates the population at the endpoints of the curve.

548
§9.2 Least Squares Fitting of Data

5.5

4.5

Population (billions)
3.5

2.5

1.5

0.5
0 1 2 3 4 5 6 7 8 9 10
Time since 1900 (decades)

Figure 1

{c7.6.2} 2. (matlab) Find the best sinusoidal approximation to the monthly average low temperatures in
Paris and Rio de Janeiro. How does the variation of these temperatures about the mean compare
to the high temperature calculations? Was this the result you expected?
Answer: A good sinusoidal approximation for the low temperatures in Paris is
2πt 2πt
temp(t) = 51.4167 − 9.4908 cos( ) − 8.4745 sin( ).
12 12
An approximation for the low temperatures in Rio is
2πt 2πt
temp(t) = 67.8333 + 3.3987 cos( ) + 3.5654 sin( ).
12 12

Solution: Use a set of MATLAB commands similar to those shown in Section 9.2 to graph the low
temperatures in Paris (Figure 2a) and Rio (Figure 2b). The variations about the mean temperature
are dP = 12.7237 and dR = 4.9258. (These values can be computed for each set of data by typing
d = sqrt(b(2)^2 + b(3)^2) in MATLAB.) The low temperatures in Paris vary more depending
on the time of year than do the temperatures in Rio, which is consistent with the graphs and with
the values of d for the high temperatures.

{c7.6.3} 3. (matlab) In Table 5 we present weather data from ten U.S. cities. The data is the average
number of days in the year with precipitation and the percentage of sunny hours to hours when it
could be sunny. Find the best linear fit to this data.

549
§9.2 Least Squares Fitting of Data

100 100

90 90

80 80

70 70
Paris temperature (Fahrenheit)

Rio temperature (Fahrenheit)


60 60

50 50

40 40

30 30

20 20

10 10

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
time (months) time (months)

Figure 2a Figure 2b

City Rainy Days Sunny (%) City Rainy Days Sunny (%)
Charleston 92 72 Kansas City 98 59
Chicago 121 54 Miami 114 85
Dallas 82 65 New Orleans 103 61
Denver 82 67 Phoenix 28 88
Duluth 136 52 Salt Lake City 99 59

{T:sunny} Table 5: Precipitation Days Versus Sunny Time for Selected U.S. Cities.

Answer: Let R be the number of days in the year with precipitation and let s be the percentage
of sunny hours to daylight hours. Then the best linear estimate of the relationship between the two
is:
R ≈ 199.2 − 156.6s.

Solution: In MATLAB, enter the data for number of rainy days as the vector R, and then enter the
data for percentage of sunny hours as the vector s1 . Then create the 1 × 10 vector s2 = (1, 1, . . . , 1).
Now, find the best vector b = (b1 , b2 ) such that R = b1 s1 + b2 s2 . This vector can be found using
(9.1.4). The solution vector is (b1 , b2 ) ≈ (−156.6, 199.2). Figure 3 shows the actual data graphed
against the linear estimate.

550
§9.2 Least Squares Fitting of Data

200

180

160

140
number of rainy days per year

120

100

80

60

40

20

0
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
percentage of sunny hours to daylight hours

Figure 3

551
Chapter 10 Orthogonality

10 Orthogonality
In Section 10.1 we discuss orthonormal bases (bases in which each basis vector has unit
length and any two basis vectors are perpendicular) and orthogonal matrices (matrices
whose columns form an orthonormal basis). We will see that the computation of coordi-
nates in an orthonormal basis is particularly straightforward. The Gram-Schmidt orthonor-
malization process for constructing orthonormal bases is presented in Section 10.2. We
use orthogonality in Section 10.3 to study the eigenvalues and eigenvectors of symmetric
matrices (the eigenvalues are real and the eigenvectors can be chosen to be orthonormal).
The chapter ends with a discussion of the QR decomposition for finding orthonormal bases
in Section 10.4. This decomposition leads to an algorithm that is numerically superior to
Gram-Schmidt and is the one used in MATLAB.

552
§10.1 Orthonormal Bases and Orthogonal Matrices

{Chap:LinTrans}
{S:orthonormal} 10.1 Orthonormal Bases and Orthogonal Matrices
In Section 8.3 we discussed how to write the coordinates of a vector in a basis. We now
show that finding coordinates of vectors in certain bases is a very simple task — these bases
are called orthonormal bases.
Nonzero vectors v1 , . . . , vk in Rn are orthogonal if the dot products
vi · vj = 0
when i 6= j. The vectors are orthonormal if they are orthogonal and of unit length, that is,
vi · vi = 1.

{L:orthog} The standard example of a set of orthonormal vectors in Rn is the standard basis e1 , . . . , en .
Lemma 10.1.1. Nonzero orthogonal vectors are linearly independent.

Proof Let v1 , . . . , vk be a set of nonzero orthogonal vectors in Rn and suppose that


α1 v1 + · · · + αk vk = 0.
To prove the lemma we must show that each αj = 0. Since vi · vj = 0 for i 6= j,
αj vj · vj = α1 v1 · vj + · · · + αk vk · vj
= (α1 v1 + · · · + αk vk ) · vj = 0 · vj = 0.

Since vj · vj = ||vj ||2 > 0, it follows that αj = 0. 


C:symmetric_distinct}
Corollary 10.1.2. A set of n nonzero orthogonal vectors in Rn is a basis.

Proof Lemma 10.1.1 implies that the n vectors are linearly independent, and Chapter 5,
Corollary 5.6.7 states that n linearly independent vectors in Rn form a basis. 

Next we discuss how to find coordinates of a vector in an orthonormal basis, that is, a basis
{T:orthocoord} consisting of orthonormal vectors.
Theorem 10.1.3. Let V ⊂ Rn be a subspace and let {v1 , . . . , vk } be an orthonormal basis
of V . Let v ∈ V be a vector. Then
v = α1 v1 + · · · + αk vk .
where
αi = v · vi .

553
§10.1 Orthonormal Bases and Orthogonal Matrices

Proof Since {v1 , . . . , vk } is a basis of V , we can write

v = α1 v1 + · · · + αk vk

for some scalars αj . It follows that

v · vj = (α1 v1 + · · · + αk vk ) · vj = αj ,

as claimed. 

An Example in R3 Let
1
v1 = √ (1, 1, 1),
3
1
v2 = √ (1, −2, 1),
6
1
v3 = √ (1, 0, −1).
2
A short calculation verifies that these vectors have unit length and are pairwise orthogonal.
Let v = (1, 2, 3) be a vector and determine the coordinates of v in the basis V = {v1 , v2 , v3 }.
Theorem 10.1.3 states that these coordinates are:
√ 7 √
[v]V = (v · v1 , v · v2 , v · v3 ) = (2 3, √ , − 2).
6

Matrices in Orthonormal Coordinates Next we discuss how to find the matrix associated
with a linear map in an orthonormal basis. Let L : Rn → Rn be a linear map and let
V = {v1 , . . . , vn } be an orthonormal basis for Rn . Then the matrix associated to L in the
basis V can be calculated in terms of dot product. That matrix is:

{e:coordorthomat} [L]V = L(vj ) · vi . (10.1.1)

To verify (10.1.1), recall from Definition 8.3.3 that the (i, j)th entry of [L]V is the ith entry
in the vector [L(vj )]V which is L(vj ) · vi by Theorem 10.1.3.

An Example in R2 Let V = {v1 , v2 } ⊂ R2 where


   
1 1 1 1
v1 = √ and v2 = √ .
2 1 2 −1

554
§10.1 Orthonormal Bases and Orthogonal Matrices

The set V is an orthonormal basis of R2 . Using (10.1.1) we can find the matrix associated
to the linear map  
2 1
LA (x) = x
−1 3
in the basis V. That is, compute
   
Av1 · v1 Av2 · v1 1 5 −3
[L]V = = .
Av1 · v2 Av2 · v2 2 1 5

Orthogonal Matrices
{def:orthmat}
Definition 10.1.4. An n × n matrix Q is orthogonal if its columns form an orthonormal
basis of Rn .

The following lemma states elementary properties of orthogonal matrices. Particularly, an


orthogonal matrix is invertible and it is straightforward to compute its inverse.
{lem:orthprop}
Lemma 10.1.5. Let Q be an n × n matrix. Then

(a) Q is orthogonal if and only if Qt Q = In ;

(b) Q is orthogonal if and only if Q−1 = Qt ;

(c) If Q1 , Q2 are orthogonal matrices, then Q1 Q2 is an orthogonal matrix.

Proof (a) Let Q = (v1 | · · · |vn ). Since Q is orthogonal, the vj form an orthonormal basis.
By direct computation note that Qt Q = {(vi · vj )} = In , since the vj are orthonormal. Note
that (b) is simply a restatement of (a).
(c) Now let Q1 , Q2 be orthogonal. Then (a) implies

(Q1 Q2 )t (Q1 Q2 ) = Qt2 Qt1 Q1 Q2 = Qt2 In Q2 = Qt2 Q2 = In ,

thus proving (c). 

Remarks Concerning MATLAB In the next section we prove that every vector subspace of
Rn has an orthonormal basis (see Theorem 10.2.1), and we present a method for constructing
such a basis (the Gram-Schmidt orthonormalization process). Here we note that certain
commands in MATLAB produce bases for vector spaces. For those commands MATLAB

555
§10.1 Orthonormal Bases and Orthogonal Matrices

always produces an orthonormal basis. For example, null(A) produces a basis for the null
space of A. Take the 3 × 5 matrix
 
1 2 3 4 5
{eq:Anull1} A= 0 1 2 3 4 . (10.1.2*)
2 3 4 0 0
Since rank(A) = 3, it follows that the null space of A is two-dimensional. Typing B =
null(A) in MATLAB produces

B =
-0.4666 0
0.6945 0.4313
-0.2876 -0.3235
0.3581 -0.6470
-0.2984 0.5392

The columns of B form an orthonormal basis for the null space of A. This assertion can be
checked by first typing

v1 = B(:, 1);
v2 = B(:, 2);

and then typing

norm(v1)
norm(v2)
dot(v1,v2)
A*v1
A*v2

yields answers 1, 1, 0, (0, 0, 0)t , (0, 0, 0)t (to within numerical accuracy). Recall that the
MATLAB command norm(v) computes the norm of a vector v.

Exercises

{c7.4.1}
1. Find an orthonormal basis for the solutions to the linear equation
2x1 − x2 + x3 = 0.

556
§10.1 Orthonormal Bases and Orthogonal Matrices

1 1
Answer: The vectors w1 = √ (1, 1, −1) and w2 = √ (0, 1, 1) form an orthonormal basis for the
3 2
solution set.
Solution: Find one vector which is a solution to the equation, for example (1, 1, −1). Then, divide
the vector by its length, obtaining the unit vector w1 . By inspection, find a vector v2 which satisfies
1
both the given equation and w1 · v2 = 0. Then set w2 = v2 .
||v2 ||

{c7.4.2}
2. (a) Find the coordinates of the vector v = (1, 4) in the orthonormal basis V
1 1
v1 = √ (1, 2) and v2 = √ (2, −1).
5 5
 
1 1
(b) Let A = . Find [A]V .
2 −3
(a) By Theorem 10.1.3:
1
[v]V = (v · v1 , v · v2 ) = √ (9, −2).
5
(b) By (10.1.1):    
Av1 · v1 Av2 · v1 −1 3
[A]V = = .
Av1 · v2 Av2 · v2 2 −1

{c7.4.3} 3. (matlab) Load the matrix  


1 2 0
A= 0 1 0 
0 0 0
into MATLAB. Then type the command orth(A). Verify that the result is an orthonormal basis
for the column space of A.

{c7.4.3.B}
4. Show that if P is an n × n orthogonal matrix, then det(P ) = ±1.
Solution: P is orthogonal if and only if P t P = In . Hence

1 = det(In ) = det(P t P ) = det(P t ) det(P ) = det(P )2

since det(P t ) = det(P ). It follows that det(P ) = ±1.

In Exercises 5 – 9 decide whether or not the given matrix is orthogonal.

557
§10.1 Orthonormal Bases and Orthogonal Matrices

{c7.9.1a}
 
2 0
5. .
0 1
Answer: The matrix is not orthogonal.
Solution: By Lemma 10.1.5, a matrix A is orthogonal if and only if At A = In .
    
2 0 2 0 4 0
= 6= I2 .
0 1 0 1 0 1
{c7.9.1b}
 
0 1 0
6.  0 0 1 .
1 0 0
Answer: The matrix is orthogonal, since
  
0 0 1 0 1 0
 1 0 0  0 0 1  = I3 .
0 1 0 1 0 0
{c7.9.1c}
 
0 −1 0
7.  0 0 1 .
−1 0 0
Answer: The matrix is orthogonal, since
  
0 0 −1 0 1 0
 −1 0 0  0 0 −1  = I3 .
0 1 0 −1 0 0
{c7.9.1d}
 
cos(1) − sin(1)
8. .
sin(1) cos(1)
Answer: The matrix is orthogonal, since
  
cos(1) sin(1) cos(1) − sin(1)
= I2 .
− sin(1) cos(1) sin(1) cos(1)
{c7.9.1e}
 
1 0 4
9. .
0 1 0
Answer: The matrix is not orthogonal, since all orthogonal matrices are square.

{c7.9.45}
10. Prove that the rows of an n × n orthogonal matrix form an orthonormal basis for Rn .

558
§10.1 Orthonormal Bases and Orthogonal Matrices

Solution: Let A be an orthogonal matrix. By Definition 10.1.4, the columns of A form an or-
thonormal basis for Rn . We must show that the rows of A also form an orthonormal basis for Rn .
By Lemma 10.1.5(b), At = A−1 . From this, we can show

In = AA−1 = AAt = (At )t (At ).

Thus, by Lemma 10.1.5(a), At is an orthogonal matrix; so the columns of At , which are the rows
of A, form an orthonormal basis for Rn .

{mc.exercise10}
11. Show that if P is an n × n orthogonal matrix, then det(P ) = ±1.
Solution: By definition, an n×n matrix P is orthogonal if P t P = In . Properties of the determinant
imply
1 = det(In ) = det(P t P ) = det(P t ) det(P ) = det(P ) det(P ) = det(P )2 .
It follows that det(P )2 = 1 and hence det(P ) = ±1.

559
§10.2 Gram-Schmidt Orthonormalization Process

{S:GSO} 10.2 Gram-Schmidt Orthonormalization Process


Suppose that W = {w1 , . . . , wk } is a basis for the subspace V ⊂ Rn . There is a natural
process by which the W basis can be transformed into an orthonormal basis V of V . This
process proceeds inductively on the wj ; the orthonormal vectors v1 , . . . , vk can be chosen so
that
span{v1 , . . . , vj } = span{w1 , . . . , wj }

for each j ≤ k. Moreover, the vj are chosen using the theory of least squares that we have
just discussed.

The Case j = 2 To gain a feeling for how the induction process works, we verify the case
j = 2. Set
1
{E:ortho1} v1 = w1 ; (10.2.1)
||w1 ||
so v1 points in the same direction as w1 and has unit length, that is, v1 · v1 = 1. The
normalization is shown in Figure 34.
,
v2

w2

v2

v w0 = (w2 ,v1 )v1


1

w1

{F:gram} Figure 34: Planar illustration of Gram-Schmidt orthonormalization.

Next, we find a unit length vector v20 in the plane spanned by w1 and w2 that is perpendicular
to v1 . Let w0 be the vector on the line generated by v1 that is nearest to w2 . It follows
from (9.1.3) that
w2 · v1
w0 = v1 = (w2 · v1 )v1 .
||v1 ||2

560
§10.2 Gram-Schmidt Orthonormalization Process

The vector w0 is shown on Figure 34 and, as Lemma 9.1.1 states, the vector v20 = w2 − w0
is perpendicular to v1 . That is,

{E:ortho2} v20 = w2 − (w2 · v1 )v1 (10.2.2)

is orthogonal to v1 .
Finally, set
1 0
{E:ortho3} v2 = v (10.2.3)
||v20 || 2
so that v2 has unit length. Since v2 and v20 point in the same direction, v1 and v2 are
orthogonal. Note also that v1 and v2 are linear combinations of w1 and w2 . Since v1 and
v2 are orthogonal, they are linearly independent. It follows that

span{v1 , v2 } = span{w1 , w2 }.

In summary: computing v1 and v2 using (10.2.1), (10.2.2) and (10.2.3) yields an orthonormal
basis for the plane spanned by w1 and w2 .

The General Case

{T:orthobasis} Theorem 10.2.1. (Gram-Schmidt Orthonormalization) Let w1 , . . . , wk be a basis for the


subspace W ⊂ Rn . Define v1 as in (10.2.1) and then define inductively

{E:inductiveGS} 0
vj+1 = wj+1 − [(wj+1 · v1 )v1 + · · · + (wj+1 · vj )vj ] (10.2.4)
1
{eq:gsnormal} vj+1 = 0 v0 . (10.2.5)
||vj+1 || j+1

Then span{v1 , . . . , vj } = span{w1 , . . . , wj } and v1 , . . . , vk is an orthonormal basis of W .

Proof We assume that we have constructed orthonormal vectors v1 , . . . , vj such that

span{v1 , . . . , vj } = span{w1 , . . . , wj }.

Our purpose is to find a unit vector vj+1 that is orthogonal to each vi and that satisfies

span{v1 , . . . , vj+1 } = span{w1 , . . . , wj+1 }.

We construct vj+1 in two steps. First we find a vector vj+1


0
that is orthogonal to each of
the vi using least squares. Let w0 be the vector in span{v1 , . . . , vj } that is nearest to wj+1 .

561
§10.2 Gram-Schmidt Orthonormalization Process

Theorem 9.1.2 tells us how to make this construction. Let A be the matrix whose columns
are v1 , . . . , vj . Then (9.1.4) states that the coordinates of w0 in the vi basis is given by
(At A)−1 At wj+1 . But since the vi ’s are orthonormal, the matrix At A is just Ik . Hence
w0 = (wj+1 · v1 )v1 + · · · + (wj+1 · vj )vj .
Note that 0
vj+1= wj+1 −w0 is the vector defined in (10.2.4). We claim that vj+1 0
= wj+1 −w0
is orthogonal to vk for k ≤ j and hence to every vector in span{v1 , . . . , vj }. Just calculate
0
vj+1 · vk = wj+1 · vk − w0 · vk = wj+1 · vk − wj+1 · vk = 0.
Define vj+1 as in (10.2.5). It follows that v1 , . . . , vj+1 are orthonormal and that each vector
is a linear combination of w1 , . . . , wj+1 . 

An Example of Orthonormalization Let W ⊂ R4 be the subspace spanned by the vectors


{eq:gsexam} w1 = (1, 0, −1, 0), w2 = (2, −1, 0, 1), w3 = (0, 0, −2, 1). (10.2.6)
We find an orthonormal basis for W using Gram-Schmidt orthonormalization.

Step 1: Set
1 1
v1 = w1 = √ (1, 0, −1, 0).
||w1 || 2
Step 2: Following the Gram-Schmidt process, use (10.2.4) to define
v20 = w2 − (w2 · v1 )v1
√ 1
= (2, −1, 0, 1) − 2 √ (1, 0, −1, 0)
2
= (1, −1, 1, 1).
Normalization using (10.2.5) yields
1 0 1
v2 = v = (1, −1, 1, 1).
||v20 || 2 2
Step 3: Using (10.2.4) set
v30 = w3 − (w3 · v1 )v1 − (w3 · v2 )v2
√ 1
= (0, 0, −2, 1) − 2 √ (1, 0, −1, 0) −
2
 
1 1
− (1, −1, 1, 1)
2 2
1
= (−3, −1, −3, 5).
4

562
§10.2 Gram-Schmidt Orthonormalization Process

Normalization using (10.2.5) yields

1 0 4
v3 = v = √ (−3, −1, −3, 5).
||v30 || 3 44

Hence we have constructed an orthonormal basis {v1 , v2 , v3 } for W , namely

1
v1 = √ (1, 0, −1, 0)
2
≈ (0.7071, 0, −0.7071, 0)
1
v2 = (1, −1, 1, 1)
{eq:gsoresult} 2 (10.2.7)
= (0.5, −0.5, 0.5, 0.5)
4
v3 = √ (−3, −1, −3, 5)
44
≈ (−0.4523, −0.1508, −0.4523, 0.7538)

Exercises

{c7.5.1}
1. Find an orthonormal basis of R2 by applying Gram-Schmidt orthonormalization to the vectors
w1 = (3, 4) and w2 = (1, 5).
1 1
Answer: The vectors v1 = (3, 4) and v2 = (−4, 3) form an orthonormal basis for R2 .
5 5
Solution: Find these vectors using Gram-Schmidt orthonormalization (Theorem 10.2.1). More
specifically, calculate v1 and v2 such that:
1 1
v1 = w1 = (3, 4).
||w1 || 5
23 1
v20 = w2 − (w2 · v1 )v1 = (1, 5) − (3, 4) = (−44, 33).
 25
 25
1 0 5 1 1
v2 = v2 = (−44, 33) = (−4, 3).
||v20 || 11 25 5

{c7.5.2}
2. Find an orthonormal basis of the plane W ⊂ R3 spanned by the vectors w1 = (1, 2, 3) and
w2 = (2, 5, −1) by applying Gram-Schmidt orthonormalization.
1 1
Answer: The vectors v1 = √ (1, 2, 3) and v2 = √ (19, 52, −41) form an orthonormal basis
14 4746
for W .

563
§10.2 Gram-Schmidt Orthonormalization Process

Solution: Use Gram-Schmidt orthonormalization (Theorem 10.2.1) to calculate v1 and v2 such


that:
1 1
v1 = w1 = √ (1, 2, 3).
||w1 || 14
0 9 1
v2 = w2 − (w2 · v1 )v1 = (2, 5, −1) − (1, 2, 3) = (19, 52, −41).
 14  14
1 0 14 1 1
v2 = v2 = √ (19, 52, −41) = √ (19, 52, −41).
||v20 || 4746 14 4746
{c7.5.3}
3. Let W = {w1 , . . . , wk } be an orthonormal basis of the subspace W ⊂ Rn . Prove that W can be
extended to an orthonormal basis {w1 , . . . , wn } of Rn .
By Corollary 5.6.5, a linearly independent subset of Rn can be extended to form a basis for Rn by
adding linearly independent vectors vk+1 , . . . , vn which are not in the span of {w1 , . . . , wk }. Then,
Gram-Schmidt orthonormalization (Theorem 10.2.1) can be used to form the orthonormal basis
{w1 , . . . , wn } from the basis {w1 , . . . , wk , vk+1 , . . . , vn }.

{c7.5.4} 4. (matlab) Use Gram-Schmidt orthonormalization to find an orthonormal basis for the subspace
of R5 spanned by the vectors
{MATLAB:61} w1 = (2, 1, 3, 5, 7) w2 = (2, −1, 5, 2, 3) and w3 = (10, 1, −23, 2, 3). (10.2.8*)
Extend this basis to an orthonormal basis of R . 5

Answer: An orthonormal basis for the subspace spanned by w1, w2, and w3 is:

v1 = (0.2132, 0.1066, 0.3198, 0.5330, 0.7462)


v2 = (0.2236, -0.3927, 0.8399, -0.1978, -0.2265)
v3 = (0.8452, -0.3542, -0.3979, -0.0409, 0.0088)

Adding the vectors

v4 = (0.4340, 0.8222, 0.1827, 0, -0.3198)


v5 = (0.0424, 0.1813, 0.0251, -0.8217, 0.5381)

to this basis forms an orthonormal basis for R5 .


Solution: The following set of MATLAB commands computes v1, v2, and v3 by Gram-Schmidt
orthonormalization:

v1 = 1/norm(w1)*w1;
v2p = w2 - dot(w2,v1)*v1;
v2 = 1/norm(v2p)*v2p;
v3p = w3 - dot(w3,v1)*v1 - dot(w3,v2)*v2;
v3 = 1/norm(v3p)*v3p;

564
§10.2 Gram-Schmidt Orthonormalization Process

We can verify this result by creating the matrix A whose columns are w1', w2', and w3', then using
the command [Q R] = qr(A,0).
We extend the basis by finding a vector w4 which is orthogonal to w1, w2, and w3. That is,
solve the linear system    
w1 0
0
 w2  w4 =  0  .
w3 0
One possible solution is w4 = (19, 36, 8, 0, −14). Then, find the unit vector v4 in the direction of
w4. Next, add a fifth basis vector by finding the vector w5 which is orthogonal to {w1,w2,w3,w4}.
In this case, w5 = (22, 94, 13, −426, 279). Normalize this vector to obtain v5.

565
§10.3 The Spectral Theory of Symmetric Matrices

{S:symmetric} 10.3 The Spectral Theory of Symmetric Matrices


Eigenvalues and eigenvectors of symmetric matrices have remarkable properties that can be
{T:symmetricmat} summarized in three theorems.

T:symmetricmatvector} Theorem 10.3.1. Let A be a symmetric matrix. Then every eigenvalue of A is real.
Theorem 10.3.2. Let A be an n × n symmetric matrix. Then there is an orthonormal
{T:symmetricmatdiag} basis of R consisting of eigenvectors of A.
n

Theorem 10.3.3. For each n × n symmetric matrix A, there exists an orthogonal matrix
P such that P t AP is a diagonal matrix.

The proof of Theorem 10.3.1 uses the Hermitian inner product — a generalization of dot
product to complex vectors.

Hermitian Inner Products Let v, w ∈ Cn be two complex n-vectors. Define

hv, wi = v1 w1 + · · · + vn wn .

Note that the coordinates wi of the second vector enter this formula with a complex conju-
gate. However, if v and w are real vectors, then

hv, wi = v · w.

An alternative notation for the Hermitian inner product is given by matrix multiplication.
Suppose that v and w are column n-vectors. Then

hv, wi = v t w.

The properties of the Hermitian inner product are similar to those of dot product. We note
three. Let c ∈ C be a complex scalar. Then

hv, vi = ||v||2 ≥ 0
hcv, wi = chv, wi
hv, cwi = chv, wi

Note the complex conjugation of the complex scalar c in the previous formula.
Let C be a complex n × n matrix. Then the most important observation concerning Her-
mitian inner products that we shall use is:

{e:hermite_matrix} (10.3.1)
t
hCv, wi = hv, C wi.

566
§10.3 The Spectral Theory of Symmetric Matrices

This fact is verified by calculating


t t
hCv, wi = (Cv)t w = (v t C t )w = v t (C t w) = v t (C w) = hv, C wi.

So if A is a n × n real symmetric matrix, then

{e:symminv} hAv, wi = hv, Awi, (10.3.2)

since A = At = A.
t

Proof of Theorem 10.3.1 Let λ be an eigenvalue of A and let v be the associated


eigenvector. Since Av = λv we can use (10.3.2) to compute

λhv, vi = hAv, vi = hv, Avi = λhv, vi.

Since hv, vi = ||v||2 > 0, it follows that λ = λ and λ is real. 

Proof of Theorem 10.3.2 Let A be a real symmetric n×n matrix. We show that there
is an orthonormal basis of Rn consisting of eigenvectors of A. The proof follows directly
from Corollary 10.1.2 if the eigenvalues are distinct.
If some of the eigenvalues are multiple, the proof is more complicated and uses Gram-
Schmidt orthonormalization. The proof proceeds inductively on n. The theorem is trivially
valid for n = 1; so we assume that it is valid for n − 1.
Theorem 7.2.4 of Chapter 7 implies that A has an eigenvalue λ1 and Theorem 10.3.1 states
that this eigenvalue is real. Let v1 be a unit length eigenvector corresponding to the eigen-
value λ1 . Extend v1 to an orthonormal basis v1 , w2 , . . . , wn of Rn and let P = (v1 |w2 | · · · |wn )
be the matrix whose columns are the vectors in this orthonormal basis. Orthonormality and
direct multiplication implies that
{e:orthosym} P t P = In . (10.3.3)
Therefore P is invertible; indeed P −1 = P t . Next, let

B = P −1 AP.

By direct computation

Be1 = P −1 AP e1 = P −1 Av1 = λ1 P −1 v1 = λ1 e1 .

It follows that that B has the form


 
λ1 ∗
B=
0 C

567
§10.3 The Spectral Theory of Symmetric Matrices

where C is an (n − 1) × (n − 1) matrix. Since P −1 = P t , it follows that B is a symmetric


matrix; to verify this point compute

B t = (P t AP )t = P t At (P t )t = P t AP = B.

It follows that  
λ1 0
B=
0 C
where C is a symmetric matrix. By induction we can use the Gram-Schmidt orthonor-
malization process to choose an orthonormal basis z2 , . . . , zn in {0} × Rn−1 consisting of
eigenvectors of C. It follows that e1 , z2 , . . . , zn is an orthonormal basis for Rn consisting of
eigenvectors of B.
Finally, let vj = P −1 zj for j = 2, . . . , n. Since v1 = P −1 e1 , it follows that v1 , v2 , . . . , vn
is a basis of Rn consisting of eigenvectors of A. We need only show that the vj form an
orthonormal basis of Rn . This is done using (10.3.2). For notational convenience let z1 = e1
and compute

hvi , vj i = hP −1 zi , P −1 zj i = hP t zi , P t zj i
= hzi , P P t zj i = hzi , zj i,

since P P t = In . Thus the vectors vj form an orthonormal basis since the vectors zj form
an orthonormal basis. 

Proof of Theorem 10.3.3 As a consequence of Theorem 10.3.2, let V = {v1 , . . . , vn }


be an orthonormal basis for Rn consisting of eigenvectors of A. Indeed, suppose

Avj = λj vj

where λj ∈ R. Note that 


λj i=j
Avj · vi =
0 i 6= j
It follows from (10.1.1) that
 
λ1 0
..
[A]V =  .
 

0 λn
is a diagonal matrix. So every symmetric matrix A is similar by an orthogonal matrix P to
a diagonal matrix where P is the matrix whose columns are the eigenvectors of A; namely,
P = [v1 | · · · |vn ]. 

568
§10.3 The Spectral Theory of Symmetric Matrices

Exercises

{c7.7.1}
1. Let  
a b
A=
b d
be the general real 2 × 2 symmetric matrix.

(a) The discriminant of the characteristic polynomial of a 2 × 2 matrix is


D = tr(A)2 − 4 det(A).
Use the discriminant to show that A has real eigenvalues.
(b) Show that A has equal eigenvalues only if A is a scalar multiple of I2 .

Solution: (a) We can calculate the discriminant D of matrix A using (4.6.10):


D = tr(A)2 − 4 det(A) = (a + d)2 − 4(ad − b2 ) = a2 + 2ad + b2 − 4ad + 4b2 = (a − d)2 + 4b2 .
Therefore, D ≥ 0 for all real symmetric matrices A. The eigenvalues of A are
√ √
(a + d) + D (a + d) − D
λ1 = and λ2 = .
2 2
Thus, λ1 and λ2 are real since D is non-negative.
(b) Matrix A has equal eigenvalues only if D = 0. According to the computation in (a) of this
problem, D = 0 only if a = d and b = 0. Therefore, if λ1 = λ2 , then A is a multiple of I2 .
{c7.7.2}
2. Let  
1 2
A= .
2 −2
Find the eigenvalues and eigenvectors of A and verify that the eigenvectors are orthogonal.
Answer: The eigenvalues of A are λ1 = 2 and λ2 = −3, with respective eigenvectors v1 = (2, 1)
and v2 = (1, −2).
Solution: Indeed, v1 · v2 = (2, 1) · (1, −2) = 0, so the eigenvectors are orthogonal.
{c7.9.2}
3. Let Q be an orthogonal n × n matrix. Show that Q preserves the length of vectors, that is
kQvk = kvk for all v ∈ Rn .

Solution: For this proof, we use the fact that, if C is a complex matrix, then (Cv) · w = v · (C w).
t

See (10.3.1). In particular, since Q is a real matrix, (Qv) · w = v · (Qt w). Therefore, since Q is
orthogonal:
||Qv||2 = (Qv) · (Qv) = (Qt Qv) · v = (In v) · v = v · v = ||v||2 .

569
§10.3 The Spectral Theory of Symmetric Matrices

{c10.3.10}
4. Let S2 be the set of real 2 × 2 symmetric matrices and let P be a 2 × 2 orthogonal matrix.

(a) Verify that S2 is a 3-dimensional vector space.


(b) Verify that the map MP : S2 → S2 defined by
MP (A) = P t AP
is linear.
(c) Let P be the reflection matrix  
0 −1
{e:reflection_matrix} P = (10.3.4)
1 0
Verify that P is an orthogonal matrix and compute the eigenvalues and eigenvectors of MP .

Solution:

(a) Symmetric matrices are closed under addition and scalar multiplication. Therefore, S2 is a
vector space. Let
     
1 0 0 1 0 0
{e:sym_mat_base} E1 = E2 = E3 = (10.3.5)
0 0 1 0 0 1
Observe that  
a b
= aE1 + bE2 + cE3 .
b c
Hence, the vectors E = {E1 , E2 , E3 } span S2 and are linearly independent. Therefore, E is a
basis for S2 and dim S2 = 3.
(b) To verify linearity, compute
MP (A + B) = P t (A + B)P = P t (AP + BP ) = P t AP + P t BP = MP (A) + MP (B)
for all A, B ∈ S2 and
MP (cA) = P t (cA)P = cP t AP = cMP (A)
for all c ∈ R.
(c) Compute the matrix of MP in the basis given in (10.3.5), where P is the reflection matrix in
(10.3.4). More precisely, compute
MP (E1 ) = P t E1 P
   
0 1 1 0 0 −1
=
−1 0 0 0 1 0
  
0 1 0 −1
=
−1 0 0 0
 
0 0
=
0 1
= E3

570
§10.3 The Spectral Theory of Symmetric Matrices

It follows that 
0
[MP (E1 )]E =  0 
1
Similarly,

MP (E2 ) = −E2
MP (E3 ) = E1

The 3 × 3 matrix of MP in the basis E is

[MP ]E = [[MP (E1 )]E |[MP (E2 )]E |[MP (E3 )]E ]
 
0 0 1
=  0 −1 0 
1 0 0

Compute the characteristic polynomial of [MP ]E as


 
−λ 0 1
p(λ) = det  0 −1 − λ 0  = (1 − λ2 )(1 + λ)
1 0 −λ

Hence the eigenvalues of MP are −1, −1, 1. The eigenvectors in R3 associated with the eigen-
value −1 is the null space of  
1 0 1
 0 0 0 .
1 0 1
That space is generated by the vectors (1, 0, −1)t and (0, 1, 0)t . These vectors correspond to
the symmetric matrices E1 − E3 and E2 . The eigenvector corresponding to the eigenvalue 1
is the null space of the matrix  
−1 0 1
 0 −2 0 .
1 0 −1
That vector is (1, 0, 1)t and corresponds with the symmetric matrix E1 − E3 .

In Exercises 5 – 7 compute the eigenvalues and the eigenvectors of the 2 × 2 matrix. Then load
the matrix into the program map in MATLAB and iterate. That is, choose an initial vector v0 and
use map to compute v1 = Av0 , v2 = Av1 , …. How does the result of iteration compare with the
eigenvectors and eigenvalues that you have found? Hint: You may find it convenient to use the
feature Rescale in the MAP Options. Then the norm of the vectors is rescaled to 1 after each use of
the command Map and the vectors vj will not escape from the viewing screen.

571
§10.3 The Spectral Theory of Symmetric Matrices

 
1 3
5. (matlab) A =
{exer:powita} 3 1
Answer: The eigenvectors of A are w1 = (1, 1) and w2 = (1, −1) with eigenvalues λ1 = 4 and
λ2 = −2.
 
a b
Solution: Note that any matrix of the form has eigenvectors w1 and w2 with eigenvalues
b a
λ1 = a + b and λ2 = a − b. By iterating using map, we see that vj approaches a multiple of (1, 1) as
j increases for v0 6= (1, −1). If v0 is a multiple of (1, −1), then vj is a multiple of (1, −1) for all j.
In Exercises 5 – 7 let u1 be the eigenvector of matrix associated to the eigenvalue λ1 where
|λ1 | > |λ2 |. These exercises demonstrate that vj approaches the direction of u1 as j increases when
v0 is not a scalar multiple of u2 .
 
11 9
6. (matlab) B =
{exer:powitb} 9 11
Answer: The eigenvectors of B are w1 = (1, 1) and w2 = (1, −1) with eigenvalues λ1 = 20 and
λ2 = 2.
Solution: See solution to Exercise 5.
 
0.005 −2.005
7. (matlab) C =
{exer:powitc} −2.005 0.005
Answer: The eigenvectors of C are w1 = (1, 1) and w2 = (1, −1) with eigenvalues λ1 = −2 and
λ2 = 2.01.
Solution: See comment after the solution to Exercise 5. By iterating using map, we see that vj
approaches a multiple of (1, −1) as j increases for v0 6= (1, 1). If v0 is a multiple of (1, 1), then vj
is a multiple of (1, 1) for all j.

{c7.7.3} 8. (matlab) Perform


 the
 same computational experiment as described in Exercises 5 – 7 using
0 2
the matrix A = and the program map. How do your results differ from the results in
2 0
those exercises and why?
In this case, for any vector v0 = (x, y), the result of the iteration is v1 = Av0 = 2(y, x). That is,
any vector multiplied by A is reflected across the line y = x and doubled in length. The result is
different from that in Exercises 5 – 7 because the eigenvalues are λ1 = 2 and λ2 = −2, so |λ1 | = |λ2 |.

572
§10.4 *QR Decompositions

{S:QR} 10.4 *QR Decompositions


In this section we describe an alternative approach to Gram-Schmidt orthonormalization
for constructing an orthonormal basis of a subspace W ⊂ Rn . This method is called the
QR decomposition and is numerically superior to Gram-Schmidt. Indeed, the QR decom-
position is the method used by MATLAB to compute orthonormal bases. To discuss this
decomposition we need to introduce a new type of matrices, the orthogonal matrices.

Reflections Across Hyperplanes: Householder Matrices Useful examples of orthogonal ma-


trices are reflections across hyperplanes. An n − 1 dimensional subspace of Rn is called a
hyperplane. Let V be a hyperplane and let u be a nonzero vector normal to V . Then a
reflection across V is a linear map H : Rn → Rn such that

(a) Hv = v for all v ∈ V .

(b) Hu = −u.

We claim that the matrix of a reflection across a hyperplane is orthogonal and there is a
simple formula for that matrix.
{D:Householder}
Definition 10.4.1. A Householder matrix is an n × n matrix of the form
2
{eq:householder} H = In − uut (10.4.1)
ut u
where u ∈ Rn is a nonzero vector. .

This definition makes sense since ut u = ||u||2 is a number while the product uut is an n × n
matrix.

Lemma 10.4.2. Let u ∈ Rn be a nonzero vector and let V be the hyperplane orthogonal to
u. Then the Householder matrix H is a reflection across V and is orthogonal.

Proof By definition every vector v ∈ V satisfies ut v = u · v = 0. Therefore,

2
Hv = v − uut v = v,
ut u
and
2
Hu = u − uut u = u − 2u = −u.
ut u

573
§10.4 *QR Decompositions

Hence H is a reflection across the hyperplane V . It also follows that H 2 = In since H 2 v =


H(Hv) = Hv = v for all v ∈ V and H 2 u = H(−u) = u. So H 2 acts like the identity on a
basis of Rn and H 2 = In .
To show that H is orthogonal, we first calculate

2 2
H t = Int − t
(uut )t = In − t uut = H.
uu uu

Therefore In = HH = HH t and H t = H −1 . Now apply Lemma 10.1.5(b). 

QR Decompositions The Gram-Schmidt process is not used in practice to find orthonor-


mal bases as there are other techniques available that are preferable for orthogonalization on
a computer. One such procedure for the construction of an orthonormal basis is based on QR
decompositions using Householder transformations. This method is the one implemented in
MATLAB .
An n × k matrix R = {rij } is upper triangular if rij = 0 whenever i > j.
{qr-Def}
Definition 10.4.3. An n × k matrix A has a QR decomposition if

{eq:qrdecom} A = QR. (10.4.2)

where Q is an n × n orthogonal matrix and R is an n × k upper triangular matrix R.

QR decompositions can be used to find orthonormal bases as follows. Suppose that W =


{w1 , . . . , wk } is a basis for the subspace W ⊂ Rn . Then define the n × k matrix A which
has the wj as columns, that is
A = (w1t | · · · |wkt ).

Suppose that A = QR is a QR decomposition. Since Q is orthogonal, the columns of Q are


orthonormal. So write
Q = (v1t | · · · |vnt ).

On taking transposes we arrive at the equation At = Rt Qt :


 
  r11 0 ··· 0 ··· 0  
w1 v1
 ..  
 r12 r22 ··· 0 ··· 0 
..  .
 .  =  .. .. .. .. . 

 . . . .

··· ··· 
wk vn
r1k r2k ··· rkk ··· 0

574
§10.4 *QR Decompositions

By equating rows in this matrix equation we arrive at the system


w1 = r11 v1
w2 = r12 v1 + r22 v2
{eq:wrv} .. (10.4.3)
.
wk = r1k v1 + r2k v2 + · · · + rkk vk .
It now follows that the W = span{v1 , . . . , vk } and that {v1 , . . . , vk } is an orthonormal basis
{prop:qrdec} for W . We have proved:
Proposition 10.4.4. Suppose that there exist an orthogonal n × n matrix Q and an upper
triangular n × k matrix R such that the n × k matrix A has a QR decomposition
A = QR.
Then the first k columns v1 , . . . , vk of the matrix Q form an orthonormal basis of the subspace
W = span{w1 , . . . , wk }, where the wj are the columns of A. Moreover, rij = vi · wj is the
coordinate of wj in the orthonormal basis.

Conversely, we can also write down a QR decomposition for a matrix A, if we have computed
an orthonormal basis for the columns of A. Indeed, using the Gram-Schmidt process,
Theorem 10.2.1, we have shown that QR decompositions always exist. In the remainder
of this section we discuss a different way for finding QR decompositions using Householder
matrices.

Construction of a QR Decomposition Using Householder Matrices The QR decomposition by


{prop:Housej} Householder transformations is based on the following observation :
Proposition 10.4.5. Let z = (z1 , . . . , zn ) ∈ Rn be nonzero and let
q
r = zj2 + · · · + zn2 .

Define u = (u1 , . . . , un ) ∈ Rn by
   
u1 0
 .. ..
 . .
  
  
   
 uj−1   0 
   
 uj  =  zj − r .
   
 uj+1   zj+1 
 . ..
   
 .. .
  
  
un zn

575
§10.4 *QR Decompositions

Then
2ut z = ut u
and  
z1
 ..
 .


 
 zj−1 
{eq:Housej} (10.4.4)
 
 r
Hz =  

 0 
 .
 
 ..


0
2
holds for the Householder matrix H = In − uut .
ut u

Proof Begin by computing

ut z 2
= uj zj + zj+1 + · · · + zn2
= zj2 − rzj + zj+1
2
+ · · · + zn2
= −rzj + r2 .

Next, compute

ut u = (zj − r)(zj − r) + zj+1


2
+ · · · + zn2
= zj2 − 2rzj + r2 + zj+1
2
+ · · · + zn2
= 2(−rzj + r2 ).

Hence 2ut z = ut u, as claimed.


Note that z − u is the vector on the right hand side of (10.4.4). So, compute

2ut z
 
2 t
Hz = In − t uu z = z − t u = z − u
uu uu
to see that (10.4.4) is valid. 

An inspection of the proof of Proposition 10.4.5 shows that we could have chosen

uj = zj + r

instead of uj = zj − r. Therefore, the choice of H is not unique.

576
§10.4 *QR Decompositions

Proposition 10.4.5 allows us to determine inductively a QR decomposition of the matrix

A = (w10 | · · · |wk0 ),

where each wj0 ∈ Rn . So, A is an n × k matrix and k ≤ n.


First, set z = w10 and use Proposition 10.4.5 to construct the Householder matrix H1 such
that  
r11
 0 
H1 w10 =  .  ≡ r1 .
 .. 
 

0
Then the matrix A1 = H1 A can be written as

A1 = (r1 |w21 | · · · |wk1 ),

where wj1 = H1 wj0 for j = 2, . . . , k.


Second, set z = w21 in Proposition 10.4.5 and construct the Householder matrix H2 such
that  
r12
 r22 
 
H2 w2 =  0  ≡ r2 .
1  
 .. 
 . 
0
Then the matrix A2 = H2 A1 = H2 H1 A can be written as

A2 = (r1 |r2 |w32 | · · · |wk2 )

where wj2 = H2 wj1 for j = 3, . . . , k. Observe that the 1st column r1 is not affected by the
matrix multiplication, since H2 leaves the first component of a vector unchanged.
Proceeding inductively, in the ith step, set z = wii−1 and use Proposition 10.4.5 to construct
the Householder matrix Hi such that:
 
r1i
 .. 
 . 
 
 rii 
Hi wii−1 = 
 0  ≡ ri

 . 
 
 .. 
0

577
§10.4 *QR Decompositions

and the matrix Ai = Hi Ai−1 = Hi · · · H1 A can be written as


i
Ai = (r1 | · · · |ri |wi+1 | · · · |wki ),

where wi2 = Hi wji−1 for j = i + 1, . . . , k.


After k steps we arrive at
Hk · · · H1 A = R,
where R = (r1 | · · · |rk ) is an upper triangular n × k matrix. Since the Householder matrices
H1 , . . . , Hk are orthogonal, it follows from Lemma 10.1.5(c) that the Qt = Hk · · · H1 is
orthogonal. Thus, A = QR is a QR decomposition of A.

Orthonormalization with MATLAB Given a set w1 , . . . , wk of linearly independent vectors


in Rn the MATLAB command qr allows us to compute an orthonormal basis of the spanning
set of these vectors. As mentioned earlier, the underlying technique MATLAB uses for the
computation of the QR decomposition is based on Householder transformations.
The syntax of the QR decomposition in MATLAB is quite simple. For example, let w1 =
(1, 0, −1, 0), w2 = (2, −1, 0, 1) and w3 = (0, 0, −2, 1) be the three vectors in (10.2.6). In
Section 5.5 we computed an orthonormal basis for the subspace of R4 spanned by w1 , w2 , w3 .
Here we use the MATLAB command qr to find an orthonormal basis for this subspace. Let
A be the matrix having the vectors w1t , w2t and w3t as columns. So, A is:

A = [1 2 0; 0 -1 0; -1 0 -2; 0 1 1]

The command

[Q R] = qr(A,0)

leads to the answer

Q =
-0.7071 0.5000 -0.4523
0 -0.5000 -0.1508
0.7071 0.5000 -0.4523
0 0.5000 0.7538
R =
-1.4142 -1.4142 -1.4142
0 2.0000 -0.5000
0 0 1.6583

578
§10.4 *QR Decompositions

A comparison with (10.2.7) shows that the columns of the matrix Q are the elements in
the orthonormal basis. The only difference is that the sign of the first vector is opposite.
However, this is not surprising since we know that there is some freedom in the choice of
Householder matrices, as remarked after Proposition 10.4.5.
In addition, the command qr produces the matrix R whose entries rij are the coordinates of
the vectors wj in the new orthonormal basis as in (10.4.3). For instance, the second column
of R tells us that

w2 = r12 v1 + r22 v2 + r32 v3 = −1.4142v1 + 2.0000v2 .

Exercises

{c7.9.3a} In Exercises 1 – 4, compute the Householder matrix H corresponding to the given vector u.
 
1
1. u = .
1
       
2 1  1 0 1 1 0 −1
H = I2 − 1 1 = − = .
2 1 0 1 1 1 −1 0
{c7.9.3b}
 
0
2. u = .
−2
   
2 0 0 1 0
H = I2 − = .
4 0 4 0 −1
{c7.9.3c}
 
−1
3. u =  1 .
5
   
1 −1 −5 25 2 10
2  1
H = I3 − −1 1 5 =  2 25 −10 .
27 27
−5 5 25 10 −10 −50
{c7.9.3d}
 
1
 0 
4. u = 
 4 .

−2
   
1 0 4 −2 19 0 −8 4
2  0 0
 0 0  1  0 21
 0 0 
.
H = I4 − =
21  4 0 16 −8  21  −8 0 −9 16 
−2 0 −8 4 4 0 16 13

579
§10.4 *QR Decompositions

{c7.9.4}
5. Find the matrix that reflects the plane across the line generated by the vector (1, 2).
Answer: The matrix that reflects the plane across (1, 2) is
 3 4 

H= 5 5 .
4 3
5 5

Solution: The matrix that reflects the plane across (1, 2) is the Householder matrix associated to
the vector u, where u · (1, 2) = 0. Compute H, the Householder matrix associated to u = (2, −1).

In Exercises 6 – 9, use the MATLAB command qr to compute an orthonormal basis for each of the
subspaces spanned by the given set of vectors.

{c7.5.5a} 6. (matlab) w1 = (1, −1), w2 = (1, 2).


The orthonormal basis generated by the command [Q R] = qr(A,0) is:

v1 = v2 =
-0.7071 0.7071
0.7071 0.7071

{c7.5.5b} 7. (matlab) w1 = (1, −2, 3), w2 = (0, 1, 1).


The orthonormal basis generated by the command [Q R] = qr(A,0) is:

v1 = v2 =
-0.2673 0.0514
0.5345 -0.8230
-0.8018 -0.5658

{c7.5.5c} 8. (matlab) w1 = (1, −2, 3), w2 = (0, 1, 1), w3 = (2, 2, 0).


The orthonormal basis generated by the command [Q R] = qr(A,0) is:

v1 = v2 = v3 =
-0.2673 0.0514 -0.9623
0.5345 -0.8230 -0.1925
-0.8018 -0.5658 0.1925

{c7.5.5d} 9. (matlab) v1 = (1, 0, −2, 0, −1), v2 = (2, −1, 4, 2, 0), v3 = (0, 3, 5, 1, −1).
The orthonormal basis generated by the command [Q R] = qr(A,0) is:

580
§10.4 *QR Decompositions

v1 = v2 = v3 =
-0.4082 0.6882 0.0190
0 -0.2294 -0.8494
0.8165 0.4588 -0.2282
0 0.4588 0.0127
0.4082 -0.2294 0.4754

{c7.5.6} 10. (matlab) Find the 4 × 4 Householder matrices H1 and H2 corresponding to the vectors
u1 = (1.04, 2, 0.76, −0.32)
{MATLAB:62} (10.4.5*)
u2 = (1.4, −1.3, 0.6, 1.2).

Compute H = H1 H2 and verify that H is an orthogonal matrix.


Answer:

H1 = H2 =
0.6245 -0.7220 -0.2744 0.1155 0.2807 0.6679 -0.3083 -0.6165
-0.7220 -0.3885 -0.5276 0.2222 0.6679 0.3798 0.2862 0.5725
-0.2744 -0.5276 0.7995 0.0844 -0.3083 0.2862 0.8679 -0.2642
0.1155 0.2222 0.0844 0.9645 -0.6165 0.5725 -0.2642 0.4716

H =
-0.2935 0.1305 -0.6678 -0.6714
-0.4365 -0.6536 -0.4053 0.4669
-0.7279 -0.1065 0.6051 -0.3043
-0.4398 0.7378 -0.1536 0.4885

Solution: Find H1 in MATLAB by typing

H1 = eye(4) - 2/(u1'*u1)*u1*u1'

Calculate H2 similarly. Confirm that H = H1 H2 is orthogonal by computing the product H t H to


see that it is I4 .

581
Chapter 11 *Matrix Normal Forms

11 *Matrix Normal Forms


In this chapter we generalize to n × n matrices the theory of matrix normal forms presented
in Chapter 6 for 2 × 2 matrices. In this theory we ask: What is the simplest form that
a matrix can have up to similarity. After first presenting several preliminary results, the
theory culminates in the Jordan normal form theorem, Theorem 11.3.2.
The first of the matrix normal form results — every matrix with n distinct real eigenvalues
can be diagonalized — is presented in Section 7.3. The basic idea is that when a matrix has
n distinct real eigenvalues, then it has n linearly independent eigenvectors. In Section 11.1
we discuss matrix normal forms when the matrix has n distinct eigenvalues some of which
are complex. When an n × n matrix has fewer than n linearly independent eigenvectors,
it must have multiple eigenvalues and generalized eigenvectors. This topic is discussed in
Section 11.2. The Jordan normal form theorem is introduced in Section 11.3 and describes
similarity of matrices when the matrix has fewer than n independent eigenvectors. The
proof is given in Appendix 11.5.
We introduced Markov matrices in Section 4.8. One of the theorems discussed there has
a proof that relies on the Jordan normal form theorem, and we prove this theorem in
Appendix 11.4.

582
§11.1 Simple Complex Eigenvalues

{C:HDeigenvalues}
{S:CSE} 11.1 Simple Complex Eigenvalues
Theorem 7.3.1 states that a matrix A with real unequal eigenvalues may be diagonalized.
It follows that in an appropriately chosen basis (the basis of eigenvectors), matrix multi-
plication by A acts as multiplication by these real eigenvalues. Moreover, geometrically,
multiplication by A stretches or contracts vectors in eigendirections (depending on whether
the eigenvalue is greater than or less than 1 in absolute value).
The purpose of this section is to show that a similar kind of diagonalization is possible when
the matrix has distinct complex eigenvalues. See Theorem 11.1.1 and Theorem 11.1.2. We
show that multiplication by a matrix with complex eigenvalues corresponds to multiplication
by complex numbers. We also show that multiplication by complex eigenvalues correspond
geometrically to rotations as well as expansions and contractions.

The Geometry of Complex Eigenvalues: Rotations and Dilatations Real 2 × 2 matrices are
the smallest real matrices where complex eigenvalues can possibly occur. In Chapter 6,
Theorem 6.3.4(b) we discussed the classification of such matrices up to similarity. Recall
that if the eigenvalues of a 2 × 2 matrix A are σ ± iτ , then A is similar to the matrix
 
σ −τ
{E:normalfm2} . (11.1.1)
τ σ
Moreover, the basis in which A has the form (11.1.1) is found as follows. Let v = w1 + iw2
be the eigenvector of A corresponding to the eigenvalue σ − iτ . Then {w1 , w2 } is the desired
basis.
Geometrically, multiplication of vectors inp R2 by (11.1.1) is the same as a rotation followed
by a dilatation. More specifically, let r = σ 2 + τ 2 . So the point (σ, τ ) lies on the circle of
radius r about the origin, and there is an angle θ such that (σ, τ ) = (r cos θ, r sin θ). Now
we can rewrite (11.1.1) as
   
σ −τ cos θ − sin θ
=r = rRθ ,
τ σ sin θ cos θ
where Rθ is rotation counterclockwise through angle θ. From this discussion we see that
geometrically complex eigenvalues are associated with rotations followed either by stretching
(r > 1) or contracting (r < 1).
As an example, consider the matrix
 
2 1
{E:exampA} A= . (11.1.2)
−2 0

583
§11.1 Simple Complex Eigenvalues

The characteristic polynomial of A is pA (λ) = λ2 − 2λ + 2. Thus the eigenvalues of A are


1 ± i, and σ = 1 and τ = 1 for this example. An eigenvector associated to the eigenvalue
1 − i is v = (1, −1 − i)t = (1, −1)t + i(0, −1)t . Therefore,
   
1 −1 1 0
−1
B = S AS = where S = ,
1 1 −1 −1

as can be checked by direct calculation. Moreover, we can rewrite


 √ √ 
2 2
√  −  √ π
B = 2 √ 2 √2  = 2R 4 .
2 2
2 2
So, in an appropriately chosen coordinate system, multiplication√by A rotates vectors coun-
terclockwise by 45◦ and then expands the result by a factor of 2. See Exercise 3.

The Algebra of Complex Eigenvalues: Complex Multiplication We have shown that the nor-
mal form (11.1.1) can be interpreted geometrically as a rotation followed by a dilatation.
There is a second algebraic interpretation of (11.1.1), and this interpretation is based on
multiplication by complex numbers.
Let λ = σ + iτ be a complex number and consider the matrix associated with complex
multiplication, that is, the linear mapping

{e:cplxmap} z 7→ λz (11.1.3)

on the complex plane. By identifying real and imaginary parts, we can rewrite (11.1.3) as
a real 2 × 2 matrix in the following way. Let z = x + iy. Then

λz = (σ + iτ )(x + iy) = (σx − τ y) + i(τ x + σy).

Now identify z with the vector (x, y); that is, the vector whose first component is the real
part of z and whose second component is the imaginary part. Using this identification the
complex number λz is identified with the vector (σx − τ y, τ x + σy). So, in real coordinates
and in matrix form, (11.1.3) becomes
      
x σx − τ y σ −τ x
7→ = .
y τ x + σy τ σ y

That is, the matrix corresponding to multiplication of z = x + iy by the complex number


λ = σ + iτ is the one that multiplies the vector (x, y)t by the normal form matrix (11.1.1).

584
§11.1 Simple Complex Eigenvalues

Direct Agreement Between the Two Interpretations of (11.1.1) We have shown that matrix
multiplication by (11.1.1) may be thought of either algebraically as multiplication by a com-
plex number (an eigenvalue) or geometrically as a rotation followed by a dilatation. We now
show how to go directly from the algebraic interpretation to the geometric interpretation.
Euler’s formula (Chapter 6, (6.2.5)) states that

eiθ = cos θ + i sin θ

for any real number θ. It follows that we can write a complex number λ = σ + iτ in polar
form as
λ = reiθ
where r2 = λλ = σ 2 + τ 2 , σ = r cos θ, and τ = r sin θ.
Now consider multiplication by λ in polar form. Write z = seiϕ in polar form, and compute

λz = reiθ seiϕ = rsei(ϕ+θ) .

It follows from polar form that multiplication of z by λ = reiθ rotates z through an angle θ
and dilates the result by the factor r. Thus Euler’s formula directly relates the geometry of
rotations and dilatations with the algebra of multiplication by a complex number.

Normal Form Matrices with Distinct Complex Eigenvalues In the first parts of this
section we have discussed a geometric and an algebraic approach to matrix multiplication
by 2 × 2 matrices with complex eigenvalues. We now turn our attention to classifying n × n
matrices that have distinct eigenvalues, whether these eigenvalues are real or complex. We
will see that there are two ways to frame this classification — one algebraic (using complex
numbers) and one geometric (using rotations and dilatations).

Algebraic Normal Forms: The Complex Case Let A be an n × n matrix with real entries and
n distinct eigenvalues λ1 , . . . , λn . Let vj be an eigenvector associated with the eigenvalue
λj . By methods that are entirely analogous to those in Section 7.3 we can diagonalize the
matrix A over the complex numbers. The resulting theorem is analogous to Theorem 7.3.1.
More precisely, the n × n matrix A is complex diagonalizable if there is a complex n × n
matrix T such that  
λ1 0 · · · 0
 0 λ2 · · · 0 
T −1 AT =  . .. . . . .
 .. . .. 
 
.
0 0 ··· λn

585
§11.1 Simple Complex Eigenvalues

{T:diagsimplecpx}
Theorem 11.1.1. Let A be an n × n matrix with n distinct eigenvalues. Then A is complex
diagonalizable.

The proof of Theorem 11.1.1 follows from a theoretical development virtually word for word
the same as that used to prove Theorem 7.3.1 in Section 7.3. Beginning from the theory
that we have developed so far, the difficulty in proving this theorem lies in the need to base
the theory of linear algebra on complex scalars rather than real scalars. We will not pursue
that development here.
As in Theorem 7.3.1, the proof of Theorem 11.1.1 shows that the complex matrix T is the
matrix whose columns are the eigenvectors vj of A; that is,
T = (v1 | · · · |vn ).

Finally, we mention that the computation of inverse matrices with complex entries is the
same as that for matrices with real entries. That is, row reduction of the n × 2n matrix
(T |In ) leads, when T is invertible, to the matrix (In |T −1 ).

Two Examples As a first example, consider the normal form 2 × 2 matrix (11.1.1) that has
eigenvalues λ and λ, where λ = σ + iτ . Let
   
σ −τ λ 0
B= and C = .
τ σ 0 λ
Since the eigenvalues of B and C are identical, Theorem 11.1.1 implies that there is a 2 × 2
complex matrix T such that
{E:BCsimilar} C = T −1 BT. (11.1.4)
Moreover, the columns of T are the complex eigenvectors v1 and v2 associated to the eigen-
values λ and λ.
It can be checked that the eigenvectors of B are v1 = (1, −i)t and v2 = (1, i)t . On setting
 
1 1
T = ,
−i i

it is a straightforward calculation to verify that C = T −1 BT .


As a second example, consider the matrix
 
4 2 1
{compute-eigenvalues} A= 2 −3 1 . (11.1.5*)
1 −1 −3

586
§11.1 Simple Complex Eigenvalues

Using MATLAB we find the eigenvalues of A by typing eig(A). They are:

ans =
4.6432
-3.3216 + 0.9014i
-3.3216 - 0.9014i

We can diagonalize (over the complex numbers) using MATLAB — indeed MATLAB is
programmed to do these calculations over the complex numbers. Type [T,D] = eig(A)
and obtain

T =
0.9604 -0.1299 + 0.1587i -0.1299 - 0.1587i
0.2632 0.0147 - 0.5809i 0.0147 + 0.5809i
0.0912 0.7788 - 0.1173i 0.7788 + 0.1173i

D =
4.6432 0 0
0 -3.3216 + 0.9014i 0
0 0 -3.3216 - 0.9014i

This calculation can be checked by typing inv(T)*A*T to see that the diagonal matrix D
appears. One can also check that the columns of T are eigenvectors of A.
Note that the development here does not depend on the matrix A having real entries.
Indeed, this diagonalization can be completed using n × n matrices with complex entries —
and MATLAB can handle such calculations.

Geometric Normal Forms: Block Diagonalization There is a second normal form theorem
based on the geometry of rotations and dilatations for real n × n matrices A. In this normal
form we determine all matrices A that have distinct eigenvalues — up to similarity by real
n × n matrices S. The normal form results in matrices that are block diagonal with either
1 × 1 blocks or 2 × 2 blocks of the form (11.1.1) on the diagonal.
A real n × n matrix is in real block diagonal form if it is a block diagonal matrix
 
B1 0 ··· 0
 0 B2 ··· 0 
{e:blockform}  .. .. .. ..  , (11.1.6)
 
 . . . . 
0 0 ··· Bm

587
§11.1 Simple Complex Eigenvalues

where each Bj is either a 1 × 1 block


Bj = λ j
for some real number λj or a 2 × 2 block
 
σj −τj
{e:2x2block} Bj = (11.1.7)
τj σj
where σj and τj 6= 0 are real numbers. A matrix is real block diagonalizable if it is similar
to a real block diagonal form matrix.
Note that the real eigenvalues of a real block diagonal form matrix are just the real numbers
λj that occur in the 1 × 1 blocks. The complex eigenvalues are the eigenvalues of the 2 × 2
{T:Complexdiag} blocks Bj and are σj ± iτj .
Theorem 11.1.2. Every n × n matrix A with n distinct eigenvalues is real block diagonal-
izable.

{L:indepcomplx} We need two preliminary results.


Lemma 11.1.3. Let λ1 , . . . , λq be distinct (possible complex) eigenvalues of an n × n matrix
A. Let vj be a (possibly complex) eigenvector associated with the eigenvalue λj . Then
v1 , . . . , vq are linearly independent in the sense that if
{E:indepcomplx} α1 v1 + · · · + αq vq = 0 (11.1.8)
for (possibly complex) scalars αj , then αj = 0 for all j.

Proof The proof is identical in spirit with the proof of Lemma 7.3.2. Proceed by induction
on q. When q = 1 the lemma is trivially valid, as αv = 0 for v 6= 0 implies that α = 0, even
when α ∈ C and v ∈ Cn .
By induction assume the lemma is valid for q − 1. Now apply A to (11.1.8) obtaining
α1 λ1 v1 + · · · + αq λq vq = 0.
Subtract this identity from λq times (11.1.8), and obtain
α1 (λ1 − λq )v1 + · · · + αq−1 (λq−1 − λq )vq−1 = 0.
By induction
αj (λj − λq ) = 0
for j = 1, . . . , q − 1. Since the λj are distinct it follows that αj = 0 for j = 1, . . . , q − 1.
Hence (11.1.8) implies that αq vq = 0; since vq 6= 0, αq = 0. 

588
§11.1 Simple Complex Eigenvalues

{L:rlcmplx}
Lemma 11.1.4. Let µ1 , . . . , µk be distinct real eigenvalues of an n × n matrix A and let
ν1 , ν 1 . . . , ν` , ν ` be distinct complex conjugate eigenvalues of A. Let vj ∈ Rn be eigenvectors
associated to µj and let wj = wjr + iwji be eigenvectors associated with the eigenvalues νj .
Then the k + 2` vectors
v1 , . . . , vk , w1r , w1i , . . . , w`r , w`i
in Rn are linearly independent.

Proof Let w = wr + iwi be a vector in Cn and let β r and β i be real scalars. Then

{E:realcmplx} β r wr + β i wi = βw + βw, (11.1.9)


1 r
where β = (β − iβ i ). Identity (11.1.9) is verified by direct calculation.
2
Suppose now that

{E:rlcplxlc} α1 v1 + · · · + αk vk + β1r w1r + β1i w1i + · · · + β`r w`r + β`i w`i = 0 (11.1.10)

for real scalars αj , βjr and βji . Using (11.1.9) we can rewrite (11.1.10) as

α1 v1 + · · · + αk vk + β1 w1 + β 1 w1 + · · · + β` w` + β ` w` = 0,
1 r
where βj = (β − iβji ). Since the eigenvalues
2 j
µ1 , . . . , µ k , ν 1 , ν 1 . . . , ν ` , ν `

are all distinct, we may apply Lemma 11.1.3 to conclude that αj = 0 and βj = 0. It follows
that βjr = 0 and βji = 0, as well, thus proving linear independence. 

Theorem 11.1.2 Let µj for j = 1, . . . , k be the real eigenvalues of A and let νj , ν j for
j = 1, . . . , ` be the complex eigenvalues of A. Since the eigenvalues are all distinct, it follows
that k + 2` = n.
Let vj and wj = wjr + iwji be eigenvectors associated with the eigenvalues µj and ν j . It
follows from Lemma 11.1.4 that the n real vectors

{e:complexeigen} v1 , . . . , vk , w1r , w1i , . . . , w`r , w`i (11.1.11)

are linearly independent and hence form a basis for Rn .

589
§11.1 Simple Complex Eigenvalues

We now show that A is real block diagonalizable. Let S be the n × n matrix whose columns
are the vectors in (11.1.11). Since these vectors are linearly independent, S is invertible. We
claim that S −1 AS is real block diagonal. This statement is verified by direct calculation.
First, note that Sej = vj for j = 1, . . . , k and compute

(S −1 AS)ej = S −1 Avj = µj S −1 vj = µj ej .

It follows that the first k columns of S −1 AS are zero except for the diagonal entries, and
those diagonal entries equal µ1 , . . . , µk .
Second, note that Sek+1 = w1r and Sek+2 = w1i . Write the complex eigenvalues as

νj = σj + iτj .

Since Aw1 = ν 1 w1 , it follows that

Aw1r + iAw1i = (σ1 − iτ1 )(w1r + iw1i )


= (σ1 w1r + τ1 w1i ) + i(−τ1 w1r + σ1 w1i ).

Equating real and imaginary parts leads to

Aw1r = σ1 w1r + τ1 w1i


{e:complexsimple} (11.1.12)
Aw1i = −τ1 w1r + σ1 w1i .

Using (11.1.12), compute

(S −1 AS)ek+1 = S −1 Aw1r = S −1 (σ1 w1r + τ1 w1i )


= σ1 ek+1 + τ1 ek+2 .

Similarly,

(S −1 AS)ek+2 = S −1 Aw1i = S −1 (−τ1 w1r + σ1 w1i )


= −τ1 ek+1 + σ1 ek+2 .

Thus, the k th and (k + 1)st columns of S −1 AS have the desired diagonal block in the k th
and (k + 1)st rows, and have all other entries equal to zero.
The same calculation is valid for the complex eigenvalues ν2 , . . . , ν` . Thus, S −1 AS is real
block diagonal, as claimed. 

590
§11.1 Simple Complex Eigenvalues

MATLAB Calculations of Real Block Diagonal Form Let C be the 4 × 4 matrix


 
1 0 2 3
 2 1 4 6 
ur-by-four-eigenvalues} C=  −1 −5 1
. (11.1.13*)
3 
1 4 7 10
Using MATLAB enter C by typing e13_2_14 and find the eigenvalues of C by typing eig(C)
to obtain

ans =
0.5855 + 0.8861i
0.5855 - 0.8861i
-0.6399
12.4690

We see that C has two real and two complex conjugate eigenvalues. To find the complex
eigenvectors associated with these eigenvalues, type

[T,D] = eig(C)

MATLAB responds with

T =
-0.0787+0.0899i -0.0787-0.0899i 0.0464 0.2209
0.0772+0.2476i 0.0772-0.2476i 0.0362 0.4803
-0.5558-0.5945i -0.5558+0.5945i -0.8421 -0.0066
0.3549+0.3607i 0.3549-0.3607i 0.5361 0.8488

D =
0.586+0.886i 0 0 0
0 0.586-0.886i 0 0
0 0 -0.640 0
0. 0 0 12.469

The 4 × 4 matrix T has the eigenvectors of C as columns. The j th column is the eigenvector
associated with the j th diagonal entry in the diagonal matrix D.
To find the matrix S that puts C in real block diagonal form, we need to take the real and
imaginary parts of the eigenvectors corresponding to the complex eigenvalues and the real
eigenvectors corresponding to the real eigenvalues. In this case, type

591
§11.1 Simple Complex Eigenvalues

S = [real(T(:,1)) imag(T(:,1)) T(:,3) T(:,4)]

to obtain

S =
-0.0787 0.0899 0.0464 0.2209
0.0772 0.2476 0.0362 0.4803
-0.5558 -0.5945 -0.8421 -0.0066
0.3549 0.3607 0.5361 0.8488

Note that the 1st and 2nd columns are the real and imaginary parts of the complex eigen-
vector. Check that inv(S)*C*S is the matrix in complex diagonal form

ans =
0.5855 0.8861 0.0000 0.0000
-0.8861 0.5855 0.0000 -0.0000
0.0000 0.0000 -0.6399 0.0000
-0.0000 -0.0000 -0.0000 12.4690

as proved in Theorem 11.1.2.

Exercises

{c10.4.3}
1. Consider the 2 × 2 matrix  
3 1
A= ,
−2 1
whose eigenvalues are 2 ± i and whose associated eigenvectors are:
   
1−i 1+i
and
2i −2i

Find a complex 2 × 2 matrix T such that C = T −1 AT is complex diagonal and a real 2 × 2 matrix
S so that B = S −1 AS is in real block diagonal form.
Answer:    
1−i 1+i 1 −1
T = and S= .
2i −2i 0 2

592
§11.1 Simple Complex Eigenvalues

Solution: T is the matrix whose columns are the complex eigenvectors of A. Multiplying T −1 AT
yields  
2+i 0
,
0 2−i
the matrix with the eigenvalues of A along the main diagonal. The first column of the matrix S
contains the real part of the eigenvector (1+i, −2i)t , and the second column contains the imaginary
part of this eigenvector. Multiplying S −1 AS yields
   
2 −1 σ −τ
=
1 2 τ σ

where σ − iτ is the eigenvalue 2 − i associated to eigenvector (1 + i, −2i)t .


{c10.4.4}
2. Let  
2 5
A= .
−2 0
Find a complex 2 × 2 matrix T such that T −1 AT is complex diagonal and a real 2 × 2 matrix S so
that S −1 AS is in real block diagonal form.
   
5 5 5 −1
Answer: T = and S =
−1 + 3i −1 − 3i 0 3
Solution: The eigenvalues of A are 1 ± 3i and the associated eigenvectors are (5, −1 ± 3i)t . The
columns of T are just the eigenvectors of A and the real and imaginary parts of the eigenvectors
are the columns of S.

{c10.4.rotate} 3. (matlab) Use map to verify that the normal form matrices (11.1.1) are just rotations followed
by dilatations. In particular, use map to study the normal form matrix
 
1 −1
A= .
1 1
Then compare your results with the similar matrix
 
2 1
B= .
−2 0

Matrix A rotates a vector by 45◦ counterclockwise, then expands the vector by a factor of 2. Let
v1 = Av0 , v2 = Av1 , and so on. Note that v8 = 16v0 . That is, after eight iterations, the vector
points in its original direction and has length 16|v0 |.
Let w1 = Bv0 and w2 = Bw1 . Then, w8 = 16v0 = v8 . The rotation and dilatation caused
by matrix B coincides with that of matrix A every 8 iterations, or 360◦ . However, B does not cause
a single constant rotation or dilatation on each iteration, as does A.

593
§11.1 Simple Complex Eigenvalues

{c10.4.6} 4. (matlab) Consider the 2 × 2 matrix


 
−0.8318 −1.9755
A= .
0.9878 1.1437

(a) Use MATLAB to find the complex conjugate eigenvalues and eigenvectors of A.
(b) Find the real block diagonal normal form of A and describe geometrically the motion of this
normal form on the plane.
(c) Using map describe geometrically how A maps vectors in the plane to vectors in the plane.

(a) Enter the matrix in MATLAB, then type [S,D] = eig(A) to find the eigenvectors S =
0.8165 0.8165 -0.4082 - 0.4083i -0.4082 + 0.4083i
v1 = (0.8165, −0.4082 − 0.4083i)t and v2 = (0.8165, −0.4082 + 0.4083i)t with eigenvalues λ1 =
0.1559 + 0.9878i and λ2 = 0.1559 − 0.9878i.
(b) The real block diagonal normal form of A is
   
σ −τ 0.1559 −0.9878
R= =
τ σ 0.9878 0.1559
where λ1 = σ +√iτ . Note that |λj | = 1. This matrix rotates a vector through an angle of
cos−1 (0.1559) ≈ 2, and does not alter its length.
(c) In this case, the matrix rotates and dilatates the vector so that the endpoints of the iterated
vectors lie on an ellipse.

In Exercises 5 – 8 find a square real matrix S so that S −1 AS is in real block diagonal form and a
complex square matrix T so that T −1 AT is in complex diagonal form.

{c10.4.7a} 5. (matlab)  
1 2 4
{block-diagonal-exercise} A= 2 −4 −5  . (11.1.14*)
1 10 −15

Answer: The matrices are:

T =
0.9690 0.0197 + 0.3253i 0.0197 - 0.3253i
0.1840 0.0506 - 0.5592i 0.0506 + 0.5592i
0.1647 -0.4757 - 0.5935i -0.4757 + 0.5935i
S =
0.9690 0.0197 0.3253
0.1840 0.0506 -0.5592
0.1647 -0.4757 -0.5935

594
§11.1 Simple Complex Eigenvalues

Solution: The matrix T is the matrix whose columns are eigenvectors of A. Find T in MAT-
LAB by typing [T,D] = eig(A). The second and third columns of S correspond to the real and
imaginary parts of the second eigenvector of A. Verify these solutions by noting that inv(T)*A*T
yields a diagonal matrix with the eigenvalues of A along the diagonal, and that inv(S)*A*S yields
a block matrix with blocks consisting of the real and imaginary parts of each complex conjugate
pair of eigenvalues.

{c10.4.7b} 6. (matlab)  
−15.1220 12.2195 13.6098 14.9268
 −28.7805 21.8049 25.9024 28.7317 
mple-eigenvalue-exercise} A=
 . (11.1.15*)
60.1951 −44.9512 −53.9756 −60.6829 
−44.5122 37.1220 43.5610 47.2927

Answer: The matrices are:

T =
-0.1818 + 0.0422i -0.1818 - 0.0422i 0.2143 - 0.0122i 0.2143 + 0.0122i
-0.3649 - 0.0311i -0.3649 + 0.0311i 0.3154 + 0.0613i 0.3154 - 0.0613i
0.7291 0.7291 -0.6934 -0.6934
-0.5465 + 0.0289i -0.5465 - 0.0289i 0.6072 - 0.0347i 0.6072 + 0.0347i
S =
-0.1818 0.0422 0.2143 -0.0122
-0.3649 -0.0311 0.3154 0.0613
0.7291 0 -0.6934 0
-0.5465 0.0289 0.6072 -0.0347

Solution: The matrix T is the matrix whose columns are eigenvectors of A. Find T in MATLAB
by typing [T,D] = eig(A). The first two columns of S correspond to the real and imaginary parts
of the first eigenvector of A, and the last two columns contain the real and imaginary parts of the
third eigenvector. Type

S = [real(T(:,1)) imag(T(:,1)) real(T(:,3)) imag(T(:,3))]

in Matlab to compute S. Verify these solutions by noting that inv(T)*A*T yields a diagonal matrix
with the eigenvalues of A along the diagonal, and that inv(S)*A*S yields a block matrix with blocks
consisting of the real and imaginary parts of each complex conjugate pair of eigenvalues.

{c10.4.7c} 7. (matlab)
2.2125 5.1750 8.4250 15.0000 19.2500 0.5125
 
 −1.9500 −3.9000 −6.5000 −7.4000 −12.0000 −2.9500 
 2.2250 3.9500 6.0500 0.9000 1.5000 1.0250
 
mple-eigenvalue-exercise-2} A= . (11.1.16*)

 −0.2000 −0.4000 0 0.1000 0 −0.2000 
 −0.7875 −0.8250 −1.5750 1.0000 2.2500 0.5125 
1.7875 2.8250 4.5750 0 4.7500 5.4875

595
§11.1 Simple Complex Eigenvalues

Answer: The matrices are:

T =
Columns 1 through 4
-0.1933-0.2068i -0.1933+0.2068i -0.6791+0.5708i -0.6791-0.5708i
-0.0362+0.4192i -0.0362-0.4192i 0.2735-0.3037i 0.2735+0.3037i
0.4084+0.1620i 0.4084-0.1620i 0.0881+0.0243i 0.0881-0.0243i
-0.0000-0.0000i -0.0000+0.0000i -0.0000+0.0000i -0.0000-0.0000i
-0.1933-0.2068i -0.1933+0.2068i -0.1321-0.0365i -0.1321+0.0365i
0.2657-0.6317i 0.2657+0.6317i 0.1321+0.0365i 0.1321-0.0365i
Columns 5 through 6
0.4205-0.1238i 0.4205+0.1238i
0.0855+0.2601i 0.0855-0.2601i
-0.1639-0.1479i -0.1639+0.1479i
-0.5203+0.1710i -0.5203-0.1710i
0.4205-0.1238i 0.4205+0.1238i
-0.4205+0.1238i -0.4205-0.1238i
S =
-0.1933 -0.2068 -0.6791 0.5708 0.4205 -0.1238
-0.0362 0.4192 0.2735 -0.3037 0.0855 0.2601
0.4084 0.1620 0.0881 0.0243 -0.1639 -0.1479
-0.0000 -0.0000 -0.0000 0.0000 -0.5203 0.1710
-0.1933 -0.2068 -0.1321 -0.0365 0.4205 -0.1238
0.2657 -0.6317 0.1321 0.0365 -0.4205 0.1238

Solution: The matrix T is the matrix whose columns are eigenvectors of A. Find T in MATLAB
by typing [T,D] = eig(A). The first two columns of S correspond to the real and imaginary parts
of the first eigenvector of A, the third and fourth columns of S contain the real and imaginary
parts of the third eigenvector, and the last two columns contain the real and imaginary parts of the
fifth eigenvector. Verify these solutions by noting that inv(T)*A*T yields a diagonal matrix with
the eigenvalues of A along the diagonal, and that inv(S)*A*S yields a block matrix with blocks
consisting of the real and imaginary parts of each complex conjugate pair of eigenvalues.

{c10.4.7d} 8. (matlab)  
−12 15 0
ple-eigenvalue-exercise-3} A= 1 5 2 . (11.1.17*)
−5 1 5

Answer: The matrices are:

T =

596
§11.1 Simple Complex Eigenvalues

0.9602 0.4405 - 0.1340i 0.4405 + 0.1340i


-0.0818 0.5397 - 0.0860i 0.5397 + 0.0860i
0.2671 0.0569 + 0.6972i 0.0569 - 0.6972i
S =
0.9602 0.4405 -0.1340
-0.0818 0.5397 -0.0860
0.2671 0.0569 0.6972

Solution: The matrix T is the matrix whose columns are eigenvectors of A. Find T in MAT-
LAB by typing [T,D] = eig(A). The second and third columns of S correspond to the real and
imaginary parts of the second eigenvector of A. Verify these solutions by noting that inv(T)*A*T
yields a diagonal matrix with the eigenvalues of A along the diagonal, and that inv(S)*A*S yields
a block matrix with blocks consisting of the real and imaginary parts of each complex conjugate
pair of eigenvalues.

597
§11.2 Multiplicity and Generalized Eigenvectors

{S:MGE} 11.2 Multiplicity and Generalized Eigenvectors


The difficulty in generalizing the results in the previous two sections to matrices with mul-
tiple eigenvalues stems from the fact that these matrices may not have enough (linearly
independent) eigenvectors. In this section we present the basic examples of matrices with
a deficiency of eigenvectors, as well as the definitions of algebraic and geometric multiplic-
ity. These matrices will be the building blocks of the Jordan normal form theorem — the
theorem that classifies all matrices up to similarity.

Deficiency in Eigenvectors for Real Eigenvalues An example of deficiency in eigenvectors is


given by the following n × n matrix
 
λ0 1 0 ··· 0 0
 0 λ0
 1 ··· 0 
0
 0 0 λ0 ··· 0 0
{E:JnR} Jn (λ0 ) =  . .. .. .. .. .. (11.2.1)
 ..
 
. . . . .


 
 0 0 0 ··· λ0 1 
0 0 0 ··· 0 λ0

where λ0 ∈ R. Note that Jn (λ0 ) has all diagonal entries equal to λ0 , all superdiagonal
entries equal to 1, and all other entries equal to 0. Since Jn (λ0 ) is upper triangular, all n
eigenvalues of Jn (λ0 ) are equal to λ0 . However, Jn (λ0 ) has only one linearly independent
eigenvector. To verify this assertion let

N = Jn (λ0 ) − λ0 In .

Then v is an eigenvector of Jn (λ0 ) if and only if N v = 0. Therefore, Jn (λ0 ) has a unique


linearly independent eigenvector if
Lemma 11.2.1. nullity(N ) = 1.

Proof In coordinates the equation N v = 0 is:


    
0 1 0 ··· 0 0 v1 v2
 0
 0 1 ··· 0 0  
 v2  
  v3 

 0 0 0 ··· 0 0    v3
   v4 
 .. .. .. . . .. ..   .. = ..  = 0.
   
 .
 . . . . .   .
  
  . 

 0 0 0 · · · 0 1   vn−1   vn 
0 0 0 ··· 0 0 vn 0

598
§11.2 Multiplicity and Generalized Eigenvectors

Thus v2 = v3 = · · · vn = 0, and the solutions are all multiples of e1 . Therefore, the nullity
of N is 1. 

Note that we can express matrix multiplication by N as


N e1 = 0
{e:Ndef} (11.2.2)
N ej = ej−1 j = 2, . . . , n.
Note that (11.2.2) implies that N n = 0.
{D:multiplicities} The n × n matrix N motivates the following definitions.
Definition 11.2.2. Let λ0 be an eigenvalue of A. The algebraic multiplicity of λ0 is the
number of times that λ0 appears as a root of the characteristic polynomial pA (λ). The
geometric multiplicity of λ0 is the number of linearly independent eigenvectors of A having
eigenvalue equal to λ0 .

Abstractly, the geometric multiplicity is:


nullity(A − λ0 In ).

Our previous calculations show that the matrix Jn (λ0 ) has an eigenvalue λ0 with algebraic
multiplicity equal to n and geometric multiplicity equal to 1.
Lemma 11.2.3. The algebraic multiplicity of an eigenvalue is greater than or equal to its
geometric multiplicity.

Proof For ease of notation we prove this lemma only for real eigenvalues, though the
proof for complex eigenvalues is similar. Let A be an n × n matrix and let λ0 be a real
eigenvalue of A. Let k be the geometric multiplicity of λ0 and let v1 , . . . , vk be k linearly
independent eigenvectors of A with eigenvalue λ0 . We can extend {v1 , . . . , vk } to be a basis
V = {v1 , . . . , vn } of Rn . In this basis, the matrix of A is
 
λ0 Ik (∗)
[A]V = .
0 B
The matrices A and [A]V are similar matrices. Therefore, they have the same characteristic
polynomials and the same eigenvalues with the same algebraic multiplicities. It follows from
Lemma 7.1.9 that the characteristic polynomial of A is:
pA (λ) = p[A]V (λ) = (λ − λ0 )k pB (λ).
Hence λ0 appears as a root of pA (λ) at least k times and the algebraic multiplicity of λ0 is
greater than or equal to k. The same proof works when λ0 is a complex eigenvalue — but
all vectors chosen must be complex rather than real. 

599
§11.2 Multiplicity and Generalized Eigenvectors

Deficiency in Eigenvectors with Complex Eigenvalues An example of a real matrix with com-
plex conjugate eigenvalues having geometric multiplicity less than algebraic multiplicity is
the 2n × 2n block matrix
 
B I2 0 · · · 0 0
 0 B I2 · · · 0 0 
 
 0 0 B ··· 0 0 
{E:JnC} Jbn (λ0 ) =  . .. .. . . .. ..  (11.2.3)
 ..
 
 . . . . . 

 0 0 0 · · · B I2 
0 0 0 ··· 0 B

where λ0 = σ + iτ and B is the 2 × 2 matrix


 
σ −τ
B= .
τ σ

Lemma 11.2.4. Let λ0 be a complex number. Then the algebraic multiplicity of the eigen-
value λ0 in the 2n × 2n matrix Jbn (λ0 ) is n and the geometric multiplicity is 1.

Proof We begin by showing that the eigenvalues of J = Jbn (λ0 ) are λ0 and λ0 , each
with algebraic multiplicity n. The characteristic polynomial of J is pJ (λ) = det(J − λI2n ).
From Lemma 7.1.9 of Chapter 7 and induction, we see that pJ (λ) = pB (λ)n . Since the
eigenvalues of B are λ0 and λ0 , we have proved that the algebraic multiplicity of each of
these eigenvalues in J is n.
Next, we compute the eigenvectors of J. Let Jv = λ0 v and let v = (v1 , . . . , vn ) where each
vj ∈ C2 . Observe that (J − λ0 I2n )v = 0 if and only if

Qv1 + v2 = 0
..
.
Qvn−1 + vn = 0
Qvn = 0,

where Q = B − λ0 I2 . Using the fact that λ0 = σ + iτ , it follows that


 
i 1
Q = B − λ0 I2 = −τ .
−1 i

Hence  
2 2 i 1
Q = 2τ i = −2τ iQ.
−1 i

600
§11.2 Multiplicity and Generalized Eigenvectors

Thus
0 = Q2 vn−1 + Qvn = −2τ iQvn−1 ,
from which it follows that Qvn−1 + vn = vn = 0. Similarly, v2 = · · · = vn−1 = 0. Since
there is only one nonzero complex vector v1 (up to a complex scalar multiple) satisfying
Qv1 = 0,
it follows that the geometric multiplicity of λ0 in the matrix Jbn (λ0 ) equals 1. 
{D:jordanblock}
Definition 11.2.5. The real matrices Jn (λ0 ) when λ0 ∈ R and Jbn (λ0 ) when λ0 ∈ C are
real Jordan blocks. The matrices Jn (λ0 ) when λ0 ∈ C are (complex) Jordan blocks.

Generalized Eigenvectors and Generalized Eigenspaces What happens when n × n ma-


trices have fewer that n linearly independent eigenvectors? Answer: The matrices gain
generalized eigenvectors.
Definition 11.2.6. A vector v ∈ Cn is a generalized eigenvector for the n × n matrix A
with eigenvalue λ if
{e:geneig} (A − λIn )k v = 0 (11.2.4)
for some positive integer k. The smallest integer k for which (11.2.4) is satisfied is called
the index of the generalized eigenvector v.

Note: Eigenvectors are generalized eigenvectors with index equal to 1.


Let λ0 be a real number and let N = Jn (λ0 ) − λ0 In . Recall that (11.2.2) implies that
N n = 0. Hence every vector in Rn is a generalized eigenvector for the matrix Jn (λ0 ). So
Jn (λ0 ) provides a good example of a matrix whose lack of eigenvectors (there is only one
independent eigenvector) is made up for by generalized eigenvectors (there are n independent
generalized eigenvectors).
Let λ0 be an eigenvalue of the n × n matrix A and let A0 = A − λ0 In . For simplicity, assume
that λ0 is real. Note that
null space(A0 ) ⊂ null space(A20 ) ⊂ · · ·
⊂ null space(Ak0 ) ⊂ · · ·
⊂ Rn .
Therefore, the dimensions of the null spaces are bounded above by n and there must be a
smallest k such that
dim null space(Ak0 ) = dim null space(Ak+1
0 ).

601
§11.2 Multiplicity and Generalized Eigenvectors

It follows that
{E:nullsequal}
{L:Jordan} null space(Ak0 ) = null space(Ak+1
0 ). (11.2.5)
Lemma 11.2.7. Let λ0 be a real eigenvalue of the n × n matrix A and let A0 = A − λ0 In .
Let k be the smallest integer for which (11.2.5) is valid. Then
null space(Ak0 ) = null space(Ak+j
0 )
for every interger j > 0.

Proof We can prove the lemma by induction on j if we can show that


null space(Ak+1
0 ) = null space(Ak+2
0 ).
Since null space(Ak+1
0 ) ⊂ null space(Ak+2
0 ), we need to show that
null space(Ak+2
0 ) ⊂ null space(Ak+1
0 ).
Let w ∈ null space(Ak+2
0 ). It follows that
Ak+1 Aw = Ak+2 w = 0;
so Aw ∈ null space(Ak+1
0 ) = null space(Ak0 ), by (11.2.5). Therefore,
Ak+1 w = Ak (Aw) = 0,
which verifies that w ∈ null space(Ak+1
0 ). 

Let Vλ0 be the set of all generalized eigenvectors of A with eigenvalue λ0 . Let k be the
smallest integer satisfying (11.2.5), then Lemma 11.2.7 implies that
Vλ0 = null space(Ak0 ) ⊂ Rn
is a subspace called the generalized eigenspace of A associated to the eigenvalue λ0 . It will
follow from the Jordan normal form theorem (see Theorem 11.3.2) that the dimension of
Vλ0 is the algebraic multiplicity of λ0 .

An Example of Generalized Eigenvectors Find the generalized eigenvectors of the 4×4 matrix
 
−24 −58 −2 −8
 15 35 1 4 
{MATLAB:1} A= . (11.2.6*)
 3 5 7 4 
3 6 0 6
and their indices. When finding generalized eigenvectors of a matrix A, the first two steps
are:

602
§11.2 Multiplicity and Generalized Eigenvectors

(i) Find the eigenvalues of A.


(ii) Find the eigenvectors of A.

After entering A into MATLAB by typing e13_3_6, we type eig(A) and find that all of
the eigenvalues of A equal 6. Without additional information, there could be 1,2,3 or 4
linearly independent eigenvectors of A corresponding to the eigenvalue 6. In MATLAB we
determine the number of linearly independent eigenvectors by typing null(A-6*eye(4))
and obtaining

ans =
0.8892 0
-0.4446 0.0000
-0.0262 0.9701
-0.1046 -0.2425

We now know that (numerically) there are two linearly independent eigenvectors. The next
step is find the number of independent generalized eigenvectors of index 2. To complete this
calculation, we find a basis for the null space of (A−6I4 )2 by typing null((A-6*eye(4))^2)
obtaining

ans =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

Thus, for this example, all generalized eigenvectors that are not eigenvectors have index 2.

Exercises

In Exercises 1 – 4 determine the eigenvalues and their geometric and algebraic multiplicities for the
{c10.5.1a} given matrix.
 
2 0 0 0
 0 3 1 0 
1. A =  0 0 3 0 .

0 0 0 4
The eigenvalues of matrix A are:

603
§11.2 Multiplicity and Generalized Eigenvectors

eigenvalue algebraic multiplicity geometric multiplicity


2 1 1
3 2 1
4 1 1
{c10.5.1b}
 
2 0 0 0
 0 2 0 0 
2. B = 
 .
0 0 3 1 
0 0 0 3
The eigenvalues of matrix B are:

eigenvalue algebraic multiplicity geometric multiplicity


2 2 2
3 2 1
{c10.5.1c}
 
−1 1 0 0
 0 −1 0 0 
3. C = 
 .
0 0 −1 0 
0 0 0 1
The eigenvalues of matrix C are:

eigenvalue algebraic multiplicity geometric multiplicity


−1 3 2
1 1 1
{c10.5.1d}
 
2 −1 0 0
 1 2 0 0 
4. D = 
 .
0 0 2 1 
0 0 0 2
The eigenvalues of matrix D are:

eigenvalue algebraic multiplicity geometric multiplicity


2+i 1 1
2−i 1 1
2 2 1

In Exercises 5 – 8 find a basis consisting of the eigenvectors for the given matrix supplemented by
generalized eigenvectors. Choose the generalized eigenvectors with lowest index possible.
{c10.5.2a}
 
1 −1
5. A = .
1 3

604
§11.2 Multiplicity and Generalized Eigenvectors

Answer: v1 = (−1, 1) and v2 = (0, 1) is a basis.


Solution: The characteristic polynomial of A is λ2 − 4λ + 4 = (λ − 2)2 . Therefore, the eigenvalues
of A both equal 2. Find the eigenvectors of A by solving (A − 2I2 )v = 0. That is, solve
 
−1 −1
v = 0.
1 1

{c10.5.2b} v1 = (−1, 1) is the only independent solution. Choose any vector for v2 that is independent of v1 .
 
−2 0 −2
6. B =  −1 1 −2 .
0 1 −1
Answer: v1 = (1, −1, −1), v2 = (−2, 0, 1), v3 = (0, 1, 1) is a basis.
Solution: Determine either by direct calculation of the characteristic polyomial of B or by using
MATLAB that the eiegenvalues of B are 0, −1, −1. Let v1 be the eigenvector with 0 eigenvalue; so
v1 = (1, −1, −1). There is only one independent eigenvector associated with the eigenvalue −1 and
that eigenvector is v2 = (−2, 0, 1). Let v3 be any generalized eigenvector associated with the eigen-
value −1; one choice is v3 = (0, 1, 1). These eigenvectors can be found by direct calculation or by
using MATLAB . When using MATLAB find v1 by typing null(B), v2 by typing null(B+eye(3)),
{c10.5.2c} and v3 by typing null((B+eye(3))^2)
 
−6 31 −14
7. C =  −1 6 −2 .
0 2 1
Answer: v1 = (9, 1, −1), v2 = (−2, 0, 1), v3 = (9, 1, −2) is a basis.
Solution: Determine either by direct calculation of the characteristic polyomial of C or by using
MATLAB that the eiegenvalues of C are −1, 1, 1. Let v1 be the eigenvector with −1 eigenvalue;
so v1 = (9, 1, −1). There is only one independent eigenvector associated with the eigenvalue 1
and that eigenvector is v2 = (−2, 0, 1). Let v3 be any generalized eigenvector associated with the
eigenvalue 1; one choice is v3 = (9, 1, −2). These eigenvectors can be found by direct calculation
or by using MATLAB . When using MATLAB find v1 by typing null(C+eye(3)), v2 by typing
{c10.5.2d} null(D-eye(3)), and v3 by typing null((C-eye(3))^2)
 
5 1 0
8. D =  −3 1 1 .
−12 −4 0
Answer: v1 = (1, −3, 0), v2 = (0, 1, −2), v3 = (1, 0, 0) is a basis.
Solution: Determine either by direct calculation of the characteristic polyomial of D or by using
MATLAB that the eiegenvalues of C are 2, 2, 2. Let v1 be the eigenvector with eigenvalue 2; so
v1 = (1, −3, 0). There is only one independent generalized eigenvector of index 2 associated with
the eigenvalue 2 and that generalized eigenvector is v2 = (0, 1, −2). Let v3 be any generalized eigen-
vector of index 3 associated with the eigenvalue 2; one choice is v3 = (1, 0, 0). These eigenvectors

605
§11.2 Multiplicity and Generalized Eigenvectors

can be found by direct calculation or by using MATLAB . When using MATLAB find v1 by typing
null(C-2*eye(3)) and v2 by typing null((D-eye(3))^2).

In Exercises 9 – 10, use MATLAB to find the eigenvalues and their algebraic and geometric multi-
plicities for the given matrix.

{c10.5.3A} 9. (matlab)  
2 3 −21 −3
 2 7 −41 −5 
{eigenvalue-exercise} A= . (11.2.7*)
 0 1 −5 −1 
0 0 4 4

Answer: The eigenvalue 2 has algebraic multiplicity 4 and geometric multiplicity 1.


Solution: Using MATLAB to find eigenvalues of high algebraic multiplicity is numerically dan-
gerous. Type eig(A) and obtain

2.0000 + 0.0006i
2.0000 - 0.0006i
1.9994
2.0006

Since the coefficients of A are all integers, you might be suspicious of the answer and guess that all
of the eigenvalues of A equal 2. Type null(A-2*eye(4)) and obtain

ans =
-0.4804
-0.8006
-0.1601
0.3203

dividing by ans(3) yields the eigenvector v1 = (3, 5, 1, −2). To check whether the eigenvalue 2 has
algebraic multiplicity greater than 1, type null((A-eye(4))^2) and obtain

ans =
0.5071 0
0.8452 0
0.1690 0
0 1.0000

Thus v2 = (0, 0, 0, 1) is a generalized eigenvector of A with index 2 and eigenvalue 2. To find


generalized eigenvectors of index 3 type null((A-eye(4))^3) and obtain

606
§11.2 Multiplicity and Generalized Eigenvectors

ans =
-0.9487 0 0
0 1.0000 0
-0.3162 0 0
0 0 1.0000

Thus, v3 = (0, 1, 0, 0) is a generalized eigenvector of index 3. Type null((A-eye(4))^4) and obtain

ans =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

to see that 2 is an eigenvalue of A of algebraic multiplicity 4 and geometric multiplicity 1.

{c10.5.3B} 10. (matlab)  


179 −230 0 10 −30
 144 −185 0 8 −24 
{eigenvalue-exercise-2} (11.2.8*)
 
B=
 30 −39 −1 3 −9 .

 192 −245 0 9 −30 
40 −51 0 2 −7

Answer: The eigenvalue −1 has algebraic multiplicity 5 and geometric multiplicity 3.


Solution: Type eig(A) and obtain

ans =
-1.0000
-1.0000 + 0.0000i
-1.0000 - 0.0000i
-1.0000
-1.0000

Type null(B+eye(5)) and obtain

ans =
0.7701 -0.1043 0.0000
0.6160 -0.0835 0.0000
-0.0000 -0.0000 -1.0000
-0.0966 -0.9443 0.0000
-0.1349 -0.3008 0.0000

We see that −1 is an eigenvalue of B with geometric multiplicity 5.

607
§11.3 The Jordan Normal Form Theorem

{S:JNF} 11.3 The Jordan Normal Form Theorem


The question that we discussed in Sections 7.3 and 11.1 is: Up to similarity, what is the
simplest form that a matrix can have? We have seen that if A has real distinct eigenvalues,
then A is real diagonalizable. That is, A is similar to a diagonal matrix whose diagonal
entries are the real eigenvalues of A. Similarly, if A has distinct real and complex eigenvalues,
then A is complex diagonalizable; that is, A is similar either to a diagonal matrix whose
diagonal entries are the real and complex eigenvalues of A or to a real block diagonal matrix.
In this section we address the question of simplest form when a matrix has multiple eigen-
values. In much of this discussion we assume that A is an n × n matrix with only real
eigenvalues. Lemma 7.3.3 shows that if the eigenvectors of A form a basis, then A is diago-
nalizable. Indeed, for A to be diagonalizable, there must be a basis of eigenvectors of A. It
follows that if A is not diagonalizable, then A must have fewer than n linearly independent
eigenvectors.
The prototypical examples of matrices having fewer eigenvectors than eigenvalues are the
matrices Jn (λ) for λ real (see (11.2.1)) and Jbn (λ) for λ complex (see (11.2.3)).
Definition 11.3.1. A matrix is in Jordan normal form if it is block diagonal and the matrix
in each block on the diagonal is a Jordan block, that is, J` (λ) for some integer ` and some
real or complex number λ.
A matrix is in real Jordan normal form if it is block diagonal and the matrix in each block
on the diagonal is a real Jordan block, that is, either J` (λ) for some integer ` and some real
number λ or Jb` (λ) for some integer ` and some complex number λ.

{T:Jordan} The main theorem about Jordan normal form is:


Theorem 11.3.2 (Jordan normal form). Let A be an n × n matrix. Then A is similar to
a Jordan normal form matrix and to a real Jordan normal form matrix.

This theorem is proved by constructing a basis V for Rn so that the matrix S −1 AS is in


Jordan normal form, where S is the matrix whose columns consists of vectors in V. The
algorithm for finding the basis V is complicated and is found in Appendix 11.5. In this
section we construct V only in the special and simpler case where each eigenvalue of A is
real and is associated with exactly one Jordan block.
More precisely, let λ1 , . . . , λs be the distinct eigenvalues of A and let

Aj = A − λj In .

The eigenvectors corresponding to λj are the vectors in the null space of Aj and the gener-
alized eigenvectors are the vectors in the null space of Akj for some k. The dimension of the

608
§11.3 The Jordan Normal Form Theorem

null space of Aj is precisely the number of Jordan blocks of A associated to the eigenvalue
λj . So the assumption that we make here is
nullity(Aj ) = 1
for j = 1, . . . , s.
Let kj be the integer whose existence is specified by Lemma 11.2.7. Since, by assumption,
there is only one Jordan block associated with the eigenvalue λj , it follows that kj is the
algebraic multiplicity of the eigenvalue λj .
To find a basis in which the matrix A is in Jordan normal form, we proceed as follows. First,
let wjkj be a vector in
null space(Aj j ) – null space(Aj j ).
k k −1

Define the vectors wji by


wj,kj −1 = Aj wj,kj
..
.
wj,1 = Aj wj,2 .
Second, when λj is real, let the kj vectors vji = wji , and when λj is complex, let the 2kj
vectors vji be defined by
vj,2i−1 = Re(wji )
vj,2i = Im(wji ).
Let V be the set of vectors vji ∈ Rn . We will show in Appendix 11.5 that the set V consists
of n vectors and is a basis of Rn . Let S be the matrix whose columns are the vectors in V.
Then S −1 AS is in Jordan normal form.

The Cayley Hamilton Theorem As a corollary of the Jordan normal form theorem, we prove
the Cayley Hamilton theorem which states that a square matrix satisfies its characteristic
{T:CH} polynomial. More precisely:
Theorem 11.3.3 (Cayley Hamilton). Let A be a square matrix and let pA (λ) be its char-
acteristic polynomial. Then
pA (A) = 0.

Proof Let A be an n × n matrix. The characteristic polynomial of A is


pA (λ) = det(A − λIn ).

609
§11.3 The Jordan Normal Form Theorem

Suppose that B = P −1 AP is a matrix similar to A. Theorem 7.2.8 states that pB = pA .


Therefore
pB (B) = pA (P −1 AP ) = P −1 pA (A)P.

So if the Cayley Hamilton theorem holds for a matrix similar to A, then it is valid for the
matrix A. Moreover, using the Jordan normal form theorem, we may assume that A is in
Jordan normal form.
Suppose that A is block diagonal, that is
 
A1 0
A= ,
0 A2

where A1 and A2 are square matrices. Then

pA (λ) = pA1 (λ)pA2 (λ).

This observation follows directly from Lemma 7.1.9. Since

Ak1
 
0
Ak = ,
0 Ak2

it follows that
 
pA (A1 ) 0
pA (A) =
0 pA (A2 )
 
pA1 (A1 )pA2 (A1 ) 0
= .
0 pA1 (A2 )pA2 (A2 )

It now follows from this calculation that if the Cayley Hamilton theorem is valid for Jordan
blocks, then pA1 (A1 ) = 0 = pA2 (A2 ). So pA (A) = 0 and the Cayley Hamilton theorem is
valid for all matrices.
A direct calculation shows that Jordan blocks satisfy the Cayley Hamilton theorem. To
begin, suppose that the eigenvalue of the Jordan block is real. Note that the characteristic
polynomial of the Jordan block Jn (λ0 ) in (11.2.1) is (λ − λ0 )n . Indeed, Jn (λ0 ) − λ0 In is
strictly upper triangular and (Jn (λ0 ) − λ0 In )n = 0. If λ0 is complex, then either repeat this
calculation using the complex Jordan form or show by direct calculation that (A−λ0 In )(A−
λ0 In ) is strictly upper triangular when A = Jbn (λ0 ) is the real Jordan form of the Jordan
block in (11.2.3). 

610
§11.3 The Jordan Normal Form Theorem

An Example Consider the 4 × 4 matrix


 
−147 −106 −66 −488
 604 432 271 1992 
{e:Aexamp} A=  . (11.3.1*)
621 448 279 2063 
−169 −122 −76 −562
Using MATLAB we can compute the characteristic polynomial of A by typing

poly(A)

The output is

ans =
1.0000 -2.0000 -15.0000 -0.0000 -0.0000

Note that since A is a matrix of integers we know that the coefficients of the characteristic
polynomial of A must be integers. Thus the characteristic polynomial is exactly:
pA (λ) = λ4 − 2λ3 − 15λ2 = λ2 (λ − 5)(λ + 3).
So λ1 = 0 is an eigenvalue of A with algebraic multiplicity two and λ2 = 5 and λ3 = −3 are
simple eigenvalues of multiplicity one.
We can find eigenvectors of A corresponding to the simple eigenvalues by typing

v2 = null(A-5*eye(4));
v3 = null(A+3*eye(4));

At this stage we do not know how many linearly independent eigenvectors have eigenvalue
0. There are either one or two linearly independent eigenvectors and we determine which
by typing null(A) and obtaining

ans =
-0.1818
0.6365
0.7273
-0.1818

So MATLAB tells us that there is just one linearly independent eigenvector having 0 as an
eigenvalue. There must be a generalized eigenvector in V0 . Indeed, the null space of A2 is
two dimensional and this fact can be checked by typing

611
§11.3 The Jordan Normal Form Theorem

null2 = null(A^2)

obtaining

null2 =
0.2193 -0.2236
-0.5149 -0.8216
-0.8139 0.4935
0.1561 0.1774

Choose one of these vectors, say the first vector, to be v12 by typing

v12 = null2(:,1);

Since the algebraic multiplicity of the eigenvalue 0 is two, we choose the fourth basis vector
be v11 = Av12 . In MATLAB we type

v11 = A*v12

obtaining

v11 =
-0.1263
0.4420
0.5051
-0.1263

Since v11 is nonzero, we have found a basis for V0 . We can now put the matrix A in Jordan
normal form by setting

S = [v11 v12 v2 v3];


J = inv(S)*A*S

to obtain

J =
-0.0000 1.0000 0.0000 -0.0000
0.0000 0.0000 0.0000 -0.0000
-0.0000 -0.0000 5.0000 0.0000
0.0000 -0.0000 -0.0000 -3.0000

612
§11.3 The Jordan Normal Form Theorem

We have only discussed a Jordan normal form example when the eigenvalues are real and
multiple. The case when the eigenvalues are complex and multiple first occurs when n = 4.
A sample complex Jordan block when the matrix has algebraic multiplicity two eigenvalues
σ ± iτ of geometric multiplicity one is
 
σ −τ 1 0
 τ σ 0 1 
 .
 0 0 σ −τ 
0 0 τ σ

Numerical Difficulties When a matrix has multiple eigenvalues, then numerical difficulties
can arise when using the MATLAB command eig(A), as we now explain.
Let p(λ) = λ2 . Solving p(λ) = 0 is very easy — in theory — as λ = 0 is a double root of p.
Suppose, however, that we want to solve p(λ) = 0 numerically. Then, numerical errors will
lead to solving the equation
λ2 = 

where  is a small
p number. Note that if  > 0, the solutions are ± ; while if  < 0, the
solutions are ±i ||. Since numerical errors are machine dependent,  can be of either sign.
The numerical process of finding double roots of a characteristic polynomial (that is, double
eigenvalues of a matrix) is similar to numerically solving the equation λ2 = 0, as we shall
see.
For example, on a Sun SPARCstation 10 using MATLAB version 4.2c, the eigenvalues of
the 4 × 4 matrix A in (11.3.1*) (in format long) obtained using eig(A) are:

ans =
5.00000000001021
-0.00000000000007 + 0.00000023858927i
-0.00000000000007 - 0.00000023858927i
-3.00000000000993

That is, MATLAB computes two complex conjugate eigenvalues

±0.00000023858927i

which corresponds to an  of -5.692483975913288e-14. On a IBM compatible 486 com-


puter using MATLAB version 4.2 the same computation yields eigenvalues

ans=

613
§11.3 The Jordan Normal Form Theorem

4.99999999999164
0.00000057761008
-0.00000057760735
-2.99999999999434

That is, on this computer MATLAB computes two real, near zero, eigenvalues

±0.00000057761

that corresponds to an  of 3.336333121e-13. These errors are within round off error in
double precision computation.
A consequence of these kinds of error, however, is that when a matrix has multiple eigenval-
ues, we cannot use the command [V,D] = eig(A) with confidence. On the Sun SPARC-
station, this command yields a matrix

V =
-0.1652 0.0000 - 0.1818i 0.0000 + 0.1818i -0.1642
0.6726 -0.0001 + 0.6364i -0.0001 - 0.6364i 0.6704
0.6962 -0.0001 + 0.7273i -0.0001 - 0.7273i 0.6978
-0.1888 0.0000 - 0.1818i 0.0000 + 0.1818i -0.1915

that suggests that A has two complex eigenvectors corresponding to the ‘complex’ pair of
near zero eigenvalues. The IBM compatible yields the matrix

V =
-0.1652 0.1818 -0.1818 -0.1642
0.6726 -0.6364 0.6364 0.6704
0.6962 -0.7273 0.7273 0.6978
-0.1888 0.1818 -0.1818 -0.1915

indicating that MATLAB has found two real eigenvectors corresponding to the near zero
real eigenvalues. Note that the two eigenvectors corresponding to the eigenvalues 5 and −3
are correct on both computers.

Exercises

{c10.5.2}
1. Write two different 4 × 4 Jordan normal form matrices all of whose eigenvalues equal 2 for which
the geometric multiplicity is two.

614
§11.3 The Jordan Normal Form Theorem

Answer: Two such matrices are:


   
2 1 0 0 2 1 0 0
 0 2 0 0   0 2 1 0 

 0 0
 and  
2 1   0 0 2 0 
0 0 0 2 0 0 0 2

{c10.5.2A}
2. How many different 6 × 6 Jordan form matrices have all eigenvalues equal to 3? (We say that
two Jordan form matrices are the same if they have the same number and type of Jordan block,
though not necessarily in the same order along the diagonal.)
Answer: There are 10 different Jordan normal form matrices.
Solution: There is 1 matrix with six Jordan blocks. There is 1 matrix with five Jordan blocks.
There are 2 matrices with four Jordan blocks; one matrix has a Jordan block of size three and
three blocks of size one and the other matrix has two Jordan blocks of size two and two of size
one. There are 2 matrices with three Jordan blocks; one has a block of size four and one has three
blocks of size two. There are 3 matrices with two Jordan blocks; one matrix each with a block of
size n, where n = 1, 2, 3. There is 1 Jordan matrix with one Jordan block. Altogether, there are 10
different matrices.
{c10.5.2B}
3. A 5 × 5 matrix A has three eigenvalues equal to 4 and two eigenvalues equal to −3. List the
possible Jordan normal forms for A (up to similarity). Suppose that you can ask your computer
to compute the nullity of precisely two matrices. Can you devise a strategy for determining the
Jordan normal form of A? Explain your answer.
Answer: There are six different Jordan normal form matrices.
Solution: There are 3 different Jordan matrices associated with the eigenvalue 4 and 2 different
Jordan matrices associated with the eigenvalue −3. Altogether, there are 3 · 2 = 6 different Jordan
matrices.
{mc11_3A}
4. Find the Jordan Normal Form for each matrix
     
1 4 0 −1 −2 1
C1 = C2 = C3 =
1 1 1 −2 −10 4

and for each j determine whether Ẋ = Cj X is a saddle, sink, or source.


Solution:

C1 : The characteristic polynomial 


of C1 is λ2 
− 2λ − 3 = (λ + 1)(λ − 3). Hence, the eigenvalues of
−1 0
C1 are −1 and 3. The JNF is and the origin is a saddle.
0 3

615
§11.3 The Jordan Normal Form Theorem

C2 : The characteristic polynomial


 of C2is λ2 + 2λ + 1 = (λ + 1)2 . Hence the eigenvalues of A2 are
−1 1
both −1. The JNF is and the origin is a sink.
0 −1
C3 : The characteristic
 polynomial
 of C3 is λ2 − 2λ + 2 and the eigenvalues of C3 are 1 ± i. Hence,
1 −1
the JNF is and the origin is a source.
1 1
{c10.5.2C}
5. An 8 × 8 real matrix A has three eigenvalues equal to 2, two eigenvalues equal to 1 + i, and one
zero eigenvalue. List the possible Jordan normal forms for A (up to similarity). Suppose that you
can ask your computer to compute the nullity of precisely two matrices. Can you devise a strategy
for determining the Jordan normal form of A? Explain your answer.
Answer: There are 6 possible Jordan normal form matrices which are determined by computing
nullity(A − 2I8 ) and nullity(A − (1 + i)I8 ).
Solution: Since A is real, the eigenvalues of A are 2 with multiplicity three; 1 + i with multiplicity
two; 1 − i with multiplicity two; and 0 with multiplicity 1. There are three possible Jordan blocks
associated with the eigenvalues 2 determined by the geometric multiplicity of the eigenvalue 2
and two possible Jordan blocks associated with the eigenvalues 1 ± i determined by the geometric
multiplicity of the eigenvalue 1+i. The geometric multiplicity of the eigenvalue λ is the dimension of
the null space of A − λI8 . Thus there are six possible Jordan normal forms and they are determined
by computing the dimensions of null space(A − 2I8 ) and null space(A − (1 + i)I8 ).

{c10.5.3a} In Exercises 6 – 11 find the Jordan normal forms for the given matrix.
 
2 4
6. A = .
1 1
Answer: The Jordan normal form of matrix A is
 √ 
3 + 17
0
 2 √ .

3 − 17

0
2
√ √
3+ 17 3− 17
Solution: Matrix A has two distinct real eigenvalues at λ = and λ = .
{c10.5.3b} 2 2
 
9 25
7. B = .
−4 −11
Answer: The Jordan normal form of matrix B is
 
−1 1
.
0 −1

Solution: Matrix B has one real eigenvalue at λ = −1 with one linearly independent eigenvector.

616
§11.3 The Jordan Normal Form Theorem

{c10.5.4}
 
−5 −8 −9
8. C =  5 9 9 .
−1 −2 −1
Answer: The Jordan normal form of matrix C is
 
−1 0 0
 0 2 1 .
0 0 2

Solution: Matrix C has two real eigenvalues. The eigenvalue at −1 has algebraic multiplicity of 1,
{c10.5.4a} and the eigenvalue at 2 has algebraic multiplicity of 2, and only 1 linearly independent eigenvector.
 
0 1 0
9. D =  0 0 1 .
1 1 −1
Answer: The Jordan normal form of matrix D is
 
−1 1 0
 0 −1 0 .
0 0 1

Solution: Matrix D has two real eigenvalues. The eigenvalue at −1 has algebraic multiplicity of 2
{c10.5.4b} and geometric multiplicity 1, and the eigenvalue at 1 has multiplicity of 1.
 
2 0 −1
10. E =  2 1 −1 .
1 0 0
Answer: The Jordan normal form of matrix E is
 
1 1 0
 0 1 1 .
0 0 1

Solution: Matrix E has one eigenvalue at 1 of algebraic multiplicity of 3 and geometric multiplicity
{c10.5.4c} 1.
 
3 −1 2
11. F =  −1 2 −1 .
−1 1 0
Answer: The Jordan normal form of matrix F is
 
2 1 0
 0 2 0 .
0 0 1

617
§11.3 The Jordan Normal Form Theorem

Solution: Matrix F has two real eigenvalues. The eigenvalue at 2 has algebraic multiplicity of 2
and geometric multiplicity 1, and the eigenvalue at 1 has multiplicity of 1.
{c10.5.5A}
 
2 0 0
12. Compute etJ where J =  0 −1 1 .
0 0 −1
 t 
e 0 0
Answer: etJ =  0 e−t te−t .
0 0 e−t
Solution: The matrix J is in block diagonal form so we can compute the matrix exponential by
computing the matrix exponential of each block. In particular, let
 
−1 1
M= .
0 −1
Then  
1 t
etM = e−t (I2 + tM ) = e−t .
0 1
Therefore,
et
 
0 0
et
 
0
etJ
= = 0 e−t te−t  .
etM

0
0 0 e−t
{c10.5.5B}
 
2 1 0 0 0
 0 2 0 0 0 
13. Compute etJ where J =  .
 
 0 0 3 1 0 
 0 0 0 3 1 
0 0 0 0 3
   
2t 1 t
 e 0 1
0 
1 2  
 
Answer: e = 

.
J
1 t t


0 e3t  0
 2  
 1 t  
0 0 1
Solution: The matrix J has block diagonal form
 
L 0
J=
0 M
where  
  3 1 0
2 1
L= and M =  0 3 1 .
0 2
0 0 3

618
§11.3 The Jordan Normal Form Theorem

Therefore,
eL
 
0
etJ =
0 eM
where  1 2 
  1 t t
1 t 2
eL = e2t and eM = e3t  0 t .
 
0 1 1
0 0 1

{c10.5.5}
14. An n × n matrix N is nilpotent if N k = 0 for some positive integer k.

(a) Show that the matrix N defined in (11.2.2) is nilpotent.


(b) Show that all eigenvalues of a nilpotent matrix equal zero.
(c) Show that any matrix similar to a nilpotent matrix is also nilpotent.
(d) Let N be a matrix all of whose eigenvalues are zero. Use the Jordan normal form theorem to
show that N is nilpotent.

(a) The linear map N 2 satisfies

N 2 e1 = N 2 e2 = 0, and N 2 ej = ej−2 for j = 3, . . . , n.

The linear map N 3 satisfies

N 3 e1 = N 3 e2 = N 3 e3 = 0, and N 3 ej = ej−3 for j = 4, . . . , n.

By induction, N n ej = 0 for j = 1, . . . , n. So N n = 0, and N is nilpotent.


(b) Let λ be an eigenvalue of matrix N . Then, there exists a nonzero vector v such that N v = λv.
Therefore, N n v = λn v. If N is nilpotent, then λn v = 0. Since v is nonzero, λn = 0, which implies
λ = 0.
(c) Let M = P −1 N P be a matrix similar to nilpotent n × n matrix N . Then M n = (P −1 N P )n =
P −1 N n P . Since N is nilpotent, N n = 0. So M n = 0, and M is also nilpotent.
(d) By the Jordan normal form theorem, N is similar to some matrix W which is block diagonal and
has the matrices Nj along the diagonal, where each Nj is a Jordan block. By (a) of this problem,
Njn = 0 for each j. W n is block diagonal with blocks Njn . Since Njn = 0 for all j, W n = 0. Thus,
W is nilpotent, and, by (c) of this problem, N is also nilpotent.

{c10.5.5C}
15. Let A be a 3 × 3 matrix. Use the Cayley-Hamilton theorem to show that A−1 is a linear
combination of I3 , A, A2 . That is, there exist real scalars a, b, c such that

A−1 = aI3 + bA + cA2 .

619
§11.3 The Jordan Normal Form Theorem

The characteristic polynomial of a 3 × 3 matrix A has the form

pA (λ) = −λ3 + b2 λ2 + b1 λ + b0 .

The Cayley-Hamilton theorem states that

pA (A) = −A3 + b2 A2 + b1 A + b0 I3 = 0.

Therefore
A(−A2 + b2 A + b1 I3 ) = −b0 I3 .
Since pA (λ) = det(A − λI3 ), it follows that b0 = pA (0) = det(A). Since A is invertible, det(A) 6= 0;
that is, b0 6= 0. Finally,
1
A−1 = − (−A2 + b2 A + b1 I3 ) = aI3 + bA + cA2 ,
b0
where a = −b1 /b0 , b = −b2 /b0 , and c = 1/b0 .

In Exercises 16 – 20, (a) determine the real Jordan normal form for the given matrix A, and (b)
find the matrix S so that S −1 AS is in real Jordan normal form.

{E:jnfma} 16. (matlab)  


−3 −4 −2 0
 −9 −39 −16 −7 
{jordan-form-exercise} A=
 18
. (11.3.2*)
64 27 10 
15 86 34 18

(a) Answer: The Jordan normal form of A is


 
3 0 0 0
 0 1 0 0 
J = .
0 0 0 0 
0 0 0 −1

Solution: To find the Jordan normal form of matrix A, type eig(A) to find the eigenvalues.
Matrix A has four distinct eigenvalues, so the Jordan normal form is the diagonal matrix with
these eigenvalues along its diagonal.
(b)Answer: The diagonalizing matrix is

S =
-0.1387 -0.1543 -0.0000 -0.5774
0.1387 -0.3086 -0.4082 0.0000
0.1387 0.9258 0.8165 0.5774
-0.9707 -0.1543 0.4082 -0.5774

620
§11.3 The Jordan Normal Form Theorem

Solution: The columns of matrix S consist of the eigenvectors of A.

{E:jnfmb} 17. (matlab)  


9 45 18 8
0 −4 −1 −1 
{jordan-form-exercise-2} (11.3.3*)

A=
 .
−16 −69 −29 −12 
25 123 49 23

(a) Answer: The Jordan normal form of A is


 
2 0 0 0
 0 −1 1 0 
J = .
 0 0 −1 1 
0 0 0 −1

Solution: Type eig(A) to find the eigenvalues of matrix A. Not all eigenvalues are simple, so
type null(A - lambda*eye(4)) for each eigenvalue λ to find the number of linearly independent
eigenvectors associated to it. Matrix A has a simple eigenvalue at 2, and an eigenvalue at −1 with
algebraic multiplicity 3 and one linearly independent eigenvector.
(b) Answer: The diagonalizing matrix is

S =
-0.1387 -0.5902 -1.0165 -0.9837
0.1387 0.0000 -0.5902 0.1640
0.1387 0.5902 2.1970 0.0656
-0.9707 -0.5902 -0.4263 0.0328

Solution: The first column of S is v1 , the eigenvector associated with eigenvalue 2. To find the
other columns, note that the nullity of (A + I4 ) is 1, the nullity of (A + I4 )2 is 2, and the nullity of
(A + I4 )3 is 3. Then, select one vector from ((A + I4 )3 ) and label it v23 . Set v22 = (A + I4 )v23 and
v21 = (A + I4 )2 v23 . Then, S = (v1 |v21 |v22 |v23 ).

{E:jnfmc} 18. (matlab)  


−5 −13 17 42
 −10 −57 66 187 
{jordan-form-exercise-3} A=
 . (11.3.4*)
−4 −23 26 77 
−1 −9 9 32

(a) Answer: The Jordan normal form of A is


 
0 2 0 0
 −2 0 0 0 
J = .
 0 0 −2 1 
0 0 −1 −2

621
§11.3 The Jordan Normal Form Theorem

Solution: Type eig(A) to find that matrix A has distinct eigenvalues at ±2i and 2 ± i.
(b) Answer: The diagonalizing matrix is

T =
-0.2118 -0.0456 0.2211 0.0060
-0.8548 -0.2507 0.8762 0.1803
-0.3555 -0.0988 0.3529 0.0669
-0.1437 -0.0531 0.1440 0.0344

Solution: The columns of matrix S consist of the eigenvectors of A.

{E:jnfmd} 19. (matlab)  


1 0 −9 18
 12 −7 −26 77 
{jordan-form-exercise-4} A=
 . (11.3.5*)
5 −2 −13 32 
2 −1 −4 11

(a) Answer: The Jordan normal form of A is


 
−2 1 0 0
 0 −2 0 0 
J = .
 0 0 −2 1 
0 0 0 −2

Solution: Type eig(A) to find that −2 is the only eigenvalues of A. Then type null(A +
2*eye(4)) to find the number of linearly independent eigenvectors associated to eigenvalue λ = −2.
The eigenvalue has algebraic multiplicity 4 and geometric multiplicity 2. We then find that the
nullity of (A + 2I4 )2 is 4. Therefore, all generalized eigenvectors v of D are in the null space of
A + 2I4 .
(b) Answer: The diagonalizing matrix is

S =
3 1 0 0
12 0 -5 1
5 0 -2 0
2 0 -1 0

Solution: To find S, find that A has two linearly independent eigenvectors associated to −2, and
that the null space of (A+2I4 ) is R4 . Then, select two vectors in R4 , in this case v12 = (1, 0, 0, 0)t and
v22 = (0, 1, 0, 0)t , and set v11 = (A + 2I4 )v12 and v21 = (A + 2I4 )v22 . Then, S = (v11 |v12 |v21 |v22 ).

622
§11.3 The Jordan Normal Form Theorem

{E:jnfme} 20. (matlab)  


−1 −1 1 0
 −3 1 1 0 
{jordan-form-exercise-5} A=
 −3
. (11.3.6*)
2 −1 1 
−3 2 0 0

(a) Answer: The Jordan normal form of A is


 
−1 1 0 0
 0 −1 1 0 
J = .
0 0 −1 0 
0 0 0 1

(b) Answer: The diagonalizing matrix is

S =
-1 1 0 0
-1 1 0 1
-1 0 1 1
-1 0 0 1

Solution: To find S, use MATLAB to see that A has two eigenvalues: −1 of algebraic multiplicity
three and 2 of multiplicity one. The geometric multiplicity of −1 is one. Just type null(A+eye(4))
to see that (1, 1, 1, 1) is the only eigenvector corresponding to this eigenvalue. Choose v3 in the null
space of (A + I4 )3 but not in the null space of (A + I4 )2 . For example let v3 = [0 0 1 0]'. Then
set v2 = (A+eye(4))*v3 and v1 = (A+eye(4))*v2. Finally, set v4 = null(A-2*eye(4)) and S =
[v1 v2 v3 v4].

{E:jnfmf} 21. (matlab)  


0 0 −1 2 2
 1 −2 0 2 2 
{jordan-form-exercise-6} (11.3.7*)
 
A=
 1 −1 −1 2 2 .

 0 0 0 1 2 
0 0 0 −1 3

(a) Answer: The Jordan normal form of A is


 
−1 1 0 0 0
 0 −1 1 0 0 
 
J =  0 0 −1 0 0 .

 0 0 0 2 −1 
0 0 0 1 2
(b) Answer:

623
§11.3 The Jordan Normal Form Theorem

S =
7.5 6.5 0 2 0
7.5 -1.0 1.0 2 0
7.5 -1.0 -6.5 2 0
0.0 0.0 0.0 2 0
0.0 0.0 0.0 1 1

Solution: A MATLAB calculation shows that the eigenvalues of A are −1 of algebraic multiplicity
three and geometric multiplicity one and simple eigenvalues 2 ± i. Choose v3 in the null space of
(A + I5 )3 but not in the null space of (A + I5 )2 . One choice is v3 = [0 1 -6.5 0 0]'. Let v2 =
(A+eye(5))*v3 and v1 = (A+eye(5))*v2. Next, set v4 equal to the real part of the eigenvector
associated to the eigenvalue 2 + i and let v5 be the complex part of that eigenvector. Finally, set
S = [v1 v2 v3 v4 v5].

624
§11.4 *Markov Matrix Theory

{S:TransitionTheory} 11.4 *Markov Matrix Theory


In this appendix we use the Jordan normal form theorem to study the asymptotic dynamics
of transition matrices such as those of Markov chains introduced in Section 4.8.

{T:convergeto0} The basic result is the following theorem.


Theorem 11.4.1. Let A be an n × n matrix and assume that all eigenvalues λ of A satisfy
|λ| < 1. Then for every vector v0 ∈ Rn

{E:convergeto0} lim Ak v0 = 0. (11.4.1)


k→∞

Proof Suppose that A and B are similar matrices; that is, B = SAS −1 for some invertible
matrix S. Then B k = SAk S −1 and for any vector v0 ∈ Rn (11.4.1) is valid if and only if

lim B k v0 = 0.
k→∞

Thus, when proving this theorem, we may assume that A is in Jordan normal form.
Suppose that A is in block diagonal form; that is, suppose
 
C 0
A= ,
0 D

where C is an ` × ` matrix and D is a (n − `) × (n − `) matrix. Then


 k 
C 0
Ak = .
0 Dk

So for every vector v0 = (w0 , u0 ) ∈ R` × Rn−` (11.4.1) is valid if and only if

lim C k v0 = 0 and lim Dk v0 = 0.


k→∞ k→∞

So, when proving this theorem, we may assume that A is a Jordan block.
Consider the case of a simple Jordan block. Suppose that n = 1 and that A = (λ) where λ
is either real or complex. Then
Ak v 0 = λ k v 0 .
It follows that (11.4.1) is valid precisely when |λ| < 1. Next, suppose that A is a nontrivial
Jordan block. For example, let
 
λ 1
A= = λI2 + N
0 λ

625
§11.4 *Markov Matrix Theory

where N 2 = 0. It follows by induction that

1
Ak v0 = λk v0 + kλk−1 N v0 = λk v0 + kλk N v0 .
λ
Thus (11.4.1) is valid precisely when |λ| < 1. The reason for this convergence is as follows.
The first term converges to 0 as before but the second term is the product of three terms
1
k, λk , and N v0 . The first increases to infinity, the second decreases to zero, and the third
λ
is constant independent of k. In fact, geometric decay (λk , when |λ| < 1) always beats
polynomial growth. Indeed,
{E:PG} lim mj λm = 0 (11.4.2)
m→∞

for any integer j. This fact can be proved using l’Hôspital’s rule and induction.
So we see that when A has a nontrivial Jordan block, convergence is subtler than when A
has only simple Jordan blocks, as initially the vectors Av0 grow in magnitude. For example,
suppose that λ = 0.75 and v0 = (1, 0)t . Then A8 v0 = (0.901, 0.075)t is the first vector in
the sequence Ak v0 whose norm is less than 1; that is, A8 v0 is the first vector in the sequence
closer to the origin than v0 .
It is also true that (11.4.1) is valid for any Jordan block A and for all v0 precisely when
|λ| < 1. To verify this fact we use the binomial theorem. We can write a nontrivial Jordan
block as λIn + N where N k+1 = 0 for some integer k. We just discussed the case k = 1. In
this case
 
m m m−1 m m−2 2
(λIn + N ) = λ In + mλ N+ λ N + ···
2
 
m m−k k
+ λ N ,
k

where  
m m! m(m − 1) · · · (m − j + 1)
= = .
j j!(m − j)! j!
To verify that
lim (λIn + N )m = 0
m→∞

we need only verify that each term


 
m m−j j
lim λ N =0
m→∞ j

626
§11.4 *Markov Matrix Theory

Such terms are the product of three terms


1
m(m − 1) · · · (m − j + 1) and λm and Nj.
j!λj
The first term has polynomial growth to infinity dominated by mj , the second term de-
creases to zero geometrically, and the third term is constant independent of m. The desired
convergence to zero follows from (11.4.2). 

Definition 11.4.2. The n × n matrix A has a dominant eigenvalue λ0 > 0 if λ0 is a simple


eigenvalue and all other eigenvalues λ of A satisfy |λ| < λ0 .
{T:Markovdom}
Theorem 11.4.3. Let P be a Markov matrix. Then 1 is a dominant eigenvalue of P .

Proof Recall from Chapter 3, Definition 4.8.1 that a Markov matrix is a square matrix
P whose entries are nonnegative, whose rows sum to 1, and for which a power P k that has
all positive entries. To prove this theorem we must show that all eigenvalues λ of P satisfy
|λ| ≤ 1 and that 1 is a simple eigenvalue of P .
Let λ be an eigenvalue of P and let v = (v1 , . . . , vn )t be an eigenvector corresponding to the
eigenvalue λ. We prove that |λ| ≤ 1. Choose j so that |vj | ≥ |vi | for all i. Since P v = λv,
we can equate the j th coordinates of both sides of this equality, obtaining

pj1 v1 + · · · + pjn vn = λvj .

Therefore,
|λ||vj | = |pj1 v1 + · · · + pjn vn | ≤ pj1 |v1 | + · · · + pjn |vn |,
since the pij are nonnegative. It follows that

|λ||vj | ≤ (pj1 + · · · + pjn )|vj | = |vj |,

since |vi | ≤ |vj | and rows of P sum to 1. Since |vj | > 0, it follows that λ ≤ 1.
Next we show that 1 is a simple eigenvalue of P . Recall, or just calculate directly, that
the vector (1, . . . , 1)t is an eigenvector of P with eigenvalue 1. Now let v = (v1 , . . . , vn )t be
an eigenvector of P with eigenvalue 1. Let Q = P k so that all entries of Q are positive.
Observe that v is an eigenvector of Q with eigenvalue 1, and hence that all rows of Q also
sum to 1.
To show that 1 is a simple eigenvalue of Q, and therefore of P , we must show that all
coordinates of v are equal. Using the previous estimates (with λ = 1), we obtain

{E:ineqM} |vj | = |qj1 v1 + · · · + qjn vn | ≤ qj1 |v1 | + · · · + qjn |vn | ≤ |vj |. (11.4.3)

627
§11.4 *Markov Matrix Theory

Hence
|qj1 v1 + · · · + qjn vn | = qj1 |v1 | + · · · + qjn |vn |.
This equality is valid only if all of the vi are nonnegative or all are nonpositive. Without
loss of generality, we assume that all vi ≥ 0. It follows from (11.4.3) that

vj = qj1 v1 + · · · + qjn vn .

Since qji > 0, this inequality can hold only if all of the vi are equal. 
{T:convergetoeig}
Theorem 11.4.4. (a) Let Q be an n × n matrix with dominant eigenvalue λ > 0 and
associated eigenvector v. Let v0 be any vector in Rn . Then

1 k
lim Q v0 = cv,
k→∞ λk
for some scalar c.
(b) Let P be a Markov matrix and v0 a nonzero vector in Rn with all entries nonnegative.
Then
lim (P t )k v0 = V
k→∞

where V is the eigenvector of P t with eigenvalue 1 such that the sum of the entries in V is
equal to the sum of the entries in v0 .

Proof (a) After a similarity transformation, if needed, we can assume that Q is in Jordan
normal form. More precisely, we can assume that
 
1 1 0
Q=
λ 0 A

where A is an (n − 1) × (n − 1) matrix with all eigenvalues µ satisfying |µ| < 1. Suppose


v0 = (c0 , w0 ) ∈ R × Rn−1 . It follows from Theorem 11.4.1 that
 
1 k 1 c0 0
lim Q v0 = lim ( Q)k v0 = lim = c0 e1 .
k→∞ λk k→∞ λ k→∞ 0 Ak w0

Since e1 is the eigenvector of Q with eigenvalue λ Part (a) is proved.


(b) Theorem 11.4.3 states that a Markov matrix has a dominant eigenvalue equal to 1. The
Jordan normal form theorem implies that the eigenvalues of P t are equal to the eigenvalues

628
§11.4 *Markov Matrix Theory

of P with the same algebraic and geometric multiplicities. It follows that 1 is also a dominant
eigenvalue of P t . It follows from Part (a) that

lim (P t )k v0 = cV
k→∞

for some scalar c. But Theorem 4.8.3 in Chapter 3 implies that the sum of the entries in v0
equals the sum of the entries in cV which, by assumption equals the sum of the entries in
V . Thus, c = 1. 

Exercises

{c10.6.1}
1. Let A be an n × n matrix. Suppose that

lim Ak v0 = 0.
k→∞

for every vector v0 ∈ Rn . Then the eigenvalues λ of A all satisfy |λ| < 1.
Let λ be an eigenvalue of A, and let v 6= 0 be an eigenvector associated to λ. Then,
 
0 = lim Ak v = lim λk v = lim λk v.
k→∞ k→∞ k→∞

Since v is nonzero, lim λk = 0. This is true only when |λ| < 1.


k→∞

629
§11.5 *Proof of Jordan Normal Form

{S:Jordan} 11.5 *Proof of Jordan Normal Form


We prove the Jordan normal form theorem under the assumption that the eigenvalues of
A are all real. The proof for matrices having both real and complex eigenvalues proceeds
along similar lines.
Let A be an n × n matrix, let λ1 , . . . , λs be the distinct eigenvalues of A, and let Aj =
{L:commute} A − λj In .
Lemma 11.5.1. The linear mappings Ai and Aj commute.

Proof Just compute


Ai Aj = (A − λi In )(A − λj In ) = A2 − λi A − λj A + λi λj In ,
and
Aj Ai = (A − λj In )(A − λi In ) = A2 − λj A − λi A + λj λi In .
So Ai Aj = Aj Ai , as claimed. 

{L:Ajinvertible} Let Vj be the generalized eigenspace corresponding to eigenvalue λj .


Lemma 11.5.2. Ai : Vj → Vj is invertible when i 6= j.

Proof Recall from Lemma 11.2.7 that Vj = null space(Akj ) for some k ≥ 1. Suppose that
v ∈ Vj . We first verify that Ai v is also in Vj . Using Lemma 11.5.1, just compute
Akj Ai v = Ai Akj v = Ai 0 = 0.

Therefore, Ai v ∈ null space(Akj ) = Vj .


Let B be the linear mapping Ai |Vj . It follows from Chapter 8, Theorem 8.2.3 that
nullity(B) + dim range(B) = dim(Vj ).
Now w ∈ null space(B) if w ∈ Vj and Ai w = 0. Since Ai w = (A − λi In )w = 0, it follows
that Aw = λi w. Hence
Aj w = (A − λj In )w = (λi − λj )w
and
Akj w = (λi − λj )k w.
Since λi 6= λj , it follows that Akj w = 0 only when w = 0. Hence the nullity of B is zero. We
conclude that
dim range(B) = dim(Vj ).
Thus, B is invertible, since the domain and range of B are the same space. 

630
§11.5 *Proof of Jordan Normal Form

{L:independentVj}
Lemma 11.5.3. Nonzero vectors taken from different generalized eigenspaces Vj are linearly
independent. More precisely, if wj ∈ Vj and

w = w1 + · · · + ws = 0,

then wj = 0.

Proof Let Vj = null space(Aj j ) for some integer kj . Let C = Ak22 ◦ · · · ◦Aks s . Then
k

0 = Cw = Cw1 ,

since Aj j wj = 0 for j = 2, . . . , s. But Lemma 11.5.2 implies that C|V1 is invertible. There-
k

fore, w1 = 0. Similarly, all of the remaining wj have to vanish. 


{L:spanVj}
Lemma 11.5.4. Every vector in Rn is a linear combination of vectors in the generalized
eigenspaces Vj .

Proof Let W be the subspace of Rn consisting of all vectors of the form z1 + · · · + zs


where zj ∈ Vj . We need to verify that W = Rn . Suppose that W is a proper subspace.
Then choose a basis w1 , . . . , wt of W and extend this set to a basis W of Rn . In this basis
the matrix [A]W has block form, that is,
 
A11 A12
[A]W = ,
0 A22

where A22 is an (n − t) × (n − t) matrix. The eigenvalues of A22 are eigenvalues of A. Since


all of the distinct eigenvalues and eigenvectors of A are accounted for in W (that is, in A11 ),
we have a contradiction. So W = Rn , as claimed. 
{L:basisunion}
Lemma 11.5.5. Let Vj be a basis for the generalized eigenspaces Vj and let V be the union
of the sets Vj . Then V is a basis for Rn .

Proof We first show that the vectors in V span Rn . It follows from Lemma 11.5.4 that
every vector in Rn is a linear combination of vectors in Vj . But each vector in Vj is a linear
combination of vectors in Vj . Hence, the vectors in V span Rn .
Second, we show that the vectors in V are linearly independent. Suppose that a linear
combination of vectors in V sums to zero. We can write this sum as

w1 + · · · + ws = 0,

631
§11.5 *Proof of Jordan Normal Form

where wj is the linear combination of vectors in Vj . Lemma 11.5.3 implies that each wj = 0.
Since Vj is a basis for Vj , it follows that the coefficients in the linear combinations wj must
all be zero. Hence, the vectors in V are linearly independent.
Finally, it follows from Theorem 5.5.3 of Chapter 5 that V is a basis. 
{L:diagVj}
Lemma 11.5.6. In the basis V of Rn guaranteed by Lemma 11.5.5, the matrix [A]V is block
diagonal, that is,  
A11 0 0
..
[A]V =  0 . 0 ,
 

0 0 Ass
where all of the eigenvalues of Ajj equal λj .

Proof It follows from Lemma 11.5.1 that A : Vj → Vj . Suppose that vj ∈ Vj . Then Avj
is in Vj and Avj is a linear combination of vectors in Vj . The block diagonalization of [A]V
follows. Since Vj = null space(Aj j ), it follows that all eigenvalues of Ajj equal λj .
k


Lemma 11.5.6 implies that to prove the Jordan normal form theorem, we must find a basis
in which the matrix Ajj is in Jordan normal form. So, without loss of generality, we may
assume that all eigenvalues of A equal λ0 , and then find a basis in which A is in Jordan
normal form. Moreover, we can replace A by the matrix A − λ0 In , a matrix all of whose
eigenvalues are zero. So, without loss of generality, we assume that A is an n × n matrix all
of whose eigenvalues are zero. We now sketch the remainder of the proof of Theorem 11.3.2.
Let k be the smallest integer such that Rn = null space(Ak ) and let

s = dim null space(Ak ) − dim null space(Ak−1 ) > 0.

Let z1 , . . . , zn−s be a basis for null space(Ak−1 ) and extend this set to a basis for
null space(Ak ) by adjoining the linearly independent vectors w1 , . . . , ws . Let

Wk = span{w1 , . . . , ws }.

It follows that Wk ∩ null space(Ak−1 ) = {0}.


We claim that the ks vectors W = {wj` = A` (wj )} where 0 ≤ ` ≤ k − 1 and 1 ≤ j ≤ s
are linearly independent. We can write any linear combination of the vectors in W as
yk + · · · + y1 , where yj ∈ Ak−j (Wk ). Suppose that

yk + · · · + y1 = 0.

632
§11.5 *Proof of Jordan Normal Form

Then Ak−1 (yk + · · · + y1 ) = Ak−1 yk = 0. Therefore, yk is in Wk and in null space(Ak−1 ).


Hence, yk = 0. Similarly, Ak−2 (yk−1 + · · · + y1 ) = Ak−2 yk−1 = 0. But yk−1 = Aŷk where
ŷk ∈ Wk and ŷk ∈ null space(Ak−1 ). Hence, ŷk = 0 and yk−1 = 0. Similarly, all of the yj = 0.
It follows from yj = 0 that a linear combination of the vectors Ak−j (w1 ), . . . , Ak−j (ws ) is
zero; that is

0 = β1 Ak−j (w1 ) + · · · + βs Ak−j (ws ) = Ak−j (β1 w1 + · · · + βs ws ).

Applying Aj−1 to this expression, we see that

β1 w 1 + · · · + βs w s

is in Wk and in the null space(Ak−1 ). Hence,

β1 w1 + · · · + βs ws = 0.

Since the wj are linearly independent, each βj = 0, thus verifying the claim.
Next, we find the largest integer m so that

t = dim null space(Am ) − dim null space(Am−1 ) > 0.

Proceed as above. Choose a basis for null space(Am−1 ) and extend to a basis for
null space(Am ) by adjoining the vectors x1 , . . . , xt . Adjoin the mt vectors A` xj to the
set V and verify that these vectors are all linearly independent. And repeat the process.
Eventually, we arrive at a basis for Rn = null space(Ak ).
In this basis the matrix [A]V is block diagonal; indeed, each of the blocks is a Jordan block,
since 
wj(`−1) 0 < ` ≤ k − 1
A(wj` ) = .
0 `=1
Note the resemblance with (11.2.2).

633
Chapter 12 Matlab Commands

12 Matlab Commands
† indicates an laode toolbox command not found in MATLAB .

Chapter 1: Preliminaries

Editing and Number Commands

quit Ends MATLAB session


; (a) At end of line the semicolon suppresses echo printing
(b) When entering an array the semicolon indicates a new row
↑ Displays previous MATLAB command
[] Brackets indicating the beginning and the end of a vector or a matrix
x=y Assigns x the value of y
x(j) Recalls j th entry of vector x
A(i,j) Recalls ith row, j th column of matrix A
A(i,:) Recalls ith row of matrix A
A(:,j) Recalls j th column of matrix A

Vector Commands

norm(x) The norm or length of a vector x


dot(x,y) Computes the dot product of vectors x and y
†addvec(x,y) Graphics display of vector addition in the plane
†addvec3(x,y) Graphics display of vector addition in three dimensions

Matrix Commands

A0 (Conjugate) transpose of matrix


zeros(m,n) Creates an m × n matrix all of whose entries equal 0
zeros(n) Creates an n × n matrix all of whose entries equal 0
diag(x) Creates an n × n diagonal matrix whose diagonal entries
are the components of the vector x ∈ Rn
eye(n) Creates an n × n identity matrix

Special Numbers in MATLAB

634
Chapter 12 Matlab Commands

pi The number π = 3.1415 . . .


acos(a) The inverse cosine of the number a

Chapter 2: Solving Linear Equations

Editing and Number Commands

format Changes the numbers display format to standard five digit format
format long Changes display format to 15 digits
format rational Changes display format to rational numbers
format short e Changes display to five digit floating point numbers

Vector Commands

x.*y Componentwise multiplication of the vectors x and y


x./y Componentwise division of the vectors x and y
x.^y Componentwise exponentiation of the vectors x and y

Matrix Commands

A([i j],:) = A([j i],:)


Swaps ith and j th rows of matrix A
A\b Solves the system of linear equations associated with
the augmented matrix (A|b)
x = linspace(xmin,xmax,N)
Generates a vector x whose entries are N equally spaced points
from xmin to xmax
x = xmin:xstep:xmax
Generates a vector whose entries are equally spaced points from xmin to xmax
with stepsize xstep
[x,y] = meshgrid(XMIN:XSTEP:XMAX,YMIN:YSTEP:YMAX);
Generates two vectors x and y. The entries of x are values from XMIN to XMAX
in steps of XSTEP. Similarly for y.
rand(m,n) Generates an m × n matrix whose entries are randomly and uniformly chosen

635
Chapter 12 Matlab Commands

from the interval [0, 1]


rref(A) Returns the reduced row echelon form of the m × n matrix A
the matrix after each step in the row reduction process
rank(A) Returns the rank of the m × n matrix A

Graphics Commands

plot(x,y) Plots a graph connecting the points (x(i), y(i)) in sequence


xlabel('labelx') Prints labelx along the x axis
ylabel('labely') Prints labely along the y axis
surf(x,y,z) Plots a three dimensional graph of z(j) as a function of x(j) and y(j)
hold on Instructs MATLAB to add new graphics to the previous figure
hold off Instructs MATLAB to clear figure when new graphics are generated
grid Toggles grid lines on a figure
axis('equal') Forces MATLAB to use equal x and y dimensions
view([a b c]) Sets viewpoint from which an observer sees the current 3-D plot
zoom Zoom in and out on 2-D plot. On each mouse click, axes change by a factor of 2

Special Numbers and Functions in MATLAB

exp(x) The number e√x where e = exp(1) = 2.7182 . . .


sqrt(x) The number √x
i The number −1

Chapter 3: Matrices and Linearity

Matrix Commands

A*x Performs the matrix vector product of the matrix A with the vector x
A*B Performs the matrix product of the matrices A and B
size(A) Determines the numbers of rows and columns of a matrix A
inv(A) Computes the inverse of a matrix A

Program for Matrix Mappings

†map Allows the graphic exploration of planar matrix mappings

636
Chapter 12 Matlab Commands

Chapter 4: Solving Ordinary Differential Equations

Special Functions in MATLAB

sin(x) The number sin(x)


cos(x) The number cos(x)

Matrix Commands

eig(A) Computes the eigenvalues of the matrix A


null(A) Computes the solutions to the homogeneous equation Ax = 0

Programs for the Solution of ODEs

†dfield8 Displays graphs of solutions to differential equations


†pline Dynamic illustration of phase line plots for single
autonomous differential equations
†pplane9 Displays phase space and time series plots for systems of autonomous differential equations

Chapter 7: Determinants and Eigenvalues

Matrix Commands

det(A) Computes the determinant of the matrix A


poly(A) Returns the characteristic polynomial of the matrix A
sum(v) Computes the sum of the components of the vector v
trace(A) Computes the trace of the matrix A
[V,D] = eig(A) Computes eigenvectors and eigenvalues of the matrix A

Chapter 8: Linear Maps and Changes of Coordinates

Vector Commands

637
Chapter 12 Matlab Commands

†bcoord Geometric illustration of planar coordinates by vector addition


†ccoord Geometric illustration of coordinates relative to two bases

Chapter 10: Orthogonality

Matrix Commands

orth(A) Computes an orthonormal basis for the column space of the matrix A
[Q,R] = qr(A,0) Computes the QR decomposition of the matrix A

Graphics Commands

axis([xmin,xmax,ymin,ymax])
Forces MATLAB to use in a twodimensional plot the intervals
[xmin,xmax] resp. [ymin,ymax] labeling the x- resp. y-axis
plot(x,y,'o') Same as plot but now the points (x(i), y(i)) are marked by
circles and no longer connected in sequence

Chapter 11: Matrix Normal Forms

Vector Commands

real(v) Returns the vector of the real parts of the components


of the vector v
imag(v) Returns the vector of the imaginary parts of the components
of the vector v

638
Index
R3 i, 97
subspaces, 345 imag, 591
Rn , 2 inf, 33
ej , 130 inv, 171, 475, 515
MATLAB Instructions linspace, 41
\, 31, 33, 94, 97, 106 map, 116, 571, 593
’, 12, 515 meshgrid, 45
*, 106, 159 norm, 18, 541, 556
.^, 48 null, 257, 311, 314, 321, 475, 556
:, 9 orth, 557
;, 7 pi, 92
[1 2 1], 6 plot, 41
[1; 2; 3], 8 poly, 463, 611
.*, 48 pplane8, 209, 210, 212, 364, 393
./, 48 prod, 408
A(3,4), 32 qr, 578
A([1 3],:), 60 rand, 39, 327
acos, 21 rank, 84, 332
addvec, 16 real, 591
addvec3, 17 rref, 80, 313
axis(’equal’), 42 sin, 197
bcoord, 514 size, 160
ccoord, 527 sqrt, 92
cos, 197 sum, 463
det, 448 surf, 45
diag, 11 trace, 463
dog, 117 view, 54
dot, 18, 21 xlabel, 42
eig, 250, 256, 475, 591, 613 ylabel, 42
exp(1), 92 zeros, 11
expm, 407 zoom, 52
eye, 11, 169
format acceleration, 426
long, 38, 94 amplitude, 545
rational, 38, 95 angle between vectors, 20
grid, 42 associative, 157, 286
hold, 42 autonomous, 356

639
Chapter 12 Matlab Commands

back substitution, 56, 65 in Rn , 514


basis, 329, 343, 345, 491, 516, 525, 537, 583 in MATLAB , 514
construction, 343 standard, 511
orthonormal, 553, 555, 560, 561, 566, coupled system, 218
575, 578 Cramer’s rule, 186, 456
binomial theorem, 626
data points, 539
Cartesian plane, 2 data value, 540
Cayley Hamilton theorem, 419, 609 degrees, 21
center, 518 determinant, 180, 380, 438, 442, 446, 461,
change of coordinates, 517 481
characteristic polynomial, 243, 370, 379, computation, 442, 447
419, 458, 459, 584, 599, 609 in MATLAB , 448
inductive formula for, 445, 481
of triangular matrices, 459
of 2 × 2 matrices, 439, 446
roots, 458
of 3 × 3 matrices, 446
closed form solution, 383
uniqueness, 438, 443
closure
diagonalization
under addition, 286, 287
in MATLAB , 474
under scalar multiplication, 286, 287
differential equation
cofactor, 445, 459, 481
superposition, 229
collinear, 345
dilatation, 116, 583, 585
column, 2
dimension, 329, 330, 342, 344, 502, 511
rank, 501, 504
finite, 329
space, 501
infinite, 329
commutative, 158, 286 of Rn , 330
complex conjugation, 98 of null space, 332
complex diagonalizable, 585 direction field, 210
complex numbers, 96 discriminant, 244
complex valued solution, 369 distance
imaginary part, 369 between vectors, 535
real part, 369 Euclidean, 540
composition, 149 to a line, 536
of linear mappings, 491 to a subspace, 537
compound interest, 198 distributive, 158, 286
consistent, 57, 81 dot product, 18, 21, 43, 487, 553, 566
contraction, 116, 583 double precision, 614
coordinate system, 518
coordinates, 511–513, 525 echelon form, 63, 65, 78, 636
in R2 , 513 reduced, 66, 79, 313, 330, 442, 448

640
Chapter 12 Matlab Commands

uniqueness, 84 general solution, 252, 358, 368, 373, 428


eigendirection, 218 generalized eigenspace, 602
eigenvalue, 230, 242–244, 437, 458, 461, geometric decay, 626
472, 586 Gram-Schmidt Orthonormalization, 560
complex, 246, 367, 369, 384, 458, 584 Gram-Schmidt orthonormalization, 561
distinct, 585 growth rate, 193
multiple, 613
dominant, 627 Hermitian inner product, 566
existence, 461 homogeneous, 142, 298, 299, 301, 311, 332,
of inverse, 462 426
of symmetric matrix, 566 Hooke’s law, 426
real, 231, 367, 458 hyperplane, 573
distinct, 472
equal, 371 identity mapping, 119
eigenvector, 230, 242, 245, 369, 473, 601 inconsistent, 57, 81, 314
generalized, 372, 517, 601, 602, 611 index, 601, 603
linearly independent, 372, 473, 608 inhomogeneous, 143, 166
real, 231, 372 initial condition, 229, 356
elementary row operations, 58, 330, 439, linear independence, 358
440, 442, 483 initial position, 428
in MATLAB , 59 initial value problem, 193, 227, 229, 251,
equilibrium, 209 253, 356, 367, 404
Euler’s formula, 368, 585 for second order equations, 428
expansion, 116, 583 initial velocity, 428
exponential integral calculus, 190
decay, 193 inverse, 165, 166, 180, 286, 444, 586
growth, 193 computation, 168
external force, 427 invertible, 165, 168, 181, 238, 379, 405,
445, 461, 462, 491, 492
first order
reduction to, 427 Jordan block, 601, 608, 609, 625
fitting of data, 539 Jordan normal form, 608, 612, 630
force, 426 basis for, 609
frequency
internal, 429 law of cosines, 19
function space, 286, 542 Law of Phythagorus, 535
subspace of, 290 least squares, 540
fundamental theorem of algebra, 460 approximation, 535
distance to a line, 536
Gaussian elimination, 57, 61 distance to a subspace, 537

641
Chapter 12 Matlab Commands

fit to a quadratic polynomial, 541, 543 nilpotent, 619


fit to a sinusoidal function, 544 orthogonal, 555, 574
fitting of data, 539 product, 151
general fit, 542 scalar multiplication, 3
length, 17 square, 11, 609
linear, 56, 128, 487, 525 strictly upper triangular, 472
combination, 300, 311, 319, 333, 510 symmetric, 12, 566
fit to data, 539 transition, 270, 274, 278, 476, 525,
mapping, 128, 129, 131, 487, 511, 528 526, 625
construction, 488 transpose, 11, 159, 166, 180, 438, 504
matrix, 489 upper triangular, 12, 180, 574
regression, 541, 543 zero, 11, 119
linearly matrix vector product, 104
dependent, 319, 320, 341 in MATLAB , 105
independent, 319, 320, 329, 341–343, minimization problem, 540
345, 357, 369, 473, 553 multiplicity
algebraic, 599, 611, 612
Markov chain, 273, 274, 625 geometric, 599
mass, 426
Newton’s law of cooling, 203
matrix, 2, 9
Newton’s second law, 426
addition, 3
nilpotent, 619
associated to a linear map, 512
noninvertible, 165
augmented, 58, 63, 83, 306, 312
nonlinear, 72
block diagonal, 12
norm, 17
real, 587
normal form, 381, 516, 584, 585
coefficient, 29, 31
geometric, 587
diagonal, 11, 472, 587
normal vector, 44
exponential, 403
null space, 298, 301, 320, 461, 501, 502,
computation, 405
538, 556, 608
in MATLAB , 407 dimension, 333
Householder, 573, 574, 577 nullity, 333, 598
identity, 11, 83, 98, 119
invertible, 490 orthogonal, 553, 561
lower triangular, 12, 438, 458 orthonormal, 553
mappings, 116, 118, 129, 511 orthonormalization
Markov, 274–276, 278, 625, 627, 628 with MATLAB , 578
multiplication, 104, 149, 150, 157,
166, 566, 583 parabolic fit, 542
in MATLAB , 159 parallelogram, 21, 181

642
Chapter 12 Matlab Commands

parallelogram law, 16 saddle, 210, 390, 518


particle motion, 426 scalar, 2
particular solution, 358 scalar multiplication, 2, 17, 127, 286, 288,
perpendicular, 19, 290, 345, 560 299
phase in MATLAB , 6
portrait, 393 scatter plot, 541, 542
for a saddle, 393 shear, 122, 183
for a sink, 393 similar, 379, 405, 462, 472, 516, 528, 588,
for a source, 393 608
space, 209 matrices, 379
pivot, 65, 78, 331 singular, 165, 458, 461
planar mappings, 116 sink, 211, 390
plane, 43, 345, 561 sinusoidal functions, 545
Polking, John, 210 sliding friction, 426
polynomial, 291 source, 211, 390
polynomial growth, 626 span, 299–301, 311, 314, 319, 329, 345, 540,
population dynamics, 201, 203 560
population model, 201 spanning set, 301, 329, 344, 578
principle of superposition, 142 spring, 426
product rule, 192 damped, 430
motion of, 426
QR decomposition, 574, 575, 578 undamped, 430
using Householder matrices, 575 spring equation, 430
stability
radians, 21 asymptotic, 209
range, 501, 502 stable
manifold, 394
rank, 84, 301, 330, 333, 347
orbit, 394
real block diagonal form
subspace, 288, 290, 298, 300, 501, 537, 553
in MATLAB , 591
of function space, 290
real diagonalizable, 472, 473
of polynomials, 329
reflection, 573
of solutions, 358
rotation, 117, 583, 585
proper, 290, 344
matrix, 391
substitution, 28
round off error, 614
superposition, 142, 229, 290
row, 2
system of differential equations, 208
equivalent, 79, 80, 84, 98, 456
constant coefficient, 227
rank, 501, 504
uncoupled, 208
reduction, 331, 443, 586
space, 501 time series, 212

643
Chapter 12 Matlab Commands

trace, 243, 379, 462


trajectory, 209
trigonometric function, 291

uniqueness of solutions, 166, 404


unstable, 209
manifold, 394
orbit, 394

vector, 2, 286, 312


addition, 2, 16, 127, 286, 288, 299
complex, 566
coordinates, 511
in C 1 , 291
length, 17
norm, 17
space, 286, 287, 300, 342, 487, 525
subtraction, 3

zero mapping, 119


zero vector, 286, 288

644

You might also like