What Is Linear Algebra and Why Do I Need It?

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

MATH10202 Linear Algebra A 2019-20

These notes accompany the part of the course lectured by Mark Kambites.

What is Linear Algebra and Why Do I Need It?


Linear means “to do with lines”. Linear algebra is the algebra of linear equations, which are
equations whose solution sets (when drawn in space) are lines, and higher dimensional analogues of lines
called linear subspaces. Linear equations can be concisely and elegantly expressed using algebraic objects
called matrices, and the first part of the course is mostly concerned with these. The subject is important
for both pure mathematics and applications:

• Linear algebra expresses some of the fundamental objects of geometry in a formal algebraic way. It
allows us to use equational reasoning to understand geometry.

• Many real-world problems (both those of a geometric flavour and others) are modelled with linear
algebra, which provides a powerful toolkit to solve these problems.

Practicalities
Lecturer (first half ). Professor Mark Kambites (email [email protected]).
Notes and Lectures. Notes for Mark’s part of the course will be provided with gaps for you to
complete in lectures, on Mark’s webpage at

personalpages.manchester.ac.uk/staff/Mark.Kambites/la.php

The notes form the definitive content of this part of the course, and you should expect to refer to them
frequently. If you need the notes in a different format due a disability, please just let me know. The lectures
will explain the same material, but sometimes more informally.
Exercises. Exercise sheets will be handed out in lectures, and contain instructions on when to do the
exercises and what to hand in. They are an essential part of learning, so if you want to do well in the course
you need to schedule time to attempt all of them (not just the ones for handing in!). Solutions will be made
available when you have had a chance to attempt the questions yourself.
Office Hour. My office is Alan Turing 2.137. My office hour will generally be 15:30-16:30 on Tuesdays
during teaching weeks; it may sometimes be necessary to change this in which case details will be posted
on my website, so please check there before making a special journey. If you can’t make it to my office hour
but need a personal meeting then please email or ask after a lecture for an appointment.
Supervisions and Homework. The exercise sheets tell you which exercises to hand in for which weeks;
your supervision group leader will tell you exactly when and where to hand in. Attendance at super-
visions, and handing in homework, is compulsory. Please make sure you arrive on time with the
exercise sheets; group leaders may mark you absent if you come late or not properly prepared.
Assessment. The assessment for the course comprises homework and supervision attendance (10%) and a
final exam (90%). The supervisions/homework for the first 6 weeks and Section A of the exam will cover
my half of the course.
Feedback. Please let me know how you are finding the course, and especially if you think there are any
problems with it. Feel free to speak to me after a lecture, email me, come along in my office hour, or even
slip an anonymous note under my office door!
Books. The course is self-contained and full notes will be supplied, so you should not need to refer to any
books. But if you would like an alternative viewpoint, the course webpage contains some suggested texts.

1
Prerequisite Material
This course builds directly upon MATH10101 Foundations of Pure Mathematics from last
semester. You will need to understand many of the concepts from that course, including:

• proofs (including by contradiction and induction);

• sets (ways of defining them, cardinalities);

• functions (and their properties such as injectivity and surjectivity);

• modular arithmetic.

If you can’t remember what the above terms mean, you should go back to your 10101 notes and reread the
definitions. Exercise Sheet 0 gives a chance to practice working with these.

Remark. You’ll discover this semester that mathematics at university level is not an “examine-and-
forget” subject. Each stage builds upon what you have learnt before, and demands progressively deeper
understanding of previous material. Your main aim in studying this course — and my aim in lecturing it
— is therefore not the exam, but a deep understanding of the material which will serve you well next year
and beyond. If together we can achieve that, the exam will take care of itself!

1 Matrices
Matrices are a useful and elegant way to study linear equations. For teaching purposes it is actually easier to
introduce matrices first (which we do in this chapter) and linear equations afterwards (in the next chapter).

Definition. A matrix is a rectangular array of real numbers1 , for example:


 
    1
2 5 1 6 3 2
A= √
 
or B = or C = 
 −3.7
.
3 0.8 2 1 9 
4

Size. A matrix has clearly defined numbers of rows and columns, which together make up its size. A
matrix with p rows and q columns is called a p × q matrix. Of the above examples:

• A is a 2 × 2 matrix (2 rows and 2 columns – we say that it is square);

• B is a 2 × 3 matrix (2 rows and 3 columns);

• C is a 4 × 1 matrix (4 rows and 1 column).

Matrix Entries. If a matrix is called A, we write Aij to denote the entry in the ith
√ row and jth column of
A. For example, in the matrix A above, the entries are A11 = 2, A12 = 5, A21 = 3 and A22 = 0.8. (Some
people write aij instead of Aij .)

1
For now! Later we shall see that matrices (“matrices” is the plural of “matrix”) can also have other things in them, such
as complex numbers or even more abstract mathematical objects, and much of what we are doing will still work. The objects
which form the entries of a matrix are called the scalars (so for now “scalar” is just another word for “real number”).

2
Equality of Matrices. Two matrices A and B are equal if and only if
• they have the same number of rows and
• they have the same number of columns and
• corresponding entries are equal, that is Aij = Bij for all appropriate2 i and j.

1.1 Addition of Matrices


Two matrices can be added if they have the same number of rows (as each other) and the same number of
columns (as each other). If so, to get each entry of A + B we just add the corresponding entries of A and
B. Formally, A + B is the matrix with entries given by

(A + B)ij = Aij + Bij

for all i and j.


   
2 1 7 2
Example. The matrices A =  6 1  and B =  1 4  are both 3 × 2 matrices, so they can be added.
7 1 3 9
The sum is    
2+7 1+2 9 3
A+B =  6+1 1+4  =  7 5 .
7+3 1+9 10 10

1.2 Scaling of Matrices


Any matrix can be multiplied by any scalar (real number), to give another matrix of the same size. To do
this, just multiply each entry of the matrix by the scalar. Formally, if A is a matrix and λ ∈ R then the
matrix λA has entries given by
(λA)ij = λ(Aij )
for all i and j.
     
−1 4 7 × (−1) 7 × 4 −7 28
Example. 7 = = .
3 2 7×3 7×2 21 14

Notation. We write −A as a shorthand for (−1)A.

1.3 Subtraction of Matrices


Two matrices of the same size (the same number of rows and same number of columns) can be subtracted.
Again, the operation is performed by subtracting corresponding components. Formally,

(A − B)ij = Aij − Bij .

   
2 7 −1 3
Example. Let A =  7 −1  and B =  0 4 .
−1 2 2 0
Since both are 3 × 2 matrices, the subtraction A − B is meaningful and we have
   
2 − (−1) 7 − 3 3 4
A−B = 7−0 −1 − 4  =  7 −5 
−1 − 2 2−0 −3 2
2
By “appropriate” I mean, of course, that i and j have the right values to be “indices” (“indices” is the plural of “index”)
into the matrix: in other words, i is an integer between 1 and the number of rows, and j is an integer between 1 and the number
of columns. Once the size of a matrix is known it should be obvious what numbers are allowed to be row and column indices.
For brevity I will often just write “for all i and j” to mean “for all integers i between 1 and the number of rows in the matrix
and j between 1 and the number of columns in the matrix”.

3
Proposition 1.1. For any matrices A and B of the same size p × q, we have

A − B = A + (−B).

Proof. Notice first that (−B) = (−1)B is also a p × q matrix, so the sum A + (−B) makes sense. Now for
each i and j we have

(A + (−B))ij = Aij + (−B)ij = Aij + (−1)Bij = Aij − Bij = (A − B)ij .

So A − B and A + (−B) are the same size, and agree in every entry, which means they are equal.

1.4 Multiplication of Matrices


Two matrices A and B can, under certain circumstances, be multiplied together to give a product AB.
The condition for this to be possible is that the number of columns in A should equal the number of rows
in B.
If A is m × n and B is n × p then AB can be formed and will be m × p.

A × B = AB
m× n n× p m× p

The entry in the ith row and the jth column of AB depends on the entries in the ith row of A and the
entries in the jth column of B. Formally,
n
X
(AB)ij = Ai1 B1j + Ai2 B2j + Ai3 B3j + . . . + Ain Bnj = Aik Bkj .
k=1

The way entries of the factors combine to form entries of the product can be visualised as follows:

.. ..
     
B1j
. .
   B2j   
 Ai1 Ai2 Ai3 . . . Ain     (AB)ij 

 ..



 ... ... B3j ...  =  ... ...
 
.. ...


 .   ..   . 

..
  .  
..

. Bnj .
i − th row j − th column Element in i − th row
and j − th column

Example. Consider the matrices


 
2 1  
5
A =  −3 7  and B = .
−2
1 5

A is 3 × 2 and B is 2 × 1. As the number of columns in A is the same as the number of rows in B, AB


exists. AB will be 3 × 1. The three entries of AB are computed as follows:

• The top entry of AB will be computed from top row of A and the single column of B: it is 2 × 5 +
1 × (−2) = 8.

• The middle entry of AB will be computed from the middle row of A and the single column of B: it is
−3 × 5 + 7 × (−2) = −29.

• The bottom entry of AB will be computed from the bottom row of A and the single column of B: it
is 1 × 5 + 5 × (−2) = −5.

4
 
8
So AB =  −29 .
−5

Example. Consider the matrices


   
2 1 4 −2
C= and D = .
−5 3 3 7

Both C and D are 2 × 2. As the number of columns in C equals the number of rows in D, CD exists, and
it will also be 2 × 2.

  
2 1 4 −2
CD =
−5 3 3 7
 
2×4+1×3 2 × (−2) + 1 × 7
=
−5 × 4 + 3 × 3 (−5) × (−2) + 3 × 7
 
11 3
=
−11 31

Example. Consider the matrices


 
2 6 −4 3  
7 8 −1
 1 2 8 −5 
 −1
  4 1 
E =  −3
 7 2 9  and F =  .
 1
  2 −3 2 
2 2 4 
2 2 −2
−2 0 7 4

Then EF will be a 5 × 3 matrix, and for example the entry (EF )42 can be calculated using the 4th row of
8
4
E, which is 1 2 2 4 , and the 2nd column of F , which is , so:
−3
2

(EF )42 = 1 × 8 + 2 × 4 + 2 × (−3) + 4 × 2 = 18.

Tip. The only way to master matrix operations — especially multiplication — is to practice! When you
have finished the examples on Exercise Sheet 1, try making up examples of your own and multiplying them.
Get a friend to do the same ones and check you get the same answer.

Exercise. Prove that if A and B are square matrices (same number of rows and columns) of the same size
as each other, then AB is always defined. What size will AB be?

1.5 Properties of Matrix Operations


The operations on matrices share a lot of properties with the corresponding operations on numbers. The
following theorem gives some of the most notable ones:

Theorem 1.2. Let A, B and C be matrices such that the given operations are defined, and λ and µ be
numbers. Then

(i) A + B = B + A;

(ii) A + (B + C) = (A + B) + C;

(iii) A(BC) = (AB)C;

5
(iv) A(B + C) = AB + AC;

(v) (B + C)A = BA + CA;

(vi) A(B − C) = AB − AC;

(vii) (B − C)A = BA − CA;

(viii) λ(B + C) = λB + λC;

(ix) λ(B − C) = λB − λC;

(x) (λ + µ)C = λC + µC;

(xi) (λ − µ)C = λC − µC;

(xii) λ(µC) = (λµ)C;

(xiii) λ(BC) = (λB)C = B(λC).

Proof. We prove (i) and (iv) to exemplify the methods. Some of the other proofs are on Exercise Sheet 1.

(i) For A + B (and B + A) to be defined, A and B must be the same shape, say m × n. Now A + B and
B + A are both m × n, and for each i and j we have

(A + B)ij = Aij + Bij = Bij + Aij = (B + A)ij .

So A + B and B + A are the same shape with exactly the same entries, which means A + B = B + A.

(iv) For the given operations to be defined, A must be an m × r matrix, and B and C both r × n matrices.
Now A(B + C) and AB + AC are both m × n matrices, and for each i and j we have
r
X r
X r
X r
X
[A(B + C)]ij = Ail (B + C)lj = Ail (Blj + Clj ) = Ail Blj + Ail clj
l=1 l=1 l=1 l=1
= (AB)ij + (AC)ij = (AB + AC)ij .

Remark. Property (i) in Theorem 1.2 is called the commutativity law for matrix addition; notice that
there isn’t a corresponding statement for matrix multiplication - see Exercise Sheet 1. Properties (ii) and
(iii) are associativity laws (for matrix addition and matrix multiplication respectively). Properties (iii)-(xi)
are distributivity laws.

1.6 Diagonal Matrices, Identities and Inverses

The Main Diagonal. If A is a square matrix (say n × n) then the main diagonal (sometimes just called
the diagonal) is the diagonal from top left to bottom right of the matrix. In other words, it is the collection
of entries whose row number is the same as their column number.
A matrix is called diagonal if all the entries not on the diagonal are 0. (Some, or even all, of the
diagonal entries may also be 0.) For example,
 
  1 0 0
2 0
and  0 1 0 
0 0
0 0 1

are diagonal matrices but

6
 
−2 0
5 2
is not diagonal because of the 5 which lies off the main diagonal.

Identity Matrices. A square matrix with 1’s on the main diagonal and 0’s everywhere else is called an
identity matrix. The n × n identity matrix is denoted In . For example:
 
  1 0 0
1 0
I2 = while I3 =  0 1 0  .
0 1
0 0 1

It has the important property that


AIn = A and/or In A = A

whenever A is the right size for these products to be defined. (Exercise: prove this.)

Inverses. Let A be an n × n square matrix. An inverse of A is an n × n matrix A−1 such that

A A−1 = A−1 A = In .

Not every matrix has an inverse. Matrices with an inverse are called invertible; those which are not
invertible are called singular.

Exercise. What is the inverse of the identity matrix In ?

Lemma 1.3. (i) Suppose A is an invertible n × n matrix. Then the inverse of A is unique.

(ii) If E and D are both invertible n × n matrices then ED is invertible.

Proof. (i) Suppose B and C are both inverses of A. Then AB = BA = In and AC = CA = In . Now
using Theorem 1.2(iii):

B = In B = (CA)B = C(AB) = CIn = C.

(ii) Let E −1 and D −1 be inverses of E and D respectively. Then again by Theorem 1.2(iii):

(ED)(D −1 E −1 ) = E(DD −1 )E −1 = EIn E −1 = EE −1 = In .

A similar argument (exercise!) gives (D −1 E −1 )(ED) = In . This shows that the matrix D −1 E −1 is
an inverse of ED (and so by part (i), the inverse of ED).

Notation. If A is an invertible matrix, we write A−1 for its inverse.


There is a simple way to describe inverses of 2 × 2 matrices:

Theorem 1.4. Let  


a b
A=
c d
be a 2 × 2 matrix. Then A is invertible if and only if ad − bc 6= 0, and if it is, then
 
1 d −b
A−1 = .
ad − bc −c a

7
Remark. The value ad − bc is a very important number associated to the matrix A. It is called the
determinant of A, and we will see later (Chapter 4) how to generalise the idea to bigger matrices.

Exercise. Determine which of the following matrices are invertible, and for those which are find the inverse:
       
1 2 0 1 −1 3 3 1
.
3 4 1 0 −2 6 3 1

The Zero Matrix. A matrix is called a zero matrix if all of its entries are 0. For example:
 
0  
 0  0 0
  or .
 0  0 0
0

Clearly there is exactly one zero matrix of each size: we write 0p×q for the p × q matrix all of whose entries
are 0, so those above are 04×1 and 02×2 .

1.7 Transpose and Symmetry


The transpose of a p × q matrix is the q × p matrix obtained by swapping over the rows and columns. The
transpose of a matrix A is written AT , and formally it is the q × p matrix which entries given by

(AT )ij = Aji

for all i and j. For example:


 T
1  
2 6  
 7   T 2 −1 1
  =
 2  1 7 2 4 while if A =  −1 3  then A = .
6 3 1
1 1
4

Theorem 1.5. Let A and B be matrices such that the given operations are defined. Then:

(i) (AT )T = A;

(ii) (A + B)T = AT + B T ;

(iii) (AB)T = (B T AT ).

Proof.
(i) Clear.
(ii) For A + B to be defined we need A and B to have the same size, say m × n, in which case (A + B)T and
AT + B T are both defined and n × m. Now for each i and j,

((A + B)T )ij = (A + B)ji = Aji + Bji = (AT )ij + (B T )ij = (AT + B T )ij .

So (A + B)T and (AT + B T ) are the same size with the same entries, which means they are equal.
(iii) For AB to be defined it must be that A is m × p and B is p × n, in which case AT is p × m and B T
is n × p, so that B T AT is defined. It is easy to check that (AB)T and B T AT are both n × m. Now for
1 ≤ i ≤ n and 1 ≤ j ≤ m we have3
p
X p
X p
X
((AB)T )ij = (AB)ji = Ajk Bki = (AT )kj (B T )ik = (B T )ik (AT )kj = (B T AT )ij .
k=1 k=1 k=1

3
This is a case where just saying “appropriate i and j” might not be entirely clear, so we are careful to specify!

8
Theorem 1.6. If A is an invertible matrix then AT is invertible and (AT )−1 = (A−1 )T .

Proof. Using Theorem 1.5(iii)


AT (A−1 )T = (A−1 A)T = (In )T = In
(A−1 )T AT = (AA−1 )T = (In )T = In

Definition. A matrix A is called symmetric if AT = A, or skew symmetric if AT = −A.


   
1 9 4 0 9 4
Example.  9 3 −2  is symmetric.  −9 0 −2  is skew symmetric.
4 −2 7 −4 2 0

Remark. Notice that a matrix which is symmetric or skew symmetric must be square, and a skew symmetric
matrix must have 0s on the main diagonal.

Theorem 1.7. (i) If A is any matrix then AAT and AT A are symmetric matrices.

(ii) If A is a square matrix then A + AT is symmetric and A − AT is skew symmetric.

(iii) If A is invertible and symmetric then A−1 is symmetric

Proof. (i) and (ii) follow easily from Theorem 1.5, and (iii) follows from Theorem 1.6. (exercise: do
this!).

9
2 Systems of Linear Equations
A system of equations of the form
x + 2y = 7
2x − y = 4
or
5p − 6q + r = 4
2p + 3q − 5r = 7
6p − q + 4r = −2
is called a system of linear equations.

Definition. An equation involving certain variables is linear if each side of the equation is a sum of
constants and constant multiples of the variables.

Examples. The equations 2x = 3y + 3, 3 = 0 and x + y + z = 0 are linear.

Non-examples. The equations y = x2 and 3xy + z + 7 = 2 are not linear since they involve products of
variables.

2.1 Solutions to Systems of Equations


A solution to a system of equations is a way of assigning a value to each variable, so as to make all the
equations true at once.

Example. The system of equations . . .

4x − 2y = 22
9x + 3y = 12

. . . has a solution x = 3 and y = −5. (In fact in this case this is the only solution.)

Remark. In general, a system of linear equations can have no solutions, a unique solution, or infinitely
many solutions. It is not possible for the system to have, for example, exactly 2 solutions; this is a higher
dimensional generalisation of the fact that 2 straight lines cannot meet in exactly 2 points (recall Exercise
Sheet 0).

Definition. The set of all solutions is called the solution set or solution space of the system.

Warning. When we talk about solving a system of equations, we mean describing all the solutions (or
saying that there are none, if the solution set is empty), not just finding one possible solution!

2.2 2-dimensional Geometry of Linear Equations


What can solution spaces look like? Consider a linear equation in two variables x and y, say:

3x + 4y = 4.

Each solution consists of an x-value and a y-value. We can think of these as the coordinates of a point in
2-dimensional space. For example, a solution to the above equation is x = −4 and y = 4. This solution
gives the point (−4, 4).
In general, the set of solutions to a linear equation with variables x and y forms a line in 2-dimensional
space. So each equation corresponds to a line — we say that the equation defines the line.

Systems of Equations. Now suppose we have a system of two such equations. A solution to the system
means an x-value and a y-value which solve all the equations at once. In other words, a solution

10
corresponds to a point which lies on all the lines. So in geometric terms, the solution set to a system of
linear equations with variables x and y is an intersection of lines in the plane.

Question. Recall from Exercise Sheet 0 what an intersection of two lines in the plane can look like:

• Usually4 it is a single point. This is why 2 equations in 2 variables usually have a unique solution.

• Alternatively, the lines could be distinct but parallel. Then the intersection is the empty set. This
is why 2 equations in 2 variables can sometimes have no solutions.

• Or then again, the two lines could actually be the same. In this case the intersection is the whole
line. This is why 2 equations in 2 variables can sometimes have infinitely many solutions.

Similarly, if we have a system of more than 2 equations, we are looking for the intersection of all the
lines (in other words, the set of points in the plane which lie on all the lines). The intersection of 3 lines is
“usually” empty but could also be a single point or a line.

2.3 Higher Dimensional Geometry of Linear Equations


Now suppose we have an equation in the variables x, y and z, say

3x + 4y + 7z = 4.

This time we can consider a solution as a point in 3-dimensional space. The set of all solutions (assuming
there are some) forms a plane in space. The solution set for a system of k linear equations will be an
intersection of k planes.

Exercise. What can the intersection of 2 planes in 3-space look like? Try to imagine all the possibilities,
as we did for lines in the previous section. How do they correspond to the possible forms of solution sets of
2 equations with 3 variables?

Exercise. Now try to do the same for intersections of 3 planes. (Recall that the intersection of three sets
is the set of points which lie in all three). How do the possible geometric things you get correspond to the
possible forms of solution sets for 3 equations with 3 variables?
In yet higher dimensions, the solution set to a linear equation in n variables is a copy of (n − 1)-dimensional
space inside n-dimensional space. Such a set is called a hyperplane. So geometrically, the solution set of
k equations with n variables will be an intersection of k hyperplanes in n-dimensional space.

Remark. Actually it is not quite true to say that every equation defines a hyperplane. Equations like
“0 = 0” and “2 = 2” define the whole space; if all equations in a system have this form then the solution
space will be the whole space. On the other hand, equations like “0 = 1” have no solutions; if any equation
in a system has this form then the solution space will be empty.

Remark. You might be wondering what happens if we drop the requirement that the equations be linear,
and allow for example polynomial equations? If we do this then lines, planes and hyperplanes get replaced
by more general objects called curves, surfaces and hypersurfaces respectively. For example, the solution
set to x2 + y 2 + z 2 = 1 is a sphere, which is a 2-dimensional surface in 3-space. The solution space to a
system of polynomial equations (an intersection of hypersurfaces) is called an (affine) algebraic variety.
This part of mathematics is called algebraic geometry; you can study it in your third year.

4
Of course “usually” here is a vague term, but I hope you can see intuitively what I mean! If you pick two lines at random
it is somehow “infinitely unlikely” that they will be exactly parallel. It is possible to make this intuition precise; this is well
beyond the scope of this course but as a (challenging!) exercise you might like to think how you would do so.

11
2.4 Linear Equations and Matrices
Any system of linear equation can be rearranged to put all the constant terms on the right, and all the
terms involving variables on the left. This will yield a system something like . . .

4x − 2y = 3
2x + 2y = 5

. . . so from now on we will assume all systems have this form. The system can then be written in matrix
form:      
4 −2 x 3
= .
2 2 y 5
You can check (by multiplying out the matrix equation using the definition of matrix multiplication) that
it holds if and only if both of the original (scalar) equations hold. We can express the equations even more
concisely as an augmented matrix:  
4 −2 3
.
2 2 5

2.5 Elementary Row Operations and Solving Systems of Equations


At school you learnt to solve systems of linear equations by simple algebraic manipulation. This ad hoc
approach is handy for small examples, but many applications involve bigger systems of equations, and for
these it is helpful to have a systematic method.
We start with the augmented matrix of our system (see Section 2.4), and we allow ourselves to modify
the matrix by certain kinds of steps. We can:

(i) multiply a row by a non-zero scalar (written ri → λri );

(ii) add a multiple of one row to another (written ri → ri + λrj );

(iii) swap two rows of the matrix (written ri ↔ rj ).

These are called elementary row operations.5

Warning. Multiplying a row by 0 is definitely not an elementary row operation!

Definition. Two matrices A and B are called row equivalent if each can be obtained from the other by
a sequence of elementary row operations.

Row operations and row equivalence are useful because of the following fact:

Theorem 2.1. Suppose M and N are the augmented matrices of two system of linear equations. If M is
row equivalent to N then the two systems have the same solution set.

We’ll prove this theorem later (Chapter 3), and but for now let’s explore its consequences. It means
that if we start with a system of equations, applying elementary row operations to the augmented matrix
will give us another system of equations with the same solution set. The aim of the game is to obtain a
system of equations which is easier to solve. For example, consider the system:

First we convert this into augmented matrix form (noting that the “missing” x terms are really 0x and
become 0s in the matrix):
5
There is an obvious “dual” idea of elementary column operations, written ci → λci , ci → ci + λcj and ci ↔ cj .

12
Now we swap rows 1 and 2 (r1 ↔ r2 ), to get

1
Next let’s scale rows 1 and 2 to make the first entry in each 1 (r1 → −r1 , r2 → 10 r2 ):

Next we subtract 10 times row 2 from row 3 (r3 → r3 − 10r2 ), giving:

Now we convert the augmented matrix back to a system of equations:

It turns out that these equations are easier to solve than the original ones. Equation [3] tells us straight
3
away that z = 3. Now substituting this into equation [2] we get y − 10 × 3 = − 19
10 , in other words, y = −1.
Finally, substituting both of these into equation [2] gives x − 4 × (−1) + 3 = 9, that is, x = 2.
So we have found the solutions to equations [1], [2] and [3]. But these were obtained from our original
system of equations by elementary row operations, so (by Theorem 2.1) they are also the solutions to the
original system of equations. (Check this for yourself !)
This strategy forms the basis of a very general algorithm, called Gaussian6 Elimination, which we
shall see in Section 2.7.

2.6 Row Echelon Matrices

Remark. Notice how we converted the matrix into a roughly “triangular” form, with all the entries towards
the “bottom left” being 0. It was this property which made the new equations easy to solve. The following
definitions will make precise what we mean by “triangular” here.

Pivots. The pivot (or leading entry) of a row in an augmented matrix is the position of the leftmost
non-0 entry to the left of the bar. (If all entries left of the bar are 0, we call the row a zero row and it has
no pivot.)

Example. In the matrix A on the right


 
0 5 1 0
• the pivot of the first row is the 5; A = 3 0 0 0
• the pivot of the second row is the 3; 0 0 0 2

• the third row is a zero row (it has no pivot)

Row Echelon Matrices. An augmented matrix is called a row echelon matrix if


6
After Carl Friedrich Gauß (pronounced to rhyme with “house”), 1777–1855.

13
(1) any zero rows come at the bottom; and
(2) the pivot of each of the other rows is strictly to the right of the pivot of the row above; and
(3) all the pivots are 1.
A row echelon matrix is a reduced row echelon matrix if in addition:
(4) each pivot is the only non-zero entry in its column.

Example. The matrix A above is not a row echelon matrix, because the  
0 1 1 2 0
second row pivot is not to the right of the first row pivot. The matrix B 0 0 0 1 5
on the right is a row echelon matrix, but it is not a reduced row echelon B=
0

0 0 0 0
matrix because the 2 in the first row is in the same column as the second
0 0 0 0 6
row pivot.

2.7 Gaussian Elimination

Gaussian Elimination is a simple algorithm which uses elementary row operations to reduce the
augmented matrix to row echelon form:

(1) By swapping rows if necessary, make sure that no non-0 entries (in any row) are to the left of
the first row pivot.
(2) Scale the first row so that the pivot is 1.

(3) Add multiples of the first row to each of the rows below, so as to make all entries below the first row
pivot become 0. (Since there were no non-0 entries left of the first row pivot, this ensures all pivots
below the first row are strictly to the right of the pivot in the first row.)

(4) Now ignore the first row, and repeat the entire process with the second row. (This will ensure that all
pivots below the second row are to the right of the pivot in the second row, which in turn is to the
right of the pivot in the first row.)

(5) Keep going like this (with the third row, and so on) until either we have done all the rows, or all the
remaining rows are zero.

(6) At this point the matrix is in row echelon form.

Exercise. Go back to the example in Section 2.5, and compare what we did with the procedure above.

2.8 Another Example

Example. Let’s solve the system of equations:


2x + y − z = −7
6x − z = −10
−4x + y + 7z = 31
Solution. First we express the system as an augmented matrix:

Notice (Step 1) that there are no non-0 entries to the left of the pivot in the first row (there can’t be,
since the pivot is in the first column!). So we do not need to swap rows to ensure this is the case.

14
Next (Step 2) we scale to make the pivot in the first row 1, that is, r1 → 12 r1 , which gives

Next (Step 3) we seek to remove the entry in row 2 which is below the row 1 pivot. We can do this with
the operation r2 → r2 − 6r1 , giving the matrix:

Similarly, we remove the entry in row 3 below the row 1 pivot by r3 → r3 + 4r1 :

Now we ignore the first row, and repeat the whole process with the remaining rows. Notice that there
are no pivots to the left of the row 2 leading entry. (The one in row 1 doesn’t count, since we are ignoring
row 1!). Now we scale to make the second row pivot into 1, with r2 → − 13 r2 , giving:

Then remove the entry in row 3 which is below the row 2 pivot, with r3 → r3 − 3r2 , giving:

Finally, we repeat the process again, this time ignoring rows 1 and 2. All we have to do now is scale row
3 to make the pivot 1, with r3 → 17 r3 :

Our matrix is now in row echelon form, so we convert it back to a system of equations:

These matrices can easily be solved by “backtracking” through them. Specifically:

• equation [3] says explicitly that z = 4;

2
• substituting into equation [2] gives y − 3 × 4 = − 11
3 , so y = −1;

1 1
• substituting into equation [1] gives x + 2 × (−1) − 2 × 4 = − 27 , so x = −1.

15
Thus, the solution is x = −1, y = −1 and z = 4. (You should check this by substituting the values back
into the original equations!)

Remark. Notice how the row echelon form of the matrix facilitated the “backtracking” procedure for
solving equations [1], [2] and [3].

Exercise. Use Gaussian elimination to solve the following systems of equations:

4x + y = 9 2x − 4y = 12
(i) (ii)
2x − 3y = 1 −x + 2y = −5

2.9 Gauss-Jordan Elimination


Gaussian elimination lets us transform any matrix into a row echelon matrix. Sometimes it is helpful to
go further and obtain a reduced row echelon matrix. To do this we use a slight variation called Gauss-
Jordan7 Elimination:

• First use Gaussian elimination to compute a row echelon form.

• Add multiples of the 2nd row to the row above to remove any non-0 entries above the 2nd row pivot.

• Add multiples of the 3rd row to the rows above to remove any non-0 entries above the 3rd row pivot.

• Continue down the rows, adding multiples of row k to the rows above to eliminate any non-0 entries
above the kth row pivot.

Theorem 2.2. Let A be a matrix. Then A is row equivalent to a unique reduced row echelon matrix (called
the reduced row echelon form of the matrix).

Proof. It should be clear that applying the Gauss-Jordan elimination algorithm to A will give a reduced
row echelon matrix row equivalent to A. The proof of uniqueness is more difficult, and we omit it.

Important Consequence. We can check if two matrices are row equivalent, by applying Gauss-Jordan
elimination to compute their reduced row echelon forms, then checking if these are equal. (If the reduced
row echelon forms are equal then clearly the original matrices are equivalent. Conversely, if the matrices are
equivalent then by Theorem 2.2, they must have the same reduced row echelon form.)
   
1 3 6 1 0 0
Exercise. Check if  0 1 2  and  1 5 10  are row equivalent.
3 7 14 1 1 2

2.10 Elimination with More Variables than Equations


What if, after applying Gaussian elimination and converting back to equations, there are more variables than
equations? This can happen either because the system we started with was like this, or because elimination
gives us rows of zeros in the matrices: these correspond to the equation “0 = 0” which obviously can’t be
used to find the value of any variables.
Suppose there are k equations and n variables, where n > k. We can still apply Gaussian elimination to
find a row echelon form. But when we convert back to equations and “backtrack” to find solutions, we will
sometimes still encounter an equation with more than one unknown variable.
If this happens, there are infinitely many solutions. We can describe all the solutions by introducing
parameters to replace n − k of the variables.
7
After the German Wilhelm Jordan (1842–1899) and not the more famous French mathematician Camille Jordan (1838–
1922). The correct pronunciation is therefore something like “YOR-dan” (not zhor-DAN!) but most people give up and go with
the English pronunciation!

16
Example. Consider the equations.

−x + 2y + z = 2
3x − 5y − 2z = 10
x − y = 14

The augmented matrix is:


   
−1 2 1 2 1 −2 −1 −2
 3 −5 −2 10 and reducing to row echelon form we get 0 1 1 16  .
1 −1 0 14 0 0 0 0

which gives the equations


x − 2y − z = −2 [1]
y + z = 16 [2]
0 = 0 [3]
Equation [3] is always satisfied but clearly useless for finding the values of the variables, so we can ignore it.
Equation [2] has two unknown variables (y and z) so we introduce a parameter λ to stand for one of them,
say z = λ. Now solving we get y = 16 − λ. Substituting into [1] gives x − 2(16 − λ) − λ = −2, or x = 30 − λ.
So the solutions are:
x = 30 − λ, y = 16 − λ, z = λ
for λ ∈ R.

Remark. When we say these are the solutions, we mean that substituting in different values of λ ∈ R will
give all the solutions to the equations. For example, λ = 1 would give x = 29, y = 15 and z = 1, so this
is one solution to the system of equations (check this!). Or then again, λ = 0 gives x = 30, y = 16 and
z = 0, so this is another possible solution.

Exercise. Use Gaussian elimination to solve


2x − 4y = 10
−x + 2y = −5

2.11 Homogeneous Systems of Equations


A system of linear equations is called homogeneous if the constant terms are all 0. For example:

x + 7y − 4z = 0
2x + 4y − z = 0
3x + y + 2z = 0

An obvious observation about homogeneous systems is that they always have at least one solution,
given by setting all the variables equal to 0. This is called the trivial solution. Geometrically, this means
that the solution space to a homogeneous system always contains the origin.

Forward Pointer. Solution sets to homogeneous systems — that is, intersections of hyperplanes through
the origin — are very special and important subsets in Rn , called vector subspaces. More on this later
(Chapter 5).
When it comes to solving them, homogeneous systems of equations are treated just like non-homogeneous
ones. The only difference is that nothing interesting ever happens on the RHS of the augmented matrix, as
the following exercise will convince you!

Exercise. Find solutions to the system of equations above.

Remark. Notice how, in your solution, everything stays “homogeneous” throughout. All the augmented
matrices in the elimination process, and all the resulting equations, have only 0’s on the RHS.

17
2.12 Computation in the Real World
In describing methods to solve equations (Gaussian and Gauss-Jordan elimination) we have implicitly as-
sumed two things:

• that the equations are precisely known; and

• that we can perform the necessary arithmetic with perfect accuracy.

But in reality, these assumptions are often not true. If we want to solve a system of equations because it
models a real-world problem, then the coefficients in the equations may have been obtained by measure-
ment, in which case their values are subject to experimental error. And even if the coefficients are known
precisely, real-world computers don’t actually store and process real numbers: they can’t, because there
is an (uncountably) infinite set of real numbers but even the most powerful digital computer has a finite
memory. Instead they work with approximations to real numbers: typical is floating point arithmetic
in which numbers are rounded to a certain number of significant figures. At every stage of the computation
numbers have to be rounded to the nearest available approximation, introducing slight inaccuracies even if
the original data was perfect.
If the starting data or the arithmetic are imperfect then it clearly isn’t realistic to expect perfect solutions.
But some methods have the useful property that data which is almost right will produce solutions which
are almost right8 . Other methods do not have this property at all: they may work perfectly in theory, but
the tiniest error in the starting data gets magnified as the computation progresses, and leads to a wildly
inaccurate answer. Gaussian elimination, in the simple form we have described, is not very good in this
respect, but it can be made better with a slightly more sophisticated approach (called partial pivoting)
which takes account of the relative size of the matrix entries.
Further consideration of these practical issues is beyond the scope of this course, but you can learn about
them in later years, starting with the second year option MATH20602 Numerical Analysis I.

8
Or the subtly different property that the solutions produced are exact solutions to almost the right problems. Of course
this is an oversimplification; for an accurate explanation see the first few sections of Chapter 1 in the book Accuracy and Stability
of Numerical Algorithms (2nd edition) by N. J. Higham, and in particular the distinction between forward error and backward
error. (But this is not examinable in this course.)

18
3 Fields, Elementary Matrices and Calculating Inverses
3.1 Fields
So far we have worked with matrices whose entries are real numbers (and systems of equations whose
coefficients and solutions are real numbers). If for a moment we call the set of real numbers K (it will
become clear shortly why we’re not just calling it R!), here are a few basic properties of these operations:

(F1) for all a, b, c ∈ K, (a + b) + c = a + (b + c);

(F2) for all a, b ∈ K, a + b = b + a;

(F3) there is an element of K (which we call “0”) with the property that a + 0 = 0 + a = a for all a ∈ K;

(F4) for every a ∈ K there exists an element −a ∈ K such that a + (−a) = 0;

(F5) for all a, b, c ∈ K, (ab)c = a(bc);

(F6) for all a, b ∈ K, ab = ba;

(F7) there is an element of K (which we call “1” and which is different from 0) with the property that
1a = a1 = a for all a ∈ K;

(F8) for every a ∈ K except 0, there exists an element a−1 ∈ K such that aa−1 = a−1 a = 1;

(F9) for all a, b, c ∈ K, a(b + c) = ab + ac.

These nine properties are called the field axioms. Check for yourself that they are true for the real
numbers. Of course lots of other things are true for the real numbers, so what’s so special about these ones?
It turns out these are exactly the properties which we need to make linear algebra “work”. This means we
can replace the real numbers with any other type of “scalar” having these properties and things will still
work in the same way! For example, the complex numbers (check!) also satisfy the field axioms, so we can
do linear algebra where the scalars are complex numbers.

Definition. Let K be a set equipped with operations of addition and multiplication9 . Then K is a called
field if it satisfies the nine field axioms.

Examples. The real numbers satisfy the axioms (that was the whole point!) so R is a field. We have
already mentioned that the complex numbers satisfy the axioms, so C is a field as well. Exercise Sheet 3
gives some more examples.

Non-example. The set of 2 × 2 matrices (with real number entries) has operations of addition and
multiplication, but is not a field. We saw on Exercise Sheet 1 that it doesn’t satisfy (F6). In fact it satisfies
seven of the nine axioms: on Exercise Sheet 3 you can find the other one that it doesn’t satisfy!

Remark. For real numbers (and complex numbers) we have the extra operations of subtraction and division.
The definition of a field doesn’t explicitly mention these, but in fact we can use them in any field by defining:

• a − b to be shorthand for a + (−b); and


a
• a/b and b to be shorthand for a(b−1 ).

Remark. Many other elementary facts about the real numbers can be deduced from the field axioms, which
means these facts must be true in all fields. For example, we know that in the real numbers we always have
(b + c)a = ba + ca, which is not one of the field axioms. But in fact if a, b and c come from any field then:

(b + c)a = a(b + c) = ab + ac = ba + ca
9
Of course, the field axioms don’t even make sense unless we know how to add and multiply the objects!

19
where the first and last equalities follow from (F6) and the middle one from (F9). Exercise Sheet 3 gives
some more examples of this.

Important Remark. In fact, almost everything we did in Chapters 1 and 2 works when the real numbers
are replaced by elements of some other field (for example, by complex numbers). Matrix operations can
be defined in exactly the same way, and everything we proved about them remains true. Systems of linear
equations still make sense, and Gaussian and Gauss-Jordan Elimination still find the solution sets (there is
an example to try on Exercise Sheet 3). Of course, wherever we referred to the real numbers 0 and 1 (for
example, in defining identity and zero matrices, or during Gaussian elimination) we use the 0 and 1 elements
of the field instead. The only thing which was really specific to the real numbers, and doesn’t quite work
over other fields, is the geometric interpretation of solution sets in space.

Remark. It is worth taking a step back to consider the “big picture” of what we did in this section:

• We start with a particular “concrete” mathematical structure (in this case the real numbers).

• We identify the vital features which make it “work”, and abstract these features as axioms.

• We forget the original structure and study an abstract structure (a field) about which the only
assumption we make is that it satisfies the axioms.

• Everything we learn in the abstract setting applies to every structure satisfying the axioms (for exam-
ple, we can apply it back to the real numbers, but we can also apply it to the complex numbers).

This process of abstraction is one of the central pillars of mathematics. Instead of studying specific
individual structures in isolation, it lets us simultaneously understand lots of different structures which
work in the same way. For example, instead of learning that something is true for the real numbers and
then having to check all over again whether it still works for the complex numbers, we can do it for both
structures at once (and for a whole load of other structures besides). This idea is the basis of abstract
algebra, which you will study in detail starting with MATH20201 Algebraic Structures I next semester.

3.2 Elementary Matrices


From now on, we will mostly work with matrices whose entries come from a field K. Thus, everything we
show and do will work for matrices of real numbers, complex numbers, or even more obscure fields. If you
find this confusing, remember that you can always “specialise” back to the case of R: in other words just
imagine all the entries are real numbers.

Definition. An n × n matrix E is an elementary matrix if it can be obtained from the identity matrix In
by a single elementary row operation. For an elementary row operation ρ we write Eρ for the corresponding
matrix.
There are three types of elementary matrices, corresponding to the three types of row operations. They
look like:

Eri ↔rj =

20
Eri →αri =

Eri →ri +λrj =

Theorem 3.1. Let A and B be m × n matrices over a field and suppose B is obtained from A by a row
operation ρ. Then B = Eρ A.

Proof. Check by matrix multiplication.

Corollary 3.2. Suppose A can be transformed into B by applying a sequences of elementary row operations
ρ1 , ρ2 , . . . , ρk . Then we have
B = Eρk . . . Eρ2 Eρ1 A.

Proof. By induction on k. The base case k = 1 is exactly Theorem 3.1.


Now let k ≥ 2 and suppose for induction that the statement holds for smaller k. Let A′ be the matrix
obtained from A by applying the transformation ρ1 . Then by Theorem 3.1, A′ = Eρ1 A. Now B is obtained
from A′ by the sequence of (k − 1) elementary row operations ρ2 , . . . , ρk , so by the inductive hypothesis,

B = Eρk . . . Eρ2 A′ = Eρk . . . Eρ2 (Eρ1 A) = Eρk . . . Eρ2 Eρ1 A.

Theorem 3.3. Let E be an elementary n × n matrix over a field. Then E is invertible and E −1 is also an
elementary matrix.

Proof. It is easy to check (exercise!) using the definition of the inverse that:

• the inverse of Eri ↔rj is Eri ↔rj ;

• the inverse of Eri →αri is Eri → 1 ri ;


α

• the inverse of Eri →ri +λrj is Eri →ri −λrj .

Elementary matrices allow us to provide the promised proof that elementary row operations on an
augmented matrix preserve the solution set.

Theorem 2.1. Suppose M and N are the augmented matrices of two system of linear equations. If M is
row equivalent to N then the two systems have the same solution set.

Proof. Let M = (A | B) and N = (C | D). Then the two systems of equations can be written in matrix form
(see Section 2.4) as AX = B and CX = D where X is the column matrix whose entries are the variables.

21
Since M and N are equivalent by a sequences of row operations, A and C are equivalent by the same
sequence of row operations, and so are B and D. So by Corollary 3.2 there are elementary matrices
E1 , . . . , Ek such that
C = Ek . . . E1 A and D = Ek . . . E1 B.
Now suppose Z is column matrix which is a solution to AX = B, that is, AZ = B. Then we have

CZ = (Ek . . . E1 A)Z = (Ek . . . E1 )(AZ) = (Ek . . . E1 )B = D

so Z is also a solution to CX = D.
The converse (showing that a solution to CX = D is also a solution to AX = B) is very similar
(exercise!).

3.3 Invertibility and Systems of Equations


Theorem 3.4. For A an n × n matrix over a field, the following are equivalent:

(i) A is invertible;

(ii) AX = 0n×1 has only the trivial solution X = 0n×1 ;

(iii) the reduced row echelon form of A is In ;

(iv) A is row equivalent to In ;

(v) A can be written as a product of elementary matrices.

Proof. We prove (i) =⇒ (ii) =⇒ (iii) =⇒ (iv) =⇒ (v) =⇒ (i).

(i) =⇒ (ii). Suppose A is invertible. If AX = 0n×1 then

X = In X = (A−1 A)X = A−1 0n×1 = 0n×1

so 0n×1 is the only solution to AX = 0n×1 .

(ii) =⇒ (iii). Suppose (ii) holds. Let M be the reduced row echelon form of A. Since M is row equivalent
to A, it follows from Theorem 2.1 that the equation M X = 0n×1 has only one solution. Since M is square
and in reduced row echelon form, it is either the identity matrix or has a zero row at the bottom. Having
a zero row at the bottom would give lots of solutions to M X = 0n×1 (exercise: why? think about
(M | 0n×1 ) and backtracking), so it must be that M = In .

(iii) =⇒ (iv). By definition, A is row equivalent to its reduced row echelon form.

(iv) =⇒ (v). If A is row equivalent to In then by Corollary 3.2 we have

A = Ek . . . E1 In = Ek . . . E1

for some elementary matrices E1 , . . . , Ek .

(v) =⇒ (i). By Theorem 3.3 elementary matrices are invertible, and by (an inductive argument using)
Lemma 1.3(ii) a product of invertible matrices is invertible.

3.4 Calculating Inverses


Suppose A is an invertible n × n matrix over a field. By Theorem 3.4 there is a sequence of row operations
ρ1 , . . . , ρk transforming A to In . By Corollary 3.2 this means

In = Eρk . . . Eρ1 A

22
Multiplying both sides on the right by A−1 we get

A−1 = Eρk . . . Eρ1 AA−1 = Eρk . . . Eρ1 In .

But this means (by Corollary 3.2 again) that applying the sequence of row operations ρ1 , . . . , ρk to In gives
A−1 . This observation gives us an efficient way to find the inverse A−1 of an invertible matrix A:

(i) use Gauss-Jordan elimination to transform A into reduced row echelon form (which by Theorem 3.4
must turn out to be In — if it isn’t then A isn’t invertible!);

(ii) apply the same sequence of row operations to In to get A−1

We can make the process even easier by writing the matrices A and In side-by-side in a single augmented
n × 2n matrix (A | In ), then just apply Gauss-Jordan elimination to convert the left-hand-side to In , and
let the right-hand-side “follow along”.
 
1 0 1
Example. Let A =  1 1 −1 . We have
0 1 0
   
1 0 1 1 0 0 1 0 1 1 0 0
 1 1 −1 0 1 0  −r2 →r2 −r1 
−−−−−→ 0 1 −2 −1 1 0 
0 1 0 0 0 1 0 1 0 0 0 1
 
1 0 1 1 0 0
r3 →r3 −r2 
−− −−−−→ 0 1 −2 −1 1 0 
0 0 2 1 −1 1
 
r3 → 21 r3
1 0 1 1 0 0
−−−−−→  0 1 −2 −1 1 0 
1
0 0 1 2 − 21 12
1 0 0 12 1
− 21
 
r1 →r1 −r3
r2 →r2 +2r3  2
−− −−−−−→ 0 1 0 0 0 1 .
1 1 1
0 0 1 2 −2 2

We have now reduced the left-hand-side to its reduced row echelon form, which is In , confirming that A is
invertible. The remaining right-hand-side must be A−1 , so
 1 1
− 21

2 2
A−1 =  0 0 1 .
1 1 1
2 −2 2

 
1 6 4
Example. Let B =  2 4 −1 . This time:
−1 2 5
   
1 6 4 1 0 0 r2 →r2 −2r1 1 6 4 1 0 0
r3 →r3 +r1
 2 4 −1 0 1 0  −−−−−−→  0 −8 −9 −2 1 0 
−1 2 5 0 0 1 0 8 9 1 0 1
 
1 6 4 1 0 0
r3 →r3 +r2
−− −−−−→  0 −8 −9 −2 1 0 .
0 0 0 −1 1 1

We could continue with the elimination (exercise: try it!) but the zero row which has appeared in the
bottom of the left-hand-side is clearly not going away again. This means the reduced row echelon form of
B is not In , so B is not invertible!

23
4 Determinants
The determinant of a square matrix is a scalar (i.e. an element of the field from which the matrix entries
are drawn) which can be associated to it, and which contains a surprisingly large amount of information
about the matrix. To define it we need some extra notation:

Definition. If M is an n × n matrix over a field, then M ij is the (n − 1) × (n − 1) matrix obtained from


M by throwing away the ith row and jth column.
 
1 2 3    
5 6 1 2
Example. If A =  4 5 6  then A11 = and A23 = .
8 9 7 8
7 8 9

Definition. If M is a 1 × 1 matrix then det M = M11 . If M is an n × n matrix with n ≥ 2 then


n
X
det M = (−1)k+1 M1k det M 1k .
k=1

Alternative Notation. Sometimes the determinant is written det(M ) or |M |.


 
a b
Remark/Exercise. If M = is a 2 × 2 matrix then the formula simplifies to
c d

det M = ad − bc.

which we saw in Section 1.6.

Example. The determinant of the matrix A from the previous example is:

det A = A11 det A11 − A12 det A12 + A13 det A13
     
5 6 4 6 4 5
= 1 det − 2 det + 3 det
8 9 7 9 7 8
= 1 × (5 × 9 − 6 × 8) − 2 × (4 × 9 − 6 × 7) + 3 × (4 × 8 − 5 × 7)
= −3 + 12 − 9
=0

Warning. The determinant is only defined for square (n × n) matrices!

4.1 Expanding Along Rows and Columns


The formula in the definition of the determinant is, for fairly obvious reasons, called the expansion along
the first row. In fact, we can expand in a similar way along any row, or indeed down any column:

Theorem 4.1. For any n × n matrix M over a field and any 1 ≤ i ≤ n and 1 ≤ j ≤ n:
Pn i+k M
(i) det M = k=1 (−1) ik det M ik ;
Pn k+j M
(ii) det M = k=1 (−1) kj det M kj .

Proof. Omitted.

Computing the determinant using (i) is called expanding along the ith row, while using (ii) is
expanding down the jth column. Notice that expanding along the first row just means using the
definition of the determinant. But depending on the form of the matrix, choosing the row or column to
expand along/down carefully can make it much easier to compute the determinant.

24


1 0 3
Example. Suppose we wish to compute det  2 0 9 . Noticing the two 0 entries in the second column,
1 4 6
it makes sense to expand down this column, giving
     
2+1 2 9 2+2 1 3 2+3 1 3
det A = (−1) ×0×det + (−1) ×0×det + (−1) ×4×det = −4×3 = −12.
1 6 1 6 2 9

Using the definition (i.e. expanding along the first row) would have given the same answer (check for
yourself !), but with more calculation.

Terminology. The value det M ij is called that (i, j) minor of M , while the value (−1)i+j det M ij is
the (i, j) cofactor of M . The method we have seen for computing the determinant is called cofactor
expansion or Laplace10 expansion. (Note that some people who write mij for matrix entries use the
notation Mij for the (i, j) minor of M .)

4.2 Upper Triangular Matrices


A square matrix M is called upper triangular if Mij = 0 for all i > j, that is, if all entries below the main
diagonal are 0.
 
  1 0 3
1 3
Examples. and  0 7 −2  are upper triangular.
0 1
0 0 0

Remark. A square (non-augmented) row echelon matrix is always upper triangular.

Theorem 4.2. Let M be an n × n upper triangular matrix over a field. Then det M is the product of the
entries on the main diagonal:
Yn
det M = Mii .
i=1

Proof. See Exercise Sheet 3.

Corollary 4.3. det In = 1 for all n.

Proof. In is clearly upper triangular, so det In is the product of the entries on the main diagonal,
which is 1.

4.3 Elementary Row Operations and the Determinant


Cofactor expansion (even along a carefully chosen row or column) is not a very efficient way to compute
determinants. If the matrix is large it quickly becomes impractical. Just as with solving systems of equations
and computing inverses, row operations ride to the rescue!

Theorem 4.4. Let A and B be n × n matrices over a field. Then


i r →αr
i
(i) if A −− −−→ B then det B = α det A;
ri →ri +λrj
(ii) if A −−−−−−−→ B then det B = det A;
ri ↔rj
(iii) if A −−−−→ B then det B = − det A;

Corollary 4.5. If A and B are square matrices over a field and are row equivalent, then

det A = 0 ⇐⇒ det B = 0.
10
After Pierre-Simon, Marquis de Laplace (pronounced “la-PLASS”), 1749–1827.

25
Proof. It follows from Theorem 4.4 that none of the types of elementary row operations ever change whether
the determinant is 0. If A and B are row equivalent then there is a sequence of row operations transforming
A into B, and none of these can ever change whether the determinant is 0.

We will prove Theorem 4.4 shortly, but first let’s see how it helps us calculate a determinant. The idea
is to use row operations to transform the matrix into a form where the determinant is easy to calculate,
and keep track of how the operations affect the determinant so that we can recover the determinant of
the original matrix. Theorem 4.2 suggests it makes sense to aim for an upper triangular matrix; we know
every square matrix is row equivalent to one of these because row echelon matrices are upper triangular,
but usually it is not necessary to go as far as finding a row echelon matrix:

7 1
 
0 6 6
Example. Let’s find the determinant of the matrix  0 0 −6 .
1 −2 −4
Applying the operations r1 → 6r1 , r3 → r3 − r1 , r2 ↔ r3 gives an upper triangular matrix:
 
1 0 7
 0 −2 −11  .
0 0 −6
By Theorem 4.2 the determinant of this matrix is 1 × (−2) × (−6) = 12. By Theorem 4.4 the operation
r1 → 6r1 will have multiplied the determinant by 6, while the row swap will have changed the sign, so the
determinant of our original matrix must be − 16 (12) = −2.
Another application of Theorem 4.4 is to obtain the determinants of elementary matrices; this will prove
useful later.
Corollary 4.6. (i) det Eri →αri =
(ii) det Eri →ri +λrj =
(iii) det Eri ↔rj =
Proof. Each part follows by the corresponding part of Theorem 4.4 from the fact that Eρ is obtained from
In by the corresponding row operation ρ. For (iii), for example, Eri ↔rj is obtained from In by a row swap,
so by Theorem 4.4(iii) and Corollary 4.3 we have
det Eri ↔rj = − det In = −1.
Exercise. Do the other two cases.

4.4 Proof of Theorem 4.4 (not directly examinable)


This section contains a proof of Theorem 4.4. It is not directly examinable and will not be covered in
lectures. However, you are strongly encouraged to read it because (i) it will increase your understanding of
things which are examinable, (ii) it will give you experience of learning mathematics by reading and (iii) it
will begin to show you how complex proofs are put together in stages using lemmas.
Lemma 4.7. Let A be an n × n matrix. If A has two different rows the same, then det A = 0.
Proof. Again, we use induction on n. The case n = 1 is vacuously true: if there is only one row then there
cannot be two different rows the 11
 same, so the statement cannot be false.
a b
If n = 2 then A = for some a, b ∈ R, and now det A = ab − ab = 0.
a b
Now let n ≥ 3 and suppose for induction that the result holds for smaller matrices. Pick two rows in A
which are the same, and let r be the index of some other row. Expanding along row r, we have
n
X
det A = (−1)k+r Ark det Ark .
k=1
11
Recall the flying pigs from Exercise Sheet 0!

26
For each k, notice that Ark is an (n − 1) × (n − 1) matrix with two rows the same. Hence, by the inductive
hypothesis, all of these matrices have determinant 0, so det A = 0.

Question/Exercise. Why, in the proof of Lemma 4.7, do we have to start the induction from n = 3? In
other words, why can’t we just use n = 1 as the base case?

Lemma 4.8. Let X, Y and Z be n × n matrices which are all the same except in row i, and suppose
Zij = Xij + Yij for all j. Then det Z = det X + det Y

Proof. Expand det Z along the ith row (exercise: write out the details!).

We are now ready to prove Theorem 4.4.


i r →αr
i
Proof of Theorem 4.4. (i) Suppose A −− −−→ B. Expanding along the ith row we have:
n
X
det B = (−1)k+i Bik det B ik
k=1
n
X
= (−1)k+i (αAik ) det Aik
k=1
Xn
= α (−1)k+i Aik det Aik
k=1
= α det A.

ri →ri +λrj
(ii) Suppose A −−−−−−−→ B. Let D be the matrix which is the same as A and B, except that the ith row
is λ times the jth row of A. Then we are in the position of Lemma 4.8 with X = A, Y = D and Z = B, so
we have
det B = det A + det D.
Now let C be the matrix which is the same as D, except that the ith row is exactly the jth row of A. Then
D can be obtained from C by applying the transformation ri → λri , so by part (i), det D = λ det C. But C
has two rows (rows i and j) the same, so by Lemma 4.7, det D = λ0 = 0. Thus, det B = det A + 0 = det A.
ri ↔rj
(iii) Suppose A −−−−→ B. We define matrices F , G, H, P and Q which are the same as A (and B) except
in rows i and j, where their entries are as follows:

• row i and row j of F are both the sum of rows i and j in A;

• row i of G is the sum of rows i and j in A; row j of G is row j of A;

• row i of H is the sum of rows i and j in A; row j of H is row i of A;

• rows i and j of P are row i of A; and

• rows i and j of Q are row j of A.

Now we are in the position of Lemma 4.8 with X = G, Y = H and Z = F , so we have det F = det G+det H.
Similarly, the conditions of Lemma 4.8 are satisfied with X = A, Y = Q, Z = G and also with X = B, Y =
P, Z = H so we obtain
det G = det A + det Q and det H = det B + det P.
Notice also that each of F , P and Q has two identical rows, so by Lemma 4.7 they all have determinant 0.
Hence
0 = det F = det G + det H = det A + det Q + det P + det B = det A + det B
so that det A = − det B as required.

27
4.5 Determinants and Inverses
We saw in Section 1.6 that a 2 × 2 matrix is invertible if and only its determinant is non-0. We now know
nearly enough to prove the corresponding statement for n × n matrices: just one more lemma is needed.

Lemma 4.9. Let M be an n × n reduced row echelon matrix over a field. Then either M = In or M has a
zero row and has determinant 0.

Proof. See Exercise Sheet 3.

Theorem 4.10. An n × n matrix A over a field is invertible if and only if det A 6= 0.

Proof. By Theorem 2.2, A is row equivalent to a reduced row echelon matrix M . By Theorem 3.4, A is
invertible if and only if M = In . By Lemma 4.9, M = In if and only if det M 6= 0. But M is row equivalent
to A, so by Corollary 4.5, det M 6= 0 if and only if det A 6= 0.

We also saw in Section 1.6 that for 2 × 2 matrices there is an explicit formula for the inverse in terms of
determinants. The formula generalises to larger matrices, although it is often too complicated to be useful
in practice:

Theorem 4.11 (Cramer’s Rule12 ). If A is invertible then


1
A−1 = B
det A
where B is the matrix given by
Bij = (−1)i+j det Aji .

The matrix B in the statement of the theorem is called the adjugate of A.

Exercise. Check that if A is a 2 × 2 matrix, Cramer’s Rule simplifies to the formula from Section 1.6.

4.6 Multiplicativity of the Determinant


In this section we will establish another important property of the determinant, namely that det(AB) =
det A det B for all n × n matrices A and B. This property is called multiplicativity of the determinant.
To prove it we will use elementary matrices.

Lemma 4.12. Let E1 , . . . , Ek be n × n elementary matrices. Then for every n × n matrix M ,

det(Ek . . . E1 M ) = det(Ek . . . E1 ) det M.

Proof. By strong induction on k. Suppose first k = 1. Then E1 = Eρ for some elementary row operation ρ.
Let M be an n × n matrix. Now Eρ M is obtained from M by the operation ρ. For each of the three types of
elementary row operation, we can check using the appropriate parts of Theorem 4.4 and Corollary 4.6 that
det(Eρ M ) = det Eρ det M . For example, if ρ is a row swap then by Theorem 4.4(i), det(Eρ M ) = − det M
and by Corollary 4.6(i) det(Eρ ) = −1.
Now let k ≥ 2 and suppose the statement is true for smaller k. Then applying the inductive hypothesis
three times, with the role of M played first by E1 M , then by M itself, and finally by E1 , we obtain:

det(Ek Ek−1 . . . E1 M ) = det(Ek . . . E2 ) det(E1 M )


= det(Ek . . . E2 ) det E1 det M
= det(Ek . . . E1 ) det M.

Theorem 4.13. Let A and B be n × n matrices over a field. Then

det(AB) = (det A)(det B)


12
Gabriel Cramer (1704–52).

28
Proof. Write A = Ek . . . E1 M where M is the reduced row echelon form of A and the Ei s are elementary
matrices. Then by Lemma 4.12:

det(AB) = det(Ek . . . E1 M B) = det(Ek . . . E1 ) det(M B).

We now consider two cases:

Case 1: M = In . In this case we have M B = B and Ek . . . E1 = A so

det(AB) = det(Ek . . . E1 ) det(M B) = det A det B.

Case 2: M 6= In . In this case A is not invertible and so by Theorem 4.10, det A = 0. Also, by Lemma 4.9
M has a row of zeros. It follows that M B has a row of zeros, and expanding along this row we see that
det(M B) = 0. Now
det(AB) = det(Ek . . . E1 ) det(M B) = 0 = (det A)(det B).

29
5 Vectors, Eigenvalues and Subspaces
5.1 Vectors and Matrices
Recall that real n-space is the set

Rn = {(a1 , . . . , an ) | a1 , . . . , an ∈ R}.

The elements of Rn are called (real) n-vectors, or just vectors when n is clear. More generally, if K is a
field then
K n = {(a1 , . . . , an ) | a1 , . . . , an ∈ K}.
is the space of n-vectors over K.

The Zero Vector. We write 0n for the zero vector (0, . . . , 0) ∈ K n , or just 0 where n is clear.

Vectors as Matrices. There is an obvious way to think about an n-vector (a1 , . . . , an ) as a 1 × n row
matrix (a1 a2 . . . an ); a vector thought of in this way we call a row n-vector. Similarly, a vector may be
thought of as an n × 1 column matrix, in which case we call it a column n-vector. Notice that each row
n-vector is the transpose of the same vector thought of as a column n-vector.

Vector Operations. We know how to add matrices of the same size (Section 1.1), and how to scale a
matrix by a scalar (Section 1.2). We can also define addition and scaling on vectors, by thinking of them as
row (or column) vectors: The resulting operations are:

(a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , . . . , an + bn ) and λ(a1 , . . . , an ) = (λa1 , . . . , λan ).

Since row and column vectors are just matrices, we can also multiply them by matrices which have the right
size. For example, if v is a column n-vector and M is an n × n matrix13 then M v is defined (since the
number of columns in M equals the number of rows in v) and gives another column n-vector. On the other,
hand, vM is not defined14 .

5.2 Eigenvalues and Eigenvectors

Definition. Let A be an n × n matrix over a field. We say that a scalar λ is an eigenvalue15 of A if there
is a non-zero column n-vector (that is, an n × 1 matrix) X such that

AX = λX.

The vector X is called an eigenvector of A associated (or corresponding) to the eigenvalue λ.


     
3 1 1 2
Example. Let A = ,X= and Y = . Then
1 3 1 2
   
4 8
AX = = 4X and AY = = 4Y
4 8

so 4 is an eigenvalue of A, and X and Y are both corresponding eigenvectors.


13
Over the same field as v, of course. Although what we are doing makes sense over any field, it doesn’t make sense to “mix
and match” different fields at the same time: we need to assume that, at any one time, all matrices and vectors considered are
over the same fixed field.
14
Unless n = 1. Although it would be defined for n > 1 if v were a row n-vector! For this reason we should technically
distinguish between row vectors and column vectors — in other words, officially regard them as different objects, just as we do
with scalars and 1 × 1 matrices (recall Exercise Sheet 1). In practice, it is often convenient to blur these distinctions where no
confusion can arise; in particular, we will often think of K n as the space of column n-vectors
15
From the German “eigen” (pronounced EYE-gn) meaning “own” or “private”, in the sense that they are the matrix’s own
special values.

30
Definition. If A is an n × n matrix then the characteristic polynomial of A is defined by

χA (x) = det(A − xIn ).

Remark. The symbol x here is a “formal variable”. What do we mean by A − xIn ? It is a matrix whose
entries are polynomials with variable x. Polynomials don’t actually form a field (exercise: which axioms
fail?) but its determinant is calculated just as if they did, by expanding along a row or column, and using
algebraic addition and multiplication of polynomials in place of the normal addition and multiplication of
numbers. What comes out is a polynomial (of degree n, as it happens) in the variable x.

Example (continued). Let A be the matrix above. Then


     
3 1 x 0 3−x 1
A − xIn = − =
1 3 0 x 1 3−x

and
 
3−x 1
χA (x) = det(A − xIn ) = det
1 3−x
= (3 − x) × (3 − x) − 1 × 1 = (3 − x)2 − 1 = x2 − 6x + 8 = (x − 4)(x − 2).

I have factorised the polynomial so we can see what the roots are: notice that one of them is 4, which we
saw was an eigenvalue of A. In fact, this is a general phenomemon:

Theorem 5.1. Let A be an n × n matrix over a field K, and λ ∈ K. Then λ is an eigenvalue of A if and
only if χA (λ) = 0.

Proof. We have

λ is an eigenvalue of A ⇐⇒ AX = λX for some X 6= 0n×1


⇐⇒ AX − λX = 0n×1 for some X 6= 0n×1
⇐⇒ AX − λIn X = 0n×1 for some X 6= 0n×1
⇐⇒ (A − λIn )X = 0n×1 for some X 6= 0n×1
⇐⇒ A − λIn is not invertible (by Theorem 3.4)
⇐⇒ det(A − λIn ) = 0 (by Theorem 4.10)
⇐⇒ χA (λ) = 0 (by the definition of χA (x))

Example (continued). Keeping A as above, we know that χA (λ) has roots 2 and 4. We already know
that 4 is an eigenvalue, but Theorem
 5.1 tells us that 2 is an eigenvalue as well. What are the corresponding
x1
eigenvectors? If X = is one of them then AX = 2X, or equivalently, (A − 2In )X = 02×1 , that is,
x2
    
1 1 x1 0
= .
1 1 x2 0

The solutions to this are x1 = t, x2 = −t, so the set of all eigenvectors corresponding to the eigenvalue 2 is:
  
t
| t ∈ R, t 6= 0 .
−t

Notice how we exclude the case t = 0, since this gives the zero vector which by definition is not an eigenvector.

Remark. By the same line of reasoning, the collection of eigenvectors corresponding to a given eigenvalue
will always be the solution space to some system of homogeneous equations (minus the trivial solution,
which is the zero vector).

31
 
0 1
Example. What are the eigenvalues and eigenvectors of A = ? This time
−1 0
 
−x 1
χA (x) = det(A − xI2 ) = det = (−x)(−x) − (1 × (−1)) = x2 + 1
−1 −x

so the eigenvalues are the solutions to x2 + 1 = 0. If we regard A as a matrix over R (or Q) there are no
solutions in K, so A does not have any (real) eigenvalues. On the other hand, A is also a matrix over C
(every real number is also a complex number!), and this equation does have solutions in C, namely x = i
and x = −i. So A has (complex) eigenvalues i and −i. (Can you find the corresponding eigenvectors?)

5.3 Linear Combinations and Spans

Definition. Let v ∈ K n and S = {w1 , . . . , wt } ⊆ K n . Then

(i) v is called a linear combination of w1 , . . . , wt if there exist λ1 , . . . , λt ∈ K such that

v = λ1 w1 + λ2 w2 + · · · + λt wt .

(ii) span(S) is the set of all vectors which are linear combinations of w1 , . . . , wt , that is

span(S) = {µ1 w1 + µ2 w2 + · · · + µt wt | µ1 , . . . , µt ∈ K}

Example. Let w1 = (1, 2, −1), w2 = (6, 4, 2) and S = {w1 , w2 } ⊆ R3 .

(i) Is v = (9, 2, 7) a linear combination of w1 and w2 ? In other words, can we find λ1 , λ2 ∈ R such that
v = λ1 w1 + λ2 w2 ? In other words, such that

(9, 2, 7) = λ1 (1, 2, −1) + λ2 (6, 4, 2)


= (λ1 , 2λ1 , −1λ1 ) + (6λ2 , 4λ2 , 2λ2 )
= (λ1 + 6λ2 , 2λ1 + 4λ2 , −1λ1 + 2λ2 )

In other words, such that

λ1 + 6λ2 = 9, 2λ1 + 4λ2 = 2 and − 1λ1 + 2λ2 = 7.

This is a just a system of linear equations. Solving (e.g. by Gaussian elimination), we see that there
is a solution (λ1 = −3 and λ2 = 2). So the answer is YES, v is a linear combination of w1 and w2 .

(ii) Is v = (4, −1, 8) a linear combination of w1 and w2 ? This time v = λ1 w1 + λ2 w2 would mean

(4, −1, 8) = λ1 (1, 2, −1) + λ2 (6, 4, 2),

that is,
λ1 + 6λ2 = 4, 2λ1 + 4λ2 = −1 and − 1λ1 + 2λ2 = 8.
These equations have no solutions (check!), so this time v is not a linear combination of w1 and w2 .

(iii) What is span(S)? By definition we have:

span(S) = {λ1 w1 + λ2 w2 | λ1 , λ2 ∈ R}
= {(λ1 + 6λ2 , 2λ1 + 4λ2 , −1λ1 + 2λ2 ) | λ1 , λ2 ∈ R}

By the previous parts we have (9, 2, 7) ∈ span(S) but (4, −1, 8) ∈


/ span(S).

32
Remark. Notice that the zero vector is always in span(S), since it can be written as

0 = 0w1 + 0w2 + · · · + 0wt .

Remark/Definition. What if S is the empty set ∅? You might expect span(∅) to be the empty set, but
it turns out to be convenient to define
span(∅) = {0}.
so that, even in this case, the zero vector is always in the span.

5.4 Subspaces

Definition. Let U be non-empty subset of K n . Then U is a subspace of K n if

(i) ∀u1 , u2 ∈ U, u1 + u2 ∈ U (U is “closed under addition”); and

(ii) ∀λ ∈ K, ∀u ∈ U, λu ∈ U (U is “closed under scaling”).

Examples and Non-examples.

(i) Rn of Rn .

(ii) {0n } of Rn .

(iii) {(λ, 0) | λ ∈ R} of R2 .

(iv) {(µ, µ) | µ ∈ R} of R2 .

(v) {(λ, 1) | λ ∈ R} of R2 .

(vi) ∅ Rn for any n.

Proposition 5.2. If U is a subspace of K n then 0n ∈ U .

Proof. Since U is non-empty we may choose some u ∈ U . Now (−1)u ∈ U so 0n = u + (−1)u ∈ U .

Theorem 5.3. Let S = {w1 , . . . wt } ⊆ K n . Then span(S) is a subspace of K n .

Proof. We have seen that the zero vector is always in span(S), so span(S) is certainly not empty.
Let u, v ∈ span(S). Then

u = λ1 w1 + λ2 w2 + · · · + λt wt and v = µ1 w1 + µ2 w2 + · · · + µt wt

for some λ1 , . . . , λt , µ1 , . . . , µt ∈ K. Now we have

u + v = (λ1 w1 + λ2 w2 + · · · + λt wt ) + (µ1 w1 + µ2 w2 + · · · + µt wt )
= (λ1 + µ1 )w1 + (λ2 + µ2 )w2 + · · · + (λt + µt )wt

which means that u + v ∈ span(S).


Now let λ ∈ K and u ∈ span(S). Then again

u = λ1 w1 + λ2 w2 + · · · + λt wt

for some λ1 , . . . , λt ∈ K, so

λu = λ(λ1 w1 + λ2 w2 + · · · + λt wt ) = (λλ1 )w1 + (λλ2 )w2 + · · · + (λλt )wt

which means that λu ∈ span(S).


Thus, span(S) is a subspace of K n .

33
The following theorem gives many more examples of subspaces.

Theorem 5.4. Let M be an m × n matrix over a field K. Let U be the set of all n × 1 column vectors X
such that
M X = 0m×1 .
Then U is a subspace of K n .

Proof. First notice that 0n×1 ∈ U , so U is non-empty.


Now if u, v ∈ U then M u = M v = 0m×1 so using Theorem 1.2

M (u + v) = M u + M v = 0m×1 + 0m×1 = 0m×1

which means that u + v ∈ U .


Similarly, if λ ∈ K and u ∈ U then16

M (λu) = λ(M u) = λ0m×1 = 0m×1

so that λu ∈ U .
Thus, U is a subspace of K n .

Definition. The subspace U in Theorem 5.4 is called the kernel or nullspace of the matrix M . (More on
this in the second part of the course.)

Corollary 5.5. For any homogeneous system of m linear equations with n variables, the solution space is
a subspace of K n .

Proof. Because the constant terms are all zero, when we write the equations in matrix form (see Section 2.4)
they have the form
M X = 0m×1 ,
where M is m × n matrix of coefficients and X is the column matrix of variables. So the solution set is by
definition exactly the kernel of the matrix of coefficients M .

5.5 Dependence and Independence

Definition. S = {w1 , . . . , wt } ⊆ K n is a linearly independent set if the only way to write 0 as a linear
combination
0 = λ1 w1 + λ2 w2 + · · · + λt wt .
is with λ1 = λ2 = · · · = λt = 0. Otherwise S is a linearly dependent set.

Example. In R2 , the set {(1, 0), (0, 1)} is linearly independent. It should be obvious that

λ1 (1, 0) + λ2 (0, 1) = (λ1 , λ2 )

so the only way this can be 0 is if λ1 = λ2 = 0. Intuitively, if you start at the origin and travel some non-
zero distance in the x-direction, you clearly can’t return to the origin by travelling only some distance (even
“backwards”) in the y-direction. In this sense, the x-direction and y-direction are independent directions,
which is why the vectors (1, 0) (in the x-direction) and (0, 1) (in the y-direction) are linearly independent.

Example. Let w1 = (1, −2, 3), w2 = (5, 6, −1), w3 = (3, 2, 1) and S = {w1 , w2 , w3 } ⊆ R3 . Is S linearly
dependent or linearly independent? To answer this we need to consider solutions to

λ1 w1 + λ2 w2 + λ3 w3 = (0, 0, 0), in other words,


16
Again, this argument uses the basic properties of matrix operations from Theorem 1.2. From here on we will use these
without explicit reference.

34
λ1 + 5λ2 + 3λ3 = 0, −2λ1 + 6λ2 + 2λ3 = 0, 3λ1 − λ2 + λ3 = 0

Gaussian elimination tells us that the solutions are of the form

−t
λ1 = λ2 = , λ3 = t
2

for t ∈ R. Putting t = 2, for example, gives a solution with λ1 , λ2 , λ3 not all zero, so S is linearly dependent.17

Example. Let w1 = (4, −1, 2), w2 = (−4, 10, 2) and S = {w1 , w2 } ⊆ R3 . Is S linearly dependent or
independent? This time we consider the solutions to

λ1 w1 + λ2 w2 = (0, 0, 0), that is,

4λ1 − 4λ2 = 0, −λ1 + 10λ2 = 0, 2λ1 + 2λ2 = 0.

Here the only solution is λ1 = λ2 = 0 (check!), which means that S is linearly independent.

5.6 Basis and Dimension

Definition. Let U be a subspace of K n . A basis for U is a subset S ⊆ U such that

(i) S is linearly independent; and

(ii) span(S) = U

Example. Let S = {(4, −1, 2), (−4, 10, 2)}. We saw in the example at the end of Section 5.5 that S is
linearly independent, so S is a basis for span(S).

Example. The set


S = { (1, 0, 0, . . . , 0), (0, 1, 0 . . . , 0), . . . , (0, 0, . . . , 0, 1) }

is a basis for Rn (exercise: why?). It is called the standard basis for Rn .

Theorem 5.6. Suppose U is a subspace of K n . Then U has a finite basis, and if B1 and B2 are two different
bases18 for U then |B1 | = |B2 |.

Proof. This will be proved in the second part of the course.

Definition. If U is a subspace of K n then the dimension of U is |B| where B is a basis for U . We write
dim U for the dimension of U . (Notice how Theorem 5.6 ensures that this definition makes sense: without
it, the dimension might depend on which basis B we choose!)

Lemma 5.7. Let U be a subspace of K n . If S ⊆ U , S is linearly independent and |S| = dim U then S is a
basis for U .

Proof. Again, to be proved in the second part of the course.


17
Alternatively, we could observe that the presence of a parameter in the solution description immediately tells us that there
are infinitely many solutions, so λ1 = λ2 = λ3 = 0 obviously can’t be the only solution!
18
“Bases”, pronounced “bay-seez”, is the plural of “basis”.

35
5.7 Geometry of Subspaces in Rn
An m-dimensional subspace of Rn basically looks like a copy of Rm sitting inside Rn and containing the
origin. In other words, it is a copy of m-space inside n-space, for example a line in space, a plane in space,
a 3-space in 4-space and so on.

Example. A 2-dimensional subspace of R3 is simply a plane (through the origin, since a subspace always
contains the zero vector). A 1-dimensional subspace is a line through the origin, while a 0-dimensional
subspace is a point (which has to be the origin!).

Remark. Intuitively, you can think of a basis as giving a system of m coordinates for describing points
in the subspace. For example, if U is a 2-dimensional subspace of R3 then it is a plane, so we should be able
to describe points on it with only 2 coordinates. Choose a basis {u, v} for U . Then each point in U can be
written as λu + µv in exactly one way, and we can think of (λ, µ) as its coordinates in the plane. Notice
how the basis vectors themselves are u = 1u + 0v and v = 0u + 1v, so they get coordinates (1, 0) and (0, 1)
respectively.

36
6 Orthogonality
In this chapter, we shall work only with vectors and matrices over R, and not over a general field. (The
reason for this will become apparent.)

6.1 The Euclidean Inner Product and Norm

Definition. Let u = (a1 , . . . an ), v = (b1 , . . . , bn ) ∈ Rn . The (Euclidean19 ) inner product on Rn is


defined by
hu | vi = a1 b1 + a2 b2 + · · · + an bn .

Remark. If u and v are a column n-vectors, notice that hu | vi is just the entry of the 1 × 1 matrix uT v.
Proposition 6.1. Let u, v ∈ Rn . Then
(i) hu | vi = hv | ui;
(ii) hu + v | wi = hu | wi + hv | wi;
(iii) hλu | vi = λhu | vi = hu | λvi;
(iv) hv | vi ≥ 0, and hv | vi = 0 ⇐⇒ v = 0n .
Proof. Let u = (a1 , . . . , an ), v = (b1 , . . . , bn ) and w = (c1 , . . . cn ).
(i) Clear from the definition.
(ii) u + v = (a1 + b1 , a2 + b2 , . . . , an + bn ), so
hu + v | wi = (a1 +b1 )c1 +(a2 +b2 )c2 +· · ·+(an +bn )cn = (a1 c1 . . . an cn )+(b1 c1 +. . . bn cn ) = hu | wi+hv | wi.

(iii) λu = (λa1 , . . . , λan ) so


hλu | vi = λa1 b1 + λa2 b2 + . . . λan bn = λ(a1 b1 + · · · + an bn ) = λhu | vi
and similarly for (u, λv).
(iv) hv | vi = b21 + · · · + b2n ≥ 0 since all the bi are real20 and is clearly equal to 0 if and only if
b1 = b2 = · · · = bn = 0, that is, v = 0n .

Definition. For v = (a1 , a2 , . . . , an ) ∈ Rn , the (Euclidean) norm (also called the magnitude or length)
of v is p q
||v|| = hv | vi = a21 + a22 + · · · + a2n .

Remark. The norm of v is just the distance from the origin to the point with coordinates given by
v. (Notice that in 2 dimensions this is Pythagoras’ Theorem!). p In particular, note that ||v|| ≥ 0 and
||v|| = 0 ⇐⇒ v = 0n . If n = 1, say v = (v1 ), notice that ||v|| = v12 is just the absolute value |v1 | of the
entry v1 , that is, v1 if v1 ≥ 0 and −v1 if v1 < 0.

Remark. The norm gives us an algebraic way to talk about distance in Rn : the distance between the
points with coordinate vectors a and b is ||a − b||. This is the “usual” notion of the distance between two
points (there are others!) in Rn , as first studied in detail by Euclid.
19
After Euclid of Alexandria, 3rd-4th century BCE. He didn’t invent inner products — vectors in the abstract sense we are
studying are a 19th century innovation — but we’ll see shortly that this definition gives a notion of distance in Rn which makes
it into the geometric space which Euclid studied. The inner product is sometimes called the scalar product (because it gives
a scalar) or the dot product (because some people write it u · v instead of hu | vi).
20
For the first time in the whole course, we are using here a fact about the real numbers which isn’t true for fields in general.
Actually property (iv) in the statement of the proposition does not really make sense over C (for example) since the order ≥
does not make sense on the complex numbers. It turns out that we can do something similar to the inner product for complex
numbers (see Exercise Sheet 5) but not for fields in general.

37
6.2 Orthogonality and Orthonormality

Definition. Two vectors u and v are called orthogonal if hu | vi = 0. A set S ⊆ Rn is called an orthogonal
set if every pair of vectors from S is orthogonal.

Remark. Geometrically, vectors are orthogonal if the lines from the origin to the corresponding points are
at right angles (or if at least one of them is the zero vector).

Example. Let u = (−1, 3, 2), v = (4, 2, −1) and w = (1, 1, 2). Then
hu | vi = (−1 × 4) + (3 × 2) + (2 × −1) = 0
so u and v are orthogonal. On the other hand,
hu | wi = (−1 × 1) + 3 × 1 + 2 × 2 = 6
so u and w are not orthogonal. Hence, the set {u, v} is orthogonal, but the set {u, v, w} is not. Also,
p p √
||u|| = hu | ui = (−1)2 + 32 + 22 = 14

Exercise. What are ||w|| and hv | wi? Is {v, w} an orthogonal set?


Proposition 6.2. Let u, v ∈ Rn and λ, µ ∈ R. If u and v are orthogonal then λu and µv are orthogonal.
Proof. If u and v are orthogonal then hu | vi = 0, so using Proposition 6.1(iii) twice,
hλu | µvi = λhu | µvi = λµhu | vi = 0,
so λu and µv are orthogonal.

Definition. A vector u ∈ Rn is called a unit vector if ||u|| = 1. A set S ⊆ Rn is called an orthonormal


set if
(i) S is an orthogonal set; and
(ii) every vector in S is a unit vector.

Example. In R3 the set {(1, 0, 0), (0, 0, 1)} is orthonormal (exercise: check!).
Lemma 6.3. Let v ∈ Rn and λ ∈ R. Then ||λv|| = |λ| ||v||.
Proof. Using Proposition 6.1(iii) again,
p p p
||λv|| = hλv | λvi = λ2 hv | vi = |λ| hv | vi = |λ| ||v||.

1
Corollary 6.4. If v ∈ Rn is a non-zero vector then ||v|| v is a unit vector.
Proof. Since ||v|| is positive, by Lemma 6.3 we have

1 1
||v|| = 1 ||v|| = 1.
|| v|| =

||v|| ||v|| ||v||

Example. Let u = (0, 1, 0), v = (1, 0, 1) and w = (1, 0, −1) and let S = {u, v, w}. Then hu | vi = hu | wi =
hv | wi = 0 (exercise: √check!) so S is an orthogonal set. Although ||u|| = 1 the set is not orthonormal
because ||v|| = ||w|| = 2 (exercise: again, check!). However, by Corollary 6.4 and Proposition 6.2 the
set       
1 1 1 1 1 1
u, v, w = (0, 1, 0), √ , 0, √ , √ , 0, − √
||v|| ||w|| 2 2 2 2
is orthonormal.

38
6.3 Orthonormal Bases

Definition. Let U be a subspace of Rn and let S ⊆ U . We say that S is an orthonormal basis for U if
S is an orthonormal set and S is a basis for U .
Recall (from Section 5.6) that if S is a basis for U then we can write every vector in U as a linear
combination of vectors in S. In general, finding how to write a given vector in this way involves solving a
system of linear equations. When S is an orthonormal basis, however, things are much easier:

Lemma 6.5. Let U be a subspace of Rn and S = {v1 , v2 , . . . , vk } an orthonormal basis for U . Then for
every u ∈ U ,
u = hu | v1 iv1 + hu | v2 iv2 + · · · + hu | vk ivk .

Proof. Since S is a basis for U and u ∈ U we can write u = λ1 v1 + · · · + λk vk for some λ1 , . . . , λk ∈ R. Now
for each i,

hu | vi i = hλ1 v1 + λ2 v2 + · · · + λk vk | vi i
= λ1 hv1 | vi i + λ2 hv2 | vi i + · · · + λk hvk | vi i (by Proposition 6.1)
= λi hvi | vi i (because hvj | vi i = 0 for j 6= i)
= λi (because hvi | vi i = 1)

Theorem 6.6. If S = {v1 , . . . , vk } is an orthogonal set of non-zero vectors in Rn then S is


linearly independent.

Proof. Suppose
λ1 v1 + λ2 v2 + · · · + λk vk = 0n .

Recalling the definition of linear independence (see Section 5.5), we need to show that λ1 = · · · = λk = 0.
For each i we have

0 = h0n | vi i
= hλ1 v1 + · · · + λk vk | vi i
= λ1 hv1 | vi i + · · · + λk hvk | vi i (by Proposition 6.1)
= λi hvi | vi i (because hvj | vi i = 0 for j 6= i)

Now vi is a non-zero vector so hvi | vi i 6= 0 by Proposition 6.1(iv), so we must have λi = 0. Thus, S is


linearly independent.

Example. Let S = {u, v, w} where u = (0, 1, 0), v = − 54 , 0, 35 and w = ( 35 , 0, 45 ). Then S is an orthogonal




(in fact orthonormal) set (check!) so, by Theorem 6.6, S is linearly independent. Since dim R3 = 3 = |S|,
Lemma 5.7 tells us that S is a basis (in fact, an orthonormal basis) for R3 .
Hence every element of R3 can be written as a linear combination of u, v and w, and because S is
an orthonormal basis, Lemma 6.5 gives us a practical way to do this. For example, consider x = (1, 1, 1) ∈ R3 .
We have:
1 7
(1, 1, 1) = x = hx | uiu + hx | viv + hx | wiw = u − v + w.
5 5

Remark. Recall from Section 5.7 that we can think of a basis for an m-dimensional subspace as giving a
system of m coordinates for describing points in the subspace. In particular, viewing R3 as a subspace of
itself, the basis S gives a different system of coordinates for R3 . The point which has “proper” coordinates
(1, 1, 1) has coordinates 1, − 51 , 75 in this new system of coordinates.

39
6.4 Gram-Schmidt Orthogonalization
We have seen that for understanding a subspace, it is useful to have an orthonormal basis. This naturally
leads to two questions: does every subspace have an orthonormal basis, and if so, how do we find it?
Theorem 6.7. Every subspace of Rn has an orthonormal basis.
By way of a proof, we will present an algorithm called the Gram-Schmidt21 Orthogonalization
Process, which answers the second question of how we find an orthonormal basis. The algorithm starts
with a basis for a subspace of Rn (recall that by Theorem 5.6, every subspace has one!) and produces from
it an orthonormal basis for the same subspace.
Let U be a subspace of Rn , and {u1 , . . . , uk } ⊆ Rn a (not necessarily orthonormal) basis for U .

Step 1. Set v1 = u1 .

Step 2. Set v2 = u2 − hu 2 |v1 i


hv1 |v1 i v1 . Notice that v2 ∈ U , because it is a linear combination of things from U
and U is a subspace. Notice also that22
hu2 | v1 i
hv1 | v2 i = hv1 | u2 − v1 i = hv1 | u2 i − hu2 | v1 i = 0.
hv1 | v1 i
Moreover, v2 6= 0n , because otherwise we would have:
hu2 | v1 i hu2 | v1 i
0n = v2 = u2 − v1 = 1u2 − u1
hv1 | v1 i hv1 | v1 i
which is impossible because the set {u1 , . . . , uk } is linearly independent.

Step 3. Set v3 = u3 − hu 3 |v2 i


hv2 |v2 i v2 −
hu3 |v1 i
hv1 |v1 i v1 . By similar arguments to step 2, v3 ∈ U and v3 6= 0n . Now
because hv1 | v2 i = 0 we have
hu3 | v2 i hu3 | v1 i
hv1 | v3 i = hv1 | u3 − v2 − v1 i
hv2 | v2 i hv1 | v1 i
hu3 | v2 i hu3 | v1 i
= hv1 | u3 i − hv1 | v2 i − hv1 | v1 i
hv2 | v2 i hv1 | v1 i
= hv1 | u3 i − hu3 | v1 i = 0

A similar argument (exercise!) shows that hv2 | v3 i = 0.


Now continue in the same way until finally.....

Step k. Set
huk | vk−1 i huk | v2 i huk | v1 i
vk = u k − vk−1 − · · · − v2 − v1 .
hvk−1 | vk−1 i hv2 | v2 i hv1 | v1 i
Again we find that vk 6= 0, vk ∈ U and hvi | vk i = 0 for all i < k.
So we end up with non-zero vectors v1 , . . . , vk ∈ U such that each vj is orthogonal to each vi with i < j.
Thus, v1 , . . . , vk is an orthogonal (but not yet orthonormal!) subset of U . Now set
 
1 1 1
S = v1 , v2 , . . . , vk .
||v1 || ||v2 || ||vk ||
Then S is orthogonal by Proposition 6.2 and the elements are unit vectors by Corollary 6.4, so S is or-
thonormal. Now by Theorem 6.6, S is linearly independent. Finally,

|S| = k = |{u1 , . . . , uk }| = dim U

so by Lemma 5.7, S is a basis for U .


21
After Jørgen Pedersen Gram (1850–1916) and Erhard Schmidt (1876–1959), although the idea was known rather earlier.
22
From here on we will make frequent use of the basic properties of the inner product, given by Proposition 6.1, without
explicit mention.

40
Example. Let u1 = (1, 1, 1), u2 = (0, 3, 3) and u3 = (0, 0, 1). Consider the subspace

U = span({u1 , u2 , u3 }) ⊆ R3

of all linear combinations of these vectors. It is easy to check that these three vectors are linearly independent,
so they form a basis for U . However, they are neither orthogonal (check!) nor (except for u3 ) unit vectors,
so they are not an orthonormal basis. We apply the Gram-Schmidt process to compute an orthonormal
basis. First we set:
v1 = u1 = (1, 1, 1)
Now hu2 | v1 i = 6 and hv1 | v1 i = 3 so we set

hu2 | v1 i 6
v2 = u2 − v1 = (0, 3, 3) − (1, 1, 1) = (−2, 1, 1).
hv1 | v1 i 3

Now we have hv2 | v2 i = 6, hu3 | v1 i = 1 and hu3 | v2 i = 1 so we set


 
hu3 | v2 i hu3 | v1 i 1 1 1 1
v3 = u3 − v2 − v1 = (0, 0, 1) − (−2, 1, 1) − (1, 1, 1) = 0, − , .
hv2 | v2 i hv1 | v1 i 6 3 2 2

Thus,   
1 1
{v1 , v2 , v3 } = (1, 1, 1), (−2, 1, 1), 0, − ,
2 2
is a basis of orthogonal vectors for the space U . To turn it into an orthonormal basis we just have to scale
to get unit vectors, so  
1 1 1
√ (1, 1, 1), √ (−2, 1, 1), √ (0, −1, 1)
3 6 2
is an orthonormal basis for U .

Remark. What is U in the previous example? In fact since {u1 , u2 , u3 } is a linearly independent set of size
3 = dim R3 it must be that U = R3 ! So what we have found is an orthonormal basis for R3 .

Exercise. Use the Gram-Schmidt process to convert {u1 , u2 , u3 } into an orthonormal basis for R3 where

u1 = (1, 0, 0), u2 = (3, 2, −2) and u3 = (1, 4, 1).

6.5 Orthogonal Matrices

Definition. A square matrix A is called orthogonal if A is invertible and A−1 = AT .

Theorem 6.8. Let A be an n × n matrix. Then the following are equivalent:

(i) A is orthogonal;

(ii) AT A = In ;

(iii) AAT = In ;

(iv) the columns of A (viewed as column n-vectors) form an orthonormal basis for Rn ;

(v) the rows of A (viewed as row n-vectors) form an orthonormal basis for Rn ;

Proof. The equivalence of (i), (ii) and (iii) follows from Exercise 5.8.
Now let c1 , . . . , cn be the columns of A, viewed as column n-vectors. It follows from the definitions of
matrix multiplication, the tranpose and the inner product that
n
X n
X
T T
(A A)ij = (A )ik Akj = Aki Akj = hci | cj i
k=1 k=1

41
for all i and j. So AT A = In means exactly that

(AT A)ii = hci | ci i = 1 for all i and (AT A)ij = hci | cj i = 0 for i 6= j,

which means exactly that {c1 , . . . , cn } is an orthonormal set. Since dim Rn = n, by Lemma 5.7 this is the
same as {c1 , . . . , cn } being an orthonormal basis. This shows that (ii) ⇐⇒ (iv).
A very similar argument with rows instead of columns (exercise!) shows (iii) ⇐⇒ (v).

Remark/Warning. The terminology here is very confusing: an orthogonal matrix is one whose rows
(and columns) are orthonormal, not just orthogonal! (It might be better if orthogonal matrices were called
orthonormal matrices instead, but unfortunately this is an example of where illogical names have become
too ingrained to change.)

Remark. The equivalence of (iv) and (v) alone is quite remarkable. Given an orthonormal basis for Rn ,
write it down as the rows of an n × n matrix. Magically23 , the columns of the matrix will be another
orthonormal basis! For example, consider the orthonormal basis for R3 which we obtained in the example
in Section 6.4. Writing the basis vectors as the rows of a matrix we get
 1 1 1 
√ √ √
3 3 3
−2
√ √1 √1
A= .
 
6 6 6
−1 √1
0 √
2 2

Theorem 6.8 tells us both that A will be an orthogonal matrix (exercise: check!), and that the vectors
which make up the columns of A will form another orthonormal basis for R3 , namely:
     
1 −2 1 1 −1 1 1 1
√ , √ ,0 , √ , √ , √ , √ , √ , √ .
3 6 3 6 2 3 6 2
In fact, the rows and the columns of a matrix are often connected in deep and mysterious ways; this is an
example of a very general mathematical phenomenon called duality.

23
Disclaimer: the University of Manchester does not really believe in magic. ;-)

42

You might also like