Linear Algebra UCD
Linear Algebra UCD
MATH10390
Dr Richard Smith
i
Contents
Contents iii
0 Programme overview 1
0.1 Programme outline . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Assessment and grading . . . . . . . . . . . . . . . . . . . . . 5
0.3 Continuous assessment schedules . . . . . . . . . . . . . . . 7
0.4 Discussion boards and MathJax . . . . . . . . . . . . . . . . . 9
0.5 Any other business . . . . . . . . . . . . . . . . . . . . . . . . 10
1 Matrices 1 13
1.1 Introduction to matrices . . . . . . . . . . . . . . . . . . . . . 13
1.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Further ideas from matrix arithmetic . . . . . . . . . . . . . . 23
1.4 Inverses and determinants of 2 × 2 matrices . . . . . . . . . 26
1.5 Inverses and determinants of n × n matrices . . . . . . . . . 31
2 Vector Geometry 1 41
2.1 Euclidean n-space and vectors . . . . . . . . . . . . . . . . . 41
2.2 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3 The scalar product . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Matrices acting on vectors . . . . . . . . . . . . . . . . . . . . 58
2.5 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . 59
4 Vector Geometry 2 83
iii
iv Contents
6 Matrices 2 107
6.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Programme overview
• Vector geometry 1
Vectors in n-dimensional Euclidean space, vector arithmetic, scalar
products, the Cauchy-Schwarz inequality, angles between vectors,
and the action of matrices on vectors.
1
2 Programme overview
• Vector geometry 2
Orthonormal lists of vectors, orthonormal bases of Rn and coordi-
nate systems.
• Matrices 2
Quadratic forms and matrix norms.
• Differentiation
Rates of change, differentiation from first principles, relationship
with continuity, the power, product, quotient and chain rules, deriva-
tives of polynomials, trigonometric, exponential and logarithmic func-
tions, and composites thereof.
• Integration
Indefinite integrals as antiderivatives, standard examples, Riemann
sums, definite integrals and area, the Fundamental Theorem of Cal-
culus.
• Methods of integration
Integration by substitution and integration by parts.
• Numerical techniques
Solving equations numerically, the bisection and Newton-Raphson
methods, numerical integration, the trapezoidal rule and Simpson’s
rule.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
4 Programme overview
Online material
1. UCD Mathematics Moodle
All module material will be made available on
Once at the site, please log in using your UCD Connect creden-
tials and then enrol to both modules (the enrolment key for both is
‘ucdprofcert2023’).
3. Continuous assessment
Continuous assessment comes in two forms: written homework and
WeBWorK. Both will be issued and managed online – see Section
0.2 for more details. The full schedule of issue dates and assessment
deadlines is given in Section 0.3.
4. Discussion boards
Students can post queries and discuss topics via the weekly dis-
cussion boards – see Section 0.4 for more information.
vector.ucd.ie/moodle
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
6 Programme overview
Grading
You will receive a mark out of 30 for your continuous assessment, which
will be converted to a letter grade according to the University’s Standard
Conversion Grade Scale (see under Mark to Grade Conversion Scales).
Likewise, you will receive a mark out of 70 for your final exam which will
be converted into a letter grade in the same manner. These two letter
grades will be combined to make an overall module grade; the precise
mechanism by which this will be achieved can be seen under Module
Grade Calculation Points.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
8 Programme overview
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
10 Programme overview
use MathJax in the discussion boards are given in Appendix A.1 (of either
module).
You’re free to use MathJax to post mathematical content. Alternatively
you can post such content by writing it by hand and scanning it to a pdf
(see above) or by using a suitable pdf annotator, and then attaching the
pdf file to your post. This option may be preferable if you want to write
a lot of mathematical content.
If you wish to withdraw from the programme, then please note that it is
essential to do so by Friday 11 August (week 12), to ensure you do not
have a failing grade recorded against your name on the University’s sys-
tem. Since this programme is only one trimester long, it is not possible
to get a refund upon withdrawal; see point 1.7 of UCD’s Refunds Policy.
Extenuating circumstances
The University has an Extenuating Circumstances Policy. The Univer-
sity defines extenuating circumstances to be ‘serious unforeseen circum-
stances beyond your control which prevented you from meeting the re-
quirements of your programme’. Note the following footnote on page 2
of the Student Guide on Extenuating Circumstances:
You can apply for extenuating circumstances online. For more details,
please contact Natalia Zadorozhnyaya ([email protected]).
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 1
Matrices 1
Example 1.2.
!
2 −11 27
1. is a 2 × 3 matrix.
1 0 −5
13
14 Matrices 1
Remarks 1.3.
1. Two matrices have the same size if they have the same number
of rows and the same number of columns, so a 2 × 3 matrix and
a 3 × 2 matrix have different sizes.
5. Two matrices are equal if and only if they have the same size
and the corresponding entries are all equal.
√ !
2 2 −3
Example 1.4. Let A = .
4 0 5
√
Then A12 = 2 (i = 1, j = 2), A23 = 5 (i = 2, j = 3), A13 = −3, etc.
Find A + B and A − B.
Solution. We have
!
2 + (−1) 0+1 −1 + 0 −1 + (−2)
A+B =
1+3 2 + (−3) 4 + 1 2+1
!
1 1 −1 −3
= ,
4 −1 5 3
and
!
2 − (−1) 0−1 −1 − 0 −1 − (−2)
A−B =
1−3 2 − (−3) 4 − 1 2−1
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
16 Matrices 1
!
3 −1 −1 1
= .
−2 5 3 1
(cA)ij = cAij ,
!
2 5
Example 1.8. If A = , then
3 −4
! ! !
4 10 −10 −25 0 0
2A = , −5A = , 0A = .
6 −8 −15 20 0 0
Zero matrices
The humble 0 has a much more exciting history than you might think,
and is in fact a very special number. It is the only number, such that
when added to any number, has no effect: a + 0 = 0 + a = a. For this
reason, mathematicians call it an additive identity.
The last example in Example 1.8 prompts the following definition.
The zero matrices are the matrix analogies of 0: if the m × n zero matrix
is added to any m × n matrix, the matrix remains unchanged.
1.2. Matrix multiplication 17
x1 + x2 + x3 + x4 = 2 + 4 + 6 + 8 = 20. (1.1)
This is fine for small numbers of terms (in this case 4), but very often we
need to add together much larger numbers of terms, such as a thousand
terms or a million terms. Writing out such large numbers of terms as
above would get upsetting very quickly.
Summation notation has been developed to deal with such eventualities.
We will require it for matrix multiplication. The sum of terms x1 + x2 +
x3 + x4 in (1.1) can be expressed instead as
4
X
xi . (1.2)
i=1
P
The symbol is the summation symbol, i is the index of summation and
1 and 4 are the initial value (or lower limit) and final value (or upper
limit) of the index of summation, respectively. Now we can express the
sum of the first thousand terms of the sequence xi concisely as
1000
X
xi .
i=1
and so on.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
18 Matrices 1
Most of the time, the choice of letter to denote the index of summation
does not matter. Letters such as j and k are commonly used instead of
i. For example, the expressions
4
X 4
X
xj and xk
j=1 k=1
for all positive integers i. On the left hand side of (1.4), i is fixed while j
varies. Assuming i stays where it is, we can replace the summation index
j with any letter, provided that it is not i. For example,
4
X 4
X
yik = yi` = 12 i + 22 i + 32 i + 42 i = 30i = zi ,
k=1 `=1
however
4
X
yii (1.5)
i=1
because in each case you are preserving those indices that are fixed and
those that vary, respectively.
Finally, quite often the lower and upper limits of sums can be replaced
by letters. For example, given positive integers k < n, the notation
n
X
xi , (1.6)
i=k
stands for
xk + xk+1 + xk+2 + · · · + xn−1 + xn .
If xi = 2i, as it was initally, then (1.6) happens to be equal to
n(n + 1) − (k − 1)k.
Matrix multiplication
Like the arithmetic of ordinary numbers, matrix multiplication is per- See MATH10400 Fact
formed before addition and subtraction, e.g. AB + C = (AB) + C , not 1.4.
A(B + C ), etc.
When first encountered, this definition is perhaps best understood using
examples.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
20 Matrices 1
φ
x
−1 1 a 4
For those of you who are familiar with complex numbers, this is very
much like representing the complex number z = a + ib in polar form.
r cos φ
If we represent this point as the 2 × 1 column vector r sin φ , then mul- By writing a point as
a column vector, we
tiplication on the left by Rθ gives are not doing anything
deep. We are simply
! ! ! representing the point
r cos φ cos θ − sin θ r cos φ in a slightly different
Rθ = way, to take advantage
r sin φ sin θ cos θ r sin φ of matrix multiplication.
! We return to this idea in
r cos θ cos φ − r sin θ sin φ Section 2.4.
=
r sin θ cos φ + r cos θ sin φ
!
r cos(θ + φ)
= ,
r sin(θ + φ)
−1.16
Rθ ≈
3
(75◦ )
4
2 3.42 θ= 5π
12
θ
x
−2 2 4
Zero divisors
We have already seen that matrices defy the commutative law of multi-
plication: in general AB 6= BA. Here is another example of how matrix
multiplication violates once sacrosanct traditions.
Example 1.15. If
! !
0 1 0 6
A = and B = ,
0 0 0 0
then ! ! !
0 1 0 6 0 0
AB = = ,
0 0 0 0 0 0
which is the 2 × 2 zero matrix.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
24 Matrices 1
for i = 1, . . . , n and j = 1, . . . , m.
In practice, this means that we convert rows into columns, and vice-
versa: the ith row of A is the ith column of AT .
!
1 0 5
Example 1.20. If A = , then
2 −3 7
1 2
AT = 0 −3 .
5 7
Matrix transposes are taken before multiplication, e.g., ABT means A(BT ),
not (AB)T , etc. Notice that if you take the transpose of a matrix twice,
you get back to where you started: AT T = A.
The following type of matrix turns out to be very important, as we will
see in due course.
3 7 8
Fact 1.23. In the following, we assume that the sizes of the matrices
A, B and C are such that all indicated sums and products are defined.
A + B = B + A.
AB 6= BA.
(AB)C = A(BC ).
Observe that in Fact 1.23, we are treating the matrices as whole entities,
rather than as collections of numbers: besides (6), no numbers are present
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
26 Matrices 1
Solution.
! ! !
a b 1 0 a b
AI2 = = = A,
c d 0 1 c d
! ! !
1 0 a b a b
and I2 A = = = A.
0 1 c d c d
Inverses of 2 × 2 matrices
Now that we have the 2 × 2 identity matrix, we can ask which 2 × 2
matrices have multiplicative inverses.
! !
2 1 3 −1
Example 1.26. Let A = and B = . Compute AB
5 3 −5 2
and BA.
Solution.
! ! !
2 1 3 −1 1 0 In the same way that 15
AB = = = I2 , ‘reverses’ the effect of 5,
5 3 −5 2 0 1 by getting us back to 1
( 15 × 5 = 1), so B re-
verses the effect of A by
! ! !
3 −1 2 1 1 0 getting back to I2 (AB =
and BA = = = I2 . BA = I2 ).
−5 2 5 3 0 1
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
28 Matrices 1
It is worth pointing out that there is nothing in Definition 1.27 that stip-
ulates that if B ∈ M2 (R) is an inverse of A ∈ M2 (R), then B is the only
matrix that can perform that task. Fortunately, using a little matrix arith-
metic, we can show that if A has an inverse, then said inverse must indeed
be unique. This accords with our understanding of ordinary numbers: 51
is the only number such that, when multiplied by 5, produces 1.
BA = AB = I2 and C A = AC = I2 .
It follows that
(BA)C = I2 C = C ,
and (BA)C = B(AC ) = BI2 = B, Fact 1.23 (4)
as required.
This is another depar- Thus we can talk about the inverse of a matrix, whenever it exists. Not
every 2 × 2 matrix has an inverse: 0 0 has no inverse, but in addition,
ture from the arith- 0 0
metic of numbers. Ev-
ery non-zero number a
some non-zero matrices like 0 0 don’t have inverses either.
has the multiplicative 0 1
inverse 1/a, but this is
not true of matrices!
ad − bc 6= 0.
1
adj(A).
ad − bc
Proof. Using Fact 1.23 (6) and the observation above, we have
1 1
A· adj(A) = A · adj(A)
ad − bc ad − bc
1
= (ad − bc)I2 = I2 .
ad − bc
1
Likewise, ad−bc adj(A) · A = I2 , so ad−bc
1
adj(A) satisfies Definition 1.27.
−1
Thus A exists and equals ad−bc adj(A).
1
For instance, in Example 1.26 we had A = , so here ad − bc =
2 1
5 3
2 × 3 − 1 × 5 = 1 6= 0 and
!
1 1 3 −1
adj(A) = = B.
ad − bc 1 −5 2
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
30 Matrices 1
A = and B = .
discuss them on the dis-
cussion boards.
13 10 2 35
The same applies to
passages in the text
or solutions of examples Verify that your solutions are correct by showing that AA−1 = A−1 A =
BB−1 = B−1 B = I2 .
which ask you to verify
certain things.
Determinants of 2 × 2 matrices
! ! ! !
2 b 1 a+b
A
0
A = =
1 d 1 c+d
! !
1 a
A =
0 c
x
1 2 3
Example 1.33 (Inverses of rotation matrices). Find det(Rθ ) and (Rθ )−1 .
Solution. We have
θ
x
2 4
Remarks 1.34.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
32 Matrices 1
for 1 6 i, j 6 n.
Recall that the main diagonal of a square matrix is the list of all the
entries running diagonally from top left to bottom right. The entries
along the main diagonal of In are all 1, while all other entries of In are 0.
This definition is a mouthful when first seen. To give you some purchase
on it, we will cover an example when n = 3.
1.5. Inverses and determinants of n × n matrices 33
−4 1 −1
3 1 −8
−4 1 −1
+ − +
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
34 Matrices 1
In the positions marked ‘+’, Cij = Mij , and in the positions marked
‘−’, Cij = −Mij .
We now write down C , the matrix of cofactors of A. It differs from M
in Example 1.38 only according to the pattern of signs above:
+1 −2 +(−6) 1 −2 −6
C = −(−3) +(−1) −13 = 3 −1 −13 .
+3 −1 +(−8) 3 −1 −8
det(A) = A11 C11 + A12 C12 + A13 C13 = 1 · 1 + 3 · (−2) + 0 · (−6) = −5.
Remarks 1.43.
This theorem gives us n + n = 2n different ways of computing det(A). Its You can simplify the
computation of det(A)
proof is beyond the scope of the module. by choosing to expand
along a row or column
We have looked at determinants, but it remains to complete the business having as many zeros
of finding matrix inverses. as possible: if Aij = 0,
then there is no need
to compute Cij , because
Definition 1.45 (Matrix adjugates). Let A ∈ Mn (R). The adjugate Aij Cij = 0.
Solution.
1 3 3
adj(A) = C T = −2 −1 −1 .
−6 −13 −8
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
36 Matrices 1
Solution.
1 3 0 1 3 3
A · adj(A) = 2 −2 1 −2 −1 −1
−4 1 −1 −6 −13 −8
−5 0 0
= 0 −5 0
0 0 −5
= −5I3 = det(A)I3 .
Crumbs! That took some work. That is the first method of finding matrix
inverses given in this module. With a bit of practice, it can be applied
reasonably well to 3 × 3 matrices, but applying it to larger matrices
will, in general, become very painful. The second method, which is more
computationally efficient and scales more effectively to larger matrices,
though conceptually slightly deeper, is given in Section 3.5.
5 1 9
1 3 0 1 3
2 −2 1 2 −2
−4 1 −1 −4 1
Hence we have
Orthogonal matrices
Recall the notion of matrix transpose from Definition 1.19.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
38 Matrices 1
√1 √1 √1
2 6 3
Example 1.51. Show that P = −
√1 √1 √1
2 6 3
is orthogonal.
0 − √26 √1
3
1. det(AT ) = det(A).
The proof of this proposition has been banished to the forbidden zone
that is Appendix C.1.
1.5. Inverses and determinants of n × n matrices 39
Corollary 1.55.
Proof.
The following result summarises the process for finding inverses of square
matrices presented above.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
40 Matrices 1
Proposition 1.57.
Proof.
1. We have
2. Note that
and likewise
Vector Geometry 1
Point P ↔ (x1 , x2 )
Image source:
University of British
Columbia.
In this way, we see that R2 is an algebraic model of the Euclidean plane Have you ever won-
and R3 is an algebraic model of Euclidean space. The ancient Greeks did dered why we live in
a 3-dimensional world
not do geometry in higher dimensions, but by using cartesian coordinates, and not, say, a 5-
we can easily study geometry in dimensions n > 3, using algebra, and dimensional one?
Rn as an algebraic model, even though visualization becomes essentially Certainly, being re-
stricted to 2 dimensions
impossible. would cause digestion
problems, as illustrated
Rn = {(x1 , . . . , xn ) : x1 , . . . , xn ∈ R} .
41
42 Vector Geometry 1
Note that time is not the Example 2.2. Euclidean 3-space is of course a mathematical model
fourth dimension: R4 is
simply a way of repre- for the space we live in, whereas 4-space is the model for ‘space-time’
senting space-time. R3 × R: three spatial dimensions and one temporal dimension, used
for instance in Einstein’s theory of relativity.
(2,3,0)
O
2
Googling ‘high dimen- Any problem which has n variables in it naturally yields points in Rn . The
sional data’ returns
about 361 million
visualization and modeling of ‘high-dimensional’ data sets is a topic of
search results. enormous importance in contemporary science, engineering and industry.
In multivariate statistics, data having n different measurable character-
istics x1 , . . . , xn is represented as ‘data points’ (x1 , . . . , xn ) in Rn – each
dimension corresponds to a different characteristic of the data. Then the
power of linear algebra can be brought to bear to analyse this data –
see, for example, Appendix B.
2.1. Euclidean n-space and vectors 43
Vectors
Now the discussion moves from points in n-space to vectors. Given the Out there in the math-
ematical badlands, al-
right context, many different objects can be called vectors. In this module most anything (e.g. func-
we will focus on just a few. tions in calculus) can be
regarded as a vector in
the right context.
Definition 2.3. In the correct context, a number of objects can be For more information
vectors. about this, look up the
term ‘vector space’.
1. Points (x1 , . . . , xn ) ∈ Rn .
The important thing to note is not the positions of P and Q as such, but
the position of the terminal point Q, relative to the initial point P.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
44 Vector Geometry 1
Definition 2.4. Line segments having the same length and the same
direction are said to be equivalent, and represent the same vector.
Solution.
B = (−3,2)
C A = (2,1)
x
O
(y1 − x1 , y2 − x2 , . . . , yn − xn ).
sponding vectors using
algebra.
OP y
x
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
46 Vector Geometry 1
Of course, the entries of the vector OP equal the coordinates of the point
P. When we treat points as vectors and talk about the ‘length of P’, then
we really mean the length of OP.
y
(2,2)
x
x+y
x
O
The zero vectors in R2 and R3 are (0, 0) and (0, 0, 0), respectively, and so
on. They correspond to the origins of these spaces. Just like the zero
2.2. Vector arithmetic 47
matrices of Definition 1.9, adding a zero vector to a vector of the same The zero vectors don’t
Example 2.11. Let x = (2, 1). What are 3x, −2x, −x and 12 x?
3x
If x is non-zero and c >
0, then cx and x share
x
the same direction. If
x c < 0 then cx points in
the direction opposite to
−2x that of x.
Fact 2.12. For all vectors x, y and z of the same size, and scalars k Compare this with Fact
and c, we have 1.23 (1), (2) and (part
of) (6).They are the same
laws!
1. (x + y) + z = x + (y + z) (associativity)
2. x + y = y + x (commutativity)
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
48 Vector Geometry 1
√
Example 2.13. If x = (4, 3), then x = 42 + 32 = 5.
(4,3)
3
x
4
x2
x
2.2. Vector arithmetic 49
If x = (x1 , x2 , x3 ), then
s
q
q 2 q
x =
(x1 , x2 , 0)
2 + x 2 = x1 + x2 + x32 = x12 + x22 + x32 .
2 2
3
1
x̂ =
x.
x
Then
x̂
=
1
x
= 1
x
i.e. x̂ is the unit vector having the same direction as x.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
50 Vector Geometry 1
(7, 0) = x+y+z
= (3, 0) + (4 cos π4 , −4 sin π4 ) + (d cos θ, d sin θ)
= (3 + 4 cos π4 + d cos θ, −4 sin π4 + d sin θ)
≈ (5.828 + d cos θ, −2.828 + d sin θ).
With a bit more work, We divide to eliminate d: tan θ ≈ 2.413, so θ ≈ 1.178. Finally,
we can show that θ is
exactly 38 π, or 67.5◦ .
d ≈ 1.172/ cos θ ≈ 3.06 km.
y z
θ
2.3. The scalar product 51
Definition 2.19. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) be vectors Don’t confuse the scalar
i=1
Fact 2.20. The following hold for all vectors x, y and z in Rn , and These facts are ex-
scalars c ∈ R. tremely useful! You will
gain great powers if
you can master them.
1. x · y = y · x.
2. x · x = x 2 .
3. x · (y + z) = x · y + x · z and (x + y) · z = x · z + y · z.
1. We have y · x = y1 x1 + · · · + yn xn = x1 y1 + · · · + xn yn = x · y.
3. Since
y + z = (y1 + z1 , . . . , yn + zn ),
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
52 Vector Geometry 1
it follows that
x · (y + z) = x1 (y1 + z1 ) + · · · + xn (yn + zn )
= (x1 y1 + · · · + xn yn ) + (x1 z1 + · · · + xn zn )
= x · y + x · z.
There are several established proofs of this result. The one given below
uses an intermediate result that is hugely significant in its own right.
u+v 2 = u 2+ v 2.
u + v 2 = (u + v) · (u + v)
Fact 2.20 (2)
2.3. The scalar product 53
as u · v = 0. Finally,
p
u 6 u 2+ v 2 = u+v ,
The geometric import of Theorem 2.22 is revealed later in Remark 2.27. Throughout these two
proofs, we treat u, v, x
and y as whole enti-
Proof of Theorem 2.21. Set ties and rely largely on
Fact 2.20 to manipulate
them. We do not (and
u = (x · y)x v = x y − u = x 2 y − (x · y)x.
2
and should not) concern our-
selves with the entries
u1 , u2 , . . . of the vectors
As cw = |c| w for any scalar c and vector w (see remarks after
– doing so could easily
have u = |x · y| x . Moreover, u + v = x 2 y, so
get us into a big pig’s
Definition 2.14),
we
2
u+v = x y
. Next, we show that u · v = 0. Indeed, we have
breakfast, and we want
to minimise the number
of such breakfasts.
u · v = u · ( x 2 y) + u · (−u)
In short, life is made
Fact 2.20 (3)
easier by Fact 2.20!
= x 2 (u · y) − u 2
Fact 2.20 (2), (4)
= x 2 (((x · y)x) · y) − (|x · y| x )2
= x 2 (x · y)2 − (x · y)2 x 2 = 0.
Fact 2.20 (4)
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
54 Vector Geometry 1
The proof of this fact Given any real number r in the range −1 6 r 6 1, there is a unique
belongs to a calculus
course. It is outside the
number θ in the range 0 6 θ 6 π, such that
scope of MATH10400.
cos θ = r.
π
2 θ π
x
r
−1
It is important to check that this definition of angle does indeed fit with
our geometric intuition. We do this in R2 , for which we require the cosine
rule:
c 2 = a2 + b2 − 2ab cos θ,
where a, b and c are the side lengths of the triangle as below, and θ is
the angle as below.
c
a
θ A
O b
Now
z = (x1 − y1 )2 + (x2 − y2 )2 = x12 − 2x1 y1 + y21 + x22 − 2x2 y2 + y22 ,
2
and
2
2
x +
y
− 2 x
y
cos θ = (x12 + x22 ) + (y21 + y22 ) − 2 x
y
cos θ,
so by cancelling the terms x12 , y21 , x22 and y22 from both sides, we obtain
−2x1 y1 − 2x2 y2 = −2 x
y
cos θ
⇒ x · y = x1 y1 + x2 y2 = x
y
cos θ
x·y
⇒
= cos θ.
x
y
This result matches perfectly with Definition 2.24!
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
56 Vector Geometry 1
B
x
y
x
−1 6
2.3. The scalar product 57
u+v 2 = u 2+ v 2,
u+v
v
u
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
58 Vector Geometry 1
x = .. ,
.
xn
to take advantage of matrix multiplication. With this in mind, the matrix
product Ax is a m × 1 column vector which, in turn, can be regarded as a
point or vector in Rm . In this way, we can use the matrix A to transform
points or vectors in Rn to points or vectors in Rm .
Quite often, we concentrate on the case where m = n, i.e., when A ∈
Mn (R) and x and Ax belong to the same space Rn . The rotation of points
in 2 dimensions is one such example. There are 3×3 matrices that rotate
points in 3 dimensions, and others that perform many more geometric
actions besides.
The next result ties together matrix transposes and scalar products and
has many applications.
(Px) · (Py) = x · y.
2.5. Orthogonal projections 59
In particular,
Px
= x .
This makes complete sense when you consider rotations. When you
rotate an object you don’t change the length of anything, nor do you
change any angles between anything in the object (whereas stretching,
twisting or otherwise deforming the object may alter such things).
Proposition 2.29 is used in Sections 4.2 and 6.1, and Appendix B.
In Figure 2.15 below, L and y are represented by the dotted and dashed
lines, respectively, and the line from x to y is highlighted in red. As you
can see, the red line is orthogonal to L.
How do we determine y = projL (x)? The vector y has to lie on the line L,
which means that it must be a scalar multiple of v. So let’s set y = cv,
where c is to be determined. We need the red line from x to y to be
orthogonal to L. The red line is parallel to the vector x − y, and v is
parallel to L. If we take the scalar product of x − y with v, we obtain
(x − y) · v = x · v − y · v = x · v − c(v · v) = x · v − c,
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
60 Vector Geometry 1
y = projL (x)
0 θ
v
In the figure above, From the point of view of trigonometry, this makes perfect sense. Let θ
θ is acute and cos θ
is positive. However,
be the angle between v and x. Given the right-angled triangle in the
θ may be obtuse, so picture above, the
length of y should equal the length of x, multiplied
by | cos θ|, i.e. y = x | cos θ|. From
we require the abso-
lute value | cos θ|, rather
Definition 2.24, we know that
x · v = x v cos θ = x cos θ, thus
y
should equal |x · v|, and indeed
than cos θ.
it does, again because v = 1.
Solution.
In both cases you can
and should verify that 1. We have projL (x) = (x · v)v = 19
v = 19
(1, −1, 1, −1).
(x − projL (x)) · v = 0.
2 4
2. Here, v is not a unit vector, so the formula above does not apply.
In these cases, observe that L is also parallel to v̂, where v̂ is
the unit vector parallel to v defined after Definition 2.15. Thus
This formula for projL (x)
(x · v)v
projL (x) = (x · v̂)v̂ =
2 .
works for any non-zero
vector v.
v
In this example we have
(x · v)
projL (x) =
2 v = 25
v = 25
(3, 2, 1).
v 14 14
2.5. Orthogonal projections 61
The exercise above shows that y is the vector on the line L that is closest
to x.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 3
63
64 Systems of Linear Equations
2 ( 21 ,2)
x
−2 2 4
Personal computers
It is sometimes necessary to find solutions that solve several linear equa-
tions simultaneously. For example, weather prediction models often re-
quire simultaneous solutions of hundreds of thousands of such equations.
Simple examples are re- Example 3.4. We consider a straightforward example about making
quired when introducing computers. Suppose that a factory makes two models of personal
a topic. However, sim-
ple examples often look computer: one for standard use, that takes 2 hours to build, and a
contrived. Please ac-
cept my apologies in ad-
second, more powerful model for gaming purposes, that takes 3 hours
vance. to build. Imagine that the factory can produce a maximum of 300
computers per week, and has at its disposal a total of 800 hours of
labour per week. How many computers of each type should be built,
in order to maximise capacity and time?
x + y = 300
(3.1)
2x + 3y = 800.
3.2. Elementary row operations 65
x + 3
2
y = 400. (3.2)
If we then subtract the first equation in (3.1) above from (3.2), we get
1
2
y = 100,
3 1 −2 9
applies if our linear sys-
eqn 2 tem has a unique solu-
tion, and this is not true
−1 4 2 0 eqn 3 in all cases.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
66 Systems of Linear Equations
Such operations change the system of equations, but preserve the set of
simultaneous solutions of the system.
The operations correspond to the following operations on the augmented
matrix:
3. swap two rows in the matrix (this only amounts to writing down the
equations of the system in a different order).
Step 2
R2 3 1 −2 9
−
3R1 3 6 −3 15
new R2 0 −5 1 −6
Step 3
R2 0 −5 1 −6
+
R3 0 6 1 5
new R2 0 1 2 −1
Step 4
R3 0 6 1 5
−
6R2 0 6 12 −6
new R3 0 0 −11 11
Step 5
R3 0 0 −11 11
− 11 R3 0 0
1
1 −1
We have produced a new, simpler, system of equations: The point of using EROs
is that the system of
equations gets simpler,
x1 + 2x2 − x3 = 5 (A) but the solutions of the
system are preserved.
x2 + 2x3 = −1 (B) So the solutions of the
final system are the
x3 = −1 (C), same as the solutions of
the original system.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
68 Systems of Linear Equations
4. Any zero rows (rows consisting entirely of 0s) are grouped to-
gether at the bottom of the matrix.
Gauss-Jordan elimination
Theorems 3.12 and 3.15 lie at the heart of the new strategy for solving
linear systems. The proof of the first is algorithmic, as it describes a
method that can be implemented. We include the proof as it is of con-
siderable practical use when solving linear systems. There is a video to
accompany the proof, in which we apply the algorithm in the proof to
Example 3.13.
Proof. If A is the zero m × n matrix then there is nothing to do as A This algorithm gen-
is already in row-echelon form. If A is non-zero, look in A for the first erates a sequence of
EROs that is guar-
column (from the left) having a non-zero entry. This column is called the anteed to produce a
pivot column, and the first non-zero entry in the pivot column is called row-echelon matrix.
the pivot. Suppose the pivot column is column j and the pivot occurs in
However, this sequence
of EROs not the only
row i. Now interchange, if necessary, rows 1 and i and call the resulting one that does this! In
matrix B:
practice, in specific
cases it can be better to
R1 ↔ Ri A → B. use a different sequence
of EROs – see marginal
notes accompanying
Thus the pivot B1j is non-zero. Now perform the ERO Example 3.17.
Also be aware that, after
an alternate reduction,
1
R1 → × R1 B → C. you may get a REF that
B1j is different from the one
that the theorem pro-
vides. Such REFs are
Note that C1j = 1. Now whenever Ckj , 2 6 k 6 m, is non-zero, perform not unique!
the ERO
Rk → Rk − Ckj × R1
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
70 Systems of Linear Equations
−2 −4 9 11 −15
−2 −4 9 11 −15
x1 + 2x2 − x3 = 5
3x1 + x2 − 2x3 = 9
−x1 + 4x2 + 2x3 = 0
Solution. See accompanying video. On the video, we begin at step In the video accompa-
nying Example 3.5, we
5, which the the final step on the video accompanying Example 3.5. did not follow the al-
gorithm in the proof of
The RREF in this case is Theorem 3.12 exactly. In
step 2, we performed
1 0 0 2 R2 → R2 + R3, in-
stead of R2 → − 51 R2
,
0 1 0 1 (the ERO suggested by
0 0 1 −1
the algorithm). Both
EROs turn the pivot in
column 2 into a 1, but
the first ERO avoids
fractions (which can in-
which corresponds to the linear system
crease errors in arith-
x1 = 2 metic).
x2 = 1
x3 = −1.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
72 Systems of Linear Equations
Using EROs in this way preserves the set of solutions of the system.
By reducing to the RREF, we have passed from a system that is hard to
solve to another that is trivial to solve. Moreover, the set of solutions
has been preserved, so the solution to the final system is the same as
the solution to the original system (and of every intermediate one).
This method of solving linear systems by reducing the augmented matrix
to reduced row-echelon form is called Gauss–Jordan elimination.
x1 + 2x2 = − 14
x3 = − 13
4
x4 = 5
4
.
The RREF involves 3 leading 1s, one in each of the columns corre-
sponding to the variables x1 , x3 and x4 . The column corresponding to
x2 contains no leading 1. We distinguish between these cases.
A variable whose associated column in the RREF contains a leading
1 is called a leading variable. A variable whose column in the RREF
does not contain a leading 1 is called a free variable. In this case
x1 , x3 and x4 are leading variables and x2 is free.
To each free variable we associate a parameter. Here, we let x2 = t.
Then we can express the remaining leading variables in terms of these
parameters to get a parametric solution: Here, x3 and x4 don’t de-
pend on t, but there are
x1 = − 41 − 2t, x2 = t, x3 = − 13
4
, x4 = 5
4
. situations where all the
variables depend on pa-
rameters. Dependency
This parametric solution describes all the (infinitely many) possible on parameters varies
from case to case.
solutions of the system.
Inconsistent systems
We have dealt with systems having infinitely many solutions, by virtue
of parameters. It is also possible for a given system to have no solutions
at all.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
74 Systems of Linear Equations
x1 + 2x2 + 3x3 = 5
2x1 + 5x2 + 7x3 = 13
3x2 + 3x3 = 10
An inconsistent system If a system is inconsistent, a REF obtained from its augmented matrix
will always betray itself
in this way – this is how
will include a row of the form 0 0 0 . . . 0 1, i.e. it will have a leading 1 in
we spot them! its rightmost column. Such a row corresponds to an equation of the form
0x1 + 0x2 + · · · + 0xn = 1, which has no solution.
x
−2 2
3.5. Connections between linear systems and matrices 75
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
76 Systems of Linear Equations
by aij , and the right hand side is bi . Then our system can be written
Now consider the m × n matrix A whose (i, j)-entry is aij (so Aij = aij ),
and the n × 1 and m × 1 column vectors
x1 b1
x2 b2
x = .
and b = .
,
.
. ..
xn bm
encapsulates the left hand side of our whole system, and thus our system
of m equations can be rewritten as the single matrix equation
Ax = b.
x1 + 2x2 − x3 = 5
3x1 + x2 − 2x3 = 9
−x1 + 4x2 + 2x3 = 0.
−1 4 2 x3 0
Theorem 3.23 gives a full description of the solutions of the matrix equa-
tion Ax = b, when m = n.
Theorem 3.23. Let A ∈ Mn (R) and let x and b be n×1 column vectors.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
78 Systems of Linear Equations
The first part of this theorem was proved above, just before Example 3.22.
The proof of part 2 is beyond the scope of the module.
If A is not invertible, then the question of whether or not the equation
Ax = b has any solutions depends on the specific nature of A and b. This
problem must be approached on a case by case basis.
1 4 2
Solution. The inverse A−1 (if it exists) can be found using EROs as
follows.
1 4 2 0 0 1
1 4 2 0 0 1
1 3 1 1 0 0
R2 → R2 − 2R1 0 −6 −3 −2 1 0
R3 → R3 − R1 0 1 1 −1 0 1
3.5. Connections between linear systems and matrices 79
1 3 1 1 0 0
R3 ↔ R2 1 −1 0 1
0 1
0 −6 −3 −2 1 0
R1 → R1 − 3R2 1 0 −2 4 0 −3
1 −1 0
0 1 1
R3 → R3 + 6R2 0 0 3 −8 1 6
1 0 −2 4 0 −3
1 −1 0
0 1 1
R3 → 13 R3 0 0 1 −3 3
8 1
2
R1 → R1 + 2R3 1 0 0 − 43 2
3
1
R2 → R2 − R3 − 3 −1 .
5 1
0 1 0 3
0 0 1 −3 8 1
3
2
−38 1
3
2 −8 1 6
The proof of Theorem 3.25 is again beyond the scope of the module.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
80 Systems of Linear Equations
1 2 −1
Example 3.26. Does A = 0 1
1 have an inverse?
1 4 1
1 2 −1 1 0 0
Solution. 1. Form the 3 × 6 matrix B = 0 1
1 0 1 0 .
1 4 1 0 0 1
Matrix rank
Finally, we take a very brief look at the rank of a matrix.
While a matrix can have more than one REF in general, the number of
non-zero rows of any such REF will always be the same (we won’t prove
this fact).
The notion of matrix rank can give us information about the number of
solutions and the number of parameters of solutions of systems of linear
equations.
Imagine that we have a system of equations in n variables. The system
is consistent, i.e. has a solution, if and only if the rank of the coefficient
matrix A of the system equals the rank of the augmented matrix. If this
is the case then we have more. We always have rank(A) 6 n. Moreover,
if rank(A) = n then the system has a unique solution. Otherwise, the
system has infinitely many solutions, and the parametric solution will
involve n − rank(A) > 1 parameters.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 4
Vector Geometry 2
2. vi · vj = 0 whenever 1 6 i, j 6 k and i 6= j.
A list of vectors is orthonormal if each vector has unit length, and if each
vector is orthogonal to every other vector in the list. Below are the
archetypal examples of orthonormal lists of vectors.
83
84 Vector Geometry 2
Example 4.3.
O
e1
You should verify that the next examples yield orthonormal lists of vectors.
Example 4.4.
Remarks 4.5. Notice that if v has unit length, then the 1-element list
v is orthonormal. Criterion (1) of Definition 4.2 is evidently fulfilled.
Criterion (2) is also fulfilled, because it can only fail if we can find
two vectors in the list that are not orthogonal to each other!
Let us consider further the list e1 , . . . , en in Rn from Example 4.3 (3). Take
another vector x = (x1 , . . . , xn ) in Rn . The ith entry of x is the number xi .
Observe that
x · ei = (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) · (0, . . . , 0, 1, 0, . . . , 0) = xi .
Thus, the entries xi of x are equal to the scalar products x · ei , 1 6 i 6 n.
Observe further that
x1 e1 + · · · + xn en = x1 (1, 0, . . . , 0) + · · · + xn (0, . . . , 0, 1)
= (x1 , 0, . . . , 0) + · · · + (0, . . . , 0, xn )
= (x1 , . . . , xn ) = x.
This equality holds for every vector x in Rn . It turns out that (4.1) is
part of a wider pattern exhibited by certain special lists of orthonormal
vectors.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
86 Vector Geometry 2
v1 , . . . , vn ,
We will examine the consequences of Theorem 4.6 one by one. Part (1)
of the theorem imposes a limit on the length of lists of on vectors in Rn .
For example, it is not possible to have a list of 4 on vectors in R3 .
Theorem 4.6 part (2) requires more attention. First, notice that (4.1) is a
special case of (4.2), applied to the vectors e1 , . . . , en . Next, consider the
following definitions, which are prompted by this part of the theorem.
Let’s see how Theorem 4.6 part (2), (4.2), and the ideas in Definition 4.7
apply in some examples.
Example 4.8.
√
and x · v2 = (2, 4) · √1 (−1, 1)
2
= √2
2
= 2,
with respect to v1 , v2 .
Moreover, we can see that
√ √ √ √
3 2v1 + 2v2 = 3 2 √12 (1, 1) + 2 √12 (−1, 1)
= (3, 3) + (−1, 1)
= (2, 4) = x. There is a video to ac-
company this example.
4 √
2v2 = (−1,1)
x
√
3 2v1 = (3,3)
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
88 Vector Geometry 2
than the usual ones. We consider this in the figure below, where x, v1
and v2 are as in Example 4.8 (1).
√
3 2 = x·v1
x
See the accompanying
short video.
√
x·v2 = 2
v2 v1
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
90 Vector Geometry 2
respect to the new system are given by x·w1 , . . . , x·wn . Using Proposition
2.28, notice that
x · wi = x · (Pei ) = (P T x) · ei ,
and this is the ith entry of the vector P T x, because the ith entry of ei
equals 1 and all other entries of ei are zero. Hence the coordinates of
x with respect to the new system are precisely the entries of the vector
P T x. Let’s see this in action.
Example 4.11. Recall Examples 4.10 and 4.8 (2). Show that the co-
ordinates of the vector x = (3, 4, 5) with respect to w1 , w2 , w3 are the
entries in P T x.
Solution. From Example 4.8 (2), we know that the new coordinates
are − √12 , − √36 and √123 . Computation of P T x yields
√1
2
− √12 0 3 − √12
PT x = √1 √1 − √26 4 = − √36 ,
6 6
√1 √1 √1 5 √
12
3 3 3 3
as required.
z = a1 v1 + · · · + ak vk .
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 5
93
94 Eigenvalues and Eigenvectors of Matrices
! ! !
a −a a
and A = = (−1) ·
−a a −a
for any a ∈ R.
4
! !
2 −2
A =
−2 2
2
x
−4 −2 2 4
!
2
−2
Ax = λx
number c, we have
Warning 5.5. In this module, we have dealt exclusively with real num-
bers. Our matrices and vectors have been composed entirely of such
numbers. It turns out that some matrices, consisting entirely of real
numbers, can have eigenvalues that are complex numbers, that is,
numbers of the form a + ib, where a, b ∈ R and i is an imaginary
number satisfying i2 = −1. Moreover, in such cases, the correspond-
ing eigenvectors consist of complex numbers in general.
For example, it turns out that the rotation matrix Rθ from Example
1.2 (2) has eigenvalues cos θ ± i sin θ, which are not real numbers in
general (they are real only if θ is an integer multiple of π).
Hereafter, all the eigenvalues of the matrices considered in examples
will be real!
Seeking eigenvalues
Given a general matrix A ∈ Mn (R), how can we find its eigenvalues and
eigenvectors? The next theorem tells us how to find both. In practice, we
usually find the eigenvalues before finding the corresponding eigenvec-
tors.
0
where 0 denotes the n × 1 zero column vector. Hence,
0 = Ax − λx = Ax − λIn x by Theorem 1.36
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
96 Eigenvalues and Eigenvectors of Matrices
= (A − λIn ) x.
| {z }
a n×n matrix
By Theorem 3.23, this matrix equation has a unique solution if A−λIn has
an inverse, and either no solutions or infinitely many solutions otherwise.
Notice that x = 0 is always a solution of this equation. Since we are
looking for non-zero solutions, we require A − λIn to be singular, i.e. not
invertible (if A − λIn were invertible, then the zero vector 0 would be the
only solution). If A − λIn is singular, then since 0 is always a solution,
we will have infinitely many solutions. In particular, we will have a non-
zero solution. Now A − λIn is singular if and only if det(A − λIn ) = 0, by
Theorem 1.56.
Solution. We have
! ! !
0 1 λ 0 −λ 1
A − λI2 = − = .
1 0 0 λ 1 −λ
Hence
!
−λ 1
det(A − λI2 ) = det = (−λ)(−λ) − 1 × 1 = λ2 − 1.
1 −λ
det(A − λIn ) = 0
A = 0 −1 −8 .
1 0 −2
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
98 Eigenvalues and Eigenvectors of Matrices
0 = det(A − λI3 )
5−λ 6 2
−1 − λ −8
= det 0
1 0 −2 − λ
! !
−1 − λ −8 6 2
= (5 − λ) det + det
0 −2 − λ −1 − λ −8
= (5 − λ)(1 + λ)(2 + λ) + −48 + 2(1 + λ)
= (5 − λ)(λ2 + 3λ + 2) + 2λ − 46
= −λ3 + 2λ2 + 15λ − 36.
±1, ±2, ±3, ±4, ±6, ±9, ±12, ±18, and ± 36.
p(1) = −1 + 2 + 15 − 36 6= 0
p(−1) = 1 + 2 − 15 − 36 6= 0
p(2) = −8 + 8 + 30 − 36 6= 0
p(−2) = 8 + 8 − 30 − 36 6= 0
p(3) = −27 + 18 + 45 − 36 = 0.
0 = det(A − λI3 )
8−λ −3 −3
= det −3 8 − λ −3 .
−3 −3 8 − λ
Sometimes, as in this
example, there are
= (8 − λ) (8 − λ)2 − 9 + 3 −3(8 − λ) − 9 − 3 9 + 3(8 − λ)
‘smarter’ ways of evalu-
ating such determinants
λ1 = λ2 = 11 and λ3 = 2,
5.3 Eigenvectors
We know now how to find eigenvalues. How do we find the correspond-
ing eigenvectors? The matrix equation (A − λIn )x = 0 in the proof of
Theorem 5.6 may be regarded as a system of linear equations in which
the coefficient matrix is A − λIn and the variables are the n entries of the
column vector x, which we can denote by x1 , . . . , xn – see Section 3.5.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
100 Eigenvalues and Eigenvectors of Matrices
xn 0
1 0 −2
(A − (−4)I3 )x = (A + 4I3 )x = 0.
In other words,
9 6 2 x1 0
0 3 −8 x2 = 0 .
1 0 2 x3 0
This fact has a practi-
cal benefit when find-
ing eigenvectors: if your
We will use EROs to solve this system of linear equations. We know
linear system does not that the determinant of the matrix on the left is 0, so we will get
have a parametric solu-
tion, then you know that either no solutions or a parametric solution, yielding infinitely many
you have done some- solutions. However, since the zero vector solves the system, we know
thing wrong! When
finding eigenvectors, the therefore that we will get a parametric solution.
number of parameters,
i.e. the number of free
variables, equals the 9 6 2 0
number of zero rows in
0 3 −8 0
your RREF. Thus there
must be at least one
zero row in your RREF 1 0 2 0
– if not then you have
made a mistake.
5.3. Eigenvectors 101
R1 ↔ R3 1 0 2 0
0 3 −8 0
9 6 2 0
1 0 2 0
0 3 −8 0
t 1
1 0 −2 3 −12 3
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
102 Eigenvalues and Eigenvectors of Matrices
−3 −3 6 x3 0
0 0 0 0 0 0
t 1
We seek a value of t to produce v3 , in such a way that
v3
= 1.
√
Since x = |t| 3, we can choose t = √13 or t = − √13 . Either choice
5.3. Eigenvectors 103
−3 −3 −3 x3 0
−3 −3 −3 0 0 0
t 0 1
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
104 Eigenvalues and Eigenvectors of Matrices
0 t
−2s −2
Bearing in the mind the condition
v2
= 1, we see that s = √1
6
or
s = − √16 . Either will do: let’s set s = √16 .
We have ensured that v1 and v2 are orthogonal unit vectors. What
about orthogonality to v3 ? As it happens, we can see by inspection
that v3 is orthogonal to both v1 and v2 , so we are done.
Theorem 5.14 (2) explains the happy orthogonality ‘accident’ in the solu-
tion of Example 5.13. The vectors v1 and v2 corresponded to λ1 = λ2 = 11,
while v3 corresponded to λ3 = 2. Hence we were bound to have v1 · v3 =
v2 · v3 = 0, no matter what our choices of v1 and v2 were.
This helps in such examples: given a symmetric matrix, we don’t need
to worry about the orthogonality of eigenvectors having different eigen-
values. We still need to engineer orthogonality between eigenvectors
having the same eigenvalue, however, as in the case of v1 and v2 above.
Theorem 5.14 (3) will help enormously when we consider quadratic forms
in Section 6.1. There, we will see the utility of listing the eigenvalues in
descending order.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
106 Eigenvalues and Eigenvectors of Matrices
be the orthogonal matrix from Example 1.51. Verify that Pe1 , Pe2
and Pe3 are the eigenvectors v1 , v2 and v3 found in Example 5.13,
and that
11 0 0
P T AP = 0 11 0 ,
0 0 2
is a diagonal matrix, with the eigenvalues λ1 > λ2 > λ3 of A found in
Example 5.11 running along the main diagonal.
Chapter 6
Matrices 2
107
108 Matrices 2
Quadratic forms on R2 , Figure 6.1: q(x) = 5x12 − 3x22 − 4x1 x2 plotted as a 3D surface
or indeed functions
on R2 in general,
can be plotted as 3D
surfaces: the value
of f(x1 , x2 ) yields the
height of the surface
above the point (x1 , x2 ).
These days, plotting
such things with
computers is quite easy,
and doing so helps
to put flesh on them.
There are some free
graphing apps, such
as Quick Graph for
iOS, which enable the
user to quickly sketch
quadratic forms (and
more general functions)
on R2 , and view them
from different angles.
is not a quadratic form, because the term x12 x2 does not fit into the pattern
above.
6.1. Quadratic forms 109
Having that rough idea in mind, we gingerly approach the formal defini-
tion.
Example 6.3.
Example 6.4.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
110 Matrices 2
observe that
! ! !!
x1 1 0 x1
x · (Ax) = ·
x2 0 1 x2
! !
x1 x1
= · = x12 + x22 ,
x2 x2
! ! !!
x1 5 −2 x1
x · (Ax) = ·
x2 −2 −3 x2
! !
x1 5x1 − 2x2
= ·
x2 −2x1 − 3x2
= x1 (2x1 − 11
x + 3x3 ) + x2 (− 11
2 2 2 1
x − 5x2 ) + x3 (3x1 + x3 )
= 2x12 − 5x2 + x3 − 11x1 x2 + 6x1 x3 ,
2 2
symmetric matrix and the scalar product. We won’t prove the next result.
q(x) = x · (Ax).
∂2 f π ∂2 f π
! !
√1
∂x12 ∂x1 ∂x2
(3, ) (3, ) 0
Hf (a) = ,
4 4 2
∂ f π ∂ f π
=
− √32
2 2
√1
∂x2 ∂x1 ∂x22
(3, 4
) (3, 4
) 2
√
which has corresponding quadratic form q(x) = − √32 x22 + 2x1 x2 . It is
important to note that the quadratic form will generally vary with a:
try different values of a above and see what happens.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
112 Matrices 2
x = y1 (Pe1 ) + · · · + yn (Pen ).
Solution. Happily for us, the matrix A associated with the quadratic
form is the same matrix as in Examples 5.11, 5.13 and 5.15. We collect
the corresponding orthogonal matrix P from Example 5.15 and set
y = P T x. Bearing in mind the eigenvalues of A, we obtain
This solution, while formally correct, seems rather terse and ill-mannered.
In particular, it doesn’t really reveal what is happening, so let’s take a
closer look. If we evaluate y we obtain
1
y1 √
2
− √12 0 x1
T
y2 = y = P x = √16 √16 − √26 x2
y3 √1
3
√1
3
√1
3
x3
√1 x1 − √1 x2
2 2
= √16 x1 + √16 x2 − √26 x3 .
√1 x1 + √1 x2 + √1 x3
3 3 3
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
114 Matrices 2
q(x) = λ1 1 + λ2 0 + · · · + λn 0 = λ1 .
Example 6.9. The maximum and minimum values that the quadratic
form q(x) in Example 6.7 can take, subject to the condition x = 1,
Note that if you are are 11 and 2, respectively. Verify that these values are attained at
asked to find these max-
√1
imum and minimum val-
ues of a quadratic form, √1
2 3
Pe1 = − √12 and Pe3 = √1 ,
and nothing else, then
all you need to do is 3
find the eigenvalues of
0 √1
the associated symmet- 3
ric matrix and pick out
the largest and smallest
ones.
respectively (you can obtain the maximum at Pe2 as well).
Definition
6.12 (Matrix norms). Let A ∈
Mn
(R). The norm of A, written
A
, is defined to be the maximum of
Ax
, subject to the constraint
that x = 1.
At first glance, it may not be obvious that there should be such a maximum
number
at all.
But there is, courtesy of a particular quadratic form. If we
square Ax
, we obtain
2
Ax
= (Ax) · (Ax) = x · (AT Ax),
by Fact 2.20 (2) and Proposition 2.28. This defines a quadratic form,
because AT A is symmetric, no matter what the original matrix A is up
to! Indeed, (AT A)T = AT AT T = AT A, using Fact 1.23 (7), and the fact that
AT T = A for any matrix.
2
Therefore, by Proposition 6.8, the maximum value of
Ax
, subject to the
constraint that x = 1, is equal to λ, where λ is the largest eigenvalue
of the symmetric matrix AT A. Taking positive square roots of both sides
now yields
√
A
= λ.
Note that λ is non-negative (so we can take its square root legitimately),
2
because it is the maximum value of
Ax
, which can never be negative.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
116 Matrices 2
!
1 1
Example 6.13. Compute
A
, where A = .
0 1
117
118 Discussion board and WeBWorK guides
$$aˆ4 = 3$$
4. Surds
√ √
Expressions like 2 and 5 11 can be obtained by writing $\sqrt{2}$
or $\sqrt[5]{11}$, respectively (note the use of square brackets
as well as curly ones in the second example).
6. Standard functions
The commands $\sin$, $\cos$, $\tan$ and $\log$ produce the
standard functions sin, cos, tan and log. For example,
$\cos(x) = \frac{\sqrt{3}}{2}$
√
3
produces cos(x) = 2
.
7. Summation and integration
Use the commands $\sum$ and $\int$, together with ˆ and _ and
curly braces, to write expressions involving summation and integra-
tion. For instance,
$\sum_{k=1}ˆn k = \frac{1}{2}n(n-1)$
Pn
produces k=1 k = 12 n(n − 1), and
$\int_1ˆ2 xˆ2 dx = \frac{7}{3}$
R2
gives 1
x 2 dx = 73 .
8. Greek characters and special symbols
The Greek letters α, β, θ and π etc can be expressed using $\alpha$,
$\beta$, $\theta$ and $\pi$, respectively. Symbols such as R and
≈ require $\mathbb{R}$ and $\approx$, respectively.
9. Matrices
Alas, there is no quick way to write down matrices properly using
MathJax, because doing so requires a so-called ‘LaTeX environ-
ment’.
To begin, type \begin{pmatrix}. Then type in the entries of the
first row, separating each one by an ampersand & character. When
you reach end of the first row, type \\. Add the entries of the
second row as above, and repeat until you have reached the end
of the final row. To finish, type \end{pmatrix} (you do not need to
add \\ at the end of the final row).
Perhaps an example explains all of this best. Typing
$$\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$$
will produce !
1 2
.
3 4
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
120 Discussion board and WeBWorK guides
The best way to learn this stuff is through practice and experimenta-
tion. You can do so by using this Live Demo. The examples above can
be adapted and combined in all sorts of ways to produce expressions
of greater complexity (though it should not be necessary to write enor-
mously complicated expressions!).
Be mindful when using the curly braces { and }. MathJax will com-
plain with error messages, or will not render your expression properly, if
they are missing or are in the wrong place. Every opening { requires a
corresponding closing } (correctly placed).
If you are in any doubt about how WeBWorK is going to interpret your
answer, press the ‘Preview Answers’ button.
A.2. How to use WeBWorK 121
http://webwork.maa.org/wiki/Available_Functions
Set 3, question 10
The definition of row-echelon form in the notes (Definition 3.8) does not
agree with the one used here. Here, the first non-zero entry of a row
does not have to be equal to 1. Such an entry is called a ‘leading entry’,
instead of a ‘leading 1’. In Definition 3.8 (2) and (3), replace ‘leading 1’
by ‘leading entry’.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
122 Discussion board and WeBWorK guides
The definition of reduced row-echelon form is the same as the one given
in the notes. In particular, all leading entries must be leading 1s.
x1 + x2 − 3x3 = −9
4x1 + 3x2 + 2x3 = −5,
(the numbers of equations and variables will vary from question to ques-
tion). In each case, the format of the answer suggests that you will find
infinitely many solutions involving a single parameter, denoted by s.
So suppose that your general solution to the system above is
x1 = 3 + 2s
x2 = 2 − 7s
x3 = s
123
124 Principal component analysis (non-examinable)
hence m
X m
X
T
yi yi = yij yik = Cjk ,
i=1 jk i=1
for 1 6 j, k 6 p.
Also, observe that, given two n × 1 column vectors a and b, we see that
a · b equals the single entry in the identical 1 × 1 matrices aT b and bT a.
We abuse notation slightly and write a · b = aT b = bT a. With this in
mind,
n
X n
X
(yi · v) =
2
(vT yi )(yi T v)
i=1 i=1
Xn
= vT (yi yi T )v
i=1
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
126 Principal component analysis (non-examinable)
n
X
T T
= v yi yi v = vT C v = v · (C v), (B.3)
i=1
using Fact 1.23. This yields our quadratic form q, by Theorem 6.5.
Since C is symmetric, by Theorem 5.14, there is an orthogonal matrix P ∈
Mn (R), such that the on basis Pe1 , . . . , Pen of Rn consists of eigenvectors
of C , having corresponding eigenvalues λ1 , . . . , λn , listed in descending
order. Moreover, the product P T C P is the diagonal matrix
λ1 0 ... 0
λ2
0 0
.
.. ... ..
. .
0 ... 0 λn
wi − (wi · v2 )v2 ,
Principal component analysis (non-examinable) 127
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Appendix C
129
130 Additional material (non-examinable)
p
r
!
X X
= Ai` B`k Ckj . (C.2)
`=1 k=1
The point is that the double sums in lines (C.1) and (C.2) are equal!
The only difference between them is the order of summation: in (C.1)
we sum over ` first, and then over k, whereas in (C.2) we sum over k
first, and then `. In summary, ((AB)C )ij = (A(BC ))ij . As above, since
i and j were chosen arbitrarily, it follows that (AB)C = A(BC ).
5. Let A and B be m × p matrices and let C be a p × n matrix. Then
A + B is a m × p matrix and (A + B)C is a m × n matrix. Likewise,
AC , BC and AC + BC are m × n matrices. We must check that the
corresponding entries of (A + B)C and AC + BC are equal. Given
1 6 i 6 m and 1 6 j 6 n, we have
p
X
((A + B)C )ij = (A + B)ik Ckj Definition 1.10
k=1
p
X
= (Aik + Bik )Ckj Definition 1.5
k=1
p
X
= Aik Ckj + Bik Ckj
k=1
p p
X X
= Aik Ckj + Bik Ckj
k=1 k=1
= (AC )ij + (BC )ij Definition 1.10
= (AC + BC )ij . Definition 1.5
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
132 Additional material (non-examinable)
p
X
= Ajk Bki Definition 1.10
k=1
p
X
= (BT )ik (AT )kj Definition 1.19
k=1
= (B AT )ij .
T
Definition 1.10
This shows that the corresponding matrix entries of (AB)T and BT AT
agree, hence (AB)T = BT AT .
The proofs of Proposition 1.53 (1) and (3) require mathematical induction.
Since we have not considered this method of proof in this module at all,
we just give very rough clues as to how these results can be proved.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
134 Additional material (non-examinable)
Vector subspaces
Vector subspaces are special sets of vectors. Throughout this chapter, we
will be using sets and set notation. Readers unfamiliar with such things
are advised to read through these notes before continuing.
Example C.2. Show that the following subsets of R2 are not sub-
C.2. Vector subspaces of Rn and dimension 135
spaces of R2 .
1. S = (x1 , x2 ) ∈ R2 : x12 + x22 = 1 .
2. S = (x1 , x2 ) ∈ R2 : x1 = 0 or x2 = 0 .
3. S = (x1 , x2 ) ∈ R2 : x1 > 1 .
Solution.
x
(1,0)
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
136 Additional material (non-examinable)
x
(1,0)
x
−1(1,0) = (−1,0) (1,0)
For example. . .
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
138 Additional material (non-examinable)
= (5, 11, 0) = x.
If the list of vectors So a different set of scalars has, in this case, given us the same vec-
happens to be linearly
independent, then the
tor. The scalars involved in the linear combination are not uniquely
scalars involved in any determined by the vector in question.
linear combination will
be uniquely determined:
no two different sets of Linear combinations can be used to define subspaces.
scalars will yield the
same linear combina-
tion. Definition C.6 (Subspaces generated by vectors). Let v1 , . . . , vk be a
We will not explicitly
cover linearly indepen-
list of vectors in Rn . Define the set of all possible linear combinations
dent lists of vectors in of the list
these notes, however,
orthonormal lists of vec-
tors happen to be exam- V = {a1 v1 + · · · + ak vk : a1 , . . . an ∈ R} .
ples of them.
Then V is called the subspace spanned or generated by v1 , . . . , vk .
1. If we set a1 = 0 = · · · = ak = 0, then
a1 v1 + · · · + ak vk = 0v1 + · · · + 0vk = 0.
Thus 0 ∈ V .
2. Let x, y ∈ V . By definition, x and y are linear combinations of
v1 , . . . , vk , so there exist scalars a1 , . . . , ak and b1 , . . . , bk such that
x = a1 v1 + · · · + ak vk and y = b1 v1 + · · · + bk vk .
x + y = (a1 v1 + · · · + ak vk ) + (b1 v1 + · · · + bk vk )
= (a1 + b1 )v1 + · · · + (an + bk )vk ∈ V .
Example C.9.
V = {av : a ∈ R} .
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
140 Additional material (non-examinable)
Example C.10.
V = {a1 v1 + a2 v2 : a1 , a2 ∈ R} .
The whole subspace, if plotted, would look like a flat surface extending
arbitrarily far in all directions. Any vector in V , if regarded as a straight
line segment having the origin as its initial point, would lie along the
surface, as depicted above.
C.2. Vector subspaces of Rn and dimension 141
V = {x ∈ Rn : n · x = 0} .
Figure C.2: The plane in Example C.10 (2) with normal vector n
x = a1 v1 + · · · + ak vk ∈ V ,
x = (x · v1 )v1 + · · · + (x · vk )vk .
x = (x · v1 )v1 + · · · + (x · vk )vk ,
as claimed.
Recall Remarks C.5. Unlike the list of vectors in that remark, given x ∈ V
as above, the scalars used to express x as a linear combination of the
vectors v1 , . . . , vk are uniquely determined: by Proposition C.14, they must
equal the coordinates x · v1 , . . . , x · vk , otherwise one would get a different
vector.
Definition C.15 is a generalisation of Definition 4.7 as it allows us to
define on bases of subspaces of Rn , and coordinates of vectors that lie in
said subspaces, rather than just Rn itself. If we set k = n and V = Rn ,
then Definition C.15 boils down to Definition 4.7.
Dimension of subspaces
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
144 Additional material (non-examinable)
Remarks C.17.
V = {(a, b, 0) : a, b ∈ R} ,
of dimension is safe and sound: the length of any one such list
will be the same as the length of any other! The fact that all bases
of a given vector sub-
space have the same
Geometrically speaking, projV (x) is defined in such a way that the vector
x − projV (x) is orthogonal to the entire subspace V , in the sense that
it is orthogonal to every vector in V . Imagine a fly hovering above a
horizontal table top. If we draw a vertical line down from the fly to the
table top, then the point at which the line meets the table would be the
orthogonal projection of the fly onto the table. This vertical line can be
said to be perpendicular to the entire table top.
Moreover, projV (x) is defined so that the distance between x and projV (x)
is less than the distance between x and any other vector z in V . In other
words,
x − proj (x)
6
x − z
,
V
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
146 Additional material (non-examinable)
The claims made above are justified by the next theorem. What follows
is essentially the solution of Exercise 4.12.
C.2. Vector subspaces of Rn and dimension 147
Proof.
by Theorem 2.22.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
148 Additional material (non-examinable)
Well done!