Linalg I
Linalg I
Linalg I
© Jean Gallier
In recent years, computer vision, robotics, machine learning, and data science have been
some of the key areas that have contributed to major advances in technology. Anyone who
looks at papers or books in the above areas will be ba✏ed by a strange jargon involving exotic
terms such as kernel PCA, ridge regression, lasso regression, support vector machines (SVM),
Lagrange multipliers, KKT conditions, etc. Do support vector machines chase cattle to catch
them with some kind of super lasso? No! But one will quickly discover that behind the jargon
which always comes with a new field (perhaps to keep the outsiders out of the club), lies a
lot of “classical” linear algebra and techniques from optimization theory. And there comes
the main challenge: in order to understand and use tools from machine learning, computer
vision, and so on, one needs to have a firm background in linear algebra and optimization
theory. To be honest, some probablity theory and statistics should also be included, but we
already have enough to contend with.
Many books on machine learning struggle with the above problem. How can one under-
stand what are the dual variables of a ridge regression problem if one doesn’t know about the
Lagrangian duality framework? Similarly, how is it possible to discuss the dual formulation
of SVM without a firm understanding of the Lagrangian framework?
The easy way out is to sweep these difficulties under the rug. If one is just a consumer
of the techniques we mentioned above, the cookbook recipe approach is probably adequate.
But this approach doesn’t work for someone who really wants to do serious research and
make significant contributions. To do so, we believe that one must have a solid background
in linear algebra and optimization theory.
This is a problem because it means investing a great deal of time and energy studying
these fields, but we believe that perseverance will be amply rewarded.
Our main goal is to present fundamentals of linear algebra and optimization theory,
keeping in mind applications to machine learning, robotics, and computer vision. This work
consists of two volumes, the first one being linear algebra, the second one optimization theory
and applications, especially to machine learning.
This first volume covers “classical” linear algebra, up to and including the primary de-
composition and the Jordan form. Besides covering the standard topics, we discuss a few
topics that are important for applications. These include:
1. Haar bases and the corresponding Haar wavelets.
2. Hadamard matrices.
3
4
5. Convergence of sequences and series in a normed vector space. The matrix exponential
eA and its basic properties (see Section 8.8).
6. The group of unit quaternions, SU(2), and the representation of rotations in SO(3)
by unit quaternions (Chapter 15).
9. Methods for computing eigenvalues and eigenvectors, with a main focus on the QR
algorithm (Chapter 17).
Four topics are covered in more detail than usual. These are
3. The geometry of the orthogonal groups O(n) and SO(n), and of the unitary groups
U(n) and SU(n).
Except for a few exceptions we provide complete proofs. We did so to make this book
self-contained, but also because we believe that no deep knowledge of this material can be
acquired without working out some proofs. However, our advice is to skip some of the proofs
upon first reading, especially if they are long and intricate.
The chapters or sections marked with the symbol ~ contain material that is typically
more specialized or more advanced, and they can be omitted upon first (or second) reading.
Acknowledgement: We would like to thank Christine Allen-Blanchette, Kostas Daniilidis,
Carlos Esteves, Spyridon Leonardos, Stephen Phillips, João Sedoc, Stephen Shatz, Jianbo
Shi, Marcelo Siqueira, and C.J. Taylor for reporting typos and for helpful comments. Mary
Pugh and William Yu (at the University of Toronto) taught a course using our book and
reported a number of typos and errors. We warmly thank them as well as their students,
not only for finding errors, but also for very hepful comments and suggestions for simplifying
some proofs. Special thanks to Gilbert Strang. We learned much from his books which have
been a major source of inspiration. Thanks to Steven Boyd and James Demmel whose books
have been an invaluable source of information. The first author also wishes to express his
deepest gratitute to Philippe G. Ciarlet who was his teacher and mentor in 1970-1972 while
he was a student at ENPC in Paris. Professor Ciarlet was by far his best teacher. He also
5
knew how to instill in his students the importance of intellectual rigor, honesty, and modesty.
He still has his typewritten notes on measure theory and integration, and on numerical linear
algebra. The latter became his wonderful book Ciarlet [14], from which we have borrowed
heavily.
Contents
1 Introduction 13
6
CONTENTS 7
6 Determinants 175
6.1 Permutations, Signature of a Permutation . . . . . . . . . . . . . . . . . . . 175
6.2 Alternating Multilinear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.3 Definition of a Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.4 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . . . . 192
6.5 Systems of Linear Equations and Determinants . . . . . . . . . . . . . . . . 195
6.6 Determinant of a Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.7 The Cayley–Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.8 Permanents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.10 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Bibliography 763
12 CONTENTS
Chapter 1
Introduction
As we explained in the preface, this first volume covers “classical” linear algebra, up to and
including the primary decomposition and the Jordan form. Besides covering the standard
topics, we discuss a few topics that are important for applications. These include:
1. Haar bases and the corresponding Haar wavelets, a fundamental tool in signal process-
ing and computer graphics.
2. Hadamard matrices which have applications in error correcting codes, signal processing,
and low rank approximation.
3. Affine maps (see Section 5.5). These are usually ignored or treated in a somewhat
obscure fashion. Yet they play an important role in computer vision and robotics.
There is a clean and elegant way to define affine maps. One simply has to define affine
combinations. Linear maps preserve linear combinations, and similarly affine maps
preserve affine combinations.
4. Norms and matrix norms (Chapter 8). These are used extensively in optimization
theory.
5. Convergence of sequences and series in a normed vector space. Banach spaces (see
Section 8.7). The matrix exponential eA and its basic properties (see Section 8.8).
In particular, we prove the Rodrigues formula for rotations in SO(3) and discuss the
surjectivity of the exponential map exp : so(3) ! SO(3), where so(3) is the real vector
space of 3⇥3 skew symmetric matrices (see Section 11.7). We also show that det(eA ) =
etr(A) (see Section 14.5).
6. The group of unit quaternions, SU(2), and the representation of rotations in SO(3)
by unit quaternions (Chapter 15). We define a homomorphism r : SU(2) ! SO(3)
and prove that it is surjective and that its kernel is { I, I}. We compute the rota-
tion matrix Rq associated with a unit quaternion q, and give an algorithm to con-
struct a quaternion from a rotation matrix. We also show that the exponential map
13
14 CHAPTER 1. INTRODUCTION
exp : su(2) ! SU(2) is surjective, where su(2) is the real vector space of skew-
Hermitian 2 ⇥ 2 matrices with zero trace. We discuss quaternion interpolation and
prove the famous slerp interpolation formula due to Ken Shoemake.
7. An introduction to algebraic and spectral graph theory. We define the graph Laplacian
and prove some of its basic properties (see Chapter 18). In Chapter 19, we explain
how the eigenvectors of the graph Laplacian can be used for graph drawing.
9. Methods for computing eigenvalues and eigenvectors are discussed in Chapter 17. We
first focus on the QR algorithm due to Rutishauser, Francis, and Kublanovskaya. See
Sections 17.1 and 17.3. We then discuss how to use an Arnoldi iteration, in combination
with the QR algorithm, to approximate eigenvalues for a matrix A of large dimension.
See Section 17.4. The special case where A is a symmetric (or Hermitian) tridiagonal
matrix, involves a Lanczos iteration, and is discussed in Section 17.6. In Section 17.7,
we present power iterations and inverse (power) iterations.
Five topics are covered in more detail than usual. These are
4. The geometry of the orthogonal groups O(n) and SO(n), and of the unitary groups
U(n) and SU(n).
Most texts omit the proof that the P A = LU factorization can be obtained by a simple
modification of Gaussian elimination. We give a complete proof of Theorem 7.5 in Section
7.6. We also prove the uniqueness of the rref of a matrix; see Proposition 7.19.
At the most basic level, duality corresponds to transposition. But duality is really the
bijection between subspaces of a vector space E (say finite-dimensional) and subspaces of
linear forms (subspaces of the dual space E ⇤ ) established by two maps: the first map assigns
to a subspace V of E the subspace V 0 of linear forms that vanish on V ; the second map assigns
to a subspace U of linear forms the subspace U 0 consisting of the vectors in E on which all
linear forms in U vanish. The above maps define a bijection such that dim(V ) + dim(V 0 ) =
dim(E), dim(U ) + dim(U 0 ) = dim(E), V 00 = V , and U 00 = U .
15
x1 + 2x2 x3 = 1
2x1 + x2 + x3 = 2
x1 2x2 2x3 = 3.
One way to approach this problem is introduce the “vectors” u, v, w, and b, given by
0 1 0 1 0 1 0 1
1 2 1 1
u= @ 2 A v= @ 1 A w= @ 1 A b = 2A
@
1 2 2 3
and to write our linear system as
x1 u + x2 v + x3 w = b.
In the above equation, we used implicitly the fact that a vector z can be multiplied by a
scalar 2 R, where 0 1 0 1
z1 z1
z = @z 2 A = @ z 2 A ,
z3 z3
and two vectors y and and z can be added, where
0 1 0 1 0 1
y1 z1 y1 + z 1
y + z = @y2 A + @z2 A = @y2 + z2 A .
y3 z3 y3 + z 3
17
18 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
x1 u + x2 v + x3 w = b.
x 1 u + x 2 v + x 3 w = 03 ,
it can be shown that every vector in R3⇥1 can be written as a linear combination of u, v, w.
Here, 03 is the zero vector 0 1
0
0 3 = @0 A .
0
It is customary to abuse notation and to write 0 instead of 03 . This rarely causes a problem
because in most cases, whether 0 denotes the scalar zero or the zero vector can be inferred
from the context.
In fact, every vector z 2 R3⇥1 can be written in a unique way as a linear combination
z = x1 u + x2 v + x3 w.
2.1. MOTIVATIONS: LINEAR COMBINATIONS, LINEAR INDEPENDENCE, RANK19
This is because if
z = x1 u + x2 v + x3 w = y1 u + y2 v + y3 w,
then by using our (linear!) operations on vectors, we get
y1 = x1 , y2 = x 2 , y3 = x 3 ,
which shows that z has a unique expression as a linear combination, as claimed. Then our
equation
x1 u + x2 v + x3 w = b
has a unique solution, and indeed, we can check that
x1 = 1.4
x2 = 0.4
x3 = 0.4
is the solution.
But then, how do we determine that some vectors are linearly independent?
One answer is to compute a numerical quantity det(u, v, w), called the determinant of
(u, v, w), and to check that it is nonzero. In our case, it turns out that
1 2 1
det(u, v, w) = 2 1 1 = 15,
1 2 2
or more concisely as
Ax = b.
Now what if the vectors u, v, w are linearly dependent? For example, if we consider the
vectors
0 1 0 1 0 1
1 2 1
u= @ 2 A v= @ 1 A w= @ 1 A,
1 1 2
we see that
u v = w,
a nontrivial linear dependence. It can be verified that u and v are still linearly independent.
Now for our problem
x1 u + x2 v + x3 w = b
it must be the case that b can be expressed as linear combination of u and v. However,
it turns out that u, v, b are linearly independent (one way to see this is to compute the
determinant det(u, v, b) = 6), so b cannot be expressed as a linear combination of u and v
and thus, our system has no solution.
If we change the vector b to 0 1
3
b = 3A ,
@
0
then
b = u + v,
and so the system
x1 u + x2 v + x3 w = b
has the solution
x1 = 1, x2 = 1, x3 = 0.
2.1. MOTIVATIONS: LINEAR COMBINATIONS, LINEAR INDEPENDENCE, RANK21
More generally, given any two vectors x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) 2 Rn , their
inner product denoted x · y, or hx, yi, is the number
0 1
y1
B y2 C X n
B C
x · y = x1 x2 · · · xn · B .. C = xi yi .
@ . A i=1
yn
is a generalization of the length of a vector, called the Euclidean norm, or `2 -norm. Second,
it can be shown that we have the inequality
|x · y| kxk kyk ,
so if x, y 6= 0, the ratio (x · y)/(kxk kyk) can be viewed as the cosine of an angle, the angle
between x and y. In particular, if x · y = 0 then the vectors x and y make the angle ⇡/2,
that is, they are orthogonal . The (square) matrices Q that preserve the inner product, in
the sense that hQx, Qyi = hx, yi for all x, y 2 Rn , also play a very important role. They can
be thought of as generalized rotations.
Returning to matrices, if A is an m ⇥ n matrix consisting of n columns A1 , . . . , An (in
Rm ), and B is a n ⇥ p matrix consisting of p columns B 1 , . . . , B p (in Rn ) we can form the p
vectors (in Rm )
AB 1 , . . . , AB p .
These p vectors constitute the m ⇥ p matrix denoted AB, whose jth column is AB j . But
we know that the ith coordinate of AB j is the inner product of the ith row of A by the jth
column of B, 0 1
b1j
B b2j C X n
B C
ai1 ai2 · · · ain · B .. C = aik bkj .
@ . A k=1
bnj
Thus we have defined a multiplication operation on matrices, namely if A = (aik ) is a m ⇥ n
matrix and if B = (bjk ) if n ⇥ p matrix, then their product AB is the m ⇥ n matrix whose
entry on the ith row and the jth column is given by the inner product of the ith row of A
by the jth column of B,
X n
(AB)ij = aik bkj .
k=1
Beware that unlike the multiplication of real (or complex) numbers, if A and B are two n ⇥ n
matrices, in general, AB 6= BA.
2.1. MOTIVATIONS: LINEAR COMBINATIONS, LINEAR INDEPENDENCE, RANK23
Suppose that A is an n ⇥ n matrix and that we are trying to solve the linear system
Ax = b,
BAi = ei , i = 1, . . . , n,
with ei = (0, . . . , 0, 1, 0 . . . , 0), where the only nonzero entry is 1 in the ith slot. If we form
the n ⇥ n matrix 0 1
1 0 0 ··· 0 0
B0 1 0 · · · 0 0C
B C
B0 0 1 · · · 0 0C
B C
In = B .. .. .. . . .. .. C ,
B. . . . . .C
B C
@0 0 0 · · · 1 0A
0 0 0 ··· 0 1
called the identity matrix , whose ith column is ei , then the above is equivalent to
BA = In .
B(Ax) = Bb.
x = Bb.
A(Bb) = (AB)b = In b = b.
What is not obvious is that BA = In implies AB = In , but this is indeed provable. The
matrix B is usually denoted A 1 and called the inverse of A. It can be shown that it is the
unique matrix such that
AA 1 = A 1 A = In .
If a square matrix A has an inverse, then we say that it is invertible or nonsingular , otherwise
we say that it is singular . We will show later that a square matrix is invertible i↵ its columns
are linearly independent i↵ its determinant is nonzero.
In summary, if A is a square invertible matrix, then the linear system Ax = b has the
unique solution x = A 1 b. In practice, this is not a good way to solve a linear system because
computing A 1 is too expensive. A practical method for solving a linear system is Gaussian
elimination, discussed in Chapter 7. Other practical methods for solving a linear system
24 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
QQ> = Q> Q = In
A = V ⌃U > ,
Another important application of the SVD is principal component analysis (or PCA), an
important tool in data analysis.
Yet another fruitful way of interpreting the resolution of the system Ax = b is to view
this problem as an intersection problem. Indeed, each of the equations
x1 + 2x2 x3 = 1
2x1 + x2 + x3 = 2
x1 2x2 2x3 = 3
x1 + 2x2 x3 = 1
defines the plane H1 passing through the three points (1, 0, 0), (0, 1/2, 0), (0, 0, 1), on the
coordinate axes, the second equation
2x1 + x2 + x3 = 2
defines the plane H2 passing through the three points (1, 0, 0), (0, 2, 0), (0, 0, 2), on the coor-
dinate axes, and the third equation
x1 2x2 2x3 = 3
defines the plane H3 passing through the three points (3, 0, 0), (0, 3/2, 0), (0, 0, 3/2), on
the coordinate axes. See Figure 2.1.
x1-2x2-2x3= 3
x1-2x2-2x3= 3
Figure 2.2: The solution of the system is the point in common with each of the three planes.
The intersection Hi \Hj of any two distinct planes Hi and Hj is a line, and the intersection
H1 \ H2 \ H3 of the three planes consists of the single point (1.4, 0.4, 0.4), as illustrated
in Figure 2.2.
The planes corresponding to the system
x1 + 2x2 x3 = 1
2x1 + x2 + x3 = 2
x1 x2 + 2x3 = 3,
are illustrated in Figure 2.3.
x1- x 2+2x3= 3
1 2 3
Figure 2.3: The planes defined by the equations x1 + 2x2 x3 = 1, 2x1 + x2 + x3 = 2, and
x1 x2 + 2x3 = 3.
2.1. MOTIVATIONS: LINEAR COMBINATIONS, LINEAR INDEPENDENCE, RANK27
This system has no solution since there is no point simultaneously contained in all three
planes; see Figure 2.4.
x1- x 2+2x3= 3
x1 + 2x2 x3 = 3
2x1 + x2 + x3 = 3
x1 x2 + 2x3 = 0,
x 1- x2+ 2x3= 0
1 2 3
2x1+ x2+ x 3= 3
Figure 2.5: The planes defined by the equations x1 + 2x2 x3 = 3, 2x1 + x2 + x3 = 3, and
x1 x2 + 2x3 = 0.
28 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
x 1- x2+ 2x3= 0
Under the above interpretation, observe that we are focusing on the rows of the matrix
A, rather than on its columns, as in the previous interpretations.
Another great example of a real-world problem where linear algebra proves to be very
e↵ective is the problem of data compression, that is, of representing a very large data set
using a much smaller amount of storage.
Typically the data set is represented as an m ⇥ n matrix A where each row corresponds
to an n-dimensional data point and typically, m n. In most applications, the data are not
independent so the rank of A is a lot smaller than min{m, n}, and the the goal of low-rank
decomposition is to factor A as the product of two matrices B and C, where B is a m ⇥ k
matrix and C is a k ⇥ n matrix, with k ⌧ min{m, n} (here, ⌧ means “much smaller than”):
0 1 0 1
B C B C
B C B C0 1
B C B C
B C B C
B A C=B B C@ C A
B C B C
B m⇥n C B m⇥k C k⇥n
B C B C
@ A @ A
Now it is generally too costly to find an exact factorization as above, so we look for a
low-rank matrix A0 which is a “good” approximation of A. In order to make this statement
precise, we need to define a mechanism to determine how close two matrices are. This can
be done using matrix norms, a notion discussed in Chapter 8. The norm of a matrix A is a
2.2. VECTOR SPACES 29
nonnegative real number kAk which behaves a lot like the absolute value |x| of a real number
x. Then our goal is to find some low-rank matrix A0 that minimizes the norm
2
kA A0 k ,
over all matrices A0 of rank at most k, for some given k ⌧ min{m, n}.
Some advantages of a low-rank approximation are:
1. Fewer elements are required to represent A; namely, k(m + n) instead of mn. Thus
less storage and fewer operations are needed to reconstruct A.
2. Often, the process for obtaining the decomposition exposes the underlying structure of
the data. Thus, it may turn out that “most” of the significant data are concentrated
along some directions called principal directions.
Low-rank decompositions of a set of data have a multitude of applications in engineering,
including computer science (especially computer vision), statistics, and machine learning.
As we will see later in Chapter 21, the singular value decomposition (SVD) provides a very
satisfactory solution to the low-rank approximation problem. Still, in many cases, the data
sets are so large that another ingredient is needed: randomization. However, as a first step,
linear algebra often yields a good initial solution.
We will now be more precise as to what kinds of operations are allowed on vectors. In
the early 1900, the notion of a vector space emerged as a convenient and unifying framework
for working with “linear” objects and we will discuss this notion in the next few sections.
However, keep in mind that vector spaces are not just algebraic
objects; they are also geometric objects.
Definition 2.1. A group is a set G equipped with a binary operation · : G ⇥ G ! G that
associates an element a · b 2 G to every pair of elements a, b 2 G, and having the following
properties: · is associative, has an identity element e 2 G, and every element in G is invertible
(w.r.t. ·). More explicitly, this means that the following equations hold for all a, b, c 2 G:
(G1) a · (b · c) = (a · b) · c. (associativity);
(G2) a · e = e · a = a. (identity);
1 1 1
(G3) For every a 2 G, there is some a 2 G such that a · a =a · a = e. (inverse).
30 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
a · b = b · a for all a, b 2 G.
3. Similarly, the sets R of real numbers and C of complex numbers are abelian groups
under addition (with identity element 0), and R⇤ = R {0} and C⇤ = C {0} are
abelian groups under multiplication (with identity element 1).
4. The sets Rn and Cn of n-tuples of real or complex numbers are abelian groups under
componentwise addition:
5. Given any nonempty set S, the set of bijections f : S ! S, also called permutations
of S, is a group under function composition (i.e., the multiplication of f and g is the
composition g f ), with identity element the identity function idS . This group is not
abelian as soon as S has more than two elements.
6. The set of n ⇥ n matrices with real (or complex) coefficients is an abelian group under
addition of matrices, with identity element the null matrix. It is denoted by Mn (R)
(or Mn (C)).
7. The set R[X] of all polynomials in one variable X with real coefficients,
P (X) = an X n + an 1 X n 1
+ · · · + a1 X + a0 ,
(with ai 2 R), is an abelian group under addition of polynomials. The identity element
is the zero polynomial.
2.2. VECTOR SPACES 31
8. The set of n ⇥ n invertible matrices with real (or complex) coefficients is a group under
matrix multiplication, with identity element the identity matrix In . This group is
called the general linear group and is usually denoted by GL(n, R) (or GL(n, C)).
9. The set of n ⇥ n invertible matrices with real (or complex) coefficients and determinant
+1 is a group under matrix multiplication, with identity element the identity matrix
In . This group is called the special linear group and is usually denoted by SL(n, R)
(or SL(n, C)).
10. The set of n ⇥ n invertible matrices with real coefficients such that RR> = R> R = In
and of determinant +1 is a group (under matrix multiplication) called the special
orthogonal group and is usually denoted by SO(n) (where R> is the transpose of the
matrix R, i.e., the rows of R> are the columns of R). It corresponds to the rotations
in Rn .
11. Given an open interval (a, b), the set C(a, b) of continuous functions f : (a, b) ! R is
an abelian group under the operation f + g defined such that
and
a · e00 = a for all a 2 M, (G2r)
then e0 = e00 .
Proof. If we let a = e00 in equation (G2l), we get
e0 · e00 = e00 ,
e0 · e00 = e0 ,
and thus
e0 = e0 · e00 = e00 ,
as claimed.
32 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Proposition 2.1 implies that the identity element of a monoid is unique, and since every
group is a monoid, the identity element of a group is unique. Furthermore, every element in
a group has a unique inverse. This is a consequence of a slightly more general fact:
Proposition 2.2. In a monoid M with identity element e, if some element a 2 M has some
left inverse a0 2 M and some right inverse a00 2 M , which means that
a0 · a = e (G3l)
and
a · a00 = e, (G3r)
then a0 = a00 .
Proof. Using (G3l) and the fact that e is an identity element, we have
Similarly, Using (G3r) and the fact that e is an identity element, we have
a0 · (a · a00 ) = a0 · e = a0 .
as claimed.
Remark: Axioms (G2) and (G3) can be weakened a bit by requiring only (G2r) (the exis-
tence of a right identity) and (G3r) (the existence of a right inverse for every element) (or
(G2l) and (G3l)). It is a good exercise to prove that the group axioms (G2) and (G3) follow
from (G2r) and (G3r).
Another important property about inverse elements in monoids is stated below.
Proposition 2.3. In a monoid M with identity element e, if a and b are invertible elements
of M , where a 1 is the inverse of a and b 1 is the inverse of b, then ab is invertible and its
inverse is given by (ab) 1 = b 1 a 1 .
Proof. Using associativity and the fact that e is the identity element we have
We also have
Therefore b 1 a 1
is the inverse of ab.
Observe that the inverse of ba is a 1 b 1 . Proposition 2.3 implies that the set of invertible
elements of a monoid M is a group, also with identity element e.
A vector space is an abelian group E with an additional operation · : K ⇥ E ! E called
scalar multiplication that allows rescaling a vector in E by an element in K. The set K
itself is an algebraic structure called a field . A field is a special kind of stucture called a
ring. These notions are defined below. We begin with rings.
The identity element for addition is denoted 0, and the additive inverse of a 2 A is
denoted by a. More explicitly, the axioms of a ring are the following equations which hold
for all a, b, c 2 A:
a + (b + c) = (a + b) + c (associativity of +) (2.1)
a+b=b+a (commutativity of +) (2.2)
a+0=0+a=a (zero) (2.3)
a + ( a) = ( a) + a = 0 (additive inverse) (2.4)
a ⇤ (b ⇤ c) = (a ⇤ b) ⇤ c (associativity of ⇤) (2.5)
a⇤1=1⇤a=a (identity for ⇤) (2.6)
(a + b) ⇤ c = (a ⇤ c) + (b ⇤ c) (distributivity) (2.7)
a ⇤ (b + c) = (a ⇤ b) + (a ⇤ c) (distributivity) (2.8)
a ⇤ b = b ⇤ a for all a, b 2 A.
34 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Note that (2.9) implies that if 1 = 0, then a = 0 for all a 2 A, and thus, A = {0}. The
ring A = {0} is called the trivial ring. A ring for which 1 6= 0 is called nontrivial . The
multiplication a ⇤ b of two elements a, b 2 A is often denoted by ab.
The abelian group Z is a commutative ring (with unit 1), and for any commutative ring
K, the abelian group K[X] of polynomials is also a commutative ring (also with unit 1).
The set Z/mZ of residues modulo m where m is a positive integer is a commutative ring.
A field is a commutative ring K for which K {0} is a group under multiplication.
Definition 2.3. A set K is a field if it is a ring and the following properties hold:
(F1) 0 6= 1;
(F3) ⇤ is commutative.
Let K ⇤ = K {0}. Observe that (F1) and (F2) are equivalent to the fact that K ⇤ is a
group w.r.t. ⇤ with identity element 1. If ⇤ is not commutative but (F1) and (F2) hold, we
say that we have a skew field (or noncommutative field ).
Note that we are assuming that the operation ⇤ of a field is commutative. This convention
is not universally adopted, but since ⇤ will be commutative for most fields we will encounter,
we may as well include this condition in the definition.
Example 2.2.
1. The rings Q, R, and C are fields.
3. The set of (formal) fractions f (X)/g(X) of polynomials f (X), g(X) 2 R[X], where
g(X) is not the zero polynomial, is a field.
(V1) ↵ · (u + v) = (↵ · u) + (↵ · v);
(V2) (↵ + ) · u = (↵ · u) + ( · u);
(V3) (↵ ⇤ ) · u = ↵ · ( · u);
(V4) 1 · u = u.
Given ↵ 2 R and v 2 E, the element ↵ · v is also denoted by ↵v. The field R is often
called the field of scalars.
In Definition 2.4, the field R may be replaced by the field of complex numbers C, in which
case we have a complex vector space. It is even possible to replace R by the field of rational
numbers Q or by any arbitrary field K (for example Z/pZ, where p is a prime number), in
which case we have a K-vector space (in (V3), ⇤ denotes multiplication in the field K). In
most cases, the field K will be the field R of reals, but all results in this chapter hold for
vector spaces over an arbitrary field .
From (V0), a vector space always contains the null vector 0, and thus is nonempty.
From (V1), we get ↵ · 0 = 0, and ↵ · ( v) = (↵ · v). From (V2), we get 0 · v = 0, and
( ↵) · v = (↵ · v).
Another important consequence of the axioms is the following fact:
Remark: One may wonder whether axiom (V4) is really needed. Could it be derived from
the other axioms? The answer is no. For example, one can take E = Rn and define
· : R ⇥ Rn ! Rn by
· (x1 , . . . , xn ) = (0, . . . , 0)
2
The symbol 0 is also overloaded, since it represents both the zero in R (a scalar) and the identity element
of E (the zero vector). Confusion rarely arises, but one may prefer using 0 for the zero vector.
36 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
for all (x1 , . . . , xn ) 2 Rn and all 2 R. Axioms (V0)–(V3) are all satisfied, but (V4) fails.
Less trivial examples can be given using the notion of a basis, which has not been defined
yet.
The field R itself can be viewed as a vector space over itself, addition of vectors being
addition in the field, and multiplication by a scalar being multiplication in the field.
Example 2.3.
1. The fields R and C are vector spaces over R.
2. The groups Rn and Cn are vector spaces over R, with scalar multiplication given by
(x1 , . . . , xn ) = ( x1 , . . . , xn ),
P (X) = am X m + am 1 X m 1
+ · · · + a1 X + a0
· P (X) = am X m + am 1 X m 1
+ · · · + a1 X + a0 .
4. The ring R[X] of all polynomials with real coefficients is a vector space over R, and the
ring C[X] of all polynomials with complex coefficients is a vector space over C, with
the same scalar multiplication as above.
5. The ring of n ⇥ n matrices Mn (R) is a vector space over R.
6. The ring of m ⇥ n matrices Mm,n (R) is a vector space over R.
7. The ring C(a, b) of continuous functions f : (a, b) ! R is a vector space over R, with
the scalar multiplication f of a function f : (a, b) ! R by a scalar 2 R given by
8. A very important example of vector space is the set of linear maps between two vector
spaces to be defined in Section 2.7. Here is an example that will prepare us for the
vector space of linear maps. Let X be any nonempty set and let E be a vector space.
The set of all functions f : X ! E can be made into a vector space as follows: Given
any two functions f : X ! E and g : X ! E, let (f + g) : X ! E be defined such that
( f )(x) = f (x)
Let E be a vector space. We would like to define the important notions of linear combi-
nation and linear independence.
Before defining these notions, we need to discuss a strategic choice which, depending
how it is settled, may reduce or increase headaches in dealing with notions such as linear
combinations and linear dependence (or independence). The issue has to do with using sets
of vectors versus sequences of vectors.
P
2.3 Indexed Families; the Sum Notation i2I ai
Our experience tells us that it is preferable to use sequences of vectors; even better, indexed
families of vectors. (We are not alone in having opted for sequences over sets, and we are in
good company; for example, Artin [3], Axler [4], and Lang [40] use sequences. Nevertheless,
some prominent authors such as Lax [43] use sets. We leave it to the reader to conduct a
survey on this issue.)
Given a set A, recall that a sequence is an ordered n-tuple (a1 , . . . , an ) 2 An of elements
from A, for some natural number n. The elements of a sequence need not be distinct and
the order is important. For example, (a1 , a2 , a1 ) and (a2 , a1 , a1 ) are two distinct sequences
in A3 . Their underlying set is {a1 , a2 }.
What we just defined are finite sequences, which can also be viewed as functions from
{1, 2, . . . , n} to the set A; the ith element of the sequence (a1 , . . . , an ) is the image of i under
the function. This viewpoint is fruitful, because it allows us to define (countably) infinite
sequences as functions s : N ! A. But then, why limit ourselves to ordered sets such as
{1, . . . , n} or N as index sets?
The main role of the index set is to tag each element uniquely, and the order of the tags
is not crucial, although convenient. Thus, it is natural to define the notion of indexed family.
Definition 2.5. Given a set A, an I-indexed family of elements of A, for short a family,
is a function a : I ! A where I is any set viewed as an index set. Since the function a is
determined by its graph
{(i, a(i)) | i 2 I},
the family a can be viewed as the set of pairs a = {(i, a(i)) | i 2 I}. For notational simplicity,
we write ai instead of a(i), and denote the family a = {(i, a(i)) | i 2 I} by (ai )i2I .
38 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
is an indexed family. The element 2 appears twice in the family with the two distinct tags
r and b.
When the indexed set I is totally ordered, a family (ai )i2I is often called an I-sequence.
Interestingly, sets can be viewed as special cases of families. Indeed, a set A can be viewed
as the A-indexed family {(a, a) | a 2 I} corresponding to the identity function.
Remark: An indexed family should not be confused with a multiset. Given any set A, a
multiset is a similar to a set, except that elements of A may occur more than once. For
example, if A = {a, b, c, d}, then {a, a, a, b, c, c, d, d} is a multiset. Each element appears
with a certain multiplicity, but the order of the elements does not matter. For example, a
has multiplicity 3. Formally, a multiset is a function s : A ! N, or equivalently a set of pairs
{(a, i) | a 2 A}. Thus, a multiset is an A-indexed family of elements from N, but not a
N-indexed family, since distinct elements may have the same multiplicity (such as c an d in
the example above). An indexed family is a generalization of a sequence, but a multiset is a
generalization of a set.
WePalso need to take care of an annoying technicality, which is to define sums of the
form i2I ai , where I is any finite index set and (ai )i2I is a family of elements in some set
A equiped with a binary operation + : A ⇥ A ! A which is associative (Axiom (G1)) and
commutative. This will come up when we define linear combinations.
The issue is that the binary operation + only tells us how to compute a1 + a2 for two
elements of A, but it does not tell us what is the sum of three of more elements. For example,
how should a1 + a2 + a3 be defined?
What we have to do is to define a1 +a2 +a3 by using a sequence of steps each involving two
elements, and there are two possible ways to do this: a1 + (a2 + a3 ) and (a1 + a2 ) + a3 . If our
operation + is not associative, these are di↵erent values. If it associative, then a1 +(a2 +a3 ) =
(a1 + a2 ) + a3 , but then there are still six possible permutations of the indices 1, 2, 3, and if
+ is not commutative, these values are generally di↵erent. If our operation is commutative,
then all six permutations have the same value. P Thus, if + is associative and commutative,
it seems intuitively clear that a sum of the form i2I ai does not depend on the order of the
operations used to compute it.
This is indeed the case, but a rigorous proof requires induction, and such a proof is
surprisingly
P involved. Readers may accept without proof the fact that sums of the form
i2I ai are indeed well defined, and jump directly to Definition 2.6. For those who want to
see the gory details, here we go.
P
First, we define sums i2I ai , where I is a finite sequence of distinct natural numbers,
say I = (i1 , . . . , im ). If I = (i1 , . . . , im ) with m 2, we denote the sequence (i2 , . . . , im ) by
P
2.3. INDEXED FAMILIES; THE SUM NOTATION i2I ai 39
If the operation + is not associative, the grouping of the terms matters. For instance, in
general
a1 + (a2 + (a3 + a4 )) 6= (a1 + a2 ) + (a3 + a4 ).
P
However, if the operation + is associative, the sum i2I ai should not depend on the grouping
of the elements in I, as long as their order is preserved. For example, if I = (1, 2, 3, 4, 5),
J1 = (1, 2), and J2 = (3, 4, 5), we expect that
X ✓X ◆ ✓X ◆
ai = aj + aj .
i2I j2J1 j2J2
and
X✓ X ◆ ✓X ✓ X ◆◆
a↵ =a + a↵ .
k2K ↵2Ik j2J ↵2Ij
If we add the righthand side to a , using associativity and the definition of an indexed sum,
we get
✓✓ X ◆ ✓X✓ X ◆◆◆ ✓ ✓ X ◆◆ ✓X✓ X ◆◆
a + a↵ + a↵ = a + a↵ + a↵
↵2Ik0 j2J ↵2Ij ↵2Ik0 j2J ↵2Ij
1 1
✓X ◆ ✓X ✓ X ◆◆
= a↵ + a↵
↵2Ik1 j2J ↵2Ij
X✓ X ◆
= a↵ ,
k2K ↵2Ik
as claimed.
Pn P
If I = (1, . . . , n), we also write
Pn i=1 a i instead of i2I ai . Since + is associative, Propo-
sition 2.5 shows that the sum i=1 ai is independent of the grouping of its elements, which
justifies the use the notation a1 + · · · + an (without any parentheses).
If we also assume that
P our associative binary operation on A is commutative, then we
can show that the sum i2I ai does not depend on the ordering of the index set I.
P
2.3. INDEXED FAMILIES; THE SUM NOTATION i2I ai 41
Proposition 2.6. Given any nonempty set A equipped with an associative and commutative
binary operation + : A ⇥ A ! A, for any two nonempty finite sequences I and J of distinct
natural numbers such that J is a permutation of I (in other words, the underlying sets of I
and J are identical), for every sequence (ai )i2I of elements in A, we have
X X
a↵ = a↵ .
↵2I ↵2J
then using associativity and commutativity several times (more rigorously, using induction
on i1 1), we get
✓ ✓iX
1 1 ◆◆ ✓ X
p ◆ ✓iX 1 1 ◆ ✓ Xp ◆
ai 1 + ai + ai = ai + ai 1 + ai
i=1 i=i1 +1 i=1 i=i1 +1
p
X
= ai ,
i=1
as claimed.
The cases where i1 = 1 or i1 = p are treated similarly, but in a simpler manner since
either P = () or Q = () (where () denotes the empty sequence).
42 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
P
Having done all this, we can now make sense of sums of the form i2I ai , for any finite
indexed set I and any family a = (ai )i2I of elements in A, where A is a set equipped with a
binary operation + which is associative and commutative.
Indeed, since I is finite, it is in bijection with the set {1, . . . , n} for some n 2 N, and any
total ordering on I corresponds to a permutation I of {1, . . . , n} (where P we identify a
permutation with its image). For any total ordering on I, we define i2I, ai as
X X
ai = aj .
i2I, j2I
0
Then for any other total ordering on I, we have
X X
ai = aj ,
i2I, 0 j2I 0
and since I and I 0 are di↵erent permutations of {1, . . . , n}, by Proposition 2.6, we have
X X
aj = aj .
j2I j2I 0
P
Therefore,
P the sum i2I, ai does
P not depend on the total ordering on I. We define the sum
i2I ai as the common value i2I, ai for all total orderings of I.
Here are some examples with A = R:
p P p p
1. If I = {1, 2, 3}, a = {(1, 2), (2, 3), (3, 2)}, then i2I ai = 2 3+ 2= 1+ 2.
P p p p
2. If I = {2, 5, 7}, a = {(2, 2), (5, 3), (7, 2)}, then i2I ai = 2 3 + 2 = 1+ 2.
P
3. If I = {r, g, b}, a = {(r, 2), (g, 3), (b, 1)}, then i2I ai = 2 3 + 1 = 0.
Given a set A, recall that an I-indexed family (ai )i2I of elements of A (for short, a family)
is a function a : I ! A, or equivalently a set of pairs {(i, ai ) | i 2 I}. We agree that when
I = ;, (ai )i2I = ;. A family (ai )i2I is finite if I is finite.
Remark: When considering a family (ai )i2I , there is no reason to assume that I is ordered.
The crucial point is that every element of the family is uniquely indexed by an element of
I. Thus, unless specified otherwise, we do not assume that the elements of an index set are
ordered.
Given two disjoint sets I and J, the union of two families (ui )i2I and (vj )j2J , denoted as
(ui )i2I [ (vj )j2J , is the family (wk )k2(I[J) defined such that wk = uk if k 2 I, and wk = vk
if k 2 J. Given a family (ui )i2I and any element v, we denote by (ui )i2I [k (v) the family
(wi )i2I[{k} defined such that, wi = ui if i 2 I, and wk = v, where k is any index such that
k2 / I. Given a family (ui )i2I , a subfamily of (ui )i2I is a family (uj )j2J where J is any subset
of I.
In this chapter, unless specified otherwise, it is assumed that all families of scalars are
finite (i.e., their index set is finite).
Definition 2.6. Let E be a vector space. A vector v 2 E is a linear combination of a family
(ui )i2I of elements of E i↵ there is a family ( i )i2I of scalars in R such that
X
v= i ui .
i2I
P
When I = ;, we stipulate that v = 0. (By Proposition 2.6, sums of the form i2I i ui are
well defined.) We say that a family (ui )i2I is linearly independent i↵ for every family ( i )i2I
of scalars in R, X
i ui = 0 implies that i = 0 for all i 2 I.
i2I
Equivalently, a family (ui )i2I is linearly dependent i↵ there is some family ( i )i2I of scalars
in R such that X
i ui = 0 and j 6= 0 for some j 2 I.
i2I
Observe that defining linear combinations for families of vectors rather than for sets of
vectors has the advantage that the vectors being combined need not be distinct. For example,
for I = {1, 2, 3} and the families (u, v, u) and ( 1 , 2 , 1 ), the linear combination
X
i ui = 1 u + 2 v + 1 u
i2I
makes sense. Using sets of vectors in the definition of a linear combination does not allow
such linear combinations; this is too restrictive.
44 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Unravelling Definition 2.6, a family (ui )i2I is linearly dependent i↵ either I consists of a
single element, say i, and ui = 0, or |I| 2 and some uj in the family can be expressed as
a linear combination of the other vectors in the family. Indeed, in the second case, there is
some family ( i )i2I of scalars in R such that
X
i ui = 0 and j 6= 0 for some j 2 I,
i2I
Observe that one of the reasons for defining linear dependence for families of vectors
rather than for sets of vectors is that our definition allows multiple occurrences of a vector.
This is important because a matrix may contain identical columns, and we would like to say
that these columns are linearly dependent. The definition of linear dependence for sets does
not allow us to do that.
The above also shows that a family (ui )i2I is linearly independent i↵ either I = ;, or I
consists of a single element i and ui 6= 0, or |I| 2 and no vector uj in the family can be
expressed as a linear combination of the other vectors in the family.
When I is nonempty, if the family (ui )i2I is linearly independent, note that ui 6= 0 for
P 2 I. Otherwise, if ui = 0 for some i 2 I, then we get a nontrivial linear dependence
all i
i2I i ui = 0 by picking any nonzero i and letting k = 0 for all k 2 I with k 6= i, since
i 0 = 0. If |I| 2, we must also have ui 6= uj for all i, j 2 I with i 6= j, since otherwise we
get a nontrivial linear dependence by picking i = and j = for any nonzero , and
letting k = 0 for all k 2 I with k 6= i, j.
Thus, the definition of linear independence implies that a nontrivial linearly independent
family is actually a set. This explains why certain authors choose to define linear indepen-
dence for sets of vectors. The problem with this approach is that linear dependence, which
is the logical negation of linear independence, is then only defined for sets of vectors. How-
ever, as we pointed out earlier, it is really desirable to define linear dependence for families
allowing multiple occurrences of the same vector.
In the special case where the vectors that we are considering are the columns A1 , . . . , An
of an n ⇥ n matrix A (with coefficients in K = R or K = C), linear independence has a
simple characterization in terms of the solutions of the linear system Ax = 0.
Recall that A1 , . . . , An are linearly independent i↵ for any scalars x1 , . . . , xn 2 K,
if x1 A1 + · · · + xn An = 0, then x1 = · · · = xn = 0. (⇤1 )
If we form the column vector x whose coordinates are x1 , . . . , xn 2 K, then by definition of
Ax,
x1 A1 + · · · + xn An = Ax,
2.4. LINEAR INDEPENDENCE, SUBSPACES 45
so (⇤1 ) is equivalent to
if Ax = 0, then x = 0. (⇤2 )
In other words, the columns A1 , . . . , An of the matrix A are linearly independent i↵ the linear
system Ax = 0 has the unique solution x = 0 (the trivial solution).
The above can typically be demonstrated by solving the system Ax = 0 by variable
elimination, and verifying that the only solution obtained is x = 0.
Another way to prove that the linear system Ax = 0 only has the trivial solution x = 0 is
to show that A is invertible by by finding explicity the inverse A 1 of A. Indeed, if A has an
inverse A 1 , we have A 1 A = AA 1 = I, so multiplying both sides of the equation Ax = 0
on the left by A 1 , we obtain
A 1 Ax = A 1 0 = 0,
and since A 1 Ax = Ix = x, we get x = 0.
The first method can be applied to show linear independence in (2) and (3) of the following
example.
Example 2.4.
1. Any two distinct scalars , µ 6= 0 in R are linearly dependent.
2. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1) are linearly independent. See Figure
2.7.
Figure 2.7: A visual (arrow) depiction of the red vector (1, 0, 0), the green vector (0, 1, 0),
and the blue vector (0, 0, 1) in R3 .
3. In R4 , the vectors (1, 1, 1, 1), (0, 1, 1, 1), (0, 0, 1, 1), and (0, 0, 0, 1) are linearly indepen-
dent.
4. In R2 , the vectors u = (1, 1), v = (0, 1) and w = (2, 3) are linearly dependent, since
w = 2u + v.
(2,3)
2u
Figure 2.8: A visual (arrow) depiction of the pink vector u = (1, 1), the dark purple vector
v = (0, 1), and the vector sum w = 2u + v.
When I is finite, we often assume that it is the set I = {1, 2, . . . , n}. In this case, we
denote the family (ui )i2I as (u1 , . . . , un ).
The notion of a subspace of a vector space is defined as follows.
Definition 2.7. Given a vector space E, a subset F of E is a linear subspace (or subspace)
of E i↵ F is nonempty and u + µv 2 F for all u, v 2 F , and all , µ 2 R.
It is easy to see that a subspace F of E is indeed a vector space, since the restriction
of + : E ⇥ E ! E to F ⇥ F is indeed a function + : F ⇥ F ! F , and the restriction of
· : R ⇥ E ! E to R ⇥ F is indeed a function · : R ⇥ F ! F .
Since a subspace F is nonempty, if we pick any vector u 2 F and if we let = µ = 0,
then u + µu = 0u + 0u = 0, so every subspace contains the vector 0.
The following facts also hold. The proof is left as an exercise.
Proposition 2.7.
(1) The intersection of any family (even infinite) of subspaces of a vector space E is a
subspace.
(2) Let F be any subspace of a vector space E. For any nonempty finite index set I,
P(ui )i2I is any family of vectors ui 2 F and ( i )i2I is any family of scalars, then
if
i2I i ui 2 F .
The subspace {0} will be denoted by (0), or even 0 (with a mild abuse of notation).
2.4. LINEAR INDEPENDENCE, SUBSPACES 47
Example 2.5.
x+y =0
Figure 2.9: The subspace x + y = 0 is the line through the origin with slope 1. It consists
of all vectors of the form ( 1, 1).
x+y+z =0
3. For any n 0, the set of polynomials f (X) 2 R[X] of degree at most n is a subspace
of R[X].
Proposition 2.8. Given any vector space E, if S is any nonempty subset of E, then the
smallest subspace hSi (or Span(S)) of E containing S is the set of all (finite) linear combi-
nations of elements from S.
Proof. We prove that the set Span(S) of all linear combinations of elements of S is a subspace
of E, leaving as an exercise the verification that every subspace containing S also contains
Span(S).
48 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Figure 2.10: The subspace x + y + z = 0 is the plane through the origin with normal (1, 1, 1).
P
First,P
Span(S) is nonempty since it contains S (which is nonempty). If u = i2I i ui
and v = j2J µj vj are any two linear combinations in Span(S), for any two scalars , µ 2 R,
X X
u + µv = i ui + µ µj v j
i2I j2J
X X
= i ui + µµj vj
i2I j2J
X X X
= i ui + ( i ui + µµi vi ) + µµj vj ,
i2I J i2I\J j2J I
which is a linear combination with index set I [ J, and thus u + µv 2 Span(S), which
proves that Span(S) is a subspace.
One might wonder what happens if we add extra conditions to the coefficients involved
in forming linear combinations. Here are three natural restrictions which turn out to be
important (as usual, we assume that our index sets are finite):
P
(1) Consider combinations i2I i ui for which
X
i = 1.
i2I
These
P are called affine combinations. One should realize that every linear combination
i2I i ui can be viewed as an affine combination.PFor example, Pif k is an index not
in I, if we let J = I [ {k}, uk = 0, and k = 1 i2I i , then j2J j uj is an affine
combination and X X
i ui = j uj .
i2I j2J
2.5. BASES OF A VECTOR SPACE 49
However, we get new spaces. For example, in R3 , the set of all affine combinations of
the three vectors e1 = (1, 0, 0), e2 = (0, 1, 0), and e3 = (0, 0, 1), is the plane passing
through these three points. Since it does not contain 0 = (0, 0, 0), it is not a linear
subspace.
P
(2) Consider combinations i2I i ui for which
i 0, for all i 2 I.
These are called positive (or conic) combinations. It turns out that positive combina-
tions of families of vectors are cones. They show up naturally in convex optimization.
P
(3) Consider combinations i2I i ui for which we require (1) and (2), that is
X
i = 1, and i 0 for all i 2 I.
i2I
These are called convex combinations. Given any finite family of vectors, the set of all
convex combinations of these vectors is a convex polyhedron. Convex polyhedra play a
very important role in convex optimization.
Remark: The notion Pof linear combination can also be defined for infinite index sets I.
To ensure that a sum i2I i ui makes sense, we restrict our attention to families of finite
support.
Definition 2.8. Given any field K, a family of scalars ( i )i2I has finite support if i =0
for all i 2 I J, for some finite subset J of I.
If ( i )i2I is a family of scalars of finite support, for any vector space E over K,
Pfor any
of vectors ui 2 E, we define the linear combination i2I i ui
(possibly infinite) family (ui )i2I P
as the finite linear combination j2J j uj , where J is any finite subset of I such that i = 0
for all i 2 I J. In general, results stated for finite families also hold for families of finite
support.
Definition 2.9. Given a vector space E and a subspace V of E, a family (vi )i2I of vectors
vi 2 V spans V or generates V i↵ for every v 2 V , there is some family ( i )i2I of scalars in
R such that X
v= i vi .
i2I
We also say that the elements of (vi )i2I are generators of V and that V is spanned by (vi )i2I ,
or generated by (vi )i2I . If a subspace V of E is generated by a finite family (vi )i2I , we say
that V is finitely generated . A family (ui )i2I that spans V and is linearly independent is
called a basis of V .
Example 2.6.
1. In R3 , the vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1), illustrated in Figure 2.9, form a basis.
2. The vectors (1, 1, 1, 1), (1, 1, 1, 1), (1, 1, 0, 0), (0, 0, 1, 1) form a basis of R4 known
as the Haar basis. This basis and its generalization to dimension 2n are crucial in
wavelet theory.
The first key result of linear algebra is that every vector space E has a basis. We begin
with a crucial lemma which formalizes the mechanism for building a basis incrementally.
Lemma 2.9. Given a linearly independent family (ui )i2I of elements of a vector space E, if
v 2 E is not a linear combination of (ui )i2I , then the family (ui )i2I [k (v) obtained by adding
v to the family (ui )i2I is linearly independent (where k 2 / I).
P
Proof. Assume that µv + i2I i ui = 0, for any family ( i )i2I ofPscalars in R. If µ 6= 0, then
µ has an inverse (because R is a field), and thus we have v = i2I (µ
1
i )ui , showing that
v is a linear
P combination of (u )
i i2I and contradicting the hypothesis. Thus, µ = 0. But then,
we have i2I i ui = 0, and since the family (ui )i2I is linearly independent, we have i = 0
for all i 2 I.
The next theorem holds in general, but the proof is more sophisticated for vector spaces
that do not have a finite set of generators. Thus, in this chapter, we only prove the theorem
for finitely generated vector spaces.
Theorem 2.10. Given any finite family S = (ui )i2I generating a vector space E and any
linearly independent subfamily L = (uj )j2J of S (where J ✓ I), there is a basis B of E such
that L ✓ B ✓ S.
2.5. BASES OF A VECTOR SPACE 51
Proof. Consider the set of linearly independent families B such that L ✓ B ✓ S. Since this
set is nonempty and finite, it has some maximal element (that is, a subfamily B = (uh )h2H
of S with H ✓ I of maximum cardinality), say B = (uh )h2H . We claim that B generates
E. Indeed, if B does not generate E, then there is some up 2 S that is not a linear
combination of vectors in B (since S generates E), with p 2 / H. Then by Lemma 2.9, the
family B 0 = (uh )h2H[{p} is linearly independent, and since L ✓ B ⇢ B 0 ✓ S, this contradicts
the maximality of B. Thus, B is a basis of E such that L ✓ B ✓ S.
Remark: Theorem 2.10 also holds for vector spaces that are not finitely generated. In this
case, the problem is to guarantee the existence of a maximal linearly independent family B
such that L ✓ B ✓ S. The existence of such a maximal family can be shown using Zorn’s
lemma; see Lang [40] (Theorem 5.1).
A situation where the full generality of Theorem 2.10 is needed
p is the case of the vector
space R over the field of coefficients Q. The numbers 1 and 2 are linearly independent
p
over Q, so according to Theorem 2.10, the linearly independent family L = (1, 2) can be
extended to a basis B of R. Since R is uncountable and Q is countable, such a basis must
be uncountable!
The notion of a basis can also be defined in terms of the notion of maximal linearly
independent family and minimal generating family.
Definition 2.10. Let (vi )i2I be a family of vectors in a vector space E. We say that (vi )i2I
a maximal linearly independent family of E if it is linearly independent, and if for any vector
w 2 E, the family (vi )i2I [k {w} obtained by adding w to the family (vi )i2I is linearly
dependent. We say that (vi )i2I a minimal generating family of E if it spans E, and if for
any index p 2 I, the family (vi )i2I {p} obtained by removing vp from the family (vi )i2I does
not span E.
Proposition 2.11. Given a vector space E, for any family B = (vi )i2I of vectors of E, the
following properties are equivalent:
(1) B is a basis of E.
Proof. We will first prove the equivalence of (1) and (2). Assume (1). Since B is a basis, it is
a linearly independent family. We claim that B is a maximal linearly independent family. If
B is not a maximal linearly independent family, then there is some vector w 2 E such that
the family B 0 obtained by adding w to B is linearly independent. However, since B is a basis
52 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
The second key result of linear algebra is that for any two bases (ui )i2I and (vj )j2J of a
vector space E, the index sets I and J have the same cardinality. In particular, if E has a
finite basis of n elements, every basis of E has n elements, and the integer n is called the
dimension of the vector space E.
To prove the second key result, we can use the following replacement lemma due to
Steinitz. This result shows the relationship between finite linearly independent families and
finite families of generators of a vector space. We begin with a version of the lemma which is
a bit informal, but easier to understand than the precise and more formal formulation given
in Proposition 2.13. The technical difficulty has to do with the fact that some of the indices
need to be renamed.
Proposition 2.12. (Replacement lemma, version 1) Given a vector space E, let (u1 , . . . , um )
be any finite linearly independent family in E, and let (v1 , . . . , vn ) be any finite family such
that every ui is a linear combination of (v1 , . . . , vn ). Then we must have m n, and there
is a replacement of m of the vectors vj by (u1 , . . . , um ), such that after renaming some of the
indices of the vj s, the families (u1 , . . . , um , vm+1 , . . . , vn ) and (v1 , . . . , vn ) generate the same
subspace of E.
(v1 , . . . , vn ) generate the same subspace of E. The vector um+1 can also be expressed as a lin-
ear combination of (v1 , . . . , vn ), and since (u1 , . . . , um , vm+1 , . . . , vn ) and (v1 , . . . , vn ) generate
the same subspace, um+1 can be expressed as a linear combination of (u1 , . . . , um , vm+1 , . . .,
vn ), say
m
X Xn
um+1 = i ui + j vj .
i=1 j=m+1
a nontrivial linear dependence of the ui , which is impossible since (u1 , . . . , um+1 ) are linearly
independent.
Therefore, m + 1 n, and after renaming indices if necessary, we may assume that
m+1 6= 0, so we get
m
X n
X
1 1 1
vm+1 = ( m+1 i )ui m+1 um+1 ( m+1 j )vj .
i=1 j=m+2
Observe that the families (u1 , . . . , um , vm+1 , . . . , vn ) and (u1 , . . . , um+1 , vm+2 , . . . , vn ) generate
the same subspace, since um+1 is a linear combination of (u1 , . . . , um , vm+1 , . . . , vn ) and vm+1
is a linear combination of (u1 , . . . , um+1 , vm+2 , . . . , vn ). Since (u1 , . . . , um , vm+1 , . . . , vn ) and
(v1 , . . . , vn ) generate the same subspace, we conclude that (u1 , . . . , um+1 , vm+2 , . . . , vn ) and
and (v1 , . . . , vn ) generate the same subspace, which concludes the induction hypothesis.
u1 = v 4 + v5
u2 = v 3 + v4 v 5
u3 = v 1 + v2 + v3 .
u2 = v 3 + v4 v 5 = v 3 + u1 v5 v 5 = u1 + v 3 2v5 .
v3 = u1 + u2 + 2v5 ,
54 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
and so
u3 = v 1 + v 2 + v 3 = v 1 + v 2 u1 + u2 + 2v5 .
Finally, we get
v 1 = u1 u2 + u 3 v2 2v5
Therefore we have
v 1 = u1 u2 + u 3 v 2 2v5
v3 = u1 + u2 + 2v5
v 4 = u1 v 5 ,
which shows that (u1 , u2 , u3 , v2 , v5 ) spans the same subspace as (v1 , v2 , v3 , v4 , v5 ). The vectors
(v1 , v3 , v4 ) have been replaced by (u1 , u2 , u3 ), and the vectors left over are (v2 , v5 ). We can
rename them (v4 , v5 ).
For the sake of completeness, here is a more formal statement of the replacement lemma
(and its proof).
Proposition 2.13. (Replacement lemma, version 2) Given a vector space E, let (ui )i2I be
any finite linearly independent family in E, where |I| = m, and let (vj )j2J be any finite family
such that every ui is a linear combination of (vj )j2J , where |J| = n. Then there exists a set
L and an injection ⇢ : L ! J (a relabeling function) such that L \ I = ;, |L| = n m, and
the families (ui )i2I [ (v⇢(l) )l2L and (vj )j2J generate the same subspace of E. In particular,
m n.
Proof. We proceed by induction on |I| = m. When m = 0, the family (ui )i2I is empty, and
the proposition holds trivially with L = J (⇢ is the identity). Assume |I| = m + 1. Consider
the linearly independent family (ui )i2(I {p}) , where p is any member of I. By the induction
hypothesis, there exists a set L and an injection ⇢ : L ! J such that L \ (I {p}) = ;,
|L| = n m, and the families (ui )i2(I {p}) [ (v⇢(l) )l2L and (vj )j2J generate the same subspace
of E. If p 2 L, we can replace L by (L {p}) [ {p0 } where p0 does not belong to I [ L, and
replace ⇢ by the injection ⇢0 which agrees with ⇢ on L {p} and such that ⇢0 (p0 ) = ⇢(p).
Thus, we can always assume that L \ I = ;. Since up is a linear combination of (vj )j2J
and the families (ui )i2(I {p}) [ (v⇢(l) )l2L and (vj )j2J generate the same subspace of E, up is
a linear combination of (ui )i2(I {p}) [ (v⇢(l) )l2L . Let
X X
up = i ui + l v⇢(l) . (1)
i2(I {p}) l2L
contradicting the fact that (ui )i2I is linearly independent. Thus, l 6= 0 for some l 2 L, say
l = q. Since q 6= 0, we have
X X
v⇢(q) = ( q 1 i )ui + q 1 up + ( q
1
l )v⇢(l) . (2)
i2(I {p}) l2(L {q})
We claim that the families (ui )i2(I {p}) [ (v⇢(l) )l2L and (ui )i2I [ (v⇢(l) )l2(L {q}) generate the
same subset of E. Indeed, the second family is obtained from the first by replacing v⇢(q) by up ,
and vice-versa, and up is a linear combination of (ui )i2(I {p}) [ (v⇢(l) )l2L , by (1), and v⇢(q) is a
linear combination of (ui )i2I [(v⇢(l) )l2(L {q}) , by (2). Thus, the families (ui )i2I [(v⇢(l) )l2(L {q})
and (vj )j2J generate the same subspace of E, and the proposition holds for L {q} and the
restriction of the injection ⇢ : L ! J to L {q}, since L \ I = ; and |L| = n m imply that
(L {q}) \ I = ; and |L {q}| = n (m + 1).
The idea is that m of the vectors vj can be replaced by the linearly independent ui s in
such a way that the same subspace is still generated. The purpose of the function ⇢ : L ! J
is to pick n m elements j1 , . . . , jn m of J and to relabel them l1 , . . . , ln m in such a way
that these new indices do not clash with the indices in I; this way, the vectors vj1 , . . . , vjn m
who “survive” (i.e. are not replaced) are relabeled vl1 , . . . , vln m , and the other m vectors vj
with j 2 J {j1 , . . . , jn m } are replaced by the ui . The index set of this new family is I [ L.
Actually, one can prove that Proposition 2.13 implies Theorem 2.10 when the vector
space is finitely generated. Putting Theorem 2.10 and Proposition 2.13 together, we obtain
the following fundamental theorem.
Theorem 2.14. Let E be a finitely generated vector space. Any family (ui )i2I generating E
contains a subfamily (uj )j2J which is a basis of E. Any linearly independent family (ui )i2I
can be extended to a family (uj )j2J which is a basis of E (with I ✓ J). Furthermore, for
every two bases (ui )i2I and (vj )j2J of E, we have |I| = |J| = n for some fixed integer n 0.
Proof. The first part follows immediately by applying Theorem 2.10 with L = ; and S =
(ui )i2I . For the second part, consider the family S 0 = (ui )i2I [ (vh )h2H , where (vh )h2H is any
finitely generated family generating E, and with I \ H = ;. Then apply Theorem 2.10 to
L = (ui )i2I and to S 0 . For the last statement, assume that (ui )i2I and (vj )j2J are bases of
E. Since (ui )i2I is linearly independent and (vj )j2J spans E, Proposition 2.13 implies that
|I| |J|. A symmetric argument yields |J| |I|.
Remark: Theorem 2.14 also holds for vector spaces that are not finitely generated.
Definition 2.11. When a vector space E is not finitely generated, we say that E is of infinite
dimension. The dimension of a finitely generated vector space E is the common dimension
n of all of its bases and is denoted by dim(E).
Clearly, if the field R itself is viewed as a vector space, then every family (a) where a 2 R
and a 6= 0 is a basis. Thus dim(R) = 1. Note that dim({0}) = 0.
56 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Let (ui )i2I be a basis of a vector space E. For any vector v 2 E, since the family (ui )i2I
generates E, there is a family ( i )i2I of scalars in R, such that
X
v= i ui .
i2I
Proof. First, assumePthat (ui )i2I is linearly independent. If (µi )i2I is another family of scalars
in R such that v = i2I µi ui , then we have
X
( i µi )ui = 0,
i2I
and since (ui )i2I is linearly independent, we must have i µi = 0 for all i 2 I, that is, i = µi
for all i 2 I. The converse is shown by contradiction. If (ui )i2I was linearly dependent, there
would be a family (µi )i2I of scalars not all null such that
X
µi u i = 0
i2I
with j 6= j +µ
Pj since µj 6= 0, contradicting the assumption that ( i )i2I is the unique family
such that v = i2I i ui .
Definition 2.13. If (ui )i2I is a basis of a vector space E, for any vector v 2 E, if (xi )i2I is
the unique family of scalars in R such that
X
v= x i ui ,
i2I
each xi is called the component (or coordinate) of index i of v with respect to the basis (ui )i2I .
2.6. MATRICES 57
2.6 Matrices
In Section 2.1 we introduced informally the notion of a matrix. In this section we define
matrices precisely, and also introduce some operations on matrices. It turns out that matri-
ces form a vector space equipped with a multiplication operation which is associative, but
noncommutative. We will explain in Section 3.1 how matrices can be used to represent linear
maps, defined in the next section.
(a1 1 · · · a1 n )
In these last two cases, we usually omit the constant index 1 (first index in case of a row,
second index in case of a column). The set of all m ⇥ n-matrices is denoted by Mm,n (K)
or Mm,n . An n ⇥ n-matrix is called a square matrix of dimension n. The set of all square
matrices of dimension n is denoted by Mn (K), or Mn .
Remark: As defined, a matrix A = (ai j )1im, 1jn is a family, that is, a function from
{1, 2, . . . , m} ⇥ {1, 2, . . . , n} to K. As such, there is no reason to assume an ordering on
the indices. Thus, the matrix A can be represented in many di↵erent ways as an array, by
adopting di↵erent orders for the rows or the columns. However, it is customary (and usually
convenient) to assume the natural ordering on the sets {1, 2, . . . , m} and {1, 2, . . . , n}, and
to represent A as an array according to this ordering of the rows and columns.
We define some operations on matrices as follows.
58 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Definition 2.15. Given two m ⇥ n matrices A = (ai j ) and B = (bi j ), we define their sum
A + B as the matrix C = (ci j ) such that ci j = ai j + bi j ; that is,
0 1 0 1
a1 1 a1 2 . . . a1 n b1 1 b1 2 . . . b1 n
B a2 1 a2 2 C B
. . . a2 n C B b2 1 b2 2 . . . b2 n C
B C
B .. .. .. .. C + B .. .. .. .. C
@ . . . . A @ . . . . A
am 1 am 2 . . . am n bm 1 bm 2 . . . bm n
0 1
a1 1 + b1 1 a1 2 + b1 2 ... a1 n + b 1 n
B a2 1 + b 2 1 a2 2 + b 2 2 ... a2 n + b 2 n C
B C
=B .. .. ... .. C.
@ . . . A
am 1 + b m 1 am 2 + b m 2 . . . am n + bm n
For any matrix A = (ai j ), we let A be the matrix ( ai j ). Given a scalar 2 K, we define
the matrix A as the matrix C = (ci j ) such that ci j = ai j ; that is
0 1 0 1
a1 1 a1 2 . . . a1 n a1 1 a1 2 ... a1 n
B a2 1 a2 2 . . . a2 n C B a2 1 a2 2 ... a2 n C
B C B C
B .. .. .. .. C = B .. .. .. .. C .
@ . . . . A @ . . . . A
am 1 am 2 . . . am n am 1 am 2 . . . am n
note that the entry of index i and j of the matrix AB obtained by multiplying the matrices
A and B can be identified with the product of the row matrix corresponding to the i-th row
of A with the column matrix corresponding to the j-column of B:
0 1
b1 j n
B .. C X
(ai 1 · · · ai n ) @ . A = ai k b k j .
bn j k=1
2.6. MATRICES 59
Definition 2.16. The square matrix In of dimension n containing 1 on the diagonal and 0
everywhere else is called the identity matrix . It is denoted by
0 1
1 0 ... 0
B0 1 . . . 0 C
B C
In = B .. .. . . .. C
@. . . .A
0 0 ... 1
Definition 2.17. Given an m ⇥ n matrix A = (ai j ), its transpose A> = (a> j i ), is the
n ⇥ m-matrix such that a>
ji = a ij , for all i, 1 i m, and all j, 1 j n.
The following observation will be useful later on when we discuss the SVD. Given any
m ⇥ n matrix A and any n ⇥ p matrix B, if we denote the columns of A by A1 , . . . , An and
the rows of B by B1 , . . . , Bn , then we have
AB = A1 B1 + · · · + An Bn .
Definition 2.18. For any square matrix A of dimension n, if a matrix B such that AB =
BA = In exists, then it is unique, and it is called the inverse of A. The matrix B is also
denoted by A 1 . An invertible matrix is also called a nonsingular matrix, and a matrix that
is not invertible is called a singular matrix.
60 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Using Proposition 2.20 and the fact that matrices represent linear maps, it can be shown
that if a square matrix A has a left inverse, that is a matrix B such that BA = I, or a right
inverse, that is a matrix C such that AC = I, then A is actually invertible; so B = A 1 and
C = A 1 . These facts also follow from Proposition 5.14.
Using Proposition 2.3 (or mimicking the computations in its proof), we note that if A
and B are two n ⇥ n invertible matrices, then AB is also invertible and (AB) 1 = B 1 A 1 .
It is immediately verified that the set Mm,n (K) of m ⇥ n matrices is a vector space under
addition of matrices and multiplication of a matrix by a scalar.
Definition 2.19. The m ⇥ n-matrices Eij = (eh k ), are defined such that ei j = 1, and
eh k = 0, if h 6= i or k 6= j; in other words, the (i, j)-entry is equal to 1 and all other entries
are 0.
It is clear that every matrix A = (ai j ) 2 Mm,n (K) can be written in a unique way as
m X
X n
A= ai j Eij .
i=1 j=1
Thus, the family (Eij )1im,1jn is a basis of the vector space Mm,n (K), which has dimension
mn.
Remark: Definition 2.14 and Definition 2.15 also make perfect sense when K is a (com-
mutative) ring rather than a field. In this more general setting, the framework of vector
spaces is too narrow, but we can consider structures over a commutative ring A satisfying
all the axioms of Definition 2.4. Such structures are called modules. The theory of modules
is (much) more complicated than that of vector spaces. For example, modules do not always
have a basis, and other properties holding for vector spaces usually fail for modules. When
a module has a basis, it is called a free module. For example, when A is a commutative
ring, the structure An is a module such that the vectors ei , with (ei )i = 1 and (ei )j = 0 for
j 6= i, form a basis of An . Many properties of vector spaces still hold for An . Thus, An is a
free module. As another example, when A is a commutative ring, Mm,n (A) is a free module
with basis (Ei,j )1im,1jn . Polynomials over a commutative ring also form a free module
of infinite dimension.
The properties listed in Proposition 2.16 are easily verified, although some of the com-
putations are a bit tedious. A more conceptual proof is given in Proposition 3.1.
2.7. LINEAR MAPS 61
Proposition 2.16. (1) Given any matrices A 2 Mm,n (K), B 2 Mn,p (K), and C 2 Mp,q (K),
we have
(AB)C = A(BC);
that is, matrix multiplication is associative.
(2) Given any matrices A, B 2 Mm,n (K), and C, D 2 Mn,p (K), for all 2 K, we have
(A + B)C = AC + BC
A(C + D) = AC + AD
( A)C = (AC)
A( C) = (AC),
so that matrix multiplication · : Mm,n (K) ⇥ Mn,p (K) ! Mm,p (K) is bilinear.
The properties of Proposition 2.16 together with the fact that AIn = In A = A for all
square n⇥n matrices show that Mn (K) is a ring with unit In (in fact, an associative algebra).
This is a noncommutative ring with zero divisors, as shown by the following example.
then ✓ ◆✓ ◆ ✓ ◆
1 0 0 0 0 0
AB = = ,
0 0 1 0 0 0
and ✓ ◆✓ ◆ ✓ ◆
0 0 1 0 0 0
BA = = .
1 0 0 0 1 0
Thus AB 6= BA, and AB = 0, even though both A, B 6= 0.
In the rest of this section, we assume that all vector spaces are real vector spaces, but all
results hold for vector spaces over an arbitrary field.
Definition 2.20. Given two vector spaces E and F , a linear map (or linear transformation)
between E and F is a function f : E ! F satisfying the following two conditions:
Setting x = y = 0 in the first identity, we get f (0) = 0. The basic property of linear maps
is that they transform linear combinations into linear combinations. Given any finite family
(ui )i2I of vectors in E, given any family ( i )i2I of scalars in R, we have
X X
f( i ui ) = i f (ui ).
i2I i2I
The above identity is shown by induction on |I| using the properties of Definition 2.20.
Example 2.8.
1. The map f : R2 ! R2 defined such that
x0 = x y
y0 = x + y
is a linear map. When we want to be more precise, we write idE instead of id.
where C([a, b]) is the set of continuous functions defined on the interval [a, b], is a linear
map.
2.7. LINEAR MAPS 63
is linear in each of the variable f , g. It also satisfies the properties hf, gi = hg, f i and
hf, f i = 0 i↵ f = 0. It is an example of an inner product.
Definition 2.21. Given a linear map f : E ! F , we define its image (or range) Im f = f (E),
as the set
Im f = {y 2 F | (9x 2 E)(y = f (x))},
1
and its Kernel (or nullspace) Ker f = f (0), as the set
Ker f = {x 2 E | f (x) = 0}.
The derivative map D : R[X] ! R[X] from Example 2.8(3) has kernel the constant
polynomials, so Ker D = R. If we consider the second derivative D D : R[X] ! R[X], then
the kernel of D D consists of all polynomials of degree 1. The image of D : R[X] ! R[X]
is actually R[X] itself, because every polynomial P (X) = a0 X n + · · · + an 1 X + an of degree
n is the derivative of the polynomial Q(X) of degree n + 1 given by
X n+1 X2
Q(X) = a0 + · · · + an 1 + an X.
n+1 2
On the other hand, if we consider the restriction of D to the vector space R[X]n of polyno-
mials of degree n, then the kernel of D is still R, but the image of D is the R[X]n 1 , the
vector space of polynomials of degree n 1.
Proposition 2.17. Given a linear map f : E ! F , the set Im f is a subspace of F and the
set Ker f is a subspace of E. The linear map f : E ! F is injective i↵ Ker f = (0) (where
(0) is the trivial subspace {0}).
Proof. Given any x, y 2 Im f , there are some u, v 2 E such that x = f (u) and y = f (v),
and for all , µ 2 R, we have
f ( u + µv) = f (u) + µf (v) = x + µy,
and thus, x + µy 2 Im f , showing that Im f is a subspace of F .
Given any x, y 2 Ker f , we have f (x) = 0 and f (y) = 0, and thus,
f ( x + µy) = f (x) + µf (y) = 0,
that is, x + µy 2 Ker f , showing that Ker f is a subspace of E.
First, assume that Ker f = (0). We need to prove that f (x) = f (y) implies that x = y.
However, if f (x) = f (y), then f (x) f (y) = 0, and by linearity of f we get f (x y) = 0.
Because Ker f = (0), we must have x y = 0, that is x = y, so f is injective. Conversely,
assume that f is injective. If x 2 Ker f , that is f (x) = 0, since f (0) = 0 we have f (x) = f (0),
and by injectivity, x = 0, which proves that Ker f = (0). Therefore, f is injective i↵
Ker f = (0).
64 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Definition 2.22. Given a linear map f : E ! F , the rank rk(f ) of f is the dimension of
the image Im f of f .
A fundamental property of bases in a vector space is that they allow the definition of
linear maps as unique homomorphic extensions, as shown in the following proposition.
Proposition 2.18. Given any two vector spaces E and F , given any basis (ui )i2I of E,
given any other family of vectors (vi )i2I in F , there is a unique linear map f : E ! F such
that f (ui ) = vi for all i 2 I. Furthermore, f is injective i↵ (vi )i2I is linearly independent,
and f is surjective i↵ (vi )i2I generates F .
Proof. If such a linear map f : E ! F exists, since (ui )i2I is a basis of E, every vector x 2 E
can written uniquely as a linear combination
X
x= x i ui ,
i2I
and since (ui )i2I is a basis, we have i = 0 for all i 2 I, which shows that (vi )i2I is linearly
independent. Conversely, assume that (vi )i2I is linearly Pindependent. Since (ui )i2I is a basis
of E, every vector x 2 E is a linear combination x = i2I i ui of (ui )i2I . If
X
f (x) = f ( i ui ) = 0,
i2I
then X X X
i vi = i f (ui ) = f( i ui ) = 0,
i2I i2I i2I
and i = 0 for all i 2 I because (vi )i2I is linearly independent, which means that x = 0.
Therefore, Ker f = (0), which implies that f is injective. The part where f is surjective is
left as a simple exercise.
Figure 2.11 provides an illustration of Proposition 2.18 when E = R3 and V = R2
f
2
3 F= R
E= R
u3= (0,0,1)
v1= (1,1)
v2= (-1,1)
f(u1) - f(u )
2
2f(u3 )
f is not injective
Figure 2.11: Given u1 = (1, 0, 0), u2 = (0, 1, 0), u3 = (0, 0, 1) and v1 = (1, 1), v2 = ( 1, 1),
v3 = (1, 0), define the unique linear map f : R3 ! R2 by f (u1 ) = v1 , f (u2 ) = v2 , and
f (u3 ) = v3 . This map is surjective but not injective since f (u1 u2 ) = f (u1 ) f (u2 ) =
(1, 1) ( 1, 1) = (2, 0) = 2f (u3 ) = f (2u3 ).
By the second part of Proposition 2.18, an injective linear map f : E ! F sends a basis
(ui )i2I to a linearly independent family (f (ui ))i2I of F , which is also a basis when f is
bijective. Also, when E and F have the same finite dimension n, (ui )i2I is a basis of E, and
f : E ! F is injective, then (f (ui ))i2I is a basis of F (by Proposition 2.11).
The following simple proposition is also useful.
Proposition 2.19. Given any two vector spaces E and F , with F nontrivial, given any
family (ui )i2I of vectors in E, the following properties hold:
66 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
(1) The family (ui )i2I generates E i↵ for every family of vectors (vi )i2I in F , there is at
most one linear map f : E ! F such that f (ui ) = vi for all i 2 I.
(2) The family (ui )i2I is linearly independent i↵ for every family of vectors (vi )i2I in F ,
there is some linear map f : E ! F such that f (ui ) = vi for all i 2 I.
Proof. (1) If there is any linear map f : E ! F such that f (ui ) = vi for all i 2 I, since
(ui )i2I generates E, every vector x 2 E can be written as some linear combination
X
x= x i ui ,
i2I
This shows that f is unique if it exists. Conversely, assume that (ui )i2I does not generate E.
Since F is nontrivial, there is some some vector y 2 F such that y 6= 0. Since (ui )i2I does
not generate E, there is some vector w 2 E that is not in the subspace generated by (ui )i2I .
By Theorem 2.14, there is a linearly independent subfamily (ui )i2I0 of (ui )i2I generating the
same subspace. Since by hypothesis, w 2 E is not in the subspace generated by (ui )i2I0 , by
Lemma 2.9 and by Theorem 2.14 again, there is a basis (ej )j2I0 [J of E, such that ei = ui
for all i 2 I0 , and w = ej0 for some j0 2 J. Letting (vi )i2I be the family in F such that
vi = 0 for all i 2 I, defining f : E ! F to be the constant linear map with value 0, we have
a linear map such that f (ui ) = 0 for all i 2 I. By Proposition 2.18, there is a unique linear
map g : E ! F such that g(w) = y, and g(ej ) = 0 for all j 2 (I0 [ J) {j0 }. By definition
of the basis (ej )j2I0 [J of E, we have g(ui ) = 0 for all i 2 I, and since f 6= g, this contradicts
the fact that there is at most one such map. See Figure 2.12.
(2) If the family (ui )i2I is linearly independent, then by Theorem 2.14, (ui )i2I can be
extended to a basis of E, and the conclusion follows by Proposition 2.18. Conversely, assume
that (ui )i2I is linearly dependent. Then there is some family ( i )i2I of scalars (not all zero)
such that X
i ui = 0.
i2I
By the assumption, for any nonzero vector y 2 F , for every i 2 I, there is some linear map
fi : E ! F , such that fi (ui ) = y, and fi (uj ) = 0, for j 2 I {i}. Then we would get
X X
0 = fi ( i ui ) = i fi (ui ) = i y,
i2I i2I
and since y 6= 0, this implies i = 0 for every i 2 I. Thus, (ui )i2I is linearly independent.
2
3 F= R
w = (0,0,1) E= R
u2= (0,1,0)
u1 = (1,0,0)
g(w) = y
2
3 F= R
w = (0,0,1) E= R
u2= (0,1,0)
y = (1,0)
u1 = (1,0,0) defining g
Figure 2.12: Let E = R3 and F = R2 . The vectors u1 = (1, 0, 0), u2 = (0, 1, 0) do not
generate R3 since both the zero map and the map g, where g(0, 0, 1) = (1, 0), send the peach
xy-plane to the origin.
(1) If f has a left inverse g, that is, if g is a linear map such that g f = id, then f is an
isomorphism and f 1 = g.
(2) If f has a right inverse h, that is, if h is a linear map such that f h = id, then f is
an isomorphism and f 1 = h.
Proof. (1) The equation g f = id implies that f is injective; this is a standard result
about functions (if f (x) = f (y), then g(f (x)) = g(f (y)), which implies that x = y since
g f = id). Let (u1 , . . . , un ) be any basis of E. By Proposition 2.18, since f is injective,
(f (u1 ), . . . , f (un )) is linearly independent, and since E has dimension n, it is a basis of
E (if (f (u1 ), . . . , f (un )) doesn’t span E, then it can be extended to a basis of dimension
strictly greater than n, contradicting Theorem 2.14). Then f is bijective, and by a previous
observation its inverse is a linear map. We also have
1 1 1 1
g = g id = g (f f ) = (g f ) f = id f =f .
(2) The equation f h = id implies that f is surjective; this is a standard result about
functions (for any y 2 E, we have f (h(y)) = y). Let (u1 , . . . , un ) be any basis of E. By
Proposition 2.18, since f is surjective, (f (u1 ), . . . , f (un )) spans E, and since E has dimension
n, it is a basis of E (if (f (u1 ), . . . , f (un )) is not linearly independent, then because it spans
E, it contains a basis of dimension strictly smaller than n, contradicting Theorem 2.14).
Then f is bijective, and by a previous observation its inverse is a linear map. We also have
1 1 1 1
h = id h = (f f) h = f (f h) = f id = f .
The set Hom(E, F ) is a vector space under the operations defined in Example 2.3, namely
When E and F have finite dimensions, the vector space Hom(E, F ) also has finite di-
mension, as we shall see shortly.
2.8. LINEAR FORMS AND THE DUAL SPACE 69
(g1 + g2 ) f = g1 f + g2 f ;
g (f1 + f2 ) = g f1 + g f2 .
with i = f ⇤ (ui ) 2 K for every i, 1 i n. Thus, with respect to the basis (u1 , . . . , un ),
the linear form f ⇤ is represented by the row vector
( 1 ··· n ),
70 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
we have 0 1
x1
B .. C
f ⇤ (x) = 1 ··· n @ . A,
xn
a linear combination of the coordinates of x, and we can view the linear form f ⇤ as a linear
equation. If we decide to use a column vector of coefficients
0 1
c1
B .. C
[email protected]
cn
f ⇤ (x) = c> x.
>
Observe that c = . The above notation is often used in machine learning.
Example 2.10. Let C([0, 1]) be the vector space of continuous functions f : [0, 1] ! R. The
map I : C([0, 1]) ! R given by
Z 1
I(f ) = f (x)dx for any f 2 C([0, 1])
0
Example 2.11. Consider the vector space Mn (R) of real n⇥n matrices. Let tr : Mn (R) ! R
be the function given by
tr(A) = a11 + a22 + · · · + ann ,
called the trace of A. It is a linear form. Let s : Mn (R) ! R be the function given by
n
X
s(A) = aij ,
i,j=1
Given a vector space E and any basis (ui )i2I for E, we can associate to each ui a linear
form u⇤i 2 E ⇤ , and the u⇤i have some remarkable properties.
Definition 2.28. Given a vector space E and any basis (ui )i2I for E, by Proposition 2.18,
for every i 2 I, there is a unique linear form u⇤i such that
⇢
⇤ 1 if i = j
ui (uj ) =
0 if i 6= j,
for every j 2 I. The linear form u⇤i is called the coordinate form of index i w.r.t. the basis
(ui )i2I .
Remark: Given an index set I, authors often define the so called “Kronecker symbol” ij
such that ⇢
1 if i = j
ij =
0 if i 6= j,
for all i, j 2 I. Then, u⇤i (uj ) = i j.
The reason for the terminology coordinate form is as follows: If E has finite dimension
and if (u1 , . . . , un ) is a basis of E, for any vector
v= 1 u1 + ··· + n un ,
we have
u⇤i (v) = u⇤i ( 1 u1 + · · · + n un )
= 1 u⇤i (u1 ) + · · · + i u⇤i (ui ) + · · · + ⇤
n ui (un )
= i,
since u⇤i (uj ) = i j . Therefore, u⇤i is the linear function that returns the ith coordinate of a
vector expressed over the basis (u1 , . . . , un ).
The following theorem shows that in finite-dimension, every basis (u1 , . . . , un ) of a vector
space E yields a basis (u⇤1 , . . . , u⇤n ) of the dual space E ⇤ , called a dual basis.
Theorem 2.21. (Existence of dual bases) Let E be a vector space of dimension n. The
following property holds: For every basis (u1 , . . . , un ) of E, the family of coordinate forms
(u⇤1 , . . . , u⇤n ) is a basis of E ⇤ (called the dual basis of (u1 , . . . , un )).
Proof. If v ⇤ 2 E ⇤ is any linear form, consider the linear form
f ⇤ = v ⇤ (u1 )u⇤1 + · · · + v ⇤ (un )u⇤n .
Observe that because u⇤i (uj ) = i j,
Therefore, (u⇤1 , . . . , u⇤n ) spans E ⇤ . We claim that the covectors u⇤1 , . . . , u⇤n are linearly inde-
pendent. If not, we have a nontrivial linear dependence
⇤ ⇤
1 u1 + ··· + n un = 0,
and if we apply the above linear form to each ui , using a familar computation, we get
⇤
0= i ui (ui ) = i,
proving that u⇤1 , . . . , u⇤n are indeed linearly independent. Therefore, (u⇤1 , . . . , u⇤n ) is a basis of
E ⇤.
In particular, Theorem 2.21 shows a finite-dimensional vector space and its dual E ⇤ have
the same dimension.
We explained just after Definition 2.27 that if the space E is finite-dimensional and has
a finite basis (u1 , . . . , un ), then a linear form f ⇤ : E ! K is represented by the row vector of
coefficients
f ⇤ (u1 ) · · · f ⇤ (un ) . (1)
The proof of Theorem 2.21 shows that over the dual basis (u⇤1 , . . . , u⇤n ) of E ⇤ , the linear form
f ⇤ is represented by the same coefficients, but as the column vector
0 1
f ⇤ (u1 )
B .. C
@ . A, (2)
⇤
f (un )
2.9 Summary
The main concepts and results of this chapter are listed below:
• Families of vectors.
• Linear subspaces.
2.9. SUMMARY 73
• Any two bases in a finitely generated vector space E have the same number of elements;
this is the dimension of E (Theorem 2.14).
• Hyperplanes.
• Every vector has a unique representation over a basis (in terms of its coordinates).
• Matrices
• The vector space Mm,n (K) of m ⇥ n matrices over the field K; The ring Mn (K) of
n ⇥ n matrices over the field K.
• The image and the kernel of a linear map are subspaces. A linear map is injective i↵
its kernel is the trivial space (0) (Proposition 2.17).
• The unique homomorphic extension property of linear maps with respect to bases
(Proposition 2.18 ).
• Coordinate forms.
2.10 Problems
Problem 2.1. Let H be the set of 3 ⇥ 3 upper triangular matrices given by
80 1 9
< 1 a b =
H= @ A
0 1 c | a, b, c 2 R .
: ;
0 0 1
(1) Prove that H with the binary operation of matrix multiplication is a group; find
explicitly the inverse of every matrix in H. Is H abelian (commutative)?
(2) Given two groups G1 and G2 , recall that a homomorphism if a function ' : G1 ! G2
such that
'(ab) = '(a)'(b), a, b 2 G1 .
Prove that '(e1 ) = e2 (where ei is the identity element of Gi ) and that
'(a 1 ) = ('(a)) 1 , a 2 G1 .
(3) Let S 1 be the unit circle, that is
S 1 = {ei✓ = cos ✓ + i sin ✓ | 0 ✓ < 2⇡},
and let ' be the function given by
0 1
1 a b
' @0 1 c A = (a, c, eib ).
0 0 1
Prove that ' is a surjective function onto G = R ⇥ R ⇥ S 1 , and that if we define
multiplication on this set by
(x1 , y1 , u1 ) · (x2 , y2 , u2 ) = (x1 + x2 , y1 + y2 , eix1 y2 u1 u2 ),
then G is a group and ' is a group homomorphism from H onto G.
(4) The kernel of a homomorphism ' : G1 ! G2 is defined as
Ker (') = {a 2 G1 | '(a) = e2 }.
Find explicitly the kernel of ' and show that it is a subgroup of H.
Problem 2.2. For any m 2 Z with m > 0, the subset mZ = {mk | k 2 Z} is an abelian
subgroup of Z. Check this.
(1) Give a group isomorphism (an invertible homomorphism) from mZ to Z.
(2) Check that the inclusion map i : mZ ! Z given by i(mk) = mk is a group homomor-
phism. Prove that if m 2 then there is no group homomorphism p : Z ! mZ such that
p i = id.
Remark: The above shows that abelian groups fail to have some of the properties of vector
spaces. We will show later that a linear map satisfying the condition p i = id always exists.
2.10. PROBLEMS 75
Prove that the columns of A1 are linearly independent. Find the coordinates of the vector
x = (6, 2, 7) over the basis consisting of the column vectors of A1 .
Problem 2.6. Let A2 be the following matrix:
0 1
1 2 1 1
B2 3 2 3C
A2 = B@ 1 0
C.
1 1A
2 1 3 0
Express the fourth column of A2 as a linear combination of the first three columns of A2 . Is
the vector x = (7, 14, 1, 2) a linear combination of the columns of A2 ?
Problem 2.7. Let A3 be the following matrix:
0 1
1 1 1
A3 = @1 1 2A .
1 2 3
Prove that the columns of A1 are linearly independent. Find the coordinates of the vector
x = (6, 9, 14) over the basis consisting of the column vectors of A3 .
Problem 2.8. Let A4 be the following matrix:
0 1
1 2 1 1
B2 3 2 3C
A4 = B@ 1 0
C.
1 1A
2 1 4 0
Prove that the columns of A4 are linearly independent. Find the coordinates of the vector
x = (7, 14, 1, 2) over the basis consisting of the column vectors of A4 .
Problem 2.9. Consider the following Haar matrix
0 1
1 1 1 0
B1 1 1 0C
H=B @1
C.
1 0 1A
1 1 0 1
v i = ai 1 u 1 + · · · + ai m u m , 1 i m,
and that the matrix A = (ai j ) is an upper-triangular matrix, which means that if 1 j <
i m, then ai j = 0. Prove that if (u1 , . . . , um ) are linearly independent and if all the
diagonal entries of A are nonzero, then (v1 , . . . , vm ) are also linearly independent.
Hint. Use induction on m.
(2) Let A = (ai j ) be an upper-triangular matrix. Prove that if all the diagonal entries of
A are nonzero, then A is invertible and the inverse A 1 of A is also upper-triangular.
Hint. Use induction on m.
Prove that if A is invertible, then all the diagonal entries of A are nonzero.
(3) Prove that if the families (u1 , . . . , um ) and (v1 , . . . , vm ) are related as in (1), then
(u1 , . . . , um ) are linearly independent i↵ (v1 , . . . , vm ) are linearly independent.
Problem 2.12. In solving this problem, do not use determinants. Consider the n ⇥ n
matrix 0 1
1 2 0 0 ... 0 0
B0 1 2 0 . . . 0 0C
B C
B0 0 1 2 . . . 0 0C
B C
B . . . . .. .. C
A = B ... ... . . . . . . .C .
B C
B0 0 . . . 0 1 2 0C
B C
@0 0 . . . 0 0 1 2A
0 0 ... 0 0 0 1
Ax = b,
78 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
for 0 1
b1
B b2 C
B C
b = B .. C .
@.A
bn
(2) Prove that the matrix A is invertible and find its inverse A 1 . Given that the number
of atoms in the universe is estimated to be 1082 , compare the size of the coefficients the
inverse of A to 1082 , if n 300.
(3) Assume b is perturbed by a small amount b (note that b is a vector). Find the new
solution of the system
A(x + x) = b + b,
where x is also a vector. In the case where b = (0, . . . , 0, 1), and b = (0, . . . , 0, ✏), show
that
|( x)1 | = 2n 1 |✏|.
(where ( x)1 is the first component of x).
(4) Prove that (A I)n = 0.
Problem 2.14. (1) Let A be an n ⇥ n matrix. If A is invertible, prove that for any x 2 Rn ,
if Ax = 0, then x = 0.
(2) Let A be an m ⇥ n matrix and let B be an n ⇥ m matrix. Prove that Im AB is
invertible i↵ In BA is invertible.
Hint. If for all x 2 Rn , M x = 0 implies that x = 0, then M is invertible.
2.10. PROBLEMS 79
(3) Show that the n diagonal n ⇥ n matrices Di defined such that the diagonal entries of
Di are equal the entries (from top down) of the ith column of B form a basis of the space of
n ⇥ n diagonal matrices (matrices with zeros everywhere except possibly on the diagonal).
For example, when n = 4, we have
0 1 0 1
1 0 0 0 1 0 0 0
B0 1 0 0 C B0 1 0 0C
D1 = B@0 0 1 0 A
C D = B
@0
C,
2
0 1 0A
0 0 0 1 0 0 0 1
0 1 0 1
1 0 0 0 1 0 0 0
B 0 1 0 0C B0 1 0 0C
D3 = B@0 0
C, D = B
@ 0 0 1 0 A.
C
1 0A 4
0 0 0 1 0 0 0 1
80 CHAPTER 2. VECTOR SPACES, BASES, LINEAR MAPS
Problem 2.16. Given any m⇥n matrix A and any n⇥p matrix B, if we denote the columns
of A by A1 , . . . , An and the rows of B by B1 , . . . , Bn , prove that
AB = A1 B1 + · · · + An Bn .
Problem 2.17. Let f : E ! F be a linear map which is also a bijection (it is injective and
surjective). Prove that the inverse function f 1 : F ! E is linear.
Problem 2.18. Given two vectors spaces E and F , let (ui )i2I be any basis of E and let
(vi )i2I be any family of vectors in F . Prove that the unique linear map f : E ! F such that
f (ui ) = vi for all i 2 I is surjective i↵ (vi )i2I spans F .
Problem 2.19. Let f : E ! F be a linear map with dim(E) = n and dim(F ) = m. Prove
that f has rank 1 i↵ f is represented by an m ⇥ n matrix of the form
A = uv >
with u a nonzero column vector of dimension m and v a nonzero column vector of dimension
n.
Problem 2.20. Find a nontrivial linear dependence among the linear forms
are linearly independent. Express the linear form '(x, y, z) = x+y+z as a linear combination
of '1 , '2 , '3 .