Mat 224
Mat 224
Course Notes
Tyler Holden
Mathematics and Computational Sciences
University of Toronto Mississauga
[email protected]
ii
Contents
1 General Vector Spaces 1
1.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Operations on Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.1 Internal Direct Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.2 External Direct Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.3 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Linear Transformations 27
2.1 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 The Kernel and Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Change of Basis 42
3.1 Coordinate Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Change of Basis Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
iii
1 General Vector Spaces
Your previous exposure to linear algebra was limited to studying spaces like Rn for some n ∈ N. Our
goal is to generalize the properties of Rn and define something called vector spaces. When studying
T
Rn there were two fundamental “classes” of objects: There were vectors v = v1 v2 · · · vn ∈
Rn , and scalars c ∈ R. We can add two vectors together, and multiply by scalars, as follows:
v1 w1 v 1 + w1 v1 cv1
v 2 w2 v 2 + w2 v2 cv2
.. + .. = .. and c . = . .
. . . .. ..
vn wn v n + wn vn cvn
An alternative viewpoint, which will be useful throughout the course, is to write vectors in terms
T
of the “standard basis.” Let ei = 0 · · · 0 1 0 · · · 0 denote the vector whose value is
1 in the ith row, and 0 everywhere Pn else. The vector v = v 1 v2 · · · v n can be written as
v = v1 e1 + v2 e2 + · · · + vn en = i=1 vi ei . Here the ei act as placeholders, with vector addition
and scalar multiplication reading as
" #
n
X Xn n
X n
X n
X
vi ei +
wj ej = (vi + wi )ei and c vi ei = (cvi )ei . (1.1)
i=1 j=1 i=1 i=1 i=1
The idea is to now generalize this to a more abstract setting. We will define a space consisting of
vectors and scalar, demanding that vectors can be added to one another, and multiplied by scalars.
If we can prove general properties about such spaces, we may be able to discover interesting results
that apply to more than just Rn .
1
2013-
c Tyler Holden
1 General Vector Spaces 1.1 Vector Spaces
Definition 1.1
A (real) vector space is a set V equipped with two operations, called vector addition + :
V × V → V and scalar multiplication · : R × V → V . When u, v ∈ V , we often write vector
addition as u + v. When c ∈ R and v ∈ V , we will often write scalar multiplication as
c · v = cv. These two operators must satisfy the following properties:
The formal definition of a vector space is rather tedious, and in fact I don’t remember all of
these conditions. Mathematicians internalize the notion of a vector space differently.1 That being
said, let’s quickly discuss the implications of some of these conditions:
• The closure properties (1) and (6) are sanity conditions, and say that we’re not allowed to
leave V . In this sense, the set V = [0, 1] with the usual notions of addition and multiplication,
cannot be a vector space. Indeed, if v = 0.8 and w = 0.9 then v + w = 1.7, which is not in
V.
• Properties (2), (3), and (7) say that we don’t have to worry about the order in which we
apply our operations.
• Property (8) ensures that addition and scalar multiplication play together nicely, and is the
only condition which mixes both addition and multiplication. This is why it’s called the
compatibility condition.
• Finally, properties (4), (5), and (9) tell us how the special numbers 0, 1 ∈ R interact with
vectors, and that vectors have inverses.
1
To a mathematician, the definition of a vector space is “a module over a field.”
2
2013-
c Tyler Holden
1.1 Vector Spaces 1 General Vector Spaces
If you’ve studied the field axioms before, some of this will probably seem familiar. Like with
the field axioms, many of these seem “obvious,” but be careful not to assume that anything beyond
these facts is true. Moreover, you may have notice that I put the word “real” in brackets. In this
course we will only study real vector spaces, so I may be lazy and drop this adjective; however, in
more abstract linear algebra, you learn how to apply the principles we’ll learn to vector spaces over
other fields, such as C or Z2 .
The definition of a vector space makes it tedious to show that something is a vector space.
While you must ensure that all of the conditions are satisfied, many of them are usually trivial.
You should feel free to wave away anything that is trivial by providing a quick comment.
Example 1.2
Example 1.3
the set of polynomials of degree n. Show that this is a vector space under the usual notions
of polynomial addition and multiplication by a real number.
Solution. Here I’ll demonstrate what I mean by discarding most of the vector space axioms. For
example, polynomial addition is certainly commutative (2) and associative (3), so you don’t need
to prove this. Similarly, (7), (8), and (9) are all clearly true. Thus we need only prove (1), (4), (5),
and (6).
Let’s check closure. Fix two polynomials p, q ∈ Pn (R) and write them as
n
X n
X
i
p(x) = pi x and q(x) = qi x i .
i=0 j=0
If c ∈ R then
n
X n
X
p(x) + q(x) = (pi + qi )xi ∈ Pn (R) and cp(x) = (cpi )xi ∈ Pn (R) (1.2)
i=0 i=0
showing that Pn (R) is closed under both addition and scalar multiplication. Notice here the sim-
ilarly between (1.1) and (1.2), and how the xi play the role of placeholders, just like the ei do in
Rm .
The zero vector is the zero polynomial, since if 0(x) = 0 then for any other polynomial q(x) we
have 0(x) + q(x) = 0 + q(x) = q(x).This might be obvious, but the presence of the zero vector is
always critical to check.
3
2013-
c Tyler Holden
1 General Vector Spaces 1.1 Vector Spaces
so that
n
X n
X
p(x) + (−p(x)) = [pi + (−pi )] = 0xi = 0.
i=0 i=0
This brings us to the following lemma, which gives us a few more properties of vector spaces
that can be proven from the axioms:
Lemma 1.4
Let V be a vector space, with u, v, w ∈ V and c ∈ R. The following are all true:
1. If u + v = u + w then v = w.
3. (−1)v = −v
Proof. These are all good exercises in axiom manipulation, so I will prove (2) and leave the rest to
you. Lets start by assuming that cv = 0, and show that either c = 0 or v = 0.
(⇐) Let’s start by assuming that c = 0, and show that 0v = 0 for any v ∈ V . Now 0v =
(0 + 0)v = 0v + 0v. Writing this as 0v + 0v = 0 + 0v, we can use Property (1) to cancel a 0v from
either side, giving 0v = 0.
In a similar vein, let’s now assume that v = 0 and show that c0 = 0 for any c ∈ R. Employing
a similar technique, c0 = c(0 + 0) = c0 + c0. Once again using the cancellation property from (1),
we get c0 = 0.
(⇒) If c = 0 then we’re done, so assume c 6= 0 and multiply everything by c−1 to get
thus either c = 0 or v = 0.
The above lemma needed to be proved for the sake of completeness, but this sort of algebraic
manipulation isn’t going to show up a lot. You should still do the other three proofs for exercise,
but don’t worry about mastering this sort of thing.
Before moving on, it’s worth pointing out Property (3) of Lemma 1.4. You may have thought
that this was obviously true, but Definition 1.1 did not give this to you. When we say that
4
2013-
c Tyler Holden
1.1 Vector Spaces 1 General Vector Spaces
v + (−v) = 0, the −v here was just notation, indicating that the element −v was related to v.
The fact that (−1)v = −v is a convenient a posteriori choice of notation. In terms of application,
note that in Example 1.3 we could have skipped showing the existence of −p(x): Once you know
that V is closed under scalar multiplication, (−1)p = −p guarantees the existence of the additive
inverse. This holds in general, and one need never check this in the definition of a vector space.
Example 1.5
Determine whether the set H = {(x, y) : y ≥ 0} is a vector space under the usual addition
and scalar multiplication.
Solution. Because we’re using the usual definitions of addition and scalar multiplication, most the
axioms are trivially true. The main conditions to check are closure of addition, closure of scalar
multiplication, and the existence of the zero vector.
Now 0 = (0, 0) is certainly a zero vector, so that’s not an issue. Moreover, vector addition is
closed, since if v = (x1 , y1 ), w = (x2 , y2 ) ∈ V , then the only restriction is that y1 , y2 ≥ 0. Adding
these together gives v + w = (x1 + x2 , y1 + y2 ), and since y1 + y2 ≥ 0, we know that v + w ∈ V as
well.
Now the troublesome property: H is not closed under scalar multiplication. Indeed, v = (0, 1) ∈
H. However, (−1)v = (0, −1) is not an element of H since its y-coordinate is negative. Because of
this, H is not a vector space.
Exercises
1.1-1. For each of the following subsets of R, indicate whether the set is a vector space with the
usual notion of addition and multiplication inherited from R:
1.1-2. Let Mm×n (R) denote the collection of m × n real matrices. Show that Mm×n (R) is a vector
space.
1.1-3. Let D([a, b]) = {f : [a, b] → R : f is differentiable}; that is, the set of differentiable functions
whose domain is the interval [a, b]. Show that D[a, b] is a vector space.
n o
1.1-4. Fix an N ∈ N, and let N̂ = {1, 2, . . . , N }. Define X = f : N̂ → C , the set of all functions
from N̂ to the complex numbers. Show that X is a vector space.
1.1-5. Let V be a vector space, and A be any set. Define V A = {f : A → V }; the set of all
functions from A to V . Show that V A is a vector space under pointwise addition and scalar
multiplication.
1.1-6. Let D = (x, y) : x2 + y 2 ≤ 1 . Determine whether D is a vector space.
1.1-7. Fix some n ∈ N and define Cn = {(z1 , . . . , zn ) : zi ∈ C}. Determine whether Cn is a vector
space.
5
2013-
c Tyler Holden
1 General Vector Spaces 1.2 Subspaces
1.1-8. Fix some n ∈ N and define Zn = {(k1 , . . . , kn ) : ki ∈ Z}. Determine whether Zn is a vector
space.
1.1-9. Let P (R) denote the set of all polynomials with real coefficients; that is, there is no restriction
on the degree of the polynomial. Determine whether P (R) is a vector space.
1.2 Subspaces
In mathematics, we are often interested in “sub-objects;” that is, subsets of objects which share
their same basic structure. In our case, this manifests as follows:
Definition 1.6
If V is a vector space, a subspace of V is any subset U ⊆ V which is a vector space under
the same operations of addition and pointwise multiplication.
Strictly speaking, these should be called “sub-vector spaces,” but contextually it is obvious that
this is what is meant. Now we’ve seen that showing that a set is a vector space is tedious, and
we don’t want to repeat the process with subspaces. Luckily, we can avoid most of these issues by
using the fact that V is a vector space.
Theorem 1.7
1. 0 ∈ U ,
Proof. (⇒) If U is a subspace then it is a vector space. The three properties above are axioms (4),
(1), and (6) of Definition 1.1, and so are trivially satisfied.
(⇐) The more interesting direction is that these three properties alone are sufficient to guarantee
all nine of the vector space axioms. The reason for this is that since U ⊆ V , if the axioms hold in
V then they will hold in U as well.
For example, suppose we wanted to show that addition is commutative; that is, we want to show
that for all u, v ∈ U that u + v = v + u. The proof is boring: Since U ⊆ V we know that u, v are
also elements of V . Since V is a vector space, we know addition is commutative, so u + v = v + u
in V . But if these are the same thing in V , they must also be the same thing in U . So addition is
commutative.
This was Axiom 2, and the same argument holds for Axioms 3, 5, 7, 8, and 9 (pick one and
check). Thus all that needs to be checked is Axioms 1, 4, and 6. These are precisely the three
properties that we assume hold for U , thus U is a vector space.
6
2013-
c Tyler Holden
1.2 Subspaces 1 General Vector Spaces
Great, so we only need to check three things to ensure that a subset is a vector space. Here is
a simple example:
Example 1.8
Solution. By Theorem 1.7 it suffices to show that X has the zero vector, and is closed under both
addition and scalar multiplication.
1. The zero vector in R2 is (0, 0), and this is also an element of X, so this condition is satisfied.
2. Fix two arbitrary elements of X, let’s call them (a, a) and (b, b). Their sum is (a, a) + (b, b) =
(a + b, a + b). This is also an element of X, so X is closed under addition.
3. Fix an arbitrary c ∈ R and (a, a) ∈ X. Acting c on (a, a) gives c(a, a) = (ca, ca) ∈ X, so X
is closed under scalar multiplication.
Example 1.9
Determine if the set U = (x, x2 ) : x ∈ R is a subspace of R2 .
Solution. The set U is the graph of the parabola y = x2 . Now (0, 0) ∈ U so the set contains the
0 vector; however, both addition and scalar multiplication will quickly fail. For example, choosing
the vectors (1, 1) and (2, 4), adding them gives (1, 1) + (2, 4) = (3, 5). This is not an element of X
(32 6= 5). There was nothing special about this set of points, pretty much any set of points would
break addition, since (a + b)2 6= a2 + b2 . Similarly, scalar multiplication breaks for most choice of
scalars and vectors.
Example 1.10
Consider P4 (R), the vector space of degree 4 polynomials. Define the set N =
{p ∈ P4 (R) : p(1) = 0}. Determine whether N is a subspace of P4 (R).
2. Let p, q ∈ N , so that p(1) = 0 and q(1) = 0. We want to check whether p + q ∈ N ; that is,
we need to show that (p + q)(1) = 0. But
so p + q ∈ N .
7
2013-
c Tyler Holden
1 General Vector Spaces 1.2 Subspaces
In the above proof, we didn’t use the fact that we were working in P4 (R), and it’s easy to
conclude that this proof would have worked for any Pn (R).
Exercise: In Exercise 1.1-9 you showed that P (R) – the set of all polynomials – is a vector
space. Show that for any n ∈ N, Pn (R) is a subspace of P (R).
Example 1.11
Let M2 (R) denote the set of 2 × 2-matrices. In Exercise 1.1-2 you should that this was a
vector space. Define the set S = {X ∈ M2 (R) : det(X) = 0}. Determine if S is a subspace
of M2 (R).
Solution. The zero matrix is definitely an element of S, since det(0) = 0. Scalar multiplication
is also satisfied. Indeed, if A ∈ S then det(A) = 0. Fix an arbitrary c ∈ R, so that det(cA) =
c2 det(A) = c2 × 0 = 0. However, vector addition fails. Note that
1 0 0 0
A= and B =
0 0 0 1
Exercises
T
1.2-1. (a) Consider the vector v = −1 2 π ∈ R3 , and define the set L = {tv : t ∈ R}. Show
that L is a subspace of R3 .
(b) More generally, let v ∈ Rn be any vector, and define L = {tv : t ∈ R}. Show that L is a
subspace.
1.2-2. Let A be an m × n matrix, and let H be the set of solutions to the equation Ax = 0.
(a) Is H ⊆ Rn or H ⊆ Rm ?
8
2013-
c Tyler Holden
1.3 Spanning Sets 1 General Vector Spaces
1.2-3. Let X = {f : R → R}; that is, the set of all functions from R to R.
Definition 1.13
If V is a vector space and A ⊆ V , the span of A is the set span(A) of all finite linear
combinations of vectors in A.
Let’s examine Definition 1.13 in a bit more detail. Most of the time, we’ll be interested in
looking at the span of finite sets. So if A = {v1 , . . . , vn }, then we can say
( n )
X
span(A) = span {v1 , . . . , vn } = ci vi : ci ∈ R ,
i=1
9
2013-
c Tyler Holden
1 General Vector Spaces 1.3 Spanning Sets
which is explicitly the set of all linear combinations of the elements of A. There is no need to worry
about the question of finite linear combinations, since there were only finitely many elements to
add together in the first place.
On the other hand, recall that P (R) is the set of all polynomials with real coefficients. Define
n o
A = xk : k ∈ N ,
the set of all monomials. Here A is infinite, but nonetheless its span consists of only finite linear
combinations. For example, 2 − πx + 33x74 is in the span of A. The restriction to finite linear
combinations ensures that every element of span(A) is actually a polynomial, and not some sort of
infinite series.
Example 1.14
Show that Pn (R) = span 1, x, x2 , . . . , xn .
Solution. Hopefully this is intuitive to you, but let’s be a little bit careful about this. Since we’re
showing an equality of sets, we need to perform a double subset inclusion.
(⊆) Every degree n polynomial is of the form p(x) = a0 · 1 + a1 · x + · · · an · xn , with the right
hand side being a linear combination of those on the right. Hence Pn (R) ⊆ span {1, x, . . . , xn }.
(⊇) Clearly each element of 1, x, x2 , . . . , xn is in Pn (R), so we need to argue that so too are
their linear combinations. However, Pn (R) is closed under addition and scalar multiplication, and
hence every linear combination of these elements is in Pn (R). Thus {1, x, . . . , xn } ⊆ Pn (R).
You should have some experience with spanning sets already, and using Gaussian elimination
to determine when a vector lies in the span of a set. Don’t be thrown by the more abstract setting,
as the general technique remains the same.
Example 1.15
Show that 2x2 − 5x + 1 is in the span of x2 , 1 + x, x2 − x + 1 in P2 (R).
10
2013-
c Tyler Holden
1.3 Spanning Sets 1 General Vector Spaces
Let V be a vector space and A ⊆ V an arbitrary non-empty subset of V . The set span(A) is
a subspace of V . Moreover, A ⊆ span(A), and span(A) the smallest subspace containing A.
1. The zero vector is the trivial linear combination. Choose any element v ∈ A, so that 0 = 0v.
Thus span(A) contains the zero vector.
3. Let u be as in part (2) above and fix some arbitrary real number d ∈ R. We have
n
X
du = (dci )wi
i=1
Hopefully you recall that Rn , while naturally spanned by the standard basis {ei }, can be spanned
by many other sets as well. For example, the sets
1 1 π 1 11 1 2
, or , or , ,
1 −1 e 5 12 0 3
11
2013-
c Tyler Holden
1 General Vector Spaces 1.3 Spanning Sets
all span R2 , and there are infinitely many other possibilities. There are similarly other spanning
sets for the other vector spaces we have seen.
Theorem 1.17 will help us check that two spanning sets generate the same space. Suppose S, T
are subsets of some vector space V and we want to show that span(S) = span(T ). According
to Theorem 1.7, if we can show S ⊆ span(T ) then span(S) ⊆ span(T ). Similarly, showing that
T ⊆ span(S) implies that span(T ) ⊆ span(S). Both inequalities give equality.
Example 1.18
Show that S = x2 , x + 1, x2 − x + 1 is a spanning set for P2 (R).
Solution. Clearly x2 , x + 1, x2 − x + 1 ⊆ P2 (R),
so by Theorem 1.7 we have span(S) ⊆ P2 (R).
Conversely, we know that P2 (R) = span 1, x, x2 . It suffices to show that each
of these
elements is
in the span of span(S), since by Theorem 1.7 we then know that P2 (R) = span 1, x, x2 ⊆ span(S).
You can check that
x2 = x2 + 0(x + 1) + 0(x2 − x + 1)
1 1 1
x = (x2 ) + (x + 1) − (x2 − x + 1)
2 2 2
1 1 1
1 = − (x2 ) + (x + 1) + (x2 − x + 1),
2 2 2
giving the other subset inclusion.
Exercises
1.3-1. Let S = {ei : i = 1, . . . , n} be the set of standard basis vectors in Rn . Show that Rn =
span(S).
1.3-2. Suppose we are working in Mm×n (R), and we define Eij to be the m × n matrix which is 1
in position (i, j), and zero everywhere else. For example, in M2×2 we have
1 0 0 1 0 0 0 0
E11 = , E12 = , E21 = , E22 = .
0 0 0 0 1 0 0 1
This is similar to the standard basis vectors ei above, but now in matrix form. Show that
Mm×n (R) = span {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n}.
1.3-3.
Find a finite set of vectors S which span P3 (R) which is not the standard monomials
2 3
1, x, x , x .
12
2013-
c Tyler Holden
1.4 Linear Independence 1 General Vector Spaces
Suppose that S = {A1 , . . . , Ak } ⊆ Mn (R) is a spanning set for Mn (R). Show that S 0 =
1.3-6.
AT1 , . . . , ATk also spans Mn (R).
1.3-7. Suppose that S = {p1 , . . . , pk } ⊆ Pn (R) is such that Pn (R) = span(S). Show that for every
x ∈ R, there exists an pi ∈ S such that pi (x) 6= 0.
1.3-8. Show that if V is any vector space, then V always admits a spanning set; that is, there is a
subsets S ⊆ V such that span(S) = V .
1.3-9. Suppose V is a vector space and {v1 , . . . , vn } is a set of vectors in V . Show that if w ∈
span {v1 , . . . , vn }, then there exists some i ∈ {1, . . . , n} such that
that is, there is some vector vi that can be replaced by w without affecting the span.
and many other possibilities. In fact, you know that there are infinitely many linear combinations
that give v, since as a linear system the spanning set has a coefficient matrix
1 0 1 −1
A=
0 1 1 1
which has rank 2. Thus any solution to Ax = v will have two parameters in its solution set. You
can check that the solutions to Ax = v are
−1 1 1
−1 −1 2
x = s
1 + t 0 + 0 , s, t ∈ R,
0 1 0
so that the first solution above is (s, t) = (0, 0), the second is (s, t) = (3/2, 1/2), and the third is
(s, t) = (4, 5).
13
2013-
c Tyler Holden
1 General Vector Spaces 1.4 Linear Independence
There are a few problems here. The first is that we didn’t need four vectors to span R2 . In
fact, any two of these vectors would have spanned R2 . The second is that when we wrote v as a
linear combination of this spanning set, there was no unique way of doing this. The source of this
issue is the fact that these vectors lie within each others spans. For example,
1 1 0 1 1 1 1 −1 1 1 0
=1· +1· or = · − · or =1· −1· .
1 0 1 0 2 1 2 1 0 1 1
We have redundancy, and we can fix our issues by removing that redundancy.
It’s annoying to ask whether one of these vectors lies in the span of the other three, and to do
that for each vector. Instead, we can succinctly combine these into a single statement: If there are
any non-trivial solutions to
1 0 1 −1 0
c1 + c2 + c3 + c4 =
0 1 1 1 0
then there are redundant vectors, since we can re-arrange the equation to write one vector as a
linear combination of the others. For example, we know
1 0 1 −1 0
1· −3 +1· +2· =
0 1 1 1 0
so that we can re-write this as
1 0 1 −1 0 1 1 1 1 2 −1
=3 − −2 or = + + etc.
0 1 1 1 1 3 0 3 1 3 1
Definition 1.19
If V is a vector space, a non-empty finite set of vectors {v1 , . . . , vn } is said to be linearly
independent if
c1 v1 + c2 v2 + · · · + cn vn = 0
implies that c1 = c2 = · · · = cn = 0; that is, the only solution to this system of equations
is the trivial solution. If S ⊆ V is an infinite set, we say that S is linearly independent if
any finite subset of S is linearly independent in V . A set of vectors which is not linearly
independent is said to be linearly dependent.
Linear independence is stated in a roundabout way, so you should take some time to stew over
the definition. Alluding to our discussion above, the intuition is that linearly dependent vectors
have redundancy, some of the vectors lie in the span of the others. Thus linearly independent
vectors are the opposite: They cannot be written as a linear combination of one another.
Example 1.20
Determine whether the set S = x + 1, 2x − 1, x2 + 3x is linearly independent in P2 (R).
14
2013-
c Tyler Holden
1.4 Linear Independence 1 General Vector Spaces
Remember that the 0 on the right hand side is the zero polynomial. If we collect like terms, this
becomes
c3 x2 + (c1 + 2c2 + 3c3 )x + (c1 − c2 ) = 0x2 + 0x + 0.
For this to be true, each of the coefficients must be zero, giving a linear system
c3 = 0
c1 + 2c2 + 3c3 = 0.
c1 − c2 =0
so we conclude that the only solution is the trivial solution c1 = c2 = c3 = 0. This is precisely the
condition or linear independence, and so S is linearly independent.
I mentioned previously that linear independence would fix the issue of non-uniqueness of rep-
resentation. This next theorem shows this is true very quickly.
Theorem 1.21
w = c1 v1 + c2 v2 + · · · + cn vn
= d1 v1 + d2 v2 + · · · + dn vn .
We’re going to show that ci = di for each i, and therefore the coefficients are unique. To do this,
let’s subtract the two lines above to get
Now since S is linearly independent, the only solution to this homogeneous can occur if each of the
coefficients is zero; namely, ci = di for each i = 1, . . . , n. Thus the coefficients are unique.
Exercises
15
2013-
c Tyler Holden
1 General Vector Spaces 1.4 Linear Independence
16
2013-
c Tyler Holden
1.5 Basis and Dimension 1 General Vector Spaces
If V is a vector space, our major goal is to find a set of vectors which span V , but do so in the most
efficient way possible (are linearly independent). These are competing notions: As we add more
vectors to a set, we may increase the span, but decrease the likelihood of being linearly independent.
In fact, we can make this argument precise:
Theorem 1.22
Let V be a vector space. If S is any spanning set of V , and T is any linearly independent
subset of V , then |T | ≤ |S|.
Proof. This theorem is true even when S and T are infinite, though in that case the proof is a
bit harder (see Exercise 1.5-5). Thus suppose S = {v1 , . . . , vn } is a finite spanning set for V , and
T = {w1 , . . . , wk } is a finite linearly independent set.
Since V = span(S), we know that w1 ∈ span(S). By Exercise 1.3-9, we can replace one
of the vi with w1 . By re-arranging if necessary, let’s say that we replace v1 with w1 , so that
V = span {w1 , v2 , . . . , vn }. We can continue this inductively, replacing v2 with w2 , v3 with w3 ,
and so on.
For the sake of contradiction, suppose that k > n (that is, |T | > |S|). The induction we
performed above ensures that we can write V = span {w1 , . . . , wn }. But since k > n there are
still wi , i ∈ {n + 1, n + 2, . . . , k − 1, k} remaining that are not part of this span. Let’s look at
wn+1 , which we know must be in span {w1 , . . . , wn } = V . But since wn+1 is non-zero, there are
coefficients c1 , . . . , cn – not all of which are zero – such that
wn+1 = c1 w1 + c2 w2 + · · · + cn wn ,
Having established that linearly independent sets are always smaller than spanning sets, the
question then becomes whether we can find a “Goldilocks” set; namely, a set which is both linearly
independent and spans the vector space.
Definition 1.23
If V is a vector space, a basis for V is any set B which is both linearly independent and
spans V .
17
2013-
c Tyler Holden
1 General Vector Spaces 1.5 Basis and Dimension
away, we would preserve linear independence, but we would no longer span the vector space. It
then seems reasonable to presume that all bases must be the same size.
Proposition 1.24
If V is a vector space with bases B1 and B2 , then |B1 | = |B2 |; that is, all bases are the same
size.
Proof. Since B1 is linearly independent and B2 spans V , by Theorem 1.22 we know that |B1 | ≤ |B2 |.
But the same argument holds in reverse: B2 is linearly independent and B1 spans V , so by Theorem
1.22 we know |B2 | ≤ |B1 |. Both inequalities imply that |B1 | = |B2 |.
Definition 1.25
If V is a vector space, the dimension of V – written dim(V ) – is the cardinality of any of its
bases.
The fact that every vector space has a basis is Exercise 1.5-5c. The fact that bases have a unique
size is Proposition 1.24. Thus talking about the dimension of a vector space is a well-defined notion.
Examples of common bases and the dimensions of their corresponding vector spaces are as follows:
Of course, the aforementioned bases aren’t the only bases for their respective vector spaces. The
only thing we can guarantee is that two different bases always have the same number of vectors.
Knowing the dimension of a vector space, how do we find a basis? There are two main ideas:
We can start from a linearly independent set, and continue adding new vectors while maintaining
linear independence until we span the entire space. Alternatively, we can take a spanning set,
and remove redundant vectors until we are left with something which is linearly independent. To
convince ourselves that this procedure will eventually result in a basis, we need the following:
Proposition 1.26
This proposition states that we don’t need to worry about both linear independence and span,
so long as we have the right number of vectors.
18
2013-
c Tyler Holden
1.5 Basis and Dimension 1 General Vector Spaces
Proof. I will do the proof of (1), leaving the proof of (2) as an exercise (Exercise 1.5-1). Suppose
then that T = {v1 , . . . , vn } is linearly independent and dim(V ) = n. Fix a basis B of V , so that
|B| = n. We need to show that span(T ) = V , so suppose for the sake of contradiction that this
is not the case. Choose some w ∈ V \ span(T ), and note that T ∪ {w} is linearly independent by
Exercise 1.4-9. But since T ∪ {w} is linearly independent and B is a spanning set, Theorem 1.22
tells us that |T ∪ {w} | ≤ |B|, which in turn implies that n + 1 ≤ n. This is a contradiction, so it
must be the case that span(T ) = V
Proposition 1.27
Proof. I will do the proof of (1) and leave (2) to Exercise 1.5-2. Let T be a linearly independent
set, and suppose dim(V ) = n. If span(T ) = V then we’re done, so assume that span(T ) ⊂ V .
Choose a vector w1 ∈ V \ span(T ), so that T1 = T ∪ {w1 } is still linearly independent, according
to Exercise 1.4-9. If span(T1 ) = V we’re done, and if not we inductively choose a wi ∈ V \ span(Ti )
and define Ti = Ti−1 ∪ {wi }. Again, Ti is linearly independent.
This process must terminate. Indeed, |Ti | = |Ti−1 | + 1, so that the size of our linearly indepen-
dent sets are increasing. Per Proposition 1.26, once |Ti | = n, we will have a basis.
Proposition 1.28
If V is a finite dimensional vector space and U ⊆ V is a subspace, the following are true:
1. dim(U ) ⊆ dim(V );
Proof. 1. For the sake of contradiction assume dim(U ) > dim(V ). Fix a basis B for U . By
definition we know that |B| = dim(U ) > dim(V ). Moreover, since B is linearly independent
in U , it is also linearly independent in V , and so |B| ≤ dim(V ). But this contradicts the fact
that |B| = dim(U ) > dim(V ), so dim(U ) ≤ dim(V ).
3. If B is a basis of U , then |B| = dim(U ) = dim(V ). Thus B is a linear independent set of size
dim(V ), and hence is a basis of V by Proposition 1.26.
19
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces
Exercises
1.5-1. Finish the proof of Proposition 1.26 by proving that if S ⊆ V is a spanning set, and |S| =
dim(V ), then S is a basis for V .
1.5-2. Prove item (2) in Proposition 1.27. Hint: Use Exercise 1.3-5 to inductively reduce the size of
your spanning set, and argue that eventually this process must terminate.
1.5-3. (a) Show that a basis is a maximal linearly independent set; that is, if B is a basis and T is
a linearly independent set such that B ⊆ T , then B = T .
(b) Show that a basis is a minimal spanning set; that is, if B is a basis and S is a spanning
set such that S ⊆ B, then B = S.
1.5-4. Let C[0, 1] denote the set of continuous functions f : [0, 1] → R. Show that C[0, 1] is an
infinite dimensional vector space. Hint: Do not try to find a basis for C[0, 1].
1.5-5. (Hard) Here we’ll prove Theorem 1.22. This is not difficult, but the language is probably
new and unfamiliar. It is therefore easy to become flustered or overwhelmed. You should not
worry about this proof at all, it is only for advanced students who are interested in seeing
how we deal with the infinite case. To prove our theorem, we will need the following famous
result (which it turns out is equivalent to the Axiom of Choice):
Zorn’s Lemma: “If P is a partially ordered set such that every chain in P has an
upper bound, then P has a maximal element.”
(a) Prove that if S is a spanning set of a vector space V , and T ⊆ S is a linearly
independent set, then there exists a basis B such that T ⊆ B ⊆ S. To do this,
consider the set M of all linearly independent subsets of V containing T , with
a partial ordering induced by subset inclusion.
i. Argue that every chain in M has an upper bound, by taking the union over
all elements in the chain.
ii. Use Zorn’s Lemma to argue that M has a maximal element, say B.
iii. Show that B spans V by using Exercise 1.4-9.
(b) Prove Theorem 1.22 by letting Ŝ = S ∪ T , and apply the theorem from part
(a).
(c) Conclude from part (a) that every vector space has a basis.
Suppose V is an arbitrary vector space with subspaces U and W . Our goal is to ask ourselves
if there is some sense in which every element of V can be broken down into something in U and
something in W .
For example, let F (R) = {f : R → R} denote the set of all real-valued functions on R. This is
a vector space, and it’s not too hard to check that
20
2013-
c Tyler Holden
1.6 Operations on Vector Spaces 1 General Vector Spaces
the set of even and odd functions respectively, are subspaces. Critical to our discussion will be
the fact that the only function which is both even and odd is the zero function 0(x) = 0; that is,
E ∩ O = {0}.
Now I claim that every function f ∈ F (R) can be written uniquely as f = g + h, where g ∈ E
and h ∈ O. To see this, fix such an f , and define
f (x) + f (−x) f (x) − f (−x)
g(x) = and h(x) = .
2 2
You can quickly check that g ∈ E and h ∈ O, and that g + h = f . Moreover, if there are another
set of functions ĝ ∈ E and ĥ ∈ O such that f = ĝ + ĥ, then subtracting f = g + h would give
0 = (g − ĝ) + (h − ĥ) ⇒ ĝ − g = h − ĥ
| {z } | {z }
∈E ∈O
Since the left hand side – an even function – is equal to the right hand side – an odd function – it
must be the case that each side is identically the 0 function. Thus ĝ = g and ĥ = h, showing that
this decomposition is unique.
We conclude that there’s a sense in which F (R) = E + O, and it’s this plus sign that we have
to decipher.
Definition 1.29
Suppose V is a vector space with U, W subspaces. We define the sum U + W =
{u + w : u ∈ U, w ∈ W }.
You can check in general that U + W and U ∩ W are subspaces of V . To mimic what we had
above with F (R) = E + O, we introduce the following definition:
Definition 1.30
If V is a vector space with subspaces U and W such that U + W = V and U ∩ W = {0}, we
say that V is the internal direct sum of U and W and write V = U ⊕ W .
Proposition 1.31
Let V be a vector space with subspaces U and W . The following are equivalent:
1. V = U ⊕ W
Proof. [1 ⇒ 2] The proof is almost identical to the proof that F (R) = E+O above, with appropriate
changes made.
P
[2 ⇒ 3]PIf v ∈ V , write v = uP + w. Since P BU and BW are bases, we can write u = i ci ui
and w = j dj wj , and so v = i ci ui + j dj wj showing that B spans V . To show linear
21
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces
Now by assumption, there is exactly one way to write 0 ∈ V as the sum of two vectors from U and
W . Since 0 ∈ U ∩ W , we know that 0 = 0 + 0 is one such way, and hence must be the only such
way. We can immediately conclude that
k
X X̀
ci ui = 0 and dj wj = 0.
i=1 j=1
Since BU and BW are bases, we conclude that ci = 0 for all i = 1, . . . , k and dj = 0 for all
j = 1, . . . , `.
[3 ⇒ 1] It suffices to show that U + W =PV and U P∩ W = {0}. In the firstPcase, let v ∈ V .
SincePB is a basis for V , we can write v = i ci ui + j dj wj . Clearly u = i ci ui ∈ U and
w = j dj wJ ∈ W , so v = u + w shows that V ⊆ U + W . The other inclusion is trivial.
To show that U ∩ W = {0}, suppose v ∈ U ∩ W . Since v ∈ U and v ∈ W , there exists
coefficients ci , i = 1, . . . , k and dj , j = 1, . . . , ` such that
k
X X̀
v= ci ui = dj wj .
i=1 j=1
Since B is a basis, we conclude that ci = 0 for all i = 1, . . . , k and dj = 0 for all j = 1, . . . , `. Hence
v = 0, showing that the only point in U ∩ W is the zero vector, as required.
Since the bases for U and W combine to give a basis for V , we can immediately conclude the
following:
Corollary 1.32
Example 1.33
22
2013-
c Tyler Holden
1.6 Operations on Vector Spaces 1 General Vector Spaces
hz, vi
projv (z) = v
hv, vi
be the usual projection operator. It is easy to see that projv (u) ∈ U . If we take w = z − projv (z)
we claim that w ∈ U ⊥ . Indeed,
hz, vi hz, vi
v·w =v· z− v = hv, zi − hv, vi = 0.
hv, vi hv, vi
Let V and W be vector spaces (not contained in a larger, ambient vector space). As sets, we can
make sense of the Cartesian product V × W . We can endow V × W with a vector space structure
by defining addition and multiplication to be component-wise. Thus
Don’t prove the vector space axioms from scratch, but think about this for a while and convince
yourself why they must obviously be true. The external direct sum, written V ⊕ W , is the set
V × W with this notion of addition and multiplication.
For example, we can naturally define a vector space structure on R2 ⊕ P1 (R) by saying that
v1 w1 v 1 + w1
, a + bx + , c + dx = , (a + c) + (b + d)x .
v2 w2 v 2 + w2
The more interesting case is when we take the direct sum of infinitely many vector spaces. Let
Vi , i ∈ N be a countable collection of vector spaces (See Exercise 1.6-4 for the generalization to
uncountable families). The Cartesian product
∞
Y
Vi = {(v1 , v2 , v3 , . . .) : vi ∈ Vi }
i=1
is the set of all sequences such the ith element of the sequence comes from Vi . This is a vector
space under componentwise addition and scalar multiplication. However, it’s not the direct sum!
23
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces
Without going into the details of why it’s not the direct sum, let’s worry about defining the direct
sum.
L∞ Q∞
The direct sum i=1 Vi is the subspace of i=1 Vi consisting of sequences v = (v1 , v2 , . . .)
which
L∞ are confinitely zero; that is, all but finitely many of the vi are zero. You will confirm that
i=1 Vi is a subspace in Exercise 1.6-3.
Suppose that V is a vector space, and U is a subspace. We can define a relation on V by saying
that x ∼ y if x − y ∈ U .
Proposition 1.34
The relation above is an equivalence relation; that is, it is reflexive, symmetric, and transitive.
Proof. Recall that the three critical properties of a subspace are that it contains the zero vector,
is closed under addition, and is closed under scalar multiplication. We’ll see that each of these
properties corresponds to exactly one of the three equivalence relation properties.
• Reflexive: If x ∈ V then x − x = 0 ∈ U , so x ∼ x.
Proposition 1.35
Proof. We need to check that these operations are well-defined; that is, they don’t depend on which
representative of the equivalence class we choose. Let [x], [y] ∈ V /U and choose two representatives
of each, say x, x0 ∈ [x] and y, y0 ∈ [y]. By definition, this means x − x0 ∈ U and y − y0 ∈ U . Our
goal is to show that x+y ∼ x0 +y0 so that [x+y] = [x0 +y0 ], and similarly for scalar multiplication.
Now
(x + y) − (x0 + y0 ) = (x − x0 ) + (y + y0 ) ∈ U,
| {z } | {z }
∈U ∈U
24
2013-
c Tyler Holden
1.6 Operations on Vector Spaces 1 General Vector Spaces
and similarly,
cx − cx0 = c (x − x0 ) ∈ U
| {z }
∈U
so both operations are well defined. From this, all the other vector space axioms follow from the
fact that V is a vector space, and so don’t need to be shown.
With this vector space structure on U , note that every element in U is in the equivalence class
of the zero vector.
Example 1.36
n T o
Let V = R2 and let U = span 1 1 . Show that dim(V /U ) = 1.
Solution. Let’s play around with this a bit first. So immediately, we know that
1 0 1 0 1
= since − = ∈ U.
1 0 1 0 1
h T i h T i
More generally, t t = 0 0 for any t ∈ R. If we think of R2 as the plane, this says that
every vector along the line y = x is in the equivalence class of the zero vector.
This generalizes further. The equivalence classes are precisely the set of all lines which are
T
perpendicular to y = x. For example, take v = 1 −1 . Note that elements in [v] look like
1+t x
[v] = :t∈R = :y =x+2 .
−1 + t y
y
−1
1 2
U, [0]
(−1, 1)
1
−2 −1 1 2 x
n T o
Figure 1.1: The vector space R2 / span 1 1 . Every point in this quotient space
is a line perpendicular to y = x.
A useful thing to do in these situations is to find a “moduli space;” in this case, a subset of R2
in bijection with V /U which captures the vector space structure of V /U . In this case, I’m going to
use the y-axis W = {(0, y) : y ∈ R}.
25
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces
Claim: There is a bijective correspondence between W and V /U . First, if [x] ∈ U then there
is a representative w ∈ W such n that [w] = [x]. Indeed,o choose an arbitrary representative x =
T
x1 x2 ∈ V /U , so that [x] = x1 + t x2 + t : t ∈ R . Taking t = −x1 gives
0 x1
= = [x]
x2 − x1 x2
T
and 0 x2 − x1 ∈ W .
T
Next, this representative is unique. Indeed, suppose [x] ∈ V /U and w1 = 0 w1 , w2 =
T
0 w2 ∈ W are such that [w1 ] = [x] = [w2 ]. Then
0 0 0
w1 − w2 = − = ∈ U.
w1 w2 w1 − w2
But every element of U is of the form t t for t ∈ R, so that w1 − w2 = 0, showing that w1 = w2 .
Thus for every element of V /U , there exists a unique element of W which represents it.
Taking then the elements of W as our choice of representatives for V /U , we know that addition
and scalar multiplication are defined by acting on equivalence classes; that is,
0 0 0 0 0
+ = and c = .
w1 w2 w1 + w2 w1 cw1
h T i
This is precisely the vector space structure on the subspace W , and it’s easy to see that 0 1
is a basis for this space. Thus dim(V /U ) = 1.
Exercises
1.6-1. Let U and W be vector spaces, and let V = U ⊕ W be their external direct sum.
(a) Show that U × {0} is a subspace of V .
(b) Let Z ⊆ W be an arbitrary subspace of W . Show that U × Z is a subspace of V .
Q
1.6-2. If Vi , i ∈ N is a countable family of vector spaces, show that i∈N Vi is a vector space under
componentwise addition and scalar multiplication.
Q
1.6-3. If Vi , i ∈ N is a countable family of vector spaces, show that ⊕i∈N Vi is a subspace of i∈N Vi ,
and hence is a vector space itself.
1.6-4. Let I be an arbitrary set, and let (Vi )i∈I be a family of vector spaces indexed by I. Define
( )
Y [
Vi = f : I → Vi : ∀i ∈ I, f (i) ∈ Vi .
i∈I i∈I
Q
(a) Show that i∈I Vi is a vector space under pointwise addition and scalar multiplication.
L
(b) Define i∈I Vi as the set of all functions which have finite support; that is, there are
only finitely many points such that f (i) 6= 0.
(c) Show that when I is countable, the above definition reduces to our usual definition.
26
2013-
c Tyler Holden
2 Linear Transformations
2 Linear Transformations
Functions are a powerful mathematical tool: They give us the ability to indirectly study a space
itself, and to study the relationship between two spaces. Let’s quickly recall some important facts
about functions.
If A, B are two sets, a function is a map f : A → B which assigns to each element of A a single
element B. If a ∈ A, we usually indicate its target under f as f (a). Here A is called the domain
of f , and B is the codomain. The range or image of a function is the set
namely, the image of a function is the set of points that are mapped onto from A. Also recall that
a function f : A → B is said to be injective if whenever f (x) = f (y) then x = y. It is said to be
surjective if B = image(f ). Two functions f, g : A → B are equal if f (x) = g(x) for all x ∈ A.
Now many students think of functions in terms of their representations. For example, the
function f : R → R given by f (x) = x2 is a perfectly fine function. When we write f (x) = x2 , we’re
giving an algorithm for how outputs are computed: If you give me a number x, its output is x2 .
Many (in fact, almost all) functions f : R → R cannot be prescribed using such a nice algorithm.
It’s essential to recognize that the algorithm is not the function!
For example, consider the functions f, g : {0, 1} → {0, 1}, given by f (x) = x and g(x) = x2 .
The algorithm for computing the output of these two functions is different. In the first case, we
just reproduce the input, while in the second case we first take the input and square it. However,
these two functions are equal:
This is a toy example, but we’ll see much more significant examples of this shortly.
Vector spaces are your first exposure to the notion of “mathematical structure.” A vector space
V is not just a set, it is a set with additional structure. We can add vectors, and we can multiply
them by real numbers. When we look at functions between vectors spaces V and W , we want to
restrict our attention to functions which allow us to relate the structure on V to the structure on
W . For example, the map
v
2
T : R → P1 (R), T 1 7→ v1 x + v2
v2
27
2013-
c Tyler Holden
2 Linear Transformations 2.1 Linear Maps
v1 w1 v 1 + w1
T + =T
v2 w2 v 2 + w2
| {z }
Adding in R2
= (v1 + w1 )x + (v2 + w2 )
= (v1 x + v2 ) + (w1 x + w2 )
v1 w1
=T +T
v2 w2
| {z }
Adding in P1 (R)
This is what we mean by “preserving addition:” When we applied T , we could add the vectors
first, then apply T ; or apply T to each vector separately, then add them. The order of applying T
and adding the vectors does not matter. You can easily check that this map also preserves scalar
multiplication; that is, T (cv) = cT (v).
Functions which do not preserve the algebraic structure of the vector space will not help us
study these sets as vector spaces, so we discard them. All of this discussion leads to the following
definition:
Definition 2.1
If V and W are vector spaces, a linear transformation T : V → W is a function such that
for all v, w ∈ V and c ∈ R we have
1. T (v + w) = T (v) + T (w)
2. T (cv) = cT (v).
The words function, map, transformation, and operator are all effectively synonyms. There are
slight differences in convention; for example, functions between vector spaces are often called trans-
formations, while linear transformations from a vector space V to itself are often called operators.
That being said, the word “linear map” often arises as well.
Example 2.2
is linear.
T T
Solution. Fix v = v1 v2 v3 and w = w1 w2 w3 in R3 , and c ∈ R. We begin by showing
28
2013-
c Tyler Holden
2.1 Linear Maps 2 Linear Transformations
Example 2.3
T
Show that the map M : R2 → R given by M ( v1 v2 ) = v1 v2 is not linear.
Solution. Both addition and scalar multiplication will fail this. You can do this in general
to see
why it breaks down, but of course a counterexample is sufficient. Let c = 2 and v = 1 1 . Then
2 1
M (2v) = M = 4 6= 2M .
2 1
Example 2.4
Solution. Despite being an important map in the study of linear algebra, the determinant map is
not linear. Indeed, we already know that det(A + B) 6= det(A) + det(B) from Example 1.11, so
there’s nothing left to do.
While it wasn’t stated explicitly, we know that the zero vector 0V is an important part of
the structure of a vector space V . One would reasonably expect that if T : V → W is a linear
transformation, the zero vectors should map to one another. Between this and a few other small
results, we have the following proposition
29
2013-
c Tyler Holden
2 Linear Transformations 2.1 Linear Maps
Proposition 2.5
1. T (0V ) = 0W
Proof. The proof of (2) is straightforward, and (3) is an uncomplicated induction proof. Thus I’ll
do the proof for (1) and leave the others as an exercise.
We know that 0V + 0V = 0V , thus
An important result that we learned in a previous course is that every m × n matrix A defines a
linear transformation TA : Rn → Rm , x 7→ Ax. We learned that the converse was also true; namely,
if T : Rn → Rm is a linear transformation, then wePcould write T (x) = Ax. Here A is determined
by how T acts on the standard basis, since if x = i xi ei then
x1
X
x2
T (x) = T ( xi ei ) = xi T (ei ) = T (e1 ) T (e2 ) · · · T (en ) . .
| {z .
} .
i
A
xn
In Section 1.5 we learned that we could use a basis as the atoms of a vector space. This conveniently
reduces much of the work to linear transformations to checking what it does on a basis, and allows
us to extend the above result to more than just the standard basis in Rn :
Theorem 2.6: Linear Extension
Let V and W be finite dimensional vector spaces, and B = {v1 , . . . , vn } ⊆ V be a basis for V .
If for each i ∈ {1, . . . , n}, wi ∈ W is a vector in W , there is a unique linear transformation
T : V → W such that T (vi ) = wi .
Proof. For any vector v ∈ V we know that v can be written in the basis P B using a unique set
of coefficients; namely, there exist unique c1 , . . . , cn ∈ R such that v = i ci vi . Define the map
T : V → W by sending X X X
ci vi → ci wi = ci T (vi ).
i i i
Since the ci are all unique, this map is well defined. It remains to check that it is linear, which
I’ve left to Exercise 2.1-5.
30
2013-
c Tyler Holden
2.1 Linear Maps 2 Linear Transformations
Thus T (v) = S(v) for all v ∈ V , showing that S = T . We conclude that T is unique.
Remark 2.7
1. Note that in the theorem statement, the wi do not need to be distinct. It is therefore
possible that multiple of the vi map to the same wi .
2. If you ever see a phrase along the lines of “Define the transformation such that ... and
extend linearly,” the “extend linearly” part is referring to this theorem.
3. As an immediate corollary to Theorem 2.6 which was directly shown as part of the
proof, any two linear transformations which agree on a basis are equal.
Example 2.8
T T
Consider the basis of R2 consisting of v1 = 1 1 and v2 = 1 −1 . Determine the
linear transformation T : R2 → R3 which maps
2 0
1 1
T = 0 and T = 0 .
1 −1
−1 1
so that
2 0 a+b
a a+b 1 a−b 1 a+b a−b
T =T + = 0 + 0 = 0 .
b 2 1 2 −1 2 2
−1 1 −b
31
2013-
c Tyler Holden
2 Linear Transformations 2.1 Linear Maps
Exercises
2.1-3. Let V, W be vector spaces, and let L(V, W ) denote the set of linear transformations from V
to W . Define addition and scalar multiplication pointwise; that is, if T, S ∈ L(V, W ) and
c ∈ R, define T + S and cT by
Note: The space V ∗ = L(V, R) is called the dual space of V , and End(V) = L(V, V ) is known
as the endomorphisms of V .
2.1-4. Complete the proof of Proposition 2.5 by proving properties (2) and (3).
2.1-5. Complete the proof of Theorem 2.6 by showing that the given map is linear.
2.1-9. Suppose that T : V → W is a bijective linear transformation, and that {v1 , . . . , vk } is a basis
for V . Show that {T (v1 ), . . . , T (vk )} is a basis for W . Conclude that dim V = dim W .
2.1-10. Suppose V is a vector space and v is some non-zero element of V . Show that there is a linear
transformation T : V → R such that T (v) 6= 0.
32
2013-
c Tyler Holden
2.2 The Kernel and Image 2 Linear Transformations
2.1-13. Let V be a vector space. We saw from Exercise 2.1-3 that L(Rn , R) is a vector space, called
the dual space to Rn . Fix some v ∈ Rn , and define the map Tv : Rn → R by Tv (w) = v · w,
where v · w is the usual dot product.
2.1-14. (Challenging) If you’ve taken any multi-variable calculus, you’ve likely learned about direc-
tions derivatives. Recall that if v ∈ Rn and f : Rn → R is a C ∞ function, then
f (w + tv) − f (w)
Dv (f )(w) = lim = ∇f (w) · v.
t→0 t0
Thus Dv (f ) : Rn → R.
Note: This exercise is not hard, it’s just that the bookkeeping – tracking what everything
does and where it should live – is tricky. However, this is an incredibly important construction
in mathematics, and is a worthwhile exercise.
We’re already familiar with the image of a function. Since the zero vector is such an important
element of a vector space, it has a dual notion as well called the kernel, which is the set of all things
which map to zero.
Definition 2.9
Let T : V → W be a linear transformation between vector spaces. The image of T is the set
of all elements of W which are hit by some element of V ,
ker(T ) = {v ∈ V : T (v) = 0} .
33
2013-
c Tyler Holden
2 Linear Transformations 2.2 The Kernel and Image
Proof. Let’s start with the kernel of T . Clearly 0V ∈ ker T since T (0V ) = 0W , so it remains to
show addition and scalar multiplication. Let v, w ∈ ker(T ). We want to show that v + w ∈ ker(T ),
so
T (v + w) = T (v) + T (w) = 0W + 0W = 0W
so that cv ∈ ker(T ).
For the image, we note that T (0) = 0, so 0 ∈ image(T ). Thus let z, y ∈ image(T ). Since these
are elements of the image, there are v, w ∈ V such that T (v) = z and T (w) = y. To show that
z + y ∈ image(T ), we need to find something which maps to z + y. The obvious candidate is v + w,
and indeed
T (v + w) = T (v) + T (w) = z + y
T (cv) = cT (v) = cz
My comment prior to Proposition 2.10 yields another meta-proof. We already know that the
span of a set of vectors is always a subspace, and the image of TA is its column space; that is, the
span of its columns. Similarly, the null space is the set of all solutions to a homogeneous system,
which in turn is the span of the system’s basic solutions. Since we can write both the null space
and column space as the span of a set of vectors, it immediately follows that they are subspaces.
Example 2.11
34
2013-
c Tyler Holden
2.2 The Kernel and Image 2 Linear Transformations
Solution. In Exercise 2.1-1e you showed that the map Tt0 : Pn (R) → R, p 7→ p(t0 ) is a linear map
for any choice of t0 ∈ R. Note that L = ker T1 , since
Example 2.12
Solution. We can attack this from two different directions. One is to recognize K as the kernel of
a linear transformation. Define T : Mn (R) → Mn (R) by T (A) = A + AT . You can quickly check
that this map is linear, and that ker T = K.
Alternatively, define S : Mn (R) → Mn (R) by S(A) = A − AT . We claim that K = image(S).
Indeed, note if that X ∈ image(S) then X = A − AT for some A ∈ Mn (R), and
−X T = −(A − AT )T = −AT + A = X.
Now we know that the rank of a matrix is the same as the dimension of its column space,
or that rank(A) = dim(col A). Since we know that under the identification of T with its matrix
representation, col(A) = image(TA ), this allows us to define the rank of a linear transformation.
Definition 2.13
If T : V → W is a linear transformation, the rank of T is rank(T ) = dim(image(T )). The
nullity of T is nullity(T ) = dim(ker(T )).
The rank and nullity of a linear transformation play together very nicely, leading to the following
major theorem:
35
2013-
c Tyler Holden
2 Linear Transformations 2.2 The Kernel and Image
Let T : V → W be a linear transformation. If both ker T and image T are finite dimensional,
then V is finite dimensional and
Proof. Since rank(T ) = k is finite, fix a basis {r1 , . . . , rk } for image(T ). By virtue of being in the
image of T , there exists elements si ∈ V such that T (si ) = ri for each i ∈ {1, . . . , k}. Similarly,
since nullity(T ) = ` is finite, let {t1 , . . . , t` } be a basis for ker T . It suffices to show that B =
{s1 , . . . , sk , t1 , . . . , t` } is a basis for V .
start by showing that B spans V . Let v ∈ V , so that T (v) ∈ image(T ). We can thus write
Let’sP
T (v) = i ci ri for some choice of coefficients ci ∈ R. Subtracting these vectors from one another,
and using the linearity of T we get
!
X X
0W = T (v) − ci ri = T v − ci si
i i
P
showing that v − i ci si ∈ ker T . We can thus write this in the basis for ker T as
X X k
X X̀
v− ci si = dj tj ⇒ v= ci si + dj tj
i j i=1 j=1
for some choice of coefficients dj ∈ R. Since v was arbitrary, any vector in V can be written as a
linear combination of the elements of B.
P P
For linear independence, suppose i ci si + j dj tj = 0V . We want to show that all of the
coefficients must be zero. Acting T on both sides of the equation give
X k X̀ k
X X̀ k
X
0W = T ci si +
dj tj = ci T (si ) + dj T (tj ) = ci ri .
i=1 j=1 i=1 j=1 i=1
| {z }
=0W
We often know the dimension of V , and so determining one of the nullity immediately tells us
the rank, and vice versa.2 This process can save a lot of time.
2
If you’ve taken any graph theory, you may have learned about the Euler Characteristic χ = V − E + F . There are
theorems which tell us how the Euler characteristic must behave. Surprisingly, the Rank-Nullity Theorem is another
manifestation of this fact, but you will probably have to go to graduate school to see why.
36
2013-
c Tyler Holden
2.2 The Kernel and Image 2 Linear Transformations
Example 2.15
Let A be an n × n matrix such that Ak = 0 for some k ∈ N. Show that every matrix
X ∈ Mn (R) can be written as X = AY − Y for some Y ∈ Mn (R).
Solution. Define the map T : Mn (R) → Mn (R) by T (Y ) = AY − Y , which you should check is
linear. If we can show that this map is surjective – that is, that image(T ) = Mn (R) – then we will
have shown that every matrix X can be written as X = T (Y ) = AY −Y . Since image(T ) ⊆ Mn (R),
it sufficies to show that dim(image(T )) = n, and by the Rank-Nullity Theorem, it suffices to show
that dim(ker(T )) = 0.
If Y ∈ ker T then 0 = T (Y ) = AY − Y , so AY = Y . By repeatedly multiplying this equation
by A, we arrive at the chain of equalities
Y = AY = A2 Y = A3 Y = A4 Y = · · · = Ak Y = 0
showing that Y = 0. Thus ker T = {0}, which in turn implies that dim(ker T ) = 0.
Exercises
2.2-1. Verify the alternative solution to Example 2.12 by showing that T (A) = A + AT is linear,
and that K = ker T .
2 a b a
2.2-2. Consider the linear transformation T : M2 (R) → R given by 7→ .
c d c
Show that U and V are subspaces of Mn (R), and that dim(U ) = n(n − k) and dim(V ) = nk.
2.2-5. Let T : V → W be a linear transformation, with {b1 , . . . , bn } a basis for V . Show that if
{b1 , . . . , bk } is a basis for ker T for some k ≤ n, then {T (bk+1 ), T (bk+2 ), . . . , T (bn−1 ), T (bn )}
is a basis for image(T ).
2.2-6. Let T1 : Pn (R) → R be the evaluation map at t0 = 1; that is, T1 (p) = p(1). Show that T1 is
surjective, and that xk − xk−1 : k = 1, . . . , n is a basis for ker T1 .
2.2-7. Fix some t0 ∈ R, and define T : Pn (R) → Pn−1 (R) by p(x) 7→ p(x + t0 ) − p(x).
(a) Argue that T is well-defined; that is, that T (p) ∈ Pn−1 (R).
37
2013-
c Tyler Holden
2 Linear Transformations 2.3 Isomorphisms
(b) Show that T is surjective, and conclude that every polynomial q of degree n − 1 can be
written as q(x) = p(x + t0 ) − p(x) for some p ∈ Pn (R).
(a) Show that for every subspace W ⊆ V , there is a linear transformation T : V → W such
that W = ker T .
(b) Show that for every subspace W ⊆ V , there is a linear transformation T : U → V such
that W = image T .
2.3 Isomorphisms
While we’ve discussed injective and surjective maps already, they will play a special role in this
section. To recall, a (not necessarily linear) function f : X → Y is injective if whenever f (x) = f (y)
then x = y. That same function is said to be surjective if for every y ∈ Y there exists an x ∈ X
such that f (x) = y. A function is bijective if it is both injective and surjective.
Focusing back on linear algebra, images and kernels play an important role in determining
whether a function is surjective and injective respectively. Per the definition, a linear transformation
T : V → W is surjective if and only if image(T ) = W . That linear transformation is injective if
and only if ker(T ) = {0} (Exercise 2.1-6). This immediately tells us that if V and W are finite
dimensional vector spaces, and T : V → W is a linear bijection between them, that
So a linear bijection between finite dimensional vector spaces can only exist if the vector spaces are
of the same dimension.
What we’re moving towards is the notion of an isomorphism. In mathematics, we’re often
interested in determining when two objects are the same, but are disguised to look different. For
example, consider the vector space R, and the subspaces
0 1
span {x} ⊆ P2 (R), and span ⊆ M2 (R).
0 0
You can quickly check that each of these spaces is 1-dimensional.
Let’s add and scalar multiply a few vectors in each subspace
1. 2 + 5 = 7 and 3 · 2 = 6
2. 2x + 5x = 7x and 3 · 2x = 6x
0 2 0 5 0 7 0 2 0 6
3. + = and 3 · = .
0 0 0 0 0 0 0 0 0 0
38
2013-
c Tyler Holden
2.3 Isomorphisms 2 Linear Transformations
When we add thing together, it “looks like” a copy of R. And in fact, these are all “the same
space” but in disguise.
So how do we make sense of this? When we defined linear transformations, we argued that the
functions should preserve the structure of the vector space; namely, addition and scalar multipli-
cation. Hence if T : V → W then T (x + y) = T (x) + T (y) and T (cx) = cT (x). There’s a sense in
which the linear transformation is copying the information from V into W . To be an isomorphism,
there should be a perfect correspondence between the vector space structure on V and the vector
space structure on W .
Definition 2.16
Two vector spaces V and W are said to be isomorphic if there exist linear maps T : V → W
and S : W → V such that S ◦ T = IV and T ◦ S = IW , where IV and IW are the identity
maps on V and W respectively. In this case, the maps S and T are said to be isomorphisms,
and we write V ∼
= W.
Remark 2.17
2. This definition does not require that V and W be finite dimensional vector spaces.
3. This is not how most linear algebra textbooks would define an isomorphism. Instead,
they say that T : V → W is an isomorphism if it is a bijective linear map. This is a
simpler definition – and we’ll show in Theorem 2.19 that it is equivalent to Definition
2.16 – but the traditional definition is misleading. As you study more mathematics,
you’ll learn about isomorphisms in those contexts as well. An isomorphism is always
defined as an invertible map which preserves structure (in our case, a linear map),
and whose inverse also preserves the structure. It just so happens that an invertible
linear map has a linear inverse, so you can skip the statement that the inverse must be
linear. However, in other fields of mathematics, the inverse of a structure-preserving
map is not usually structure-preserving.
If two spaces V and W are isomorphic, this means that as vector spaces they are identical :
They may look different, but structurally they are exactly the same.
Example 2.18
Solution. With both of the vector spaces above, there are “placeholders.” For Rn+1 written as
a column vector, the placeholder is where a number occurs in a column, say the kth position.
In the case of Pn (R), the place holder is the monomial xk . So we should define a map which
sends the kth element of the column to the coefficient of xk . Thus define the pair of inverse maps
39
2013-
c Tyler Holden
2 Linear Transformations 2.3 Isomorphisms
Both maps are linear, and are inverses to one another, and so are isomorphisms. We conclude that
Rn+1 ∼
= Pn (R).
Theorem 2.19
These are the two main ways of establishing whether a linear transformation is a isomorphism:
Show that it’s bijective, or show that it admits a linear inverse. There are many cases where one
is more advantageous than the other, so be sure that you explore both in the event that one is
proving too difficult.
Example 2.20
Solution. Consider the map T : Mm×n (R) → Mn×m (R) given by T (A) = AT . We know that this
map is linear, so it remains to show that it is bijective. We claim this map is injective, and indeed
if T (A) = AT = 0 then each entry of A must be zero, showing that A = 0. Thus ker T = {0}, and
we conclude that T is injective.
Since the dimension of the domain and codomain are both mn, by rank nullity we then have
that mn = dim(ker T ) + dim(image T ) = dim(image T ) showing that T is surjective. Thus T is
bijective, and we conclude that Mm×n (R) ∼
= Mn×m (R) as required.
A quick result inspired by Example 2.20, which you will prove in Exercise 2.3-8, is that if
T : V → W and dim(V ) = dim(W ), then T is an isomorphism if T is either injective or surjective;
40
2013-
c Tyler Holden
2.3 Isomorphisms 2 Linear Transformations
that is, proving one of injectivity or surjectivity gives you the other for free. In fact, we can go one
step further:
Theorem 2.21
If V and W are finite dimensional vector spaces such that dim(V ) = dim(W ), then V ∼
= W.
Proof. Let the dimension of both vector spaces be n, and fix bases {v1 , . . . , vn } and {w1 , . . . , wn }
for V and W respectively. Define the linear transformation T : V → W by demanding that
T (vi ) = wi , i = 1, . . . , n and extending linearly. By Exercise 2.3-8, it suffices to show that
P T is
injective. Indeed, suppose v ∈ ker T , so that T (v) = 0W . Write v in the basis for V as v = i ci vi ,
so that !
Xn Xn n
X
0W = T ci vi = ci T (vi ) = ci wi .
i=1 i=1 i=1
Since {w1 , . . . , wn } is a basis for W it is linearly independent, showing that all the ci must be zero.
This shows that v = 0V , and we conclude that ker T = {0V }. Thus T is injective, hence bijective,
and hence an isomorphism.
For each natural number n ∈ N we know there exists a vector space of dimension n; namely, Rn .
Theorem 2.21 is profound because it tells us says that there is only one vector space of dimension
n, up to isomorphism.3 Hence there is a sense in which every finite dimensional vector space is just
a copy of Rn in disguise.
Now this doesn’t mean that the study of isomorphisms is concluded. There are still many
reasons to use isomorphisms. Often we use isomorphism to change our perspective of a problem,
allowing us to solve it more simply. Other times, a linear transformation might naturally arise, and
knowing that it is an isomorphism will allow us to invoke powerful tools.
Exercises
2.3-3. Let T : V → W be a linear transformation. Show that the following are equivalent:
3
In fact, this result transcends the natural numbers. For each cardinal c, there is a unique vector space whose
dimension has cardinality c, up to isomorphism.
41
2013-
c Tyler Holden
3 Change of Basis
(a) T is injective.
(b) There exists a linearly independent set {v1 , . . . , vk } in V such that {T (v1 ), . . . , T (vk )}
is linearly independent in W .
(c) For every linearly independent set {v1 , . . . , vk } in V , {T (v1 ), . . . , T (vk )} is linearly in-
dependent in W .
2.3-4. In Section 1.6.2 we learned about external direct sums. Suppose U, W are vector spaces, and
let V = U ⊕ W be their direct sum.
2.3-5. Let T : V → W be a linear transformation. Show that the following are equivalent:
(a) T is surjective.
(b) There exists a spanning set {v1 , . . . , vk } in V such that {T (v1 ), . . . , T (vk )} is a spanning
set in W .
(c) For every spanning set set {v1 , . . . , vk } in V , {T (v1 ), . . . , T (vk )} is a spanning set in W .
2.3-6. Let T : V → W be a linear transformation. Show that the following are equivalent:
(a) T is bijective.
(b) There exists a basis {v1 , . . . , vk } in V such that {T (v1 ), . . . , T (vk )} is a basis in W .
(c) For every basis {v1 , . . . , vk } in V , {T (v1 ), . . . , T (vk )} is a basis in W .
2.3-7. We know there is a bijective correspondence between m × n-matrices A and linear transfor-
mations T : Rn → Rm . Show that there is a bijective correspondence between invertible n × n
matrices and isomorphisms T : Rn → Rn .
2.3-8. Suppose V, W are finite dimensional vector spaces such that dim(V ) = dim(W ). Suppose
that T : V → W is a linear map.
2.3-9. Let Vec denote the set of all finite dimensional vector spaces. Define an relation on Vec by
saying that V ∼ W if V ∼= W . Show that this is an equivalence relation.
3 Change of Basis
When we first started learning about linear algebra, our emphasis was on Rn and the standard
basis E = {e1 , . . . , en }. We have since learned about more abstract vector spaces V , and about
more general bases. Even in the case of Rn there are alternate bases to the standard basis.
42
2013-
c Tyler Holden
3.1 Coordinate Transformations 3 Change of Basis
There’s a subtle difference between writing (8, −4, 1) ∈ R3 as a triple of numbers in the set R3 ,
T
and writing the vector v = 8 −4 1 in the vector space R3 . In the latter case, we are thinking
of this as representing the vector in the standard basis as
v = 8e1 + (−4)e2 + e3 .
This distinction may seem pointless, because the two representations coincide. However, there’s
nothing mathematically special about the standard basis, and we could have used a different basis.
Suppose instead that we used that basis
1 1 1
b1 = 0 , b2 = −1 , and b3 = 1 .
1 0 −1
We know that in any basis B = {b1 , b2 , b3 }, every vector v can be written uniquely as a linear
combination of the basis elements. In the case of (8, −4, 1) we could write this as
1 1 1
v = 2 0 + 5 −1 + 1 1 .
1 0 −1
If the basis is understood to
be implicit,
then the coefficients are the important part, and we can
say that v corresponds to 2 5 1 .
T T
Now we’re in trouble, because I’ve written v = 8 −4 1 = 2 5 1 , which is clearly
T T
nonsense. So let’s be more careful and write this as v = 8 −4 1 E = 2 5 1 B . These are
the same vector, just written in different representations. Do matrices also change if we specify
different bases? Are there things which are independent of the choice of basis?
In this section, we are going to analyze how we can move between different bases and how those
representations change correspondingly.
Definition 3.1
Suppose V is a finite dimensional vector space, dim V = n, and B = {b1 , . . . , bn } is a basis
of V . The coordinate transformation of V with respect to B is the map CB : V → Rn such
that
c1
c2
CB (c1 b1 + c2 b2 + · · · + cn bn ) = . .
..
cn
Remark 3.2
1. The order of the basis elements matter! For example, E1 = {e1 , e2 } is a different basis
T T T
for R2 than E2 = {e2 , e1 }. For example, CE1 ( 2 4 ) = 2 4 , while CE2 ( 2 4 ) =
43
2013-
c Tyler Holden
3 Change of Basis 3.1 Coordinate Transformations
T
4 2 .
Example 3.3
Consider P2 (R) with the bases B = 1, x, x2 and D = x, 1 + x, 1 + x2 . If p(x) = 1 + 2x2 ,
determine CB (p) and CD (p).
Solution. We need to write p(x) = 1 + 2x2 in each of these bases. You should check that
p(x) = 1 · 1 + 0 · x + 2 · x2
= 1 · x + (−1) · (1 + x) + 2 · (1 + x2 ).
T T
Therefore, CB (1 + 2x2 ) = 1 0 2 and CD (1 + 2x2 ) = 1 −1 2 .
Now that we have a method of writing vectors in coordinates, we need to look at what happens
to linear transformation. In general, we now need two bases. If V and W are finite dimensional
vector spaces with dim V = n and dim W = m, and T : V → W is a linear transformation, then we
need to specify a basis B on V and a basis D on W . Combined with the coordinate transformations,
we can visualize this with the following diagram:
V
T /W (3.1)
CB CD
Rn / Rm
TA
4
A diagram is a series of arrows between objects. Usually these arrows represent functions, but this is not necessary.
A diagram commutes if regardless of what path you take, the evaluation is always the same. In Diagram (3.1), we
start at V and end at Rm . We can either take the path specified by CD ◦ T , or take TA ◦ CB . Both paths give precisely
the same output.
44
2013-
c Tyler Holden
3.1 Coordinate Transformations 3 Change of Basis
Definition 3.4
Let V, W be finite dimensional vector spaces of dimension n and m, and bases B and D,
respectively. If T : V → W is a linear transformation, we define the matrix of T in the
bases B and D to be the matrix A ∈ Mm×n (R) such that if TA : Rn → Rm is the linear
transformation satisfying TA (x) = Ax, then TA ◦ CB = CD ◦ T .
We can define a function MDB : L(V, W ) → L(Rn , Rm ) which assigns to each linear transforma-
tion T its matrix MDB (T ) in the bases of B and D. So how do we compute MDB (T )? The key lies
in the fact that if bi ∈ B is the ith element of the basis, then CB (bi ) = ei . Thus if we substitute
bi into Equation (3.2) we get
So the ith column of A can be computed by evaluating CD (T (bi )), meaning that
MDB (T ) = CD (T (b1 )) CD (T (b2 )) · · · CD (T (bn )) . (3.3)
Note that when V = Rn , W = Rm , and En , Em are the standard bases on Rn and Rm , then Equation
(3.3) is how we usually compute the matrix associated to a linear transformation.
Example 3.5
Let I2 : R2 → R2 be the identity map. If the domain and codomain are equipped with the
bases
1 1 2 1
B= , and D = ,
1 0 1 2
respectively, find MDB (I2 ).
Solution. Let bi denote the ith element of the basis B. According to Equation (3.3), we need to
compute CD (I2 (bi )) = CD (bi ) for i = 1, 2. Doing this, we get
1 1 1
CD (b1 ) = CD =
1 3 1
1 1 2
CD (b2 ) = CD = .
0 3 −1
Thus
1 1 2
MDB (I2 ) = ,
3 1 −1
showing that the identity transformation’s matrix need not be the identity matrix.
45
2013-
c Tyler Holden
3 Change of Basis 3.1 Coordinate Transformations
Example 3.6
Solution. Let bi , i = 1, 2, 3 be the basis elements of B in order. Per Equation (3.3), we need to
compute CD (T (bi )) for each basis element. Doing so we have
1 0 1 T
CD (T (b1 )) = CD 1 0 1 0 ,
=
0 0 2
1 1 1 T
CD (T (b2 )) = CD = 1 2 1 0 ,
1 0 2
1 0 T
CD (T (b2 )) = CD = 1 0 0 0 .
0 1
Proof. The algebraic proof of this is messy, but the proof effectively boils down to the fact that
46
2013-
c Tyler Holden
3.1 Coordinate Transformations 3 Change of Basis
ST
V
T /W S /& U
CB CD CF
Rn / Rm / k
TMDB (T ) TMF D (S) 7R
TMF B (ST )
This gives some insight into why we’ve written the bases backwards when writing MDB , since
it ensures adjacent “cancellation” in Equation (3.4) above.
Exercises
3.1-1. Let V be a finite dimensional vector space and B be a basis for V . Let CB : V → Rn be the
corresponding coordinate transformation on V .
3.1-2. In each case of a finite dimensional vector space V and basis B below, determine the coordinate
transformation CB : V → Rdim V
3.1-3. Let 0 : V → W be the zero map. Show that for any bases B of V and D of W , MDB (0) = 0,
the zero matrix.
3.1-4. Let I : V → V be the identity map, and dim(V ) = n. Show that for any bases B of V ,
MBB (I) = In .
3.1-5. Let T : P2 (R) → M2 (R) be the transformation defined in Example 3.6, with B and D defined
there as well.
47
2013-
c Tyler Holden
3 Change of Basis 3.2 Change of Basis Matrix
a+b
3.1-6. Consider the map T : P3 (R) → R2 given by T (a + bx + cx2 + dx3 ) = .
c−d
(a) Let B1 = 1, x, x2 , x3 and E1 be the standard basis for R2 . Determine MDB (T ).
2 3
1 −1
(b) Let B1 = 1, x, x , x and E2 = , be a basis for R2 . Determine MDB (T ).
1 1
(c) Let B2 = 2, 1 − x, x2 + x3 , x3 and E1 be the standard basis for R2 . Determine MDB (T ).
(d) Let B2 and E2 be as defined above. Determine MDB (T ).
3.1-7. Suppose V is a vector space with two bases, B and D. If CB and CD are the respective
coordinate transformations, find a matrix P such that for all v ∈ V , CB (v) = (TP ◦ CD )(v).
3.1-8. Let T : V → W be a linear transformation of finite dimensional vector spaces. Assume
dim(V ) = dim(W ). Show that the following are equivalent:
(a) T is an isomorphism
(b) For every basis B of V and D of W , MDB (T ) is invertible.
(c) There exists a basis B of V and D of W such that MDB (T ) is invertible.
3.1-9. Suppose V and W are finite dimensional vector spaces with dim V = n and dim W = m. Let
T : V → W be a linear map.
Showthat if k = dim(ker T ), then there are bases B of V and
0 In−k
D of W such that MDB = . Hint: Start with a basis for ker T and extend this to a
0 0
basis of V .
We’ll focus our conversation from the previous section to the special case of linear operators T :
V → V , where both copies of V are endowed with the same basis. In this case, we will write
MB (T ) = MBB (T ) for convenience of notation. Now if we change the basis on V to D, the question
we’d like to consider is how to relate MB (T ) to MD (T ); that is, how do we change the basis?
Let’s do a toy example to set up our problem:
Example 3.8
find MB (T ) and MD (T ).
Solution. For MB (T ) we need to determine CB (T (b1 )) and CB (T (b2 )), where bi are the basis
elements of B:
T T
CB (T (b1 )) = CB 2 0 = 1 1
T T
CB (T (b2 )) = CB 0 2 = 1 −1
48
2013-
c Tyler Holden
3.2 Change of Basis Matrix 3 Change of Basis
so
1 1
MB (T ) = .
1 −1
Let di denote the basis elements for D, so that
T 1
CD (T (d1 )) = CD 9 −1 = −1 49
5
T 1
CD (T (d2 )) = CD 1 1 = 1 1
5
so
1 −1 1
MD (T ) = .
5 49 1
The relationship between MB (T ) and MD (T ) is probably not clear, so we’ll have to think about
this a bit more deeply. You hopefully found in Exercise 3.1-7 that CD = TMDB (IV ) ◦ CB , or if v ∈ V
then CD (v) = MDB (IV )CB (v), where IV : V → V is the identity map.
Definition 3.9
If V is a vector space with bases B and D, we define the change of basis matrix PDB =
MDB (IV ).
Theorem 3.10
1. CD = TPDB ◦ CB ,
2. CB ◦ T = TMB (T ) ◦ CB , and
3. CD ◦ T = TMD (T ) ◦ CD .
Since CB is an isomorphism, it is invertible. Precomposing with CB−1 gives the desired result.
Let’s translate the result of Theorem 3.10 at the level of matrices, which says that PDB MB (T ) =
−1
MD (T )PDB . Since PDB is invertible, we can write this as MB (T ) = PDB MD (T )PDB , so that MB (T )
and MD (T ) are similar.
49
2013-
c Tyler Holden
3 Change of Basis 3.2 Change of Basis Matrix
Example 3.11
Consider the same linear transformation and bases as Example 3.8. Compute PDB and
−1
confirm that MB (T ) = PDB MD (T )PDB .
Solution. By definition, we know PDB = MDB (I2 ) = CD (b1 ) CD (b2 ) . Computing these terms
we get
1 1 1
CD (b1 ) = CD =
1 5 1
1 1 −1
CD (b2 ) = CD =
−1 5 9
so
1 1 −1 −1 1 9 1
PDB = with PDB = .
5 1 9 2 −1 1
−1 −1
You can quickly check that PDB = PBD and that MB (T ) = PDB MD (T )PDB .
Theorem 3.10 says that given two bases B and D, the matrices MB (T ) and MD (T ) are similar.
Is the converse true? Namely, given two similar n × n matrices A = P −1 BP , are there bases B
and D such that A = MB (TA ) and B = MD (TB )? One of these is easy: By setting B = E to be
the standard basis of Rn , we get ME (TA ) = A, so the only question is whether the desired basis D
exists.
Theorem 3.12
Proof. Since A and B are similar, there is an invertible P ∈ Mn (R) such that A = P −1 BP . Set
D = {p1 , p2 , . . . , pn }, where pi is the ith column of P . We claim that this basis does the trick.
Indeed, note that
PED = MED (In ) = CE (p1 ) CE (p2 ) · · · CE (pn ) = p1 p2 · · · pn = P.
−1
So by Theorem 3.10 we have that MD (TA ) = PED ME (TA )PED = P −1 AP = B, as required.
The proof technique above, performed in reverse, also gives us the following convenient way of
computing MD (TA ):
Corollary 3.13
If A ∈ Mn (R) and D is a basis for Rn , then MD (TA ) = P −1 AP where P is the matrix whose
columns are the elements of D.
Hence if A ∈ Mn (R) is diagonlizable (A = P −1 DP for some diagonal matrix D), then taking
the columns of P as a basis D for Rn will make MD (TA ) = D a diagonal matrix.
50
2013-
c Tyler Holden
3.2 Change of Basis Matrix 3 Change of Basis
Recall that the determinant and trace of a matrix are invariant under similar matrices. For
example, if A = P −1 BP then
by the cyclic invariance of the trace. Since different representations of linear operators in different
bases are all naturally similar, we can extend these notions to general linear transformations in a
well-defined manner.
Definition 3.14
Suppose V is a finite dimensional vector space, T : V → V is a linear operator, and B is a
fixed basis for V .
There are also basis independent ways of defining these objects, but that’s a bit trickier. Fur-
thermore, with access to the characteristic polynomial of a linear transformation, we can define
eigenvalues, and with eigenvalues we can define eigenvectors.
51
2013-
c Tyler Holden
3 Change of Basis 3.2 Change of Basis Matrix
Definition 3.16
Let T : V → V be a linear operator. A vector v ∈ V is said to be an eigenvector of T if
there exists some λ ∈ R such that T (v) = λv. In this case, λ is said to be the eigenvalue
associated to v.
As with the case of matrix operators, the eigenvalues of T : V → V are the roots of the
characteristic polynomial. Indeed, since T (v) = λv, we know that (T − λI)v = 0. For this to have
non-trivial solutions, it must be the case that T is not a linear isomorphism, which corresponds to
having zero determinant.
Example 3.17
Definition 3.18
If T : V → V is a linear operator, and λ is an eigenvalue of T , then the eigenspace of T
corresponding to λ is
Eλ = {v : T (v) = λv} .
Exercises
3.2-1. For each vector space and pair of bases, compute the change of basis matrix:
52
2013-
c Tyler Holden
3.3 Invariant Subspaces 3 Change of Basis
(d) V a four dimensional vector space with bases B = {b1 , b2 , b3 , b4 } and D = {b2 , b4 , b3 , b1 }.
3.2-5. For each of the following linear operators, compute the rank, determinant, and characteristic
polynomial:
Definition 3.19
If V is a vector space with subspace U , and T : V → V is a linear operator, we say that U
is T -invariant if T (U ) ⊆ U .
53
2013-
c Tyler Holden
3 Change of Basis 3.3 Invariant Subspaces
Checking generic points can be tricky. There’s is a much simpler way of determining whether
a space is T -invariant if you can determine a spanning set for U .
Proposition 3.21
P Suppose that T (ui ) ∈ U for each ui . If u ∈ U , write u in terms of the spanning elements as
Proof.
u = i ci ui . Applying T gives
!
X X
T (u) = T ci ui = ci T (ui ) .
| {z }
i i
∈U
Since U is a subspace it is closed under linear combinations, and so we conclude that T (u) ∈ U as
required.
Example 3.20 is now a bit easier to solve, if we recognize that U = span x2 − x, x − 1 . To
show that U is T -invariant, we need only show that each of these spanning elements is T -invariant,
and indeed
T (x2 − x) = x2 − x ∈ U
T (x − 1) = 2x − 2 = 2(x − 1) ∈ U
so U is T -invariant. In fact, you may have recognized that the spanning elements here are precisely
the eigenvectors we found in Example 3.17. In general, eigenspaces Eλ of the operator T are
T -invariant.
Proposition 3.22
The fact that eigenspaces represent natural T -invariant spaces will be helpful, especially in light
of the following:
54
2013-
c Tyler Holden
3.3 Invariant Subspaces 3 Change of Basis
Theorem 3.23
k
X X̀
T (wi ) = 0ur + ds ws .
r=1 s=1
Thus
c1 0
.. ..
. .
ck 0
CB (T (ui )) =
0
and CB (T (wi )) =
d1 .
.. ..
. .
0 d`
Both results together show that MB (T ) is block diagonal, as required.
One can inductively extend Theorem 3.23, so that if V = U1 ⊕ U2 ⊕ U3 then there is a basis B
such that MB (T ) is further refined into a three-block diagonal, and so on.
Example 3.24
Let E = {e1 , e2 , e3 } be the standard basis for R3 , and let T : R3 → R3 be the linear operator
which rotates about e3 by an angle π/4 in the counter-clockwise direction when the origin
is viewed from e3 . Write T in block diagonal form.
Solution. Intuitively, our invariant subspaces are W = {e1 , e2 } and U = {e3 }. Indeed, once can
check that T (e3 ) = e3 ,
1 −1
1 1
T (e1 ) = √ 1 and T (e2 ) = √ 1 .
2 0 2 0
55
2013-
c Tyler Holden
4 Inner Products and Friends
Exercises
3.3-1. Let T : V → V be a linear operator. Show that {0}, V , ker T , and image(T ) are all T -invariant
spaces.
(a) Let U± = {v ∈ V : T (v) = ±v}. Show that U+ and U− are T T -invariant subspaces, and
that V = U+ ⊕ U−
I 0
(b) Show that there is a basis B in which MB (T ) = k for some k ∈ N.
0 −In−k
3.3-3. Suppose T : V → V is a linear operator such that T ◦T = T (such maps are called projections).
(a) Let U = {v : T (v) = v}. Show that U is a T -invariant subspace of V , and that V =
U ⊕ ker T .
Ik 0
(b) Argue that there is a basis B of V in which MB (T ) = for some k ∈ N.
0 0
3.3-4. Let T : V → V be a linear operator, with U ⊆ V a general subspace. Define the set
( k )
X
T i
U = T (ui ) : k ∈ N, ui ∈ U, i = 1, . . . , k ,
i=0
In this next portion, we’re going to add an additional structure to our vector spaces. We’ll actually
see a list of three structures, but they’re all closely related.
What our vector spaces have been missing up until now, is some method of measurement. For
example, you’re probably familiar with the idea that a vector should have a length (or magnitude,
56
2013-
c Tyler Holden
4.1 Measurement Devices 4 Inner Products and Friends
if you’re a physicist). If vectors have length, we should be able to measure the distance between
two vectors. If we’re lucky, maybe we can even find the angle between two vectors.
All of this becomes much more abstract as the vector spaces themselves change. For example,
it’s a bit weird to ask ourselves the distance between the vectors f (x) = sin(x) and g(x) = ex in
C(R), or to ask the angle between them. But nonetheless, we can do this.
The inner product is the most power measurement device we’ll examine. It gives the ability to
measure angles, lengths, and distances.
Definition 4.1
Given a real vector space V , an inner product on V is a map h·, ·i : V × V → R satisfying
Combining the symmetry and linear properties of an inner product tell us that an inner product
is actually bilinear, or linear in each of its components:
For this reason, you might see an inner product defined as a positive definite, symmetric, bilinear
mapping.
While there are many different kinds of inner products, the one with which we will be most
concerned is the Euclidean inner product, also known as simply the dot product. Given two vectors
x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in Rn , we write
n
X
hx, yi = x · y = xi yi = x1 y1 + x2 y2 + · · · + xn yn .
i=1
Example 4.2
Show that the Euclidean inner product satisfies the criteria of Definition 4.1.
57
2013-
c Tyler Holden
4 Inner Products and Friends 4.1 Measurement Devices
If V = C[−π, π] is the collection of continuous functions on [−π, π], we can define an inner
product via Z π
1
hf, gi = f (x)g(x) dx. (4.1)
2π −π
In fact, this is one of those most important inner products in mathematics and physics, with an
astounding number of real-life applications like encryption, compression, and signal analysis.
Definition 4.3
If V is an inner-product space, two vectors x, y ∈ V are said to be orthogonal if hx, yi = 0.
Example 4.4
Let C[−π, π] be endowed with the inner product given in (4.1). Show that fn (x) = sin(nπx)
and gm (x) = cos(mπx) are orthogonal for all m, n ∈ Z.
If V = Mn (R), recall that the trace of A ∈ Mn (R) is the sum of its diagonal elements. We can
define the Hilbert-Schmidt inner product
You will show in Exercise 4.1-3 that this is really just the Euclidean inner product.
5
If you’ve never seen this before, use the angle sum identities on the right hand side.
58
2013-
c Tyler Holden
4.1 Measurement Devices 4 Inner Products and Friends
Remark 4.5 Very early in these notes, I mentioned that we’d be dealing with real vector
spaces. More generally, you can study vector spaces over any field F , where we demand
that the scalars must come from F . If F = C for example, we have complex vector spaces.
The only change we must make to the definition of an inner product is that the symmetry
property becomes hx, yi = hy, xi, where the bar indicates complex conjugation.
In both (4.1) and (4.2), throw in a complex conjugate over one of the entries, and you now
have the complex inner products. These two inner products are the inner products used in
Quantum Mechanics. In the former case, Equation (4.1) describes the inner product on the
space of wave functions (the Schrödinger picture), while Equation (4.2) describes the inner
product on the space of quantum operators (the Heisenberg picture).
One final property we’ll need for the next section is the following inequality:
Proposition 4.6: Cauchy-Schwarz
If V is a real vector space with an inner product h·, ·i, then for every x, y ∈ V we have
Proof. If x = 0 both sides are zero and the result is trivially true, so assume x 6= 0. Let p =
hx, yi / hx, xi, which you may recognize as the projection coefficient of y onto x. Now
This term is always non-negative by the positive-definite property of the inner product, hence
hx, yi2
hy, yi − ≥0 ⇒ hx, yi2 ≤ hx, xi hy, yi
hx, xi
from which the inequality follows by taking square roots. For equality, note that the first line is
zero if and only if y − px = 0, or y = px, showing that x and y are linearly dependent.
4.1.2 Norms
The next structure is called a norm, and prescribes a way of measuring the length of a vector.
59
2013-
c Tyler Holden
4 Inner Products and Friends 4.1 Measurement Devices
Definition 4.7
Let V be a real vector space. A norm on V is a map k·k : V → R satisfying,
n
!1/2 q
p X
kxk := hx, xi = x2i = x21 + x22 + · · · + x2n .
i=1
Recognize that this generalizes the Pythagorean Theorem in R2 , since if x = (x, y) then the vector
x looksplike the hypotenuse of a triangle with side lengths x and y. The length of the hypotenuse
is just x2 + y 2 = kxk. I will leave it as an exercise to show that the Euclidean norm is indeed a
norm.
Above I commented that we could define the Euclidean norm using the Euclidean inner product.
This extends more generally:
Proposition 4.8
p
If (V, h·, ·i) is an inner product space, then kxk = hx, xi defines a norm on V .
Proof. Non-degeneracy and homogeneity follow immediately from the definition of an inner product
as you should check. All the remains to be shown is the triangle inequality, wherein
Taking the square root of both sides gives the desired result.
Knowing that inner products induce norms, it’s natural to ask whether every norm comes from
an inner product. The answer is no, via the following theorem which is surprisingly difficult to
prove:
Theorem 4.9
If (V, k·k) is a normed vector space, then the norm is induced by an inner product if and
only if
2kxk2 + 2kyk2 = kx + yk2 + kx − yk2 .
The converse direction is a matter of algebra (Exercise ??), but the forward direction is quite
tricky. If you’re interested in trying it, you first need a candidate for what the inner product should
60
2013-
c Tyler Holden
4.1 Measurement Devices 4 Inner Products and Friends
The p-norm comes from an inner product if and only if p = 2, which is precisely the Euclidean
norm.
4.1.3 Metrics
Finally, one has a metric. Metrics are the most flexible, least rigid structure we’ll impose on a
space. Loosely speaking, metrics prescribe a method for determining the distance between two
vectors.
Definition 4.10
A set X with a function d : X × X → R is said to be a metric space if
Note that a metric space does not need to be a vector space: The definition of a metric has
no mention of addition, multiplication, or even of an 0 element. However, just as inner products
induced norms, so too do norms induce metrics.
Proposition 4.11
If V is a real vector space with a norm k·k, then the function d : V × V → R given by
d(x, y) = kx − yk, defines a metric.
That the three properties of a metric are satisfied is almost immediate, and I’ll leave the proof
to Exercise 4.1-5. This means that from the Euclidean norm we have the Euclidean metric. If
x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) then the Euclidean metric is
n
!1/2
X p
d(x, y) = kx − yk = (xi − yi )2 = (x1 − y1 )2 + · · · + (xn − yn )2 .
i=1
In R2 , this agrees with the usual distance formula. As with inner product-induced norms, one
can ask whether all metrics are induced by norms. Generally no, with the following proposition
indicating when it is the case.
61
2013-
c Tyler Holden
4 Inner Products and Friends 4.1 Measurement Devices
Proposition 4.12
Proposition 4.12 is relatively straightforward to prove, and has been left to Exercise 4.1-6. A
straightforward example of a metric which is not induced by a norm is the discrete metric. If X is
any set (say a vector space), define
(
0 x=y
d(x, y) = .
1 otherwise
You can check that this is indeed a metric, and that it is not induced by a norm.
Exercises
4.1-1. Let C[−π, π] be endowed with the inner product given in Equation (4.1). We’ve already
shown that {fn : n ∈ Z} and {gm : m ∈ Z} are mutually orthogonal.
4.1-2. Let Mn (R) be endowed with the Hilbert-Schmidt inner product defined in Equation (4.2).
Show that this is indeed an inner product.
4.1-3. Let Mn (R) be endowed with the Hilbert-Schmidt inner product. Fix the isomorphism Mn (R) ∼
=
2
Rn by “stacking the columns.” Show that Hilbert-Schmidt norm agrees with the Euclidean
norm under this transformation.
4.1-4. If V is a vector space, two norms k·k1 and k·k2 are said to be equivalent if there exists positive
real numbers α, β such that α kxk1 ≤ kxk2 ≤ β kxk1 for all x ∈ V .
(a) Show that being equivalent defines an equivalence relation on the set of norms.
(b) Show that the following two norms are equivalent in Rn ,
" n
#
X
kxk1 = max |xi | and kxk2 = x2i .
i∈{1,...,n}
i=1
4.1-5. Prove Proposition 4.11 by showing that the metric induced by a norm is indeed a metric.
4.1-7. Let V be a vector space, and S = {b1 , . . . , bn } be a set of pairwise orthogonal vectors. If the
zero vector is not in S, show that S is linearly independent.
62
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends
4.2.1 Orthonormality
If V is an inner product space, Definition 4.3 told us that v, w ∈ V are said to be orthogonal if
hv, wi = 0. We’ve also learned that we can use a basis to reduce most computations in a vector
space to a finite number of computations on the basis elements. To play nicely with an inner
product, we might want to impose additional constraints on our basis:
Definition 4.13
If V is a vector space, a basis B is said to be an orthogonal basis if hv, wi = 0 for all v, w ∈ B
such that v 6= w. It is further said to be an orthonormal basis if it is an orthogonal basis,
and hv, vi = 1 for each v ∈ B.
For example, in R2 the following two bases are both orthonormal in the Euclidean inner product:
1 0 1 1 1 1
, and √ ,√ .
0 1 2 1 2 −1
63
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases
Note that any non-zero vector can be made normal by dividing by its norm. Indeed, if v ∈ V is a
non-zero vector, then v/kvk is normal, since
v
1
kvk
= kvk kvk = 1.
Thus the only real obstruction to finding an orthonormal basis is to first find an orthogonal one.
In our second orthonormal basis example above, had we started with the orthogonal basis
1 1
, ,
1 −1
√
then we arrive at an orthonormal basis by dividing each by its norm 2.
To see why orthogonal bases are particularly nice, consider the following theorem:
Theorem 4.14
P
Proof. Since B is a basis, we know that v = i ci bi for some choice of ci ∈ R. If we inner product
this against b1 we get
* n
+ k
X X
hb1 , vi = b1 , ci bi = ci hb1 , bi i .
i=1 i=1
Since B is orthogonal, we know that hb1 , bi i is zero for all i 6= 1, and when i = 1 we can rewrite
this as hb1 , bi i = kb1 k2 , so our equations becomes
hb1 , vi
hb1 , vi = c1 kb1 k2 ⇒ c1 = .
kb1 k2
Now this clearly didn’t depend on the fact that we used b1 , and if we repeat this process with any
other bi , we could get ci = hbi , vi /kbi k2 . These are precisely the coefficients given in Equation
(4.3).
Notice then that if B is an orthogonal basis, Theorem 4.14 says that the coordinate transfor-
mation has the form
hv, b1 i
kb1 k2
CB (v) = ... .
hv, bn i
kbn k2
This means that we don’t have to do any row reduction to establish the coordinate transformations.
64
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends
Example 4.15
Consider R2 with the Euclidean inner product. Let B be the orthonormal basis
1 1 1 1
B= √ ,√ .
2 1 2 −1
T
Write the vector v = a b in this basis.
Solution. Since B is orthonormal, we just need to compute hv, bi i for each bi ∈ B. Indeed,
a+b a−b
hv, b1 i = √ and hv, b2 i = √ .
2 2
Thus
a a+b a−b
= √ b1 + √ b2 .
b 2 2
We’ll see in the next section that the form of these coefficients has a particularly nice geometric
interpretation.
Let Rn be endowed with the standard Euclidean inner product (the dot product), and recall form
your previous linear algebra experience that if u, v ∈ Rn , we define the projection of v onto the
subspace generated by u by
hv, ui
proju (v) = u.
kuk2
In the special case where u is normal (kuk = 1), the projection reduces to projv u = hv, ui u. In
fact, the reason that the kuk2 term appears is to ensure that we project onto a normal vector. Per
our discussion in Section 4.2.1, u/kuk is normal, so projecting onto it gives
u u hv, ui
v, = u.
kuk kuk kuk2
Nothing about our discussion above was particular to Rn or the Euclidean inner product, so we
can define projections in arbitrary spaces as follows:
Definition 4.16
Let V be an inner product space, and u ∈ V . If v ∈ V is any other vector, then
hv, ui
proju (v) = u
kuk2
is an element of span u, and in that case we say that proju (v) is the projection of V onto
the subspace spanned by u.
65
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases
It should be evident that proju (v) ∈ span {u}, since it is just a scalar multiple of u. One of the
key considerations of this definition is that v − proju (v) is then orthogonal (Definition 4.3) to u,
and hence any element in span {u}. Indeed,
hv, ui
hv − proju (v), ui = hv, ui − hu, ui = 0.
kuk2
S ⊥ = {v ∈ V : hv, wi = 0, ∀w ∈ S} .
Our goal is to say that V = U ⊕ U ⊥ for any subspace U of V . Before we can do that, we should
check a few things about U ⊥ . These are all good exercises, and have been left to Exercise 4.2-4.
Proposition 4.18
Definition 4.19
If V is a finite dimensional inner product space, let U be a subspace with orthogonal basis
{b1 , . . . , bk }. If v ∈ V , then the orthogonal projection of v onto U is
k
X hv, bi i
projU (v) = bi . (4.4)
i=1
kbi k2
Knowing that U ⊥ is a subspace means we’re on the right track to showing that it is comple-
mentary to U . However, we’re going to need a way of writing arbitrary vectors as the sum of an
element in U and an element in U ⊥ . To do that, we’ll use the following:
66
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends
Proposition 4.20
If V is a finite dimensional inner product space and U is a subspace of V , then for any v ∈ V
we have that projU (v) ∈ U and v − projU (v) ∈ U ⊥ .
Proof. Fix an orthogonal basis B = {b1 , . . . , bk } of U . Equation (4.4) makes it clear that projU (v) ∈
span {b1 , . . . , bk } = U , so we need only show the orthogonality part. To show that v − projU (v) ∈
U ⊥, P
we need to show that it is orthogonal to every element of U . Suppose u ∈ U , and write
u = i ci bk , so that
* k + k
X hv, bi i X hv, bi i
hv − projU (v), ui = hv, ui − bi , u = hv, ui − hbi , ui (4.5)
i=1
kbi k2 i=1
kbi k2
Now
* k
+ * k
+ k
X 2
X X
hbi , ui = bi , cj bj = ci kbi k and hv, ui = v, ci bi = ci hv, bi i
i=j i=1 i=1
k
X k
X k
X
hv, bi i hv, bi i
hv − projU (v), ui = hv, ui − 2 hbi , ui = ci hv, bi i − ci kbi k2
i=1
kbi k i=1 i=1
kbi k2
k
X k
X
= ci hv, bi i − ci hv, bi i = 0,
i=1 i=1
Corollary 4.21
Proof. It suffices to show that V = U + U ⊥ and U ∩ U ⊥ = {0}. To show that the intersection is
trivial, let v ∈ U ∩ U ⊥ be any element. Since v ∈ U ⊥ , we know that hv, wi = 0 for all w ∈ U . But
since v ∈ U this implies that hv, vi = 0. By positive-definiteness of the inner product, v = 0.
To show that V = U + U ⊥ , fix any element v ∈ V . Note that v = projU (v) + (v − projU (v)).
By Proposition 4.20, this is an element of U + U ⊥ , and since v was arbitrary, it must be the case
that V = U + U ⊥ .
Example 4.22
Take V = R3 with the standard inner product, v = (1, 0, −1)T , and let S = span {v}. Find
S⊥.
67
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases
Solution. As dim(S) = 1 we know dim(S ⊥ ) = 2, and as such is a plane. Every element of S looks
like αx = (α, 0, −α) for some α ∈ R, so the orthogonal complement is
S ⊥ = {y : y · x = 0, ∀x ∈ v}
= {(x, y, z) : (x, y, z) · (α, 0, −α) = 0}
= {(x, y, z) : x − z = 0} .
So given a subspace, how do we find or construct an orthogonal basis? The key is to use Proposition
4.20 to inductively construct an orthogonal basis.
Corollary 4.23
Proof. Proposition 4.20 already tells us that v − projU (v) is orthogonal to every element of B, so
it suffices to show that it’s non-zero. Let’s prove the contrapositive; namely, if v − projU (v) = 0
then v ∈ U . This follows quickly, since by hypothesis we must have
k
X hv, bi i
v = projU (v) = bi ∈ span B = U.
i=1
kbi k2
Recall from Exercise 4.1-7 that an orthogonal set which does not contain the zero vector is
automatically linearly independent. Together with Corollary 4.23, this means that if we have an
orthogonal, linearly independent set B, we can build successively larger linearly independent sets
by finding an element v which is not in the span of B, and adding v − projU (v) to our collection.
We are guaranteed that this new set of vectors is also orthogonal and linearly independent. If our
space if finite dimensional, we will eventually exhaust all elements of the space, and hence have
created an orthogonal basis. This is called the Gram-Schmidt Orthogonalization Algorithm:
Proposition 4.24: Gram-Schmidt Orthogonalization
1. f1 = b1
2. Assume the vectors fi , i = 1, . . . , ` exist and let U` = span {f1 , . . . , f` }. Define f`+1 =
b`+1 − projU` (b`+1 ).
Proof. It is evident that F is orthogonal, so we need only show that we span U . For this, we will
proceed by induction, and show that span {b1 , . . . , b` } = span {f1 , . . . , f` } for each ` = 1, . . . , k.
68
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends
Since f1 = b1 , it’s evident that span {b1 } = span {f1 }. Now suppose that span {b1 , . . . , b` } = U` =
span {f1 , . . . , f` }. It suffices to show that b`+1 ∈ span {f1 , . . . , f` , f`+1 } and vice-versa for f`+1 .
By definition, we know that f`+1 = x`+1 − projU` (x`+1 ), and that projU` (x`+1 ) ∈ U` =
P
span {x1 , . . . , x` }. Thus we can find scalars ci such that projU` (x`+1 ) = `i=1 ci xi , and
X̀
f`+1 = x`+1 − ci x` ∈ span {x1 , . . . , x`+1 }
i=1
as required. Conversely, we do the same thing. Since projU` (x`+1 ) ∈ U` = span {f1 , . . . , f` } we find
coefficients di such that
X̀
f`+1 = x`+1 − projU` (xl+1 ) = x`+1 − di fi .
i=1
By rearranging this equation to solve for x`+1 , we have that x`+1 ∈ span {f1 , . . . , f`+1 }. Both
inclusions give the desired result.
Before doing any examples, note that if we replace fi with cfi , then for any vector v we have
This means that once we’ve found a vector fi , we can multiplying it by a scalar without affecting
the orthogonality algorithm. This is useful, since the projection often involves a lot of fractions,
and carrying those fractions throughout the algorithm becomes cumbersome. On the other hand,
we shouldn’t be surprised by this. Multiplying a vector by a scalar affects neither its span, nor its
orthogonality (hci fi , cj fj i = ci cj hfi , fj i = 0), so this seems like a reasonable thing to do.
Example 4.25
Consider R4 with the standard Euclidean inner product. Find an orthogonal basis of the
space
1 0 1
1 , , 1 .
1
U = span 0 1 1
0 0 0
Solution. The question is posed in such a way that we have an obvious basis to work with. Let
b1 , b2 , b3 be the bases, in order, given in the problem statement. Working through our problem,
69
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases
we have
T
1
1
f 1 = b1 =
0
0
0 1 −1/2 −1
hb2 , f1 i 1 1 1 1/2 1
f̂2 = b2 − f1 =
1 − 2 0 =
take f2 = ,
kf1 k2 1 2
0 0 0 0
1 1 −1 1/3 1
hb3 , f1 i hb3 , f2 i 1 2 1 2 1 −1/3 −1
f̂3 = b3 − f3 =
2 f1 − 2 f2 = 1 − 2 0 − 6 2 = 1/3 take 1 .
kf1 k kf2 k
0 0 0 0 0
Try Example 4.25 without rescaling f2 , and you’ll see how terrible this computation becomes
in even the simplest of cases.
Exercises
P {b1 , . . . , bn } is an
4.2-3. Suppose P orthonormal basis for a inner product space V . Show that if
v = i ci bi then kvk2 = i c2i
4.2-4. Let V be a finite dimensional inner product space. Show that for any subspace S ⊆ V ,
(S ⊥ )⊥ = S.
4.2-6. Show that v, w ∈ V are orthogonal if and only if kv + wk2 = kvk2 + kwk2 .
70
2013-
c Tyler Holden
4.3 Isometries 4 Inner Products and Friends
4.2-7. Let B = {b1 , . . . , bn } be an orthogonal basis for V . Show that for any v, w ∈ V ,
n
X hv, bi i hw, bi i
hv, wi =
i=1
kbi k4
4.3 Isometries
When we discussed linear transformations and eventually isomorphisms, we talked about how we
wanted our functions to preserve the underlying structure. Since addition and scalar multiplication
were the operations of a vector space, linear transformations should preserve that. With inner
products, norms, and metrics, we’ve added an additional structure.
Definition 4.26
If (V, h·, ·iV ) and (W, h·, ·iW ) are inner product spaces, we say that the linear transformation
T : V → W is an isometry if hT x, T yiW = hx, yiV for all x, y ∈ V .
That is to say, an isometry is a linear transformation which preserves the inner product. Even
more insightful is what an isometry does to the induced norms. If k·kV and k·kW are the norms
induced by the inner products on V and W , then T is an isometry if
or equivalently, kT xkW = kxkV . This is also the definition of an isometry on a normed vector
space, and it tells us that isometries preserve the length of vectors.
Exercise 4.2-3 gives us nice example of an isometry. If V is a finite dimensional vector space of
dimension n, endowed with an orthogonal basis B = {b1 , . . . , bn }, then the coordinate transform
→ Rn is an isometry if Rn is equipped with the Euclidean inner product. Indeed, note that
CB : V P
if v = i ci bi then Exercise 4.2-3 says that
X
hv, viV = kvk2 = c2i .
i
P
On the other hand, CB (v) = c1 c2 · · · cn , which has Euclidean norm kCB (v)kEuc = i c2i .
Thus hv, viV = hCB (v), CB (v)iEuc , as required.
71
2013-
c Tyler Holden
4 Inner Products and Friends 4.3 Isometries
Proposition 4.27
If T : V → V is a linear operator on the inner product space V , then the following are
equivalent
1. T is an isometry
Proof. Let’s prove (1) ⇒ (2). Since T is an isometry, it preserves orthogonality and the norm, since
hT bi , T bj i = hbi , bj i. Hence T B is orthogonal, and it remains to show that it is a basis. We can
show that T B is a basis simply by showing that it does not contain the zero vector, since then it is
a maximally linearly independent set. Suppose then that T bi = 0 for one of the bi ∈ B, in which
chase
0 = kT bi k2 = hT bi , T bi i = hbi , bi i = kbi k2 .
This implies that bi = 0, which is impossible, since B is a basis. Thus none of the T B elements are
zero, and it’s an orthogonal basis.
The proof of (2) ⇒ (3) P to (3) ⇒ (4). Fix an arbitrary v ∈ V and write
P is trivial, so let’s move
it in the basis B as v = ci bi . Now T (v) = i ci T (bi ), and by Exercise 4.2-3 we have
n
X
2
kT vk = c2i = kvk2 .
i=1
Finally, for (4) ⇒ (1), we know that kT v − T wk2 = kT (v − w)k2 = kv − wk2 . Rewriting this in
terms of the inner product give
hT v − T w, T v − T wi = hT v, T vi + hT w, T wi − 2 hT v, T wi
= hv, vi + hw, wi − 2 hT v, T wi since kT vk2 = kvk2
= hv − w, v − wi since kT (v − w)k2 = kv − wk2
= hv, vi + hw, wi − 2 hv, wi .
Hence it must be the case that hT v, T wi = hv, wi, showing that T is an isometry.
By condition (2) of the above proposition, we can immediately conclude that all isometric
operators are isomorphisms.
Definition 4.28
A matrix A ∈ Mn (R) is said to be orthogonal if AT A = In = AAT .
72
2013-
c Tyler Holden
4.3 Isometries 4 Inner Products and Friends
In fact, if you think about how matrix multiplication works, you’ll see that the columns of an
orthogonal matrix are precisely an orthonormalbasis for Rn . This
is because if we write such an
orthogonal matrix in terms of its columns A = c1 c2 · · · cn then (AT A)ij = cTi cj = hci , cj i.
The requirement that this be the identity matrix says that hci , cj i = 1 if i = j and hci , cj i = 0 if
i 6= j. This is precisely what it means for the columns to be orthonormal, and there are precisely
n of them, so they form a basis.
Proposition 4.29
If V is a finite dimensional inner product space and T : V → V , the following are equivalent:
1. T is an isometry
Proof. For (1) ⇒ (2), fix an arbitrary orthonormal basis B = {b1 , . . . , bn } of V , and recall that
n n
B : V → R is an isometry if R is endowed
C with the Euclidean inner product. Now MB (T ) =
CB (T (b1 )) CB (T (b2 )) · · · CB (T (bn )) , and the claim that MB (T ) is orthogonal reduces to
checking that its columns are orthonormal in the Euclidean inner product. Indeed,
Thus the orthonormality of the columns of MB (T ) follows from the orthonormality of the original
basis.
The direction (2) ⇒ (3) is trivial since the Gram-Schmidt algorithm ensures the existence of
an orthonormal basis, so all that remains is (3) ⇒ (1). Let B be some orthonormal basis in which
MB (T ) is orthogonal, and again recall that CB : V → Rn is therefore an isometry. Thus
(
1 i=j
hT (bi ), T (bj )iEuc = hCB (T (bi )), CB (T (bj ))iEuc = .
0 i 6= j
Exercises
73
2013-
c Tyler Holden
4 Inner Products and Friends 4.3 Isometries
4.3-3. Let B and C be any two orthonormal bases of an inner product space V . Show that there is
an isometry T : V → V such that T takes B to C.
(a) If V is Euclidean Rn and A ∈ Mn (R), let TA : Rn → Rn , x 7→ Ax. Show that TA∗ = TAT .
(b) Let V be an arbitrary vector space with basis B, and T : V → V a linear operator. Show
that MB (T ∗ ) = MB (T )T .
4.3-5. Let V be a finite dimensional inner product space. If T : V → V is a linear operator, show
that the following are equivalent:
(a) T is symmetric.
(b) MB (T ) is symmetric for every orthonormal basis B of V .
(c) MB (T ) is symmetric for some orthonormal basis B of V .
(d) There is an orthonormal basis B = {b1 , . . . , bn } of V such that hbi , T bj i = hT bi , bj i.
4.3-6. Suppose T : V → V is a linear operator on the inner product space V , and U is a T -invariant
subspace of V . Show that U ⊥ is a T ∗ invariant subspace.
4.3-7. Recall that if V is a vector space, we define its dual space as the vector space V ∗ = L(V, R).
4.3-8. (a) If h·, ·i is an inner product on V , for each v ∈ V define the map fv : V → V by
fv (w) = hv, wi. Show that fv ∈ V ∗ for each v ∈ V .
(b) Suppose that V is a finite dimensional vector space. Show that the map T : V → V ∗
given by v 7→ fv is an isomorphism. (This result is also true if the vector space has
countable dimension, and V is “complete” in the induced norm. The proof is nearly
identical, but does require a few extra tricks.)
(c) Show that under the identification V ∼ = V ∗ , we can write h·, ·i : V × V → R as h·, ·i :
V ∗ × V → R, and that in this case the adjoint and dual operators coincide.
74
2013-
c Tyler Holden
4.4 Diagonalizability 4 Inner Products and Friends
(a) T is symmetric.
(b) T is an involution T 2 = id.
(c) T is an isometry.
4.4 Diagonalizability
In Exercise 4.3-4 we learned about the adjoint of a linear operator, and what it means to be
symmetric. Let’s repeat it just so we have it here in front of us:
Definition 4.30
If V is an inner product space and T : V → V is a linear operator, the adjoint of T is a
linear map T ∗ : V → V such that hT x, yi = hx, T ∗ yi for all x, y ∈ V . A linear operator is
symmetric (or self-adjoint) if T = T ∗ .
From Exercise 4.3-4, we know that the adjoint of the matrix representation of a linear trans-
formation is computed through via the transpose. From Section 3.2 we also know that a matrix is
diagonalizable precisely when its eigenvectors form a basis for V . We can be guaranteed that all
symmetric operators are diagonalizable, as we’ll show in this section.
Proposition 4.31
Proof. There is a way to prove this in general, but it requires that we have the notion of a sesquilin-
ear inner product, which isn’t worth introducing just for this proof. So instead, let’s recall that
eigenvalues are invariant under our choice of coordinate representative for the linear transformation.
Fix an orthonormal basis B of V so that A = MB (T ) is a symmetric matrix. Let λ be an
eigenvalue of A, so that Av = λv for some eigenvector v. It suffices to show that λ = λ. Indeed,
Indeed, if you look at this proof, you’re seeing something that looks a lot like an inner product.
We know from Exercise 4.1-8 that every inner product in Rn is of the form hx, yi = xT Ay for some
positive definite symmetric matrix A. When our vectors live in C though, we have to change this
to hx, yi = xT Ay, and A becomes a positive definite “Hermitian” matrix.
Anyway, the point of having a symmetric matrix is that it would be impossible to diagonalize
a matrix over Rn if it has complex eigenvalues. Knowing that the eigenvalues of a symmetric
transformation are therefore real gives us a stronger chance at properly diagonalizing.
75
2013-
c Tyler Holden
4 Inner Products and Friends 4.4 Diagonalizability
Proposition 4.32
2. U ⊥ is also T invariant.
Proof.
1. This is immediate. Since U is itself a vector space, the restriction of h·, ·i to U is also an inner
product space. Moreover, since hT x, yi = hx, T yi holds for every x, y ∈ V , it certainly holds
for every x, y ∈ U .
2. This follows immediately from Exercise 4.3-6. We know that U ⊥ is T ∗ invariant, but since T
is symmetric, T ∗ = T ; namely, U ⊥ is T -invariant.
Theorem 4.33
Proof. Suppose that T is a symmetric matrix, for which we will proceed by induction on the
dimension of V . If dim V = 1 then every linear operator just acts by scalar multiplication: T (v) =
cv for some c ∈ R. Clearly every vector in R is an eigenvector, and any one of these forms an
orthogonal basis for V , so we’re done.
Therefore, assume that V admits a basis of orthogonal vectors which are eigenvalues for T
whenever dim V = n − 1, and let’s prove the result for dim V = n. Fix an eigenvalue λ of T , which
we know is real, and let bn be an eigenvector associated to λ. Set U = span bn , and let U ⊥ be it’s
orthogonal complement. Clearly U is T -invariant, so U ⊥ is T -invariant. Moreover, dim U ⊥ = n − 1
and T is symmetric on U ⊥ , so U ⊥ admits an orthogonal basis {b1 , . . . , bn−1 } of eigenvectors of T .
The set B = {b1 , . . . , bn } is the desired basis.
Conversely, if B = {b1 , . . . , bn }, then MB (T ) is diagonal, and hence trivially symmetric. By
Proposition 4.29, T is symmetric.
Exercises
76
2013-
c Tyler Holden
4.4 Diagonalizability 4 Inner Products and Friends
Show that T is symmetric, and find an orthonormal basis for P2 (R) consisting of eigenvectors
of T .
• It’s characteristic polynomial pT (λ) splits over R; that is, pT can be written as a product
of linear factors.
• Whenever U is a T -invariant subspace of V , then U ⊥ is also a T invariant subspace of
V.
77
2013-
c Tyler Holden