0% found this document useful (0 votes)
118 views

Mat 224

This document defines and discusses the properties of vector spaces. It introduces the concept of a vector space as a set with operations of vector addition and scalar multiplication that satisfy certain properties. Some key properties discussed include closure, commutativity, associativity, existence of additive identities and inverses, and compatibility of addition and scalar multiplication.

Uploaded by

Souvik Ghosh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views

Mat 224

This document defines and discusses the properties of vector spaces. It introduces the concept of a vector space as a set with operations of vector addition and scalar multiplication that satisfy certain properties. Some key properties discussed include closure, commutativity, associativity, existence of additive identities and inverses, and compatibility of addition and scalar multiplication.

Uploaded by

Souvik Ghosh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

MAT224

Course Notes

Tyler Holden
Mathematics and Computational Sciences
University of Toronto Mississauga
[email protected]
ii
Contents
1 General Vector Spaces 1
1.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Operations on Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.1 Internal Direct Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.2 External Direct Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.3 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Linear Transformations 27
2.1 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 The Kernel and Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3 Change of Basis 42
3.1 Coordinate Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Change of Basis Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Inner Products and Friends 56


4.1 Measurement Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.2 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.1 Orthonormality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.3 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

iii
1 General Vector Spaces

1 General Vector Spaces

Your previous exposure to linear algebra was limited to studying spaces like Rn for some n ∈ N. Our
goal is to generalize the properties of Rn and define something called vector spaces. When studying
 T
Rn there were two fundamental “classes” of objects: There were vectors v = v1 v2 · · · vn ∈
Rn , and scalars c ∈ R. We can add two vectors together, and multiply by scalars, as follows:

         
v1 w1 v 1 + w1 v1 cv1
 v 2   w2   v 2 + w2   v2   cv2 
         
 .. +  .. =  .. and c  . =  . .
 .  .  .  ..  ..
vn wn v n + wn vn cvn

An alternative viewpoint, which will be useful throughout the course, is to write vectors in terms
 T
of the “standard basis.” Let ei = 0 · · · 0 1 0 · · · 0  denote the vector  whose value is
1 in the ith row, and 0 everywhere Pn else. The vector v = v 1 v2 · · · v n can be written as
v = v1 e1 + v2 e2 + · · · + vn en = i=1 vi ei . Here the ei act as placeholders, with vector addition
and scalar multiplication reading as

" #  
n
X Xn n
X n
X n
X
vi ei +  
wj ej = (vi + wi )ei and c vi ei = (cvi )ei . (1.1)
i=1 j=1 i=1 i=1 i=1

The idea is to now generalize this to a more abstract setting. We will define a space consisting of
vectors and scalar, demanding that vectors can be added to one another, and multiplied by scalars.
If we can prove general properties about such spaces, we may be able to discover interesting results
that apply to more than just Rn .

1
2013-
c Tyler Holden
1 General Vector Spaces 1.1 Vector Spaces

1.1 Vector Spaces

Definition 1.1
A (real) vector space is a set V equipped with two operations, called vector addition + :
V × V → V and scalar multiplication · : R × V → V . When u, v ∈ V , we often write vector
addition as u + v. When c ∈ R and v ∈ V , we will often write scalar multiplication as
c · v = cv. These two operators must satisfy the following properties:

1. [Closure of addition] For all u, v ∈ V , u + v ∈ V .

2. [Commutativity of addition] For all u, v ∈ V , u + v = v + u.

3. [Associativity of addition] For all u, v, w ∈ V , (u + v) + w = u + (v + w).

4. [Existence of an additive identity] There exists an element 0 ∈ V such that v+0 =


0 + v = v for all v ∈ V .

5. [Existence of an additive inverse] For all u ∈ V , there exists an element −u ∈ V


such that u + (−u) = 0.

6. [Closure of scalar multiplication] For all u and c ∈ R, cu ∈ V .

7. [Associativity of scalar multiplication] If u ∈ V and c, d ∈ R, then (cd)u =


c(du) = d(cu).

8. [Compatibility] If u, v ∈ V and c, d ∈ R then c(u+v) = cu+cv and (c+d)u = cu+du.

9. [Action of the multiplicative identity] For any v ∈ V , 1v = v.

The formal definition of a vector space is rather tedious, and in fact I don’t remember all of
these conditions. Mathematicians internalize the notion of a vector space differently.1 That being
said, let’s quickly discuss the implications of some of these conditions:

• The closure properties (1) and (6) are sanity conditions, and say that we’re not allowed to
leave V . In this sense, the set V = [0, 1] with the usual notions of addition and multiplication,
cannot be a vector space. Indeed, if v = 0.8 and w = 0.9 then v + w = 1.7, which is not in
V.

• Properties (2), (3), and (7) say that we don’t have to worry about the order in which we
apply our operations.

• Property (8) ensures that addition and scalar multiplication play together nicely, and is the
only condition which mixes both addition and multiplication. This is why it’s called the
compatibility condition.

• Finally, properties (4), (5), and (9) tell us how the special numbers 0, 1 ∈ R interact with
vectors, and that vectors have inverses.
1
To a mathematician, the definition of a vector space is “a module over a field.”

2
2013-
c Tyler Holden
1.1 Vector Spaces 1 General Vector Spaces

If you’ve studied the field axioms before, some of this will probably seem familiar. Like with
the field axioms, many of these seem “obvious,” but be careful not to assume that anything beyond
these facts is true. Moreover, you may have notice that I put the word “real” in brackets. In this
course we will only study real vector spaces, so I may be lazy and drop this adjective; however, in
more abstract linear algebra, you learn how to apply the principles we’ll learn to vector spaces over
other fields, such as C or Z2 .
The definition of a vector space makes it tedious to show that something is a vector space.
While you must ensure that all of the conditions are satisfied, many of them are usually trivial.
You should feel free to wave away anything that is trivial by providing a quick comment.
Example 1.2

If n ∈ N, the set Rn = {(v1 , . . . , vn ) : vi ∈ R} is a vector space under the usual operations of


vector addition and scalar multiplication.

Example 1.3

Let n be a non-negative integer, and define the set



Pn (R) = a0 + a1 x + a2 x2 + · · · + an xn : ai ∈ R ,

the set of polynomials of degree n. Show that this is a vector space under the usual notions
of polynomial addition and multiplication by a real number.

Solution. Here I’ll demonstrate what I mean by discarding most of the vector space axioms. For
example, polynomial addition is certainly commutative (2) and associative (3), so you don’t need
to prove this. Similarly, (7), (8), and (9) are all clearly true. Thus we need only prove (1), (4), (5),
and (6).
Let’s check closure. Fix two polynomials p, q ∈ Pn (R) and write them as

n
X n
X
i
p(x) = pi x and q(x) = qi x i .
i=0 j=0

If c ∈ R then
n
X n
X
p(x) + q(x) = (pi + qi )xi ∈ Pn (R) and cp(x) = (cpi )xi ∈ Pn (R) (1.2)
i=0 i=0

showing that Pn (R) is closed under both addition and scalar multiplication. Notice here the sim-
ilarly between (1.1) and (1.2), and how the xi play the role of placeholders, just like the ei do in
Rm .
The zero vector is the zero polynomial, since if 0(x) = 0 then for any other polynomial q(x) we
have 0(x) + q(x) = 0 + q(x) = q(x).This might be obvious, but the presence of the zero vector is
always critical to check.

3
2013-
c Tyler Holden
1 General Vector Spaces 1.1 Vector Spaces

Finally, every element has an additive inverse. If p is as above, define


n
X
−p(x) = (−pi )xi
i=0

so that
n
X n
X
p(x) + (−p(x)) = [pi + (−pi )] = 0xi = 0. 
i=0 i=0

This brings us to the following lemma, which gives us a few more properties of vector spaces
that can be proven from the axioms:
Lemma 1.4

Let V be a vector space, with u, v, w ∈ V and c ∈ R. The following are all true:

1. If u + v = u + w then v = w.

2. cv = 0 if and only if either c = 0 or v = 0.

3. (−1)v = −v

4. (−c)v = −(cv) = c(−v).

5. The zero vector is unique.

6. If v ∈ V , its additive inverse −v is unique.

Proof. These are all good exercises in axiom manipulation, so I will prove (2) and leave the rest to
you. Lets start by assuming that cv = 0, and show that either c = 0 or v = 0.
(⇐) Let’s start by assuming that c = 0, and show that 0v = 0 for any v ∈ V . Now 0v =
(0 + 0)v = 0v + 0v. Writing this as 0v + 0v = 0 + 0v, we can use Property (1) to cancel a 0v from
either side, giving 0v = 0.
In a similar vein, let’s now assume that v = 0 and show that c0 = 0 for any c ∈ R. Employing
a similar technique, c0 = c(0 + 0) = c0 + c0. Once again using the cancellation property from (1),
we get c0 = 0.
(⇒) If c = 0 then we’re done, so assume c 6= 0 and multiply everything by c−1 to get

v = c−1 (cv) = c−1 (0) = c−1 0 = 0,

thus either c = 0 or v = 0.

The above lemma needed to be proved for the sake of completeness, but this sort of algebraic
manipulation isn’t going to show up a lot. You should still do the other three proofs for exercise,
but don’t worry about mastering this sort of thing.
Before moving on, it’s worth pointing out Property (3) of Lemma 1.4. You may have thought
that this was obviously true, but Definition 1.1 did not give this to you. When we say that

4
2013-
c Tyler Holden
1.1 Vector Spaces 1 General Vector Spaces

v + (−v) = 0, the −v here was just notation, indicating that the element −v was related to v.
The fact that (−1)v = −v is a convenient a posteriori choice of notation. In terms of application,
note that in Example 1.3 we could have skipped showing the existence of −p(x): Once you know
that V is closed under scalar multiplication, (−1)p = −p guarantees the existence of the additive
inverse. This holds in general, and one need never check this in the definition of a vector space.
Example 1.5

Determine whether the set H = {(x, y) : y ≥ 0} is a vector space under the usual addition
and scalar multiplication.

Solution. Because we’re using the usual definitions of addition and scalar multiplication, most the
axioms are trivially true. The main conditions to check are closure of addition, closure of scalar
multiplication, and the existence of the zero vector.
Now 0 = (0, 0) is certainly a zero vector, so that’s not an issue. Moreover, vector addition is
closed, since if v = (x1 , y1 ), w = (x2 , y2 ) ∈ V , then the only restriction is that y1 , y2 ≥ 0. Adding
these together gives v + w = (x1 + x2 , y1 + y2 ), and since y1 + y2 ≥ 0, we know that v + w ∈ V as
well.
Now the troublesome property: H is not closed under scalar multiplication. Indeed, v = (0, 1) ∈
H. However, (−1)v = (0, −1) is not an element of H since its y-coordinate is negative. Because of
this, H is not a vector space. 

Exercises

1.1-1. For each of the following subsets of R, indicate whether the set is a vector space with the
usual notion of addition and multiplication inherited from R:

(a) R (b) (0, ∞) (c) {0} (d) ∅ (e) Q

1.1-2. Let Mm×n (R) denote the collection of m × n real matrices. Show that Mm×n (R) is a vector
space.

1.1-3. Let D([a, b]) = {f : [a, b] → R : f is differentiable}; that is, the set of differentiable functions
whose domain is the interval [a, b]. Show that D[a, b] is a vector space.
n o
1.1-4. Fix an N ∈ N, and let N̂ = {1, 2, . . . , N }. Define X = f : N̂ → C , the set of all functions
from N̂ to the complex numbers. Show that X is a vector space.

1.1-5. Let V be a vector space, and A be any set. Define V A = {f : A → V }; the set of all
functions from A to V . Show that V A is a vector space under pointwise addition and scalar
multiplication.

1.1-6. Let D = (x, y) : x2 + y 2 ≤ 1 . Determine whether D is a vector space.

1.1-7. Fix some n ∈ N and define Cn = {(z1 , . . . , zn ) : zi ∈ C}. Determine whether Cn is a vector
space.

5
2013-
c Tyler Holden
1 General Vector Spaces 1.2 Subspaces

1.1-8. Fix some n ∈ N and define Zn = {(k1 , . . . , kn ) : ki ∈ Z}. Determine whether Zn is a vector
space.

1.1-9. Let P (R) denote the set of all polynomials with real coefficients; that is, there is no restriction
on the degree of the polynomial. Determine whether P (R) is a vector space.

1.1-10. Complete the proof of Lemma 1.4

1.2 Subspaces

In mathematics, we are often interested in “sub-objects;” that is, subsets of objects which share
their same basic structure. In our case, this manifests as follows:
Definition 1.6
If V is a vector space, a subspace of V is any subset U ⊆ V which is a vector space under
the same operations of addition and pointwise multiplication.

Strictly speaking, these should be called “sub-vector spaces,” but contextually it is obvious that
this is what is meant. Now we’ve seen that showing that a set is a vector space is tedious, and
we don’t want to repeat the process with subspaces. Luckily, we can avoid most of these issues by
using the fact that V is a vector space.
Theorem 1.7

If V is a vector space, a set U ⊆ V is a subspace if and only if

1. 0 ∈ U ,

2. U is closed under vector addition

3. U is closed under scalar multiplication.

Proof. (⇒) If U is a subspace then it is a vector space. The three properties above are axioms (4),
(1), and (6) of Definition 1.1, and so are trivially satisfied.
(⇐) The more interesting direction is that these three properties alone are sufficient to guarantee
all nine of the vector space axioms. The reason for this is that since U ⊆ V , if the axioms hold in
V then they will hold in U as well.
For example, suppose we wanted to show that addition is commutative; that is, we want to show
that for all u, v ∈ U that u + v = v + u. The proof is boring: Since U ⊆ V we know that u, v are
also elements of V . Since V is a vector space, we know addition is commutative, so u + v = v + u
in V . But if these are the same thing in V , they must also be the same thing in U . So addition is
commutative.
This was Axiom 2, and the same argument holds for Axioms 3, 5, 7, 8, and 9 (pick one and
check). Thus all that needs to be checked is Axioms 1, 4, and 6. These are precisely the three
properties that we assume hold for U , thus U is a vector space.

6
2013-
c Tyler Holden
1.2 Subspaces 1 General Vector Spaces

Great, so we only need to check three things to ensure that a subset is a vector space. Here is
a simple example:
Example 1.8

Show that the set X = {(x, x) : x ∈ R} is a subspace of R2 .

Solution. By Theorem 1.7 it suffices to show that X has the zero vector, and is closed under both
addition and scalar multiplication.

1. The zero vector in R2 is (0, 0), and this is also an element of X, so this condition is satisfied.

2. Fix two arbitrary elements of X, let’s call them (a, a) and (b, b). Their sum is (a, a) + (b, b) =
(a + b, a + b). This is also an element of X, so X is closed under addition.

3. Fix an arbitrary c ∈ R and (a, a) ∈ X. Acting c on (a, a) gives c(a, a) = (ca, ca) ∈ X, so X
is closed under scalar multiplication.

As all three properties are satisfied, we can conclude that X is a subspace of R2 . 

Example 1.9

Determine if the set U = (x, x2 ) : x ∈ R is a subspace of R2 .

Solution. The set U is the graph of the parabola y = x2 . Now (0, 0) ∈ U so the set contains the
0 vector; however, both addition and scalar multiplication will quickly fail. For example, choosing
the vectors (1, 1) and (2, 4), adding them gives (1, 1) + (2, 4) = (3, 5). This is not an element of X
(32 6= 5). There was nothing special about this set of points, pretty much any set of points would
break addition, since (a + b)2 6= a2 + b2 . Similarly, scalar multiplication breaks for most choice of
scalars and vectors. 

Example 1.10

Consider P4 (R), the vector space of degree 4 polynomials. Define the set N =
{p ∈ P4 (R) : p(1) = 0}. Determine whether N is a subspace of P4 (R).

Solution. We again check our three properties of a subspace:

1. The zero vector is the zero polynomial 0(x) = 0. Certainly 0(1) = 0, so 0 ∈ N .

2. Let p, q ∈ N , so that p(1) = 0 and q(1) = 0. We want to check whether p + q ∈ N ; that is,
we need to show that (p + q)(1) = 0. But

(p + q)(1) = p(1) + q(1) = 0 + 0 = 0,

so p + q ∈ N .

7
2013-
c Tyler Holden
1 General Vector Spaces 1.2 Subspaces

3. Let p ∈ N so that p(1) = 0. If c ∈ R is any real number, then (cp)(1) = cp(1) = c × 0 = 0, so


cp ∈ N .

We conclude that N is a subspace of P4 (R). 

In the above proof, we didn’t use the fact that we were working in P4 (R), and it’s easy to
conclude that this proof would have worked for any Pn (R).

Exercise: In Exercise 1.1-9 you showed that P (R) – the set of all polynomials – is a vector
space. Show that for any n ∈ N, Pn (R) is a subspace of P (R).

Example 1.11

Let M2 (R) denote the set of 2 × 2-matrices. In Exercise 1.1-2 you should that this was a
vector space. Define the set S = {X ∈ M2 (R) : det(X) = 0}. Determine if S is a subspace
of M2 (R).

Solution. The zero matrix is definitely an element of S, since det(0) = 0. Scalar multiplication
is also satisfied. Indeed, if A ∈ S then det(A) = 0. Fix an arbitrary c ∈ R, so that det(cA) =
c2 det(A) = c2 × 0 = 0. However, vector addition fails. Note that
   
1 0 0 0
A= and B =
0 0 0 1

both have determinant 0 and hence are in S; however,


 
1 0
A+B =
0 1

has determinant 1, and so is not an element of S. Thus S is not a subspace. 



In Example 1.9 we say that (x, x2 ), x ∈ R failed to be a subspace of R2 because (a + b)2 6=
a2 +b2 ; namely, that this equation was not linear. Similarly, Example 1.11 fails because det(A+B) 6=
det(A) + det(B): The determinant function is not linear. Hopefully, you’re starting to see why this
is called linear algebra. A good question is whether or not you see the linearity in Example 1.10.

Exercises
 T
1.2-1. (a) Consider the vector v = −1 2 π ∈ R3 , and define the set L = {tv : t ∈ R}. Show
that L is a subspace of R3 .
(b) More generally, let v ∈ Rn be any vector, and define L = {tv : t ∈ R}. Show that L is a
subspace.

1.2-2. Let A be an m × n matrix, and let H be the set of solutions to the equation Ax = 0.

(a) Is H ⊆ Rn or H ⊆ Rm ?

8
2013-
c Tyler Holden
1.3 Spanning Sets 1 General Vector Spaces

(b) Show that H is a subspace.

1.2-3. Let X = {f : R → R}; that is, the set of all functions from R to R.

(a) Show that X is a vector space.


(b) Determine whether the set V = {f ∈ X : f (π) = 0} is a subspace.
(c) Determine whether the set W = {f ∈ X : ∃x ∈ R, f (x) = 0} is a subspace.
(d) Determine whether the set Y = {f ∈ X : ∀x ∈ R, f (x) ≥ 0} is a subspace.
(e) Determine whether the set D = {f ∈ X : f is differentiable} is a subspace.

1.2-4. Consider the set of all real sequences S = {(an )∞


n=1 : an ∈ R}.

(a) Show that S is a vector space.


(b) Define c = {(an ) ∈ S : (an ) converges}. Show that c is a subspace of S.
n o
n→∞
(c) Define c0 = (an ) ∈ S : (an ) −−−→ 0 . Show that c0 is a subspace of c and hence of S.

1.2-5. Let Mn (R) denote the set of all n × n-matrices.



(a) Define S = A ∈ Mn (R) : A + AT = 0 . Determine whether S is a subspace of Mn (R).
(b) Define T = {A ∈ Mn (R) : Tr(A) = 0}. Determine whether T is a subspace of Mn (R).

(c) Define O = A ∈ Mn (R) : AAT = In . Determine whether O is a subspace of Mn (R).

1.3 Spanning Sets



We used the standard basis {ei } to write out vectors in Rn , and the monomials xi to write out
polynomials in Pn (R). Our goal is to generalize this notion to any vector space.
Definition 1.12
If V is a vector space, and {v1 , . . . , vk } ⊆ V is some finite collection of vectors in V , a linear
combination of these vectors is any sum of of the form
n
X
c1 v1 + c2 v2 + · · · + cn−1 vn−1 + cn vn = ci vi
i=1

for some choice of ci ∈ R, i = 1, . . . , n.

Definition 1.13
If V is a vector space and A ⊆ V , the span of A is the set span(A) of all finite linear
combinations of vectors in A.

Let’s examine Definition 1.13 in a bit more detail. Most of the time, we’ll be interested in
looking at the span of finite sets. So if A = {v1 , . . . , vn }, then we can say
( n )
X
span(A) = span {v1 , . . . , vn } = ci vi : ci ∈ R ,
i=1

9
2013-
c Tyler Holden
1 General Vector Spaces 1.3 Spanning Sets

which is explicitly the set of all linear combinations of the elements of A. There is no need to worry
about the question of finite linear combinations, since there were only finitely many elements to
add together in the first place.
On the other hand, recall that P (R) is the set of all polynomials with real coefficients. Define
n o
A = xk : k ∈ N ,

the set of all monomials. Here A is infinite, but nonetheless its span consists of only finite linear
combinations. For example, 2 − πx + 33x74 is in the span of A. The restriction to finite linear
combinations ensures that every element of span(A) is actually a polynomial, and not some sort of
infinite series.
Example 1.14

Show that Pn (R) = span 1, x, x2 , . . . , xn .

Solution. Hopefully this is intuitive to you, but let’s be a little bit careful about this. Since we’re
showing an equality of sets, we need to perform a double subset inclusion.
(⊆) Every degree n polynomial is of the form p(x) = a0 · 1 + a1 · x + · · · an · xn , with the right
hand side being a linear combination of those on the right. Hence Pn (R) ⊆ span {1, x, . . . , xn }.

(⊇) Clearly each element of 1, x, x2 , . . . , xn is in Pn (R), so we need to argue that so too are
their linear combinations. However, Pn (R) is closed under addition and scalar multiplication, and
hence every linear combination of these elements is in Pn (R). Thus {1, x, . . . , xn } ⊆ Pn (R). 

You should have some experience with spanning sets already, and using Gaussian elimination
to determine when a vector lies in the span of a set. Don’t be thrown by the more abstract setting,
as the general technique remains the same.
Example 1.15

Show that 2x2 − 5x + 1 is in the span of x2 , 1 + x, x2 − x + 1 in P2 (R).

Solution. We want to determine if there are coefficients c1 , c2 , c3 such that


c1 x2 + c2 (1 + x) + c3 (x2 − x + 1) = 2x2 − 5x + 1.
Expanding out the left hand side, we get
(c1 + c3 )x2 + (c2 − c3 )x + (c2 + c3 ) = 2x2 − 5x + 1.
Equating coefficients gives the following linear system of equations:
    
c1 + + c3 = 2 1 0 1 c1 2
c2 − c3 = −5 −→ 0 1 −1 c2  = −5 . (1.3)
c2 + c3 = 1 0 1 1 c3 1
You
 2 can solve this to find (c1 , c2 , c3 ) = (−1, −2, 3). Thus 2x2 − 5x + 1 is indeed in the span of
x , 1 + x, x2 − x + 1 . 

10
2013-
c Tyler Holden
1.3 Spanning Sets 1 General Vector Spaces

Remark 1.16 Examine the columns


 of the matrix in (1.3) above. Do you see how these
columns relate to the elements x2 , x + 1, x2 − x + 1 ?

The value of spanning sets is that they produce subspaces automatically.


Theorem 1.17

Let V be a vector space and A ⊆ V an arbitrary non-empty subset of V . The set span(A) is
a subspace of V . Moreover, A ⊆ span(A), and span(A) the smallest subspace containing A.

Proof. Using Theorem 1.7, let’s check that span(A) is a subspace of V .

1. The zero vector is the trivial linear combination. Choose any element v ∈ A, so that 0 = 0v.
Thus span(A) contains the zero vector.

2. Let u, v ∈ span(A). By definition, each can be written as a finite linear combination of


elements in A, so there are w1 , . . . , wn ∈ A such that
n
X m
X
u= ci wi and v = di wi .
i=1 j=1

Summing these together gives


n
X
u+v = (ci + di )wi
i=1

which is also a linear combination of the wi , and hence is in the span of A.

3. Let u be as in part (2) above and fix some arbitrary real number d ∈ R. We have
n
X
du = (dci )wi
i=1

which is still a linear combination of the wi , and hence in the span of A.

Now A ⊆ span(A), for if w ∈ A then w = 1w is a linear combination of elements in A. When


we say that span(A) is the smallest subspace containing A, we mean that if U is another subspace
such that A ⊆ U , then span(A) ⊆ U .
Let U be a subspace such that A ⊆ U . Being a subspace, U is closed under vector addition and
scalar multiplication, the two operations used to construct linear combinations. Hence any linear
combination of elements in A is also an element of U , showing that span(A) ⊆ U .

Hopefully you recall that Rn , while naturally spanned by the standard basis {ei }, can be spanned
by many other sets as well. For example, the sets
             
1 1 π 1 11 1 2
, or , or , ,
1 −1 e 5 12 0 3

11
2013-
c Tyler Holden
1 General Vector Spaces 1.3 Spanning Sets

all span R2 , and there are infinitely many other possibilities. There are similarly other spanning
sets for the other vector spaces we have seen.
Theorem 1.17 will help us check that two spanning sets generate the same space. Suppose S, T
are subsets of some vector space V and we want to show that span(S) = span(T ). According
to Theorem 1.7, if we can show S ⊆ span(T ) then span(S) ⊆ span(T ). Similarly, showing that
T ⊆ span(S) implies that span(T ) ⊆ span(S). Both inequalities give equality.
Example 1.18

Show that S = x2 , x + 1, x2 − x + 1 is a spanning set for P2 (R).


Solution. Clearly x2 , x + 1, x2 − x + 1  ⊆ P2 (R),
so by Theorem 1.7 we have span(S) ⊆ P2 (R).
Conversely, we know that P2 (R) = span 1, x, x2 . It suffices to show that each
 of these
elements is
in the span of span(S), since by Theorem 1.7 we then know that P2 (R) = span 1, x, x2 ⊆ span(S).
You can check that

x2 = x2 + 0(x + 1) + 0(x2 − x + 1)
1 1 1
x = (x2 ) + (x + 1) − (x2 − x + 1)
2 2 2
1 1 1
1 = − (x2 ) + (x + 1) + (x2 − x + 1),
2 2 2
giving the other subset inclusion. 

Exercises

1.3-1. Let S = {ei : i = 1, . . . , n} be the set of standard basis vectors in Rn . Show that Rn =
span(S).

1.3-2. Suppose we are working in Mm×n (R), and we define Eij to be the m × n matrix which is 1
in position (i, j), and zero everywhere else. For example, in M2×2 we have
       
1 0 0 1 0 0 0 0
E11 = , E12 = , E21 = , E22 = .
0 0 0 0 1 0 0 1

This is similar to the standard basis vectors ei above, but now in matrix form. Show that
Mm×n (R) = span {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n}.

1.3-3. 
Find a finite set of vectors S which span P3 (R) which is not the standard monomials
2 3
1, x, x , x .

1.3-4. Let u, v, w ∈ V be three vectors. Show that

span {u, v, w} = span {u + v, w − u, u + v + w} .

1.3-5. Let {v1 , . . . , vn } ⊆ V , and suppose v1 ∈ span {v2 , . . . , vn }. Show that

span {v1 , . . . , vn } = span {v2 , . . . , vn } .

12
2013-
c Tyler Holden
1.4 Linear Independence 1 General Vector Spaces

Suppose that S = {A1 , . . . , Ak } ⊆ Mn (R) is a spanning set for Mn (R). Show that S 0 =
1.3-6. 
AT1 , . . . , ATk also spans Mn (R).

1.3-7. Suppose that S = {p1 , . . . , pk } ⊆ Pn (R) is such that Pn (R) = span(S). Show that for every
x ∈ R, there exists an pi ∈ S such that pi (x) 6= 0.

1.3-8. Show that if V is any vector space, then V always admits a spanning set; that is, there is a
subsets S ⊆ V such that span(S) = V .

1.3-9. Suppose V is a vector space and {v1 , . . . , vn } is a set of vectors in V . Show that if w ∈
span {v1 , . . . , vn }, then there exists some i ∈ {1, . . . , n} such that

span {v1 , . . . , vn } = span {v1 , . . . , vi−1 , w, vi+1 , . . . , vn } ;

that is, there is some vector vi that can be replaced by w without affecting the span.

1.4 Linear Independence

Spanning sets can be inefficient. For example, consider the set


       
1 0 1 −1
S= , , ,
0 1 1 1
 T
which you can check spans R2 . If we choose a typical vector, say v = 1 2 , we can write v as
         
1 1 0 1 −1
=1 +2 +0 +0
2 0 1 1 1
       
1 0 3 1 1 −1
=0 +0 + +
0 1 2 1 2 1
       
1 0 1 −1
=2 −7 +4 +5
0 1 1 1

and many other possibilities. In fact, you know that there are infinitely many linear combinations
that give v, since as a linear system the spanning set has a coefficient matrix
 
1 0 1 −1
A=
0 1 1 1

which has rank 2. Thus any solution to Ax = v will have two parameters in its solution set. You
can check that the solutions to Ax = v are
     
−1 1 1
−1 −1 2
x = s     
 1 + t  0 + 0 , s, t ∈ R,
0 1 0

so that the first solution above is (s, t) = (0, 0), the second is (s, t) = (3/2, 1/2), and the third is
(s, t) = (4, 5).

13
2013-
c Tyler Holden
1 General Vector Spaces 1.4 Linear Independence

There are a few problems here. The first is that we didn’t need four vectors to span R2 . In
fact, any two of these vectors would have spanned R2 . The second is that when we wrote v as a
linear combination of this spanning set, there was no unique way of doing this. The source of this
issue is the fact that these vectors lie within each others spans. For example,
                 
1 1 0 1 1 1 1 −1 1 1 0
=1· +1· or = · − · or =1· −1· .
1 0 1 0 2 1 2 1 0 1 1
We have redundancy, and we can fix our issues by removing that redundancy.
It’s annoying to ask whether one of these vectors lies in the span of the other three, and to do
that for each vector. Instead, we can succinctly combine these into a single statement: If there are
any non-trivial solutions to
         
1 0 1 −1 0
c1 + c2 + c3 + c4 =
0 1 1 1 0
then there are redundant vectors, since we can re-arrange the equation to write one vector as a
linear combination of the others. For example, we know
         
1 0 1 −1 0
1· −3 +1· +2· =
0 1 1 1 0
so that we can re-write this as
               
1 0 1 −1 0 1 1 1 1 2 −1
=3 − −2 or = + + etc.
0 1 1 1 1 3 0 3 1 3 1

Definition 1.19
If V is a vector space, a non-empty finite set of vectors {v1 , . . . , vn } is said to be linearly
independent if
c1 v1 + c2 v2 + · · · + cn vn = 0
implies that c1 = c2 = · · · = cn = 0; that is, the only solution to this system of equations
is the trivial solution. If S ⊆ V is an infinite set, we say that S is linearly independent if
any finite subset of S is linearly independent in V . A set of vectors which is not linearly
independent is said to be linearly dependent.

Linear independence is stated in a roundabout way, so you should take some time to stew over
the definition. Alluding to our discussion above, the intuition is that linearly dependent vectors
have redundancy, some of the vectors lie in the span of the others. Thus linearly independent
vectors are the opposite: They cannot be written as a linear combination of one another.
Example 1.20

Determine whether the set S = x + 1, 2x − 1, x2 + 3x is linearly independent in P2 (R).

Solution. We begin by setting up the linear homogeneous system,

c1 (x + 1) + c2 (2x − 1) + c3 (x2 + 3x) = 0.

14
2013-
c Tyler Holden
1.4 Linear Independence 1 General Vector Spaces

Remember that the 0 on the right hand side is the zero polynomial. If we collect like terms, this
becomes
c3 x2 + (c1 + 2c2 + 3c3 )x + (c1 − c2 ) = 0x2 + 0x + 0.
For this to be true, each of the coefficients must be zero, giving a linear system

c3 = 0
c1 + 2c2 + 3c3 = 0.
c1 − c2 =0

Row reducing the augment matrix we get


   
0 0 1 0 1 0 0 0
1 RREF
2 3 0 −−−−→ 0 1 0 0 ,
1 −1 0 0 0 0 1 0

so we conclude that the only solution is the trivial solution c1 = c2 = c3 = 0. This is precisely the
condition or linear independence, and so S is linearly independent. 

I mentioned previously that linear independence would fix the issue of non-uniqueness of rep-
resentation. This next theorem shows this is true very quickly.
Theorem 1.21

Suppose V is a vector space and S = {v1 , v2 , . . . , vn } is a linearly independent set in V . If


w ∈ span(S), there there are a unique set of coefficients c1 , . . . , cn ∈ R such that
n
X
w= ci vi .
i=1

Proof. Suppose we can write w in two different ways, say as

w = c1 v1 + c2 v2 + · · · + cn vn
= d1 v1 + d2 v2 + · · · + dn vn .

We’re going to show that ci = di for each i, and therefore the coefficients are unique. To do this,
let’s subtract the two lines above to get

0 = (c1 − d1 )v1 + (c2 − d2 )v2 + · · · (cn − dn )vn .

Now since S is linearly independent, the only solution to this homogeneous can occur if each of the
coefficients is zero; namely, ci = di for each i = 1, . . . , n. Thus the coefficients are unique.

Exercises

1.4-1. Argue, without doing


n
 any computation,
2 n
that the standard basis {e1 , . . . , en } is linearly inde-
pendent in R , and 1, x, x , . . . , x is linearly independent in Pn (R).

15
2013-
c Tyler Holden
1 General Vector Spaces 1.4 Linear Independence

1.4-2. Show that the set 


4, x2 + 2, 2x2 − x
is linearly independent in P2 (R).
1.4-3. Indicate whether each statement is True or False. If True, prove the statement. If False,
provide a counter-example.
(a) If S ⊆ Rn is linearly independent, then |S| ≤ n.
(b) If V is a vector space, then any singleton set S is linearly independent.
(c) If S ⊆ Pn (R) and |S| ≤ n + 1 then S is linearly independent.

(d) The set xk : k ∈ N is linearly independent in P (R).
(e) Let D[0, 1] be the set of differentiable functions on [0, 1]. The set
n o
f : [0, 1] → R, x 7→ xk : k ∈ N
is a linearly independent in D[0, 1].
1.4-4. Let T ⊆ V be a linearly independent set. Show that any subset S ⊆ T is also linearly
independent.
1.4-5. Suppose {v1 , . . . , vn } ⊆ V is linearly independent. Show that
span {v1 , . . . , vn−1 } ⊂ span {v1 , . . . , vn } ;
that is, show that the span of the left is a strict subset of the span on the right. Note that
this result is generally not true if the {v1 , . . . , vn } are not linearly independent.
1.4-6. Consider the set   
a b
C= : a, b ∈ R .
−b a
(a) Show that C is a subspace of M2 (R).
(b) Find two linearly independent vectors in C. Be sure to demonstrate that they are linearly
independent.
1.4-7. Fix a natural number n and let a1 < a2 < · · · < an be distinct real numbers. Argue that the
set      2   n−1 

 1 a1 a1 a1 


1  a2   a2  an−1 

     2  2 
 .. ,  .. ,  .. , · · ·  . 
..
 .  .  .
  

 

1 an 2
an n−1
an
is linearly independent in Rn .
1.4-8. Extend Theorem 1.21 to include the case where S is an infinite linearly independent set.
1.4-9. Suppose V is a vector space, and S = {v1 , . . . , vk } is a linearly independent set in S. If
w ∈ V but w ∈/ span(S), show that S ∪ {w} is linearly independent.
1.4-10. Fix an n ∈ N and consider Pn (R). Let S = {p1 , . . . , pn } ∈ Pn (R) such that deg(p1 ) <
deg(p2 ) < · · · < deg(pn ). Show that S is linearly independent.

1.4-11. Suppose A ∈ Mn (R) satisfies Ar = 0 but Ar−1 6= 0 for some r ∈ N. Show that In , A, A2 , · · · , Ar−1
is linearly independent in Mn (R).

16
2013-
c Tyler Holden
1.5 Basis and Dimension 1 General Vector Spaces

1.5 Basis and Dimension

If V is a vector space, our major goal is to find a set of vectors which span V , but do so in the most
efficient way possible (are linearly independent). These are competing notions: As we add more
vectors to a set, we may increase the span, but decrease the likelihood of being linearly independent.
In fact, we can make this argument precise:
Theorem 1.22

Let V be a vector space. If S is any spanning set of V , and T is any linearly independent
subset of V , then |T | ≤ |S|.

Proof. This theorem is true even when S and T are infinite, though in that case the proof is a
bit harder (see Exercise 1.5-5). Thus suppose S = {v1 , . . . , vn } is a finite spanning set for V , and
T = {w1 , . . . , wk } is a finite linearly independent set.
Since V = span(S), we know that w1 ∈ span(S). By Exercise 1.3-9, we can replace one
of the vi with w1 . By re-arranging if necessary, let’s say that we replace v1 with w1 , so that
V = span {w1 , v2 , . . . , vn }. We can continue this inductively, replacing v2 with w2 , v3 with w3 ,
and so on.
For the sake of contradiction, suppose that k > n (that is, |T | > |S|). The induction we
performed above ensures that we can write V = span {w1 , . . . , wn }. But since k > n there are
still wi , i ∈ {n + 1, n + 2, . . . , k − 1, k} remaining that are not part of this span. Let’s look at
wn+1 , which we know must be in span {w1 , . . . , wn } = V . But since wn+1 is non-zero, there are
coefficients c1 , . . . , cn – not all of which are zero – such that

wn+1 = c1 w1 + c2 w2 + · · · + cn wn ,

which contradicts the linear independence of T .

Having established that linearly independent sets are always smaller than spanning sets, the
question then becomes whether we can find a “Goldilocks” set; namely, a set which is both linearly
independent and spans the vector space.
Definition 1.23
If V is a vector space, a basis for V is any set B which is both linearly independent and
spans V .

It now makes sense why we called {e1 , . . . , en } ⊆ Rn the standard


 basis,2 since we know this set
n n
both spans R and is linearly independent! We’ve also seen that 1, x, x , . . . , x is a basis for
Pn (R). You can check that the set {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} defined in Exercise 1.3-2 is a basis
for Mm×n (R).
Since bases fit into that perfect compromise between being linearly independent but also span-
ning, we don’t have much room to fiddle with the number of vectors in a basis. If we add another
vector, we wouldn’t increase the span but we would destroy linear independence. If we take a vector

17
2013-
c Tyler Holden
1 General Vector Spaces 1.5 Basis and Dimension

away, we would preserve linear independence, but we would no longer span the vector space. It
then seems reasonable to presume that all bases must be the same size.
Proposition 1.24

If V is a vector space with bases B1 and B2 , then |B1 | = |B2 |; that is, all bases are the same
size.

Proof. Since B1 is linearly independent and B2 spans V , by Theorem 1.22 we know that |B1 | ≤ |B2 |.
But the same argument holds in reverse: B2 is linearly independent and B1 spans V , so by Theorem
1.22 we know |B2 | ≤ |B1 |. Both inequalities imply that |B1 | = |B2 |.

Definition 1.25
If V is a vector space, the dimension of V – written dim(V ) – is the cardinality of any of its
bases.

The fact that every vector space has a basis is Exercise 1.5-5c. The fact that bases have a unique
size is Proposition 1.24. Thus talking about the dimension of a vector space is a well-defined notion.
Examples of common bases and the dimensions of their corresponding vector spaces are as follows:

1. Since {e1 , . . . , en } is a basis for Rn , dim(Rn ) = n.



2. Since 1, x, x2 , . . . , xn is a basis for Pn (R), dim(Pn (R)) = n + 1.

3. Since {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis for Mm×n (R), dim(Mn (R)) = mn.

4. The vector space P (R) of all polynomials with real coefficients


is of infinite (countable) di-
mension. A basis for P (R) is xk : k a non-negative integer .

Of course, the aforementioned bases aren’t the only bases for their respective vector spaces. The
only thing we can guarantee is that two different bases always have the same number of vectors.
Knowing the dimension of a vector space, how do we find a basis? There are two main ideas:
We can start from a linearly independent set, and continue adding new vectors while maintaining
linear independence until we span the entire space. Alternatively, we can take a spanning set,
and remove redundant vectors until we are left with something which is linearly independent. To
convince ourselves that this procedure will eventually result in a basis, we need the following:
Proposition 1.26

Suppose V is a finite dimensional vector space with dim(V ) = n.

1. If T ⊆ V is linearly independent and |T | = n, then T is a basis.

2. If S ⊆ V is a spanning set, and |S| = n, then S is a basis.

This proposition states that we don’t need to worry about both linear independence and span,
so long as we have the right number of vectors.

18
2013-
c Tyler Holden
1.5 Basis and Dimension 1 General Vector Spaces

Proof. I will do the proof of (1), leaving the proof of (2) as an exercise (Exercise 1.5-1). Suppose
then that T = {v1 , . . . , vn } is linearly independent and dim(V ) = n. Fix a basis B of V , so that
|B| = n. We need to show that span(T ) = V , so suppose for the sake of contradiction that this
is not the case. Choose some w ∈ V \ span(T ), and note that T ∪ {w} is linearly independent by
Exercise 1.4-9. But since T ∪ {w} is linearly independent and B is a spanning set, Theorem 1.22
tells us that |T ∪ {w} | ≤ |B|, which in turn implies that n + 1 ≤ n. This is a contradiction, so it
must be the case that span(T ) = V

Proposition 1.27

Let V be a finite dimensional vector space.

1. If T is a linearly independent subset of V , then we can extend T to a basis of V .

2. If S is a finite spanning set of V , we can reduce S to a basis of V .

Proof. I will do the proof of (1) and leave (2) to Exercise 1.5-2. Let T be a linearly independent
set, and suppose dim(V ) = n. If span(T ) = V then we’re done, so assume that span(T ) ⊂ V .
Choose a vector w1 ∈ V \ span(T ), so that T1 = T ∪ {w1 } is still linearly independent, according
to Exercise 1.4-9. If span(T1 ) = V we’re done, and if not we inductively choose a wi ∈ V \ span(Ti )
and define Ti = Ti−1 ∪ {wi }. Again, Ti is linearly independent.
This process must terminate. Indeed, |Ti | = |Ti−1 | + 1, so that the size of our linearly indepen-
dent sets are increasing. Per Proposition 1.26, once |Ti | = n, we will have a basis.

Proposition 1.28

If V is a finite dimensional vector space and U ⊆ V is a subspace, the following are true:

1. dim(U ) ⊆ dim(V );

2. Any basis of U can be extended to a basis of V ;

3. If dim(U ) = dim(V ), then U = V .

Proof. 1. For the sake of contradiction assume dim(U ) > dim(V ). Fix a basis B for U . By
definition we know that |B| = dim(U ) > dim(V ). Moreover, since B is linearly independent
in U , it is also linearly independent in V , and so |B| ≤ dim(V ). But this contradicts the fact
that |B| = dim(U ) > dim(V ), so dim(U ) ≤ dim(V ).

2. Fix a basis B of U . Since B is linearly independent in U , it is also linearly independent in V .


By Proposition 1.27 we can thus extend B to a basis B 0 of V .

3. If B is a basis of U , then |B| = dim(U ) = dim(V ). Thus B is a linear independent set of size
dim(V ), and hence is a basis of V by Proposition 1.26.

19
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces

Exercises

1.5-1. Finish the proof of Proposition 1.26 by proving that if S ⊆ V is a spanning set, and |S| =
dim(V ), then S is a basis for V .

1.5-2. Prove item (2) in Proposition 1.27. Hint: Use Exercise 1.3-5 to inductively reduce the size of
your spanning set, and argue that eventually this process must terminate.

1.5-3. (a) Show that a basis is a maximal linearly independent set; that is, if B is a basis and T is
a linearly independent set such that B ⊆ T , then B = T .
(b) Show that a basis is a minimal spanning set; that is, if B is a basis and S is a spanning
set such that S ⊆ B, then B = S.

1.5-4. Let C[0, 1] denote the set of continuous functions f : [0, 1] → R. Show that C[0, 1] is an
infinite dimensional vector space. Hint: Do not try to find a basis for C[0, 1].

1.5-5. (Hard) Here we’ll prove Theorem 1.22. This is not difficult, but the language is probably
new and unfamiliar. It is therefore easy to become flustered or overwhelmed. You should not
worry about this proof at all, it is only for advanced students who are interested in seeing
how we deal with the infinite case. To prove our theorem, we will need the following famous
result (which it turns out is equivalent to the Axiom of Choice):

Zorn’s Lemma: “If P is a partially ordered set such that every chain in P has an
upper bound, then P has a maximal element.”
(a) Prove that if S is a spanning set of a vector space V , and T ⊆ S is a linearly
independent set, then there exists a basis B such that T ⊆ B ⊆ S. To do this,
consider the set M of all linearly independent subsets of V containing T , with
a partial ordering induced by subset inclusion.
i. Argue that every chain in M has an upper bound, by taking the union over
all elements in the chain.
ii. Use Zorn’s Lemma to argue that M has a maximal element, say B.
iii. Show that B spans V by using Exercise 1.4-9.
(b) Prove Theorem 1.22 by letting Ŝ = S ∪ T , and apply the theorem from part
(a).
(c) Conclude from part (a) that every vector space has a basis.

1.6 Operations on Vector Spaces

1.6.1 Internal Direct Sum

Suppose V is an arbitrary vector space with subspaces U and W . Our goal is to ask ourselves
if there is some sense in which every element of V can be broken down into something in U and
something in W .
For example, let F (R) = {f : R → R} denote the set of all real-valued functions on R. This is
a vector space, and it’s not too hard to check that

E = {f ∈ F (R) : ∀x ∈ R, f (x) = f (−x)} and O = {f :∈ F (R) : ∀x ∈ R, f (−x) = −f (x)} ,

20
2013-
c Tyler Holden
1.6 Operations on Vector Spaces 1 General Vector Spaces

the set of even and odd functions respectively, are subspaces. Critical to our discussion will be
the fact that the only function which is both even and odd is the zero function 0(x) = 0; that is,
E ∩ O = {0}.
Now I claim that every function f ∈ F (R) can be written uniquely as f = g + h, where g ∈ E
and h ∈ O. To see this, fix such an f , and define
f (x) + f (−x) f (x) − f (−x)
g(x) = and h(x) = .
2 2
You can quickly check that g ∈ E and h ∈ O, and that g + h = f . Moreover, if there are another
set of functions ĝ ∈ E and ĥ ∈ O such that f = ĝ + ĥ, then subtracting f = g + h would give

0 = (g − ĝ) + (h − ĥ) ⇒ ĝ − g = h − ĥ
| {z } | {z }
∈E ∈O

Since the left hand side – an even function – is equal to the right hand side – an odd function – it
must be the case that each side is identically the 0 function. Thus ĝ = g and ĥ = h, showing that
this decomposition is unique.
We conclude that there’s a sense in which F (R) = E + O, and it’s this plus sign that we have
to decipher.
Definition 1.29
Suppose V is a vector space with U, W subspaces. We define the sum U + W =
{u + w : u ∈ U, w ∈ W }.

You can check in general that U + W and U ∩ W are subspaces of V . To mimic what we had
above with F (R) = E + O, we introduce the following definition:
Definition 1.30
If V is a vector space with subspaces U and W such that U + W = V and U ∩ W = {0}, we
say that V is the internal direct sum of U and W and write V = U ⊕ W .

Proposition 1.31

Let V be a vector space with subspaces U and W . The following are equivalent:

1. V = U ⊕ W

2. Every element v ∈ V can be written uniquely as v = u + w where u ∈ U and w ∈ W .

3. If BU = {u1 , . . . , uk } and BW = {w1 , . . . , w` } are bases of U and W respectively, then


B = BU ∪ BW is a basis for V .

Proof. [1 ⇒ 2] The proof is almost identical to the proof that F (R) = E+O above, with appropriate
changes made.
P
[2 ⇒ 3]PIf v ∈ V , write v = uP + w. Since P BU and BW are bases, we can write u = i ci ui
and w = j dj wj , and so v = i ci ui + j dj wj showing that B spans V . To show linear

21
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces

independence, suppose that


k
X X̀
ci ui + dj wj = 0
i=1 j=1

Now by assumption, there is exactly one way to write 0 ∈ V as the sum of two vectors from U and
W . Since 0 ∈ U ∩ W , we know that 0 = 0 + 0 is one such way, and hence must be the only such
way. We can immediately conclude that
k
X X̀
ci ui = 0 and dj wj = 0.
i=1 j=1

Since BU and BW are bases, we conclude that ci = 0 for all i = 1, . . . , k and dj = 0 for all
j = 1, . . . , `.
[3 ⇒ 1] It suffices to show that U + W =PV and U P∩ W = {0}. In the firstPcase, let v ∈ V .
SincePB is a basis for V , we can write v = i ci ui + j dj wj . Clearly u = i ci ui ∈ U and
w = j dj wJ ∈ W , so v = u + w shows that V ⊆ U + W . The other inclusion is trivial.
To show that U ∩ W = {0}, suppose v ∈ U ∩ W . Since v ∈ U and v ∈ W , there exists
coefficients ci , i = 1, . . . , k and dj , j = 1, . . . , ` such that
k
X X̀
v= ci ui = dj wj .
i=1 j=1

Subtracting these equations gives


k
X X̀
ci ui + (−dj )wj = 0.
i=1 j=1

Since B is a basis, we conclude that ci = 0 for all i = 1, . . . , k and dj = 0 for all j = 1, . . . , `. Hence
v = 0, showing that the only point in U ∩ W is the zero vector, as required.

Since the bases for U and W combine to give a basis for V , we can immediately conclude the
following:
Corollary 1.32

If V = U ⊕ W , then dim(V ) = dim(U ) + dim(W ).

Example 1.33

Let v ∈ Rn be a non-zero vector. Let U = span v and define U ⊥ = {w ∈ V : v · w = 0},


where v · w is the standard dot-product. Show that Rn = U ⊕ U ⊥ .

Solution. We claim that U ∩ U ⊥ = {0}. Indeed, if z ∈ U then z = cv for some c ∈ R. Moreover, if


z ∈ U ⊥ then
0 = z · v = c(v · v) = ckvk2 .

22
2013-
c Tyler Holden
1.6 Operations on Vector Spaces 1 General Vector Spaces

We know that kvk = 0 if and only if v = 0, so U ∩ U ⊥ = {0}.


To show that U + U ⊥ = Rn , let z ∈ Rn be an arbitrary point. Let

hz, vi
projv (z) = v
hv, vi

be the usual projection operator. It is easy to see that projv (u) ∈ U . If we take w = z − projv (z)
we claim that w ∈ U ⊥ . Indeed,
 
hz, vi hz, vi
v·w =v· z− v = hv, zi − hv, vi = 0.
hv, vi hv, vi

Thus U + U ⊥ = Rn and we conclude that Rn = U ⊕ U ⊥ as required. 

If V = U ⊕ W and U = X ⊕ Y , then we can write V = (X ⊕ Y ) ⊕ W . It’s not hard to convince


oneself that Proposition 1.31 generalizes to this case as well. In fact, if U1 , U2 , . . . , Uk ⊆ V , we can
discuss the n-fold internal direct sum, written
k
M
V = Ui .
i=1

1.6.2 External Direct Sum

Let V and W be vector spaces (not contained in a larger, ambient vector space). As sets, we can
make sense of the Cartesian product V × W . We can endow V × W with a vector space structure
by defining addition and multiplication to be component-wise. Thus

(v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w1 + w2 ) and c(v1 , w1 ) = (cv1 , cw1 ).

Don’t prove the vector space axioms from scratch, but think about this for a while and convince
yourself why they must obviously be true. The external direct sum, written V ⊕ W , is the set
V × W with this notion of addition and multiplication.
For example, we can naturally define a vector space structure on R2 ⊕ P1 (R) by saying that
        
v1 w1 v 1 + w1
, a + bx + , c + dx = , (a + c) + (b + d)x .
v2 w2 v 2 + w2

The more interesting case is when we take the direct sum of infinitely many vector spaces. Let
Vi , i ∈ N be a countable collection of vector spaces (See Exercise 1.6-4 for the generalization to
uncountable families). The Cartesian product

Y
Vi = {(v1 , v2 , v3 , . . .) : vi ∈ Vi }
i=1

is the set of all sequences such the ith element of the sequence comes from Vi . This is a vector
space under componentwise addition and scalar multiplication. However, it’s not the direct sum!

23
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces

Without going into the details of why it’s not the direct sum, let’s worry about defining the direct
sum.
L∞ Q∞
The direct sum i=1 Vi is the subspace of i=1 Vi consisting of sequences v = (v1 , v2 , . . .)
which
L∞ are confinitely zero; that is, all but finitely many of the vi are zero. You will confirm that
i=1 Vi is a subspace in Exercise 1.6-3.

1.6.3 Quotient Spaces

Suppose that V is a vector space, and U is a subspace. We can define a relation on V by saying
that x ∼ y if x − y ∈ U .
Proposition 1.34

The relation above is an equivalence relation; that is, it is reflexive, symmetric, and transitive.

Proof. Recall that the three critical properties of a subspace are that it contains the zero vector,
is closed under addition, and is closed under scalar multiplication. We’ll see that each of these
properties corresponds to exactly one of the three equivalence relation properties.

• Reflexive: If x ∈ V then x − x = 0 ∈ U , so x ∼ x.

• Symmetric: Suppose x ∼ y so that x − y ∈ U . Now y − x = (−1)(x − y) ∈ U since U is a


subspace, and hence closed under scalar multiplication. Thus y ∼ x.

• Transitive: Suppose x ∼ y and y ∼ z, so that x − y ∈ U and y − z ∈ U . Then x − z =


(x − y) + (y − z) ∈ U since U is closed under addition. Thus x ∼ z.

Since this is an equivalence relation on V , if x ∈ V we can take the equivalence class of x,


denoted [x]. The set of all equivalence classes is denote V /U .

Proposition 1.35

If V is a vector space and U ⊆ V is a subspace, then V /U is a vector space under the


operations
[x] + [y] = [x + y] and c[x] = [cx].

Proof. We need to check that these operations are well-defined; that is, they don’t depend on which
representative of the equivalence class we choose. Let [x], [y] ∈ V /U and choose two representatives
of each, say x, x0 ∈ [x] and y, y0 ∈ [y]. By definition, this means x − x0 ∈ U and y − y0 ∈ U . Our
goal is to show that x+y ∼ x0 +y0 so that [x+y] = [x0 +y0 ], and similarly for scalar multiplication.
Now
(x + y) − (x0 + y0 ) = (x − x0 ) + (y + y0 ) ∈ U,
| {z } | {z }
∈U ∈U

24
2013-
c Tyler Holden
1.6 Operations on Vector Spaces 1 General Vector Spaces

and similarly,
cx − cx0 = c (x − x0 ) ∈ U
| {z }
∈U

so both operations are well defined. From this, all the other vector space axioms follow from the
fact that V is a vector space, and so don’t need to be shown.

With this vector space structure on U , note that every element in U is in the equivalence class
of the zero vector.
Example 1.36
n T o
Let V = R2 and let U = span 1 1 . Show that dim(V /U ) = 1.

Solution. Let’s play around with this a bit first. So immediately, we know that
         
1 0 1 0 1
= since − = ∈ U.
1 0 1 0 1
h T i h  T i
More generally, t t = 0 0 for any t ∈ R. If we think of R2 as the plane, this says that
every vector along the line y = x is in the equivalence class of the zero vector.
This generalizes further. The equivalence classes are precisely the set of all lines which are
 T
perpendicular to y = x. For example, take v = 1 −1 . Note that elements in [v] look like
     
1+t x
[v] = :t∈R = :y =x+2 .
−1 + t y

y
 
−1
1 2
U, [0]
(−1, 1)
1
−2 −1 1 2 x

n T o
Figure 1.1: The vector space R2 / span 1 1 . Every point in this quotient space
is a line perpendicular to y = x.

A useful thing to do in these situations is to find a “moduli space;” in this case, a subset of R2
in bijection with V /U which captures the vector space structure of V /U . In this case, I’m going to
use the y-axis W = {(0, y) : y ∈ R}.

25
2013-
c Tyler Holden
1 General Vector Spaces 1.6 Operations on Vector Spaces

Claim: There is a bijective correspondence between W and V /U . First, if [x] ∈ U then there
is a representative w ∈ W such n that [w] = [x]. Indeed,o choose an arbitrary representative x =
  T
x1 x2 ∈ V /U , so that [x] = x1 + t x2 + t : t ∈ R . Taking t = −x1 gives
   
0 x1
= = [x]
x2 − x1 x2
 T
and 0 x2 − x1 ∈ W .
 T
Next, this representative is unique. Indeed, suppose [x] ∈ V /U and w1 = 0 w1 , w2 =
 T
0 w2 ∈ W are such that [w1 ] = [x] = [w2 ]. Then
     
0 0 0
w1 − w2 = − = ∈ U.
w1 w2 w1 − w2
 
But every element of U is of the form t t for t ∈ R, so that w1 − w2 = 0, showing that w1 = w2 .
Thus for every element of V /U , there exists a unique element of W which represents it.
Taking then the elements of W as our choice of representatives for V /U , we know that addition
and scalar multiplication are defined by acting on equivalence classes; that is,
         
0 0 0 0 0
+ = and c = .
w1 w2 w1 + w2 w1 cw1
h T i
This is precisely the vector space structure on the subspace W , and it’s easy to see that 0 1
is a basis for this space. Thus dim(V /U ) = 1. 

Exercises

1.6-1. Let U and W be vector spaces, and let V = U ⊕ W be their external direct sum.
(a) Show that U × {0} is a subspace of V .
(b) Let Z ⊆ W be an arbitrary subspace of W . Show that U × Z is a subspace of V .
Q
1.6-2. If Vi , i ∈ N is a countable family of vector spaces, show that i∈N Vi is a vector space under
componentwise addition and scalar multiplication.
Q
1.6-3. If Vi , i ∈ N is a countable family of vector spaces, show that ⊕i∈N Vi is a subspace of i∈N Vi ,
and hence is a vector space itself.
1.6-4. Let I be an arbitrary set, and let (Vi )i∈I be a family of vector spaces indexed by I. Define
( )
Y [
Vi = f : I → Vi : ∀i ∈ I, f (i) ∈ Vi .
i∈I i∈I
Q
(a) Show that i∈I Vi is a vector space under pointwise addition and scalar multiplication.
L
(b) Define i∈I Vi as the set of all functions which have finite support; that is, there are
only finitely many points such that f (i) 6= 0.
(c) Show that when I is countable, the above definition reduces to our usual definition.

26
2013-
c Tyler Holden
2 Linear Transformations

2 Linear Transformations

Functions are a powerful mathematical tool: They give us the ability to indirectly study a space
itself, and to study the relationship between two spaces. Let’s quickly recall some important facts
about functions.
If A, B are two sets, a function is a map f : A → B which assigns to each element of A a single
element B. If a ∈ A, we usually indicate its target under f as f (a). Here A is called the domain
of f , and B is the codomain. The range or image of a function is the set

image(f ) = {b ∈ B : ∃a ∈ A, b = f (a)} = {f (a) : a ∈ A} ;

namely, the image of a function is the set of points that are mapped onto from A. Also recall that
a function f : A → B is said to be injective if whenever f (x) = f (y) then x = y. It is said to be
surjective if B = image(f ). Two functions f, g : A → B are equal if f (x) = g(x) for all x ∈ A.
Now many students think of functions in terms of their representations. For example, the
function f : R → R given by f (x) = x2 is a perfectly fine function. When we write f (x) = x2 , we’re
giving an algorithm for how outputs are computed: If you give me a number x, its output is x2 .
Many (in fact, almost all) functions f : R → R cannot be prescribed using such a nice algorithm.
It’s essential to recognize that the algorithm is not the function!
For example, consider the functions f, g : {0, 1} → {0, 1}, given by f (x) = x and g(x) = x2 .
The algorithm for computing the output of these two functions is different. In the first case, we
just reproduce the input, while in the second case we first take the input and square it. However,
these two functions are equal:

f (0) = 0 = g(0) and f (1) = 1 = g(1).

This is a toy example, but we’ll see much more significant examples of this shortly.

2.1 Linear Maps

Vector spaces are your first exposure to the notion of “mathematical structure.” A vector space
V is not just a set, it is a set with additional structure. We can add vectors, and we can multiply
them by real numbers. When we look at functions between vectors spaces V and W , we want to
restrict our attention to functions which allow us to relate the structure on V to the structure on
W . For example, the map

 
v
2
T : R → P1 (R), T 1 7→ v1 x + v2
v2

27
2013-
c Tyler Holden
2 Linear Transformations 2.1 Linear Maps

is interesting, because it preserves addition and scalar multiplication. Note that

     
v1 w1 v 1 + w1
T + =T
v2 w2 v 2 + w2
| {z }
Adding in R2

= (v1 + w1 )x + (v2 + w2 )
= (v1 x + v2 ) + (w1 x + w2 )
   
v1 w1
=T +T
v2 w2
| {z }
Adding in P1 (R)

This is what we mean by “preserving addition:” When we applied T , we could add the vectors
first, then apply T ; or apply T to each vector separately, then add them. The order of applying T
and adding the vectors does not matter. You can easily check that this map also preserves scalar
multiplication; that is, T (cv) = cT (v).
Functions which do not preserve the algebraic structure of the vector space will not help us
study these sets as vector spaces, so we discard them. All of this discussion leads to the following
definition:
Definition 2.1
If V and W are vector spaces, a linear transformation T : V → W is a function such that
for all v, w ∈ V and c ∈ R we have

1. T (v + w) = T (v) + T (w)

2. T (cv) = cT (v).

The words function, map, transformation, and operator are all effectively synonyms. There are
slight differences in convention; for example, functions between vector spaces are often called trans-
formations, while linear transformations from a vector space V to itself are often called operators.
That being said, the word “linear map” often arises as well.
Example 2.2

Show that the map T : R3 → M2 (R) given by


 
v1  
v2  7→ v 1 − 3v 2 0
0 v3
v3

is linear.

 T  T
Solution. Fix v = v1 v2 v3 and w = w1 w2 w3 in R3 , and c ∈ R. We begin by showing

28
2013-
c Tyler Holden
2.1 Linear Maps 2 Linear Transformations

that T preservers addition, so that


     
v1 w1 v 1 + w1  
         (v1 + w1 ) − 3(v2 + w2 ) 0
T v 2 + w2 =T v 2 + w2 =
0 v 3 + w3
v3 w3 v 3 + w3
   
v1 − 3v2 0 w1 − 3w2 0
= +
0 v3 0 w3
   
v1 w1
=T  v2   +T   w2  .
v3 w3

Similarly, for scalar multiplication we have


    
v1 cv1  
cv1 − 3cv2 0
T c v2  = T cv2  =
0 v3
v3 cv3
 
  v1
v1 − 3v2 0  
=c = cT v2  .
0 v3
v3

We conclude that T is linear, as required. 

Example 2.3
 T
Show that the map M : R2 → R given by M ( v1 v2 ) = v1 v2 is not linear.

Solution. Both addition and scalar multiplication will fail this. You can do this in general
  to see
why it breaks down, but of course a counterexample is sufficient. Let c = 2 and v = 1 1 . Then
   
2 1
M (2v) = M = 4 6= 2M .
2 1


Example 2.4

If n ∈ N, determine whether the map det : Mn (R) → R is a linear transformation.

Solution. Despite being an important map in the study of linear algebra, the determinant map is
not linear. Indeed, we already know that det(A + B) 6= det(A) + det(B) from Example 1.11, so
there’s nothing left to do. 

While it wasn’t stated explicitly, we know that the zero vector 0V is an important part of
the structure of a vector space V . One would reasonably expect that if T : V → W is a linear
transformation, the zero vectors should map to one another. Between this and a few other small
results, we have the following proposition

29
2013-
c Tyler Holden
2 Linear Transformations 2.1 Linear Maps

Proposition 2.5

If T : V → W is a linear transformation, then

1. T (0V ) = 0W

2. T (−v) = −T (v) for any v ∈ V


n
! n
X X
3. T ci vi = ci T (vi ) for any c1 , . . . , cn ∈ R and v1 , . . . , vn ∈ V .
i=1 i=1

Proof. The proof of (2) is straightforward, and (3) is an uncomplicated induction proof. Thus I’ll
do the proof for (1) and leave the others as an exercise.
We know that 0V + 0V = 0V , thus

T (0V ) = T (0V + 0V ) = T (0V ) + T (0V ).

By the cancellation property, it must be the case that T (0V ) = 0W .

An important result that we learned in a previous course is that every m × n matrix A defines a
linear transformation TA : Rn → Rm , x 7→ Ax. We learned that the converse was also true; namely,
if T : Rn → Rm is a linear transformation, then wePcould write T (x) = Ax. Here A is determined
by how T acts on the standard basis, since if x = i xi ei then
 
x1
X  
  x2 

T (x) = T ( xi ei ) = xi T (ei ) = T (e1 ) T (e2 ) · · · T (en )  . .
| {z .
}  .
i
A
xn

In Section 1.5 we learned that we could use a basis as the atoms of a vector space. This conveniently
reduces much of the work to linear transformations to checking what it does on a basis, and allows
us to extend the above result to more than just the standard basis in Rn :
Theorem 2.6: Linear Extension

Let V and W be finite dimensional vector spaces, and B = {v1 , . . . , vn } ⊆ V be a basis for V .
If for each i ∈ {1, . . . , n}, wi ∈ W is a vector in W , there is a unique linear transformation
T : V → W such that T (vi ) = wi .

Proof. For any vector v ∈ V we know that v can be written in the basis P B using a unique set
of coefficients; namely, there exist unique c1 , . . . , cn ∈ R such that v = i ci vi . Define the map
T : V → W by sending X X X
ci vi → ci wi = ci T (vi ).
i i i

Since the ci are all unique, this map is well defined. It remains to check that it is linear, which
I’ve left to Exercise 2.1-5.

30
2013-
c Tyler Holden
2.1 Linear Maps 2 Linear Transformations

It thus remains to check that T is unique.


P Indeed, suppose that S is some other linear trans-
formation such that S(vi ) = wi . If v = i ci vi is some vector, we know that
X X
T (v) = ci T (vi ) = ci wi
i i
!
X X
= ci S(vi ) = S ci vi = S(v).
i i

Thus T (v) = S(v) for all v ∈ V , showing that S = T . We conclude that T is unique.

Remark 2.7

1. Note that in the theorem statement, the wi do not need to be distinct. It is therefore
possible that multiple of the vi map to the same wi .

2. If you ever see a phrase along the lines of “Define the transformation such that ... and
extend linearly,” the “extend linearly” part is referring to this theorem.

3. As an immediate corollary to Theorem 2.6 which was directly shown as part of the
proof, any two linear transformations which agree on a basis are equal.

Example 2.8
 T  T
Consider the basis of R2 consisting of v1 = 1 1 and v2 = 1 −1 . Determine the
linear transformation T : R2 → R3 which maps
   
  2   0
1 1
T =  0 and T = 0 .
1 −1
−1 1

Solution. Per the proof of Theorem 2.6, we know that if v = c1 v1 + c2 v2 then


   
2 0
T (v) = c1  0 + c2 0 .
 
−1 1
 T
Let v = a b be a general vector, and let’s find the coefficients c1 and c2 to write v is the basis
{v1 , v2 }. Note that we can write
     
a a+b 1 a−b 1
= + ,
b 2 1 2 −1

so that
     
       2 0 a+b
a a+b 1 a−b 1 a+b  a−b  
T =T + = 0 + 0 = 0 . 
b 2 1 2 −1 2 2
−1 1 −b

31
2013-
c Tyler Holden
2 Linear Transformations 2.1 Linear Maps

Exercises

2.1-1. Determine whether each map is a linear transformation.

(a) T : Mm×n (R) → Mn×m (R), A 7→ AT ,


 T  T
(b) T : R2 → R3 , v1 v2 v3 7→ 0 v1 + v2 v2 v3 ,
(c) T : R → R, v 7→ ev ,
(d) T : M2×2 (R) → R, A 7→ A11 − 3A21 ,
(e) Fix some t0 ∈ R, and define Tt0 : Pn (R) → R, p 7→ p(t0 ).
 
2
 T v1 −v2
(f) T : R → M2 (R), v1 v2 7→ ,
v2 v1
2.1-2. Suppose S : V → W and T : W → U are linear transformations. Show that S ◦ T : V → U
is a linear transformation.

2.1-3. Let V, W be vector spaces, and let L(V, W ) denote the set of linear transformations from V
to W . Define addition and scalar multiplication pointwise; that is, if T, S ∈ L(V, W ) and
c ∈ R, define T + S and cT by

(T + S)(v) = T (v) + S(v) and (cT )(v) = cT (v).

(a) Show that L(V, W ) is a vector space.


(b) Find a basis for L(V, W ). Hint: What do elements of L(Rn , Rm ) look like? Can you
think of a basis for this space?
(c) If V and W are finite dimensional spaces, what is the dimension of L(V, W )?

Note: The space V ∗ = L(V, R) is called the dual space of V , and End(V) = L(V, V ) is known
as the endomorphisms of V .

2.1-4. Complete the proof of Proposition 2.5 by proving properties (2) and (3).

2.1-5. Complete the proof of Theorem 2.6 by showing that the given map is linear.

2.1-6. Suppose T : V → W is an linear transformation. Show that T is injective if and only if


T (v) = 0W implies that v = 0V .

2.1-7. Suppose that T : V → W is an injective linear transformation. Show that if {v1 , . . . , vk } is


a linearly independent set in V , then {T (v1 ), . . . , T (vk )} is linearly independent in W .

2.1-8. Suppose that T : V → W is a surjective linear transformation, and that {v1 , . . . , vk } is a


spanning set for V . Show that {T (v1 ), . . . , T (vk )} is a spanning set for W .

2.1-9. Suppose that T : V → W is a bijective linear transformation, and that {v1 , . . . , vk } is a basis
for V . Show that {T (v1 ), . . . , T (vk )} is a basis for W . Conclude that dim V = dim W .

2.1-10. Suppose V is a vector space and v is some non-zero element of V . Show that there is a linear
transformation T : V → R such that T (v) 6= 0.

2.1-11. Suppose T : V → W is a linear transformation, and there is a function S : W → V which


inverts T . Show that S is linear.

32
2013-
c Tyler Holden
2.2 The Kernel and Image 2 Linear Transformations

2.1-12. Let C ∞ (R) denote the set of infinitely differentiable functions f : R → R.

(a) Show that C ∞ (R) is a vector space.


(b) Show that d
dx : C ∞ (R) → C ∞ (R), f 7→ f 0 is a linear transformation.
(c) We say that a map L : C ∞ (R) → C ∞ (R) is a linear differential operator if L ∈
 dn d0
span dx n : n ∈ Z, n ≥ 0 , where it is understood that dx0 is the identity map. Show

that every linear differential operator is a linear transformation on C ∞ (R).

2.1-13. Let V be a vector space. We saw from Exercise 2.1-3 that L(Rn , R) is a vector space, called
the dual space to Rn . Fix some v ∈ Rn , and define the map Tv : Rn → R by Tv (w) = v · w,
where v · w is the usual dot product.

(a) Show that for any v ∈ Rn , Tv ∈ L(Rn , R).


(b) Define a map T : Rn → L(Rn , R) by T (v) = Tv . Show that T is a linear transformation.

2.1-14. (Challenging) If you’ve taken any multi-variable calculus, you’ve likely learned about direc-
tions derivatives. Recall that if v ∈ Rn and f : Rn → R is a C ∞ function, then
f (w + tv) − f (w)
Dv (f )(w) = lim = ∇f (w) · v.
t→0 t0
Thus Dv (f ) : Rn → R.

(a) Argue that Dv (f ) ∈ C ∞ (Rn ).


(b) Define a map Dv : C ∞ (R) → C ∞ (R) by Dv : f 7→ Dv (f ). Show that Dv is an
endomorphism of C ∞ (R).
(c) Define a map D : Rn → L(C ∞ (Rn ), C ∞ (Rn )) by v 7→ Dv . Show that D is a linear
transformation.

Note: This exercise is not hard, it’s just that the bookkeeping – tracking what everything
does and where it should live – is tricky. However, this is an incredibly important construction
in mathematics, and is a worthwhile exercise.

2.2 The Kernel and Image

We’re already familiar with the image of a function. Since the zero vector is such an important
element of a vector space, it has a dual notion as well called the kernel, which is the set of all things
which map to zero.
Definition 2.9
Let T : V → W be a linear transformation between vector spaces. The image of T is the set
of all elements of W which are hit by some element of V ,

image(T ) = {w ∈ W : ∃v ∈ V, T (v) = W } = {T (v) : v ∈ V } ⊆ W.

The kernel of T is the set of elements in V which map to 0W ,

ker(T ) = {v ∈ V : T (v) = 0} .

33
2013-
c Tyler Holden
2 Linear Transformations 2.2 The Kernel and Image

We know that there is a one-to-one correspondence between linear transformations T : Rn → Rm


and m × n matrices A ∈ Mm×n (R). If TA (x) = Ax, then the image of T is the column space of A,
and the kernel of T is the null space of A.
Proposition 2.10

If T : V → W is a linear transformation, then ker(T ) is a subspace of V , and image(T ) is a


subspace of W .

Proof. Let’s start with the kernel of T . Clearly 0V ∈ ker T since T (0V ) = 0W , so it remains to
show addition and scalar multiplication. Let v, w ∈ ker(T ). We want to show that v + w ∈ ker(T ),
so
T (v + w) = T (v) + T (w) = 0W + 0W = 0W

showing that v + w ∈ ker(T ). Similarly, if c ∈ R then

T (cv) = cT (v) = c0W = 0W

so that cv ∈ ker(T ).
For the image, we note that T (0) = 0, so 0 ∈ image(T ). Thus let z, y ∈ image(T ). Since these
are elements of the image, there are v, w ∈ V such that T (v) = z and T (w) = y. To show that
z + y ∈ image(T ), we need to find something which maps to z + y. The obvious candidate is v + w,
and indeed
T (v + w) = T (v) + T (w) = z + y

so z + y ∈ image(T ). Similarly, if c ∈ R we want to show that cz ∈ image(T ), and so

T (cv) = cT (v) = cz

showing that cz ∈ image(T ).

My comment prior to Proposition 2.10 yields another meta-proof. We already know that the
span of a set of vectors is always a subspace, and the image of TA is its column space; that is, the
span of its columns. Similarly, the null space is the set of all solutions to a homogeneous system,
which in turn is the span of the system’s basic solutions. Since we can write both the null space
and column space as the span of a set of vectors, it immediately follows that they are subspaces.
Example 2.11

Consider the following subset of Pn (R):


( n
)
X
L= an xn + an−1 xn−1 + · · · + a1 x + a0 :∈ Pn (R) : an = 0 .
k=0

Show that L is a subspace of Pn (R).

34
2013-
c Tyler Holden
2.2 The Kernel and Image 2 Linear Transformations

Solution. In Exercise 2.1-1e you showed that the map Tt0 : Pn (R) → R, p 7→ p(t0 ) is a linear map
for any choice of t0 ∈ R. Note that L = ker T1 , since

ker T1 = {p ∈ Pn (R) : T1 (p) = 0}



= an xn + an−1 xn−1 + · · · + a1 x + a0 ∈ Pn (R) : an + an−1 + · · · + a1 + a0 = 0 .

By Proposition 2.10, ker T1 is a subspace of Pn (R), giving the desired result. 

Example 2.12

Consider the following subset of Mn (R),



K = A ∈ Mn (R) : A = −AT ;

namely, the set of skew-symmetric matrices. Show that K is a subspace of Mn (R).

Solution. We can attack this from two different directions. One is to recognize K as the kernel of
a linear transformation. Define T : Mn (R) → Mn (R) by T (A) = A + AT . You can quickly check
that this map is linear, and that ker T = K.
Alternatively, define S : Mn (R) → Mn (R) by S(A) = A − AT . We claim that K = image(S).
Indeed, note if that X ∈ image(S) then X = A − AT for some A ∈ Mn (R), and

−X T = −(A − AT )T = −AT + A = X.

On the other hand, if X ∈ K then X = −X T . We want to show that X ∈ image(S), so we need


to find some element A ∈ Mn (R) such that S(A) = X. Set A = (1/2)X, in which case
   T
T 1 1 1
S(A) = A − A = X − X = (X + X) = X,
2 2 2

as desired. Both inclusions show that K = image(S), thus K is a subspace of Mn (R). 

Now we know that the rank of a matrix is the same as the dimension of its column space,
or that rank(A) = dim(col A). Since we know that under the identification of T with its matrix
representation, col(A) = image(TA ), this allows us to define the rank of a linear transformation.
Definition 2.13
If T : V → W is a linear transformation, the rank of T is rank(T ) = dim(image(T )). The
nullity of T is nullity(T ) = dim(ker(T )).

The rank and nullity of a linear transformation play together very nicely, leading to the following
major theorem:

35
2013-
c Tyler Holden
2 Linear Transformations 2.2 The Kernel and Image

Theorem 2.14: Rank-Nullity

Let T : V → W be a linear transformation. If both ker T and image T are finite dimensional,
then V is finite dimensional and

dim V = dim(ker T ) + dim(image(T )) = rank(T ) + nullity(T ).

Proof. Since rank(T ) = k is finite, fix a basis {r1 , . . . , rk } for image(T ). By virtue of being in the
image of T , there exists elements si ∈ V such that T (si ) = ri for each i ∈ {1, . . . , k}. Similarly,
since nullity(T ) = ` is finite, let {t1 , . . . , t` } be a basis for ker T . It suffices to show that B =
{s1 , . . . , sk , t1 , . . . , t` } is a basis for V .
start by showing that B spans V . Let v ∈ V , so that T (v) ∈ image(T ). We can thus write
Let’sP
T (v) = i ci ri for some choice of coefficients ci ∈ R. Subtracting these vectors from one another,
and using the linearity of T we get
!
X X
0W = T (v) − ci ri = T v − ci si
i i
P
showing that v − i ci si ∈ ker T . We can thus write this in the basis for ker T as

X X k
X X̀
v− ci si = dj tj ⇒ v= ci si + dj tj
i j i=1 j=1

for some choice of coefficients dj ∈ R. Since v was arbitrary, any vector in V can be written as a
linear combination of the elements of B.
P P
For linear independence, suppose i ci si + j dj tj = 0V . We want to show that all of the
coefficients must be zero. Acting T on both sides of the equation give
 
X k X̀ k
X X̀ k
X
0W = T  ci si + 
dj tj = ci T (si ) + dj T (tj ) = ci ri .
i=1 j=1 i=1 j=1 i=1
| {z }
=0W

Since the {ri : i = 1, . . . , k} is a basis, they are P


linearly independent, and so ci = 0 for all i =
1, . . . , n. Our original equation thus reduces to j dj tj = 0V , but as the {tj : j = 1, . . . , `} is a
basis for ker(T ) it is also linearly independent, so dj = 0 for all j = 1, . . . , `. Thus we conclude
that the only linear combination of B which results in the zero vector is the trivial combination,
showing that B is linearly independent. Both results show that B is a basis, giving the result.

We often know the dimension of V , and so determining one of the nullity immediately tells us
the rank, and vice versa.2 This process can save a lot of time.
2
If you’ve taken any graph theory, you may have learned about the Euler Characteristic χ = V − E + F . There are
theorems which tell us how the Euler characteristic must behave. Surprisingly, the Rank-Nullity Theorem is another
manifestation of this fact, but you will probably have to go to graduate school to see why.

36
2013-
c Tyler Holden
2.2 The Kernel and Image 2 Linear Transformations

Example 2.15

Let A be an n × n matrix such that Ak = 0 for some k ∈ N. Show that every matrix
X ∈ Mn (R) can be written as X = AY − Y for some Y ∈ Mn (R).

Solution. Define the map T : Mn (R) → Mn (R) by T (Y ) = AY − Y , which you should check is
linear. If we can show that this map is surjective – that is, that image(T ) = Mn (R) – then we will
have shown that every matrix X can be written as X = T (Y ) = AY −Y . Since image(T ) ⊆ Mn (R),
it sufficies to show that dim(image(T )) = n, and by the Rank-Nullity Theorem, it suffices to show
that dim(ker(T )) = 0.
If Y ∈ ker T then 0 = T (Y ) = AY − Y , so AY = Y . By repeatedly multiplying this equation
by A, we arrive at the chain of equalities

Y = AY = A2 Y = A3 Y = A4 Y = · · · = Ak Y = 0

showing that Y = 0. Thus ker T = {0}, which in turn implies that dim(ker T ) = 0. 

Exercises

2.2-1. Verify the alternative solution to Example 2.12 by showing that T (A) = A + AT is linear,
and that K = ker T .
   
2 a b a
2.2-2. Consider the linear transformation T : M2 (R) → R given by 7→ .
c d c

(a) Find a basis for ker T .


(b) Conclude that rank(T ) = 2.

2.2-3. Suppose T : V → W is a linear transformation, with B = {b1 , . . . , bn } a basis for V .

(a) Show that {T (b1 ), . . . , T (bn )} is a basis for image(T ).


(b) Conclude that if T is surjective, then {T (b1 ), . . . , T (bn )} is a basis for W .

2.2-4. Let X ∈ Mn (R) be such that rank(X) = k for some k ≤ n. Define

U = {A ∈ Mn (R) : BA = 0} and V = {BA : A ∈ Mn (R)}

Show that U and V are subspaces of Mn (R), and that dim(U ) = n(n − k) and dim(V ) = nk.

2.2-5. Let T : V → W be a linear transformation, with {b1 , . . . , bn } a basis for V . Show that if
{b1 , . . . , bk } is a basis for ker T for some k ≤ n, then {T (bk+1 ), T (bk+2 ), . . . , T (bn−1 ), T (bn )}
is a basis for image(T ).

2.2-6. Let T1 : Pn (R) → R be the evaluation map at t0 = 1; that is, T1 (p) = p(1). Show that T1 is
surjective, and that xk − xk−1 : k = 1, . . . , n is a basis for ker T1 .

2.2-7. Fix some t0 ∈ R, and define T : Pn (R) → Pn−1 (R) by p(x) 7→ p(x + t0 ) − p(x).

(a) Argue that T is well-defined; that is, that T (p) ∈ Pn−1 (R).

37
2013-
c Tyler Holden
2 Linear Transformations 2.3 Isomorphisms

(b) Show that T is surjective, and conclude that every polynomial q of degree n − 1 can be
written as q(x) = p(x + t0 ) − p(x) for some p ∈ Pn (R).

2.2-8. (Challenging) Let V be a vector space.

(a) Show that for every subspace W ⊆ V , there is a linear transformation T : V → W such
that W = ker T .
(b) Show that for every subspace W ⊆ V , there is a linear transformation T : U → V such
that W = image T .

2.3 Isomorphisms

While we’ve discussed injective and surjective maps already, they will play a special role in this
section. To recall, a (not necessarily linear) function f : X → Y is injective if whenever f (x) = f (y)
then x = y. That same function is said to be surjective if for every y ∈ Y there exists an x ∈ X
such that f (x) = y. A function is bijective if it is both injective and surjective.
Focusing back on linear algebra, images and kernels play an important role in determining
whether a function is surjective and injective respectively. Per the definition, a linear transformation
T : V → W is surjective if and only if image(T ) = W . That linear transformation is injective if
and only if ker(T ) = {0} (Exercise 2.1-6). This immediately tells us that if V and W are finite
dimensional vector spaces, and T : V → W is a linear bijection between them, that

dim(V ) = dim(ker T ) + dim(image T ) = dim(W ).


| {z }
=0

So a linear bijection between finite dimensional vector spaces can only exist if the vector spaces are
of the same dimension.
What we’re moving towards is the notion of an isomorphism. In mathematics, we’re often
interested in determining when two objects are the same, but are disguised to look different. For
example, consider the vector space R, and the subspaces
 
0 1
span {x} ⊆ P2 (R), and span ⊆ M2 (R).
0 0
You can quickly check that each of these spaces is 1-dimensional.
Let’s add and scalar multiply a few vectors in each subspace

1. 2 + 5 = 7 and 3 · 2 = 6

2. 2x + 5x = 7x and 3 · 2x = 6x
         
0 2 0 5 0 7 0 2 0 6
3. + = and 3 · = .
0 0 0 0 0 0 0 0 0 0

The choice of real numbers certainly doesn’t matter:


     
0 a 0 b 0 a+b
(ax) + (bx) = (a + b)x and + = .
0 0 0 0 0 0

38
2013-
c Tyler Holden
2.3 Isomorphisms 2 Linear Transformations

When we add thing together, it “looks like” a copy of R. And in fact, these are all “the same
space” but in disguise.
So how do we make sense of this? When we defined linear transformations, we argued that the
functions should preserve the structure of the vector space; namely, addition and scalar multipli-
cation. Hence if T : V → W then T (x + y) = T (x) + T (y) and T (cx) = cT (x). There’s a sense in
which the linear transformation is copying the information from V into W . To be an isomorphism,
there should be a perfect correspondence between the vector space structure on V and the vector
space structure on W .
Definition 2.16
Two vector spaces V and W are said to be isomorphic if there exist linear maps T : V → W
and S : W → V such that S ◦ T = IV and T ◦ S = IW , where IV and IW are the identity
maps on V and W respectively. In this case, the maps S and T are said to be isomorphisms,
and we write V ∼
= W.

Remark 2.17

1. The condition that S ◦ T = IV and T ◦ S = IW says that as functions, S and T are


inverses of one another. We already know that this means both S and T are bijective
maps. The condition of bijectivity has thus been omitted, but it is implicit.

2. This definition does not require that V and W be finite dimensional vector spaces.

3. This is not how most linear algebra textbooks would define an isomorphism. Instead,
they say that T : V → W is an isomorphism if it is a bijective linear map. This is a
simpler definition – and we’ll show in Theorem 2.19 that it is equivalent to Definition
2.16 – but the traditional definition is misleading. As you study more mathematics,
you’ll learn about isomorphisms in those contexts as well. An isomorphism is always
defined as an invertible map which preserves structure (in our case, a linear map),
and whose inverse also preserves the structure. It just so happens that an invertible
linear map has a linear inverse, so you can skip the statement that the inverse must be
linear. However, in other fields of mathematics, the inverse of a structure-preserving
map is not usually structure-preserving.

If two spaces V and W are isomorphic, this means that as vector spaces they are identical :
They may look different, but structurally they are exactly the same.
Example 2.18

Show that Rn+1 ∼


= Pn (R).

Solution. With both of the vector spaces above, there are “placeholders.” For Rn+1 written as
a column vector, the placeholder is where a number occurs in a column, say the kth position.
In the case of Pn (R), the place holder is the monomial xk . So we should define a map which
sends the kth element of the column to the coefficient of xk . Thus define the pair of inverse maps

39
2013-
c Tyler Holden
2 Linear Transformations 2.3 Isomorphisms

T : Rn+1 → Pn (R) and S : Pn (R) → Rn+1 by


   
a0 a0
 a1   a1 
  n+1 n+1  
T .. 7→ a0 + a1 x + · · · + an+1 x and S(a0 + a1 x + · · · an+1 x )= .. .
 .  .
an+1 an+1

Both maps are linear, and are inverses to one another, and so are isomorphisms. We conclude that
Rn+1 ∼
= Pn (R). 

Theorem 2.19

If T : V → W is a linear transformation, then T is an isomorphism if and only if T is a linear


bijection.

Proof. (⇒) This direction is trivial. If T is an isomorphism, then it has an inverse S : W → V


which is also linear. This guarantees that T is bijective, and it’s linear by assumption.
(⇐) Suppose T is a linear bijection. Since T is a bijection, it has a set-theoretical inverse
S : W → V . Now we only know that S is a function, so we have to show that it’s linear. Let
x, y ∈ W and c ∈ R. Since T is bijective, there exist v, w ∈ V such that T (v) = x and T (w) = y;
equivalently, S(x) = v and S(y) = w. Now T (v + w) = T (v) + T (w) = x + y implies that

S(x + y) = v + w = S(x) + S(y).

Similarly, T (cv) = cT (v) = cx show that S(cx) = cv = cS(x). Thus S is linear.

These are the two main ways of establishing whether a linear transformation is a isomorphism:
Show that it’s bijective, or show that it admits a linear inverse. There are many cases where one
is more advantageous than the other, so be sure that you explore both in the event that one is
proving too difficult.
Example 2.20

Show that Mm×n (R) ∼


= Mn×m (R).

Solution. Consider the map T : Mm×n (R) → Mn×m (R) given by T (A) = AT . We know that this
map is linear, so it remains to show that it is bijective. We claim this map is injective, and indeed
if T (A) = AT = 0 then each entry of A must be zero, showing that A = 0. Thus ker T = {0}, and
we conclude that T is injective.
Since the dimension of the domain and codomain are both mn, by rank nullity we then have
that mn = dim(ker T ) + dim(image T ) = dim(image T ) showing that T is surjective. Thus T is
bijective, and we conclude that Mm×n (R) ∼
= Mn×m (R) as required. 

A quick result inspired by Example 2.20, which you will prove in Exercise 2.3-8, is that if
T : V → W and dim(V ) = dim(W ), then T is an isomorphism if T is either injective or surjective;

40
2013-
c Tyler Holden
2.3 Isomorphisms 2 Linear Transformations

that is, proving one of injectivity or surjectivity gives you the other for free. In fact, we can go one
step further:
Theorem 2.21

If V and W are finite dimensional vector spaces such that dim(V ) = dim(W ), then V ∼
= W.

Proof. Let the dimension of both vector spaces be n, and fix bases {v1 , . . . , vn } and {w1 , . . . , wn }
for V and W respectively. Define the linear transformation T : V → W by demanding that
T (vi ) = wi , i = 1, . . . , n and extending linearly. By Exercise 2.3-8, it suffices to show that
P T is
injective. Indeed, suppose v ∈ ker T , so that T (v) = 0W . Write v in the basis for V as v = i ci vi ,
so that !
Xn Xn n
X
0W = T ci vi = ci T (vi ) = ci wi .
i=1 i=1 i=1

Since {w1 , . . . , wn } is a basis for W it is linearly independent, showing that all the ci must be zero.
This shows that v = 0V , and we conclude that ker T = {0V }. Thus T is injective, hence bijective,
and hence an isomorphism.

For each natural number n ∈ N we know there exists a vector space of dimension n; namely, Rn .
Theorem 2.21 is profound because it tells us says that there is only one vector space of dimension
n, up to isomorphism.3 Hence there is a sense in which every finite dimensional vector space is just
a copy of Rn in disguise.
Now this doesn’t mean that the study of isomorphisms is concluded. There are still many
reasons to use isomorphisms. Often we use isomorphism to change our perspective of a problem,
allowing us to solve it more simply. Other times, a linear transformation might naturally arise, and
knowing that it is an isomorphism will allow us to invoke powerful tools.

Exercises

2.3-1. Fix some θ ∈ [0, 2π), and define Tθ : R2 → R2 by


    
x cos(θ) sin(θ) x
7→ .
y − sin(θ) cos(θ) y

Show that Tθ is an isomorphism.

2.3-2. Suppose T : V → W is a linear transformation.

(a) Show that rank(T ) = dim(V ) if and only if T is injective.


(b) Show that rank(T ) = dim(W ) if and only if T is surjective.
(c) Conclude that rank(T ) = dim(V ) = dim(W ) if and only if T is bijective.

2.3-3. Let T : V → W be a linear transformation. Show that the following are equivalent:
3
In fact, this result transcends the natural numbers. For each cardinal c, there is a unique vector space whose
dimension has cardinality c, up to isomorphism.

41
2013-
c Tyler Holden
3 Change of Basis

(a) T is injective.
(b) There exists a linearly independent set {v1 , . . . , vk } in V such that {T (v1 ), . . . , T (vk )}
is linearly independent in W .
(c) For every linearly independent set {v1 , . . . , vk } in V , {T (v1 ), . . . , T (vk )} is linearly in-
dependent in W .

2.3-4. In Section 1.6.2 we learned about external direct sums. Suppose U, W are vector spaces, and
let V = U ⊕ W be their direct sum.

(a) Define Û = U × {0}. Show that Û ∼


= U . Conclude that Ŵ = {0} × W ∼
= W.
(b) Show that V = U ⊕ W is isomorphic to V = Û ⊕ Ŵ , where the former is the external
direct sum, and the latter is the internal direct sum.

2.3-5. Let T : V → W be a linear transformation. Show that the following are equivalent:

(a) T is surjective.
(b) There exists a spanning set {v1 , . . . , vk } in V such that {T (v1 ), . . . , T (vk )} is a spanning
set in W .
(c) For every spanning set set {v1 , . . . , vk } in V , {T (v1 ), . . . , T (vk )} is a spanning set in W .

2.3-6. Let T : V → W be a linear transformation. Show that the following are equivalent:

(a) T is bijective.
(b) There exists a basis {v1 , . . . , vk } in V such that {T (v1 ), . . . , T (vk )} is a basis in W .
(c) For every basis {v1 , . . . , vk } in V , {T (v1 ), . . . , T (vk )} is a basis in W .

2.3-7. We know there is a bijective correspondence between m × n-matrices A and linear transfor-
mations T : Rn → Rm . Show that there is a bijective correspondence between invertible n × n
matrices and isomorphisms T : Rn → Rn .

2.3-8. Suppose V, W are finite dimensional vector spaces such that dim(V ) = dim(W ). Suppose
that T : V → W is a linear map.

(a) Show that if T is injective, then T is an isomorphism.


(b) Show that if T is surjective, then T is an isomorphism.

2.3-9. Let Vec denote the set of all finite dimensional vector spaces. Define an relation on Vec by
saying that V ∼ W if V ∼= W . Show that this is an equivalence relation.

3 Change of Basis

When we first started learning about linear algebra, our emphasis was on Rn and the standard
basis E = {e1 , . . . , en }. We have since learned about more abstract vector spaces V , and about
more general bases. Even in the case of Rn there are alternate bases to the standard basis.

42
2013-
c Tyler Holden
3.1 Coordinate Transformations 3 Change of Basis

There’s a subtle difference between writing (8, −4, 1) ∈ R3 as a triple of numbers in the set R3 ,
 T
and writing the vector v = 8 −4 1 in the vector space R3 . In the latter case, we are thinking
of this as representing the vector in the standard basis as

v = 8e1 + (−4)e2 + e3 .

This distinction may seem pointless, because the two representations coincide. However, there’s
nothing mathematically special about the standard basis, and we could have used a different basis.
Suppose instead that we used that basis
     
1 1 1
   
b1 = 0 , b2 = −1 , and b3 =  1 .
1 0 −1

We know that in any basis B = {b1 , b2 , b3 }, every vector v can be written uniquely as a linear
combination of the basis elements. In the case of (8, −4, 1) we could write this as
     
1 1 1
v = 2 0 + 5 −1 + 1  1 .
1 0 −1
If the basis is understood to
 be implicit,
 then the coefficients are the important part, and we can
say that v corresponds to 2 5 1 .
 T  T
Now we’re in trouble, because I’ve written v = 8 −4 1 = 2 5 1 , which is clearly
 T  T
nonsense. So let’s be more careful and write this as v = 8 −4 1 E = 2 5 1 B . These are
the same vector, just written in different representations. Do matrices also change if we specify
different bases? Are there things which are independent of the choice of basis?
In this section, we are going to analyze how we can move between different bases and how those
representations change correspondingly.

3.1 Coordinate Transformations

Definition 3.1
Suppose V is a finite dimensional vector space, dim V = n, and B = {b1 , . . . , bn } is a basis
of V . The coordinate transformation of V with respect to B is the map CB : V → Rn such
that  
c1
 c2 
 
CB (c1 b1 + c2 b2 + · · · + cn bn ) =  . .
 ..
cn

Remark 3.2
1. The order of the basis elements matter! For example, E1 = {e1 , e2 } is a different basis
 T  T  T
for R2 than E2 = {e2 , e1 }. For example, CE1 ( 2 4 ) = 2 4 , while CE2 ( 2 4 ) =

43
2013-
c Tyler Holden
3 Change of Basis 3.1 Coordinate Transformations

 T
4 2 .

2. The coordinate transformation is linear and invertible, and is therefore an isomorphism


between V and Rn (Exercise 3.1-1).

3. You should convince yourself that if B = {b1 , b2 , . . . , bn } is a basis, then CB (bi ) = ei .

Example 3.3
 
Consider P2 (R) with the bases B = 1, x, x2 and D = x, 1 + x, 1 + x2 . If p(x) = 1 + 2x2 ,
determine CB (p) and CD (p).

Solution. We need to write p(x) = 1 + 2x2 in each of these bases. You should check that

p(x) = 1 · 1 + 0 · x + 2 · x2
= 1 · x + (−1) · (1 + x) + 2 · (1 + x2 ).

 T  T
Therefore, CB (1 + 2x2 ) = 1 0 2 and CD (1 + 2x2 ) = 1 −1 2 . 

Now that we have a method of writing vectors in coordinates, we need to look at what happens
to linear transformation. In general, we now need two bases. If V and W are finite dimensional
vector spaces with dim V = n and dim W = m, and T : V → W is a linear transformation, then we
need to specify a basis B on V and a basis D on W . Combined with the coordinate transformations,
we can visualize this with the following diagram:

V
T /W (3.1)

CB CD
 
Rn / Rm
TA

Since CB is an isomorphism, we know it is invertible, so we can define the composite map CD ◦


T ◦ CB−1 : Rn → Rm . Being a composition of linear maps, this map is also linear, and so can be
represented by a matrix transformation TA : Rn → Rm , x 7→ Ax, where A ∈ Mm×n (R). We can
write this as either
TA = CD ◦ T ◦ CB−1 or TA ◦ CB = CD ◦ T. (3.2)

The latter equation in particular says that Diagram (3.1) commutes.4

4
A diagram is a series of arrows between objects. Usually these arrows represent functions, but this is not necessary.
A diagram commutes if regardless of what path you take, the evaluation is always the same. In Diagram (3.1), we
start at V and end at Rm . We can either take the path specified by CD ◦ T , or take TA ◦ CB . Both paths give precisely
the same output.

44
2013-
c Tyler Holden
3.1 Coordinate Transformations 3 Change of Basis

Definition 3.4
Let V, W be finite dimensional vector spaces of dimension n and m, and bases B and D,
respectively. If T : V → W is a linear transformation, we define the matrix of T in the
bases B and D to be the matrix A ∈ Mm×n (R) such that if TA : Rn → Rm is the linear
transformation satisfying TA (x) = Ax, then TA ◦ CB = CD ◦ T .

We can define a function MDB : L(V, W ) → L(Rn , Rm ) which assigns to each linear transforma-
tion T its matrix MDB (T ) in the bases of B and D. So how do we compute MDB (T )? The key lies
in the fact that if bi ∈ B is the ith element of the basis, then CB (bi ) = ei . Thus if we substitute
bi into Equation (3.2) we get

CD (T (bi )) = TA (CB (bi )) = TA (ei ) = ith column of A.


| {z }
ei

So the ith column of A can be computed by evaluating CD (T (bi )), meaning that
 
MDB (T ) = CD (T (b1 )) CD (T (b2 )) · · · CD (T (bn )) . (3.3)

Note that when V = Rn , W = Rm , and En , Em are the standard bases on Rn and Rm , then Equation
(3.3) is how we usually compute the matrix associated to a linear transformation.
Example 3.5

Let I2 : R2 → R2 be the identity map. If the domain and codomain are equipped with the
bases        
1 1 2 1
B= , and D = ,
1 0 1 2
respectively, find MDB (I2 ).

Solution. Let bi denote the ith element of the basis B. According to Equation (3.3), we need to
compute CD (I2 (bi )) = CD (bi ) for i = 1, 2. Doing this, we get
   
1 1 1
CD (b1 ) = CD =
1 3 1
   
1 1 2
CD (b2 ) = CD = .
0 3 −1

Thus
 
1 1 2
MDB (I2 ) = ,
3 1 −1

showing that the identity transformation’s matrix need not be the identity matrix. 

45
2013-
c Tyler Holden
3 Change of Basis 3.1 Coordinate Transformations

Example 3.6

Consider the map T : P2 (R) → M2 (R) given by


 
2 a b
T (a + bx + cx ) = .
b c

Let B = 1, 1 + x, 1 + x2 be a basis for P2 (R), and let
       
1 0 0 1 1 0 0 −1
D= , , ,
0 1 1 0 0 −1 1 0

be the basis for M2 (R). Find MDB (T ).

Solution. Let bi , i = 1, 2, 3 be the basis elements of B in order. Per Equation (3.3), we need to
compute CD (T (bi )) for each basis element. Doing so we have

 
1 0 1 T
CD (T (b1 )) = CD 1 0 1 0 ,
=
0 0 2
 
1 1 1 T
CD (T (b2 )) = CD = 1 2 1 0 ,
1 0 2
 
1 0  T
CD (T (b2 )) = CD = 1 0 0 0 .
0 1

Thus we conclude that


 
1 1 2

1 0 2 0
MDB (T ) =  . 
2 1 1 0
0 0 0

As a sanity check, we know that if TA : Rn → Rm and TB : Rm → Rk are linear transformations


induced by matrices A and B, then TB ◦ TA = TBA . We expect the same thing to be true for the
matrices in different bases, and indeed this is the case.
Proposition 3.7

Let T : V → W and S : W → U . If B, D, and F are bases of V, W, and U respectively, then

MFB (ST ) = MF D (S)MDB (T ). (3.4)

Proof. The algebraic proof of this is messy, but the proof effectively boils down to the fact that

46
2013-
c Tyler Holden
3.1 Coordinate Transformations 3 Change of Basis

this diagram commutes:

ST

V
T /W S /& U

CB CD CF
  
Rn / Rm / k
TMDB (T ) TMF D (S) 7R

TMF B (ST )

Algebraically, we know that

TMF B (ST ) ◦ CB = CF ◦ S ◦ T = TMF D (S) ◦ TMDB (T ) ◦ CB = TMF D (S)MDB (T ) ◦ CB

showing that MF B (ST ) = MF D (S)MDB (T ) as required.

This gives some insight into why we’ve written the bases backwards when writing MDB , since
it ensures adjacent “cancellation” in Equation (3.4) above.

Exercises

3.1-1. Let V be a finite dimensional vector space and B be a basis for V . Let CB : V → Rn be the
corresponding coordinate transformation on V .

(a) Show that CB is a linear transformation.


(b) Show that CB is a vector space isomorphism.

3.1-2. In each case of a finite dimensional vector space V and basis B below, determine the coordinate
transformation CB : V → Rdim V

(a) V = R3 , B = {e3 , e1 , e2 } where ei is the standard basis.



(b) V = P3 (R), B = x − 1, x2 − 1, x2 − x
       
1 0 1 1 0 0 0 0
(c) V = M2 (R), B = , , , .
0 0 0 1 1 1 1 0

3.1-3. Let 0 : V → W be the zero map. Show that for any bases B of V and D of W , MDB (0) = 0,
the zero matrix.

3.1-4. Let I : V → V be the identity map, and dim(V ) = n. Show that for any bases B of V ,
MBB (I) = In .

3.1-5. Let T : P2 (R) → M2 (R) be the transformation defined in Example 3.6, with B and D defined
there as well.

(a) Compute MBB (T ) and MDD (T ).


(b) Find a matrix P such that MBB (T )P = MDD (T )

47
2013-
c Tyler Holden
3 Change of Basis 3.2 Change of Basis Matrix

 
a+b
3.1-6. Consider the map T : P3 (R) → R2 given by T (a + bx + cx2 + dx3 ) = .
c−d

(a) Let B1 = 1, x, x2 , x3 and E1 be the standard basis for R2 . Determine MDB (T ).
   
 2 3
1 −1
(b) Let B1 = 1, x, x , x and E2 = , be a basis for R2 . Determine MDB (T ).
1 1

(c) Let B2 = 2, 1 − x, x2 + x3 , x3 and E1 be the standard basis for R2 . Determine MDB (T ).
(d) Let B2 and E2 be as defined above. Determine MDB (T ).
3.1-7. Suppose V is a vector space with two bases, B and D. If CB and CD are the respective
coordinate transformations, find a matrix P such that for all v ∈ V , CB (v) = (TP ◦ CD )(v).
3.1-8. Let T : V → W be a linear transformation of finite dimensional vector spaces. Assume
dim(V ) = dim(W ). Show that the following are equivalent:
(a) T is an isomorphism
(b) For every basis B of V and D of W , MDB (T ) is invertible.
(c) There exists a basis B of V and D of W such that MDB (T ) is invertible.
3.1-9. Suppose V and W are finite dimensional vector spaces with dim V = n and dim W = m. Let
T : V → W be a linear map.
 Showthat if k = dim(ker T ), then there are bases B of V and
0 In−k
D of W such that MDB = . Hint: Start with a basis for ker T and extend this to a
0 0
basis of V .

3.2 Change of Basis Matrix

We’ll focus our conversation from the previous section to the special case of linear operators T :
V → V , where both copies of V are endowed with the same basis. In this case, we will write
MB (T ) = MBB (T ) for convenience of notation. Now if we change the basis on V to D, the question
we’d like to consider is how to relate MB (T ) to MD (T ); that is, how do we change the basis?
Let’s do a toy example to set up our problem:
Example 3.8

Let T : R2 → R2 be given by T (x, y) = (x + y, x − y). If


       
1 1 4 1
B= , and D = , ,
1 −1 5 0

find MB (T ) and MD (T ).

Solution. For MB (T ) we need to determine CB (T (b1 )) and CB (T (b2 )), where bi are the basis
elements of B:
 T   T
CB (T (b1 )) = CB 2 0 = 1 1
 T   T
CB (T (b2 )) = CB 0 2 = 1 −1

48
2013-
c Tyler Holden
3.2 Change of Basis Matrix 3 Change of Basis

so  
1 1
MB (T ) = .
1 −1
Let di denote the basis elements for D, so that
 T  1  
CD (T (d1 )) = CD 9 −1 = −1 49
5
 T  1  
CD (T (d2 )) = CD 1 1 = 1 1
5
so  
1 −1 1
MD (T ) = . 
5 49 1

The relationship between MB (T ) and MD (T ) is probably not clear, so we’ll have to think about
this a bit more deeply. You hopefully found in Exercise 3.1-7 that CD = TMDB (IV ) ◦ CB , or if v ∈ V
then CD (v) = MDB (IV )CB (v), where IV : V → V is the identity map.
Definition 3.9
If V is a vector space with bases B and D, we define the change of basis matrix PDB =
MDB (IV ).

Theorem 3.10

If T : V → V is a linear operator on a finite dimensional vector space V , and V is equipped


with bases B and D, then TPDB ◦ TMB (T ) = TMD (T ) ◦ TPDB

Proof. This proof is a bit of algebraic magic. We know three things:

1. CD = TPDB ◦ CB ,

2. CB ◦ T = TMB (T ) ◦ CB , and

3. CD ◦ T = TMD (T ) ◦ CD .

We apply TPDB two (2), to get

TPDB ◦ TMB (T ) ◦ CB = TPDB ◦ CB ◦ T


= CD ◦ T by (1)
= TMD (T ) ◦ CD by (3)
= TMD (T ) ◦ TPDB ◦ CB by (1).

Since CB is an isomorphism, it is invertible. Precomposing with CB−1 gives the desired result.

Let’s translate the result of Theorem 3.10 at the level of matrices, which says that PDB MB (T ) =
−1
MD (T )PDB . Since PDB is invertible, we can write this as MB (T ) = PDB MD (T )PDB , so that MB (T )
and MD (T ) are similar.

49
2013-
c Tyler Holden
3 Change of Basis 3.2 Change of Basis Matrix

Example 3.11

Consider the same linear transformation and bases as Example 3.8. Compute PDB and
−1
confirm that MB (T ) = PDB MD (T )PDB .

 
Solution. By definition, we know PDB = MDB (I2 ) = CD (b1 ) CD (b2 ) . Computing these terms
we get
   
1 1 1
CD (b1 ) = CD =
1 5 1
   
1 1 −1
CD (b2 ) = CD =
−1 5 9
so    
1 1 −1 −1 1 9 1
PDB = with PDB = .
5 1 9 2 −1 1
−1 −1
You can quickly check that PDB = PBD and that MB (T ) = PDB MD (T )PDB . 

Theorem 3.10 says that given two bases B and D, the matrices MB (T ) and MD (T ) are similar.
Is the converse true? Namely, given two similar n × n matrices A = P −1 BP , are there bases B
and D such that A = MB (TA ) and B = MD (TB )? One of these is easy: By setting B = E to be
the standard basis of Rn , we get ME (TA ) = A, so the only question is whether the desired basis D
exists.
Theorem 3.12

If A, B ∈ Mn (R) are similar, and TA : Rn → Rn is the usual linear transformation induced


by A, then there is a basis D of Rn such that MD (TA ) = B.

Proof. Since A and B are similar, there is an invertible P ∈ Mn (R) such that A = P −1 BP . Set
D = {p1 , p2 , . . . , pn }, where pi is the ith column of P . We claim that this basis does the trick.
Indeed, note that
   
PED = MED (In ) = CE (p1 ) CE (p2 ) · · · CE (pn ) = p1 p2 · · · pn = P.
−1
So by Theorem 3.10 we have that MD (TA ) = PED ME (TA )PED = P −1 AP = B, as required.

The proof technique above, performed in reverse, also gives us the following convenient way of
computing MD (TA ):

Corollary 3.13

If A ∈ Mn (R) and D is a basis for Rn , then MD (TA ) = P −1 AP where P is the matrix whose
columns are the elements of D.

Hence if A ∈ Mn (R) is diagonlizable (A = P −1 DP for some diagonal matrix D), then taking
the columns of P as a basis D for Rn will make MD (TA ) = D a diagonal matrix.

50
2013-
c Tyler Holden
3.2 Change of Basis Matrix 3 Change of Basis

Recall that the determinant and trace of a matrix are invariant under similar matrices. For
example, if A = P −1 BP then

det(A) = det(P −1 BP ) = det(P )−1 det(B) det(P ) = det(B),

using the multiplicative nature of the determinant; and

Tr(A) = Tr(P −1 BP ) = Tr(P P −1 B) = Tr(B),

by the cyclic invariance of the trace. Since different representations of linear operators in different
bases are all naturally similar, we can extend these notions to general linear transformations in a
well-defined manner.
Definition 3.14
Suppose V is a finite dimensional vector space, T : V → V is a linear operator, and B is a
fixed basis for V .

1. The determinant of T is det(T ) = det(MB (T )),

2. The trace of T is Tr(T ) = Tr(MB (T )),

3. The characteristic polynomial of T is pλ (T ) = det(T − λIV ).

Furthermore, by Exercise 3.1-8 we know that T is a linear isomorphism if and only if MB (T ) is


invertible. Thus a linear operator T : V → V is an isomorphism if and only if det(T ) 6= 0.
Example 3.15

Consider the linear operator T : P2 (R) → P2 (R) given by

T (ax2 + bx + c) = ax2 + (a + 2b)x + (a + b + 3c).

Find the characteristic polynomial of T .

Solution. We can easily compute that

(T − λIP2 (R) )(ax2 + bx + c) = a[1 − λ]x2 + [a + b(2 − λ)]x + [a + b + c(3 − λ)].



Let B = x2 , x, 1 be the standard basis of P2 (R), so that
 
1−λ 0 0
MB (T − λI) =  1 2−λ 0 .
1 1 3−λ
Since this is lower-triangular, its determinant is the product of its diagonal, so that

pλ (T ) = det(T − λI) = det(MB (T − λI)) = (1 − λ)(2 − λ)(3 − λ). 

There are also basis independent ways of defining these objects, but that’s a bit trickier. Fur-
thermore, with access to the characteristic polynomial of a linear transformation, we can define
eigenvalues, and with eigenvalues we can define eigenvectors.

51
2013-
c Tyler Holden
3 Change of Basis 3.2 Change of Basis Matrix

Definition 3.16
Let T : V → V be a linear operator. A vector v ∈ V is said to be an eigenvector of T if
there exists some λ ∈ R such that T (v) = λv. In this case, λ is said to be the eigenvalue
associated to v.

As with the case of matrix operators, the eigenvalues of T : V → V are the roots of the
characteristic polynomial. Indeed, since T (v) = λv, we know that (T − λI)v = 0. For this to have
non-trivial solutions, it must be the case that T is not a linear isomorphism, which corresponds to
having zero determinant.
Example 3.17

Consider the linear operator T : P2 (R) → P2 (R) given by

T (ax2 + bx + c) = ax2 + (a + 2b)x + (a + b + 3c).

Determine the eigenvalues and eigenvectors of T .

Solution. We found the characteristic polynomial of T in Example 3.15, wherein pλ (T − λI) =


(1 − λ)(2 − λ)(3 − λ). The roots of pλ (T − λI) are evidently λ = 1, 2, 3, corresponding to the
eigenvalues of T .
For λ = 1, we must have T (ax2 + bx + c) = ax2 + bx + c, which gives a homogeneous system of
equations with coefficient matrix
   
0 0 0 1
1 1 0 with fundamental solution −1 .
1 1 2 0

Evidently, v = x2 − x is an eigenvector corresponding to the eigenvalue λ = 1. Similarly, for λ = 2


we have v2 = x − 1, and for λ = 3 we have v3 = 1. 

Definition 3.18
If T : V → V is a linear operator, and λ is an eigenvalue of T , then the eigenspace of T
corresponding to λ is
Eλ = {v : T (v) = λv} .

It isn’t too difficult to show that Eλ is a subspace of V .

Exercises

3.2-1. For each vector space and pair of bases, compute the change of basis matrix:

(a) R2 with bases        


1 3 1 0
B= , and D = ,
1 4 0 2

52
2013-
c Tyler Holden
3.3 Invariant Subspaces 3 Change of Basis

(b) P2 (R) with bases


 
B = x + 1, x − 2, x2 and D = x2 , x2 + x, x2 + 1 .

(c) M2 (R) with bases


       
1 0 1 0 0 1 0 1
B = {Eij : 1 ≤ i, j ≤ 2} and D = , , , .
0 1 0 −1 1 0 −1 0

(d) V a four dimensional vector space with bases B = {b1 , b2 , b3 , b4 } and D = {b2 , b4 , b3 , b1 }.

3.2-2. Suppose V a vector space.

(a) If B is a basis, then PBB = I


−1
(b) Show that if B, D are two bases for V , then PDB is invertible, and PDB = PBD .
(c) Show that if B, D, F are bases for V , then PF D PDB = PF B

3.2-3. Show that if T : V → V is a linear operator and λ is an eigenvalue for T , then Eλ is a


subspace of V .

3.2-4. Suppose that T : V → V is a linear operator and λ is an eigenvalue of T . If B is any


basis of V , and EλB is the eigenspace of TMB (T ) in Rdim V , then the coordinate transformation
CB : V → Rn restricted to Eλ is an isomorphism Eλ ∼ = EλB .

3.2-5. For each of the following linear operators, compute the rank, determinant, and characteristic
polynomial:

(a) T : R2 → R2 , T (a, b) = (a + 2b, 2a − b)


(b) T : P2 (R) → P2 (R), T (ax2 + bx + c) = 2ax + b
(c) T : M2 (R) → M2 (R), T (A) = AT

3.3 Invariant Subspaces

Definition 3.19
If V is a vector space with subspace U , and T : V → V is a linear operator, we say that U
is T -invariant if T (U ) ⊆ U .

If U is a T -invariant subspace, we can be guaranteed that the restriction of T to U is still a


linear operator; namely, T |U : U → U makes sense and is linear.
Example 3.20

Consider the linear operator T : P2 (R) → P2 (R) given by

T (ax2 + bx + c) = ax2 + (a + 2b)x + (a + b + 3c).



Let U = ax2 + (b − a)x − b : a, b ∈ R . Show that U is T -invariant.

53
2013-
c Tyler Holden
3 Change of Basis 3.3 Invariant Subspaces

Solution. Let v = ax2 + (b − a)x − b be a generic element of U . Evaluating T on v we get

T (v) = T (ax2 + (b − a)x − b) = ax2 + [a + 2(b − a)]x + [a + (b − a) − 3b]


= ax2 + (2b − a)x + (−2b)

which is of the correct form to be an element of U . Thus U is T -invariant. 

Checking generic points can be tricky. There’s is a much simpler way of determining whether
a space is T -invariant if you can determine a spanning set for U .
Proposition 3.21

Suppose T : V → V is a linear operator, and U = span {u1 , . . . , uk }. If T (ui ) ∈ U for each


i = 1, . . . , k, then U is a T -invariant subspace of V .

P Suppose that T (ui ) ∈ U for each ui . If u ∈ U , write u in terms of the spanning elements as
Proof.
u = i ci ui . Applying T gives
!
X X
T (u) = T ci ui = ci T (ui ) .
| {z }
i i
∈U

Since U is a subspace it is closed under linear combinations, and so we conclude that T (u) ∈ U as
required.


Example 3.20 is now a bit easier to solve, if we recognize that U = span x2 − x, x − 1 . To
show that U is T -invariant, we need only show that each of these spanning elements is T -invariant,
and indeed

T (x2 − x) = x2 − x ∈ U
T (x − 1) = 2x − 2 = 2(x − 1) ∈ U

so U is T -invariant. In fact, you may have recognized that the spanning elements here are precisely
the eigenvectors we found in Example 3.17. In general, eigenspaces Eλ of the operator T are
T -invariant.
Proposition 3.22

If T : V → V is a linear operator, and Eλ is an eigenspace of T , then Eλ is T -invariant.

Proof. If v ∈ Eλ then by definition, T (v) = λv ∈ Eλ , since Eλ is a subspace of V .

The fact that eigenspaces represent natural T -invariant spaces will be helpful, especially in light
of the following:

54
2013-
c Tyler Holden
3.3 Invariant Subspaces 3 Change of Basis

Theorem 3.23

Suppose that T : V → V is a linear operator on a finite dimensional vector space. If we can


write V = U ⊕ W where U and W are both T -invariant subspaces, then there exists a basis
B = BU ∪ BW , where BU and BW are bases for U and W respectively, and
 
MBU (T ) 0
MB (T ) = .
0 MBW (T )

Proof. Fix a basis BU = {u1 , . . . , uk } of U and a basis BW = {w1 , . . . , w` } of W . We know that


B = BU ∪ BW is a basis for V . Moreover, since U is T -invariant we know that T (ui ) ∈ U for each
ui ∈ BU , so
k
X X̀
T (ui ) = cr ur + 0ws .
r=1 s=1

The same reasoning shows that

k
X X̀
T (wi ) = 0ur + ds ws .
r=1 s=1

Thus    
c1 0
 ..  ..
 .  .
   
ck   0
CB (T (ui )) = 
 0
 and CB (T (wi )) =  
d1  .
   
 ..  ..
 .  .
0 d`
Both results together show that MB (T ) is block diagonal, as required.

One can inductively extend Theorem 3.23, so that if V = U1 ⊕ U2 ⊕ U3 then there is a basis B
such that MB (T ) is further refined into a three-block diagonal, and so on.
Example 3.24

Let E = {e1 , e2 , e3 } be the standard basis for R3 , and let T : R3 → R3 be the linear operator
which rotates about e3 by an angle π/4 in the counter-clockwise direction when the origin
is viewed from e3 . Write T in block diagonal form.

Solution. Intuitively, our invariant subspaces are W = {e1 , e2 } and U = {e3 }. Indeed, once can
check that T (e3 ) = e3 ,
   
1 −1
1 1
T (e1 ) = √ 1 and T (e2 ) = √  1 .
2 0 2 0

55
2013-
c Tyler Holden
4 Inner Products and Friends

Clearly R3 = U ⊕ W , and moreover we have


 
1 −1 0
1 
ME (T ) = √ 1 1 √0 . 
2 0 0 2

Exercises

3.3-1. Let T : V → V be a linear operator. Show that {0}, V , ker T , and image(T ) are all T -invariant
spaces.

3.3-2. Suppose T : V → V is a linear operator such that T ◦ T = IV .

(a) Let U± = {v ∈ V : T (v) = ±v}. Show that U+ and U− are T T -invariant subspaces, and
that V = U+ ⊕ U−
 
I 0
(b) Show that there is a basis B in which MB (T ) = k for some k ∈ N.
0 −In−k

3.3-3. Suppose T : V → V is a linear operator such that T ◦T = T (such maps are called projections).

(a) Let U = {v : T (v) = v}. Show that U is a T -invariant subspace of V , and that V =
U ⊕ ker T .
 
Ik 0
(b) Argue that there is a basis B of V in which MB (T ) = for some k ∈ N.
0 0

3.3-4. Let T : V → V be a linear operator, with U ⊆ V a general subspace. Define the set
( k )
X
T i
U = T (ui ) : k ∈ N, ui ∈ U, i = 1, . . . , k ,
i=0

where it is understood that T 0 = I.

(a) Show that U T is a subspace of V .


(b) Show that U T is T -invariant.
(c) Show that if W ⊆ V is a T -invariant subspace such that U ⊆ W , then U ⊆ U T ⊆ W .

4 Inner Products and Friends

In this next portion, we’re going to add an additional structure to our vector spaces. We’ll actually
see a list of three structures, but they’re all closely related.

4.1 Measurement Devices

What our vector spaces have been missing up until now, is some method of measurement. For
example, you’re probably familiar with the idea that a vector should have a length (or magnitude,

56
2013-
c Tyler Holden
4.1 Measurement Devices 4 Inner Products and Friends

if you’re a physicist). If vectors have length, we should be able to measure the distance between
two vectors. If we’re lucky, maybe we can even find the angle between two vectors.
All of this becomes much more abstract as the vector spaces themselves change. For example,
it’s a bit weird to ask ourselves the distance between the vectors f (x) = sin(x) and g(x) = ex in
C(R), or to ask the angle between them. But nonetheless, we can do this.

4.1.1 Inner Products

The inner product is the most power measurement device we’ll examine. It gives the ability to
measure angles, lengths, and distances.
Definition 4.1
Given a real vector space V , an inner product on V is a map h·, ·i : V × V → R satisfying

1. [Symmetric] hx, yi = hy, xi for every x, y ∈ V ,

2. [Linear] hax + by, zi = a hx, zi + b hy, zi for all x, y, z ∈ V and a, b ∈ R

3. [Positive Definite] hx, xi ≥ 0 and hx, xi = 0 if and only if x = 0.

Combining the symmetry and linear properties of an inner product tell us that an inner product
is actually bilinear, or linear in each of its components:

hz, ax + byi = hax + by, zi = a hx, zi + b hy, zi = a hz, xi + b hz, yi .

For this reason, you might see an inner product defined as a positive definite, symmetric, bilinear
mapping.
While there are many different kinds of inner products, the one with which we will be most
concerned is the Euclidean inner product, also known as simply the dot product. Given two vectors
x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in Rn , we write
n
X
hx, yi = x · y = xi yi = x1 y1 + x2 y2 + · · · + xn yn .
i=1

Example 4.2

Show that the Euclidean inner product satisfies the criteria of Definition 4.1.

Solution. Symmetry is evident, so let’s look at linearity. Suppose x, y, z ∈ V and a, b ∈ R, so that


n
X n
X n
X n
X
(ax + by) · z = (axi + byi )zi = [axi zi + byi zi ] = a x i zi + b yi zi
i=1 i=1 i=1 i=1
= ax · z + by · z.

57
2013-
c Tyler Holden
4 Inner Products and Friends 4.1 Measurement Devices

Finally, we show positive definiteness. Indeed,


n
X
x·x= x2i ≥ 0.
i=1

If x = 0 then clearly x · x = 0. Conversely, if x · x = 0, then since x · x is a sum of non-negative


terms, each term in its evaluation must be zero; that is, x2i = 0 for all i = 1, . . . , n. This in turn
implies that xi = 0, so x = 0. 

If V = C[−π, π] is the collection of continuous functions on [−π, π], we can define an inner
product via Z π
1
hf, gi = f (x)g(x) dx. (4.1)
2π −π
In fact, this is one of those most important inner products in mathematics and physics, with an
astounding number of real-life applications like encryption, compression, and signal analysis.
Definition 4.3
If V is an inner-product space, two vectors x, y ∈ V are said to be orthogonal if hx, yi = 0.

Example 4.4

Let C[−π, π] be endowed with the inner product given in (4.1). Show that fn (x) = sin(nπx)
and gm (x) = cos(mπx) are orthogonal for all m, n ∈ Z.

Solution. Recall that5


sin(α − β) + sin(α + β)
sin(α) cos(β) =
2
so that
sin((n − m)πx) + sin((n + m)πx)
sin(nπx) cos(nπx) = .
2
For any k ∈ Z, sin(kπx) is odd on the interval [−π, π], and hence integrates to zero. Thus
Z π Z π
1 1
hfn , gm i = sin(nπx) cos(nπx) dx = sin((n − m)πx) + sin((n + m)πx) dx = 0,
2π −π 4π −π

showing that these vectors are orthogonal for all n, m ∈ Z as required. 

If V = Mn (R), recall that the trace of A ∈ Mn (R) is the sum of its diagonal elements. We can
define the Hilbert-Schmidt inner product

hA, Bi = Tr(AT B). (4.2)

You will show in Exercise 4.1-3 that this is really just the Euclidean inner product.
5
If you’ve never seen this before, use the angle sum identities on the right hand side.

58
2013-
c Tyler Holden
4.1 Measurement Devices 4 Inner Products and Friends

Remark 4.5 Very early in these notes, I mentioned that we’d be dealing with real vector
spaces. More generally, you can study vector spaces over any field F , where we demand
that the scalars must come from F . If F = C for example, we have complex vector spaces.
The only change we must make to the definition of an inner product is that the symmetry
property becomes hx, yi = hy, xi, where the bar indicates complex conjugation.
In both (4.1) and (4.2), throw in a complex conjugate over one of the entries, and you now
have the complex inner products. These two inner products are the inner products used in
Quantum Mechanics. In the former case, Equation (4.1) describes the inner product on the
space of wave functions (the Schrödinger picture), while Equation (4.2) describes the inner
product on the space of quantum operators (the Heisenberg picture).

One final property we’ll need for the next section is the following inequality:
Proposition 4.6: Cauchy-Schwarz

If V is a real vector space with an inner product h·, ·i, then for every x, y ∈ V we have

hx, yi2 ≤ hx, xi hy, yi

with equality if and only if x and y are linearly independent.

Proof. If x = 0 both sides are zero and the result is trivially true, so assume x 6= 0. Let p =
hx, yi / hx, xi, which you may recognize as the projection coefficient of y onto x. Now

hy − px, y − pxi = hy, yi − p hy, xi − p hx, yi + p2 hx, xi


hx, yi2 hx, yi2 hx, xi
= hy, yi − 2 +
hx, xi hx, xi2
hx, yi2
= hy, yi − .
hx, xi

This term is always non-negative by the positive-definite property of the inner product, hence

hx, yi2
hy, yi − ≥0 ⇒ hx, yi2 ≤ hx, xi hy, yi
hx, xi

from which the inequality follows by taking square roots. For equality, note that the first line is
zero if and only if y − px = 0, or y = px, showing that x and y are linearly dependent.

4.1.2 Norms

The next structure is called a norm, and prescribes a way of measuring the length of a vector.

59
2013-
c Tyler Holden
4 Inner Products and Friends 4.1 Measurement Devices

Definition 4.7
Let V be a real vector space. A norm on V is a map k·k : V → R satisfying,

1. [Non-degenerate] kxk ≥ 0 for all x ∈ V and kxk = 0 if and only if x = 0,

2. [Homogeneous] kαxk = |α|kxk for all x ∈ V and α ∈ R,

3. [Triangle Inequality] kx + yk ≤ kxk + kyk.

The Euclidean norm is induced from the Euclidean inner product

n
!1/2 q
p X
kxk := hx, xi = x2i = x21 + x22 + · · · + x2n .
i=1

Recognize that this generalizes the Pythagorean Theorem in R2 , since if x = (x, y) then the vector
x looksplike the hypotenuse of a triangle with side lengths x and y. The length of the hypotenuse
is just x2 + y 2 = kxk. I will leave it as an exercise to show that the Euclidean norm is indeed a
norm.
Above I commented that we could define the Euclidean norm using the Euclidean inner product.
This extends more generally:
Proposition 4.8
p
If (V, h·, ·i) is an inner product space, then kxk = hx, xi defines a norm on V .

Proof. Non-degeneracy and homogeneity follow immediately from the definition of an inner product
as you should check. All the remains to be shown is the triangle inequality, wherein

kx + yk2 = hx + y, x + yi = hx, xi + 2 hx, yi + hy, yi


≤ hx, xi + 2kxkkyk + hy, yi by Cauchy-Schwarz
2
= (kxk + kyk) .

Taking the square root of both sides gives the desired result.

Knowing that inner products induce norms, it’s natural to ask whether every norm comes from
an inner product. The answer is no, via the following theorem which is surprisingly difficult to
prove:
Theorem 4.9

If (V, k·k) is a normed vector space, then the norm is induced by an inner product if and
only if
2kxk2 + 2kyk2 = kx + yk2 + kx − yk2 .

The converse direction is a matter of algebra (Exercise ??), but the forward direction is quite
tricky. If you’re interested in trying it, you first need a candidate for what the inner product should

60
2013-
c Tyler Holden
4.1 Measurement Devices 4 Inner Products and Friends

look like, in terms of the norm. You can check that


kx + yk2 − kx − yk2
hx, yi =
4
is such a candidate.
On the other hand, there are plenty of norms which do not come from inner products. A popular
example on Rn are the p-norms: If p ≥ 1 is a real number, define
n
!1/p
X
kxkp = |xi |p .
i=1

The p-norm comes from an inner product if and only if p = 2, which is precisely the Euclidean
norm.

4.1.3 Metrics

Finally, one has a metric. Metrics are the most flexible, least rigid structure we’ll impose on a
space. Loosely speaking, metrics prescribe a method for determining the distance between two
vectors.
Definition 4.10
A set X with a function d : X × X → R is said to be a metric space if

1. [Symmetry] d(x, y) = d(y, x) for all x, y ∈ X,

2. [Non-degenerate] d(x, y) ≥ 0 for all x, y ∈ X, with d(x, x) = 0 if and only if x = y.

3. [Triangle Inequality] d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ X.

Note that a metric space does not need to be a vector space: The definition of a metric has
no mention of addition, multiplication, or even of an 0 element. However, just as inner products
induced norms, so too do norms induce metrics.
Proposition 4.11

If V is a real vector space with a norm k·k, then the function d : V × V → R given by
d(x, y) = kx − yk, defines a metric.

That the three properties of a metric are satisfied is almost immediate, and I’ll leave the proof
to Exercise 4.1-5. This means that from the Euclidean norm we have the Euclidean metric. If
x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) then the Euclidean metric is
n
!1/2
X p
d(x, y) = kx − yk = (xi − yi )2 = (x1 − y1 )2 + · · · + (xn − yn )2 .
i=1

In R2 , this agrees with the usual distance formula. As with inner product-induced norms, one
can ask whether all metrics are induced by norms. Generally no, with the following proposition
indicating when it is the case.

61
2013-
c Tyler Holden
4 Inner Products and Friends 4.1 Measurement Devices

Proposition 4.12

If V is a vector space, and d is a metric on d, then d is induced by a norm if and only if d


satisfies the following two properties:

1. [Translation Invariance:] d(x + z, y + z) = d(x, y) for all x, y, z ∈ V ,

2. [Homogeneous:] For any c ∈ R, d(cx, cy) = |c|d(x, y)

Proposition 4.12 is relatively straightforward to prove, and has been left to Exercise 4.1-6. A
straightforward example of a metric which is not induced by a norm is the discrete metric. If X is
any set (say a vector space), define
(
0 x=y
d(x, y) = .
1 otherwise

You can check that this is indeed a metric, and that it is not induced by a norm.

Exercises

4.1-1. Let C[−π, π] be endowed with the inner product given in Equation (4.1). We’ve already
shown that {fn : n ∈ Z} and {gm : m ∈ Z} are mutually orthogonal.

(a) Show that hfn , fm i = 0 if n 6= m and hfn , fn i = 1.


(b) Show that hgn , gm i = 0 if n 6= m and hgn , gn i = 1.

4.1-2. Let Mn (R) be endowed with the Hilbert-Schmidt inner product defined in Equation (4.2).
Show that this is indeed an inner product.

4.1-3. Let Mn (R) be endowed with the Hilbert-Schmidt inner product. Fix the isomorphism Mn (R) ∼
=
2
Rn by “stacking the columns.” Show that Hilbert-Schmidt norm agrees with the Euclidean
norm under this transformation.

4.1-4. If V is a vector space, two norms k·k1 and k·k2 are said to be equivalent if there exists positive
real numbers α, β such that α kxk1 ≤ kxk2 ≤ β kxk1 for all x ∈ V .

(a) Show that being equivalent defines an equivalence relation on the set of norms.
(b) Show that the following two norms are equivalent in Rn ,
" n
#
X
kxk1 = max |xi | and kxk2 = x2i .
i∈{1,...,n}
i=1

4.1-5. Prove Proposition 4.11 by showing that the metric induced by a norm is indeed a metric.

4.1-6. Prove Proposition 4.12.

4.1-7. Let V be a vector space, and S = {b1 , . . . , bn } be a set of pairwise orthogonal vectors. If the
zero vector is not in S, show that S is linearly independent.

62
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends

4.1-8. If V is a vector space, a map B : V × V → R is said to be a bilinear form if it is linear in


each of its components; that is, if v1 , v2 , v3 ∈ V and c ∈ R then
B(v1 + cv2 , v3 ) = B(v1 , v3 ) + cB(v2 , v3 ) and B(v1 , v2 + cv3 ) = B(v1 , v2 ) + cB(v1 , v3 ).
(a) Confirm that each map below is bilinear:
i. Multiplication on R: M : R × R → R, (x, y) 7→ xy.
 
ii. The determinant on R2 : D : R2 × R2 → R, (x, y) 7→ det x y .
iii. The inner product on V : I : V × V → R, (x, y) 7→ hx, yi.
(b) Show that if A ∈ Mn (R), the map B : Rn → Rn → R, (x, y) 7→ xT Ay is bilinear.
(c) Show that the converse of (b) is true as well. If B : Rn → Rn → R is a bilinear map,
there is a matrix A ∈ Mn (R) such that B(x, y) = xT Ay.
(d) In light of (b) and (c), show that each property of the matrix is equivalent to the given
property of on the bilinear form:
i. AT = A if and only if B(x, y) = B(y, x) (Such matrices are said to be symmetric)
ii. A is symmetric and all of its eigenvalues are positive if and only if B(x, x) > 0 for
all non-zero x.
4.1-9. Recall in Euclidean R2 that if v, w ∈ R2 , then hv, wi = kvkkwk cos(θ). Let V an arbitrary
inner product space.
hv, wi
(a) Show that −1 ≤ ≤ 1 for all v, w ∈ V .
kvkkwk
(b) Argue that there exists a unique θ ∈ [0, π] such that cos(θ) = hv, wi /kvkkwk. Define
this to be the angle between v and w.
(c) Find the angle between p(x) = 1 + x + x2 and q(x) = 2x − 3 in P2 (R) with the inner
Z 1
product hp, qi = p(x)q(x) dx.
0

4.2 Orthonormal Bases

4.2.1 Orthonormality

If V is an inner product space, Definition 4.3 told us that v, w ∈ V are said to be orthogonal if
hv, wi = 0. We’ve also learned that we can use a basis to reduce most computations in a vector
space to a finite number of computations on the basis elements. To play nicely with an inner
product, we might want to impose additional constraints on our basis:
Definition 4.13
If V is a vector space, a basis B is said to be an orthogonal basis if hv, wi = 0 for all v, w ∈ B
such that v 6= w. It is further said to be an orthonormal basis if it is an orthogonal basis,
and hv, vi = 1 for each v ∈ B.

For example, in R2 the following two bases are both orthonormal in the Euclidean inner product:
        
1 0 1 1 1 1
, and √ ,√ .
0 1 2 1 2 −1

63
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases

Note that any non-zero vector can be made normal by dividing by its norm. Indeed, if v ∈ V is a
non-zero vector, then v/kvk is normal, since

v 1

kvk = kvk kvk = 1.

Thus the only real obstruction to finding an orthonormal basis is to first find an orthogonal one.
In our second orthonormal basis example above, had we started with the orthogonal basis
   
1 1
, ,
1 −1

then we arrive at an orthonormal basis by dividing each by its norm 2.
To see why orthogonal bases are particularly nice, consider the following theorem:
Theorem 4.14

If V is a finite dimensional inner product space with an orthogonal basis B = {b1 , b2 , . . . , bn },


then for any v ∈ V we can write v in the basis B by
n
X hv, bi i
v= bi . (4.3)
i=1
kbi k2

P
Proof. Since B is a basis, we know that v = i ci bi for some choice of ci ∈ R. If we inner product
this against b1 we get
* n
+ k
X X
hb1 , vi = b1 , ci bi = ci hb1 , bi i .
i=1 i=1

Since B is orthogonal, we know that hb1 , bi i is zero for all i 6= 1, and when i = 1 we can rewrite
this as hb1 , bi i = kb1 k2 , so our equations becomes

hb1 , vi
hb1 , vi = c1 kb1 k2 ⇒ c1 = .
kb1 k2

Now this clearly didn’t depend on the fact that we used b1 , and if we repeat this process with any
other bi , we could get ci = hbi , vi /kbi k2 . These are precisely the coefficients given in Equation
(4.3).

Notice then that if B is an orthogonal basis, Theorem 4.14 says that the coordinate transfor-
mation has the form  
hv, b1 i
 kb1 k2 
 
CB (v) =  ...  .
 
hv, bn i
kbn k2

This means that we don’t have to do any row reduction to establish the coordinate transformations.

64
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends

Example 4.15

Consider R2 with the Euclidean inner product. Let B be the orthonormal basis
    
1 1 1 1
B= √ ,√ .
2 1 2 −1
 T
Write the vector v = a b in this basis.

Solution. Since B is orthonormal, we just need to compute hv, bi i for each bi ∈ B. Indeed,
a+b a−b
hv, b1 i = √ and hv, b2 i = √ .
2 2
Thus  
a a+b a−b
= √ b1 + √ b2 . 
b 2 2

We’ll see in the next section that the form of these coefficients has a particularly nice geometric
interpretation.

4.2.2 Orthogonal Projections

Let Rn be endowed with the standard Euclidean inner product (the dot product), and recall form
your previous linear algebra experience that if u, v ∈ Rn , we define the projection of v onto the
subspace generated by u by
hv, ui
proju (v) = u.
kuk2
In the special case where u is normal (kuk = 1), the projection reduces to projv u = hv, ui u. In
fact, the reason that the kuk2 term appears is to ensure that we project onto a normal vector. Per
our discussion in Section 4.2.1, u/kuk is normal, so projecting onto it gives
 
u u hv, ui
v, = u.
kuk kuk kuk2

Nothing about our discussion above was particular to Rn or the Euclidean inner product, so we
can define projections in arbitrary spaces as follows:
Definition 4.16
Let V be an inner product space, and u ∈ V . If v ∈ V is any other vector, then

hv, ui
proju (v) = u
kuk2

is an element of span u, and in that case we say that proju (v) is the projection of V onto
the subspace spanned by u.

65
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases

It should be evident that proju (v) ∈ span {u}, since it is just a scalar multiple of u. One of the
key considerations of this definition is that v − proju (v) is then orthogonal (Definition 4.3) to u,
and hence any element in span {u}. Indeed,

hv, ui
hv − proju (v), ui = hv, ui − hu, ui = 0.
kuk2

Instead of limiting ourselves to one-dimensional subspaces spanned by a single element, we


want to consider how to project into arbitrary subspaces. One method is to abstract it away. For
example, if U is a subspace of V , and we can find a complementary subspace W so that V = U ⊕W ,
then every element of v ∈ V can be written uniquely as v = u + w, where u ∈ U and w ∈ W .
We can then simply define projU (v) = u. However, when we do this we lose the orthogonality
condition, since there’s no reason that elements of U should be orthogonal to elements of V . This
just means we have to be more clever in choosing our complementary subspace.
Definition 4.17
If V is a vector space endowed with an inner product h·, ·i, and S is a subspace, we define
the orthogonal complement to S as

S ⊥ = {v ∈ V : hv, wi = 0, ∀w ∈ S} .

Our goal is to say that V = U ⊕ U ⊥ for any subspace U of V . Before we can do that, we should
check a few things about U ⊥ . These are all good exercises, and have been left to Exercise 4.2-4.
Proposition 4.18

If V is an inner product space, the following are true:

1. If S is a subspace of V , then S ⊥ is a subspace of V .

2. If S ⊆ T ⊆ V are subspaces, then T ⊥ ⊆ S ⊥ .

3. If S is a subspace of V , then S ⊆ (S ⊥ )⊥ , and this is an equality if V is finite dimensional.

Definition 4.19
If V is a finite dimensional inner product space, let U be a subspace with orthogonal basis
{b1 , . . . , bk }. If v ∈ V , then the orthogonal projection of v onto U is
k
X hv, bi i
projU (v) = bi . (4.4)
i=1
kbi k2

Knowing that U ⊥ is a subspace means we’re on the right track to showing that it is comple-
mentary to U . However, we’re going to need a way of writing arbitrary vectors as the sum of an
element in U and an element in U ⊥ . To do that, we’ll use the following:

66
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends

Proposition 4.20

If V is a finite dimensional inner product space and U is a subspace of V , then for any v ∈ V
we have that projU (v) ∈ U and v − projU (v) ∈ U ⊥ .

Proof. Fix an orthogonal basis B = {b1 , . . . , bk } of U . Equation (4.4) makes it clear that projU (v) ∈
span {b1 , . . . , bk } = U , so we need only show the orthogonality part. To show that v − projU (v) ∈
U ⊥, P
we need to show that it is orthogonal to every element of U . Suppose u ∈ U , and write
u = i ci bk , so that
* k + k
X hv, bi i X hv, bi i
hv − projU (v), ui = hv, ui − bi , u = hv, ui − hbi , ui (4.5)
i=1
kbi k2 i=1
kbi k2

Now
* k
+ * k
+ k
X 2
X X
hbi , ui = bi , cj bj = ci kbi k and hv, ui = v, ci bi = ci hv, bi i
i=j i=1 i=1

so substituting this into (4.5) we get

k
X k
X k
X
hv, bi i hv, bi i
hv − projU (v), ui = hv, ui − 2 hbi , ui = ci hv, bi i − ci kbi k2
i=1
kbi k i=1 i=1
kbi k2
k
X k
X
= ci hv, bi i − ci hv, bi i = 0,
i=1 i=1

which is what we wanted to show.

Corollary 4.21

If V is a finite dimensional inner product space and U ⊆ V is a subspace, then V = U ⊕ U ⊥ .

Proof. It suffices to show that V = U + U ⊥ and U ∩ U ⊥ = {0}. To show that the intersection is
trivial, let v ∈ U ∩ U ⊥ be any element. Since v ∈ U ⊥ , we know that hv, wi = 0 for all w ∈ U . But
since v ∈ U this implies that hv, vi = 0. By positive-definiteness of the inner product, v = 0.
To show that V = U + U ⊥ , fix any element v ∈ V . Note that v = projU (v) + (v − projU (v)).
By Proposition 4.20, this is an element of U + U ⊥ , and since v was arbitrary, it must be the case
that V = U + U ⊥ .

Example 4.22

Take V = R3 with the standard inner product, v = (1, 0, −1)T , and let S = span {v}. Find
S⊥.

67
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases

Solution. As dim(S) = 1 we know dim(S ⊥ ) = 2, and as such is a plane. Every element of S looks
like αx = (α, 0, −α) for some α ∈ R, so the orthogonal complement is
S ⊥ = {y : y · x = 0, ∀x ∈ v}
= {(x, y, z) : (x, y, z) · (α, 0, −α) = 0}
= {(x, y, z) : x − z = 0} . 

4.2.3 Gram-Schmidt Orthogonalization

So given a subspace, how do we find or construct an orthogonal basis? The key is to use Proposition
4.20 to inductively construct an orthogonal basis.
Corollary 4.23

Suppose V is a finite dimensional vector space, B = {b1 , . . . , bk } is a orthogonal set, and


U = span B. If v ∈ V is not in U , then v − projU (v) is non-zero, and B ∪ {v − projU (v)} is
an orthogonal set.

Proof. Proposition 4.20 already tells us that v − projU (v) is orthogonal to every element of B, so
it suffices to show that it’s non-zero. Let’s prove the contrapositive; namely, if v − projU (v) = 0
then v ∈ U . This follows quickly, since by hypothesis we must have
k
X hv, bi i
v = projU (v) = bi ∈ span B = U.
i=1
kbi k2

Recall from Exercise 4.1-7 that an orthogonal set which does not contain the zero vector is
automatically linearly independent. Together with Corollary 4.23, this means that if we have an
orthogonal, linearly independent set B, we can build successively larger linearly independent sets
by finding an element v which is not in the span of B, and adding v − projU (v) to our collection.
We are guaranteed that this new set of vectors is also orthogonal and linearly independent. If our
space if finite dimensional, we will eventually exhaust all elements of the space, and hence have
created an orthogonal basis. This is called the Gram-Schmidt Orthogonalization Algorithm:
Proposition 4.24: Gram-Schmidt Orthogonalization

Let V be a vector space, and let U be a finite dimensional subspace. If {b1 , . . . , bk } is a


basis for U , inductively define the vectors fi , i = 1, . . . , k as follows:

1. f1 = b1

2. Assume the vectors fi , i = 1, . . . , ` exist and let U` = span {f1 , . . . , f` }. Define f`+1 =
b`+1 − projU` (b`+1 ).

Then the set F = {f1 , . . . , fk } is an orthogonal basis for U .

Proof. It is evident that F is orthogonal, so we need only show that we span U . For this, we will
proceed by induction, and show that span {b1 , . . . , b` } = span {f1 , . . . , f` } for each ` = 1, . . . , k.

68
2013-
c Tyler Holden
4.2 Orthonormal Bases 4 Inner Products and Friends

Since f1 = b1 , it’s evident that span {b1 } = span {f1 }. Now suppose that span {b1 , . . . , b` } = U` =
span {f1 , . . . , f` }. It suffices to show that b`+1 ∈ span {f1 , . . . , f` , f`+1 } and vice-versa for f`+1 .
By definition, we know that f`+1 = x`+1 − projU` (x`+1 ), and that projU` (x`+1 ) ∈ U` =
P
span {x1 , . . . , x` }. Thus we can find scalars ci such that projU` (x`+1 ) = `i=1 ci xi , and


f`+1 = x`+1 − ci x` ∈ span {x1 , . . . , x`+1 }
i=1

as required. Conversely, we do the same thing. Since projU` (x`+1 ) ∈ U` = span {f1 , . . . , f` } we find
coefficients di such that


f`+1 = x`+1 − projU` (xl+1 ) = x`+1 − di fi .
i=1

By rearranging this equation to solve for x`+1 , we have that x`+1 ∈ span {f1 , . . . , f`+1 }. Both
inclusions give the desired result.

Before doing any examples, note that if we replace fi with cfi , then for any vector v we have

hv, cfi i c2 hv, fi i hv, fi i


2 (cfi ) = 2 fi = fi .
kcfi k 2
|c| kfi k kfi k2

This means that once we’ve found a vector fi , we can multiplying it by a scalar without affecting
the orthogonality algorithm. This is useful, since the projection often involves a lot of fractions,
and carrying those fractions throughout the algorithm becomes cumbersome. On the other hand,
we shouldn’t be surprised by this. Multiplying a vector by a scalar affects neither its span, nor its
orthogonality (hci fi , cj fj i = ci cj hfi , fj i = 0), so this seems like a reasonable thing to do.

Example 4.25

Consider R4 with the standard Euclidean inner product. Find an orthogonal basis of the
space      

 1 0 1 
      
1  ,   , 1 .
1
U = span  0 1 1

 
 
0 0 0

Solution. The question is posed in such a way that we have an obvious basis to work with. Let
b1 , b2 , b3 be the bases, in order, given in the problem statement. Working through our problem,

69
2013-
c Tyler Holden
4 Inner Products and Friends 4.2 Orthonormal Bases

we have
 T
1
1
f 1 = b1 = 
0

0
       
0 1 −1/2 −1
hb2 , f1 i 1 1 1  1/2  1
f̂2 = b2 − f1 =     
1 − 2 0 = 
 take f2 =   ,
kf1 k2 1  2
0 0 0 0
         
1 1 −1 1/3 1
hb3 , f1 i hb3 , f2 i 1 2 1 2  1 −1/3 −1
f̂3 = b3 −         f3 =  
2 f1 − 2 f2 = 1 − 2 0 − 6  2 =  1/3 take  1 .
kf1 k kf2 k
0 0 0 0 0

Thus our resulting orthogonal basis is


            

 1 −1 1   1 −1 1 
       
 1 1 1  1 1 −1
    


1 ,  1 , −1 or √  , √  , √   if orthonormal. 
0  2  1  0 2 1

 
  2
 6 3 

0 0 0 0 0 0

Try Example 4.25 without rescaling f2 , and you’ll see how terrible this computation becomes
in even the simplest of cases.

Exercises

4.2-1. Suppose B = {b1 , . . . , bn } is an orthogonal basis for V . If T : V → V is a linear operator,


determine MB (T ).

4.2-2. Convert the basis 1, x, x2 into an orthogonal basis for P2 (R) using each inner product
below:

(a) hp, qi = p(0)q(0) + p(1)q(1) + p(2)q(2)


Z 1
(b) hp, qi = p(x)q(x) dx.
0

P {b1 , . . . , bn } is an
4.2-3. Suppose P orthonormal basis for a inner product space V . Show that if
v = i ci bi then kvk2 = i c2i

4.2-4. Let V be a finite dimensional inner product space. Show that for any subspace S ⊆ V ,
(S ⊥ )⊥ = S.

4.2-5. Let U, W ⊆ V be subspaces. Show that (U + W )⊥ = U ⊥ ∩ W ⊥ .

4.2-6. Show that v, w ∈ V are orthogonal if and only if kv + wk2 = kvk2 + kwk2 .

70
2013-
c Tyler Holden
4.3 Isometries 4 Inner Products and Friends

4.2-7. Let B = {b1 , . . . , bn } be an orthogonal basis for V . Show that for any v, w ∈ V ,

n
X hv, bi i hw, bi i
hv, wi =
i=1
kbi k4

4.2-8. Let U ⊆ V be a subspace. Show that u ∈ U if and only if u = projU (u).

4.3 Isometries

When we discussed linear transformations and eventually isomorphisms, we talked about how we
wanted our functions to preserve the underlying structure. Since addition and scalar multiplication
were the operations of a vector space, linear transformations should preserve that. With inner
products, norms, and metrics, we’ve added an additional structure.
Definition 4.26
If (V, h·, ·iV ) and (W, h·, ·iW ) are inner product spaces, we say that the linear transformation
T : V → W is an isometry if hT x, T yiW = hx, yiV for all x, y ∈ V .

That is to say, an isometry is a linear transformation which preserves the inner product. Even
more insightful is what an isometry does to the induced norms. If k·kV and k·kW are the norms
induced by the inner products on V and W , then T is an isometry if

kT xk2W = hT x, T xiW = hx, xiV = kxk2V ,

or equivalently, kT xkW = kxkV . This is also the definition of an isometry on a normed vector
space, and it tells us that isometries preserve the length of vectors.
Exercise 4.2-3 gives us nice example of an isometry. If V is a finite dimensional vector space of
dimension n, endowed with an orthogonal basis B = {b1 , . . . , bn }, then the coordinate transform
→ Rn is an isometry if Rn is equipped with the Euclidean inner product. Indeed, note that
CB : V P
if v = i ci bi then Exercise 4.2-3 says that

X
hv, viV = kvk2 = c2i .
i

  P
On the other hand, CB (v) = c1 c2 · · · cn , which has Euclidean norm kCB (v)kEuc = i c2i .
Thus hv, viV = hCB (v), CB (v)iEuc , as required.

71
2013-
c Tyler Holden
4 Inner Products and Friends 4.3 Isometries

Proposition 4.27

If T : V → V is a linear operator on the inner product space V , then the following are
equivalent

1. T is an isometry

2. If B = {b1 , . . . , bn } is an orthonormal basis for V , then so is T B = {T b1 , . . . , T bn }.

3. There exists an orthonormal basis B = {b1 , . . . , bn } of V such that T B =


{T b1 , . . . , T bn } is also an orthonormal basis.

4. kT vk = kvk for all v ∈ V

Proof. Let’s prove (1) ⇒ (2). Since T is an isometry, it preserves orthogonality and the norm, since
hT bi , T bj i = hbi , bj i. Hence T B is orthogonal, and it remains to show that it is a basis. We can
show that T B is a basis simply by showing that it does not contain the zero vector, since then it is
a maximally linearly independent set. Suppose then that T bi = 0 for one of the bi ∈ B, in which
chase
0 = kT bi k2 = hT bi , T bi i = hbi , bi i = kbi k2 .

This implies that bi = 0, which is impossible, since B is a basis. Thus none of the T B elements are
zero, and it’s an orthogonal basis.
The proof of (2) ⇒ (3) P to (3) ⇒ (4). Fix an arbitrary v ∈ V and write
P is trivial, so let’s move
it in the basis B as v = ci bi . Now T (v) = i ci T (bi ), and by Exercise 4.2-3 we have

n
X
2
kT vk = c2i = kvk2 .
i=1

Finally, for (4) ⇒ (1), we know that kT v − T wk2 = kT (v − w)k2 = kv − wk2 . Rewriting this in
terms of the inner product give

hT v − T w, T v − T wi = hT v, T vi + hT w, T wi − 2 hT v, T wi
= hv, vi + hw, wi − 2 hT v, T wi since kT vk2 = kvk2
= hv − w, v − wi since kT (v − w)k2 = kv − wk2
= hv, vi + hw, wi − 2 hv, wi .

Hence it must be the case that hT v, T wi = hv, wi, showing that T is an isometry.

By condition (2) of the above proposition, we can immediately conclude that all isometric
operators are isomorphisms.
Definition 4.28
A matrix A ∈ Mn (R) is said to be orthogonal if AT A = In = AAT .

72
2013-
c Tyler Holden
4.3 Isometries 4 Inner Products and Friends

For example, the matrix  


1 1 1
A= √
2 1 −1
is an orthogonal matrix, since
      
T 1 1 1 1 1 1 2 0 1 0
A A= = = .
2 1 −1 1 −1 2 0 2 0 1

In fact, if you think about how matrix multiplication works, you’ll see that the columns of an
orthogonal matrix are precisely an orthonormalbasis for Rn . This
 is because if we write such an
orthogonal matrix in terms of its columns A = c1 c2 · · · cn then (AT A)ij = cTi cj = hci , cj i.
The requirement that this be the identity matrix says that hci , cj i = 1 if i = j and hci , cj i = 0 if
i 6= j. This is precisely what it means for the columns to be orthonormal, and there are precisely
n of them, so they form a basis.
Proposition 4.29

If V is a finite dimensional inner product space and T : V → V , the following are equivalent:

1. T is an isometry

2. If B is an orthonormal basis of V , then MB (T ) is orthogonal.

3. There exists an orthonormal basis B basis of V in which MB (T ) is orthogonal.

Proof. For (1) ⇒ (2), fix an arbitrary orthonormal basis B = {b1 , . . . , bn } of V , and recall that
n n
 B : V → R is an isometry if R is endowed
C  with the Euclidean inner product. Now MB (T ) =
CB (T (b1 )) CB (T (b2 )) · · · CB (T (bn )) , and the claim that MB (T ) is orthogonal reduces to
checking that its columns are orthonormal in the Euclidean inner product. Indeed,

hCB (T (bi )), CB (T (bj ))iEuc = hT (bi ), T (bj )iEuc = hbi , bj iV .

Thus the orthonormality of the columns of MB (T ) follows from the orthonormality of the original
basis.
The direction (2) ⇒ (3) is trivial since the Gram-Schmidt algorithm ensures the existence of
an orthonormal basis, so all that remains is (3) ⇒ (1). Let B be some orthonormal basis in which
MB (T ) is orthogonal, and again recall that CB : V → Rn is therefore an isometry. Thus
(
1 i=j
hT (bi ), T (bj )iEuc = hCB (T (bi )), CB (T (bj ))iEuc = .
0 i 6= j

Hence T B = {T (b1 ), . . . , T (bn )} is an orthogonal basis, showing that T is an isometry by Propo-


sition 4.27.

Exercises

4.3-1. If V is an inner product space, a map S : V → V is said to be distance preserving if


kSx − Syk = kx − yk, where k·k is the norm induced by the inner product.

73
2013-
c Tyler Holden
4 Inner Products and Friends 4.3 Isometries

(a) Show that if S(0) = 0, then S is a linear transformation.


(b) Conclude that if S(0) = 0, then S is an isometry.

4.3-2. Show that the composition of two isometries is again an isometry.

4.3-3. Let B and C be any two orthonormal bases of an inner product space V . Show that there is
an isometry T : V → V such that T takes B to C.

4.3-4. If V is a vector space, and T : V → V is a linear operator, we define the adjoint of T : V → V


to be the linear transformation T ∗ such that hT x, yi = hx, T ∗ yi for all x, y ∈ V .

(a) If V is Euclidean Rn and A ∈ Mn (R), let TA : Rn → Rn , x 7→ Ax. Show that TA∗ = TAT .
(b) Let V be an arbitrary vector space with basis B, and T : V → V a linear operator. Show
that MB (T ∗ ) = MB (T )T .

A linear operator is symmetric if T ∗ = T .

4.3-5. Let V be a finite dimensional inner product space. If T : V → V is a linear operator, show
that the following are equivalent:

(a) T is symmetric.
(b) MB (T ) is symmetric for every orthonormal basis B of V .
(c) MB (T ) is symmetric for some orthonormal basis B of V .
(d) There is an orthonormal basis B = {b1 , . . . , bn } of V such that hbi , T bj i = hT bi , bj i.

4.3-6. Suppose T : V → V is a linear operator on the inner product space V , and U is a T -invariant
subspace of V . Show that U ⊥ is a T ∗ invariant subspace.

4.3-7. Recall that if V is a vector space, we define its dual space as the vector space V ∗ = L(V, R).

(a) Define a map B : V ∗ × V → R by (f, v) = f (v). Show that B is a bilinear map.


(b) If T : V → V , we define its dual T : V ∗ → V ∗ such that B(f, T v) = B(T ∗ f, v).
i. Let T : Rn → Rn be given by T (x) = Ax for some A ∈ Mn (R). What is T ∗ ?
ii. Let T : P2 (R) → P2 (R) be given by T (p) = p0 . Find T ∗ if the inner product is given
Z 1
by hp, qi = p(x)q(x) dx.
0

4.3-8. (a) If h·, ·i is an inner product on V , for each v ∈ V define the map fv : V → V by
fv (w) = hv, wi. Show that fv ∈ V ∗ for each v ∈ V .
(b) Suppose that V is a finite dimensional vector space. Show that the map T : V → V ∗
given by v 7→ fv is an isomorphism. (This result is also true if the vector space has
countable dimension, and V is “complete” in the induced norm. The proof is nearly
identical, but does require a few extra tricks.)
(c) Show that under the identification V ∼ = V ∗ , we can write h·, ·i : V × V → R as h·, ·i :
V ∗ × V → R, and that in this case the adjoint and dual operators coincide.

4.3-9. Let T : V → V be a linear


 operator. Show that any two of the following conditions imply the
3
third (there are thus 2 = 3 results that you have to show).

74
2013-
c Tyler Holden
4.4 Diagonalizability 4 Inner Products and Friends

(a) T is symmetric.
(b) T is an involution T 2 = id.
(c) T is an isometry.

4.4 Diagonalizability

In Exercise 4.3-4 we learned about the adjoint of a linear operator, and what it means to be
symmetric. Let’s repeat it just so we have it here in front of us:
Definition 4.30
If V is an inner product space and T : V → V is a linear operator, the adjoint of T is a
linear map T ∗ : V → V such that hT x, yi = hx, T ∗ yi for all x, y ∈ V . A linear operator is
symmetric (or self-adjoint) if T = T ∗ .

From Exercise 4.3-4, we know that the adjoint of the matrix representation of a linear trans-
formation is computed through via the transpose. From Section 3.2 we also know that a matrix is
diagonalizable precisely when its eigenvectors form a basis for V . We can be guaranteed that all
symmetric operators are diagonalizable, as we’ll show in this section.
Proposition 4.31

If T : V → V is symmetric linear operator on a finite dimensional vector space, then the


eigenvalues of T are real.

Proof. There is a way to prove this in general, but it requires that we have the notion of a sesquilin-
ear inner product, which isn’t worth introducing just for this proof. So instead, let’s recall that
eigenvalues are invariant under our choice of coordinate representative for the linear transformation.
Fix an orthonormal basis B of V so that A = MB (T ) is a symmetric matrix. Let λ be an
eigenvalue of A, so that Av = λv for some eigenvector v. It suffices to show that λ = λ. Indeed,

λkvk2 = λvT v = vT (λv) = vT Av


= (Av)T v = (λv)T v = λvT v
= λkvk2 .

Equating these gives λ = λ as required.

Indeed, if you look at this proof, you’re seeing something that looks a lot like an inner product.
We know from Exercise 4.1-8 that every inner product in Rn is of the form hx, yi = xT Ay for some
positive definite symmetric matrix A. When our vectors live in C though, we have to change this
to hx, yi = xT Ay, and A becomes a positive definite “Hermitian” matrix.
Anyway, the point of having a symmetric matrix is that it would be impossible to diagonalize
a matrix over Rn if it has complex eigenvalues. Knowing that the eigenvalues of a symmetric
transformation are therefore real gives us a stronger chance at properly diagonalizing.

75
2013-
c Tyler Holden
4 Inner Products and Friends 4.4 Diagonalizability

Proposition 4.32

If T : V → V is a symmetric operator on an inner product space V , and U is a T -invariant


subspace, then

1. T is a symmetric linear operator on U .

2. U ⊥ is also T invariant.

Proof.

1. This is immediate. Since U is itself a vector space, the restriction of h·, ·i to U is also an inner
product space. Moreover, since hT x, yi = hx, T yi holds for every x, y ∈ V , it certainly holds
for every x, y ∈ U .

2. This follows immediately from Exercise 4.3-6. We know that U ⊥ is T ∗ invariant, but since T
is symmetric, T ∗ = T ; namely, U ⊥ is T -invariant.

Theorem 4.33

If T : V → V is a linear operator on a finite dimensional inner product space, then T is


symmetric if and only if V has an orthogonal basis consisting of eigenvectors of T .

Proof. Suppose that T is a symmetric matrix, for which we will proceed by induction on the
dimension of V . If dim V = 1 then every linear operator just acts by scalar multiplication: T (v) =
cv for some c ∈ R. Clearly every vector in R is an eigenvector, and any one of these forms an
orthogonal basis for V , so we’re done.
Therefore, assume that V admits a basis of orthogonal vectors which are eigenvalues for T
whenever dim V = n − 1, and let’s prove the result for dim V = n. Fix an eigenvalue λ of T , which
we know is real, and let bn be an eigenvector associated to λ. Set U = span bn , and let U ⊥ be it’s
orthogonal complement. Clearly U is T -invariant, so U ⊥ is T -invariant. Moreover, dim U ⊥ = n − 1
and T is symmetric on U ⊥ , so U ⊥ admits an orthogonal basis {b1 , . . . , bn−1 } of eigenvectors of T .
The set B = {b1 , . . . , bn } is the desired basis.
Conversely, if B = {b1 , . . . , bn }, then MB (T ) is diagonal, and hence trivially symmetric. By
Proposition 4.29, T is symmetric.

Exercises

4.4-1. Let P2 (R) be endowed with the Euclidean inner product




a1 + b1 x + c1 x2 , a2 + b2 x + c2 x2 = a1 a2 + b1 b2 + c1 c2

and define the linear operator T : P2 (R) → P( R) by

T (a + bx + cx2 ) = (5a + 16b − 14c) + (16a − b + 2c)x + (−14a + 2b + 14c)x2 .

76
2013-
c Tyler Holden
4.4 Diagonalizability 4 Inner Products and Friends

Show that T is symmetric, and find an orthonormal basis for P2 (R) consisting of eigenvectors
of T .

4.4-2. Let T : V → V be a linear operator on an n-dimensional vector space V .Show that T is


symmetric if and only if

• It’s characteristic polynomial pT (λ) splits over R; that is, pT can be written as a product
of linear factors.
• Whenever U is a T -invariant subspace of V , then U ⊥ is also a T invariant subspace of
V.

77
2013-
c Tyler Holden

You might also like