Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

MATH3215 Hilbert Spaces and

Fourier Analysis
Derek Harland

This module is about linear algebra in infinite dimensions. Many things that we
take for granted in finite dimensions (such as existence of a basis or diagonalis-
ability of matrix) are much more subtle in infinite dimensions. This means that
infinite-dimensional linear algebra is more interesting than its finite-dimensional
counterpart, and also that we must proceed much more cautiously in infinite
dimensions. Techniques that you learnt in Analysis will be very important.
Infinite-dimensional linear algebra has many important applications to other
areas of mathematics, including:
• The Fourier series (and its counterpart, the Fourier transform)
• Partial differential equations
• Quantum mechanics
We will focus in the Fourier series in this module. This is the mathematics that
underlies digital storage of music and other media, including MP3 files.

1 Banach spaces and Hilbert spaces


1.1 Normed vector spaces
Definition 1.1. A complex vector space is a set E together with two operations:
addition, which sends x, y ∈ E a vector x + y ∈ E, and scalar multiplication,
which sends λ ∈ C and x ∈ E to λx ∈ E. These are required to satisfy the
following axioms:
• x + y = y + x ∀x, y ∈ E (commutativity)
• (x + y) + z = x + (y + z) ∀x, y, z ∈ E (associativity of addition)
• λ(µx) = (λµ)x ∀λ, µ ∈ C, x ∈ E (associativity of scalar multiplication)
• λ(x + y) = λx + λy ∀λ ∈ C, x, y ∈ E (distributivity of vector addition)
• (λ + µ)x = λx + µx ∀λ, µ ∈ C, x ∈ E (distributivity of scalar addition)
• there is an element ∃0E ∈ E such that 0E + x = x ∀x ∈ X (additive
identity)
• 1x = x ∀x ∈ E, where 1 ∈ C (multiplicative identity)
Warning: I will try to remember to denote the zero-element of a vector space
E by 0E , but will sometimes forget!
Note these axioms imply that every x ∈ E has an additive inverse −x := −1x
such that −x + x = (−1)x + (1)x = (−1 + 1)x = 0x = 0E .

1
Definition 1.2. Let E be a complex vector space and let F ⊂ E. Then F is
called a subspace of E if
(i) 0E ∈ F
(ii) ∀x ∈ F, λ ∈ C λx ∈ F

(iii) ∀x, y ∈ F , x + y ∈ F .
Definition 1.3. Let E be a complex (or real) vector space. A norm on E is a
function k · k : E → R satisfying
(N1) kxk ≥ 0 ∀x ∈ E, and kxk = 0 if and only if x = 0E

(N2) kλxk = |λ|kxk ∀x ∈ E, λ ∈ C (or λ ∈ R)


(N3) kx + yk ≤ kxk + kyk ∀x, y ∈ E.
The pair (E, k · k) is called a normed vector space.

You should think of a norm as being a kind of distance function: kxk is the
“length” of the vector x, and kx − yk is the “distance” between two vectors x
and y. Note that axiom (N3) is often called the triangle inequality.
p √
Example 1.4. Let E = R3 with k(x1 , x2 , x3 )k = x21 + x22 + x23 = x.x. Note
that kxk equals what you would normally call the length of the vector x. We
claim that this is a norm: to prove it, one must check that all three axioms are
satisfied. Doing this is part of your exercise sheet!
Note that the triangle inequality says in this case that the length of one side
of a triangle is no bigger than the sum of the lengths of the other two sides.
This hopefully sounds plausible, even if you haven’t seen a proof. . .
For the next example you will need to recall some definitions from earlier
analysis modules. First, if D ⊂ R then a function f : D → R is called continuous
if
∀y ∈ D ∀ > 0 ∃δ > 0 s.t. |x − y| < δ ⇒ |f (x) − f (y)| < .
A function f : D → C is called continuous if the real and imaginary parts of f
are both continuous. Note that if f : D → C is continuous then the function
D → R, t 7→ |f (t)| is also continuous (prove it using the fact that products and
compositions of continuous functions are continuous!)
Second, if S ⊂ R and M ∈ R are such that ∀x ∈ S, x ≤ M then M is called
an upper bound for S and S is said to be bounded above. If S is bounded above
then the supremum of S, denoted sup S, is the least upper bound for S. In
other words sup S is the unique number such that (i) sup S is an upper bound
for S and (ii) if M is any upper bound for S, then M ≥ sup S. Similarly, if S
is bounded from below the infimum of S, denoted inf S, is the greatest lower
bound for S.

2
Example 1.5. Let a, b ∈ R with a < b and let C[a, b] be the set of continuous
functions f : [a, b] → C. For any f ∈ C[a, b], let

kf k∞ = sup |f (t)| = sup{|f (t)| : t ∈ [a, b]}.


t∈[a,b]

We claim that (C[a, b], k · k∞ ) is a normed vector space. To prove this we must
prove that C[a, b] is a vector space and that k · k∞ is a norm.
It is not completely obvious that C[a, b] is a vector space – you need to
show for example that the sum of two continuous functions is continuous, and
that multiplying a continuous function by a constant gives another continuous
function. Thankfully you will have seen proofs of these in 2nd year analysis, so
I won’t prove them here.
Now we prove that k·k∞ is a norm. First, we need to check that the definition
of k · k∞ makes sense: kf k∞ is only well-defined if the set {|f (t)| : t ∈ [a, b]} is
bounded above. We can prove this using the Maximum Value Theorem from 2nd
year analysis. This states that there exists a c ∈ [a, b] such that |f (t)| ≤ |f (c)|,
so then we know the set has an upper bound |f (c)|. Now we’ll check that k · k∞
satisfies the three axioms:
(N1): Since |f (t)| ≥ 0 ∀t ∈ [a, b], kf k∞ ≥ 0. If f (t) 6= 0 for some t ∈ [a, b]
then kf k∞ > 0, so the only way it can happen that kf k∞ = 0 is if f (t) = 0
∀t ∈ [a, b] i.e. f = 0.
(N2): If λ ∈ C then |λf (t)| = |λ||f (t)| ≤ |λ|kf k∞ because kf k∞ is an
upper bound for {|f (t)| : t ∈ [a, b]}. Therefore |λ|kf k∞ is an upper bound for
{|λf (t)| : t ∈ [a, b]}. Suppose for contradicion that M is another upper bound
for this set which is less than |λ|kf k∞ : then 0 ≤ M < |λ|kf k∞ , so |λ| > 0, and
M/|λ| is an upper bound for {|f (t)| : t ∈ [a, b]} which is less than kf k∞ . This
contradicts the fact that kf k∞ is a least upper bound. Therefore no such M
exists and |λ|kf k∞ equals the least upper bound kλf k∞ for {|λf (t)| : t ∈ [a, b]}.
(N3): Let f, g ∈ C[a, b]. Then for any t ∈ [a, b], |f (t)+g(t)| ≤ |f (t)|+|g(t)| ≤
kf k∞ + kgk∞ , so kf k∞ + kgk∞ is an upper bound for {|f (t) + g(t)| : t ∈ [a, b]}.
Since kf + gk∞ is a least upper bound for this set, kf + gk∞ ≤ kf k∞ + kgk∞ .
Lemma 1.6. Any subspace of a normed vector space is a normed vector space.
Proof. Let (E, k · k) be a vector space and let F ⊂ E be a subspace. Let
k · kF : F → R denote the restriction of k · k : E → R to F . We leave it as an
exercise to check that k · kF is a norm on F .
Example 1.7. Let a < b and let P [a, b] be the set of polynomial functions, i.e.
n
X
P [a, b] = {f : [a, b] → C, f : t 7→ ci ti : n ∈ N, ci ∈ C for i = 1, . . . , n}.
i=0

The P [a, b] is a subset of C[a, b] because polynomials are always continuous.


You can check for yourself that P [a, b] is in fact a subspace of C[a, b]. Since
k · k∞ is a norm on C[a, b], the restriction of k · k∞ to P [a, b] is a norm on P [a, b].

3
The following lemma is very useful:
Lemma 1.8. Let (E, k · k) be a normed vector space. Then

|kxk − kyk| ≤ kx − yk ∀x, y ∈ E.

Proof. By the triangle inequality kx−yk+kyk ≥ kx−y+yk = kxk. Rearranging


this shows that
kxk − kyk ≤ kx − yk.
Similarly, since ky − xk + kxk ≥ kyk,

kyk − kxk ≤ ky − xk = kx − yk.

Definition 1.9. Let E and F be two normed vector spaces and let T : E → F
be a linear map. T is called a linear isometry if kT xkF = kxkE for all x ∈ E. It
is called an isometric isomorphism if it is a linear isometry and an isomorphism
of vector spaces.
One of the main uses of norms is in talking about convergent sequences.
Definition 1.10. Let (E, k·k) be a normed vector space, let (xn ) be a sequence
in E and let x ∈ E. We say that (xn ) converges to x and that x is a limit of
(xn ) if
∀ > 0 ∃N ∈ N such that n > N ⇒ kxn − xk < .
This is denoted xn → x or limn→∞ xn = x.
In the cases (E, k · k) = (R, | · |) and (E, k · k) = (C, | · |), this definition
coincides with the definition of convergence that you encountered earlier in
your mathematical life.
Note that the definition says “a” limit, not “the” limit: there is nothing in
the definition to indicate that limits are unique! Nevertheless, we can prove:
Proposition 1.11. Let (xn ) be a sequence in a normed vector space (E, k · k)
and let x, y ∈ E such that xn → x and xn → y. Then x = y.
Proof. By positivity of norms and by the triangle inequality, 0 ≤ kx − yk =
kx − xn + xn − yk ≤ kx − xn k + kxn − yk for every n ∈ N. Since kx − xn k and
kxn − yk converge to 0 as n → ∞, it must be that kx − yk = 0. Then by axiom
(N1) x − y = 0E i.e. x = y.
Definition 1.12. Let (E, k·k) be a normed vector space. A function F : E → C
(or R) is called continuous if for every x ∈ E and every sequence (xn ) that
converges to x,
lim F (xn ) = F (x).
n→∞

Lemma 1.13. Let (E, k · k) be a normed vector space. Then the function k · k :
E → R is continuous, i.e. whenever (xn ) is a sequence in E that converges to
x, kxn k → kxk.

4
Proof. Let x ∈ E and let (xn ) be a sequence that converges to x. Then

0 ≤ |kxn k − kxk| ≤ kxn − xk.

As n → ∞, kxn − xk → 0 so |kxn k − kxk| → 0 by the squeeze rule.


We’ll spend the rest of this subsection unpicking the meaning of convergence
for sequences of functions in C[a, b]. Recall that there are (at least) two different
types of convergence for sequences of functions:
Definition 1.14. Let a, b ∈ R with a < b, let (fn ) be a sequence of bounded
functions fn : [a, b] → C and let f : [a, b] → C be a bounded function. We say
that (fn ) converges to f uniformly if for every  > 0 there exists an N ∈ N such
that
n > N ⇒ |fn (t) − f (t)| <  for every t ∈ [a, b].
We say that fn → f pointwise if for every t ∈ [a, b] and every  > 0 there exists
an N ∈ N such that

n>N ⇒ |fn (t) − f (t)| < .

Note the subtle difference: to prove uniform convergence we must choose the
same N for every t ∈ [a, b], whereas to prove pointwise convergence we can (if
we wish) choose different N ’s for different values of t.
Another way of thinking about these two types of convergence is the fol-
lowing: uniform convergence means that that the sequence of functions (fn )
converges to f ∈ C[a, b] with respect to the norm k · k∞ ; pointwise convergence
means that for each t the sequence (fn (t)) of complex numbers converges to
f (t).
Uniform and pointwise convergence are related:
Proposition 1.15. A uniformly convergent sequence of functions is pointwise
convergent.
Proof. Let (fn ) be a sequence of functions fn : [a, b] → C and let f : [a, b] → C
such that fn → f uniformly, i.e. kfn − f k∞ → 0. Let t ∈ [a, b]. We must show
that |fn (t) − f (t)| → 0. Note that 0 ≤ |fn (t) − f (t)| ≤ kfn − f k∞ because
kfn − f k∞ is an upper bound for the function fn − f . Therefore by the squeeze
rule for sequences, |fn (t) − f (t)| → 0.
Note however that pointwise convergent sequences need not be uniformly
convergent: here’s a counter-example
Example 1.16. For each n ∈ N let fn : [0, 1] → C be the function

1
2nt
 0 ≤ t ≤ 2n
1
fn (t) = 2 − 2nt 2n ≤ t ≤ n1 .
1

0 n ≤t≤1

Let f : [0, 1] → C be the constant function f (t) = 0. It’s helpful to sketch fn :

5
These functions all belong to C[a, b] (i.e. they are continuous and bounded).
We claim that fn → f pointwise, and that fn 9 f uniformly.
Showing that fn 9 f uniformly is easy: just note that kfn − f k∞ = 1 for
every n ∈ N, because fn has maximum value 1 at t = 1/2n. So kfn − f k∞
certainly doesn’t converge to 0.
Showing that fn → f pointwise is slightly more tricky. If t = 0 then fn (t) = 0
for every n ∈ N and f (t) = 0, so fn (t) → f (t) as n → ∞. If 0 < t ≤ 1 then
fn (t) = 0 for all n > 1/t and f (t) = 0, so again fn (t) → f (t).
Just as there are two types of convergence, there are in fact two types of
continuity. We’ve already encountered one type; here’s the other:

Definition 1.17. Let D ⊂ R. A function f : D → C is called uniformly


continuous if

∀ > 0 ∃δ > 0 s.t. |x − y| < δ ⇒ |f (x) − f (y)| < .

The difference from the previous definition is rather subtle. Given  > 0, to
prove that a function is continuous we need to find a δ > 0 for each y ∈ D, and
we can use different δ’s for different y’s. To prove that a function is uniformly
continuous we must use the same δ for every y ∈ D. To see in more detail how
the two definitions differ, here’s an example:
Example 1.18. Let f : (0, ∞) → C be the function f (t) = 1/t. Then f is
continuous but not uniformly continuous.
I hope you agree that f is continuous, so I will just prove that f is not
uniformly continuous. To do so, I must show you an  > 0 such that

∀δ > 0 ∃x, y ∈ [a, b] with |x − y| < δ and |f (x) − f (y)| ≥ .

Let  = 1, and let δ be any non-negative real number. If δ ≥ 2 choose x = 1,


y = 1/3. Then |x − y| =p2/3 < δ and |f (x) − f (y)| = 2 > 1 = .
If δ < 2 choose x = δ/2 and y = x−δ/2. Note that y > 0 because δ/2 < 1.
Then |x − y| = δ/2 < δ and

1 1 |y − x| |y − x|
|f (x) − f (y)| = − = > = 1 = .
x y |x||y| |x|2

Thus for all possible values of δ I’ve shown you an example of x, y such that
|x − y| < δ and |f (x) − f (y)| ≥ .
The reason this example works is because the slope of the graph of f (x) gets
very steep as x approaches 0. By making x and y small, we can arrange that
|x − y| is small and |f (x) − f (y)| is big.
Thankfully, in the cases that we care about continuity and uniform continuity
are the same:

Proposition 1.19. A function f : [a, b] → C is uniformly continuous if and


only if it is continuous.

6
Proof. First we prove that a uniformly continuous function is continuous. Sup-
pose that f is uniformly continuous. Let y ∈ [a, b] and let  > 0. Since f is uni-
formly continuous there exists a δ > 0 such that |x − z| < δ ⇒ |f (x) − f (z)| < 
for every x, z ∈ D. In particular, in the case z = y we have |x − y| < δ ⇒
|f (x) − f (y)| <  for every x ∈ D. So f is continuous.
We must also prove that a continuous function is uniformly continuous. We
do so using proof by contradiction.
Suppose that there exists a function f : [a, b] → C which is continuous and
not uniformly continuous. Saying that f is not uniformly continuous means
there exists an  > 0 such that

∀δ > 0 ∃x, y ∈ [a, b] with |x − y| < δ and |f (x) − f (y)| ≥ 

For each n ∈ N let δn = 1/n and choose xn , yn ∈ [a, b] such that |xn − yn | < δn
and |f (xn ) − f (yn )| ≥ .
Now (yn )n∈N is a bounded sequence in [a, b] so by the Bolzano-Weierstrass
theoreom it has a convergent subsequence (yn(k) )k∈N (where n(1) < n(2) <
n(3) < . . .) and a limit y ∈ [a, b] such that yn(k) → y as k → ∞. Then it is also
true that xn(k) → y as k → ∞, because

1
0 ≤ |xn(k) − y| ≤ |xn(k) − yn(k) | + |yn(k) − y| < + |yn(k) − y|∀k ∈ N,
n(k)
1
and since both n(k) and |yn(k) − y| tend to zero as k → ∞, |xn(k) − y| → 0 as
k → ∞.
If f is continuous then it is also true that f (yn(k) ) → f (y) and f (xn(k) ) →
f (y) as k → ∞. However, this is impossible because

|f (xn(k) ) − f (yn(k) )| >  > 0 ∀k ∈ N

We have reached a contradiction, so conclude that every continuous function is


uniformly continuous.

1.2 Banach spaces


We have seen that a convergent sequence is a sequence where the points get
closer and closer to a limit. A Cauchy sequence is subtly different: it is a
sequence where the points get closer and closer to each other.

Definition 1.20. Let (E, k · k) be a normed vector space. A sequence (xn ) in


(E, k · k) is called Cauchy if for every  > 0 there exists an N ∈ N such that

n, m > N ⇒ kxn − xm k < .

Proposition 1.21. Every convergent sequence in a normed vector space is


Cauchy.

7
Proof. Let (xn ) be a sequence in a normed vector space (E, k · k) that converges
to a limit x ∈ E. Let  > 0. Since xn → x, there exists an N ∈ N such that

n > N ⇒ kxn − xk < /2.

It follows using the triangle inequality that

n, m > N ⇒ kxn − xm k ≤ kxn − xk + kx − xm k < /2 + /2 = .

In general, it is not true that every Cauchy sequence is convergent. Spaces


where Cauchy sequences do converge are very special, and have their own name:
Definition 1.22. A normed vector space (E, k·k) is said to be complete if every
Cauchy sequence (xn ) in E has a limit x ∈ E. A Banach space is a complete
normed vector space.
You have seen one example of a Banach space before: the Banach space R
of real numbers, with norm of x ∈ R given by |x|. The main difference between
real and rational numbers is that the reals are complete and the rationals are
not.
Completeness is important because it allows you to find solutions to equa-
tions without needing to actually write the solution down. For example, imagine
that you want to write down the solution to the equation x2 = 2. We know that
the solution is irrational so can’t be written down on any finite piece of paper.
However, it is possible to write down a sequence that converges to the solution;
for example, the sequence (xn ) defined by

x2n + 2
x1 = 1, xn+1 =
2xn

is known to converge to 2, as does the sequence 1, 1.4, 1.41, 1.414, . . .. Com-
pleteness is the thing that tells us these sequences have limits. Similarly, a good
strategy for solving equations in Banach spaces is to start by writing a sequence
of approximate solutions and then to show that it is Cauchy and hence has a
limit.
p
Example 1.23. R3 with norm k(x, y, z)k = x2 + y 2 + z 2 is complete.
To prove this assertion, suppose that (xn ) is a Cauchy sequence in R3 . I
need to find an x ∈ R3 such that xn → x.
Consider the sequence (x1n ) in R formed using the first coordinate of each
vector. We claim that this is also Cauchy. To prove the claim, let  > 0.
Since (xn ) is a Cauchy sequence there exists an N ∈ N such that n, m > N ⇒
kxn − xm k < . Then for n, m > N ,

(x1n − x1m )2 ≤ (x1n − x1m )2 + (x2n − x2m )2 + (x3n − x3m )2 = kxn − xm k2 < 2 .

Therefore |xn − xm | < . So this sequence is Cauchy. Similarly, the sequences


(x2n ) and (x3n ) are Cauchy.

8
Since x1n is Cauchy it has a limit x1 ∈ R, and similarly there exist x2 , x3 ∈ R
such that x2n → x2 and x3n → x3 . Now xn converges to the limit x = (x1 , x2 , x3 ),
because v
u 3
uX √
kxn − xk = t (xin − xi )2 → 0 + 0 + 0 = 0
i=1

and therefore kxn − xk → 0. So every Cauchy sequence xn has a limit, and R3


is complete.
Example 1.24. C[a, b] with the supremum-norm k · k∞ is complete.
I will prove this using a similar strategy to the last example. Let (fn ) be a
Cauchy sequence in C[a, b]. I must find an f ∈ C[a, b] such that kfn − f k∞ → 0.
Let t be in any real number in [a, b] and consider the sequence of complex
numbers (fn (t)). I claim that this sequence is Cauchy. To prove this, let  > 0.
Since the sequence (fn ) of functions is Cauchy, I can choose an N ∈ N such that
n, m > N ⇒ kfn − fm k∞ < . Then for n, m > N and t ∈ [a, b],

|fn (t) − fm (t)| ≤ sup |fn (t) − fm (t)| = kfn − fm k∞ < .


t∈[a,b]

Therefore (fn (t)) is Cauchy.


Since the sequence (fn (t)) is Cauchy and C is complete this sequence has
a limit, which we call f (t). Doing this for every t ∈ [a, b] defines a function
f : [a, b] → C such that fn → f pointwise. We need to show that kfn −f k∞ → 0
(i.e. that fn → f uniformly) and that f ∈ C[a, b] (i.e. that f is continuous).
For uniform convergence, let  > 0. Since (fn ) is Cauchy there is an N 0 > 0
|fn (t) − fm (t)| < /2 for all t ∈ [a, b], m, n > N 0 . Since the function | · | : C → R
is continuous, taking the limit as m → ∞ gives

|fn (t) − f (t)| ≤ /2 ∀t ∈ [a, b], n > N.

This implies that kfn − f k∞ ≤ /2 <  for n > N . So kfn − f k∞ → 0.


It follows that f is continuous, because the uniform limit of a sequence of
continuous functions is continuous.
Example 1.25. The subspace P [a, b] ⊂ C[a, b] of polynomial functions with norm
k · k∞ is not complete.
To prove this, we give an example of a Cauchy sequence with no limit. Let
fn ∈ P [a, b] be the function
n
X ti
fn (t) = .
i=0
i!

Note that fn is the order n Taylor series for the exponential function t 7→ exp(t).
By Taylor’s theorem, fn → exp pointwise, and in fact fn → exp uniformly. Thus
kfn − exp k∞ → 0 and it follows that fn is a Cauchy sequence. However, the
limit exp of fn is not in P [a, b], so P [a, b] is not complete.

9
1.3 Inner products
Definition 1.26. Let E be a complex vector space. An inner product on E is
a function h·, ·i : E × E → C satisfying:
(P1) hx, yi = hy, xi ∀x, y ∈ E (skew-symmetry);

(P2) hx, xi ≥ 0, and hx, xi = 0 if and only if x = 0E (positive definiteness);


(P3) hλx + µy, zi = λhx, zi + µhy, zi ∀λ, µ ∈ C, x, y, z ∈ E (linearity in first
slot).
The pair (E, h·, ·i) is called an inner product space in this case.

It follows immediately from the definition that an inner product is anti-linear


in its second slot:
Lemma 1.27. Any inner product on a complex vector space E satisfies

hz, λx + µyi = λ̄hz, xi + µ̄hz, yi ∀λ, µ ∈ C, x, y, z ∈ E.

Proof.

hz, λx + µyi = hλx + µy, zi


= λ̄hx, zi + µ̄hy, zi
= λ̄hz, xi + µ̄hz, yi.

Lemma 1.28. Any inner product on a complex vector space E satisfies hx, 0E i =
0 and h0E , xi = 0 for all x ∈ E.
Proof. Since 0E = 0E − 0E ,

h0E , xi = h0E − 0E , xi = h0E , xi − h0E , xi = 0,

and similarly hx, 0E i = 0.


Note that one can also define inner products over real vector spaces. The
definition is similar, except that C is replaced by R and complex conjugations
are omitted. The dot product of vectors is an example of a real inner product
on R3 . In this course we will stick to complex inner products, but bear in mind
that there is also a “real” version!
Pn
Example 1.29. On Cn , hx, yi = j=1 xj ȳj is an inner product (where we denote
x = (x1 , x2 , . . . , xn )). Proving this is an exercise!
Rb
Example 1.30. On C[a, b], hf, gi = a f (t)ḡ(t)dt is an inner product, called the
L2 inner product.

10
It should be obvious that this function h·, ·i is skew-symmetric and linear
in the first slot. I’ll prove that it’s positive definite. If f : [a, b] → C is any
function, then
Z b
hf, f i = |f (t)|2 dt ≥ 0
a
because |f (t)|2 ≥ 0. Clearly if f = 0 then hf, f i = 0. If f 6= 0 then there exists
a s ∈ [a, b] such that f (s) 6= 0. Since |f (t)| is continuous there exists a δ > 0
such that |f (t)| > 21 |f (s)| for t ∈ I := [a, b] ∩ (s − δ, s + δ). The length of this
interval I is greater than or equal to δ. Therefore
2
δ|f (s)|2
Z Z
f (s)
hf, f i ≥ |f (t)|2 dt ≥ dt ≥ > 0.
I I 2 4
So f = 0 =⇒ hf, f i = 0.
There is one more example to come, but before that we need to present two
important results:
Proposition 1.31 (Cauchy-Schwarz inequality). Let h·, ·i be an innerpproduct
on a complex vector space E. Let k · k : E → R be the function kxk = hx, xi.
Then
|hx, yi| ≤ kxkkyk ∀x, y ∈ E.
Note that hx, xi has a square root because it is positive!
Proof. If x = 0E or y = 0E then the kxkkyk = 0 by definition and hx, yi = 0 by
lemma 1.28.
Suppose then that x 6= 0E and y 6= 0E . For any λ ∈ C it holds that
0 ≤ hλx + y, λx + yi
= λhx, λx + yi + hy, λx + yi
= λλ̄hx, xi + λhx, yi + λ̄hy, xi + hy, yi
= |λ|2 hx, xi + hy, yi + 2<(λhx, yi).
Therefore
|λ|2 hx, xi + hy, yi ≥ −2<(λhx, yi) ∀λ ∈ C.
Let us rearrange this by writing λ = ρeiθ , where ρ = |λ|, and dividing through
by ρ:
1
ρkxk2 + kyk2 ≥ −2<(eiθ hx, yi) ∀ρ > 0, θ ∈ R.
ρ
Remember, this inequality holds for all ρ, θ. If we chose
θ = π − arg(hx, yi)
then the right hand side equals 2|hx, yi|. If we choose
ρ = kyk/kxk
then the left hand side equals 2kxkkyk. Therefore the Cauchy-Schwarz inequal-
ity follows from our inequality.

11
You might be wondering how I knew which values of ρ and θ to choose. The
answer is that they are the values that result in the strongest possible inequality,
i.e. they make the LHS as small as possible and the RHS as large as possible.
This observation might help you to remember the proof!
Proposition 1.32. Let h·, ·i be an inner product
p on a complex vector space E,
and let k · k : E → R be the function kxk = hx, xi. Then k · k is a norm on E.
Notice that the norm k · k is defined using the inner product analogously to
how the length of a vector in R3 is defined using the dot product.
Proof. Axiom (N1) follows directly from the positivity of the inner product (P2).
Axiom (N2) follows from linearity and anti-linearity in the first and second slots:
p q
kλxk = hλx, λxi = λλ̄hx, xi = |λ|kxk.

Finally, the triangle inequality (N3) is proved using the Cauchy-Schwarz in-
equality as follows:

kx + yk2 = hx + y, x + yi
= hx, xi + hy, yi + hx, yi + hy, xi
≤ kxk2 + kyk2 + 2kxkkyk
= (kxk + kyk)2 .

This proposition means that inner product spaces are special examples of
normed vector spaces. Thus one can talk about convergent sequences, Cauchy
sequences in inner product spaces just as in normed vector spaces.
qP
n
Note that the proposition implies that on E = Cn , x 7→ 2
j=1 |xj | is
a norm, as claimed earlier. On C[a, b] we now have two norms: the norm
kf k∞ = supt∈[a,b] |f (t)| introduced earlier, and the norm
! 21
Z b
2
kf k2 = |f (t)| dt
a

derived from the L2 inner product. We distinguish these using the subscript
“2” or “∞”; we refer to the former as the supremum norm and the latter as the
L2 norm.
Now we can look at the promised last example of an inner product space.
2
Example 1.33.
P∞ Let `2 (N) be the set of sequences (xn )n∈N of complex numbers
for which n=1 |xn | converges; equivalently,

( )
X
2 2
` (N) = x = (x1 , x2 , x3 , . . .) : |xn | < ∞ .
n=1

12
`2 (N) is a vector space, with addition and scalar multiplication given by:
x + y = (x1 + y1 , x2 + y2 , x3 + y3 , . . .), λx = (λx1 , λx2 , λx3 , . . .).
For these definitions to make sense we must check that if x, y ∈ `2 (N) then
x + y, λx ∈ `2 (N). By the triangle inequality for Cn ,
  12   12   12   21   12
n
X n
X Xn X∞ X∞
 |xj + yj |2  ≤  |xj |2  + |yj |2  ≤  |xj |2  + |yj |2  .
j=1 j=1 j=1 j=1 j=1

This holds for any n ∈ N, so taking the limit n → ∞ gives


  12   21   12

X ∞
X X∞
 |xj + yj |2  ≤  |xj |2  +  |yj |2  < ∞.
j=1 j=1 j=1

Therefore x + y ∈ `2 (N). I leave it as an exercise to show that λx ∈ `2 (N).


Let h·, ·i : `2 (N) × `2 (N) → C be the function

X
hx, yi = xj y¯j .
j=1

Before doing anything else we should check that this series converges, so that we
know hx, yi is a finite number. For any given n, we know by the Cauchy-Schwarz
inequality for Cn that
 1/2  1/2  1/2  1/2
Xn Xn Xn X∞ ∞
X
xj ȳj ≤  |xj |2   |yj |2  ≤ |xj |2   |yj |2  .
j=1 j=1 j=1 j=1 j=1

Therefore, taking the limit as n → ∞,


 1/2  1/2

X X∞ X∞
xj ȳj ≤  |xj |2   |yj |2  < ∞.
j=1 j=1 j=1

So the series converges, and h·, ·i : `2 (N) × `2 (N) → C is a well-defined function.


Finally, I will show that h·, ·i is an inner product. For positivity, note that
∀n ∈ N
X n
|xj |2 ≥ 0,
j=1

and hence, taking n → ∞, hx, xi ≥ 0. It can be shown similarly that if for some
m ∈ N xm 6= 0, hx, xi ≥ |xm |2 0, so the only way that hx, xi can equal 0 is if
xn = 0 for all n ∈ N. Linearity in the first slot is straightforward: for any n,
n
X n
X n
X
(λxj + µyj )z̄j = λ xj z̄j + µ yj z̄j ,
j=1 j=1 j=1

13
and sending n → ∞ gives hλx + µy, zi = λhx, zi + µhy, zi. Skew-symmetry can
be proved similarly.
We have seen that every inner product defines a norm. Now we will discuss
the question of the converse: can every norm be obtained from an inner product?
Proposition 1.34 (Parallelogram identity). Let E be an inner product space,
and let k · k be the associated norm. Then

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 ∀x, y ∈ E.

Proof.

kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi
= hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi
= 2hx, xi + 2hy, yi
= 2kxk2 + 2kyk2

Now consider the space C[0, 1]. Does the supremum norm satisfy the par-
allelogram identity? Let f, g ∈ C[0, 1] be the functions f (t) = 1, g(t) = t.
Then
kf k∞ = 1, kgk∞ = 1, kf + gk∞ = 2, kf − gk∞ = 1.
So kf + gk2∞ + kf − gk2∞ = 5 and 2kf k2 + 2kgk2 = 4, so k · k∞ does not obey the
parallelogram identity. Therefore it is not obtained from any inner product. So
it’s certainly not true that every norm is obtained from an inner product.
Let’s ask a different question: given any norm, how can we tell whether it
can be obtained from an inner product? The next proposition (which we won’t
prove) answers this question.
Proposition 1.35. Let E be a vector space and let k · k be a norm on E which
satisfies the parallelogram identity. Let
1
kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 .

hx, yi =
4
p
Then h·, ·i is an inner product on E and kxk = hx, xi ∀x ∈ E.
Thus every norm that satisfies the parallelogram identity comes from an
inner product.
Since inner product spaces carry norms one can speak of convergence of
sequences in inner product spaces, and continuous functions on inner product
spaces. The next lemma is a useful result that utilises both of these ideas:
Lemma 1.36. Let E be an inner product space and let y ∈ E. Then x 7→ hy, xi
and x 7→ hx, yi are both continuous functions on E, i.e. for every x ∈ E and
every sequence (xn ) that converges to x,

hy, xn i → hy, xi and hxn , yi → hx, yi.

14
Proof. Let x ∈ E be any vector (xn ) be any sequence that converges to x. Then,
by the Cauchy-Schwarz inequality,

0 ≤ |hy, xn i − hy, xi = hy, xn − xi ≤ kykkxn − xk.

As n → ∞, kxn − xk → 0 and therefore |hy, xn i − hy, xi| → 0. It follows that


hy, xn i → hy, xi and that hxn , yi → hx, yi.

1.4 Hilbert spaces


Definition 1.37. An inner product space which is complete with respect to its
induced norm is called a Hilbert space
Obviously, any Hilbert space is also a Banach space (but not every Banach
space is a Hilbert space).
Example 1.38. Cn with the inner product introduced above is a Hilbert space.
Proving this is part of your exercise sheet!
Example 1.39. `2 (N) is a Hilbert space.
To prove this I must show that `2 (N) is complete. Let (xn ) be a Cauchy
sequence in `2 (N). The n-th element is written xn = (x1n , x2n , x3n , . . .) (i.e. I use
subscripts to denote different points in the sequence and superscripts to denote
the components of the vector).
For each i ∈ N, (xin ) is a Cauchy sequence in C. I leave it as an exercise to
show this – you need to use the fact that (xn ) is a Cauchy sequence and the
inequality,

X
kxin − xim k2 ≤ |xjn − xjm |2 = kxn − xm k2 .
j=1

Since C is complete, there is a limit xi ∈ C such that xin → xi as n → ∞. Now


let
x = (x1 , x2 , x3 , . . .),
i.e. x is the vector formed from all of the limits xi . We claim that x ∈ `2 (N)
and that xn → x.
Let  > 0. Since (xn ) is Cauchy, there exists an N ∈ N such that
k
X 2
|xin − xim |2 ≤ kxn − xm k2 <
i=1
2

for all n, m > N and any k ∈ N. Taking the limit m → ∞ and using the fact
that xim → xi gives
k
X 2
|xin − xi |2 ≤
i=1
2

15
for all n > N and any k ∈ N. Therefore the sum on the left converges, so
xn − x ∈ `2 (N) for all n > N . Moreover,


! 21  12
2
X 
|xin i 2
−x | ≤ < (1)
i=1
2

for all n > N . Since  was arbitrary, this inequality shows that kxn − xk → 0
as n → ∞.
It remains to show that x ∈ `2 (N). Thankfully, most of the work is already
done. It follows from (1) that kx − xn k is finite, and hence that x − xn ∈ `2 (N)
for some particular value of n. Therefore x = (x − xn ) + xn is a sum of two
elements of `2 (N); since `2 (N) is a vector space, x ∈ `2 (N).
Example 1.40. (C[a, b], k · k2 ) is not a Hilbert space. I will show this in the case
a = −1, b = 1, but a similar argument would work in any other case.
For any n ∈ N, let fn : [−1, 1] → C be the function,

0 −1 ≤ t < 0

fn (t) = nt 0 ≤ t < 1/n .

1 1/n ≤ t ≤ 1

This function is continuous (draw a graph!). Moreover, I will now show that
the sequence (fn ) in C[−1, 1] is Cauchy with respect to the norm k · k2 .
Let  > 0. Choose N ∈ N such that N > 1/2 . Then for all n, m > N ,
1 1
Z 1 Z Z
N N 1
kfn − fm k22 = |fn (t) − fm (t)|2 dt ≤ |fn (t) − fm (t)|2 dt ≤ 1dt =
−1 0 0 N

Therefore kfn − fm k ≤ 1/ N < .
Despite the fact that (fn ) is Cauchy, it does not converge to a limit in
C[−1, 1]. I will prove this by contradiction: suppose that there is a function
f ∈ C[−1, 1] such that kfn − f k2 → 0. Then
Z 0 Z 0
kfn − f k22 ≥ |fn (t) − f (t)|2 dt = |f (t)|2 dt ≥ 0.
−1 −1
R0
Since kfn − f k2 → 0 as n → ∞, it must be that −1 |f (t)|2 dt = 0. Since f is
continuous, this implies that f (t) = 0 ∀t ∈ [−1, 0]. Similarly, for any  > 0 and
any n ∈ N such that 1/n < ,
Z 1 Z 1
2 2
kfn − f k2 ≥ |fn (t) − f (t)| dt = |1 − f (t)|2 dt ≥ 0.
 
R1
It follows as above that  |1 − f (t)|2 dt = 0, and hence that f (t) = 1 for
all t ∈ [, 1]. Since this is true for every  > 0, f (t) = 1 for all t ∈ (0, 1].
Therefore the function f is not continuous at 0, contradicting our assumption
that f ∈ C[a, b].

16
Although (C[a, b], k · k2 ) is not a Hilbert space, there is a Hilbert L2 [a, b]
such that C[a, b] is a subspace of L2 [a, b]. Regarded as a subspace, C[a, b] is
“dense” in L2 [a, b], meaning that every element L2 [a, b] is a limit of a sequence
in C[a, b]. The space L2 [a, b] can be constructed using a process of “completion”,
or it can be constructed using “Measure Theory”. In this course we will not
study L2 [a, b] in any detail.

2 Orthonormal bases and the Fourier series


2.1 Bases
There is more than one way to define a basis in a Hilbert space. Here is the
first definition, which actually applies in any kind of vector space:
Definition 2.1. Let E be a vector space and let S be a (possibly infinite)
subset. The span of S is the set of all finite linear combinations of vectors in S:
n
X
sp S = {x ∈ E : x = λi vi for some λ1 , . . . , λn ∈ C, v1 , . . . , vn ∈ S}
i=1

We say that S is linearly independent if for all finite subsets {x1 , . . . , xn } ⊆ S


of distinct vectors xi ,
n
X
∀λ1 , . . . , λn ∈ C, λi xi = 0E =⇒ λ1 = . . . = λn = 0.
i=1

A Haar basis (or algebraic basis) for E is a linearly independent set S ⊆ E such
that sp S = E.
Notice that this definition only allowsPfor finite linear combinations. This is

because talking about infinite sums like i=1 λi xi is dangerous – the sum may
not converge! We’ll see a way to get around that problem later.
Example 2.2. Consider the following vectors in `2 (N):

e1 = (1, 0, 0, 0, . . .)
e2 = (0, 1, 0, 0, . . .)
..
.

Each of these vectors is in `2 (N) because the sum of the squares of its coefficients
converges to 1. Let S = {ei : i ∈ N}. I claim that the set S is linearly inde-
pendent. Let {ei1 , . . . , ein } be any finite subset, where i1 , . . . in are all distinct.
Suppose that
λi1 ei1 + . . . + λin ein = (0, 0, 0, . . .)
By comparing the i1 -th coefficient on the LHS and RHS, we see that λi1 = 0.
Similarly, λi2 = . . . = λin = 0. Therefore S is linearly independent.

17
What is the span of this set? Let c0 (N) be the set of sequences which are
eventually constant and zero:

c0 (N) = {(x1 , x2 , x3 , . . .) : ∃N ∈ N such that n > N =⇒ xn = 0}

I claim that sp S = c0 (N). First I’ll show that sp S ⊆ c0 (N). Let x ∈ sp S. Then

x = λi1 ei1 + . . . + λin ein

for some coefficients λi1 , . . . , λin and vectors ei1 , . . . ein ∈ S. Let N = max{i1 , . . . , in }.
Then for m > N , xm = 0, so x is eventually constant and zero. Now I’ll show
that c0 (N) ⊂ sp S. Let x ∈ c0 (N). Then x is of the form

x = (x1 , x2 , . . . , xN , 0, 0, . . .)
PN
for some N ∈ N. It follows that x = i=1 λi ei , so x ∈ sp S.
Note that sp S 6= `2 (N). For P example, the vector x whose n-th entry is

xn = 1/n belongs to `2 (N) (since n=1 n12 converges), but x is NOT a finite
linear combination of the ei ’s. (It can be proved that `2 (N) has a Haar basis –
however, the basis is uncountable and the proof does not tell you how to write
it down).
The preceding example example illustrates the problem with using a Haar
basis: even though the set S looked like it could be a basis for `2 (N), it turned out
not to be one. We will solve this problem by introducing another (inequivalent)
definition of a basis, one which directly exploits the Hilbert space structure.
Definition 2.3. Let E be an inner product space. Two vectors x, y are called
orthogonal if hx, yi = 0. An orthonormal set in E is a subset S ⊂ E consisting
of pairwise-orthogonal unit vectors. In other words, for all x, y ∈ S,
(
1 if x = y
hx, yi = .
0 if x 6= y

An orthonormal sequence is a sequence (en ) in E such that


(
1 if n = m
hen , em i = .
0 if n 6= m

(This condition is sometimes abbreviated to hen , em i = δnm .

Clearly, if (en ) is an orthonormal sequence then {en : n ∈ N} is an orthonor-


mal set.
Proposition 2.4. Any orthonormal subset of an inner product space is linearly
independent.

18
Proof. Let S be an orthonormal set and let x1 , . . . xn be n distinct vectors in S.
Suppose that
Xn
λi xi = 0E
i=1

for some λ1 , . . . , λn ∈ C. By taking the inner product with x1 , this equation


implies
Xn
λi hxi , x1 i = 0.
i=1

Since the set is orthonormal and the xi ’s are distinct, hxi , x1 i equals 1 when
i = 1 and 0 otherwise. Therefore the equation simplifies to λ1 = 0. Similarly, it
can be shown that λ2 = . . . = λn = 0. So S is linearly independent.
For example, the set S in example 2.2 is an orthonormal set, as you may
check for yourself. This proposition confirms that the set S in example 2.2 is
linearly independent.
Definition 2.5. An orthonormal sequence (en ) in an inner product space is
called a Hilbert basis, or countable orthonormal basis, if for every x ∈ E, there
exists a sequence (λn ) in C such that
n
X
lim λi ei = x.
n→∞
i=1
P∞
We often abbreviate the last condition to i=1 λi ei = x. Notice that the
definition refers to the inner product in two places: first one needs an inner
product to talk about vectors being “orthonormal”, and second, one needs the
induced
P∞norm to talk about convergence. If we did not have a norm the expres-
sion i=1 λi ei would be meaningless.
Definition 2.6. A Hilbert space is called separable if it admits a countable
orthonormal basis.
Example 2.7. `2 (N) is a separable Hilbert space.
We will show that the vectors e1 , e2 , . . . in example 2.2 formP
an orthonormal

sequence. Let x ∈ `2 (N), and write x = (x1 , x2 , x3 , . . .) where i=1 |xi |2 < ∞.
We claim that
X n
x = lim xi ei .
n→∞
i=1

19
Pn
To prove this we must show that kx − i=1 xi ei k → 0 as n → ∞. Now
n 2
X
x− xi ei = k(0, . . . , 0, xn+1 , xn+2 , . . .)k2
i=1

X
= |xi |2
i=n
X∞ n
X
= |xi |2 − |xi |2 .
i=1 i=1
P∞
As n → ∞ the last expression here tends to zero, so x = i=1 xi ei as claimed.
Now we will explore some properties of countable orthonormal sets and bases.
The next lemma generalises the following simple observation: if ~x = (x1 , x2 , x3 )
is a vector in R3 and ~e1 = (1, 0, 0), ~e2 = (0, 1, 0), ~e3 = (0, 0, 1) is the standard
basis, then xi = ~x.~ei for i = 1, 2, 3.
Lemma 2.8. Let (en )n∈N be an orthonormal set in an inner product space E
and suppose that x ∈ E and (λn ) is a sequence in C such that

X
x= λi ei .
i=1

Then λi = hx, ei i for all i ∈ N.


Proof. Since by lemma 1.36 h·, i is continuous, for any i ∈ N we have
* n
+ n
X X
hx, ei i = lim λj ej , ei = lim λj hej , ei i.
n→∞ n→∞
j=1 j=1

Since the vectors (en )n∈N are orthonormal,


n
(
X 0 n<i
λj hej , ei i = .
j=1
λi n≥i
Pn
Therefore limn→∞ j=1 λj hej , ei i = λi .
Corollary 2.9. An orthonormal sequence (en ) in an inner product space E is
a Hilbert basis if and only if, for every x ∈ E,
n
X
x = lim hx, ei iei .
n→∞
i=1

Proof. If (en ) is an orthonormal P basis then, for any x ∈ E, there is a se-



quence
P∞ (λ n ) in C such that x = Pi n= hx, ei i so
Pλ∞i ei . By lemma 2.8, λ
n=1
x = n=1 hx, ei iei . Conversely, if x = n=1 hx, ei iei then x = i=1 λi ei with
λi = hx, ei i.

20
Corollary 2.10 (Parseval’s identity). Let (en )n∈N be a Hilbert basis for an
inner product space E. Then

X
hx, yi = hx, en ihen , yi ∀x, y ∈ E.
n=1
Pn
Proof. By corollary 2.9, x = limn→∞ j=1 hx, ej iej . Therefore, by lemma 1.36,
* n
+ * n + n
X X X
hx, yi = lim hx, ej iej , y = lim hx, ej iej , y = lim hx, ej ihej , yi.
n→∞ n→∞ n→∞
j=1 j=1 j=1

Parseval’s identity is very powerful. It allows us to prove that there is es-


sentially only one example of a separable Hilbert space!
Theorem 2.11. Any infinite-dimensional separable Hilbert space is isometri-
cally isomorphic to `2 (N).
Proof. Let E be a Hilbert space with orthonormal basis (en )n∈N . Let T : E →
`2 (N) be the map

T (x) = hx, e1 i, hx, e2 i, hx, e3 i, . . . ∀x ∈ E.

I will show that T is an isometric isomorphism. Let x ∈ E. By Parseval’s


identity,

X 2
hx, en i = kxk2E .
n=1
P∞
This shows that T (x) ∈ `2 (N). Since kT (x)k2`2 (N) = n=1 |hx, en i|2 , this identity
also shows that kT (x)k`2 (N) = kxkE , i.e. T preserves distances. It follows from
my definition of T that T is linear.
The kernel of T is the zero-dimensional subspace of E, because T (x) = 0`2 (N)
=⇒ kxkE = kT (x)k`2 (N) = 0 =⇒ x = OE . Therefore T is injective.
It remains to show that T is surjective. Let y = (y1 , y2 , . . .) ∈ `2 (N). Let
n
X
xn = yj ej .
j=1

Assuming that this sequence converges to a limit x ∈ E, we will have (by lemma
2.8) that yj = hx, ej i, and hence that T (x) = y.
In order to show that (xn )n∈N is convergent, we first show that it’s Cauchy.
For n ≥ m,
2
n
X n
X ∞
X ∞
X m
X
kxn − xm k2E = yj ej = |yj |2 ≤ |yj |2 = |yj |2 − |yj |2 .
j=m+1 j=m+1 j=1 j=1 j=1
E

21
The right hand side of this equality tends to 0 as m → ∞ because y ∈ `2 (N). It
follows that the sequence (xn )n∈N is Cauchy. Since E is complete, this sequence
converges to a limit x ∈ E, and, as explained above, T (x) = y. So T is
surjective.
Here is one more important example of an orthonormal set:
Example 2.12. Consider the space C[−π, π], with inner product
Z π
1
hf, gi = f (t)ḡ(t)dt.
2π −π

(here I introduced a factor of 1/2π to make various formulae simpler). Let

S = {t 7→ exp(int) : n ∈ Z}.

I claim that S is an orthonormal set in C[a, b]. To prove this, I need to calculate
hen , em i. It is useful first to note the following: for any n ∈ Z such that n 6= 0,
Z π Z π

exp(int)dt = cos nt + i sin nt dt
−π −π
1 π
= [sin nt − i cos nt]t=−π
n
= 0,

since cos nt and sin nt are periodic. On the other hand, if n = 0,


Z π Z π
exp(0it)dt = 1 dt = 2π.
−π −π

Summarising our results, we see that


(
Z π
1 n=0
exp(int)dt =
−π 0 n 6= 0.

It follows that
(
Z π Z π
1 1 1 n=m
hen , em i = exp(int)exp(imt)dt = exp(i(n−m)t)dt = .
2π −π 2π −π 6 m
0 n=

I will prove later in this course that that (en )n∈Z is not just an orthonormal
set but is in fact a Hilbert basis for C[−π, π].
Definition 2.13. Given any function f ∈ C[−π, π] the Fourier coefficients of
f are the complex numbers
Z π
1
fˆ(n) = hf, en i = f (t) exp(−int)dt ∀n ∈ Z.
2π −π

22
(Here the inner product and the functions en are as in the previous example.)
The Fourier series of f is

X ∞
X
hf, ej iej = fˆ(j) exp(ijt).
j=−∞ j=−∞

By corollary 2.9, this series converges to f with respect to the norm k · k2 if


and only if (en )n∈N is a Hilbert basis.

2.2 Finite-dimensional projections


In this section we will investigate the following useful construction:
Definition 2.14. Let S = {e1 , e2 , . . . , en } be a set of n orthonormal vectors
in an inner product space. The projection defined by S is the linear map PS :
E → E defined by
X n
PS (x) = hx, ej iej .
j=1

(Exercise: check that this map is linear!).


In the case that E is finite-dimensional and the vectors in S form a basis,
PS (x) : E → E is just the identity map, by corollary 2.9. More generally,
PS (x) ∈ sp S, and P (x) = x if x ∈ sp S.
You may have encountered maps similar to PS in 3-dimensional geometry.
Recall that a line through the origin in R3 has a direction vector v ∈ R3 (which
is non-zero). The projection of any vector x ∈ R3 onto a line is given by

x 7→ (x.e)e,

where e := v/ v.v. This is similar to PS in the case that S contains just one
vector. Similarly, given a pair e1 , e2 of orthonormal vectors in R3 , the map

x 7→ (x.e1 )e1 + (x.e2 )e2

is the projection onto the plane spanned by e1 and e2 .


Two vectors x, y in an inner product space are called perpendicular if hx, yi =
0.
Lemma 2.15 (Pythagorus). If x and y are two perpendicular vectors in an
inner product space, then kx + yk2 = kxk2 + kyk2 .
Proposition 2.16. Let E be an inner product space and let S ⊂ E be a finite
orthonormal set. Then
(a) x − PS (x) is perpendicular to every vector y ∈ sp S, and
(b) PS (x) is the closest point in x in sp S to x, i.e. ∀y ∈ sp S,

kx − PS (x)k ≤ kx − yk, and kx − PS (x)k = kx − yk ⇔ y = PS (x).

23
Proof. First I show that x − PS (x) is perpendicular to y ∈ sp S. Let e1 , . . . en
be the elements of S. By corollary 2.9, y = PS (y). By direct calculation,
* n n
+
X X
hPS (y), x − PS (x)i = hy, ei iei , x − hx, ej iej
i=1 j=1
n
X n X
X n
= hy, ei ihei , xi − hy, ei ihei , ej ihx, ej i
i=1 i=1 j=1
Xn Xn
= hy, ei ihei , xi − hy, ei ihx, ei i
i=1 i=1
= 0.

Now I show that PS (x) is the closest point in sp S to x. Let y ∈ sp S. Then


PS (x) − y ∈ sp S, so by part (a) and the Pythagorus lemma,

kx − yk2 = kx − PS (x) + PS (x) − yk2 = kx − PS (x)k2 + kPS (x) − yk2 .

Since kPS (x) − yk2 ≥ 0, kx − yk2 ≥ kx − PS (x)k2 . Moreover kx − yk2 =


kx − PS (x)k2 ⇔ kPS (x) − yk2 = 0 ⇔ y = PS (x).
Corollary 2.17. If S and S 0 are two finite orthonormal subsets of an inner
product space E such sp S = sp S 0 , then PS (x) = PS 0 (x) ∀x ∈ E.
Proof. By the previous proposition, kPS (x)−xk ≤ kPS 0 (x)−xk because PS 0 (x) ∈
sp S. Similarly, kPS 0 (x) − xk ≤ kPS (x) − xk because PS (x) ∈ sp S 0 . Therefore
kPS 0 (x) − xk = kPS (x) − xk, and, by the previous proposition, it must be that
PS (x) = PS 0 (x).
This corollary shows that the operator PS does not depend directly on the
orthonormal set S, but only on the finite-dimensional subspace F = sp S. There-
fore we refer to PS as the projection onto the subspace F , and often write it as
PF . The problem now arises: given a finite-dimensional subspace F of an inner
product space, how to we find the projection operator onto F ? To do so, we
need to find an orthonormal basis S = {e1 , e2 , . . . , en }, and calculate the oper-
ator PS . Finding a basis for a finite-dimensional vector space is something you
have hopefully learnt how to do in the past, but you may not have learnt how
to find an orthonormal basis. There is an algorithm, called the Gram-Schmidt
process that takes a basis {f1 , f2 , . . . , fn } for an n-dimensional vector space F
and gives you an orthonormal basis {e1 , e2 , . . . , en }. Here’s how it works:

Step 1. Set e1 = f1 /kf1 k.


Now repeat the following. Suppose that you have found vectors {e1 , e2 , . . . , ek }
where 1 ≤ k ≤ n − 1. To find ek+1 :
Pk
Step 2. Set gk+1 = fk+1 − i=1 hfk+1 , ei iei .
Step 3. Set ek+1 = gk+1 /kgk+1 k.

24
At the end of the process you will have an orthonormal basis {e1 , e2 , . . . , en }.
Note that steps 1 and 3 guarantee that hek , ek i = 1, and step 2 guarantees that
hej , ek i = 0 if j 6= k. Note that in step 2 you are calculating fk+1 − PSk (fk+1 ),
where Sk = {e1 , e2 , . . . ek }. The vector gk+1 that you calculate in step 2 is
always non-zero, because fk+1 ∈ / sp{e1 , e2 , . . . ek } = sp{f1 , f2 , . . . , fk }.
I conclude this section by showing two useful things that you can prove
using finite-dimensional projects. First, I’ll show you a new way to prove the
Cauchy-Schwarz inequality.
Let x, y be two vectors in an inner product space E. We know the Cauchy-
Schwarz inequality is easily proved if x and y are both zero, so suppose without
loss of generality that y 6= 0E . Define e = y/kyk. Then S = {e} is an orthonor-
mal set (because he, ei = 1). If we let u = PS (x) = hx, eie and v = x − u then
u and v are orthogonal (prove this, either by calculating hu, vi directly or by
using the previous proposition). It follows that

kxk2 = kuk2 + kvk2 ≥ kuk2 .

The Cauchy-Schwarz inequality follows almost immediately, because


2
|hx, yi|2
 
y
kuk2 = |hx, ei|2 = x, = .
kyk kyk2

Note that this proof gives some geometrical insight: we have an equality |hx, yi| =
kxkkyk if and only if x = λy or y = λx for some λ ∈ C, i.e. equality holds in the
Cauchy-Schwarz inequality if and only if x and y are linearly dependent.
Using a similar argument, one can prove a more powerful inequality:
Proposition 2.18 (Bessel’s inequality). Let (en )n∈N be an orthonormal se-
quence in an inner product space. Then
Pn Pn
(a) For every n ∈ N, j=1 |hx, ej i|2 ≤ kxk2 , and j=1 |hx, ej i|2 = kxk2 if and
only if x ∈ sp{e1 , . . . , en }.
P∞ 2 2
(b) j=1 |hx, ej i| ≤ kxk .

Proof. For the first part, let Sn = {e1 , . . . , en }. The by part(a) of proposition
2.16, PSn (x) is perpendicular to x − PSn (x) (since PSn (x) ∈ sp Sn ). So, by the
Pythagorus lemma,

kxk2 = kx − PSn (x) + PSn (x)k2 = kx − PSn (x)k2 + kPSn (x)k2 .

Therefore kxk2 ≥ kPSn (x)k2 , and kxk2 = kPSn (x)k2 ⇔ x = PSn (x). If x =
PSn (x) then x ∈ sp Sn , and conversely, if xP
∈ sp Sn then PSn (x) = x. By direct
n
calculation, one finds that kPSn (x)k2 = 2
j=1 |hx, ej i| . This completes the
proof of the first part.
The second part follows from the first part by taking the limit n → ∞.

25
2.3 Fejér’s theorem
In this section we will take our first steps towards our goal of proving that the
Fourier series for a function f ∈ C[−π, π] converges to f . In this section, we
restrict attention to a slightly smaller set of functions.

Definition 2.19. A function f : R → C is called 2π-periodic if f (t + 2π) = f (t)


for all t ∈ R. The space of 2π-periodic continuous functions is the vector space

C(R; 2π) = {f : R → C : f is continuous, and f (t + 2π) = f (t) ∀t ∈ R}.

Let C= [−π, π] = {f ∈ C[−π, π] : f (−π) = f (π)}. Then C= [−π, π] is a


subspace of C[−π, π] (check this!). Given any function f ∈ C= [−π, π], we can
construct a 2π-periodic function g ∈ C(R; 2π) by setting

g(t) = f (t − 2mπ) if (2m − 1)π < t ≤ (2m + 1)π for m ∈ Z.

This g is continuous because f (π) = f (−π). Conversely, given the function g


we can recover f simply by restricting g to the interval [−π, π]. Thus there
is a bijection from C= [−π, π] to C(R; 2π); in fact these two vector spaces are
isomorphic.
In this subsection we will study convergence of the Fourier series for 2π-
periodic functions, and later we will go back to look at functions in C[a, b].
We won’t actually study the Fourier series directly, but instead will study
its Cesàro means.
Definition 2.20. Let x0 , x1 , x2 , . P
. . be a sequence in a complex vector space,

and consider the associated series n=0 xn . The partial sums of this series are
n
X
sn := xi , n ∈ N ∪ {0}.
i=0

The Cesàro means of the series are


n
1 X
σn := si , n ∈ N ∪ {0}.
n + 1 i=0

In other words, the nth Cesàro mean is the mean (average) value of the first
n + 1 partial sums. Recall that the series is said to converge if there is a limit s
such that sn → s (with respect to some norm). It can be shown (and we won’t
do so here) that if sn → s then also σn → s. However, it can happen that the
Cesàro means converge, even when the partial sums do not.
Example 2.21. Let (xn ) be the sequence in C given by xn = (−1)n for n ∈
N ∪ {0}. Then the partial sums of (xn ) are
n
(
X
n 0 if n is odd
sn = (−1) =
i=0
1 if n is even.

26
Clearly, these don’t converge. The Cesáro means are
(
1
if n is odd
σn = 2 n+2
2(n+1) if n is even.

1
Clearly σn → 2 as n → ∞.
The partial sums of the Fourier series of a 2π-periodic continuous function
f are
n
X
sn (f )(t) = fˆ(j) exp(ijt).
j=−n

The Cesàro means are


n
1 X
σn (f )(t) = sn (f )(t).
n+1
k=1

In general, the partial sums of the Fourier series do not converge to f pointwise.
However, it can be proved (and we won’t prove here) that the partial sums do
converge if f is a smooth function. We will only prove that the Cesàro means
of f converge to f . First, we explore a few different ways of writing the partial
sums.

Proposition 2.22. Let f be a 2π-periodic continuous function. Then


n  
X |j|
σn (f )(t) = 1− fˆ(j) exp(ijt).
j=−n
n+1

Proof. From the definitions,


k
n X
X 1 ˆ
σn (f )(t) = f (j) exp(ijt).
n+1
k=0 j=−k

The values of j and k summed over form a triangle in the j, k-plane defined by
−k ≤ j ≤ k and 0 ≤ k ≤ n. For a given j-coordinate, there are n + 1 − |j| points
in this triangle, whose k-coordinates are k = |j|, |j| + 1, . . . , n. Therefore, the
total coefficient of fˆ(j) exp(ijt) in the sum is (n + 1 − |j|)/(n + 1) = 1 − |j|/(n +
1).

For our next way of re-writing the Cesàro means, we need some more termi-
nology.
Definition 2.23. Let f, g be two 2π-periodic continuous functions. The con-
volution of f and g is the function f ∗ g such that
Z π
1
f ∗ g(t) = f (t − s)g(s)ds.
2π −π

27
Note that f ∗ g is also 2π-periodic and continuous.
Definition 2.24. The Fejér kernel is the sequence of 2π-periodic functions
Kn : R → R defined by
n  
X |j|
Kn (t) = 1− exp(ijt).
j=−n
n+1

Now we can now give another way of writing the partial sums using convo-
lution and the Fejér kernel:
Proposition 2.25. Let f be a 2π-periodic continuous function. Then σn (f ) =
Kn ∗ f .
Proof. By the previous proposition and the definition of the Fourier coefficients,
n   Z π
X |j| 1
σn (f )(t) = 1− exp(ijt) f (s) exp(−ijs)ds
j=−n
n+1 2π −π
Z π n  
1 X |j|
= f (s) 1− exp(ij(t − s))ds
2π −π j=−n
n+1
Z π
1
= f (s)Kn (t − s)ds
2π −π
= Kn ∗ f (t).

Before moving on, let’s make a note of a useful property of the convolution:
Proposition 2.26. Convolution is a commutative product; i.e. f ∗ g = g ∗ f for
all 2π-periodic continuous functions f and g.
Proof. First make change of integration variable u = t − s:
Z π
1
f ∗ g(t) = f (t − s)g(s)ds
2π −π
Z t−π
1
=− f (u)g(t − u)du
2π t+π
Z t+π
1
= f (u)g(t − u)du
2π t−π
Z π Z t−π Z t+π
1 1 1
= f (u)g(t − u)du − f (u)g(t − u)du + f (u)g(t − u)du
2π −π 2π −π 2π π

Since f and g are 2π-periodic, the last two terms cancel, leaving g ∗ f (t).
The definition of the Fejér kernel as a sum is a bit difficult to work with.
Fortunately, there is a much neater expression:

28
Proposition 2.27. Kn (t) is a 2π-periodic, even and continuous function, such
that 
n + 1 t=0
Kn (t) = 
sin(n+1) t
2
 1 2
0 < t ≤ π.
n+1 sin t 2

Proof. Kn is 2π-periodic and continuous because it is a sum of 2π-periodic


continuous functions. We leave it as an exercise to check from the definition
that Kn (−t) = Kn (t).
From the definition,
n   n
X |j| 2 X 2 n(n + 1)
Kn (0) = 1− = 2n+1− j = 2n+1− = n+1.
j=−n
n + 1 n + 1 j=1
n + 1 2

It remains to verify the formula for Kn (t) in the case t 6= 0. We start by


noting that

sin(n + 1) 2t ie−i(n+1)t/2 − iei(n+1)t/2 1 − rn+1


t = −it/2 it/2
= e−int/2 ,
sin 2 ie − ie 1−r

where r = eit 6= 1. We recognise the last expression as the formula for a


geometric series; thus

sin(n + 1) 2t
= e−int/2 (1 + r + . . . + rn ).
sin 2t

If we square this expression then the coefficient of 1 will be 1, because there


is one way to write 1 as a product rj rk for 0 ≤ j, k ≤ n. The coefficient of r
will be 2, because r = r0 r1 = r1 r0 can be written as a product in two ways.
The term with the largest coefficient will be that of rn ; this is n + 1, because
rn = rn r0 = rn−1 r1 = . . . = r0 rn . For j ≥ n the coefficients of rj will decrease
with j, reaching 1 when j = 2n. Thus
2
sin(n + 1) 2t

= e−int (1 + 2r + 3r2 + . . . + nrn−1 + (n + 1)rn + nrn+1 + . . . + r2n )
sin 2t
= e−int + 2e−i(n−1)t + 3e−i(n−2)t + . . . + ne−it + (n + 1) + neit + . . . + eint
Xn
= (n + 1 − |j|)eijt
j=−n

= (n + 1)Kn (t).

It’s a good idea to try to plot a graph of Kn (t) for fixed n (I’ll do so in the
lecture).
Now that we have several ways to write down Kn we can summarise some
of its important properties in the next proposition:

29
Proposition 2.28. For any n ∈ N, the Fejér kernel Kn satisfies:
(i) Kn (t) ≥ 0 ∀t ∈ R.
1

(ii) 2π −π
Kn (t) = 1
(iii) For all  > 0 and δ > 0 there exists N,δ ∈ N such that

∀n ∈ N, t ∈ [δ, π], n > N,δ =⇒ Kn (t) < .

Proof. Part (i) follows from the previous proposition as Kn (t) is either a square
of a real number or equals n + 1 (according to the R πvalue of t). Part (ii) follows
1
from the definition of Kn , and the fact that 2π −π
exp(2πijt)dt = 1 if j = 0
and 0 otherwise.
For part (iii), let , δ > 0 be given. By the previous proposition, if t ∈ [δ, π]
then
1 sin2 ((n + 1)t/2)
Kn (t) =
n+1 sin2 (t/2)
1 1
≤ 2
n + 1 sin (t/2)
1 1
≤ 2 .
n + 1 sin (δ/2)

So we choose N,δ ∈ N such that

1
N,δ > .
 sin2 δ/2

It follows that Kn (t) <  whenever t ∈ [δ, π] and n > N,δ .


Finally, we are ready to prove the first important theorem in this section:
Theorem 2.29 (Fejér). Let f be 2π-periodic and continuous. Then the Cesàro
means of the Fourier series of f converge to f uniformly, i.e.

kf − σn (f )k∞ → 0 as n → ∞.

Remember, we wanted to show that the Fourier series for f converges to f .


This theorem doesn’t quite do that, as it only shows that the Cesáro means of
the Fourier series converge. However, in other senses the theorem is very good:
it gives us the best possible type of convergence, namely uniform convergence
(recall that uniform convergence implies pointwise convergence, see proposition
1.15).
Proof. Let  > 0. To show that σn (f ) converges to f uniformly we must find
N ∈ N such that

∀n ∈ N, ∀t ∈ [−π, π], n > N =⇒ |σn (f )(t) − f (t)| < .

30
Let’s think about what this means. By proposition 2.25,
Z π
1
σn (f )(t) = Kn ∗ f (t) = Kn (s)f (t − s)ds.
2π −π

By proposition 2.28 part (ii),


Z π Z π
1 1
f (t) = f (t) Kn (s)ds = Kn (s)f (t)ds.
2π −π 2π −π

Therefore
Z π
1 
|σn (f )(t) − f (t)| = Kn (s) f (t − s) − f (t) ds
2π −π
Z π
1
≤ Kn (s) f (t − s) − f (t) ds. (2)
2π −π

(Here we have used the fact that Kn (s) ≥ 0 – see proposition 2.28 part (i)). We
need to show that the integral on the right is less than  for sufficiently large
n. Our strategy will be to split the domain of integration into a small interval
[−δ, δ] containing the origin, and its complement [−π, −δ) ∪ (δ, π]. We will do
so using the continuity of f and the properties of the Fejér kernel Kn .
First let’s use the continuity of f . Since f is continuous it is also uniformly
continuous, by proposition 1.19. Therefore there exists a δ > 0 such that

∀s, t ∈ R, −δ ≤ s ≤ δ =⇒ |f (t − s) − f (t)| < .
2
Without loss of generality, we may assume δ < π. It follows that
Z δ Z δ
1 1 
Kn (s) f (t − s) − f (t) ds < Kn (s) ds
2π −δ 2π −δ 2
Z π
1 
< Kn (s) ds
2π −π 2

= (3)
2
1

(where we have used that Kn (s) ≥ 0 and 2π −π Kn (s)ds = 1).
Now we’ll make use of the properties Kn (s). Let

M = kf k∞ = sup |f (t)| = sup |f (t)|.


t∈[−π,π] t∈R

By proposition 2.28, we can choose an N ∈ N such that



∀n ∈ N, ∀s ∈ R, n > N and |s| ∈ [δ, π] =⇒ 0 ≤ Kn (s) ≤ .
4M

31
Therefore
Z π Z π
1 1 
Kn (s) f (t − s) − f (t) ds ≤ Kn (s) |f (t − s)| + |f (t)| ds
2π δ 2π δ
Z π
1 
≤ (M + M )ds
2π δ 4M
π−δ
=
2 2π

< (4)
4
Similarly,
Z −δ
1 
Kn (s) f (t − s) − f (t) ds < . (5)
2π −π 4
Therefore, by equations (2) to (5),
  
|σn (f )(t) − f (t)| < + + = ,
2 4 4
as required.

2.4 Convergence of the Fourier series


In the last subsection we studied the Cesáro means of the Fourier series; in this
section we return to studying the Fourier series itself. We wish to show that the
set
{exp(int) : n ∈ Z}
is a countable orthonormal basis for C[−π, π] with respect to the usual inner
product. This is equivalent to showing that the Fourier series for any function
f ∈ C[−π, π] converges to f with respect to the norm k · k2 . We begin with a
few simple observations.
Lemma 2.30. For any f ∈ C[−π, π], kf k2 ≤ kf k∞ .
Proof. By definition,
 Z π  21
1
kf k2 = |f (s)|2 ds and kf k∞ = sup |f (s)|.
2π −π s∈[a,b]

It follows that 0 ≤ |f (s)|2 ≤ kf k∞


2
. Therefore
 Z π  21   12
1 2 1 2
kf k2 ≤ kf k∞ ds = 2πkf k∞ = kf k∞ .
2π −π 2π

Using this lemma we can prove the following corollary of Fejér’s theorem:

32
Corollary 2.31. Let f ∈ C= [−π, π]. Then the Cesàro means σn (f ) of the
Fourier series of f converge to f with respect to the norm k · k2 , i.e.
lim kf − σn (f )k2 = 0.
n→∞

Proof. As explained at the beginning of the last section, if f ∈ C= [−π, π] then


f (−π) = f (π) and f can be extended to a 2π-periodic function g. Therefore,
by the previous lemma,
0 ≤ kf − σn (f )k2 ≤ kf − σn (f )k∞ = kg − σn (g)k∞ .
As n → ∞, kg − σn (g)k∞ → 0 by Fejér’s theorem, so kf − σn (f )k2 → 0 by the
squeeze rule.
The previous corollary applies only to functions in C= [−π, π], whereas we
would like to know about the Fourier series for functions in C[−π, π]. To this
end, we prove the following lemma
Lemma 2.32. Let f ∈ C[−π, π] and let  > 0. Then there is a function
g ∈ C= [−π, π] such that kg − f k2 < .
(Sometimes people paraphrase this result by saying that C= [−π, π] is dense
in C[−π, π]).
Proof. Let (
f (t) −π + δ ≤ t ≤ π
g(t) = ,
(t + π)A + B −π ≤ t < −π + δ
for some constants A, B, δ ∈ R with δ > 0. We need g to belong to C= [−π, π],
so choose the constants A and B to ensure that g is continuous at t = −π + δ
and g(−π) = g(π). Thus we require
Aδ + B = f (−π + δ)
B = f (π).
Solving for A and B gives
f (−π + δ) − f (π)
A= and B = f (π).
δ
With these choices for A, B, we have that g ∈ C= [−π, π].
Now, for any t ∈ [−π, δ − π],
|g(t)| ≤ max{f (π), f (−π + δ)} ≤ kf k∞
so
|f (t) − g(t)| ≤ |f (t)| + |g(t)| ≤ 2kf k∞ .
Therefore
Z π Z δ−π
1 2 1 2 δ
kf − gk22 = f (t) − g(t) dt ≤ f (t) − g(t) dt ≤ 2
4kf k∞ .
2π −π 2π −π 2π

33
Let us choose δ such that
 2

δ < 2π ;
2kf k∞

it then follows that kf − gk2 <  as required.


Now we are able to prove the main theorem of this subsection
Theorem 2.33. For any j ∈ Z, let ej ∈ C[−π, π] be the function ej (t) =
exp(ijt), and let S = {ej : j ∈ Z}. Then S is a countable orthonormal basis
for C[−π, π].
Proof. We have already shown that the set S is orthonormal, so I only need to
show that
Xn
lim f − hf, ej iej = 0.
n→∞
j=−n
2
I will introduce some notation: let Pn be the projection defined by the orthonor-
mal set {e−n , e1−n , . . . , en−1 , en }:
n
X
Pn (f ) := hf, ej iej .
j=−n

Then Pn is the orthogonal projection onto the subspace

Fn = sp{e−n , e1−n , . . . , en−1 , en }.

I need to show that kf − Pn (f )k2 → 0.


Let  > 0. I must find an N ∈ N such that n > N =⇒ kf − Pn (f )k2 < .
First, by lemma 2.32, there exists a g ∈ C= [−π, π] such that kf − gk2 < /2.
By corollary 2.31 there exists an N ∈ N such that

n > N =⇒ kg − σn (g)k2 < /2.

Therefore

n > N =⇒ kf − σn (g)k2 ≤ kf − gk2 + kg − σn (g)k2 < .

Now, for any n ∈ N, σn (g) ∈ Fn , because (by proposition 2.22) σn (g) is


a linear combination of the functions e−n , e1−n , . . . , en−1 , en . Therefore, by
proposition 2.16, Pn (f ) is the closest point in Fn to f , so

∀n ∈ N kf − Pn (f )k2 ≤ kf − σn (g)k2 .

Therefore
∀n ∈ N n > N =⇒ kf − Pn (f )k2 < .

34
Corollary 2.34. The Fourier series for any function f ∈ C[−π, π] converges
to f with respect to the norm k · k2 .
Corollary 2.35 (Parseval’s identity). For any two functions f, g ∈ C[−π, π],
Z π ∞
1 X
f (t)g(t)dt = fˆ(n)ĝ(n).
2π −π n=−∞

P∞ The left hand side is the inner product hf, gi and the right hand side
Proof.
is n=−∞ hf, en ihen , f i (where en (t) := exp(int). Therefore the result follows
from Parseval’s identity (proposition 2.10) and the fact that the en form a
countable orthonormal basis.
Corollary 2.36 (Riemann-Lebesgue lemma). For any function f ∈ C[−π, π],
fˆ(n) → 0 as n → ±∞.
P∞ Rπ
ˆ 2 1
|f (t)|2 dt < ∞.
Proof. By the previous corollary, n=−∞ |f (n)| = 2π −π
Therefore fˆ(n) → 0 as n → ∞, and as n → −∞.

2.5 Solving differential equations


The Fourier series was invented by Joseph Fourier to solve differential equations.
In fact, it is extremely useful for solving many famous partial differential equa-
tions, including the heat equation, the wave equation, and the Poisson equation.
We will discuss the wave equation below, but other equations can be treated in
a similar way.
First we need to introduce some variants of the Fourier series. The Fourier
series involving exponential functions exp(int) that we have studied until now
is the most mathematically elegant Fourier series. However, there are several
other incarnations of the Fourier series that are more useful in applications.
The real Fourier series. Let f : [−π, π] → R be a real continuous function. The
usual (complex) Fourier coefficients and Fourier series for f are:
Z π ∞
1 X
fˆ(n) = f (t) exp(−int)dt, fˆ(n) exp(int)
2π −π n=−∞

Using Euler’s formula, the Fourier series may be rewritten as



X
fˆ(0) + fˆ(n)(cos nt + i sin nt) + fˆ(−n)(cos nt − i sin nt)
 
n=1

X
= fˆ(0) + (fˆ(n) + fˆ(−n)) cos nt + (ifˆ(n) − ifˆ(−n)) sin nt .
 
n=1

35
We therefore define the real Fourier coefficients

1 π
Z
an = (fˆ(n) + fˆ(−n)) = f (t) cos nt dt
π −π
1 π
Z
bn = (ifˆ(n) − ifˆ(−n)) = f (t) sin nt dt
π −π

and real Fourier series



a0 X 
+ an cos nt + bn sin nt .
2 n=1

Like the complex Fourier series, the real Fourier series converges to f . Note
that an is defined for n ≥ 0 and bn is defined for n ≥ 1, and an and bn are both
real as long as f is a real continuous function.
If f is and odd function then an = 0 ∀n, because cos(nt) is even and the
integral of a product of odd and even functions vanishes. Similarly, if f is an
even function then bn = 0 ∀n.
The sine Fourier series. Let f : [0, π] → R be a continuous function such that
f (0) = 0. We construct a Fourier series for f using the following trick. Define
a function f˜ : [−π, π] → R as follows:
(
f (t) 0≤t≤π
f˜(t) =
−f (−t) −π ≤ t < 0.

Now f˜ is continuous, because f (0) = 0, and also odd. The Fourier series for f˜
converges to f˜ on the interval [−π, π], and therefore it also converges to f on
the interval [0, π]. Since f˜ is an odd function, so the coeffients an in its real
Fourier series are all zero and the bn coefficients are given by

1 π ˜ 2 π
Z Z
bn = f (t) sin nt dt = f (t) sin nt dt
π −π π 0

Therefore the series takes the form


∞ Z π
X 2
Bn sin nt, Bn = f (t) sin nt dt.
n=1
π 0

This is the sine Fourier series for f , and Bn are known as the sine Fourier
coefficients. The series converges to f on the interval [0, π].
The cosine Fourier series. Let f : [0, π] → R be any continuous function.
Similar to above, we define a function f˜ : [−π, π] → R:
(
˜ f (t) 0≤t≤π
f (t) =
f (−t) −π ≤ t < 0.

36
This function is continuous and even. Therefore the coeffients bn in its real
Fourier series are zero and the coefficients an are given by

1 π ˜ 2 π
Z Z
an = f (t) cos nt dt = f (t) cos nt dt.
π −π π 0

Therefore the real Fourier series is


∞ Z π
A0 X 2
+ An cos nt, An = f (t) cos nt dt.
2 n=1
π 0

This is the cosine Fourier series for f , and An are the cosine Fourier coefficients.
The series converges to f on the interval [0, π].
One partial differential equation that can be solved using the Fourier series
is the wave equation:
∂2y 2
2∂ y
= c (6)
∂t2 ∂x2
This is a differential equation for a function y(x, t) of two variables. The equa-
tion can be used to model a taut vibrating string: y(x, t) is the displacement
of the string at position x and time t, and c ∈ R is a real constant related to
the physical properties of the string. We assume that the ends of the string
are held fixed at positions x = 0 and x = π; therefore we impose the boundary
conditions
y(0, t) = 0 and y(π, t) = 0 ∀t ≥ 0. (7)
Suppose that at time t = 0,
∂y
y(x, 0) = f (x), (x, 0) = 0 ∀x ∈ [0, π], (8)
∂t
where f (x) is a continuous function of x ∈ [0, π]. These initial conditions de-
scribe a string that has been “plucked”. We aim to model the future oscillation
of the string by solving the wave equation subject to these boundary conditions
and initial conditions.
To solve the system (6), (7), (8), let’s consider the sine Fourier coeffients of
y:
2 π
Z
Bn (t) = y(x, t) sin nx dx.
π 0
(it is wise to use the sine Fourier series rather than any other Fourier series
because y(x, t) vanishes at x = 0 and at x = π). Note that because y depends
on both x and t, the Fourier coefficients depend on time t. Let us suppose that y
solves the wave equation and boundary conditions, and determine a differential

37
equation solved by Bn :
d2 Bn 2 π ∂2y
Z
(t) = (x, t) sin nxdx
dt2 π 0 ∂t2
Z π 2
2 ∂ y
= c2 (x, t) sin nxdx
π 0 ∂x2
Z π  x=π
22 ∂y d 2 2 ∂y
= −c (x, t) sin nxdx + c (x, t) sin nx
π 0 ∂x dx π ∂x x=0
Z π 2
 x=π
2 d 2 d
= c2 y(x, t) 2 sin nxdx − c2 y(x, t) sin nx
π 0 dx π dx x=0
Z π
2
= −c2 n2 y(x, t) sin nxdx.
π 0
Here in the first line we exchanged integrals and derivatives – this is fine as
long as ∂ 2 y/∂t2 exists and is continuous. The two boundary terms arising from
integration by parts vanish because sin(nx) and y(x, t) vanish when x = 0, π.
Our conclusion therefore is that
d2 Bn
(t) + c2 n2 Bn (t) = 0.
dt2
So we have infinitely-many ordinary differential equations (labelled by n ∈
N) to solve. These equations are particular examples of second order linear
equations, whose general solution you should have seen before (in MATH1012,
for example):
Bn (t) = Cn cos cnt + Dn sin cnt.
Here Cn , Dn are real constants for all n ∈ N. To fix these constants we consider
the initial conditions. These are
2 π
Z
Bn (0) = y(x, 0) sin nxdx
π 0
Z π
2
= f (x) sin nxdx,
π 0
Z π
dBn 2 ∂y
(0) = (x, 0) sin nxdx
dt π 0 ∂t
= 0.

From the general solution, Bn (0) = Cn and dBn /dt(0) = −nDn . Therefore
Dn = 0 and
2 π
Z
Cn = f (x) sin nxdx.
π 0
Therefore the solution y(x, t) to the system (6), (7), (8) can be described using
its sine Fourier series:

X ∞
X
y(x, t) = bn sin nx = Cn cos cnt sin nx
n=1 n=1

38
This series converges to y (with respect to the norm k · k2 ) because y(x, t) is
continuous.
To summarise, by calculating the constants Cn you can write down a solu-
tion of the system (6), (7), (8) that takes the form of a series. Note that the
constants Cn are nothing other than the sine Fourier coefficients of the function
f . Note also that we did not prove that the wave equation has a solution; we
merely found a formula for its solution assuming that it has one (proving that
differential equations have solutions is a topic all of its own).
The piece of magic that makes all of this work is the relation
d2
sin nx = −n2 sin nx
dx2
that appeared in our integration by parts formula. This relation can be inter-
preted as saying that sin nx is an eigenvector of the linear operator d2 /dx2 with
eigenvalue −n2 . We will come back to eigenvectors and eigenvalues later in the
course.
Example 2.37. Solve the system (6), (7), (8) with initial conditions given by
f (x) = x(π − x).
To write a series solution, we just need to know the sine Fourier coefficients of
f . These are
2 π
Z
Cn = x(π − x) sin nxdx
π 0
x=π
2 π

− cos nx − cos nx
Z
2
=− (π − 2x) dx + x(π − x)
π 0 n π n x=0
2 π
Z
cos nx
= (π − 2x) dx
π 0 n
x=π
2 π
Z 
sin nx sin nx
=− (−2) 2 dx + (π − 2x) 2
π 0 n n x=0
Z π
4
= sin nxdx
πn2 0
4  x=π
= − cos nx x=0
πn3
4
1 + (−1)n+1

= 3
πn
(
8
3 n is odd
= πn
0 n is even.

Therefore the series solution is


∞ ∞
X X 8
y(x, t) = Cn cos cnt sin nx = cos(2j − 1)ct sin(2j − 1)x.
n=1 j=1
π(2j − 1)3

39
3 Subspaces
In this section we return to our study of subspaces. Recall that so far we
have only discussed finite-dimensional subspaces of inner product spaces – the
study of infinite-dimensional subspaces requires more care. The two important
definitions are:
Definition 3.1. Let E be a normed vector space and let F ⊂ E be a subspace.
We say that F is closed if every convergent sequence in F has its limit in F .
That is, for all sequences (xn ) in F and all points x ∈ E,

xn → x =⇒ x ∈ F.

Definition 3.2. Let E be a normed vector space and let F ⊂ E be a subspace.


We say that F is complete if every Cauchy sequence in F has a limit in F .
Example 3.3. Let E = `2 (N) and let c0 (N) = {(x1 , x2 , x3 , . . .) ∈ `2 (N) : ∃N ∈
N s.t. n > N =⇒ xn = 0} (vectors in c0 (N) are eventually constant and zero).
Then c0 (N) is not a closed subspace.
To prove this, we need to give one example of a convergent sequence in c0 (N)
whose limit is not in c0 (N). Let

y1 = (1, 0, 0, 0, 0, . . .)
y2 = (1, 21 , 0, 0, . . .)
y3 = (1, 21 , 13 , 0, . . .)
..
.

Then (yn ) converges to the vector y = (1, 21 , 13 , 14 , . . .). However, y 6∈ c0 (N)


because non of its entries are zero. So c0 (N) is not closed.
Recall that the space c0 (N) is non only non-closed but is also non-complete.
In fact, closedness and completeness are closely related:
Proposition 3.4. A complete subspace of a normed vector space is closed.
Proof. Let F be a complete subspace of a normed vector space E. Let (xn ) be
a sequence in F and x a vector in E such that xn → x. Since the sequence (xn )
is convergent, it is Cauchy (proposition 1.21). Since F is complete, (xn ) has a
limit y ∈ F . Since limits of sequences are unique (proposition 1.11, x = y so
x ∈ F . Therefore F is closed.
Proposition 3.5. A closed subspace of a complete normed vector space is com-
plete.
Proof. Let F be a closed subspace of a complete normed vector space E. Let
(xn ) be a Cauchy sequence in F . Then (xn ) is also a Cauchy sequence in E.
Since E is complete, (xn ) converges to a limit x ∈ E. Since F is closed, x ∈ F .
Therefore every Cauchy sequence in F has a limit in F .

40
Since the subspace c0 (N) of `2 (N) is not closed it is also not complete, by
proposition 3.4 (you also proved in your exercise sheet that c0 (N) is not com-
plete). We saw in example 1.25 that the subspace P [a, b] of polynomial functions
in C[a, b] is not complete with respect to the norm k · k∞ . Therefore this space
is also not closed, by proposition 3.4.
Example 3.6. Any finite-dimensional subspace of an inner product space is both
complete and closed.
This follows from the fact that any n-dimensional subspace F of an inner
product space E is isometrically isomorphic to Cn (with its usual inner product).
Since Cn is complete (as you showed on an exercise sheet), F must also be
complete, and hence closed by proposition 3.4.
To construct an isometric isomorphism from F to Cn choose an orthonormal
basis {e1 , . . . , en } for F (by using P
the Gram-Schmidt process). Then let L :
n
Cn → F be the linear map L(z) = i=1 zi ei . It’s an exercise to check that L is
an isometric isomorphism!

3.1 Infinite-dimensional projections


Recall that earlier we studied projections onto finite-dimensional subspaces. We
saw that the projection of a vector x ∈ E onto a finite-dimensional subspace F
was the closest point in F to E. Now we will study this idea more generally.
Theorem 3.7. Let F be a complete subspace of an inner product space E and
let x ∈ E. Then there is a unique vector z ∈ F such that

kx − zk ≤ kx − yk ∀y ∈ F.

In other words, there is a unique closest point in F to x.


Proof. If x ∈ F we may choose z = x. Then kx − zk = 0 and it follows from
the definition of a norm that z is the unique closest point in F to x.
If x ∈
/ F we construct z as follows. Let

d = inf{kx − yk2 : y ∈ F.

This infimum exists because the set is bounded from below by zero. Since d is
a greatest lower bound, for every n ∈ N there must exists a yn ∈ F such that
kx − yn k2 < d + 1/n (otherwise d + 1/n would be an upper bound greater than
d). Since d is a lower bound, kx − yn k2 > d ∀n ∈ N. So, by the squeeze rule,
kx − yn k2 → d as n → ∞.
Now we’ll show that this sequence yn must be Cauchy. By the parallelogram
identity,

kyn − ym k2 = k(yn − x) − (ym − x)k2


= 2kyn − xk2 + 2kym − xk2 − kyn + ym − 2xk2
= 2kyn − xk2 + 2kym − xk2 − 4k yn +y
2
m
− xk2
≤ 2kyn − xk2 + 2kym − xk2 − 4d,

41
where in the last line we used the fact that yn +y
2
m
∈ F and the definition of d.
Given  > 0, we may choose N ∈ N such that n > N =⇒ kyn − xk2 − d < 2 /4.
It follows that n, m > N =⇒ kyn − ym k < , so yn is a Cauchy sequence.
Since yn is Cauchy and F is complete yn converges to a limit z ∈ F . By the
continuity of the norm,

kz − xk = lim kyn − xk = d.
n→∞

Therefore kz − xk ≤ ky − xk for all y ∈ F . Thus we have shown existence of


z ∈ F with the required property.
It remains to show uniqueness. Suppose that z 0 ∈ F is another point in F
such that kx−z 0 k2 = d. We must show that in fact z = z 0 . By the parallelogram
identity,
2kx − zk2 + 2kx − z 0 k2 = k2x − z − z 0 k2 + kz − z 0 k2 .
z+z 0
Rearranging and using 2 ∈ F gives

z+z 0 2
kz − z 0 k2 = 4d − 4kx − 2 k ≤ 0.

This implies that kz − z 0 k = 0 and that z = z 0 .


This proof is the first proof we’re we’ve made essential use of completeness.
Although we don’t have a formula for z, we were able to prove its existence by
constructing a Cauchy sequence and appealing to completeness.
Definition 3.8. Let F be a complete subspace of an inner product space E.
The projection onto F is the map PF : E → F such that PF (x) is the closest
point in F to E, i.e.

kPF (x) − xk ≤ ky − xk ∀y ∈ F.

Note that PF (x) exists and is unique by the preceding theorem. However,
we haven’t yet shown that PF is linear.
If F is finite-dimensional with an orthonormal basis {e1 , . . . , en }, proposition
2.16 tells us that PF is the map introduced in definition 2.14:
n
X
PF (x) = hx, ej iej .
j=1

Proposition 3.9. Let F be a complete subspace of an inner product space E,


let x ∈ E and let z ∈ F . Then z = PF (x) if and only if x − z is orthogonal to
F , i.e.
hz − x, yi = 0 ∀y ∈ F.
This proposition implies that, as in finite dimensions, the projection PF is
orthogonal.

42
Proof. Suppose that x − z is perpendicular to F . Then, for any y ∈ F ,
Pythagorus tells us that
kx − yk2 = kx − z + z − yk2 = kx − zk2 + kz − yk2 ≥ kx − zk2 .
So z is the closest point in F to x, so z = PF (x).
Now suppose that z = PF (x) i.e. that z is the closest point in F to x.
Suppose for contradiction that there exists a y ∈ F such that hx − z, yi 6= 0.
By multiplying y with a phase if necessary, we may assume without loss of
generality that hx − z, yi = α where α ∈ R and α > 0. Let t > 0 and consider
kx − (z + ty)k2 = kx − zk2 − 2tα + t2 kyk2 .
If t = α/kyk2 then
α2 α2 kyk2 α2
kx − (z + ty)k2 = kx − zk2 − 2 2
+ 4
= kx − zk2 − .
kyk kyk kyk2
Thus z + ty is a point in F which is closer to x than z, contradicting our
assumption. So x − z is orthogonal to F .
Proposition 3.10. Let F be a complete subspace of an inner product space E.
Then PF : E → F is a linear map.
Proof. Let x1 , x2 ∈ E, λ1 , λ2 ∈ F , X = λ1 x1 +λ2 x2 and Z = λ1 P (x1 )+λ2 P (x2 ).
We must show that P (X) = Z. By proposition 3.9,
hX − Z, yi = λ1 hx1 − P (x1 ), yi + λ2 hx2 − P (x2 ), yi = 0 ∀y ∈ F.
Therefore, again by proposition 3.9, Z = PF (X).

3.2 Orthogonal complements


Definition 3.11. Let F be a subspace of an inner product space E. The
orthogonal complement of F is
F ⊥ = {x ∈ E : hx, yi = 0 ∀y ∈ F }.
Proposition 3.12. The orthogonal complement of a subspace of an inner prod-
uct space is a closed subspace.
Proof. Let F ⊂ E be a subspace of an inner product space E. If w, x ∈ F ⊥ and
λ ∈ C then for all y ∈ F ,
hλx, yi = λhx, yi = 0 and hw + x, yi = hw, yi + hx, yi = 0,
so w + x ∈ F ⊥ and λ ∈ F ⊥ . Also h0E , yi = 0 for all y ∈ F so 0E ∈ F ⊥ .
Therefore F ⊥ is a subspace.
To show F ⊥ is closed, suppose that (xn ) is a sequence in F ⊥ such that
xn → x for some x ∈ E. Then, by continuity of the inner product (lemma
1.36),
hx, yi = lim hxn , yi = lim 0 = 0 ∀y ∈ F.
n→∞ n→∞
⊥ ⊥
So x ∈ F . Therefore F is closed.

43
Example 3.13. Consider E = C[−a, a] with its usual inner product. Let Feven
and Fodd be the spaces of even and odd functions. These are both subspaces of
C[−a, a]. We claim that

Feven = Fodd .
It follows that Fodd is a closed subspace of C[−a, a].
The crucial observation is that
Z a
∀f ∈ Fodd , ∀g ∈ Feven , hf, gi = f (t)g(t)dt = 0
−a

because f (t)g(t) is an odd function.


First, suppose f ∈ Fodd . Then by the above observation, for any g ∈ Feven ,
⊥ ⊥
hf, gi = 0. So f ∈ Feven , and Fodd ⊆ Feven .

Now suppose that f ∈ Feven . We may write f = g + h, where g(t) =
1 1
2 (f (t) + f (−t)) and h(t) = 2 (f (t) − f (−t)). Notice that g is even and h is odd.

Since g ∈ Feven and f ∈ Feven , hf, gi = 0. On the other hand,

hf, gi = hg + h, gi = hg, gi + hh, gi = hg, gi

by the above observation. Therefore kgk22 = 0, so g = 0 and f = h ∈ Fodd .



Thus Feven ⊆ Fodd .

A similar argument shows that Fodd = Feven , so Feven is also a closed sub-
space.
Theorem 3.14. Let H be a Hilbert space and let F ⊂ H be a closed subspace.
Then H = F ⊕ F ⊥ and (F ⊥ )⊥ = F .
Proof. First we show that H = F ⊕ F ⊥ . This means that (i) F ∩ F ⊥ = {0H }
and (ii) for every x ∈ H there exist y ∈ F and z ∈ F ⊥ such that x = y + z.
Firstly, if x ∈ F ∩ F ⊥ then hx, yi = 0 for all y ∈ F (since x ∈ F ⊥ ). In
particular hx, xi = 0 (since x ∈ F ). So x = 0H . Thus F ∩ F ⊥ ⊆ {0H }, and
F ∩ F ⊥ = {0H } because F and F ⊥ are subspaces.
Secondly, since F is closed and H is complete F is complete. Therefore there
is a projection PF : H → F . Given x ∈ H, let y = PF (x) and z = x − PF (x) so
that x = y + z. Then y ∈ F by definition, and z ∈ F ⊥ by proposition 3.9. So
H = F ⊕ F ⊥.
Now we show that (F ⊥ )⊥ = F .
If x ∈ F then hx, yi = 0 for all y ∈ F ⊥ , so x ∈ (F ⊥ )⊥ . Therefore

F ⊆ (F ⊥ )⊥ .

Now suppose that x ∈ (F ⊥ )⊥ . Since PF (x) ⊆ (F ⊥ )⊥ , x − PF (x) ∈ (F ⊥ )⊥ .


On the other hand, by proposition 3.9, x − PF (x) ∈ F ⊥ . Then x − PF (x) =
F ⊥ ∩ (F ⊥ )⊥ = {0H }, so x = PF (x). Thus x ∈ F , and

(F ⊥ )⊥ ⊆ F.

Altogether, we have that F = (F ⊥ )⊥ .

44
4 Linear operators
4.1 Bounded linear operators
Recall that a linear operator (or linear map) T is a function T : E → F between
two vector spaces such that:
(i) T (x + y) = T (x) + T (y) ∀x, y ∈ E and;
(ii) T (λx) = λT (x) ∀λ ∈ C.
In this section we will study linear operators of the following type:

Definition 4.1. Let E, F be two normed vector spaces (with E non-trivial)


and let T : E → F be a linear operator. Then T is called bounded if the set

{kT ek : e ∈ E and kek = 1}

is bounded from above. If T is bounded we define the operator norm of T by

kT kop = sup{kT ek : e ∈ E and kek = 1} = sup kT ek.


e∈E, kek=1

The set of all bounded linear operators from E to F is denoted B(E, F ).


Example 4.2. Let E be an inner product space and let F ⊂ E be a complete
subspace such that F 6= {0E }. Then PF : E → F is a bounded linear operator,
and kPF kop = 1.
Note that PF is a linear operator, by proposition 3.10. First I’ll show that PF
is bounded. Let e ∈ E be such that kek = 1. Then, since he − PF (e), PF (e)i = 0
(proposition 3.9)

1 = kek2 = ke − PF (e) + PF (e)k2 = ke − PF (e)k2 + kPF (e)k2 ≥ kPF (e)k2 .

Therefore kPF (e)k ≤ 1. It follows that the set

{kPF (e)k : e ∈ E and kek = 1}

is bounded above by 1. Therefore PF is bounded.


Since 1 is an upper bound for the set described above and kPF kop is the
least upper bound, kPF kop ≤ 1. It remains to show kPF kop ≥ 1. Let e ∈ F be
any vector in F with length 1 (such a vector exists because F 6= {0E }). Then
PF (e) = e, so kPF kop ≥ kPF (e)k = kek = 1. Therefore kPF kop = 1 as claimed.
Example 4.3. Let E be an inner product space and let y ∈ E be any vector.
Let fy : E → C be the function such that

fy (x) = hx, yi.

Then fy is a bounded linear operator and kfy kop = kyk.

45
The fact that fy is a linear operator follows immediately from the definition
of the inner product (check this!). Let e ∈ E be any vector of unit length; then
by the Cauchy-Schwarz inequality

|fy (e)| = |he, yi| ≤ kekkyk = kyk.

Therefore the operator fy is bounded. Moreover kfy kop ≤ kyk, because kyk is
an upper bound and kfy kop is a least upper bound for the set

{|fy (e)| : e ∈ E and kek = 1}

It remains to show kfy kop ≥ kyk. Suppose that y 6= 0E and let e = y/kyk.
Then
hy, yi
|fy (e)| = = kyk.
kyk
Then kfy kop ≥ kyk, because kyk belongs to the set above and kfy kop is an upper
bound for this set. Therefore kfy kop = kykop when y 6= 0E . The case y = 0E is
left as an exercise.
Example 4.4. Consider C[−π, π] with the usual inner product and let T :
C[−π, π] → C[−π, π] be the operator T (f ) = df
dt . Then T is not bounded.
Consider for example the function en (t) = exp(int), where n ∈ N. Then
ken k2 = 1 and T (en ) = inen so kT (en )k2 = n. Thus the set {T (f ) : f ∈
C[−π, π], kf k2 = 1} is not bounded above, so T is not bounded.
The next lemma provides an alternative way to check boundedness of linear
operators:
Lemma 4.5. Let E, F be two normed vector spaces (with E non-trivial) and
let T : E → F be a linear operator. Then T is bounded if and only if

∃M ≥ 0 such that kT xk ≤ M kxk ∀x ∈ E. (M)

If T is bounded then the smallest such M is equal to kT k.


Proof. Suppose that condition (M) holds and let Mmin be the smallest M for
which this condition holds. If e ∈ E and kek = 1 then kT (e)k ≤ Mmin kek =
Mmin . Therefore T is bounded. Moreover, kT kop ≤ Mmin (because kT kop is a
least upper bound).
Conversely, suppose that T is bounded. Let x ∈ E. If x 6= 0E then
 
x
kT (x)k = kxk T ≤ kxkkT kop .
kxk

If x = 0E then kxk = 0 = kT (x)k, so it still holds that kT (x)k ≤ kT kop kxk.


Thus (M) holds with M = kT kop . Moreover, Mmin ≤ kT kop (because Mmin ≤
M ).
Thus we have established equivalence of boundedness and condition (M).
If either holds we have shown that kT kop ≤ Mmin ≤ kT kop , and hence that
kT kop = Mmin .

46
Example 4.6. Let T : Cn → Cn be the map T (x) = Ax, where A is the diagonal
n × n matrix:  
λ1 0 0 ··· 0
 0 λ2 0 · · · 0 
A= .
 
.. .. 
 .. . . 
0 0 0 ··· λn
I claim that T is bounded and that kT kop = L, where L := maxi |λi |. Let
x ∈ Cn . Then
n
! 12 n
! 12
X X
kT (x)k = |λi xi |2 ≤ L2 |xi |2 = Lkxk.
i=1 i=1

So T is bounded by lemma 4.5, and kT kop ≤ L. Now let i be such that |λi | = L
and let x ∈ Cn be such that xi = 1 and xj = 0 if j 6= i. Then

kT (x)k = kλi xk = |λi |kxk = Lkxk.

Since kT (x)k ≤ kT kop kxk, L ≤ kT kop . Therefore kT kop = L.


More generally, any linear operator with (non-trivial) finite-dimensional do-
main is bounded.
Recall that previously we defined continuous functions from E → C. The
following definition generalises this:
Definition 4.7. Let E and F be two normed vector spaces and let T : E → F
be a function. Then T is called continuous if for every convergent sequence in
(xn ) in E such that xn → x, T (xn ) → T (x) in F .
The next proposition shows why bounded linear operators are important:
Proposition 4.8. A linear operator is bounded if and only if it is continuous.
Proof. Now suppose that T : E → F is a bounded linear operator. Let (xn ) be
a convergent sequence in E such that xn → x. Then by lemma 4.5

kT (xn ) − T (x)k = kT (xn − x)k ≤ kT kop kxn − xk.

So T (xn ) → T (x)k by the squeeze rule. Therefore T is continuous.


Now let T : E → F be a continuous linear operator. Suppose for contradic-
tion that T is not bounded. Then for each n ∈ N there must be a vector en ∈ E
such that ken k = 1 and kT (en )k ≥ n. Let xn = en /n. Then kxn k = 1/n so
xn → 0E as n → ∞. Since T is continuous it follows that T (xn ) → T (0E ) = 0F .
However, kT (xn )k = kT (en )k/n ≥ 1, so T (xn ) 9 0F , and we have reached a
contradiction. Thus every continuous linear operator is bounded.
Recall that, given two linear maps S, T : E → F , the sum S + T is the linear
map defined by
(S + T )(x) = S(x) + T (x) ∀x ∈ E.

47
Similarly, for λ ∈ C the linear map λT is defined by

(λT )(x) = λT (x) ∀x ∈ E.

The space of linear maps from E to F forms a vector space with respect to
these operations. Note that the zero-vector in this vector space is the map
0E,F : E → F such that 0E,F (x) = 0F ∀x ∈ E.
Proposition 4.9. Let E, F be two normed vector spaces. Then (B(E, F ), k·kop )
is a normed vector space.
Proof. We first show that B(E, F ) is a subspace of the vector space of all linear
maps E → F . Let S, T ∈ B(E, F ) and λ ∈ C. We must show that S + T ∈
B(E, F ), λT ∈ B(E, F ) and 0E,F ∈ B(E, F ).
Let e ∈ E with kek = 1. Then

k(S + T )(e)k = kS(e) + T (e)k ≤ kS(e)k + kT (e)k ≤ kSkop + kT kop

by the triangle inequality for F and the definition of k · kop . Therefore S + T ∈


B(E, F ), and
kS + T kop ≤ kSkop + kT kop .
Also,
k(λT )(e)k = kλT (e)k = |λ|kT (e)k ≤ |λ|kT kop .
Therefore λT is bounded, and

kλT kop ≤ |λ|kT kop .

Finally 0E,F ∈ B(E, F ) because k0E,F k = 0 (Exercise!).


Now we show that k · kop is a norm. We have already established the triangle
inequality.
Let λ ∈ C and T ∈ B(E, F ). If λ = 0 then kλT kop = k0E,F kop = 0 and
|λ|kT kop = 0, so kλT kop = |λ|kT kop . Now consider the case that λ 6= 0. By the
inequality proved above, |λT |op ≤ |λ|kT kop . By the same inequality,

1 1
kT kop = λT ≤ kλT kop
λ op |λ|

so |λ|kT kop ≤ kλT kop . Therefore kλT kop = |λ|kT kop .


Finally, we show positivity. By definition kT kop is the supremum of a set of
positive numbers, so kT kop ≥ 0. As has already been noted, if T = 0E,F then
kT kop = 0.
Suppose that T 6= 0E,F . then there exists a vector x such that T (x) 6= 0F .
Clearly x 6= 0E . Let e = x/kxk; then kek = 1 and T (e) 6= 0F . So kT kop ≥
kT (e)k > 0. Thus T 6= 0E,F =⇒ kT kop > 0.
Proposition 4.10. E be a non-trivial normed vector space and let F be a
complete normed vector space. Then B(E, F ) is complete.

48
Proof. Let (Tn ) be a Cauchy sequence in B(E, F ). We claim that (kTn kop ) is a
Cauchy sequence in R. By lemma 1.8,

∀n, m ∈ N, kTn kop − kTm kop ≤ kTn − Tm kop .

Let  > 0. Since (Tn ) is Cauchy there exists an N ∈ N such that n, m > N =⇒
kTn − Tm kop < . So n, m > N =⇒ |kTn kop − kTm kop < . Therefore (kTn kop )
is a Cauchy sequence. Since R is complete this Cauchy sequence converges; let

L = lim kTn kop .


n→∞

Since kTn kop ≥ 0∀n ∈ N, L ≥ 0.


Now let x ∈ E. We claim that (Tn (x)) is a Cauchy sequence in F . This is
clearly the case if x = 0E , so consider the case x 6= 0E . Note that

kTn (x) − Tm (x)k ≤ kTn − Tm kop kxk

by proposition 4.5. Given  > 0 we can choose N such that n, m > N =⇒


kTn − Tm kop < /kxk. Then n, m > N =⇒ kTn (x) − Tm (x)k <  so Tn (x) is
Cauchy. Since F is complete this Cauchy sequence converges; let

T (x) = lim Tn (x).


n→∞

By doing this for every x ∈ E we obtain a map T : E → F . I claim that T


is linear, that T is bounded, and that Tn → T with respect to k · kop . This is
enough to show that B(E, F ) is complete.
For linearity, let x, y ∈ E and λ, µ ∈ C. Then

T (λx + µy) = lim Tn (λx + µy) = λ lim Tn (x) + µ lim T (y) = λT (x) + µT (y)
n→∞ n→∞ n→∞

by the linearity of limits (exercise sheet 1, problem 6). Therefore T is linear.


To show that T is bounded, note that by the continuity of k · k (lemma 1.13),

kT (x)k = lim Tn (x) = lim kTn (x)k ≤ lim kTn kkxk = Lkxk.
n→∞ n→∞ n→∞

This holds for all x ∈ E so T is bounded by lemma 4.5.


Finally, I show that Tn → T . Let  > 0. Since (Tn ) is Cauchy I can choose
N ∈ N such that n, m > N =⇒ kTn − Tm kop < . Let x ∈ E. Then

n, m ≥ N =⇒ kTn (x) − Tm (x)k < kxk.

Taking m → ∞ and using the continuity of k · k gives

n > N =⇒ kTn (x) − T (x)k ≤ kxk.

Since this holds for every x ∈ E, so by lemma 4.5

n > N =⇒ kTn − T kop ≤ .

Therefore Tn → T with respect to the norm k · kop .

49
To conclude this subsection we focus on a special case.
Definition 4.11. Let E be a normed vector space. The dual of E is the space
E ∗ = B(E, C) of bounded linear operators from E to C. Elements of B(E, C)
are called bounded linear functionals.
By propositions 4.9 and 4.10, E ∗ is a Banach space (i.e. a complete normed
vector space).
For example, if E is an inner product space and y ∈ E we saw in example
4.3 that the map
fy : E → C, x 7→ hx, yi
is a bounded linear functional. In general, not all bounded linear functionals
R1
are of this type (for example, the functional T : C[−1, 1] → C, T (g) = 0 g(t)dt
is not). However, things are different if E is complete:
Theorem 4.12 (Riesz-Fréchet). Let H be a Hilbert space. Then the map F :
H → H ∗ given by F : y 7→ fy is a bijective anti-linear isometry.
Proof. Saying that F is “anti-linear” means that fλx+µy = λ̄fx + µ̄fy ∀x, y ∈
H, λµ ∈ C. This follows immediately from the definition of fy and the properties
of the inner product.
Saying that F is an isometry means that kfy kop = kyk ∀y ∈ H. We showed
this in example 4.3.
Since F is an anti-linear isometry it is automatically injective: if fx = fy
then kx − yk = kfx−y kop = kfx − fy kop = 0 so x = y.
Finally, we must show that F is surjective. Let T ∈ H ∗ . We seek a y ∈ H
such that T = fy . Let

K = ker T = {x ∈ H : T (x) = 0}.

In the case K = H, T (x) = 0 for all x ∈ H. Therefore we may choose y = 0H


so that T = fy .
Now consider the case that K 6= H. First I show that K is closed. If (xn ) is
a sequence in K and x ∈ H such that xn → x then
 
T (x) = T lim xn = lim T (xn ) = lim 0 = 0,
n→∞ n→∞ n→∞

because T is continuous. Therefore x ∈ K, so K is closed.


Since K is closed, H = K ⊕ K ⊥ , by theorem 3.14. Since K 6= H this means
that K ⊥ 6= 0H . Therefore I can choose a non-zero vector z ∈ K ⊥ . Note that
T (z) 6= 0 because z ∈
/ K. I will choose

y = λz

for some λ ∈ C.
Let x ∈ H be any vector. I can write
 
T (x) T (x)
x= x− z + z.
T (z) T (z)

50
The second vector on the right is a scalar multiple of z, so belongs to K ⊥ . The
first belongs to K = ker T , because
 
T (x) T (x)
T x− z = T (x) − T (z) = 0.
T (z) T (z)
Therefore
λ̄kzk2
   
T (x) T (x)
hx, λzi = λ̄ x − z, z + λ̄ z, z =0+ T (x).
T (z) T (z) T (z)

Therefore if we choose λ = T (z)/kzk2 we will have that hx, yi = T (x). So


T = fy , where y = T (z)z/kzk2 .

4.2 Self-adjoint operators and the adjoint of a bounded


linear operator
The main theorem and definition in this section are
Definition 4.13. Let E be a two inner product spaces and let T : E → F be
a bounded linear operator. A linear operator T ∗ : F → E is called the adjoint
of T if
hT (x), yi = hx, T ∗ (y)i ∀x ∈ E, y ∈ F.
An operator T : E → E is called Hermitian or self-adjoint if T is its own
adjoint, i.e. if
hT (x), yi = hx, T (y)i ∀x, y ∈ E.
Theorem 4.14. Let H be a Hilbert space and let T : H → H be a bounded linear
operator. Then there exists a unique bounded linear operator T ∗ : H → H such
that
hT (x), yi = hx, T ∗ (y)i ∀x, y ∈ H.
Moreover, kT ∗ k ≤ kT k.
Self-adjoint operators are useful for two reasons: first, lots of the operators
that people need to study (for example in quantum mechanics and in the the-
ory of differential equations) are self-adjoint; and second, it’s easier to prove
theorems about self-adjoint operators than more general operators.
Example 4.15. The projection operator PF : H → H onto a closed subspace
F ⊂ H of a Hilbert space H is self-adjoint.
To show this, let x, y ∈ H. Then by proposition 3.9,

hPF (x), yi = hPF (x), y − PF (y)i + hPF (x), PF (y)i = hPF (x), PF (y)i,

since PF (x) ∈ F . Similarly, hx, PF (y)i = hPF (x), PF (y)i. So

hPF (x), yi = hx, PF (y)i,

i.e. PF is self-adjoint.

51
Example 4.16. Let L, R : `2 (N) → `2 (N) be the operators,

L(x1 , x2 , x3 , . . .) = (x2 , x3 , x4 , . . .)
R(x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .).

They are known as the “left shift” and “right shift” operators. Note that these
are both bounded, since for a unit vector x,
  12

 12 X
kL(x)k = |x2 |2 + |x3 |2 + |x4 |2 + . . . ≤ |xj |2  = 1
j=1

and   12

 12 X
kR(x)k = 02 + |x1 |2 + |x2 |2 + . . . = |xj |2  = 1.
j=1

I claim that R = L∗ . To show this, let x, y ∈ `2 (N). By direct calculation,

hL(x), yi = x2 ȳ1 + x3 ȳ2 + x4 ȳ3 + . . .


hx, R(y)i = x1 × 0 + x2 ȳ1 + x3 ȳ2 + . . . .

So hL(x), yi = hx, R(y)i for all x, y ∈ `2 (N) i.e. R = L∗ .


It is also true that L = R∗ – I leave this as an exercise.
Now I’ll prove the main theorem 4.14.
Proof. Let T : H → H be a bounded linear operator on a Hilbert space H.
First, I’ll show that the function T ∗ : H → H is unique (assuming that it
exists). Suppose T1∗ and T2∗ are two functions satisfying

hx, T1∗ (y)i = hT (x), yi = hx, T2∗ (y)i ∀x, y ∈ H.

Then
hx, T1∗ (y) − T2∗ (y)i = 0 ∀x, y ∈ H.
In particular, choosing x = T1∗ (y) − T2∗ (y) gives

kT1∗ (y) − T2∗ (y)k2 = 0 ∀y ∈ H,

so T1∗ (y) = T2∗ (y) for all y ∈ H. Thus T ∗ is unique.


Now I’ll show that T ∗ is linear (again, assuming it exists). Let λ1 , λ2 ∈ C,
y1 , y2 ∈ H, Y = λ1 y1 + λ2 y2 and Z = λ1 T ∗ (y1 ) + λ2 T ∗ (y2 ). Then for all x ∈ H,

hx, T ∗ (Y )i = hT (x), Y i = λ¯1 hT (x), y1 i + λ¯2 hT (x), y1 i


= λ¯1 hx, T ∗ (y2 )i + λ¯2 hx, T ∗ (y2 )i = hx, Zi

Thus
hx, T ∗ (Y ) − Zi = 0 ∀x ∈ H.

52
In particular, choosing x = T ∗ (Y ) − Z gives

kT ∗ (Y ) − Zk2 = 0

and hence T ∗ (Y ) = Z. So T ∗ is linear.


Finally, I’ll show that T ∗ exists and is bounded. For any y ∈ H, let gy :
H → C be the linear functional

gy (x) = hT (x), yi.

Then gy is bounded and kgy kop ≤ kykkT kop , because if x has unit length,

|gy (x)| = |hT (x), yi| ≤ kT (x)kkyk ≤ kT kop kyk,

where I have used the Cauchy-Schwarz inequality and the fact that T is bounded.
Therefore by the Riesz-Fréchet theorem 4.12 there exists a vector z ∈ H
such that
gy (x) = hx, zi.
I will define T ∗ (y) := z. Then

hT (x), yi = gy (x) = hx, T ∗ (y)i ∀x, y ∈ H.

This establishes the existence of the map T ∗ . Moreover, by construction

kT ∗ (y)k = kgy k ≤ kT kop kyk ∀y ∈ H.

Therefore T ∗ is bounded and kT ∗ kop ≤ kT kop .


Lemma 4.17. Let S, T : H → H be a bounded linear operator on a Hilbert
space H and let λ, µ ∈ C. Then
(i) (T ∗ )∗ = T
(ii) kT ∗ kop = kT kop
(iii) (λS + µT )∗ = λ̄S ∗ + µ̄T ∗
(iv) (ST )∗ = T ∗ S ∗
(v) T ∗ T and T T ∗ are self-adjoint.
Proof. First, ∀x, y ∈ H

hx, T (y)i = hT (y), xi = hy, T ∗ (x)i = hT ∗ (x), yi.

So (T ∗ )∗ = T by the uniqueness of the adjoint.


Second, from theorem 4.14

kT kop = k(T ∗ )∗ kop ≤ kT ∗ kop ≤ kT kop

so kT kop = kT ∗ kop .

53
Third, ∀x, y ∈ H

hx, λ̄S ∗ (y) + µ̄T ∗ (y)i = λhx, T ∗ (y)i + µhx, S ∗ (y)i = hλS(x) + µT (x), yi.

So λ̄S ∗ + µ̄T ∗ = (λS + µT )∗ by uniqueness of the adjoint.


Fourth, for all x, y ∈ H,

hx, T ∗ (S ∗ (y))i = hT (x), S ∗ (y)i = hS(T (x)), yi.

So (ST )∗ = T ∗ S ∗ by uniqueness of the adjoint.


Fifth, (T T ∗ ) = (T ∗ )∗ T ∗ = T T ∗ and (T ∗ T )∗ = T ∗ (T ∗ )∗ = T ∗ T by the
previous parts.

The eigenvalues and eigenvectors of self-adjoint operators have very nice


properties:
Proposition 4.18. Let T : E → E be a self-adjoint operator on an inner
product space E. Then

(i) the eigenvalues of T are real;


(ii) eigenvectors of T corresponding to distinct eigenvalues are orthogonal; and
(iii) ker(T ) = im(T )⊥ .
Proof. For part (i), suppose that v is an eigenvector of T with eigenvalue λ ∈ C.
Then
λhv, vi = hλv, vi = hT (v), vi = hv, T (v)i = hv, λvi = λ̄hv, vi.
Since v is non-zero hv, vi =
6 0, so it follows that λ = λ̄ i.e. λ ∈ R.
For part (ii), suppose that u, v are eigenvectors such that T (u) = λu, T (v) =
µv and λ 6= µ. Then

λhu, vi = hλu, vi = hT (u), vi = hu, T (v)i = hu, µvi = µ̄hu, vi.

Since λ, µ ∈ R this implies that (λ − µ)hu, vi = 0. Since λ 6= µ it follows that


hu, vi = 0.
For part (iii), let that u ∈ ker(T ). Then for all vectors T (v) ∈ im(T ) it holds
that
hu, T (v)i = hT (u), vi = h0E , vi = 0.
Therefore u ∈ im(T )⊥ , and hence ker(T ) ⊆ im(T )⊥ . Conversely, let u ∈
im(T )⊥ . Then
hT (u), T (u)i = hu, T (T (u))i = 0
because T (T (u)) ∈ im(T ). Therefore T (u) = 0E , so u ∈ ker(T ) and hence
im(T )⊥ ⊆ ker(T ). Thus im(T )⊥ = ker(T ).
The next theorem (which will be useful later on) gives a useful characterisa-
tion of the norm of a self-adjoint operator.

54
Theorem 4.19. Let T : H → H be a self-adjoint bounded linear operator on a
Hilbert space H. Then

kT kop = sup |hT (e), ei|.


e∈H, kek=1

Proof. Let
s= sup |hT (e), ei|.
e∈H, kek=1

We must show that s = kT kop . By the Cauchy-Schwarz inequality, for all unit
vectors e ∈ H,
hT (e), ei ≤ kT (e)kkek = kT (e)k ≤ kT kop .
Since s is a least upper bound,

s ≤ kT kop .

Now let x, y ∈ H be any two vectors. Then

hT (x + y), x + yi = hT (x), xi + hT (y), yi + hT (x), yi + hT (y), xi


= hT (x), xi + hT (y), yi + hT (x), yi + hy, T (x)i
= hT (x), xi + hT (y), yi + 2<(hT (x), yi).

where we used the fact that T is self-adjoint. Similarly,

hT (x − y), x − yi = hT (x), xi + hT (y), yi − 2<(hT (x), yi).

Subtracting these gives

4<(hT (x), yi) = hT (x + y), x + yi − hT (x − y), λx − yi


≤ |hT (x + y), x + yi| + |hT (x − y), x − yi|
≤ skx + yk2 + skx − yk2
= 2skxk2 + 2skyk2

Now let e be any unit vector, choose y = T (e) and x = kT (e)ke. Then
hT (x), yi = kT (e)k3 and kxk2 = kyk2 = kT (e)k2 , so the inequality says that

4kT (e)k3 ≤ 4skT (e)k2 .

If kT (e)k > 0 we can divide through to obtain

kT (e)k ≤ s.

If kT (e)k = 0 then it is still true that kT (e)k ≤ s, because s ≥ 0. Since this


inequality holds for all unit vectors e we conclude that

kT kop ≤ s.

Thus we have shown kT kop = s.

55
4.3 Hilbert-Schmidt operators
I being this section by reminding you of some things you will have encountered
before in finite-dimensional linear algebra. Let A be an n × n matrix and let
T : Cn → Cn be the linear map given by T (x) = Ax ∀x ∈ Cn . Let
     
1 0 0
0 1 0
     
e1 = 0 , e2 = 0 , . . . , en = 0
     
 ..   ..   .. 
. . .
0 0 1

be the standard orthonormal basis for Cn . Then T (ej ) equals the jth column
of A, and
hT (ej ), ei i
equals the entry Aij of A in the ith row and jth column. Conversely, given
any linear map T : Cn → Cn , we can construct a square matrix A by setting
Aij = hT (ej ), ei i; it can then be shown that T (x) = Ax for every x ∈ Cn . Thus
every linear map from Cn to Cn can be described as multiplication by some
matrix.
In infinite-dimensional linear algebra the situation is different: not every
linear map can be described by a matrix. Those that can have a special name:
Definition 4.20. Let E be an inner product space with countable orthonormal
basis (en )n∈N and let T : E → E be a bounded linear operator. The matrix
elements of T with respect to this basis are the complex numbers

hT (ej ), ei i i, j ∈ N.

The Hilbert-Schmidt norm of T is


  21

X
kT kHS :=  |hT (ej ), ei i|2  .
i,j=1

T is called Hilbert-Schmidt if this sum converges and T is bounded. The space


of all Hilbert-Schmidt operators is denoted HS(E, E).
Example 4.21. Let λ1 , λ2 , λ3 . . . ∈ C and let T : `2 (N) → `2 (N) be the linear
operator
T (x1 , x2 , x3 , . . .) = (λ1 x1 , λ2 x2 , λ3 x3 , . . .).
Let us choose the usual basis

e1 = (1, 0, 0, . . .)
e2 = (0, 1, 0, . . .)
..
.

56
Then T (ej ) = λj ej . So the matrix elements of T w.r.t. this basis are
(
λj j = i
hT (ej ), ei i = λj hej , ei i =
0 j 6= i.
Suppose that the sequence (λn )n∈N is bounded, i.e. there exists an L > 0 such
that |λn | ≤ L ∀n ∈ N. We leave it as an exercise to show that kT (x)k ≤ Lkxk
for all x ∈ `2 (N), so T is bounded.
Now
X∞ ∞
X
|hT (ej ), ei i|2 = |λj |2 .
i,j=1 j=1

Therefore T is Hilbert-Schmidt if and only if λ := (λ1 , λ2 , λ3 , . . .) belongs to


`2 (N). If T is Hilbert-Schmidt then
  12

X
kT kHS =  |λj |2  = kλk.
j=1
P∞ 1
For example, if λj = 2−j/2 then kT kHS = ( j=1 2−j ) 2 = 1, using the formula
for a geometric series.
1

Example 4.22. Consider C[−π, π] with its inner product hf, gi = 2π −π
f (t)g(t)dt
and orthonormal basis S = {en : n ∈ Z}, where en (t) = exp(int). Let
df
T (f ) =
.
dt
Let us calculate the matrix elements of T with respect to this basis. We have
d
T (ej )(t) = exp(ijt) = ijej (t).
dt
So the matrix elements are
(
ij j=k
hT (ej ), ek i = ijhej , ek i =
0 j 6= k
for j, k ∈ Z. This operator is not Hilbert-Schmidt, because it is not even
bounded.
Notice that the definition of a Hilbert-Schmidt operator involves a choice
of orthonormal basis. On the other hand, when doing linear algebra it is good
practice to make definitions which do not depend on any choice of basis. With
this in mind, I’ll now show that the definition of a Hilbert-Schmidt operator is
independent of the choice of basis. First I need:
Lemma 4.23. Let H be a Hilbert space and let (en )n∈N and (fn )n∈N be two
orthonormal bases for E. Let T be a Hilbert-Schmidt operator with respect to
the basis (en )n∈N . Then

X ∞
X
|hT (fj ), fi i|2 = |hT (ej ), ei i|2 .
i,j=1 i,j=1

57
Proof. By Parseval’s identity, for any vector x ∈ E,

X ∞
X ∞
X ∞
X
2 2 2 2
|hfi , xi| = |hx, fi i| = kxk = |hx, ei i| = |hei , xi|2 .
i=1 i=1 i=1 i=1

I will prove the lemma by using this identity twice:


∞ X
X ∞ ∞ X
X ∞
|hT (fj ), fi i|2 = |hT (fj ), ei i|2
j=1 i=1 j=1 i=1

X
= |hfj , T ∗ (ei )i|2
i,j=1
X∞
= |hej , T ∗ (ei )i|2
i,j=1
X∞
= |hT (ej ), ei i|2 .
i,j=1

Corollary 4.24. The property of being Hilbert-Schmidt, and the Hilbert-Schmidt


norm, are independent of the choice of countable orthonormal basis.
Proof. Let (en )n∈N and (fn )n∈N be two orthonormal bases for an inner product
space E and let T : E → E be a linear operator. SupposePthat T is Hilbert-
∞ 2
Schmidt with respect to (en )n∈N . Then by the lemma, i,j=1 |hT (fj ), fi i|
2
P∞ to kT kHS . Similarly, if T is Hilbert-Schmidt
converges with respect to (fn )n∈N
then i,j=1 |hT (ej ), ei i|2 converges to kT kHS
2
.
Recall that the rank of a linear operator T : E → F is the dimension of its
image, i.e.
rank(T ) = dim{T (x) : x ∈ E}.
An operator is said to have finite rank if its image is finite-dimensional.
Proposition 4.25. A bounded linear operator of finite rank on an inner product
space with countable orthonormal basis is Hilbert-Schmidt.
Proof. Let T : H → H be bounded and of finite rank. Using the Gram-Schmidt
process we may choose an orthonormal basis {f1 , . . . , fN } for the image of T .
This can be extended to a countable orthonormal basis (fn )n∈N for H (see
below). With respect to this basis, the matrix elements hT (fj ), fi i vanish if
i > N because T (fj ) is a linear combination of f1 , . . . , fN , all of which are
orthogonal to fi . Therefore

X X ∞
N X X ∞
N X N
X
|hT (fj ), fi i|2 = |hT (fj ), fi i|2 = |hfj , T ∗ (fi )i|2 = kT ∗ (fi )k2 ,
i,j=1 i=1 j=1 i=1 j=1 i=1

where the last equality follows from Parseval’s identity. Therefore the sum is
finite and T is Hilbert-Schmidt.

58
In the proof I made use of the following:
Lemma. Let {f1 , . . . fN } be a set of N orthonormal vectors in a separable
Hilbert space H. Then there exists vectors fN +n for n ∈ N such that (fn )n∈N is
a countable orthonormal basis.
For completeness, here is a proof, which is not examinable.
Proof. Let (en )n∈N be any countable orthonormal basis for H. Let Fm =
sp{f1 , . . . , fN , e1 , . . . , em } and let gm be the orthogonal projection of em onto the
orthogonal complement of Fm−1 , i.e. gm = em − PFm−1 (em ). Note that by con-
struction Fm = sp{f1 , . . . , fN , g1 , . . . , gm }, and that the vectors f1 , . . . , fN , g1 , g2 , . . .
are orthogonal.
I claim that at most N of these vectors are zero. I prove this claim by contra-
dition: suppose that gm1 = gm2 = . . . = gmN +1 = 0E for some m1 < m2 < . . . <
mN +1 = M . Consider the subspace FM = sp{f1 , . . . , fN , e1 , . . . , eM }. The as-
sumption gM = 0E implies that eM ∈ FM −1 , so eM can be written as a linear
combination of the remaining vectors and FM = sp{f1 , . . . , fN , eN +1 , . . . , eM −1 }.
Similarly, each of the vectors emi , can in turn be removed from the spanning
set, so FM can be written as the span of N + M − (N + 1) = M − 1 vectors and
dim FM ≤ M − 1. However, since e1 , . . . , eM are linearly independent we know
that dim FM ≥ M , so we have reached a contradiction.
Given that only finitely many of gn are zero, let (gk(n) )n∈N be the subse-
quence consisting of all non-zero vectors in the sequence. Let fN +n = gk(n) /kgk(n) k
for n ∈ N. Then (fn )n∈N is an orthonormal sequence. I need to show this is
a countable
P∞ orthonormal basis. Let x ∈ E be any vector; I need to show that
x = i=1 hx, fi ifi . Now f1 , . . . , fN +n is an orthonormal basis for Fk(n) so

N +n k(n)
X X
x− hx, fi ifi = kx − PFk(n) (x)k ≤ x − hx, ei iei ,
i=1 i=1

where I have used the fact that e1 , . . . , ek(n) ∈ Fk(n) and PFk(n) (x) is the closest
point in Fk(n) to x. Since (en )n∈N is an orthonormal basis the RHS tends to
PN +n
zero as n → ∞, so x = limn→∞ i=1 hx, fi ifi by the squeeze rule.
Note that the way (fn )n∈N is constructed in this proof is essentially the
Gram-Schmidt process. The only differences are that there are infinitely-many
rather than finitely-many vectors, and there is an additional step of removing
zero-vectors.
This proposition gives many more examples of Hilbert-Schmidt operators:
for example, the orthogonal projection PF onto any finite-dimensional subspace
is Hilbert-Schmidt, and the composition of such a projection with any bounded
linear operator is also Hilbert-Schmidt.
So far I haven’t said very much about the Hilbert-Schmidt norm of a Hilbert-
Schmidt operator. The two most useful properties of the Hilbert-Schmidt norm
are described in the next two propositions. First, the Hilbert-Schmidt norm
gives us information about the operator norm:

59
Proposition 4.26. Let T : E → E be a Hilbert-Schmidt operator. Then
kT kop ≤ kT kHS .
Proof. In the proof we will need the Cauchy-Schwarz inequality for `2 (N), which
says that
P for any two sequences (yj )j∈N and (zj )j∈N of complex numbers such
that j |yj |2 < ∞ and j |zj |2 < ∞,
P

2   

X X∞ ∞
X
yj z¯j ≤ |yj |2   |zj |2 
j=1 j=1 j=1

)n∈N be an orthonormal basis for E and let x ∈ E be a unit vector.


Let (enP

Then x = j=1 xj ej for some xj ∈ C. Then

X
kT (x)k2 = |hT (x), ei )i|2 by Parseval’s identity
i=1
* ∞  + 2

X X
= T xj ej  , ei
i=1 j=1
2
∞ X
X ∞
= xj hT (ej ), ei i by continuity of T and h·, ·i
i=1 j=1

∞ ∞
! ∞ 
X X X
≤ |xk |2  |hT (ej ), ei i|2  by Cauchy-Schwarz
i=1 k=1 j=1

= kT k2HS since kxk2 = 1.

Therefore kT kHS is an upper bound on the norm of kT (x)k, and since kT kop is
a least upper bound it follows that kT kop ≤ kT kHS .

The second proposition says that kT kHS is a norm (in fact, it says a little
more):
Proposition 4.27. Let H be a separable Hilbert space with countable orthonor-
mal basis (en )n∈N . Then HS(H, H) is an inner product space, with inner prod-
uct given by

X
hS, T iHS = hS(ei ), T (ei )i.
i=1

The norm induced by this inner product equals the Hilbert-Schmidt norm kT kHS .

4.4 Compact operators


I’ll begin this section with a few reminders. Given a sequence (xn ) in a vector
space E, a subsequence is a sequence (yn ) = (xk(n) ), where k : N → N is a

60
strictly increasing function (n > m =⇒ k(n) > k(m)). For example, the
function k(n) = 2n picks out the subsequence of even terms: x2 , x4 , x6 , . . ..
Note that if (xn ) converges to x then any subsequence of (xn ) also converges to
x. Another reminder is:
Definition 4.28. A sequence (xn ) in a normed inner product space E is called
bounded if there exists a constant M > 0 such that kxn k ≤ M ∀n ∈ N.
You have seen both of these notions before in the Bolzano-Weierstrass the-
orem, which says that every bounded sequence in R (or C) has a convergent
subsequence. There is a generalisation of this theorem to finite-dimensional
inner product spaces:

Proposition 4.29. Every bounded sequence in a finite-dimensional inner prod-


uct space has a convergent subsequence
Proof. Let (xn ) be a sequence in a finite-dimensional normed vector space E
and suppose there exists M > 0 such that kxn k ≤ M ∀n ∈ N. Let e1 , . . . , em
be an orthonormal basis for E, and write
m
X
xn = λjn ej
j=1

for some λjn ∈ C, where n ∈ N and 1 ≤ j ≤ m.


Consider the sequence (λ1n ) in C. Since the sequence (xn ) is bounded,
m
X
|λ1n |2 ≤ |λjn |2 = kxn k2 ≤ M 2
j=1

for all n ∈ N. So |λ1n | ≤ M and (λ1n ) is a bounded sequence in C. By the


Bolzano-Weierstrass theorem is has a subsequence (λ1k1 (n) ) that converges to a
limit λ1 .
Now consider the sequence (λ2k1 (n) ) in C. As above one can show that
|λ2k1 (n) | ≤ M so this sequence is bounded. By the Bolzano-Weierstrass the-
orem, it has a subsequence (λ2k2 (n) ) that converges to a limit λ2 .
By repeating this process, we end up with a subsequence (λm km (n) ) that con-
verges to a limit λm . Note also that λjkm (n) → λj for any j ≤ m, because this se-
P∞
quence is a subsequence of (λjkj (n) ). I claim now that xkm (n) → x = j=1 λj ej .
This follows from:
  21
m
X m
X
kxkm (n) − xk = (λjkm (n) − λj )ej ≤  |λjkm (n) − λj |2  → 0.
j=1 j=1

61
Note that there is no such theorem in infinite-dimensional normed vector
spaces.
Definition 4.30. Let H be a Hilbert space and let T : H → H be a bounded
linear operator. Then T is called compact if, for every bounded sequence (xn )
in H, the image (T (xn )) has a convergent subsequence.
Example 4.31. The identity operator on an infinite-dimensional Hilbert space
is not compact.
Recall that the identity operator is IH : H → H such that IH (x) = x
∀x ∈ H. To show that it’s not compact I need to give an example of a bounded
sequence whose image has not convergent subsequence.
Let (en )n∈N be an orthonormal sequence, i.e. a sequence such that hen , en i =
1 and hen , em i = 0 if n 6= m. Such a sequence exists because H is infinite-
dimensional. Then the sequence (en ) is bounded. Its image is the same se-
quence (IH (en )) = (en ). I will show by contradiction that this sequence has no
convergent subsequence. Let ek(n) be any subsequence. Then for n 6= m,
1
kek(n) − ek(m) k = hek(n) , ek(n) i + hek(m) , ek(m) i − hek(n) , ek(m) i − hek(m) , ek(n) i 2

= 2.

This subsequence is not Cauchy, since if  = 2/2 there is no N such that
n, m > N =⇒ kek(n) − ek(m) k < . Since it is not Cauchy, it is also not
convergent. Therefore the sequence has no convergent subsequence.
Example 4.32. Any bounded linear operator of finite rank is compact.
To prove this I’ll make use of the Bolzano-Weierstrass theorem. Let T :
H → H be a bounded linear operator of finite rank and let (xn ) be a bounded
sequence in H. Then there exists an M > 0 such that kxn k < M ∀n ∈ N.
Therefore
kT (xn )k ≤ kT kop kxn k ≤ kT kop M ∀n ∈ N.
So (T (xn )) is a bounded sequence. Also, T (xn ) is a sequence in the image of T ,
which is finite-dimensional (since T has finite rank). Thus T (xn ) is a bounded
sequence in a finite-dimensional vector space, so by proposition 4.29 it has a
convergent subsequence. Therefore T is compact.
Proposition 4.33. Let (Tn ) be a sequence of compact bounded linear operators
on a Hilbert space H and suppose that T is a bounded linear operator on H such
that Tn → T with respect to the operator norm. Then T is compact.
Note: saying Tn → T w.r.t. the operator norm means that kTn − T kop → 0.
Proof. Let (xn ) be a bounded sequence in H. We must find a convergent sub-
sequence of T (xn ).
Since T1 is compact and (xn ) is bounded the sequence (T1 (xn )) has a subse-
quence (T1 (xk1 (n) )) which converges to a limit y1 ∈ H. Since T2 is compact and
(xk1 (n) ) is bounded the sequence (T2 (xk1 (n) )) has a subsequence (T2 (xk2 (n) ))
which converges to a limit y2 ∈ H. By repeating this argument, we find subse-
quences (xkj (n) and vectors yj ∈ H such that

62
• (xkj+1 (n) ) is a subsequence of (xkj (n) ) and
• Tj (xkj (n) ) converges to yj as n → ∞.
Let’s arrange these sequences in a big table, with the j-th sequence in the j-th
row:
xk1 (1) xk1 (2) xk1 (3) ...
xk2 (1) xk2 (2) xk2 (3) ...
xk3 (1) xk3 (2) xk3 (3) ...
.. .. ..
. . .
xkj (1) xkj (2) xkj (3) ...
.. .. ..
. . .
Now let’s choose the sequence given by the diagonal entries: (zn ) = (xkn (n) ).
Then, for any j ∈ N, Tj (zn ) → yj as n → ∞. This is because, if we ignore the
first j − 1 terms, (Tj (zn )) is a subsequence of (Tj (zkj (n) )).
I claim that (T (zn )) is a Cauchy sequence. To show this, let  be any positive
real number. Consider

kT (zn ) − T (zm )k ≤ kT (zn ) − Tl (zn )k + kTl (zn ) − Tl (zm )k + kTl (zm ) − T (zm )k
≤ kT − Tl kop kzn k + kTl (zn ) − Tl (zm )k + kT − Tl kop kzm k
≤ 2M kT − Tl kop + kTl (zn ) − Tl (zm )k,

where M > 0 is such that kxn k ≤ M ∀n ∈ N. I choose l ∈ N such that


kT −Tl kop < /4M . Having fixed this l, I use the fact that (Tl (zn )) is convergent
and hence Cauchy to choose N ∈ N such that n, m > N =⇒ kTl (zn ) −
Tl (zm )k < /2. Then we have
 
n, m > N =⇒ kT (zn ) − T (zm )k < 4M + = .
4M 2
Since (T (zn )) is Cauchy and H is complete, (T (zn )) is convergent. Therefore
(T (xn )) has a convergent subsequence (T (zn )).

Corollary 4.34. Any Hilbert-Schmidt operator is compact.


Proof. Let T : H → H be a Hilbert-Schmidt operator on a Hilbert space H
with countable orthonormal basis (en )n∈N . For any n ∈ N, let Tn : H → H be
the linear operator
Xn
Tn (x) = hT (x), ej iej .
j=1

In other words, Tn (x) = Pn (T (x)), where Pn : H → H is the projection operator


onto Fn = sp{e1 , . . . , en }. Then Tn is bounded (because it is a composition of
bounded operators) and of finite rank (because its image is contained in the
finite-dimensional vector space Fn ). Therefore Tn is compact.

63
I claim that Tn → T in both the operator norm and the Hilbert-Schmidt
norm. Consider first the Hilbert-Schmidt norm. The matrix elements of T − Tn
are
* n +
X
hT (ej ) − Tn (ej ), ei i = hT (ej ), ei i − hT (ej ), ek iek , ei
k=1
(
0 j≤n
= .
hT (ej ), ei i j > n

Therefore
∞ X
X ∞
kT − Tn k2HS = |hT (ej ), ei i|2
j=n+1 i=1
X ∞
∞ X ∞
n X
X
= |hT (ej ), ei i|2 − |hT (ej ), ei i|2
j=1 i=1 j=1 i=1

→0 as n → ∞.
Now kT − Tn kop ≤ kT − Tn kHS , by proposition 4.26. Therefore kT − Tn kop → 0
as n → ∞.
We have shown that T is a limit of a sequence of compact operators. There-
fore T is compact by proposition 4.33.

4.5 The spectrum of a bounded linear operator


Definition 4.35. Let S, T : E → E be two linear operators on a normed vector
space E. If T S = IE and ST = IE we say that S is an inverse of T and write
S = T −1 . If S is a bounded linear operator we say that T has a bounded inverse.
Definition 4.36. Let T : E → E be a bounded linear operator on a normed
vector space E. The spectrum of T is the set
σ(T ) = {λ ∈ C : T − λIE has no bounded inverse}.
The resolvent set of T is
ρ(T ) = C \ σ(T ) = {λ ∈ C : T − λIE has a bounded inverse}.
Proposition 4.37. All eigenvectors of a bounded linear operator belong to its
spectrum.
Proof. Suppose that T : E → E is a bounded linear operator and v ∈ E is an
eigenvector of T with eigenvalue λ. I need to show that λ ∈ σ(T ). To show this,
I need to show that T − λIE has no bounded inverse. Suppose for contradiction
that it does. Then
v = (T − λIE )−1 ((T − λIE )(v)) = (T − λIE )−1 (0E ) = 0E ,
which contradicts the fact that eigenvectors must be non-zero.

64
Example 4.38. If E is finite-dimensional and T : E → E then λ ∈ σ(T ) if and
only if λ is an eigenvalue of T . I’ve already shown if λ is an eigenvalue then
λ ∈ σ(T ). Suppose that λ is not an eigenvalue; I must show that λ ∈ / σ(T ).
Since λ is not an eigenvalue there are no non-zero vectors v such that T (v) = λv,
i.e.
ker(T − λIE ) = {v ∈ E : (T − λIE )(v) = 0E } = {0E }.
Therefore nullity(T − λIE ) = dim ker(T − λIE ) = 0 and T is injective. By the
rank-nullity theorem,
dim im(T − λIE ) = rank(T − λIE ) = dim(E) − nullity(T − λIE ) = dim(E),
so im(T − λIE ) = E and T − λIE is surjective. Since T − λIE is a bijection it
has an inverse (T − λIE )−1 , and since this linear operator has finite-dimensional
domain it is bounded.
These two examples show that the spectrum of a bounded linear operator is
very closely related to its set of eigenvalues. Eigenvalues have many important
applications. For example, recall that our earlier discussion of the vibrating
string revolved around the functions sin(nx), with n ∈ N. These are eigenvectors
of the operator y 7→ d2 y/dx2 with eigenvalue −n2 . Similarly, if one wants to
study a vibrating surface (such as a the skin of a drum, or a metal plate) it is
important to know about the eigenvalues and eigenvectors of the appropriate
operator.
In the theory of Hilbert spaces one tends to study the spectrum of an oper-
ator rather than its set of eigenvalues. As we shall see, the spectrum is “better-
behaved” than the set of eigenvalues, i.e. the spectrum has lots of nice properties
which are not shared by the set of eigenvalues. We will see some of those prop-
erties in a moment, but in order to derive those properties I first need to show
you some preliminary results. The proofs of the next two lemmas are left as
exercises.
Lemma 4.39. If S, T : E → E are two bounded linear operators on a normed
vector space E then ST is a bounded linear operator and kST kop ≤ kSkop kT kop .
Lemma 4.40. If E is a normed vector space and (Sn ) and (Tn ) are two conver-
gent sequences in B(E, E) such that Sn → S ∈ B(E, E) and Tn → T ∈ B(E, E)
then Sn Tn → ST .
I can use these to prove:
Proposition 4.41. Let A : H → H be a bounded operator on a Hilbert space
such that kAkop < 1. Then IH − A has a bounded inverse given by

X
An
n=0

Proof. For any n ∈ N let


n
X
Bn = Aj = IH + A + A2 + . . . + An .
j=0

65
I claim that (Bn ) is a Cauchy sequence in B(H, H). Let r = kAkop < 1. Then
for n > m,

n
X n
X
kBn − Bm kop = Aj ≤ Aj op
j=m+1 j=m+1
op
n
X j rm+1 − rn+1 rm+1
≤ kAkop = ≤ .
j=m+1
1−r 1−r

Note that the RHS tends to zero as m → ∞ because r < 1. Therefore, given
 > 0 there is an N ∈ N such that m > N =⇒ rm+1 /(1 − r) < . Then
n > m > N =⇒ kBn − Bm kop < , so the sequence (Bn ) is Cauchy. Since H
is complete, B(H, H) is complete by proposition 4.10 and hence Bn converges
to a limit B ∈ B(H, H).
Now I’ll show that B is an inverse of IH − A. Notice that

(IH − A)Bn = IH − An+1 = Bn (IH − A).

Since kAn+1 kop ≤ rn+1 → 0 as r → ∞,

(IH − A)B = lim (IH − A)Bn = IH = lim Bn (I − A) = B(IH − A).


n→∞ n→∞

So B is a bounded inverse of A.
P∞
Note that the series j=0 Aj for (1 − A)−1 is very similar to the Taylor
series (1 − z)−1 = 1 + z + z 2 + . . .. This observation might help you remember
the series!
The next theorem outlines some of the important properties of the spectrum.
Theorem 4.42. Let T : H → H be a bounded linear operator on a Hilbert space
H. Then
(i) the spectrum of T is contained in the closed disc of radius kT kop :

σ(T ) ⊆ {λ ∈ C : |λ| ≤ kT kop };

(ii) σ(T ) is closed;


(iii) σ(T ) is non-empty.
Proof. For part (i), I will suppose that |λ| > kT kop and show that T − λIE
has a bounded inverse (and hence that λ ∈ / σ(T )). Notice that the operator
λ−1 T has norm kλ−1 T kop = kT k/|λ| < 1. So by proposition 4.41 IH − λ−1 T
has a bounded inverse B. It follows that −λB is a bounded inverse of T − λIE ,
because

(−λB)(T − λIE ) = B(−λ−1 + IE ) = IH and


−1
(T − λIE )(−λB) = (−λ + IE )B = IH .

66
To prove part (ii) I will first show that if λ ∈ ρ(T ) then there exists  > 0
such that
|µ − λ| <  =⇒ µ ∈ ρ(T ).
(if you are studying topology or metric spaces, you will recognise that this is the
same as saying that ρ(T ) is an open subset of T ). To prove my claim, suppose
that λ ∈ ρ(T ). Then for any µ ∈ C,

T − µIH = (T − λIH ) + (λ − µ)IH = IH + (λ − µ)(T − λIH )−1 T − λIH .


 

Thus T − µIH will be invertible if (IH − A) is invertible, where A := (µ − λ)(T −


λIH )−1 . By proposition 4.41, IH − A is invertible if kAkop < 1. Now

kAkop = |µ − λ|k(T − λIH )−1 kop ,

so kAkop k < 1 if |µ − λ| <  := 1/k(T − λIH )−1 kop . Thus we have shown that
if |µ − λ| <  then µ ∈ ρ(T ).
Having proved the claim, I now show that σ(T ) is closed. This means that
every convergent sequence (λn ) in σ(T ) has its limit in σ(T ). Suppose for
contradiction that (λn ) is a sequence in σ(T ) such that λn → λ and λ ∈ / σ(T ).
Then there exists an  > 0 such that |µ − λ| <  =⇒ µ ∈ / σ(T ). Since λn → λ
there exists N ∈ N such that n > N =⇒ |λn − λ| < . Then we have that
n > N =⇒ λn ∈ / σ(T ), which contradicts the statement that (λn ) is a sequence
in σ(T ). So σ(T ) is closed.
The proof of part (iii) is omitted – it involves a generalisation of Liouville’s
theorem from complex analysis, so is beyond our scope.
The next proposition is another example of a nice property of the spectrum.
Proposition 4.43. Let T ∈ B(H, H) be a bounded linear operator on a Hilbert
space H. Then λ ∈ σ(T ) ⇐⇒ λ̄ ∈ σ(T ∗ ).
Proof. Suppose that λ ∈ ρ(T ) i.e. that (T − λIH )−1 exists. Then taking the
adjoint of the LHS and RHS of

IH = (T − λIH )(T − λIH )−1

gives ∗
IH = (T − λIH )−1 (T ∗ − λ̄IH ).
Similarly, taking the adjoint of

IH = (T − λIH )−1 (T − λIH )

gives ∗
IH = (T ∗ − λ̄IH ) (T − λIH )−1 .
∗
So (T ∗ − λ̄IH ) is invertible with inverse (T − λIH )−1 , and λ̄ ∈ ρ(T ∗ ). By a
similar argument, λ̄ ∈ ρ(T ∗ ) implies that λ ∈ ρ(T ). So we have shown λ̄ ∈ ρ(T ∗ )
⇐⇒ λ ∈ ρ(T ), which is equivalent to λ̄ ∈ σ(T ∗ ) ⇐⇒ λ ∈ σ(T ).

67
Example 4.44. Consider the left- and right-shift operators L : `2 (N) → `2 (N),
R : `2 (N) → `2 (N) defined by

L(x1 , x2 , x3 , . . .) = (x2 , x3 , x4 , . . .)
R(x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .).

We’ve already seen that these are bounded, with kLkop = kRkop = 1, and that
L = R∗ and R = L∗ . Let’s work out what their spectra are. Since eigenvalues
always belong to the spectrum, we’ll find their eigenvalues first.
Let λ ∈ C, and let
x = (1, λ, λ2 , . . .).
P∞
Then x ∈ `2 (N) if |λ| < 1, because in that case n=0 |λ|
2n
is a convergent
geometric series. Moreover,

L(x) = (λ, λ2 , λ3 , . . .) = λx

so x is an eigenvector of L with eigenvalue λ. Thus

{λ ∈ C : |λ| < 1} ⊂ σ(L).

On the other hand, R has no eigenvectors. To show this, suppose that x ∈ `2 (N)
satisfies
λ(x1 , x2 , x3 , . . .) = R(x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .).
If λ = 0 this clearly implies that xn = 0 ∀n ∈ N. If λ 6= 0 then the equation
λx1 = 0 implies that x1 = 0, the equation λx2 = x1 in turn implies that x2 = 0,
and so on, so that again xn = 0 ∀n ∈ N. Thus x has to be zero, so cannot be
an eigenvector of R.
Now let’s consider the spectra of L and R. By theorem 4.42 and our calcu-
lation of eigenvalues we know that

{λ ∈ C : |λ| < 1} ⊆ σ(L) ⊆ {λ ∈ C : |λ| ≤ 1}.

Thus it remains to determine whether complex numbers of modulus 1 belong


n
to σ(L). Suppose that |λ| = 1. Then λ is the limit of the sequence λn = n+1 λ.
Moreover, λn ∈ σ(L) ∀n ∈ N because |λn | < 1. Since σ(L) is closed, λ ∈ σ(L).
Therefore
σ(L) = {λ ∈ C : |λ| ≤ 1}.
Then, by proposition 4.43,

σ(R) = {λ ∈ C : λ̄ ∈ σ(L)} = {λ ∈ C : |λ| ≤ 1}.

Thus σ(R) is uncountably infinite, even though σ has no eigenvalues. This is


very different to the finite-dimensional situation, where the spectrum equals the
set of all eigenvalues.

68
4.6 The spectral theorem
In this section we will study in detail the eigenvalues and spectrum of operators
which are both compact and self-adjoint. I’ll begin with an example, but to set
the example up I need a lemma (whose proof is left as an exercise).
Lemma 4.45. Let H be a Hilbert space, let (en ) be an orthonormal sequence
in H, and let (λn ) be a bounded sequence in C. Then

X
T (x) = λn hx, en ien
n=1

defines a bounded linear operator T : H → H.


Example 4.46. Let H be a Hilbert space, let (en ) be an orthonormal sequence in
H, and let (λn ) be a sequence of real numbers such that |λn+1 | ≤ |λn | ∀n ∈ N
and λn → 0. Let T : H → H be the operator such that

X
T (x) = λn hx, en ien .
n=1

Then T is a bounded linear operator, by lemma 4.45. T is also self-adjoint,


because λn are real:
*∞ + ∞
X X
hT (x), yi = λn hx, en ien , y = λn hx, en ihen , yi
n=1 n=1

* +
X
= x, λn hy, en ien = hx, T (y)i.
n=1

Finally, T is compact, because it is the limit of the sequence of compact opera-


tors,
Xn
Tn (x) = λj hx, ej iej .
j=1

Indeed,

X ∞
X
k(T − Tn )(x)k2 = |λj hx, ej i|2 ≤ λ2n+1 |hx, ej i|2 ≤ λ2n+1 kxk2
j=n+1 j=n+1

so kT − Tn kop ≤ |λn+1 | → 0.
We will see later that every compact self-adjoint operator on a Hilbert space
is of this type. This is an important result – it is an infinite-dimensional analog
of diagonalising a matrix. Just as diagonalising a matrix makes it easier to do
calculations and prove theorems, so too with operators.
Notice that the operator in the example has eigenvalues λ1 , λ2 , . . . (because
T (en ) = λn en . As we will see later, this information tells us exactly what its
spectrum is.

69
Proposition 4.47. Let T : H → H be a compact self-adjoint operator on a
Hilbert space H. Then T has an eigenvalue λ = ±kT kop .
Proof. In the case kT kop = 0, we have T = 0H,H so 0 is certainly an eigenvalue
of T .
Consider then the case kT kop 6= 0. By proposition 4.19,

kT kop = sup{|hT (x), xi| : x ∈ H, kxk = 1}.

Therefore for each n ∈ N there exists a unit vector en such that kT kop − n1 <
|hT (en ), en i| ≤ kT kop . This sequence (en ) must have a subsequence (ek(n) ) such
that
hT (ek(n) ), ek(n) i → λ,
where λ = ±kT kop . Since T is compact and the subsequence (ek(n) ) is bounded,
this subsequence must have a subsequence (em(n) ) such that T (em(n) ) converges
to a limit z. Thus we have that

hT (em(n) ), em(n) i → λ = ±kT kop and T (em(n) ) → z.

It follows that

kT (em(n) ) − λem(n) k2 = kT (em(n) )k2 + λ2 kem(n) k2 − 2λhT (em(n) ), em(n) i


≤ 2λ2 − 2λhT (em(n) ), em(n) i
→0 as n → ∞,

so T (em(n) ) − λem(n) → 0. Since in addition T (em(n) ) → z, we deduce that

λem(n) → z.

It follows that
T (z) = lim λT (em(n) ) = λz.
n→∞

Moreover kzk = limn→∞ kλem(n) k = |λ| = kT kop 6= 0 so z is an eigenvector of


T with eigenvalue λ.
Theorem 4.48 (Spectral theorem). Let T be a non-zero compact self-adjoint
operator on a Hilbert space H.

(i) If T is of finite rank then there exists an orthonormal subset {e1 , e2 , . . . , er } ⊂


H of size r = rank(T ) and real numbers λ1 , λ2 . . . , λr ∈ R such that

kT kop = |λ1 | ≥ |λ2 | ≥ . . . ≥ |λr | > 0 and


Xr
T (x) = λn hx, en ien ∀x ∈ H.
n=1

70
(ii) If T is of infinite rank then there exists and orthonormal sequence (en )n∈N
in H and a sequence (λn ) of non-zero real numbers such that

kT kop = |λ1 | ≥ |λ2 | ≥ |λ3 | ≥ . . . ,


lim λn = 0 and
n→∞

X
T (x) = λn hx, en ien ∀x ∈ H.
n=1

Proof. Let r ∈ N ∪ {0, ∞} be the rank of the compact self-adjoint operator T .


If r = 0 then T = 0H,H and so there is nothing to prove, so suppose that r > 0.
I will prove the theorem by using proposition 4.47 repeatedly.
First, by that proposition there exists an eigenvector e1 of T with eigenvalue
λ1 = ±kT kop . Without loss of generality we can assume ke1 k = 1. Let H1 =
{e1 }⊥ . If x ∈ H1 then hT (x), e1 i = hx, T (e1 )i = λ1 hx, e1 i = 0, so T (x) ∈ H1 .
Thus I can define an operator

T1 : H1 → H1 , T1 (x) := T (x).

Since T is compact and self-adjoint, T1 is also compact and self-adjoint.


If r = 1 then T1 must the zero-operator, and we have (more or less) proved
the theorem. If not, we know by proposition 4.47 that there is a unit eigenvector
e2 ∈ H1 of T1 with eigenvalue λ1 = ±kT1 kop . Note that e2 is also an eigenvector
of T , and that he2 , e1 i = 0. Let

H2 = {e1 , e2 }⊥ .

As before T (H2 ) ⊆ H2 so I can define a compact self-adjoint operator

T2 : H2 → H2 , T2 (x) := T (x).

By repeating this process, we end up with either


(i) a finite set {e1 , . . . , er } of eigenvectors of T or;
(ii) if r = ∞, an orthonormal sequence (en ) of eigenvectors of T .
Moreover, the eigenvalue λn+1 associated with en+1 equals ±kTn kop , where

Tn : Hn → Hn , Tn (x) := T (x) and Hn := {e1 , . . . , en }⊥ .

Since Hn+1 ⊂ Hn ,

{T (e) : e ∈ Hn+1 , kek = 1} ⊆ {T (e) : e ∈ Hn , kek = 1}

and so kTn+1 kop ≤ kTn kop . Thus |λn+1 | ≤ |λn | as claimed.


In case (ii) I also need to prove that λn → 0 as n → ∞. Since the sequence
(en ) is bounded and T is compact there is a subsequence (ek(n) ) such that
(T (ek(n) )) is convergent. This convergent sequence is also Cauchy, so given

71
 > 0 there exists N ∈ N such that n, m > N =⇒ kT (ek(n) ) − T (ek(m) )k < .
Now

kT (ek(n) ) − T (ek(m) )k2 = kλk(n) ek(n) − λk(m) ek(m) k2 = λ2k(n) + λ2k(m) ≥ λ2k(n) .

Thus n > N =⇒ |λk(n) | < , so |λk(n) | → 0. Since the sequence |λn | is


decreasing, it must converge to the same limit as its subsequence, so |λn | → 0
and λn → 0.
Finally, I prove the two formulae for T (x). If r is finite, {e1 , . . . er } is an
orthonormal basis for im(T ). Therefore
r
X r
X r
X
T (x) = hT (x), en ien = hx, T (en )ien = λn hx, en ien .
n=1 n=1 n=1

Now consider the case r = ∞. Let

H∞ = {e1 , e2 , . . .}⊥ .

As before, if x ∈ H∞ then T (x) ∈ H∞ , so I can define

T∞ : H∞ → H∞ , T∞ (x) = x.

Now kT∞ kop ≤ kTn kop = |λn | for all n ∈ N, because H∞ ⊂ Hn . Since |λn | → 0,
kT∞ k = 0. Therefore T∞ is the zero operator. Now

X
x− hx, en ien ∈ H∞ ,
n=1
P∞
because hx − n=1 hx, en ien , em i = 0 for all m ∈ N. Therefore

∞ ∞
! !
X X
T (x) = T x− hx, en ien +T hx, en ien
n=1 n=1

X ∞
X
=0+ hx, en iT (en ) = λn hx, en ien .
n=1 n=1

Corollary 4.49. Let T be a non-zero compact self-adjoint operator on a Hilbert


space H. Then σ(T ) consists of the eigenvalues of T and 0 ∈ C.
Proof. We saw earlier that the eigenvalues λn of T belong to the spectrum.
In the case that r = rank(T ) < ∞, zero belongs to the spectrum because
any vector perpendicular to the eigenvectors e1 , . . . , er is an eigenvector with
eigenvalue 0. In the case that r = ∞, zero belongs to the spectrum because
λn → 0 and the spectrum is closed (proposition 4.42).

72
Now suppose that λ 6= λn ∀n ∈ N and λ 6= 0; it remains to show that
λ∈
/ σ(T ). As in the proof of theorem 4.48 define
r
X
P (x) = x − hx, en ien .
n=1

(where r ∈ N ∪ {0, ∞}). Then


r
X
(T − λIH )(x) = λn hx, en ien − λx
n=1
Xr
= (λn − λ)hx, en ien − λP (x).
n=1

I seek an inverse for T of the form


r
X
R(x) := µn hx, en ien + µP (x).
n=1

This will be a bounded linear operator if either r < ∞ or r = ∞ and the


sequence µn is bounded, by lemma 4.45. Assuming that this is the case, we find
that
r
 X
R (T − λIH )(x) = (λn − λ)hx, en iR(en ) − λR(P (x))
n=1
Xr
= µn (λn − λ)hx, en ien − λµP (x)
n=1

because P (en ) = 0, hP (x), en i = 0 and P (P (x)) = P (x). Thus for R to be an


inverse of T − λIH I need that
1 1
µn = , µ=− .
λn − λ λ

If these identities are satisfied it also holds that (T − λIH )(R(x)) = x, as you
can check for yourself.
It remains to show that the sequence µn is bounded in the case r = ∞. I
need to find a d > 0 such that |λ − λn | ≥ d ∀n ∈ N. To do so, set  = |λ|/2;
since λn → 0 there exists an N ∈ N such that n > N =⇒ |λn | < |λ|/2. So

n > N =⇒ |λ − λn | ≤ |λ| − |λn | < |λ| − |λ|/2 = |λ|/2.

If we set
d = min{|λ − λ1 |, |λ − λ2 |, . . . , |λ − λN |, |λ|/2}
then |λ − λn | ≥ d for all n ∈ N.

73

You might also like