Notes
Notes
Notes
Fourier Analysis
Derek Harland
This module is about linear algebra in infinite dimensions. Many things that we
take for granted in finite dimensions (such as existence of a basis or diagonalis-
ability of matrix) are much more subtle in infinite dimensions. This means that
infinite-dimensional linear algebra is more interesting than its finite-dimensional
counterpart, and also that we must proceed much more cautiously in infinite
dimensions. Techniques that you learnt in Analysis will be very important.
Infinite-dimensional linear algebra has many important applications to other
areas of mathematics, including:
• The Fourier series (and its counterpart, the Fourier transform)
• Partial differential equations
• Quantum mechanics
We will focus in the Fourier series in this module. This is the mathematics that
underlies digital storage of music and other media, including MP3 files.
1
Definition 1.2. Let E be a complex vector space and let F ⊂ E. Then F is
called a subspace of E if
(i) 0E ∈ F
(ii) ∀x ∈ F, λ ∈ C λx ∈ F
(iii) ∀x, y ∈ F , x + y ∈ F .
Definition 1.3. Let E be a complex (or real) vector space. A norm on E is a
function k · k : E → R satisfying
(N1) kxk ≥ 0 ∀x ∈ E, and kxk = 0 if and only if x = 0E
You should think of a norm as being a kind of distance function: kxk is the
“length” of the vector x, and kx − yk is the “distance” between two vectors x
and y. Note that axiom (N3) is often called the triangle inequality.
p √
Example 1.4. Let E = R3 with k(x1 , x2 , x3 )k = x21 + x22 + x23 = x.x. Note
that kxk equals what you would normally call the length of the vector x. We
claim that this is a norm: to prove it, one must check that all three axioms are
satisfied. Doing this is part of your exercise sheet!
Note that the triangle inequality says in this case that the length of one side
of a triangle is no bigger than the sum of the lengths of the other two sides.
This hopefully sounds plausible, even if you haven’t seen a proof. . .
For the next example you will need to recall some definitions from earlier
analysis modules. First, if D ⊂ R then a function f : D → R is called continuous
if
∀y ∈ D ∀ > 0 ∃δ > 0 s.t. |x − y| < δ ⇒ |f (x) − f (y)| < .
A function f : D → C is called continuous if the real and imaginary parts of f
are both continuous. Note that if f : D → C is continuous then the function
D → R, t 7→ |f (t)| is also continuous (prove it using the fact that products and
compositions of continuous functions are continuous!)
Second, if S ⊂ R and M ∈ R are such that ∀x ∈ S, x ≤ M then M is called
an upper bound for S and S is said to be bounded above. If S is bounded above
then the supremum of S, denoted sup S, is the least upper bound for S. In
other words sup S is the unique number such that (i) sup S is an upper bound
for S and (ii) if M is any upper bound for S, then M ≥ sup S. Similarly, if S
is bounded from below the infimum of S, denoted inf S, is the greatest lower
bound for S.
2
Example 1.5. Let a, b ∈ R with a < b and let C[a, b] be the set of continuous
functions f : [a, b] → C. For any f ∈ C[a, b], let
We claim that (C[a, b], k · k∞ ) is a normed vector space. To prove this we must
prove that C[a, b] is a vector space and that k · k∞ is a norm.
It is not completely obvious that C[a, b] is a vector space – you need to
show for example that the sum of two continuous functions is continuous, and
that multiplying a continuous function by a constant gives another continuous
function. Thankfully you will have seen proofs of these in 2nd year analysis, so
I won’t prove them here.
Now we prove that k·k∞ is a norm. First, we need to check that the definition
of k · k∞ makes sense: kf k∞ is only well-defined if the set {|f (t)| : t ∈ [a, b]} is
bounded above. We can prove this using the Maximum Value Theorem from 2nd
year analysis. This states that there exists a c ∈ [a, b] such that |f (t)| ≤ |f (c)|,
so then we know the set has an upper bound |f (c)|. Now we’ll check that k · k∞
satisfies the three axioms:
(N1): Since |f (t)| ≥ 0 ∀t ∈ [a, b], kf k∞ ≥ 0. If f (t) 6= 0 for some t ∈ [a, b]
then kf k∞ > 0, so the only way it can happen that kf k∞ = 0 is if f (t) = 0
∀t ∈ [a, b] i.e. f = 0.
(N2): If λ ∈ C then |λf (t)| = |λ||f (t)| ≤ |λ|kf k∞ because kf k∞ is an
upper bound for {|f (t)| : t ∈ [a, b]}. Therefore |λ|kf k∞ is an upper bound for
{|λf (t)| : t ∈ [a, b]}. Suppose for contradicion that M is another upper bound
for this set which is less than |λ|kf k∞ : then 0 ≤ M < |λ|kf k∞ , so |λ| > 0, and
M/|λ| is an upper bound for {|f (t)| : t ∈ [a, b]} which is less than kf k∞ . This
contradicts the fact that kf k∞ is a least upper bound. Therefore no such M
exists and |λ|kf k∞ equals the least upper bound kλf k∞ for {|λf (t)| : t ∈ [a, b]}.
(N3): Let f, g ∈ C[a, b]. Then for any t ∈ [a, b], |f (t)+g(t)| ≤ |f (t)|+|g(t)| ≤
kf k∞ + kgk∞ , so kf k∞ + kgk∞ is an upper bound for {|f (t) + g(t)| : t ∈ [a, b]}.
Since kf + gk∞ is a least upper bound for this set, kf + gk∞ ≤ kf k∞ + kgk∞ .
Lemma 1.6. Any subspace of a normed vector space is a normed vector space.
Proof. Let (E, k · k) be a vector space and let F ⊂ E be a subspace. Let
k · kF : F → R denote the restriction of k · k : E → R to F . We leave it as an
exercise to check that k · kF is a norm on F .
Example 1.7. Let a < b and let P [a, b] be the set of polynomial functions, i.e.
n
X
P [a, b] = {f : [a, b] → C, f : t 7→ ci ti : n ∈ N, ci ∈ C for i = 1, . . . , n}.
i=0
3
The following lemma is very useful:
Lemma 1.8. Let (E, k · k) be a normed vector space. Then
Definition 1.9. Let E and F be two normed vector spaces and let T : E → F
be a linear map. T is called a linear isometry if kT xkF = kxkE for all x ∈ E. It
is called an isometric isomorphism if it is a linear isometry and an isomorphism
of vector spaces.
One of the main uses of norms is in talking about convergent sequences.
Definition 1.10. Let (E, k·k) be a normed vector space, let (xn ) be a sequence
in E and let x ∈ E. We say that (xn ) converges to x and that x is a limit of
(xn ) if
∀ > 0 ∃N ∈ N such that n > N ⇒ kxn − xk < .
This is denoted xn → x or limn→∞ xn = x.
In the cases (E, k · k) = (R, | · |) and (E, k · k) = (C, | · |), this definition
coincides with the definition of convergence that you encountered earlier in
your mathematical life.
Note that the definition says “a” limit, not “the” limit: there is nothing in
the definition to indicate that limits are unique! Nevertheless, we can prove:
Proposition 1.11. Let (xn ) be a sequence in a normed vector space (E, k · k)
and let x, y ∈ E such that xn → x and xn → y. Then x = y.
Proof. By positivity of norms and by the triangle inequality, 0 ≤ kx − yk =
kx − xn + xn − yk ≤ kx − xn k + kxn − yk for every n ∈ N. Since kx − xn k and
kxn − yk converge to 0 as n → ∞, it must be that kx − yk = 0. Then by axiom
(N1) x − y = 0E i.e. x = y.
Definition 1.12. Let (E, k·k) be a normed vector space. A function F : E → C
(or R) is called continuous if for every x ∈ E and every sequence (xn ) that
converges to x,
lim F (xn ) = F (x).
n→∞
Lemma 1.13. Let (E, k · k) be a normed vector space. Then the function k · k :
E → R is continuous, i.e. whenever (xn ) is a sequence in E that converges to
x, kxn k → kxk.
4
Proof. Let x ∈ E and let (xn ) be a sequence that converges to x. Then
Note the subtle difference: to prove uniform convergence we must choose the
same N for every t ∈ [a, b], whereas to prove pointwise convergence we can (if
we wish) choose different N ’s for different values of t.
Another way of thinking about these two types of convergence is the fol-
lowing: uniform convergence means that that the sequence of functions (fn )
converges to f ∈ C[a, b] with respect to the norm k · k∞ ; pointwise convergence
means that for each t the sequence (fn (t)) of complex numbers converges to
f (t).
Uniform and pointwise convergence are related:
Proposition 1.15. A uniformly convergent sequence of functions is pointwise
convergent.
Proof. Let (fn ) be a sequence of functions fn : [a, b] → C and let f : [a, b] → C
such that fn → f uniformly, i.e. kfn − f k∞ → 0. Let t ∈ [a, b]. We must show
that |fn (t) − f (t)| → 0. Note that 0 ≤ |fn (t) − f (t)| ≤ kfn − f k∞ because
kfn − f k∞ is an upper bound for the function fn − f . Therefore by the squeeze
rule for sequences, |fn (t) − f (t)| → 0.
Note however that pointwise convergent sequences need not be uniformly
convergent: here’s a counter-example
Example 1.16. For each n ∈ N let fn : [0, 1] → C be the function
1
2nt
0 ≤ t ≤ 2n
1
fn (t) = 2 − 2nt 2n ≤ t ≤ n1 .
1
0 n ≤t≤1
5
These functions all belong to C[a, b] (i.e. they are continuous and bounded).
We claim that fn → f pointwise, and that fn 9 f uniformly.
Showing that fn 9 f uniformly is easy: just note that kfn − f k∞ = 1 for
every n ∈ N, because fn has maximum value 1 at t = 1/2n. So kfn − f k∞
certainly doesn’t converge to 0.
Showing that fn → f pointwise is slightly more tricky. If t = 0 then fn (t) = 0
for every n ∈ N and f (t) = 0, so fn (t) → f (t) as n → ∞. If 0 < t ≤ 1 then
fn (t) = 0 for all n > 1/t and f (t) = 0, so again fn (t) → f (t).
Just as there are two types of convergence, there are in fact two types of
continuity. We’ve already encountered one type; here’s the other:
The difference from the previous definition is rather subtle. Given > 0, to
prove that a function is continuous we need to find a δ > 0 for each y ∈ D, and
we can use different δ’s for different y’s. To prove that a function is uniformly
continuous we must use the same δ for every y ∈ D. To see in more detail how
the two definitions differ, here’s an example:
Example 1.18. Let f : (0, ∞) → C be the function f (t) = 1/t. Then f is
continuous but not uniformly continuous.
I hope you agree that f is continuous, so I will just prove that f is not
uniformly continuous. To do so, I must show you an > 0 such that
1 1 |y − x| |y − x|
|f (x) − f (y)| = − = > = 1 = .
x y |x||y| |x|2
Thus for all possible values of δ I’ve shown you an example of x, y such that
|x − y| < δ and |f (x) − f (y)| ≥ .
The reason this example works is because the slope of the graph of f (x) gets
very steep as x approaches 0. By making x and y small, we can arrange that
|x − y| is small and |f (x) − f (y)| is big.
Thankfully, in the cases that we care about continuity and uniform continuity
are the same:
6
Proof. First we prove that a uniformly continuous function is continuous. Sup-
pose that f is uniformly continuous. Let y ∈ [a, b] and let > 0. Since f is uni-
formly continuous there exists a δ > 0 such that |x − z| < δ ⇒ |f (x) − f (z)| <
for every x, z ∈ D. In particular, in the case z = y we have |x − y| < δ ⇒
|f (x) − f (y)| < for every x ∈ D. So f is continuous.
We must also prove that a continuous function is uniformly continuous. We
do so using proof by contradiction.
Suppose that there exists a function f : [a, b] → C which is continuous and
not uniformly continuous. Saying that f is not uniformly continuous means
there exists an > 0 such that
For each n ∈ N let δn = 1/n and choose xn , yn ∈ [a, b] such that |xn − yn | < δn
and |f (xn ) − f (yn )| ≥ .
Now (yn )n∈N is a bounded sequence in [a, b] so by the Bolzano-Weierstrass
theoreom it has a convergent subsequence (yn(k) )k∈N (where n(1) < n(2) <
n(3) < . . .) and a limit y ∈ [a, b] such that yn(k) → y as k → ∞. Then it is also
true that xn(k) → y as k → ∞, because
1
0 ≤ |xn(k) − y| ≤ |xn(k) − yn(k) | + |yn(k) − y| < + |yn(k) − y|∀k ∈ N,
n(k)
1
and since both n(k) and |yn(k) − y| tend to zero as k → ∞, |xn(k) − y| → 0 as
k → ∞.
If f is continuous then it is also true that f (yn(k) ) → f (y) and f (xn(k) ) →
f (y) as k → ∞. However, this is impossible because
7
Proof. Let (xn ) be a sequence in a normed vector space (E, k · k) that converges
to a limit x ∈ E. Let > 0. Since xn → x, there exists an N ∈ N such that
x2n + 2
x1 = 1, xn+1 =
2xn
√
is known to converge to 2, as does the sequence 1, 1.4, 1.41, 1.414, . . .. Com-
pleteness is the thing that tells us these sequences have limits. Similarly, a good
strategy for solving equations in Banach spaces is to start by writing a sequence
of approximate solutions and then to show that it is Cauchy and hence has a
limit.
p
Example 1.23. R3 with norm k(x, y, z)k = x2 + y 2 + z 2 is complete.
To prove this assertion, suppose that (xn ) is a Cauchy sequence in R3 . I
need to find an x ∈ R3 such that xn → x.
Consider the sequence (x1n ) in R formed using the first coordinate of each
vector. We claim that this is also Cauchy. To prove the claim, let > 0.
Since (xn ) is a Cauchy sequence there exists an N ∈ N such that n, m > N ⇒
kxn − xm k < . Then for n, m > N ,
(x1n − x1m )2 ≤ (x1n − x1m )2 + (x2n − x2m )2 + (x3n − x3m )2 = kxn − xm k2 < 2 .
8
Since x1n is Cauchy it has a limit x1 ∈ R, and similarly there exist x2 , x3 ∈ R
such that x2n → x2 and x3n → x3 . Now xn converges to the limit x = (x1 , x2 , x3 ),
because v
u 3
uX √
kxn − xk = t (xin − xi )2 → 0 + 0 + 0 = 0
i=1
Note that fn is the order n Taylor series for the exponential function t 7→ exp(t).
By Taylor’s theorem, fn → exp pointwise, and in fact fn → exp uniformly. Thus
kfn − exp k∞ → 0 and it follows that fn is a Cauchy sequence. However, the
limit exp of fn is not in P [a, b], so P [a, b] is not complete.
9
1.3 Inner products
Definition 1.26. Let E be a complex vector space. An inner product on E is
a function h·, ·i : E × E → C satisfying:
(P1) hx, yi = hy, xi ∀x, y ∈ E (skew-symmetry);
Proof.
Lemma 1.28. Any inner product on a complex vector space E satisfies hx, 0E i =
0 and h0E , xi = 0 for all x ∈ E.
Proof. Since 0E = 0E − 0E ,
10
It should be obvious that this function h·, ·i is skew-symmetric and linear
in the first slot. I’ll prove that it’s positive definite. If f : [a, b] → C is any
function, then
Z b
hf, f i = |f (t)|2 dt ≥ 0
a
because |f (t)|2 ≥ 0. Clearly if f = 0 then hf, f i = 0. If f 6= 0 then there exists
a s ∈ [a, b] such that f (s) 6= 0. Since |f (t)| is continuous there exists a δ > 0
such that |f (t)| > 21 |f (s)| for t ∈ I := [a, b] ∩ (s − δ, s + δ). The length of this
interval I is greater than or equal to δ. Therefore
2
δ|f (s)|2
Z Z
f (s)
hf, f i ≥ |f (t)|2 dt ≥ dt ≥ > 0.
I I 2 4
So f = 0 =⇒ hf, f i = 0.
There is one more example to come, but before that we need to present two
important results:
Proposition 1.31 (Cauchy-Schwarz inequality). Let h·, ·i be an innerpproduct
on a complex vector space E. Let k · k : E → R be the function kxk = hx, xi.
Then
|hx, yi| ≤ kxkkyk ∀x, y ∈ E.
Note that hx, xi has a square root because it is positive!
Proof. If x = 0E or y = 0E then the kxkkyk = 0 by definition and hx, yi = 0 by
lemma 1.28.
Suppose then that x 6= 0E and y 6= 0E . For any λ ∈ C it holds that
0 ≤ hλx + y, λx + yi
= λhx, λx + yi + hy, λx + yi
= λλ̄hx, xi + λhx, yi + λ̄hy, xi + hy, yi
= |λ|2 hx, xi + hy, yi + 2<(λhx, yi).
Therefore
|λ|2 hx, xi + hy, yi ≥ −2<(λhx, yi) ∀λ ∈ C.
Let us rearrange this by writing λ = ρeiθ , where ρ = |λ|, and dividing through
by ρ:
1
ρkxk2 + kyk2 ≥ −2<(eiθ hx, yi) ∀ρ > 0, θ ∈ R.
ρ
Remember, this inequality holds for all ρ, θ. If we chose
θ = π − arg(hx, yi)
then the right hand side equals 2|hx, yi|. If we choose
ρ = kyk/kxk
then the left hand side equals 2kxkkyk. Therefore the Cauchy-Schwarz inequal-
ity follows from our inequality.
11
You might be wondering how I knew which values of ρ and θ to choose. The
answer is that they are the values that result in the strongest possible inequality,
i.e. they make the LHS as small as possible and the RHS as large as possible.
This observation might help you to remember the proof!
Proposition 1.32. Let h·, ·i be an inner product
p on a complex vector space E,
and let k · k : E → R be the function kxk = hx, xi. Then k · k is a norm on E.
Notice that the norm k · k is defined using the inner product analogously to
how the length of a vector in R3 is defined using the dot product.
Proof. Axiom (N1) follows directly from the positivity of the inner product (P2).
Axiom (N2) follows from linearity and anti-linearity in the first and second slots:
p q
kλxk = hλx, λxi = λλ̄hx, xi = |λ|kxk.
Finally, the triangle inequality (N3) is proved using the Cauchy-Schwarz in-
equality as follows:
kx + yk2 = hx + y, x + yi
= hx, xi + hy, yi + hx, yi + hy, xi
≤ kxk2 + kyk2 + 2kxkkyk
= (kxk + kyk)2 .
This proposition means that inner product spaces are special examples of
normed vector spaces. Thus one can talk about convergent sequences, Cauchy
sequences in inner product spaces just as in normed vector spaces.
qP
n
Note that the proposition implies that on E = Cn , x 7→ 2
j=1 |xj | is
a norm, as claimed earlier. On C[a, b] we now have two norms: the norm
kf k∞ = supt∈[a,b] |f (t)| introduced earlier, and the norm
! 21
Z b
2
kf k2 = |f (t)| dt
a
derived from the L2 inner product. We distinguish these using the subscript
“2” or “∞”; we refer to the former as the supremum norm and the latter as the
L2 norm.
Now we can look at the promised last example of an inner product space.
2
Example 1.33.
P∞ Let `2 (N) be the set of sequences (xn )n∈N of complex numbers
for which n=1 |xn | converges; equivalently,
∞
( )
X
2 2
` (N) = x = (x1 , x2 , x3 , . . .) : |xn | < ∞ .
n=1
12
`2 (N) is a vector space, with addition and scalar multiplication given by:
x + y = (x1 + y1 , x2 + y2 , x3 + y3 , . . .), λx = (λx1 , λx2 , λx3 , . . .).
For these definitions to make sense we must check that if x, y ∈ `2 (N) then
x + y, λx ∈ `2 (N). By the triangle inequality for Cn ,
12 12 12 21 12
n
X n
X Xn X∞ X∞
|xj + yj |2 ≤ |xj |2 + |yj |2 ≤ |xj |2 + |yj |2 .
j=1 j=1 j=1 j=1 j=1
Before doing anything else we should check that this series converges, so that we
know hx, yi is a finite number. For any given n, we know by the Cauchy-Schwarz
inequality for Cn that
1/2 1/2 1/2 1/2
Xn Xn Xn X∞ ∞
X
xj ȳj ≤ |xj |2 |yj |2 ≤ |xj |2 |yj |2 .
j=1 j=1 j=1 j=1 j=1
and hence, taking n → ∞, hx, xi ≥ 0. It can be shown similarly that if for some
m ∈ N xm 6= 0, hx, xi ≥ |xm |2 0, so the only way that hx, xi can equal 0 is if
xn = 0 for all n ∈ N. Linearity in the first slot is straightforward: for any n,
n
X n
X n
X
(λxj + µyj )z̄j = λ xj z̄j + µ yj z̄j ,
j=1 j=1 j=1
13
and sending n → ∞ gives hλx + µy, zi = λhx, zi + µhy, zi. Skew-symmetry can
be proved similarly.
We have seen that every inner product defines a norm. Now we will discuss
the question of the converse: can every norm be obtained from an inner product?
Proposition 1.34 (Parallelogram identity). Let E be an inner product space,
and let k · k be the associated norm. Then
Proof.
kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi
= hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi
= 2hx, xi + 2hy, yi
= 2kxk2 + 2kyk2
Now consider the space C[0, 1]. Does the supremum norm satisfy the par-
allelogram identity? Let f, g ∈ C[0, 1] be the functions f (t) = 1, g(t) = t.
Then
kf k∞ = 1, kgk∞ = 1, kf + gk∞ = 2, kf − gk∞ = 1.
So kf + gk2∞ + kf − gk2∞ = 5 and 2kf k2 + 2kgk2 = 4, so k · k∞ does not obey the
parallelogram identity. Therefore it is not obtained from any inner product. So
it’s certainly not true that every norm is obtained from an inner product.
Let’s ask a different question: given any norm, how can we tell whether it
can be obtained from an inner product? The next proposition (which we won’t
prove) answers this question.
Proposition 1.35. Let E be a vector space and let k · k be a norm on E which
satisfies the parallelogram identity. Let
1
kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 .
hx, yi =
4
p
Then h·, ·i is an inner product on E and kxk = hx, xi ∀x ∈ E.
Thus every norm that satisfies the parallelogram identity comes from an
inner product.
Since inner product spaces carry norms one can speak of convergence of
sequences in inner product spaces, and continuous functions on inner product
spaces. The next lemma is a useful result that utilises both of these ideas:
Lemma 1.36. Let E be an inner product space and let y ∈ E. Then x 7→ hy, xi
and x 7→ hx, yi are both continuous functions on E, i.e. for every x ∈ E and
every sequence (xn ) that converges to x,
14
Proof. Let x ∈ E be any vector (xn ) be any sequence that converges to x. Then,
by the Cauchy-Schwarz inequality,
for all n, m > N and any k ∈ N. Taking the limit m → ∞ and using the fact
that xim → xi gives
k
X 2
|xin − xi |2 ≤
i=1
2
15
for all n > N and any k ∈ N. Therefore the sum on the left converges, so
xn − x ∈ `2 (N) for all n > N . Moreover,
∞
! 21 12
2
X
|xin i 2
−x | ≤ < (1)
i=1
2
for all n > N . Since was arbitrary, this inequality shows that kxn − xk → 0
as n → ∞.
It remains to show that x ∈ `2 (N). Thankfully, most of the work is already
done. It follows from (1) that kx − xn k is finite, and hence that x − xn ∈ `2 (N)
for some particular value of n. Therefore x = (x − xn ) + xn is a sum of two
elements of `2 (N); since `2 (N) is a vector space, x ∈ `2 (N).
Example 1.40. (C[a, b], k · k2 ) is not a Hilbert space. I will show this in the case
a = −1, b = 1, but a similar argument would work in any other case.
For any n ∈ N, let fn : [−1, 1] → C be the function,
0 −1 ≤ t < 0
fn (t) = nt 0 ≤ t < 1/n .
1 1/n ≤ t ≤ 1
This function is continuous (draw a graph!). Moreover, I will now show that
the sequence (fn ) in C[−1, 1] is Cauchy with respect to the norm k · k2 .
Let > 0. Choose N ∈ N such that N > 1/2 . Then for all n, m > N ,
1 1
Z 1 Z Z
N N 1
kfn − fm k22 = |fn (t) − fm (t)|2 dt ≤ |fn (t) − fm (t)|2 dt ≤ 1dt =
−1 0 0 N
√
Therefore kfn − fm k ≤ 1/ N < .
Despite the fact that (fn ) is Cauchy, it does not converge to a limit in
C[−1, 1]. I will prove this by contradiction: suppose that there is a function
f ∈ C[−1, 1] such that kfn − f k2 → 0. Then
Z 0 Z 0
kfn − f k22 ≥ |fn (t) − f (t)|2 dt = |f (t)|2 dt ≥ 0.
−1 −1
R0
Since kfn − f k2 → 0 as n → ∞, it must be that −1 |f (t)|2 dt = 0. Since f is
continuous, this implies that f (t) = 0 ∀t ∈ [−1, 0]. Similarly, for any > 0 and
any n ∈ N such that 1/n < ,
Z 1 Z 1
2 2
kfn − f k2 ≥ |fn (t) − f (t)| dt = |1 − f (t)|2 dt ≥ 0.
R1
It follows as above that |1 − f (t)|2 dt = 0, and hence that f (t) = 1 for
all t ∈ [, 1]. Since this is true for every > 0, f (t) = 1 for all t ∈ (0, 1].
Therefore the function f is not continuous at 0, contradicting our assumption
that f ∈ C[a, b].
16
Although (C[a, b], k · k2 ) is not a Hilbert space, there is a Hilbert L2 [a, b]
such that C[a, b] is a subspace of L2 [a, b]. Regarded as a subspace, C[a, b] is
“dense” in L2 [a, b], meaning that every element L2 [a, b] is a limit of a sequence
in C[a, b]. The space L2 [a, b] can be constructed using a process of “completion”,
or it can be constructed using “Measure Theory”. In this course we will not
study L2 [a, b] in any detail.
A Haar basis (or algebraic basis) for E is a linearly independent set S ⊆ E such
that sp S = E.
Notice that this definition only allowsPfor finite linear combinations. This is
∞
because talking about infinite sums like i=1 λi xi is dangerous – the sum may
not converge! We’ll see a way to get around that problem later.
Example 2.2. Consider the following vectors in `2 (N):
e1 = (1, 0, 0, 0, . . .)
e2 = (0, 1, 0, 0, . . .)
..
.
Each of these vectors is in `2 (N) because the sum of the squares of its coefficients
converges to 1. Let S = {ei : i ∈ N}. I claim that the set S is linearly inde-
pendent. Let {ei1 , . . . , ein } be any finite subset, where i1 , . . . in are all distinct.
Suppose that
λi1 ei1 + . . . + λin ein = (0, 0, 0, . . .)
By comparing the i1 -th coefficient on the LHS and RHS, we see that λi1 = 0.
Similarly, λi2 = . . . = λin = 0. Therefore S is linearly independent.
17
What is the span of this set? Let c0 (N) be the set of sequences which are
eventually constant and zero:
I claim that sp S = c0 (N). First I’ll show that sp S ⊆ c0 (N). Let x ∈ sp S. Then
for some coefficients λi1 , . . . , λin and vectors ei1 , . . . ein ∈ S. Let N = max{i1 , . . . , in }.
Then for m > N , xm = 0, so x is eventually constant and zero. Now I’ll show
that c0 (N) ⊂ sp S. Let x ∈ c0 (N). Then x is of the form
x = (x1 , x2 , . . . , xN , 0, 0, . . .)
PN
for some N ∈ N. It follows that x = i=1 λi ei , so x ∈ sp S.
Note that sp S 6= `2 (N). For P example, the vector x whose n-th entry is
∞
xn = 1/n belongs to `2 (N) (since n=1 n12 converges), but x is NOT a finite
linear combination of the ei ’s. (It can be proved that `2 (N) has a Haar basis –
however, the basis is uncountable and the proof does not tell you how to write
it down).
The preceding example example illustrates the problem with using a Haar
basis: even though the set S looked like it could be a basis for `2 (N), it turned out
not to be one. We will solve this problem by introducing another (inequivalent)
definition of a basis, one which directly exploits the Hilbert space structure.
Definition 2.3. Let E be an inner product space. Two vectors x, y are called
orthogonal if hx, yi = 0. An orthonormal set in E is a subset S ⊂ E consisting
of pairwise-orthogonal unit vectors. In other words, for all x, y ∈ S,
(
1 if x = y
hx, yi = .
0 if x 6= y
18
Proof. Let S be an orthonormal set and let x1 , . . . xn be n distinct vectors in S.
Suppose that
Xn
λi xi = 0E
i=1
Since the set is orthonormal and the xi ’s are distinct, hxi , x1 i equals 1 when
i = 1 and 0 otherwise. Therefore the equation simplifies to λ1 = 0. Similarly, it
can be shown that λ2 = . . . = λn = 0. So S is linearly independent.
For example, the set S in example 2.2 is an orthonormal set, as you may
check for yourself. This proposition confirms that the set S in example 2.2 is
linearly independent.
Definition 2.5. An orthonormal sequence (en ) in an inner product space is
called a Hilbert basis, or countable orthonormal basis, if for every x ∈ E, there
exists a sequence (λn ) in C such that
n
X
lim λi ei = x.
n→∞
i=1
P∞
We often abbreviate the last condition to i=1 λi ei = x. Notice that the
definition refers to the inner product in two places: first one needs an inner
product to talk about vectors being “orthonormal”, and second, one needs the
induced
P∞norm to talk about convergence. If we did not have a norm the expres-
sion i=1 λi ei would be meaningless.
Definition 2.6. A Hilbert space is called separable if it admits a countable
orthonormal basis.
Example 2.7. `2 (N) is a separable Hilbert space.
We will show that the vectors e1 , e2 , . . . in example 2.2 formP
an orthonormal
∞
sequence. Let x ∈ `2 (N), and write x = (x1 , x2 , x3 , . . .) where i=1 |xi |2 < ∞.
We claim that
X n
x = lim xi ei .
n→∞
i=1
19
Pn
To prove this we must show that kx − i=1 xi ei k → 0 as n → ∞. Now
n 2
X
x− xi ei = k(0, . . . , 0, xn+1 , xn+2 , . . .)k2
i=1
∞
X
= |xi |2
i=n
X∞ n
X
= |xi |2 − |xi |2 .
i=1 i=1
P∞
As n → ∞ the last expression here tends to zero, so x = i=1 xi ei as claimed.
Now we will explore some properties of countable orthonormal sets and bases.
The next lemma generalises the following simple observation: if ~x = (x1 , x2 , x3 )
is a vector in R3 and ~e1 = (1, 0, 0), ~e2 = (0, 1, 0), ~e3 = (0, 0, 1) is the standard
basis, then xi = ~x.~ei for i = 1, 2, 3.
Lemma 2.8. Let (en )n∈N be an orthonormal set in an inner product space E
and suppose that x ∈ E and (λn ) is a sequence in C such that
∞
X
x= λi ei .
i=1
20
Corollary 2.10 (Parseval’s identity). Let (en )n∈N be a Hilbert basis for an
inner product space E. Then
∞
X
hx, yi = hx, en ihen , yi ∀x, y ∈ E.
n=1
Pn
Proof. By corollary 2.9, x = limn→∞ j=1 hx, ej iej . Therefore, by lemma 1.36,
* n
+ * n + n
X X X
hx, yi = lim hx, ej iej , y = lim hx, ej iej , y = lim hx, ej ihej , yi.
n→∞ n→∞ n→∞
j=1 j=1 j=1
Assuming that this sequence converges to a limit x ∈ E, we will have (by lemma
2.8) that yj = hx, ej i, and hence that T (x) = y.
In order to show that (xn )n∈N is convergent, we first show that it’s Cauchy.
For n ≥ m,
2
n
X n
X ∞
X ∞
X m
X
kxn − xm k2E = yj ej = |yj |2 ≤ |yj |2 = |yj |2 − |yj |2 .
j=m+1 j=m+1 j=1 j=1 j=1
E
21
The right hand side of this equality tends to 0 as m → ∞ because y ∈ `2 (N). It
follows that the sequence (xn )n∈N is Cauchy. Since E is complete, this sequence
converges to a limit x ∈ E, and, as explained above, T (x) = y. So T is
surjective.
Here is one more important example of an orthonormal set:
Example 2.12. Consider the space C[−π, π], with inner product
Z π
1
hf, gi = f (t)ḡ(t)dt.
2π −π
S = {t 7→ exp(int) : n ∈ Z}.
I claim that S is an orthonormal set in C[a, b]. To prove this, I need to calculate
hen , em i. It is useful first to note the following: for any n ∈ Z such that n 6= 0,
Z π Z π
exp(int)dt = cos nt + i sin nt dt
−π −π
1 π
= [sin nt − i cos nt]t=−π
n
= 0,
It follows that
(
Z π Z π
1 1 1 n=m
hen , em i = exp(int)exp(imt)dt = exp(i(n−m)t)dt = .
2π −π 2π −π 6 m
0 n=
I will prove later in this course that that (en )n∈Z is not just an orthonormal
set but is in fact a Hilbert basis for C[−π, π].
Definition 2.13. Given any function f ∈ C[−π, π] the Fourier coefficients of
f are the complex numbers
Z π
1
fˆ(n) = hf, en i = f (t) exp(−int)dt ∀n ∈ Z.
2π −π
22
(Here the inner product and the functions en are as in the previous example.)
The Fourier series of f is
∞
X ∞
X
hf, ej iej = fˆ(j) exp(ijt).
j=−∞ j=−∞
x 7→ (x.e)e,
√
where e := v/ v.v. This is similar to PS in the case that S contains just one
vector. Similarly, given a pair e1 , e2 of orthonormal vectors in R3 , the map
23
Proof. First I show that x − PS (x) is perpendicular to y ∈ sp S. Let e1 , . . . en
be the elements of S. By corollary 2.9, y = PS (y). By direct calculation,
* n n
+
X X
hPS (y), x − PS (x)i = hy, ei iei , x − hx, ej iej
i=1 j=1
n
X n X
X n
= hy, ei ihei , xi − hy, ei ihei , ej ihx, ej i
i=1 i=1 j=1
Xn Xn
= hy, ei ihei , xi − hy, ei ihx, ei i
i=1 i=1
= 0.
24
At the end of the process you will have an orthonormal basis {e1 , e2 , . . . , en }.
Note that steps 1 and 3 guarantee that hek , ek i = 1, and step 2 guarantees that
hej , ek i = 0 if j 6= k. Note that in step 2 you are calculating fk+1 − PSk (fk+1 ),
where Sk = {e1 , e2 , . . . ek }. The vector gk+1 that you calculate in step 2 is
always non-zero, because fk+1 ∈ / sp{e1 , e2 , . . . ek } = sp{f1 , f2 , . . . , fk }.
I conclude this section by showing two useful things that you can prove
using finite-dimensional projects. First, I’ll show you a new way to prove the
Cauchy-Schwarz inequality.
Let x, y be two vectors in an inner product space E. We know the Cauchy-
Schwarz inequality is easily proved if x and y are both zero, so suppose without
loss of generality that y 6= 0E . Define e = y/kyk. Then S = {e} is an orthonor-
mal set (because he, ei = 1). If we let u = PS (x) = hx, eie and v = x − u then
u and v are orthogonal (prove this, either by calculating hu, vi directly or by
using the previous proposition). It follows that
Note that this proof gives some geometrical insight: we have an equality |hx, yi| =
kxkkyk if and only if x = λy or y = λx for some λ ∈ C, i.e. equality holds in the
Cauchy-Schwarz inequality if and only if x and y are linearly dependent.
Using a similar argument, one can prove a more powerful inequality:
Proposition 2.18 (Bessel’s inequality). Let (en )n∈N be an orthonormal se-
quence in an inner product space. Then
Pn Pn
(a) For every n ∈ N, j=1 |hx, ej i|2 ≤ kxk2 , and j=1 |hx, ej i|2 = kxk2 if and
only if x ∈ sp{e1 , . . . , en }.
P∞ 2 2
(b) j=1 |hx, ej i| ≤ kxk .
Proof. For the first part, let Sn = {e1 , . . . , en }. The by part(a) of proposition
2.16, PSn (x) is perpendicular to x − PSn (x) (since PSn (x) ∈ sp Sn ). So, by the
Pythagorus lemma,
Therefore kxk2 ≥ kPSn (x)k2 , and kxk2 = kPSn (x)k2 ⇔ x = PSn (x). If x =
PSn (x) then x ∈ sp Sn , and conversely, if xP
∈ sp Sn then PSn (x) = x. By direct
n
calculation, one finds that kPSn (x)k2 = 2
j=1 |hx, ej i| . This completes the
proof of the first part.
The second part follows from the first part by taking the limit n → ∞.
25
2.3 Fejér’s theorem
In this section we will take our first steps towards our goal of proving that the
Fourier series for a function f ∈ C[−π, π] converges to f . In this section, we
restrict attention to a slightly smaller set of functions.
In other words, the nth Cesàro mean is the mean (average) value of the first
n + 1 partial sums. Recall that the series is said to converge if there is a limit s
such that sn → s (with respect to some norm). It can be shown (and we won’t
do so here) that if sn → s then also σn → s. However, it can happen that the
Cesàro means converge, even when the partial sums do not.
Example 2.21. Let (xn ) be the sequence in C given by xn = (−1)n for n ∈
N ∪ {0}. Then the partial sums of (xn ) are
n
(
X
n 0 if n is odd
sn = (−1) =
i=0
1 if n is even.
26
Clearly, these don’t converge. The Cesáro means are
(
1
if n is odd
σn = 2 n+2
2(n+1) if n is even.
1
Clearly σn → 2 as n → ∞.
The partial sums of the Fourier series of a 2π-periodic continuous function
f are
n
X
sn (f )(t) = fˆ(j) exp(ijt).
j=−n
In general, the partial sums of the Fourier series do not converge to f pointwise.
However, it can be proved (and we won’t prove here) that the partial sums do
converge if f is a smooth function. We will only prove that the Cesàro means
of f converge to f . First, we explore a few different ways of writing the partial
sums.
The values of j and k summed over form a triangle in the j, k-plane defined by
−k ≤ j ≤ k and 0 ≤ k ≤ n. For a given j-coordinate, there are n + 1 − |j| points
in this triangle, whose k-coordinates are k = |j|, |j| + 1, . . . , n. Therefore, the
total coefficient of fˆ(j) exp(ijt) in the sum is (n + 1 − |j|)/(n + 1) = 1 − |j|/(n +
1).
For our next way of re-writing the Cesàro means, we need some more termi-
nology.
Definition 2.23. Let f, g be two 2π-periodic continuous functions. The con-
volution of f and g is the function f ∗ g such that
Z π
1
f ∗ g(t) = f (t − s)g(s)ds.
2π −π
27
Note that f ∗ g is also 2π-periodic and continuous.
Definition 2.24. The Fejér kernel is the sequence of 2π-periodic functions
Kn : R → R defined by
n
X |j|
Kn (t) = 1− exp(ijt).
j=−n
n+1
Now we can now give another way of writing the partial sums using convo-
lution and the Fejér kernel:
Proposition 2.25. Let f be a 2π-periodic continuous function. Then σn (f ) =
Kn ∗ f .
Proof. By the previous proposition and the definition of the Fourier coefficients,
n Z π
X |j| 1
σn (f )(t) = 1− exp(ijt) f (s) exp(−ijs)ds
j=−n
n+1 2π −π
Z π n
1 X |j|
= f (s) 1− exp(ij(t − s))ds
2π −π j=−n
n+1
Z π
1
= f (s)Kn (t − s)ds
2π −π
= Kn ∗ f (t).
Before moving on, let’s make a note of a useful property of the convolution:
Proposition 2.26. Convolution is a commutative product; i.e. f ∗ g = g ∗ f for
all 2π-periodic continuous functions f and g.
Proof. First make change of integration variable u = t − s:
Z π
1
f ∗ g(t) = f (t − s)g(s)ds
2π −π
Z t−π
1
=− f (u)g(t − u)du
2π t+π
Z t+π
1
= f (u)g(t − u)du
2π t−π
Z π Z t−π Z t+π
1 1 1
= f (u)g(t − u)du − f (u)g(t − u)du + f (u)g(t − u)du
2π −π 2π −π 2π π
Since f and g are 2π-periodic, the last two terms cancel, leaving g ∗ f (t).
The definition of the Fejér kernel as a sum is a bit difficult to work with.
Fortunately, there is a much neater expression:
28
Proposition 2.27. Kn (t) is a 2π-periodic, even and continuous function, such
that
n + 1 t=0
Kn (t) =
sin(n+1) t
2
1 2
0 < t ≤ π.
n+1 sin t 2
sin(n + 1) 2t
= e−int/2 (1 + r + . . . + rn ).
sin 2t
= (n + 1)Kn (t).
It’s a good idea to try to plot a graph of Kn (t) for fixed n (I’ll do so in the
lecture).
Now that we have several ways to write down Kn we can summarise some
of its important properties in the next proposition:
29
Proposition 2.28. For any n ∈ N, the Fejér kernel Kn satisfies:
(i) Kn (t) ≥ 0 ∀t ∈ R.
1
Rπ
(ii) 2π −π
Kn (t) = 1
(iii) For all > 0 and δ > 0 there exists N,δ ∈ N such that
Proof. Part (i) follows from the previous proposition as Kn (t) is either a square
of a real number or equals n + 1 (according to the R πvalue of t). Part (ii) follows
1
from the definition of Kn , and the fact that 2π −π
exp(2πijt)dt = 1 if j = 0
and 0 otherwise.
For part (iii), let , δ > 0 be given. By the previous proposition, if t ∈ [δ, π]
then
1 sin2 ((n + 1)t/2)
Kn (t) =
n+1 sin2 (t/2)
1 1
≤ 2
n + 1 sin (t/2)
1 1
≤ 2 .
n + 1 sin (δ/2)
1
N,δ > .
sin2 δ/2
kf − σn (f )k∞ → 0 as n → ∞.
30
Let’s think about what this means. By proposition 2.25,
Z π
1
σn (f )(t) = Kn ∗ f (t) = Kn (s)f (t − s)ds.
2π −π
Therefore
Z π
1
|σn (f )(t) − f (t)| = Kn (s) f (t − s) − f (t) ds
2π −π
Z π
1
≤ Kn (s) f (t − s) − f (t) ds. (2)
2π −π
(Here we have used the fact that Kn (s) ≥ 0 – see proposition 2.28 part (i)). We
need to show that the integral on the right is less than for sufficiently large
n. Our strategy will be to split the domain of integration into a small interval
[−δ, δ] containing the origin, and its complement [−π, −δ) ∪ (δ, π]. We will do
so using the continuity of f and the properties of the Fejér kernel Kn .
First let’s use the continuity of f . Since f is continuous it is also uniformly
continuous, by proposition 1.19. Therefore there exists a δ > 0 such that
∀s, t ∈ R, −δ ≤ s ≤ δ =⇒ |f (t − s) − f (t)| < .
2
Without loss of generality, we may assume δ < π. It follows that
Z δ Z δ
1 1
Kn (s) f (t − s) − f (t) ds < Kn (s) ds
2π −δ 2π −δ 2
Z π
1
< Kn (s) ds
2π −π 2
= (3)
2
1
Rπ
(where we have used that Kn (s) ≥ 0 and 2π −π Kn (s)ds = 1).
Now we’ll make use of the properties Kn (s). Let
31
Therefore
Z π Z π
1 1
Kn (s) f (t − s) − f (t) ds ≤ Kn (s) |f (t − s)| + |f (t)| ds
2π δ 2π δ
Z π
1
≤ (M + M )ds
2π δ 4M
π−δ
=
2 2π
< (4)
4
Similarly,
Z −δ
1
Kn (s) f (t − s) − f (t) ds < . (5)
2π −π 4
Therefore, by equations (2) to (5),
|σn (f )(t) − f (t)| < + + = ,
2 4 4
as required.
Using this lemma we can prove the following corollary of Fejér’s theorem:
32
Corollary 2.31. Let f ∈ C= [−π, π]. Then the Cesàro means σn (f ) of the
Fourier series of f converge to f with respect to the norm k · k2 , i.e.
lim kf − σn (f )k2 = 0.
n→∞
33
Let us choose δ such that
2
δ < 2π ;
2kf k∞
Therefore
∀n ∈ N kf − Pn (f )k2 ≤ kf − σn (g)k2 .
Therefore
∀n ∈ N n > N =⇒ kf − Pn (f )k2 < .
34
Corollary 2.34. The Fourier series for any function f ∈ C[−π, π] converges
to f with respect to the norm k · k2 .
Corollary 2.35 (Parseval’s identity). For any two functions f, g ∈ C[−π, π],
Z π ∞
1 X
f (t)g(t)dt = fˆ(n)ĝ(n).
2π −π n=−∞
P∞ The left hand side is the inner product hf, gi and the right hand side
Proof.
is n=−∞ hf, en ihen , f i (where en (t) := exp(int). Therefore the result follows
from Parseval’s identity (proposition 2.10) and the fact that the en form a
countable orthonormal basis.
Corollary 2.36 (Riemann-Lebesgue lemma). For any function f ∈ C[−π, π],
fˆ(n) → 0 as n → ±∞.
P∞ Rπ
ˆ 2 1
|f (t)|2 dt < ∞.
Proof. By the previous corollary, n=−∞ |f (n)| = 2π −π
Therefore fˆ(n) → 0 as n → ∞, and as n → −∞.
35
We therefore define the real Fourier coefficients
1 π
Z
an = (fˆ(n) + fˆ(−n)) = f (t) cos nt dt
π −π
1 π
Z
bn = (ifˆ(n) − ifˆ(−n)) = f (t) sin nt dt
π −π
Like the complex Fourier series, the real Fourier series converges to f . Note
that an is defined for n ≥ 0 and bn is defined for n ≥ 1, and an and bn are both
real as long as f is a real continuous function.
If f is and odd function then an = 0 ∀n, because cos(nt) is even and the
integral of a product of odd and even functions vanishes. Similarly, if f is an
even function then bn = 0 ∀n.
The sine Fourier series. Let f : [0, π] → R be a continuous function such that
f (0) = 0. We construct a Fourier series for f using the following trick. Define
a function f˜ : [−π, π] → R as follows:
(
f (t) 0≤t≤π
f˜(t) =
−f (−t) −π ≤ t < 0.
Now f˜ is continuous, because f (0) = 0, and also odd. The Fourier series for f˜
converges to f˜ on the interval [−π, π], and therefore it also converges to f on
the interval [0, π]. Since f˜ is an odd function, so the coeffients an in its real
Fourier series are all zero and the bn coefficients are given by
1 π ˜ 2 π
Z Z
bn = f (t) sin nt dt = f (t) sin nt dt
π −π π 0
This is the sine Fourier series for f , and Bn are known as the sine Fourier
coefficients. The series converges to f on the interval [0, π].
The cosine Fourier series. Let f : [0, π] → R be any continuous function.
Similar to above, we define a function f˜ : [−π, π] → R:
(
˜ f (t) 0≤t≤π
f (t) =
f (−t) −π ≤ t < 0.
36
This function is continuous and even. Therefore the coeffients bn in its real
Fourier series are zero and the coefficients an are given by
1 π ˜ 2 π
Z Z
an = f (t) cos nt dt = f (t) cos nt dt.
π −π π 0
This is the cosine Fourier series for f , and An are the cosine Fourier coefficients.
The series converges to f on the interval [0, π].
One partial differential equation that can be solved using the Fourier series
is the wave equation:
∂2y 2
2∂ y
= c (6)
∂t2 ∂x2
This is a differential equation for a function y(x, t) of two variables. The equa-
tion can be used to model a taut vibrating string: y(x, t) is the displacement
of the string at position x and time t, and c ∈ R is a real constant related to
the physical properties of the string. We assume that the ends of the string
are held fixed at positions x = 0 and x = π; therefore we impose the boundary
conditions
y(0, t) = 0 and y(π, t) = 0 ∀t ≥ 0. (7)
Suppose that at time t = 0,
∂y
y(x, 0) = f (x), (x, 0) = 0 ∀x ∈ [0, π], (8)
∂t
where f (x) is a continuous function of x ∈ [0, π]. These initial conditions de-
scribe a string that has been “plucked”. We aim to model the future oscillation
of the string by solving the wave equation subject to these boundary conditions
and initial conditions.
To solve the system (6), (7), (8), let’s consider the sine Fourier coeffients of
y:
2 π
Z
Bn (t) = y(x, t) sin nx dx.
π 0
(it is wise to use the sine Fourier series rather than any other Fourier series
because y(x, t) vanishes at x = 0 and at x = π). Note that because y depends
on both x and t, the Fourier coefficients depend on time t. Let us suppose that y
solves the wave equation and boundary conditions, and determine a differential
37
equation solved by Bn :
d2 Bn 2 π ∂2y
Z
(t) = (x, t) sin nxdx
dt2 π 0 ∂t2
Z π 2
2 ∂ y
= c2 (x, t) sin nxdx
π 0 ∂x2
Z π x=π
22 ∂y d 2 2 ∂y
= −c (x, t) sin nxdx + c (x, t) sin nx
π 0 ∂x dx π ∂x x=0
Z π 2
x=π
2 d 2 d
= c2 y(x, t) 2 sin nxdx − c2 y(x, t) sin nx
π 0 dx π dx x=0
Z π
2
= −c2 n2 y(x, t) sin nxdx.
π 0
Here in the first line we exchanged integrals and derivatives – this is fine as
long as ∂ 2 y/∂t2 exists and is continuous. The two boundary terms arising from
integration by parts vanish because sin(nx) and y(x, t) vanish when x = 0, π.
Our conclusion therefore is that
d2 Bn
(t) + c2 n2 Bn (t) = 0.
dt2
So we have infinitely-many ordinary differential equations (labelled by n ∈
N) to solve. These equations are particular examples of second order linear
equations, whose general solution you should have seen before (in MATH1012,
for example):
Bn (t) = Cn cos cnt + Dn sin cnt.
Here Cn , Dn are real constants for all n ∈ N. To fix these constants we consider
the initial conditions. These are
2 π
Z
Bn (0) = y(x, 0) sin nxdx
π 0
Z π
2
= f (x) sin nxdx,
π 0
Z π
dBn 2 ∂y
(0) = (x, 0) sin nxdx
dt π 0 ∂t
= 0.
From the general solution, Bn (0) = Cn and dBn /dt(0) = −nDn . Therefore
Dn = 0 and
2 π
Z
Cn = f (x) sin nxdx.
π 0
Therefore the solution y(x, t) to the system (6), (7), (8) can be described using
its sine Fourier series:
∞
X ∞
X
y(x, t) = bn sin nx = Cn cos cnt sin nx
n=1 n=1
38
This series converges to y (with respect to the norm k · k2 ) because y(x, t) is
continuous.
To summarise, by calculating the constants Cn you can write down a solu-
tion of the system (6), (7), (8) that takes the form of a series. Note that the
constants Cn are nothing other than the sine Fourier coefficients of the function
f . Note also that we did not prove that the wave equation has a solution; we
merely found a formula for its solution assuming that it has one (proving that
differential equations have solutions is a topic all of its own).
The piece of magic that makes all of this work is the relation
d2
sin nx = −n2 sin nx
dx2
that appeared in our integration by parts formula. This relation can be inter-
preted as saying that sin nx is an eigenvector of the linear operator d2 /dx2 with
eigenvalue −n2 . We will come back to eigenvectors and eigenvalues later in the
course.
Example 2.37. Solve the system (6), (7), (8) with initial conditions given by
f (x) = x(π − x).
To write a series solution, we just need to know the sine Fourier coefficients of
f . These are
2 π
Z
Cn = x(π − x) sin nxdx
π 0
x=π
2 π
− cos nx − cos nx
Z
2
=− (π − 2x) dx + x(π − x)
π 0 n π n x=0
2 π
Z
cos nx
= (π − 2x) dx
π 0 n
x=π
2 π
Z
sin nx sin nx
=− (−2) 2 dx + (π − 2x) 2
π 0 n n x=0
Z π
4
= sin nxdx
πn2 0
4 x=π
= − cos nx x=0
πn3
4
1 + (−1)n+1
= 3
πn
(
8
3 n is odd
= πn
0 n is even.
39
3 Subspaces
In this section we return to our study of subspaces. Recall that so far we
have only discussed finite-dimensional subspaces of inner product spaces – the
study of infinite-dimensional subspaces requires more care. The two important
definitions are:
Definition 3.1. Let E be a normed vector space and let F ⊂ E be a subspace.
We say that F is closed if every convergent sequence in F has its limit in F .
That is, for all sequences (xn ) in F and all points x ∈ E,
xn → x =⇒ x ∈ F.
y1 = (1, 0, 0, 0, 0, . . .)
y2 = (1, 21 , 0, 0, . . .)
y3 = (1, 21 , 13 , 0, . . .)
..
.
40
Since the subspace c0 (N) of `2 (N) is not closed it is also not complete, by
proposition 3.4 (you also proved in your exercise sheet that c0 (N) is not com-
plete). We saw in example 1.25 that the subspace P [a, b] of polynomial functions
in C[a, b] is not complete with respect to the norm k · k∞ . Therefore this space
is also not closed, by proposition 3.4.
Example 3.6. Any finite-dimensional subspace of an inner product space is both
complete and closed.
This follows from the fact that any n-dimensional subspace F of an inner
product space E is isometrically isomorphic to Cn (with its usual inner product).
Since Cn is complete (as you showed on an exercise sheet), F must also be
complete, and hence closed by proposition 3.4.
To construct an isometric isomorphism from F to Cn choose an orthonormal
basis {e1 , . . . , en } for F (by using P
the Gram-Schmidt process). Then let L :
n
Cn → F be the linear map L(z) = i=1 zi ei . It’s an exercise to check that L is
an isometric isomorphism!
kx − zk ≤ kx − yk ∀y ∈ F.
d = inf{kx − yk2 : y ∈ F.
This infimum exists because the set is bounded from below by zero. Since d is
a greatest lower bound, for every n ∈ N there must exists a yn ∈ F such that
kx − yn k2 < d + 1/n (otherwise d + 1/n would be an upper bound greater than
d). Since d is a lower bound, kx − yn k2 > d ∀n ∈ N. So, by the squeeze rule,
kx − yn k2 → d as n → ∞.
Now we’ll show that this sequence yn must be Cauchy. By the parallelogram
identity,
41
where in the last line we used the fact that yn +y
2
m
∈ F and the definition of d.
Given > 0, we may choose N ∈ N such that n > N =⇒ kyn − xk2 − d < 2 /4.
It follows that n, m > N =⇒ kyn − ym k < , so yn is a Cauchy sequence.
Since yn is Cauchy and F is complete yn converges to a limit z ∈ F . By the
continuity of the norm,
√
kz − xk = lim kyn − xk = d.
n→∞
z+z 0 2
kz − z 0 k2 = 4d − 4kx − 2 k ≤ 0.
kPF (x) − xk ≤ ky − xk ∀y ∈ F.
Note that PF (x) exists and is unique by the preceding theorem. However,
we haven’t yet shown that PF is linear.
If F is finite-dimensional with an orthonormal basis {e1 , . . . , en }, proposition
2.16 tells us that PF is the map introduced in definition 2.14:
n
X
PF (x) = hx, ej iej .
j=1
42
Proof. Suppose that x − z is perpendicular to F . Then, for any y ∈ F ,
Pythagorus tells us that
kx − yk2 = kx − z + z − yk2 = kx − zk2 + kz − yk2 ≥ kx − zk2 .
So z is the closest point in F to x, so z = PF (x).
Now suppose that z = PF (x) i.e. that z is the closest point in F to x.
Suppose for contradiction that there exists a y ∈ F such that hx − z, yi 6= 0.
By multiplying y with a phase if necessary, we may assume without loss of
generality that hx − z, yi = α where α ∈ R and α > 0. Let t > 0 and consider
kx − (z + ty)k2 = kx − zk2 − 2tα + t2 kyk2 .
If t = α/kyk2 then
α2 α2 kyk2 α2
kx − (z + ty)k2 = kx − zk2 − 2 2
+ 4
= kx − zk2 − .
kyk kyk kyk2
Thus z + ty is a point in F which is closer to x than z, contradicting our
assumption. So x − z is orthogonal to F .
Proposition 3.10. Let F be a complete subspace of an inner product space E.
Then PF : E → F is a linear map.
Proof. Let x1 , x2 ∈ E, λ1 , λ2 ∈ F , X = λ1 x1 +λ2 x2 and Z = λ1 P (x1 )+λ2 P (x2 ).
We must show that P (X) = Z. By proposition 3.9,
hX − Z, yi = λ1 hx1 − P (x1 ), yi + λ2 hx2 − P (x2 ), yi = 0 ∀y ∈ F.
Therefore, again by proposition 3.9, Z = PF (X).
43
Example 3.13. Consider E = C[−a, a] with its usual inner product. Let Feven
and Fodd be the spaces of even and odd functions. These are both subspaces of
C[−a, a]. We claim that
⊥
Feven = Fodd .
It follows that Fodd is a closed subspace of C[−a, a].
The crucial observation is that
Z a
∀f ∈ Fodd , ∀g ∈ Feven , hf, gi = f (t)g(t)dt = 0
−a
F ⊆ (F ⊥ )⊥ .
(F ⊥ )⊥ ⊆ F.
44
4 Linear operators
4.1 Bounded linear operators
Recall that a linear operator (or linear map) T is a function T : E → F between
two vector spaces such that:
(i) T (x + y) = T (x) + T (y) ∀x, y ∈ E and;
(ii) T (λx) = λT (x) ∀λ ∈ C.
In this section we will study linear operators of the following type:
45
The fact that fy is a linear operator follows immediately from the definition
of the inner product (check this!). Let e ∈ E be any vector of unit length; then
by the Cauchy-Schwarz inequality
Therefore the operator fy is bounded. Moreover kfy kop ≤ kyk, because kyk is
an upper bound and kfy kop is a least upper bound for the set
It remains to show kfy kop ≥ kyk. Suppose that y 6= 0E and let e = y/kyk.
Then
hy, yi
|fy (e)| = = kyk.
kyk
Then kfy kop ≥ kyk, because kyk belongs to the set above and kfy kop is an upper
bound for this set. Therefore kfy kop = kykop when y 6= 0E . The case y = 0E is
left as an exercise.
Example 4.4. Consider C[−π, π] with the usual inner product and let T :
C[−π, π] → C[−π, π] be the operator T (f ) = df
dt . Then T is not bounded.
Consider for example the function en (t) = exp(int), where n ∈ N. Then
ken k2 = 1 and T (en ) = inen so kT (en )k2 = n. Thus the set {T (f ) : f ∈
C[−π, π], kf k2 = 1} is not bounded above, so T is not bounded.
The next lemma provides an alternative way to check boundedness of linear
operators:
Lemma 4.5. Let E, F be two normed vector spaces (with E non-trivial) and
let T : E → F be a linear operator. Then T is bounded if and only if
46
Example 4.6. Let T : Cn → Cn be the map T (x) = Ax, where A is the diagonal
n × n matrix:
λ1 0 0 ··· 0
0 λ2 0 · · · 0
A= .
.. ..
.. . .
0 0 0 ··· λn
I claim that T is bounded and that kT kop = L, where L := maxi |λi |. Let
x ∈ Cn . Then
n
! 12 n
! 12
X X
kT (x)k = |λi xi |2 ≤ L2 |xi |2 = Lkxk.
i=1 i=1
So T is bounded by lemma 4.5, and kT kop ≤ L. Now let i be such that |λi | = L
and let x ∈ Cn be such that xi = 1 and xj = 0 if j 6= i. Then
47
Similarly, for λ ∈ C the linear map λT is defined by
The space of linear maps from E to F forms a vector space with respect to
these operations. Note that the zero-vector in this vector space is the map
0E,F : E → F such that 0E,F (x) = 0F ∀x ∈ E.
Proposition 4.9. Let E, F be two normed vector spaces. Then (B(E, F ), k·kop )
is a normed vector space.
Proof. We first show that B(E, F ) is a subspace of the vector space of all linear
maps E → F . Let S, T ∈ B(E, F ) and λ ∈ C. We must show that S + T ∈
B(E, F ), λT ∈ B(E, F ) and 0E,F ∈ B(E, F ).
Let e ∈ E with kek = 1. Then
1 1
kT kop = λT ≤ kλT kop
λ op |λ|
48
Proof. Let (Tn ) be a Cauchy sequence in B(E, F ). We claim that (kTn kop ) is a
Cauchy sequence in R. By lemma 1.8,
Let > 0. Since (Tn ) is Cauchy there exists an N ∈ N such that n, m > N =⇒
kTn − Tm kop < . So n, m > N =⇒ |kTn kop − kTm kop < . Therefore (kTn kop )
is a Cauchy sequence. Since R is complete this Cauchy sequence converges; let
T (λx + µy) = lim Tn (λx + µy) = λ lim Tn (x) + µ lim T (y) = λT (x) + µT (y)
n→∞ n→∞ n→∞
kT (x)k = lim Tn (x) = lim kTn (x)k ≤ lim kTn kkxk = Lkxk.
n→∞ n→∞ n→∞
49
To conclude this subsection we focus on a special case.
Definition 4.11. Let E be a normed vector space. The dual of E is the space
E ∗ = B(E, C) of bounded linear operators from E to C. Elements of B(E, C)
are called bounded linear functionals.
By propositions 4.9 and 4.10, E ∗ is a Banach space (i.e. a complete normed
vector space).
For example, if E is an inner product space and y ∈ E we saw in example
4.3 that the map
fy : E → C, x 7→ hx, yi
is a bounded linear functional. In general, not all bounded linear functionals
R1
are of this type (for example, the functional T : C[−1, 1] → C, T (g) = 0 g(t)dt
is not). However, things are different if E is complete:
Theorem 4.12 (Riesz-Fréchet). Let H be a Hilbert space. Then the map F :
H → H ∗ given by F : y 7→ fy is a bijective anti-linear isometry.
Proof. Saying that F is “anti-linear” means that fλx+µy = λ̄fx + µ̄fy ∀x, y ∈
H, λµ ∈ C. This follows immediately from the definition of fy and the properties
of the inner product.
Saying that F is an isometry means that kfy kop = kyk ∀y ∈ H. We showed
this in example 4.3.
Since F is an anti-linear isometry it is automatically injective: if fx = fy
then kx − yk = kfx−y kop = kfx − fy kop = 0 so x = y.
Finally, we must show that F is surjective. Let T ∈ H ∗ . We seek a y ∈ H
such that T = fy . Let
y = λz
for some λ ∈ C.
Let x ∈ H be any vector. I can write
T (x) T (x)
x= x− z + z.
T (z) T (z)
50
The second vector on the right is a scalar multiple of z, so belongs to K ⊥ . The
first belongs to K = ker T , because
T (x) T (x)
T x− z = T (x) − T (z) = 0.
T (z) T (z)
Therefore
λ̄kzk2
T (x) T (x)
hx, λzi = λ̄ x − z, z + λ̄ z, z =0+ T (x).
T (z) T (z) T (z)
hPF (x), yi = hPF (x), y − PF (y)i + hPF (x), PF (y)i = hPF (x), PF (y)i,
i.e. PF is self-adjoint.
51
Example 4.16. Let L, R : `2 (N) → `2 (N) be the operators,
L(x1 , x2 , x3 , . . .) = (x2 , x3 , x4 , . . .)
R(x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .).
They are known as the “left shift” and “right shift” operators. Note that these
are both bounded, since for a unit vector x,
12
∞
12 X
kL(x)k = |x2 |2 + |x3 |2 + |x4 |2 + . . . ≤ |xj |2 = 1
j=1
and 12
∞
12 X
kR(x)k = 02 + |x1 |2 + |x2 |2 + . . . = |xj |2 = 1.
j=1
Then
hx, T1∗ (y) − T2∗ (y)i = 0 ∀x, y ∈ H.
In particular, choosing x = T1∗ (y) − T2∗ (y) gives
Thus
hx, T ∗ (Y ) − Zi = 0 ∀x ∈ H.
52
In particular, choosing x = T ∗ (Y ) − Z gives
kT ∗ (Y ) − Zk2 = 0
Then gy is bounded and kgy kop ≤ kykkT kop , because if x has unit length,
where I have used the Cauchy-Schwarz inequality and the fact that T is bounded.
Therefore by the Riesz-Fréchet theorem 4.12 there exists a vector z ∈ H
such that
gy (x) = hx, zi.
I will define T ∗ (y) := z. Then
so kT kop = kT ∗ kop .
53
Third, ∀x, y ∈ H
hx, λ̄S ∗ (y) + µ̄T ∗ (y)i = λhx, T ∗ (y)i + µhx, S ∗ (y)i = hλS(x) + µT (x), yi.
54
Theorem 4.19. Let T : H → H be a self-adjoint bounded linear operator on a
Hilbert space H. Then
Proof. Let
s= sup |hT (e), ei|.
e∈H, kek=1
We must show that s = kT kop . By the Cauchy-Schwarz inequality, for all unit
vectors e ∈ H,
hT (e), ei ≤ kT (e)kkek = kT (e)k ≤ kT kop .
Since s is a least upper bound,
s ≤ kT kop .
Now let e be any unit vector, choose y = T (e) and x = kT (e)ke. Then
hT (x), yi = kT (e)k3 and kxk2 = kyk2 = kT (e)k2 , so the inequality says that
kT (e)k ≤ s.
kT kop ≤ s.
55
4.3 Hilbert-Schmidt operators
I being this section by reminding you of some things you will have encountered
before in finite-dimensional linear algebra. Let A be an n × n matrix and let
T : Cn → Cn be the linear map given by T (x) = Ax ∀x ∈ Cn . Let
1 0 0
0 1 0
e1 = 0 , e2 = 0 , . . . , en = 0
.. .. ..
. . .
0 0 1
be the standard orthonormal basis for Cn . Then T (ej ) equals the jth column
of A, and
hT (ej ), ei i
equals the entry Aij of A in the ith row and jth column. Conversely, given
any linear map T : Cn → Cn , we can construct a square matrix A by setting
Aij = hT (ej ), ei i; it can then be shown that T (x) = Ax for every x ∈ Cn . Thus
every linear map from Cn to Cn can be described as multiplication by some
matrix.
In infinite-dimensional linear algebra the situation is different: not every
linear map can be described by a matrix. Those that can have a special name:
Definition 4.20. Let E be an inner product space with countable orthonormal
basis (en )n∈N and let T : E → E be a bounded linear operator. The matrix
elements of T with respect to this basis are the complex numbers
hT (ej ), ei i i, j ∈ N.
e1 = (1, 0, 0, . . .)
e2 = (0, 1, 0, . . .)
..
.
56
Then T (ej ) = λj ej . So the matrix elements of T w.r.t. this basis are
(
λj j = i
hT (ej ), ei i = λj hej , ei i =
0 j 6= i.
Suppose that the sequence (λn )n∈N is bounded, i.e. there exists an L > 0 such
that |λn | ≤ L ∀n ∈ N. We leave it as an exercise to show that kT (x)k ≤ Lkxk
for all x ∈ `2 (N), so T is bounded.
Now
X∞ ∞
X
|hT (ej ), ei i|2 = |λj |2 .
i,j=1 j=1
57
Proof. By Parseval’s identity, for any vector x ∈ E,
∞
X ∞
X ∞
X ∞
X
2 2 2 2
|hfi , xi| = |hx, fi i| = kxk = |hx, ei i| = |hei , xi|2 .
i=1 i=1 i=1 i=1
where the last equality follows from Parseval’s identity. Therefore the sum is
finite and T is Hilbert-Schmidt.
58
In the proof I made use of the following:
Lemma. Let {f1 , . . . fN } be a set of N orthonormal vectors in a separable
Hilbert space H. Then there exists vectors fN +n for n ∈ N such that (fn )n∈N is
a countable orthonormal basis.
For completeness, here is a proof, which is not examinable.
Proof. Let (en )n∈N be any countable orthonormal basis for H. Let Fm =
sp{f1 , . . . , fN , e1 , . . . , em } and let gm be the orthogonal projection of em onto the
orthogonal complement of Fm−1 , i.e. gm = em − PFm−1 (em ). Note that by con-
struction Fm = sp{f1 , . . . , fN , g1 , . . . , gm }, and that the vectors f1 , . . . , fN , g1 , g2 , . . .
are orthogonal.
I claim that at most N of these vectors are zero. I prove this claim by contra-
dition: suppose that gm1 = gm2 = . . . = gmN +1 = 0E for some m1 < m2 < . . . <
mN +1 = M . Consider the subspace FM = sp{f1 , . . . , fN , e1 , . . . , eM }. The as-
sumption gM = 0E implies that eM ∈ FM −1 , so eM can be written as a linear
combination of the remaining vectors and FM = sp{f1 , . . . , fN , eN +1 , . . . , eM −1 }.
Similarly, each of the vectors emi , can in turn be removed from the spanning
set, so FM can be written as the span of N + M − (N + 1) = M − 1 vectors and
dim FM ≤ M − 1. However, since e1 , . . . , eM are linearly independent we know
that dim FM ≥ M , so we have reached a contradiction.
Given that only finitely many of gn are zero, let (gk(n) )n∈N be the subse-
quence consisting of all non-zero vectors in the sequence. Let fN +n = gk(n) /kgk(n) k
for n ∈ N. Then (fn )n∈N is an orthonormal sequence. I need to show this is
a countable
P∞ orthonormal basis. Let x ∈ E be any vector; I need to show that
x = i=1 hx, fi ifi . Now f1 , . . . , fN +n is an orthonormal basis for Fk(n) so
N +n k(n)
X X
x− hx, fi ifi = kx − PFk(n) (x)k ≤ x − hx, ei iei ,
i=1 i=1
where I have used the fact that e1 , . . . , ek(n) ∈ Fk(n) and PFk(n) (x) is the closest
point in Fk(n) to x. Since (en )n∈N is an orthonormal basis the RHS tends to
PN +n
zero as n → ∞, so x = limn→∞ i=1 hx, fi ifi by the squeeze rule.
Note that the way (fn )n∈N is constructed in this proof is essentially the
Gram-Schmidt process. The only differences are that there are infinitely-many
rather than finitely-many vectors, and there is an additional step of removing
zero-vectors.
This proposition gives many more examples of Hilbert-Schmidt operators:
for example, the orthogonal projection PF onto any finite-dimensional subspace
is Hilbert-Schmidt, and the composition of such a projection with any bounded
linear operator is also Hilbert-Schmidt.
So far I haven’t said very much about the Hilbert-Schmidt norm of a Hilbert-
Schmidt operator. The two most useful properties of the Hilbert-Schmidt norm
are described in the next two propositions. First, the Hilbert-Schmidt norm
gives us information about the operator norm:
59
Proposition 4.26. Let T : E → E be a Hilbert-Schmidt operator. Then
kT kop ≤ kT kHS .
Proof. In the proof we will need the Cauchy-Schwarz inequality for `2 (N), which
says that
P for any two sequences (yj )j∈N and (zj )j∈N of complex numbers such
that j |yj |2 < ∞ and j |zj |2 < ∞,
P
2
∞
X X∞ ∞
X
yj z¯j ≤ |yj |2 |zj |2
j=1 j=1 j=1
∞ ∞
! ∞
X X X
≤ |xk |2 |hT (ej ), ei i|2 by Cauchy-Schwarz
i=1 k=1 j=1
Therefore kT kHS is an upper bound on the norm of kT (x)k, and since kT kop is
a least upper bound it follows that kT kop ≤ kT kHS .
The second proposition says that kT kHS is a norm (in fact, it says a little
more):
Proposition 4.27. Let H be a separable Hilbert space with countable orthonor-
mal basis (en )n∈N . Then HS(H, H) is an inner product space, with inner prod-
uct given by
∞
X
hS, T iHS = hS(ei ), T (ei )i.
i=1
The norm induced by this inner product equals the Hilbert-Schmidt norm kT kHS .
60
strictly increasing function (n > m =⇒ k(n) > k(m)). For example, the
function k(n) = 2n picks out the subsequence of even terms: x2 , x4 , x6 , . . ..
Note that if (xn ) converges to x then any subsequence of (xn ) also converges to
x. Another reminder is:
Definition 4.28. A sequence (xn ) in a normed inner product space E is called
bounded if there exists a constant M > 0 such that kxn k ≤ M ∀n ∈ N.
You have seen both of these notions before in the Bolzano-Weierstrass the-
orem, which says that every bounded sequence in R (or C) has a convergent
subsequence. There is a generalisation of this theorem to finite-dimensional
inner product spaces:
61
Note that there is no such theorem in infinite-dimensional normed vector
spaces.
Definition 4.30. Let H be a Hilbert space and let T : H → H be a bounded
linear operator. Then T is called compact if, for every bounded sequence (xn )
in H, the image (T (xn )) has a convergent subsequence.
Example 4.31. The identity operator on an infinite-dimensional Hilbert space
is not compact.
Recall that the identity operator is IH : H → H such that IH (x) = x
∀x ∈ H. To show that it’s not compact I need to give an example of a bounded
sequence whose image has not convergent subsequence.
Let (en )n∈N be an orthonormal sequence, i.e. a sequence such that hen , en i =
1 and hen , em i = 0 if n 6= m. Such a sequence exists because H is infinite-
dimensional. Then the sequence (en ) is bounded. Its image is the same se-
quence (IH (en )) = (en ). I will show by contradiction that this sequence has no
convergent subsequence. Let ek(n) be any subsequence. Then for n 6= m,
1
kek(n) − ek(m) k = hek(n) , ek(n) i + hek(m) , ek(m) i − hek(n) , ek(m) i − hek(m) , ek(n) i 2
√
= 2.
√
This subsequence is not Cauchy, since if = 2/2 there is no N such that
n, m > N =⇒ kek(n) − ek(m) k < . Since it is not Cauchy, it is also not
convergent. Therefore the sequence has no convergent subsequence.
Example 4.32. Any bounded linear operator of finite rank is compact.
To prove this I’ll make use of the Bolzano-Weierstrass theorem. Let T :
H → H be a bounded linear operator of finite rank and let (xn ) be a bounded
sequence in H. Then there exists an M > 0 such that kxn k < M ∀n ∈ N.
Therefore
kT (xn )k ≤ kT kop kxn k ≤ kT kop M ∀n ∈ N.
So (T (xn )) is a bounded sequence. Also, T (xn ) is a sequence in the image of T ,
which is finite-dimensional (since T has finite rank). Thus T (xn ) is a bounded
sequence in a finite-dimensional vector space, so by proposition 4.29 it has a
convergent subsequence. Therefore T is compact.
Proposition 4.33. Let (Tn ) be a sequence of compact bounded linear operators
on a Hilbert space H and suppose that T is a bounded linear operator on H such
that Tn → T with respect to the operator norm. Then T is compact.
Note: saying Tn → T w.r.t. the operator norm means that kTn − T kop → 0.
Proof. Let (xn ) be a bounded sequence in H. We must find a convergent sub-
sequence of T (xn ).
Since T1 is compact and (xn ) is bounded the sequence (T1 (xn )) has a subse-
quence (T1 (xk1 (n) )) which converges to a limit y1 ∈ H. Since T2 is compact and
(xk1 (n) ) is bounded the sequence (T2 (xk1 (n) )) has a subsequence (T2 (xk2 (n) ))
which converges to a limit y2 ∈ H. By repeating this argument, we find subse-
quences (xkj (n) and vectors yj ∈ H such that
62
• (xkj+1 (n) ) is a subsequence of (xkj (n) ) and
• Tj (xkj (n) ) converges to yj as n → ∞.
Let’s arrange these sequences in a big table, with the j-th sequence in the j-th
row:
xk1 (1) xk1 (2) xk1 (3) ...
xk2 (1) xk2 (2) xk2 (3) ...
xk3 (1) xk3 (2) xk3 (3) ...
.. .. ..
. . .
xkj (1) xkj (2) xkj (3) ...
.. .. ..
. . .
Now let’s choose the sequence given by the diagonal entries: (zn ) = (xkn (n) ).
Then, for any j ∈ N, Tj (zn ) → yj as n → ∞. This is because, if we ignore the
first j − 1 terms, (Tj (zn )) is a subsequence of (Tj (zkj (n) )).
I claim that (T (zn )) is a Cauchy sequence. To show this, let be any positive
real number. Consider
kT (zn ) − T (zm )k ≤ kT (zn ) − Tl (zn )k + kTl (zn ) − Tl (zm )k + kTl (zm ) − T (zm )k
≤ kT − Tl kop kzn k + kTl (zn ) − Tl (zm )k + kT − Tl kop kzm k
≤ 2M kT − Tl kop + kTl (zn ) − Tl (zm )k,
63
I claim that Tn → T in both the operator norm and the Hilbert-Schmidt
norm. Consider first the Hilbert-Schmidt norm. The matrix elements of T − Tn
are
* n +
X
hT (ej ) − Tn (ej ), ei i = hT (ej ), ei i − hT (ej ), ek iek , ei
k=1
(
0 j≤n
= .
hT (ej ), ei i j > n
Therefore
∞ X
X ∞
kT − Tn k2HS = |hT (ej ), ei i|2
j=n+1 i=1
X ∞
∞ X ∞
n X
X
= |hT (ej ), ei i|2 − |hT (ej ), ei i|2
j=1 i=1 j=1 i=1
→0 as n → ∞.
Now kT − Tn kop ≤ kT − Tn kHS , by proposition 4.26. Therefore kT − Tn kop → 0
as n → ∞.
We have shown that T is a limit of a sequence of compact operators. There-
fore T is compact by proposition 4.33.
64
Example 4.38. If E is finite-dimensional and T : E → E then λ ∈ σ(T ) if and
only if λ is an eigenvalue of T . I’ve already shown if λ is an eigenvalue then
λ ∈ σ(T ). Suppose that λ is not an eigenvalue; I must show that λ ∈ / σ(T ).
Since λ is not an eigenvalue there are no non-zero vectors v such that T (v) = λv,
i.e.
ker(T − λIE ) = {v ∈ E : (T − λIE )(v) = 0E } = {0E }.
Therefore nullity(T − λIE ) = dim ker(T − λIE ) = 0 and T is injective. By the
rank-nullity theorem,
dim im(T − λIE ) = rank(T − λIE ) = dim(E) − nullity(T − λIE ) = dim(E),
so im(T − λIE ) = E and T − λIE is surjective. Since T − λIE is a bijection it
has an inverse (T − λIE )−1 , and since this linear operator has finite-dimensional
domain it is bounded.
These two examples show that the spectrum of a bounded linear operator is
very closely related to its set of eigenvalues. Eigenvalues have many important
applications. For example, recall that our earlier discussion of the vibrating
string revolved around the functions sin(nx), with n ∈ N. These are eigenvectors
of the operator y 7→ d2 y/dx2 with eigenvalue −n2 . Similarly, if one wants to
study a vibrating surface (such as a the skin of a drum, or a metal plate) it is
important to know about the eigenvalues and eigenvectors of the appropriate
operator.
In the theory of Hilbert spaces one tends to study the spectrum of an oper-
ator rather than its set of eigenvalues. As we shall see, the spectrum is “better-
behaved” than the set of eigenvalues, i.e. the spectrum has lots of nice properties
which are not shared by the set of eigenvalues. We will see some of those prop-
erties in a moment, but in order to derive those properties I first need to show
you some preliminary results. The proofs of the next two lemmas are left as
exercises.
Lemma 4.39. If S, T : E → E are two bounded linear operators on a normed
vector space E then ST is a bounded linear operator and kST kop ≤ kSkop kT kop .
Lemma 4.40. If E is a normed vector space and (Sn ) and (Tn ) are two conver-
gent sequences in B(E, E) such that Sn → S ∈ B(E, E) and Tn → T ∈ B(E, E)
then Sn Tn → ST .
I can use these to prove:
Proposition 4.41. Let A : H → H be a bounded operator on a Hilbert space
such that kAkop < 1. Then IH − A has a bounded inverse given by
∞
X
An
n=0
65
I claim that (Bn ) is a Cauchy sequence in B(H, H). Let r = kAkop < 1. Then
for n > m,
n
X n
X
kBn − Bm kop = Aj ≤ Aj op
j=m+1 j=m+1
op
n
X j rm+1 − rn+1 rm+1
≤ kAkop = ≤ .
j=m+1
1−r 1−r
Note that the RHS tends to zero as m → ∞ because r < 1. Therefore, given
> 0 there is an N ∈ N such that m > N =⇒ rm+1 /(1 − r) < . Then
n > m > N =⇒ kBn − Bm kop < , so the sequence (Bn ) is Cauchy. Since H
is complete, B(H, H) is complete by proposition 4.10 and hence Bn converges
to a limit B ∈ B(H, H).
Now I’ll show that B is an inverse of IH − A. Notice that
So B is a bounded inverse of A.
P∞
Note that the series j=0 Aj for (1 − A)−1 is very similar to the Taylor
series (1 − z)−1 = 1 + z + z 2 + . . .. This observation might help you remember
the series!
The next theorem outlines some of the important properties of the spectrum.
Theorem 4.42. Let T : H → H be a bounded linear operator on a Hilbert space
H. Then
(i) the spectrum of T is contained in the closed disc of radius kT kop :
66
To prove part (ii) I will first show that if λ ∈ ρ(T ) then there exists > 0
such that
|µ − λ| < =⇒ µ ∈ ρ(T ).
(if you are studying topology or metric spaces, you will recognise that this is the
same as saying that ρ(T ) is an open subset of T ). To prove my claim, suppose
that λ ∈ ρ(T ). Then for any µ ∈ C,
so kAkop k < 1 if |µ − λ| < := 1/k(T − λIH )−1 kop . Thus we have shown that
if |µ − λ| < then µ ∈ ρ(T ).
Having proved the claim, I now show that σ(T ) is closed. This means that
every convergent sequence (λn ) in σ(T ) has its limit in σ(T ). Suppose for
contradiction that (λn ) is a sequence in σ(T ) such that λn → λ and λ ∈ / σ(T ).
Then there exists an > 0 such that |µ − λ| < =⇒ µ ∈ / σ(T ). Since λn → λ
there exists N ∈ N such that n > N =⇒ |λn − λ| < . Then we have that
n > N =⇒ λn ∈ / σ(T ), which contradicts the statement that (λn ) is a sequence
in σ(T ). So σ(T ) is closed.
The proof of part (iii) is omitted – it involves a generalisation of Liouville’s
theorem from complex analysis, so is beyond our scope.
The next proposition is another example of a nice property of the spectrum.
Proposition 4.43. Let T ∈ B(H, H) be a bounded linear operator on a Hilbert
space H. Then λ ∈ σ(T ) ⇐⇒ λ̄ ∈ σ(T ∗ ).
Proof. Suppose that λ ∈ ρ(T ) i.e. that (T − λIH )−1 exists. Then taking the
adjoint of the LHS and RHS of
gives ∗
IH = (T − λIH )−1 (T ∗ − λ̄IH ).
Similarly, taking the adjoint of
gives ∗
IH = (T ∗ − λ̄IH ) (T − λIH )−1 .
∗
So (T ∗ − λ̄IH ) is invertible with inverse (T − λIH )−1 , and λ̄ ∈ ρ(T ∗ ). By a
similar argument, λ̄ ∈ ρ(T ∗ ) implies that λ ∈ ρ(T ). So we have shown λ̄ ∈ ρ(T ∗ )
⇐⇒ λ ∈ ρ(T ), which is equivalent to λ̄ ∈ σ(T ∗ ) ⇐⇒ λ ∈ σ(T ).
67
Example 4.44. Consider the left- and right-shift operators L : `2 (N) → `2 (N),
R : `2 (N) → `2 (N) defined by
L(x1 , x2 , x3 , . . .) = (x2 , x3 , x4 , . . .)
R(x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .).
We’ve already seen that these are bounded, with kLkop = kRkop = 1, and that
L = R∗ and R = L∗ . Let’s work out what their spectra are. Since eigenvalues
always belong to the spectrum, we’ll find their eigenvalues first.
Let λ ∈ C, and let
x = (1, λ, λ2 , . . .).
P∞
Then x ∈ `2 (N) if |λ| < 1, because in that case n=0 |λ|
2n
is a convergent
geometric series. Moreover,
L(x) = (λ, λ2 , λ3 , . . .) = λx
On the other hand, R has no eigenvectors. To show this, suppose that x ∈ `2 (N)
satisfies
λ(x1 , x2 , x3 , . . .) = R(x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .).
If λ = 0 this clearly implies that xn = 0 ∀n ∈ N. If λ 6= 0 then the equation
λx1 = 0 implies that x1 = 0, the equation λx2 = x1 in turn implies that x2 = 0,
and so on, so that again xn = 0 ∀n ∈ N. Thus x has to be zero, so cannot be
an eigenvector of R.
Now let’s consider the spectra of L and R. By theorem 4.42 and our calcu-
lation of eigenvalues we know that
68
4.6 The spectral theorem
In this section we will study in detail the eigenvalues and spectrum of operators
which are both compact and self-adjoint. I’ll begin with an example, but to set
the example up I need a lemma (whose proof is left as an exercise).
Lemma 4.45. Let H be a Hilbert space, let (en ) be an orthonormal sequence
in H, and let (λn ) be a bounded sequence in C. Then
∞
X
T (x) = λn hx, en ien
n=1
Indeed,
∞
X ∞
X
k(T − Tn )(x)k2 = |λj hx, ej i|2 ≤ λ2n+1 |hx, ej i|2 ≤ λ2n+1 kxk2
j=n+1 j=n+1
so kT − Tn kop ≤ |λn+1 | → 0.
We will see later that every compact self-adjoint operator on a Hilbert space
is of this type. This is an important result – it is an infinite-dimensional analog
of diagonalising a matrix. Just as diagonalising a matrix makes it easier to do
calculations and prove theorems, so too with operators.
Notice that the operator in the example has eigenvalues λ1 , λ2 , . . . (because
T (en ) = λn en . As we will see later, this information tells us exactly what its
spectrum is.
69
Proposition 4.47. Let T : H → H be a compact self-adjoint operator on a
Hilbert space H. Then T has an eigenvalue λ = ±kT kop .
Proof. In the case kT kop = 0, we have T = 0H,H so 0 is certainly an eigenvalue
of T .
Consider then the case kT kop 6= 0. By proposition 4.19,
Therefore for each n ∈ N there exists a unit vector en such that kT kop − n1 <
|hT (en ), en i| ≤ kT kop . This sequence (en ) must have a subsequence (ek(n) ) such
that
hT (ek(n) ), ek(n) i → λ,
where λ = ±kT kop . Since T is compact and the subsequence (ek(n) ) is bounded,
this subsequence must have a subsequence (em(n) ) such that T (em(n) ) converges
to a limit z. Thus we have that
It follows that
λem(n) → z.
It follows that
T (z) = lim λT (em(n) ) = λz.
n→∞
70
(ii) If T is of infinite rank then there exists and orthonormal sequence (en )n∈N
in H and a sequence (λn ) of non-zero real numbers such that
T1 : H1 → H1 , T1 (x) := T (x).
H2 = {e1 , e2 }⊥ .
T2 : H2 → H2 , T2 (x) := T (x).
Since Hn+1 ⊂ Hn ,
71
> 0 there exists N ∈ N such that n, m > N =⇒ kT (ek(n) ) − T (ek(m) )k < .
Now
kT (ek(n) ) − T (ek(m) )k2 = kλk(n) ek(n) − λk(m) ek(m) k2 = λ2k(n) + λ2k(m) ≥ λ2k(n) .
H∞ = {e1 , e2 , . . .}⊥ .
T∞ : H∞ → H∞ , T∞ (x) = x.
Now kT∞ kop ≤ kTn kop = |λn | for all n ∈ N, because H∞ ⊂ Hn . Since |λn | → 0,
kT∞ k = 0. Therefore T∞ is the zero operator. Now
∞
X
x− hx, en ien ∈ H∞ ,
n=1
P∞
because hx − n=1 hx, en ien , em i = 0 for all m ∈ N. Therefore
∞ ∞
! !
X X
T (x) = T x− hx, en ien +T hx, en ien
n=1 n=1
∞
X ∞
X
=0+ hx, en iT (en ) = λn hx, en ien .
n=1 n=1
72
Now suppose that λ 6= λn ∀n ∈ N and λ 6= 0; it remains to show that
λ∈
/ σ(T ). As in the proof of theorem 4.48 define
r
X
P (x) = x − hx, en ien .
n=1
If these identities are satisfied it also holds that (T − λIH )(R(x)) = x, as you
can check for yourself.
It remains to show that the sequence µn is bounded in the case r = ∞. I
need to find a d > 0 such that |λ − λn | ≥ d ∀n ∈ N. To do so, set = |λ|/2;
since λn → 0 there exists an N ∈ N such that n > N =⇒ |λn | < |λ|/2. So
If we set
d = min{|λ − λ1 |, |λ − λ2 |, . . . , |λ − λN |, |λ|/2}
then |λ − λn | ≥ d for all n ∈ N.
73