3103 Handout 1
3103 Handout 1
3103 Handout 1
Elementary analysis mostly studies real-valued (or complex-valued) functions on the real
line R or on n-dimensional space Rn . Functional analysis, by contrast, shifts the point of
view: we collect all the functions of a given class (for instance, all bounded continuous
functions) into a space of functions, and we study that space (and operations on it) as an
object in its own right. Since spaces of functions are nearly always infinite-dimensional, we
are led to study analysis on infinite-dimensional vector spaces, of which the most important
cases are Banach spaces and Hilbert spaces.
Before we get down to the detailed study of functional analysis, here are two examples
that show how functional-analysis ideas arise already in elementary analysis:
Ordinary differential equations. Recall that we can solve a first-order linear inho-
mogeneous ordinary differential equation (ODE)
dy
+ p(t)y = q(t) (1.1)
dt
[where p(t) and q(t) are given functions and y(t) is the unknown function] by the method of
integrating factors. Now, what about a second-order linear inhomogeneous ODE
d2 y dy
2
+ p1 (t) + p0 (t)y = q(t) ? (1.2)
dt dt
In general this is hard, but in case of constant coefficients, i.e.
d2 y dy
+ c 1 + c0 y = q(t) , (1.3)
dt2 dt
we can factor the equation into a pair of first-order ODEs, which can then be solved in
succession. You probably recall the method: let α and β be the roots of the quadratic
polynomial λ2 + c1 λ + c0 , so that
λ2 + c1 λ + c0 = (λ − α)(λ − β) . (1.4)
d2 d d d
+ c 1 + c 0 = − α − β , (1.6)
dt2 dt dt dt
1
so that the equation (1.3) becomes
d d
−α − β y = q(t) . (1.7)
dt dt
If we define d
z = −β y, (1.8)
dt
we can rewrite the second-order equation (1.7) as the pair of first-order equations
d
− α z = q(t) (1.9a)
dt
d
−β y = z (1.9b)
dt
So we can first solve (1.9a) for the unknown function z(t), and then solve (1.9b) for the
unknown function y(t).
Now, what have we done here? In particular, what are the objects
d d2 d
, , − α, etc? (1.10)
dt dt2 dt
The answer is that they are linear operators on a space of functions (which map it into
another, not necessarily the same, space of functions). We have some choices in how we
make this precise. For instance, let C k (a, b) be the vector space of real-valued functions on
the interval (a, b) ⊆ R that are k times continuously differentiable (for k = 0 this is just the
space of continuous functions); and let C ∞ (a, b) be the vector space of real-valued functions
on the interval (a, b) ⊆ R that are infinitely differentiable. Then d/dt can be considered
as a linear operator mapping C k (a, b) into C k−1 (a, b) for any k ≥ 1, or as a linear operator
mapping C ∞ (a, b) into itself. Likewise, d2 /dt2 can be considered as a linear operator mapping
C k (a, b) into C k−2 (a, b) for any k ≥ 2, or as a linear operator mapping C ∞ (a, b) into itself.
Note that all these spaces of functions are infinite-dimensional . (Why? You should
supply a proof.) So we are inexorably led to study analysis on infinite-dimensional vector
spaces. Furthermore, we see that among the important objects are linear operators that
map one infinite-dimensional vector space to another.
Remark: Differential operators have the unfortunate property that they reduce the
“smoothness” of a function. As a result, they do not map any of the spaces C k into itself;
rather, they map C k into a larger space such as C k−1 or C k−2 . If we insist on having a
space that is mapped into itself, we have to work with C ∞ , which turns out to be harder to
handle than the spaces C k (it is a “Fréchet space” rather than a “Banach space”). On the
other hand, for many purposes we want to map a space into itself. One way of solving this
dilemma is to rewrite the differential equation (together with its initial conditions) as an
integral equation, and then apply methods of functional analysis to this integral equation.
Since integral operators increase the smoothness of a function — for instance, the indefinite
integral of a C k function is C k+1 — they do map the spaces C k into themselves (in fact, into
proper subsets of themselves), and the functional-analytic treatment of integral equations is
straightforward.
2
Fourier analysis. Let f be a continuous function (either real-valued or complex-valued)
defined on the interval [−π, π]. Then you will recall that its Fourier coefficients {cn }∞
n=−∞
are complex numbers defined by
Zπ
1
cn = e−int f (t) dt (1.11)
2π
−π
for n = . . . , −2, −1, 0, 1, 2, . . . . It is easy to see that the sequence {cn } is bounded . (Why?
You should supply a proof.) Therefore, the Fourier transform for continuous functions on
[−π, π] can be considered as a linear operator F mapping the space C[−π, π] of continuous
complex-valued functions on the interval [−π, π] into the space ℓ∞ (Z) of bounded doubly
infinite sequences of complex numbers. (I will explain the funny notation ℓ∞ next week.)
Once again, both of these spaces are infinite-dimensional . (Why? You should supply a
proof.) So we are again led to study analysis on infinite-dimensional vector spaces. And we
see once again the key role played by linear operators.
Similar reasoning applies to the Fourier transform on the whole real line R. For instance,
let f be a continuous function (either real-valued or complex-valued) defined on R that is
absolutely integrable, i.e. satisfies
Z∞
|f (t)| dt < ∞ . (1.12)
−∞
(Why didn’t we need to impose such a condition when we were working on the closed bounded
interval [−π, π]? How would things be different if we had chosen instead to work on the open
interval (−π, π)?) Then its Fourier transform is the function fb(ω) defined for ω ∈ R by
Z∞
1
fb(ω) = e−iωt f (t) dt . (1.13)
2π
−∞
Then it is easy to see that the function fb is well-defined (why?) and bounded (why?); and
with some work it can also be shown that fb is continuous. Therefore, the Fourier transform
for functions on R can be considered as a linear operator F mapping the space C(R) ∩ L1 (R)
of complex-valued functions that are continuous and absolutely integrable on R into the
space C(R) ∩ L∞ (R) of complex-valued functions that are continuous and bounded on R.
(I will explain the funny notations L1 and L∞ next week.) Once again, these spaces are
infinite-dimensional.
Remark: As always in applying functional analysis, we have some choice concerning the
space of functions where we take our operator to act. I chose to make the Fourier transform
act on the spaces C[−π, π] and C(R) ∩ L1 (R) solely for illustrative purposes, so that the
integral could be understood as the ordinary Riemann integral (together with the standard
limit definition when one or both of the limits of integration is infinite). This is not, in
fact, the cleanest way to understand the Fourier transform. The cleanest approach uses the
spaces L1 [−π, π] and L1 (R) of functions that are Lebesgue-measurable (but not necessarily
3
continuous) and absolutely integrable, or else the spaces L2 [−π, π] and L2 (R) of functions
that are Lebesgue-measurable and whose squares are absolutely integrable. But this requires
an understanding of Lebesgue integration, which I am not assuming for this course. If you
do know a bit about Lebesgue integration, then a few weeks from now (after we have studied
the elementary theory of Hilbert spaces) you might want to look at Saxe, Chapter 4, Rynne
and Youngson, Section 3.5 or Kreyszig, Sections 3.4 and 3.5.
while
Now, the space Rn possesses two structures that are relevant here, namely:
The vector-space structure of Rn is the subject of linear algebra, while the convergence
structure of Rn is the subject of elementary real analysis. These two structures fit
together nicely in the sense that
(b) if the sequence x1 , x2 , . . . converges to x, and α is a real number, then the sequence
αx1 , αx2 , . . . converges to αx.
The study of infinite-dimensional vector spaces will fit this same pattern: each space
under study will have both a vector-space structure and a convergence structure; we must
study both structures, and we must make sure that they fit together nicely in the sense of
(a) and (b) above. Now, the algebraic part of this study is not very different from what
you already learned in your linear-algebra course: indeed, many of the results of linear
algebra hold equally well for finite-dimensional or infinite-dimensional vector spaces. (The
exceptions are, of course, results explicitly referring to bases, dimension, etc.) On the other
hand, the topological behavior of infinite-dimensional spaces is quite different from that of
finite-dimensional spaces, and much of this course will be devoted to studying precisely those
differences.
4
Now, the most general setting for studying questions of convergence is the structure
known as a topological space. So, should we start this course by studying topological
spaces? Well, we could do so (and some courses in functional analysis do just that); but
the concept of a topological space is rather abstract, and the behavior of general topological
spaces can be rather pathological. Therefore, it makes sense to be more modest, and to
begin by studying a subclass of topological spaces that is
(a) rich enough to include most (though not all) of the function spaces that are relevant
for functional analysis,
(b) sufficiently simple so that convergence can be easily visualized and much (though not
all) of the intuition from Rn can be carried over.
The metric spaces are a subclass of topological spaces that have these two properties. Most
(though not all) of the important function spaces are metrizable (and indeed are Banach
spaces); and the theory of metric spaces is quite a bit simpler than the general theory of
topological spaces. So this is where we shall start.
You have already studied the elementary theory of metric spaces in Real Analysis (module
7102), so most of the rest of this handout should constitute review for you. You should make
sure that you know this material well and are capable of filling in all the missing proofs.
Please consult me as soon as possible if you have trouble with any of this material.
Definition 1.1 A metric space (X, d) is defined to be a set X together with a function
d: X × X → R satisfying the following four conditions:
Conditions (i)–(iii) are usually trivial to check in any concrete example, while the triangle
inequality (iv) may require more work.
I stress that the metric space is the pair (X, d), i.e. the set X together with the metric d.
However, we shall often refer informally to “the metric space X” whenever it is understood
from the context what the metric d is.
5
Most of the examples of metric spaces that we will consider are also normed linear spaces
(or subspaces thereof) — and although we will begin the study of normed linear spaces
in earnest about 2 weeks from now, it seems sensible to give the definition without delay,
because norms are slightly easier to work with than general metrics and it would be silly to
deprive ourselves of this convenience. Roughly speaking, a norm is a real-valued function
that assigns a “length” to each vector. Here is the precise definition:
Definition 1.2 Let X be a vector space over the field R of real numbers (or the field C of
complex numbers). Then a norm on X is a function that assigns to each vector x ∈ X a
real number kxk, satisfying the following four conditions:
(iii) kλxk = |λ| kxk for all x ∈ X and all λ ∈ R (or C) (homogeneity);
The pair (X, k · k) consisting of a vector space X together with a norm k · k on it is called
a normed linear space.
I stress once again that the normed linear space is the pair (X, k · k). The same vector
space X can be equipped with many different norms, and these give rise to different normed
linear spaces. However, we shall often refer informally to “the normed linear space X”
whenever it is understood from the context what the norm is.
The point now is that every normed linear space can be given the structure of a metric
space in an obvious way, namely we define the metric by
d(x, y) = kx − yk . (1.14)
Indeed, properties (i), (ii) and (iv) of the metric follow trivially from the corresponding
properties of the norm, while property (iii) of the metric follows from the special case λ = −1
of property (iii) of the norm (why?).
Of course, the homogeneity property of the norm holds for all real (or complex) num-
bers λ, not just λ = −1, so normed linear spaces constitute a special (but important) subclass
of metric vector spaces.
It is easy to see that d is a metric on X (why?). It is called the discrete metric. As we shall
see, it corresponds to a rather uninteresting “space of isolated points”, in which a sequence
{xn } can converge to x only if it is eventually equal to x (i.e. xn = x for all n ≥ some n0 ).
Note that, even if X happens to be a vector space, this metric does not arise from a
norm (why?).
6
Example 2. The real line with the usual norm. Let X be the set R of real numbers,
considered as a one-dimensional vector space over the field of real numbers, and define
d(x, y) = |x − y| . (1.17)
Example 3. Rn with the ℓ1 norm. Fix an integer n ≥ 1, and let X be the space
n
R of ordered n-tuples of real numbers (considered as a vector space over the field of real
numbers). For x = (x1 , x2 , . . . , xn ), define
n
X
kxk1 = |xi | . (1.18)
i=1
It is easy to see that k · k1 is a norm on Rn (you should supply a proof). It is sometimes called
the “Manhattan norm” on Rn (do you see why?). More commonly it is called the “ℓ1 norm”,
for reasons to be discussed later when we study the ℓ1 sequence space (Example 12 below).
Let us remark once again that exactly the same formula defines a norm on Cn , considered
as a vector space over the field of complex numbers.
Example 4. Rn with the ℓ∞ norm. Consider again Rn , but now with the norm
Once again you should verify that k · k∞ is a norm on Rn . It is called the “max norm” or
“sup norm” or “uniform norm” or “ℓ∞ norm”; the latter terminology will be clarified later
when we study the ℓ∞ sequence space (Example 11 below). Once again, exactly the same
formula defines a norm on Cn .
Example 5. Rn with the Euclidean norm. Consider again Rn , but now with the
usual Euclidean norm v
u n
uX
kxk2 = t |xi |2 . (1.20)
i=1
The proof that k · k2 is a norm (in particular, that it satisfies the triangle inequality) is
slightly less trivial than in the preceding examples. It needs the Cauchy–Schwarz inequality
!1/2 !1/2
Xn Xn Xn
2 2
xi yi ≤ xi yi (1.21)
i=1 i=1 i=1
7
for real numbers x1 , . . . , xn and y1 , . . . , yn , which you will prove in Problem 1(a) of Problem
Set #1. Assuming this, we then have
n
X n
X n
X n
X
2
(xi + yi ) = x2i +2 xi yi + yi2 (1.22a)
i=1 i=1 i=1 i=1
n n
!1/2 n
!1/2 n
X X X X
≤ x2i +2 x2i yi2 + yi2 (1.22b)
i=1 i=1 i=1 i=1
!1/2 !1/2 2
n
X n
X
= x2i + yi2 (1.22c)
i=1 i=1
where the middle step used the Cauchy–Schwarz inequality. Taking square roots, we have
v v v
u n u n u n
uX uX uX
t (xi + yi )2 ≤ t x2i + t yi2 , (1.23)
i=1 i=1 i=1
Remark. Examples 3–5 show that the same space X can be equipped with many
different norms (and thus many different metrics). We shall see later that these three metrics
on Rn are all “equivalent” in a sense to be defined later — in particular, they give rise to the
same convergence structure — so it doesn’t matter which one we use. But not all metrics
on a given space X are equivalent! For instance, the discrete metric (Example 1) clearly
gives rise to a different convergence structure on X = R than is given by the usual metric
(Example 2). Here is another “funny” metric on R, which we will study again later:
Example 6. The real line with the tanh metric. Let X be the set R of real
numbers, with the distance
8
Example 7. Spaces of bounded functions, with the sup norm. Let A be any set,
and let X = B(A) be the set of bounded real-valued functions on A (please remind yourself
exactly what it means for a real-valued function on A to be bounded!). The set B(A) is a
vector space under the usual operations of pointwise addition and pointwise multiplication
by scalars.1 Then, for any f ∈ B(A) we define
You should prove that k · k∞ is a norm on B(A). It is called, not surprisingly, the “sup
norm” on B(A). Why did we need to restrict attention here to bounded functions?
Note that Rn with the sup norm (Example 4) is the special case in which A = {1, 2, . . . , n}.
Note also that exactly the same formula would define a norm on the space BC (A) of
bounded complex -valued functions on A, considered as a vector space over the field of com-
plex numbers.
Example 8. Spaces of bounded continuous functions, with the sup norm. Let
A be any subset of the real line, and let X = C(A) be the vector space of bounded continuous
real-valued functions on A. Use once again the sup norm (1.26). Since k · k∞ is a norm on
B(A), it is also a norm on the linear subspace C(A) ⊆ B(A) [why?]. Why did we need to
restrict attention here to bounded continuous functions?
The most important case is when A is a closed bounded subset of the real line, such as a
closed interval A = [a, b]. Then all continuous real-valued functions on A are automatically
bounded — you proved this nontrivial fact in your first analysis course, and we will generalize
it a week or two from now in connection with the concept of compactness. (Exercise: Show
by example that if A is either not closed or not bounded, then a continuous real-valued
function on A need not be bounded.)
We will see soon that, instead of requiring A to be a subset of the real line, we could take
A to be any metric space. Once we have defined what it means for a real-valued function
on a metric space to be continuous, we will see that the set C(A) of bounded continuous
real-valued functions on A, equipped with the sup norm, is always a normed linear space
space. So this shows a way of building new (and more complicated) metric spaces (or normed
linear spaces) from old ones. For instance, starting from the real line R, we can build the
space C(R) of bounded continuous real-valued functions on R, then the space C(C(R)) of
bounded continuous real-valued functions on C(R), and so forth.
9
continuous real-valued functions on [a, b] (which are automatically bounded, as we have just
observed). Now define
Zb
kf k1 = |f (t)| dt . (1.27)
a
The proof that k · k2 is a norm will again require the Cauchy–Schwarz inequality, this time
for continuous functions on [a, b]; see Problem 1(b) of Problem Set #1.
Exactly the same formula would define a norm on the space CC (A) of bounded continuous
complex -valued functions on A; to prove this one has to use the complex Cauchy–Schwarz
inequality.
Example 11. The sequence spaces ℓ∞ and c0 . We denote by ℓ∞ the vector space of
bounded infinite sequences x = (x1 , x2 , . . .) of real numbers, and on this set we define
You should prove that k · k∞ is a norm on ℓ∞ . Can you see that this is actually a special
case of Example 7, and also of Example 8?
We denote by c0 the set of infinite sequences x = (x1 , x2 , . . .) of real numbers that are
convergent to zero (i.e. lim xn = 0). We have c0 ( ℓ∞ (why?). Since k · k∞ is a norm on
n→∞
ℓ∞ , it follows that it is also a norm on the linear subspace c0 ⊂ ℓ∞ .
Once again, we can here replace real numbers by complex numbers if we wish.
Example 12. The sequence space ℓ1 . We denote by ℓ1 the set of infinite sequences
P
∞
x = (x1 , x2 , . . .) of real numbers that are absolutely summable, i.e. satisfy |xi | < ∞. On
i=1
this set we define ∞
X
kxk1 = |xi | . (1.30)
i=1
10
Example 13. The sequence space ℓ2 . We denote by ℓ2 the set of infinite sequences
P
∞
x = (x1 , x2 , . . .) of real numbers that are absolutely square-summable, i.e. satisfy |xi |2 <
i=1
∞. On this set we define !1/2
∞
X
kxk2 = |xi |2 . (1.31)
i=1
which holds for all n. Next we observe that the finite sums on the right-hand side are
bounded above by the corresponding infinite sums, so that
n
!1/2 ∞
!1/2 ∞
!1/2
X X X
(xi + yi)2 ≤ x2i + yi2 (1.33)
i=1 i=1 i=1
∞
!1/2
X
for all n. Now we can take n → ∞: the left-hand side converges to (xi + yi )2
i=1
[why?], which proves what is needed.
This is a typical type of argument that is routinely used to deduce inequalities for infinite
sequences from those for finite sequences, and you should make sure that you understand
well all the steps in it.
Example 14. The sequence spaces ℓp for 1 ≤ p < ∞. We will define these sequence
spaces later; they include ℓ1 and ℓ2 as the most important special cases.
Example 15. The sequence space RN . We denote by RN the set of all infinite
sequences x = (x1 , x2 , . . .) of real numbers, bounded or unbounded. We define
∞
X |xj − yj |
d(x, y) = 2−j . (1.34)
j=1
1 + |xj − yj |
First of all, do you see why d(x, y) is well-defined and finite for all x, y ∈ RN ? In Problem 2
of Problem Set #1 you will prove that d is a metric on RN .
Example 15 is an interesting example of a metrizable topological vector space whose
topology does not arise from a norm.
11
assumed to be familiar; and most (but not all) of the facts that hold in Rn will carry over
to general metric spaces. So you can use your knowledge of Rn , as well as rough sketches on
a piece of paper (R2 ), to gain intuition about general metric spaces that is often (but not
always) correct. Later we will examine in detail some results in Rn that do not carry over
to general metric spaces (notably those involving compactness).
Most of the proofs of results in this section are fairly easy, and they will be left as
exercises for you. You should make sure you can do them! (If you need help, consult one of
the textbooks such as Kolmogorov–Fomin, Kreyszig, Giles, or Dieudonné; and if that doesn’t
help, come see me.) These elementary proofs illustrate techniques that you will need to use
later, in more complicated contexts.
We fix, once and for all, a metric space (X, d). Given a point x ∈ X and a real number
r > 0, we define the open ball of center x and radius r
A subset A ⊆ X is said to be open if, for every x ∈ A, there exists r > 0 (depending in
general on x) such that B(x, r) ⊆ A.
Proof. Consider an open ball B(x, r). We need to check that for each y ∈ B(x, r), there
exists r ′ > 0 such that B(y, r ′) ⊆ B(x, r). Before reading further, you should draw a picture
of this situation and figure out what r ′ should be!
Did you guess that we should take r ′ = r − d(x, y)? To see that this works, consider any
z ∈ B(y, r ′), i.e. any z for which d(y, z) < r − d(x, y). Then by the triangle inequality we
have
d(x, z) ≤ d(x, y) + d(y, z) < r , (1.37)
which implies that z ∈ B(x, r). Since z was an arbitrary point of B(y, r ′), we have proven
that B(y, r ′) ⊆ B(x, r).
Proposition 1.4
(a) The empty set ∅ and the whole space X are open sets.
S(x, r) = {y ∈ X: d(x, y) = r} .
I haven’t even bothered to introduce this latter concept here, as it is of little relevance to most of our work.
12
(c) The intersection of a finite collection of open sets is open.
Proposition 1.5 For any set A, A◦ is the largest open set contained in A.
You should supply the proof. Note that you have to prove two things: that A◦ is indeed
open; and that if B is any open set contained in A, then B ⊆ A◦ .
Proposition 1.6
(c) If A ⊆ B, then A◦ ⊆ B ◦ .
13
Proof. Consider a closed ball B(x, r). We need to show that its complement X \ B(x, r) is
open, i.e. that for each y ∈ X \ B(x, r) there exists r ′ > 0 such that B(y, r ′) ⊆ X \ B(x, r).
Before reading further, you should draw a picture of this situation and figure out what r ′
should be!
Did you guess that we should take r ′ = d(x, y) − r? Note that d(x, y) > r because
y∈ / B(x, r), hence r ′ > 0. To see that this works, consider any z ∈ B(y, r ′). Then d(x, z) ≥
d(x, y) − d(y, z) (why is this a consequence of the triangle inequality?); and since d(y, z) <
r ′ = d(x, y) − r, we have d(x, z) > r, hence z ∈/ B(x, r), hence z ∈ X \ B(x, r). Since z was
an arbitrary point of B(y, r ), we have proven that B(y, r ′) ⊆ X \ B(x, r).
′
We have the following “dual” of Proposition 1.4, which is obtained by taking complements
everywhere and using the standard rules of Boolean algebra (hence the roles of union and
intersection get reversed):
Proposition 1.8
(a) The empty set ∅ and the whole space X are closed sets.
Proposition 1.9 The closure of A is the complement of the interior of the complement of
A.
Thus, just as open and closed sets are “dual” under complementation, so interior and
closure are “dual” under complementation.
Warning: The closure of an open ball B(x, r) is always contained in the closed ball
B(x, r) (you should verify this!); but in a general metric space it is not necessarily equal to
B(x, r). For instance, in a discrete metric space we have B(x, 1) = {x} for each point x
(why?), and the closure of {x} is itself (why?); but B(x, 1) = X (why?).
Proposition 1.10 For any set A, A is the smallest closed set containing A.
4
Kolmogorov–Fomin use the term “contact point”, and Rynne–Youngson use the term “closure point”.
Warning: Do not confuse this with the related but different notion of a limit point (or accumulation
point) of A, which is a point x ∈ X such that every neighborhood of x contains infinitely many points of A.
5
Kolmogorov–Fomin use the notation [A] in place of A. Rynne–Youngson use both of the notations A
and A− .
14
Proposition 1.11
(c) If A ⊆ B, then A ⊆ B.
When A is reduced to a single point x, we write d(x, B) as a synonym for d({x}, B). Note
that the infimum in d(A, B) need not be attained, i.e. there need not exist any pair x ∈ A
and y ∈ B such that d(x, y) = d(A, B); all we know, a priori , is that there exist pairs x, y
that make d(x, y) arbitrarily close to d(A, B). See Problem 3 on Problem Set #1.
We will see later, after defining continuous functions, that this says that the function x 7→
d(x, A) is a continuous (in fact, Lipschitz-continuous) real-valued function on X.
Proof of Lemma 1.12. For every z ∈ A we have d(x, z) ≤ d(x, y) + d(y, z), hence
so d(x, A) − d(y, A) ≤ d(x, y). Doing the same thing with the roles of x and y reversed, one
concludes that d(x, A) − d(y, A) ≥ −d(x, y). Putting these two inequalities together proves
the Lemma.
Proposition 1.13 For any nonempty set A ⊆ X and any r > 0, the set Vr (A) = {x ∈
X: d(x, A) < r} is an open neighborhood of A.
15
You should prove Proposition 1.13. Obviously Vr (A) contains A (why?), so you need
only prove that Vr (A) is open; that is, you need to prove that for any x ∈ Vr (A), there exists
r ′ > 0 such that B(x, r ′ ) ⊆ Vr (A). You should draw a picture to figure out what r ′ should
be taken to be, and then use Lemma 1.12 to complete the proof.
Proof. If d(x, A) = 0, then for every ǫ > 0 there exists y ∈ A with d(x, y) < ǫ; or in other
words, for every ǫ > 0 we have B(x, ǫ) ∩ A 6= ∅. This proves that x is a cluster point of A
(why?), i.e. x ∈ A.
Conversely, if x ∈ A, then for every ǫ > 0 we have B(x, ǫ) ∩ A 6= ∅, i.e. there exists y ∈ A
with d(x, y) < ǫ. But this shows that d(x, A) = 0.
(a) Every closed set is the intersection of a decreasing sequence of open sets.
(b) Every open set is the union of an increasing sequence of closed sets.
Indeed, (a) follows from (1.41), while (b) follows from (a) by taking complements.
The key word here is “sequence”: that is, every closed set is the intersection of a countably
infinite collection of open sets. Warning: This does not hold in arbitrary topological spaces.
Lemma 1.16 A set B ⊆ Y is open in the subspace Y if and only if there exists an open set
A in X such that B = A ∩ Y .
Proof. If y ∈ Y and r > 0, then B(y, r) ∩ Y is the open ball of center y and radius r in
the subspace Y (why?).
Now, if A is open in X and y ∈ A ∩ Y , then there exists r > 0 such that B(y, r) ⊆ A,
hence y ∈ B(y, r) ∩ Y ⊆ A ∩ Y , which shows that A ∩ Y is open in Y .
16
Conversely, if B ⊆ Y is open in the subspace Y , then for each y ∈ B there exists a
number r(y) > 0 such that B(y, r(y)) ∩ Y ⊆ B. This shows that
!
[ [
B = (B(y, r(y)) ∩ Y ) = B(y, r(y)) ∩ Y (1.42)
y∈B y∈B
(you
S should explain both of these equalities!), or in other words B = A ∩ Y where A =
B(y, r(y)) is open in X (why?).
y∈B
Proposition 1.17 If A is dense with respect to B, and B is dense with respect to C, then
A is dense with respect to C.
A set A that is dense with respect to the whole space X is called everywhere dense,
or simply dense in X. Such sets are characterized by the fact that A = X, or equivalently
that every nonempty open set contains a point of A.
I assume that you are familiar, from your previous courses in analysis (or set theory),
with the classification of sets as finite, countably infinite, or uncountably infinite.6 We
say that a set is countable if it is either finite or countably infinite.7 We then make the
following important definition: A metric space X is said to be separable if there exists in X
a countable dense set. As we shall see, separable metric spaces are easier to work with than
general metric spaces, because they are in a certain sense “not too large”, i.e. they can be
“well approximated” by a countable subset.
For example, the real line R with the usual metric (Example 2) is separable, because the
set Q of rational numbers is dense in R (why?), and Q is countably infinite (why?). More
generally, Rn with any of the usual metrics (Examples 3–5) is separable (why?). At the other
extreme, a discrete metric space (Example 1) is separable if and only if the underlying set
X is countable (why?).
We now proceed to study the separability of the sequence spaces ℓ∞ , c0 , ℓ1 and ℓ2 . Let
us first show that the space ℓ∞ of bounded sequences (Example 11) is not separable:
6
If not, then you should carefully study Kolmogorov–Fomin, Chapter 1, Sections 1 and 2 (or the equivalent
discussion in another book) without delay. In this course I will make use of basic facts about finite and infinite
sets without further comment. See also Handout #0.
7
The terms denumerable and nondenumerable are also used as synonyms for “countably infinite”
and “uncountably infinite”, respectively; and the term at most denumerable is also used as a synonym
for “finite or countably infinite”.
17
Proposition 1.18 ℓ∞ is not separable.
By a similar technique you will prove, in Problem 5 of Problem Set #1, that the space
C(R) of bounded continuous real-valued functions on the whole real line R (Example 8) is
not separable. And in Problem 6 you will abstract this construction to prove a more general
result about separability of metric spaces.
On the other hand, the subspace c0 ( ℓ∞ , which consists of sequences that are convergent
to zero, is separable:
Proof. Let S be the subset of c0 consisting of sequences with rational entries, of which at
most finitely many are nonzero. This is a countably infinite set (why?).8 We will show that
S is dense in c0 . To do this, we must show that for any x ∈ c0 and any ǫ > 0, there exists
y ∈ S such that d∞ (x, y) ≤ ǫ. We construct the needed y in two steps:
Firstly, x = (x1 , x2 , . . .) ∈ c0 means that lim xn = 0, i.e. for any ǫ > 0 there exists
n→∞
an integer N such that |xn | ≤ ǫ whenever n > N. Secondly, we choose rational numbers
y1 , y2 , . . . , yN such that |xi −yi | ≤ ǫ for 1 ≤ i ≤ N. So if we define y = (y1 , y2 , . . . , yN , 0, 0, . . .),
we have y ∈ S and d∞ (x, y) ≤ ǫ (why?). This completes the proof.
This proof is typical of many proofs in analysis: First we “cut off” an infinite sequence
by approximating it (to within a small error ǫ) by a finite sequence; and then we further
approximate that finite sequence (again to within a small error ǫ) by a finite sequence of
some desired special type. The total error committed in this approximation is at worst the
8
Since this is a very important argument that we will use over and over again throughout this course, let
me make the reasoning explicit in case you were unable to answer the “why?” for yourself:
Let SN be the subset of c0 consisting of sequences with rational entries of which at most the first N
entries are nonzero, i.e. sequences of the form (x1 , x2 , . . . , xN , 0, 0, 0, . . .) with x1 , . . . , xN ∈ Q. Then SN
is in obvious bijection with QN (why?), and QN is a countably infinite set because it is a finite Cartesian
S
∞
product of countably infinite sets [cf. Theorem 0.3(c) from Handout #0]. But then S = SN is a countably
N =1
infinite union of countably infinite sets, hence also countably infinite [cf. Theorem 0.3(b) from Handout #0].
This two-step process — countable union of finite Cartesian products — is crucial, since a countably
infinite Cartesian product of countably infinite sets is in general not countably infinite.
18
sum of the errors committed in the two steps, by the triangle inequality.9 You will use this
technique, in Problem 7 of Problem Set #1, to prove the separability of the sequence spaces
ℓ1 and ℓ2 .
Later in this course we will see that the spaces C[a, b] of (bounded) continuous real-valued
functions on a closed bounded interval of the real line are also separable, but the proof is
more difficult.
Continuous mappings
One of the most fundamental concepts in analysis on R (or Rn ) is that of a continuous
real-valued function. Intuitively, the idea is that a function f is continuous at a point x0
in case f (x) can be made arbitrarily close to f (x0 ) by taking x sufficiently close to x0 . This
idea can be made more precise in either of two equivalent ways: the familiar ǫ–δ way, or by
using neighborhoods:
ǫ–δ definition. A function f is continuous at a point x0 if, for each ǫ > 0, there
exists δ > 0 such that |x − x0 | < δ implies |f (x) − f (x0 )| < ǫ.
You should make sure that you understand why these two definitions are equivalent. That
is, you should sketch for yourself the proof that each property implies the other.
Here we will generalize the concept of continuity so that both the domain and the range
are allowed to be arbitrary metric spaces. The definition will be an almost exact copy of the
definition from elementary analysis; all we do is replace the absolute value by the metric.
Here is the precise definition: Let (X, dX ) and (Y, dY ) be two metric spaces.
ǫ–δ definition. A mapping f : X → Y is said to be continuous at the point
x0 ∈ X if, for each ǫ > 0, there exists δ > 0 such that dX (x, x0 ) < δ implies
dY (f (x), f (x0 )) < ǫ.
You should make sure, once again, that you understand why the two definitions are equiva-
lent. (The reasoning will be the same as it was in R.) Here is another rephrasing:
Neighborhood definition (rephrased). A mapping f : X → Y is continuous
at the point x0 ∈ X if, for each neighborhood V of f (x0 ) in Y , f −1 [V ] is a
neighborhood of x0 in X.
9
In the specific case above, we didn’t need to use the triangle inequality, because we were dealing with the
sup metric. But it would have been no harm if we had used it: the final bound would have been d∞ (x, y) ≤ 2ǫ
instead of ǫ, but that would have been no problem, because we could have used ǫ/2 instead of ǫ in the two
approximation steps.
19
You should make sure that you understand why this is equivalent to the previous phrasing
of the neighborhood definition.
Remark: In general topological spaces, the ǫ–δ definition makes no sense because there
is in general no metric available, but the neighborhood definition still makes sense and is
taken as the definition of a continuous mapping.
A mapping f : X → Y is said to be continuous on X (or simply continuous without
further qualification) if it is continuous at every point of X. Continuity is characterized by
any one of the following equivalent conditions:
Proposition 1.20 Let X and Y be metric spaces, and let f be a mapping of X into Y .
Then the following properties are equivalent:
(a) f is continuous;
Proof. (a) =⇒ (d): Let x0 be any point in A, and let V be any neighborhood of f (x0 )
in Y . Then, by hypothesis, f −1 [V ] is a neighborhood of x0 in X; so there exists a point
x ∈ A ∩ f −1 [V ] (why?), which means that f (x) ∈ f [A] ∩ V . So every neighborhood of f (x0 )
contains a point of f [A], which means that f (x0 ) ∈ f [A]. [Remark: This argument uses
only the continuity of f at x0 .]
(d) =⇒ (c): Let C be a closed set in Y , and define A = f −1 [C]. Then, by (d),
f [A] ⊆ f [A]. Now, f [A] = f [f −1 [C]] = C ∩ f [X] ⊆ C (why?), so f [A] ⊆ C = C (why?).
So we have shown that f [A] ⊆ C, which implies that A ⊆ f −1 [C] = A. Since the reverse
inclusion A ⊆ A is obvious, we have A = A and hence A is closed.
(c) =⇒ (b): In fact, each of (b) and (c) immediately implies the other, because f −1 [Y \
S] = X \ f −1 [S] for any subset S ⊆ Y . (You should think carefully about why this is true,
and why the corresponding equality for direct images is not in general true.)
(b) =⇒ (a): Consider any x0 ∈ X, and let V be a neighborhood of f (x0 ). Then there
exists an open neighborhood W of f (x0 ) that is contained in V (why?); and f −1 [W ] is an
open set containing x0 that is contained in f −1 [V ], so f −1 [V ] is a neighborhood of x0 in
X. This proves that f is continuous at x0 ; and since x0 is arbitrary, this proves that f is
continuous everywhere.
It should be stressed that parts (b) and (c) of Proposition 1.20 refer to the inverse images
of sets under the function f . By contrast, the direct image of an open (resp. closed) set by
a continuous mapping need not be open (resp. closed). For instance, the map f (x) = x2 is
continuous from R to R, but the image of the open set (−1, 1) is the non-open set [0, 1).
Likewise, the map g(x) = tanh x is continuous from R to R, but the image of the closed set
R is the non-closed set (−1, 1). [See, however, next week concerning the direct images of the
special class of closed sets called “compact sets”.]
20
Convergence of sequences
Another important concept in analysis is that of convergence of sequences. I assume that
you recall the definition of convergence of sequences in R or Rn , and I pass immediately to
give the definition for arbitrary metric spaces.
Let (X, d) be a metric space, let (xn )∞
n=1 be a sequence of points of X, and let a be a
point of X. Then:
ǫ–n0 definition. The sequence (xn )∞ n=1 converges to a (or has a as a limit) in
case, for every ǫ > 0, there exists an integer n0 such that d(xn , a) < ǫ whenever
n ≥ n0 .
Once again you should make sure you understand why the two definitions are equivalent.
We can also use the standard definition of convergence of sequences in R (which is a
special case of the metric-space definition) to rephrase the ǫ–n0 definition as follows:
ǫ–n0 definition (rephrased). The sequence (xn )∞
n=1 converges to a if and only
if lim d(xn , a) = 0.
n→∞
Proof. Suppose that the sequence (xn )∞ n=1 converges to both a and b. Then, for each ǫ > 0,
there exists an integer n0 such that d(xn , a) < ǫ whenever n ≥ n0 , and also an integer n1 such
that d(xn , b) < ǫ whenever n ≥ n1 . Now take any n ≥ max(n0 , n1 ): we have d(xn , a) < ǫ
and d(xn , b) < ǫ, hence by the triangle inequality d(a, b) < 2ǫ. But since ǫ was arbitrary, we
can conclude that d(a, b) = 0 (why?), which means that a = b (why?).
Since a sequence can have at most one limit, it makes sense to write lim xn = a as
n→∞
a shorthand for the statement that the sequence (xn )∞n=1 converges to a. We also use the
notation xn n→∞
−→ a or simply xn → a.
The concept of the closure of a set can be rephrased in terms of sequences:
Proof. Suppose first that (xn )∞ n=1 is a sequence of points of A that converges to a point
x ∈ X. This means that for every neighborhood V of x, there exists an integer n0 such that
xn ∈ V whenever n ≥ n0 . But this means, in particular, that V ∩ A is nonempty. Hence
x ∈ A.
Conversely, suppose that x ∈ A. Then, for every positive integer n, the set A ∩ B(x, 1/n)
is nonempty, so choose (arbitrarily) a point xn ∈ A ∩ B(x, 1/n). Then it is easy to see (e.g.
using the ǫ–n0 definition) that the sequence (xn )∞
n=1 converges to x.
21
Warning: This result does not hold in arbitrary topological spaces. Or rather, only half
of it does: every limit of a sequence of points in A indeed belongs to A, but points in A need
not arise as limits of sequences of points in A. Rather, the concept of “sequence” has to
be replaced by the more general concept of net, and then an analogue of Proposition 1.22
holds.
Proposition 1.23 Let X and Y be metric spaces, let f be a map from X to Y , and let x⋆
be a point in X. Then f is continuous at x⋆ if and only if, for each sequence (xn )∞
n=1 of
points of X that converges to x⋆ , we have lim f (xn ) = f (x⋆ ).
n→∞
Proof. Suppose first that f is continuous at x⋆ and that the sequence (xn )∞ n=1 converges to
x⋆ . Then, for every ǫ > 0 we can choose δ > 0 such that d(x, x⋆ ) < δ implies d(f (x), f (x⋆ )) <
ǫ (why?). And given δ we can choose n0 such that d(xn , x⋆ ) < δ whenever n ≥ n0 . It follows
that d(f (xn ), f (x⋆ )) < ǫ whenever n ≥ n0 . But this shows that lim f (xn ) = f (x⋆ ).
n→∞
Conversely, suppose that f is not continuous at x0 . Then there exists ǫ > 0 such that
for all δ > 0 there exists a point x with d(x, x⋆ ) < δ such that d(f (x), f (x⋆ )) ≥ ǫ. [You
should go carefully through the quantifiers here to understand why this is true!] Taking
δ = 1/n for each positive integer n, we can choose points xn satisfying d(xn , x⋆ ) < 1/n and
d(f (xn ), f (x⋆ )) ≥ ǫ. We then have xn → x⋆ but f (xn ) 6→ f (x⋆ ) (why?).
Warning: Once again, this result does not hold in arbitrary topological spaces; rather,
sequences must once again be replaced by nets.
22
as n → ∞, but the sequence (xn ) does not converge to any finite limit.
Clearly, what we need to look at is not the distance from xn to the next element xn+1 ,
but rather the distances from xn to all subsequent elements in the sequence. Here is the
precise definition:
Let (X, d) be a metric space, and let (xn )∞n=1 be a sequence of points of X. Then the
sequence (xn ) is called a Cauchy sequence if, for every ǫ > 0, there exists an integer n0
such that d(xm , xn ) < ǫ whenever m, n ≥ n0 .
Proof. Suppose that the sequence (xn ) converges to a point a. Then, for each ǫ > 0 there
exists an integer n0 such that d(xn , a) < ǫ/2 for all n ≥ n0 (why?). But then, by the triangle
inequality, d(xm , xn ) < ǫ for all m, n ≥ n0 .
Unfortunately, the converse is not in general true: in a general metric space, a Cauchy
sequence need not converge. For instance, consider the sequence (xn )∞ n=1 of real numbers
defined by
X n
1
xn = 2
. (1.45)
k=1
k
Then (xn ) is an increasing sequence that converges to the limit
X∞
1 π2
a = 2
= . (1.46)
k=1
k 6
And by Proposition 1.24, (xn ) is a Cauchy sequence as well. So far everything looks fine;
but now consider (xn ) as a sequence in the metric space Q of rational numbers (equipped
with the usual metric d(x, y) = |x − y| that it inherits as a subspace of the metric space R).
It is still a Cauchy sequence (why?), but it no longer converges (why?).10
The culprit in this example is not the specific sequence (xn ), but rather the metric space
of rational numbers; in the real numbers this kind of disaster could not happen, as we shall
soon see. Let us therefore make a definition that characterize the “good” metric spaces: A
metric space (X, d) is called complete if every Cauchy sequence converges.
We have just shown that the metric space Q of rational numbers is not complete. On
the other hand, the Cauchy convergence criterion from elementary real analysis states that
a sequence of real numbers is convergent if and only if it is Cauchy, or in other words:
10
Maybe you don’t like this example because the proof of (1.46) is not completely trivial (the easiest proof
uses Fourier series) and the proof of the irrationality of π 2 /6 is decidedly nontrivial. If so, here is a more
elementary example: Define the sequence (xn )∞ n=1 by the initial condition x1 = 1 and the recursion
1 2
xn+1 = xn + .
2 xn
Can you see what limit a this sequence is going to converge to? [Hint: Assume that xn → a, pass to the limit
n → ∞ in the recursion equation, and then solve the resulting equation for a. This doesn’t prove that (xn )
converges, but it does show what the limit must be if the sequence converges at all. With a bit more work
you can then prove that the sequence does indeed converge.] You may recognize this recursion as Newton’s
method xn+1 = xn − f (xn )/f ′ (xn ) for the function f (x) = x2 − 2.
23
Proposition 1.25 The real line R (with its usual metric as in Example 2) is complete.
I won’t prove this here, since it would take me too far afield and I assume that you have
seen this proof in your previous course of real analysis. Let me just recall that the proof of
the Cauchy convergence criterion obviously must depend on some “completeness” property
of the real numbers that distinguishes them from the “incomplete” rationals. Any one of the
following three properties could be used as the starting point from which one could prove
the other two properties as well as the Cauchy convergence criterion:11
The least upper bound property. Every nonempty set of real numbers which
has an upper bound has a least upper bound.
For instance, the completeness of R can be derived from the Bolzano–Weierstrass property
by using the following two facts, which are valid in arbitrary metric spaces (their proofs are
easy and I leave them to you):
Proposition 1.27 Let (X, d) be a metric space and let (xn )∞ n=1 be a Cauchy sequence in X.
If some subsequence of (xn ) converges to a point a, then the whole sequence (xn ) converges
to a.
Proposition 1.28 The metric space Rn with any of its usual metrics (Examples 3–5) is
complete.
Let me do it for the sup metric d∞ ; I leave the proof to you for the metrics d1 and d2 .
(i) (i)
Proof. Let x(1) , x(2) , x(3) , . . . be a Cauchy sequence in Rn , where x(i) = (x1 , . . . , xn ). This
means that, for each ǫ > 0 there exists n0 such that d∞ (x(i) , x(j) ) < ǫ whenever i, j ≥ n0 .
But by the definition of d∞ this means that
(i) (j)
|xk − xk | < ǫ (1.47)
11
In case you wonder where these properties come from, the best approach is to develop the real numbers
“constructively” from the rationals. One way is to use the method of Dedekind cuts: see e.g. Landau,
Foundations of Analysis. Another way is to define the real numbers as equivalence classes of Cauchy
sequences of rationals — this is a special case of the general construction of “completion of a metric space”
that we will discuss very soon.
24
(i)
for each index k (1 ≤ k ≤ n) whenever i, j ≥ n0 . Therefore the sequence (xk )∞
i=1 is a Cauchy
sequence in R for each index k, so it converges to some limit xk . Moreover, taking the limit
j → ∞ in (1.47), we conclude that
(i)
|xk − xk | ≤ ǫ (1.48)
for each index k (1 ≤ k ≤ n) whenever i ≥ n0 (why? why did the < turn into ≤?). But
this means that if we define x = (x1 , . . . , xn ), we have proven that d∞ (x(i) , x) ≤ ǫ whenever
i ≥ n0 . And since ǫ was arbitrary, this shows that x(i) → x in the metric space (Rn , d∞ ).
Next week we shall use a similar technique to prove the completeness of the sequence
spaces ℓ∞ , c0 , ℓ1 and ℓ2 as well as of the space B(A) of bounded functions on an arbitrary
set A. The space C(A) of bounded continuous functions, equipped with the sup norm, is
also complete, and we shall prove this a few weeks from now (it is not difficult). On the
other hand, the spaces C[a, b] of bounded continuous functions with the L1 or L2 metric
(Examples 9 and 10) are not complete, as we shall also show next week.
25