Measure 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 77

Measure and Integration (under construction)

Gustav Holzegel∗
April 24, 2017

Abstract
These notes accompany the lecture course ”Measure and Integration”
at Imperial College London (Autumn 2016). They follow very closely the
text “Real-Analysis” by Stein-Shakarchi, in fact most proofs are simple
rephrasings of the proofs presented in the aforementioned book.

Contents
1 Motivation 3
1.1 Quick review of the Riemann integral . . . . . . . . . . . . . . . 3
1.2 Drawbacks of the class R, motivation of the Lebesgue theory . . 4
1.2.1 Limits of functions . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Length of curves . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 The Fundamental Theorem of Calculus . . . . . . . . . . 5
1.3 Measures of sets in R . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Literature and Further Reading . . . . . . . . . . . . . . . . . . . 6

2 Measure Theory: Lebesgue Measure in Rd 7


2.1 Preliminaries and Notation . . . . . . . . . . . . . . . . . . . . . 7
2.2 Volume of Rectangles and Cubes . . . . . . . . . . . . . . . . . . 7
2.3 The exterior measure . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Properties of the exterior measure . . . . . . . . . . . . . 10
2.4 The class of (Lebesgue) measurable sets . . . . . . . . . . . . . . 12
2.4.1 The property of countable additivity . . . . . . . . . . . . 14
2.4.2 Regularity properties of the Lebesgue measure . . . . . . 15
2.4.3 Invariance properties of the Lebesgue measure . . . . . . 16
2.4.4 σ-algebras and Borel sets . . . . . . . . . . . . . . . . . . 16
2.5 Construction of a non-measurable set . . . . . . . . . . . . . . . . 17
2.6 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6.1 Some abstract preliminary remarks . . . . . . . . . . . . . 19
2.6.2 Definitions and equivalent formulations of measurability . 19
2.6.3 Properties 1: Behaviour under compositions . . . . . . . . 21
2.6.4 Properties 2: Behaviour under limits . . . . . . . . . . . . 21
2.6.5 Properties 3: Behaviour of sums and products . . . . . . 22
∗ Imperial College London, Department of Mathematics, South Kensington Campus, Lon-

don SW7 2AZ, United Kingdom.

1
2.6.6 The notion of “almost everywhere” . . . . . . . . . . . . . 22
2.7 Building blocks of integration theory . . . . . . . . . . . . . . . . 23
2.7.1 Simple functions . . . . . . . . . . . . . . . . . . . . . . . 23
2.7.2 Step functions . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 Approximation Theorems . . . . . . . . . . . . . . . . . . . . . . 23
2.8.1 Approximating a measurable function by simple functions 23
2.8.2 Approximating a mesurable function by step functions . . 25
2.9 Littlewoods Three Principles . . . . . . . . . . . . . . . . . . . . 26

3 Integration Theory: The Lebesgue Integral 28


3.1 Simple Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Bounded functions on sets of finite measure . . . . . . . . . . . . 29
3.2.1 Riemann integrable functions are Lebesgue integrable . . 32
3.3 Non-negative functions . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 The General case and the notion of integrable . . . . . . . . . . . 35
3.5 Aside: Complex-valued functions . . . . . . . . . . . . . . . . . . 37
3.6 The space of integrable functions as a normed vector space . . . 37
3.7 Dense families in L1 Rd . . . . . . . . . . . . . . . . . . . . . .

39
3.8 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.8.1 Slices of measurable sets and functions . . . . . . . . . . . 40
3.8.2 Statement and Discussion of Fubini’s and Tonelli’s Theorem 40
3.8.3 The proof of Tonelli’s Theorem (using Fubini) . . . . . . . 42
3.8.4 The proof of Fubini’s Theorem . . . . . . . . . . . . . . . 42

4 Differentiation and Integration 46


4.1 Differentiation of the Integral . . . . . . . . . . . . . . . . . . . . 46
4.1.1 Proof of the Lebesgue Differentiation Theorem . . . . . . 47
4.1.2 Proof of Proposition 4.1 . . . . . . . . . . . . . . . . . . . 49
4.1.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Differentiation of functions . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Functions of bounded variation . . . . . . . . . . . . . . . 50
4.2.2 Examples of functions of bounded variation . . . . . . . . 51
4.2.3 Characterisation of functions of bounded variation . . . . 52
4.2.4 Bounded variation implies differentiable a.e. . . . . . . . . 54
4.2.5 Absolute Continuity and the Fundamental Theorem of the
Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.6 The proof of Theorem 4.8 . . . . . . . . . . . . . . . . . . 56
4.2.7 The proof of Theorem 4.6 (non examinable) . . . . . . . . 59

5 Abstract Measure Theory 60


5.1 Measure Spaces: Definition and basic examples . . . . . . . . . . 60
5.2 Exterior measure and Carathéodory’s theorem . . . . . . . . . . 62
5.3 Premeasures and the extension theorem . . . . . . . . . . . . . . 64
5.3.1 Construction of a measure from a premeasure . . . . . . . 65
5.3.2 The proof of Proposition 5.1 . . . . . . . . . . . . . . . . 66
5.4 A further example: Hausdorff measure . . . . . . . . . . . . . . . 67
5.5 Integration on a general measure space . . . . . . . . . . . . . . . 68
5.6 Construction of product measures . . . . . . . . . . . . . . . . . . 69
5.7 General Fubini theorem . . . . . . . . . . . . . . . . . . . . . . . 70

2
6 The change of variables formula 72
6.1 An example illustrating Theorem 6.1 . . . . . . . . . . . . . . . . 72
6.2 A reformulation of Theorem 6.1 . . . . . . . . . . . . . . . . . . . 73
6.3 Proof of Theorem 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . 73

7 Mastery Material: Lp -spaces 76


7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2 The Hölder inequality . . . . . . . . . . . . . . . . . . . . . . . . 76
7.3 Minkowski’s inequality . . . . . . . . . . . . . . . . . . . . . . . . 77

1 Motivation
1.1 Quick review of the Riemann integral
In your second year analysis course you defined the Riemann integral. Let us
remind ourselves of the simplest situation and consider a bounded real-valued
function f defined on the interval [a, b].
• A partition P of [a, b] is a finite set of points x0 , x1 , ..., xn with

a = x0 ≤ x1 ≤ x2 ≤ ... ≤ xn−1 ≤ xn = b .

• Given a partition P , we define

Mi = sup f (x) and mi = inf f (x)


xi−1 ≤x≤xi xi−1 ≤x≤xi

and n
X
U (P, f ) = Mi (xi − xi−1 ) the upper sum
i=1
n
X
L (P, f ) = mi (xi − xi−1 ) the lower sum .
i=1

• Finally, we define the upper and lower Riemann integrals of f on [a, b] as


Z b Z b
f dx := inf U (P, f ) and f dx := sup L (P, f ) (1)
a P P
a

Note that since f is bounded these objects are well-defined.


Definition 1.1. We say that f is Riemann-integrable over [a, b] if the upper
and the lower Riemann integrals agree. In this case we write f ∈ R and denote
Rb
the common value by a f dx.
The theory then proceeds along the following lines
Definition 1.2. We say that the partition P ⋆ is a refinement of P if P ⋆ ⊃ P .
Given two partitions P1 and P2 we define their common refinement as P ⋆ =
P1 ∪ P2 .
One easily shows

3
Theorem 1.1. If P ⋆ is a refinement of P then
L (P, f ) ≤ L (P ⋆ , f ) and U (P, f ) ≥ U (P ⋆ , f ) .
Theorem 1.2. We have
Z b Z b
f dx ≤ f dx
a a

Proof. Let P ⋆ be the common refinement of two arbitrary partitions P1 and P2 .


We have
L (P1 , f ) ≤ L (P ⋆ , f ) ≤ U (P ⋆ , f ) ≤ U (P2 , f )
by the previous theorem. Fixing P2 and taking the sup over all partitions P1
we obtain Z b
f dx ≤ U (P2 , f )
a
for any partition P2 . Taking now the infimum over all partitions P2 , we are
done.
Theorem 1.3. The function f is Riemann integrable if and only if for every
ǫ > 0 there exists a partition P such that
U (P, f ) − L (P, f ) < ǫ .
Proof. The function f being Riemann integrable means that, for ǫ > 0 pre-
scribed, there exist partitions P1 , P2 (with common refinement P ⋆ ) such that
Z b Z b
⋆ ǫ
U (P , f ) − f dx ≤ U (P1 , f ) − f dx < ,
a a 2
Z b Z b
ǫ
f dx − L (P ⋆ , f ) ≤ f dx − L (P1 , f ) < .
a a 2
Adding the two inequalities proves the first direction. Conversely, to check
whether f is Riemann integrable, it suffices to show inf P U (P, f )−supP L (P, f ) <
ǫ for any ǫ > 0. With ǫ > 0 prescribed we take the partition P̂ promised by the
assumption to obtain
   
inf U (P, f ) − sup L (P, f ) ≤ U P̂ , f − L P̂ , f < ǫ .
P P

One can then use the criterion of Theorem 1.3 to prove


Theorem 1.4. If f is continuous on [a, b] then it is Riemann integrable.
Proof. Exercise. Hint: Use that a continuous function on a compact interval is
uniformly continuous.

1.2 Drawbacks of the class R, motivation of the Lebesgue


theory
In this course you will learn about a new integral, the Lebesgue integral, which
will allow us to integrate a much larger class of functions. Why would one want
to do that?

4
1.2.1 Limits of functions
Well, one main drawbacks of the class of Riemann integrable functions R is that
it does not behave well under taking limits.
To see this, consider the function

1 if x ∈ Q ∩ [0, 1] ,
f (x) = (2)
0 otherwise.
This function is not Riemann integrable (why?). On the other hand, for (xn )
an enumeration of the rational numbers in [0, 1], the function

1 if x ∈ {x1 , x2 , ..., xn },
fn (x) =
0 otherwise.
is Riemann integrable for every n (why? with what value?) and we have fn → f
pointwise. We conclude that the limit of a sequence of Riemann integrable
functions does not have to be Riemann integrable.
A perhaps less academic example is provided by the following sequence of
functions that you will construct on the first example sheet: (fn ) is a sequence
of continuous functions fn : [0, 1] → R with fn → f pointwise such that
• 0 ≤ fn ≤ 1
• (fn ) decreases monotonically as n → ∞
• f is not Riemann integrable
R1
The above implies that sn = 0 fn dx is a decreasing sequence of positive num-
R1
bers and hence converges. It is very tempting to define 0 f dx to be that
limit. The Lebesgue integral will allow us to do this in this particular situation
(“monotone convergence theorem”) and in much more general ones.

1.2.2 Length of curves


Let Γ (t) = (x (t) , y (t)) for a ≤ t ≤ b be a curve in the plane with x and y
continuous functions. We can define the length L of the curve as the supremum
of the length of polygonal approximations (picture). We call Γ a rectifiable
curve if L < ∞. You may recall that if x and y are continuously differentiable,
then the above limiting procedure leads to the formula
Z bp
L= ẋ2 (t) + ẏ 2 (t)dt .
a

A natural question is whether weaker conditions on x and y suffice to guarantee


rectifiability of Γ and whether we can make sense of the formula above in this
case. This will lead us to functions of bounded variation.

1.2.3 The Fundamental Theorem of Calculus


For F a differentiable function whose derivative is Riemann integrable on [a, b]
we have the FTC
Z b
F ′ (x) dx = F (b) − F (a) . (3)
a

5
Now there are functions whose derivative is not Riemann integrable. Can we
still make sense of the formula above? What is the class of functions for which
an identity as above holds?

1.3 Measures of sets in R


As we shall see, the key to answering the above questions lies in understanding
the size or “measure” of sets in Rd . In particular, discussing the case d = 1 for
simplicity, we will construct a function (“measure”) defined on a certain class
of subsets of R (denoted by ℓ)
m : ℓ → R+
0 ∪ {∞}

with the following properties


1. m (E) = b − a if E is the interval [a, b], a ≤ b.
P∞
2. m (E) = n=1 m (En ) whenever E = ∪∞ n=1 En is a disjoint union
(countable additivity).
In particular, m (E1 ∪ E2 ) = m (E1 ) + m (E2 ) if E1 , E2 are disjoint.
(finite additivity)
3. m (E + h) = m (E) for every h ∈ R.
(translation invariance)
We will show existence and uniqueness of such a measure, the Lebesgue measure,
provided one restricts ℓ to the class of “measurable sets”. This class will be
closed under countable unions and intersections as well as taking complements
and contain moreover the open sets. But it won’t comprise all subsets of R!
There are non-measurable sets. In other words there is no function m with the
above properties that is defined on all sets of R.1
Once we have identified the class of measurable sets, we shall define measur-
able functions and then construct the Lebesgue integral. Armed with this, we
shall be able to answer some of the questions raised in Section 1.2.
This will probably take three quarters of the course. In the last part we will
discuss what is called “abstract measure theory”, which will be important, for
instance, in applications to probability.

1.4 Literature and Further Reading


As mentioned, Sections 2-5 of these notes will follow very closely the book “Real
Analysis” by Stein-Shakarchi (Princeton University Press) covering Chapters 1-3
and the first half of Chapter 6 of that book. For Section 5.6 I also used “Measure
Theory and Integration” by M. Taylor (Graduate Studies in Mathematics). The
proof of the change of variables formula in Section 6 is taken from “Analysis II”
by Theodor Bröcker (Spektrum Lehrbuch; in German). I also recommend the
“classics”: Folland’s “Real Analysis” (Wiley) and Rudin’s “Real and Complex
Analysis” (Mc Graw Hill). These books take a more advanced point of view
than these notes but will be nice and accessible as complementary reading.
1 The construction of such non-measurable sets involves the axiom of choice. We will

construct such a set later. In higher dimensions, one has the Banach-Tarski paradox, which
proves (using again the axiom of choice) that the unit ball in R3 can be decomposed into a
finite number of disjoint sets Ai (i ≥ 5) which can reassembled – using only translations and
rotations applied to the Ai – into two copies of the unit ball.

6
2 Measure Theory: Lebesgue Measure in Rd
2.1 Preliminaries and Notation
• x = (x1 , .., xd ) a point in Rd
p
• |x| = (x1 )2 + ...(xd )2 the Euclidean norm
• d (x, y) = |x − y| the distance of two points x, y ∈ Rd
• d (E, F ) = inf x∈E;y∈F |x − y| distance between two sets E, F ⊂ Rd .
• Br (x) := {y ∈ Rd | |x − y| < r} open ball around x of radius r in Rd .
• recall definitions of open, closed, bounded
• E ⊂ Rd is compact if for any open
S
S cover E ⊂ α∈A Uα (Uα open) there
exists a finite subcover, i.e. E ⊂ j∈J Uj for J a finite subset of A.

• recall in Rd a subset E ∈ Rd is compact if and only if it is closed and


bounded (Heine-Borel)
• x ∈ Rd is a limit point of E ⊂ Rd if for every r > 0 the ball Br (x) contains
points of E.
• x ∈ Rd is an isolated point of E ⊂ Rd if for some r > 0 the ball Br (x)
satisfies Br (x) ∩ E = {x}.
• A closed set is perfect if it does not contain any isolated points.
• The closure of E ⊂ Rd , denoted E is the union of E and all its limit
points.
• The boundary of E ⊂ Rd , denoted ∂E is the closure of E without the
interior of E, i.e. ∂E = E \ int (E).

2.2 Volume of Rectangles and Cubes


Our first basic tool to measure the size of sets will be closed rectangles, defined
as follows
R = [a1 , b1 ] × [a2 , b2 ] × ... × [ad , bd ]
with ai ≤ bi for all i = 1, ..., d. Note that R is closed and has axes parallel to
the coordinate axes of Rd . We define the length of the sides of R as bi − ai for
i = 1, ..., d and the volume of R as

|R| = (b1 − a1 ) · ... · (bd − ad ) (4)

A closed rectangle with bi − ai = c for all i = 1, ..., d is called a cube.


We make the obvious definition for an open rectangle.
A union of closed rectangles is said to be almost disjoint if the interiors of
the rectangles are disjoint.

7
Lemma 2.1. If aS rectangle is the almost disjoint union of finitely many rect-
N
angles , say R = k=1 Rk , then
N
X
|R| = |Rk |
k=1

Proof. One first extends the sides of the rectangles Rk to obtain new rectangles
R̃1 , ..., R̃M as shown in the figure below.

R2
R1

R4

R3

R5

This way one has the almost disjoint unions


M
[ [
R= R̃j and Rk = R̃j for k = 1, ..., N
j=1 j∈Jk

where Jk is a partition of the integers from 1 to M . We claim that


M
X
|R| = |R̃j | .
j=1

This follows from writing out the volumes on the left and on the right and ap-
plying the distributive law. The same argument can be applied to the rectangles
Rk , so X
|Rk | = |R̃j | .
j∈Jk

Combining the above, we obtain


M
X N X
X N
X
|R| = |R̃j | = |R̃j | = |Rk | .
j=1 k=1 j∈Jk j=1

SN
Lemma 2.2. If R ⊂ k=1 Rk with R and Rk (closed) rectangles, then |R| ≤
PN
k=1 |Rk |.

Proof. Exercise. Repeat the previous proof.


Our idea will now be to approximate an arbitrary set in Rd by rectangles.
We first note the following

8
Theorem 2.1. Every open subset U ⊂ Rd , d ≥ 1 can be written as a countable
union of almost disjoint closed cubes.
Proof. We construct the union as a (countably infinite) sequence of steps. We
start with the grid of mesh 1 on Rd with lines parallel to the coordinate axes.
A cube Q of the grid is accepted if Q ⊂ U , recjected if Q ⊂ U c and tentatively
accepted otherwise. In the second step we bisect the tentatively accepted cubes
to cubes of length 2−1 and again accept, reject or tentatively accept the sub-
cubes. Continuing this procedure indefinitely, we obtain a countable union of
accepted cubes and we claim that this union is U . To see this, note first that
taking the grid of mesh size 2−N of Rd we know that any cube contained in U
has either been accepted or is contained in a cube that has been accepted in a
previous step. Therefore, it suffices to show that x ∈ U is contained in a cube
of size 2−N contained in U for large enough N . But this is easily deduced from
the fact that U is open.

2.3 The exterior measure


Theorem 2.1 suggests to define the measure of an arbitrary open set U as

X ∞
[
m (U ) = |Rj | where U = Rj with Rj almost disjoint.
j=1 j=1

However, we would still have to show independence of this quantity from the
decomposition. We’re now going to achieve this and much more.
Definition 2.1. For E ⊂ Rd any subset of Rd we define the exterior measure
of E by
X∞
m⋆ (E) = Sinf

|Qj |
E⊂ j=1 Qj
j=1

where we are taking the infimum over all countable coverings of E by closed
cubes.
Note that 0 ≤ m⋆ ≤ ∞. Remarkably, replacing countable by finite in the
above definition would yield a different quantity (see Example Sheet 1).

2.3.1 Examples
Let us compute the exterior measure for some elementary sets to see whether it
agrees with our intuitive definition of volume above.

1. The exterior measure of a point is zero. (Why?)


2. The exterior measure of a closed cube Q is equal to its volume.
To see this, note first that since Q covers itself we have m⋆ (Q) ≤ |Q|. To
show the reverse direction we need to show that any covering Qj satisfies

X
|Qj | ≥ |Q| .
j=1

9
In fact, it suffices to show that any covering Qj satisfies

X
|Qj | ≥ |Q| − ǫ .
j=1

for any ǫ > 0. To show the latter, given ǫ > 0 we chooseS∞ for each j an
open cube Sj with Qj ⊂ Sj and |Sj | ≤ |Qj | + 2ǫj . Then j=1 Sj is an open
SN
cover of Q. Since Q is compact, there is a finite open subcover j=1 Skj
SN
of Q and clearly also Q ⊂ j=1 Skj . Now we can apply Lemma 2.2 to
PN
conclude |Q| ≤ j=1 |Skj | and therefore

N N  ∞
X X ǫ  X
|Q| ≤ |Skj | ≤ |Qkj | + ≤ |Qj | + ǫ
j=1 j=1
2kj j=1

as desired.
3. The exterior measure of an open cube Q is equal to its volume.
Again, we have m⋆ (Q) ≤ |Q| = |Q| since the closed cube Q covers Q.
In the reverse direction observe that we can find, for any ǫ > 0, a closed
cube Qin contained in Q such that |Qin | ≥ |Q| − ǫ. Therefore, we have
m⋆ (Q) ≥ m⋆ (Qin ) ≥ |Q| − ǫ, the first inequality holding because any
covering of Q is also one of Qin .
4. The exterior measure of a rectangle R is equal to its volume.
We sketch the argument. First of all, arguing as for the cube in 2. above,
we obtain |R| ≤ m⋆ (R).
For the reverse direction consider a grid of cubes of length 1/k on Rd and
denote by Q̃ the cubes entirely contained in R and by Q̃′ those cubes
intersecting both R and the complement Rc . Clearly for fixed k there are
only finitely many cubes in Q̃ and Q̃′ . In fact, it is easy to see that the
number of cubes in Q̃′ is smaller than Ck d−1 for some uniform constant
C depending only on the side lengths of R. We finally note that
[
R⊂ Q
Q∈Q̃∪Q̃′

where the right hand side is a rectangle expressed as the union of finitely
many almost disjoint union of cubes. By monotonicity and Lemma 2.1 we
have
X X C
m⋆ (R) ≤ |Q| + |Q| ≤ |R| +
k
Q∈Q̃ ′ Q∈Q̃

and choosing k large we obtain the desired inequality for any ǫ > 0.
5. The exterior measure of Rd is infinite, m⋆ Rd = ∞, as any covering of


Rd must in particular cover arbitrarily large cubes.

2.3.2 Properties of the exterior measure


In the following we denote by Ei and E subsets of Rd .

10
Proposition 2.1. The exterior measure m⋆ satisfies the following properties:
1. If E1 ⊂ E2 then m⋆ (E1 ) ≤ m⋆ (E2 ) (monotonicity)
S∞ P∞
2. If E = j=1 Ej , then m⋆ (E) ≤ j=1 m⋆ (Ej ) (countable subadditivity)

3. If E ⊂ Rd , then
m⋆ (E) = inf m⋆ (U)
E⊂U

with the infimum taken over all open sets that contain E.
S
4. If E = E1 E2 and d (E1 , E2 ) > 0, then m⋆ (E) = m⋆ (E1 ) + m⋆ (E2 )
(finite additivity for sets with positive distance)
5. If E = ∞
S
j=1 QjP is a union of almost disjoint cubes,

then m⋆ (E) = j=1 |Qj | (countable additivity for almost disjoint cubes)
Before we prove this, let us make some remarks. We note first that 5. gives
us (in view of Theorem 2.1) a notion of volume of an arbitrary open set which
is independent of the decomposition into cubes.
We also remark that one cannot conclude in general that if E1 and E2 are
disjoint sets in Rd , then m (E1 ) + m (E2 ) = m (E1 ∪ E2 ) holds.2 However, for
the class of sets (“Lebesgue measurable sets”) that we are going to define in
the next section, this property does hold, in fact it does so for countably many
disjoint sets (countable additivity)!
Proof. The first property follows since any covering of E2 of closed cubes is also
a covering of E1 . For the second property we use the 2ǫj -trick: With ǫ > 0
arbitrary and fixed, we choose for any Ej a covering by closed subes Qj,n with

X ǫ
m⋆ (Ej ) ≥ |Qj,n | − .
n=1
2j
S
Since E ⊂ j,n Qj,n is a covering by closed cubes we have
∞ X
∞ ∞  ∞
X X X ǫ X
m⋆ (E) ≤ |Qj,n | = |Qj,n | ≤ m⋆ (Ej ) + j ≤ m⋆ (Ej ) + ǫ
j,n j=1 n=1 j=1
2 j=1

where we inserted the previous estimate in the third step. Since this inequality
holds for any ǫ > 0 we are done.
To prove 3. we note that by 1. we clearly have m⋆ (E) ≤ inf m⋆ (U), so we
only need to show the ≥-direction. For this we can assume in addition that
m⋆ (E) < ∞ as otherwise the inequality holds trivially. To prove the inequality,
it clearly suffices to construct for any ǫ > 0 a set U with

m⋆ (U) ≤ m⋆ (E) + ǫ . (5)


ǫ
To construct this U for ǫ > 0 given, we use the 2j -trick: We first cover E with
closed cubes such that

X ǫ
m⋆ (E) ≥ |Qj | − (6)
j=1
2
2 See Example Sheet 2 after having read Section 2.5.

11
and then, for each cube Qj choose an open cube Q0j with Qj ⊂ Q0j and |Qj | ≥
S∞
|Q0j | − 2ǫ 21j . We then define the union U := j=1 Q0j and claim it satisfies the
desired inequality. To check this we note
∞ ∞  
X
0
X ǫ 1
m⋆ (U) ≤ |Qj | ≤ |Qj | + ≤ m⋆ (E) + ǫ
j=1 j=1
2 2j

where the first step follows from monotonicity and the last from inserting (6).
We leave the proof of 4. as an exercise. [Outline: Note that ≤-direction
follows from monotonicity. For ≥ choose
P 0 < δ < d (E1 , E2 ) and cover E =
E1 ∪ E2 by cubes such that m⋆ (E) ≥ ∞ j=1 |Qj | − ǫ. Refine the cubes such that
they all have length smaller than δ/2. Note that each cube can only intersect
either E1 or E2 . Conclude.]
For 5. we note that the ≤ direction follows from monotonicity. To show ≥,
we give ourselves ǫ of room. Let ǫ > 0 be fixed. For each Qj choose a closed
cube Q̃j strictly contained in Qj with
ǫ
|Q̃j | ≥ |Qj | − .
2j
For fixed N , the cubes Q̃j are disjoint and compact, hence (by an Exercise on
Example Sheet 2) finite distance apart. We can hence apply 4. and conclude
 
N N N N 
[ X   X X ǫ
m⋆ (E) ≥ m⋆  Q̃j  = m⋆ Q̃j = |Q̃j | ≥ Qj | + j
j=1 j=1 j=1 j=1
2

and hence after taking the limit N → ∞ that



X
m⋆ (E) ≥ |Qj | + ǫ
j=1

for any ǫ > 0. The desired inequality follows.

2.4 The class of (Lebesgue) measurable sets


The motivation to define a class of measurable subsets on Rd that does not
comprise all subsets of Rd is to have a class of sets for which countable additivity
holds (cf. Theorem 2.2 below).
Definition 2.2. A subset E ⊂ Rd is (Lebesgue) measurable if for any ǫ > 0
there exists an open set U with E ⊂ U and

m⋆ (U \ E) ≤ ǫ .

If E is (Lebesgue) measurable we define its (Lebesgue) measure m (E) to be


m (E) = m⋆ (E).
How big is the class of Lebesgue measurable sets? An answer is given by
the following Proposition, which shows that this class of sets is closed under
taking countable intersections, countable unions and complements (something
we will call a σ-algebra of sets below, see Section 2.4.4), and contains the open
and closed sets.

12
Proposition 2.2. The following sets are (Lebesgue) measurable:
1. Every open set is measurable.
2. Sets of exterior measure 0 are measurable.
3. A countable union of measurable sets is measurable.
4. Closed sets are measurable.
5. The complement of a measurable set is measurable.
6. A countable intersection of measurable sets is measurable.
Proof. Item (1) follows from the definition and (2) is a simple consequence of
the third property of the exterior measure. Namely, if E has measure zero, there
exists a U open with E ⊂ U and m⋆ (U) ≤ 0 + ǫ. Since U \ E ⊂ U monotonicity
implies m⋆ (U \ E) ≤ ǫ. For item (3) we use the standard 2ǫn -trick. Let A1 , A2 , ...
be measurable sets. This means that we can pick Ui open such that Ai ⊂ Ui
and m⋆ (Ui \ Ai ) ≤ 2ǫi . Now ∪i Ai ⊂ ∪i Ui and hence by monotonicity

! !
[ [ [ X
m⋆ Ui \ Ai ≤ m⋆ (Ui \ Ai ) ≤ m⋆ (Ui \ Ai ) ≤ ǫ .
i i i i=1

Turning to (4) we first observe that it suffices to prove the claim for closed and
bounded (hence compact sets). This is because we can write an arbitrary closed
set F as a countable union of compact sets F = ∪∞

n=1 B n (0) ∩ F . We can
then use item (3). In particular we will assume now that F is compact, so in
particular m⋆ (F ) < ∞. Let ǫ > 0 be prescribed. By property (3) of the exterior
measure we find a U open such that F ⊂ U and m⋆ (U ) ≤ m⋆ (F ) + ǫ . Since
F is closed, U \ F is open and by Theorem 2.1 we Pcan express it as a countable
union of almost disjoint closed cubes, U \ F = ∞ j=1 Qj . Clearly it suffices to
show the measure of this is ǫ-small. To do this we observe that for any N , the
union K = ∪N i=1 Qj is compact. Since K and F are disjoint, they are a positive
distance apart and we conclude
N
X
m⋆ (U) ≥ m⋆ (K ∪ F ) = m⋆ (K) + m⋆ (F ) = m⋆ (Qj ) + m⋆ (F )
j=1

Since m⋆ (F ) < ∞ we can subtract it an combine the above with the boxed
inequality above to obtain for any N
N
X
m⋆ (Qj ) ≤ ǫ .
j=1

Taking the limit N → ∞ we obtain the desired result as



X
m⋆ (U \ F ) = m⋆ (∪j Qj ) = m⋆ (Qj ) ≤ ǫ .
j=1

To show (5) we proceed as follows. For every n ≥ 1 we choose an Un open with


E ⊂ Un and m⋆ (Un \ E) ≤ n1 , which we can do since E is measurable. We now

13
note that (Un )c is closed (hence measurable by (4)) and that S = ∪∞ c
n=1 (Un ) is
also measurable by (3). We now easily check the inclusions

S ⊂ Ec and E c \ S ⊂ Un \ E for any n.

to conclude
1
m⋆ (E c \ S) ≤ m⋆ (Un \ E) ≤
n
and hence that m⋆ (E c \ S) = 0. Since E c = S ∪ (E c \ S) is a union of two
measurable sets it is measurable.
For (6) it suffices to note that by de Morgan’s laws
 c
∞ ∞
c
\ [
Ej =  (Ej ) 
j=1 j=1

and use (5) and (3).

2.4.1 The property of countable additivity


The Lebesgue measurable sets satisfy the following crucial property which is
called countable additivity:
Theorem 2.2. If E1 , E2 , ... are disjoint measurable sets and E = ∪∞
j=1 Ej , then


X
m (E) = m (Ej ) .
j=1

Proof. We first claim that it suffices to prove this for the Ej being bounded.
Why? Suppose we had this result and we are trying to prove the general case.
We take (Qk )∞
k=1 the sequence of cubes of length k. We have Qk ⊂ Qk+1 for all
k ≥ 1 and we define S1 = Q1 and Sk = Qk \ Qk−1 for all k ≥ 2. We define

Ej,k = Ej ∩ Sk

which are measurable and bounded sets, disjoint for all j and k. We have
[ ∞
[
E= Ej,k and Ej = Ej,k
j,k k=1

are both disjoint unions of bounded measurable sets. Since we are assuming we
have the result in this case, we conclude
 

[ X ∞ X
X ∞ ∞
X
m (E) = m  Ej,k  = m (Ej,k ) = m (Ej,k ) = m (Ej ) .
j,k j,k j=1 k=1 j=1

So now let’s prove it assuming that the Ej are bounded. Clearly “≤” holds by
monotonicity, so we only need to prove “≥” (and we can assume m (E) < ∞).
Recall the idea of how we proved this in the case of cubes: We found strictly
smaller closed cubes, then worked for finite N . The Lebesgue measurability
gives us the following analogue:

14
Lemma 2.3. Let E ⊂ Rd be measurable. Then for every ǫ > 0 there exists a
closed set F ⊂ E with m (E \ F ) ≤ ǫ.
Proof. Apply the definition of measurability to the complement: There exists
an open U with E c ⊂ U and m (U \ E c ) ≤ ǫ. If we define F = U c , then F is
closed and in view of E \ F = F c \ E c we have m (E \ F ) ≤ ǫ.
In particular, we can find a closed set Fj in each Ej with m (Ej \ Fj ) ≤
ǫ
2j . Now for fixed N the F1 , ...., FN are closed, bounded (hence compact) and
disjoint, hence positive distance apart and we can apply the properties of the
exterior measure to conclude for all N
 
N N N  N
[ X X ǫ X
m (E) ≥ m  Fj  = m (Fj ) ≥ m (Ej ) − j ≥ m (Ej ) − ǫ .
j=1 j=1 j=1
2 j=1

The step in the middle follows from monotonicity applied to Ej = Fj ∪(Ej \ Fj ).


Taking the limit as N → ∞ and observing that this inequality holds for any
ǫ > 0 we have shown the desired inequality.

2.4.2 Regularity properties of the Lebesgue measure


The following Proposition establishes certain “continuity” properties (from above
and below) of the Lebesgue measure
Proposition 2.3. Let E1 , E2 , .... be a countable collection of measurable subsets
of Rd .
1. If the Ek are increasing to E in that Ek ⊂ Ek+1 for all k and E = ∪∞
i=1 Ei .
Then
m (E) = lim m (EN ) .
N →∞

2. If the Ek are decreasing to E in that Ek ⊃ Ek+1 for all k and E = ∩∞


i=1 Ei
and m (Ek ) < ∞ for some k, then

m (E) = lim m (EN )


N →∞

Remark 2.1. The bold emphasises the additional condition in the decreasing
case. To see that the conclusion is not generally valid without this condition
consider the case En = [n, ∞).
Proof. For the first part, set G1 = E1 and Gk = Ek \ Ek−1 . The Gk are then
disjoint and measurable and ∪k Gk = ∪k Ek . Now apply countable additivity for
disjoint measurable sets (Theorem 2.2) to obtain

X N
X
m (∪∞ ∞
m (Gi ) = lim m ∪N

i=1 Ei ) = m (∪i=1 Gi ) = m (Gi ) = lim i=1 Gi
N →∞ N →∞
i=1 i=1

and noting that EN = ∪N i=1 Gi the proof is complete.


For the second part we first note that wlog m (E1 ) < ∞ as we can always
forget about finitely many elements in the sequence. We now reduce it to case
1 by defining Fj = E1 \ Ej which is clearly an increasing sequence F1 ⊂ F2 ⊂

15
F3 ⊂ .... and also m (Fj ) = m (E1 ) − m (Ej ) since E1 = Fj ∪ Ej is a disjoint
union of measurable sets. We also observe that ∪∞ j=1 Fj = E1 \ E. Combining
these things and using the first part we find

lim (m (E1 ) − m (Ej )) = lim m (Fj ) = m ∪∞



j=1 Fj = m (E1 \ E) = m (E1 )−m (E) .
j→∞ j→∞

As m (E1 ) < ∞, we can subtract it from both sides and obtain the result.

2.4.3 Invariance properties of the Lebesgue measure


The Lebesgue measure is invariant under translations rotations and reflections.
In this section we discuss the translation invariance.
For E ⊂ Rd measurable we define the h-translated set

Eh = E + h := {x + h | x ∈ E} for h ∈ Rd fixed.

Theorem 2.3. If E is measurable, then the h-translated set Eh is also measur-


able and m (E) = m (Eh ).
Proof. The conclusion clearly holds if E is a cube. This observation allows us
to conclude that m⋆ (E) = m⋆ (Eh ) holds for an arbitrary set E, since given
any covering Qj of E by cubes, the h-translated cubes will cover Eh . Finally, if
E is measurable and U open with E ⊂ U and m⋆ (U \ E) ≤ ǫ is given, then the
translated set Uh is also open, satisfies Eh ⊂ Uh and m⋆ (Uh \ Eh ) ≤ ǫ.
In the same way we can prove the invariance under reflexions and rotations.

2.4.4 σ-algebras and Borel sets


Definition 2.3. A σ-algebra is a non-empty collection of subsets of Rd that is
closed under countable unions, countable intersections and complements.
Note that the empty set and the set E = Rd are contained in any σ-algebra
(why?). To give some examples, we note that clearly the collection of all subsets
of Rd forms a σ-algebra, Mall . We have also seen the σ-algebra of Lebesgue
measurable sets (cf. Proposition 2.2), MLebesgue . It turns out that (assuming
the axiom of choice) MLebesgue is properly contained in Mall . In other words,
there do exist non-measurable subsets on Rd (cf. Section 2.5). As another ex-
ample we will construct the Borel σ-algebra, BR below, which will turn out to
be properly contained in MLebesgue .
We note that it is easy to show that the intersection of two σ-algebras M and
N is again a σ algebra. (Clearly, if A ∈ M ∩ N then A ∈ M and A ∈ N . Hence
Ac ∈ M and Ac ∈ N and therefore Ac ∈ M ∩ N . Similarly for countable unions
and intersections.) In fact, using the argument sketched in the bracket one shows
that the intersection of an arbitrary collection (not necessarily countable) of σ-
algebras is again a σ-algebra. This observation allows us to prove the following
Theorem:
Theorem 2.4. If F is an arbitrary collection of subsets of Rd , there exists a
unique smallest σ-algebra M which contains F .
We call the σ-algebra promised in the theorem the σ-algebra generated by F
and denote it by M (F ).

16
Proof. Take the intersection of all σ-algebras which contain F . Note this inter-
section is non-empty as Mall is a σ-algebra containing F . The intersection is
the smallest σ algebra containing F in the sense that M (F ) is contained in any
σ-algebra which includes the sets from F . This also gives the uniqueness.
Definition 2.4. If U denotes the collection of all open sets in Rd , then M (U)
is called the Borel σ-algebra, denoted BR (containing the Borel sets).
Observation 2.1. The Borel σ-algebra can also be generated by closed cubes,
i.e. if Q denotes the collection of all closed cubes in Rd , then M (Q) = BR .
To verify the Observation note first that any open set lies in the σ-algebra
of closed cubes by Theorem 2.1 and conversely any cube lies in the Borel σ-
algebra. We combine this with the following general fact: If E is any collection
of subsets of Rd satisfying E ⊂ M (F ), then M (E) ⊂ M (F ). (Indeed, M (F )
is a σ-algebra containing E, so the smallest σ-algebra containing E must be
contained in it.)
It is of course natural to ask whether the Borel sets are properly contained in
the Lebesgue measurable sets. The answer is yes and the following Proposition
(as well as the exercises of Example Sheet 3) clarifies the relation between the
two. First we need a definition
Definition 2.5. A Gδ -set is a countable intersection of open sets. An Fσ -set
is countable union of closed sets.
Proposition 2.4. Let E ⊂ Rd . Then E is measurable
1. if and only if E is Gδ with a set of measure zero removed,
2. if and only if E is the union of an Fσ and a set of measure zero.
In particular, any Lebesgue measurable set can be obtained from a Borel set
by adjoining a set of a measure zero to the latter.
Proof. If E satisfies the conditions in (1) or (2) then E is measurable since Gδ ,
Fσ and sets of measure zero are measurable.
For the converse of (1) let now E be measurable. We choose for any n ≥ 1
an open Un with m (Un \ E) ≤ n1 . Then clearly S = ∩∞ n=1 Un is a Gδ containing
E and the measure of the complement S \ E is zero in view of m (S \ E) ≤
m (Un \ E) ≤ n1 being true for all n ≥ 1. Write E = S \ (S \ E).
For the converse of (2) let E be measurable and choose for any n ≥ 1 a
closed Fn with m (E \ Fn ) ≤ n1 . Then clearly S = ∪∞ n=1 Fn is an Fσ which
is contained in E and the complement E \ S has measure zero in view of
m (E \ S) ≤ m (E \ Fn ) ≤ n1 being true for all n ≥ 1. Write E = S∪(E \ S).

2.5 Construction of a non-measurable set


We construct a non-measurable set in d = 1. We start with X = [0, 1] and
define the following equivalence relation on X.

x∼y if x − y ∈ Q. (Note that x − y ∈ [−1, 1].)

17
It is easy to check this is indeed an equivalence relation. By the fundamen-
tal theorem of equivalence relations X can be partitioned as the union of the
(disjoint) equivalence classes
[
X = [0, 1] = Eα
α

To construct the non-measurable set N we choose from every equivalence class


Eα precisely one element xα (this uses the axiom of choice). The set N is the
collection of these chosen xα :
N := {xα }

We claim that N is not measurable. Suppose it was. Take (rk )k=1 an enumer-
ation of the rationals in [−1, 1] and consider the translates

Nk := N + rk

• We first note that the Nk are disjoint. Indeed, if u ∈ Nk ∩ Nk′ for k 6= k ′ ,


then

u = xα + rk = xβ + rk′ and hence xα − xβ = rk′ − rk 6= 0

from which we conclude α 6= β and that xα and xβ are in the same equiv-
alence class, which is impossible, since we selected precisely one element
from each equivalence class.
• We have Nk ⊂ [−1, 2] for any k and in particular ∪∞ k=1 Nk ⊂ [−1, 2].
This follows easily from the fact that N ⊂ [0, 1] and rk ∈ [−1, 1].
S
• We have [0, 1] ⊂ k Nk .
To see this, let x ∈ [0, 1]. Since x sits in some equivalence class, we have
x = xα + rk for some xα and some rk ∈ [−1, 1]. It follows that x ∈ Nk for
some k.
• We have m (N ) = m (Nk ) by translation invariance.
Combining the facts above we have
[
[0, 1] ⊂ Nk ⊂ [−1, 2]
k

with the union being a disjoint union. Monotonicity and the countable additivity
for measurable sets imply that

X
1≤ m (Nk ) ≤ 3
k=1

This is a contradiction both in the case where m (N ) = m (Nk ) = 0 and when


m (N ) = m (Nk ) > 0.
Remark 2.2. We can use the non measurable set N to show that finite addi-
tivity generally fails for the exterior measure as was claimed in Section 2.3.2.

18
2.6 Measurable Functions
From measurable sets we now turn to measurable functions. To motivate the
definition, it is actually worth taking a step back viewing things from a slightly
more abstract point of view.

2.6.1 Some abstract preliminary remarks


Remember that a topological space (X, τ ) is a set X together with a collection τ
of subsets of X (which contains the empty set and X itself) which are declared
to be open and which are closed under arbitrary unions and finite intersections.
If X and Y are topological spaces, then f : X → Y is continuous is f −1 (V) is
open for any open set V in Y .
We can define a measure space (X, M) as a set X together with a collection of
subsets M (which contains the empty set and X itself) which are declared to be
measurable and which are closed under countable unions, countable intersections
and taking complements.
If X is a measure space and Y is a topological space, then we say that
f : X → Y is measurable provided that f −1 (V) is measurable for all open sets
V in Y .
In what we do in the next 4-5 weeks, X will always be Rd and Y will either
be R or the extended reals R. Recall the latter is a topological space whose open
sets are the segments (a, b), [−∞, a), (b, ∞] for a, b ∈ R and unions thereof.
Because for us Y = R or Y = R, you can in principle forget about the
abstract point of view just described. However, it is useful to have it at the
back of your mind.

2.6.2 Definitions and equivalent formulations of measurability


For E a measurable subset of Rd we will consider functions f : Rd ⊃ E → R,
called finite (real)-valued functions and f : Rd ⊃ E → R called extended real-
valued functions. We’ll be most interested in functions defined on all of Rd ,
i.e. the case E = Rd .
Definition 2.6. Let Y be a topological space (for us R or R). A function
f : Rd ⊃ E → Y is measurable if f −1 (V) is measurable for all V open in Y .
Proposition 2.5. Let Y = R. Then the following are equivalent
1. f is measurable
2. f −1 ((a, ∞)) is measurable for all a ∈ R
3. f −1 ([a, ∞)) is measurable for all a ∈ R
4. f −1 ((−∞, a)) is measurable for all a ∈ R
5. f −1 (a, b) is measurable for all a, b ∈ R.
6. ...
Proof. (1) =⇒ (2) is trivial. To get (3) from (2) we observe
∞  ! ∞  
−1 −1
\ 1 \
−1 1
f ([a, ∞)) = f a− ,∞ = f a − ,∞
n=1
n n=1
n

19
where we noted that the inverse image commutes with arbitrary intersections.
c
c
To get (4) from (3) we note f −1 ((−∞, a)) = f −1 ([a, ∞) ) = f −1 ([a, ∞))
using that the inverse image commutes with taking the complement. I leave the
implication (4) to (5) to you and we conclude (5) =⇒ (1) by noting that any
open set U can be written as ∪∞n=1 (an , bn ) (why?).

Using the same arguments one proves


Proposition 2.6. Let Y = R. Then the following are equivalent
1. f is measurable
2. f −1 ((a, ∞]) is measurable for all a ∈ R
3. f −1 ([a, ∞]) is measurable for all a ∈ R
4. f −1 ([∞, a)) is measurable for all a ∈ R
5. f −1 (a, b) is measurable for all a, b ∈ R and f −1 (∞) and f −1 (−∞) are
measurable
6. ...
Proof. Exercise.
To sum up this discussion from a practical point of view, in order to check
whether a given function f : Rd ⊃ E → R is measurable it suffices to check
whether the sets {x ∈ E | f (x) > a} are measurable for all a. Note the latter
sets are equal to f −1 ((a, ∞]) if Y = R and equal to f −1 ((a, ∞)) if Y = R.
Alternatively it suffices to check that the sets {x ∈ E | f (x) < a} are measurable
for all a. Etc.
The key ingredient in the above propositions was that the inverse image
commutes with arbitrary unions, intersections and complements combined with
the σ-algebra structure on the measurable sets. We can formulate this a bit
more abstractly as follows:
Proposition 2.7. The following are equivalent:
1. The function f : Rd ⊃ E → Y is measurable
2. The set f −1 (F ) is measurable for any F ∈ BY , the Borel σ-algebra of Y .
3. The set f −1 (F ) is measurable for any F ∈ F where F is any collection
of sets generating BY , the Borel σ-algebra of Y .
Proof. The implications (2) =⇒ (3) and (2) =⇒ (1) are immediate.
Let N be the collection of all sets on Y such that f −1 (F ) is measurable
for all F ∈ N . One easily checks that this is a σ-algebra on Y using that the
inverse image commutes with unions interections and complements (do it!).
(1) =⇒ (2): If f is measurable, then the σ-algebra N must contain the
open sets in Y and since the Borel σ-algebra is by definition the smallest such
σ-algebra we have BY ⊂ N and in particular the inverse image of any Borel set
must be measurable.
(3) =⇒ (2): If (3) holds, then F ⊂ N . Hence M (F ) ⊂ N and since
M (F ) = BY we are done.

20
Note that in view of Proposition 2.7, Proposition 2.5 follows immediately
from the observation that the intervals appearing in it (individually) generate
BR and similarly for Proposition 2.6.

2.6.3 Properties 1: Behaviour under compositions


Proposition 2.8. Let Y be a topological space.
1. If f : Rd → Y is continuous, then it is measurable.
2. If Z is a topological space and Φ : Y → Z is continuous, then the compo-
sition Φ ◦ f is measurable if f : Rd → Y is measurable.
Proof. For (1) we simply note that by continuity f −1 (V) is open (hence mea-
−1
surable) for any V ⊂ Y open. For (2) we observe that (Φ ◦ f ) (W) =
f −1 Φ−1 (W) and since Φ−1 (W) is open for any open W ⊂ Z we conclude


that the composition is measurable if f is.


Warning: The composition of two measurable functions does not have to be
measurable, even if the Φ in the composition f ◦ Φ is continuous. However, the
composition of two Borel measurable functions is Borel measurable.

2.6.4 Properties 2: Behaviour under limits


We recall that if (an ) is a sequence in R we can consider the sequence

bk = sup (ak , ak+1 , ....)

which is clearly a non-increasing sequence in k: b1 ≥ b2 ≥ ..... In particular we


can define
B = inf (b1 , b2 , ....)
We call B the upper limit of the (an ) and use the notation

B = lim sup an = inf sup an . (7)


n→∞ k≥1 n≥k

It can be proved that there exists a subsequence of (an ) that converges to B


and that B is the largest number with this property.
The lim inf an is defined analogously replacing sup by inf and conversely in
the above definition.
Recall that lim inf an = lim sup an if and only if the limit lim an exists.
Suppose now (fn ) is a sequence of measurable functions fn : Rd → R. Then
we can define pointwise the functions
   
sup fn (x) := sup (fn (x)) and inf fn (x) := inf (fn (x))
n n n n

   
lim sup (x) := lim sup (fn (x)) and lim inf (x) := lim inf (fn (x))
n→∞ n→∞ n→∞ n→∞

Theorem 2.5. If (fn ) is a sequence of measurable functions as above, then the


four functions above are all measurable.

21
Proof. We observe that supn fn is measurable if the set {x| supn (fn (x)) > a}
is measurable for all a ∈ R. But the latter set can be written as ∪n {x|fn (x) >
a} = ∪n fn−1 ((a, ∞]) and this set is clearly measurable. The inf is done analo-
gously and for the lim sup it follows from the definition (7).
Corollary 2.1. If the sequence in the theorem converges to f , i.e. limn→∞ (x) =
f (x) for every x, then f is measurable.

2.6.5 Properties 3: Behaviour of sums and products


Theorem 2.6. Let Y = R or Y = R. If f : Rd ⊃ E → Y and g : Rd ⊃ E → Y
are measurable, then
1. −f is measurable
2. f 2 is measurable
3. k · f for k ∈ R is measurable
f +g and f ·g are measurable if both f and g are finite valued (i.e. Y = R).
4. The functions max (f, g), min (f, g) and |f | are measurable
Proof. The first follows from {x | − f (x) > a} = {x | f (x) < −a} for every
a ∈ R. For the second, we have for a ≥ 0 the identity
1 1
[
{x | f 2 (x) > a} = {x | f (x) > a 2 } {x | f (x) < −a 2 }

and for a < 0 that {x | f 2 (x) > a} = {x | f 2 (x) > 0} ∪ {x | f (x) = 0}. For the
third assertion, observe that
n o n ao
x | k · f (x) > a = x |f (x) > ,
k
[
{x | (f + g) (x) > a} = {x | f (x) > a − r} ∩ {x | g (x) > r} ,
r∈Q

and the formula


1
(f + g)2 − (f − g)2 .

fg =
4
For the fourth we note that

{x | max (f, g) (x) > a} = {x | f (x) > a} ∩ {x | g (x) > a}

{x | min (f, g) (x) > a} = {x | f (x) > a} ∪ {x | g (x) > a}


and the formula |f | = max (f, 0) − min (f, 0) for the absolute value.

2.6.6 The notion of “almost everywhere”


Let f : E → Y and g : E → Y be two functions. We say that f is equal to
g almost everywhere if the set {x | f (x) 6= g(x)} has measure zero. More
generally we define a statement to hold almost everywhere, if it holds except for
a measure zero set.
We conclude that if f = g for almost every x ∈ E as above, then f is
measurable if g is. This follows as the sets {x | f (x) > a} and {x | g(x) > a}

22
differ by a set of measure zero (which is measurable). Indeed, if N is the set
where the two sets differ, then

{x | f (x) > a} = ({x | g (x) > a} ∩ N c ) ∪ ({x | f (x) > a} ∩ N )

and the first set is measurable by the measurability of g and the second because
it is a subset of a set of measure 0.

2.7 Building blocks of integration theory


Given a set E ⊂ Rd we define the characteristic function of E as

1 if x ∈ E
χE (x) =
0 if x ∈ /E

It is easy to see that the function χE is measurable if and only if the set E is
measurable.

2.7.1 Simple functions


A simple function is defined as a finite linear combination (with real coefficients)
of characteristic functions, i.e.
N
X
f (x) = ak χEk (x)
k=1

for some N ∈ N, where ak ∈ R and the Ek are measurable sets. Equivalently,


we may define a simple function as a measurable function f : Rd → R, which
assumes only finitely many distinct values. Note that a simple function has
a canonical expression where all the ak are distinct and the Ek are disjoint
measurable sets (why?). The simple functions will constitute the building blocks
for the Lebesgue integral that we are about to define.

2.7.2 Step functions


Step functions are a narrower class of functions and form the building blocks for
the Riemann integral that you already know. They are defined as finite linear
combinations of characteristic functions of rectangles, i.e.
N
X
f (x) = ak χRk (x)
k=1

with the Rk being rectangles.

2.8 Approximation Theorems


2.8.1 Approximating a measurable function by simple functions
Our first approximation theorem concerns approximating non-negative measur-
able functions by simple functions. (It works in the same way whether the range
is [0, ∞] or [0, ∞).)

23
Theorem 2.7. Let f : Rd → [0, ∞] be measurable and non-negative. Then

there exists an increasing sequence of non-negative simple functions (φk )k=1
that converges pointwise to f :

φk (x) ≤ φk+1 (x) for all k and x and lim φk (x) = f (x) for all x
k→∞

Proof. We first truncate f as follows: We let QN denote the cube of length N


centred at the origin and define

 f (x) if x ∈ QN and f (x) ≤ N
FN (x) = N if x ∈ QN and f (x) > N
0 otherwise

It is easy to see that FN is still measurable and that FN (x) → f (x) for every
x as N → ∞. We now approximate FN by a sequence of simple functions by
partitioning the range as follows: We decompose the range [0, N ] into N · M
intervals of length 1/M and define
n ℓ ℓ + 1o
Eℓ,M = x ∈ QN | < FN (x) ≤ for 0 ≤ ℓ < N M .
M M
These sets are all measurable and we can thus define the simple function
NX
M−1

FN,M (x) = χE (x) .
M ℓ,M
ℓ=0

1
Note that by construction we have FN (x) − FN,M (x) ≤ M for all x. We finally
k
choose M = N = 2 and define a sequence of simple functions via

φk (x) := F2k ,2k (x) .

By construction we have F2k (x) − φk (x) ≤ 21k for all x and φk (x) → f (x) for
all x as k → ∞. Finally, φk is also increasing (why?).
We next remove the assumption that f should be non-negative:

Theorem 2.8. Let f : Rd → [−∞, ∞] be measurable. Then there exists a



sequence of simple functions (φk )k=1 that satisfies

|φk (x) | ≤ |φk+1 (x) | for all k and x and lim φk (x) = f (x) for all x.
k→∞

In particular, |φk (x) | ≤ |f (x) | holds for all x and all k.


Proof. We decompose f (x) = f + (x) − f − (x) where f + (x) := max (f (x) , 0)
and f − (x) = max (−f (x) , 0) are the positive and the negative part of f re-
spectively. Recall that the latter are measurable by Theorem 2.6. Since both
f + and f − are non-negative, we can apply the previous Theorem providing us
with simple functions φk → f + and ψk → f − . Hence Φk := φk − ψk satisfies
Φk (x) → f (x) for all x as k → ∞. Finally, note that |Φk (x) | = φk (x) + ψk (x)
(since if φk (x) 6= 0 for some x then ψk (x) = 0 for this x and conversely!), so
|Φk | is indeed increasing. The last claim is straightforward.

24
2.8.2 Approximating a mesurable function by step functions
We can also approximate a measurable function by the simpler step functions.
The price we pay is that the convergence is only almost everywhere. Recall
the measurable function (2) from the introduction to illustrate that one can-
not expect the convergence to hold everywhere when approximating with step
functions.
We first isolate the following proposition:
PN
Proposition 2.9. Let h = k=1 ak χEk be a simple function with m (Ek ) < ∞
for all k. Then for any ǫ > 0 there exists a step function ϕ such that

m ({x | ϕ (x) 6= h (x)}) ≤ ǫ for all k.

In other words, a step function approximates the simple function up to a set


of arbitrary small measure.
Proof. One first convinces oneself that it suffices to prove the result for h = χE
and E measurable with m (E) < ∞ (why?). To prove the latter, recall from
Example Sheet 2 thatone can approximate E by finitely many closed cubes such
SN
that m E∆ j=1 Qj ≤ 2ǫ . Moreover one can (using the usual procedure of
extending the sides of the cubes to form almost disjoint rectangles) find disjoint
SN SM PM
rectangles with j=1 Qj = j=1 Rj . We then define ϕ (x) = k=1 χRk (x).

Theorem 2.9. Let f : Rd → R be measurable. Then there exists a sequence of



step functions (ϕk )k=1 that convergence pointwise for almost every x.
Proof. We first note that f can be approximated by a sequence (ψk ) of simple
functions by Theorem 2.7 (with all Ej appearing in each of the ψk having
finite measure). For each ψk we apply Proposition 2.9 find a step function ϕk
approximating ψk such that
1
m ({x | ϕk (x) 6= ψk (x)}) ≤ for all k.
2k
We claim that ϕk → f except on a set of measure 0. To see this, we use the
Borel-Cantelli Lemma (Example Sheet 2). Indeed, the fact that

X
m ({x | ϕk (x) 6= ψk (x)}) < ∞
k=1

allows us to conclude that the set

M = {x |ϕk (x) 6= ψk (x) for infinitely many k}

has measure zero. We have Rd = M ∪ M c and for any x ∈ M c we have ϕk (x) =


ψk (x) for all k ≥ K for sufficiently large K. Hence limk→∞ ϕk (x) → f (x) for
any x ∈ M c .

25
2.9 Littlewoods Three Principles
Before we turn to the integration theory let us introduce three heuristic prin-
ciples which nicely summarise the relation of the new notions of “measurable
set” and “measurable function” that we introduced above to the more familiar
notions of rectangles and continuous functions:

1. Every measurable set is nearly a finite union of closed cubes. (See Example
Sheet 2, Exercise 3c for the precise statement.)
2. Every measurable function is nearly continuous.
3. Every convergent sequence of measurable functions is nearly uniformly
convergent.

Let us first make item 3 more precise (Egoroff’s Theorem) and then use it
to formulate and prove item 2 precisely (Lusin’s Theorem).
Theorem 2.10 (Egoroff). Suppose (fk ) is a sequence of measurable functions
fk : E → R with E measurable and m (E) < ∞. Assume that

fk → f a.e. on E.

Given ǫ > 0 we can find a closed set Aǫ ⊂ E such that m (A \ Aǫ ) ≤ ǫ and


fk → f uniformly on Aǫ .
Remark 2.3. Here is an example illustrating the theorem. Let fn : [−1, 1] → R
 1
nx if x > 0
fn (x) =
0 if x ≤ 0

Then fn (x) → 0 for all x but not uniformly near zero.


Remark 2.4. If m (E) = ∞ the result may not hold as the example of the mov-
ing bump fn = χ[n,n+1] illustrates. The fn converge to zero but not uniformly,
even after removing a set of large measure.
Remark 2.5. The theorem remains true for the target space being the extended
reals R. (How do we define uniform convergence in this case?)
Proof. We first observe that is suffices to prove the result for fk → f everywhere
3
on
P∞E. 1For later purposes we also define an N (depending only on ǫ > 0) by
ǫ
n=N 2 n < 2 . Now let us start the proof. We define the set
n 1 o
Ekn = x ∈ E |fj (x) − f (x) | < for all j > k
n
The set Ekn contains all x in the domain for which fj (x) is already 1/n-close to
the value of the limit function f (x) for all j bigger than k. Note that for fixed
n, any x is eventually contained in some Ekn by the assumption that fj → f
3 Indeed, if we prove the result for this case, then for general f we change f on a set of

measure 0 (say N ) to f˜ such that fk → f˜ everywhere. Applying the result we obtain a closed
set Aǫ with fk → f˜ uniformly on Aǫ and m (E \ Aǫ ) < ǫ. We then choose an open set U with
N ⊂ U and m (U ) ≤ ǫ. We now have that fk → f˜ = f uniformly on the closed set Aǫ ∩ U c
and also that m (E \ (Aǫ ∩ U )) ≤ m (E \ Aǫ ) + m (U ) ≤ ǫ + ǫ = 2ǫ.

26
everywhere, so in particular E = k Ekn for any n. We also have Ekn ⊂ Ek+1 n
S
and hence Proposition 2.3 applies, giving limk→∞ m (Ekn ) = m (E). In view of
m (Ekn ) + m (E \ Ekn ) = m (E) we conclude that for any n there exists a kn such
that m (E \ Ekn ) ≤ 21n holds for all k > kn . We now define
\
Ãǫ = Eknn
n≥N

The point is that with this definition, fj is uniformly continuous on Ãǫ . To see
this, recall that what we have to show is given δ > 0 there exists a J such that
|fj (x) − f (x) | < δ holds for all j > J and all x ∈ Ãǫ . Now indeed if we choose
n ≥ N with n1 < δ we have for any x ∈ Ãǫ (hence x ∈ Eknn for any n ≥ N ) the
inequality
1
|fj (x) − f (x) | < < δ for all j > kn .
n
Note that kn just depends on δ (andǫ).
Finally, we claim that m E \ Ãǫ < 2ǫ . This simply follows from observing
that \ [
Eknn = E \ Eknn

E\
n≥N n≥N

and using subadditivity of the measure together withour choice


 of N . To finish
the proof, we choose a closed subset Aǫ ⊂ Ãǫ with m Ãǫ \ Aǫ < 2ǫ (cf. Lemma
   
2.3). As a result we have m (E \ Aǫ ) ≤ m E \ Ãǫ + m Aǫ \ Ãǫ < ǫ and of
course fj converge uniformly on the subset Aǫ ⊂ Ãǫ .
We are now ready to formulate and prove the second of Littlewood’s princi-
ple.
Theorem 2.11 (Lusin). Let f : E → R be measurable and finite valued, with
m (E) < ∞. Then for any ǫ > 0 there exists a closed set Fǫ with Fǫ ⊂ E and
m (E \ Fǫ ) ≤ ǫ such that f Fǫ is continuous.

Remark 2.6. Note that the theorem does not make the (stronger) statement
that f is continuous on E at the points of Fǫ . Again the example of (2) is useful
here. That function is discontinuous at all points of [0, 1]. What is Fǫ ?
Proof. We use the result from Example Sheet 3 that every measurable function
is the pointwise limit almost everywhere of a sequence of continuous functions
together with Egoroff’s Theorem.
Given f we find (fn ) continuous with fn → f almost everywhere. Using
Egoroff’s Theorem we find a closed set Fǫ ⊂ E where the convergence is uniform
and such that m (E \ Fǫ ) ≤ ǫ. But the limit of a uniformly convergent sequence
of continuous functions is continuous proving that f |Fǫ

27
3 Integration Theory: The Lebesgue Integral
From now on all functions that appear are assumed to be measurable. Our goal
is to define the Lebesgue integral of a measurable function. To achieve this, we
proceed in stages. We first define the integral for simple function, the bounded
functions supported on a set of finite measure, then non-negative functions and
finally the general case.

3.1 Simple Functions


Given a simple function in canonical form (ak distinct and non-zero, Ek disjoint)
N
X
ϕ (x) = ak χEk (x)
k=1

(recall by definition m (Ek ) < ∞) we define the Lebesgue integral of ϕ as


Z N
X
ϕ (x) dx := ak m (Ek ) . (8)
Rd k=1

We can also define the integral of ϕ over a measurable subset E ⊂ Rd with


m (E) < ∞ by observing that ϕ (x) · χE (x) is still a simple function (why?) and
setting
Z Z
ϕ (x) dx := ϕ (x) · χE (x) dx . (9)
E Rd

Note that with this definition we can already integrate the function (2) from
the introduction since the rationals in [0, 1] are measurable with measure zero.
Proposition 3.1. Let ϕ, ψ be simple functions. The Lebesgue integral defined
as in (8) satisfies
1. Independence of the representation: If ϕ = N
P
k=1 ak χEk is any represen-
tation of ϕ (not necessarily canonical), then
Z N
X
ϕ (x) dx = ak m (Ek ) .
Rd k=1

2. Linearity: For λ, µ ∈ R we have


Z Z Z
(λϕ + µψ) = λ ϕ + µ ψ .

3. Additivity: If E and F are disjoint subsets of finite measure, then


Z Z Z
ϕ= ϕ+ ϕ.
E∪F E F

4. Monotonicity:
Z Z
d
ϕ ≤ ψ on R =⇒ ϕ≤ ψ.
Rd Rd

28
5. Triangle inequality: If ϕ is simple, then so is |ϕ| and
Z Z
ϕ ≤ |ϕ| .

R R
We have allowed ourselves to write the shorthand instead of Rd above.
Proof. The proof of (1) is a bit fiddly and is relegated to Example Sheet 4. For
(2) we note that
XN M
X
λϕ + µψ = λ a k χE k + µ bℓ χFℓ
k=1 ℓ=1

and by the independence of the representation proven in (1) we have


Z N
X M
X Z Z
(λϕ + µψ) = λ ak m (Ek ) + µ bk m (Eℓ ) = λ ϕ+µ ψ.
k=1 ℓ=1

For (3) we simply note χE∪F = χE + χF and use the linearity established in
(2). For (4) we noteR that if η ≥ 0 then the canonical form is everywhere non-
negative and hence η ≥ 0 by definition. Setting η = ϕ − ψ and using linearity
PN
the result follows. For (5), we put ϕ in canonical form, ϕ = k=1 ak χEk , hence
PN
|ϕ| = k=1 |ak |χEk is simple and observe
Z N
X N
X Z
ϕ = ak m (Ek ) ≤ |ak |m (Ek ) = |ϕ| .
k=1 k=1

R R
Observation 3.1. If f = g almost everywhere for f, g simple, then f= g.
Indeed, if P
h = 0 almost everywhere, then its canonical Rrepresentation must
look like h = N k=1 ak χEk with m (Ek ) = 0, which implies h = 0.

3.2 Bounded functions on sets of finite measure


We now extend our definition of the Lebesgue integral from simple functions to
bounded functions supported on sets of finite measure.
We define the support of f : Rd → R to be
supp f := {x | f (x) 6= 0}
Note that the set supp f is measurable if f is measurable. We say that f is
supported on E if f (x) = 0 whenever x ∈
/ E.
Lemma 3.1. Let f be a bounded function supported on a set E of finite measure.
If (ϕn ) is any sequence of simple functions bounded by M , supported on E, and
such that ϕn (x) → f (x) for almost every x, then
R
1. The limit limn→∞ ϕn exists
2. Moreover,
Z
f = 0 almost everywhere =⇒ lim ϕn = 0.
n→∞

29
Remark 3.1. Note that by Theorem 2.8, given an f as in the Lemma, there
always exists a (ϕn ) satisfying the assumptions in the Lemma.
R
Proof. Defining In = ϕn we set out to prove that In is a Cauchy sequence. Let
us fix ǫ > 0 arbitrary. Given the sequence (ϕn ) as in the Lemma, we first use
Egoroff’s Theorem to find a closed subset F of E where the convergence of ϕn
is uniform and such that m (E \ F ) ≤ ǫ. Given that the convergence is uniform
on F , we can find for the prescribed ǫ > 0 an N such that |ϕm (x) − ϕn (x) | ≤ ǫ
holds for all m, n ≥ N and all x ∈ F . But then for m, n ≥ N we have
Z Z Z
|Im − In | ≤ |ϕm − ϕn | = |ϕm − ϕn | + |ϕm − ϕn | ≤ ǫm (F ) + ǫ2M
E F E\F

where we used the triangle inequality for the second integral in the last step.
Since m (F ) ≤ m (E) < ∞ and ǫ is arbitrary we conclude that Im is Cauchy
and (1) is proven. For (2) one simply repeats the above argument now showing
that for any ǫ > 0 one can find N such that |In | ≤ ǫm (E) + ǫM for n ≥ N .
Using the Lemma we can define the Lebesgue integral for f : Rd → R a
bounded function supported on a set of finite measure:
Definition 3.1. Let f : Rd → R be a bounded function supported on a set E of
finite measure. We define its Lebesgue integral as
Z Z
f (x) dx := lim ϕn (x) dx (10)
Rd n→∞ Rd

where ϕn is any sequence of simple functions satisfying |ϕn | ≤ M for some


constant M , each ϕn supported on E and ϕn (x) → f (x) for a.e. x as n → ∞.
We already know that the limit exists but of course we need to show it
is independent of the particular approximation (ϕn ). Suppose therefore we
have two sequences (ϕn ) and (ψn ) with the above properties. Then (ϕn − ψn )
converges to zero almost everywhere, is bounded by 2M and isR supported on
R a set
of finite measure. By part (2) of Lemma 3.1 we conclude lim Rd ϕn = lim Rd ψn
(the individual limits existing by part (1)).
As in (9), we define the Lebesgue integral of a bounded function with
m (supp(f )) < ∞ over a measurable subset E ⊂ Rd of finite measure by
Z Z
f (x) dx := f (x) · χE (x) dx . (11)
E Rd

Proposition 3.2. The Lebesgue integral on bounded functions supported on


sets of finite measure defined in (10) and (11) is linear, additive, monotone and
satisfies the triangle inequality (cf. Proposition 3.1 for the precise formulation)
Proof. Follows by approximating with simple functions and taking limits.
We can now prove our first convergence theorem for the Lebesgue integral
which is a statement about interchanging the limit with the integral.
Theorem 3.1 (Bounded Convergence Theorem). Let (fn ) be a sequence of
measurable functions fn : Rd → R with
• |fn (x) | ≤ M for all n,

30
• each fn supported on E with m (E) < ∞,
• fn (x) → f (x) for almost every x as n → ∞.
Then f is measurable, bounded a.e., supported on E for a.e. x and
Z
|fn − f | → 0 as n → 0,
E

which by the triangle inequality immediately implies


Z Z
fn → f as n → ∞. (12)
E E

Remark 3.2. We will only prove the theorem with the additional assumption
that |f (x) | ≤ M holds
R for all x (it is easy to see the assumptions already imply
this a.e.). Otherwise E |fn − f | appearing in the conclusion is not (yet) defined!
In Section 3.3 we will define the integral for unbounded functions and also see
that the integrand can always be changed on a set of measure 0 without affecting
the integral. Hence the conclusion of the Theorem remains true in this case.
Proof. The limiting function f is measurable combining Corollary 2.1 and the
remark in Section 2.6.6. We also have from combining the first and the third
item that |f (x) | ≤ M for a.e. x and combining the the second and the third
that f is supported on E except for a set of measure 0. It remains to show
the estimate. The proof is almost identical to the one of Lemma 3.1. Given
ǫ > 0 we find (by Egoroff’s theorem) F ⊂ E with fn → f uniformly on F and
m (E \ F ) ≤ ǫ. By the uniformity on F we can find N such that for n ≥ N we
have |fn (x) − f (x) | ≤ ǫ for all n ≥ N . But then
Z Z Z
|fn − f |dx = |fn − f |dx + |fn − f | ≤ ǫm (E) + 2M m (E \ F )
E F E\F

holds for all n ≥ N which proves the claim. (Note that we have used Remark
3.2 here.)
Remark 3.3. Note that the conclusion of the theorem can be phrased as the
interchange of the limit and the integral in this particular situation:
Z Z
lim fn = lim fn .
n→∞ E E n→∞

For the Riemann integral we needed uniform convergence to draw this conclu-
sion. Here we obtain uniform convergence up to an arbitrary small set from
Egoroff ’s theorem, which together with the boundedness of the functions in-
volved allows us to draw the above conclusion.
Remark 3.4. Note that boundedness is indeed essential as the example of
fn (x) = nχ(0, 1 ] shows.
n

31
3.2.1 Riemann integrable functions are Lebesgue integrable
Using the bounded convergence theorem, we can show that a Riemann integrable
function is measurable and that the Riemann integral agrees with the Lebesgue
integral in this case.
Theorem 3.2. Suppose f is Riemann integrable on the closed interval [a, b].
Then f is measurable and
Z R Z L
f (x) dx = f (x) dx .
[a,b] [a,b]

Proof. Since f is Riemann integrable it is bounded, say |f (x) | ≤ M . Moreover,


there exists sequences of step functions (ϕk ) and (ψk ) with |ϕk (x) | ≤ M and
|ψk (x) | ≤ M for all k and x ∈ [a, b], such that

ϕ1 (x) ≤ ϕ2 (x) ≤ ... ≤ f (x) ≤ ... ≤ ψ2 (x) ≤ ψ1 (x)

(cf. Section 1.1, refining the partitions of the upper and lower Riemann sums)
and with
Z R Z R Z R
lim ϕk (x) dx = f (x) dx = lim ψk (x) dx . (13)
k→∞ [a,b] [a,b] k→∞ [a,b]

Now on step functions the Riemann and the Lebesgue integral agree by defini-
tion, so for all k we have
Z R Z L Z R Z L
ϕk (x) dx = ϕk (x) dx , ψk (x) dx = ψk (x) dx .
[a,b] [a,b] [a,b] [a,b]

We now define pointwise ϕ̃ (x) = limk→∞ ϕk (x) and ψ̃ (x) = limk→∞ ψk (x),
these limits existing by monotonicity and boundedness of f (x). The functions
ϕ̃ and ψ̃ are measurable (Corollary 2.1), bounded and supported on [a, b]. Hence
the Bounded Convergence Theorem yields
Z L Z L Z L Z L
lim ϕk (x) dx = ϕ̃ (x) dx and lim ψk (x) dx = ψ̃ (x) dx.
k→∞ [a,b] [a,b] k→∞ [a,b] [a,b]

Combining this with the previous equalities we easily obtain


Z L  
ϕ̃ (x) − ψ̃ (x) dx = 0 where also ϕ̃ − ψ̃ ≥ 0
[a,b]

from which we claim we can conclude ϕ̃ = ψ̃ almost everywhere (Exercise, or


see item (6) of Proposition 3.3 below). But then we must have ϕ̃ (x) ≤ f (x) ≤
ψ̃ (x) = ϕ̃ (x) almost everywhere and hence f = ϕ̃ and f = ψ̃ almost everywhere.
Finally, since ϕk → f a.e. by construction we have by definition of the Lebesgue
integral
Z L Z L Z R Z R
f (x) dx = lim ϕk (x) dx = lim ϕk (x) dx = lim f (x) dx
[a,b] k→∞ [a,b] k→∞ [a,b] k→∞ [a,b]

32
3.3 Non-negative functions
We proceed further in enlarging our class of functions that we can integrate.
We now consider measurable extended valued functions

f : Rd → [0, ∞]
Not only are these functions potentially unbounded, they can also take the
values ±∞ on a measurable set and they may also be supported on a set of
infinite measure, for instance all of Rd . We define the extended Lebesgue
integral for these functions by
Z Z
f (x) dx := sup g (x) dx (14)
Rd g

where we take the supremum over all measurable functions g such that 0 ≤ g ≤
f , such that g is bounded and supported on a set of finite measure.
Now the expression on the left is either finite or infinite. In the first case
we shall say that f is Lebesgue integrable. As usual we define for E ⊂ Rd
measurable the integral
Z Z
f (x) dx := f (x) χE (x) dx (15)
E Rd
−a/2
Example 3.1. The function Fa (x) = 1 + |x|2 is integrable for a > d.
The extended Lebesgue integral has the familiar properties of the integral:
Proposition 3.3. The integral defined in (14) and (15) satisfies the following:
1. Linearity: If f, g ≥ 0 and λ, µ ∈ R are both positive, then
Z Z Z
(λf + µg) = λ f + µ g

2. Additivity: If E and F are disjoint subsets of Rd and f ≥ 0, then


Z Z Z
f= f+ f
E∪F E F
R R
3. Monotonicity: If 0 ≤ f ≤ g then f≤ g.
4. If g is integrable and 0 ≤ f ≤ g then f is integrable.
5. If f is integrable then f (x) < ∞ for a.e. x.
R
6. If f = 0, then f (x) = 0 for a.e. x.
Proof. The third is immediate from the definition and so is the fourth. For
Rthe first, Rnote that it suffices to show the statement with λ = µ = 1 since
λf = λ f for λ > 0 follows again immediately from the definition. To show
it for λ = µ = 1 we first show the ≥ direction. Let ϕ1 ≤ f and ϕ2 ≤ g be
two bounded functions supported on a set of finite measure. Then we have
ϕ1 + ϕ2 ≤ f + g and hence
Z Z Z Z
f + g ≥ (ϕ1 + ϕ2 ) = ϕ1 + ϕ2

33
and taking the sup over all ϕ1 and ϕ2 we obtain the first direction. For the
second, we let η ≥ 0 be bounded, supported on a set of finite measure with
η ≤ f + g. We then define η1 (x) = min (f (x) , η (x)) and η2 = η − η1 . We note
η1 ≤ f and η2 ≤ g and hence
Z Z Z Z Z Z
η = (η1 + η2 ) = η1 + η2 ≤ f + g .

Taking the sup over all η we obtain the ≤ direction and item 1 is proven. The
second item then immediately follows using the linearity.
It remains to prove (5) and (6). For (5) we let Ek = {x | T f (x) ≥ k}
and E∞ = {x | f (x) = ∞}. We R then R have Ek ⊃ E k+1 and E ∞ = k Ek . The
integrability tells us that ∞ > f ≥ f χEk ≥ k ·m (Ek ) and hence m (Ek ) → 0
as k → ∞. Combining this with Proposition 2.3 we obtain m (E∞ ) = S∞ 0.
For (6) we let Ek = {x | f (x) > k1 }. Since {x | f (x) > 0} = k=1 Ek it
R conclude m ({x | f (x) > 0}) = 0.
suffices to show that m (Ek ) = 0 for Rall k to
We can infer this statement from 0 = f ≥ f χEk ≥ k1 m (Ek ) for all k ≥ 1.

R Note thatR the converse of (6) in also true. If η ≥ 0 and η = 0 a.e. then
η = supg g (x) dx = 0. (Otherwise there would exist a bounded function
supported on a set of measure zero with non-zero integral – a contradiction.)
This shows that the extra assumption in Remark 3.2 is unnecessary for the
conclusion of Theorem 3.1 to be valid.
Next we will try to prove convergence results for the extended Lebesgue
integral. In particular, we can revisit the example of Remark 3.4 and ask about
a general statement regarding the exchange of the limit and the integral if the
functions under consideration are not uniformly bounded.
Lemma 3.2 (Fatou’s Lemma). Let (fn ) be a sequence of measurable functions
with fn ≥ 0. If limn→∞ fn (x) = f (x) for almost every x, then
Z Z
f ≤ lim inf fn .
n→∞

Note that both the left hand side and the right hand side may be +∞.
Proof. We first note that it suffices to prove
Z Z
g ≤ lim inf fn
n→∞

for any g with 0 ≤ g ≤ f bounded and supported on a set of finite measure


because we can then take the supremum on the left to obtain the result. This
observation allows us to use the Bounded Convergence Theorem: We first define
gn (x) := min (g (x) , fn (x)) and observe that gn ≥ 0 is measurable, bounded
and supported on a set of finite measure. Moreover, gn (x) → g (x) for a.e. x as
follows from gn (x) − g (x) = min (0, fn (x) − f (x) + f (x) − g (x)). Monotonic-
ity of the integral and the bounded convergence theorem yields
Z Z Z
fn ≥ gn → g ,

so taking the lim inf on the left already produces the desired result.

34
For a general sequence of measurable non-negative functions the inequality
given by Fatou’s Lemma is the best one can do. However, under additional
assumption the interchange of the limit and the integral (and hence equality)
can be inferred. The following theorem provides such a setting and is one of the
cornerstones of the Lebesgue-theory of integration. It will be used many many
times in the sequel so it is important to know this statement well!
Theorem 3.3 (Monotone Convergence Theorem, MCT). Let (fn ) be a sequence
of (extended real-valued) measurable functions with fn ≥ 0 and fn → f a.e.
Suppose in addition fn (x) ≤ fn+1 (x) holds a.e. in x for any n. Then
Z Z
lim fn = f . (16)
n→∞
R R
Proof. We have fn ≤ f for any n by monotonicity of the integral. Hence
Z Z Z
lim sup fn ≤ f ≤ lim inf fn
n→∞ n→∞

with the second inequality being Fatou’s Lemma.


Remark 3.5. We write the shorthand fn ր f a.e. if fn → f a.e. and fn (x) ≤
fn+1 (x) a.e. in x for any n.
Remark 3.6. Note that both sides of the equality in (16) can be +∞ that is
(16) holds in the extended sense.
R
Remark 3.7. Note that the MCT provides a useful tool to compute f via
simple functions and the approximation of Theorem 2.7.
Before we turn to the other cornerstone of the Lebesgue theory, the dom-
inated convergence theorem (DCT), we enlarge our class of functions for the
integral one more time.

3.4 The General case and the notion of integrable


Let f : Rd → R be measurable.4
Definition 3.2. We say that f above is integrable if |f | is Lebesgue integrable
in the sense of the previous section. For such f we define
Z Z Z
f = f+ − f−

where f + = max (f (x), 0) and f − = max (−f (x), 0) are the positive and negative
part of f respectively.
Remark 3.8. Note that f = f + − f − and |f | = f + + f − and hence f ± ≤ |f |,
so f ± are indeed integrable. Note also that given two different decompositions of
f into a difference of non-negativeRfunctions,
R −i.e. Rf = f +R − f − = g + − g − , one
has f + g = f + g and hence f + g = fR + gR+ by linearity
+ − − + + −
ofR the
integral on non-negative functions. It follows that f + − f − = g + − g −
R

and hence that the value of the integral of f is independent of the decomposition.
4 We could allow R here but we will see below that the notion of integrability requires the

set of points where f (x) = ±∞ to have measure zero.

35
Remark 3.9. We can always modify f on a set of measure zero without affecting
the integrability of f or the value of the integral (cf. the comments after the proof
of Proposition 3.3). Therefore, we may adopt the convention that a function f
can be undefined on a set of measure zero. Cf. also Footnote 4.
Proposition 3.4. The integral defined above is linear, additive, monotone and
satisfies the triangle inequality.
Proof. Exercise.
We next prove two interesting regularity properties of integrable functions:
Proposition 3.5. Suppose f is integrable on Rd . Then for every ǫ > 0
1. There exists a set of finite measure B (a large ball, for instance) such that
Z
|f | < ǫ “vanishing at infinity”
Bc

2. There is a δ > 0 such that


Z
|f | < ǫ for all E with m (E) < δ “absolute continuity”
E

The first statement expresses the intuitive fact that the function f has to
go to zero at infinity in a suitable sense in order to be integrable. The second
property says that for fixed f , if one integrates over sufficiently small sets, the
integral is small as well. The name absolute continuity will become clearer to
us later (see Section 4.2.5 and also Example Sheet 5).
Proof. Wlog f ≥ 0 since otherwise we look at |f |.
For the first statement let Bn be the ball of radius n centred at the origin
and fRn (x) = Rf (x) χBn (x). Note that fn ր f and hence by the MCT
R (Theorem
R
3.3), fn → f < ∞. RThis means there exists an N such that f − fN < ǫ
which is equivalent to f (x) χ(BN )c < ǫ and hence the desired result.
For the second statement we set En = {x | f (x) ≤ n}R and fnR(x) :=
f (x)χEn (x). Noting that fn ր f and hence by the MCT fn → f , we
conclude the existence of an N with f − fN < 2ǫ . But this means that
R R

ǫ
Z Z Z
f= f − fN + fN < + m (E) N .
E E E 2
ǫ
R
Now if we choose δ < 2N , then m (E) < δ implies that E f < ǫ as desired.
We are ready to prove the other cornerstone of the Lebesgue Theory, the
dominated convergence theorem (DCT):
Theorem 3.4 (Dominated Convergence Theorem (DCT)). Let (fn ) be a se-
quence of measurable functions with fn → f a.e. If |fn (x) | ≤ g (x) where g is
integrable, then f is integrable and
Z
|fn − f | → 0 as n → ∞.

Hence by the triangle inequality


Z Z
lim fn = f.
n→∞

36
Proof. We provide two proofs. One is via Fatou’s Lemma. Starting from |fn −
f | ≤ 2g a.e., which holds by the triangle inequality, we apply Fatou’s Lemma to
the sequence of non-negative (after a change on a set of measure zero) functions
2g − |f − fn | to obtain
Z Z
2g ≤ 2g + lim inf (−|fn − f |) .
n→∞
R R
Since g is finite by assumption we obtain lim supn→∞ |fn − f | ≤ 0 which
proves the result.
The other proof
R uses Proposition 3.5. Given ǫ > 0 we first choose a large ball
BM such that (BM )c g < ǫ by (1) of Proposition 3.5. We next invoke Egoroff’s
theorem to choose X ⊂ BM with m (X) < δ such that fn → f uniformly on
BM \RX. Here δ > 0 is chosen as in (2) of Proposition 3.5, i.e. in particular so
that X g < ǫ. Using the uniform convergence on BM \ X we choose N large
such that |fn (x) − f (x) | < m(Bǫ M ) holds for all n ≥ N and all x ∈ BM \ X.
Combining everything we obtain for n ≥ N
Z Z Z Z
|fn − f | = |fn − f | + |fn − f | + |fn − f |
(BM )c BM \X X
Z Z Z
|fn − f | ≤ 2g + ǫ + 2g ≤ 2ǫ + ǫ + 2ǫ < 5ǫ
(BM )c X

which is what we needed to prove.

3.5 Aside: Complex-valued functions


So far we have considered real-valued functions f : Rd → R. The theory easily
extends to complex-valued functions f : Rd → C.pIndeed, we say that f (x) =
u (x) + iv (x) is Lebesgue integrable if |f (x) | = u(x)2 + v(x)2 is integrable.
It is easy to see that f is Lebesgue integrable if and only if both u and v are
integrable. We define the Lebesgue integral as
Z Z Z
f (x) dx := u (x) dx + i v (x) dx

As considering complex valued functions does not really add anything concep-
tually new to the theory we will continue to develop it for real valued functions.

3.6 The space of integrable functions as a normed vector


space
The class
 of integrable functions form a vector space, which we shall denote
L1 Rd . It is equipped with a (semi)norm
Z
kf kL1 (Rd ) := |f (x) |dx
Rd

It is a semi-norm precisely because kf kL1(Rd ) = 0 only implies f = 0 a.e.


However, it is not hard to show the following:

37
Exercise 3.1. Show that L1 Rd is a normed vectorspace if we define the ele-


ments to be equivalence classes of functions agreeing almost everywhere.


Recall that a norm induces a metric, here d (x, y) = kx − ykL1 (Rd ) , which
makes L1 a metric space. Recall also that a metric space is called complete if
every Cauchy sequence (xn ) in X converges to a limit x ∈ X, i.e. d (xn , x) → 0
as n → ∞.
Theorem 3.5 (Riesz-Fischer). The vectorspace L1 Rd is complete with respect


to the metric induced by the norm.


Remark 3.10. The theorem expresses how the Lebesgue integral can be under-
stood as the completion of the Riemann integral. See also .... below.
Proof. Let (fn ) be a Cauchy sequence. We first find a candidate limit f with
f ∈ L1 and then prove fn → f in the L1 -norm.
The key idea is to extract from (fn ) a subsequence which converges pointwise
to some f (using the completeness of R).5 To do this, we first construct a
subsequence (fnj ) with

1
kfnj+1 − fnj kL1 ≤ for all j ≥ 1. (17)
2j
We the define
k
X
gK (x) = |fn1 (x) | + |fnj+1 (x) − fnj (x) |
j=1


X
g (x) = |fn1 (x) | + |fnj+1 (x) − fnj (x) |
j=1

Note that g makes sense as an extended valued function. Since gK ր g almost


everywhere as k → ∞ we have that g is measurable and by the MCT that
Z Z Z k
X
g = lim gk = |fn1 (x) | + lim kfnj+1 (x) − fnj (x) kL1 < ∞
k→∞ k→∞
j=1

the last inequality following from the property (17) and that fn1 ∈ L1 . We
conclude that g is integrable and hence g (x) < ∞ for a.e. x. This implies that
the sum converges absolutely for a.e. x which means that the right hand side of
k
X
fnj+1 (x) = fn1 (x) + fnj+1 (x) − fnj (x)
j=1

converges for a.e. x, hence so does the left hand side. We conclude that the
subsequence fnj+1 converges pointwise for a.e. x to some limiting function which
we call f . Since |f (x) | ≤ g (x) for a.e. x we conclude that f is integrable, so
f ∈ L1 Rd . It remains to show that fnj → f in L1 . But this is immediate


from |f (x) − fnk (x) | ≤ 2g for a.e. x and the dominant convergence theorem.
5 Going to a subsequence is necessary as we can have kf − f k
n L1 → 0 for some (fn ) and f
such that fn (x) → f (x) for no x! See Example Sheet 6.

38
To finish the proof we recall that if a subsequence of a Cauchy sequence
converges to a limit f , then so must the entire sequence.6 Hence fn → f in L1
and the theorem is proven.
Corollary 3.1. Let fn → f in L1 . Then there exists a subsequence (fnk ) with
fnk → f a.e. pointwise.
Proof. The assumption implies that (fn ) is Cauchy in L1 . Then repeat the
construction in the proof of Theorem 3.5.

3.7 Dense families in L1 Rd
We next consider certain families of simple (both in the colloquial and the precise
sense) functions which are dense in L1 . Recall the definition of dense:
Definition 3.3. A family G of integrable functions is dense in L1 if for any
f ∈ L1 we can find a g ∈ G with kf − gkL1 < ǫ.
Why are dense families useful? In a typical application one wants to establish
an identity for integrable functions which involves the L1 -norm. To prove the
identity, it may be simpler to prove it for a dense family of functions in L1
because a (say) continuous function is much easier to manipulate than a general
element of L1 . Finally, a density argument allows one to extend the identity to
all L1 -functions. Example Sheet 5 provides an example.
Theorem 3.6. The following families of functions are dense in L1 Rd


1. simple functions
2. step functions
3. continuous functions of compact support
Proof. Exercise. Outline: For the first note that one may assume f ≥ 0 as one
can approximate separately for f + and f − . Then approximate f with (ϕk ) an
increasing sequence of simple functions converging pointwise to f and apply the
MCT to show convergence in L1 . For the second part it suffices to approximate
the characteristic function of a set of finite measure by a step function (why).
For this Problem 3 from Example Sheet 2 will be handy. Finally for the third
conclusion one needs to smooth the edges of a step function.

3.8 Fubini’s Theorem


We now turn to an important analytical tool in the integration theory. It al-
lows one to convert a d-dimensional integral into a d1 -dimensional and a d2 -
dimensional one (d1 + d2 = d).
To see that this is not entirely trivial, let us start by trying to integrate the
2
−y 2
function f (x, y) = (xx2 +y 2 )2 over the unit square [0, 1] × [0, 1]. Naively we might

do this in two ways. First integrate in y and then in x


Z 1 Z 1
x2 − y 2 x2 − y 2 π
Z
?
2 2 2
dydx = dx dy 2 2 2
= ,
[0,1]×[0,1] (x + y ) 0 0 (x + y ) 4
6 Let ǫ > 0 be given. Choose N such that kfn − fm k < 2ǫ for m, n ≥ N . Choose nk ≥ N
such that kf − fnk k < 2ǫ , so in particular kfn − fnk k < 2ǫ for all n ≥ N . The triangle
inequality now implies kfn − f k < ǫ for all n ≥ N as desired.

39
or first in x and then in y
Z 1 Z 1
x2 − y 2 x2 − y 2 π
Z
?
2 2 2
dydx = dy dx 2 2 )2
=− .
[0,1]×[0,1] (x + y ) 0 0 (x + y 4

The fact that the result depends on the order in which the integration is carried
out tells us that some care is needed to state assumptions when a d-dimensional
integral can be computed in terms of iterated ones.

3.8.1 Slices of measurable sets and functions


We begin by setting up some notation.
• We let Rd = Rd1 × Rd2 with d = d1 + d2 and d1 , d2 ≥ 1.
• For E ⊂ Rd1 × Rd2 a subset we define the slices of E

Ex = {y ∈ Rd2 | (x, y) ∈ E} and Ey = {x ∈ Rd1 | (x, y) ∈ E}

• For a measurable function f : Rd1 × Rd2 → R we define

the slice corresponding to y ∈ Rd2 as the function f y (x) := f (x, y) with y fixed

the slice corresponding to x ∈ Rd1 as the function f x (y) := f (x, y) with x fixed

The big question now is:


• If E is a measurable set, are Ex and Ey also measurable?
• If f is a measurable function, are the slices f x and f y also measurable?
It is not hard to see that the answer is generally no. Take R2 and the set
E := N × {0} ⊂ R × {0} ⊂ R2 , with N a non-measurable set in R. As a subset
of a measure zero set in R2 , E is measurable. However, the slice E y=0 = N is
not measurable in R.
What rescues as is that measurability holds for almost every slice.
Let us first state the two fundamental theorems of this section and then
discuss their content in a sequence of remarks.

3.8.2 Statement and Discussion of Fubini’s and Tonelli’s Theorem


Theorem 3.7 (Fubini). Let f : Rd1 × Rd2 → R be integrable on Rd1 × Rd2 .
Then for almost every y ∈ Rd2
1. The slice f y is integrable in Rd1
2. The function defined by y 7→ Rd1 f y (x) dx is integrable in Rd2 .
R

Moreover, the integral of f can be computed iteratively


3. Z Z Z Z Z
dy f (x, y) dx = dy f y (x) dx = f.
Rd 2 Rd 1 Rd 2 Rd 1 Rd

Remark 3.11. Recall by definition all integrable functions are in particular


measurable.

40
Theorem 3.8 (Tonelli). Let f : Rd1 ×Rd2 → [0, ∞] be measurable and non-negative
on Rd1 × Rd2 . Then for almost every y ∈ Rd2
1. The slice f y is measurable on Rd1
2. The function defined by y 7→ Rd1 f y (x) dx is measurable on Rd2 .
R

Moreover, the integral of f can be computed iteratively


3. Z Z Z
dy f (x, y) dx = f in the extended sense.
Rd 2 Rd 1 Rd

Remark 3.12. Both theorems are symmetric in x and y, i.e. the R conclusion
(in say Fubini) is also that f x is integrable a.e. in Rd2 , that x 7→ Rd2 f x (y) dy
is integrable in Rd1 and that
Z Z Z
dx f (x, y) dy = f.
Rd 1 Rd 2 Rd

Remark 3.13. Note that the function in (2) of Fubini, y 7→ Rd1 f y (x) dx, is
R

defined for almost every y. This is consistent with our earlier convention that
an integrable function can be undefined on a set of measureR zero, cf. Remark
3.9. Similarly for (2) in Tonelli’s Theorem, where y 7→ Rd1 f y (x) dx is a
measurable function on Rd2 minus a set of measure 0 and hence agrees a.e. with
a measurable function on Rd2 .
Remark 3.14. Note that in Fubini’s Theorem we are assuming that f is inte-
grable. In Tonelli’s theorem this is not assumed and in particular both sides of
(3) in Tonelli’s Theorem can be infinite. The point is that if that is the case,
both, the iterated integrals and the d-dimensional
R one have to yield +∞! This
provides a useful
R strategy to compute Rd f for an arbitrary measurable function:
First compute Rd |f |. To compute this, one can by Tonelli’s theorem use ANY
convenient iterated integration:
• If one of them yields +∞ then all of them have to and we can conclude
by (3) of Tonelli that f is not integrable.
• If one of them yields a number smaller than +∞, then any iteration of
integrals has to yield that number and (3) of Tonelli implies that f is
integrable. Now the assumptions
R of Fubini’s theorem hold for this f and
we are allowed to compute Rd f using any version of iterated integrals.
Examples of this will be seen on Example Sheet 6.

Remark 3.15. Revisiting the example in the beginning we conclude that this f
cannot be integrable over the unit square. (Exercise: Show this directly.)
Remark 3.16. Even if both of the iterated integrals exist and agree one cannot
xy
infer that f in integrable over Rd . Try the function defined by f (x, y) = (x2 +y 2 )2
2 2
for x + y 6= 0 and f (x, y) = 0 for x = y = 0. [The proof is easiest in
polar coordinates which we have not introduced rigorously yet but give you an
immediate intuition of how things fail here.]

41
3.8.3 The proof of Tonelli’s Theorem (using Fubini)
Since we want to apply Fubini’s theorem, we start with the truncation

f (x, y) if |(x, y)| ≤ k and f (x, y) < k
fk (x, y) =
0 otherwise

Clearly each fk is measurable and in fact integrable. We clearly have fk ր f


and by the MCT also the limit (in the extended sense)
Z Z
fk (x, y) → f (x, y) . (18)
Rd Rd

By Fubini’s theorem applied to fk , there exists an Ek with m (Ek ) = 0 such S that


fky is integrable (in particular measurable) for all y ∈ (Ek )c . Set E = k Ek .
Then m (E) = 0 and fky is integrable (in particular measurable) for all y ∈ E c
and all k. Since fky ր f y , the function f y (being the limit of a sequence of
measurable functions) is also measurable, proving (1) of Tonelli. Further, the
MCT implies that
Z Z Z Z
fky = fk (x, y) dx ր f (x, y) dx = fy for y ∈ E c . (19)
Rd 1 Rd 1 Rd 1 Rd 1

Note that the right hand side may well be +∞Rfor some y ∈ E c !
Applying Fubini again, we know that y 7→ Rd1 fk (x, y) dx is a sequence of
d2
R almost everywhere on R . By
integrable (hence measurable) functions defined
(19) this sequence increases to the function Rd1 f (x, y), hence the latter is a
measurable function, proving (2) of Tonelli.
By the remarks in the previous paragraph, we can apply the MCT again to
(19) obtaining
Z Z Z Z
dy dx fk (x, y) → dy dx f (x, y) . (20)
Rd 2 Rd 1 Rd 2 Rd 1

Finally, we also know, by part (3) of Fubini applied to fk , the equality


Z Z Z
dy dx fk (x, y) = fk (x, y) . (21)
Rd 2 Rd 1 Rd

Combining (20), (21) and (18) yields the statement (3) of Tonelli’s theorem.

3.8.4 The proof of Fubini’s Theorem


We let F ⊂ L1 Rd be the set of integrable functions

satisfying all three con-
clusions of Fubini’s theorem and prove L1 Rd ⊂ F . The proof has four steps:


(1) Prove that F is closed under finite linear combinations


(2) Prove that F is closed under limits
(3) Prove that f = χE with E a measurable set of finite measure is in F . This
will be proven along the following lines:
(a) Prove it for E an open cube.

42
(b) Prove it for E the boundary of a closed cube.
(c) Prove it for E a finite union of closed cubes.
(d) Prove it for E open and of finite measure
(e) Prove it for E a Gδ ofr finite measure
(f) Prove it for E having measure zero.
(g) Prove it for E an arbitrary finite measure set.
(4) Conclude that any f ∈ L1 Rd is in F by approximating f with simple


functions and using the previous steps.


N
Step 1. Let (fn )k=1 ⊂ F . Then for each k we have a set Ak with m (Ak ) = 0
SN
and fny being integrable on Rd1 for all y ∈ (Ak )c . Defining A = k=1 Ak we
have m (A) = 0 and that fny is measurable and integrable for any n and any
y ∈ Ac . Clearly any linear combination of the fny satisfies the same statement
proving conclusion (1). Moreover the linearity of the integral immediately im-
plies conclusion (2) and (3).

Step 2. We prove that if (fk ) is a sequence in F with fk ր f for some


f ∈ L1 Rd , then f ∈ F . Note this immediately implies the same statement
for fk ց f since in this case −fk ր −f .
To prove this statement, note that we can restrict ourselves to fk ≥ 0 as
otherwise we can consider fk − f1 ≥ 0. We then have immediately
Z Z
lim fk (x, y) dxdy = f (x, y) (22)
k→∞ Rd Rd

from the MCT. Furthermore, by the assumption that fk ∈ F , we have for each
k a set Ak with m (Ak ) = 0 and fky S being (measurable and) integrable on Rd1

for y ∈ (Ak ) . Setting as usual A = k=1 Ak we have m (A) = 0 and fky being
c
c
integrable for all k and all y ∈ A .
Now from the fact that fky ր f y , it follows that f y is measurable and the
MCT produces Z Z
fky (x) dx ր f y (x) dx (23)
Rd 1 Rd 1

with the left hand side being integrable (hence measurable) by assumption. It
follows that the right hand side is measurable and applying the MCT again
yields Z Z Z Z
dy fky (x) dx → dy f y (x) dx . (24)
Rd 2 Rd 1 Rd 2 Rd 1

By (3) of Fubini applied to fk ∈ F we know that the left hand side satisfies
Z Z Z Z
y
dy fk (x) dx = fk (x, y) → f (x, y) (25)
Rd 2 Rd 1 Rd Rd

with the second (limit) statement being simply (22) from above. Combining
(24) and (25) yields the conclusion (3) of Fubini for f , namely
Z Z Z
dy f y (x) dx = f (x, y) < ∞ . (26)
Rd 2 Rd 1 Rd

43
Here the < ∞ follows from the assumption that f ∈ L1 . We now see that (26)
implies that the measurable function Rd1 f y (x)
R
R is integrable for almost every
y (which is conclusion (2) of Fubini) and from Rd1 f y (x) < ∞ for almost every
y we conclude that f y is integrable (which is conclusion (1) of Fubini).

Step 3.
(a) Let E a bounded open cube in Rd , E = Q1 × Q2 with Qi and open cube in
Rdi . For each fixed y, the characteristic function χE (x, y) is measurable
in x and integrable with (recall the notation (4))
Z
g (y) = χE (x, y) dx = |Q1 |χQ2
Rd 1

as it gives the volume of Q1 if y ∈ Q2 and zero otherwise (draw a picture!).


Now g (y) is measurable and integrable with
Z
g (y) dy = |Q1 ||Q2 | .
Rd 2
R
Since also Rd χE (x, y) = |E| = |Q1 ||Q2 | we have established all three
conclusions of Fubini’s theorem and hence that χE ∈ F .
R
(b) Let E be the boundary of a closed cube. We have Rd χE (x, y) = 0
since the boundary is a measure zero set in Rd . On the other hand, we
observe that for almost every y, the slice E y has Rmeasure 0 (what are
the exceptions? draw a picture!). Hence g (y) = Rd1 χE (x, y) dx = 0
for
R a.e. x. Since g (y) is zero almost everywhere, it is integrable with
Rd 2
g (y) dy = 0. This establishes all three conclusions of Fubini and
hence χE ∈ F .
SK
(c) Let E be a finite union of almost disjoint closed cubes, E = k=1 Qk . If
we let Q̃k denote the interior of Qk we can write
X X
χE = χQ̃k + χ Aℓ
k ℓ

where Aℓ denotes the various (finitely many!) boundary components of


the finite union. Step 1 immediately gives χE ∈ F .
(d) Let E be open and of finite measure. By Theorem 2.1 we can write E =
P∞ Pk
j=1 Qj . We define the sequence (fn ) of integrable functions k=1 χQj
which by the previous step is a sequence in F . Clearly also fn ր χE and
χE ∈ L1 Rd since E has finite measure. Step 2 implies χE ∈ F .
T∞
(e) Let E be a Gδ of finite measure, i.e. E = j=1 Ũj with Ũj open. Since E
 
has finite measure we can find a Ũ0 open with E ⊂ Ũ0 and m Ũ0 < ∞.
Then the sequence
\k
Uk = Ũj ∩ Ũ0
j=1

is a sequence of open sets which decreases to E, hence fk = χUk ց χE


and by Step 2 we conclude χE ∈ F .

44
(f) Let E be a set of measure zero. There is a Gδ -set G with E ⊂ G and
m (G) = 0 (why?). We know that χG ∈ F by the previous step and from
Z Z Z
dy dx χG (x, y) = χG = 0
Rd 2 Rd 1 Rd
R
we infer Rd1 χG (x, y) = 0 for a.e. y. Since 0 ≤ χE ≤ χG , the same state-
ment holds for χE . The three conclusions of Fubini are now immediate
and we conclude χE ∈ F .
(g) Let E be an arbitrary measurable set of finite measure. By Proposition
2.4 we can write E = G \ N for G a Gδ -set and N as set of measure
zero contained in G. Therefore χE = χG − χN and since this is a finite
linear combination of functions belonging to F we conclude by Step 1 that
χE ∈ F .
Step 4. We now conclude the proof. If f ∈ L1 Rd we have f = f + − f −


and we will show f + ∈ F and f − ∈ F separately. To show f + ∈ F pick (by


Theorem 2.7) an increasing sequence of simple functions (ϕk ) with ϕk ր f + .
Each ϕk is in F by Steps 1, 2 and 3 and the definition of a simple function. By
Step 2 we conclude f + ∈ F . Of course f − ∈ F is proven analogously.

45
4 Differentiation and Integration
Now that we have defined a new integral, the Lebesgue integral, we shall investi-
gate its relation with differentiation. In your first year analysis courses you met
this relation as the Fundamental Theorem of Calculus (involving the Riemann
integral).

4.1 Differentiation of the Integral


We first would like to investigate whether the following theorem is true:
Theorem 4.1. Given f : R → R integrable on [a, b], the function
Z x
F (x) = f (y) dy a ≤ x ≤ b (27)
a

is differentiable for almost every x ∈ [a, b] and F ′ (x) = f (x) holds for a.e. x ∈
[a, b].
Note that if f is continuous this statement holds by the fundamental theorem
of calculus. The above theorem indeed turns out to be true as stated. To prove
it, we will be lead to the averaging problem.
It is easy to see that Theorem 4.1 follows if we can show that for almost
every x we have
x+h x
1 1
Z Z
lim f (y) dy = lim f (y) dy = f (x) . (28)
h→0+ h x h→0+ h x−h

We reformulate this as the averaging problem:


1
Z
(AVP) Does lim f (y) dy = f (x) hold for a.e. x,
|I|→0 |I| I
x∈I

where I denotes a (say open) interval containing x. Below we will study the
averaging problem in dimension d. More precisely, we will prove the following
Theorem 4.2. (Lebesgue-Differentiation-Theorem) Suppose f is integrable on
Rd . Then
1
Z
lim f (y) dy = f (x) holds for a.e. x,
m(B)→0 m (B) B
x∈B

where B denotes an open ball containing x.


By repeating the proof of Theorem 4.2 with the limits in (28) replacing the
expressions in Theorem 4.2, we also obtain
Theorem 4.3. Suppose f is integrable on R. Then (28) holds for a.e. x hence
proving Theorem 4.1.

46
4.1.1 Proof of the Lebesgue Differentiation Theorem
To prove Theorem 4.2 we shall use the observation that the Theorem is true
for f being continuous and that the continuous functions are dense in L1 . To
estimate the error-terms that arise in the approximation by continuous functions
we shall need the important Hardy-Littlewood maximal function.
Definition 4.1. Let f : Rd → R be integrable. The maximal function f ⋆ is
defined by
1
Z
f ⋆ (x) = sup |f (y) |dy
B∋x m (B) B

where the sup is taken over all balls containing x.


Compared with the averages on the left hand side of the equality Theorem
4.2 we replace f by its absolute value and instead of taking the limit for small
balls we take the sup over all balls which contain x.
Remark 4.1. There are other variants of the maximal function. One can re-
place the balls by balls centred around x or the balls by cubes (or more general sets
of “bounded exccentricity”). Furthermore, in the 1-dimensional case one can de-
R x+h Rx
fine fR⋆ (x) = suph>0 h1 x |f (y) |dy and fL⋆ (x) = suph>0 h1 x−h |f (y) |dy. All
results about the maximal function proven in Proposition 4.1 below also hold for
these variants (the proof being exactly the same).
Proposition 4.1. Let f : Rd → R be integrable. Then the maximal function
f ⋆ satisfies:
(1) f ⋆ is measurable
(2) f ⋆ (x) < ∞ for a.e. x.
(3) f ⋆ satisfies the estimate

 3d
m {x ∈ Rd | f ⋆ (x) > α} ≤ kf kL1 (Rd ) .
α

As we shall see, f ⋆ is in general not integrable, i.e. not in L1 Rd . The




conclusion (3) serves as a substitute: It does not control the L1 -norm of f ⋆


but the measure of the set on the left. To understand this better, recall that
by Chebychev’s inequality (Example Sheet 2) we have for arbitrary g ∈ L1 the
inequality
1
m {x ∈ Rd | g (x) > α} ≤ kgkL1(Rd ) .

α
Hence controlling left hand side for f ⋆ in (3) is indeed weaker than controlling
the L1 -norm of f ⋆ (which, as mentioned, is generally impossible).
Proof. In order not to disturb the proof of Theorem 4.2 we postpone the proof
to Section 4.1.2.
Proof of Theorem 4.2. It suffices to show that for each α > 0 the set
1
n Z o
Eα = x lim sup f (y) dy − f (x) > 2α
m(B)→0 m (B) B
x∈B

47
has measure zero.7 To do this, we fix α and show for any ǫ > 0 we have
m (Eα ) < ǫ, hence m (Eα ) = 0.
Fix α and let ǫ > 0. We choose a continuous function of compact support g
with
kf − gkL1 (Rd ) < ǫ .
Since g is continuous, we have for all x that (why?)

1
Z
lim g (y) dy = g (x) .
m(B)→0 m (B) B
x∈B

We write the difference of the expression defining Eα as


1 1
Z Z
f (y) dy − f (x) ≤ (f (y) − g (y)) dy
m (B) B m (B) B
1
Z
+ g (y) dy − g (x) + |g(x) − f (x)|.
m (B) B
(29)

Taking the lim sup we observe that the second term on the right goes to zero
while the first term can be estimated by the maximal function as clearly the
lim sup is dominated by the sup over all balls. Hence
1
Z
lim sup f (y) dy − f (x) ≤ (f ⋆ − g ⋆ )(x) + |g(x) − f (x)|. (30)
m(B)→0 m (B) B
x∈B

We we now define the sets

Fα = {x | (f − g)⋆ (x) > α} and Gα = {x | |f (x) − g(x)| > α} ,

then Eα ⊂ Fα ∪ Gα as clearly at least one summand in (30) has to be be bigger


than α in order in order for the sum to be potentially bigger than 2α, as is
required for the definition of Eα . But then
1
m (Gα ) ≤ kf − gkL1 (Rd ) by Chebychev, (31)
α
3d
m (Fα ) ≤ kf − gkL1 (Rd ) by (3) of Proposition 4.1. (32)
α
We conclude
3d 1
m (Eα ) ≤
ǫ+ ǫ
α α
and since we have shown this for arbitrary ǫ > 0, m (Eα ) = 0.
7 Here

lim sup |...| := inf sup |...|


m(B)→0 δ>0 Bδ ∋x
x∈B
then ∞
S
If the Eα have measure zero n=1 E n
1 has measure zero and hence lim sup |...| = 0 for
almost every x. This implies that the limit in Theorem 4.2 exists and is equal to f (x) for
a.e. x.

48
4.1.2 Proof of Proposition 4.1
The first assertion follows from observing that Eα = {x ∈ Rd | f ⋆ (x) >
α} is Ropen. Indeed, if x ∈ Eα , there is an open ball B with x ∈ B and
1
m(B) B |f (y)|dy > α. But then, since a small neighborhood of x is also con-
tained in B we have f (x̃) > α for all points in that neighborhood.
Assertion (3) follows from (2) by observing that

\
{x | f ⋆ (x) = ∞} = {x ∈ Rd | f (x) > n}.
n=1
d
and hence by monotonicity and (2), m ({x | f ⋆ (x) = ∞}) ≤ 3n kf kL1 for any n,
which proves m ({x | f ⋆ (x) = ∞}) = 0 as desired.
To prove (2) we use the following version of the Vitali Covering Lemma
Lemma 4.1. Suppose B = {B1 , B2 , ..., BN } is a finite collection of open balls
in Rd . Then there exists a disjoint subcollection Bi1 , ..., Bik of B that satisfies
N k
!
[ X
Bn ≤ 3 d

m m Bij .
n=1 j=1

In other words, we can always pick a disjoint subcollection which covers


a fixed fraction of the original collection, the fraction not depending on the
number of balls in the collection.
Proof. The proof relies on the simple observation that if two balls B and B ′
intersect and ρ(B) ≥ ρ(B ′ ) (with ρ denoting the radius), then B ′ is contained
in a ball B̃ which is concentric with B and has 3 times the radius of B (draw a
picture!).
We then construct the subcollection as follows. In the first step we pick Bi1
to be the largest ball in the collection. We then delete from the collection B the
ball Bi1 and all balls intersecting it.
In the second step, we pick Bi2 to be the largest ball in the remaining
collection and delete from the latter Bi2 and all balls intersecting Bi2 . And so
on.
After finitely many steps we obtain a disjoint subcollection of balls Bi1 , ...,
Bik which is such that the union of the B̃ik (the balls with three times the
radius) contains the union of the original Bi . Therefore
 
N k k k
!
[ [ X   X
m B̃ij = 3d

m Bn ≤ m  B̃ij  ≤ m Bij ,
n=1 j=1 j=1 j=1

where we have used the dilation invariance of the Lebesgue measure in the last
step.
With the Lemma we can prove (3) of the Proposition. We let
Eα = {x | f ⋆ (x) > α} .
Given x ∈ Eα there exists a ball Bx containing x with
1 1
Z Z
|f (y)|dy > α or equivalently m (Bx ) ≤ |f (y)|dy. (33)
m(Bx ) Bx α Bx

49
S
We fix an arbitrary compact set K ⊂ Eα . We have K ⊂ x∈Eα Bx and by
SN
compactness, K ⊂ n=1 Bn for a finite subcollection. Now
N k k Z
!
[ X  3d X
m (K) ≤ m Bn ≤ 3 d m Bij ≤ |f (y)|dy
n=1 j=1
α j=1 Bij

where we have used the monotonicity in the first, the covering Lemma in the
second and (33) in the third step. Now it is clear that we have
3d 3d
Z Z
m (K) ≤ |f (y)|dy ≤ |f (y)|dy .
α Sj Bij α Rd

Since this holds for all compact subsets K of Eα the estimate holds for Eα itself
(Exhaust Eα by increasing compact sets and take the limit.)

4.1.3 Final Remarks


Note that applying the Lebesgue Differentiation Theorem to |f | one obtains
Corollary 4.1. We have f ⋆ (x) ≥ |f (x)| for almost every x.
Secondly, note that the Lebesgue Differentiation Theorem holds under the
weaker hypothesis that f ∈ L1loc Rd since differentiability is a local property.
Here f ∈ L1loc Rd if for every ball B the function f (x)χB (x) is integrable. For
2
instance f (x) = ex is in L1loc Rd but not integrable.


4.2 Differentiation of functions


We now want to ask the following question: What conditions on F : R → R
guarantee that a) F is differentiable almost everywhere and b) the identity
Z b
F (b) − F (a) = F ′ (x)dx (34)
a

holds. We know that if F is continuously differentiable, then the above identity


holds by the fundamental theorem of calculus.8 We also know, by what we
did in the previous sectionRthat if F is given as the indefinite integral of an
x
integrable function F (x) = a f (y)dy, then a) and b) are true. But how do we
characterise such functions? These circle of questions leads us to the study of
functions of bounded variation, which itself is intimately connected to the study
of rectifiability of curves in the plane.

4.2.1 Functions of bounded variation


Let γ be a curve in the plane γ : [a, b] → R2 with γ(t) = (x(t), y(t)), where we
assume x and y to be continuous real valued functions on the interval [a, b].
We say that γ is rectifiable if there exists an M < ∞ such that for any
partition a = t0 < t1 < ... < tN = b of [a, b] we have
N
X
|γ (tj ) − γ (tj−1 ) | ≤ M .
j=1

8 In fact, F differentiable almost everywhere and F ′ Riemann integrable is sufficient.

50
The length of a recitifable curve is defined as the smallest such M or,
equivalently, as the sup over all partitions, i.e.
N
X
L (γ) = sup |γ(tj ) − γ(tj−1 )|
a=t0 <t1 <...<tN =b j=1

The question we now ask is: What conditions on x(t) and y(t) for a given curve
guarantee that the curve is rectifiable? If x and y are continuously differentiable,
we know that this is the case and we can even establish the formula L(γ) =
Rb Rb p
a
dt|γ̇|dt = a dt ẋ2 (t) + ẏ 2 (t)dt. But what about weaker conditions?
Let F : [a, b] → R be a function, not necessarily continuous. Then given a
partition P of [a, b], say a = t0 < t1 < ... < tN = b, we define
N
X
VF,P = |F (tj ) − F (tj−1 )| to be the variation of F on P.
i=1

Definition 4.2. The function F : [a, b] → R is of bounded variation if there


exists an M < ∞ such that VF,P ≤ M holds for all partitions P of [a, b].
Observation 4.1. If P1 is a refinement of P2 (meaning that every point of P2
is also a point in P1 ), then VF,P1 ≥ VF,P2 .
This is easily seen by considering a finite sequence of one-point refinements
and the triangle inequality.
Theorem 4.4. A curve γ : [a, b] → R2 , γ(t) = (x(t), y(t)) is rectifiable if and
only if both the functions x and y are of founded variation.
Proof. This follows by observing that for any partition P we have
N q
X
(x(tj ) − x(tj−1 ))2 + (y(tj ) − y(tj−1 ))2
j=1
N
X
≤ (|x(tj ) − x(tj−1 )| + |y(tj ) − y(tj−1 )|)
j=1
N q
X
≤2 (x(tj ) − x(tj−1 ))2 + (y(tj ) − y(tj−1 ))2 (35)
j=1

4.2.2 Examples of functions of bounded variation


1. If F : [a, b] → R is monotone and bounded, then F is of bounded variation.
To see this, say that F is increasing (otherwise consider −F ) and |F (x)| ≤
M . Then we have for anyP partition with a = t0 < t1 < ... < tN = b that
P N N
j=1 |F (t j ) − F (t j+1 )| = j=1 F (tj ) − F (tj+1 ) = F (b) − F (a) ≤ 2M.

51
2. If F : [a, b] → R is Lipschitz on [a, b] then F is of bounded variation.
To see this, let |F (y) − F (x)| ≤ L|y − x| for all x, y ∈ [a, b].9 But then
N
X N
X
|F (tj ) − F (tj+1 )| = L|tj+1 − tj | ≤ L(b − a)
j=1 j=1

holds for any partition.


3. Let a, b > 0. The function F : [0, 1] → R given by
 a
x sin(x−b ) if 0 < x ≤ 1
F (x) =
0 if x = 0

is of bounded variation on [0, 1] if and only if a > b. See Exercise Sheet 7.

Note that (as the last example shows) F continuous does not imply F of bounded
variation (and neither the other way around, as a simple jump function shows)!

4.2.3 Characterisation of functions of bounded variation


The next theorem shows that the first example above in some sense captures all
functions of bounded variation:
Theorem 4.5. A real valued function F : [a, b] → R is of bounded variation if
and only if F is the difference of two increasing functions.
Proof. We first define the following functions:
N
X
TF (a, x) = sup |F (tj ) − F (tj−1 )|
P of [a,x] j=1

the total variation of F on [a, x] for a ≤ x ≤ b,


X
PF (a, x) = sup F (tj ) − F (tj−1 )
P of [a,x]
(+)

the positive variation of F (where the sum is over those j for which F (tj ) −
F (tj−1 ) ≥ 0 and
X
NF (a, x) = sup − (F (tj ) − F (tj−1 ))
P of [a,x] (−)

the negative variation of F (where the sum is over those j for which F (tj ) −
F (tj−1 ) ≤ 0. Observe that TF and also PF and NF are increasing and bounded,
hence functions of bounded variation. The functions are related as follows:
Lemma 4.2. Suppose F : [a, b] → R is of bounded variation on [a, b]. Then for
all a ≤ x ≤ b we have the relations:

F (x) − F (a) = PF (a, x) − NF (a, x)

TF (a, x) = PF (a, x) + NF (a, x)


9 Note this is in particular satisfied if F is differentiable everywhere and |F ′ | ≤ L (use the

mean value theorem).

52
Proof. Note that the above relations would clearly hold for any partition if there
was no sup in the definition of NF , PF and TF . The idea of the proof is therefore
to borrow an ǫ. Let ǫ > 0 be given. We pick partitions P1 and P2 such that
X X
0 ≤ PF − F (tj ) − F (tj−1 ) < ǫ , 0 ≤ NF − −(F (tj ) − F (tj−1 )) < ǫ
(+)∈P1 (−)∈P2

Refining to a common partition P we have (why?)


X X
0 ≤ PF − F (tj ) − F (tj−1 ) < ǫ , 0 ≤ NF − −(F (tj ) − F (tj−1 )) < ǫ
(+)∈P (−)∈P

Now since for the partition P we have


X X
F (x) − F (a) = F (tj ) − F (tj−1 ) + F (tj ) − F (tj−1 )
(+)∈P (−)∈P

we can add NF − PF on both sides and use the above estimates to obtain

|F (x) − F (a) − PF + NF | ≤ 2ǫ.

Since ǫ was arbitrary the first relation is established. For the second estimate
we note that for any partition P of [a, x], a = t0 < t1 < ... < tN = x we have
N
X X X
|F (tj ) − F (tj−1 )| = F (tj ) − F (tj−1 ) + − (F (tj ) − F (tj−1 )) .
j=1 (+)∈P (−)∈P

It follows that TF ≤ PF + NF (why?). For the other direction, start from


X X
F (tj ) − F (tj−1 ) + − (F (tj ) − F (tj−1 )) ≤ TF (36)
(+)∈P (−)∈P

which holds for any partition. Given ǫ > 0 find partitions P1 and P2 with
X X
0 ≤ PF − F (tj ) − F (tj−1 ) < ǫ , 0 ≤ NF − F (tj ) − F (tj−1 ) < ǫ
(+)∈P1 (+)∈P2

In particular, refining to a common partition P the two estimates hold replacing


P1 and P2 by P. Combining this with (36) yields PF + NF ≤ TF + 2ǫ and since
ǫ was arbitrary PF + NF ≤ TF as desired.
Using the Lemma, we easily complete the proof of Theorem 4.5. Note that if
F1 and F2 are increasing and bounded, then they both are of bounded variation
and hence their difference is of bounded variation. Conversely if F is of bounded
variation, we can set

F1 (x) = PF (a, x) + F (a) and F2 (x) = NF (a, x)

Both F1 and F2 are increasing and bounded and their difference is F (x) by the
Lemma.

53
4.2.4 Bounded variation implies differentiable a.e.
Now that we have introduced the class of functions of bounded variation we can
answer the question a) posed at the beginning of Section 4.2 by the following
Theorem 4.6. If F : [a, b] → R is of bounded variation, then F is differentiable
a.e., i.e. the limit
F (x + h) − F (x)
lim
h→0 h
exists for almost every x ∈ [a, b].
The proof of this theorem is quite intricate and we will postpone it to Sec-
tion 4.2.7, where we prove it under the additional assumption that F is also
continuous. For the general case, see Stein-Shakarchi.
Even the weaker statement (assuming that F is also continuous) leads to
the following corollaries, the first one following from the earlier observation that
Lipschitz functions are of bounded variation:
Corollary 4.2 (Rademacher’s theorem in 1 dimension). If F : [a, b] → R is
Lipschitz, then it is differentiable a.e.
Corollary 4.3. If F : [a, b] → R is increasing and continuous, then F ′ exists
almost everywhere. Moreover F ′ is integrable and
Z b
F ′ (x)dx ≤ F (b) − F (a) . (37)
a

Remark 4.2. Recall that ideally we would like to show the equality (34) instead
of the inequality in the corollary. However, the example of the Cantor-Lebesgue
function, studied in detail on the example Sheets 1, 3 and 8 shows that one
cannot expect equality to hold without additional assumptions (beyond bounded
variation) on F . The additional condition guaranteeing (34) will be that of
absolute continuity. See Section 4.2.5.
Proof. Note that F is certainly in BV and hence the derivative F ′ exists a.e. by
Theorem 4.6. In particular, for n ≥ 1 the difference quotients
F (x + n1 ) − F (x)
Gn (x) = 1 ≥0
n

form a sequence of non-negative measurable functions converging to F ′ , which


is hence a measurable function. Fatou’s Lemma (Lemma 3.2) applies and yields
Z b Z b Z b
F ′ (x)dx = lim Gn (x)dx ≤ lim inf Gn (x)dx .
a a n→∞ n→∞ a
We finally extend F to a continuous function on all of R and observe that the
integral on the right hand side can be written as
Z b Z b Z b
1 1
Gn (x) = F (x + 1/n) dx − F (x)dx
a 1/n a 1/n a
Z b+ n1 Z b
1 1
= F (y) dy − F (y)dy
1/n a+ n1 1/n a
Z b+ n1 Z a+ n1
1 1
= F (y) dy − F (y)dy (38)
1/n b 1/n a

54
from which it is manifest that the right hand side converges (independently of
the extension) to F (b) − F (a) as desired, in view of the continuity of F .

4.2.5 Absolute Continuity and the Fundamental Theorem of the


Lebesgue integral
We will now tackle the problem of finding the necessary and sufficient conditions
on F that guarantees the validity of the identity (34). It turns out that the
correct notion is that of absolute continuity:
Definition 4.3. The function F : [a, b] → R is absolutely continuous if for
any ǫ > 0 there exists a δ > 0 such that for any collection of disjoint intervals
(ak , bk ), k = 1, ..., N of [a, b] we have
N
X N
X
(bk − ak ) < δ =⇒ |F (bk ) − F (ak )| < ǫ
k=1 k=1

One immediately deduces that absolute continuity implies continuity, in fact


uniform continuity (why?).
Exercise 4.1. Show that if F is Lipschitz, then it is absolutely continuous.
Give an example of a function which is not Lipschitz but absolutely continuous.
Lemma 4.3. If F : [a, b] → R is absolutely continuous, then it is of bounded
variation on [a, b].
Proof. Let δ be the δ associated with the choice ǫ = 1 in the definition of
absolute continuity. Fix once and for all a partition P with mesh-size smaller
than δ, say a = t0 < t1 < ... < tN = b. In each interval (ti−1 , ti ), the total
variation is TF (ti−1 , ti ) < 1 by the definition of absolute continuity. But since
PN
TF (a, b) = i=1 TF (ti−1 , ti ) < N the lemma is proven.
Recall also the following fact, proven on Example Sheet 7:
Lemma 4.4. If F : [a, b] → R is continuous and of bounded variation, then its
total variation is continuous. As a consequence, the F1 and F2 in the decom-
position F = F1 − F2 of Theorem 4.5 are both continuous (in addition to being
increasing and bounded).
Why is the assumption of absolute continuity the correct notion to establish
the identity (34)? To see this, note first that if F : [a, b] → R is given by
Z x
F (x) = f (y)dy
a

with f integrable on [a, b], then F is absolutely continuous. This follows from
Proposition 3.5 and hence justifies the name introduced there. From this obser-
vation it is immediate that absolute continuity of F is a necessary condition if
F is to satisfy the identity (34): Indeed, if (34) holds, then F ′ is integrable and
therefore the right hand side of (34) is absolutely continuous. Hence so is the
right hand side.
The next theorem shows that absolute continuity is also a sufficient condi-
tion.

55
Theorem 4.7 (Fundamental Theorem of Lebesgue integration).
1. If F : [a, b] → R is absolutely continuous on [a, b], then F ′ exists almost
everywhere and is integrable. Moreover
Z x
F (x) − F (a) = F ′ (y)dy holds for all a ≤ x ≤ b. (39)
a

2. If f is integrable on [a, b], there exists an absolutely continuous function


F such that F ′ (x) = f (x) a.e., for instance
Z x
F (x) = f (y)dy .
a

Note that we already proved the second part of the theorem: Indeed we have
observed that the expression for F is absolutely integrable and by the Lebesgue
differentiation theorem we know that F ′ (x) = f (x) holds almost everywhere.
Hence the difficulty is proving the first part. I claim the first part will follow if
we can prove
Theorem 4.8. If F : [a, b] → R is absolutely continuous on [a, b], then F ′ exists
a.e. Moreover, if F ′ (x) = 0 for a.e. x, then F is constant.
Indeed, assuming Theorem 4.8 for the moment, the proof of Theorem 4.7
becomes rather short. We first observe that F absolutely continuous implies that
F is continuous and of bounded variation (Lemma 4.3) and hence that the F1
and F2 in the decomposition F = F1 −F2 are increasing and continuous (Lemma
4.4). Corollary 4.3 then implies that (F1 and F2 hence) F is differentiable
a.e. and also that (F1′ and F ′ ′
R 2x hence) F is integrable on [a, b].
We now define G(x) = a F ′ (y)dy. Clearly G is absolutely continuous and
hence so is G(x) − F (x). Theorem 4.1 implies that G′ (x) = F ′ (x) a.e. We
conclude that the function G − F is absolutely continuous an has derivative zero
a.e. and applying Theorem 4.8 that G − F is constant. Observing (G − F )(x) =
(G − F )(a) = −F (a) produces the identity (39).
It remains to prove Theorem 4.8. Just like the proof of Theorem 4.6 (which
we postponed to Section 4.2.7), the proof is quite intricate and isolated in the
following subsection 4.2.6. While you don’t have to remember the details of
the proof of these theorems you should realised that they (together with the
Lebesgue differentiation theorem) are at the heart of the Fundamental Theorem
of Lebesgue integration, Theorem 4.7.

4.2.6 The proof of Theorem 4.8


The proof of Theorem 4.8 relies on a more elaborate version of the Vitali covering
lemma than the one we have already seen in Lemma 4.1. You may wonder at
this point why covering Lemmas appear at all in this context. In a typical
situation we would like to determine the measure of some set which we happen
to know is covered by balls. Determining the measure is of course much easier
if we can select a finite disjoint subcollection of balls which covers the set (or at
least a large fraction of it).
Definition 4.4. A collection B of balls {B} is a Vitali covering of a set E if
for every x ∈ E and η > 0 there is a ball B ∈ B such that x ∈ B and m(B) < η.

56
In other words, in a Vitali covering of E every point is covered by arbitrary
small balls. The next lemma asserts that give a Vitali covering of a set of finite
measure, we can pick finitely many balls which cover the set E up to an arbitrary
prescribed δ > 0:
Lemma 4.5. Suppose E ⊂ Rd is a set of finite measure, m (E) < ∞, and B
is a Vitali covering of E. Then, for any δ > 0 we can find finitely many balls
B1 , B2 , ..., BN in B which are disjoint and so that
N
X
m (Bj ) ≥ m (E) − δ .
j=1

Moreover, we can select the balls such that also


 
N
[
m E \ Bj  < 2δ .
j=1

Note that the first estimate alone does still allow a large fraction of E not
to be covered by balls since the positivity on the left hand side of the inequality
could come from a large ball which lies mostly outside of E. The second estimate
excludes that possibility stating that the set that is not covered by balls of the
finite subcollection is also small.(Draw some pictures!)
Proof. The idea of the proof of the Lemma is to approximate the set E from
inside using compact sets, and then use the elementary covering Lemma 4.1 to
extract a finite disjoint subcollection covering at least a part of E. One then
looks at the part not yet covered and – in case it is still too large – approximates
it again from inside by a compact set, applies the old covering lemma and so
on. This procedure will eventually lead to the δ approximation claimed.
The details are as follows. It clearly suffices to prove the estimates for
δ < m (E). Fix such a δ. Let us also pick an open set U with E ⊂ U and
m (U) < m (E) + δ.
Step 1: We pick a compact set E1′ ⊂ E with m (E1′ ) ≥ m (E) − ǫ > δ − ǫ ≥ δ
(why can we do this? Example Sheet 3!). We cover E1′ by balls from B such
that every ball of the covering also lies in U (this is possible because E and
hence E1′ are covered by balls of arbitrarily small radius). Using compactness
we choose a finite sub-collection of balls covering E1′ (and contained in U) and
using Lemma 4.1 a finite disjoint subcollection B1 , ...BN1 such that
N1
X 1 δ
m (Bi ) ≥ d
m (E1′ ) ≥ d .
i=1
3 3

Step 2: If
N1
X
m (Bi ) ≥ m (E) − δ
i=1

then the first estimate is already proven. Otherwise, we have


N1
X
m (Bi ) < m (E) − δ
i=1

57
and hence
N1
[
E2 = E \ Bi
i=1

has measure m (E2 ) > δ (why?). We then repeat procedure of Step 1, i.e. we
find a compact subset E2′ with m (E2′ ) ≥ δ. We can cover the set E2′ by finitely
SN1
many balls contained in U and disjoint from i=1 Bi (why? – note that any

S N1
point in E2 has finite distance from i=1 Bi ). Using the old covering Lemma
4.1, we select a finite disjoint collection of these balls BN1 +1 , ..., BN2 such that
N2
X 1 δ
m (Bi ) ≥ m (E2′ ) ≥ d .
3d 3
i=N1 +1

Overall, we now have a subcollection of disjoint balls B1 , ..., BN2 with


N2
X δ
m (Bi ) ≥ 2
i=1
3d

Step 3: We repeat Step 2, i.e. we ask whether


N2
X
m (Bi ) ≥ m (E) − δ
i=1

in which case we are done or


N2
X
m (Bi ) < m (E) − δ
i=1

in which case we repeat the procedure in Step 2. If the procedure has not
terminated after k steps we have the estimate
Nk
X δ
m (Bi ) ≥ k .
i=1
3d

This yields a contradiction if k ≥ m(E)−δ


δ 3d , because then the right hand side
is ≥ m (E) − δ in contradiction with the procedure having not terminated.
To prove the second estimate, we observe that since all balls in the iteration
above are contained in U we have the disjoint union
N N
! !
[ [ [
E\ Bi Bi ⊂ U .
n=1 n=1

From this we deduce


N N
! !
[ [
m E\ Bi ≤ m (U) − m Bi ≤ m (E) + δ − (m (E) − δ) = 2δ.
n=1 n=1

58
Using the Lemma (in dimension d = 1), we can now complete the proof
of Theorem 4.8. Note that the difficulty is to prove that F ′ = 0 implies F is
constant, since the existence of F ′ follows from the fact that F is of bounded
variation and Theorem 4.6.
It clearly suffices to show F (b) = F (a) as we can then replace [a, b] by an
arbitrarily small subinterval. We let
E := {x ∈ (a, b) | F ′ (x) exists and is zero}
and we know m (E) = b − a by assumption. We have for each x ∈ E that
F (x + h) − F (x)
lim = 0.
h→0 h
Fix now ǫ > 0. In view of the existence of the above limit, we can find for each
η > 0 an open interval around each x ∈ E, Ix = (ax , bx ) ⊂ [a, b] with length
smaller than η, i.e. bx − ax < η, such that
|F (bx ) − F (ax )| ≤ ǫ(bx − ax ) .
The collection of these intervals forms a Vitali covering of the set E. Hence we
can apply Lemma 4.5: For any δ > 0 (which we will choose momentarily de-
pendent on ǫ) we can select finitely many disjoint intervals Ii = Ixi = (axi , bxi )
with 1 ≤ i ≤ N such that
N
X
m (Ii ) ≥ m (E) − δ = (b − a) − δ
i=1

On each interval Ii we have |F (bi ) − F (ai )| ≤ ǫ (bi − ai ) as this is a property of


all “balls” in the Vitali covering. Summing this inequality yields
N
X
|F (bi ) − F (ai )| ≤ ǫ(b − a)
i=1

because the intervals (ai , bi ) are disjoint and contained in [a, b].
SN
We now consider the complement of j=1 Ij in [a, b], denoted A. It consists
SM
of finitely many closed intervals with m (A) < δ, so A = k=1 [αk , βk ].
The idea is to use the absolute continuity on these disjoint intervals. Indeed,
we choose δ sufficiently small (depending only on ǫ) such that m (A) < δ implies
M
X
|F (βk ) − F (αk ) | ≤ ǫ .
j=1

As a consequence we then have from using the triangle inequality repeatedly


N
X M
X
F (b) − F (a) ≤ |F (bj ) − F (aj )| + |F (βj ) − F (αj ) | ≤ ǫ(b − a) + ǫ .
j=1 j=1

Since this holds for any ǫ > 0 we conclude F (b) = F (a) as desired.

4.2.7 The proof of Theorem 4.6 (non examinable)


I will add this non-examinable material to the notes at some point. It will not
be lectured in class.

59
5 Abstract Measure Theory
So far we have successfully dealt with the problem of defining a measure for sets
on Rn . We recall that the main steps of the analysis were
1. an elementary notion of measure for the simplest sets (rectangles or cubes)
2. the introduction of an exterior measure (defined on all subsets of Rd )
which assigned a “measure” as the infimum of countable coverings by
cubes and was consistent with the elementary measure on the rectangles
3. the introduction of the class of Lebesgue measurable sets which satisfied
the desired property of countable additivity
Given the class of Lebesgue measurable sets, we then defined measurable
functions f : Rd → R and developed an integration theory which allowed us to
integrate a much larger class of functions than the class of Riemann integrable
functions.
Our goal in this section is to develop the abstract framework that will allow
us to construct general measure spaces. The above “pedestrian” construction
of the Lebesgue measure on Rd can then be viewed as a particular example of
the abstract construction. More interestingly perhaps, the general construction
allows to construct many interesting measure spaces which appear in probability
and geometric measure theory.

5.1 Measure Spaces: Definition and basic examples


Definition 5.1. A measure space consists of a set X equipped with two fun-
damental objects
1. A σ-algebra M of sets (i.e. a non-empty collection of subsets of X
which is closed under complements and countable unions and intersec-
tions), which are called measurable sets
2. A measure
µ : M → [0, ∞]
i.e. a function with the property of being countably additive, i.e. if E1 , E2 , ...
is a countable collection of disjoint sets in M, then
∞ ∞
!
[ X
µ En = µ (En ) .
n=1 n=1

A measure space is denoted (X, M, µ). Sometimes (X, µ) or even just X is


used, if the σ-algebra and the measure that are being used are clear from the
context.
Exercise 5.1. Let (X, M, µ) be a measure space. Show that the measure is
monotone (i.e. A ⊂ B implies µ (A) ≤ µ (B) S for A, B ∈PM) and that for any
countable collection of sets (En ) one has µ ( ∞
n=1 nE ) ≤ ∞
n=1 µ (En ).

We make three more definitions of properties that a measure space might


have or not have. Their meaning will become clearer in the discussion of the
examples that follow immediately.

60
Definition 5.2.
1. We say (X, M, µ) is finite if µ (X) < ∞.

2. We say (X, M, µ) is σ-finite if there exists a countable
S∞ collection (Ei )i=1
of sets of finite measure (µ (Ei ) < ∞) such that X = i=1 Ei .
3. We say that (X, M, µ) is complete if the following statement is true:
Given any E ⊂ M with µ (E) = 0, any F ⊂ E is also in M (and has
necessarily µ (F ) = 0).
We give a couple of examples of general measure spaces:

1. If X is a non-empty set and M = (X, ∅) we can define µ (X) arbitrarily


and obtain a (rather trivial) measure space.
2. If X = Rd and M is the collection of Lebesgue measurable sets, then for
any measurable non-negative function f : Rd → R, the function µ : M →
[0, ∞] Z
µ (E) = f dx
E

defines a measure and hence Rd , M, µ is a measure space. [You should




verify this, i.e. check countable additivity using the properties of the in-
tegral (additivity, MCT) proven earlier.] The choice f = 1 leads to the
familiar Lebesgue measure.
 M with
A variant of this example is produced by replacing the σ-algebra 
the (smaller) σ-algebra of Borel-sets on Rd , denoted M̃. Then Rd , M̃, µ
is also a measure space. However, unlike Rd , M, µ it is not complete.


To see this recall that there are Borel sets of measure zero which contain
sets which are only Lebesgue- but not Borel measurable.10
3. Let X be a non-empty set and M = P (X) be the σ-algebra of all subsets
of X. Fix x0 ∈ X. We can define the measure

1 if x0 ∈ E
µ (E) =
0 if otherwise

for E ⊂ M. This is the so-called Dirac measure.



4. Let X = (xn )n=1 be a countable set and again M = P (X) be the σ-

algebra of all subsets of X. If (µn )n=1 is a sequence in [0, ∞] we can
define the measure µ by
µ (xn ) = µn .
P
By countable additivity we have µ (E) = xn ∈E µn for E ⊂ M. If
µn = 1 for all n the measure µ is called the counting measure as it counts
the elements of a set.
10 This can be seen from Exercise 5 on Example Sheet 3 where you constructed an injective,

strictly increasing function g : [0, 1] → [0, 1] whose image was contained in C. Since g is
monotone it is Borel measurable (Exercise 3 on Example Sheet 3), i.e. it pulls back Borel sets
to Borel sets. Taking N a non-measurable subset of [0, 1] we know that F = g (N ) is Lebesgue
measurable with measure zero because it is a subset of C, which has measure 0. However,
F = g (N ) cannot be Borel measurable because if it was, g −1 (F ) = N would have to be a
Borel set.

61
5.2 Exterior measure and Carathéodory’s theorem
While we already gave a few examples of general measure spaces, a natural
question is how to construct more interesting examples. Here the notion of an
exterior or outer measure is key.
Definition 5.3. Let X be a non-empty set. An exterior measure (or “outer
measure”) µ⋆ on X is a function µ⋆ : P (X) → [0, ∞] defined on all subsets of
X satisfying
1. µ⋆ (∅) = 0.
2. Monotonicity: If E1 ⊂ E2 then µ⋆ (E1 ) ≤ µ⋆ (E2 ).
3. Subadditivity: If E1 , E2 , ... is a countable family of sets, then
∞ ∞
!
[ X
µ⋆ En ≤ µ⋆ (En ) .
n=1 n=1

We will give examples of exterior measures below (see Section ... for the
exterior Hausdorff measure), here we only note that the exterior measure we
defined on sets in Rd in Section 2.3 satisfies the definition.
We have now reached a critical point. The key step to define the Lebesgue
measure from the exterior measure was to give up on measuring all subsets
of Rd and instead define a class of measurable sets on which the measure was
countably additive. However, Definition 2.2 explicitly used the topology of
Rd , i.e. the notion of an open set, which a general measure space does not
come equipped with. Here Carathéodory found a very clever criterion which
works in the general case (and reduces to our old criterion in the Lebesgue case,
cf. Example Sheet 9):
Definition 5.4. A set E ⊂ X is (Carathéodory) measurable if for all sets
A ⊂ X one has

µ⋆ (A) = µ⋆ (E ∩ A) + µ⋆ (E c ∩ A) . (40)

In other words, a measurable set separates any set into two parts which
behave well with respect to the exterior measure. As mentioned, the condition
can be seen to be equivalent to the condition of being Lebesgue measurable in
the case of X = Rd and the exterior measure of Section 2.3 (Example Sheet 9).
Observation 5.1. To show that a set E ⊂ X is measurable, it suffices to check
whether the inequality

µ⋆ (A) ≥ µ⋆ (E ∩ A) + µ⋆ (E c ∩ A)

holds for all A ⊂ X, as the reverse inequality holds by the subadditivity of the
exterior measure.
The observation immediately implies that sets of exterior measure 0 are
measurable since µ⋆ (A) ≥ µ⋆ (E c ∩ A) holds by monotonicity.
Theorem 5.1. Given an exterior measure µ⋆ on a set X, the collection M of
(Carathéodory) measurable sets forms a σ-algebra. Moreover, µ⋆ restricted to
M is a measure.

62
Proof. In view of the symmetry of the condition (40), we clearly have that
E ∈ M implies E c ∈ M. It is also easily checked that ∅ ∈ M and hence
X ∈ M.
Having shown non-emptyness and closure under complements, we note that
it suffices to show that the class M is closed under disjoint countable unions
and that we have countable additivity on M.11 To establish this, we first show
that M is closed under finite unions and finitely additive on M (Step 1) and
then move to the countable disjoint case (Step 2).
Step 1: Let E1 , E2 ∈ M and A ⊂ X be arbitrary. We first use that the
condition (40) holds for E1 and E2 to produce the inequality

µ⋆ (A) = µ⋆ (E2 ∩ A) + µ⋆ (E2c ∩ A)


= µ⋆ (E1 ∩ E2 ∩ A) + µ⋆ (E1c ∩ E2 ∩ A)
+ µ⋆ (E1 ∩ E2c ∩ A) + µ⋆ (E1c ∩ E2c ∩ A) (41)

The last term can be written as µ⋆ ((E1 ∪ E2 )c ∩ A). For the other three terms
on the right hand side note that

E1 ∪ E2 = (E1 ∩ E2 ) ∪ (E1c ∩ E2 ) ∪ (E1 ∩ E2c )

and hence by the subadditivity of µ⋆ we obtain

µ⋆ (A) ≥ µ⋆ (E1 ∪ E2 ∩ A) + µ⋆ ((E1 ∪ E2 )c ∩ A)

which proves E1 ∪ E2 ∈ M in view of the above observation. Note that this also
implies that E1 ∩ E2 ∈ M since the latter can be written as the complement of
the union of two sets in M. To show that µ⋆ is finitely additive assume that
E1 and E2 are disjoint and observe that

µ⋆ (E1 ∪ E2 ) = µ⋆ (E1 ∩ (E1 ∪ E2 )) + µ⋆ (E1c ∩ (E1 ∪ E2 )) = µ⋆ (E1 ) + µ⋆ (E2 )

where the first equality follows from the fact that E1 is measurable and the
second equality exploits the assumption that E1 ∩ E2 = ∅.

Step 2: Now let E1 , E2 , .... be a countable collection of disjoint sets in M.


Define
[n [∞
Gn = Ej and G = Ej
j=1 j=1

We clearly have Gn ∈ M by Step 1 and G ⊂ (Gn )c by definition. For A ⊂ X


c

arbitrary we start from

µ⋆ (A) = µ⋆ (Gn ∩ A) + µ⋆ ((Gn )c ∩ A) ≥ µ⋆ (Gn ∩ A) + µ⋆ (Gc ∩ A) (42)

and try to deal with the first term on the right hand side. We have

µ⋆ (Gn ∩ A) = µ⋆ (En ∩ (Gn ∩ A)) + µ⋆ ((En )c ∩ (Gn ∩ A))


= µ⋆ (En ∩ A) + µ⋆ (Gn−1 ∩ A) (43)
11 You should verify this. Note that any countable union can be written as a disjoint count-

able union (how?) and that closure under countable intersection follows using closure under
complements and countable unions via de Morgan’s laws.

63
where the second step exploits the disjointness of the Ej . An easy induction of
the formula (43) yields
n
X
µ⋆ (Gn ∩ A) = µ⋆ (Ej ∩ A) .
j=1

Now (42) becomes


n
X
µ⋆ (A) ≥ µ⋆ (Ej ∩ A) + µ⋆ (Gc ∩ A)
j=1

and taking the limit as n → ∞ yields



X
µ⋆ (A) ≥ µ⋆ (Ej ∩ A) + µ⋆ (Gc ∩ A) ≥ µ⋆ (G ∩ A) + µ⋆ (Gc ∩ A) ≥ µ⋆ (A) ,
j=1
(44)
where the last two inequalities both follow from the subadditivity of the exterior
measure. This shows (again in view of Observation 5.1) that G is measurable.
Moreover all inequalities in (44) are actually equalities and hence taking A = G
in (44) also yields countable additivity of µ⋆ on M.
As a final remark we note that the measure space obtained in the above
theorem is complete. Indeed, if E has measure zero and F ⊂ E then the
exterior measure of F is zero by monotonicity (recall the exterior measure is
defined on all subsets). Since we observed earlier that sets of exterior measure
zero are measurable, the claim follows.

5.3 Premeasures and the extension theorem


So far we have seen how to construct a general measure from an exterior mea-
sure using Carathéodory’s criterion (40). This of course shifts the problem to
constructing an exterior measure for subsets of a general X. This is typically
done via a premeasure, which is a notion of measure on a smaller, more elemen-
tary class of sets (similar to the rectangles we used in the case of the Lebesgue
measure).
Definition 5.5. Let X be a set. An algebra (of sets) in X is a non-empty
collection of subsets of X that is closed under complements, finite unions and
finite intersections.
Example 5.1. The collection of sets arising as finite disjoint unions of sets of
the form (a, b], (a, ∞) and ∅ with −∞ ≤ a < b < ∞, forms an algebra on R.
Definition 5.6. Let A be an algebra of sets in X. A premeasure on A is a
function µ0 : A → [0, ∞] that satisfies
1. µ0 (∅) = 0
S∞
2. If E1 , E2 , ... is a countable collection of disjoint sets in A with k=1 Ek ∈
A, then
∞ ∞
!
[ X
µ0 Ek = µ0 (Ek ) .
k=1 k=1
In particular, µ0 is finitely additive on A.

64
Note that ∞
S
k=1 Ek ∈ A in the second item is an assumption unless the union
happens to be finite. Note also that a premeasure in monotone (why?)

5.3.1 Construction of a measure from a premeasure


We now show how to construct a general measure from a premeasure. This gives
an alternative construction of the Lebesgue measure, which we will describe
below. The key in the following
Proposition 5.1. Let X be a set and µ0 be a premeasure on an algebra of sets
A in X. Define µ⋆ : P (X) → [0, ∞] by

nX ∞
[ o
µ⋆ (E) = inf µ0 (Ej ) | E ⊂ Ej where Ej ∈ A for all j .
j=1 j=1

Then
1. µ⋆ is an exterior measure on X
2. µ⋆ (E) = µ0 (E) for all E ∈ A
3. All sets in A are (Carathéodory) measurable (i.e. (40) holds)
The above proposition generates an exterior measure µ⋆ from a premeasure
µ0 . We can then apply Carathéodory’s theorem (Theorem 5.1) to construct
from µ⋆ a measure µ on the σ-algebra of Carathéodory measurable sets MC .
Now since by the above Proposition A ⊂ MC , we have that the σ-algebra
generated by A,12 denoted M (A), is contained in MC and hence in particular
µ restricts to a measure on M (A). (Of course MC can be strictly larger than
M (A)!) These considerations therefore establish the following
Theorem 5.2 (Hahn-extension). Let X be a set and µ0 be a premeasure on an
algebra of sets A in X. Denote by M the σ-algebra generated by A. Then there
exists a measure µ on M that extends µ0 .
We make an important remark about S the uniqueness. If the premeasure µ0 is

σ-finite (i.e. if X can be written as X = i=1 Ei for a countable collection (Ei )
of sets in A with µ0 (A) < ∞) then the measure µ whose existence is promised
in the theorem is unique (see Question 4 on Example Sheet 9).
Example 5.2. Combining Theorem 5.2 and Example 5.1 we outline another
construction of the Lebesgue measure on the Borel sets of R (cf. the third example
below Definition 5.2). One starts with the algebra A of intervals in Example 5.1
and defines the premeasure µ0 (I) = |I| on the intervals in A. Since A generates
the Borel σ-algebra on R and since µ0 is σ-finite, Theorem 5.2 generates a
unique measure on the Borel σ-algebra on R. The completion of this measure is
precisely the Lebesgue measure defined on the σ-algebra of Lebesgue measurable
sets. This last step (completion) will be carried out in Question 2 of Sheet 9.
12 Recall this is the smallest σ-algebra containing the sets of A.

65
5.3.2 The proof of Proposition 5.1
To prove the first part note first that µ⋆ is well-defined since we can choose Ej =
X for all j. We also easily see µ⋆ (∅) = 0 and E1 ⊂ E2 implies µ⋆ (E1 ) ≤ µ⋆ (E2 ).
To establish the subadditivity property we repeat the proof of Proposition 2.1.
We fix ǫ > 0 and givenSE1 , E2 , ... in X we choose forPeach Ei a collection
∞ ǫ ∞
S i,j ) in A with Ei ⊂ j=1 Ei,j with µ⋆ (Ei ) + 2i ≥
(E j=1Sµ0 (Ei,j ). Then
i,jSEi,j is a countable
P collection
P of sets in A which covers i Ei and hence
µ⋆ ( i Ei ) ≤ i,j µ0 (Ei,j ) ≤ i µ⋆ (Ei ) + ǫ. Since this holds for any ǫ > 0 we
are done.
To prove the second part (restriction of µ⋆ to A coincides with µ0 ) we suppose
E ∈ A. Clearly µ⋆ (E) S∞≤ µ0 (E) since E covers itself. To prove the reverse
inequality we let E ⊂ j=1 Ej with Ej ∈ A for all j be any covering of E. We
then define the sets  
k−1
[
Ek′ = E ∩ Ek \ Ej 
j=1

and
S∞ note that the Ek′ are disjoint elements of A, that Ek′ ⊂ Ek and that E =

k=1 Ek (check this!). By the countable additivity of the premeasure we then
have
X∞ ∞
X
µ0 (E) = µ0 (Ek′ ) ≤ µ0 (Ek )
k=1 k=1

and taking the infimum over all coverings of E by (Ek ) in A yields the claim as
this turns the right hand side into µ⋆ (E).
To prove the third part (all sets in A are measurable for µ⋆ ) we let A be an
arbitrary subset of X, E ∈ A and ǫ > 0. It suffices to show

ǫ + µ⋆ (A) ≥ µ⋆ (E ∩ A) + µ⋆ (E c ∩ A) . (45)
S∞
To prove this, we find a countable collection E1 , E2 , .... in A with A ⊂ j=1 Ej
and

X
µ0 (Ej ) ≤ µ⋆ (A) + ǫ . (46)
j=1

Since µ0 is a premeasure, it is finitely additive and we have


n
X n
X n
X
µ0 (Ej ) = µ0 (E ∩ Ej ) + µ0 (E c ∩ Ej ) .
j=1 j=1 j=1

Taking the limit n → ∞ (note that all terms are increasing in n) we finally find

X ∞
X ∞
X
µ0 (Ej ) = µ0 (E ∩ Ej ) + µ0 (E c ∩ Ej ) ≥ µ⋆ (E ∩ A) + µ⋆ (E c ∩ A) ,
j=1 j=1 j=1
S∞
with the last inequality following since j=1 E ∩ Ej is a countable union of sets
in A which covers E ∩ A. Combining this with (46) yields (45) as desired.

66
5.4 A further example: Hausdorff measure
In this section we present an application of Carathéodory’s construction to con-
struct the α-dimensional Hausdorff measure for sets in Rd . The discussion will
be very informal and should merely illustrate that the abstract construction
that we went through has interesting applications.
The heuristic idea for Hausdorff measure is to measure the α-dimensional
volume of sets in Rd for α < d. For instance a sphere in R3 should have non-
trivial 2-dimensional Hausdorff-measure (namely it’s area) while its Lebesgue
measure is of course zero. Similarly a interval of length 2 on the x-axis in R3
should have 1-dimensional Hausdorff-measure equal to 2 etc.
The key idea to construct a measure with these properties lies in the scaling
properties of a set. Given a subset E ⊂ Rd , suppose that scaling the set E by
n can be written as adjoining m almost disjoint copies of the original set, i.e.

nE = E1 ∪ E2 ∪ ... ∪ Em

where the Ei are disjoint congruent copies of E. For instance, if you scale the
unit interval in R3 on the x-axis by n the resulting set is
n
[
[0, n] × {0} × {0} = [j − 1, j] × {0} × {0}
j=1

so the above holds with m = n. The same example with a rectangle yields
m = n2 . It is intuitively clear that the exponent in the relation m = nα is what
we would call the dimension of the set under consideration.
For a more non-trivial example, consider the Cantor set. It is easy to see
that scaling the Cantor set C by a factor of 3, we obtain to disjoint copies of
the Cantor set, so in this case we have 2 = 3α and it would be tempting to say
that the Cantor set has fractional dimension log 2
log 3 .
We now give the definition which formalises the above considerations. For
any E ⊂ Rd we define the exterior α-dimensional Hausdorff-measure of E as

m⋆α (E) := lim Hα


δ
(E)
δ→0

where
nX ∞ o
α
[
δ
Hα (E) = inf (diamFk ) |E⊂ Fk , diamFk ≤ δ for all k.
k k=1

Here the diameter of a set A is defined as diamA = sup{|x − y|, x, y ∈ A}. Note
δ
that Hα (E) is well defined because countably many balls of diameter δ cover
all of Rd . Note also that as δ decreases, Hα
δ
(E) increases because we are taking
the infimum over fewer sets (the elements Fk in the covering are restricted to
be smaller in diameter). Hence the limit is actually defined.
We remark that the coverings by Fk in the definition cannot be replaced
by coverings of ball of diameter smaller than δ. This would yield a different
quantity. This makes the Hausdorff-measure of a set hard to compute in general.
One can check that m⋆α is monotone and sub-additive and hence indeed
an exterior measure. It moreover satisfies that if the distance of two sets E1
and E2 is strictly positive then we have m⋆α (E1 ∩ E2 ) = m⋆α (E1 ) + m⋆α (E2 ),

67
i.e. additivity (this makes m⋆α a so-called metric exterior measure). The first
two facts allow us to apply Carathéodory’s theorem (Theorem 5.1) to construct
from m⋆α a measure mα on the σ-algebra of Carathéodory measurable sets. The
third fact allows one to prove that the σ-algebra of Carathéodory measurable
sets contains the closed subsets and hence in particular the σ-algebra of Borel
sets. The measure mα restricted to the Borel sets is commonly known as the
α-dimensional Hausdorff measure on Rd . More on this in Stein-Shakarchi.

5.5 Integration on a general measure space


Our next task is to develop the analogue of the integration theory for the
Lebesgue integral to a general measure space (X, M, µ). We assume for sim-
plicity that the measure space (X, M, µ) is also σ-finite. The punchline is:
Everything that we did for the Lebesgue integral generalises and the proofs go
through almost word by word. This is why we merely collect rather informally
the main points.
Measurable functions are defined as before: A function f : X → [0, ∞] on
a measure space (X, M, µ) is measurable if f −1 ([−∞, a)) = {x ∈ X | f (x) <
a} ∈ M for all a ∈ R. Properties of measurable functions (limit of a sequence
of measurable functions is measurable etc) continue to hold.
The notion of almost everywhere ”a.e.” is defined with respect to the measure
µ, for instance f = g a.e. means that µ ({x ∈ X | f (x) 6= g(x)}) = 0.
We can define simple functions on X as before as finite linear combinations
of characteristic functions of measurable sets of finite measure,
N
X
a k χE k .
k=1

The approximation theorems of Section 2.8 continue to hold true. This actually
needs the σ-finite condition.13
Egorov’s theorem remains true (check this!).
The integral can be define via the same four stage procedure that we carried
out for the Lebesgue integral leading to
Z
f (x)dµ(x)
X

the integral of a measurable function over a general measure space (which is


again linear, monotone, etc). We say that f is integrable if
Z
|f (x)|dµ(x) < ∞ .
X

Finally, the important convergence theorems (Fatou’s Lemma, the Monotone


Convergence Theorem and the Dominant Convergence theorem) all continue
to hold. Our final goal, which does not immediately generalise, is to prove a
general Fubini theorem for the integral on a general measure space.
13 Think of the first step in the proof of the approximation theorems where we truncate F to

be defined on a set of finite measure (a cube in the Lebesgue case) which in the limit exhausts
Rd .

68
5.6 Construction of product measures
We finally discuss product measures. The idea is the following. Given two
measure spaces (X, M, µ) and (Y, N , ν) we would like to construct a σ-algebra
“M ⊗ N ” of subsets of the Cartesian product X × Y and a product measure
“µ × ν” on M ⊗ N .
Why could such a thing be useful? On the one hand, the construction below
will provide another way to construct the Lebesgue measure on R2 = R × R
(and more generally, Rn ) from the Lebesgue measure on R.14 On the other
hand, think of an application in probability: Given a measure space X = {h, t}
representing a head-tail-experiment with a measure on P (X) determined by
µ (h) = µ (t) = 1/2, we would like to consider n experiments (or perhaps even
infinitely many), i.e. the space X × X × ... × X equipped with a corresponding
product measure.
Given the setting in the first paragraph, we consider the algebra M ⊠ N of
finite disjoint unions of rectangles M × N ⊂ X × Y with M ∈ M and N ∈ N .
Exercise 5.2. Check that M ⊠ N is indeed an algebra.
Hint: Use Problem 3 of Sheet 9.
We let M ⊗ N denote the σ-algebra generated by M ⊠ N . A particular
example is given by the Borel-algebras M = BR and N = BR for which M⊗N =
BR2 (Exercise).
We now define µ0 : M ⊠ N → [0, ∞] by
 
[N XN
µ0  (Mj × Nj ) = µ (Mj ) ν (Nj )
j=1 j=1

with the convention that 0 · ∞ = ∞ · 0 = 0 on the right hand side.


Lemma 5.1. µ0 is a premeasure on M ⊠ N .
Proof. It is not hard to see that µ0 (∅) = 0 and that the difficulty is to prove
that µ0 is countably additive. S∞
We prove that if a rectangle M × N = j=1 (Mj × Nj ) is a countable union
of disjoint rectangles, then we have

X
µ0 (M × N ) = µ (Mj ) ν (Nj ) . (47)
j=1

Note that this statement implies that if a finite disjoint union of rectangles is
a countable union of disjoint rectangles, then additivity holds (which is what
we actually need to prove as any element of M ⊠ N is a finite disjoint union of
rectangles).
To prove (47) we first note that

X ∞
X
χM (x) χN (y) = χM×N (x, y) = χMj ×Nj (x, y) = χMj × χNj (y)
j=1 j=1

14 Recall that Example 5.2 provided an outline for abstractly constructing Lebesgue measure

on R from a premeasure on the intervals. This construction, followed by the construction of


Lebesgue measure on Rn via product measures is the presentation given in many books on
measure theory.

69
and then integrate with respect to x to obtain – using the MCT – the identity

X
µ (M ) χN (y) = µ (Mj ) χNj (y) .
j=1

Integrating again, this time in y and using once more the MCT we obtain the
desired (47).
Given Lemma 5.1, we can apply the Hahn extension theorem, Theorem 5.2
above to obtain a (unique if µ and ν are both σ-finite – why?) product measure
on M ⊗ N which extends µ0 , which we denote by µ × ν. In the case of Lebesgue
measure on M = N = BR one checks that the product is the Lebesgue measure
on BR2 that we defined via rectangles.

5.7 General Fubini theorem


Let (X, M, µ) and (Y, N , ν) be σ-finite measure spaces and (X × Y, M ⊗ N , µ × ν)
be the (unique) product measure space defined in the previous section.
Given a measurable function f on (X × Y, M ⊗ N , µ × ν), we would like to
understand whether the identity
Z Z  Z
f (x, y) dν(y) dµ(x) = f (x, y) d(µ × ν)
X Y X×Y

between the iterated integrals and the integral with respect to the product
measure holds (and whether the left hand side actually makes sense). This will
be the general Fubini theorem for product measures.
To state it, we make the familiar (from the Lebesgue case, cf. Section 3.8.1)
definitions of slices (also called sections): We define
• For E ⊂ X × Y a subset we define the slices/ sections of E

Ex = {y ∈ Y | (x, y) ∈ E} and Ey = {x ∈ X | (x, y) ∈ E}.

• For a measurable function f : X × Y → R we define

the slice corresponding to y ∈ Y fixed as the function f y (x) := f (x, y) ,

the slice corresponding to x ∈ X fixed as the function fx (y) := f (x, y) .

It is not too hard to show that if f is measurable on X × Y then fx is


measurable on Y and fy is measurable on X. (Outline: Start with F = {E ⊂
X × Y | Ex ∈ N for all x and Ey ∈ M for all y} and prove F is a σ-algebra
containing the rectangles, so F ⊃ M⊗N . Now observe (fx )−1 (S) = (f −1 (S))x .)
We are ready to state the general Fubini-Tonelli theorem:
Theorem 5.3 (Fubini-Tonelli). Let (X, M, µ) and (Y, N , ν) be σ-finite mea-
sure spaces and (X × Y, M ⊗ N , µ × ν) be the (unique) product measure space
defined in the previous section. Then

70
1. (Tonelli, i.e. assuming f measurable non-negative)
If f : X × Y → [0, ∞] is measurable, then the functions
Z Z
g(x) = fx dν and h(y) = f y dµ (48)
Y X

are measurable on X and Y respectively. Moreover the identity


Z Z Z 
f (x, y) d(µ × ν) = f (x, y) dν(y) dµ(x)
X×Y X Y
Z Z 
= f (x, y) dµ(x) dν(y) (49)
Y X

holds.
2. (Fubini, i.e. assuming f integrable)
If f : X × Y → R ∈ L1 (X × Y, µ × ν)), then fx ∈ L1 (Y, µ) for a.e. x,
f y ∈ L1 (X, ν) for a.e. y. Moreover, the functions (48) are in L1 (X, µ)
and L1 (Y, ν) respectively and the formula (49) holds.
We won’t prove the general Fubini-Tonelli theorem here since we already
went through the proof in the Lebesgue case. You can however easily deduce
the second statement from the first.
A nice application of the general Fubini-Tonelli theorem is given in Question
6 on Example sheet 9.

71
6 The change of variables formula
In this section we prove the famous change of variables formula. The proof
exhibits nicely many of the measure theoretic tools that we developed.
Theorem 6.1. Let U ⊂ Rn be open, ϕ : U → V be a C 1 diffeomorphism15 with
an open set V ⊂ Rn . Then
1. f : V → R is integrable if and only if the function (f ◦ ϕ) | det Dϕ| : U → R
is integrable
2. The following change of variables formula holds:
Z Z
f (y) dy = (f ◦ ϕ)(x)| det Dϕ(x)|dx . (50)
V=ϕ(U ) U

6.1 An example illustrating Theorem 6.1


Before we prove the theorem, let us illustrate it with a concrete example. Sup-
pose we want to integrate f : R2 → R
2
−(y2 )2
f (y1 , y2 ) = e−(y1 )

over R2 .16 One way to compute the integral is to go to polar coordinates (x1 , x2 )
(which you should think of as x1 = r and x2 = φ)

ϕ : (x1 , x2 ) 7→ (y1 = x1 cos(x2 ), y2 = x1 sin(x2 )) .

Now ϕ maps (0, ∞) × [0, 2π] → R2 \ {0} but not diffeomorphically. However, ϕ
is clearly a diffeomorphism of two open subsets of R2 when restricted to a map
U := (0, ∞) × (0, 2π) → V := R2 \ {(y1 ≥ 0, y2 = 0)}. Its differential is
 
cos x2 −x1 sin x2
Dϕ (x1 , x2 ) = ,
sin x2 x1 cos x2

which has Jacobi determinant |Dϕ (x1 , x2 ) | = x1 > 0 on the domain considered.
Note that V = ϕ (U) differs from R2 by a set of measure zero, so the left hand
side of (50) is precisely the integral we want to compute, namely
Z Z
2 2 2 2
e−(y1 ) −(y2 ) dy1 dy2 = e−(y1 ) −(y2 ) dy1 dy2 . (51)
R2 V
2
For the right hand side of (50) we note that f ◦ ϕ (x1 , x2 ) = e−(x1 ) and hence
Z Z ∞ Z 2π
−(x1 )2 −(x1 )2
e x1 dx1 dx2 = dx1 e x1 dx2 = π ,
U 0 0

with the first step following from Fubini and the last step from a simple integra-
tion by substitution. We conclude that the desired integral (51) has the value
R∞ 2 √
π and hence, as a corollary, that −∞ dy e−y = π.
15 Recallthat this means that ϕ is C 1 , bijective and that its inverse is also a C 1 map.
R 2
2
16 By ∞
Fubini we know that the result is −∞
e−y dy and hence in particular that f is
integrable but it is not immediate how to compute the one-dimensional integral!

72
6.2 A reformulation of Theorem 6.1
We next give an equivalent formulation of Theorem 6.1:
Theorem 6.2. Let U ⊂ Rn be open, ϕ : U → V be a C 1 diffeomorphism with
an open set V ⊂ Rn . Then we have for any measurable set A in U
Z
µn (ϕ (A)) = | det Dϕ (x) |dµn (52)
A

where µn denotes the n-dimensional Lebesgue measure.


Theorem 6.3. Theorem 6.2 and Theorem 6.1 are equivalent, i.e. one can be
deduced from the other.
Proof. It is clear that Theorem 6.1 implies Theorem 6.2. Indeed, apply Theorem
6.1 with f = χϕ(A) for any measurable set A in U and note f ◦ ϕ = χA . (If
ϕ (A) has infinite Lebesgue measure, then χϕ(A) ≥ 0 and χA | det Dϕ(x)| ≥ 0
both are not integrable by Theorem 6.1 and hence (50) holds as +∞ = +∞.)
Now we claim that Theorem 6.2 also implies Theorem 6.1. To see this,
assume Theorem 6.2 is true and let f be integrable (Direction 1). We decompose
f = f+ + f− and show the integrability of f ◦ ϕ| det Dϕ| and (50) separately for
f+ and f− .
Pick (fn ) a sequence of simple functions with fn ր f+ (Theorem 2.7), say
Pkn
fn = k=1 ak χϕ(Ak ) (why can we write fn like this?). By the linearity of the
integral we have that (50) holds for any of the fn : Indeed, for each fn (50) is
simply a finite sum of identities (52). But since (50) holds for any fn and since
fn increases to f+ (which is integrable since f is) and fn ◦ ϕ increases to f+ ◦ ϕ,
the monotone convergence theorem implies that f+ ◦ϕ| det Dϕ| is also integrable
and that (50) holds for f+ . Of course f− is treated entirely analogously.
We finally treat the case where f+ ◦ ϕ| det Dϕ| (instead of f ) is assumed to
be integrable (Direction 2). Applying what we have already shown with ϕ−1
we conclude f ◦ ϕ ◦ ϕ−1 | det Dϕ|| det Dϕ−1 | = f is integrable and that (50)
holds.

6.3 Proof of Theorem 6.2


The proof will consist of one preliminary observations followed by 5 steps.
Preliminary Observation: Note that the desired identity (52) is an iden-
tity of measures on U. Indeed, both maps

A 7→ µ (ϕ (A)) (53)

and
Z
A 7→ | det Dϕ (x) |dx (54)
A

are easily seen to be countably additive.

Step 1. Observe that it suffices to prove the following local statement:


Every point p ∈ U has an open neighbourhood W such that Theorem 6.2 holds
for ϕ|W : W → ϕ (W).

73
To see this, assume the local statement was proven. Cover U with such
open neighbourhoods Wx in which the identity holds. Now because Rn has a
countable basis of its topology we can
S∞ select a countable subcover (Wi ) (why?).
Given this countable subcover U ⊂ i=1 Wi , let A be an arbitrary measurable
Si−1
set in U. Define W̃i = A ∩ Wi \ j=1 Wi . Then the W̃i are pairwise disjoint
and their union is A. Since the identity of measures (52) holds for any W̃i and
is countably additive, it holds for A itself.

Step 2. Theorem 6.2 (hence Theorem 6.1) holds if ϕ is a permutation of


coordinates.

Step 3. Theorem 6.2 holds for n = 1, i.e. U ⊂ R.


To see this, we first note that if A = [a, b] is an interval, then ϕ (A) =
[ϕ(a), ϕ(b)] and either ϕ′ > 0 or ϕ′ < 0 as ϕ is a C 1 -diffeomorphism. In the
first case Z b
ϕ (b) − ϕ (a) = ϕ′ dx
a
while in the second Z b
ϕ(a) − ϕ(b) = −ϕ′ dx
a

hence verifying µ (ϕ (A)) = A |ϕ′ (x)|dx when A is an interval. This holds for
R
any finite interval (not necessarily closed) and by countable additivity of the
identity (52), the latter also holds for intervals of infinite length. Finite disjoint
unions of intervals in U form an algebra A of sets in U. The two measures (53)
and (54) agree on A hence define a premeasure on A which is moreover σ-finite
(since one can write U as a countable disjoint union of intervals of finite length).
Therefore, since both (53) and (54) extend the same σ-finite premeasure they
must agree on the extension (cf. Q4 of Sheet 9).

Step 4. If Theorem 6.2 holds for ψ : U → W and ρ : W → V, then it holds


for the composition ρ ◦ ψ : U → V.
Indeed, note
Z
µ (ρ ◦ ψ (A)) = | det Dρ (z) |dz
ψ(A)
Z Z
= | det Dρ(ψ(x))|| det Dψ(x)|dx = | det D(ρ ◦ ψ)(x)|dx , (55)
A A

where the first step follows from Theorem 6.2 holding for ρ, the second from
Theorem 6.2 (hence Theorem 6.1) holding for ψ and the third from the chain
rule and the properties of the determinant.

Step 5. We prove the local statement by induction on the dimension. The


case n = 1 is Step 3. Our induction assumption is that the local statement
(hence the global one) holds for n − 1 and we are considering a general diffeo-
morphism ϕ : U → V ⊂ Rn locally near p ∈ U.
We first claim that wlog we can restrict to ϕ : (x1 , ..., xn ) 7→ (ϕ1 (x), ..., ϕn (x))
satisfying ∂ϕ ∂ϕ1
∂x1 (p) 6= 0. Indeed, since for a general ϕ we have ∂xi (p) 6= 0 for
1

74
∂ϕi
some i ∈ {1, 2, ..., n} (in view Jacobian ∂xj having full rank) we can permute
∂ϕ1
the coordinates xi to achieve ∂x1(p) 6= 0 (and use Steps 2+4).
We next claim that wlog we can even assume that ϕ keeps the first coordinate
fixed, i.e. that ϕ has the form ϕ : (t, x) = (t, ϕt (x)). (Note that with this the
map ϕt : Ut := U ∩ {x1 = t} → {t} × Rn−1 is again a diffeomorphism in view of
 
1 0 ... 0
? 
Dϕ (t, x) = 
?

Dϕt 
?

and det Dϕ = det Dϕt .) To verify the claim, suppose one has established (52)
for such ϕ. Then, by Step 2 one has also shown it for any ϕ which keeps one
of the coordinates (not necessarily the first) fixed. Moreover, one can write
a general ϕ near p as the composition of two diffeomorphisms each of which
fixes at least one coordinate: Indeed, given general ϕ with ∂ϕ ∂x1 (p) 6= 0, let
1

ψ : (x1 , ..., xn ) → (ϕ1 (x), x2 , ..., xn ). Then ψ is a local diffeomorphism at p


and hence ρ = ψ ◦ ϕ−1 is a diffeomorphism at ϕ(p) ∈ V which keeps the first
coordinate fixed.
W
ψ ρ−1

ϕ
U V

We have ϕ = ρ−1 ◦ ψ and since both ρ and ψ fix at least one coordinate, Steps
2 and 4 imply (52) for general ϕ.
We finally prove the result for ϕ of the form ϕ : (t, x) = (t, ϕt (x)) using the
induction assumption: First, by Fubini, we have
Z Z
µn (ϕ (A)) = χϕ(A) dtdy2 ...dyn = dtµn−1 ((ϕ(A))t ) . (56)
Rn R

Next observe that (ϕ (A))t = ϕt (At )

and then use the induction assumption and Fubini again (together with | det Dϕ| =
| det Dϕt |) to conclude
Z Z Z
µn (ϕ (A)) = dtµn−1 (ϕt (At )) = dt | det Dϕt |dµn−1
R R At
Z Z
= dt χAt | det Dϕt |dx2 ...dxn = | det Dϕ|dµn . (57)
Rn A

75
7 Mastery Material: Lp -spaces
The material is this section is relevant only for the Mastery Question. The main
points here are the Hölder and the Minkowski inequality, which you should know
and be able to apply.
I leave some gaps in the proofs below which you should fill in on your own.
If you need help, a good reference is provided by the first three pages of Section
6 in Folland’s book (cf. Section 1.4).

7.1 Definition
Let (X, M, µ) be a measure space and f a (say real-valued) measurable function
on X. For 1 ≤ p < ∞ we define
Z 1/p
kf kLp := |f |p dµ(x)

and
Lp (X, M, µ) = {f : X → R | f is measurable and kf kLp < ∞ } ,
the space of measurable functions whose Lp -norm is finite. We sometimes write
Lp (X, µ), or simply Lp (X), or even just Lp for Lp (X, M, µ) to simplify the
notation provided no confusion arises. If one identifies two functions which are
equal almost everywhere, the space Lp (X) can be shown to be a complete vector
space by adapting the proof we gave for L1 (Rn ) in Section 3.6, Theorem 3.5.
In the following we fix a σ-finite measure space (X, M, µ)and write Lp for
Lp (X, M, µ) below.

7.2 The Hölder inequality


Theorem 7.1. Let 1 < p < ∞ and q satisfy p1 + q1 = 1. Let f ∈ Lp and g ∈ Lq .
Then the product f g ∈ L1 with the inequality
kf gkL1 ≤ kf kLp kgkLq .
Remark 7.1. If the exponents p and q in the space Lp and Lq are related by
1 1
p + q = 1, one says that the exponents are conjugate or dual to one another.

Proof. Step 1. Prove the following inequality for A, B ≥ 0 and 0 ≤ θ ≤ 1:


Aθ B 1−θ ≤ θA + (1 − θ) B
Hint: Wlog B ≤ 0. Setting A = ÃB it suffices to prove Ãθ ≤ θà + (1 − θ) for
à ≥ 0, which can be done using elementary calculus.

Step 2. Note that we can assume kf kLp 6= 0 and kgkLq 6= 0 as otherwise f g = 0


a.e. and the inequality is trivially satisfied. Replacing f by kf kfLp and g by kgkgLq
it suffices to prove kf gkL1 ≤ 1 for f having Lp -norm equal to 1 and g having
Lq norm equal to 1.

Step 3. Set A = |f (x)|p and B = |g(x)|q and θ = 1p , apply the inequality from
Step 1 and integrate it to obtain kf gkL1 ≤ 1 as desired.

76
7.3 Minkowski’s inequality
Theorem 7.2. Let 1 ≤ p < ∞ and f, g ∈ Lp . Then f + g ∈ Lp with the
inequality
kf + gkLp ≤ kf kLp + kgkLp .
Proof. The case p = 1 is easy (why?) so we let p > 1. To verify that f + g ∈ Lp
we first note that (why?)

|f (x) + g(x)|p ≤ 2p (|f (x)|p + |g(x)|p ) .

To prove the inequality, we observe

|f (x) + g(x)|p ≤ |f (x)| · |f (x) + g(x)|p−1 + |g(x)| · |f (x) + g(x)|p−1 .

If q is the conjugate exponent of p, that is p1 + 1q = 1, then we see that |f (x) +


g(x)|p−1 is in Lq (why?). Therefore we can apply Hölder’s inequality on the right
hand side and this leads to the result after some algebra using (p − 1)q = p.

77

You might also like