Measure 1
Measure 1
Measure 1
Gustav Holzegel∗
April 24, 2017
Abstract
These notes accompany the lecture course ”Measure and Integration”
at Imperial College London (Autumn 2016). They follow very closely the
text “Real-Analysis” by Stein-Shakarchi, in fact most proofs are simple
rephrasings of the proofs presented in the aforementioned book.
Contents
1 Motivation 3
1.1 Quick review of the Riemann integral . . . . . . . . . . . . . . . 3
1.2 Drawbacks of the class R, motivation of the Lebesgue theory . . 4
1.2.1 Limits of functions . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Length of curves . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 The Fundamental Theorem of Calculus . . . . . . . . . . 5
1.3 Measures of sets in R . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Literature and Further Reading . . . . . . . . . . . . . . . . . . . 6
1
2.6.6 The notion of “almost everywhere” . . . . . . . . . . . . . 22
2.7 Building blocks of integration theory . . . . . . . . . . . . . . . . 23
2.7.1 Simple functions . . . . . . . . . . . . . . . . . . . . . . . 23
2.7.2 Step functions . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 Approximation Theorems . . . . . . . . . . . . . . . . . . . . . . 23
2.8.1 Approximating a measurable function by simple functions 23
2.8.2 Approximating a mesurable function by step functions . . 25
2.9 Littlewoods Three Principles . . . . . . . . . . . . . . . . . . . . 26
2
6 The change of variables formula 72
6.1 An example illustrating Theorem 6.1 . . . . . . . . . . . . . . . . 72
6.2 A reformulation of Theorem 6.1 . . . . . . . . . . . . . . . . . . . 73
6.3 Proof of Theorem 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . 73
1 Motivation
1.1 Quick review of the Riemann integral
In your second year analysis course you defined the Riemann integral. Let us
remind ourselves of the simplest situation and consider a bounded real-valued
function f defined on the interval [a, b].
• A partition P of [a, b] is a finite set of points x0 , x1 , ..., xn with
a = x0 ≤ x1 ≤ x2 ≤ ... ≤ xn−1 ≤ xn = b .
and n
X
U (P, f ) = Mi (xi − xi−1 ) the upper sum
i=1
n
X
L (P, f ) = mi (xi − xi−1 ) the lower sum .
i=1
3
Theorem 1.1. If P ⋆ is a refinement of P then
L (P, f ) ≤ L (P ⋆ , f ) and U (P, f ) ≥ U (P ⋆ , f ) .
Theorem 1.2. We have
Z b Z b
f dx ≤ f dx
a a
4
1.2.1 Limits of functions
Well, one main drawbacks of the class of Riemann integrable functions R is that
it does not behave well under taking limits.
To see this, consider the function
1 if x ∈ Q ∩ [0, 1] ,
f (x) = (2)
0 otherwise.
This function is not Riemann integrable (why?). On the other hand, for (xn )
an enumeration of the rational numbers in [0, 1], the function
1 if x ∈ {x1 , x2 , ..., xn },
fn (x) =
0 otherwise.
is Riemann integrable for every n (why? with what value?) and we have fn → f
pointwise. We conclude that the limit of a sequence of Riemann integrable
functions does not have to be Riemann integrable.
A perhaps less academic example is provided by the following sequence of
functions that you will construct on the first example sheet: (fn ) is a sequence
of continuous functions fn : [0, 1] → R with fn → f pointwise such that
• 0 ≤ fn ≤ 1
• (fn ) decreases monotonically as n → ∞
• f is not Riemann integrable
R1
The above implies that sn = 0 fn dx is a decreasing sequence of positive num-
R1
bers and hence converges. It is very tempting to define 0 f dx to be that
limit. The Lebesgue integral will allow us to do this in this particular situation
(“monotone convergence theorem”) and in much more general ones.
5
Now there are functions whose derivative is not Riemann integrable. Can we
still make sense of the formula above? What is the class of functions for which
an identity as above holds?
construct such a set later. In higher dimensions, one has the Banach-Tarski paradox, which
proves (using again the axiom of choice) that the unit ball in R3 can be decomposed into a
finite number of disjoint sets Ai (i ≥ 5) which can reassembled – using only translations and
rotations applied to the Ai – into two copies of the unit ball.
6
2 Measure Theory: Lebesgue Measure in Rd
2.1 Preliminaries and Notation
• x = (x1 , .., xd ) a point in Rd
p
• |x| = (x1 )2 + ...(xd )2 the Euclidean norm
• d (x, y) = |x − y| the distance of two points x, y ∈ Rd
• d (E, F ) = inf x∈E;y∈F |x − y| distance between two sets E, F ⊂ Rd .
• Br (x) := {y ∈ Rd | |x − y| < r} open ball around x of radius r in Rd .
• recall definitions of open, closed, bounded
• E ⊂ Rd is compact if for any open
S
S cover E ⊂ α∈A Uα (Uα open) there
exists a finite subcover, i.e. E ⊂ j∈J Uj for J a finite subset of A.
7
Lemma 2.1. If aS rectangle is the almost disjoint union of finitely many rect-
N
angles , say R = k=1 Rk , then
N
X
|R| = |Rk |
k=1
Proof. One first extends the sides of the rectangles Rk to obtain new rectangles
R̃1 , ..., R̃M as shown in the figure below.
R2
R1
R4
R3
R5
This follows from writing out the volumes on the left and on the right and ap-
plying the distributive law. The same argument can be applied to the rectangles
Rk , so X
|Rk | = |R̃j | .
j∈Jk
SN
Lemma 2.2. If R ⊂ k=1 Rk with R and Rk (closed) rectangles, then |R| ≤
PN
k=1 |Rk |.
8
Theorem 2.1. Every open subset U ⊂ Rd , d ≥ 1 can be written as a countable
union of almost disjoint closed cubes.
Proof. We construct the union as a (countably infinite) sequence of steps. We
start with the grid of mesh 1 on Rd with lines parallel to the coordinate axes.
A cube Q of the grid is accepted if Q ⊂ U , recjected if Q ⊂ U c and tentatively
accepted otherwise. In the second step we bisect the tentatively accepted cubes
to cubes of length 2−1 and again accept, reject or tentatively accept the sub-
cubes. Continuing this procedure indefinitely, we obtain a countable union of
accepted cubes and we claim that this union is U . To see this, note first that
taking the grid of mesh size 2−N of Rd we know that any cube contained in U
has either been accepted or is contained in a cube that has been accepted in a
previous step. Therefore, it suffices to show that x ∈ U is contained in a cube
of size 2−N contained in U for large enough N . But this is easily deduced from
the fact that U is open.
However, we would still have to show independence of this quantity from the
decomposition. We’re now going to achieve this and much more.
Definition 2.1. For E ⊂ Rd any subset of Rd we define the exterior measure
of E by
X∞
m⋆ (E) = Sinf
∞
|Qj |
E⊂ j=1 Qj
j=1
where we are taking the infimum over all countable coverings of E by closed
cubes.
Note that 0 ≤ m⋆ ≤ ∞. Remarkably, replacing countable by finite in the
above definition would yield a different quantity (see Example Sheet 1).
2.3.1 Examples
Let us compute the exterior measure for some elementary sets to see whether it
agrees with our intuitive definition of volume above.
9
In fact, it suffices to show that any covering Qj satisfies
∞
X
|Qj | ≥ |Q| − ǫ .
j=1
for any ǫ > 0. To show the latter, given ǫ > 0 we chooseS∞ for each j an
open cube Sj with Qj ⊂ Sj and |Sj | ≤ |Qj | + 2ǫj . Then j=1 Sj is an open
SN
cover of Q. Since Q is compact, there is a finite open subcover j=1 Skj
SN
of Q and clearly also Q ⊂ j=1 Skj . Now we can apply Lemma 2.2 to
PN
conclude |Q| ≤ j=1 |Skj | and therefore
N N ∞
X X ǫ X
|Q| ≤ |Skj | ≤ |Qkj | + ≤ |Qj | + ǫ
j=1 j=1
2kj j=1
as desired.
3. The exterior measure of an open cube Q is equal to its volume.
Again, we have m⋆ (Q) ≤ |Q| = |Q| since the closed cube Q covers Q.
In the reverse direction observe that we can find, for any ǫ > 0, a closed
cube Qin contained in Q such that |Qin | ≥ |Q| − ǫ. Therefore, we have
m⋆ (Q) ≥ m⋆ (Qin ) ≥ |Q| − ǫ, the first inequality holding because any
covering of Q is also one of Qin .
4. The exterior measure of a rectangle R is equal to its volume.
We sketch the argument. First of all, arguing as for the cube in 2. above,
we obtain |R| ≤ m⋆ (R).
For the reverse direction consider a grid of cubes of length 1/k on Rd and
denote by Q̃ the cubes entirely contained in R and by Q̃′ those cubes
intersecting both R and the complement Rc . Clearly for fixed k there are
only finitely many cubes in Q̃ and Q̃′ . In fact, it is easy to see that the
number of cubes in Q̃′ is smaller than Ck d−1 for some uniform constant
C depending only on the side lengths of R. We finally note that
[
R⊂ Q
Q∈Q̃∪Q̃′
where the right hand side is a rectangle expressed as the union of finitely
many almost disjoint union of cubes. By monotonicity and Lemma 2.1 we
have
X X C
m⋆ (R) ≤ |Q| + |Q| ≤ |R| +
k
Q∈Q̃ ′ Q∈Q̃
and choosing k large we obtain the desired inequality for any ǫ > 0.
5. The exterior measure of Rd is infinite, m⋆ Rd = ∞, as any covering of
10
Proposition 2.1. The exterior measure m⋆ satisfies the following properties:
1. If E1 ⊂ E2 then m⋆ (E1 ) ≤ m⋆ (E2 ) (monotonicity)
S∞ P∞
2. If E = j=1 Ej , then m⋆ (E) ≤ j=1 m⋆ (Ej ) (countable subadditivity)
3. If E ⊂ Rd , then
m⋆ (E) = inf m⋆ (U)
E⊂U
with the infimum taken over all open sets that contain E.
S
4. If E = E1 E2 and d (E1 , E2 ) > 0, then m⋆ (E) = m⋆ (E1 ) + m⋆ (E2 )
(finite additivity for sets with positive distance)
5. If E = ∞
S
j=1 QjP is a union of almost disjoint cubes,
∞
then m⋆ (E) = j=1 |Qj | (countable additivity for almost disjoint cubes)
Before we prove this, let us make some remarks. We note first that 5. gives
us (in view of Theorem 2.1) a notion of volume of an arbitrary open set which
is independent of the decomposition into cubes.
We also remark that one cannot conclude in general that if E1 and E2 are
disjoint sets in Rd , then m (E1 ) + m (E2 ) = m (E1 ∪ E2 ) holds.2 However, for
the class of sets (“Lebesgue measurable sets”) that we are going to define in
the next section, this property does hold, in fact it does so for countably many
disjoint sets (countable additivity)!
Proof. The first property follows since any covering of E2 of closed cubes is also
a covering of E1 . For the second property we use the 2ǫj -trick: With ǫ > 0
arbitrary and fixed, we choose for any Ej a covering by closed subes Qj,n with
∞
X ǫ
m⋆ (Ej ) ≥ |Qj,n | − .
n=1
2j
S
Since E ⊂ j,n Qj,n is a covering by closed cubes we have
∞ X
∞ ∞ ∞
X X X ǫ X
m⋆ (E) ≤ |Qj,n | = |Qj,n | ≤ m⋆ (Ej ) + j ≤ m⋆ (Ej ) + ǫ
j,n j=1 n=1 j=1
2 j=1
where we inserted the previous estimate in the third step. Since this inequality
holds for any ǫ > 0 we are done.
To prove 3. we note that by 1. we clearly have m⋆ (E) ≤ inf m⋆ (U), so we
only need to show the ≥-direction. For this we can assume in addition that
m⋆ (E) < ∞ as otherwise the inequality holds trivially. To prove the inequality,
it clearly suffices to construct for any ǫ > 0 a set U with
11
and then, for each cube Qj choose an open cube Q0j with Qj ⊂ Q0j and |Qj | ≥
S∞
|Q0j | − 2ǫ 21j . We then define the union U := j=1 Q0j and claim it satisfies the
desired inequality. To check this we note
∞ ∞
X
0
X ǫ 1
m⋆ (U) ≤ |Qj | ≤ |Qj | + ≤ m⋆ (E) + ǫ
j=1 j=1
2 2j
where the first step follows from monotonicity and the last from inserting (6).
We leave the proof of 4. as an exercise. [Outline: Note that ≤-direction
follows from monotonicity. For ≥ choose
P 0 < δ < d (E1 , E2 ) and cover E =
E1 ∪ E2 by cubes such that m⋆ (E) ≥ ∞ j=1 |Qj | − ǫ. Refine the cubes such that
they all have length smaller than δ/2. Note that each cube can only intersect
either E1 or E2 . Conclude.]
For 5. we note that the ≤ direction follows from monotonicity. To show ≥,
we give ourselves ǫ of room. Let ǫ > 0 be fixed. For each Qj choose a closed
cube Q̃j strictly contained in Qj with
ǫ
|Q̃j | ≥ |Qj | − .
2j
For fixed N , the cubes Q̃j are disjoint and compact, hence (by an Exercise on
Example Sheet 2) finite distance apart. We can hence apply 4. and conclude
N N N N
[ X X X ǫ
m⋆ (E) ≥ m⋆ Q̃j = m⋆ Q̃j = |Q̃j | ≥ Qj | + j
j=1 j=1 j=1 j=1
2
m⋆ (U \ E) ≤ ǫ .
12
Proposition 2.2. The following sets are (Lebesgue) measurable:
1. Every open set is measurable.
2. Sets of exterior measure 0 are measurable.
3. A countable union of measurable sets is measurable.
4. Closed sets are measurable.
5. The complement of a measurable set is measurable.
6. A countable intersection of measurable sets is measurable.
Proof. Item (1) follows from the definition and (2) is a simple consequence of
the third property of the exterior measure. Namely, if E has measure zero, there
exists a U open with E ⊂ U and m⋆ (U) ≤ 0 + ǫ. Since U \ E ⊂ U monotonicity
implies m⋆ (U \ E) ≤ ǫ. For item (3) we use the standard 2ǫn -trick. Let A1 , A2 , ...
be measurable sets. This means that we can pick Ui open such that Ai ⊂ Ui
and m⋆ (Ui \ Ai ) ≤ 2ǫi . Now ∪i Ai ⊂ ∪i Ui and hence by monotonicity
∞
! !
[ [ [ X
m⋆ Ui \ Ai ≤ m⋆ (Ui \ Ai ) ≤ m⋆ (Ui \ Ai ) ≤ ǫ .
i i i i=1
Turning to (4) we first observe that it suffices to prove the claim for closed and
bounded (hence compact sets). This is because we can write an arbitrary closed
set F as a countable union of compact sets F = ∪∞
n=1 B n (0) ∩ F . We can
then use item (3). In particular we will assume now that F is compact, so in
particular m⋆ (F ) < ∞. Let ǫ > 0 be prescribed. By property (3) of the exterior
measure we find a U open such that F ⊂ U and m⋆ (U ) ≤ m⋆ (F ) + ǫ . Since
F is closed, U \ F is open and by Theorem 2.1 we Pcan express it as a countable
union of almost disjoint closed cubes, U \ F = ∞ j=1 Qj . Clearly it suffices to
show the measure of this is ǫ-small. To do this we observe that for any N , the
union K = ∪N i=1 Qj is compact. Since K and F are disjoint, they are a positive
distance apart and we conclude
N
X
m⋆ (U) ≥ m⋆ (K ∪ F ) = m⋆ (K) + m⋆ (F ) = m⋆ (Qj ) + m⋆ (F )
j=1
Since m⋆ (F ) < ∞ we can subtract it an combine the above with the boxed
inequality above to obtain for any N
N
X
m⋆ (Qj ) ≤ ǫ .
j=1
13
note that (Un )c is closed (hence measurable by (4)) and that S = ∪∞ c
n=1 (Un ) is
also measurable by (3). We now easily check the inclusions
to conclude
1
m⋆ (E c \ S) ≤ m⋆ (Un \ E) ≤
n
and hence that m⋆ (E c \ S) = 0. Since E c = S ∪ (E c \ S) is a union of two
measurable sets it is measurable.
For (6) it suffices to note that by de Morgan’s laws
c
∞ ∞
c
\ [
Ej = (Ej )
j=1 j=1
∞
X
m (E) = m (Ej ) .
j=1
Proof. We first claim that it suffices to prove this for the Ej being bounded.
Why? Suppose we had this result and we are trying to prove the general case.
We take (Qk )∞
k=1 the sequence of cubes of length k. We have Qk ⊂ Qk+1 for all
k ≥ 1 and we define S1 = Q1 and Sk = Qk \ Qk−1 for all k ≥ 2. We define
Ej,k = Ej ∩ Sk
which are measurable and bounded sets, disjoint for all j and k. We have
[ ∞
[
E= Ej,k and Ej = Ej,k
j,k k=1
are both disjoint unions of bounded measurable sets. Since we are assuming we
have the result in this case, we conclude
∞
[ X ∞ X
X ∞ ∞
X
m (E) = m Ej,k = m (Ej,k ) = m (Ej,k ) = m (Ej ) .
j,k j,k j=1 k=1 j=1
So now let’s prove it assuming that the Ej are bounded. Clearly “≤” holds by
monotonicity, so we only need to prove “≥” (and we can assume m (E) < ∞).
Recall the idea of how we proved this in the case of cubes: We found strictly
smaller closed cubes, then worked for finite N . The Lebesgue measurability
gives us the following analogue:
14
Lemma 2.3. Let E ⊂ Rd be measurable. Then for every ǫ > 0 there exists a
closed set F ⊂ E with m (E \ F ) ≤ ǫ.
Proof. Apply the definition of measurability to the complement: There exists
an open U with E c ⊂ U and m (U \ E c ) ≤ ǫ. If we define F = U c , then F is
closed and in view of E \ F = F c \ E c we have m (E \ F ) ≤ ǫ.
In particular, we can find a closed set Fj in each Ej with m (Ej \ Fj ) ≤
ǫ
2j . Now for fixed N the F1 , ...., FN are closed, bounded (hence compact) and
disjoint, hence positive distance apart and we can apply the properties of the
exterior measure to conclude for all N
N N N N
[ X X ǫ X
m (E) ≥ m Fj = m (Fj ) ≥ m (Ej ) − j ≥ m (Ej ) − ǫ .
j=1 j=1 j=1
2 j=1
Remark 2.1. The bold emphasises the additional condition in the decreasing
case. To see that the conclusion is not generally valid without this condition
consider the case En = [n, ∞).
Proof. For the first part, set G1 = E1 and Gk = Ek \ Ek−1 . The Gk are then
disjoint and measurable and ∪k Gk = ∪k Ek . Now apply countable additivity for
disjoint measurable sets (Theorem 2.2) to obtain
∞
X N
X
m (∪∞ ∞
m (Gi ) = lim m ∪N
i=1 Ei ) = m (∪i=1 Gi ) = m (Gi ) = lim i=1 Gi
N →∞ N →∞
i=1 i=1
15
F3 ⊂ .... and also m (Fj ) = m (E1 ) − m (Ej ) since E1 = Fj ∪ Ej is a disjoint
union of measurable sets. We also observe that ∪∞ j=1 Fj = E1 \ E. Combining
these things and using the first part we find
As m (E1 ) < ∞, we can subtract it from both sides and obtain the result.
Eh = E + h := {x + h | x ∈ E} for h ∈ Rd fixed.
16
Proof. Take the intersection of all σ-algebras which contain F . Note this inter-
section is non-empty as Mall is a σ-algebra containing F . The intersection is
the smallest σ algebra containing F in the sense that M (F ) is contained in any
σ-algebra which includes the sets from F . This also gives the uniqueness.
Definition 2.4. If U denotes the collection of all open sets in Rd , then M (U)
is called the Borel σ-algebra, denoted BR (containing the Borel sets).
Observation 2.1. The Borel σ-algebra can also be generated by closed cubes,
i.e. if Q denotes the collection of all closed cubes in Rd , then M (Q) = BR .
To verify the Observation note first that any open set lies in the σ-algebra
of closed cubes by Theorem 2.1 and conversely any cube lies in the Borel σ-
algebra. We combine this with the following general fact: If E is any collection
of subsets of Rd satisfying E ⊂ M (F ), then M (E) ⊂ M (F ). (Indeed, M (F )
is a σ-algebra containing E, so the smallest σ-algebra containing E must be
contained in it.)
It is of course natural to ask whether the Borel sets are properly contained in
the Lebesgue measurable sets. The answer is yes and the following Proposition
(as well as the exercises of Example Sheet 3) clarifies the relation between the
two. First we need a definition
Definition 2.5. A Gδ -set is a countable intersection of open sets. An Fσ -set
is countable union of closed sets.
Proposition 2.4. Let E ⊂ Rd . Then E is measurable
1. if and only if E is Gδ with a set of measure zero removed,
2. if and only if E is the union of an Fσ and a set of measure zero.
In particular, any Lebesgue measurable set can be obtained from a Borel set
by adjoining a set of a measure zero to the latter.
Proof. If E satisfies the conditions in (1) or (2) then E is measurable since Gδ ,
Fσ and sets of measure zero are measurable.
For the converse of (1) let now E be measurable. We choose for any n ≥ 1
an open Un with m (Un \ E) ≤ n1 . Then clearly S = ∩∞ n=1 Un is a Gδ containing
E and the measure of the complement S \ E is zero in view of m (S \ E) ≤
m (Un \ E) ≤ n1 being true for all n ≥ 1. Write E = S \ (S \ E).
For the converse of (2) let E be measurable and choose for any n ≥ 1 a
closed Fn with m (E \ Fn ) ≤ n1 . Then clearly S = ∪∞ n=1 Fn is an Fσ which
is contained in E and the complement E \ S has measure zero in view of
m (E \ S) ≤ m (E \ Fn ) ≤ n1 being true for all n ≥ 1. Write E = S∪(E \ S).
17
It is easy to check this is indeed an equivalence relation. By the fundamen-
tal theorem of equivalence relations X can be partitioned as the union of the
(disjoint) equivalence classes
[
X = [0, 1] = Eα
α
Nk := N + rk
from which we conclude α 6= β and that xα and xβ are in the same equiv-
alence class, which is impossible, since we selected precisely one element
from each equivalence class.
• We have Nk ⊂ [−1, 2] for any k and in particular ∪∞ k=1 Nk ⊂ [−1, 2].
This follows easily from the fact that N ⊂ [0, 1] and rk ∈ [−1, 1].
S
• We have [0, 1] ⊂ k Nk .
To see this, let x ∈ [0, 1]. Since x sits in some equivalence class, we have
x = xα + rk for some xα and some rk ∈ [−1, 1]. It follows that x ∈ Nk for
some k.
• We have m (N ) = m (Nk ) by translation invariance.
Combining the facts above we have
[
[0, 1] ⊂ Nk ⊂ [−1, 2]
k
with the union being a disjoint union. Monotonicity and the countable additivity
for measurable sets imply that
∞
X
1≤ m (Nk ) ≤ 3
k=1
18
2.6 Measurable Functions
From measurable sets we now turn to measurable functions. To motivate the
definition, it is actually worth taking a step back viewing things from a slightly
more abstract point of view.
19
where we noted that the inverse image commutes with arbitrary intersections.
c
c
To get (4) from (3) we note f −1 ((−∞, a)) = f −1 ([a, ∞) ) = f −1 ([a, ∞))
using that the inverse image commutes with taking the complement. I leave the
implication (4) to (5) to you and we conclude (5) =⇒ (1) by noting that any
open set U can be written as ∪∞n=1 (an , bn ) (why?).
20
Note that in view of Proposition 2.7, Proposition 2.5 follows immediately
from the observation that the intervals appearing in it (individually) generate
BR and similarly for Proposition 2.6.
lim sup (x) := lim sup (fn (x)) and lim inf (x) := lim inf (fn (x))
n→∞ n→∞ n→∞ n→∞
21
Proof. We observe that supn fn is measurable if the set {x| supn (fn (x)) > a}
is measurable for all a ∈ R. But the latter set can be written as ∪n {x|fn (x) >
a} = ∪n fn−1 ((a, ∞]) and this set is clearly measurable. The inf is done analo-
gously and for the lim sup it follows from the definition (7).
Corollary 2.1. If the sequence in the theorem converges to f , i.e. limn→∞ (x) =
f (x) for every x, then f is measurable.
and for a < 0 that {x | f 2 (x) > a} = {x | f 2 (x) > 0} ∪ {x | f (x) = 0}. For the
third assertion, observe that
n o n ao
x | k · f (x) > a = x |f (x) > ,
k
[
{x | (f + g) (x) > a} = {x | f (x) > a − r} ∩ {x | g (x) > r} ,
r∈Q
22
differ by a set of measure zero (which is measurable). Indeed, if N is the set
where the two sets differ, then
and the first set is measurable by the measurability of g and the second because
it is a subset of a set of measure 0.
It is easy to see that the function χE is measurable if and only if the set E is
measurable.
23
Theorem 2.7. Let f : Rd → [0, ∞] be measurable and non-negative. Then
∞
there exists an increasing sequence of non-negative simple functions (φk )k=1
that converges pointwise to f :
φk (x) ≤ φk+1 (x) for all k and x and lim φk (x) = f (x) for all x
k→∞
It is easy to see that FN is still measurable and that FN (x) → f (x) for every
x as N → ∞. We now approximate FN by a sequence of simple functions by
partitioning the range as follows: We decompose the range [0, N ] into N · M
intervals of length 1/M and define
n ℓ ℓ + 1o
Eℓ,M = x ∈ QN | < FN (x) ≤ for 0 ≤ ℓ < N M .
M M
These sets are all measurable and we can thus define the simple function
NX
M−1
ℓ
FN,M (x) = χE (x) .
M ℓ,M
ℓ=0
1
Note that by construction we have FN (x) − FN,M (x) ≤ M for all x. We finally
k
choose M = N = 2 and define a sequence of simple functions via
By construction we have F2k (x) − φk (x) ≤ 21k for all x and φk (x) → f (x) for
all x as k → ∞. Finally, φk is also increasing (why?).
We next remove the assumption that f should be non-negative:
|φk (x) | ≤ |φk+1 (x) | for all k and x and lim φk (x) = f (x) for all x.
k→∞
24
2.8.2 Approximating a mesurable function by step functions
We can also approximate a measurable function by the simpler step functions.
The price we pay is that the convergence is only almost everywhere. Recall
the measurable function (2) from the introduction to illustrate that one can-
not expect the convergence to hold everywhere when approximating with step
functions.
We first isolate the following proposition:
PN
Proposition 2.9. Let h = k=1 ak χEk be a simple function with m (Ek ) < ∞
for all k. Then for any ǫ > 0 there exists a step function ϕ such that
25
2.9 Littlewoods Three Principles
Before we turn to the integration theory let us introduce three heuristic prin-
ciples which nicely summarise the relation of the new notions of “measurable
set” and “measurable function” that we introduced above to the more familiar
notions of rectangles and continuous functions:
1. Every measurable set is nearly a finite union of closed cubes. (See Example
Sheet 2, Exercise 3c for the precise statement.)
2. Every measurable function is nearly continuous.
3. Every convergent sequence of measurable functions is nearly uniformly
convergent.
Let us first make item 3 more precise (Egoroff’s Theorem) and then use it
to formulate and prove item 2 precisely (Lusin’s Theorem).
Theorem 2.10 (Egoroff). Suppose (fk ) is a sequence of measurable functions
fk : E → R with E measurable and m (E) < ∞. Assume that
fk → f a.e. on E.
measure 0 (say N ) to f˜ such that fk → f˜ everywhere. Applying the result we obtain a closed
set Aǫ with fk → f˜ uniformly on Aǫ and m (E \ Aǫ ) < ǫ. We then choose an open set U with
N ⊂ U and m (U ) ≤ ǫ. We now have that fk → f˜ = f uniformly on the closed set Aǫ ∩ U c
and also that m (E \ (Aǫ ∩ U )) ≤ m (E \ Aǫ ) + m (U ) ≤ ǫ + ǫ = 2ǫ.
26
everywhere, so in particular E = k Ekn for any n. We also have Ekn ⊂ Ek+1 n
S
and hence Proposition 2.3 applies, giving limk→∞ m (Ekn ) = m (E). In view of
m (Ekn ) + m (E \ Ekn ) = m (E) we conclude that for any n there exists a kn such
that m (E \ Ekn ) ≤ 21n holds for all k > kn . We now define
\
Ãǫ = Eknn
n≥N
The point is that with this definition, fj is uniformly continuous on Ãǫ . To see
this, recall that what we have to show is given δ > 0 there exists a J such that
|fj (x) − f (x) | < δ holds for all j > J and all x ∈ Ãǫ . Now indeed if we choose
n ≥ N with n1 < δ we have for any x ∈ Ãǫ (hence x ∈ Eknn for any n ≥ N ) the
inequality
1
|fj (x) − f (x) | < < δ for all j > kn .
n
Note that kn just depends on δ (andǫ).
Finally, we claim that m E \ Ãǫ < 2ǫ . This simply follows from observing
that \ [
Eknn = E \ Eknn
E\
n≥N n≥N
Remark 2.6. Note that the theorem does not make the (stronger) statement
that f is continuous on E at the points of Fǫ . Again the example of (2) is useful
here. That function is discontinuous at all points of [0, 1]. What is Fǫ ?
Proof. We use the result from Example Sheet 3 that every measurable function
is the pointwise limit almost everywhere of a sequence of continuous functions
together with Egoroff’s Theorem.
Given f we find (fn ) continuous with fn → f almost everywhere. Using
Egoroff’s Theorem we find a closed set Fǫ ⊂ E where the convergence is uniform
and such that m (E \ Fǫ ) ≤ ǫ. But the limit of a uniformly convergent sequence
of continuous functions is continuous proving that f |Fǫ
27
3 Integration Theory: The Lebesgue Integral
From now on all functions that appear are assumed to be measurable. Our goal
is to define the Lebesgue integral of a measurable function. To achieve this, we
proceed in stages. We first define the integral for simple function, the bounded
functions supported on a set of finite measure, then non-negative functions and
finally the general case.
Note that with this definition we can already integrate the function (2) from
the introduction since the rationals in [0, 1] are measurable with measure zero.
Proposition 3.1. Let ϕ, ψ be simple functions. The Lebesgue integral defined
as in (8) satisfies
1. Independence of the representation: If ϕ = N
P
k=1 ak χEk is any represen-
tation of ϕ (not necessarily canonical), then
Z N
X
ϕ (x) dx = ak m (Ek ) .
Rd k=1
4. Monotonicity:
Z Z
d
ϕ ≤ ψ on R =⇒ ϕ≤ ψ.
Rd Rd
28
5. Triangle inequality: If ϕ is simple, then so is |ϕ| and
Z Z
ϕ ≤ |ϕ| .
R R
We have allowed ourselves to write the shorthand instead of Rd above.
Proof. The proof of (1) is a bit fiddly and is relegated to Example Sheet 4. For
(2) we note that
XN M
X
λϕ + µψ = λ a k χE k + µ bℓ χFℓ
k=1 ℓ=1
For (3) we simply note χE∪F = χE + χF and use the linearity established in
(2). For (4) we noteR that if η ≥ 0 then the canonical form is everywhere non-
negative and hence η ≥ 0 by definition. Setting η = ϕ − ψ and using linearity
PN
the result follows. For (5), we put ϕ in canonical form, ϕ = k=1 ak χEk , hence
PN
|ϕ| = k=1 |ak |χEk is simple and observe
Z N
X N
X Z
ϕ = ak m (Ek ) ≤ |ak |m (Ek ) = |ϕ| .
k=1 k=1
R R
Observation 3.1. If f = g almost everywhere for f, g simple, then f= g.
Indeed, if P
h = 0 almost everywhere, then its canonical Rrepresentation must
look like h = N k=1 ak χEk with m (Ek ) = 0, which implies h = 0.
29
Remark 3.1. Note that by Theorem 2.8, given an f as in the Lemma, there
always exists a (ϕn ) satisfying the assumptions in the Lemma.
R
Proof. Defining In = ϕn we set out to prove that In is a Cauchy sequence. Let
us fix ǫ > 0 arbitrary. Given the sequence (ϕn ) as in the Lemma, we first use
Egoroff’s Theorem to find a closed subset F of E where the convergence of ϕn
is uniform and such that m (E \ F ) ≤ ǫ. Given that the convergence is uniform
on F , we can find for the prescribed ǫ > 0 an N such that |ϕm (x) − ϕn (x) | ≤ ǫ
holds for all m, n ≥ N and all x ∈ F . But then for m, n ≥ N we have
Z Z Z
|Im − In | ≤ |ϕm − ϕn | = |ϕm − ϕn | + |ϕm − ϕn | ≤ ǫm (F ) + ǫ2M
E F E\F
where we used the triangle inequality for the second integral in the last step.
Since m (F ) ≤ m (E) < ∞ and ǫ is arbitrary we conclude that Im is Cauchy
and (1) is proven. For (2) one simply repeats the above argument now showing
that for any ǫ > 0 one can find N such that |In | ≤ ǫm (E) + ǫM for n ≥ N .
Using the Lemma we can define the Lebesgue integral for f : Rd → R a
bounded function supported on a set of finite measure:
Definition 3.1. Let f : Rd → R be a bounded function supported on a set E of
finite measure. We define its Lebesgue integral as
Z Z
f (x) dx := lim ϕn (x) dx (10)
Rd n→∞ Rd
30
• each fn supported on E with m (E) < ∞,
• fn (x) → f (x) for almost every x as n → ∞.
Then f is measurable, bounded a.e., supported on E for a.e. x and
Z
|fn − f | → 0 as n → 0,
E
Remark 3.2. We will only prove the theorem with the additional assumption
that |f (x) | ≤ M holds
R for all x (it is easy to see the assumptions already imply
this a.e.). Otherwise E |fn − f | appearing in the conclusion is not (yet) defined!
In Section 3.3 we will define the integral for unbounded functions and also see
that the integrand can always be changed on a set of measure 0 without affecting
the integral. Hence the conclusion of the Theorem remains true in this case.
Proof. The limiting function f is measurable combining Corollary 2.1 and the
remark in Section 2.6.6. We also have from combining the first and the third
item that |f (x) | ≤ M for a.e. x and combining the the second and the third
that f is supported on E except for a set of measure 0. It remains to show
the estimate. The proof is almost identical to the one of Lemma 3.1. Given
ǫ > 0 we find (by Egoroff’s theorem) F ⊂ E with fn → f uniformly on F and
m (E \ F ) ≤ ǫ. By the uniformity on F we can find N such that for n ≥ N we
have |fn (x) − f (x) | ≤ ǫ for all n ≥ N . But then
Z Z Z
|fn − f |dx = |fn − f |dx + |fn − f | ≤ ǫm (E) + 2M m (E \ F )
E F E\F
holds for all n ≥ N which proves the claim. (Note that we have used Remark
3.2 here.)
Remark 3.3. Note that the conclusion of the theorem can be phrased as the
interchange of the limit and the integral in this particular situation:
Z Z
lim fn = lim fn .
n→∞ E E n→∞
For the Riemann integral we needed uniform convergence to draw this conclu-
sion. Here we obtain uniform convergence up to an arbitrary small set from
Egoroff ’s theorem, which together with the boundedness of the functions in-
volved allows us to draw the above conclusion.
Remark 3.4. Note that boundedness is indeed essential as the example of
fn (x) = nχ(0, 1 ] shows.
n
31
3.2.1 Riemann integrable functions are Lebesgue integrable
Using the bounded convergence theorem, we can show that a Riemann integrable
function is measurable and that the Riemann integral agrees with the Lebesgue
integral in this case.
Theorem 3.2. Suppose f is Riemann integrable on the closed interval [a, b].
Then f is measurable and
Z R Z L
f (x) dx = f (x) dx .
[a,b] [a,b]
(cf. Section 1.1, refining the partitions of the upper and lower Riemann sums)
and with
Z R Z R Z R
lim ϕk (x) dx = f (x) dx = lim ψk (x) dx . (13)
k→∞ [a,b] [a,b] k→∞ [a,b]
Now on step functions the Riemann and the Lebesgue integral agree by defini-
tion, so for all k we have
Z R Z L Z R Z L
ϕk (x) dx = ϕk (x) dx , ψk (x) dx = ψk (x) dx .
[a,b] [a,b] [a,b] [a,b]
We now define pointwise ϕ̃ (x) = limk→∞ ϕk (x) and ψ̃ (x) = limk→∞ ψk (x),
these limits existing by monotonicity and boundedness of f (x). The functions
ϕ̃ and ψ̃ are measurable (Corollary 2.1), bounded and supported on [a, b]. Hence
the Bounded Convergence Theorem yields
Z L Z L Z L Z L
lim ϕk (x) dx = ϕ̃ (x) dx and lim ψk (x) dx = ψ̃ (x) dx.
k→∞ [a,b] [a,b] k→∞ [a,b] [a,b]
32
3.3 Non-negative functions
We proceed further in enlarging our class of functions that we can integrate.
We now consider measurable extended valued functions
f : Rd → [0, ∞]
Not only are these functions potentially unbounded, they can also take the
values ±∞ on a measurable set and they may also be supported on a set of
infinite measure, for instance all of Rd . We define the extended Lebesgue
integral for these functions by
Z Z
f (x) dx := sup g (x) dx (14)
Rd g
where we take the supremum over all measurable functions g such that 0 ≤ g ≤
f , such that g is bounded and supported on a set of finite measure.
Now the expression on the left is either finite or infinite. In the first case
we shall say that f is Lebesgue integrable. As usual we define for E ⊂ Rd
measurable the integral
Z Z
f (x) dx := f (x) χE (x) dx (15)
E Rd
−a/2
Example 3.1. The function Fa (x) = 1 + |x|2 is integrable for a > d.
The extended Lebesgue integral has the familiar properties of the integral:
Proposition 3.3. The integral defined in (14) and (15) satisfies the following:
1. Linearity: If f, g ≥ 0 and λ, µ ∈ R are both positive, then
Z Z Z
(λf + µg) = λ f + µ g
33
and taking the sup over all ϕ1 and ϕ2 we obtain the first direction. For the
second, we let η ≥ 0 be bounded, supported on a set of finite measure with
η ≤ f + g. We then define η1 (x) = min (f (x) , η (x)) and η2 = η − η1 . We note
η1 ≤ f and η2 ≤ g and hence
Z Z Z Z Z Z
η = (η1 + η2 ) = η1 + η2 ≤ f + g .
Taking the sup over all η we obtain the ≤ direction and item 1 is proven. The
second item then immediately follows using the linearity.
It remains to prove (5) and (6). For (5) we let Ek = {x | T f (x) ≥ k}
and E∞ = {x | f (x) = ∞}. We R then R have Ek ⊃ E k+1 and E ∞ = k Ek . The
integrability tells us that ∞ > f ≥ f χEk ≥ k ·m (Ek ) and hence m (Ek ) → 0
as k → ∞. Combining this with Proposition 2.3 we obtain m (E∞ ) = S∞ 0.
For (6) we let Ek = {x | f (x) > k1 }. Since {x | f (x) > 0} = k=1 Ek it
R conclude m ({x | f (x) > 0}) = 0.
suffices to show that m (Ek ) = 0 for Rall k to
We can infer this statement from 0 = f ≥ f χEk ≥ k1 m (Ek ) for all k ≥ 1.
R Note thatR the converse of (6) in also true. If η ≥ 0 and η = 0 a.e. then
η = supg g (x) dx = 0. (Otherwise there would exist a bounded function
supported on a set of measure zero with non-zero integral – a contradiction.)
This shows that the extra assumption in Remark 3.2 is unnecessary for the
conclusion of Theorem 3.1 to be valid.
Next we will try to prove convergence results for the extended Lebesgue
integral. In particular, we can revisit the example of Remark 3.4 and ask about
a general statement regarding the exchange of the limit and the integral if the
functions under consideration are not uniformly bounded.
Lemma 3.2 (Fatou’s Lemma). Let (fn ) be a sequence of measurable functions
with fn ≥ 0. If limn→∞ fn (x) = f (x) for almost every x, then
Z Z
f ≤ lim inf fn .
n→∞
Note that both the left hand side and the right hand side may be +∞.
Proof. We first note that it suffices to prove
Z Z
g ≤ lim inf fn
n→∞
so taking the lim inf on the left already produces the desired result.
34
For a general sequence of measurable non-negative functions the inequality
given by Fatou’s Lemma is the best one can do. However, under additional
assumption the interchange of the limit and the integral (and hence equality)
can be inferred. The following theorem provides such a setting and is one of the
cornerstones of the Lebesgue-theory of integration. It will be used many many
times in the sequel so it is important to know this statement well!
Theorem 3.3 (Monotone Convergence Theorem, MCT). Let (fn ) be a sequence
of (extended real-valued) measurable functions with fn ≥ 0 and fn → f a.e.
Suppose in addition fn (x) ≤ fn+1 (x) holds a.e. in x for any n. Then
Z Z
lim fn = f . (16)
n→∞
R R
Proof. We have fn ≤ f for any n by monotonicity of the integral. Hence
Z Z Z
lim sup fn ≤ f ≤ lim inf fn
n→∞ n→∞
where f + = max (f (x), 0) and f − = max (−f (x), 0) are the positive and negative
part of f respectively.
Remark 3.8. Note that f = f + − f − and |f | = f + + f − and hence f ± ≤ |f |,
so f ± are indeed integrable. Note also that given two different decompositions of
f into a difference of non-negativeRfunctions,
R −i.e. Rf = f +R − f − = g + − g − , one
has f + g = f + g and hence f + g = fR + gR+ by linearity
+ − − + + −
ofR the
integral on non-negative functions. It follows that f + − f − = g + − g −
R
and hence that the value of the integral of f is independent of the decomposition.
4 We could allow R here but we will see below that the notion of integrability requires the
35
Remark 3.9. We can always modify f on a set of measure zero without affecting
the integrability of f or the value of the integral (cf. the comments after the proof
of Proposition 3.3). Therefore, we may adopt the convention that a function f
can be undefined on a set of measure zero. Cf. also Footnote 4.
Proposition 3.4. The integral defined above is linear, additive, monotone and
satisfies the triangle inequality.
Proof. Exercise.
We next prove two interesting regularity properties of integrable functions:
Proposition 3.5. Suppose f is integrable on Rd . Then for every ǫ > 0
1. There exists a set of finite measure B (a large ball, for instance) such that
Z
|f | < ǫ “vanishing at infinity”
Bc
The first statement expresses the intuitive fact that the function f has to
go to zero at infinity in a suitable sense in order to be integrable. The second
property says that for fixed f , if one integrates over sufficiently small sets, the
integral is small as well. The name absolute continuity will become clearer to
us later (see Section 4.2.5 and also Example Sheet 5).
Proof. Wlog f ≥ 0 since otherwise we look at |f |.
For the first statement let Bn be the ball of radius n centred at the origin
and fRn (x) = Rf (x) χBn (x). Note that fn ր f and hence by the MCT
R (Theorem
R
3.3), fn → f < ∞. RThis means there exists an N such that f − fN < ǫ
which is equivalent to f (x) χ(BN )c < ǫ and hence the desired result.
For the second statement we set En = {x | f (x) ≤ n}R and fnR(x) :=
f (x)χEn (x). Noting that fn ր f and hence by the MCT fn → f , we
conclude the existence of an N with f − fN < 2ǫ . But this means that
R R
ǫ
Z Z Z
f= f − fN + fN < + m (E) N .
E E E 2
ǫ
R
Now if we choose δ < 2N , then m (E) < δ implies that E f < ǫ as desired.
We are ready to prove the other cornerstone of the Lebesgue Theory, the
dominated convergence theorem (DCT):
Theorem 3.4 (Dominated Convergence Theorem (DCT)). Let (fn ) be a se-
quence of measurable functions with fn → f a.e. If |fn (x) | ≤ g (x) where g is
integrable, then f is integrable and
Z
|fn − f | → 0 as n → ∞.
36
Proof. We provide two proofs. One is via Fatou’s Lemma. Starting from |fn −
f | ≤ 2g a.e., which holds by the triangle inequality, we apply Fatou’s Lemma to
the sequence of non-negative (after a change on a set of measure zero) functions
2g − |f − fn | to obtain
Z Z
2g ≤ 2g + lim inf (−|fn − f |) .
n→∞
R R
Since g is finite by assumption we obtain lim supn→∞ |fn − f | ≤ 0 which
proves the result.
The other proof
R uses Proposition 3.5. Given ǫ > 0 we first choose a large ball
BM such that (BM )c g < ǫ by (1) of Proposition 3.5. We next invoke Egoroff’s
theorem to choose X ⊂ BM with m (X) < δ such that fn → f uniformly on
BM \RX. Here δ > 0 is chosen as in (2) of Proposition 3.5, i.e. in particular so
that X g < ǫ. Using the uniform convergence on BM \ X we choose N large
such that |fn (x) − f (x) | < m(Bǫ M ) holds for all n ≥ N and all x ∈ BM \ X.
Combining everything we obtain for n ≥ N
Z Z Z Z
|fn − f | = |fn − f | + |fn − f | + |fn − f |
(BM )c BM \X X
Z Z Z
|fn − f | ≤ 2g + ǫ + 2g ≤ 2ǫ + ǫ + 2ǫ < 5ǫ
(BM )c X
As considering complex valued functions does not really add anything concep-
tually new to the theory we will continue to develop it for real valued functions.
37
Exercise 3.1. Show that L1 Rd is a normed vectorspace if we define the ele-
1
kfnj+1 − fnj kL1 ≤ for all j ≥ 1. (17)
2j
We the define
k
X
gK (x) = |fn1 (x) | + |fnj+1 (x) − fnj (x) |
j=1
∞
X
g (x) = |fn1 (x) | + |fnj+1 (x) − fnj (x) |
j=1
the last inequality following from the property (17) and that fn1 ∈ L1 . We
conclude that g is integrable and hence g (x) < ∞ for a.e. x. This implies that
the sum converges absolutely for a.e. x which means that the right hand side of
k
X
fnj+1 (x) = fn1 (x) + fnj+1 (x) − fnj (x)
j=1
converges for a.e. x, hence so does the left hand side. We conclude that the
subsequence fnj+1 converges pointwise for a.e. x to some limiting function which
we call f . Since |f (x) | ≤ g (x) for a.e. x we conclude that f is integrable, so
f ∈ L1 Rd . It remains to show that fnj → f in L1 . But this is immediate
from |f (x) − fnk (x) | ≤ 2g for a.e. x and the dominant convergence theorem.
5 Going to a subsequence is necessary as we can have kf − f k
n L1 → 0 for some (fn ) and f
such that fn (x) → f (x) for no x! See Example Sheet 6.
38
To finish the proof we recall that if a subsequence of a Cauchy sequence
converges to a limit f , then so must the entire sequence.6 Hence fn → f in L1
and the theorem is proven.
Corollary 3.1. Let fn → f in L1 . Then there exists a subsequence (fnk ) with
fnk → f a.e. pointwise.
Proof. The assumption implies that (fn ) is Cauchy in L1 . Then repeat the
construction in the proof of Theorem 3.5.
3.7 Dense families in L1 Rd
We next consider certain families of simple (both in the colloquial and the precise
sense) functions which are dense in L1 . Recall the definition of dense:
Definition 3.3. A family G of integrable functions is dense in L1 if for any
f ∈ L1 we can find a g ∈ G with kf − gkL1 < ǫ.
Why are dense families useful? In a typical application one wants to establish
an identity for integrable functions which involves the L1 -norm. To prove the
identity, it may be simpler to prove it for a dense family of functions in L1
because a (say) continuous function is much easier to manipulate than a general
element of L1 . Finally, a density argument allows one to extend the identity to
all L1 -functions. Example Sheet 5 provides an example.
Theorem 3.6. The following families of functions are dense in L1 Rd
1. simple functions
2. step functions
3. continuous functions of compact support
Proof. Exercise. Outline: For the first note that one may assume f ≥ 0 as one
can approximate separately for f + and f − . Then approximate f with (ϕk ) an
increasing sequence of simple functions converging pointwise to f and apply the
MCT to show convergence in L1 . For the second part it suffices to approximate
the characteristic function of a set of finite measure by a step function (why).
For this Problem 3 from Example Sheet 2 will be handy. Finally for the third
conclusion one needs to smooth the edges of a step function.
39
or first in x and then in y
Z 1 Z 1
x2 − y 2 x2 − y 2 π
Z
?
2 2 2
dydx = dy dx 2 2 )2
=− .
[0,1]×[0,1] (x + y ) 0 0 (x + y 4
The fact that the result depends on the order in which the integration is carried
out tells us that some care is needed to state assumptions when a d-dimensional
integral can be computed in terms of iterated ones.
the slice corresponding to y ∈ Rd2 as the function f y (x) := f (x, y) with y fixed
the slice corresponding to x ∈ Rd1 as the function f x (y) := f (x, y) with x fixed
40
Theorem 3.8 (Tonelli). Let f : Rd1 ×Rd2 → [0, ∞] be measurable and non-negative
on Rd1 × Rd2 . Then for almost every y ∈ Rd2
1. The slice f y is measurable on Rd1
2. The function defined by y 7→ Rd1 f y (x) dx is measurable on Rd2 .
R
Remark 3.12. Both theorems are symmetric in x and y, i.e. the R conclusion
(in say Fubini) is also that f x is integrable a.e. in Rd2 , that x 7→ Rd2 f x (y) dy
is integrable in Rd1 and that
Z Z Z
dx f (x, y) dy = f.
Rd 1 Rd 2 Rd
Remark 3.13. Note that the function in (2) of Fubini, y 7→ Rd1 f y (x) dx, is
R
defined for almost every y. This is consistent with our earlier convention that
an integrable function can be undefined on a set of measureR zero, cf. Remark
3.9. Similarly for (2) in Tonelli’s Theorem, where y 7→ Rd1 f y (x) dx is a
measurable function on Rd2 minus a set of measure 0 and hence agrees a.e. with
a measurable function on Rd2 .
Remark 3.14. Note that in Fubini’s Theorem we are assuming that f is inte-
grable. In Tonelli’s theorem this is not assumed and in particular both sides of
(3) in Tonelli’s Theorem can be infinite. The point is that if that is the case,
both, the iterated integrals and the d-dimensional
R one have to yield +∞! This
provides a useful
R strategy to compute Rd f for an arbitrary measurable function:
First compute Rd |f |. To compute this, one can by Tonelli’s theorem use ANY
convenient iterated integration:
• If one of them yields +∞ then all of them have to and we can conclude
by (3) of Tonelli that f is not integrable.
• If one of them yields a number smaller than +∞, then any iteration of
integrals has to yield that number and (3) of Tonelli implies that f is
integrable. Now the assumptions
R of Fubini’s theorem hold for this f and
we are allowed to compute Rd f using any version of iterated integrals.
Examples of this will be seen on Example Sheet 6.
Remark 3.15. Revisiting the example in the beginning we conclude that this f
cannot be integrable over the unit square. (Exercise: Show this directly.)
Remark 3.16. Even if both of the iterated integrals exist and agree one cannot
xy
infer that f in integrable over Rd . Try the function defined by f (x, y) = (x2 +y 2 )2
2 2
for x + y 6= 0 and f (x, y) = 0 for x = y = 0. [The proof is easiest in
polar coordinates which we have not introduced rigorously yet but give you an
immediate intuition of how things fail here.]
41
3.8.3 The proof of Tonelli’s Theorem (using Fubini)
Since we want to apply Fubini’s theorem, we start with the truncation
f (x, y) if |(x, y)| ≤ k and f (x, y) < k
fk (x, y) =
0 otherwise
Note that the right hand side may well be +∞Rfor some y ∈ E c !
Applying Fubini again, we know that y 7→ Rd1 fk (x, y) dx is a sequence of
d2
R almost everywhere on R . By
integrable (hence measurable) functions defined
(19) this sequence increases to the function Rd1 f (x, y), hence the latter is a
measurable function, proving (2) of Tonelli.
By the remarks in the previous paragraph, we can apply the MCT again to
(19) obtaining
Z Z Z Z
dy dx fk (x, y) → dy dx f (x, y) . (20)
Rd 2 Rd 1 Rd 2 Rd 1
Combining (20), (21) and (18) yields the statement (3) of Tonelli’s theorem.
42
(b) Prove it for E the boundary of a closed cube.
(c) Prove it for E a finite union of closed cubes.
(d) Prove it for E open and of finite measure
(e) Prove it for E a Gδ ofr finite measure
(f) Prove it for E having measure zero.
(g) Prove it for E an arbitrary finite measure set.
(4) Conclude that any f ∈ L1 Rd is in F by approximating f with simple
from the MCT. Furthermore, by the assumption that fk ∈ F , we have for each
k a set Ak with m (Ak ) = 0 and fky S being (measurable and) integrable on Rd1
∞
for y ∈ (Ak ) . Setting as usual A = k=1 Ak we have m (A) = 0 and fky being
c
c
integrable for all k and all y ∈ A .
Now from the fact that fky ր f y , it follows that f y is measurable and the
MCT produces Z Z
fky (x) dx ր f y (x) dx (23)
Rd 1 Rd 1
with the left hand side being integrable (hence measurable) by assumption. It
follows that the right hand side is measurable and applying the MCT again
yields Z Z Z Z
dy fky (x) dx → dy f y (x) dx . (24)
Rd 2 Rd 1 Rd 2 Rd 1
By (3) of Fubini applied to fk ∈ F we know that the left hand side satisfies
Z Z Z Z
y
dy fk (x) dx = fk (x, y) → f (x, y) (25)
Rd 2 Rd 1 Rd Rd
with the second (limit) statement being simply (22) from above. Combining
(24) and (25) yields the conclusion (3) of Fubini for f , namely
Z Z Z
dy f y (x) dx = f (x, y) < ∞ . (26)
Rd 2 Rd 1 Rd
43
Here the < ∞ follows from the assumption that f ∈ L1 . We now see that (26)
implies that the measurable function Rd1 f y (x)
R
R is integrable for almost every
y (which is conclusion (2) of Fubini) and from Rd1 f y (x) < ∞ for almost every
y we conclude that f y is integrable (which is conclusion (1) of Fubini).
Step 3.
(a) Let E a bounded open cube in Rd , E = Q1 × Q2 with Qi and open cube in
Rdi . For each fixed y, the characteristic function χE (x, y) is measurable
in x and integrable with (recall the notation (4))
Z
g (y) = χE (x, y) dx = |Q1 |χQ2
Rd 1
44
(f) Let E be a set of measure zero. There is a Gδ -set G with E ⊂ G and
m (G) = 0 (why?). We know that χG ∈ F by the previous step and from
Z Z Z
dy dx χG (x, y) = χG = 0
Rd 2 Rd 1 Rd
R
we infer Rd1 χG (x, y) = 0 for a.e. y. Since 0 ≤ χE ≤ χG , the same state-
ment holds for χE . The three conclusions of Fubini are now immediate
and we conclude χE ∈ F .
(g) Let E be an arbitrary measurable set of finite measure. By Proposition
2.4 we can write E = G \ N for G a Gδ -set and N as set of measure
zero contained in G. Therefore χE = χG − χN and since this is a finite
linear combination of functions belonging to F we conclude by Step 1 that
χE ∈ F .
Step 4. We now conclude the proof. If f ∈ L1 Rd we have f = f + − f −
45
4 Differentiation and Integration
Now that we have defined a new integral, the Lebesgue integral, we shall investi-
gate its relation with differentiation. In your first year analysis courses you met
this relation as the Fundamental Theorem of Calculus (involving the Riemann
integral).
is differentiable for almost every x ∈ [a, b] and F ′ (x) = f (x) holds for a.e. x ∈
[a, b].
Note that if f is continuous this statement holds by the fundamental theorem
of calculus. The above theorem indeed turns out to be true as stated. To prove
it, we will be lead to the averaging problem.
It is easy to see that Theorem 4.1 follows if we can show that for almost
every x we have
x+h x
1 1
Z Z
lim f (y) dy = lim f (y) dy = f (x) . (28)
h→0+ h x h→0+ h x−h
where I denotes a (say open) interval containing x. Below we will study the
averaging problem in dimension d. More precisely, we will prove the following
Theorem 4.2. (Lebesgue-Differentiation-Theorem) Suppose f is integrable on
Rd . Then
1
Z
lim f (y) dy = f (x) holds for a.e. x,
m(B)→0 m (B) B
x∈B
46
4.1.1 Proof of the Lebesgue Differentiation Theorem
To prove Theorem 4.2 we shall use the observation that the Theorem is true
for f being continuous and that the continuous functions are dense in L1 . To
estimate the error-terms that arise in the approximation by continuous functions
we shall need the important Hardy-Littlewood maximal function.
Definition 4.1. Let f : Rd → R be integrable. The maximal function f ⋆ is
defined by
1
Z
f ⋆ (x) = sup |f (y) |dy
B∋x m (B) B
3d
m {x ∈ Rd | f ⋆ (x) > α} ≤ kf kL1 (Rd ) .
α
47
has measure zero.7 To do this, we fix α and show for any ǫ > 0 we have
m (Eα ) < ǫ, hence m (Eα ) = 0.
Fix α and let ǫ > 0. We choose a continuous function of compact support g
with
kf − gkL1 (Rd ) < ǫ .
Since g is continuous, we have for all x that (why?)
1
Z
lim g (y) dy = g (x) .
m(B)→0 m (B) B
x∈B
Taking the lim sup we observe that the second term on the right goes to zero
while the first term can be estimated by the maximal function as clearly the
lim sup is dominated by the sup over all balls. Hence
1
Z
lim sup f (y) dy − f (x) ≤ (f ⋆ − g ⋆ )(x) + |g(x) − f (x)|. (30)
m(B)→0 m (B) B
x∈B
48
4.1.2 Proof of Proposition 4.1
The first assertion follows from observing that Eα = {x ∈ Rd | f ⋆ (x) >
α} is Ropen. Indeed, if x ∈ Eα , there is an open ball B with x ∈ B and
1
m(B) B |f (y)|dy > α. But then, since a small neighborhood of x is also con-
tained in B we have f (x̃) > α for all points in that neighborhood.
Assertion (3) follows from (2) by observing that
∞
\
{x | f ⋆ (x) = ∞} = {x ∈ Rd | f (x) > n}.
n=1
d
and hence by monotonicity and (2), m ({x | f ⋆ (x) = ∞}) ≤ 3n kf kL1 for any n,
which proves m ({x | f ⋆ (x) = ∞}) = 0 as desired.
To prove (2) we use the following version of the Vitali Covering Lemma
Lemma 4.1. Suppose B = {B1 , B2 , ..., BN } is a finite collection of open balls
in Rd . Then there exists a disjoint subcollection Bi1 , ..., Bik of B that satisfies
N k
!
[ X
Bn ≤ 3 d
m m Bij .
n=1 j=1
where we have used the dilation invariance of the Lebesgue measure in the last
step.
With the Lemma we can prove (3) of the Proposition. We let
Eα = {x | f ⋆ (x) > α} .
Given x ∈ Eα there exists a ball Bx containing x with
1 1
Z Z
|f (y)|dy > α or equivalently m (Bx ) ≤ |f (y)|dy. (33)
m(Bx ) Bx α Bx
49
S
We fix an arbitrary compact set K ⊂ Eα . We have K ⊂ x∈Eα Bx and by
SN
compactness, K ⊂ n=1 Bn for a finite subcollection. Now
N k k Z
!
[ X 3d X
m (K) ≤ m Bn ≤ 3 d m Bij ≤ |f (y)|dy
n=1 j=1
α j=1 Bij
where we have used the monotonicity in the first, the covering Lemma in the
second and (33) in the third step. Now it is clear that we have
3d 3d
Z Z
m (K) ≤ |f (y)|dy ≤ |f (y)|dy .
α Sj Bij α Rd
Since this holds for all compact subsets K of Eα the estimate holds for Eα itself
(Exhaust Eα by increasing compact sets and take the limit.)
50
The length of a recitifable curve is defined as the smallest such M or,
equivalently, as the sup over all partitions, i.e.
N
X
L (γ) = sup |γ(tj ) − γ(tj−1 )|
a=t0 <t1 <...<tN =b j=1
The question we now ask is: What conditions on x(t) and y(t) for a given curve
guarantee that the curve is rectifiable? If x and y are continuously differentiable,
we know that this is the case and we can even establish the formula L(γ) =
Rb Rb p
a
dt|γ̇|dt = a dt ẋ2 (t) + ẏ 2 (t)dt. But what about weaker conditions?
Let F : [a, b] → R be a function, not necessarily continuous. Then given a
partition P of [a, b], say a = t0 < t1 < ... < tN = b, we define
N
X
VF,P = |F (tj ) − F (tj−1 )| to be the variation of F on P.
i=1
51
2. If F : [a, b] → R is Lipschitz on [a, b] then F is of bounded variation.
To see this, let |F (y) − F (x)| ≤ L|y − x| for all x, y ∈ [a, b].9 But then
N
X N
X
|F (tj ) − F (tj+1 )| = L|tj+1 − tj | ≤ L(b − a)
j=1 j=1
Note that (as the last example shows) F continuous does not imply F of bounded
variation (and neither the other way around, as a simple jump function shows)!
the positive variation of F (where the sum is over those j for which F (tj ) −
F (tj−1 ) ≥ 0 and
X
NF (a, x) = sup − (F (tj ) − F (tj−1 ))
P of [a,x] (−)
the negative variation of F (where the sum is over those j for which F (tj ) −
F (tj−1 ) ≤ 0. Observe that TF and also PF and NF are increasing and bounded,
hence functions of bounded variation. The functions are related as follows:
Lemma 4.2. Suppose F : [a, b] → R is of bounded variation on [a, b]. Then for
all a ≤ x ≤ b we have the relations:
52
Proof. Note that the above relations would clearly hold for any partition if there
was no sup in the definition of NF , PF and TF . The idea of the proof is therefore
to borrow an ǫ. Let ǫ > 0 be given. We pick partitions P1 and P2 such that
X X
0 ≤ PF − F (tj ) − F (tj−1 ) < ǫ , 0 ≤ NF − −(F (tj ) − F (tj−1 )) < ǫ
(+)∈P1 (−)∈P2
we can add NF − PF on both sides and use the above estimates to obtain
Since ǫ was arbitrary the first relation is established. For the second estimate
we note that for any partition P of [a, x], a = t0 < t1 < ... < tN = x we have
N
X X X
|F (tj ) − F (tj−1 )| = F (tj ) − F (tj−1 ) + − (F (tj ) − F (tj−1 )) .
j=1 (+)∈P (−)∈P
which holds for any partition. Given ǫ > 0 find partitions P1 and P2 with
X X
0 ≤ PF − F (tj ) − F (tj−1 ) < ǫ , 0 ≤ NF − F (tj ) − F (tj−1 ) < ǫ
(+)∈P1 (+)∈P2
Both F1 and F2 are increasing and bounded and their difference is F (x) by the
Lemma.
53
4.2.4 Bounded variation implies differentiable a.e.
Now that we have introduced the class of functions of bounded variation we can
answer the question a) posed at the beginning of Section 4.2 by the following
Theorem 4.6. If F : [a, b] → R is of bounded variation, then F is differentiable
a.e., i.e. the limit
F (x + h) − F (x)
lim
h→0 h
exists for almost every x ∈ [a, b].
The proof of this theorem is quite intricate and we will postpone it to Sec-
tion 4.2.7, where we prove it under the additional assumption that F is also
continuous. For the general case, see Stein-Shakarchi.
Even the weaker statement (assuming that F is also continuous) leads to
the following corollaries, the first one following from the earlier observation that
Lipschitz functions are of bounded variation:
Corollary 4.2 (Rademacher’s theorem in 1 dimension). If F : [a, b] → R is
Lipschitz, then it is differentiable a.e.
Corollary 4.3. If F : [a, b] → R is increasing and continuous, then F ′ exists
almost everywhere. Moreover F ′ is integrable and
Z b
F ′ (x)dx ≤ F (b) − F (a) . (37)
a
Remark 4.2. Recall that ideally we would like to show the equality (34) instead
of the inequality in the corollary. However, the example of the Cantor-Lebesgue
function, studied in detail on the example Sheets 1, 3 and 8 shows that one
cannot expect equality to hold without additional assumptions (beyond bounded
variation) on F . The additional condition guaranteeing (34) will be that of
absolute continuity. See Section 4.2.5.
Proof. Note that F is certainly in BV and hence the derivative F ′ exists a.e. by
Theorem 4.6. In particular, for n ≥ 1 the difference quotients
F (x + n1 ) − F (x)
Gn (x) = 1 ≥0
n
54
from which it is manifest that the right hand side converges (independently of
the extension) to F (b) − F (a) as desired, in view of the continuity of F .
with f integrable on [a, b], then F is absolutely continuous. This follows from
Proposition 3.5 and hence justifies the name introduced there. From this obser-
vation it is immediate that absolute continuity of F is a necessary condition if
F is to satisfy the identity (34): Indeed, if (34) holds, then F ′ is integrable and
therefore the right hand side of (34) is absolutely continuous. Hence so is the
right hand side.
The next theorem shows that absolute continuity is also a sufficient condi-
tion.
55
Theorem 4.7 (Fundamental Theorem of Lebesgue integration).
1. If F : [a, b] → R is absolutely continuous on [a, b], then F ′ exists almost
everywhere and is integrable. Moreover
Z x
F (x) − F (a) = F ′ (y)dy holds for all a ≤ x ≤ b. (39)
a
Note that we already proved the second part of the theorem: Indeed we have
observed that the expression for F is absolutely integrable and by the Lebesgue
differentiation theorem we know that F ′ (x) = f (x) holds almost everywhere.
Hence the difficulty is proving the first part. I claim the first part will follow if
we can prove
Theorem 4.8. If F : [a, b] → R is absolutely continuous on [a, b], then F ′ exists
a.e. Moreover, if F ′ (x) = 0 for a.e. x, then F is constant.
Indeed, assuming Theorem 4.8 for the moment, the proof of Theorem 4.7
becomes rather short. We first observe that F absolutely continuous implies that
F is continuous and of bounded variation (Lemma 4.3) and hence that the F1
and F2 in the decomposition F = F1 −F2 are increasing and continuous (Lemma
4.4). Corollary 4.3 then implies that (F1 and F2 hence) F is differentiable
a.e. and also that (F1′ and F ′ ′
R 2x hence) F is integrable on [a, b].
We now define G(x) = a F ′ (y)dy. Clearly G is absolutely continuous and
hence so is G(x) − F (x). Theorem 4.1 implies that G′ (x) = F ′ (x) a.e. We
conclude that the function G − F is absolutely continuous an has derivative zero
a.e. and applying Theorem 4.8 that G − F is constant. Observing (G − F )(x) =
(G − F )(a) = −F (a) produces the identity (39).
It remains to prove Theorem 4.8. Just like the proof of Theorem 4.6 (which
we postponed to Section 4.2.7), the proof is quite intricate and isolated in the
following subsection 4.2.6. While you don’t have to remember the details of
the proof of these theorems you should realised that they (together with the
Lebesgue differentiation theorem) are at the heart of the Fundamental Theorem
of Lebesgue integration, Theorem 4.7.
56
In other words, in a Vitali covering of E every point is covered by arbitrary
small balls. The next lemma asserts that give a Vitali covering of a set of finite
measure, we can pick finitely many balls which cover the set E up to an arbitrary
prescribed δ > 0:
Lemma 4.5. Suppose E ⊂ Rd is a set of finite measure, m (E) < ∞, and B
is a Vitali covering of E. Then, for any δ > 0 we can find finitely many balls
B1 , B2 , ..., BN in B which are disjoint and so that
N
X
m (Bj ) ≥ m (E) − δ .
j=1
Note that the first estimate alone does still allow a large fraction of E not
to be covered by balls since the positivity on the left hand side of the inequality
could come from a large ball which lies mostly outside of E. The second estimate
excludes that possibility stating that the set that is not covered by balls of the
finite subcollection is also small.(Draw some pictures!)
Proof. The idea of the proof of the Lemma is to approximate the set E from
inside using compact sets, and then use the elementary covering Lemma 4.1 to
extract a finite disjoint subcollection covering at least a part of E. One then
looks at the part not yet covered and – in case it is still too large – approximates
it again from inside by a compact set, applies the old covering lemma and so
on. This procedure will eventually lead to the δ approximation claimed.
The details are as follows. It clearly suffices to prove the estimates for
δ < m (E). Fix such a δ. Let us also pick an open set U with E ⊂ U and
m (U) < m (E) + δ.
Step 1: We pick a compact set E1′ ⊂ E with m (E1′ ) ≥ m (E) − ǫ > δ − ǫ ≥ δ
(why can we do this? Example Sheet 3!). We cover E1′ by balls from B such
that every ball of the covering also lies in U (this is possible because E and
hence E1′ are covered by balls of arbitrarily small radius). Using compactness
we choose a finite sub-collection of balls covering E1′ (and contained in U) and
using Lemma 4.1 a finite disjoint subcollection B1 , ...BN1 such that
N1
X 1 δ
m (Bi ) ≥ d
m (E1′ ) ≥ d .
i=1
3 3
Step 2: If
N1
X
m (Bi ) ≥ m (E) − δ
i=1
57
and hence
N1
[
E2 = E \ Bi
i=1
has measure m (E2 ) > δ (why?). We then repeat procedure of Step 1, i.e. we
find a compact subset E2′ with m (E2′ ) ≥ δ. We can cover the set E2′ by finitely
SN1
many balls contained in U and disjoint from i=1 Bi (why? – note that any
′
S N1
point in E2 has finite distance from i=1 Bi ). Using the old covering Lemma
4.1, we select a finite disjoint collection of these balls BN1 +1 , ..., BN2 such that
N2
X 1 δ
m (Bi ) ≥ m (E2′ ) ≥ d .
3d 3
i=N1 +1
in which case we repeat the procedure in Step 2. If the procedure has not
terminated after k steps we have the estimate
Nk
X δ
m (Bi ) ≥ k .
i=1
3d
58
Using the Lemma (in dimension d = 1), we can now complete the proof
of Theorem 4.8. Note that the difficulty is to prove that F ′ = 0 implies F is
constant, since the existence of F ′ follows from the fact that F is of bounded
variation and Theorem 4.6.
It clearly suffices to show F (b) = F (a) as we can then replace [a, b] by an
arbitrarily small subinterval. We let
E := {x ∈ (a, b) | F ′ (x) exists and is zero}
and we know m (E) = b − a by assumption. We have for each x ∈ E that
F (x + h) − F (x)
lim = 0.
h→0 h
Fix now ǫ > 0. In view of the existence of the above limit, we can find for each
η > 0 an open interval around each x ∈ E, Ix = (ax , bx ) ⊂ [a, b] with length
smaller than η, i.e. bx − ax < η, such that
|F (bx ) − F (ax )| ≤ ǫ(bx − ax ) .
The collection of these intervals forms a Vitali covering of the set E. Hence we
can apply Lemma 4.5: For any δ > 0 (which we will choose momentarily de-
pendent on ǫ) we can select finitely many disjoint intervals Ii = Ixi = (axi , bxi )
with 1 ≤ i ≤ N such that
N
X
m (Ii ) ≥ m (E) − δ = (b − a) − δ
i=1
because the intervals (ai , bi ) are disjoint and contained in [a, b].
SN
We now consider the complement of j=1 Ij in [a, b], denoted A. It consists
SM
of finitely many closed intervals with m (A) < δ, so A = k=1 [αk , βk ].
The idea is to use the absolute continuity on these disjoint intervals. Indeed,
we choose δ sufficiently small (depending only on ǫ) such that m (A) < δ implies
M
X
|F (βk ) − F (αk ) | ≤ ǫ .
j=1
Since this holds for any ǫ > 0 we conclude F (b) = F (a) as desired.
59
5 Abstract Measure Theory
So far we have successfully dealt with the problem of defining a measure for sets
on Rn . We recall that the main steps of the analysis were
1. an elementary notion of measure for the simplest sets (rectangles or cubes)
2. the introduction of an exterior measure (defined on all subsets of Rd )
which assigned a “measure” as the infimum of countable coverings by
cubes and was consistent with the elementary measure on the rectangles
3. the introduction of the class of Lebesgue measurable sets which satisfied
the desired property of countable additivity
Given the class of Lebesgue measurable sets, we then defined measurable
functions f : Rd → R and developed an integration theory which allowed us to
integrate a much larger class of functions than the class of Riemann integrable
functions.
Our goal in this section is to develop the abstract framework that will allow
us to construct general measure spaces. The above “pedestrian” construction
of the Lebesgue measure on Rd can then be viewed as a particular example of
the abstract construction. More interestingly perhaps, the general construction
allows to construct many interesting measure spaces which appear in probability
and geometric measure theory.
60
Definition 5.2.
1. We say (X, M, µ) is finite if µ (X) < ∞.
∞
2. We say (X, M, µ) is σ-finite if there exists a countable
S∞ collection (Ei )i=1
of sets of finite measure (µ (Ei ) < ∞) such that X = i=1 Ei .
3. We say that (X, M, µ) is complete if the following statement is true:
Given any E ⊂ M with µ (E) = 0, any F ⊂ E is also in M (and has
necessarily µ (F ) = 0).
We give a couple of examples of general measure spaces:
verify this, i.e. check countable additivity using the properties of the in-
tegral (additivity, MCT) proven earlier.] The choice f = 1 leads to the
familiar Lebesgue measure.
M with
A variant of this example is produced by replacing the σ-algebra
the (smaller) σ-algebra of Borel-sets on Rd , denoted M̃. Then Rd , M̃, µ
is also a measure space. However, unlike Rd , M, µ it is not complete.
To see this recall that there are Borel sets of measure zero which contain
sets which are only Lebesgue- but not Borel measurable.10
3. Let X be a non-empty set and M = P (X) be the σ-algebra of all subsets
of X. Fix x0 ∈ X. We can define the measure
1 if x0 ∈ E
µ (E) =
0 if otherwise
strictly increasing function g : [0, 1] → [0, 1] whose image was contained in C. Since g is
monotone it is Borel measurable (Exercise 3 on Example Sheet 3), i.e. it pulls back Borel sets
to Borel sets. Taking N a non-measurable subset of [0, 1] we know that F = g (N ) is Lebesgue
measurable with measure zero because it is a subset of C, which has measure 0. However,
F = g (N ) cannot be Borel measurable because if it was, g −1 (F ) = N would have to be a
Borel set.
61
5.2 Exterior measure and Carathéodory’s theorem
While we already gave a few examples of general measure spaces, a natural
question is how to construct more interesting examples. Here the notion of an
exterior or outer measure is key.
Definition 5.3. Let X be a non-empty set. An exterior measure (or “outer
measure”) µ⋆ on X is a function µ⋆ : P (X) → [0, ∞] defined on all subsets of
X satisfying
1. µ⋆ (∅) = 0.
2. Monotonicity: If E1 ⊂ E2 then µ⋆ (E1 ) ≤ µ⋆ (E2 ).
3. Subadditivity: If E1 , E2 , ... is a countable family of sets, then
∞ ∞
!
[ X
µ⋆ En ≤ µ⋆ (En ) .
n=1 n=1
We will give examples of exterior measures below (see Section ... for the
exterior Hausdorff measure), here we only note that the exterior measure we
defined on sets in Rd in Section 2.3 satisfies the definition.
We have now reached a critical point. The key step to define the Lebesgue
measure from the exterior measure was to give up on measuring all subsets
of Rd and instead define a class of measurable sets on which the measure was
countably additive. However, Definition 2.2 explicitly used the topology of
Rd , i.e. the notion of an open set, which a general measure space does not
come equipped with. Here Carathéodory found a very clever criterion which
works in the general case (and reduces to our old criterion in the Lebesgue case,
cf. Example Sheet 9):
Definition 5.4. A set E ⊂ X is (Carathéodory) measurable if for all sets
A ⊂ X one has
µ⋆ (A) = µ⋆ (E ∩ A) + µ⋆ (E c ∩ A) . (40)
In other words, a measurable set separates any set into two parts which
behave well with respect to the exterior measure. As mentioned, the condition
can be seen to be equivalent to the condition of being Lebesgue measurable in
the case of X = Rd and the exterior measure of Section 2.3 (Example Sheet 9).
Observation 5.1. To show that a set E ⊂ X is measurable, it suffices to check
whether the inequality
µ⋆ (A) ≥ µ⋆ (E ∩ A) + µ⋆ (E c ∩ A)
holds for all A ⊂ X, as the reverse inequality holds by the subadditivity of the
exterior measure.
The observation immediately implies that sets of exterior measure 0 are
measurable since µ⋆ (A) ≥ µ⋆ (E c ∩ A) holds by monotonicity.
Theorem 5.1. Given an exterior measure µ⋆ on a set X, the collection M of
(Carathéodory) measurable sets forms a σ-algebra. Moreover, µ⋆ restricted to
M is a measure.
62
Proof. In view of the symmetry of the condition (40), we clearly have that
E ∈ M implies E c ∈ M. It is also easily checked that ∅ ∈ M and hence
X ∈ M.
Having shown non-emptyness and closure under complements, we note that
it suffices to show that the class M is closed under disjoint countable unions
and that we have countable additivity on M.11 To establish this, we first show
that M is closed under finite unions and finitely additive on M (Step 1) and
then move to the countable disjoint case (Step 2).
Step 1: Let E1 , E2 ∈ M and A ⊂ X be arbitrary. We first use that the
condition (40) holds for E1 and E2 to produce the inequality
The last term can be written as µ⋆ ((E1 ∪ E2 )c ∩ A). For the other three terms
on the right hand side note that
which proves E1 ∪ E2 ∈ M in view of the above observation. Note that this also
implies that E1 ∩ E2 ∈ M since the latter can be written as the complement of
the union of two sets in M. To show that µ⋆ is finitely additive assume that
E1 and E2 are disjoint and observe that
where the first equality follows from the fact that E1 is measurable and the
second equality exploits the assumption that E1 ∩ E2 = ∅.
and try to deal with the first term on the right hand side. We have
able union (how?) and that closure under countable intersection follows using closure under
complements and countable unions via de Morgan’s laws.
63
where the second step exploits the disjointness of the Ej . An easy induction of
the formula (43) yields
n
X
µ⋆ (Gn ∩ A) = µ⋆ (Ej ∩ A) .
j=1
64
Note that ∞
S
k=1 Ek ∈ A in the second item is an assumption unless the union
happens to be finite. Note also that a premeasure in monotone (why?)
Then
1. µ⋆ is an exterior measure on X
2. µ⋆ (E) = µ0 (E) for all E ∈ A
3. All sets in A are (Carathéodory) measurable (i.e. (40) holds)
The above proposition generates an exterior measure µ⋆ from a premeasure
µ0 . We can then apply Carathéodory’s theorem (Theorem 5.1) to construct
from µ⋆ a measure µ on the σ-algebra of Carathéodory measurable sets MC .
Now since by the above Proposition A ⊂ MC , we have that the σ-algebra
generated by A,12 denoted M (A), is contained in MC and hence in particular
µ restricts to a measure on M (A). (Of course MC can be strictly larger than
M (A)!) These considerations therefore establish the following
Theorem 5.2 (Hahn-extension). Let X be a set and µ0 be a premeasure on an
algebra of sets A in X. Denote by M the σ-algebra generated by A. Then there
exists a measure µ on M that extends µ0 .
We make an important remark about S the uniqueness. If the premeasure µ0 is
∞
σ-finite (i.e. if X can be written as X = i=1 Ei for a countable collection (Ei )
of sets in A with µ0 (A) < ∞) then the measure µ whose existence is promised
in the theorem is unique (see Question 4 on Example Sheet 9).
Example 5.2. Combining Theorem 5.2 and Example 5.1 we outline another
construction of the Lebesgue measure on the Borel sets of R (cf. the third example
below Definition 5.2). One starts with the algebra A of intervals in Example 5.1
and defines the premeasure µ0 (I) = |I| on the intervals in A. Since A generates
the Borel σ-algebra on R and since µ0 is σ-finite, Theorem 5.2 generates a
unique measure on the Borel σ-algebra on R. The completion of this measure is
precisely the Lebesgue measure defined on the σ-algebra of Lebesgue measurable
sets. This last step (completion) will be carried out in Question 2 of Sheet 9.
12 Recall this is the smallest σ-algebra containing the sets of A.
65
5.3.2 The proof of Proposition 5.1
To prove the first part note first that µ⋆ is well-defined since we can choose Ej =
X for all j. We also easily see µ⋆ (∅) = 0 and E1 ⊂ E2 implies µ⋆ (E1 ) ≤ µ⋆ (E2 ).
To establish the subadditivity property we repeat the proof of Proposition 2.1.
We fix ǫ > 0 and givenSE1 , E2 , ... in X we choose forPeach Ei a collection
∞ ǫ ∞
S i,j ) in A with Ei ⊂ j=1 Ei,j with µ⋆ (Ei ) + 2i ≥
(E j=1Sµ0 (Ei,j ). Then
i,jSEi,j is a countable
P collection
P of sets in A which covers i Ei and hence
µ⋆ ( i Ei ) ≤ i,j µ0 (Ei,j ) ≤ i µ⋆ (Ei ) + ǫ. Since this holds for any ǫ > 0 we
are done.
To prove the second part (restriction of µ⋆ to A coincides with µ0 ) we suppose
E ∈ A. Clearly µ⋆ (E) S∞≤ µ0 (E) since E covers itself. To prove the reverse
inequality we let E ⊂ j=1 Ej with Ej ∈ A for all j be any covering of E. We
then define the sets
k−1
[
Ek′ = E ∩ Ek \ Ej
j=1
and
S∞ note that the Ek′ are disjoint elements of A, that Ek′ ⊂ Ek and that E =
′
k=1 Ek (check this!). By the countable additivity of the premeasure we then
have
X∞ ∞
X
µ0 (E) = µ0 (Ek′ ) ≤ µ0 (Ek )
k=1 k=1
and taking the infimum over all coverings of E by (Ek ) in A yields the claim as
this turns the right hand side into µ⋆ (E).
To prove the third part (all sets in A are measurable for µ⋆ ) we let A be an
arbitrary subset of X, E ∈ A and ǫ > 0. It suffices to show
ǫ + µ⋆ (A) ≥ µ⋆ (E ∩ A) + µ⋆ (E c ∩ A) . (45)
S∞
To prove this, we find a countable collection E1 , E2 , .... in A with A ⊂ j=1 Ej
and
∞
X
µ0 (Ej ) ≤ µ⋆ (A) + ǫ . (46)
j=1
Taking the limit n → ∞ (note that all terms are increasing in n) we finally find
∞
X ∞
X ∞
X
µ0 (Ej ) = µ0 (E ∩ Ej ) + µ0 (E c ∩ Ej ) ≥ µ⋆ (E ∩ A) + µ⋆ (E c ∩ A) ,
j=1 j=1 j=1
S∞
with the last inequality following since j=1 E ∩ Ej is a countable union of sets
in A which covers E ∩ A. Combining this with (46) yields (45) as desired.
66
5.4 A further example: Hausdorff measure
In this section we present an application of Carathéodory’s construction to con-
struct the α-dimensional Hausdorff measure for sets in Rd . The discussion will
be very informal and should merely illustrate that the abstract construction
that we went through has interesting applications.
The heuristic idea for Hausdorff measure is to measure the α-dimensional
volume of sets in Rd for α < d. For instance a sphere in R3 should have non-
trivial 2-dimensional Hausdorff-measure (namely it’s area) while its Lebesgue
measure is of course zero. Similarly a interval of length 2 on the x-axis in R3
should have 1-dimensional Hausdorff-measure equal to 2 etc.
The key idea to construct a measure with these properties lies in the scaling
properties of a set. Given a subset E ⊂ Rd , suppose that scaling the set E by
n can be written as adjoining m almost disjoint copies of the original set, i.e.
nE = E1 ∪ E2 ∪ ... ∪ Em
where the Ei are disjoint congruent copies of E. For instance, if you scale the
unit interval in R3 on the x-axis by n the resulting set is
n
[
[0, n] × {0} × {0} = [j − 1, j] × {0} × {0}
j=1
so the above holds with m = n. The same example with a rectangle yields
m = n2 . It is intuitively clear that the exponent in the relation m = nα is what
we would call the dimension of the set under consideration.
For a more non-trivial example, consider the Cantor set. It is easy to see
that scaling the Cantor set C by a factor of 3, we obtain to disjoint copies of
the Cantor set, so in this case we have 2 = 3α and it would be tempting to say
that the Cantor set has fractional dimension log 2
log 3 .
We now give the definition which formalises the above considerations. For
any E ⊂ Rd we define the exterior α-dimensional Hausdorff-measure of E as
where
nX ∞ o
α
[
δ
Hα (E) = inf (diamFk ) |E⊂ Fk , diamFk ≤ δ for all k.
k k=1
Here the diameter of a set A is defined as diamA = sup{|x − y|, x, y ∈ A}. Note
δ
that Hα (E) is well defined because countably many balls of diameter δ cover
all of Rd . Note also that as δ decreases, Hα
δ
(E) increases because we are taking
the infimum over fewer sets (the elements Fk in the covering are restricted to
be smaller in diameter). Hence the limit is actually defined.
We remark that the coverings by Fk in the definition cannot be replaced
by coverings of ball of diameter smaller than δ. This would yield a different
quantity. This makes the Hausdorff-measure of a set hard to compute in general.
One can check that m⋆α is monotone and sub-additive and hence indeed
an exterior measure. It moreover satisfies that if the distance of two sets E1
and E2 is strictly positive then we have m⋆α (E1 ∩ E2 ) = m⋆α (E1 ) + m⋆α (E2 ),
67
i.e. additivity (this makes m⋆α a so-called metric exterior measure). The first
two facts allow us to apply Carathéodory’s theorem (Theorem 5.1) to construct
from m⋆α a measure mα on the σ-algebra of Carathéodory measurable sets. The
third fact allows one to prove that the σ-algebra of Carathéodory measurable
sets contains the closed subsets and hence in particular the σ-algebra of Borel
sets. The measure mα restricted to the Borel sets is commonly known as the
α-dimensional Hausdorff measure on Rd . More on this in Stein-Shakarchi.
The approximation theorems of Section 2.8 continue to hold true. This actually
needs the σ-finite condition.13
Egorov’s theorem remains true (check this!).
The integral can be define via the same four stage procedure that we carried
out for the Lebesgue integral leading to
Z
f (x)dµ(x)
X
be defined on a set of finite measure (a cube in the Lebesgue case) which in the limit exhausts
Rd .
68
5.6 Construction of product measures
We finally discuss product measures. The idea is the following. Given two
measure spaces (X, M, µ) and (Y, N , ν) we would like to construct a σ-algebra
“M ⊗ N ” of subsets of the Cartesian product X × Y and a product measure
“µ × ν” on M ⊗ N .
Why could such a thing be useful? On the one hand, the construction below
will provide another way to construct the Lebesgue measure on R2 = R × R
(and more generally, Rn ) from the Lebesgue measure on R.14 On the other
hand, think of an application in probability: Given a measure space X = {h, t}
representing a head-tail-experiment with a measure on P (X) determined by
µ (h) = µ (t) = 1/2, we would like to consider n experiments (or perhaps even
infinitely many), i.e. the space X × X × ... × X equipped with a corresponding
product measure.
Given the setting in the first paragraph, we consider the algebra M ⊠ N of
finite disjoint unions of rectangles M × N ⊂ X × Y with M ∈ M and N ∈ N .
Exercise 5.2. Check that M ⊠ N is indeed an algebra.
Hint: Use Problem 3 of Sheet 9.
We let M ⊗ N denote the σ-algebra generated by M ⊠ N . A particular
example is given by the Borel-algebras M = BR and N = BR for which M⊗N =
BR2 (Exercise).
We now define µ0 : M ⊠ N → [0, ∞] by
[N XN
µ0 (Mj × Nj ) = µ (Mj ) ν (Nj )
j=1 j=1
Note that this statement implies that if a finite disjoint union of rectangles is
a countable union of disjoint rectangles, then additivity holds (which is what
we actually need to prove as any element of M ⊠ N is a finite disjoint union of
rectangles).
To prove (47) we first note that
∞
X ∞
X
χM (x) χN (y) = χM×N (x, y) = χMj ×Nj (x, y) = χMj × χNj (y)
j=1 j=1
14 Recall that Example 5.2 provided an outline for abstractly constructing Lebesgue measure
69
and then integrate with respect to x to obtain – using the MCT – the identity
∞
X
µ (M ) χN (y) = µ (Mj ) χNj (y) .
j=1
Integrating again, this time in y and using once more the MCT we obtain the
desired (47).
Given Lemma 5.1, we can apply the Hahn extension theorem, Theorem 5.2
above to obtain a (unique if µ and ν are both σ-finite – why?) product measure
on M ⊗ N which extends µ0 , which we denote by µ × ν. In the case of Lebesgue
measure on M = N = BR one checks that the product is the Lebesgue measure
on BR2 that we defined via rectangles.
between the iterated integrals and the integral with respect to the product
measure holds (and whether the left hand side actually makes sense). This will
be the general Fubini theorem for product measures.
To state it, we make the familiar (from the Lebesgue case, cf. Section 3.8.1)
definitions of slices (also called sections): We define
• For E ⊂ X × Y a subset we define the slices/ sections of E
70
1. (Tonelli, i.e. assuming f measurable non-negative)
If f : X × Y → [0, ∞] is measurable, then the functions
Z Z
g(x) = fx dν and h(y) = f y dµ (48)
Y X
holds.
2. (Fubini, i.e. assuming f integrable)
If f : X × Y → R ∈ L1 (X × Y, µ × ν)), then fx ∈ L1 (Y, µ) for a.e. x,
f y ∈ L1 (X, ν) for a.e. y. Moreover, the functions (48) are in L1 (X, µ)
and L1 (Y, ν) respectively and the formula (49) holds.
We won’t prove the general Fubini-Tonelli theorem here since we already
went through the proof in the Lebesgue case. You can however easily deduce
the second statement from the first.
A nice application of the general Fubini-Tonelli theorem is given in Question
6 on Example sheet 9.
71
6 The change of variables formula
In this section we prove the famous change of variables formula. The proof
exhibits nicely many of the measure theoretic tools that we developed.
Theorem 6.1. Let U ⊂ Rn be open, ϕ : U → V be a C 1 diffeomorphism15 with
an open set V ⊂ Rn . Then
1. f : V → R is integrable if and only if the function (f ◦ ϕ) | det Dϕ| : U → R
is integrable
2. The following change of variables formula holds:
Z Z
f (y) dy = (f ◦ ϕ)(x)| det Dϕ(x)|dx . (50)
V=ϕ(U ) U
over R2 .16 One way to compute the integral is to go to polar coordinates (x1 , x2 )
(which you should think of as x1 = r and x2 = φ)
Now ϕ maps (0, ∞) × [0, 2π] → R2 \ {0} but not diffeomorphically. However, ϕ
is clearly a diffeomorphism of two open subsets of R2 when restricted to a map
U := (0, ∞) × (0, 2π) → V := R2 \ {(y1 ≥ 0, y2 = 0)}. Its differential is
cos x2 −x1 sin x2
Dϕ (x1 , x2 ) = ,
sin x2 x1 cos x2
which has Jacobi determinant |Dϕ (x1 , x2 ) | = x1 > 0 on the domain considered.
Note that V = ϕ (U) differs from R2 by a set of measure zero, so the left hand
side of (50) is precisely the integral we want to compute, namely
Z Z
2 2 2 2
e−(y1 ) −(y2 ) dy1 dy2 = e−(y1 ) −(y2 ) dy1 dy2 . (51)
R2 V
2
For the right hand side of (50) we note that f ◦ ϕ (x1 , x2 ) = e−(x1 ) and hence
Z Z ∞ Z 2π
−(x1 )2 −(x1 )2
e x1 dx1 dx2 = dx1 e x1 dx2 = π ,
U 0 0
with the first step following from Fubini and the last step from a simple integra-
tion by substitution. We conclude that the desired integral (51) has the value
R∞ 2 √
π and hence, as a corollary, that −∞ dy e−y = π.
15 Recallthat this means that ϕ is C 1 , bijective and that its inverse is also a C 1 map.
R 2
2
16 By ∞
Fubini we know that the result is −∞
e−y dy and hence in particular that f is
integrable but it is not immediate how to compute the one-dimensional integral!
72
6.2 A reformulation of Theorem 6.1
We next give an equivalent formulation of Theorem 6.1:
Theorem 6.2. Let U ⊂ Rn be open, ϕ : U → V be a C 1 diffeomorphism with
an open set V ⊂ Rn . Then we have for any measurable set A in U
Z
µn (ϕ (A)) = | det Dϕ (x) |dµn (52)
A
A 7→ µ (ϕ (A)) (53)
and
Z
A 7→ | det Dϕ (x) |dx (54)
A
73
To see this, assume the local statement was proven. Cover U with such
open neighbourhoods Wx in which the identity holds. Now because Rn has a
countable basis of its topology we can
S∞ select a countable subcover (Wi ) (why?).
Given this countable subcover U ⊂ i=1 Wi , let A be an arbitrary measurable
Si−1
set in U. Define W̃i = A ∩ Wi \ j=1 Wi . Then the W̃i are pairwise disjoint
and their union is A. Since the identity of measures (52) holds for any W̃i and
is countably additive, it holds for A itself.
hence verifying µ (ϕ (A)) = A |ϕ′ (x)|dx when A is an interval. This holds for
R
any finite interval (not necessarily closed) and by countable additivity of the
identity (52), the latter also holds for intervals of infinite length. Finite disjoint
unions of intervals in U form an algebra A of sets in U. The two measures (53)
and (54) agree on A hence define a premeasure on A which is moreover σ-finite
(since one can write U as a countable disjoint union of intervals of finite length).
Therefore, since both (53) and (54) extend the same σ-finite premeasure they
must agree on the extension (cf. Q4 of Sheet 9).
where the first step follows from Theorem 6.2 holding for ρ, the second from
Theorem 6.2 (hence Theorem 6.1) holding for ψ and the third from the chain
rule and the properties of the determinant.
74
∂ϕi
some i ∈ {1, 2, ..., n} (in view Jacobian ∂xj having full rank) we can permute
∂ϕ1
the coordinates xi to achieve ∂x1(p) 6= 0 (and use Steps 2+4).
We next claim that wlog we can even assume that ϕ keeps the first coordinate
fixed, i.e. that ϕ has the form ϕ : (t, x) = (t, ϕt (x)). (Note that with this the
map ϕt : Ut := U ∩ {x1 = t} → {t} × Rn−1 is again a diffeomorphism in view of
1 0 ... 0
?
Dϕ (t, x) =
?
Dϕt
?
and det Dϕ = det Dϕt .) To verify the claim, suppose one has established (52)
for such ϕ. Then, by Step 2 one has also shown it for any ϕ which keeps one
of the coordinates (not necessarily the first) fixed. Moreover, one can write
a general ϕ near p as the composition of two diffeomorphisms each of which
fixes at least one coordinate: Indeed, given general ϕ with ∂ϕ ∂x1 (p) 6= 0, let
1
ϕ
U V
We have ϕ = ρ−1 ◦ ψ and since both ρ and ψ fix at least one coordinate, Steps
2 and 4 imply (52) for general ϕ.
We finally prove the result for ϕ of the form ϕ : (t, x) = (t, ϕt (x)) using the
induction assumption: First, by Fubini, we have
Z Z
µn (ϕ (A)) = χϕ(A) dtdy2 ...dyn = dtµn−1 ((ϕ(A))t ) . (56)
Rn R
and then use the induction assumption and Fubini again (together with | det Dϕ| =
| det Dϕt |) to conclude
Z Z Z
µn (ϕ (A)) = dtµn−1 (ϕt (At )) = dt | det Dϕt |dµn−1
R R At
Z Z
= dt χAt | det Dϕt |dx2 ...dxn = | det Dϕ|dµn . (57)
Rn A
75
7 Mastery Material: Lp -spaces
The material is this section is relevant only for the Mastery Question. The main
points here are the Hölder and the Minkowski inequality, which you should know
and be able to apply.
I leave some gaps in the proofs below which you should fill in on your own.
If you need help, a good reference is provided by the first three pages of Section
6 in Folland’s book (cf. Section 1.4).
7.1 Definition
Let (X, M, µ) be a measure space and f a (say real-valued) measurable function
on X. For 1 ≤ p < ∞ we define
Z 1/p
kf kLp := |f |p dµ(x)
and
Lp (X, M, µ) = {f : X → R | f is measurable and kf kLp < ∞ } ,
the space of measurable functions whose Lp -norm is finite. We sometimes write
Lp (X, µ), or simply Lp (X), or even just Lp for Lp (X, M, µ) to simplify the
notation provided no confusion arises. If one identifies two functions which are
equal almost everywhere, the space Lp (X) can be shown to be a complete vector
space by adapting the proof we gave for L1 (Rn ) in Section 3.6, Theorem 3.5.
In the following we fix a σ-finite measure space (X, M, µ)and write Lp for
Lp (X, M, µ) below.
Step 3. Set A = |f (x)|p and B = |g(x)|q and θ = 1p , apply the inequality from
Step 1 and integrate it to obtain kf gkL1 ≤ 1 as desired.
76
7.3 Minkowski’s inequality
Theorem 7.2. Let 1 ≤ p < ∞ and f, g ∈ Lp . Then f + g ∈ Lp with the
inequality
kf + gkLp ≤ kf kLp + kgkLp .
Proof. The case p = 1 is easy (why?) so we let p > 1. To verify that f + g ∈ Lp
we first note that (why?)
77