0% found this document useful (0 votes)
14 views

1 ProbTools

Uploaded by

Luca Rampoldi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

1 ProbTools

Uploaded by

Luca Rampoldi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Some results from measure theory and

measure-theoretic probability

Bent Nielsen

University of Oxford

Michaelmas 2024
Part 1

Comments welcome — do let me know if you spot any mistakes

Revised version of slides written by Anders Kock 2022

1/89
Introduction

The present slide set contains some basic definitions and


results from measure theory and measure-theoretic probability.
Since we are going to use many of these results repeatedly, I
gather them here such that we can refer to them later.
We are not going to prove any/many results in this slide set.
See slides for the Advanced Mathematics course in TT on the
website for proofs of (almost all) results presented in the
present slide deck. Those slides also contain further results.
Hence, I do not expect you to be able to do so.
Rather the expectation is that we can apply the results and
that you have a place to look up what you need.
Some textbooks that provide a (much) more
detailed/complete treatment are mentioned next.

2/89
Some textbooks
Some examples of textbooks on measure-theoretic probability:
Resnick (2019) and Shiryaev (1996): Many examples, focus
on probability, good for self-study.
Kolmogorov and Fomin (1970) and Dudley (2018):
Comprehensive book on real analysis and probability.
The books Lehmann and Casella (1998) and Lehmann and
Romano (2005) also contain crash-course style material like
the one in these slides with particular emphasis on
applications in statistics and econometrics.
Liese and Miescke (2008): The appendix has a collection of
results. We shall frequently use the notation and presentation
style from this book.
The recent book Axler (2021) has a lot of intuition and many
examples of why the theory is the way it is [why σ-algebras,
why Lebesgue integration and not Riemann?]. It may be
interesting for self-study. Available for free on Axler’s website.
3/89
Why the measure-theoretic approach?

The measure-theoretic approach allows us to treat discrete,


continuous and distributions that are neither of these in a
unified way.
Sequences of measurable functions (and random variables) are
closed under pointwise limits.
This is in contrast to continuous functions underlying much of
Riemann-integration.
We can interchange pointwise limits and integration under
rather general conditions. This is again in contrast to
Riemann-integration.
You will find it used in many papers (also outside
econometrics) so it useful not be be “scared” when
encountering σ-algebras, Lebesgue measures, the Lebesgue
integral and terminology like “almost everywhere” and
“almost surely”!
4/89
Plan

Introduce
1 σ-algebras.
2 Measures and their properties.
3 Measurable functions.
4 The Lebesgue integral.
Linearity, monotonicity, dealing with null sets.
The Monotone and Dominated Convergence Theorems.
Markov, Hölder, Jensen, Minkowski inequalities.
Induced measures and the substitution rule.
Tonelli’s Theorem.
5 Densities and conditioning.
The primary examples will be probability spaces and random
variables.

5/89
σ-algebras

Sets.
Definition of σ-algebras.
Generated σ-algebras.
Borel sets on R.

6/89
Sets1

Basic definitions.
A = {a, b, c, . . . } is a set with elements a, b, c, . . . .
A ⊂ B, that is A is a subset of B, if every element in A
belongs to B.
A = B ⇔ A ⊂ B and B ⊂ A.
∅ is the empty set, which contains no elements.
Operations on sets.
A ∪ B and A ∩ B are the union / intersection of A and B,
consisting of all elements that are in A or/and B.
∪α Aα and ∩α Aα are unions and intersections. The index sets
can be finite or infinite in the countable or uncountable sense.
A, B are disjoint if A ∩ B = ∅.
Consider a subset A of a basic set R. Then, the complement
Ac is the set of elements of R that are not in A.
1
Kolmogorov and Fomin (1970, p. 1–4) 7/89
σ-algebra

Consider a non-empty set X .


Definition 1 (σ-algebra)

A collection of subsets A of X is called a σ-algebra in X if the


following conditions are met:
(σ1) X ∈ A.
(σ2) For all A ∈ A it also holds that Ac ∈ A.
S
(σ3) For any sequence of sets (An )n∈N in A, then n∈N An ∈ A.

The sets in A are called the measurable sets.


The terminology “σ-algebra” comes from the fact that a
countable number of operations take place in (σ3).
The term σ-field is also frequently used instead of σ-algebra.

8/89
Examples of σ-algebras

The systems

{∅, X } , P(X ) = {A : A ⊆ X } and {∅, A, Ac , X }

are all σ-algebras.

{∅, X } and P(X ) are the smallest and largest σ-algebras in X ,


respectively: Every other σ-algebra must contain the elements
of {∅, X }; and the elements of every other σ-algebra are contained
in P(X ).

9/89
Some further stability properties of σ-algebras

The following lemma tells us that within a σ-algebra A one can


work freely with the usual set operations without “falling out” out
of A as long as countably many set operations are involved.
Lemma 2 (Some further stability properties of σ-algebras)
If A is a σ-algebra in X , then
1 ∅ ∈ A.
2 If A1 , . . . , AN is a finite collection of sets in A,
then ∪N n=1 An ∈ A.
3 If A, B ∈ A, then A ∩ B ∈ A.
4 If A, B ∈ A, then A \ B ∈ A.
5 If (An )n∈N is a sequence of sets in A, then ∩n∈N An ∈ A.

(1) and (2) are trivial. (3) follows from A ∩ B = (Ac ∪ B c )c ∈ A.


(4) follows from A \ B = A ∩ B c and (3).
(5) follows from ∩n∈N An = (∪n∈N Acn )c ∈ A.
10/89
Intersection of σ-algebras

You will show the following, which plays an important role in


asserting the existence of “generated” σ-algebras, in the exercises.
Theorem 3 (Intersection of σ-algebras)

Let (Ai )i∈I be an (arbitrary) family of σ-algebras in X . Then also


the system
\ 
Ai = A ⊆ X : A ∈ Ai for all i ∈ I
i∈I

is a σ-algebra in X .

Note, ”an arbitrary family” can have finite, countable or


non-countable index-set I .

11/89
Generated σ-algebras

Let a, b ∈ R such that a < b.


Because ∞ 1 1
T
n=1 (a − n , b + n ) = [a, b] the family of open
intervals Dopen in R does not form a σ-algebra.
Because ∞ 1 1
S
n=1 [a + n , b − n ] = (a, b) the family of closed
intervals Dclosed in R does not form a σ-algebra.

Given this observation, one can ask which subsets of R we must


add to a family of sets D in order to obtain a σ-algebra.
More generally, for any family of sets D in a set X , one can ask if
there exists a (smallest) σ-algebra containing it.
The answer turns out to be yes!

12/89
Generated σ-algebra and Borel σ-algebra

Let D be a family of subsets in X .


It is not difficult to see that the smallest σ-algebra
containing D is the intersection of all σ-algebras containing D
as in Theorem 3 [observe that D is always contained in P(X )].
This σ-algebra is denoted the σ-algebra generated by D and
written σ(D).

Definition 4 (Borel σ-algebra in Rd )

The Borel σ-algebra in Rd is the σ-algebra generated by the


system of closed half-planes. It is denoted by B(Rd ), that is

B(Rd ) = σ {(−∞, b1 ] × · · · × (−∞, bd ] for b1 , . . . bd ∈ R



.

The elements of B(Rd ) are called Borel sets.


Definition 4 is common in probability texts. In mathematics, this definition
would be an example of a Borel σ-algebra.
13/89
Borel sets on R

Let a ≤ b.
Examples of Borel sets:
Half-open intervals: (a, b] = (−∞, b]\(−∞, a].
Open half-lines: (−∞, b) = ∪n∈N (−∞, b − 1/n].
Closed intervals: [a, b] = (−∞, b]\(−∞, a).
Open intervals: (a, b) = (−∞, b)\(−∞, a].
The above sets could also be used as based for the Borel σ-algebra.
Further examples of Borel sets:
Points: {a} = [a, a].

One can construct a (complicated) set that is not Borel.


The construction requires the axiom of choice.
(Kolmogorov and Fomin, 1970, Section 3.7, Problem 26.7).

14/89
Measures and their properties

Measures.
Examples.
Properties.

15/89
Measures

The next topic is measures.


Measures give the “size” of sets.
Measures are defined on measurable spaces, which we define
next.

Definition 5 (Measurable space)


A measurable space is a pair (X , A), where X is a non-empty set
and A is a σ-algebra in X .

Example: (Rd , B(Rd )) is a measurable space.

16/89
Definition 6
Let (X , A) be a measurable space. A measure µ on (X , A) is a
mapping µ : A → [0, ∞] satisfying the following two conditions.
1 µ(∅) = 0.
2 µ is countably additive, that is for every sequence (An )n∈N of
disjoint subsets in A it holds that
[  X
µ An = µ(An ). (1)
n∈N n∈N

If µ is a measure on (X , A), the triple (X , A, µ) is called a


measure space.

Note that
S the left-hand side in (1) is well-defined
since n∈N An ∈ A (by σ3) and the right-hand side is
well-defined as it is a sum of non-negative terms.

17/89
Some terminology

If µ(X ) = 1, then we say that µ is a probability measure.


Thus, probability measures are just special measures. In this
case, we often write P instead of µ.
If µ(X ) < ∞ we say that µ is finite.
If there exists a sequence
S (An )n∈N in A such that µ(An ) < ∞
for all n ∈ N and n∈N An = X , then we say that µ is σ-finite.

18/89
Examples of measures

P = N(0, 1) is a probability measure on (R, B(R)).


The Lebesgue measure λd on (Rd , B(Rd )) is the measure that
assigns “volumes to cubes” in the sense of

λd [a1 , b1 ] × . . . × [ad , bd ] = (b1 − a1 ) · . . . · (bd − ad )

for all a1 , b1 , . . . , ad , bd ∈ R with aS


i < bi , i = 1, . . . , d. The
Lebesgue measure  is σ-finite since n∈N [−n, n]d = Rd
d
and λd [−n, n] = (2n) < ∞. d

Consider the measurable space (X , A). For an element a ∈ X ,


we define the Dirac measure δa as
(
0 if a ∈ Ac
δa (A) =
1 if a ∈ A.

19/89
Counting measure

Counting measure: Consider the measurable space (X , A). The


counting measure τ is then defined as

τ (A) = number of elements in A.

That is, τ (A) < ∞ if and only if A has finitely many elements.

20/89
Fundamental properties of measures

Theorem 7
Let (X , A, µ) be a measure space.
1 µ is finitely additive, that is if A1 , .. . , A
PNNis a finite collection of
disjoint sets in A, then µ ∪N n=1 A n = n=1 µ(An ).
2 If A, B ∈ A and A ⊆ B, then µ(A) ≤ µ(B).
3 If A, B ∈ A, A ⊆ B and µ(A) < ∞, then µ(B \ A) = µ(B) − µ(A).
4 For any sequence of sets (An )n∈N in A it holds that
[  X
µ An ≤ µ(An ).
n∈N n∈N

5 Let (An )n∈N be a sequence of sets in A satisfying A1 ⊆ A2 ⊆ . . ..


Then, µ(∪n∈N An ) = limn→∞ µ(An ).
6 Let (An )n∈N be a sequence of sets in A satisfying A1 ⊇ A2 ⊇ . . .
and µ(A1 ) < ∞. Then, µ(∩n∈N An ) = limn→∞ µ(An ).
21/89
Comments

The properties of measures are as one would hope/expect.


We shall primarily be dealing with probability measures.
Clearly, the finiteness requirements in parts (3) and (6) of
Theorem 7 are satisfied in this case.
However, these finiteness requirements can not be dispensed
of!
To see this in (6) consider the counting measure τ
on (N, P(N)). With An = {n, n + 1, . . .}, n ∈ N, it holds that

τ ∩n∈N An = τ (∅) = 0 but lim τ (An ) = lim ∞ = ∞.
n→∞ n→∞

(4) of Theorem 7 is also referred to as the union bound or


Boole’s inequality.

22/89
Measurable functions

Functions & continuity.


Measurable functions.
Probability spaces and random variables.
Stability properties when taking limits.

23/89
Functions.2
Functions.
Let X and Y be arbitrary sets. A rule associating a unique
element y = f (x) ∈ Y with each element x ∈ X is said to be
a function f on X .
X is the domain of f .
Y = {f (x) for x ∈ X } is the range of f .
If X ̸⊂ R then f is sometimes said to be a mapping. Then f
maps X into Y .
Images and preimages.
If x ∈ X then y = f (x) is the image of x.
Every x ∈ X with y ∈ Y as its image is called a preimage of y .
The set of x ∈ X whose images belong to a set B ⊂ Y is the
preimage of B, denoted f −1 (B).
If no y ∈ B has a preimage, then f −1 (B) = ∅.
Example: Define the function f : R → R via f (x) = x 2 . Then
f −1 ([1, 4]) = x ∈ R : x 2 ∈ [1, 4] = [−2, −1] ∪ [1, 2].


2
Kolmogorov and Fomin (1970, p. 4-6, 44, 87) 24/89
Some results for images and preimages.
f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B).
f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B).
f (A ∪ B) = f (A) ∪ f (B).
In general, f (A ∩ B) ̸= f (A) ∩ f (B).
Continuity.
Let the real function f map X ⊂ R into Y ⊂ R.
f is continuous at the point x0 if, ∀ϵ > 0, ∃δ > 0 such that
|f (x) − f (x0 )| < ϵ whenever |x − x0 | < δ.
f is continuous on X if it is continuous at all x0 ∈ X .
Theorem: f is continuous on X if and only if the preimage
f −1 (B) of any open set B ⊂ Y is open (in X ).
These considerations generalize to metric spaces and to
topological spaces.

25/89
Measurable functions

We shall now introduce measurable functions.


We shall link measurability and continuity of functions.

Definition 8 (Measurable function)


Let (X , A) and (Y, B) be measurable spaces and consider the
mapping f : X → Y. We say that f is measurable (or more
precisely A-B-measurable) if

f −1 (B) ∈ A for all B ∈ B.

The larger A is, the easier it is to be measurable.


The larger B is, the harder it is to be measurable.

26/89
Probability spaces and random variables
Let us put what we have introduced so far into a probabilistic
context.
Definition 9 (Probability space)
A probability space is a triple (Ω, F, P), where Ω is a non-empty
set, F is a σ-algebra in Ω and P is a probability measure on F.
The sets in F are called events and for A ∈ F we
interpret P(A) ∈ [0, 1] as the probability of the event A occurring.

Definition 10 (Random variable)


Let (Ω, F, P) be a probability space and consider the measurable
space (R, B(R)). A random variable is mapping X : Ω → R that
is F-B(R)-measurable.

Thus, a random variable X is just a measurable function defined


on a probability space mapping into R. Random
vectors X : Ω → Rd are defined analogously.
27/89
Probability measures and distribution functions

Consider the probability space (R, B(R), P).

Definition 4 shows B(R) = σ((−∞, b] for b ∈ R).


This implies that a distribution function FP exists so that

FP (b) = P(−∞, b].

Two results can be proved.


The function FP generates exactly one probability measure on
(R, B(R)), namely P.
Any distribution function F generates exactly one probability
measure on (R, B(R)).
These results generalize to Rd .

28/89
Continuity implies measurability

The Euclidean norm is ∥x∥ = ( di=1 xi2 )1/2 for x ∈ Rd .


P

Theorem 11 (Continuity implies measurability)


Consider Rd and RK with Euclidean norms. Then every
continuous mapping f : Rd → Rk is B(R d )-B(R k )-measurable.

In words: Every continuous function is Borel measurable.

29/89
Non-continuous measurable functions

Although all continuous functions are measurable, not all


measurable functions are continuous.
In this sense, measurability is a property that more functions
have than continuity.

Let (X , A) be a measurable space and define for every A ∈ A the


indicator function 1A : X → R as
(
1 if x ∈ A,
1A (x) =
0 if x ∈ Ac .

30/89
Let A ∈ A. Clearly, for any B ⊆ R



 X if 0, 1 ∈ B,
if 1 ∈ B and 0 ∈ B c ,

A
1−1
A (B) =
Ac
 if 0 ∈ B and 1 ∈ B c ,

if 0, 1 ∈ B c .

∅

Thus, 1A is A-F-measurable irrespective of which σ-algebra R is


equipped with (e.g. F = B(R)).
Even the discontinuous indicator function is measurable!

In contrast, the function 1A for A ̸∈ A is non-measurable.

31/89
Stability properties of measurable functions
Let M(A) = {f : X → R : f is A-B(R)-measurable}.
In the exercises you will show:
Theorem 12 (Stability properties of measurable functions)
1 If f1 , . . . , fd : X → R are elements of M(A) and φ : Rd → R
is B(Rd )-B(R)-measurable, then

φ(f1 , . . . , fd ) : X → R

defined via x 7→ φ(f1 (x), . . . , fd (x)) is an element of M(A).


2 If f , g ∈ M(A), and c ∈ R then the functions

cf , f + g, f · g, f ∧ g, f ∨g

are elements of M(A). In particular, M(A) is a vector space.

Thus, measurability is preserved under transformations.


32/89
Measurability and limits

One frequently deals with sequences of measurable functions.


We shall see that the pointwise limit of a sequence of
functions in M(A) is also an element of M(A), provided that
the pointwise limits are elements of R for all x ∈ X .
This is very useful as the Lebesgue integral is only defined for
measurable functions.
Recall that continuity is not generally maintained under
pointwise limits, cf. fn : [0, 1] → [0, 1] where fn (x) = x n .
Then fn (x) → 1{1} (x), which is discontinuous at 1.

33/89
Let R = {−∞} ∪ R ∪ {∞} be the extended real line. Define

M(A) = {f : X → R : f is A-B(R)-measurable}
M(A)+ = {f ∈ M(A) : f (x) ≥ 0 for all x ∈ X }

This is useful since, e.g., supn∈N fn (x) could be ∞ for some x ∈ X


even if fn (x) ∈ R for all n ∈ N.
Theorem 13 (Limits and measurability)
1 Let (X , A) be a measurable space and let (fn )n∈N be a sequence of
functions in M(A). Then the functions

inf fn , sup fn , lim inf fn and lim sup fn


n∈N n∈N n→∞ n→∞

are also elements of M(A). Finally, if fn is pointwise convergent,


that is if f (x) := limn→∞ fn (x) exists in R for all x ∈ X , then also
the limiting function f ∈ M(A).
2 If (fn )n∈N is a sequence of functions in M(A) and
f (x) := limn→∞ fn (x) exists in R for all x ∈ X then also f ∈ M(A).
34/89
Big picture

Don’t get too bogged down in measurability details.


This is not essential for us, though I shall occasionally point
out measurability issues, e.g. in connection with extremum
estimation (measurable selection).
The most important thing is that we can properly apply the,
luckily intuitive, results for Lebesgue integration to come next.

35/89
Lebesgue integrals

Definitions.
Linearity.
Monotone Convergence Theorem.
Dominated Convergence Theorem.
Inequalities.
Induced measure and substitution.
Product measures and double integrals.

36/89
The Lebesgue integral

We shall now introduce the Lebesgue integral and study its main
properties. It has several advantages over the Riemann integral.
1 It is defined for a broader class of functions.
2 It is much more stable under pointwise limits of sequences of
functions. That is, pointwise limits and integration can often
be interchanged. For the Riemann integral we typically need
uniform convergence.
3 The Lebesgue integral is easily defined for functions on an
arbitrary measure space (X , A, µ). The Riemann integral is
defined for functions on R. This is important in probability
theory where the random variables are defined on a probability
space (Ω, F, P) and expected values are Lebesgue integrals
Z
E X = X (ω)P(dω)

37/89
The Lebesgue integral
Given a measure space (X , A, µ), ai ∈ R
and Ai ∈ A, i = 1, . . . , n we call s : X → R defined via
n
X
s(x) = ai 1Ai (x)
i=1

a simple function.
Denote by SM(A) and SM(A)+ , respectively, the set of
simple and non-negative simple functions.
For s ∈ SM(A)+ one defines
Z Xn
sdµ = ai µ(Ai ) ∈ [0, ∞],
i=1

which is well-defined since Ai ∈ A and ai ≥ 0.


In particular, it is important that
Z
1dµ = µ(A), A ∈ A.
A
38/89
The Lebesgue integral on M(A)+
Definition 14
Let (X , A, µ) be a measure space and f : XR → [0, ∞] a function
in M(A)+ . We then define the µ-integral fdµ of f via
Z nZ o
fdµ := sup sdµ : s ∈ SM(A)+ , s ≤ f ∈ [0, ∞].

It is often useful that for f ∈ M(A)+ there exists a sequence of


non-negative simple functions sn such that sn (x) ↑ f (x) for
all x ∈ X and that
Z Z
fdµ = lim sn dµ.
n→∞

In particular, as n → ∞, the non-negative simple functions


n2 n
X j − 1
sn (x) := 1f −1 ([ j−1 , j )
(x) + n1f −1 ([n,∞]) (x) ↑ f (x).
2n 2n 2n
j=1
39/89
Integrating general functions

We have defined the µ-integral for f ∈ M(A)+ .


To integrate f ∈ M(A), that is functions that need not be
non-negative, define

f + (x) = max(f (x), 0) and f − (x) = − min(f (x), 0).

Observe that f + and f − are non-negative and

f =f+−f− and |f | = f + + f − .

Furthermore, by an argument based on part 2 of Theorem 12


+
it follows that f + and f − are elements of M (A) [for which
we have defined the integral].3

3
Theorem 12 is for elements of M(A) rather than M(A) 40/89
Definition 15 (L(µ) and L1 (µ))
For a measure space (X , A, µ) we define
n Z Z o
L(µ) := f ∈ M(A) : f + dµ ∧ f − dµ < ∞
n Z Z o
L1 (µ) := f ∈ M(A) : f + dµ ∨ f − dµ < ∞

Observe that L1 (µ) consists of functions taking real values only.

Definition 16 (The general integral)


For a function f ∈ L(µ) we define
Z Z Z
fdµ := f dµ − f − dµ
+

Observe that since f ∈ L(µ), we never run into “∞ − ∞” issues.

41/89
µ-a.e. and “almost surely”

It turns that the µ-integral “does not care about null sets”.
If f , g ∈ L(µ) and µ(f =
̸ g ) = 0, then
Z Z
fdµ = gdµ.

Definition 17
A subset N of X is called a µ-null set if there exists an A ∈ A such
that

N ⊆ A and µ(A) = 0.

Remark: Consider the null-sets for the Lebesgue measure on R.


A σ-algebra can be generated from those and the Borel-algebra.
The Lebesgue measure can be extended to that σ-algebra.
Even so, there exists sets in R that are not in that σ-algebra.
Those sets are said to be non-measurable. 42/89
Examples of λ1 -null sets

Consider the measure space (R, B(R), λ1 ). As mentioned


before, the singleton {x} is a Borel set: For x ∈ R

\
x − n−1 , x + n−1 ∈ B(R),
 
{x} =
n=1

such that by part (6) of Theorem 7

λ1 ({x}) = lim λ1 x − n−1 , x + n−1 = lim 2/n = 0.


 
n→∞ n→∞

The set of rational numbers Q can be formed by countable


unions of singletons. Thus,PQ is a Borel set and
λ1 (Q) = λ1 (∪x∈Q {x}) = x∈Q λ1 ({x}) = 0.

43/89
Consider (X , A, µ). We say that a property holds for µ-almost
all x ∈ X if the property holds for all x ∈ X \N where
N is a µ-null set. [common in mathematics]
N ∈ A and µ(N) = 0. [common in probability]
We also say the property holds µ-almost everywhere (a.e.).
In probability, we say almost surely (a.s.) or with probability
one.
Examples:
If µ(x ∈ X : f (x) ̸= g (x)) = 0 we say that f = g µ-almost
everywhere, or for µ-almost every x or µ-a.e.
If µ(x ∈ X : limn→∞ fn (x) does not exist) = 0, we say that fn
converges for µ-almost every x.

44/89
Linearity over L1 (µ)

Prior to stating linearity of the integral, let us note


R that the
following expression are used interchangeably for fdµ
Z Z Z Z
fdµ, f (x)µ(dx), f (x)µ(dx), f (x)dµ(x).
X X

45/89
Theorem 18 (Linearity and other properties of the integral)

Let (X , A, µ) be a measure space, let f , g ∈ L1 (µ) and a ∈ R. Then,


R R
1 afdµ = a fdµ.
R R R
2 (f + g )dµ = fdµ + gdµ.
R R
3 If f ≤ g µ-a.e. then fdµ ≤ gdµ and
Z Z
fdµ = gdµ ⇐⇒ f = g µ-a.e.

4 If f ≥ 0 µ-a.e. then
Z
fdµ = 0 ⇐⇒ f = 0 µ-a.e.

and
Z
fdµ < ∞ =⇒ f < ∞ µ-a.e.

R R
5 | fdµ| ≤ |f |dµ.

Observe that we do not run into ∞ − ∞ issues in calculating f + g


in (2) since by virtue of f , g ∈ L1 (µ), they take real values only.
46/89
Integrating over subsets A ∈ A

Let us also mention that for f ∈ M(A) and A ∈ A such


that f 1A ∈ L(µ) we define
Z Z
fdµ := f 1A dµ.
A

47/89
Interchanging limits and integration

One of the appeals of the µ-integral is the ease with which it


allows interchanging integration and limits.
This is another advantage over the Riemann integral, which
generally only allows interchanging integration and limits on
bounded intervals under uniform rather than pointwise
convergence of a sequence of functions fn to f .
We shall now state and illustrate by examples two important
theorems on interchanging the order of limits and integration:
1 The Monotone Convergence Theorem.
2 The Dominated Convergence Theorem.

48/89
Monotone Convergence Theorem

The Monotone Convergence Theorem gives sufficient conditions


for when we can interchange limits and integration.
Theorem 19 (Monotone Convergence Theorem)
Let f , f1 , f2 , f3 , . . . be functions in M(A)+
satisfying f1 ≤ f2 ≤ f3 ≤ . . . . µ-a.e. Then, if f = limn→∞ fn µ-a.e.
it holds that
Z Z
lim fn dµ = fdµ.
n→∞

The Monotone Convergence Theorem allows us to


interchange integration and pointwise limits for an increasing
sequence of non-negative functions.
Note that we only need f1 (x) ≤ f2 (x) ≤ . . . for µ-almost all x.

49/89
Prior to illustrating the Monotone Convergence Theorem, let
us note that in case (Ω, F, P) is a probability space upon
which a random variable X with values in (R, B(R)) is
defined, then we write
Z Z
E X := X (ω)P(dω) = XdP, X ∈ L(P).
Ω Ω

for the expected value of X .


The distribution PX of X is the probability measure on B(R)
defined as

PX (A) := P(X ∈ A) = P(ω ∈ Ω : X (ω) ∈ A) = P(X −1 (A)),

for A ∈ B(R). Observe that X −1 (A) ∈ F.


[We shall see that the distribution of a random variable X is
nothing else than a so-called image measure.]

50/89
Example: AR(1)

Let (εt )t∈Z be a sequence of random variables on the


probability space (Ω, F, P) with values in (R, B(R)) such
that E |εt | ≤ C < ∞ for all t ∈ Z and some C > 0.
Let

Yt = αYt−1 + εt , t ∈ N, |α| < 1.

We know that the solution to this autoregressive equation is



X
αi εt−i
i=0

But is this series well-defined?

51/89
P∞ i
Consider first i=0 |α| |εt−i |.
PN
Clearly, limN→∞ i=0 |α|i |εt−i | exists in [0, ∞].
Since N i
P
i=0 |α| |εt−i | is F-B(R)-measurable (by Theorem 12)
and using that limits preserve
P measurability (by Theorem 13),
we conclude that limN→∞ N i=0 |α| i |ε
t−i |
is F-B(R)-measurable.
P∞
Since N i i
P
i=0 |α| |εt−i | ↑ i=0 |α| |εt−i |, the Monotone
Convergence Theorem (and linearity of the integral) yields
that

X N
X
i
E |α| |εt−i | = lim |α|i E |εt−i | ≤ C /(1 − |α|) < ∞.
N→∞
i=0 i=0
P∞ i
Thus, i=0 |α| |εt−i | < ∞ P-a.s. [cf. Theorem 18, part 4]
P∞ i
Since i=0 α εt−i converges absolutely P-a.s, it also
converges P-a.s.

52/89
Lebesgue’s Dominated Convergence Theorem

Theorem 20 (Lebesgue’s Dominated Convergence Theorem)

Let f , f1 , f2 , f3 , . . . be functions in M(A) such


that f = limn→∞ fn µ-a.e. If there exists a function g ∈ M(A)+
such that
1 |fn | ≤ g µ-a.e. for all n ∈ N.
R
2 gdµ < ∞.
then
Z Z Z
lim fn dµ = fdµ and lim |fn − f |dµ = 0.
n→∞ n→∞

A function g satisfying the conditions of the theorem is often


referred to as an integrable majorant of the sequence (fn )n∈N ,
i.e “g dominates the fn ” — hence the name of the theorem.

53/89
Example
Consider the measure space (R, B(R), λ1 ) and let f ∈ L1 (λ1 ). We
show that
Z Z
1 1
lim f (x)λ1 (dx) = lim f (x)1[−n,n] (x)λ1 (dx) = 0,
n→∞ 2n [−n,n] n→∞ 2n
(2)

via the Dominated Convergence Theorem.


Observe that
1
fn (x) := f (x)1[−n,n] (x) → 0 for all x ∈ R.
2n
Furthermore, for all n ∈ N,
Z
1 1
|fn (x)| ≤ |f (x)| and |f (x)|λ1 (dx) < ∞
2 2

since f ∈ L1 (λ1 ). The Dominated Convergence Theorem now


yields (2). 54/89
Interchanging integration and differentiation

Theorem 21 (Interchanging integration and differentiation)

Let ft (x) be a function in M(A) where t is a continuous, real


parameter, so that a < t < b for some −∞ ≤ a < b ≤ ∞.
Further, ft (x) is in L1 (λ), where λ is the Lebesgue measure.

Suppose f has a derivative at t satisfying | ∂t ft (x)| ≤ g (x) λ-a.e.
+
for a < t < b where g ∈ M(A) is integrable. Then
Z Z
∂ ∂
ft (x)λ(dx) = ft (x)λ(dx).
∂t ∂t

Proof. We will use the Dominated Convergence Theorem.


The derivative is defined as the limit
∂ ft+h (x) − ft (x)
ft (x) = lim .
∂t h↓0 h

55/89
By the mean value theorem we get

ft+h (x) − ft (x) 1 ∂ ∂


= h ft (x) = ft (x)
h h ∂t t=t ∗ ∂t t=t ∗

for an intermediate point t < t ∗ < t + h. Now, as a < t < b, then


for small h we get a < t ∗ < b. Thus, we can bound λ-a.e.

ft+h (x) − ft (x)


≤ g (x),
h
which is integrable.
Thus, we can construct a dominated sequence of functions,
fn (x) = (ft+h (x) − ft (x))/h with h = 1/n and interchange limits
and integration by appealing to the Dominated Convergence
Theorem.

56/89
Relationship to Riemann integral

It is often useful that integrals with respect to the Lebesgue


measure can be calculated via the Riemann integral over bounded
intervals.
Theorem 22
Let a and b be real numbers such that a < b and let f : [a, b] → R
be Riemann integrable and B([a, b])-B(R)-measurable. Then,
Z Z b
fdλ1 = f (x)dx,
[a,b] a

the right-hand integral being the Riemann-integral.



In the above B([a, b]) := A ∩ [a, b] : A ∈ B(R) .

57/89
A useful consequence

Consider the measure space (R, B(R), λ1 ) and let f ∈ L1 (λ1 ).


Then, by the Dominated Convergence Theorem,
Z Z Z
fdλ1 = lim f 1[−n,n] dλ1 = lim fdλ1 , (3)
n→∞ n→∞ [−n,n]

where we used that f 1[−n,n] → f and |f 1[−n,n] | ≤ |f | for all n ∈ N.


Thus, if f is also Riemann
R integrable, we can calculate the
Lebesgue integral fdλ1 as
Z Z n
fdλ1 = lim f (x)dx.
n→∞ −n

Of course, (3) remains valid in a situation where the first equality


can be ensured via the Monotone Convergence Theorem (such as
when f ≥ 0).
58/89
Example
R∞
Let us determine the value of the integral 0 xe −x λ1 (dx).
Observe that xe −x 1[0,n) (x) ↑ xe −x 1[0,∞) (x) for all x ∈ X . Thus,
by the Monotone Convergence Theorem and Theorem 22
Z ∞
xe −x λ1 (dx)
0
Z Z
−x
= xe
1[0,∞) (x)λ1 (dx) = lim xe −x 1[0,n) (x)λ1 (dx)
n→∞
Z n Z n
−x
= lim xe λ1 (dx) = lim xe −x dx,
n→∞ 0 n→∞ 0

the last integral being Riemann. Using partial integration


Z n Z n
−x −x n
xe dx = [−xe ]0 + e −x dx = −ne −n − e −n + 1,
0 0

and so
Z ∞ Z n
−x
xe −x dx = lim 1 − (n + 1)e −n = 1.

xe λ1 (dx) = lim
0 n→∞ 0 n→∞
59/89
Improper integrals

Some integrals exist as improper Riemann integrals but not as


Lebesgue integrals. An example is the Dirichlet integral. It can be
shown that Z ∞
sin x
dx = π
−∞ x
as an improper
R bRiemann integral. The argument involves showing
that limb→∞ 0 x −1 sin xdx exists. However, it can also be shown
that Z ∞
sin x
dx = ∞.
−∞ x
Thus, the function (sin x)/x is not Lebesgue integrable.
The generalized Riemann integral has Riemann, improper Riemann
and Lebesgue integrals as special cases (Bartle and Sherbert,
2011).

60/89
Regularity conditions: Example

Let us illustrate a situation where one can’t interchange integration


and limits. Consider the probability space

([0, 1], B([0, 1]), λ1 ) and fn (x) = n1[0,1/n] (x), n ∈ N.

Clearly, fn (x) → 0 for λ1 -a.e. x ∈ [0, 1]. [Observe that {0} is a


Lebesgue null set].
However, [use either the above relationship to the
Riemann-integral or that λ1 ([0, 1/n]) = 1/n]
Z Z
fn dλ1 = 1 ̸= 0 = 0dλ1 .

Thus, we should carefully check the regularity conditions of the


Dominated and Monotone Convergence Theorems prior to applying
them.
61/89
Some useful inequalities

For f , g ∈ M(A), the following inequalities hold,


R p 1/p
writing ||f ||p = |f | dµ
Z
1
Markov: µ(|f | ≥ t) ≤ p |f |p dµ, p ∈ (0, ∞), t > 0.
t
Z
1 1
Hölder: |fg |dµ ≤ ||f ||p ||g ||q , + = 1, p > 1.
p q
Z
Cauchy-Schwarz: |fg |dµ ≤ ||f ||2 ||g ||2 .

Minkowski: ||f + g ||p ≤ ||f ||p + ||g ||p , p ≥ 1.

62/89
Jensen’s inequality

Jensen’s inequality is for probability measures. Consider a


probability space (Ω, F, P) upon which a random vector X
with values in Rm is defined.

Lemma 23 (Jensen’s inequality)


Let C be an open and convex subset of Rm and let L : C → R be
a convex function. If X is a random vector with P(X ∈ C ) = 1
and E ||X || < ∞,4 then
1 E X ∈ C.
2 L(E X ) ≤ E L(X ).

Since x →7 x 2 is convex one has (E X )2 ≤ E X 2 .


If g : C → R is concave then −g is convex and so

−g (E X ) ≤ E[−g (X )] ⇐⇒ E g (X ) ≤ g (E X ).

4
pPm
Here, for any x ∈ Rm , ||x|| = i=1 xi2 denotes the Euclidean norm. 63/89
Induced measure and substitution rule
Let (X , A) and (Y, B) be measurable spaces and T : X → Y
be A-B-measurable.
If µ is a measure on A, then µ ◦ T −1 , defined by

(µ ◦ T −1 )(B) := µ(T −1 (B)) = µ(x ∈ X : T (x) ∈ B), B ∈ B

is a measure on B which is called the induced measure/image


measure/push-forward measure (of µ under the mapping T ).
One frequently writes µT for the image measure.
Let us verify that µT is a measure on B:
Clearly, µT (∅) = µ(T −1 (∅)) = µ(∅) = 0 and for (Bn )n∈N a disjoint
sequence of sets in B one has, since µ is a measure on A,

µT ∪n∈N Bn = µ T −1 ∪n∈N Bn = µ ∪n∈N T −1 (Bn )


  
X  X
= µ T −1 (Bn ) = µT (Bn ).
n∈N n∈N
64/89
Before getting to the substitution rule, we will consider an example
of applying the substitution rule to the Riemann integral
Z
I = f (x)dG (T −1 (x)).

Let G have derivative g . By the chain rule


∂ ∂ 1
G (T −1 (x)) = g (T −1 (x)) T −1 (x) = g (T −1 (x)) ′ −1 .
∂x ∂x T (T (x))
Insert above to get
Z
1
I = f (x)g (T −1 (x)) dx.
T ′ (T −1 (x))
Substitute x = T (y ) with dx = T ′ (y )dy to get
Z
1
I = f (T (y ))g (y ) ′ T ′ (y )dy .
T (y )
Cancel T ′ (y ) terms and write g (y )dy = dG (y ) to get
Z Z
−1
I = f (x)dG (T (x)) = f (T (y ))dG (y ).
65/89
We now present a “substitution rule” that we shall frequently use
and that tells us how to integrate with respect to the image
measure µT = µ ◦ T −1 .
Lemma 24 (Substitution rule)

Let (X , A, µ) be a measure space, (Y, B) a measurable space


and T : X → Y be A-B-measurable. For µT the image measure
of µ under T one has

L(µT ) = {f ∈ M(B) : f ◦ T ∈ L(µ)} ,


L1 (µT ) = {f ∈ M(B) : f ◦ T ∈ L1 (µ)} .

In addition, it holds for any function f ∈ L(µT ) that


Z Z
fdµT = (f ◦ T )dµ. (4)

66/89
Product measures and Tonelli’s Theorem

Definition 25 (Product σ-algebra)


Let (X , A) and (Y, B) be measurable spaces. We
call A ⊗ B = σ(A × B : A ∈ A, B ∈ B) the product σ-algebra,
and (X × Y, A ⊗ B) the product space.

Definition 26 (Product measure)


Let (X , A, µ) and (Y, B, ν) be σ-finite measure spaces. The
product measure µ ⊗ ν on A ⊗ B is the unique measure
satisfying (µ ⊗ ν)(A × B) = µ(A) · ν(B) for all A ∈ A and B ∈ B.

Typical product spaces that we shall consider are

Rd1 × Rd2 , B(Rd1 ) ⊗ B(Rd2 )




with the product measure λd1 ⊗ λd2 .


67/89
Example: Relationship to independence

Let X and Y be random variables on the probability


space (Ω, F, P), both with values in (R, B(R)).
Denote by PX ,Y the joint distribution of X and Y , that is

PX ,Y (C ) = P ◦ (X , Y )−1 (C ), C ∈ B(R2 )

Recall that we say that X and Y are independent if

PX ,Y (A × B) = PX (A) · PY (B) for all A, B ∈ B(R),

that is the joint distribution is the product measure of the


marginals, i.e.

PX ,Y = PX ⊗ PY ,

where PX = P ◦ X −1 and PY = P ◦ Y −1 .

68/89
Integration with respect to product measures

Tonelli’s Theorem tells us how to integrate with respect to


product measures.
In particular, it allows us to change the order of integration.

Theorem 27 (Tonelli’s Theorem)


Let (X , A, µ) and (Y, B, ν) be σ-finite measure spaces and
consider the product space (X × Y, A ⊗ B, µ ⊗ ν). For every
function f : X × Y → [0, ∞] from M(A ⊗ B)+ it holds that
Z Z  Z
f (x, y )ν(dy ) µ(dx) = f (x, y )(µ ⊗ ν)(dx, dy )
X Y
ZX ×Y
 Z 
= f (x, y )µ(dx) ν(dy ).
Y X

Fubini’s Theorem generalizes to f in M(A ⊗ B).

69/89
Absolute continuity and domination

We shall often encounter distributions that have a density.


Let (X , A) be a measurable space and µ, ν be σ-finite
measures on A.
If ν(A) = 0 implies µ(A) = 0, A ∈ A then we say that µ is
absolutely continuous with respect to ν.
We also say that ν dominates µ and write µ ≪ ν.
In words, every set that ν assigns measure 0 to, µ must also
assign measure 0 to.

70/89
Densities and conditioning

Densities through Radon-Nikodym derivatives.


Conditional expectation.
Conditional distributions through stochastic kernels.

71/89
Radon-Nikodym

Theorem 28 (Radon-Nikodym)

If µ and ν are σ-finite measures on A and µ ≪ ν, then there exists


a ν-a.e. uniquely determined f ∈ M(A)+ called the density of µ
with respect to ν such that for every A ∈ A
Z Z Z
µ(A) = fdν and hdµ = hfdν.
A

where h ∈ L(µ) = {h ∈ M(A) : hf ∈ L(ν)}.

The function f is also denoted by dµ/dν and called the


Radon-Nikodym derivative of µ with respect to ν.

72/89
Example: Normal distribution

For η ∈ R and σ 2 ∈ (0, ∞) we have that the Normal


distribution N(η, σ 2 ) is the measure on (R, B(R)) with density

1 2 /2σ 2
fη,σ2 (x) = √ e −(x−η) , x ∈R
2πσ 2
with respect to the Lebesgue measure λ1 .
That is,
Z
2
N(η, σ )(A) = fη,σ2 (x)λ1 (dx), A ∈ B(R).
A

In case A is an interval, we thus have


Z
2
N(η, σ )(A) = fη,σ2 (x)dx,
A

the last integral being a Riemann-integral.


73/89
Example: Poisson distribution

For λ ∈ (0, ∞) we have that the Poisson distribution Poi(λ) is the


measure on (N0 , P(N0 )) with density

e −λ λx
fλ (x) = , x ∈ N0
x!
with respect to the counting measure τ .
That is,
Z X e −λ λx
Poi(λ)(A) = fλ (x)τ (dx) = , A ∈ P(N0 ),
A x!
x∈A

the last equality following from Exercise Set 1.

74/89
Conditional expectations

Definition 29 (Conditional expectation)


Let (Ω, F, P) be a probability space, X ∈ L1 (P) a random variable
on it and G a σ-algebra G ⊆ F. The random variables Y satisfying
1 Y is G-B(R)-measurable,
R R
2
A XdP = A YdP for all A ∈ G,
are called a conditional expectation of X given G. We
write E(X |G) for any random variable Y satisfying (1) and (2)
above.
Important fact: For any X ∈ L1 (P) and any σ-algebra G ⊆ F a
conditional expectation always exists and is P-a.s. unique.
Conditional expectations satisfy all the intuitive rules that we
would like them to (linearity, monotonicity,...), cf.,
e.g. Hoffmann-Jørgensen (1994) Section 6.8 or Dudley (2018)
page 338.
75/89
Let (T , T ) be a measurable space. For T : Ω → T we
let σ(T ) ⊆ F denote the σ-algebra generated by T , i.e.

σ(T ) = T −1 (B) : B ∈ T .


σ(T ) is clearly the smallest σ-algebra in Ω that makes T


measurable when T is equipped with T .
We write E(X |T ) := E(X |σ(T )).
The socalled factorization lemma yields that since E(X |T )
is σ(T )-B(R)-measurable there exists
a φ : T → R, T -B(R)-measurable, such that

E(X |T )(ω) = φ(T (ω)) for all ω ∈ Ω.

76/89
Let us characterize the function φ. By the definition of a
conditional expectation it satisfies5
Z Z
φ(T (ω))P(dω) = X (ω)P(dω) for all B ∈ T ,
T −1 (B) T −1 (B)

or, equivalently via the substitution rule,


Z Z
φ(t)PT (dt) = X (ω)P(dω) for all B ∈ T ,
B T −1 (B)

where PT = P ◦ T −1 .
A T -B(R)-measurable function φ satisfying the above two
displays is also called a conditional expectation of X given T = t.
One often uses the notation E(X |T = t) := φ(t) for any such
function φ.

5
Observe that a typical element of σ(T ) is of the form T −1 (B) for B ∈ T . 77/89
Stochastic kernels and conditional distributions

Let us briefly abstract from the exact estimation setting


studied so far.
The following treatment and introduction of conditional
distributions is taken from page 625 in Liese and Miescke
(2008).

78/89
Stochastic kernels

Definition 30 (Stochastic kernels)


For two measurable spaces (X , A) and (Y, B) a mapping
K : B × X → [0, 1] is called a stochastic kernel if for every B ∈ B
the function x 7→ K(B, x) is A-B([0, 1])-measurable, and for every
x ∈ X , it holds that K(·, x) is a probability measure on B.

Stochastic kernels are also referred to as Markov kernels,


conditioning kernels, or simply kernels.
We shall now see that stochastic kernels are intimately linked
to conditional distributions.

79/89
Conditional distribution
Definition 31 (Conditional distribution)
Let X and Y be random variables on the probability space
(Ω, F, P) with values in the measurable spaces (X , A) and (Y, B),
respectively6 . The kernel K : B × X → [0, 1] is called a regular
conditional distribution of Y given X if
Z
P(X ∈ A, Y ∈ B) = K(B, x)PX (dx) for all A ∈ A, B ∈ B,
A

where PX = P ◦ X −1 is the distribution of X under P.

We can think of K(B, x) as the probability of Y falling in the set


B, given/conditional on X = x.
We often write PY |X for the conditional distribution K.
Observe also that
P(X ∈ A, Y ∈ B) = P ◦ (X , Y )−1 (A × B) = PX ,Y (A × B).
6
Both of (X , A) and (Y, B) equal (R, B(R)) 80/89
Some remarks

Observe that
Z
P(Y ∈ B) = P(X ∈ X , Y ∈ B) = K(B, x)PX (dx).
X

Thus, in accordance with intuition, the unconditional


probability of Y ∈ B can be found by averaging the
conditional probability of Y ∈ B (that is K(B, x)) over all
values of x ∈ X .
Note also that if K (B, x) = µ(B) for all x ∈ X and a
measure µ on B, then X and Y are independent
and PX ,Y (A × B) = PX (A) · µ(B) and
Z
PY (B) = P(Y ∈ B) = K(B, x)PX (dx) = µ(B).
X

“If the conditional distribution of Y given X does not depend


on x then Y and X are independent”.
81/89
Existence of conditional distribution

One may ask: when does a conditional distribution of Y given


X exist?
As we will more or less exclusively be dealing with random
vectors (that take value in (Rp , B(Rp ))), the conditional
distribution of Y given X will always exist, cf. page 625 in
Liese and Miescke (2008), Theorem A.37.
In fact it suffices that Y is a so-called complete separable
metric space equipped with the corresponding Borel σ-algebra.

82/89
Finding a conditional distribution via densities

In case the random variables under consideration have a joint


density, it is easy to find a conditional distribution:

Suppose that X and Y are random variables with values


in (X , A) and (Y, B), respectively.
Assume that there are σ-finite measures µ and ν on A and B,
respectively such that PX ,Y = P ◦ (X , Y )−1 ≪ µ ⊗ ν.
dPX ,Y
Set fX ,Y = dµ⊗ν .
Let µ(A) = 0. Then

PX (A) = PX ,Y (A × Y) = 0,

since (µ ⊗ ν)(A × Y) = µ(A) · ν(Y) = 0 and PX ,Y ≪ µ ⊗ ν.


Thus, PX ≪ µ and similarly PY ≪ ν when PX ,Y ≪ µ ⊗ ν.

83/89
Define Z
dPX
fX (x) := (x) = fX ,Y (x, y )ν(dy )

and Z
dPY
fY (y ) := (y ) = fX ,Y (x, y )µ(dx)

which are called the marginal densities.

Observe that since PX ≪ µ and PY ≪ ν (by the previous


slide) dP dPY
dµ and dν exist by the Radon Nikodym Theorem.
X

You will show the two equalities (that are not definitions) in
the exercises.

84/89
Definition 32 (Conditional distribution via densities)
The function

 fX ,Y (x,y ) if fX (x) > 0
fY |X (y |x) = fX (x)
fY (y ) if fX (x) = 0

is called the conditional


R density of Y , given X = x. The stochastic
kernel K (B, x) = B fY |X (y |x)ν(dy ) is called the regular
conditional distribution of Y , given X = x based on the
conditional density.

Since many common distributions, such as the ones in the


exponential family, are given by their densities we can thus “easily”
find conditional distributions for these.

85/89
Summarizing advantages of the measure theoretic approach

Measurability of functions is preserved under pointwise limits.


This is not the case for continuity (underlying much of
Riemann integration).
Limits and integration can easily be interchanged.
It provides a unified framework for dealing with random
variables — no matter whether they have continuous or
discrete distributions or neither.

86/89
Dirichlet’s function

The following is an example of how the Lebesgue integral handles


pointwise limits better than the Riemann integral:
Consider Dirichlet’s functions D = 1Q∩[0,1] on (R, B(R)).
D is not Riemann-integrable as all lower Riemann sums are 0
and all upper Riemann sums are 1.
R1
But 0 Ddλ1 = λ1 (Q ∩ [0, 1]) = 0.
It gets worse: Since Q ∩ [0, 1] is countable, we can write it
as Q ∩ [0, 1] = {xk : k ∈ N}.
Clearly, fn := nk=1 1{xk } ↑ D.
P
R1
For every n, fn is Riemann integrable and 0 fn (x)dx = 0.7

7
Assume without loss of generality that x1 < . . . < xn such that there is a
strictly positive distance r between the xi . Hence, in any partition of [0, 1]
consisting of intervals of length at most r /2 at most n elements contain an xi .
Taking the infimum over such partitions one sees that the upper Riemann
integral is 0 [just like the lower Riemann integral clearly is] 87/89
But since D is not Riemann-integrable we see that pointwise
limits do not preserve Riemann-integrability and it does not
make sense to write
Z 1 Z 1
lim fn (x)dx = D(x)dx.
n→∞ 0 0

However, in accordance with the Monotone Convergence


Theorem, 8
Z 1 n
X Z 1
fn dλ1 = λ1 ({xk }) = 0 = Ddλ1 .
0 k=1 0

8
Alternatively, we can use the Dominated Convergence Theorem as stated
in Theorem 20. 88/89
References
Axler, S. (2021): Measure, integration & real analysis, Springer.
Bartle, R. G. and D. R. Sherbert (2011): Introduction to
Real Analysis, Wiley, 4th ed.
Dudley, R. M. (2018): Real analysis and probability, CRC Press.
Hoffmann-Jørgensen, J. (1994): Probability with a view
toward Statistics, vol. 1, Chapman & Hall.
Kolmogorov, A. N. and S. V. Fomin (1970): Introductory
Real Analysis, Dover.
Lehmann, E. and G. Casella (1998): Theory of Point
Estimation, Springer.
Lehmann, E. and J. Romano (2005): Testing Statistical
Hypotheses, Springer.
Liese, F. and K.-J. Miescke (2008): Statistical Decision
Theory, Springer.
Resnick, S. (2019): A probability path, Springer.
Shiryaev, A. N. (1996): Probability, Springer, 2nd ed. 89/89

You might also like