0% found this document useful (0 votes)

14 views

1 ProbTools

Uploaded by

Luca Rampoldi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

1 ProbTools

Uploaded by

Luca Rampoldi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Some results from measure theory and

measure-theoretic probability

Bent Nielsen

University of Oxford

Michaelmas 2024
Part 1

Comments welcome — do let me know if you spot any mistakes

Revised version of slides written by Anders Kock 2022

1/89
Introduction

The present slide set contains some basic definitions and

results from measure theory and measure-theoretic probability.
Since we are going to use many of these results repeatedly, I
gather them here such that we can refer to them later.
We are not going to prove any/many results in this slide set.
See slides for the Advanced Mathematics course in TT on the
website for proofs of (almost all) results presented in the
present slide deck. Those slides also contain further results.
Hence, I do not expect you to be able to do so.
Rather the expectation is that we can apply the results and
that you have a place to look up what you need.
Some textbooks that provide a (much) more
detailed/complete treatment are mentioned next.

2/89
Some textbooks
Some examples of textbooks on measure-theoretic probability:
Resnick (2019) and Shiryaev (1996): Many examples, focus
on probability, good for self-study.
Kolmogorov and Fomin (1970) and Dudley (2018):
Comprehensive book on real analysis and probability.
The books Lehmann and Casella (1998) and Lehmann and
Romano (2005) also contain crash-course style material like
the one in these slides with particular emphasis on
applications in statistics and econometrics.
Liese and Miescke (2008): The appendix has a collection of
results. We shall frequently use the notation and presentation
style from this book.
The recent book Axler (2021) has a lot of intuition and many
examples of why the theory is the way it is [why σ-algebras,
why Lebesgue integration and not Riemann?]. It may be
interesting for self-study. Available for free on Axler’s website.
3/89
Why the measure-theoretic approach?

The measure-theoretic approach allows us to treat discrete,

continuous and distributions that are neither of these in a
unified way.
Sequences of measurable functions (and random variables) are
closed under pointwise limits.
This is in contrast to continuous functions underlying much of
Riemann-integration.
We can interchange pointwise limits and integration under
rather general conditions. This is again in contrast to
Riemann-integration.
You will find it used in many papers (also outside
econometrics) so it useful not be be “scared” when
encountering σ-algebras, Lebesgue measures, the Lebesgue
integral and terminology like “almost everywhere” and
“almost surely”!
4/89
Plan

Introduce
1 σ-algebras.
2 Measures and their properties.
3 Measurable functions.
4 The Lebesgue integral.
Linearity, monotonicity, dealing with null sets.
The Monotone and Dominated Convergence Theorems.
Markov, Hölder, Jensen, Minkowski inequalities.
Induced measures and the substitution rule.
Tonelli’s Theorem.
5 Densities and conditioning.
The primary examples will be probability spaces and random
variables.

5/89
σ-algebras

Sets.
Definition of σ-algebras.
Generated σ-algebras.
Borel sets on R.

6/89
Sets1

Basic definitions.
A = {a, b, c, . . . } is a set with elements a, b, c, . . . .
A ⊂ B, that is A is a subset of B, if every element in A
belongs to B.
A = B ⇔ A ⊂ B and B ⊂ A.
∅ is the empty set, which contains no elements.
Operations on sets.
A ∪ B and A ∩ B are the union / intersection of A and B,
consisting of all elements that are in A or/and B.
∪α Aα and ∩α Aα are unions and intersections. The index sets
can be finite or infinite in the countable or uncountable sense.
A, B are disjoint if A ∩ B = ∅.
Consider a subset A of a basic set R. Then, the complement
Ac is the set of elements of R that are not in A.
1
Kolmogorov and Fomin (1970, p. 1–4) 7/89
σ-algebra

Consider a non-empty set X .

Definition 1 (σ-algebra)

A collection of subsets A of X is called a σ-algebra in X if the

following conditions are met:
(σ1) X ∈ A.
(σ2) For all A ∈ A it also holds that Ac ∈ A.
S
(σ3) For any sequence of sets (An )n∈N in A, then n∈N An ∈ A.

The sets in A are called the measurable sets.

The terminology “σ-algebra” comes from the fact that a
countable number of operations take place in (σ3).
The term σ-field is also frequently used instead of σ-algebra.

8/89
Examples of σ-algebras

The systems

{∅, X } , P(X ) = {A : A ⊆ X } and {∅, A, Ac , X }

are all σ-algebras.

{∅, X } and P(X ) are the smallest and largest σ-algebras in X ,

respectively: Every other σ-algebra must contain the elements
of {∅, X }; and the elements of every other σ-algebra are contained
in P(X ).

9/89
Some further stability properties of σ-algebras

The following lemma tells us that within a σ-algebra A one can

work freely with the usual set operations without “falling out” out
of A as long as countably many set operations are involved.
Lemma 2 (Some further stability properties of σ-algebras)
If A is a σ-algebra in X , then
1 ∅ ∈ A.
2 If A1 , . . . , AN is a finite collection of sets in A,
then ∪N n=1 An ∈ A.
3 If A, B ∈ A, then A ∩ B ∈ A.
4 If A, B ∈ A, then A \ B ∈ A.
5 If (An )n∈N is a sequence of sets in A, then ∩n∈N An ∈ A.

(1) and (2) are trivial. (3) follows from A ∩ B = (Ac ∪ B c )c ∈ A.

(4) follows from A \ B = A ∩ B c and (3).
(5) follows from ∩n∈N An = (∪n∈N Acn )c ∈ A.
10/89
Intersection of σ-algebras

You will show the following, which plays an important role in

asserting the existence of “generated” σ-algebras, in the exercises.
Theorem 3 (Intersection of σ-algebras)

Let (Ai )i∈I be an (arbitrary) family of σ-algebras in X . Then also

the system
\
Ai = A ⊆ X : A ∈ Ai for all i ∈ I
i∈I

is a σ-algebra in X .

Note, ”an arbitrary family” can have finite, countable or

non-countable index-set I .

11/89
Generated σ-algebras

Let a, b ∈ R such that a < b.

Because ∞ 1 1
T
n=1 (a − n , b + n ) = [a, b] the family of open
intervals Dopen in R does not form a σ-algebra.
Because ∞ 1 1
S
n=1 [a + n , b − n ] = (a, b) the family of closed
intervals Dclosed in R does not form a σ-algebra.

Given this observation, one can ask which subsets of R we must

add to a family of sets D in order to obtain a σ-algebra.
More generally, for any family of sets D in a set X , one can ask if
there exists a (smallest) σ-algebra containing it.
The answer turns out to be yes!

12/89
Generated σ-algebra and Borel σ-algebra

Let D be a family of subsets in X .

It is not difficult to see that the smallest σ-algebra
containing D is the intersection of all σ-algebras containing D
as in Theorem 3 [observe that D is always contained in P(X )].
This σ-algebra is denoted the σ-algebra generated by D and
written σ(D).

Definition 4 (Borel σ-algebra in Rd )

The Borel σ-algebra in Rd is the σ-algebra generated by the

system of closed half-planes. It is denoted by B(Rd ), that is

B(Rd ) = σ {(−∞, b1 ] × · · · × (−∞, bd ] for b1 , . . . bd ∈ R

The elements of B(Rd ) are called Borel sets.

Definition 4 is common in probability texts. In mathematics, this definition
would be an example of a Borel σ-algebra.
13/89
Borel sets on R

Let a ≤ b.
Examples of Borel sets:
Half-open intervals: (a, b] = (−∞, b]\(−∞, a].
Open half-lines: (−∞, b) = ∪n∈N (−∞, b − 1/n].
Closed intervals: [a, b] = (−∞, b]\(−∞, a).
Open intervals: (a, b) = (−∞, b)\(−∞, a].
The above sets could also be used as based for the Borel σ-algebra.
Further examples of Borel sets:
Points: {a} = [a, a].

One can construct a (complicated) set that is not Borel.

The construction requires the axiom of choice.
(Kolmogorov and Fomin, 1970, Section 3.7, Problem 26.7).

14/89
Measures and their properties

Measures.
Examples.
Properties.

15/89
Measures

The next topic is measures.

Measures give the “size” of sets.
Measures are defined on measurable spaces, which we define
next.

Definition 5 (Measurable space)

A measurable space is a pair (X , A), where X is a non-empty set
and A is a σ-algebra in X .

Example: (Rd , B(Rd )) is a measurable space.

16/89
Definition 6
Let (X , A) be a measurable space. A measure µ on (X , A) is a
mapping µ : A → [0, ∞] satisfying the following two conditions.
1 µ(∅) = 0.
2 µ is countably additive, that is for every sequence (An )n∈N of
disjoint subsets in A it holds that
[ X
µ An = µ(An ). (1)
n∈N n∈N

If µ is a measure on (X , A), the triple (X , A, µ) is called a

measure space.

Note that
S the left-hand side in (1) is well-defined
since n∈N An ∈ A (by σ3) and the right-hand side is
well-defined as it is a sum of non-negative terms.

17/89
Some terminology

If µ(X ) = 1, then we say that µ is a probability measure.

Thus, probability measures are just special measures. In this
case, we often write P instead of µ.
If µ(X ) < ∞ we say that µ is finite.
If there exists a sequence
S (An )n∈N in A such that µ(An ) < ∞
for all n ∈ N and n∈N An = X , then we say that µ is σ-finite.

18/89
Examples of measures

P = N(0, 1) is a probability measure on (R, B(R)).

The Lebesgue measure λd on (Rd , B(Rd )) is the measure that
assigns “volumes to cubes” in the sense of

λd [a1 , b1 ] × . . . × [ad , bd ] = (b1 − a1 ) · . . . · (bd − ad )

for all a1 , b1 , . . . , ad , bd ∈ R with aS

i < bi , i = 1, . . . , d. The
Lebesgue measure is σ-finite since n∈N [−n, n]d = Rd
d
and λd [−n, n] = (2n) < ∞. d

Consider the measurable space (X , A). For an element a ∈ X ,

we define the Dirac measure δa as
(
0 if a ∈ Ac
δa (A) =
1 if a ∈ A.

19/89
Counting measure

Counting measure: Consider the measurable space (X , A). The

counting measure τ is then defined as

τ (A) = number of elements in A.

That is, τ (A) < ∞ if and only if A has finitely many elements.

20/89
Fundamental properties of measures

Theorem 7
Let (X , A, µ) be a measure space.
1 µ is finitely additive, that is if A1 , .. . , A
PNNis a finite collection of
disjoint sets in A, then µ ∪N n=1 A n = n=1 µ(An ).
2 If A, B ∈ A and A ⊆ B, then µ(A) ≤ µ(B).
3 If A, B ∈ A, A ⊆ B and µ(A) < ∞, then µ(B \ A) = µ(B) − µ(A).
4 For any sequence of sets (An )n∈N in A it holds that
[ X
µ An ≤ µ(An ).
n∈N n∈N

5 Let (An )n∈N be a sequence of sets in A satisfying A1 ⊆ A2 ⊆ . . ..

Then, µ(∪n∈N An ) = limn→∞ µ(An ).
6 Let (An )n∈N be a sequence of sets in A satisfying A1 ⊇ A2 ⊇ . . .
and µ(A1 ) < ∞. Then, µ(∩n∈N An ) = limn→∞ µ(An ).
21/89
Comments

The properties of measures are as one would hope/expect.

We shall primarily be dealing with probability measures.
Clearly, the finiteness requirements in parts (3) and (6) of
Theorem 7 are satisfied in this case.
However, these finiteness requirements can not be dispensed
of!
To see this in (6) consider the counting measure τ
on (N, P(N)). With An = {n, n + 1, . . .}, n ∈ N, it holds that

τ ∩n∈N An = τ (∅) = 0 but lim τ (An ) = lim ∞ = ∞.
n→∞ n→∞

(4) of Theorem 7 is also referred to as the union bound or

Boole’s inequality.

22/89
Measurable functions

Functions & continuity.

Measurable functions.
Probability spaces and random variables.
Stability properties when taking limits.

23/89
Functions.2
Functions.
Let X and Y be arbitrary sets. A rule associating a unique
element y = f (x) ∈ Y with each element x ∈ X is said to be
a function f on X .
X is the domain of f .
Y = {f (x) for x ∈ X } is the range of f .
If X ̸⊂ R then f is sometimes said to be a mapping. Then f
maps X into Y .
Images and preimages.
If x ∈ X then y = f (x) is the image of x.
Every x ∈ X with y ∈ Y as its image is called a preimage of y .
The set of x ∈ X whose images belong to a set B ⊂ Y is the
preimage of B, denoted f −1 (B).
If no y ∈ B has a preimage, then f −1 (B) = ∅.
Example: Define the function f : R → R via f (x) = x 2 . Then
f −1 ([1, 4]) = x ∈ R : x 2 ∈ [1, 4] = [−2, −1] ∪ [1, 2].

2
Kolmogorov and Fomin (1970, p. 4-6, 44, 87) 24/89
Some results for images and preimages.
f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B).
f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B).
f (A ∪ B) = f (A) ∪ f (B).
In general, f (A ∩ B) ̸= f (A) ∩ f (B).
Continuity.
Let the real function f map X ⊂ R into Y ⊂ R.
f is continuous at the point x0 if, ∀ϵ > 0, ∃δ > 0 such that
|f (x) − f (x0 )| < ϵ whenever |x − x0 | < δ.
f is continuous on X if it is continuous at all x0 ∈ X .
Theorem: f is continuous on X if and only if the preimage
f −1 (B) of any open set B ⊂ Y is open (in X ).
These considerations generalize to metric spaces and to
topological spaces.

25/89
Measurable functions

We shall now introduce measurable functions.

We shall link measurability and continuity of functions.

Definition 8 (Measurable function)

Let (X , A) and (Y, B) be measurable spaces and consider the
mapping f : X → Y. We say that f is measurable (or more
precisely A-B-measurable) if

f −1 (B) ∈ A for all B ∈ B.

The larger A is, the easier it is to be measurable.

The larger B is, the harder it is to be measurable.

26/89
Probability spaces and random variables
Let us put what we have introduced so far into a probabilistic
context.
Definition 9 (Probability space)
A probability space is a triple (Ω, F, P), where Ω is a non-empty
set, F is a σ-algebra in Ω and P is a probability measure on F.
The sets in F are called events and for A ∈ F we
interpret P(A) ∈ [0, 1] as the probability of the event A occurring.

Definition 10 (Random variable)

Let (Ω, F, P) be a probability space and consider the measurable
space (R, B(R)). A random variable is mapping X : Ω → R that
is F-B(R)-measurable.

Thus, a random variable X is just a measurable function defined

on a probability space mapping into R. Random
vectors X : Ω → Rd are defined analogously.
27/89
Probability measures and distribution functions

Consider the probability space (R, B(R), P).

Definition 4 shows B(R) = σ((−∞, b] for b ∈ R).

This implies that a distribution function FP exists so that

FP (b) = P(−∞, b].

Two results can be proved.

The function FP generates exactly one probability measure on
(R, B(R)), namely P.
Any distribution function F generates exactly one probability
measure on (R, B(R)).
These results generalize to Rd .

28/89
Continuity implies measurability

The Euclidean norm is ∥x∥ = ( di=1 xi2 )1/2 for x ∈ Rd .

Theorem 11 (Continuity implies measurability)

Consider Rd and RK with Euclidean norms. Then every
continuous mapping f : Rd → Rk is B(R d )-B(R k )-measurable.

In words: Every continuous function is Borel measurable.

29/89
Non-continuous measurable functions

Although all continuous functions are measurable, not all

measurable functions are continuous.
In this sense, measurability is a property that more functions
have than continuity.

Let (X , A) be a measurable space and define for every A ∈ A the

indicator function 1A : X → R as
(
1 if x ∈ A,
1A (x) =
0 if x ∈ Ac .

30/89
Let A ∈ A. Clearly, for any B ⊆ R



 X if 0, 1 ∈ B,
if 1 ∈ B and 0 ∈ B c ,

A
1−1
A (B) =
Ac
 if 0 ∈ B and 1 ∈ B c ,

if 0, 1 ∈ B c .

∅

Thus, 1A is A-F-measurable irrespective of which σ-algebra R is

equipped with (e.g. F = B(R)).
Even the discontinuous indicator function is measurable!

In contrast, the function 1A for A ̸∈ A is non-measurable.

31/89
Stability properties of measurable functions
Let M(A) = {f : X → R : f is A-B(R)-measurable}.
In the exercises you will show:
Theorem 12 (Stability properties of measurable functions)
1 If f1 , . . . , fd : X → R are elements of M(A) and φ : Rd → R
is B(Rd )-B(R)-measurable, then

φ(f1 , . . . , fd ) : X → R

defined via x 7→ φ(f1 (x), . . . , fd (x)) is an element of M(A).

2 If f , g ∈ M(A), and c ∈ R then the functions

cf , f + g, f · g, f ∧ g, f ∨g

are elements of M(A). In particular, M(A) is a vector space.

Thus, measurability is preserved under transformations.

32/89
Measurability and limits

One frequently deals with sequences of measurable functions.

We shall see that the pointwise limit of a sequence of
functions in M(A) is also an element of M(A), provided that
the pointwise limits are elements of R for all x ∈ X .
This is very useful as the Lebesgue integral is only defined for
measurable functions.
Recall that continuity is not generally maintained under
pointwise limits, cf. fn : [0, 1] → [0, 1] where fn (x) = x n .
Then fn (x) → 1{1} (x), which is discontinuous at 1.

33/89
Let R = {−∞} ∪ R ∪ {∞} be the extended real line. Define

M(A) = {f : X → R : f is A-B(R)-measurable}
M(A)+ = {f ∈ M(A) : f (x) ≥ 0 for all x ∈ X }

This is useful since, e.g., supn∈N fn (x) could be ∞ for some x ∈ X

even if fn (x) ∈ R for all n ∈ N.
Theorem 13 (Limits and measurability)
1 Let (X , A) be a measurable space and let (fn )n∈N be a sequence of
functions in M(A). Then the functions

inf fn , sup fn , lim inf fn and lim sup fn

n∈N n∈N n→∞ n→∞

are also elements of M(A). Finally, if fn is pointwise convergent,

that is if f (x) := limn→∞ fn (x) exists in R for all x ∈ X , then also
the limiting function f ∈ M(A).
2 If (fn )n∈N is a sequence of functions in M(A) and
f (x) := limn→∞ fn (x) exists in R for all x ∈ X then also f ∈ M(A).
34/89
Big picture

Don’t get too bogged down in measurability details.

This is not essential for us, though I shall occasionally point
out measurability issues, e.g. in connection with extremum
estimation (measurable selection).
The most important thing is that we can properly apply the,
luckily intuitive, results for Lebesgue integration to come next.

35/89
Lebesgue integrals

Definitions.
Linearity.
Monotone Convergence Theorem.
Dominated Convergence Theorem.
Inequalities.
Induced measure and substitution.
Product measures and double integrals.

36/89
The Lebesgue integral

We shall now introduce the Lebesgue integral and study its main
properties. It has several advantages over the Riemann integral.
1 It is defined for a broader class of functions.
2 It is much more stable under pointwise limits of sequences of
functions. That is, pointwise limits and integration can often
be interchanged. For the Riemann integral we typically need
uniform convergence.
3 The Lebesgue integral is easily defined for functions on an
arbitrary measure space (X , A, µ). The Riemann integral is
defined for functions on R. This is important in probability
theory where the random variables are defined on a probability
space (Ω, F, P) and expected values are Lebesgue integrals
Z
E X = X (ω)P(dω)

37/89
The Lebesgue integral
Given a measure space (X , A, µ), ai ∈ R
and Ai ∈ A, i = 1, . . . , n we call s : X → R defined via
n
X
s(x) = ai 1Ai (x)
i=1

a simple function.
Denote by SM(A) and SM(A)+ , respectively, the set of
simple and non-negative simple functions.
For s ∈ SM(A)+ one defines
Z Xn
sdµ = ai µ(Ai ) ∈ [0, ∞],
i=1

which is well-defined since Ai ∈ A and ai ≥ 0.

In particular, it is important that
Z
1dµ = µ(A), A ∈ A.
A
38/89
The Lebesgue integral on M(A)+
Definition 14
Let (X , A, µ) be a measure space and f : XR → [0, ∞] a function
in M(A)+ . We then define the µ-integral fdµ of f via
Z nZ o
fdµ := sup sdµ : s ∈ SM(A)+ , s ≤ f ∈ [0, ∞].

It is often useful that for f ∈ M(A)+ there exists a sequence of

non-negative simple functions sn such that sn (x) ↑ f (x) for
all x ∈ X and that
Z Z
fdµ = lim sn dµ.
n→∞

In particular, as n → ∞, the non-negative simple functions

n2 n
X j − 1
sn (x) := 1f −1 ([ j−1 , j )
(x) + n1f −1 ([n,∞]) (x) ↑ f (x).
2n 2n 2n
j=1
39/89
Integrating general functions

We have defined the µ-integral for f ∈ M(A)+ .

To integrate f ∈ M(A), that is functions that need not be
non-negative, define

f + (x) = max(f (x), 0) and f − (x) = − min(f (x), 0).

Observe that f + and f − are non-negative and

f =f+−f− and |f | = f + + f − .

Furthermore, by an argument based on part 2 of Theorem 12

+
it follows that f + and f − are elements of M (A) [for which
we have defined the integral].3

3
Theorem 12 is for elements of M(A) rather than M(A) 40/89
Definition 15 (L(µ) and L1 (µ))
For a measure space (X , A, µ) we define
n Z Z o
L(µ) := f ∈ M(A) : f + dµ ∧ f − dµ < ∞
n Z Z o
L1 (µ) := f ∈ M(A) : f + dµ ∨ f − dµ < ∞

Observe that L1 (µ) consists of functions taking real values only.

Definition 16 (The general integral)

For a function f ∈ L(µ) we define
Z Z Z
fdµ := f dµ − f − dµ
+

Observe that since f ∈ L(µ), we never run into “∞ − ∞” issues.

41/89
µ-a.e. and “almost surely”

It turns that the µ-integral “does not care about null sets”.
If f , g ∈ L(µ) and µ(f =
̸ g ) = 0, then
Z Z
fdµ = gdµ.

Definition 17
A subset N of X is called a µ-null set if there exists an A ∈ A such
that

N ⊆ A and µ(A) = 0.

Remark: Consider the null-sets for the Lebesgue measure on R.

A σ-algebra can be generated from those and the Borel-algebra.
The Lebesgue measure can be extended to that σ-algebra.
Even so, there exists sets in R that are not in that σ-algebra.
Those sets are said to be non-measurable. 42/89
Examples of λ1 -null sets

Consider the measure space (R, B(R), λ1 ). As mentioned

before, the singleton {x} is a Borel set: For x ∈ R
∞
\
x − n−1 , x + n−1 ∈ B(R),

{x} =
n=1

such that by part (6) of Theorem 7

λ1 ({x}) = lim λ1 x − n−1 , x + n−1 = lim 2/n = 0.

n→∞ n→∞

The set of rational numbers Q can be formed by countable

unions of singletons. Thus,PQ is a Borel set and
λ1 (Q) = λ1 (∪x∈Q {x}) = x∈Q λ1 ({x}) = 0.

43/89
Consider (X , A, µ). We say that a property holds for µ-almost
all x ∈ X if the property holds for all x ∈ X \N where
N is a µ-null set. [common in mathematics]
N ∈ A and µ(N) = 0. [common in probability]
We also say the property holds µ-almost everywhere (a.e.).
In probability, we say almost surely (a.s.) or with probability
one.
Examples:
If µ(x ∈ X : f (x) ̸= g (x)) = 0 we say that f = g µ-almost
everywhere, or for µ-almost every x or µ-a.e.
If µ(x ∈ X : limn→∞ fn (x) does not exist) = 0, we say that fn
converges for µ-almost every x.

44/89
Linearity over L1 (µ)

Prior to stating linearity of the integral, let us note

R that the
following expression are used interchangeably for fdµ
Z Z Z Z
fdµ, f (x)µ(dx), f (x)µ(dx), f (x)dµ(x).
X X

45/89
Theorem 18 (Linearity and other properties of the integral)

Let (X , A, µ) be a measure space, let f , g ∈ L1 (µ) and a ∈ R. Then,

R R
1 afdµ = a fdµ.
R R R
2 (f + g )dµ = fdµ + gdµ.
R R
3 If f ≤ g µ-a.e. then fdµ ≤ gdµ and
Z Z
fdµ = gdµ ⇐⇒ f = g µ-a.e.

4 If f ≥ 0 µ-a.e. then
Z
fdµ = 0 ⇐⇒ f = 0 µ-a.e.

and
Z
fdµ < ∞ =⇒ f < ∞ µ-a.e.

R R
5 | fdµ| ≤ |f |dµ.

Observe that we do not run into ∞ − ∞ issues in calculating f + g

in (2) since by virtue of f , g ∈ L1 (µ), they take real values only.
46/89
Integrating over subsets A ∈ A

Let us also mention that for f ∈ M(A) and A ∈ A such

that f 1A ∈ L(µ) we define
Z Z
fdµ := f 1A dµ.
A

47/89
Interchanging limits and integration

One of the appeals of the µ-integral is the ease with which it

allows interchanging integration and limits.
This is another advantage over the Riemann integral, which
generally only allows interchanging integration and limits on
bounded intervals under uniform rather than pointwise
convergence of a sequence of functions fn to f .
We shall now state and illustrate by examples two important
theorems on interchanging the order of limits and integration:
1 The Monotone Convergence Theorem.
2 The Dominated Convergence Theorem.

48/89
Monotone Convergence Theorem

The Monotone Convergence Theorem gives sufficient conditions

for when we can interchange limits and integration.
Theorem 19 (Monotone Convergence Theorem)
Let f , f1 , f2 , f3 , . . . be functions in M(A)+
satisfying f1 ≤ f2 ≤ f3 ≤ . . . . µ-a.e. Then, if f = limn→∞ fn µ-a.e.
it holds that
Z Z
lim fn dµ = fdµ.
n→∞

The Monotone Convergence Theorem allows us to

interchange integration and pointwise limits for an increasing
sequence of non-negative functions.
Note that we only need f1 (x) ≤ f2 (x) ≤ . . . for µ-almost all x.

49/89
Prior to illustrating the Monotone Convergence Theorem, let
us note that in case (Ω, F, P) is a probability space upon
which a random variable X with values in (R, B(R)) is
defined, then we write
Z Z
E X := X (ω)P(dω) = XdP, X ∈ L(P).
Ω Ω

for the expected value of X .

The distribution PX of X is the probability measure on B(R)
defined as

PX (A) := P(X ∈ A) = P(ω ∈ Ω : X (ω) ∈ A) = P(X −1 (A)),

for A ∈ B(R). Observe that X −1 (A) ∈ F.

[We shall see that the distribution of a random variable X is
nothing else than a so-called image measure.]

50/89
Example: AR(1)

Let (εt )t∈Z be a sequence of random variables on the

probability space (Ω, F, P) with values in (R, B(R)) such
that E |εt | ≤ C < ∞ for all t ∈ Z and some C > 0.
Let

Yt = αYt−1 + εt , t ∈ N, |α| < 1.

We know that the solution to this autoregressive equation is

∞
X
αi εt−i
i=0

But is this series well-defined?

51/89
P∞ i
Consider first i=0 |α| |εt−i |.
PN
Clearly, limN→∞ i=0 |α|i |εt−i | exists in [0, ∞].
Since N i
P
i=0 |α| |εt−i | is F-B(R)-measurable (by Theorem 12)
and using that limits preserve
P measurability (by Theorem 13),
we conclude that limN→∞ N i=0 |α| i |ε
t−i |
is F-B(R)-measurable.
P∞
Since N i i
P
i=0 |α| |εt−i | ↑ i=0 |α| |εt−i |, the Monotone
Convergence Theorem (and linearity of the integral) yields
that
∞
X N
X
i
E |α| |εt−i | = lim |α|i E |εt−i | ≤ C /(1 − |α|) < ∞.
N→∞
i=0 i=0
P∞ i
Thus, i=0 |α| |εt−i | < ∞ P-a.s. [cf. Theorem 18, part 4]
P∞ i
Since i=0 α εt−i converges absolutely P-a.s, it also
converges P-a.s.

52/89
Lebesgue’s Dominated Convergence Theorem

Theorem 20 (Lebesgue’s Dominated Convergence Theorem)

Let f , f1 , f2 , f3 , . . . be functions in M(A) such

that f = limn→∞ fn µ-a.e. If there exists a function g ∈ M(A)+
such that
1 |fn | ≤ g µ-a.e. for all n ∈ N.
R
2 gdµ < ∞.
then
Z Z Z
lim fn dµ = fdµ and lim |fn − f |dµ = 0.
n→∞ n→∞

A function g satisfying the conditions of the theorem is often

referred to as an integrable majorant of the sequence (fn )n∈N ,
i.e “g dominates the fn ” — hence the name of the theorem.

53/89
Example
Consider the measure space (R, B(R), λ1 ) and let f ∈ L1 (λ1 ). We
show that
Z Z
1 1
lim f (x)λ1 (dx) = lim f (x)1[−n,n] (x)λ1 (dx) = 0,
n→∞ 2n [−n,n] n→∞ 2n
(2)

via the Dominated Convergence Theorem.

Observe that
1
fn (x) := f (x)1[−n,n] (x) → 0 for all x ∈ R.
2n
Furthermore, for all n ∈ N,
Z
1 1
|fn (x)| ≤ |f (x)| and |f (x)|λ1 (dx) < ∞
2 2

since f ∈ L1 (λ1 ). The Dominated Convergence Theorem now

yields (2). 54/89
Interchanging integration and differentiation

Theorem 21 (Interchanging integration and differentiation)

Let ft (x) be a function in M(A) where t is a continuous, real

parameter, so that a < t < b for some −∞ ≤ a < b ≤ ∞.
Further, ft (x) is in L1 (λ), where λ is the Lebesgue measure.
∂
Suppose f has a derivative at t satisfying | ∂t ft (x)| ≤ g (x) λ-a.e.
+
for a < t < b where g ∈ M(A) is integrable. Then
Z Z
∂ ∂
ft (x)λ(dx) = ft (x)λ(dx).
∂t ∂t

Proof. We will use the Dominated Convergence Theorem.

The derivative is defined as the limit
∂ ft+h (x) − ft (x)
ft (x) = lim .
∂t h↓0 h

55/89
By the mean value theorem we get

ft+h (x) − ft (x) 1 ∂ ∂

= h ft (x) = ft (x)
h h ∂t t=t ∗ ∂t t=t ∗

for an intermediate point t < t ∗ < t + h. Now, as a < t < b, then

for small h we get a < t ∗ < b. Thus, we can bound λ-a.e.

ft+h (x) − ft (x)

≤ g (x),
h
which is integrable.
Thus, we can construct a dominated sequence of functions,
fn (x) = (ft+h (x) − ft (x))/h with h = 1/n and interchange limits
and integration by appealing to the Dominated Convergence
Theorem.

56/89
Relationship to Riemann integral

It is often useful that integrals with respect to the Lebesgue

measure can be calculated via the Riemann integral over bounded
intervals.
Theorem 22
Let a and b be real numbers such that a < b and let f : [a, b] → R
be Riemann integrable and B([a, b])-B(R)-measurable. Then,
Z Z b
fdλ1 = f (x)dx,
[a,b] a

the right-hand integral being the Riemann-integral.

In the above B([a, b]) := A ∩ [a, b] : A ∈ B(R) .

57/89
A useful consequence

Consider the measure space (R, B(R), λ1 ) and let f ∈ L1 (λ1 ).

Then, by the Dominated Convergence Theorem,
Z Z Z
fdλ1 = lim f 1[−n,n] dλ1 = lim fdλ1 , (3)
n→∞ n→∞ [−n,n]

where we used that f 1[−n,n] → f and |f 1[−n,n] | ≤ |f | for all n ∈ N.

Thus, if f is also Riemann
R integrable, we can calculate the
Lebesgue integral fdλ1 as
Z Z n
fdλ1 = lim f (x)dx.
n→∞ −n

Of course, (3) remains valid in a situation where the first equality

can be ensured via the Monotone Convergence Theorem (such as
when f ≥ 0).
58/89
Example
R∞
Let us determine the value of the integral 0 xe −x λ1 (dx).
Observe that xe −x 1[0,n) (x) ↑ xe −x 1[0,∞) (x) for all x ∈ X . Thus,
by the Monotone Convergence Theorem and Theorem 22
Z ∞
xe −x λ1 (dx)
0
Z Z
−x
= xe
1[0,∞) (x)λ1 (dx) = lim xe −x 1[0,n) (x)λ1 (dx)
n→∞
Z n Z n
−x
= lim xe λ1 (dx) = lim xe −x dx,
n→∞ 0 n→∞ 0

the last integral being Riemann. Using partial integration

Z n Z n
−x −x n
xe dx = [−xe ]0 + e −x dx = −ne −n − e −n + 1,
0 0

and so
Z ∞ Z n
−x
xe −x dx = lim 1 − (n + 1)e −n = 1.

xe λ1 (dx) = lim
0 n→∞ 0 n→∞
59/89
Improper integrals

Some integrals exist as improper Riemann integrals but not as

Lebesgue integrals. An example is the Dirichlet integral. It can be
shown that Z ∞
sin x
dx = π
−∞ x
as an improper
R bRiemann integral. The argument involves showing
that limb→∞ 0 x −1 sin xdx exists. However, it can also be shown
that Z ∞
sin x
dx = ∞.
−∞ x
Thus, the function (sin x)/x is not Lebesgue integrable.
The generalized Riemann integral has Riemann, improper Riemann
and Lebesgue integrals as special cases (Bartle and Sherbert,
2011).

60/89
Regularity conditions: Example

Let us illustrate a situation where one can’t interchange integration

and limits. Consider the probability space

([0, 1], B([0, 1]), λ1 ) and fn (x) = n1[0,1/n] (x), n ∈ N.

Clearly, fn (x) → 0 for λ1 -a.e. x ∈ [0, 1]. [Observe that {0} is a

Lebesgue null set].
However, [use either the above relationship to the
Riemann-integral or that λ1 ([0, 1/n]) = 1/n]
Z Z
fn dλ1 = 1 ̸= 0 = 0dλ1 .

Thus, we should carefully check the regularity conditions of the

Dominated and Monotone Convergence Theorems prior to applying
them.
61/89
Some useful inequalities

For f , g ∈ M(A), the following inequalities hold,

R p 1/p
writing ||f ||p = |f | dµ
Z
1
Markov: µ(|f | ≥ t) ≤ p |f |p dµ, p ∈ (0, ∞), t > 0.
t
Z
1 1
Hölder: |fg |dµ ≤ ||f ||p ||g ||q , + = 1, p > 1.
p q
Z
Cauchy-Schwarz: |fg |dµ ≤ ||f ||2 ||g ||2 .

Minkowski: ||f + g ||p ≤ ||f ||p + ||g ||p , p ≥ 1.

62/89
Jensen’s inequality

Jensen’s inequality is for probability measures. Consider a

probability space (Ω, F, P) upon which a random vector X
with values in Rm is defined.

Lemma 23 (Jensen’s inequality)

Let C be an open and convex subset of Rm and let L : C → R be
a convex function. If X is a random vector with P(X ∈ C ) = 1
and E ||X || < ∞,4 then
1 E X ∈ C.
2 L(E X ) ≤ E L(X ).

Since x →7 x 2 is convex one has (E X )2 ≤ E X 2 .

If g : C → R is concave then −g is convex and so

−g (E X ) ≤ E[−g (X )] ⇐⇒ E g (X ) ≤ g (E X ).

4
pPm
Here, for any x ∈ Rm , ||x|| = i=1 xi2 denotes the Euclidean norm. 63/89
Induced measure and substitution rule
Let (X , A) and (Y, B) be measurable spaces and T : X → Y
be A-B-measurable.
If µ is a measure on A, then µ ◦ T −1 , defined by

(µ ◦ T −1 )(B) := µ(T −1 (B)) = µ(x ∈ X : T (x) ∈ B), B ∈ B

is a measure on B which is called the induced measure/image

measure/push-forward measure (of µ under the mapping T ).
One frequently writes µT for the image measure.
Let us verify that µT is a measure on B:
Clearly, µT (∅) = µ(T −1 (∅)) = µ(∅) = 0 and for (Bn )n∈N a disjoint
sequence of sets in B one has, since µ is a measure on A,

µT ∪n∈N Bn = µ T −1 ∪n∈N Bn = µ ∪n∈N T −1 (Bn )

X X
= µ T −1 (Bn ) = µT (Bn ).
n∈N n∈N
64/89
Before getting to the substitution rule, we will consider an example
of applying the substitution rule to the Riemann integral
Z
I = f (x)dG (T −1 (x)).

Let G have derivative g . By the chain rule

∂ ∂ 1
G (T −1 (x)) = g (T −1 (x)) T −1 (x) = g (T −1 (x)) ′ −1 .
∂x ∂x T (T (x))
Insert above to get
Z
1
I = f (x)g (T −1 (x)) dx.
T ′ (T −1 (x))
Substitute x = T (y ) with dx = T ′ (y )dy to get
Z
1
I = f (T (y ))g (y ) ′ T ′ (y )dy .
T (y )
Cancel T ′ (y ) terms and write g (y )dy = dG (y ) to get
Z Z
−1
I = f (x)dG (T (x)) = f (T (y ))dG (y ).
65/89
We now present a “substitution rule” that we shall frequently use
and that tells us how to integrate with respect to the image
measure µT = µ ◦ T −1 .
Lemma 24 (Substitution rule)

Let (X , A, µ) be a measure space, (Y, B) a measurable space

and T : X → Y be A-B-measurable. For µT the image measure
of µ under T one has

L(µT ) = {f ∈ M(B) : f ◦ T ∈ L(µ)} ,

L1 (µT ) = {f ∈ M(B) : f ◦ T ∈ L1 (µ)} .

In addition, it holds for any function f ∈ L(µT ) that

Z Z
fdµT = (f ◦ T )dµ. (4)

66/89
Product measures and Tonelli’s Theorem

Definition 25 (Product σ-algebra)

Let (X , A) and (Y, B) be measurable spaces. We
call A ⊗ B = σ(A × B : A ∈ A, B ∈ B) the product σ-algebra,
and (X × Y, A ⊗ B) the product space.

Definition 26 (Product measure)

Let (X , A, µ) and (Y, B, ν) be σ-finite measure spaces. The
product measure µ ⊗ ν on A ⊗ B is the unique measure
satisfying (µ ⊗ ν)(A × B) = µ(A) · ν(B) for all A ∈ A and B ∈ B.

Typical product spaces that we shall consider are

Rd1 × Rd2 , B(Rd1 ) ⊗ B(Rd2 )

with the product measure λd1 ⊗ λd2 .

67/89
Example: Relationship to independence

Let X and Y be random variables on the probability

space (Ω, F, P), both with values in (R, B(R)).
Denote by PX ,Y the joint distribution of X and Y , that is

PX ,Y (C ) = P ◦ (X , Y )−1 (C ), C ∈ B(R2 )

Recall that we say that X and Y are independent if

PX ,Y (A × B) = PX (A) · PY (B) for all A, B ∈ B(R),

that is the joint distribution is the product measure of the

marginals, i.e.

PX ,Y = PX ⊗ PY ,

where PX = P ◦ X −1 and PY = P ◦ Y −1 .

68/89
Integration with respect to product measures

Tonelli’s Theorem tells us how to integrate with respect to

product measures.
In particular, it allows us to change the order of integration.

Theorem 27 (Tonelli’s Theorem)

Let (X , A, µ) and (Y, B, ν) be σ-finite measure spaces and
consider the product space (X × Y, A ⊗ B, µ ⊗ ν). For every
function f : X × Y → [0, ∞] from M(A ⊗ B)+ it holds that
Z Z Z
f (x, y )ν(dy ) µ(dx) = f (x, y )(µ ⊗ ν)(dx, dy )
X Y
ZX ×Y
Z
= f (x, y )µ(dx) ν(dy ).
Y X

Fubini’s Theorem generalizes to f in M(A ⊗ B).

69/89
Absolute continuity and domination

We shall often encounter distributions that have a density.

Let (X , A) be a measurable space and µ, ν be σ-finite
measures on A.
If ν(A) = 0 implies µ(A) = 0, A ∈ A then we say that µ is
absolutely continuous with respect to ν.
We also say that ν dominates µ and write µ ≪ ν.
In words, every set that ν assigns measure 0 to, µ must also
assign measure 0 to.

70/89
Densities and conditioning

Densities through Radon-Nikodym derivatives.

Conditional expectation.
Conditional distributions through stochastic kernels.

71/89
Radon-Nikodym

Theorem 28 (Radon-Nikodym)

If µ and ν are σ-finite measures on A and µ ≪ ν, then there exists

a ν-a.e. uniquely determined f ∈ M(A)+ called the density of µ
with respect to ν such that for every A ∈ A
Z Z Z
µ(A) = fdν and hdµ = hfdν.
A

where h ∈ L(µ) = {h ∈ M(A) : hf ∈ L(ν)}.

The function f is also denoted by dµ/dν and called the

Radon-Nikodym derivative of µ with respect to ν.

72/89
Example: Normal distribution

For η ∈ R and σ 2 ∈ (0, ∞) we have that the Normal

distribution N(η, σ 2 ) is the measure on (R, B(R)) with density

1 2 /2σ 2
fη,σ2 (x) = √ e −(x−η) , x ∈R
2πσ 2
with respect to the Lebesgue measure λ1 .
That is,
Z
2
N(η, σ )(A) = fη,σ2 (x)λ1 (dx), A ∈ B(R).
A

In case A is an interval, we thus have

Z
2
N(η, σ )(A) = fη,σ2 (x)dx,
A

the last integral being a Riemann-integral.

73/89
Example: Poisson distribution

For λ ∈ (0, ∞) we have that the Poisson distribution Poi(λ) is the

measure on (N0 , P(N0 )) with density

e −λ λx
fλ (x) = , x ∈ N0
x!
with respect to the counting measure τ .
That is,
Z X e −λ λx
Poi(λ)(A) = fλ (x)τ (dx) = , A ∈ P(N0 ),
A x!
x∈A

the last equality following from Exercise Set 1.

74/89
Conditional expectations

Definition 29 (Conditional expectation)

Let (Ω, F, P) be a probability space, X ∈ L1 (P) a random variable
on it and G a σ-algebra G ⊆ F. The random variables Y satisfying
1 Y is G-B(R)-measurable,
R R
2
A XdP = A YdP for all A ∈ G,
are called a conditional expectation of X given G. We
write E(X |G) for any random variable Y satisfying (1) and (2)
above.
Important fact: For any X ∈ L1 (P) and any σ-algebra G ⊆ F a
conditional expectation always exists and is P-a.s. unique.
Conditional expectations satisfy all the intuitive rules that we
would like them to (linearity, monotonicity,...), cf.,
e.g. Hoffmann-Jørgensen (1994) Section 6.8 or Dudley (2018)
page 338.
75/89
Let (T , T ) be a measurable space. For T : Ω → T we
let σ(T ) ⊆ F denote the σ-algebra generated by T , i.e.

σ(T ) = T −1 (B) : B ∈ T .

σ(T ) is clearly the smallest σ-algebra in Ω that makes T

measurable when T is equipped with T .
We write E(X |T ) := E(X |σ(T )).
The socalled factorization lemma yields that since E(X |T )
is σ(T )-B(R)-measurable there exists
a φ : T → R, T -B(R)-measurable, such that

E(X |T )(ω) = φ(T (ω)) for all ω ∈ Ω.

76/89
Let us characterize the function φ. By the definition of a
conditional expectation it satisfies5
Z Z
φ(T (ω))P(dω) = X (ω)P(dω) for all B ∈ T ,
T −1 (B) T −1 (B)

or, equivalently via the substitution rule,

Z Z
φ(t)PT (dt) = X (ω)P(dω) for all B ∈ T ,
B T −1 (B)

where PT = P ◦ T −1 .
A T -B(R)-measurable function φ satisfying the above two
displays is also called a conditional expectation of X given T = t.
One often uses the notation E(X |T = t) := φ(t) for any such
function φ.

5
Observe that a typical element of σ(T ) is of the form T −1 (B) for B ∈ T . 77/89
Stochastic kernels and conditional distributions

Let us briefly abstract from the exact estimation setting

studied so far.
The following treatment and introduction of conditional
distributions is taken from page 625 in Liese and Miescke
(2008).

78/89
Stochastic kernels

Definition 30 (Stochastic kernels)

For two measurable spaces (X , A) and (Y, B) a mapping
K : B × X → [0, 1] is called a stochastic kernel if for every B ∈ B
the function x 7→ K(B, x) is A-B([0, 1])-measurable, and for every
x ∈ X , it holds that K(·, x) is a probability measure on B.

Stochastic kernels are also referred to as Markov kernels,

conditioning kernels, or simply kernels.
We shall now see that stochastic kernels are intimately linked
to conditional distributions.

79/89
Conditional distribution
Definition 31 (Conditional distribution)
Let X and Y be random variables on the probability space
(Ω, F, P) with values in the measurable spaces (X , A) and (Y, B),
respectively6 . The kernel K : B × X → [0, 1] is called a regular
conditional distribution of Y given X if
Z
P(X ∈ A, Y ∈ B) = K(B, x)PX (dx) for all A ∈ A, B ∈ B,
A

where PX = P ◦ X −1 is the distribution of X under P.

We can think of K(B, x) as the probability of Y falling in the set

B, given/conditional on X = x.
We often write PY |X for the conditional distribution K.
Observe also that
P(X ∈ A, Y ∈ B) = P ◦ (X , Y )−1 (A × B) = PX ,Y (A × B).
6
Both of (X , A) and (Y, B) equal (R, B(R)) 80/89
Some remarks

Observe that
Z
P(Y ∈ B) = P(X ∈ X , Y ∈ B) = K(B, x)PX (dx).
X

Thus, in accordance with intuition, the unconditional

probability of Y ∈ B can be found by averaging the
conditional probability of Y ∈ B (that is K(B, x)) over all
values of x ∈ X .
Note also that if K (B, x) = µ(B) for all x ∈ X and a
measure µ on B, then X and Y are independent
and PX ,Y (A × B) = PX (A) · µ(B) and
Z
PY (B) = P(Y ∈ B) = K(B, x)PX (dx) = µ(B).
X

“If the conditional distribution of Y given X does not depend

on x then Y and X are independent”.
81/89
Existence of conditional distribution

One may ask: when does a conditional distribution of Y given

X exist?
As we will more or less exclusively be dealing with random
vectors (that take value in (Rp , B(Rp ))), the conditional
distribution of Y given X will always exist, cf. page 625 in
Liese and Miescke (2008), Theorem A.37.
In fact it suffices that Y is a so-called complete separable
metric space equipped with the corresponding Borel σ-algebra.

82/89
Finding a conditional distribution via densities

In case the random variables under consideration have a joint

density, it is easy to find a conditional distribution:

Suppose that X and Y are random variables with values

in (X , A) and (Y, B), respectively.
Assume that there are σ-finite measures µ and ν on A and B,
respectively such that PX ,Y = P ◦ (X , Y )−1 ≪ µ ⊗ ν.
dPX ,Y
Set fX ,Y = dµ⊗ν .
Let µ(A) = 0. Then

PX (A) = PX ,Y (A × Y) = 0,

since (µ ⊗ ν)(A × Y) = µ(A) · ν(Y) = 0 and PX ,Y ≪ µ ⊗ ν.

Thus, PX ≪ µ and similarly PY ≪ ν when PX ,Y ≪ µ ⊗ ν.

83/89
Define Z
dPX
fX (x) := (x) = fX ,Y (x, y )ν(dy )
dµ
and Z
dPY
fY (y ) := (y ) = fX ,Y (x, y )µ(dx)
dν
which are called the marginal densities.

Observe that since PX ≪ µ and PY ≪ ν (by the previous

slide) dP dPY
dµ and dν exist by the Radon Nikodym Theorem.
X

You will show the two equalities (that are not definitions) in
the exercises.

84/89
Definition 32 (Conditional distribution via densities)
The function

 fX ,Y (x,y ) if fX (x) > 0
fY |X (y |x) = fX (x)
fY (y ) if fX (x) = 0

is called the conditional

R density of Y , given X = x. The stochastic
kernel K (B, x) = B fY |X (y |x)ν(dy ) is called the regular
conditional distribution of Y , given X = x based on the
conditional density.

Since many common distributions, such as the ones in the

exponential family, are given by their densities we can thus “easily”
find conditional distributions for these.

85/89
Summarizing advantages of the measure theoretic approach

Measurability of functions is preserved under pointwise limits.

This is not the case for continuity (underlying much of
Riemann integration).
Limits and integration can easily be interchanged.
It provides a unified framework for dealing with random
variables — no matter whether they have continuous or
discrete distributions or neither.

86/89
Dirichlet’s function

The following is an example of how the Lebesgue integral handles

pointwise limits better than the Riemann integral:
Consider Dirichlet’s functions D = 1Q∩[0,1] on (R, B(R)).
D is not Riemann-integrable as all lower Riemann sums are 0
and all upper Riemann sums are 1.
R1
But 0 Ddλ1 = λ1 (Q ∩ [0, 1]) = 0.
It gets worse: Since Q ∩ [0, 1] is countable, we can write it
as Q ∩ [0, 1] = {xk : k ∈ N}.
Clearly, fn := nk=1 1{xk } ↑ D.
P
R1
For every n, fn is Riemann integrable and 0 fn (x)dx = 0.7

7
Assume without loss of generality that x1 < . . . < xn such that there is a
strictly positive distance r between the xi . Hence, in any partition of [0, 1]
consisting of intervals of length at most r /2 at most n elements contain an xi .
Taking the infimum over such partitions one sees that the upper Riemann
integral is 0 [just like the lower Riemann integral clearly is] 87/89
But since D is not Riemann-integrable we see that pointwise
limits do not preserve Riemann-integrability and it does not
make sense to write
Z 1 Z 1
lim fn (x)dx = D(x)dx.
n→∞ 0 0

However, in accordance with the Monotone Convergence

Theorem, 8
Z 1 n
X Z 1
fn dλ1 = λ1 ({xk }) = 0 = Ddλ1 .
0 k=1 0

8
Alternatively, we can use the Dominated Convergence Theorem as stated
in Theorem 20. 88/89
References
Axler, S. (2021): Measure, integration & real analysis, Springer.
Bartle, R. G. and D. R. Sherbert (2011): Introduction to
Real Analysis, Wiley, 4th ed.
Dudley, R. M. (2018): Real analysis and probability, CRC Press.
Hoffmann-Jørgensen, J. (1994): Probability with a view
toward Statistics, vol. 1, Chapman & Hall.
Kolmogorov, A. N. and S. V. Fomin (1970): Introductory
Real Analysis, Dover.
Lehmann, E. and G. Casella (1998): Theory of Point
Estimation, Springer.
Lehmann, E. and J. Romano (2005): Testing Statistical
Hypotheses, Springer.
Liese, F. and K.-J. Miescke (2008): Statistical Decision
Theory, Springer.
Resnick, S. (2019): A probability path, Springer.
Shiryaev, A. N. (1996): Probability, Springer, 2nd ed. 89/89

Chapter 2 of Discrete Mathematics
67% (3)
Chapter 2 of Discrete Mathematics
137 pages
Deliverables List Feed Vs Detail Engineering
100% (5)
Deliverables List Feed Vs Detail Engineering
9 pages
Measure Theory
No ratings yet
Measure Theory
141 pages
Sets and Cardinality Notes For 620-111: C. F. Miller Semester 1, 2000
No ratings yet
Sets and Cardinality Notes For 620-111: C. F. Miller Semester 1, 2000
21 pages
Introduction To Mathematical Analysis
60% (5)
Introduction To Mathematical Analysis
141 pages
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Set Theory
No ratings yet
Set Theory
11 pages
10700121101 - Copy
No ratings yet
10700121101 - Copy
6 pages
Algebra - Chapter 0 by Aluffi - solutions v1
No ratings yet
Algebra - Chapter 0 by Aluffi - solutions v1
290 pages
Set Theory
No ratings yet
Set Theory
7 pages
IntroRealAnal ch01
No ratings yet
IntroRealAnal ch01
15 pages
Analysis II Lecture Notes
No ratings yet
Analysis II Lecture Notes
85 pages
PPTs - Set Theory & Probability (L-1 To L-6)
No ratings yet
PPTs - Set Theory & Probability (L-1 To L-6)
49 pages
Analysis II Lecture Notes
No ratings yet
Analysis II Lecture Notes
83 pages
Probability Theory (Ch 1 & 2)-1 (1)
No ratings yet
Probability Theory (Ch 1 & 2)-1 (1)
15 pages
Algebra For Biological Science PDF
No ratings yet
Algebra For Biological Science PDF
87 pages
MTLI
No ratings yet
MTLI
101 pages
Lesson 2.1. Set Operations
No ratings yet
Lesson 2.1. Set Operations
20 pages
Introduction To Sets and Functions
No ratings yet
Introduction To Sets and Functions
6 pages
CSC102
No ratings yet
CSC102
29 pages
Discreate Mathematics
No ratings yet
Discreate Mathematics
137 pages
Probability and Measure
No ratings yet
Probability and Measure
54 pages
Book-Chapter02
No ratings yet
Book-Chapter02
15 pages
Lecture Notes On Topology: Hu NH Quang Vũ
No ratings yet
Lecture Notes On Topology: Hu NH Quang Vũ
114 pages
cs208
No ratings yet
cs208
34 pages
probability theory (Ch 1 & 2)
No ratings yet
probability theory (Ch 1 & 2)
16 pages
Advanced Calculus Book 1 by David Fearnley
No ratings yet
Advanced Calculus Book 1 by David Fearnley
178 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
109 pages
L7
No ratings yet
L7
4 pages
157LN01 Filled
No ratings yet
157LN01 Filled
13 pages
Foundation Probability (Lecture Notes)
No ratings yet
Foundation Probability (Lecture Notes)
51 pages
Unit 2 - 1 - 1724744570469
No ratings yet
Unit 2 - 1 - 1724744570469
85 pages
Computation Theory: Lecture One
100% (2)
Computation Theory: Lecture One
7 pages
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Real&AbstractAnalysis - CH 1
No ratings yet
Real&AbstractAnalysis - CH 1
7 pages
Lebesgue Integration
From Everand
Lebesgue Integration
J.H. Williamson
No ratings yet
Bahan Kuliah Analisis Real
No ratings yet
Bahan Kuliah Analisis Real
305 pages
CS189:289_ML_hw1
No ratings yet
CS189:289_ML_hw1
15 pages
Further Mathematics 10TH Grade First Term Enote
No ratings yet
Further Mathematics 10TH Grade First Term Enote
41 pages
4 Sets
No ratings yet
4 Sets
53 pages
Supplement (Set Theory)
No ratings yet
Supplement (Set Theory)
5 pages
TCS PPTs
No ratings yet
TCS PPTs
114 pages
Methods12_2ed_Ch05
No ratings yet
Methods12_2ed_Ch05
41 pages
Richard Earl Lectures
No ratings yet
Richard Earl Lectures
86 pages
Abstract Algebra Textbook
No ratings yet
Abstract Algebra Textbook
102 pages
Supplement (Set Theory)
No ratings yet
Supplement (Set Theory)
5 pages
Real Analysis Lecture Notes PDF
100% (1)
Real Analysis Lecture Notes PDF
124 pages
Lnotes Mathematical Found QM
No ratings yet
Lnotes Mathematical Found QM
19 pages
_6d4d429916055636f2c0d5813031090b_Supplement__Set_Theory_
No ratings yet
_6d4d429916055636f2c0d5813031090b_Supplement__Set_Theory_
5 pages
Topology Notes
0% (1)
Topology Notes
30 pages
Sets and Funcctions, Cardinality
No ratings yet
Sets and Funcctions, Cardinality
22 pages
Measure Theory Notes
No ratings yet
Measure Theory Notes
31 pages
Notes 1234
No ratings yet
Notes 1234
31 pages
Probability Book PDF
No ratings yet
Probability Book PDF
185 pages
Chapter 0
No ratings yet
Chapter 0
5 pages
Math Review Handout
No ratings yet
Math Review Handout
12 pages
Session Topic2 MathLanguage 1STSemAY2023-24
No ratings yet
Session Topic2 MathLanguage 1STSemAY2023-24
25 pages
Week 1.1E Numbers, Sets and Inequalities
No ratings yet
Week 1.1E Numbers, Sets and Inequalities
4 pages
1 Introduction To Complexity Theory: 1.1 Basic Notation
No ratings yet
1 Introduction To Complexity Theory: 1.1 Basic Notation
10 pages
Decomposing Finite Abelian Groups
No ratings yet
Decomposing Finite Abelian Groups
6 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Karakamsa
No ratings yet
Karakamsa
1 page
1404 E EUC 3 Manual and Automatic Compensation of Reactive Power
No ratings yet
1404 E EUC 3 Manual and Automatic Compensation of Reactive Power
20 pages
Lesson 2 The Cell: Prepared By: Asley Raven Macatiag
No ratings yet
Lesson 2 The Cell: Prepared By: Asley Raven Macatiag
29 pages
Coastal Protection Structures
88% (8)
Coastal Protection Structures
107 pages
Seasonal Work Brochure 05
No ratings yet
Seasonal Work Brochure 05
2 pages
Colic Infantile
No ratings yet
Colic Infantile
28 pages
14-ENDEVCO 2258A
No ratings yet
14-ENDEVCO 2258A
2 pages
The Americans in The Great War Vol 2 - Battle of ST Mihiel
No ratings yet
The Americans in The Great War Vol 2 - Battle of ST Mihiel
156 pages
Ford Everest - Exterior Design - ENG PDF
No ratings yet
Ford Everest - Exterior Design - ENG PDF
1 page
Superstitious Beliefs Across Daily Lives of Individual
No ratings yet
Superstitious Beliefs Across Daily Lives of Individual
35 pages
WP49-PUE A Comprehensive Examination of The Metric - v6
100% (1)
WP49-PUE A Comprehensive Examination of The Metric - v6
83 pages
Seismic Load Computation
100% (2)
Seismic Load Computation
5 pages
9 Dai Tra
100% (1)
9 Dai Tra
4 pages
Advantages of Valve Regulated Lead Acid (VRLA) Batteries
No ratings yet
Advantages of Valve Regulated Lead Acid (VRLA) Batteries
1 page
Minimally Invasive Prosthodontics Formatted
No ratings yet
Minimally Invasive Prosthodontics Formatted
4 pages
AITS_Test-05_Lakshya_JEE_12-01-2024_Solution (6).pdf
No ratings yet
AITS_Test-05_Lakshya_JEE_12-01-2024_Solution (6).pdf
12 pages
Technical Justification of Teak Tree Cutting: Justifikasi Teknis Penebangan Pohon Jati
No ratings yet
Technical Justification of Teak Tree Cutting: Justifikasi Teknis Penebangan Pohon Jati
13 pages
Circular: This Is A Computer System (Digital File) Generated Letter. Hence There Is No Need For A Physical Signature
No ratings yet
Circular: This Is A Computer System (Digital File) Generated Letter. Hence There Is No Need For A Physical Signature
3 pages
Review of Face Detection System Using Neural Network
No ratings yet
Review of Face Detection System Using Neural Network
5 pages
Energy Management SYSTEM Manual
No ratings yet
Energy Management SYSTEM Manual
34 pages
Guardians of Public Value: How Public Organisations Become and Remain Institutions 1st ed. Edition Arjen Boinpdf download
100% (1)
Guardians of Public Value: How Public Organisations Become and Remain Institutions 1st ed. Edition Arjen Boinpdf download
46 pages
WWW - One School - Net Notes Chemistry SPM Chemistry Formula List Form5
No ratings yet
WWW - One School - Net Notes Chemistry SPM Chemistry Formula List Form5
15 pages
Final Book
No ratings yet
Final Book
98 pages
DP-5/DP-7Series: Service Manual
No ratings yet
DP-5/DP-7Series: Service Manual
171 pages
Project Proposal On The Production of Industrial Adhesive Feasibility Study - Business Plan in Ethiopia PDF
No ratings yet
Project Proposal On The Production of Industrial Adhesive Feasibility Study - Business Plan in Ethiopia PDF
1 page
Candle Stick Patterns
No ratings yet
Candle Stick Patterns
14 pages
Part III. Change: CH 19. Molecules in Motion
No ratings yet
Part III. Change: CH 19. Molecules in Motion
9 pages
Act. 3
No ratings yet
Act. 3
5 pages
Service M Adjustm. Automatic Screen Exchange GD HM V02
No ratings yet
Service M Adjustm. Automatic Screen Exchange GD HM V02
8 pages