(De Gruyter Graduate) Marc Kesseböhmer, Sara Munday, Bernd Otto Stratmann - Infinite Ergodic Theory of Numbers-De Gruyter (2016)
(De Gruyter Graduate) Marc Kesseböhmer, Sara Munday, Bernd Otto Stratmann - Infinite Ergodic Theory of Numbers-De Gruyter (2016)
(De Gruyter Graduate) Marc Kesseböhmer, Sara Munday, Bernd Otto Stratmann - Infinite Ergodic Theory of Numbers-De Gruyter (2016)
Volume 62
Positive Dynamical Systems in Discrete Time: Theory, Models, and
Applications
Ulrich Krause, 2015
ISBN 978-3-11-036975-5, e-ISBN (PDF) 978-3-11-036569-6,
e-ISBN (EPUB) 978-3-11-039134-3
Volume 61
Markov Operators, Positive Semigroups and Approximation
Processes
Francesco Altomare, Mirella Cappelletti, Vita Leonessa,
Ioan Rasa, 2014
ISBN 978-3-11-037274-8, e-ISBN (PDF) 978-3-11-036697-6,
e-ISBN (EPUB) 978-3-11-038641-7
Volume 59
Topological Dynamical Systems: An Introduction to the Dynamics
of Continuous Mappings
Jan Vries, 2014
ISBN 978-3-11-034073-0, e-ISBN (PDF) 978-3-11-034240-6,
e-ISBN (EPUB) 978-3-11-037459-9
Infinite Ergodic
Theory of Numbers
|
Mathematics Subject Classification 2010
37Axx, 11B57, 11J70, 11J83, 60K05
Authors
Dr. Sara Munday
Università di Bologna
Department of Mathematics
ISBN 978-3-11-043941-0
e-ISBN (PDF) 978-3-11-043942-7
e-ISBN (EPUB) 978-3-11-043085-1
www.degruyter.com
Preface
This book is intended to serve as an introduction to Infinite Ergodic Theory for
advanced undergraduate and PhD students and should be appropriate as a text for
a seminar or reading course. We hope that some aspects of the presented material
will be of interest also to researchers. The prerequisites we have assumed are a certain
familiarity with measure theory and (to a lesser extent) basic concepts from functional
analysis. For the rest, we have attempted to be as self-contained as possible.
One of the fundamental objectives of ergodic theory is to investigate dynamical
systems from a measure-theoretic perspective. Central to this conception are invariant
measures – measures which remain unchanged under the dynamics. In classical
ergodic theory these measures are probability measures, whereas the topic of infinite
ergodic theory is systems which preserve infinite measures. This seemingly small
change engenders radically different results, as we will see. A systematic approach to
this part of ergodic theory has been provided in the foundational work of Jon Aaronson
[Aar97], which is still a major source for a coherent presentation and also guided us in
the preparation of this graduate text. In addition to Aaronson’s book, we also found
helpful the book by Dajani and Kraaikamp [DK02] and the survey article by Stefano
Isola [Iso11], as well as unpublished lecture notes by Omri Sarig, Maximilian Thaler,
and Roland Zweimüller.
Our aim with this book is to illuminate various aspects of infinite ergodic theory
by using several concrete examples of dynamical systems that are strongly linked to
number theory. We will use these examples to analyse some explicit questions (like the
asymptotic behaviour of sum-level sets) to illustrate not only the powerful methods
from infinite ergodic theory but also the strong connection between infinite ergodic
theory and renewal theory.
Another theme in the book is elementary Diophantine approximation. We give
some classical results for continued fractions in Chapter 1, and, with the intention of
closing a circle of ideas, in Chapter 5 we show how the analysis of the sum-level sets
for the continued fraction expansion also gives rise to some further Diophantine-type
results. The final application in Chapter 5 is to establish a uniform law for the
Stern-Brocot sequence contrasting a famous analogous result for the Farey sequence.
The book is organised with chapters containing general theory sandwiched
between chapters devoted more to our examples and to applications. The first chapter
consists of a little necessary background material and the introduction of all our main
examples. It is followed by a chapter containing some standard results in ergodic
theory and the beginnings of the theory for infinite systems. Chapter 3 is then devoted
to renewal theory and its application to certain piecewise-linear systems. We return
to more general infinite ergodic theory for Chapter 4, and finally, in Chapter 5 we see
some applications of this theory to the Gauss and Farey systems.
VI | Preface
August 2016,
Marc Kesseböhmer and Sara Munday
Contents
Mathematical symbols | IX
Bibliography | 183
Index | 188
Mathematical symbols
∀ ‘for all’.
∃ ‘exists’.
0n n consecutive appearances of the symbol 0. 32
· operator norm. 76
| · |p p-norm for p ∈ [1, ∞]. 137
∅ empty set.
=⇒ ‘implies’.
⇐⇒ ‘equivalent’.
#E cardinality of E. 18
:= ‘equal by definition’.
Ji := d(λ ◦ T i )/dλ; Jacobian for the i-th inverse branch of the interval
map T. 81
sn := q n−1 /q n . 4
σ shift map of EN . 18
Sn = Bn \ Bn−1 ; vertices of the Stern–Brocot Tree. 34
S An (h) ergodic sums for the induced system with respect to h : A → R. 112
Z set of integers.
1 Number-theoretical dynamical systems
Throughout this book, a (topological) dynamical system (X, T) means simply a
non-empty metric space X and a continuous map T : X → X. In this chapter, we
introduce various dynamical systems that generate real number expansions. Such
systems will be referred to as number-theoretic dynamical systems. The examples we
will consider here are mainly constructed over the unit interval. For further simple
examples of number-theoretic dynamical systems, including that of the map that gives
rise to the familiar decimal expansion, we refer to Dajani and Kraaikamp [DK02].
1
[x1 , x2 , x3 , . . .] := , (1.1)
1
x1 +
1
x2 +
x3 + · · ·
1
[x1 , . . . , x n ] := . (1.2)
1
x1 +
1
x2 + · · · +
xn
Notice that for x n ≥ 2, the continued fractions [x1 , . . . , x n − 1, 1] and [x1 , . . . , x n ]
represent the same number. It is typical to use only the latter expression, but on
occasion we find it helpful to have the option of using either.
We cannot immediately assign a value to an infinite continued fraction, so, for
the time being, it should be thought of only as a formal notation, akin to that for an
infinite series.
In the theory of continued fractions, a particularly important role is played
by the initial segments of each (finite or infinite) continued fraction. For a given
continued fraction [x1 , x2 , x3 , . . .], we consider the sequence of rational numbers
([x1 , . . . , x k ])k≥1 . For each k ∈ N, we will write p k /q k := [x1 , . . . , x k ], where the positive
integers p k and q k are required to be coprime. Then p k /q k will be called the k-th
convergent to the continued fraction [x1 , x2 , x3 , . . .]. In particular,
p1 1 p 1 x2
= and 2 = = .
q1 x1 q2 x1 + 1/x2 x1 x2 + 1
In this case, there are only n convergents. Each infinite continued fraction has an
infinite sequence of convergents. The justification for this terminology will come
in Proposition 1.1.3, but first let us give the recurrence relations that describe the
formation of the convergents. Here, we make the further definition that p0 := 0 and
q0 := 1.
Proof. The statements in (a) and (b) can be proved by induction; we leave this as an
exercise for the reader. For part (c), multiplying part (a) by q n and part (b) by p n , then
subtracting the first from the second yields
Corollary 1.1.2.
(a) For all n ≥ 1,
p n−1 p n (−1)n
− = .
q n−1 q n q n q n−1
p n−2 p n (−1)n−1 x n
− = .
q n−2 q n q n q n−2
But then,
p2m p2m+1 p2k+1
< < ,
q2m q2m+1 q2k+1
where the last inequality comes again from Corollary 1.1.2 (a). This contradiction
finishes the proof of part (c). Finally, for part (d), it suffices to show that
n−1
qn ≥ 2 2 . (1.3)
Note that for x = [x1 , x2 , x3 , . . .] and n ∈ N, the following recurrence relation holds:
1
rn = xn + .
r n+1
Since q n = x n q n−1 + q n−2 , we have that
qn 1
s−1
n = = xn + .
q n−1 q n−1 /q n−2
s n = [x n , x n−1 , . . . , x1 ].
Proof. We will prove the theorem by induction. For n = 1, on recalling the definitions
p0 := 0 and q0 := 1, we deduce that
p1 r2 + p0 r +0 1
= 2 = = x.
q1 r2 + q0 x1 r2 + 1 x1 + 1/r2
Now assume that the statement is true for some n ∈ N. Then, in light of The-
orem 1.1.1 (c), we obtain that
Proof. Using Theorem 1.1.5 and Theorem 1.1.1 (c), we infer that
x − p n = p n r n+1 + p n−1 − p n
q n q n r n+1 + q n−1 q n
p n q n r n+1 + p n−1 q n − p n q n r n+1 − p n q n−1
=
(q n r n+1 + q n−1 )q n
qn p − p n q n−1 1
= 2n−1 = .
q (r n + s n ) q2 (r
n+1 n + sn )
n+1
The first inequality follows since x n+1 < r n+1 , the second since x n+1 ≥ 1.
We end this section with the important result that every real number admits a con-
tinued fraction expansion. The basis of the proof is simply the Euclidean algorithm,
the algorithm for finding the greatest common divisor of two integers. Before stating
the theorem, recall that for each positive real number x the notation x denotes the
greatest integer not exceeding x and {x} denotes the fractional part of x, that is,
x = x + { x }.
Theorem 1.1.7. To every real number x ∈ (0, 1] there corresponds a continued fraction
with value equal to x. This continued fraction is infinite if and only if x is irrational.
Moreover, every irrational number has a unique continued fraction expansion.
Proof. If x = 1, then the continued fraction [1] := 1/1 is the one sought. So, suppose
that x ∈ (0, 1). Then set r1 := 1/x and define x1 := r1 , so that
1
x= .
x1 + {r1 }
If r1 is an integer, we are finished. Otherwise, if {r1 } ∈ (0, 1), then we set r2 := 1/{r1 },
so that
1
x= .
x1 + 1/r2
Suppose that the numbers r1 , r2 , . . . , r n have been defined and, if r n is not an integer,
let x n := r n and set r n+1 := 1/{r n }. Then we obtain the relation
1
rn = xn + .
r n+1
6 | 1 Number-theoretical dynamical systems
If the number x happens to be a rational number, then each r n as defined above will
also be rational. In this case, the process must stop after a finite number of steps.
Indeed, if r n = a/b and r n is not already an integer, then
1 1 b
r n+1 = = = .
r n − x n a/b − x n a − x n b
Therefore, r n+1 has a smaller denominator than r n and it follows that if we consider
the sequence r1 , r2 , r3 , . . ., we must eventually come to an integer. If that integer is
r k , then the number x is represented by the finite continued fraction [x1 , x2 , . . . , x k ],
where x k := r k > 1 (if r k = 1, then r k−1 must also be an integer, so we replace the two
final terms x k−1 and 1 by the single integer x k−1 + 1).
If x is irrational, then each r n must also be irrational and the above-described
process will not terminate. Then, by the definition of x k and Proposition 1.1.3, where
p n /q n := [x1 , x2 , . . . , x n ] as before, we have for each n ≥ 1 that
p2n p
< x < 2n+1 .
q2n q2n+1
This means that the continued fraction [x1 , x2 , x3 , . . .] has as its value the given
irrational number x.
It only remains to show that each infinite expansion is unique (recall that the
finite expansions are not unique, since the value of [x1 , . . . , x n ] is equal to the value of
[x1 , . . . , x n − 1, 1], for x n ≥ 2). So, fix
This proves uniqueness of the continued fraction expansion for all irrational numbers
from the unit interval.
1.1 Continued fractions and Diophantine approximation | 7
Theorem 1.1.8. For all irrational numbers x = [x1 , x2 , x3 , . . .] and for all n ∈ N, we have
that the inequality
x − p i < 1
q i 2q2i
Proof. By way of contradiction, suppose that the statement in the theorem is false.
This means that there exists some n ∈ N such that the inequality
x − p i ≥ 1
q i 2q2i
r i+1 + s i ≤ 2, for i = n, n + 1.
r n+2 ≤ 2 − s n+1 .
8 | 1 Number-theoretical dynamical systems
0 ≥ (s n+1 − 1)2 .
Since for each n ∈ N we have that s n+1 = 1, this contradiction finishes the proof.
Theorem 1.1.9 (Hurwitz’s Theorem I). For all irrational numbers x = [x1 , x2 , x3 , . . .] and
for all n ∈ N, we have that the inequality
x − p i < √ 1
qi 5 q2i
Proceeding for i = n and i = n + 1 as in (i) and (ii) in the previous proof, we derive the
inequality
√
s2n+1 − 5s n+1 + 1 ≤ 0. (1.4)
The strict inequality follows from the fact that γ and γ * are both irrational and s n is
rational. Using this, we obtain that
1 1 1
s n+2 = ≤ < = γ* ,
x n+2 + s n+1 1 + s n+1 1 + γ *
about the golden mean and why it is supposed to be so interesting and/or important;
unfortunately, a lot of these are simply wrong (see [Mar92]). However, the continued
fraction expansion of γ does give a clue as to why this particular number turns up so
often. Observe that γ is one of the two roots of the equation x2 − x − 1 = 0. Writing this
another way, we have that
1
γ = 1+ .
γ
Then, substituting for the γ which appears on the right-hand side of the above
equality, we obtain that
1
γ = 1+ .
1
1+
γ
Clearly this process can be repeated infinitely often, to yield the continued fraction
expansion γ = 1 + [1, 1, 1, . . .], where the fractional part consists of infinitely many
ones. The other number γ * appearing in the proof of Theorem 1.1.9 is simply equal to
γ − 1 = [1, 1, 1, . . .]. Since we are mostly concerned in this book with numbers in the
unit interval, we shall, in a slight abuse of terminology, also refer to this number as
the golden mean.
√
The next theorem shows that the constant 1/ 5 that appears in Theorem 1.1.9
cannot be improved for arbitrary irrational numbers. The proof again relies upon the
golden mean.
Theorem 1.1.11 (Hurwitz’s Theorem II). For the golden mean γ * we have that the in-
equality
* pn C
γ − ≤
q n q2 n
√
is satisfied for at most finitely many convergents p n /q n if and only if C < 1/ 5.
Proof. Firstly, note that since γ * = [1, 1, 1, . . .], we have for the remainders that r n :=
x n + [x n+1 , x n+2 , . . .] is in this case given by r n = 1 + [1, 1, 1, . . .] = 1/γ * , for all n ∈ N.
Secondly, note that
where for δ n we have that limn→∞ δ n = 0. Hence, from these two observations it
follows that
√ √
1 5+1 5−1 √
r n+1 + s n = + γ* + δn = + + δn = 5 + δn .
γ* 2 2
10 | 1 Number-theoretical dynamical systems
where the last inequality can only be fulfilled for finitely many n, due to the fact that
√ √
5 + ρ < 5 + δ n can be satisfied for at most finitely many n.
Corollary 1.1.12. For each irrational number x ∈ (0, 1], the inequality
x − p n ≤ K
q n q2n
√
is fulfilled for infinitely many reduced p n /q n as long as K ≥ 1/ 5.
We now want to investigate some further results in the vein of Hurwitz’s Theorems
and Corollary 1.1.12 given above. Before that, it will be useful to make some further
definitions.
Definition 1.1.13.
(a) Let c denote a fixed positive real number. An irrational number x is said to be
c-approximable if and only if the inequality
x − p < c
q q2
So, for every irrational number x it follows directly from Theorem 1.1.9 (Hurwitz’s
√
Theorem I) that ν (x) ≤ 1/ 5. From Hurwitz’s Theorem II and the fact that if x and y are
equivalent then ν (x) = ν (y) (see Exercise 1.6.6), it follows that if an irrational number
√
x is noble, then ν (x) = 1/ 5.
1.1 Continued fractions and Diophantine approximation | 11
is fulfilled for all i ∈ {n, n + 1, n + 2}, then it follows that x n+2 < N.
√
Proof. We proceed as in the proofs of Theorems 1.1.8 and 1.1.9, with 2 and 5,
√
respectively, now replaced by N 2 + 4. In this way, considering in sequence i = n and
i = n + 1, we derive
Proposition 1.1.16.
(a) Fix N ∈ N. If x is an irrational number in [0, 1] such that x ∈/ BN , then
x − p n ≤ √ 1
q n q2n N 2 + 4
√
is fulfilled for infinitely many n ∈ N. In other words, we have that ν (x) ≤ 1/ N 2 + 4.
(b) For each x ∈ B there exists a constant C > 0 such that for all n ∈ N we have
x − p n > C .
q n q2n
Proof. The first part is an immediate consequence of Theorem 1.1.14. For the second
part, fix x = [x1 , x2 , x3 , . . .] ∈ B. Then there exist numbers M and m0 such that x n < M
for all n ≥ m0 . Using this, for such an n we derive the inequality
and hence
x − p n > 1
, for all n ≥ m0 .
q n q2n (M + 2)
For each of the finitely many n < m0 we have that there exists a number c n > 0 such
that
x − p n > c n .
q n q2n
For certain types of point, which we define below, the orbit is very easy to determine.
1.2 Topological Dynamical Systems | 13
Suppose that we are given two dynamical systems (X, T) and (Y , S). It is desirable to
have conditions under which these two systems should be considered dynamically
equivalent, that is, as in some dynamical sense, “the same”. The sense we are after is
that their orbits should behave in the same way. The following definition does exactly
this job.
Definition 1.2.2. Two dynamical systems (X, T) and (Y , S) are said to be topologically
conjugate if there exists a homeomorphism h : X → Y, called a conjugacy map, such
that
h ◦ T = S ◦ h.
In other words, (X, T) and (Y , S) are topologically conjugate if there exists a homeo-
morphism h such that the following diagram commutes:
T
X X
h h
Y Y
S
Remark 1.2.3.
1. Topological conjugacy defines an equivalence relation on the space of all topolo-
gical dynamical systems.
2. If two dynamical systems (X, T) and (Y , S) are topologically conjugate via a
conjugacy map h, then all of their corresponding iterates are topologically
conjugate by means of h. That is, h ◦ T n = S n ◦ h for all n ≥ 1. Therefore, there
exists a one-to-one correspondence between the orbits of T and those of S.
Suppose that we are given two topologically conjugate dynamical systems, (X, T) and
(Y , S). If T(x) = x, it follows that
and so h(x) is a fixed point of the map S. Thus, there is a one-to-one correspondence
between the fixed points of T and the fixed points of S. In particular, if the number of
14 | 1 Number-theoretical dynamical systems
fixed points of T and S are not equal, the systems cannot be topologically conjugate.
The number of fixed points is an example of a topological conjugacy invariant. The
number of periodic points of each prime period is similarly seen to be a topological
conjugacy invariant.
Example 1.2.4.
(a) Consider the two maps of the unit circle T2 : R/Z → R/Z and T3 : R/Z → R/Z,
defined by setting
The map T2 has a single fixed point in 0 and the set of fixed points of the map T3 is
{0, 1/2}. Therefore, the number of fixed points is not the same and so, the sytems
(R/Z, T2 ) and (R/Z, T3 ) cannot be topologically conjugate.
√
(b) Let f : [0, 1] → [0, 1] be given by f (x) := x and g : [0, 1] → [0, 1] be given by
g(x) := 3x(1 − x). Then the set of fixed points of f is {0, 1}, whereas the set of fixed
points of g is {0, 2/3}. However, although they have the same number of fixed
points, there is no topological conjugacy map between ([0, 1], f ) and ([0, 1], g).
This can be seen by noting that every homeomorphism h : [0, 1] → [0, 1] is either
strictly increasing or strictly decreasing. In order to have h be a conjugating
homeomorphism between f and g, we would have to have h(0) := 0 and h(1) := 2/3
or vice versa. But this is simply not possible for a strictly monotonic function that
also has to map [0, 1] onto [0, 1].
In the next definition, we give a weaker notion than that of topological conjugacy.
Definition 1.2.5. Let (X, T) and (Y , S) be two dynamical systems. If there exists a
continuous surjection h : X → Y which satisfies h ◦ T = S ◦ h, then S is called a
(topological) factor of T. The map h is thereafter called a factor map.
In general, the existence of a factor map between two systems is not sufficient to make
them topologically conjugate. Nonetheless, if (Y , S) is a factor of (X, T), then every
orbit of T is projected to an orbit of S. As every factor map is by definition surjective,
this means that all of the orbits of S have an analogue in T. However, as a factor map
may not be injective, more than one orbit of T may be projected to the same orbit of S.
In other words, some orbits of S may have more than one analogue in T. Therefore, the
dynamical system (X, T) can in this sense be thought of as being more “complicated”
than the factor (Y , S).
In the following subsection, we introduce the first of the examples of
number-theoretic dynamical systems that will be used to illustrate various concepts
throughout the book. We will soon see that this map is related to the continued fraction
expansion introduced in Section 1.1.1.
1.2 Topological Dynamical Systems | 15
The map G is referred to as the Gauss map and its graph is shown in Fig. 1.1.
Let us also define here the inverse branches G n : (0, 1) → (1/(n + 1), 1/n) of the Gauss
map. These are given, for each n ∈ N, by
1
G n (x) := .
x+n
In the following proposition, we shall show how the map G acts on points in the unit
interval written in terms of their continued fraction expansion.
Proposition 1.2.7. If x = [x1 , x2 , x3 , . . .] ∈ [0, 1], where the continued fraction expansion
of x is either infinite or finite and consists of at least two elements, then G(x) =
[x2 , x3 , x4 . . .]. Moreover, if x = [x1 ], then G(x) = 0.
Proof. If x = [x1 , x2 , x3 , . . .], then directly from the definition of G, we have that
1 1 1
G(x) = − = x1 + − x1 = [x2 , x3 , . . .].
x x 1
x2 +
x3 + . . .
G(x)
1
0 1 1 x
2
1 1
Fig. 1.1. The Gauss map G : [0, 1] → [0, 1], G(x) := − for x ∈ (0, 1] and G(0) := 0.
x x
16 | 1 Number-theoretical dynamical systems
Using the latter proposition, we can very easily identify the fixed points and periodic
points for the map G. First of all, by definition, G has a fixed point at 0. There
are countably many more fixed points, given by the points x = [n, n, n, . . .], for
n ∈ N. For n = 1, this is the golden mean, γ * . For n = 2, we have the fixed point
√
2 − 1 = [2, 2, 2, . . .]. The periodic points for G are simply the periodic continued
fractions, that is, points with continued fraction expansions of the form [x1 , . . . , x k ] :=
[x1 , . . . , x k , x1 , . . . , x k , x1 , . . .], with the block x1 , . . . , x k repeating infinitely many
times. It remains to describe the pre-periodic points. The pre-periodic points of
the trivial fixed point 0 are the rational numbers. The pre-periodic points for all
other fixed points are numbers of the form x = [x1 , . . . , x k , n, n, n, . . .], that is, with
finitely many elements that can take any value, before infinitely many of the same
element n, for some n ∈ N. In particular, the pre-periodic points for the fixed point
γ * are the set of noble numbers (cf. Definition 1.1.13 (d)). Finally, the pre-periodic
points for the remaining periodic points are those points in the unit interval with
eventually periodic continued fraction expansions, that is, numbers of the form x =
[x1 , . . . , x m , x m+1 , . . . , x m+k ]. These eventually periodic expansions are the subject of
the following theorem. Before stating the theorem, recall that a quadratic surd is an
irrational root of a quadratic equation with integer coefficients. These are also called
algebraic numbers of degree two.
Theorem 1.2.8 (Lagrange’s Theorem). Every quadratic surd has an eventually periodic
continued fraction expansion; conversely, every eventually periodic continued fraction
represents a quadratic surd.
Recall Proposition 1.2.7, where we showed how the Gauss map acts on the continued
fraction expansions. In this section, we will introduce some more general theory
which will illuminate the idea behind this proposition, namely, the beginnings
of the theory of symbolic dynamics and its connections to topological dynamical
systems which admit a Markov partition (see Section 1.2.5 for the definition of
Markov partitions in the context of interval maps). We will only provide a very
short introduction, and for more details we refer, for example, to Lind and Marcus
[LM95].
1.2 Topological Dynamical Systems | 17
Definition 1.2.9.
(a) Let E be a countable, possibly infinite set containing at least two elements. The
set E will be referred to as an alphabet. The elements of E will be called letters or
symbols.
(b) For each n ∈ N we shall denote by E n the set of all words comprising n letters from
the alphabet E. For later convenience, we also denote the empty word (that is, the
word having no letters) by ε. For instance, if E = {0, 1} then
E1 = E and E2 = {(0, 0), (0, 1), (1, 0), (1, 1)}, whereas
E3 = {(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1)}.
(c) We will denote by E* := n∈N E n the set of all finite non-empty words over the
alphabet E. The set of all infinite words will be denoted by EN . In other words,
EN := ω = (ω i )∞
i=1 : ω i ∈ E for all i ∈ N .
(d) We define the length |ω| of a word ω to be the number of letters of which it consists.
That is, for every ω ∈ E* , the length of ω is the unique n ∈ N such that ω ∈ E n . For
ω ∈ EN , we have that |ω| = ∞. Furthermore, |ε| = 0.
(e) If ω ∈ E* ∪ EN and n ∈ N does not exceed the length of ω, we define the initial block
ω|n to be the initial n-length word of ω, that is, the subword ω1 ω2 . . . ω n .
(f) Given two words ω, τ ∈ E* ∪ EN , we define their wedge ω ∧ τ ∈ {ε} ∪ E* ∪ EN to be
their longest common initial block. For example, if we again let E = {0, 1} and we
have words ω = (0, 0, 1, 0, 1, . . .) and τ = (0, 0, 1, 1, 0, . . .), then ω ∧ τ = (0, 0, 1).
On the other hand, if γ = (1, 0, 1, 0, 1, . . .) then ω ∧ γ = ε. Of course, if two (finite
or infinite) words ω and τ are equal, then ω ∧ τ = ω = τ.
Let us now introduce a metric on the space EN which reflects the idea that two words
are close if they share a long initial block. In other words, the longer their common
initial subword, the closer two words are. We leave the proof that this genuinely
defines a metric, in fact an ultrametric, as an exercise (see Exercise 1.6.10).
Definition 1.2.11. Given a finite word ω ∈ E* , the cylinder set [ω] generated by ω is the
set of all infinite words with initial block ω, that is,
We now introduce the shift map, which is defined by dropping the first letter of each
word and shifting all the remaining letters one place to the left.
18 | 1 Number-theoretical dynamical systems
Definition 1.2.12. The shift map σ : EN → EN is defined by setting σ(ω) = σ((ω i )i≥1 ) :=
(ω i+1 )i≥1 . That is,
It is evident that if ω = τ, then both d(ω, τ) and d(σ(ω), σ(τ)) are equal to zero.
The system (EN , σ) will be referred to as the full shift system. Let us now consider
another useful construction, that of sub-shifts. These are restrictions of the shift map
to certain closed and shift-invariant subsets of EN and are easiest to define in terms of
an incidence matrix, that is, an (E × E)-matrix consisting entirely of 0s and 1s.
Definition 1.2.13. Let A = A ij i,j∈E be an incidence matrix. The set of all infinite
A-admissible words is defined to be
EN N
A := { ω ∈ E : A ω n ω n+1 = 1, for all n ∈ N} .
Note that if the n-th row of A does not contain any 1, then no word can contain the
letter n. This letter then may as well be thrown out of the alphabet. We will therefore
impose the condition that every row of A contains at least one 1. Further, notice that
if all the entries of the incidence matrix A are equal to 1, then EN N
A = E . However, if A
N N
has at least one 0 entry, then E A is a proper subset of E , called a sub-shift (of finite
type). In particular, if A is the identity matrix then EN
A = {(e, e, e, . . .) : e ∈ E }, that is,
ENA is the set of all constant words, which are the fixed points of σ in EN . We will now
consider a more interesting sub-shift.
shift. One reason for this name can be discovered on considering the cardinality of
the sets E nA , for n ∈ N. We have (and the reader is advised to draw a diagram of
the A-admissible words to see why) that the sequence (# E nA )n≥1 coincides with the
sequence (2, 3, 5, 8, 13, 21, 34, . . .). This latter sequence is, of course, the sequence
of Fibonacci numbers starting with f 2 = 2 and f3 = 3 instead of f0 = 1 and f1 = 1.
√
Observe that we have for the sequence of convergents (p n /q n )n≥1 to γ = (1 + 5)/2
that p n /q n = f n+2 /f n+1 for each n ∈ N.
In order to define the shift map on a sub-shift space EN A , we must first verify that these
spaces are σ-invariant, which means that σ(EN A ) ⊆ E N N
A So, let ω ∈ E A . Then A ω n ω n+1 = 1
.
for every n ∈ N. In particular, A(σω)n (σω)n+1 = A ω n+1 ω n+2 = 1 for all n ∈ N. Thus, σω ∈ EN A
and it therefore follows that ENA is σ-invariant. Therefore, the restriction σ : E N
A → E N
A is
well defined.
Let us now return to the Gauss map. In light of the above description of symbolic
dynamics and the shift map, we can now rephrase Proposition 1.2.7 in the following
way: The Gauss map acts on the irrational points of the unit interval like the full shift
map acts on the space NN . More precisely, we have that the topological dynamical
systems (I, G) and (NN , σ) are topologically conjugate under the map h : I → NN defined
by h([x1 , x2 , x3 , . . .]) = (x1 , x2 , x3 , . . .) ∈ NN . In other words, we obtain the following
commuting diagram:
h h
We leave it as an exercise for the reader to verify that the map h is genuinely a
conjugacy map between the Gauss and shift systems (see Exercise 1.6.12).
Recall the sequence of convergents to each irrational number x ∈ [0, 1] introduced
in Subsection 1.1.1; these are the equivalent of initial blocks for the Gauss map and so
we can use them to define Gauss cylinder sets.
Definition 1.2.15. For each choice x1 , . . . , x k ∈ N, define the k-th level Gauss cylinder
set C(x1 , . . . , x k ) by
The first level of these cylinder sets are given by the sets, for n ∈ N,
for some y1 , y2 , . . . ∈ N. Therefore, since the values of [y1 , y2 , . . .] lie between 0 and 1,
we see that such a point is bigger than 1/(n +1) and smaller than 1/n. Therefore, C(n) =
(1/(n + 1), 1/n) ∩ I. In general, if we fix some x1 , . . . , x k ∈ N and let x ∈ C(x1 , . . . , x k ),
we have that
1
x=
1
x1 +
.. 1
.+
x k + [y1 , y2 , . . .]
and again, since the values of [y1 , y2 , . . .] range from 0 to 1, we infer that x can
take any value in the set ([x1 , . . . , x k ], [x1 , . . . , (x k + 1)])± ∩ I, where the notation (·, ·)±
indicates that the rational number [x1 , . . . , x k ] may be the left or right endpoint of
C(x1 , . . . , x k ) depending upon whether k is even or odd, respectively. Thus, for the
Lebesgue measure λ(C(x1 , . . . , x k )) of this set, we find that
p p + p k−1 1
λ(C(x1 , . . . , x k )) = k − k = .
q k q k + q k−1 q
q2k 1 + k−1
qk
Here, the last equality comes from Theorem 1.1.1 (c). From this, we immediately obtain
that
1 1
≤ λ(C(x1 , . . . , x k )) ≤ 2 . (1.7)
2q2k qk
One easily verifies that the second ratio on the right-hand side of the above equality is
bounded above by 2 and below by 1/3, so finally we find that
1 λ(C(x1 , . . . , x k , n)) 2
≤ ≤ 2. (1.8)
3n2 λ(C(x1 , . . . , x k )) n
Let us now consider another area of elementary number theory related to the con-
tinued fraction expansion. In this subsection, we will prove some results concerning
the Lebesgue measure λ of various sets of numbers, beginning with the set of badly
approximable numbers that was introduced in Definition 1.1.15. The proofs will utilise
the Gauss cylinder sets defined above.
Theorem 1.2.16. Where B denotes the set of badly approximable numbers, we have that
λ(B) = 0.
A(n)
N := {[x 1 , x 2 , . . .] ∈ I : x i < N for all 1 ≤ i ≤ n } .
A(n+1)
N = C(x1 , . . . , x n , k).
(x1 ,...,x n ) k:k<N
x i <N,1≤i≤n
Then,
p n + p n−1 p n N + p n−1
λ C(x1 , . . . , x n , k) = −
q n + q n−1 q n N+q n−1
k:k<N
N −1
=
q2n (1 + s n )(N + s n )
N −1 N −1
< 2 = λ(C(x1 , . . . , x n )).
q n N(1 + s n ) N
Thus,
⎛ ⎞
⎜ ⎟
λ A(n+1)
N = λ⎝ C(x1 , . . . , x n , k)⎠
(x1 ,...,x n ) k:k<N
x i <N,1≤i≤n
= λ C(x1 , . . . , x n , k)
(x1 ,...,x n ) k:k<N
x i <N,1≤i≤n
N −1
1
≤ λ(C(x1 , . . . , x n )) = 1 − λ A(n)
N
(x ,...,x )
N N
1 n
x i <N,1≤i≤n
λ (AN ) = 0.
Finally, observing that B = N∈N AN , we have
λ (B ) = λ AN ≤ λ (AN ) = 0.
N∈N N∈N
1.2 Topological Dynamical Systems | 23
Corollary 1.2.17. Let W := [x1 , x2 , . . .] ∈ I : lim supn→∞ x n = ∞ . Then,
λ(W ) = 1.
Proof. The set W is simply the complement of the set B of badly approximable
numbers.
We have now seen that the set of badly approximable numbers does not contribute
to sets of irrational numbers of positive Lebesgue measure. Hence, if we want to
investigate sets of positive measure, then we have to look for irrationals which
are more rapidly approximated by their approximants than is the case for badly
approximable irrationals. Our next aim is to prove a theorem, originally due to Borel
and Bernstein¹, which will give us some information in this direction. In order to prove
this theorem, we will need the following well-known and extremely useful result. The
proof is neither long nor complicated, so we include it here for completeness.
λ(C∞ ) = 0.
∞
Proof. The convergence of n=1 λ(C n ) implies that for each ε > 0, there exists some
n(ε) ∈ N such that
λ(C n ) < ε.
n≥n(ε)
C∞ ⊂ Cn .
n≥n(ε)
1 Borel’s original article [Bor09] contained a mistake, which was observed and corrected by Bernstein
[Ber12b, Ber12a]
24 | 1 Number-theoretical dynamical systems
We are now in a position to prove the second and final main theorem of this
subsection.
λ(Wφ ) = 1,
∞
(b) If the series n=1 1/φ(n) converges, then
λ(Wφ ) = 0.
Proof. To prove part (a) we show that the complement of Wφ has measure zero. The
proof follows along the same lines as the proof of Theorem 1.2.16. As before, we obtain
that
⎛ ⎞
1
λ ⎝ ⎠
C(x1 , . . . , x n , k) < 1 − λ C(x1 , . . . , x n ) .
φ(n + 1)
1≤k<φ(n+1)
Using the fact that 1 − x < e−x for each 0 < x < 1, we have that
n
λ Bφ(m,n+1) < e− k=m 1/φ(k+1) λ Bφ(m,m) ,
which implies, since by assumption the series nk=m 1/φ(k + 1) gets arbitrarily large,
that for each m ∈ N,
lim λ Bφ(m,n) = 0.
n→∞
Hence, as Bφm := {[x1 , x2 , . . .] ∈ I : x i ≤ φ(i) for all i ≥ m} ⊂ Bφ(m,n) for all n ∈ N, we finally
obtain that λ(Bφm ) = 0. Consequently,
λ(WφC ) = λ Bφm = 0.
m≥1
1.2 Topological Dynamical Systems | 25
It remains to prove part (b). For this, we aim to use the Borel–Cantelli Lemma. To that
end, define the sets Wφ(n) , for n ∈ N, by setting
So, in order to apply the Borel–Cantelli Lemma, it suffices to show that the series
∞ (n)
n=1 λ W φ converges. Indeed,
⎛ ⎞
λ Wφ(n+1) = λ ⎝ C(x1 , . . . , x n , k)⎠ (1.9)
(x1 ,...,x n )∈Nn k>φ(n+1)
and
⎛ ⎞
p n φ(n + 1) + p n−1 p n
λ⎝ ⎠
C(x1 , . . . , x n , k) = −
q n φ(n + 1) + q n−1 q n
k>φ(n+1)
1 1 + sn
=
q2n (1 + s n ) φ(n + 1) + s n
2
< λ(C(x1 , . . . , x n )). (1.10)
φ(n + 1)
and so,
∞
∞
λ Wφ(n+1) < 2 1/φ(n) < ∞.
n=1 n=1
Proof. This follows from Theorem 1.2.19 immediately on the observation that
∞
1
∞
1
= ∞ and < ∞, for all ε > 0.
n log n n(log n)1+ε
n=1 n=1
In this subsection, we introduce the idea of a Markov partition. This notion will be
used repeatedly to build symbolic codings for our various examples. We will restrict
the discussion to maps defined on the closed unit interval [0, 1] which are either
already continuous or become continuous when considered as functions on the circle
R/Z [0, 1) identifying the points 0 and 1, with the exception of a countable set of
points (denoted by E ) where the map can be discontinuous. We will refer to such
maps as Markov interval maps. To make this clearer, all the possibilities are covered
by considering the tent map, the doubling map (modulo 1), as in Example 1.2.4 (see
also Fig. 1.2), and the Gauss map (modulo 1) (see Fig. 1.1), for which the exceptional
set is E = {0}.
M1(x) M2(x)
1 1
0 1 x 0 1 x
Fig. 1.2. The graph of two Markov interval maps. The map M1 is continuous on [0, 1] whereas M2 has
to be considered as a continuous function on the circle R/Z [0, 1).
1.2 Topological Dynamical Systems | 27
Example 1.2.22. We give here the most natural Markov partition for the Gauss map,
G : R/Z → R/Z. For each i ∈ N, let A i := (1/(i + 1), 1/i), so that M is the collection of
sets M = {(1/(i + 1), 1/i) : i ∈ N}. Then the set E is equal to the singleton {0}. Notice
that M in fact coincides with the family of first level Gauss cylinder sets, up to sets of
measure zero. We must check that the three properties defining a Markov partition are
satisfied for this choice of M. Property (a) is clearly satisfied, by the definition of the
Gauss map. Property (b) is also straightforward to check. For property (c), it is enough
to notice that for all i ∈ N we have
G (A i ) = (0, 1).
Let us now consider collections U of subsets of X and define some operations on these
collections.
U ∨ V := {U ∩ V : U ∈ U , V ∈ V} .
T −1 U := {T −1 (U) : U ∈ U }.
"
n−1
U n := T −k (U ) = U ∨ T −1 (U ) ∨ · · · ∨ T −(n−1) (U ).
k=0
These operations on collections of sets can now be used to define a certain type of
Markov partition.
Definition 1.2.24. A Markov partition M for a dynamical system ([0, 1], T) is said to be
shrinking if the diameter of the largest element in each refinement Mn shrinks to zero
28 | 1 Number-theoretical dynamical systems
Each shrinking Markov partition gives rise to a canonical coding, in the following way.
(Here we are following [Adl98], and we refer to this paper for further discussion.)
defined by
∞
π((ω1 , ω2 , ω3 , . . .)) := A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −n A ω n+1 .
n=0
Note that this map is well defined, since the intersection on the right-hand side
above is a singleton, due to Cantor’s Intersection Theorem (since the sequence
(A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −n A ω n+1 )n≥0 is a nested sequence of compact sets with diamet-
ers shrinking to zero; see Exercise 1.6.14 if you have not encountered this useful result
before). Due to the restriction that the exceptional set E be countable, we obtain a
unique coding for all but countably many points.
Theorem 1.2.26. The map π is a factor map from the system EN A , σ to the system
∞ −n
[0, 1] \ n=0 T (E ), T .
Proof. Recall that the definition of a factor map is that π should be a continuous
∞ −n
surjection from E∞ A to [0, 1] \ n=0 T (E ) such that π ◦ σ = T ◦ π. It is easily
demonstrated that π is uniformly continuous. Indeed, for this it is enough to notice
that diam(π([x1 , . . . , x n ])) ≤ sup diam(M) : M ∈ Mn → 0, for n → ∞, and uniformly
for all admissible n-cylinders.
To see that π is surjective, we argue inductively. Let x ∈ [0, 1] \ ∞ −n
n=0 T (E ) be
an element of A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −(n−1) A ω n for some n ∈ N. Since the collection
{M : M ∈ Mn+1 } covers the set
∞
A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −(n−1) A ω n \ T −n (E ),
n=0
On the other hand, using the continuity of the restriction of T to A ω1 we have that
∞
T ◦ π(ω) = T A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −n A ω n+1
n=0
∞
= TA ω1 ∩ A ω2 ∩ · · · ∩ T −(n−1) A ω n+1
n=0
∞
= A ω2 ∩ T −1 A ω3 ∩ · · · ∩ T −(n−1) A ω n+1
n=0
where the final equality comes from the fact that T(A ω1 ) ⊃ A ω2 , since ω belongs to the
set EN
A . This finishes the proof.
Example 1.2.27. Let us return once more to the Gauss map and to its canonical Markov
partition given in Example 1.2.22. We can easily see that the Markov partition M given
in that example is shrinking. Indeed, the refinement Mn coincides with the collection
of n-th level Gauss cylinder sets, up to sets of measure zero, and the largest element
of this collection is C(1, . . . , 1). We saw in (1.7) that
1 1
≤ λ(C(x1 , . . . , x n )) ≤ 2 ,
2q2n qn
and since for C(1, . . . , 1) we have that q n is equal to the (n + 1)-th Fibonacci number, it
is clear that as n tends to infinity, the diameter of the cylinder set with code consisting
of n 1s shrinks to zero.
Since M is shrinking, we can therefore obtain a coding from it. Recall that the
set E in this case is equal to the singleton {0}. Therefore, the set of points where the
coding is undefined is here simply equal to the set of pre-periodic points of 0, that is,
the rational numbers in [0, 1]. So, for any point ω = (x1 , x2 , x3 , . . .) ∈ NN , we have that
In other words, we have that π(ω) lies in A x1 = (1/(x1 + 1), 1/x1 ), G(π(ω)) lies in A x2 =
(1/(x2 + 1), 1/x2 ), G2 (π(ω)) lies in A x3 = (1/(x3 + 1), 1/x3 ) and so on. Thus, the first
continued fraction element of π(ω) is equal to x1 , the second is equal to x2 , the third
is equal to x3 and so on. Therefore, we have shown that the coding that arises from the
Markov partition M coincides with the continued fraction expansion of each irrational
number.
30 | 1 Number-theoretical dynamical systems
Note that in this case the map π : NN → I is actually a topological conjugacy map,
since the endpoints of all the intervals in M lie in the set G−1 (0) and hence, the map
is injective as well as surjective.
Remark 1.2.28. The ideas of this section can extended much further, to the setting
of graph-directed Markov systems. For this, we refer the reader to the textbook by
Mauldin and Urbanski [MU03].
In this section, we introduce the second of our main examples, namely, the Farey
map. First we give the definition and show how the Farey map is related to the Gauss
map. We then define a Markov partition for the Farey map and describe the coding
associated with it. Finally, in the second subsection, we give some basic topological
properties of the Farey map.
Let us now define a transformation on the unit interval that is closely related to the
Gauss map.
F (x)
1
0 1 1 x
2
Fig. 1.3. The Farey map, F : [0, 1] → [0, 1]. In the fixed point at 0 the map F has slope 1. The second
√
fixed point lies in γ * = 5 − 1 /2.
1.3 The Farey map: definition and topological properties | 31
For later use, let us also give the inverse branches of F. These are the functions
F0 : (0, 1) → (0, 1/2) and F1 : (0, 1) → (1/2, 1) which are defined by
x 1
F0 (x) := and F1 (x) := .
1+x 1+x
It is easily shown that the action of the map F on a point x = [x1 , x2 , x3 , . . .] ∈ (0, 1) is
as follows:
#
[x1 − 1, x2 , x3 , . . .] if x1 > 1;
F(x) =
[x2 , x3 , x4 , . . .] if x1 = 1.
For this reason, the Farey map is sometimes referred to as the slow continued fraction
map, whereas the Gauss map is referred to as the fast continued fraction map. We can
describe the relationship between the Farey and Gauss maps more precisely. To do
this, we introduce the idea of a jump transformation, which is also often referred to as
Schweiger’s jump transformation [Sch95].
Note that ρ(x) is finite for all x ∈ (0, 1]. Then, let the map F * : [0, 1] → [0, 1] be defined by
#
* F ρ(x)+1 (x) if x = 0;
F (x) :=
0 if x = 0.
Lemma 1.3.3. The jump transformation F * of the Farey map coincides with the Gauss
map.
Proof. Fix n ≥ 2 and let x = [n, x2 , x3 , . . .], so x ∈ (1/(n + 1), 1/n]. Then, we have that
On the other hand, if x = [1, x2 , x3 , . . .] ∈ (1/2, 1], then ρ(x) is equal to zero and so we
have that F * (x) = F(x), which is again equal to G(x), since G|(1/2,1] = F |(1/2,1] .
Our next aim is to describe a coding generated by the Farey map. The two open
sets {B0 := (0, 1/2), B1 := (1/2, 1)} form a Markov partition for F. In this case, the
exceptional set E is the empty set. That the three conditions for a Markov partition
are satisfied is easy to check, so we leave this as an exercise.
The above Markov partition for the Farey map yields a coding of the numbers in
[0, 1] as follows. Each element ω = (x1 , x2 , x3 , . . .) ∈ {0, 1}N is mapped by π to the
32 | 1 Number-theoretical dynamical systems
We will use the notation π(ω) = x =: x1 , x2 , x3 , . . . ∈ [0, 1]. This coding will be referred
to as the Farey coding. The Farey coding is related to the continued fraction expansion
of x in a straightforward way. Indeed, if x = [x1 , x2 , x3 , . . .], then the Farey coding of
x is given by x = 0x1 −1 , 1, 0x2 −1 , 1, 0x3 −1 , 1, . . . , where 0n denotes the sequence of
n ∈ N consecutive appearances of the symbol 0 and 00 is understood to mean the
appearance of no zeros between two consecutive 1s. This follows directly from the fact
that the Gauss map is the jump transformation on (1/2, 1] of the Farey map.
We have described the coding for irrational numbers; let us now consider the
rational numbers. We can still define a code for these points, it will just no longer
be unique. So, let x = [x1 , . . . , x k ]. In this case we obtain two infinite codings, namely,
we have that
x = 0x1 −1 , 1, 0x2 −1 , 1, . . . , 0x k −1 , 1, 0, 0, 0, . . .
and
x = 0x1 −1 , 1, 0x2 −1 , 1, . . . , 0x k −2 , 1, 1, 0, 0, 0, . . . .
Indeed, take for example the point 1/2. Observe that 1/2 ∈ B1 but also 1/2 ∈ B0 ,
so the first entry in the Farey coding for 1/2 can be either 1 or 0. Then F(1/2) = 1
and F n (1/2) = 0 for all n ≥ 2, therefore we obtain that 1/2 = 1, 1, 0, 0, 0, . . . and
1/2 = 0, 1, 0, 0, 0, . . .. The coding for any rational number can be deduced from this
example, since every rational number is eventually mapped to 1/2 by the map F (or,
to put it another way, the set of rational numbers coincides with the iteration of the
point 1/2 under the inverse branches of the Farey map).
Recall that F acts on x = [x1 , x2 , x3 , . . .] in the following way:
#
[x1 − 1, x2 , x3 , . . .] for x1 ≥ 2;
F(x) :=
[x2 , x3 , x4 . . .] for x1 = 1.
So, again, the system (I, F) can be thought of as acting like the shift map σ on the shift
space EN , where this time the alphabet E = {0, 1} is finite. This is a potential advantage
of the map F over the map G in certain situations, as the space {0, 1}N is a compact
metric space, whereas NN is not.
1.3 The Farey map: definition and topological properties | 33
We are now in a position to define the cylinder sets with respect to the Farey map.
Definition 1.3.4. For each n-tuple (x1 , . . . , x n ) ∈ {0, 1}n , define the Farey cylinder set
1 , . . . , x n ) by setting
C(x
Note that the Farey cylinder sets coincide with the refinements of the Markov partition
{B 0 , B1 } under the inverse branches F0 and F 1 of F. That is, the cylinder set
1 , . . . , x n ) coincides with the set F x1 ◦ F x2 ◦ · · · ◦ F x n ((0, 1)). We will refer to these
C(x
successive refinements as the Farey decomposition.
There is another way to describe these cylinder sets, in terms of the classical
construction of Stern–Brocot intervals (cf. [Ste58], [Bro61]). For each n ≥ 0, the
elements of the n-th member of the Stern–Brocot sequence
$ %
s n,k
Bn := : k = 1, . . . , 2n + 1
t n,k
From this arises the set Tn of Stern–Brocot intervals of order n which is given by
$& ' %
s n,k s n,k+1
Tn := , : k = 1, . . . , 2n .
t n,k t n,k+1
It is straightforward to check that these intervals are precisely the same intervals as
the set
Returning to the Stern–Brocot sequence, the n-th member of this sequence consists of
2n + 1 proper fractions and the n-th member of the sequence can be obtained from
the (n − 1)-th member by adding in the mediant of each neighbouring pair, where
we remind the reader that the mediant of two rational numbers a/b and a /b is by
definition the rational number (a + a )/(b + b ). The first few of these sequences are
given by:
$ % $ % $ %
0 1 0 1 1 0 1 1 2 1
B0 = , , B1 = , , , B2 = , , , , ,
1 1 1 2 1 1 3 2 3 1
$ %
0 1 1 2 1 3 2 3 1
B3 = , , , , , , , , ,
1 4 3 5 2 5 3 4 1
$ %
0 1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1
B4 = , , , , , , , , , , , , , , , , ,...
1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 1
34 | 1 Number-theoretical dynamical systems
1
2
1 1 2
3 2 3
1 1 2 1 3 2 3
4 3 5 2 5 3 4
1 1 2 1 3 2 3 1 4 3 5 2 5 3 4
5 4 7 3 8 5 7 2 7 5 8 3 7 4 5
Fig. 1.4. The dyadic Stern–Brocot tree with root s1,2 /t1,2 = 1/2. For each s n,2k /t n,2k , n ∈ N and k =
1, . . . , 2n−1 we have the two offspring s n+1,2k /t n+1,2k and s n+1,2k+2 /t n+1,2k+2 . For each n, the missing
elements from Bn \{0, 1} are marked in grey.
This sequence also gives rise to the Stern–Brocot Tree as shown in Fig. 1.4. The vertices
of the n-th generation of the Stern–Brocot Tree will be denoted by Sn := Bn \ Bn−1 , for
n ∈ N and the sequence (Sn )n∈N is called the even Stern–Brocot sequence.
Remark 1.3.5. The Markov partition {B0 , B1 } given above is not the only reasonable
choice for the Farey map. We might instead choose to use the partition M, that is, the
partition defined in Example 1.2.22 for the Gauss map. It is not hard to verify that this is
also a shrinking Markov partition for the Farey map, and so yields a different coding for
the points of [0, 1]. In this case, we obtain a coding map π : NN −n
A → [0, 1] \ n∈N F (0),
N
where the sub-shift NA is determined by the infinite transition matrix given by
⎧
⎪
⎨1 if i = 1 and j ∈ N;
A ij = 1 if i = n and j = n − 1, for n ≥ 2;
⎪
⎩0 otherwise.
This sub-shift is known as the (infinite) renewal shift. We will return to the subject of
renewal theory in Chapter 3.
We now come to the study of the Farey map from a topological point of view. Let us
first introduce another well-studied dynamical system, which we shall shortly show
to be topologically conjugate to the Farey map.
T (x)
1
0 1 1 x
2
The map T is referred to as the tent map; the reason for this name is clear on inspection
of the graph of T, shown in Fig. 1.5.
It turns out, and we shall prove this shortly, that the conjugating homeomorphism
between the Farey map and the tent map coincides with Minkowski’s question-mark
function, which we shall denote by Q : [0, 1] → [0, 1]. This remarkable function
was originally introduced by Minkowski [Min10] and later investigated by Denjoy
[Den38] and Salem [Sal43], amongst others. Minkowski’s original motivation behind
the definition of the function that now bears his name was to highlight the intriguing
property of continued fractions that was described in Lagrange’s Theorem (see
Theorem 1.2.8). Recall that this theorem states that the set of irrational algebraic
numbers of degree two corresponds precisely to the set of real numbers that admit
an eventually periodic continued fraction expansion. In other words, if x ∈ [0, 1] can
be written as a continued fraction of the form [x1 , . . . , x m , x m+1 , . . . , x m+k ], then x is an
irrational root of some quadratic polynomial and, moreover, the converse statement
also holds. Minkowski designed the function Q to map the quadratic surds into the
non-dyadic rationals in a continuous and order-preserving way (we leave the proof
of this to Exercise 1.6.15). The question-mark function is constructed in the following
way. First, define Q(0) = Q(0/1) := 0 and Q(1) = Q(1/1) := 1. Then, define
p + p Q(p/q) + Q(p /q )
Q
:= .
q+q 2
In other words, the function Q is successively defined on all the rational numbers
in the unit interval by taking mediants of those that have already been defined. The
definition of Q is extended to all of [0, 1] by continuity (since any uniformly continuous
36 | 1 Number-theoretical dynamical systems
Q n (x ) Q (x )
1 1
7/8
3/4
5/8
1/2
3/8
1/4
1/8
0 1 1 2 1 3 2 3 1 x 0 1 x
4 3 5 2 5 3 4
Fig. 1.6. On the left, the graphs of the functions Q n , n = 1, 2, 3, and on the right, an approximation to
the graph of the Minkowski question-mark function, Q : [0, 1] → [0, 1].
function from a dense set of a metric space E into another metric space can be uniquely
extended to a continuous function on all of E).
Another way to think about the question-mark function is as a uniform limit of
the sequence of piecewise linear functions (Q n )n∈N , where each Q n : [0, 1] → [0, 1]
is defined by mapping the n-th level Stern–Brocot fractions, arranged in increasing
order, onto the set {p/2n : 0 ≤ p ≤ 2n } and then joining these image points by straight
line segments. Above, in Fig. 1.6, can be found an illustration of the first few of these
functions and also an approximation of the graph of Q itself.
Denjoy demonstrated that the function Q is given by the following formula:
∞ k
Q([x1 , x2 , x3 , . . .]) = −2 (−1)k 2− i=1 x i .
k=1
Later, Salem derived the most important properties of Q from this formula, including
the facts that Q is strictly increasing and singular with respect to Lebesgue measure,
which means that the derivative of Q is equal to zero, Lebesgue-almost everywhere.
The function Q is for this reason referred to as a slippery Devil’s staircase, a term
coined by Gutzwiller and Mandelbrot in [GM88]. The (multifractal) fractal nature of
Q is investigated in [KS08b], see also [KS07].
Also, recall that the distribution function ∆ μ of a measure μ with support in [0, 1]
is defined for each x ∈ [0, 1] by
Proposition 1.3.7. The dynamical systems ([0, 1], F) and ([0, 1], T) are topologically
conjugate and the conjugating homeomorphism Q is given by
∞ k
Q([x1 , x2 , x3 , . . .]) := −2 (−1)k 2− i=1 x i .
k=1
That is, the conjugating homeomorphism between the Farey map and the tent map is
Minkowski’s question-mark function. Moreover, the map Q is equal to the distribution
function ∆ μ0 of the measure of maximal entropy μ0 := λ ◦ Q for the Farey map.
Remark 1.3.8. The reader unfamiliar with the concept of measures of maximal
entropy should not be unduly alarmed by this terminology. For our purposes, it is
enough to know that μ0 assigns mass 2−n to each n-th level Farey cylinder set. For
further reading we refer to [Wal82]. Also note that since Q is a homeomorphism the
measure λ ◦ Q is well defined.
Proof of Proposition 1.3.7. We will first show that the map Q is the conjugating homeo-
morphism from F to the tent system. For this, suppose first that x ∈ [0, 1/2]. Then, Q(x)
is an element of [0, 1/2] and we have that
∞
∞
k − ki=1 x i
k −(x1 −1)− ki=2 x i
T Q(x) = 2 −2 (−1) 2 = −2 (−1) 2
k=1 k=1
= Q [x 1 − 1, x2 , x3 , . . .] = Q(F(x)).
Now, suppose that x ∈ (1/2, 1], that is, x = [1, x2 , x3 , . . .]. Then, it follows that Q(x) ∈
(1/2, 1] and we have that
−1
∞
k −1− ki=2 x i
T(Q(x)) = 2 − 2 2 · 2 − 2 (−1) 2
k=2
∞
k −
k
xi
=2 (−1) 2 i=2 = Q [x2 , x3 , . . .] = Q(F(x)).
k=2
It only remains to show that Q is equal to the distribution function of μ 0 . Indeed, for
each x ∈ [0, 1] we have
∆ μ0 (x) = μ0 ([0, x]) = λ ◦ Q([0, x]) = λ([Q(0), Q(x)]) = λ([0, Q(x)]) = Q(x).
Definition 1.3.9. A map S : (X, d) → (X, d) of a metric space (X, d) into itself is said to
be Hölder continuous with exponent κ > 0 if there exists a positive constant C > 0 such
that
We will show now that Minkowski’s question-mark function, the conjugating homeo-
morphism between the Farey and tent systems, is Hölder continuous with exponent
√
log 2/(2 log γ ), where we recall that γ := (1 + 5)/2 denotes the golden mean. This
result was also originally proved by Salem [Sal43]. First, we give a useful lemma.
Lemma 1.3.10. For the denominator q n of the n-th convergent of x = [x1 , x2 , x3 , . . .], we
have that q n < γ x1 +···+x n , for all n ∈ N.
Proof. We will prove this by induction. Certainly, for p1 /q1 = 1/x1 , the inequality x1 =
q1 < γ x1 holds. So, fix n ∈ N and suppose that q k < γ x1 +···+x k for all k < n. Recall that
q n = x n q n−1 + q n−2 . Therefore,
Proposition 1.3.11. The map Q is s-Hölder continuous for s = log 2/(2 log γ ), but not for
any s > log 2/(2 log γ ).
Recalling that p n q n−1 − p n−1 q n = (−1)n+1 , it follows, in light of Lemma 1.3.10, that
1
λ(C(x1 , . . . , x n )) = > γ −(2s n +1) .
q n qn+1
Now, let x and y be two arbitrary different irrational numbers in [0, 1]. There must be
a first time during the backwards iteration of [0, 1] under the inverse branches of F in
which a Farey cylinder set appears between the numbers x and y. Say that this cylinder
set appears in the p-th stage of the Farey decomposition. If we iterate one more time,
it is clear that there are two (p + 1)-th level Farey cylinder sets fully contained in the
interval (x, y); moreover, one of these also has to be a Gauss cylinder set. Let this Gauss
cylinder set be denoted by C(z 1 , z2 , . . . , z k ), where kj=1 z j = p + 1. This leads to the
observation that, as C(z1 , z2 , . . . , z k ) is contained in the interval (x, y), we have
Consider the interval (x, y) again. By construction, it is contained inside two neigh-
bouring (p − 1)-th level Farey intervals, and so
Now fix s > log(2)/(2 log(γ )) and let x n , y n be the left and right boundary point of the
Gauss cylinder C(1, . . . , 1) of length n ∈ N. Then there exists a constant c > 0 such that
on the one hand
1
|x n − y n | = λ(C(1, . . . , 1)) = ≤ cγ −2n .
f n f n+1
In this section we will introduce and study the properties of our other main examples.
These are two families of dynamical systems, the α-Lüroth and α-Farey systems, which
are both indexed by partitions α of the unit interval. We shall first introduce the
class of partitions of [0, 1] we are interested in and then define the α-Lüroth map
L α . We then describe the expansion of real numbers that can be derived from this
map, again in terms of a Markov partition. Next, we introduce the family of α-Farey
maps and develop topological properties for these maps similar to those described in
Section 1.3.2 for the Farey map.
Definition 1.4.1. For a given partition α ∈ A, the α-Lüroth map L α : [0, 1] → [0, 1] is
given by
#
(t n − x)/a n for x ∈ A n , n ∈ N;
L α (x) :=
0 if x = 0.
In other words, the map L α consists of countably many linear branches that send A n
onto [0, 1), for each n ∈ N.
We now define a Markov partition for the map L α . Let E := {0} and let α̊ := {B n : n ∈ N},
where B n := Int(A n ) denotes the interior of A n for each n ∈ N. It is easy to verify that the
collection of sets α̊ constitutes a Markov partition. Indeed, the properties (a) and (b) of
Definition 1.2.21 are obviously satisfied and property (c) follows from the observation
that for all n ∈ N we have L α (B n ) = (0, 1).
1.4 Two further examples | 41
For later use, let us also define the inverse branches of L α . These are the countable
family of maps L α,n : (0, 1) → B n defined for each n ∈ N by
L α,n (x) := t n − a n x.
In order to construct a coding from this Markov partition, we must first show that it is
shrinking (cf. Definition 1.2.24). To do this, we calculate the Lebesgue measure of the
intervals that make up the refinements α̊ n of α̊. First of all, the size of each element
B n := (t n+1 , t n ) of α̊ is equal to a n . The refinement α̊ 2 is given by
α̊ 2 = α̊ ∨ L−1 −1
α ( α̊) = L α ( α̊) = L α,n (B k )
k∈N n∈N
= (t n − a n t k , t n − a n t k+1 ).
k∈N n∈N
and
so that the size of B1 ,...,n is equal to a1 . . . an . To shorten this notation, let us write
[1 , . . . , n ]α := t1 − a1 t2 + · · · + (−1)n−1 a1 . . . an−1 tn . Then,
α̊ n+1 = L−1 n
α ( α̊ ) = L α,k (B1 ,...,n )
k∈N (1 ,...,n )∈Nn
= t k − a k [1 , . . . , n ]α , t k − a k [1 , . . . , n + 1]α ±
k∈N (1 ,...,n )∈Nn
= [1 , . . . , n+1 ]α , [1 , . . . , n+1 + 1]α ± ,
(1 ,...,n ,n+1 )∈Nn+1
where we recall that the notation (·, ·)± means that the endpoints are not necessarily
in the correct order. Thus, the size of an interval B1 ,...,n+1 is equal to a1 . . . an+1 . From
this, we can deduce that the partition α̊ is shrinking. Since ∞ k=1 a k = 1, it follows that
there exists (at least) one of the a k with maximum size. Call it amax . Then, the largest
element of α̊ n has size (amax )n and, as 0 < amax < 1, this clearly tends to zero as n tends
to infinity.
Now we can utilise the shrinking Markov partition α̊ to obtain a coding for
all the points in the set [0, 1] \ ∞ −n
n=0 L α (0). Exactly as was the case for the Gauss
map before, if x lies in this set, we find a sequence (1 , 2 , . . .) ∈ NN such that
42 | 1 Number-theoretical dynamical systems
)∞
x∈ n=0 B 1 ∩ L −1
α B 2 ∩ · · · ∩ L α B n+1 . Thus, x ∈ B 1 and therefore,
−n
So, x = t1 −a1 L α (x). Then, L α (x) ∈ B2 and a similar calculation leads us to the identity
∞
*
x = t 1 + (−1)n−1 a i t n = t 1 − a 1 t 2 + a 1 a 2 t 3 − . . . .
n=2 i<n
This will be called the α-Lüroth expansion of the point x. To shorten the notation, we
denote these infinite series expansions by x = [1 , 2 , 3 , . . .]α .
Notice that the infinite α-Lüroth expansions match with the finite ones we ob-
tained above whilst calculating the endpoints of the intervals of the refined partitions
α̊ n . The main difference is that each infinite expansion is unique (as in the case of
infinite continued fractions), whereas the finite ones can be written in either of the
two ways:
and
Example 1.4.2.
(a) Define the harmonic partition α H by setting
$ ' %
1 1
α H := A n := , :n≥1 .
n+1 n
1.4 Two further examples | 43
This map can be found in the literature where it is often referred to as the
alternating Lüroth map. For references and more on the historical background to
this, see Section 1.5.
With respect to the map L α H , in exactly the way outlined above using the Markov
partition α̊ H , the corresponding series expansion of some arbitrary x ∈ [0, 1] turns
out to be
∞
n−1
!n
−1
x= (−1) (n + 1) (k (k + 1))
n=1 k=1
1 1 1
= − + − ··· ,
1 1 (1 + 1)2 1 (1 + 1)2 (2 + 1)3
Remark 1.4.3.
1. The name “α-Lüroth” for these maps is in honour of the German mathematician
J. Lüroth, for his 1883 paper [Lür83] which develops a particular series expansion
of real numbers which is related to the expansions derived above. For more
details, we refer once more to Section 1.5.
2. Note that the α-Lüroth expansion is a particular type of generalised Lüroth series, a
concept which was introduced by Barrionuevo et al. in [BBDK96] (also see [DK02]).
Before going any further, let us describe the action of the map L α on the expansions it
generates. For each x = [1 , 2 , 3 , . . .]α , we have, since x ∈ A1 , that
L α (x) = (t1 − x)/a1 = (t1 − (t1 − a1 t2 + a1 a2 t3 + . . .))/a1
= t2 + a2 t3 + . . . = [2 , 3 , 4 , . . .]α .
This shows that L α , just like the Gauss map, can be thought of as acting as the shift map
on the space NN , at least for those points in [0, 1] with infinite α-Lüroth expansions.
That is, L α : Iα → Iα and σ : NN → NN are topologically conjugate via the conjugacy map
h : NN → Iα given by h(1 2 3 . . .) = [1 , 2 , 3 , . . .]α .
44 | 1 Number-theoretical dynamical systems
For each x = [1 , 2 , 3 , . . .]α ∈ [0, 1], just as was done for the continued fraction
expansion, if we truncate the α-Lüroth expansion of x after k entries, then we obtain
the k-th α-Lüroth convergent of x, that is, for each k ∈ N we obtain the finite α-Lüroth
expansion r(α)
k
(x), given by
r(α)
k (x)
:= [1 , . . . , k ]α = t1 − a1 t2 + · · · + (−1)k−1 a1 · · · ak−1 tk .
The behaviour of these convergents is exactly like those of the continued fraction
convergents, as shown in the following proposition.
Proposition 1.4.4. Let x = [1 , 2 , 3 , . . .]α ∈ Iα . Then, the sequence of α-Lüroth conver-
gents of x satisfies the following four properties.
(a) The sequence r(α) (x) n≥1 of even convergents is increasing.
2n
(b) The sequence r(α) 2n−1 (x) n≥1 of odd convergents is decreasing.
(c) Every convergent
of oddorder is greater than every convergent of even order.
(d) limn→∞ r(α)
n+1 (x) − r(α)
n (x) = 0.
Proof. The proof is very similar to that of Proposition 1.1.3 and, as such, is left as an
exercise.
Definition 1.4.5. For each k-tuple (1 , . . . , k ) of positive integers, define the α-Lüroth
cylinder set C α (1 , . . . , k ) associated with the α-Lüroth expansion by
Observe once again that these cylinder sets coincide up to sets of measure zero with
the elements of the refinements α̊ n of the Markov partition α̊.
Let us now introduce a second family of maps, indexed by the same collection A of
partitions of [0, 1] as were used in the definition of L α . We will soon see that these
new maps are related to the maps L α in the same way the Farey map is related to the
Gauss map.
Although the formula looks a bit cryptic, all that the transformation F α does is to map
the set A1 linearly onto the interval [0, 1) and, for each n ≥ 2, map the interval A n
linearly onto the interval A n−1 . In particular, notice that F α |A1 = L α |A1 . The action of
F α on each point x = [1 , 2 , . . .]α ∈ [0, 1] is given by
#
[2 , 3 , . . .]α for 1 = 1;
F α (x) :=
[1 − 1, 2 , 3 , . . .]α for 1 ≥ 2.
Notice that the map F α acts on the α-Lüroth expansion of x in precisely the same way
as the Farey map acts on the continued fraction expansion of each point x ∈ [0, 1].
Definition 1.4.7. Let the two inverse branches of the map F α be denoted by
With the convention that F α,0 (0) = 0, these two branches are given by
a n+1
F α,0 (x) := (x − t n+1 ) + t n+2 , for x ∈ A n , n ≥ 1
an
and
Note that F α,0 maps the interval A n into the interval A n+1 , for each n ∈ N.
Example 1.4.8.
+
(a) For the harmonic partition α H := A n := 1/(n + 1), 1/n : n ∈ N , we obtain the
α H -Farey map F α H , which is given explicitly by
⎧
⎨2 − 2x for x ∈ (1/2, 1];
F α H (x) := n + 1 1
⎩ x− for x ∈ (1/(n + 1), 1/n].
n−1 n(n − 1)
We now show that the relationship between the maps L α and F α is exactly the same
as the relationship between the maps G and F, that is, L α is a jump transformation of
F α . More precisely, we make the following definition.
1
2
1
3
1
4
0 1 1 1 1 x 0 1 1 1 1 x
4 3 2 4 3 2
1
2
1
4
0 1 1 1 x 0 1 1 1 x
4 2 4 2
Fig. 1.8. The α D -Lüroth and α D -Farey map, which coincides with the tent map T , where t n = (1/2)n−1 ,
n ∈ N.
Notice that the map ρ α is finite everywhere on (0, 1]. Then, let the map F *α : [0, 1] →
[0, 1] be defined by
#
* F αρ α (x)+1 (x) if x = 0;
F α (x) :=
0 if x = 0.
We then obtain the following result. Note that the proof can be copied line by line from
the proof of the corresponding result for the Farey and Gauss maps, so we omit it here.
1.4 Two further examples | 47
or
In particular, this means that if we instead write x in its α-Farey coding, that is, x =
x 1 , x2 , x3 , . . . α , then
Therefore, the map F α : [0, 1] → [0, 1] is a factor of the shift map σ on the shift
space {0, 1}N , via the factor map h : {0, 1}N → [0, 1] defined by h((x1 , x2 , x3 , . . .)) :=
x 1 , x 2 , x 3 , . . . α .
Let us now define the cylinder sets associated with the map F α . These once more
coincide with the refinements of the Markov partition given above for F α .
48 | 1 Number-theoretical dynamical systems
Definition 1.4.11. For each n-tuple (x1 , . . . , x n ) of positive integers, define the α-Farey
α (x1 , . . . , x n ) by setting
cylinder set C
By analogy with the Farey decomposition described after Definition 1.3.4, we will
α (x1 , . . . , x n ) : (x1 , . . . , x n ) ∈ {0, 1}n } the n-th level
call the set of cylinder sets {C
α-Farey decomposition. Observe that we have the relation C α (x1 , . . . , x n ) = F α,x1 ◦ · · · ◦
F α,x n ([0, 1]).
Notice that every α-Lüroth cylinder set is also an α-Farey cylinder set, whereas the
converse of this statement is not true. The precise description of the correspondence
is that any α-Farey cylinder set which has the form C α (01 −1 , 1, . . . , 0k −1 , 1) coincides
with the α-Lüroth cylinder set C α (1 , . . . , k ), but if an α-Farey cylinder set is defined by
a finite word ending in the symbol 0, then it cannot be translated to a single α-Lüroth
cylinder set. However, we do have the relation
It therefore follows that for the Lebesgue measure of this interval we have that
α (01 −1 , 1, 02 −1 , 1, . . . , 0k −1 , 1, 0m )) =
λ(C λ(C α (1 , 2 , . . . , k , n))
n≥m+1
In addition, we can identify the endpoints of each α-Farey cylinder set. If we consider
the set C α (01 −1 , 1, . . . , 0k −1 , 1), then we already know the endpoints of this interval
(since it is also equal to an α-Lüroth cylinder set). On the other hand, the endpoints
of the set C α (01 −1 , 1, 02 −1 , 1, . . . , 0k −1 , 1, 0m ) are given by [1 , . . . , k , m + 1]α and
[ 1 , . . . , k ] α .
Let us now consider the topological properties of the maps F α . Perhaps by now
the reader will not be surprised to learn that they are essentially the same as the
topological properties of the Farey map F. Again, the proofs can be closely modelled
after the proofs of the corresponding results for the Farey map and so we leave many
of them as exercises.
Before stating the first proposition, we remind the reader that the measure of
maximal entropy μ α for the system F α is the measure that assigns mass 2−n to each
n-th level α-Farey cylinder set, for each n ∈ N.
1.4 Two further examples | 49
Proposition 1.4.12. The dynamical systems ([0, 1], F α ) and ([0, 1], T) are to-
pologically conjugate and the conjugating homeomorphism is given, for each
x = [1 , 2 , 3 , . . .]α , by
∞ k
θ α (x) := −2 (−1)k 2− i=1 i .
k=1
Moreover, the map θ α is equal to the distribution function of the measure of maximal
entropy μ α for the α-Farey map.
In order to determine the Hölder and sub-Hölder exponents of θ α , let us first define
κ(n) := −n log 2/(log a n ) and set
κ+ := inf κ(n) : n ∈ N and κ− := sup κ(n) : n ∈ N .
Proposition 1.4.13. We have that the map θ α is κ+ -Hölder continuous and, provided that
κ− is finite, κ− -sub-Hölder continuous.
This can be seen by simply calculating the image of the endpoints of this cylinder,
or by noting that every α-Lüroth cylinder set C α (1 , 2 , . . . , k ) is an n-th level α-Farey
cylinder set, where n = kj=1 j .
50 | 1 Number-theoretical dynamical systems
From this point, the proof that θ α is κ+ -Hölder continuous is completed in precisely
the same way as the proof that Q is log 2/(2 log γ )-Hölder continuous and we leave the
details to the reader.
Suppose now that κ+ is equal to zero. Then, we have that for each q ∈ N there
exists m0 ∈ N with the property that for every m ≥ m0 ,
m log 2 1
κ(m) = < , or, equivalently, a m < 2−qm .
− log a m q
The proof can be completed analogously to the proof of the Hölder continuity of
θ α from this point on and the details are left as an exercise for the reader (see
Exercise 1.6.19).
Example 1.4.14. For the conjugacy map θ α H between the map F α H , arising from the
harmonic partition, and the tent map T, we have that θ α H is log 4/ log 6-Hölder
continuous. To show this, first observe that
− log 2 2 log 2 2 log 2 3 log 2
κ(1) = =1> = κ(2) and < = κ(3).
log 1/2 log 6 log 6 log 12
Then, since 6n > (n2 + n)2 for n ≥ 3, we have that for all n ≥ 3,
n log 2 2 log 2
κ(n) = > = κ(2).
log(n(n + 1)) log 6
1.4 Two further examples | 51
|θ α H (x) − θ α H (y)| |x − y |κ .
In particular, this inequality has to be satisfied for x = 0 and y given by [n]α H = 1/n
successively, for each n ∈ N. But here we have that |θ α H (0) − θ α H (1/n)| = 2−(n−1) and
|0 − 1/n | = 1/n, which implies that there can be no such κ.
Notice that, in line with the fact that there is no sub-Hölder continuity in this case,
we also have that κ− is infinite.
Remark 1.4.15. The reasoning given above for why the map θ α H fails to be sub-Hölder
continuous also works for the map Q (the conjugacy map between the Farey and
tent maps). In other words, there is no positive constant κ such that the map Q is
κ-sub-Hölder continuous.
Let us now introduce some particular classes of partitions that will be useful in the
chapters that follow, particularly in Chapter 3. Before beginning this task, we first
recall the definition of a slowly varying function.
ψ(xy)
lim = 1, for all y > 0.
x→∞ ψ(x)
In the following proposition, we list some of the useful properties that slowly varying
functions satisfy. From this list, it should be clear that the idea behind a slowly varying
function is that it behaves like a logarithmic function.
log(ψ(x))
(b) lim = 0.
x→∞ log(x)
(c) For any −∞ < a < ∞, the functions ψ a , ψ · φ and ψ + φ are all slowly varying.
(b) The partition α is said to be expansive of exponent θ ≥ 0 if the tails of the partition
satisfy the power law
t n = ψ(n) · n−θ ,
Notice that if α is expanding, one immediately verifies that α is of finite type. This
can be seen, for instance, by applying the ratio test for series convergence. The next
proposition describes the situation for expansive partitions.
Proof. Suppose first that α is expansive of exponent θ ∈ [0, 1). Then, by Proposi-
tion 1.4.17, for all ε > 0 there exists n0 ∈ N such that if n ≥ n0 , then we have that
ψ(n) ≥ n−ε . Let ε > 0 be sufficiently small such that θ + ε ∈ (0, 1). Then,
∞ n
0 −1
∞
∞
∞
tn = ψ(n) · n−θ + ψ(n) · n−θ ≥ n−(θ+ε) ≥ n−1 .
n=1 n=1 n=n0 n=n0 n=n0
∞
Consequently, n=1 t n = ∞ and α is of infinite type. Now suppose that α is expansive of
exponent θ > 1. Then, again by Proposition 1.4.17, for all ε > 0 there exists n0 ∈ N such
that if n ≥ n0 , then we have that ψ(n) ≤ n ε . For ε > 0 small enough such that θ − ε > 1,
we then have that
∞
∞
∞
tn = ψ(n) · n−θ ≤ n−(θ−ε) < ∞.
n=n0 n=n0 n=n0
Therefore, in this case, the partition α is of finite type. It only remains to prove the
third assertion, which can be done by considering the following two examples. First,
let t1 := 1 and for each n ≥ 2, let t n := (n log n)−1 . The partition α defined in such a way
1.4 Two further examples | 53
∞
∞
1
tn = 1 + ,
n log n
n=1 n=2
which diverges. So, in this first case, the partition is of infinite type. On the other hand,
if now we define a partition by setting t1 := 1 and t n := n−1 ·(log n)−2 for n ≥ 2, we obtain
that
∞
∞
1
tn = 1 + ,
n(log n)2
n=1 n=2
which is a convergent series, so in this case we have that the partition is of finite type.
This finishes the proof.
Fig. 1.9 illustrates two α-Farey maps with α expansive. The graph on the left-hand
side has α with exponent θ = 2, so satisfies the condition of the second part of
Proposition 1.4.19. The graph on the right-hand side has α with exponent θ = 1/2, so
it satisfies the condition given in the first part of Proposition 1.4.19.
In this section, we consider some easily-obtained results for the α-Lüroth expan-
sion which are analogous to the metrical Diophantine results already given above
1
2
1
3
1
4
1
9
x 0 1 1 1 1
1 x
0 1 1 1 1 52 3 2
16 9 4
Fig. 1.9. The graphs of two α-Farey maps with α expansive. The partition on the left is of finite type
with α given by t n := 1/n2 , n ∈ N and the partition on the right is of infinite type with α given by
√
t n := 1/ n, n ∈ N.
54 | 1 Number-theoretical dynamical systems
for the continued fraction expansion. We will first consider the equivalent of the
badly-approximable numbers.
and set
Bα := Bα,N .
N∈N
Lemma 1.4.21.
λ (Bα ) = 0.
Proof. First notice that we can write the set of badly α-approximable numbers in the
following way:
Bα = Aα,N ,
N∈N
where
A(n)
α,N := { x = [1 , 2 , . . .]α ∈ Iα : k ≤ N for all 1 ≤ k ≤ n } .
A(n+1)
α,N = C α (1 , . . . , n+1 ) = C α (1 , . . . , n , k).
1 ,...,n+1 1 ,...,n k≤N
i ≤N, 1≤i≤n+1 i ≤N, 1≤i≤n
N
Since the last term above is simply a constant and since 0 < k=1 a k < 1, this shows
that λ(Aα,N ) = 0, for any N ∈ N. Finally, we have that
∞ ∞
λ Aα,N ≤ λ(Aα,N ) = 0.
N=1 N=1
Proof. Notice that the complement of Wα is the set of all those α-irrational num-
bers with bounded α-Lüroth elements. The corollary then follows directly from
Lemma 1.4.21.
Although the sets Aα,N defined in the proof of Lemma 1.4.21 have Lebesgue measure
zero for every N ∈ N, we can still distinguish between their sizes by calculating their
Hausdorff dimension. Luckily, this is very easy to do, as the next lemma demonstrates.
The reader unfamiliar with the Hausdorff dimension of a set can either refer, for
instance, to the book by Falconer [Fal14], or can safely ignore the next two results,
as they will not be needed for anything that follows.
N
dimH Aα,N = s, where s is given by a si = 1.
i=1
Proof. All that is required to prove this statement is to notice that for each N ∈ N the
set Aα,N is an invariant set for a finite iterated function system {L α,1 , . . . , L α,N }, where
L α,n denotes the n-th inverse branch of the map L α . Recall that these are given by
L α,n (x) := t n − a n x. That Aα,N is an invariant set for this system means that
N
Aα,N = L α,i Aα,N .
i=1
Then, since these inverse branches are contracting similarities, that is, they satisfy
the equality |L α,i (x) − L α,i (y)| = a i |x − y| for all x, y ∈ [0, 1], we have that the dimension
of Aα,N can be deduced directly from an application of Hutchinson’s Formula (see
[Fal14], Theorem 9.3).
This observation can be used to calculate the Hausdorff dimension of the set Bα , as
follows.
56 | 1 Number-theoretical dynamical systems
dimH (Bα ) = 1.
Proof. Since Bα := N∈N AN , we have that
dimH (Bα ) = sup dimH (AN ) : N ∈ N .
N s
Then, by Lemma 1.4.23, dimH Aα,N = s, where s is given by i=1 a i = 1 and
N+1 t
dimH Aα,N+1 = t, where t is given by i=1 a i = 1. Therefore, a1t + · · · + a tN < 1 and
so s < t. In other words,
dimH Aα,N < dimH Aα,N+1 .
∞
Furthermore, as i=1 a i = 1, it follows that dimH (Bα ) = 1.
Note that similarly, the Hausdorff dimension of the set of badly approximable numbers
(for the continued fraction expansion) is also known to be equal to 1. However, the
proof is much more involved, so we simply refer to [Jar29]. Let us now consider the
result analogous to Theorem 1.2.19.
Theorem 1.4.25.
(a) Let φ : N → N be a function such that the series ∞n=1 t φ(n) diverges. Where the set
Bα,φ is defined by
we have that
λ (Bα,φ ) = 0.
∞
(b) Let φ : N → N be a function such that the series n=1 t φ(n) converges. Where the set
Wα,φ is defined to be
we have that
λ (Wα,φ ) = 0.
Proof. For the proof of part (a), we proceed similarly to the proof of Lemma 1.4.21.
n
Define the sets Bα,φ by setting
(n)
Bα,φ := {x = [1 , 2 , . . .]α ∈ Iα : k < φ(k) for all 1 ≤ k ≤ n}.
1.4 Two further examples | 57
Thus,
(n) !
n
(1)
(n+1)
λ Bα,φ = 1 − t φ(n+1) λ Bα,φ = ··· = 1 − t φ(k+1) λ Bα,φ .
k=1
−x
Now, since 1 − x ≤ e for all 0 < x < 1, we then have that
n
(n+1)
λ Bα,φ ≤ e− k=1 t φ(k+1) λ Bα,φ
(1)
.
Consequently, as the series nk=1 t φ(k+1) can be made arbitrarily large as n increases,
(n)
we have that limn→∞ λ Bα,φ = 0.
Concerning the proof of part (b), we will again use the Borel–Cantelli Lemma.
Notice that
(n) (n)
Wα,φ = lim sup Wα,φ , where Wα,φ := {x ∈ Iα : n ≥ φ(n)}.
n→∞
∞
(n)
λ Wα,φ < ∞.
n=1
Indeed,
⎛ ⎞
(n)
λ Wα,φ = λ⎝ C α (1 , . . . , n−1 , k)⎠
(1 ,...,n−1 )∈Nn k:k≥φ(n)
= λ(C α (1 , . . . , n−1 , k))
(1 ,...,n−1 )∈Nn k:k≥φ(n)
= a1 . . . an−1 a k = t φ(n) .
(1 ,...,n−1 )∈Nn k:k≥φ(n)
58 | 1 Number-theoretical dynamical systems
By assumption, the series ∞ n=1 t φ(n) converges and so, therefore, does the series
∞
(n)
n=1 λ Wα,φ . This finishes the proof.
Remark 1.4.26. Notice that if the partition α is of finite type, that is, if α is such that
∞
n=1 t n converges, then we have that λ(Bα,φ ) = 1 for any arbitrary increasing function
φ : N → N.
The Farey map is named for John Farey (1766–1826), who was not a mathematician,
but a geologist. Farey’s one contribution to Mathematics was the article On a curious
property of vulgar fractions [Far16], in which he defines Farey sequences in the
following way. For each n ∈ N, list all the rationals between 0 and 1 which, when
expressed in their lowest terms, have denominator at most equal to n. Denoting the
n-th Farey sequence by Fn , the first few are given by
$ % $ % $ %
0 1 0 1 1 0 1 1 2 1
F1 := , , F2 := , , , F3 := , , , , ,
1 1 1 2 1 1 3 2 3 1
$ % $ %
0 1 1 1 2 3 1 0 1 1 1 2 1 3 2 3 1
F4 := , , , , , , , F5 := , , , , , , , , , ,...
1 4 3 2 3 4 1 1 5 4 3 5 2 5 3 4 1
The curious property of Farey’s title is that each member of the sequence is equal to the
mediant of its two neighbours. Recall that the mediant of two rational numbers a/b
and a /b is by definition the rational number (a + a )/(b + b ). Farey did not himself
provide a proof of his discovered property² and he was doubtless not the first to notice
it. Cauchy supplied the necessary proof in the same year that Farey’s article appeared.
We have already seen that if we iterate the point 1/2 under the two inverse
branches of the Farey map, each time one of the Farey fractions turns up. However,
as we have already pointed out, strictly speaking it is not the Farey sequence which
appears in this manner, but rather the Stern–Brocot sequence. The Stern–Brocot
sequence was independently discovered by the German number-theorist Moritz Stern
[Ste58] and the French clockmaker Achille Brocot [Bro61]. (Brocot used the sequences
to design systems of gears. For information on these sorts of applications, see Chapter
IV of Rockett and Szűsz [RS92].) For this reason, it would perhaps be more reasonable
to refer to the Farey map as the Stern–Brocot map. However, we choose to stick with
convention on this point.
2 That Farey did not give a proof of his curious property was pointed out by Hardy [HW08], with the
rather unfriendly comment that Farey was “at the best an indifferent mathematician”.
1.5 Notes and historical remarks | 59
In the paper [Lür83], J. Lüroth introduces a series representation of real numbers from
the unit interval. His starting point is the observation that for every real number x in
the interval (0, 1), either x = 1/, for some positive integer ≥ 2, or, 1/x lies between
two successive positive integers 1 and 1 + 1 and so
1
x= +
x,
1 + 1
Clearly, this process either continues until such a time as one of the x i is equal to the
reciprocal of a positive integer that is at least equal to 2, or continues indefinitely. For
the special case that x = 1, we notice that 1 = 1/2 + 1/4 + 1/8 + . . .. In each case, this
gives the series expansion now called the Lüroth expansion of a real number in [0, 1].
Each finite expansion of the form above represents a rational number. Suppose
now that x ∈ [0, 1] has an infinite Lüroth expansion. Since each k is at least equal to
1, for the k-th term in the Lüroth expansion of x we have that
1 1
≤ .
1 (1 + 1) . . . k−1 (k−1 + 1)(k + 1) 2k
We will use Lüroth’s original notation and write x = S(1 , 2 , . . .) for this sum. For
instance, we have that 1 = S(1, 1, 1, . . .). The next observation in [Lür83] is that if
x ∈ [0, 1] has a finite Lüroth expansion, that is, if x = S(1 , 2 , . . . , k ) for some k ∈ N,
then
mentioned, each finite Lüroth expansion represents a rational number, but it is easy
to see, using only the sum of a geometric series, that each (eventually) periodic infinite
Lüroth expansion is also a rational number. Of course, each finite Lüroth expansion
can also be written as an eventually periodic expansion; in this case the periodic
part consists of infinitely many ones. The proof of these statements are also given in
[Lür83].
It seems probable that Lüroth was thinking of a generalisation of the decimal
expansion of a real number when he introduced his infinite series expansion. He
states that the given expansion has many similarities with the representation through
infinite decimal expansions and asks whether or not it is possible to characterise the
numbers which have a finite Lüroth expansion in any other way, that is, as in the
way that rational numbers with finite decimal representations are exactly those with
denominators equal to 2n 5m for some positive integers n and m. As of the present
moment, we are unaware of any answer to this question.
The Lüroth expansion can also be generated by a dynamical system, L : [0, 1] →
[0, 1]. The map L is referred to as the Lüroth map and it is defined by
⎧ &
⎪ 1 1
⎪
⎪ n(n + 1)x − n for x ∈ , , n ≥ 2;
⎪
⎨ n+1 n
& '
L(x) := 1
⎪
⎪ 2x − 1 for x ∈ ,1 ;
⎪
⎪ 2
⎩
0 for x = 0.
L (x)
1
0 1 1 x
2
all positive slopes instead of all negative slopes. The map L α H was first described by
S. Kalpazidou, A. Knopfmacher and J. Knopfmacher [KKK91] in the early 1990s; they
called it the alternating Lüroth map and established some of its basic properties. The
Lüroth map and, to a lesser extent, the alternating Lüroth map have been studied
by several authors. In addition to those already cited above, these works include
[BBDK96], [DK96], [Gal72], [Gan01], [Šal68], [SW07] and in particular [DK02].
1.6 Exercises
Exercise 1.6.1 (Dirichlet’s Approximation Theorem). Fix x ∈ R. Prove that for every
N ∈ N there exists p, q ∈ Z with 1 ≤ q ≤ N such that
1
|xq − p| ≤
N
and deduce that for infinitely many co-prime integers p and q we have that
x − p ≤ 1 .
q q2
Hint: Apply the Pigeonhole Principle to the ‘pigeons’ kx − kx , for k = 0, . . . , N and the
‘holes’ [/N, ( + 1)/N), = 0, . . . , N − 1.
π ≈ 3.141592653589793 . . .
(i) Find the first four elements in the continued fraction expanion of π.
(ii) Determine the first four convergents of π.
then x satisfies
Exercise 1.6.4. Let (f n ) denote the Fibonacci sequence, that is, f0 := 1, f1 := 1 and
f n+2 := f n+1 + f n . Show that for the generating function we have
∞
z
fk zk =
1 − z − z2
k=0
62 | 1 Number-theoretical dynamical systems
Exercise 1.6.5. Taking inspiration from the proof of Hurwitz’s Theorem II, prove that
√ √
for x := 2 − 1 = [2, 2, 2, . . .], we have ν (x) = 1/ 8.
Exercise 1.6.6. Prove that if x and y are two equivalent irrational numbers (in the
sense of Definition 1.1.13 (c)), then ν (x) = ν (y).
Exercise 1.6.7. Let x and y be two equivalent irrational numbers. Show that there exist
integers a, b, c and d such that
ay + b
x= ,
cy + d
with ad − bc = ±1.
Exercise 1.6.8. Show that for every n ∈ N, the continued fraction expansion of
√
n2 + 1 − n is given by the periodic expansion
√ 1
n2 + 1 − n = [2n] = .
1
2n +
2n + . . .
Exercise 1.6.9. Let e = ∞k=0 1/k! = 2.71828 . . . be the base of the natural logarithm,
and note that is known that the continued fraction expansion of e is given by
where a0 := 2, a1 := 1, and for n ≥ 1 we have that a3n−1 := 2n and a3n = a3n+1 := 1. Show
that the following statements are true for every integer n ≥ 1.
(i) For exactly one element i ∈ {n, n + 1, n + 2}, we have that
e − p i < 3
.
q i 2(i + 2)q2i
Exercise 1.6.10. Show that the function d : E∞ × E∞ → [0, 1] defined in Definition 1.2.10
is really a metric.
Exercise 1.6.11. Show that the space NN equipped with the metric d from Defini-
tion 1.2.10 is a complete metric space which is not locally compact.
1.6 Exercises | 63
is a topological conjugacy map between the Gauss system and the full shift map on
the shift space NN .
Exercise 1.6.13. Show that the tent map and the map L : [0, 1] → [0, 1], x → 4x(1 − x),
are conjugated via
1 1
ψ : [0, 1] → [0, 1], x → − cos(πx).
2 2
Exercise 1.6.14. Prove Cantor’s Intersection Theorem: If (S n )n∈N is a decreasing
sequence (so S n+1 ⊆ S n ) of non-empty compact sets in R (more generally, a complete
)
metric space), with diameters shrinking to zero, then the intersection S := n∈N S n is
a singleton.
Exercise 1.6.15. Show that Minkowski’s question-mark function Q maps the set of
rational numbers onto the dyadic rationals and maps the quadratic surds onto the
set of non-dyadic rationals and show that their order is preserved.
Exercise 1.6.16. Re-derive the formula given by Denjoy for the function Q from the
definition given in terms of mediants.
Exercise 1.6.18. Provide the missing details in the proof that the map L α is κ+ -Hölder
continuous.
Exercise 1.6.19. Supply the missing details in the proof that L α is κ− -sub-Hölder
continuous.
Remark 2.1.2. For every T-invariant measure μ and any set A ∈ B of finite μ-measure,
we have
In practice, it can be difficult to check that a map preserves a given measure using
only this definition, as it is often the case that no specific information is known about
a general measurable set. However, it is enough to have knowledge of a particular class
of sets that generates the σ-algebra of measurable sets, as we now show. (Recall that
the σ-algebra generated by a collection C of subsets of X is the smallest, in the sense
of inclusion, σ-algebra that contains all the sets in C .)
2.1 Invariant measures | 65
Example 2.1.4. The collection of all subintervals of [0, 1] together with the empty set
generates the Borel σ-algebra on [0, 1] and is closed under taking intersections.
To motivate the definition of T-invariance from the stochastic point of view let us
consider a probability space (X, B , μ) together with a measurable transformation
T : X → X. Then the stochastic process given by (g ◦ T k )k∈N0 defined on (X, B , μ)
is stationary for every integrable function g : X → R if and only if μ is T-invariant.
Stationarity follows from T-invariance by noting that for every Borel set B and every
k ∈ N we have
Example 2.1.5. For the tent map T : [0, 1] → [0, 1] it is easily seen that for any interval
(a, b) ∈ [0, 1] we have that
a b 2−b 2−a
λ(T −1 (a, b)) = λ , ∪ ,
2 2 2 2
1 1
= (b − a) + (b − a) = b − a = λ((a, b)).
2 2
Thus, it follows from Lemma 2.1.3 that the tent map preserves the Lebesgue measure.
Definition 2.1.6. Let (X, B , μ) be a measure space and let M(B) denote the set of
all real-valued B-measurable functions. We now define an equivalence relation by
identifying two elements f , g ∈ M(B) if they satisfy
f ∼ g : ⇐⇒ μ({f = g }) = 0. (2.1)
Then we define
M(B) := M(B)/ ∼ .
66 | 2 Basic ergodic theory
Further, define
$ , %
L1 (μ) := f : X → R : f measurable and |f | dμ < ∞ / ∼
and
If it is clear from the context, we will not distinguish between measurable functions
and their equivalence classes given by (2.1). For any subset F of M(B), we let F + denote
the subset of non-negative elements from F .
For the case of a finite measure space, the following proposition gives a useful
characterisation of T-invariant measures in terms of the space of L1 (μ) functions.
Proof. Suppose first that (2.2) holds. In that case, for any measurable set B we can let
f = 1 B to obtain that
, ,
μ(B) = 1 B dμ = 1 B ◦ T dμ = μ(T −1 (B)).
On the other hand, if T preserves the measure μ, then (2.2) holds for all characteristic
functions and therefore it also holds for all simple functions. If f ∈ L1 (μ), we can find a
sequence (f n )n≥1 of simple functions increasing to f ; moreover, the sequence (f n ◦ T)n≥1
is a sequence of simple functions increasing to f ◦ T. We can then apply the Monotone
Convergence Theorem to deduce that
, , , ,
f dμ = lim f n dμ = lim f n ◦ T dμ = f ◦ T dμ.
n→∞ n→∞
Our aim now is to describe the invariant measures for our main examples. Let us begin
with the α-Lüroth systems.
2.1 Invariant measures | 67
Proof. Recall from Section 1.4.1 that the inverse branches L α,n : [0, 1) → A n of L α are
given by L α,n (x) := t n − a n x, for all n ∈ N. In order to show that λ is L α -invariant, by
Lemma 2.1.3 it is enough to show that for every subinterval I contained in [0, 1], we
have that
λ(I) = λ(L −1
α (I)).
In fact, since the Lebesgue measure is non-atomic, it suffices to let I := [a, b] for some
0 ≤ a < b ≤ 1. A straightforward calculation shows that
∞ ∞
−1
λ(L α [a, b]) = λ L α,n ([a, b]) = λ(L α,n ([a, b]))
n=1 n=1
∞
∞
= |(t n − a n a) − (t n − a n b)| = a n (b − a)
n=1 n=1
= b − a = λ([a, b]).
As you will show in Exercise 2.6.1, the Gauss map is not preserved by the Le-
besgue measure. However, Gauss observed in 1845 that the map G does preserve a
Lebesgue-absolutely continuous measure, which we now define.
Definition 2.1.9. The Gauss measure m G is given, for all Borel measurable sets A ⊆
[0, 1], by
,
1 1
m G (A) := dλ(x).
log 2 1 + x
A
Proof. It suffices to show that m G ([0, b]) = m G ◦ G−1 ([0, b]), for any 0 ≤ b ≤ 1. Recalling
that G n refers to the n-th inverse branch of G, which is given by G n (x) := 1/(x + n), we
have that
∞ ∞ & '
1 1
G−1 ([0, b]) = G n ([0, b]) = , .
n+b n
n=1 n=1
1 n+1 n+b b
1+ · 1+
n = n n+1 = n ,
1 n+b+1 n+b b
1+ · 1+
n+b n+b n+1 n+1
68 | 2 Basic ergodic theory
,1/n
1
∞
−1 1
m G (G ([0, b])) = dx
log 2 1+x
n=1
1/(n+b)
∞
1 1 1
= log 1 + − log 1 +
log 2 n n+b
n=1
1
∞
b b
= log 1 + − log 1 +
log 2 n n+1
n=1
,b/n
1
∞
1
= dx
log 2 1+x
n=1
b/(n+1)
∞ & '
b b
= mG ,
n+1 n
n=1
= m G ([0, b]).
Remark 2.1.11. In order to make a guess at how Gauss arrived at his invariant measure,
we make the simple arithmetic observation that
∞ ∞
1 1 1 1
= − = ,
(n + x)(n + 1 + x) n+x n+1+x 1+x
n=1 n=1
which is equivalent to
∞
1 1 1
= .
(n + x)2 1 + 1 1+x
n=1
n+x
This infinite sum can be expressed in terms of the Gauss map G as follows, with
h G (x) := 1/((1 + x) log 2), and, as usual, with G n referring to the n-th inverse branch
of G,
∞
|Gn (x)| h G (G n (x)) = h G (x).
n=1
The significance of this formula will be apparent shortly, when we come to describe
the transfer operator.
Finally, we finish this section with the observation that the Gauss measure belongs to
the same measure class as the Lebesgue measure.
2.2 Recurrence and conservativity | 69
Proposition 2.1.12. For any Borel set B ⊆ [0, 1], we have that
λ(B) λ(B)
≤ m G (B) ≤ .
2 log 2 log 2
and
, ,
1 1 1 1 λ(B)
m G (B) = dλ(x) ≥ dλ(x) = .
log 2 1+x log 2 2 2 log 2
B B
Let us now study a general property of invariant measures by taking a detour to present
one of the fundamental results of ergodic theory, namely, Halmos’s Recurrence
Theorem [Hal56]. This theorem states that for a conservative transformation (which
will be defined momentarily) on a σ-finite measure space, almost all points of a given
set return infinitely often to that set under iteration. Although it is relatively easy to
prove, the importance of this theorem should not be underestimated, as it is really
one of the very few completely general theorems in all of ergodic theory. Theorems of
this type were initially established for finite systems, but the proofs are essentially the
same.
Note that if W ⊆ X is a wandering set for a map T : X → X, then this implies that
∞
1 W ◦ T n ≤ 1.
n=0
Proposition 2.2.3. Any measure-preserving system (X, B , μ, T) with μ(X) finite is con-
servative.
70 | 2 Basic ergodic theory
Proof. Fix W ∈ B and assume that T −k (W) defines a disjoint family of sets for k ∈ N0 .
Then
−k
μ(X) ≥ μ T W = μ(T −k W) = μ (W )
k∈N k∈N k∈N
Let us recall that the symmetric difference A B of two sets A and B is defined by
A B := (A \ B) ∪ (B \ A) = (A ∪ B) \ (A ∩ B). Hereafter, we will use the notation
“A = B mod μ”, (respectively, “A ⊂ B mod μ”) to indicate that two sets are equal
(respectively, A is contained in B) up to a set of μ-measure zero, i.e., μ(A B) = 0
(respectively, μ(A \ B) = 0).
Our first claim is that μ(N) = 0. To show this, let x ∈ N. Then for every n ≥ 1 we have
that T n (x) ∈/ B, and therefore T n (x) ∈/ N. This shows that N ∩ T −n (N) = ∅, for all n ≥ 1.
2.2 Recurrence and conservativity | 71
Hence, it follows that for all i, j ∈ N such that j < i, we have that
T −j (N) ∩ T −i (N) = T −j N ∩ T −(i−j) (N) = T −j (∅) = ∅.
So, the preimages {T −n (N) : n = 0, 1, 2, ...} of N under the iterates of T form a pairwise
disjoint family of sets, that is to say, N is a wandering subset of A. By our assumption
on A and since T is assumed to be non-singular, it follows that 0 = μ(N) = μ(T −n N), for
all n ∈ N. Since N = B \ ∞ −n
n=1 T B this shows that for all n ≥ 0 we have
T −n B ⊂ T −k B mod μ.
k>n
Consequently,
T −k B = T −n−1 B ∪ T −k B = T −k B mod μ.
k>n k>n+1 k>n+1
B⊂ T −k B = T −k B = · · · = T −k B
k=0 k>1 n∈N k≥n
For the reverse implication we assume that μ(A ∩ W) > 0 for some wandering set W ∈ B.
Then for the set B := A ∩ W of positive measure we have T −n B ∩ B = ∅, for all n ∈ N.
Note that the recurrence property for B in Halmos’s Recurrence Theorem can be stated
equivalently as follows
∞
1 B ◦ T n = ∞ μ-a.e. on B.
n=1
,
n n ,
g ◦ T k f dμ = 1 W N · g ◦ T n−k · f dμ
f
k=0 k=0
W fN
n ,
= 1 W N ◦ T k · f ◦ T k · g ◦ T n dμ
f
k=0
,
n
= g ◦ Tn 1 T −k W N · f ◦ T k dμ
f
k=0
,
n ,
≤ g ◦ T n · 1W N · f ◦ T k dμ ≤ N g dμ < ∞.
f
k=0
Since f > 0 this shows that μ-a.e. on W fN the infinite sum ∞ k
k=0 g ◦ T is finite. Hence,
. / . /
∞ k N ∞ k
k=0 f ◦ T < ∞ = N Wf ⊂ k=0 g ◦ T < ∞ . Taking complements proves the
inclusion.
Now Lemma 2.2.7 gives rise to the definition of the Hopf decomposition of X with
respect to a measure-preserving transformation T.
for some g ∈ L1 (μ) such that g > 0 and the dissipative part D T := X \ C T .
Proof. To prove (a) fix f ∈ L1 (μ) with f > 0 and let W ∈ WT with μ(W) > 0. Then, for
every n ∈ N, we have
,
n n ,
f ◦ T k dμ = 1 W · f ◦ T n−k dμ
W k=0 k=0
n ,
= 1 W ◦ T k · f ◦ T n dμ
k=0
,
n ,
= f ◦ Tn · 1 W ◦ T k dμ ≤ f dμ < ∞.
k=0
2.2 Recurrence and conservativity | 73
Since we suppose that the measure of W is positive, we have that the infinite sum
∞ k
k=0 f ◦ T must converge μ almost surely on W. Hence, W is contained in the
dissipative part.
Towards part (b) fix a measurable subset A of the dissipative part with positive
measure and without loss of generality we also assume μ(A) to be finite. If we make the
assumption that μ(A ∩ W) = 0 for all wandering sets W ∈ B, then Halmos’s Recurrence
Theorem (Theorem 2.2.6) would imply that n∈N 1 A ◦ T n = ∞ almost everywhere on A.
But then Lemma 2.2.7 with g := 1 A , would imply that A is a subset of the conservative
part. This contradiction finishes the proof.
Remark 2.2.10. What we have just proved is that the dissipative part D T of a
measure-preserving system is the measurable union of WT . This means by definition
that the collection WT is hereditary (i.e., measurable subsets of wandering sets are
wandering sets) and that the properties of WT described in (a) (that WT is said to
cover D T ) and (b) (that WT is said to saturate D T ) are fulfilled. In fact, any measurable
set with these properties is uniquely determined mod μ. To see why, suppose there
were two measurable unions D and D such that μ(D \ D ) > 0. Then by property (b)
there exists a wandering set W ⊂ D \ D with positive measure and by property (a)
this set must lie in D which gives a contradiction. The same argument with D and D
interchanged gives that D = D mod μ.
The measurable union for a hereditary family of measurable sets (like WT ) always
exists and can also be constructed abstractly.
CT = X mod μ.
T −k A = T −n T −k A = T −n X = X mod μ,
k≥n k≥0
T −k A = X mod μ.
n∈N k≥n
Proof. This theorem is a direct consequence of Lemma 2.2.7 and Corollary 2.2.11 since
for g := 1 A ∈ L+1 (μ) and f ∈ L1 (μ) with f > 0, arbitrary, we obtain with the help of
Remark 2.2.13
# - # -
∞
k
∞
k
X= x∈X: 1 A (T (x)) = ∞ ⊂ x ∈ X : f (T (x)) = ∞ mod μ.
k=0 k=0
Let us end this section with another recurrence theorem, this one giving information
about distances of orbits mapped by a measurable function to a metric space.
Proof. Let B ⊂ M be a measurable set of diameter less than 1/n. Then by Halmos’s
Recurrence Theorem 2.2.6 we have μ-a.e. on f −1 B
∞
1 f −1 B ◦ T n = ∞.
n=0
Consequently, there exists a μ-null set N B ∈ B such that for all x ∈ f −1 B \ N B we have
1
lim inf d f (x), f ◦ T n (x) ≤ .
n→∞ n
By the separability of M there exists up to measure zero a countable cover of X by sets
of the form f −1 B \ N B with measurable subsets B ⊂ M of diameter less than 1/n. The
union of these sets U n has full measure and for all x ∈ U n we have
1
lim inf d f (x), f ◦ T n (x) ≤ .
n→∞ n
2.3 The transfer operator | 75
)
The claim of the theorem then holds for all x in n∈N U n , which is a set of full
measure.
Now, we remind the reader of absolutely continuous measures and the Radon–Nikodým
Theorem, which will be utilised heavily for the rest of this section.
Definition 2.3.1. Let μ and ν be two measures on a measurable space (X, B). Then ν is
called absolutely continuous with respect to μ, or sometimes μ-absolutely continuous,
and written ν μ, if for all B ∈ B with μ(B) = 0 we have that ν (B) = 0. Moreover, if for
the two measures μ and ν we have that ν μ as well as μ ν , then the two measures
are said to be equivalent and we write μ ∼ ν . We will also refer to equivalent measures
as being in the same measure class.
Note that the definition of a non-singular transformation given in Definition 2.2.4 can
also be phrased in the following way: The transformation T : X → X is non-singular if
the measure μ ◦ T −1 is absolutely continuous with respect to μ.
Remark 2.3.2. Let us point out here that Proposition 2.1.12 shows that the Gauss
measure m G is absolutely continuous with respect to the Lebesgue measure λ and vice
versa. Hence, the two measures are equivalent.
Lemma 2.3.3. Let (X, B , μ) be a σ-finite measure space and g ∈ M+ (B). Then the
integral
,
μ g (A) := g dμ, A ∈ B , (2.3)
A
defines a σ-finite measure μ g on (X, B), which is absolutely continuous with respect to
μ. We have g ∈ L+1 (μ) if and only if μ g is finite. We have μ g = μ h for two measurable
non-negative functions g, h if and only if g ∼ h, in other words, if g = h as elements of
M+ (B ).
The unique function g ∈ M+ (B) with ν = μ g is called the density of ν with respect
to μ.
Proof. The fact that μ g is a σ-additive set function (and hence defines a measure)
follows from the Monotone Convergence Theorem. Since μ is σ-finite there exists a
sequence of measurable sets B1 ⊂ B2 ⊂ · · · such that B k = X and μ(B k ) < ∞. Since
g < ∞ on X we have that A k := B k ∩ {g ≤ k} defines an increasing sequence of sets with
union equal to X for which we have μ g (A k ) ≤ kμ(B k ) < ∞. This shows that also μ g is
σ-finite. Clearly, μ g μ and g ∈ L+1 (μ) if and only if μ g is finite.
76 | 2 Basic ergodic theory
Theorem 2.3.4 (The Radon–Nikodým Theorem). Let μ and ν be two σ-finite measures
on a measurable space (X, B) such that ν μ. Then there exists a unique element
h ∈ M+ (B) such that ν = μ h , that is for every set B ∈ B, we have
,
ν (B) = h dμ.
B
Moreover, if the measure ν is finite, then this almost surely unique function h belongs to
L1 (μ).
Proof. See, for instance, Theorem 6.10 in Rudin [Rud87], (where the theorem is proved
in slightly greater generality than stated here).
Remark 2.3.5. The unique density h appearing in Theorem 2.3.4 is often referred to as
the Radon–Nikodým derivative of ν with respect to μ and denoted by h = dν /dμ.
Our aim now is to obtain invariant measures which are absolutely continuous to
a given reference measure. Throughout, we consider a non-singular transformation
T : X → X on a σ-finite measure space (X, B , μ). Let g ∈ M+ (B) and suppose that μ g has
0
density g with respect to μ, that is for all measurable sets A we have μ g (A) := A g dμ.
Then, since μ g μ, we have μ g ◦ T −1 μ ◦ T −1 μ. Thus, via the Radon–Nikodým
Theorem, we can define the operator T μ : M+ (B) → M+ (B) by
−1
μ (g) := d(μ g ◦ T ) .
T
dμ
Definition 2.3.6. The operator T μ : L1 (μ) → L1 (μ) defined above is called the transfer
operator of T with respect to the measure μ.
2.3 The transfer operator | 77
Then, an approximation argument shows that for all f ∈ L∞ (μ), we equivalently have
that
, ,
dμ = (f ◦ T) · g dμ.
f · T(g) (2.4)
X X
0 0
k (g) dμ = (f ◦ T k ) · g dμ. Furthermore,
Inductively, it follows for all k ∈ N that X f · T X
the relation in (2.4) characterises T(g). Indeed, suppose there exist g1 and g2 in L1 (μ)
such that for all f ∈ L∞ (μ) we have
, , ,
fg1 dμ = (f ◦ T)g dμ = fg2 dμ.
and thus g1 ∼ g2 .
Remark 2.3.7. The relation in (2.4) shows that the transfer operator captures the
evolution of probability densities under the action of T : [0, 1] → [0, 1] in the following
sense. Suppose that X0 denotes a [0, 1]-valued random variable with a distribution
0
absolutely continuous with respect to μ and density g. That is, P(X0 ∈ A) = A g dμ, for
all A ∈ B. Then, for all n ∈ N, the distribution of the random variable X n := T n ◦ X0 also
has a density with respect to μ, and that density coincides with T n g.
Remark 2.3.8. For a σ-finite measure μ, it is a fact that (L1 (μ))* L∞ (μ). (Here the star
denotes the dual space of L1 (μ), where we recall that if X is a normed linear space, then
the dual space X * is defined to be the set of all continuous linear functionals f : X → R.)
Hence, the operator
U T : L∞ (μ) → L∞ (μ),
f →f ◦T
We now formulate a dual version of Proposition 2.2.9. For this fix some f ∈ L1 (μ) with
T and its complement D
f > 0 and define the set C T by setting
# - # -
k k
T :=
C f = ∞ and D
T T := f <∞ .
T
k∈N k∈N
Remark 2.3.10. The last proposition shows in particular that for a non-singular
dynamical system, the set D T is the measurable union of the wandering sets (cf.
Remark 2.2.10). Hence, this decomposition is
– independent of the chosen positive integrable function f and
– for a measure-preserving system coincides with the previously-defined Hopf
decomposition.
2.3 The transfer operator | 79
Corollary 2.3.11. Let (X, B , μ, T) be a non-singular dynamical system. Then the system
is conservative if and only if
T = X
C mod μ.
Proof. The proof follows exactly along the lines of the proof of Corollary 2.2.11.
The reason for naming T the transfer operator should now be clear. The idea behind
this operator is that first of all it transfers the action of T on X to an action on L1 (μ)
and secondly, it transfers in this way the measure-theoretic problem of finding a
T-invariant measure to the functional-analytic problem of finding a fixed point for
This is all very well, but without an explicit formula for T,
the operator T. we are still
not really any closer to actually writing down an invariant measure for our remaining
examples. In the following section, we will address this issue.
Let us now specialize somewhat to the case that T is a continuous map of the circle R/Z
such that T admits a Markov partition {A i : i ≥ 1}, as in Definition 1.2.21. Let us further
80 | 2 Basic ergodic theory
assume that the map T consists of full branches, that is, that T |A i (A i ) = (0, 1) for all
i ≥ 1. We are interested in finding invariant measures that are absolutely continuous
with respect to the Lebesgue measure λ, since these retain some physical meaning, so
instead of the transfer operator associated to some arbitrary σ-finite measure μ, we
shall consider the special case of the transfer operator with respect to λ.
Remark 2.3.14. If μ ∼ λ and h := dμ/dλ denotes the a.e. positive density of μ with
μ and T
respect to λ (see Exercise 2.6.13), then the operators T λ are related as follows:
μ (f ) = 1 T
T (h · f ).
h λ
Indeed, for every A ∈ B, using the identity in (2.4) twice, first for the measure μ and
then for λ, we obtain that
, , ,
T μ (f ) dμ =
1 A · T μ (f ) dμ = (1 A ◦ T) · f dμ
A [0,1] [0,1]
, ,
= (1 A ◦ T) · fh dλ = λ (h · f ) dλ
1A · T
[0,1] [0,1]
,
1
= T (h · f ) dμ.
h λ
A
Our aim here, as advertised at the end of the last section, is to define another, related,
operator which has a more concrete form. For this, we will need the following lemma.
Lemma 2.3.15. Let φ : (a, b) → (0, 1) be continuously differentiable such that either
φ > 0 or φ < 0. Then we have
dλ ◦ φ−1 −1
= (φ ) .
dλ
Proof. We only consider the case φ > 0. Let g := 1 J for some subinterval J of (0, 1).
Then by the substitution rule of integration we have
, , ,
λ ◦ φ (J) = g ◦ φ dλ = g ◦ φ · φ /φ dλ = g ◦ φ · (φ−1 ) ◦ φ · φ dλ
−1
, ,
= g · (φ−1 ) dλ = (φ−1 ) dλ
J
Since the subintervals of (0, 1) generate B we have verified the claim of the lemma.
For an interval map T, as described above, we have that the maps T |A i are invertible
with measurable inverse branches T i := (T |A i )−1 . Further, suppose that λ ◦ T i λ.
2.3 The transfer operator | 81
d(λ ◦ T i )
Ji := .
dλ
Proposition 2.3.16. For each g ∈ M+ (B) or g ∈ L1 (μ), we have
λ (g) =
T g ◦ T i · Ji .
i≥1
Definition 2.3.17. Suppose that the dynamical system ([0, 1], T) is such that all the
inverse branches T i := (T |A i )−1 are additionally continuously differentiable. Then the
Ruelle operator P T is defined for any function f : [0, 1] → R such that the right-hand
side makes sense for x ∈ (0, 1) to be
P T (f )(x) := (f ◦ T i )(x) · T i (x). (2.5)
i≥1
λ .
PT = T
Proof. We have that if f , g are two integrable functions in the same L1 (λ) equivalence
class, then also
To see this, it suffices to note that under strictly monotone continuously differentiable
functions, Lebesgue null-sets are mapped to null-sets.
Since by Lemma 2.3.15 we have Ji = T i the second statement follows from
Proposition 2.3.16.
82 | 2 Basic ergodic theory
Proposition 2.3.19. For the Farey system ([0, 1], B , F), the λ-absolutely continuous
measure νF defined by the density h F , given by h F (x) := 1/x, is an F-invariant measure.
Moreover, the measure νF is infinite and σ-finite.
Proof. First observe that according to (2.5), the Ruelle operator P F for the map F acts
in the following way: For f : [0, 1] → R measurable,
P F (f ) = F0 · ( f ◦ F0 ) + F1 · ( f ◦ F1 ) ,
where F0 (x) := x/(1 + x) and F1 (x) := 1/(1 + x) denote the inverse branches of the
Farey map (as defined in Section 1.3.1). In order to show that the density h F defines
an F-invariant measure absolutely continuous with respect to λ, it suffices to show
that P F (h F ) = h F . Indeed, we have
1 −1
F0 (x) = and F1 (x) = .
(1 + x)2 (1 + x)2
The following calculation then finishes the proof:
1 x 1
P F (h F )(x) = hF + hF
(1 + x)2 1+x 1+x
(1 + x) + x(1 + x) 1
= = = h F (x).
x(1 + x)2 x
Before moving on to consider invariant measures for the α-Farey systems, let us
make an interesting observation about fractions in the Stern–Brocot tree (which was
introduced directly after Definition 1.3.4). For each reduced fraction v/w ∈ (0, 1) and
each n ∈ N0 , we have that
1 1
= . (2.6)
pq vw
p/q∈F −n (v/w)
The identity in this special case was first noted by the Canadian music theorist Pierre
Lamothe (see reference in [Gut11]), whereas the general case originates in [KS12a]. To
2.3 The transfer operator | 83
first prove (2.6) in an elementary way, suppose that the statement is true for all reduced
fractions v/w ∈ (0, 1) and for some n ∈ N0 . Then,
1 1
=
p
pq p
pq
q ∈F q ∈F
−(n+1) v
( )
w (F −1 ( wv ))
−n
1 1
= +
p
pq p
pq
q ∈F q ∈F
v w
−n
( v+w )
( v+w ) −n
1 1 1
= + = .
v(v + w) w(v + w) vw
Alternatively, the equality in (2.6) can be deduced as a special case from the fixed point
equation for the Ruelle operator P F for the map F. As in Proposition 2.3.19, let h F (x) :=
1/x denote the density which satisfies P F (h F ) = h F . For all x ∈ (0, 1) and all n ∈ N0 , we
then have that
n −1
(F ) (y) h F (y) = h F (x). (2.7)
y∈F −n (x)
(You are asked to prove the above statement in Exercise 2.6.12.) A straightforward
calculation, which we leave to Exercise 2.6.14, shows that
n p q2 p −n v
(F ) = , for all ∈ F .
q w2 q w
n p −1 p v w
= (F ) hF = hF = ,
q q w v
p
q ∈F ( w )
−n v
Proposition 2.3.20. For each α-Farey system ([0, 1], B , F α ), the λ-absolutely continu-
ous measure να , given by the density h F α , which is defined, up to multiplication by a
constant, by
dνα t n
∞
h F α := = ·1 ,
dλ a n An
n=1
Proof. We will prove this as in the case of the Farey map by considering the Ruelle
operator P F α for the map F α , which acts on measurable functions f : [0, 1] → R by
P F α ( f ) = F α,0 · ( f ◦ F α,0 ) + F α,1 · ( f ◦ F α,1 ) .
P F α (h F α ) = h F α .
To see this, note that for the inverse branches F α,1 and F α,0 and the density h F α , an
easy computation shows that we have
∞
h F α ◦ F α,1 = t1 /a1 · 1[0,1] and h F α ◦ F α,0 = t n+1 /a n+1 · 1 A n .
n=1
∞
|F α,1 | = a 1 · 1[0,1] and |F α,0 | = a n+1 /a n · 1 A n .
n=1
Regarding the second statement of the proposition, the σ-finiteness of the measure να
follows from the fact that να (A n ) < ∞ for all n ∈ N, and a simple calculation shows that
for each n ∈ N we have that
n n ,
n
n
tk
n
να Ak = να (A k ) = h F α dλ = · ak = tk .
ak
k=1 k=1 k=1 A k=1 k=1
k
∞
Recalling that α is of infinite type provided that k=1 t k diverges, the proof is finished.
To complete the picture for our main examples, we recall the following facts for the
Gauss and the α-Lüroth system.
2.3 The transfer operator | 85
Proposition 2.3.21.
(a) For the Gauss system ([0, 1], B , G), the λ-absolutely continuous measure m G , given
by the density h G , which is defined by
dm G 1 1
h G (x) := = ,
dλ log 2 1 + x
dm α
h L α (x) := = 1[0,1] ,
dλ
is a L α -invariant probability measure. In other words, m α coincides with λ.
Proof. The statements in (a) and (b) have already been obtained in Proposition 2.1.10
and Proposition 2.1.8, respectively. However, for the Gauss map G this statement can
now be derived alternatively by considering the Ruelle operator P G for G, which is
defined, for measurable functions f : [0, 1] → R and x ∈ [0, 1], by
∞
P G (f )(x) := |Gn (x)| f (G n (x)),
n=1
with G n referring to the n-th inverse branch of G (see also Remark 2.1.11). One then
immediately verifies that P G (h G ) = h G , from which the assertion follows. The proof for
L α , using the Ruelle operator P L α for L α , is left as an exercise (see Exercise 2.6.4).
Remark 2.3.22.
1. It is clear that the measures νF , να , m G and m α are all absolutely continuous with
respect to λ; indeed, they are defined that way. The converse is also true, since
their densities are strictly positive. Hence, all these measures are equivalent to λ.
2. The reader may have encountered the Bogolyubov–Krylov Theorem, which states
that for an arbitrary continuous map T : X → X on a compact metric space X there
always exists a T-invariant probability measure. However, this theorem does not
say anything about whether the measure is absolutely continuous with respect to
Lebesgue measure. There may be several invariant measures, but we will shortly
see in Section 2.4.5 that in our leading examples the λ-absolutely continuous ones
given by the densities h F and h F α are unique for their respective systems.
3. We can use Maharam’s Recurrence Theorem (see Theorem 2.2.14) to show that the
Farey system ([0, 1], F, νF ) and any α-Farey system ([0, 1], F α , να ) is conservative.
Indeed, this can be seen immediately on observing that both ∞ −n
n=0 F ((1/2, 1])
∞ −n
and n=0 F α (A1 ) are equal to (0, 1].
86 | 2 Basic ergodic theory
Before moving on from the subject of invariant measures, let us discuss another way of
obtaining an invariant measure for an infinite measure system, this time by way of the
jump transformation. Recall that in Definitions 1.3.2 and 1.4.9, we introduced specific
jump transformations for the maps F and F α , respectively. We also proved that the
Gauss map G is the jump transformation of the Farey map F with respect to the set
[1/2, 1] and the α-Lüroth map L α is the jump transformation of the α-Farey map F α
with respect to the set A1 . Let us now give a more general definition.
To shorten the notation, let us write {p = n} for the set {x ∈ X : p(x) = n}, so that T E* = T n
on {p = n}. We also write {p > n} for the set {x ∈ X : p(x) > n}. Observe that {p > 0} = X
and {p > n} = ∞ k=n+1 { p = k } for n ≥ 1.
Example 2.3.24. In the case of the Farey map, setting E := [1/2, 1] (so the jump
transformation F *E coincides with the Gauss map, as was already shown earlier), yields
that the sets {p = n} are given by the first level Gauss cylinder sets, that is, for each
n ∈ N, we have that {p = n} = C(n). Also, in this case we have that {p > 0} = [0, 1] and
{p > n } is equal to the Farey cylinder with code consisting of n zeros, for n ≥ 1.
One of the basic ideas behind the concept of the jump transformation is to find a set
E for a given map T such that the jump transformation T E* is easier to understand
than the original map T. The hope is to find a jump transformation which turns out
to be a map that has already been studied earlier. The following lemma then yields
information about T. (Throughout this section, it should be helpful to keep the Gauss
map and the Farey map in mind, along with their invariant measures.)
Lemma 2.3.25. With the notation above, assume that the map T E* : X → X preserves
a finite measure ν . We then have that the measure μ, defined for any measurable set
B ∈ B by
μ(B) := ν (T −n (B) ∩ {p > n }),
n≥0
is T-invariant.
2.3 The transfer operator | 87
As an example, observe that if in the above lemma we put B = E, then we obtain that
μ(E) = ν (X). Also, let us consider the situation for the Farey and Gauss systems. We
already know that the Gauss map G preserves the probability measure m G . So, if we
set E := [1/2, 1], then Lemma 2.3.25 tells us that the Farey map F preserves the measure
μ, where μ is given for arbitrary measurable B ∈ B by
μ(B) := m G (B) + . . . , 0)n ).
m G (F −n (B) ∩ C(0,
n≥1
. . . , 0)n denotes the level n Farey cylinder set with code consisting of n zeros,
Here, C(0,
for each n ≥ 1. To get an idea of what this measure actually looks like, let us calculate
μ([1/(k + 1), 1/k]), for some k ≥ 2 (note that for k = 1, this is precisely μ(E) and we
. . . , 0)n is
already know that μ(E) is equal to m G (X)). Observe first that F −n (B) ∩ C(0,
n
equal to F0 (B), for any measurable set B. Therefore, we have that
μ(B) = m G (F0n (B)).
n≥0
Recalling that F0 (x) = x/(1 + x), one immediately verifies that F0n (x) = x/(1 + nx). Thus,
μ([1/(k + 1), 1/k]) = m G (F0 ([1/(k + 1), 1/k]))
n≥0
1 ,
1/(k+n)
1
= dλ(x)
log 2 1+x
n≥0
1/(k+n+1)
1 1
1
= log 1 + − log 1 +
log 2 k+n k+n+1
n≥0
1 ! k+n+1 k+n+1
= log ·
log 2 k+n k+n+2
n≥0
log k + 1/k
= .
log 2
88 | 2 Basic ergodic theory
Recalling the measure νF obtained via the Ruelle operator in Proposition 2.3.19, we
deduce that for the sets [1/(k + 1), 1/k], for k ≥ 1,
1
ν ([1/(k + 1), 1/k]) = μ([1/(k + 1), 1/k]).
log 2 F
It will turn out later that these two measures really are equal up to multiplication by a
constant (see Theorem 2.4.35).
Definition 2.4.1. The dynamical system (X, B , μ, T) is said to be ergodic provided that
whenever A ∈ B is such that T −1 (A) = A we have that either μ(A) = 0 or μ(X \ A) = 0.
In other words, an ergodic transformation has only trivial invariant subsets. We will
often simply say that the map T is ergodic with respect to μ or that μ is an ergodic
measure for T. If the system (X, B , μ, T) is measure-preserving and T is ergodic, we
will call the system an ergodic measure-preserving system.
Remark 2.4.2. Note that it is often useful to talk about T-invariant ergodic measures,
and in many books ergodicity is only defined for T-invariant measures. However, this
is not necessary. It can also be of interest to talk about ergodic measures that are not
T-invariant.
Let us begin by considering the existence of sweep-out sets (see Definition 2.2.12) for
conservative, ergodic systems.
Proof. Suppose that E ∈ B is such that μ(E) > 0. Further, let E := E \ W, where W :=
{x ∈ E : T n (x) ∈
/ E for all n ≥ 1}. Then, by conservativity (see Halmos’s Recurrence
Theorem 2.2.6), we have that μ(W) = 0 and thus, μ(E ) > 0. Now, noting that x ∈
∞ −n ∞ −n
n=0 T (E ) if and only if x ∈ n=1 T (E ), we have that
∞ ∞
T −1 T −n (E ) = T −n (E ),
n=0 n=0
∞
and thus, since μ( n=0 T
−n
(E )) > 0, the ergodicity of T implies that
∞ ∞
T −n (E) = T −n (E ) = X mod μ.
n=0 n=0
2.4 Ergodicity and exactness | 89
(Recall that the we use the notation A = B mod μ to mean that the two sets A and B are
equal up to a set of μ-measure zero).
Proof. Suppose first that the condition stated for measurable functions holds. Then
let A ∈ B be either a wandering set or a T-invariant proper subset of X and suppose
that μ(A) > 0. It follows that
#∞ -
k
1 A ◦ T = ∞ = X mod μ
k=0
contradicting our assumption for f = 1 A . Hence, such a set A does not exist and T is
conservative and ergodic.
The reverse implication follows from the fact that for a non-negative measurable
0
function f with f dμ > 0 there must exist δ > 0 with μ({f > δ}) > 0. Hence, δ1{f >δ} ≤ f
and by Lemma 2.4.3 and Remark 2.2.13 we find μ-a.e. that
∞
∞
f ◦ Tk ≥ δ 1{f >δ} ◦ T k = ∞.
k=0 k=0
Proof. Suppose first that T is conservative and ergodic. Since by Lemma 2.4.3 any A ∈ B
with μ(A) > 0 is a sweep-out set, we have that
,
∞ ,
∞
k f · 1 A dμ =
T f· 1 A ◦ T k dμ = ∞.
k=0 k=0
k
Therefore, it follows that ∞k=0 T f = ∞ μ-a.e. on X.
For the reverse implication fix a wandering set W ∈ B with μ(W) > 0 and f ∈ L+1 (μ)
0
with f dμ > 0. Then we obtain the contradiction
,
∞ ,
∞ ,
∞= k f · 1 W dμ =
T f· k
1 W ◦ T dμ ≤ f dμ < ∞.
k=0 k=0
90 | 2 Basic ergodic theory
If A is T-invariant with μ(A) > 0 and B a measurable subset of X \ A with 0 < μ(B) < ∞
then similarly we find the contradiction
,
∞ ,
∞ ,
∞
∞= k 1 B · 1 A dμ =
T 1B · 1 A ◦ T k dμ = 1 B · 1 A dμ = 0.
k=0 k=0 k=0
Proposition 2.4.6. For a non-singular dynamical system (X, B , μ, T), the following are
equivalent:
(a) T is ergodic with respect to μ.
(b) For B ∈ B, if B = T −1 B mod μ (that is μ(B T −1 (B)) = 0), then either μ(B) = 0 or
μ(X/B) = 0.
(c) For f : X → R measurable, if f ◦ T = f μ-a.e., then f is μ-a.e. equal to a constant.
Proof. First we prove that (a) implies (b). Suppose that T is ergodic and let B be a
μ-almost-invariant measurable set, that is, μ(B T −1 (B)) = 0. We aim to construct a
T-invariant set A from B such that A has the same μ-measure as B. So, define
∞ ∞
A := T −k (B).
n=0 k=n
Moreover, since
k−1 k−1
B T −k (B) ⊂ T −i (B) T −(i+1) (B) = T −i (B T −1 (B))
i=0 i=0
and since the system is non-singular, we also have that μ(B T −k (B)) = 0. Let B n :=
∞ −k
k=n T (B) and notice that the sequence (B n )n≥1 is a decreasing
)
sequence of sets
with the property that μ(B n B) = 0, for each n ∈ N, and n∈N B n = A. It follows that
μ(A B) = 0 and so,
μ(A) = μ(B).
Furthermore, it is clear that the set A is T-invariant. Hence, the ergodicity of T implies
that μ(A) = 0 or μ(X \ A) = 0. Consequently, either μ(B) = 0 or μ(X/B) = 0, which proves
the first implication.
Now we prove the implication from (b) to (c). Let the system be ergodic and
suppose that for the measurable function f : X → R we have that f ◦ T = f μ-a.e. For
each c ∈ R, we make the observation that the set D c := {f ≤ c} := {x ∈ X : f (x) ≤ c} is
2.4 Ergodicity and exactness | 91
= X mod μ.
Proposition 2.4.7. For a conservative non-singular dynamical system (X, B , μ, T), the
following are equivalent:
(a) T is ergodic with respect to μ.
(b) For A ∈ B, if μ(A) > 0, then ∞ −n
n=0 T (A) = X mod μ.
(c) For A, B ∈ B, if μ(A)μ(B) > 0, then there exists n ∈ N such that
Proof. We will prove the string of implications (a) implies (b) implies (c) implies (a).
The implication from (a) to (b) is a consequence of Lemma 2.4.3 since the system
is supposed to be non-singular, ergodic and conservative. To prove that (b) implies (c),
let A and B be sets of positive measure. Since (b) holds, we have that
∞
T −n (A) = X mod μ,
n=1
It follows that there must be at least one n ≥ 1 such that μ(B ∩ T −n (A)) > 0.
Suppose now that (c) holds and let A be a T-invariant set. Then we have for all
n ∈ N that
So, by (c), either μ(A) = 0 or μ(X \ A) = 0, proving that T is ergodic. This finishes the
string of equivalences.
Next we state the following important uniqueness result for finite invariant ergodic
measures.
Proposition 2.4.8. Let (X, B , μ, T) be an ergodic invariant system with μ(X) = 1 and let
ν be another T-invariant probability measure on (X, B ) with ν μ. Then we have that
ν = μ.
Proof. Since ν μ, the Radon–Nikodým Theorem implies that there exists a density
f ∈ L+1 (μ) with dν = f dμ. We are going to prove that f is constant μ-almost everywhere.
Since ν and μ are both probability measures, this then guarantees that f = 1. In fact,
for r ∈ R and all B ⊂ {f > r} with positive μ-measure we have
,
ν (B ) − rμ (B ) = (f (x ) − r) dμ (x ) > 0,
B
ν (B ) > rμ (B ) .
Similarly, for all C ⊂ F r := {f ≤ r} it follows that ν (C) ≤ rμ (C). Making use of Remark 2.1.2
and since T −1 (F r ) \ F r ⊂ {f > r}, we either have μ T −1 (F r ) \ F r = 0 or
Since T is ergodic, it follows that μ (F r ) ∈ {0, 1}. This shows that f has to be μ-a.e.
equal to a constant.
Let us now prove that the Gauss map is ergodic with respect to the Gauss measure.
Before beginning, we introduce the notation “a b”, which means that there exists a
positive constant C such that C−1 a ≤ b ≤ Ca.
Lemma 2.4.9. The Gauss map G is ergodic with respect to the Gauss measure m G .
Proof. The first and most important step in the proof is to show that for any given
continued fraction cylinder set C(x1 , . . . , x n ) and for all measurable sets B, we have
2.4 Ergodicity and exactness | 93
that
In fact, it is sufficient to demonstrate (2.8) for all intervals of the form B := [c, d] ⊆ [0, 1],
since the set of all Borel sets satisfying (2.8) for a fixed constant can be shown to be a
monotone class.
Thus, let B be some fixed interval in [0, 1] and let p n /q n := [x1 , . . . , x n−1 , x n ] and
p n−1 /q n−1 := [x1 , . . . , x n−1 ] denote the n-th and (n − 1)-th approximants of the number
x = [x1 , x2 , . . .] ∈ [0, 1]. Notice that x ∈ G−n (B) ∩ C(x1 , . . . , x n ) if and only if G n (x) =
[x n+1 , x n+2 , . . .] ∈ B. Since G n is monotonic on each cylinder set C(x1 , . . . , x n ), it follows
that G−n (B) ∩ C(x1 , . . . , x n ) is an interval with endpoints given by
p n + p n−1 c p n + p n−1 d
and ,
q n + q n−1 c q n + q n−1 d
for some c, d > 0. (This follows from Theorem 1.1.5 and the fact that in this case we
have r n+1 = c−1 or r n+1 = d−1 .) Therefore, the Lebesgue measure of the intersection
G−n (B) ∩ C(x1 , . . . , x n ) is equal to
p n + p n−1 d p n + p n−1 c p n q n−1 d + p n−1 q n c − p n q n−1 d − p n−1 q n c
q n + q n−1 d − q n + q n−1 c = (q n + q n−1 c)(q n + q n−1 d)
1
= |d − c| ,
(q n + q n−1 c)(q n + q n−1 d)
by Theorem 1.1.1 (c). On the other hand, recall that the Lebesgue measure of the
cylinder set C(x1 , . . . , x n ) is given by
1
λ(C(x1 , . . . , x n )) = ,
q n (q n + q n−1 )
q n (q n + q n−1 )
λ(G−n (B) ∩ C(x1 , . . . , x n )) = λ(B)λ(C(x1 , . . . , x n ))
(q n + q n−1 c)(q n + q n−1 d)
λ(B)λ(C(x 1 , . . . , x n )).
Clearly, this also holds for finite unions of (disjoint) cylinder sets and, since finite
unions of cylinder sets generate the Borel σ-algebra, this implies that
which shows that m G (A) ∈ {0, 1}. This finishes the proof.
We can almost immediately obtain a stronger result about the Gauss map by only
slightly altering the above proof. We aim to show that the Gauss map is an exact
transformation. We first give the definition of exactness, for which we recall from
Definition 2.2.4 that a transformation is said to be non-singular if it preserves sets of
measure zero.
Remark 2.4.11.
1. This definition of exactness only makes sense for non-invertible transformations.
Indeed, if T : X → X is invertible, then it follows immediately that T −n (B) = B
for every n ∈ N. The correct corresponding property for invertible systems is
the K-property, named for Kolmogorov, who introduced it. For more details, see
[Par81] and references therein.
2. The tail σ-algebra is not an immediately transparent object. It helps to remember
that it is an intersection of sets of sets. In particular, this means that if B ∈
)
n∈N T
−n
(B), then B ∈ T −n (B) for all n ∈ N. Thus, there exists a sequence of sets
(B1 , B2 , B3 , . . .) such that B = T −n (B n ) for every n ∈ N. Another way of thinking of
this is to note that T −n (T n (B)) = T −n (T n (T −n (B n ))) = T −n (B n ) = B, for all n ∈ N.
3. It is easy to see that an exact transformation must be ergodic, for if the map T : X →
X is exact and B is a measurable subset of X such that T −1 (B) = B, then T −n (B) = B
for all n ∈ N and so, the set B belongs to the tail σ-algebra and hence, either μ(B)
or μ(X \ B) is equal to zero.
Theorem 2.4.12. The Gauss map G is exact with respect to the Gauss measure m G .
)
Proof. Let B ∈ B be such that B lies in the tail σ-algebra n∈N G−n (B). As noted in
Remark 2.4.11, this implies that there exists a sequence of sets (B n )n≥1 such that B =
G−n (B n ) for every n ∈ N. We have shown in the proof of Lemma 2.4.9, in (2.8), that for
cylinder sets C(x1 , . . . , x n ) we have
But this implies, since m G (B n ) = m G (G−n (B n )) and the measure m G is G-invariant, that
for every cylinder set C(x1 , . . . , x n ), we have that
This also holds for finite unions of cylinder sets and thus, since the cylinder sets
generate the Borel σ-algebra, we deduce that
Lemma 2.4.13. Let C denote an arbitrary level n + 1 Farey cylinder set with code ending
we then have that
in the symbol 1. For each B ∈ B such that B ⊆ C,
F n (B) .
λ(B) λ(C)λ
Proof. Say that for some x1 , . . . , x k ∈ N with ki=1 x i = n + 1 we have
:= C
C 0x1 −1 , 1, 0x2 −1 , . . . , 1, 0x k −1 , 1 .
x1 −1 , 1, 0x2 −1 , . . . , 1, 0x k −1 , 1).
B = G−k (E) ∩ C(x1 , . . . , x k ) = F −(n+1) (E) ∩ C(0
Since E = G k (B) = F n+1 (B) and since λ m G , by Proposition 2.1.12, it follows from (2.8)
that
Here the final inequality follows by using the change of variables formula.
Let us now prove directly that an α-Lüroth system ([0, 1], L α ) is exact with respect to
the Lebesgue measure. The proof follows along the same lines as that for the Gauss
map, so we give only a sketch here and leave the details as an exercise for the reader.
we have that
One immediately verifies that this also holds for a finite union of L α -cylinder sets. From
this, we deduce that
This shows that λ(B) = 0 or λ(B) = 1, and hence finishes the proof.
2.4.2 Ergodic theorems for probability spaces and consequences for the Gauss and
α-Lüroth systems
The first major result in ergodic theory was published in 1931 by G.D. Birkhoff
[Bir31]¹. This result is known as the pointwise ergodic theorem and it gives a precise
relationship between the average of an integrable function evaluated along the orbit of
a typical point (the time average) and the integral of the function (the space average).
There are now a great variety of proofs available; the interested reader is referred to
either [Wal82] or [EW11] (and references therein). In Chapter 4 we will prove the more
general Chacon–Orstein Ergodic Theorem and then show how to derive Birkhoff’s
Pointwise Ergodic Theorem from this more general theorem. Let us here simply state
the result for the case of an ergodic probability-measure-preserving system.
1 Birkhoff’s Pointwise Ergodic Theorem, although published first, was not the first ergodic theorem
to be proved. The work of von Neumann [Neu32] predates that of Birkhoff. In [Neu32] von Neumann
proves what is now called the Mean Ergodic Theorem. See the book [EW11] for an exposition of this
result and further references.
2.4 Ergodicity and exactness | 97
Since we have already proved that the Gauss and α-Lüroth systems are ergodic and
probability-measure-preserving, we may apply Birkhoff’s Pointwise Ergodic Theorem
to easily obtain the following interesting number-theoretic results. The original proofs
(in the continued fraction case, that is) of most of these statements were decidedly
more complicated.
Proposition 2.4.17. For λ-almost every real number x = [x1 , x2 , x3 , . . .] ∈ [0, 1], the
following statements hold:
(a) The element j appears in the continued fraction expansion of x with frequency
m G (C(y1 , . . . , y n )).
(d) For the growth rate of the denominators q n of the approximants to x, we have
1 π2
lim log(q n ) = .
n→∞ n 12 log 2
Proof. For the first statement, first notice that the element j appears in the first n
elements of the continued fraction expansion of an irrational number x with frequency
$ %
1 1 1 1
#{i : i ≤ n, x i = j} = # i : i ≤ n, G i (x) ∈ , .
n n j+1 j
, ,1/j
1 1 1
lim #{i : i ≤ n, x i = j} = f dm G = dλ(y)
n→∞ n log 2 1+y
1/(j+1)
1 1 1
= log 1 + − log 1 +
log 2 j j+1
2 log(1 + j) − log(j) − log(2 + j)
= .
log 2
The proof of the remaining part of the first statement follows similarly, on choosing
f := 1 C(y1 ,...,y n ) .
For part (b), define the function f : (0, 1) → (0, 1) by setting f (x) := log n, for
x ∈ (1/(n + 1), 1/n). It is easy to check that the function f is in L1 (λ) (and hence in
L1 (m G ), since λ and m G are comparable). By Birkhoff’s Pointwise Ergodic Theorem,
we therefore have for λ-a.e. x that
,
1 1
n n−1
lim log x j = lim f (G j (x)) = f dm G .
n→∞ n n→∞ n
j=1 j=0
A simple calculation shows that this yields the identity in part (b).
Proving part (c) requires a little more effort. Let now the function f be defined by
f (x) := 1/x = x1 , that is, f (x) is defined to be equal to the first element in the continued
fraction expansion of x. We then have that
1
n−1
1
(x1 + x2 + · · · + x n ) = f (G j (x)).
n n
j=0
However, we cannot directly apply Birkhoff’s Pointwise Ergodic Theorem because the
function f is not integrable in this instance. To overcome this, define for each N ∈ N,
#
f (x) if f (x) ≤ N;
f N (x) :=
0 otherwise.
The function f N is in L1 (λ) and so, by Birkhoff’s Pointwise Ergodic Theorem, we have
that
1 1
n−1 n−1
lim inf f (G j (x)) ≥ lim f N (G j (x))
n→∞ n n→∞ n
j=0 j=0
,1
= f N dm G .
0
The fact that the above integral tends to infinity as N tends to infinity finishes the proof
of part (c).
2.4 Ergodicity and exactness | 99
In order to prove part (d), first observe that if x = [x1 , x2 , x3 , . . .], then
p n (x) 1 1
= =
q n (x) x1 + [x2 , . . . , x n ] p n−1 (G(x))
x1 +
q n−1 (G(x))
q n−1 (G(x))
= .
x1 q n−1 (G(x)) + p n−1 (G(x))
This shows that p n (x) = q n−1 (G(x)) for every n ∈ N, since the approximants are in
reduced form. It follows that
so that
1
n−1
1 p n−j (G j (x))
− log(q n (x)) = log .
n n q n−j (G j (x))
j=0
First noticing that the second term on the right hand side tends to zero as n tends
p (G j (x))
to infinity, since q n−j (G j (x)) is a good approximation to G j (x) for large n, we have by
n−j
Birkhoff’s Pointwise Ergodic Theorem that
,1
1
n−1
1 1 log x π2
lim − log q n = lim f (G j (x)) = dλ(x) = − .
n→∞ n n→∞ n log 2 1+x 12 log 2
j=0 0
part (e) follows from part (d). This finishes the proof of the proposition.
Remark 2.4.18. Part (a) of the above proposition implies that for λ-a.e. x ∈ [0, 1], the
continued fraction expansion of x contains two 1s in a row infinitely often. Notice that
this implies that the same property holds for the Farey coding of almost every point.
This will turn out to be useful later on, when we prove that the Farey map is exact.
Part (c) says that the arithmetic mean of the first n continued fraction digits
diverges a.e. as n tends to infinity. Nevertheless, there exist meaningful stochastic laws
describing the continued fraction digits in greater depth. Lévy [Lév52] was the first
to derive non-degenerate limit laws in the context of continued fractions namely, we
100 | 2 Basic ergodic theory
have that the continued fraction digits belong to the domain of attraction to a stable
law with characteristic exponent 1. More precisely we have the following convergence
in distribution with respect to any absolutely continuous probability measure μ λ
Sk μ
− log k → F,
k/ log 2
where F has a stable distribution (cf. [Hei87] and [Phi88], and for related results see
also [Hen00]).
Khinchin showed that for a suitable normalising sequence a weak law of large
numbers holds (cf. [Khi35]). That is,
Sn 1
→
(n log n) log 2
∞
1 S
< ∞ and lim k = 0 λ-a.e.
nk k→∞ n k
k=1
or
∞
1 S
= ∞ and lim k = ∞ λ-a.e.
nk n
k→∞ k
k=1
On the other hand, Diamond and Vaaler have shown in [DV86] that for the trimmed
sum
n
Sn := x i − max x
1≤≤n
i=1
we have
Sn 1
lim = λ-a.e.
n→∞ n log n log 2
Finally, let us mention two further related results, namely, the extreme value law for
continued fractions by Galambos [Gal73, JKS13] and Philipp’s law [Phi76]. The extreme
value law states
$ %
ns
lim λ max x k < = exp(−1/s)
n→∞ 1≤k≤n log(2)
log log n 1
lim inf max x k · = .
n→∞ 1≤k≤n n log 2
2.4 Ergodicity and exactness | 101
Let us now turn our attention to the α-Lüroth systems. In light of the fact that we proved
in Section 2.4.1 that each map L α is ergodic with respect to the Lebesgue measure, we
can use Birkhoff’s Pointwise Ergodic Theorem to obtain various statements about the
α-Lüroth elements of λ-a.e. real number x ∈ [0, 1].
Proposition 2.4.19. Let L α denote the α-Lüroth map for the partition α = {A n : n ∈ N}
with λ(A n ) = a n and tails t n = ∞ k=n a k , as before. Then, for λ-a.e. x = [1 , 2 , 3 , . . .]α ∈
[0, 1], the following statements hold:
1
(a) lim #{j ≤ n : j = k} = a k , for each k ∈ N.
n→∞ n ⎛ ⎞
1 !n ∞
(b) lim log ⎝ j ⎠ = a k log k.
n→∞ n
j=1 k=1
1
n ∞
(c) lim j = tk .
n→∞ n
j=1 k=1
(d) For each k ∈ N, every finite sequence y1 , . . . , y k of positive integers appears infinitely
often in the α-Lüroth expansion of x.
(e) With the additional assumption on the partition α that a n ≤ t n+1 for sufficiently large
n ∈ N, we have that
∞
1
lim log x − r(α)
n = a k log a k .
n→∞ n
k=1
Then, let f be given by f (x) := log(a1 (x) ). Using (2.9), we have that
1 1 1
n−1 n n
lim f ◦ L jα (x) = lim log aj (x) = lim log x − r(α)
n
n→∞ n n→∞ n n→∞ n
j=0 j=1 j=1
, ∞ ,
= log a1 (x) dλ(x) = log a1 (x) dλ(x)
[0,1] k=1 A
k
∞
= a k log a k .
k=0
Remark 2.4.20.
1. The lists given in Propositions 2.4.17 and 2.4.19 gives only a small sample of the
possible results obtainable using Birkhoff’s Pointwise Ergodic Theorem in this
manner. The reader is invited to think of others.
2. It is immediately clear that the densities of the appearances of the digits in the
α-Lüroth expansion constitute a probability vector, as they are just given by the
associated a k s. Calculating the sum of the frequencies appearing in part (a) of
Proposition 2.4.17 shows that the same is true for the Gauss map.
3. The extra condition on α given in part (e) of the previous proposition is equivalent
to the requirement that a n /t n ≤ 1/2, for all n sufficiently large. For the example
of the alternating Lüroth map, this condition is met. It is also satisfied for any
expansive partition of exponent θ > 0 and for expanding partitions with ρ < 2.
Proof. To prove convergence in L1 (μ) we first verify the claim for bounded functions
and then use the fact that L∞ (μ) is dense in L1 (μ). Let h ∈ L∞ (μ) ⊂ L1 (μ). For
notational convenience we write S n h := n−1 j
j=0 h ◦ T . Since h ◦ T ∞ = h ∞ we also
have that the a.e. defined limit h* = lim n−1 S n h is in L∞ (μ). Hence, a.e. we have
−1
n S h − h* → 0 and by Lebesgue’s Dominated Convergence Theorem, it follows that
1 −1 n 1
1n S n h − h* 1 → 0. Since n−1 S n h is a Cauchy sequence in the Banach space L1 (μ),
1
for every ε > 0 there exists N(ε, h) ∈ N, such that for all k > 0 and n > N (ε, h) we have
1 1
11 1
1 S n h − 1 S n+k h1 < ε.
1n n+k 1
1
For ε > 0 and each f ∈ L1 (μ) we can find h ∈ L∞ (μ) with f − h1 < ε/4. Then for n >
N ε/2, h and k > 0
1 1 1 1 1 1
11 1 1 1 1 1
1 S n f − 1 S n+k f 1 ≤ 1 1 S n f − 1 S n h1 + 1 1 S n h − 1 S n+k h1
1n n+k 1 1 n n 1 1 n n+k 1
1 1 1
1 1
1 1 1 1
+1 1
1 n + k S n+k h − n + k S n+k f 1
1
≤ 2 f − h1 + ε/2 < ε.
1
Therefore n S n f n≥1 is a Cauchy sequence in L1 (μ) and consequently must have a
limit. This finishes the proof.
2.4 Ergodicity and exactness | 103
Let us now turn our attention back to infinite measure-preserving systems. It turns
out, and we give a straightforward proof of this at the end of the section, that in the
case of a dynamical system that preserves an infinite measure, Birkhoff’s Pointwise
Ergodic Theorem is replaced by the following statement.
1
n−1
lim f ◦ T j (x) = 0.
n→∞ n
j=0
We delay the proof of the above theorem until after the statement of a stronger ergodic
theorem (Theorem 2.4.24); for the moment, let us consider again the original statement
of Birkhoff’s Pointwise Ergodic Theorem given in Theorem 2.4.16. So, let T : (X, B , μ) →
(X, B , μ) be a probability-measure-preserving system and let A ∈ B be a measurable
set. Define S n (A) := n−1 j
j=0 1 A ◦ T , that is, the function S n (A) evaluated at a point
simply counts the number of visits the orbit of x makes to the set A before time n. We
shall, following Zweimüller [Zwe04], call S n (A) the occupation time of A. Birkhoff’s
Pointwise Ergodic Theorem then implies that
1
lim S n (A)(x) = μ(A), for μ-a.e. x ∈ X.
n→∞ n
This tells us three things. Firstly, it shows that the rate at which the occupation time
of A diverges is asymptotically the same for μ-a.e. x ∈ X. Secondly, it proves that this
rate depends on A only through the measure of the set A and, thirdly, it identifies the
occupation time as being proportional to n.
For infinite systems, however, the infinite-measure version of Birkhoff’s Pointwise
Ergodic Theorem, given above in Theorem 2.4.22, only provides an upper bound for
S n (A), but it gives no information on how the asymptotic behaviour of S n (A)(x) is
related to A and to x. It is natural to ask whether a sequence (c n )n∈N of normalising
constants can be found such that for all A ∈ B, we have that
1
lim S n (A)(x) = μ(A), for μ-a.e. x ∈ X.
n→∞ cn
Unfortunately, this is simply not possible, as the next theorem shows.
104 | 2 Basic ergodic theory
1
n−1
lim inf f ◦ T j (x) = 0, μ-a.e.,
n→∞ cn
j=0
0
or there exists a sequence (n k ) ∈ NN with n k → ∞ such that for all f ∈ L+1 (μ) with f dμ >
0, we have,
1
n k −1
lim f ◦ T j = ∞, μ-a.e.
k→∞ c n k
j=0
Aaronson’s result shows that in the situation of an infinite invariant measure, the
asymptotic behaviour of ergodic sums (or, more specifically, occupation times) is
extremely complicated. Despite this negative result by Aaronson, there are plenty of
interesting qualitative and quantitative characterisations for infinite dynamical sys-
tems. Our first aim in this direction is to show that although the pointwise asymptotics
of the ergodic sum of an integrable function f crucially depends on the point x ∈ X
0
chosen, it only depends upon the function f through its expected value X f dμ.
We will prove this theorem shortly, in Section 2.4.6, using the technique of inducing. (In
Chapter 4 we will give another, alternative proof of this theorem, by showing how to
derive it from the more general Chacon–Orstein Ergodic Theorem.) Before doing either
of those things, let us now show how to prove Theorem 2.4.22 using Hopf’s theorem:
Assume that the measure μ is infinite. By the σ-finiteness of the space (X, B , μ) we have
that for each m ∈ N, there exists some set B m ∈ B such that m ≤ μ(B m ) < ∞. Applying
Hopf’s Ratio Ergodic Theorem to the functions f ∈ L1 (μ) and 1 B m yields that
n−1 0
1
n−1 j
j j=0 f ◦ T f dμ
0 ≤ lim sup f ◦ T ≤ lim = X , μ-a.e. on X.
n→∞ n n→∞ S (1
n Bm ) μ(B m)
j=0
Here, the second inequality above comes from the fact that S n (1 B m ) ≤ n. Since m was
arbitrary and limm→∞ μ(B m ) = ∞, the proof of Theorem 2.4.22 is complete.
2.4 Ergodicity and exactness | 105
2.4.4 Inducing
In this section we will introduce and study induced maps. The idea behind these maps
is similar to that of the jump transformation introduced earlier (see Definitions 1.3.2
and 2.3.23). The basic construction goes back to Kakutani [Kak43] and Rokhlin
[Roh48]. In essence, it consists of viewing an infinite measure-preserving system
through the window of a set of finite measure. Recall that for a non-singular system
T : (X, B , μ) → (X, B , μ), a set A is called a sweep-out set for T if ∞ −n
n=0 T (A) =
X mod μ, and also that we showed in Lemma 2.4.3 that for conservative, ergodic
transformations, every set A ∈ B with 0 < μ(A) < ∞ is a sweep-out set.
Definition 2.4.25. Let A be a sweep-out set for the non-singular, conservative trans-
formation T : X → X and define the function φ : X → N by setting
A = A* := A ∩ T −k A mod μ.
n∈N k≥n
In the context of inducing we will always assume that A = A* , which guarantees that
φ(x) < ∞ for all x ∈ A. When we restrict the function φ to the set A, the map φ is called
the return time to A. Finally, the induced map T A : A → A of T on A = A* is defined to be
Further, the induced system (A, BA , m|A , T A ) is also non-singular and conservative.
we have that
∞
T A−1 (A ∩ B) = A ∩ {φ = n} ∩ T A−1 (A ∩ B)
n=1
∞
= A ∩ {φ = n} ∩ T A−1 (A) ∩ T A−1 (B)
n=1
∞
= A ∩ {φ = n} ∩ T −n (B).
n=1
This identity immediately implies that the induced system is non-singular. To see
that the induced system is also conservative note that by the definition of φ and the
conservativity of T we have for all B ∈ BA that
1 B ◦ T Ak = 1B ◦ T n = ∞
k∈N n∈N
a.e. on B.
Let us now turn to measure-theoretic questions. Properties of the induced map can be
used to deduce interesting properties of the original system and vice versa, as we shall
show in the following three propositions. First we assume some knowledge of T A and
use this to investigate T.
∞
m(B) := ν (A ∩ {φ > n } ∩ T −n (B)), for all B ∈ B
n=0
Proof. The proof of part (a) follows from Lemma 2.4.26, similarly to the proof of
Lemma 2.3.25. We leave the details as an exercise. Part (b) follows directly from
Maharam’s Recurrence Theorem (see Theorem 2.2.14).
Proof. To prove (a) we first claim that for any measurable T-invariant set B = T −1 (B),
the intersection A ∩ B is invariant under T A . Indeed, in light of Lemma 2.4.26, we have
2.4 Ergodicity and exactness | 107
that
∞ ∞
T A−1 (A ∩ B) = A ∩ {φ = n} ∩ T −n (B) = A ∩ {φ = n} ∩ B = A ∩ B.
n=1 n=1
c
Analogously, the second case yields that m(B ) = 0 and the proof is finished.
For the proof of part (b) we make use of Lemma 2.4.3 in the following way. Assume
that we have a set B ∈ BA with T A−1 (B) = B, μ A (B) > 0 and such that B is not equal to
A mod μ. Then the set A \ B has positive measure and for all x ∈ A \ B we find the
contradiction
∞
∞
0= 1 B ◦ T Ak (x) = 1 B ◦ T n (x) = ∞.
k=0 n=0
Let us now present a converse to Proposition 2.4.28. That is, we now assume some
knowledge concerning the original map T and use this knowledge to obtain facts
about the induced map T A .
FA (x)
1
0 1 1 x
2
Fig. 2.1. The induced map F A of the Farey map on the interval A := [1/2, 1].
Example 2.4.30. For the Farey map F, let A := [1/2, 1]. Then, the induced map F A :
A → A is given by
So, in this case, the sets {φ = n} are equal to the collection of second-level Gauss
cylinder sets {C(1, n) : n ∈ N}. In fact, we can explicitly calculate the map F A as
follows:
1−x
F A (x) = , for x = [1, n, x3 , x4 , . . .] ∈ C(1, n).
nx − (n − 1)
Also, note that the action of F A on the continued fraction expansion of a point x =
[1, x2 , x3 , . . .] ∈ A is given by F A ([1, x2 , x3 , x4 , . . .]) = [1, x3 , x4 , . . .]. (You will be asked
to check this in Exercise 2.6.5).
Proposition 2.4.31. The Farey map F is ergodic with respect to the measure νF .
Proof. To shorten the notation in what follows, let us denote the Borel σ-algebra on
[0, 1] by B and the Borel σ-algebra on [1/2, 1] =: A by BA .
2.4 Ergodicity and exactness | 109
It was shown in Proposition 2.3.19 that the map F preserves the σ-finite Borel
measure νF on the unit interval which is defined by the density h F , given by h F (x) =
1/x. It therefore follows from Proposition 2.4.29 that F A preserves the measure νF|A .
We will now show that the induced system (A, BA , νF|A , F |A ) of the Farey map and
the Gauss system ([0, 1], B , m G , G) are measure-theoretically isomorphic. Recall that
this means that there exist sets X ⊆ [1/2, 1] and Y ⊆ [0, 1] such that νF (X) = m G (Y) = 1
and a measure-preserving function ψ : X → Y such that ψ ◦ F A = G ◦ ψ. Here, remember
that the function ψ being measure-preserving means that νF|A ◦ ψ−1 (B) = m G (B), for all
B ∈ B. Indeed, it suffices to let X = [1/2, 1), Y = (0, 1] and the function ψ be equal to the
right-hand branch of the Farey map itself, that is, let φ(x) := F |A (x) for all x ∈ [1/2, 1).
Then, if x = [1, x2 , x3 , . . .] ∈ A, we have that
Furthermore,
& ' ,
1/(1+a)
−1 1 1 1 1
νF|A ((F |A ) ([a, b]) = νF|A , = dx
1+b 1+a log 2 x
1/(1+b)
1 b+1
= · log = m G ([a, b]).
log 2 a+1
It is clear that two measure-theoretically isomorphic systems are either both ergodic
or both not ergodic. Since we already know that G is ergodic, it follows from the
argument above that F A is also ergodic with respect to νF|A . Therefore, we can use
Proposition 2.4.28 to conclude that the map F is ergodic with respect to νF .
Remark 2.4.32. An argument similar to the one we have just given for F can be used
to show that each α-Farey map F α is ergodic with respect to the invariant measure
discovered in Proposition 2.3.20. We leave the details to Exercise 2.6.9.
∞
(a) μ(B) = μ(A ∩ {φ > n} ∩ T −n (B)), for all B ∈ B .
,n=0
(b) μ(X) = φ dμ.
A
Proof. By Lemma 2.4.3, every set A satisfying 0 < μ(A) < ∞ is a sweep-out set for T.
Therefore, the function φ is well defined. Observe that for all n ≥ 0,
(The proof of this fact is left to Exercise 2.6.8.) Now suppose that for all B ∈ B and some
fixed n ∈ N, we have
n
μ(B) = μ(A ∩ {φ > k} ∩ T −k (B)) + μ(A c ∩ {φ > n} ∩ T −n (B)). (2.11)
k=0
n
μ(B) = μ(A ∩ {φ > k} ∩ T −k (B)) + μ(T −1 (A c ∩ {φ > n}) ∩ T −(n+1) (B))
k=0
n
= μ(A ∩ {φ > k} ∩ T −k (B))
k=0
Incidentally, notice that the above calculation also shows that, for all n ≥ 0,
n
μ(A c ∩ {φ = n + 1}) = μ(A) − μ(A ∩ {φ = k + 1})
k=0
n
=μ A\ (A ∩ {φ = k + 1})
k=0
where the last two equalities are due to the observation made at the end of case
(i) and the fact that the set A is assumed to be of finite measure. This finishes the
proof of case (iii) and so completes the proof of part (a).
For part (b), if we substitute X for B into part (a), we obtain that
∞
∞ ,
μ(X) = μ(A ∩ {φ > n} ∩ T −n (X)) = μ(A ∩ {φ > n}) = φ dμ.
n=0 n=0 A
Remark 2.4.34. The result in Proposition 2.4.33 (b) is known as Kac’s formula.
Proof. Let m1 and m2 be two non-zero, T-invariant σ-finite measures that are both
absolutely continuous with respect to μ. Then, let B ∈ B with μ(B) > 0. Since T is
conservative and ergodic, Lemma 2.4.3 implies that the set B is a sweep-out set for
T with respect to the measure μ. That is, we have
∞
T −n (B) = X mod μ.
n=0
∞ −n
Therefore, since μ(X \ n=0 T (B)) = 0, we also have that
∞
∞
−n −n
m1 X \ T (B) = 0 and m2 X \ T (B) = 0.
n=0 n=0
112 | 2 Basic ergodic theory
In other words, the set B is also a sweep-out set for T with respect to m1 and m2 .
In particular, m1 (B), m2 (B) > 0, so the measures m1 and m2 are in fact in the same
measure class as μ.
Now choose A ∈ B such that 0 < m1 (A) < ∞ and 0 < m2 (A) < ∞. We may assume,
without loss of generality, that m1 (A) = m2 (A) = 1. Then, the measures m1 |A and m2 |A
are equivalent ergodic T-invariant probability measures for the dynamical system
given by T A : (A, BA ) → (A, BA ). Thus, according to Proposition 2.4.8, we have that m1 =
m2 on BA . The formula in Proposition 2.4.33 (a) then yields that m1 = m2 on all of B.
Proof. First, both F and F α are non-singular with respect to λ, since νF , να and λ are
in the same measure class. Then, as F and F α are both conservative and ergodic (see
Proposition 2.4.31 and Exercise 2.6.9), an application of Theorem 2.4.35 gives that both
νF and να are unique.
Now, we will turn our attention to a proof of Hopf’s Ratio Ergodic Theorem. The proof
we will shortly present is due originally to Zweimüller [Zwe04]. It exploits the idea of
inducing in a way that will allow us to apply the finite measure version of Birkhoff’s
Pointwise Ergodic Theorem.
Before we begin the proof, let us first fix some notation. Throughout, the system
(X, B , μ, T) is assumed to be conservative, ergodic and measure-preserving. For f ∈
L1 (μ), we denote ergodic sums for the system T by
n−1
S n (f ) := f ◦ Tj.
j=0
n−1
S An (h) := h ◦ T Aj .
j=0
n−1
φ n := S An (φ) = φ ◦ T Aj , (2.12)
j=0
2.4 Ergodicity and exactness | 113
where φ : A → N is the return time function on A. Note that for a specific x ∈ A, the j-th
summand φ ◦ T Aj (x) inside this sum is equal to the length of the j-th excursion of the
orbit (T n (x))n≥0 to the set A. To have a more concrete idea of what this means, it helps
to think in terms of continued fractions. So, if x = [1, x1 , x2 , x3 , . . .] ∈ A := [1/2, 1] and
if F A denotes the Farey map induced on A, then φ1 (x) := φ(x) = x1 , φ2 (x) := φ(x) +
φ(F A (x)) = x1 + x2 , and so on; in general,
n
φ n (x) := xi .
i=1
The idea of chopping up the orbits of points under T into pieces corresponding to each
excursion to the set A is a useful one. We can also apply this idea to obtain the induced
version of a function f : X → R, by adding up the values of the function observed during
the first excursion and then represent these as a single function.
A
φ(x)−1
f (x) := f ◦ T j (x).
j=0
S φ n (f ) = S An (f A ), for all n ∈ N.
(b)
, ,
f dμ = f A dμ.
X A
Proof. To prove part (a), we observe that for any n ∈ N, the section of orbit
x, T(x), . . . , T φ n (x)−1 (x) that determines the sum S φ n (f ) consists of n complete
excursions to A (that is, T φ n (x) (x) ∈ A). Therefore, we have that
= S φ (f ) + (S φ (f )) ◦ T A + · · · + (S φ (f )) ◦ T An−1 = S An (f A ).
114 | 2 Basic ergodic theory
To prove part (b), let f := 1 B for some B ∈ B. Using Proposition 2.4.33 (a), we then have
that
, ∞
1 B dμ = μ(B) = μ(A ∩ {φ > n} ∩ T −n (B))
X n=0
,
∞
n
= 1 A∩{φ>n} · 1 B ◦ T dμ
A n=0
⎛ ⎞
, φ−1 ,
= ⎝ 1 B ◦ T ⎠ dμ = (1 B )A dμ.
n
A n=0 A
Hence, the assertion in part (b) holds for characteristic functions. A standard ar-
gument from measure theory then finishes the proof; we leave the details as an
exercise.
Remark 2.4.39. The latter proposition also yields Kac’s formula (see Remark 2.4.34)
as a corollary, by simply choosing f := 1 X . In particular, note that this formula implies
that the T-invariant measure μ is infinite if and only if the return-time function to any
set A of positive finite measure is non-integrable.
Proof of Theorem 2.4.24. Let A be a sweep-out set for T. First observe that it suffices to
prove that for all f ∈ L1 (μ), we have that
0
S n (f ) f dμ
lim (x) = X , for μ-a.e. x ∈ A. (2.13)
n→∞ S n (1 A ) μ(A)
Indeed, the set of points where this limit exists and is equal to the right-hand side of
the equality in (2.13) is T-invariant and of strictly positive μ-measure (since μ(A) > 0).
Therefore, the correct limit must be attained μ-a.e., by ergodicity. Then, if the same
0
assertion is made for g ∈ L1 (μ), with the extra conditions that g ≥ 0 and X g dμ > 0,
the assertion of the theorem follows immediately.
Therefore, we are left only to give a proof of (2.13). For this, consider the
induced map T A . In light of Proposition 2.4.29, we have that T A is an ergodic
measure-preserving transformation on the finite measure space (A, BA , μ|A ). We can
therefore apply Birkhoff’s Pointwise Ergodic Theorem to T A and the induced function
f A , which is integrable by Lemma 2.4.38, to deduce that
, 0
S φ n (f ) S An (f A ) A f dμ
lim = lim = f dμ|A = X , μ-a.e. on A. (2.14)
n→∞ S φ n (1 A ) n→∞ n μ(A)
This proves (2.13) for μ-a.e. x ∈ A for the subsequence φ n (x) n≥1 . It remains to
demonstrate convergence for the full sequence.
2.5 Exactness revisited | 115
By the linearity of the integral, we may assume without loss of generality that f ≥ 0.
Then the sequence S n (f ) n≥1 is non-decreasing in n. Now, for a.e. x ∈ A we find for
every k ∈ N a positive integer n such that φ n−1 (x) ≤ k < φ n (x). Therefore, observing that
S k (1 A )(x) = n − 1 and using Lemma 2.4.38 (a), we have
Definition 2.5.1. Let (X, B , μ) be a σ-finite measure space and let T : (X, B , μ) →
(X, B , μ) denote a bi-measurable map (that is, T is measurable and T(B) ∈ B for all
B ∈ B). Then T is said to satisfy the intersection property with respect to the measure μ
provided that for every A ∈ B with positive measure, there exists some k ≥ 1, depending
on A, such that μ(T k (A) ∩ T k+1 (A)) > 0.
Lemma 2.5.2. Let the map T : (X, B , μ) → (X, B , μ) be bi-measurable and ergodic with
respect to μ. If T satisfies the intersection property, then T is exact.
Proof. Suppose that T is bi-measurable, ergodic and satisfies the intersection prop-
)
erty, and let A ∈ m∈N T −m (B). Suppose that μ(A) > 0. In order to show that T is
exact, we have to show that the complement of A has μ-measure equal to zero. Since
A belongs to the tail σ-algebra, we have that T −m (T m (A)) = A, for all m ≥ 0. We then
have for all m ≥ 0,
Lemma 2.5.3. Consider the Farey system ([0, 1], B , F), and let A be given such that
λ(A) > 0. Then
lim sup λ F n (A) ∩ C(1) = λ(C(1)).
n→∞
Proof. We always have λ F n (A) ∩ C(1) ≤ λ(C(1)). Hence we are left to show
that there exists a strictly increasing sequence of positive integers (n k )k≥1 such
that limk→∞ λ F n k (A) ∩ C(1) = λ(C(1)). For this let x = x1 , x2 , x3 , . . . be a
Lebesgue-density point² of A and recall that (C(x 1 , . . . , x n ))n≥1 denotes the shrinking
family of Farey cylinder sets each containing x. Note that, since F is ergodic, (or by
using Halmos’s Recurrence Theorem), we can certainly choose x such that there exists
a sequence (n k )k≥1 such that x n k +1 = 1, for all k ∈ N. To shorten the notation, let us
define for all k ∈ N,
1 , . . . , x n , x n +1 ) = C(x
D k := C(x 1 , . . . , x n , 1).
k k k
= C(1). Since x is a
We have that F n k is bijective on D k and we have that F n k (D k ) = C(1)
Lebesgue-density point of A, it follows that
λ(A ∩ D k ) λ(D k \ A)
lim = 1 and lim = 0. (2.15)
k→∞ λ(D k ) k→∞ λ(D k )
2 A good reference for the Lebesgue density theorem and Lebesgue density points is Rudin [Rud87];
see in particular Theorem 7.2.
2.5 Exactness revisited | 117
Proposition 2.5.4. Let A be given such that λ(A) > 0. Then for the Farey map F we have
that
Proof. Obviously, lim supn→∞ λ(F n (A) ∩ F n+1 (A) ∩ C(1)) ≤ λ(C(1)). As in the proof
of Lemma 2.5.3, let x = x1 , x2 , . . . be a Lebesgue-density point of A. In light of
Remark 2.4.18, we have that there exists a sequence (m k )k≥1 such that x m k +1 =
x m k +2 = 1, for all k. Therefore, for both of the sequences (C(x 1 , . . . , x m , x m +1 ))k≥1 and
k k
(C(x1 , . . . , x m k , x m k +1 , x m k +2 ))k≥1 we can proceed as in the proof of Lemma 2.5.3, which
yields that
Corollary 2.5.5. For the Farey system ([0, 1], B , F), let A be given such that λ(A) > 0.
Then there exists n ∈ N such that
In other words, we have that the Farey map F satisfies the intersection property with
respect to the Lebesgue measure λ.
118 | 2 Basic ergodic theory
Theorem 2.5.6. The Farey map F is exact with respect to the infinite invariant meas-
ure νF .
Proof. First suppose that T is exact and that f ∈ L1 (X, B , μ) has zero expectation, that
0
is, suppose that X f dμ = 0. Then, since T = 1 (see Exercise 2.6.11), the sequence
(T n (f )1 )n≥1 is bounded. To show that its limit is zero, fix a subsequence (n k ) such
k≥1
that
1 1 1 1
1 nk 1 1 n 1
lim 1T ( f )1 = lim sup 1T ( f )1 < ∞.
k→∞ 1 n→∞ 1
If (g n )n≥1 is defined by
g n := sign T n ( f ) ∈ L∞ (X, B , μ) ,
as subsets of the weak-* compact unit ball in L1 (X, B , μ)* . Since G K ⊂ G K+1 for all
)
K ∈ N, the intersection property of compact sets implies that K∈N G K = ∅. Fix
) *
g ∈ K∈N G K . By definition, g ∈ L1 X, T −n K B , μ L∞ X, T −n K B , μ for all K ∈ N, so
)
we have that g is measurable with respect to the tail-σ-algebra n∈N T −n B. Therefore,
by the exactness of T, we have that g must be constant μ-a.e., that is, g = c ∈ [0, ∞)
μ-a.e., for some c ∈ R. Since g is an accumulation point of the sequence g n k ◦ T n k k≥1 ,
0 0
there exists a subsequence n k ≥1 such that lim→∞ g n k ◦ T n k · f dμ = g · f dμ.
This gives that
1 1 1 1 ,
1 n 1 1 nk 1 n
0 ≤ lim sup 1T ( f )1 = lim 1T ( f )1 = lim g n k ◦ T k · f dμ
n→∞ 1 k→∞ 1 k→∞
, , ,
= lim g n k ◦ T n k · f dμ = g · f dμ = c · f dμ = 0.
→∞
In order to prove the converse, we assume that 1 T 1is not exact and construct
0 1 n 1
f ∈ L1 (X, B , μ) with f dμ = 0 and lim inf n→∞ 1T ( f )1 > 0. To that end, choose
) −n
1
A ∈ n∈N T B such that 0 < μ (A) < μ (X ), which is possible by the σ-finiteness of μ.
For the same reason there exists a measurable set B ⊂ X \ A such that 0 < μ (B) < ∞. For
0 0
f := 1 A − μ (A) /μ (B) 1 B , we have that f ∈ L1 (X, B , μ), f dμ = 0 and A f dμ = μ (A) > 0.
)
Since A ∈ n∈N T −n B, there exists a sequence (A n )n≥1 in B such that A = T −n A n , for
each n ∈ N. This yields that for all n ∈ N, we have
1 1 , , ,
1 n 1 n n n f dμ
1T ( f )1 ≥ T f dμ ≥ T f dμ = 1 A n T
1
An An
, ,
= 1 A n ◦ T n f dμ = f dμ > 0.
A
Proof. First note that since T preserves the probability measure μ, we have that
μ ◦ T −1 = μ, which implies d(μ ◦ T −1 )/dμ = 1 and hence, T 1 = 1. Using this, it
0 0
follows that for each f ∈ L1 (X, B , μ) we have that T(f − f dμ) = T(f ) − f dμ.
0 0 0 X X
Since (f − X f dμ) ∈ L1 (X, B , μ) and X (f − X f dμ)dμ = 0, we can apply Lin’s
0
Criterion, which gives that limn→∞ T n (f − f dμ)1 = 0. Using this and the fact
0 X
that 1 A − X 1 A dμ ∈ L1 (X, B , μ), we obtain
, ,
lim μ(A ∩ T −n (B)) = lim 1 A (1 B ◦ T n ) dμ = lim (T n 1 A )1 B dμ
n→∞ n→∞ n→∞
X X
,
= lim n (1 A − μ(A))1 B dμ + μ(A)μ(B)
(T
n→∞
X
= μ(A)μ(B).
2.6 Exercises
Exercise 2.6.1. Show that the Lebesgue measure is not invariant under the Gauss map.
Exercise 2.6.2. Show that the system (R, B , λ, T) defined in Example 2.3.12 is really
non-singular and that its Hopf decomposition is non-trivial with conservative part
T = [0, 1].
given by C
Exercise 2.6.3. As before, let F0 : x → x/(1 + x) and F1 : x → 1/(1 + x) denote the two
inverse branches of the Farey map F. For ω = (ω1 , . . . , ω n ) ∈ {0, 1}n , n ∈ N, define
F ω := F ω1 ◦ . . . ◦ F ω n ,
En := {(ω1 , . . . , ω n ) ∈ {0, 1}n : # {i : ω i = 1} is even}, and On := {0, 1}n \ En .
(ii) Use the identity in (i) to obtain an alternative proof of the fixed point equation
P F (h) = h of the Ruelle operator P F for the map F. (See Proposition 2.3.19).
Exercise 2.6.4. Give a proof of Proposition 2.3.21 (b), by verifying the eigenequation
P L α h L α = h L α for the Ruelle operator P L α of an α-Lüroth map L α .
Exercise 2.6.5. Let F A : A → A denote the induced map of the Farey map F on
the set A := [1/2, 1]. Prove that F A ([x1 , x2 , x3 , x4 , . . .]) = [x1 , x3 , x4 , . . .], for all
[x1 , x2 , x3 , x4 , . . .] ∈ A.
2.6 Exercises | 121
Exercise 2.6.6. Let (X, B , μ, T) be an ergodic system with a finite invariant measure μ,
and suppose that E ∈ B is such that μ(E) > 0. Let (n k )k≥0 be the sequence of occurrence
times such that T n k (x) ∈ E for all k ≥ 0 (note that these are guaranteed to exist for μ-a.e.
x by Halmos’s Recurrence Theorem). Show that if we assume that n0 = 0, so x ∈ E, then
we have μ-a.e. that
nk 1
lim = .
k→∞ k μ(E)
Exercise 2.6.7. Prove that in an infinite ergodic system, the assumption of con-
servativity is necessary for the existence of sweep-out sets of finite measure. (See
Lemma 2.4.3.)
Exercise 2.6.8. Prove that where φ(x) := inf {n ≥ 1 : T n (x) ∈ A} and A is a sweep-out
set for T, we have
Exercise 2.6.9. Taking inspiration from the proof of Proposition 2.4.31, prove that the
map F α is ergodic with respect to the measure να defined in Proposition 2.3.20.
Exercise 2.6.10. Using the duality (L1 (μ))* L∞ (μ) give a formal proof that the
unitary operator U T : L∞ (μ) → L∞ (μ), f → f ◦ T, is the dual operator of T.
= 1.
Exercise 2.6.11. Prove that T
Exercise 2.6.13. Let μ and ν be two σ-finite measures on (X, B) with μ ∼ ν . Show that
the Radon–Nikodým density dμ/dν is almost everywhere positive and that dν /dμ =
(dμ/dν )−1
Exercise 2.6.14. Show that if v/w is a reduced fraction in (0, 1), then for all p/q ∈
F −n v/w we have that
n p q2
(F ) = .
q w2
Exercise 2.6.15. Show that the statement in Proposition 2.5.4 still holds if we replace
C(1) by any arbitrary Farey-cylinder whose final symbol is equal to 1. (Of course, the
sequence (m k )k≥1 might be a different one).
Exercise 2.6.16. Fill in the gap in the proof of Lin’s Criterion: Prove that if g is
measurable with respect to the tail σ-algebra of T and T is exact, then g is constant
μ-a.e.
3 Renewal theory and α-sum-level sets
In this chapter we will mainly investigate certain subsets of the unit interval which
are defined in terms of the α-Lüroth expansion. However, in order to motivate this
exploration, in the first section we will describe the analogous problem for the
continued fraction expansion. In the second section, we first define the sets we are
interested in and then show how classical results in the field of renewal theory can be
used to obtain detailed information about the sets in question.
The first few of these sets are shown in Fig. 3.1, below. Directly from the definition, we
have that C1 = C(1) = [1/2, 1]. Likewise, it follows that for the next few sum-level sets
we have
and so on.
To begin the inspection of the sequence (Cn )n≥1 of these sets, let us consider the
lim-inf set, which is defined by
In order for an irrational number x to lie in all of the sets CN , CN+1 , CN+2 , . . ., for some
N ∈ N, we must have that x = [x1 , . . . , x k , 1, 1, 1, . . .], where ki=1 x i = N. In other
words, the lim-inf set of the sequence (Cn )n≥1 is equal to the set of all noble numbers
(see Definition 1.1.13 (d)), that is, irrational numbers whose continued fraction digits
are from some point on always equal to 1. As we have already observed, this set is
3.2 Sum-level sets for the α-Lüroth expansion | 123
0 1
C1
1/2
C2
1/3 2/3
C3
1/4 2/5 3/5 3/4
C4
1/5 2/7 3/8 3/7 4/7 5/8 5/7 4/5
..
.
Fig. 3.1. The first four sum-level sets.
countable. On the other hand, one immediately verifies that the lim-sup set¹
is equal to the set of all irrational numbers in [0, 1]. Hence, at first sight, the sequence
of sum-level sets appears to be far away from being a canonical dynamical entity.
(However, in Chapter 5 we will show that this is actually not the case.)
For the Lebesgue measure of the first four members of the sequence of the
sum-level sets (cf. Fig. 3.1) one immediately computes that
From this one might already start to suspect that the sequence (λ (Cn ))n≥1 is decreasing
for n tending to infinity. In fact, it was conjectured by Fiala and Kleban [FK10] that
λ (Cn ) tends to zero, as n tends to infinity. In Section 5.1, we will settle this conjecture
affirmatively, as well as prove some much stronger results. Before this, though, we will
consider the parallel, easier to analyse, situation for the α-Lüroth systems.
In this section, we will study the sequence of the Lebesgue measures of the α-sum-level
sets for an arbitrary α-Lüroth map L α , for a given partition α. Analogous to the
sum-level sets for the continued fraction expansion, defined in the previous section,
1 Recall that we have already encountered lim-sup sets in Chapter 1, specifically in Lemma 1.2.18 (the
Borel–Cantelli Lemma).
124 | 3 Renewal theory and α-sum-level sets
Also, for later convenience, we define C0(α) := [0, 1]. Our toolkit for the investigation
into the sequence (λ(Cn(α) ))n≥1 will consist of classical results from renewal theory.
Our aim here is to state and give some ideas of the proofs of some strong renewal
theorems due to Garsia/Lamperti [GL63] and Erickson [Eri70]. Before doing so, we first
state and prove the original discrete renewal theorem due to Erdős, Pollard and Feller
[EFP49]. We begin by defining a renewal pair.
Definition 3.2.1. Let (v n )n≥1 be an infinite probability vector, that is, a sequence of
non-negative real numbers for which ∞ k=1 v n = 1. Assume that associated to this
vector there exists a sequence (w n )n≥0 , with w0 := 1, which satisfies the renewal
equation:
n
wn = v m w n−m , for all n ∈ N.
m=1
A pair ((v n )n≥1 , (w n )n≥0 ) of sequences with these properties is referred to as a renewal
pair.
Let us give a brief sketch of the original probabilistic motivation for this definition.
For further details and many examples, we refer the reader to Chapter XIII of Feller
[Fel68a].
Consider a sequence of independent identically distributed random variables
(T n )n∈N with values in N. (Just think of T n as the random discrete time between
the occurrence of a ‘recurrent event’, like the successive renewal of a burned-out
lightbulb.) For the distribution, we write v k := P(T1 = k) for each k ∈ N. Now the
probability of the occurrence of the event at time k ∈ N is given by
# -
w k := P T i = k for some ∈ N0 .
i=1
Since the empty sum is by definition equal to 0 we have w0 = 1. Using the fact that
the sequence of random variables (T n ) are independent and identically distributed
3.2 Sum-level sets for the α-Lüroth expansion | 125
we find
#
-
wn = P T i = n for some ∈ N0
i=1
#
-
n
= P T1 = m and T i = n for some ∈ N
m=1 i=1
#
-
n−1
= P({T1 = n}) + P T1 = m and T i = n − m for some ∈ N
m=1 i=2
#
-
n−1
= vn + vm P T i+1 = n − m for some ∈ N0
m=1 i=1
n
= v m w n−m .
m=1
This shows that the renewal equation w n = nm=1 v m w n−m for n ∈ N has its natural
place in probability theory. In the following we will see how to determine the
asymptotic behaviour of (w n ) in terms of (v n ) just by analysing the renewal equation.
We are now almost in a position to state and prove the classical discrete renewal
theorem. The proof we give here is essentially (with a few extra details inserted) the
original proof given in [EFP49]. Before we start, we make the following definitions for
a given renewal pair ((v n ), (w n )):
Then, for all n with w n = 0 we also have that v n = 0, since using the renewal equation
gives that w n = 0 = nm=1 v m w n−m and so each term in this sum must be equal to zero.
In particular, v n w0 = 0, but since w0 = 1, it follows that v n = 0. This implies that d w is
a factor of d v . It is also possible to show, using a fairly straightforward but somewhat
ungainly induction argument, that d v is a factor of d w . Thus, these two quantities are
always equal. We will also need the following elementary technical observation.
Lemma 3.2.2. If d is the greatest common divisor of the sequence of natural numbers
(n k )k≥1 , then there exist numbers K and M with the property that for each m ∈ N such
that m ≥ M there exist c1 , . . . , c K ∈ N such that
K
m·d= ck nk .
k=1
Proof. We can assume that d = 1 (otherwise just divide each of the n k by d), and
also that d is the greatest common divisor of the first K of the given numbers, that is,
g.c.d.(n1 , n2 , . . . , n K ) = 1. It is well known that there then exist integers b1 , b2 , . . . , b K
126 | 3 Renewal theory and α-sum-level sets
b1 n1 + . . . + b K n K = 1.
where i ≥ 0 and 0 ≤ r < n1 come from the division algorithm applied to m − M. Therein
lie the factors c k and (since bn1 > b k r), they are clearly positive integers.
Finally, before stating the theorem, we also need the following elementary lemma. We
include the proof for completeness.
Lemma 3.2.3. Let (b n )n≥1 and (bn )n≥1 be two sequences of real numbers with the
property that limn→∞ (b n + bn ) exists. Then, provided that we do not have lim inf n→∞ b n =
−∞ and lim supn→∞ bn = ∞, or vice versa, it follows that
lim (b n + bn ) − lim sup(bn ) = lim inf (b n + bn ) + lim inf (−bn ) ≤ lim inf (b n ).
n→∞ i→∞ n→∞ i→∞ n→∞
lim (b n + bn ) − lim inf (b n ) = lim sup(b n + bn ) + lim sup(−b n ) ≥ lim sup(bn ).
n→∞ i→∞ n→∞ i→∞ n→∞
Theorem 3.2.4 (Discrete Renewal Theorem). Let ((v n )n≥1 , (w n )n≥0 ) be a renewal pair
and suppose that d v = 1. Then
1
lim w n = ∞ ,
n→∞ m=1 m · v m
where the limit is understood to be equal to zero if the series in the denominator diverges.
Proof. For ease of notation, throughout we denote s := ∞ m=1 m · v m . First, we show by
induction that 0 ≤ w n ≤ 1, for each n ∈ N0 . To start, notice that w1 = v1 · w0 = v1 ≤ 1.
2 This can be seen, for instance, by considering moduli of integers. The interested reader is referred
to Section 2.9 of The Theory of Numbers, by Hardy and Wright [HW08].
3.2 Sum-level sets for the α-Lüroth expansion | 127
n
w n = v1 · w n−1 + v2 · w n−2 + · · · + v n · w0 ≤ v k ≤ 1.
k=1
Let w := lim supn→∞ w n and pick a subsequence (w n k )k∈N with the property that
limk→∞ w n k = w. Then, for all m ≥ 1, we have, via Lemma 3.2.3, that
⎛ ⎞
⎜ ⎟
w = lim w n k = lim ⎝v m · w n k −m + v s · w n k −s ⎠
k→∞ k→∞
1≤s≤n k
s =m
= lim inf v m · w n k −m + lim sup v s · w n k −s
k→∞ k→∞ 1≤s≤n k
s =m
≤ v m lim inf w n k −m + v s lim sup w n k −s
k→∞ k→∞
s =m
From this it follows immediately that v m w ≤ v m lim inf k→∞ w n k −m and therefore,
provided that v m > 0, we obtain
Thus,
lim w n k −m = w. (3.1)
k→∞
Applying this argument many times over, one obtains that equation (3.1) holds for
all m such that there exist positive integers m1 , . . . m j with each v m i > 0 so that m =
m1 + . . . + m j . Given that d v = 1, from Lemma 3.2.1 it follows that every large enough
m has this form (where we can do without the factors c i , as there is no reason that the
integers m i have to be distinct). In other words, there exists some M ∈ N such that (3.1)
holds for every m ≥ M.
Now, for each n ∈ N, set
∞
r n := vm .
m=n+1
Then r0 = 1 and
∞
∞
∞
∞
∞
m · vm = vm + vm + vm + . . . = rn .
m=1 m=1 m=2 m=3 n=0
128 | 3 Renewal theory and α-sum-level sets
From the renewal equation and the fact that r m − r m−1 = −v m , we deduce that
n
r0 · w n = w n = − (r m − r m−1 )w n−m
m=1
and, by bringing the negative terms to the left-hand side, we can write this in the
following way:
If we define the left-hand side of Equation (3.2) to be equal to s n , then the right-hand
side of Equation (3.2) is equal to s n−1 . Note that s0 = r0 · w0 = 1. Thus, in light of
Equation (3.2), we have that s n = 1, for all n ∈ N. In particular,
n
k −M
r i · w n k −(M+i) = 1. (3.3)
i=0
We will now show that w = 1/s. First, suppose that s is finite. In that case, for all ε > 0
there exists N ∈ N with
r0 + r1 + . . . r N ≥ s − ε.
N
1≥ r i · w n k −(M+i) .
i=0
N
1≤ε+ r i · w n k −(M+i) .
i=0
∞
1≤ε+w m · vm ,
m=1
If we are instead in the situation that s is infinite, we have for all C > 0, that there
exists an N ∈ N such that
r0 + r1 + . . . r N > C,
from which, in a similar way to the above, we obtain the inequality 1 ≥ Cw. Since C
can be arbitrarily large, it follows that w = 0. Notice that if lim supn→∞ w n = 0, then we
must have that limn→∞ w n = 0, as these are all positive numbers. Therefore, in the case
where s is infinite, the proof is finished.
In the case where s is finite, we also have to show that lim inf n→∞ w n = 1/s. This
proceeds analogously, starting by setting w := lim inf n→∞ w n and then choosing a
subsequence that achieves this lower limit.
Remark 3.2.5. In the above proof, if it so happens that v m > 0 for every m ∈ N, we
could dispense with the slight complication of having to use Lemma 3.2.2, since in
this situation we have that Equation (3.1) holds for every m ∈ N.
We will now state some stronger renewal results obtained by Garsia and Lamperti
[GL63], and by Erickson [Eri70]. Their results are for the case where the limit in the
statement of the discrete renewal theorem is equal to zero. They study the manner in
which the sequence (w n )n≥0 tends to zero, under a certain additional hypothesis which
we will now describe. Let the sequences ((v n )n≥1 , (w n )n≥0 ) be a given renewal pair and
let the two associated sequences (V n )n∈N and (W n )n∈N be defined, for all n ∈ N, by
∞
n
V n := v k and W n := wk . (3.4)
k=n k=1
Then the principal assumption in these strong renewal results is that V(n) satisfies
V n = ψ(n)n−θ ,
for all n ∈ N, for some θ ∈ [0, 1] and for some slowly varying function ψ. (Recall
that slowly varying functions were defined in Section 1.4.4.) Before stating the
theorem, let us also remind the reader that the notation “f (n) ∼ g(n)” means that
limn→∞ f (n)/g(n) = 1. Finally, the constants appearing on the right-hand side of the
first two statements are given in terms of the gamma function (which was originally
introduced by Euler). The gamma function is an extension of the factorial function to
complex arguments, so we have Γ(n) = (n − 1)!, and, considered as an extension to the
open right-half plane, it has no zeros. For more details, we refer the interested reader
to the book Complex Analysis by Gamelin [Gam01].
130 | 3 Renewal theory and α-sum-level sets
Finally, for θ ∈ (0, 1/2) we have that the limit in the latter formula does not have to exist
in general. However, for θ ∈ (0, 1/2] it is shown in [GL63, Theorem 1.1] that one at least
has
1 sin πθ
lim inf n · w n · V n = = ,
n→∞ Γ(θ)Γ(1 − θ) π
and that the limit exists if we restrict the indices to a set of integers whose complement
is of zero density³
We will not rigorously prove these strong renewal results, as the proofs are
decidedly non-trivial. However, we will provide a sketch of some of the main ideas.
The proof of the first statement in the strong renewal results by Garsia/Lamperti and
Erickson is reasonably straightforward, although it does use some fairly heavy ana-
lytic machinery. The deep result underlying this statement is Karamata’s Tauberian
Theorem, which we state below in the setting of power series (the proof can be found
in [Fel68b]). Before stating this theorem, let us recall the following definitions:
– A measurable function ψ : R+ → R+ is said to be slowly varying if
ψ(xy)
lim = 1, for all y > 0.
x→∞ ψ(x)
f (x) = x ρ · ψ(x),
3 If we set A(n) := {1, . . . , n} ∩ A, then the density of a set of integers A is given, where the limit
√
exists, by d(A) := limn→∞ # A(n)/n. For example, if A := {n2 : n ∈ N}, then since # A(n) ≤ n we have
that d(A) = 0.
3.2 Sum-level sets for the α-Lüroth expansion | 131
Theorem 3.2.6 (Karamata’s Tauberian Theorem). Let b n ≥ 0 for all n ∈ N0 and suppose
that the series
∞
B(s) := bn sn
n=0
converges for 0 ≤ s < 1. If ψ is slowly varying and 0 ≤ ρ < ∞, then the following two
statements are equivalent:
1 1
(a) B(s) ∼ · ψ , as s → 1− .
(1 − s)ρ 1−s
n−1
n ρ · ψ(n)
(b) bk ∼ , as n → ∞.
Γ(1 + ρ)
k=0
Furthermore, if the sequence (b n )n∈N is monotonic and 0 < ρ < ∞, then (a) is equivalent to
n ρ−1 · ψ(n)
(c) b n ∼ , as n → ∞.
Γ(ρ)
Finally, if for a family of sequences (b xn )x∈X the asymptotic in (b) holds uniformly in
x ∈ X then so does the asymptotic in (a).
Proof. See [Fel68b], Theorem 5 in Section XIII.5 and, for the uniformity, a detailed
inspection of the proof of the Extended Continuity Theorem (Section XIII.1 Theorem
2a) is needed (cf. Excercises 3.3.6 and 3.3.7).
For the following discussion, we will need to use the notion of a generating function
(see also Chapter XI of [Fel68b]).
∞
C(s) := cn sn .
n=0
Then, if C(s) is convergent in some interval −s0 < s < s0 , we say that C(s) is the
generating function of the sequence (c i )i≥0 . Note that if the sequence (c i )i≥0 is bounded,
then C(s) certainly converges in the interval |s| < 1.
Now, for a given renewal pair ((v n )n≥1 , (w n )n≥0 ), and with W n and V n defined as in
(3.4), we wish to use Karamata’s Tauberian Theorem to prove that
−1
−1
n
W n ∼ (Γ(2 − θ)Γ(1 + θ)) ·n· Vk .
k=1
n
V k ∼ (1 − θ)−1 · n1−θ · ψ(n), as n → ∞.
k=1
132 | 3 Renewal theory and α-sum-level sets
∞
∞
V(t) := V n t n and W(t) := wn tn .
n=0 n=0
Recalling both that V n = n−θ ψ(n), where θ ∈ [0, 1] and ψ is a slowly varying function,
and that V n is monotonically decreasing, we can apply Theorem 3.2.6 (c) =⇒ (a) to
obtain
1
V(t) ∼ Γ(1 − θ)(1 − t)θ−1 ψ .
1−t
Multiplying out and gathering coefficients yields that V(t)W(t) = 1/(1 − t), or, in other
words,
1
W(t) = .
(1 − t)V(t)
Thus,
1
W(t) ∼ (1 − t)−θ .
1
Γ(1 − θ)ψ
1−t
The proof of the other parts of the strong renewal results quoted above also rely on
Theorem 3.2.6, but also on some intricate estimates of integrals. We leave the details
to the intrepid reader.
We begin our discussion with the crucial observation that the sequence of the
Lebesgue measures of these α-sum-level sets satisfies a renewal equation (as observed
in [WX11], [Mun11] and [KMS12]). Here, the role of the probability vector is filled by the
sequence of Lebesgue measures of the partition elements of α, that is, the sequence
(a m )m≥1 .
3.2 Sum-level sets for the α-Lüroth expansion | 133
Lemma 3.2.8. We have that a n , Cn(α) defines a renewal pair. That is, for each n ∈ N,
we have
n
λ Cn(α) = (α)
a m λ Cn−m .
m=1
Proof. Since λ(C0(α) ) = 1 and λ(C1(α) ) = a1 , the assertion certainly holds for n = 1. For
n ≥ 2, the following calculation finishes the proof.
n−1
λ Cn(α) = λ C α (n) + λ(C α (1 , . . . , k , m))
m=1 (α)
C α (1 ,...,k ,m)∈Cn
k∈N
n−1
= λ(C α (n)) + am λ(C α (1 , . . . , k ))
m=1 (α)
C α (1 ,...,k )∈Cn−m
k∈N
n−1 n
= a n λ C0(α) + (α)
a m λ Cn−m = (α)
a m λ Cn−m .
m=1 m=1
We are now in a position to prove our main results. The first of these is valid for
arbitrary partitions, but for the second we must restrict ourselves to partitions that
are either expansive of exponent θ ∈ [0, 1] or of finite type (recall that these were
introduced in Definition 1.4.18). The proof of the first statement of the first main result
again makes use of the notion of a generating function, which was defined above (see
Definition 3.2.7).
Theorem 3.2.9. For the α-sum-level sets of an arbitrary given partition α ∈ A we have
that ∞ (α)
n=1 λ(Cn ) diverges, and that
#
0 if F α is of infinite type;
(α)
lim λ Cn = ∞ −1
k=1 t k if F α is of finite type.
n→∞
Proof. The general form of the discrete renewal theorem given in Theorem 3.2.4 above
can be applied directly to our specific situation. For this, fix some partition α ∈ A,
and set v n := λ(A n ) = a n , for each n ∈ N. Let us recall again that this is certainly a
probability vector. Then, put w n := λ(Cn(α) ), for each n ∈ N0 . In light of Lemma 3.2.8 and
the observation that w0 = λ(C0(α) ) = 1, we then have that these particular sequences
(v n )n≥1 and (w n )n≥0 are indeed a renewal pair. Consequently, an application of the
discrete renewal theorem immediately implies that
∞ −1 ∞ −1
(α)
lim λ Cn = k · ak = tk ,
n→∞
k=1 k=1
∞
where this limit is equal to zero if k=1 t k diverges. Note that by Lemma 2.3.20, the
divergence of the latter series is equivalent to the statement that the partition α is of
infinite type.
134 | 3 Renewal theory and α-sum-level sets
For the remaining assertion, let us consider the two generating functions V and
W, which are given by
∞
∞
V(s) := v n s n and W(s) := wm sm .
n=1 m=0
Using the Cauchy product formula for the two power series in tandem with the renewal
equation provided in Lemma 3.2.8, we have that
∞
n
∞
W(s)V(s) = sn v m w n−m = w n s n = W(s) − 1 (3.5)
n=1 m=1 n=1
∞ (α)
which shows that the series n=0 λ(Cn ) diverges. This finishes the proof.
Theorem 3.2.10. For a given partition α which is either expansive of exponent θ ∈ [0, 1]
or of finite type, we have the following estimates for the asymptotic behaviour of the
Lebesgue measure of the α-sum-level sets.
(a) With K θ := (Γ(2− θ)Γ(1+ θ))−1 for α expansive of exponent θ ∈ [0, 1] and with K θ := 1
for α of finite type, we have that
n −1
n
(α)
λ Ck ∼ K θ · n · tk .
k=1 k=1
(b) With k θ := (Γ(2 − θ)Γ(θ))−1 for α expansive of exponent θ ∈ (1/2, 1] and with k θ := 1
for α of finite type, we have that
n −1
(α)
λ Cn ∼ k θ · tk .
k=1
Moreover, if θ ∈ (0, 1/2), then the corresponding limit does not exist in general.
However, in this situation the existence of the limit is always guaranteed at least
on the complement of some set of integers of zero density.
Proof. The statements in the theorem concerning partitions α of finite type follow
easily from Theorem 3.2.9. Indeed, given that
⎛ ⎞
λ Cn(α) n
lim ⎝ −1 ⎠ = lim λ C (α)
n · lim t k = 1,
n→∞ n n→∞ n→∞
k=1 t k k=1
3.3 Exercises | 135
the statement in part (b) follows immediately. The corresponding claim in part (a) fol-
lows directly on considering the Cesàro average of the sequence of Lebesgue measures
of α-sum-level sets. Similarly to the proof of Theorem 3.2.9, the remainder of the proof
(that is, those parts concerning partitions that are expansive of exponent θ), follow
from straightforward applications of the strong renewal results of Garsia/Lamperti
and Erickson to the setting of the α-sum-level sets. For this we must set v n := a n , V n := t n
and w n := λ(Cn(α) ), and recall that the so-defined pair of sequences ((v n )n≥1 , (w n )n≥0 )
satisfies the conditions of a renewal pair.
3.3 Exercises
Exercise 3.3.1. Let (v n ) be a infinite probability vector with generating function V (see
Definition 3.2.7). Show that
1 − V(z)
∞
lim− = m · vm .
s→1 1−z
m=1
Exercise 3.3.2. Consider the renewal pair (v n , w n ) with the extra assumption that v1 :=
p ∈ (0, 1) and v2 := 1 − p. Determine the values of the sequence (w n ) explicitly with the
help of the generating functions V and W, and verify that the speed of convergence in
the renewal theorem is in fact exponential. (Hint: Use the relation (3.5) to show that
W = 1/(1 − V) is a rational function with two poles, one in s1 = 1 and another one
in s2 . Find the residues and determine the power series of W by using the identity
(1 − s/s k ) = n≥0 (s/s k )n , for k = 1, 2.)
Exercise 3.3.3. Generalise the ideas of Exercise 3.3.2 to the case that more than two,
but only finitely many, of the v n are non-zero and such that d v = 1. (Hint: As an
intermediate step show that 1− V(s) is a polynomial of finite degree which has a simple
root in 1 and all the other roots are of modulus strictly greater than 1. Also make use
of Exercise 3.3.1.)
Exercise 3.3.4. In the following exercise we employ a very useful representation for
slowly varying functions. Assume that L is a slowly varying function. Then there
exist constants c ∈ R and A > 0, a bounded measurable function η and a continuous
function δ, both defined on [a, ∞), with limx→∞ η(x) = c and limx→∞ δ(x) = 0 such that
for all x ≥ a we have
, x
δ(t)
L(x) = exp η(x) + dt .
a t
2. We have
log(L(x))
lim = 0.
x→∞ log(x)
U x (t) ∼ t ρ ψ(t),
Hint: First show that for some δ ∈ (0, 1) the quotient ω x (δ · τ)/τ−ρ ψ(1/τ) stays uniformly
bounded as τ tends to zero (for this split the domain of integration into the points 2k /τ,
k ∈ N and use integration by parts). Then use Exercise 3.3.4 (1) with a > 0 sufficiently
small and b < ∞ sufficiently large to split the domain of integration in the definition of
the Laplace transform in a convergent part and two negligible parts.
Exercise 3.3.7. Use Exercise 3.3.6 to prove the uniformity statement in Karamata’s
Tauberian Theorem 3.2.6.
t −1
Hint: Consider the distribution function U x (t) := k=0 b xk + (t − t )b x t and make the
change of variables y = exp(−t).
4 Infinite ergodic theory
In this chapter we will make a deeper journey into infinite ergodic theory, taking
up where we left off in Chapter 2. In the following chapter, we shall then see some
applications of this general theory to continued fractions.
Our first main result of this section will be the Chacon–Ornstein Ergodic Theorem
[CO60], which is stated completely in terms of linear operators acting on L1 (μ).
After having proved this powerful result, we will then see that it implies Birkhoff’s
Pointwise Ergodic Theorem (which we have already seen in Theorem 2.4.16), Hopf’s
Ergodic Theorem (cf. Theorem 2.4.24), as well as Hurewicz’s Ergodic Theorem (Corol-
lary 4.1.18), which we will specifically need later in this chapter. The proof, as you
might expect, is not trivial. Before stating and proving the Chacon–Ornstein Ergodic
Theorem, we will first collect a few useful observations concerning the functional
analytic nature of this part of the theory. In particular, we shall now study the
previously-defined transfer operator T and Koopman operator U T : f → f ◦ T from a
broader functional-analytic perspective (recall that the Koopman operator U T was first
mentioned in Remark 2.3.8).
Let (X, B , μ) denote a σ-finite measure space. We remind the reader that the space
of integrable functions L1 (μ), in which functions are identified if they are a.e. equal,
together with norm · 1 given by
,
f 1 := |f | dμ
defines a Banach space. The space L∞ (μ) equipped with the norm · ∞ given by
f ∞ := inf {c ∈ R : |f | ≤ c a.e.}
also defines a Banach space. Further, let L+p (μ) denote the set of non-negative
functions from L p (μ) and recall that the non-negative part ψ+ of a measurable
real-valued function ψ is given by the measurable function ψ+ := max{ψ, 0}.
We shall now study bounded linear operators acting on L1 (μ); these are linear
functions V : L1 (μ) → L1 (μ) with bounded operator norm, which is defined to be
1 1
V := sup 1V(f )11 .
f 1 =1
138 | 4 Infinite ergodic theory
Lemma 4.1.2. For the Koopman and the transfer operator we have:
(a) If (X, B , μ, T) is a measure-preserving dynamical system, then U T : f → f ◦ T is a
positive contraction on L1 (μ).
is a positive contraction
(b) If (X, B , μ, T) is a non-singular dynamical system, then T
on L1 (μ).
which shows that U T is well defined and a contraction. It is clear that U T is positive,
so the proof of part (a) is finished.
Towards part (b), as we recalled above, we have already seen that T has norm 1.
Positivity follows from the fact that for a given f ∈ L1 (μ) we have, for all g ∈ L+∞ (μ), that
+
, ,
Tf · g dμ = f · g ◦ T dμ ≥ 0.
Definition 4.1.3. Let V : L1 (μ) → L1 (μ) be a bounded linear operator. Then the dual
of V, which will be denoted by V * , is an operator acting on the dual space (L1 (μ))*
L∞ (μ) which is uniquely determined by the identity
, ,
f · V * (g) dμ := V(f ) · g dμ,
n−1
S n f := V k f for n ∈ N ∪ {∞}, and we set S0 f := 0.
k=0
4.1 The functional analytic perspective and the Chacon–Ornstein Ergodic Theorem | 139
Note that for V = U T , this definition coincides with our definition in the context of
dynamical systems as introduced in Section 2.4.6.
We can now continue towards the proof of the Chacon–Ornstein Ergodic Theorem
and its immediate corollaries, i.e., to Hopf’s, Birkhoff’s and Hurewicz’s Pointwise
Ergodic Theorems.
Vnφ
lim = 0.
n→∞ S n g
E n := {x ∈ X : V n φ(x) > εS n g }.
The aim is to show that ∞ n=2 μ g (E n ) <0 ∞, for μ g given by dμ g := g dμ, where we
assume without loss of generality that g dμ = 1. This will be sufficient, since then
by the Borel–Cantelli Lemma (see Lemma 1.2.18), we have that the set of points which
lie in infinitely many of the sets E n is of μ g -measure equal to zero; then taking the
complement of this limsup set and noting that g > 0 as well as that ε > 0 was arbitrary,
n
this gives lim supn→∞ VS n gφ ≤ 0 μ-a.e. Applying this result to −φ instead of φ shows that
n
μ-a.e. we also have lim inf n→∞ VS n gφ ≥ 0, giving finally the assertion.
Now, since V is a positive operator and both 0 and ψ ∈ L1 (μ) are less than or equal
to ψ , we have 0 = V(0) ≤ V(ψ+ ) as well as V ( ψ) ≤ V ψ+ , and hence (Vψ)+ ≤ (V(ψ+ ))+ =
+
V(ψ+ ). Using this and the fact that V(V n φ − εS n g) = V n+1 φ − S n+1 εg + εg, it follows that
+
V n+1 φ − εS n+1 g + 1 E n+1 εg = 1 E n+1 V n+1 φ − εS n+1 g + 1 E n+1 εg
= 1 E n+1 V n+1 φ − S n+1 εg + εg
+
= 1 E n+1 V(V n φ − εS n g)
+
≤ V V n φ − εS n g .
+
To shorten the notation below, let us set J n := V n φ − εS n g . Then the above
inequality together with the fact that V is a contraction implies that
, , , ,
ε 1 E n+1 gdμ ≤ (VJ n − J n+1 )dμ ≤ (V J n − J n+1 )dμ ≤ (J n − J n+1 )dμ.
j−2
≤ f + V max V k f = f + V ( f n−1 ) ≤ f + V ( f n ) .
1≤j≤n
k=0
Before stating the next result, let us fix some notation. We let Q n (φ, g ) := S n φ/S n g and
define Q2 (φ, g ) := sup
n∈N Q n ( φ, g ).
Since ({max
. 1≤k≤n Q k (/φ − sg, g ) > 0})n∈N is an increasing sequence of sets with union
2
equal to Q (φ, g ) > s , using the continuity of μ g from below finishes the proof.
Let g ∈ L1 (μ) such that g > 0. By letting s tend to infinity in the previous lemma, we
2
find that Q(φ, g) < ∞ μ-a.e., for each φ ∈ L1 (μ), and in particular for those φ such
that φ > 0. In fact, this shows that {S∞ g = ∞} = {S∞ φ = ∞} mod μ and hence the set
{S ∞ g = ∞} is μ-a.e. independent of g.
Remark 4.1.8. This definition further generalizes our notion of Hopf decompositions.
Let (X, B , μ, T) be a measure-theoretical dynamical system. Then the following hold:
(a) If the system is measure-preserving, then
C T = C UT mod μ.
T = C
C mod μ.
T
Lemma 4.1.9. Let V be a conservative positive contraction on L 1 (μ) and let φ ∈ L∞ (μ)
be given such that either V * φ ≥ φ or V * φ ≤ φ. Then V * φ = φ.
Proof. For the case that V * φ ≥ φ, let g ∈ L1 (μ) be fixed such that g > 0. We then have
that
,
n−1 , ,
* k n
0≤ V φ−φ V g dμ = φ V g − g dμ ≤ 2 φ∞ g dμ < ∞.
k=0
Since by assumption X∞ (g) = X, we have that S n g = n−1 k
k=0 V g is unbounded μ-a.e. on
*
X. Therefore, the latter inequality can be satisfied only if V φ = φ.
The case V * φ ≤ φ can be treated in an analogous way and is left to the reader.
Lemma 4.1.11. Let V be a positive conservative contraction on L1 (μ) and let φ ∈ L∞ (μ)
be V *-invariant, that is, φ = V * φ. Then φ+ and 1{a<φ≤b} are also both V * -invariant, for
all a, b ∈ R.
Proof. Fix φ ∈ L∞ (μ) such that φ = V * φ. Since V * is a positive operator, we have that
V * φ+ ≥ (V * φ)+ = φ+ and hence we can apply Lemma 4.1.9, which tells us that V * φ+ =
φ+ . These observations together with Example 4.1.10 give that for each a ∈ R,
V * (φ − a1 X ) = φ − a1 X and V * (φ − a1 X )+ = (φ − a1 X )+ .
+
Next, observe that the sequence h n := n 1/n − 1/n − (φ − a)+ converges
n≥1
monotonically from below to the indicator function 1{a<φ} . Since by the above
observations, all elements h n in this sequence are V * -invariant, we get V * 1{a<φ} ≥
V * h n = h n ↗ 1{a<φ} . Again by Lemma 4.1.9 we have that V * 1{a<φ} = 1{a<φ} . Now, the
lemma follows on observing that 1{a<φ≤b} = 1{a<φ} − 1{b<φ}
For a positive conservative contraction, the fixed points of its dual can be characterized
in the following way.
Lemma 4.1.12. Let V be a positive conservative contraction on L 1 (μ) and let φ ∈ L∞ (μ).
Then the following equivalence holds:
Proof. Let φ ∈ L∞ (μ) and assume that we have V(φ · h) = φ · V(h), for all h ∈ L+1 (μ).
Note that by Example 4.1.10, we have that 1 X = V * 1 X . Using this, it follows that for all
h ∈ L+1 (μ) we have
, , ,
V * (φ) · h dμ = φ · V(h) dμ = V(φ · h) dμ
, ,
*
= V (1 X ) · φ · h dμ = φ · h dμ.
Remark 4.1.13. The above lemma shows in particular that if all φ k ∈ L∞ (μ), k ∈
{1, . . . , n}, are V * -invariant then so is their product φ1 · · · φ n .
Definition 4.1.14. Let V : L1 (μ) → L1 (μ) be a bounded linear operator. Then the system
(L1 (μ), V) is called ergodic if the σ-algebra I := σ({f ∈ L∞ (μ) : V * f = f }) generated by
the V * -invariant functions is trivial.
Remark 4.1.15. Note that the σ-algebra I is trivial if and only if g ∈ {f ∈ L∞ (μ) :
V * f = f } implies that g is constant.
Lemma 4.1.16. For the transfer and the Koopman operator we have:
(a) If (X, B , μ, T) is a measure-preserving, conservative and ergodic dynamical system,
then (L1 (μ), U T ) is conservative and ergodic.
(b) If (X, B , μ, T) is a non-singular, conservative and ergodic dynamical system, then
is conservative and ergodic.
(L1 (μ), T)
are conservative by Remark 4.1.8. In
Proof. Both systems (L1 (μ), U T ) and (L1 (μ), T)
order to prove the ergodicity of U T , we have to show that the σ-algebra I generated
by the U T* -invariant functions is trivial, which is equivalent to the fact that all
U T* -invariant functions are constant. Fix φ ∈ L∞ (μ) such that U T* φ = φ. Then, by
Lemma 4.1.11, we can assume without loss of generality that φ ∈ L+∞ (μ)\ {0}. Therefore,
where μ φ is given by dμ φ := φdμ, we have for all f ∈ L1 (μ) that
, , , , ,
f ◦ T dμ φ = U T f · φ dμ = f U T* (φ) dμ = fφ dμ = f dμ φ .
Note that the conditional expectation is already characterised if the above equality
can be shown to hold for sets B ∈ F with F = σ(F ) and such that F is closed under
intersections and contains Ω.
Proof. Let us begin by proving the μ-a.e. convergence. For this, fix g ∈ L1 (μ) such that
g > 0. Next, let us define the set
L := φg + ψ − Vψ : φ ∈ L∞ (μ), ψ ∈ L1 (μ), V(φh) = φV(h), for all h ∈ L+1 (μ) .
Our next step is to show that L is a dense subset of L1 (μ) and that L is closed. To show
the denseness, we use the general fact that a subspace is dense in a normed vector
space if and only if the annihilator of the subspace is equal to the null-space, which
is a consequence of the Hahn–Banach Theorem (cf. [Rud91, Theorem 4.7]). Hence, it
is sufficient to show that the following implication holds for each k ∈ L∞ (μ):
,
kh dμ = 0, for all h ∈ L =⇒ k = 0. (4.1)
In order to show this, note that for each ψ ∈ L1 (μ), we have that ψ − Vψ belongs to L.
Hence, for each k ∈ L∞ (μ) that fulfills the left hand side of the above implication, we
have
4.1 The functional analytic perspective and the Chacon–Ornstein Ergodic Theorem | 145
, , ,
0= kψ dμ − k Vψ dμ = (k − V * k)ψ dμ.
Since this must hold for all ψ ∈ L1 (μ), it follows that k = V * k. Using Lemma 4.1.12, we
0
have that kg ∈ L and hence, kkg dμ = 0. Since kkg ≥ 0, it follows that k = 0, which
proves (4.1).
Now, to establish the μ-a.e. convergence, all that is left to show is that L is closed
in L1 (μ). For this, let us fix h ∈ L and δ > 0 such that for each ε > 0 there exists h ε ∈ L
with h − h ε 1 < ε · δ. Since we have that
Since the set J is closed under taking intersections and generates I and since φ
is I -measurable, it follows from the characterisation of the conditional expectation
146 | 4 Infinite ergodic theory
stated above that φ = E f /g|I . For general f ∈ L1 (μ) the claim follows by approxim-
ating f with functions from L.
Finally, if we additionally have that V is ergodic, then the σ-algebra I generated by
the V *-invariant functions is trivial by definition and hence, the limiting function is
μ-a.e. constant and equal to
, 0
f dμ
E f /g |I = f /g dμ g = 0 .
g dμ
As already mentioned at the beginning, we end this section by first showing how the
Chacon–Ornstein Ergodic Theorem implies Hurewicz’s Ergodic Theorem, and then
Hopf’s Ergodic Theorem, and, in turn, Birkhoff’s Pointwise Ergodic Theorem.
Remark 4.1.19. We can also give a second proof of Hopf’s Ergodic Theorem (which
we already discussed in Chapter 2, see Theorem 2.4.24), using the Chacon–Orstein
Ergodic Theorem. The argument works in precisely the same way as that given above.
First, we have seen in Section 4.1 that the Chacon–Ornstein Ergodic Theorem is
applicable to the Koopman operator U T : L1 (μ) → L1 (μ), given by U T f := f ◦ T.
Therefore, the assertion in Hopf’s Ergodic Theorem certainly holds for all f ∈ L1 (μ)
and for a particular g0 ∈ L+1 (μ) such that g0 > 0. The proof is then completed in exactly
0
the same way as in Hurewicz’s Ergodic Theorem for any g ∈ L1 (μ) with g dμ > 0.
Now Birkhoff’s Pointwise Ergodic Theorem (see Theorem 2.4.16) follows immedi-
ately by choosing g = 1 X in Hopf’s Ergodic Theorem. Recall that in the proof we gave
of Hopf’s Ergodic Theorem in Chapter 2 using inducing, part of the argument was to
use Birkhoff’s Theorem, so this deduction is only reasonable now.
4.2 Pointwise dual ergodicity | 147
In this section we will study an ergodicity property of the transfer operator associated
to a system.
Remark 4.2.2. For what follows it will be important to find a sequence (a n )n≥1 in
the asymptotic class of (r n ) that is strictly increasing. First fix f , g ∈ M(B) with f
0
a μ-integrable function and g a bounded function with g > 0, g dμ = 1. Then fix a
μ-typical point x ∈ X that witnesses both the Hurewicz Ergodic Theorem, in the sense
that
n n ,
lim k
T f (x)/ k
T g(x) = f dμ,
n→∞
k=0 k=0
Definition 4.2.3. For a set A ∈ B with 0 < μ(A) < ∞, the wandering rate of A with respect
to T is given by the sequence (w n (A))n≥1 , where
n
w n (A) := μ T −k (A) .
k=0
Where φ is the return time function with respect to A (as in Definition 2.4.25), we also
have that
n n
−k
k−1
−
w n (A) = μ(A ∩ {φ > k}) = μ(A) + μ T A\ T A . (4.2)
k=0 k=1 =0
Definition 4.2.4. Let (X, B , μ, T) be pointwise dual ergodic with return sequence (r n ).
Then, a set A ∈ B with positive, finite measure is called a uniform set for f ∈ L+1 (μ) if
,
1 k
n−1
T f → f dμ uniformly (mod μ) on A.
rn
k=0
We will also call a set A ∈ B uniform if it is a uniform set for some f ∈ L+1 (μ) with
0
f dμ > 0.
Lemma 4.2.5. Assume that the sequence (r n ) is regularly varying with exponent ρ ∈
[0, 1] and given by r n = nk=0 b k , for some non-negative sequence (b n ). If for some A ∈ B
with 0 < μ(A) < ∞ and f ∈ L+1 (μ) we have uniformly (mod μ) on A that
,
1 k
n−1
lim T f = f dμ,
n→∞ r n
k=0
∞ n
then with B(s) := n=0 b n s , s ∈ [0, 1), we have uniformly (mod μ) on A that
,
1 n n
∞
lim− s T f= f dμ.
s→1 B(s)
n=0
Proof. Let r n = n ρ ψ(n) for some slowly varying function ψ. Since b n ≤ r n it follows that
B(s) is finite for all s ∈ [0, 1). Then the claim follows by applying Karamata’s Tauberian
Theorem (see Theorem 3.2.6) twice, first with the sequence (b n ) and then with the
sequence (T n f ). More precisely, first note that
n
1 n ρ ψ(n) 1 1
bk ∼ =⇒ B(s) ∼ ψ Γ(1 + ρ).
Γ(1 + ρ) Γ(1 + ρ) (1 − s)ρ 1−s
k=0
n , ,
k f ∼
T f dμ · r n ∼ f dμ · n ρ ψ(n)
k=0
4.2 Pointwise dual ergodicity | 149
Lemma 4.2.6 ([Aar81]). For f ∈ L1 (μ) or f ∈ M + (B), and A ∈ B such that 0 < μ(A) < ∞
we have for s ∈ (0, 1)
, ,
∞
∞
1 − sφ n f dμ =
sn T sn f dμ,
A n=0 n=0 An
n−1
where A n := T −n A \ k=0 T −k A, for n ∈ N, and A0 = A.
∞ , ,
∞
= sn f dμ + sφ n f dμ
sn T
n=0 An A n=0
Proposition 4.2.7 (Asymptotic Renewal Equation [Aar81]). Let A be a uniform set with
regular varying return sequence r n = nk=0 b k , for some non-negative sequence (b k ).
150 | 4 Infinite ergodic theory
Then
,
1
1 − s φ dμ ∼ , for s → 1− ,
B(s)
A
∞ n
where B(s) := n=0 b n s .
Proof. Let A be a uniform set for f ∈ L+1 (μ). By Lemma 4.2.6 and with A n as defined
therein, we have
, , ,
∞
∞
1 − sφ n f dμ =
sn T sn f dμ → f dμ,
A n=0 n=0 An
for s → 1− , where the convergence follows by Abel’s Theorem. On the other hand,
making use now of Lemma 4.2.5 and the almost everywhere uniform convergence, we
find for s → 1−
, , ,
∞
1 − sφ k f dμ ∼ B(s) f dμ
sk T 1 − s φ dμ.
A k=0 A
Proof. As shown in Remark 4.2.2, pointwise dual ergodicity implies that the sequence
(r n ) can be chosen to be r n = nk=0 b n for some strictly positive sequence (b n ). On the
one hand, Lemma 4.2.6 for f = 1 X together with Proposition 4.2.7 implies, for s → 1− ,
∞ ,
k 1 1
s μ (A k ) = (1 − s φ ) dμ ∼ ,
1−s (1 − s)B(s)
k=0 A
∞ k
with B(s) := k=0 s b k . Since, as in (4.2),
n
w n (A) = μ(A) + μ (A k ) ∼ n α ψ(n)
k=1
4.3 ψ-mixing, Darling–Kac sets and pointwise dual ergodicity | 151
for α ∈ [0, 1] and ψ some slowly varying function, Karamata’s Tauberian Theorem
gives on the other hand
∞
Γ(1 + α) 1
s k μ (A k ) ∼ ψ .
(1 − s)α 1−s
k=1
Hence we have
1
B(s) ∼ .
(1 − s)1−α Γ(1 + α)ψ(1/(1 − s))
Applying in Karamata’s Tauberian Theorem the assertion (a) implies (b) to the
generating function B(s) gives
n
1
rn = b k ∼ n1−α .
Γ(2 − α)Γ(1 + α)ψ(n)
k=0
1 k
n−1
T 1 A → μ(A) uniformly (mod μ ) on A.
rn
k=0
Let us now introduce a stronger mixing property than the one given in Definition 2.5.8.
We recall that the refinements U n of a collection of sets U were defined in Defini-
tion 1.2.23(c) and σ(U ) denotes the σ-algebra generated by U .
Definition 4.3.2. Let (X, B , μ, T) be a dynamical system with μ(X) < ∞ and let M be a
measurable partition of X. Then the system is said to be ψ-mixing with respect to M,
if there exists a sequence (ψ m )m≥0 of positive real numbers which tends to zero for m
tending to infinity, such that for all n ∈ N, A ∈ σ Mn , B ∈ B and m ∈ N0 , we have that
μ A ∩ T −(m+n) (B) ≤ (1 + ψ m ) μ (A) μ (B) ,
Proof. Without loss of generality we assume that μ(A) = 1. Recall that since T is
k
conservative and ergodic we have by Corollary 2.4.5 that ∞ k=0 T 1 A = ∞ and hence
n
the sequence (a n )n≥1 given for each n ∈ N by a n := k=1 μ(A ∩ T −k A) tends to infinity.
We will show that this sequence will witness the Darling–Kac property for A. To begin,
by the assumed ψ-mixing condition there exists a positive sequence (ψ n )n≥0 with
limn→∞ ψ n = 0 such that for B ∈ σ(Mk ) with k ∈ N, and for all n ∈ N0 we have
Ak+n 1 B ≤ (1 + ψ n )μ(B)
T
Ak+n 1 B ≥ (1 − ψ n )μ(B).
T
k−1
where, as in (2.12), we define φ k := =0 φ ◦ T A . For the transfer operators, it follows
from (4.3) that
n
n
n
n 1A =
T Ak 1{φ =n} and
T k 1A =
T Ak 1{φ ≤n} .
T
k k
k=1 k=1 k=1
n
In particular, we have a n = k=1 μ |A ({ φ k ≤ n}). Now, for the upper bound we have
n
n
n+m
k 1A =
T Ak 1{φ ≤n} ≤
T Ak 1{φ ≤n}
T
k k
k=1 k=1 k=1
n
n
≤m+ Ak+m 1{φ ≤n} ≤ m +
T Ak+m 1{φ ≤n}
T
k+m k
k=1 k=1
n
≤m+ (1 + ψ m )μ|A ({φ k ≤ n}) = m + (1 + ψ m )a n ,
k=1
where for the last inequality we used the fact that {φ k ≤ n} ∈ σ(Mk ). Since this
inequality holds for every m, n ∈ N and since (a n ) is diverging it follows that
4.3 ψ-mixing, Darling–Kac sets and pointwise dual ergodicity | 153
uniformly a.e.
1 k
n
lim sup T 1 A ≤ 1.
n→∞ an
k=1
n
n
k 1A ≥
T Ak+m 1{φ ≤n} − m
T k+m
k=1 k=1
n
n
≥ Ak+m 1{φ ≤n} −
T Ak+m 1{φ ≤n≤φ } − m
T
k k k+m
k=1 k=1
n
n
≥ (1 − ψ m )μ|A ({φ k ≤ n}) − Ak+m 1{φ ≤n≤φ } − m
T k k+m
k=1 k=1
n
≥ (1 − ψ m )a n − m − Ak+m 1{φ ≤n≤φ } .
T k k+m
k=1
2
n n
≤ (1 + ψ0 ) μ|A ({φ k = })μ|A ({φ m > n − }).
k=1 =1
Let us split up the last sum for = 1, . . . , n − p and = n − p + 1, . . . , n for a fixed p < n.
For the first part we have
n
n−p
n
μ|A ({φ k = })μ|A ({φ m > n − }) ≤ μ|A ({φ m > p}) μ|A ({φ k ≤ n − p})
k=1 =1 k=1
For the second part, using μ|A ({φ m > n − }) ≤ 1 we have
n
n
μ|A ({φ k = })μ|A ({φ m > n − })
k=1 =n−p+1
n
≤ μ|A ({n − p ≤ φ k ≤ n})
k=1
154 | 4 Infinite ergodic theory
n
n
≤ μ|A ({φ k ≤ n}) − μ|A ({φ k ≤ n − p})
k=1 k=1
n
n−p
≤ μ|A ({φ k ≤ n}) − μ|A ({φ k ≤ n − p})
k=1 k=1
= a n − a n−p ≤ p.
1 k
n
lim inf T 1 A ≥ 1 − ψ m − (1 − ψ0 )2 μ|A ({φ1 ≥ p}).
n→∞ an
k=1
Since φ1 = φ is finite a.e. we have μ|A ({φ1 ≥ p}) → 0 for p → ∞. Also, ψ m → 0 for m → ∞,
this proves the uniform lower bound
1 k
n
lim inf T 1 A ≥ 1.
n→∞ an
k=1
Proof. To see the first claim, note that by Proposition 2.4.33 (a) we have for two
measurable sets A, C with finite measure and μ(A) > 0 that
∞ ∞
μ A k ∩ T −k (C) = μ A ∩ T −k (C) ∩ {φ > k} = μ(C).
k=0 k=0
Hence, using the Monotone Convergence Theorem, for every measurable set C with
finite measure we have
, ∞ ∞ ,
∞ ,
k (1 A ) dμ =
T 1 C
T k
(1 Ak ) dμ = μ A k ∩ T −k
(C) = 1 X dμ.
k
n−
n
μ B ∩ T −k (A ) ∩ T −(k+) (C)
=0 k=0
n
n−
−k − −m
= μ B∩T A ∩ T (C) \ T (A)
=0 k=0 m=1
n n−
n
= μ B ∩ T −k (C ∩ A) + μ B ∩ T −k T −1 (C−1 ) \ B .
k=0 =0 k=0
n−
n
μ B ∩ T −k T −1 (C−1 ) \ B
=0 k=0
n−1 n−(+1)+1
n
n−
= μ(B ∩ T −k (C ) − μ(B ∩ T −k (C )
=0 k=1 =1 k=0
n−1
n−
n−1
n−
n
= μ(B ∩ T −k (C )) − μ(B ∩ T −k (C )) − μ(B ∩ C )
=0 k=1 =1 k=1 =1
n
n
= μ(B ∩ T −k (C0 ) − μ(B ∩ C )
k=1 =1
n
n
= μ(B ∩ T −k (C \ A)) − μ(B ∩ C )
k=0 =0
n n−
n
μ(B ∩ T −k (C)) = μ B ∩ T −k (A ) ∩ T −(k+) (C)
k=0 =0 k=0
n
− −m
+ μ B∩T C\ T A .
=0 m=0
1 k
n
lim T 1 B = μ(B). (4.4)
n→∞ a n
k=0
156 | 4 Infinite ergodic theory
Then the system is pointwise dual ergodic. In particular, the existence of a Darling–Kac
set implies pointwise dual ergodicity.
Proof. We are going to prove that the convergence in (4.4) in fact holds a.e. on X. Then
Hurewicz’s Ergodic Theorem combined with this observation proves pointwise dual
ergodicity.
First note that by the Chacon–Ornstein Lemma 4.1.4 and the assumption stated in
the proposition we have for all N ∈ N that
n k n
k
a n−N a n−N k=0 T 1 B k=n−N+1 T 1 B
1 − n
a n n−N T
= → 1.
an k 1B k 1B
T
k=0 k=0
n
As before, for n ≥ 0, let A n := A \ k=1 T −k A. By Egorov’s Theorem we may assume
without loss of generality that the convergence in (4.4) holds uniformly on A. Fix ε > 0.
Using the first part of Lemma 4.3.4 we find for almost every x ∈ X an n0 ∈ N such
0 k
that nk=0 T 1 A k (x) ≥ (1 − ε). Further there exists n1 > n0 such that for all n ≥ n1 and
uniformly on A we have
1 0
n−n
k (1 B ) ≥ (1 − ε)μ(B) and a n−n0 ≥ (1 − ε).
T
a n−n0 an
k=0
Now using the second part of Lemma 4.3.4 gives for all n ≥ n1
n n
1 k 1 k k
n n−k
T 1 B (x) = T 1Ak · 1B +
T 1 k −j
T
an an B\ j=0 T (A) (x)
k=0 k=0 =0 k=0
1 k
n0 n−k
≥ T 1Ak ·
T (1 B ) (x)
an
k=0 =0
1 k 0
n0 n−n
≥ T 1Ak · (1 B ) (x)
T
an
k=0 =0
a n−n0 n0
≥ (1 − ε) μ(B) k 1 A (x) ≥ (1 − ε)3 μ(B).
T
an k
k=0
k
This shows that a.e. we have lim inf n→∞ a1n nk=0 T 1 B ≥ μ(B).
Towards the upper bound for the limit superior, fix ε > 0. By Hurewicz’s Ergodic
Theorem we also have a.e. on A,
1 k
n
lim T 1 A = μ(A). (4.5)
n→∞ a n
k=0
Again by Egorov’s Theorem we find a set A ∈ B such that μ(A ) > (1 + ε)−1 μ(A) and the
convergence in (4.5) holds uniformly on A . Hence there exists n0 ∈ N such that on A
k
and all n ≥ n0 we have a1n nk=0 T 1 A ≤ (1 + ε)μ(A) for all n ≥ n0 . Using the second part
4.4 Exercises | 157
4.4 Exercises
Exercise 4.4.1. Let X := (0, ∞) and λ denote the Lebesgue measure restricted to X.
Consider
V1 : L1 (λ) → L1 (λ)
f → x → V1 (f )(x) := e−x f (x) .
Exercise 4.4.2. Let X := (0, ∞) and λ denote the Lebesgue measure restricted to X.
Consider
V2 : L1 (λ) → L1 (λ)
,
f → x → V2 (f )(x) := 1(0,1) f dλ .
Exercise 4.4.3. Let f , g ∈ L1 (μ) with g ≥ 0. Prove with the help of Wiener’s Maximal
Inequality that we have
S n f S∞ f
lim =
n→∞ S n g S∞ g
158 | 4 Infinite ergodic theory
exists μ-a.e. on {S∞ g > 0} \ {S∞ g = ∞}. Go back to Exercises 4.4.1 and 4.4.2 and
consider the limit limn→∞ S n f /S n g on {S∞ g > 0} and {S∞ g = ∞}, respectively.
Exercise 4.4.5. For g ∈ L+1 (μ) we have that the set {S∞ g = ∞} is V * -invariant.
To begin, let us recall the definition of the sum-level sets (Cn )n≥1 for the continued
fraction expansion:
# -
k
Cn := [x 1 , x2 , x3 , . . .] ∈ [0, 1] : x i = n for some k ∈ N
i=1
n
= C(x1 , . . . , x k ).
k
k=1 (x1 ,...,x k ): x i =n
i=1
We claimed in the introduction to this chapter that the renewal theory arguments
as used in Chapter 3 are no longer sufficient to analyse the sum-level sets for the
continued fraction expansion. To see why, observe that if we wanted to prove a result
equivalent to Lemma 3.2.8 in this situation, we would be doomed to failure, as the
following calculation shows:
3 1 1 1 1
= λ(C3 ) = · λ(C2 ) + · λ(C1 ) + · λ(C0 ) = .
10 2 6 12 3
(Recall that we calculated the values of the Lebesgue measure of the first four
sum-level sets at the beginning of Chapter 3.)
Before stating the first main theorem, we need the following lemma which
provides the crucial link between the sequence of sum-level sets and the Farey map.
Note that this lemma contradicts our initial impression that the sequence of sum-level
sets is not a dynamical entity, despite its apparent strangeness.
160 | 5 Applications of infinite ergodic theory
F −(n−1) (C1 ) = Cn .
Proof. By computing the images of C1 under the inverse images F0 and F1 of the Farey
map, one immediately verifies that F −1 (C1 ) = C2 . We then proceed by way of induction
as follows. Assume that for some n ∈ N we have that F −(n−1) (C1 ) = Cn . Since F −n (C1 ) =
F −1 (F −(n−1) (C1 )) = F −1 (Cn ), it is then sufficient to show that F −1 (Cn ) = Cn+1 . For this, let
x = [x1 , x2 , x3 . . .] ∈ Cn be given. Then there exists ∈ N such that
x ∈ C(x1 , . . . , x ) and x i = n.
i=1
By computing the images of x under the inverse branches F0 and F1 , one obtains that
F −1 (x) = {[1, x1 , x2 , . . .], [x1 + 1, x2 , . . .]}. Since we have that
1+ x i = (x1 + 1) + x i = n + 1,
i=1 i=2
this shows that F −1 (x) ⊂ Cn+1 , and hence, F −1 (Cn ) ⊂ Cn+1 . The reverse inclusion Cn+1 ⊂
F −1 (Cn ) can be established by counting the Stern–Brocot intervals contained in Cn+1 .
Remark 5.1.2. Notice that an analogous result also holds for the α-sum-level sets, but
that this observation was not necessary for the analysis given in Section 3.2.2.
We are now almost ready to prove our first main theorem. The proof will follow on
combining Lemma 5.1.1 with the next result which depends on the fact that F is exact
(as shown in Theorem 2.5.6), so that we can apply Lin’s criterion for exactness (see
Theorem 2.5.7). Let us also recall that the unique absolutely continuous invariant
0
measure for the Farey map, denoted by νF , is given by νF (A) := A h F (x) dλ(x), where
h F (x) := 1/x (see Proposition 2.3.19).
Proposition 5.1.3. For each measurable set C which satisfies νF (C) < ∞, we have that
lim λ F −n (C) = 0.
n→∞
Proof. Let C ∈ B be given as stated in the proposition. So, for each A ∈ B for which
0 < νF (A) < ∞, we then have
λ F −n (C) = νF 1 F −n (C) · h F−1 = νF 1 C ◦ F n · h F−1
1A 1A
= νF 1 C ◦ F n · h F−1 − +
νF (A ) νF (A )
5.2 ψ-mixing for the Gauss map and the Gauss problem | 161
1 1
1 n 1 1 νF F −n (C) ∩ A
≤1 −1 A 1
1F h F − νF (A ) 1 + νF (A )
1 11
1 n 1 1 ν C
F( )
≤1 −1 A 1
1F h F − νF (A ) 1 + νF (A )
1
ν (C)
→ F , for n tending to infinity.
νF (A )
Here, the limit follows from the fact that νF h F−1 − 1 A /νF (A) = 0 and F is exact, and
hence, Lin’s criterion is applicable. Therefore, by choosing A ∈ B such that νF (A) is
arbitrarily large, the proposition follows.
We can now easily apply this result to determine the limit of the sequence (λ(Cn ))n≥1 ,
which was our first objective.
Theorem 5.1.4.
lim λ(Cn ) = 0.
n→∞
Proof. The proof follows immediately by first putting C = C1 in Proposition 5.1.3, and
then using the fact that Cn = F −(n−1) (C1 ), for all n ∈ N, as shown in Lemma 5.1.1.
Remark 5.1.5.
1. The above theorem (and proof) can be found in [KS12b]. Also in that paper, there is
another proof of the same theorem, which is more elementary, in the sense that it
uses less infinite ergodic theory. The other proof depends upon first showing that
lim inf n→∞ λ(Cn ) = 0. This fact was first established by Fiala and Kleban [FK10],
but they did not provide a proof for the limit.
2. The arguments given above can be slightly modified to give a different proof of
the infinite-type part of Theorem 3.2.9. Note that this will not work for the finite
measure case.
5.2 ψ-mixing for the Gauss map and the Gauss problem
Recall that in Corollary 2.5.9, we used Lin’s criterion for exactness to show that if
a map T is exact, then it is also mixing (cf. Definition 2.5.8). It follows, in light
of Theorem 2.4.12, that the Gauss map G is mixing. Here, we want to show that G
satisfies the stronger property of ψ-mixing which was introduced in Chapter 4 (see
Definition 4.3.2). This property can sometimes, for instance in [Aar97], be found under
the name continued-fraction mixing precisely because the Gauss map satisfies it. The
proof that we will present here is very much inspired by the proof given in [Ios92].
Before proving that the Gauss map G is ψ-mixing, we must make some preliminary
1 = 1. Secondly, we remind the reader that
remarks. First of all, recall that we have G
= h P G (h G f ) was established in Proposition 2.3.18, where there we
the identity Gf −1
G
162 | 5 Applications of infinite ergodic theory
where we have set p i (x) := (1 + x)/((i + x) (i + 1 + x)), for all i ∈ N. (With the pointwise
definition above kept in mind, the proof of this fact is simply a calculation; we leave
it to Exercise 5.7.2.) Notice that (p i (x))i≥1 is a probability vector; this will turn out to be
crucial later.
Further, let us introduce the set of functions of bounded variation,
where var f is defined to be var f := var[0,1] f and where var[a,b] f is defined, for [a, b] ⊂
[0, 1], to be
# n -
var[a,b] f := sup |f (x i+1 ) − f (x i )| : a ≤ x 1 < · · · < x n+1 ≤ b, n ∈ N .
i=1
Note that any function of bounded variation is in particular bounded. One can show
that any function of bounded variation f can be written as the difference g − h
of two bounded functions g and h, which are either both non-decreasing or both
non-increasing (you are asked to prove this in Exercise 5.7.1). More precisely, these
functions can be chosen, for x ∈ [0, 1], in the non-decreasing case to be
It is easy to see, and we leave it as an exercise, that in the non-decreasing case, var g =
g (1) − g (0) = var f and var h = h(1) − h(0) = var f + f (0) − f (1), and in the non-increasing
case, var g = var f and var h = var f − f (0) + f (1).
Proof. That Gf is bounded follows directly from the fact that ∞ p i (x) = 1 for all
i=1
x ∈ [0, 1]. For the proof of the remaining assertions, assume that f is non-decreasing
5.2 ψ-mixing for the Gauss map and the Gauss problem | 163
(for the non-increasing case consider −f instead) and let x, y ∈ [0, 1] be fixed such that
x < y. Then,
∞ ∞
(x) − Gf
(y) = 1 1
Gf p i (x) f − p i (y) f
x+i y+i
i=1 i=1
∞
1
= (p i (x) − p i (y)) f
x+i
i=1
∞
1 1
+ p i (y) f −f
x+i y+i
i=1
∞
1
≥ (p i (x) − p i (y)) f
x+i
i=1
∞
1 1
= (p i (x) − p i (y)) f −f ≥ 0,
x+i x+2
i=1
where, in the final equality, and inequality, we have used the observations that
∞
i=1 ( p i ( x ) − p i ( y )) = 0, that p 1 is decreasing, and that p i is increasing for all i ≥ 3.
Lemma 5.2.2. For each monotone and bounded function f : [0, 1] → R, we have that
≤ 1
var Gf var f .
2
Proof. Using Lemma 5.2.1 and assuming first that f is non-decreasing, we obtain
∞ ∞
:= Gf
(0) − Gf
(1) = 1 1
var Gf p i (0) f − p i (1) f
i 1+i
i=1 i=1
∞
1 1
= f (1) − (p i−1 (1) − p i (0)) f
2 i
i=2
1 2
∞
1 1 1 1 1
= f (1) − f ≤ f (1) − f (0) = var f ,
2 2 i ( i + 1) i 2 2 2
i=2
where we used the facts that f is non-decreasing and that 2/ (i (i + 1)) i≥2 is a
probability vector. For the non-increasing case, consider −f instead and observe that
(−f ) = −Gf
var(−f ) = var f and G .
Remark 5.2.3. Note that the constant 1/2 in the latter lemma is optimal. In order to see
this, choose f to be any non-decreasing function such that f |[0,1/2] = 0 and 0 < f (1) < ∞.
For this choice, the above calculation immediately shows that var Gf = 1/2 var f .
Now we are in a position to take the next step towards the proof that G is ψ-mixing,
namely, we will obtain a bound on the distance, arising from the supremum norm
164 | 5 Applications of infinite ergodic theory
Proof. First observe that for each x ∈ [0, 1] and f ∈ BV, we have that
, ,
|f (u )| − f dm G ≤ f (u ) − f (x ) dm G (x ) ≤ var f ,
0
and hence, f ∞ ≤ f dm G + var f . It follows from (2.4) by setting f := 1 that
0 n 0
G f dm G = f dm G , and thus the above observation gives, for each n ∈ N,
1 , 1 ,
1 n 1
1G f − f dm G 1 ≤ var G n f − f dm G = var Gnf .
1 1
∞
Lemma 5.2.5. For each B ∈ B, all m, n ∈ N0 and either every Gauss cylinder set C :=
C(x1 , . . . , x n ) of level n > 0 or for n = 0 and C := [0, 1], we have that
−n−m
λ G (B) ∩ C − m G (B) λ (C) ≤ 2−m log 2 m G (B) λ (C) .
and P G , we have
Proof. Using Lemma 5.2.4 and, as before, the relationship between G
that
−n−m
λ G (B) ∩ C − m G (B) λ (C)
,
= 1 B ◦ G m+n · h−1
G · 1 C dm G − m B λ C
G ( ) ( )
,
= m+n h−1
1B · G 1
G C − λ ( C ) 1 B dm
G
, 1 1
1 m n −1 1
≤ 1 B dm G 1G G h G 1 C − λ ( C )1
∞
5.2 ψ-mixing for the Gauss map and the Gauss problem | 165
n h−1 n −1
≤ m G (B) 2−m 2 var G 1
G C − G h 1
G C (0) − G n
h −1
1
G C (1)
−1 n
= m G (B) 2−m 2 var h−1 n −1 n
G P G (1 C ) − h G P G (1 C ) (0) − h G P G (1 C ) (1)
Here, the final inequality can be seen as follows: Directly from the definition of P G ,
we have that P nG (1 C ) is equal to the derivative of the inverse branch of G n that maps
the unit interval onto the Gauss cylinder C. With p n /q n := [x1 , . . . , x n ], this derivative
is given for y ∈ [0, 1] by 1/(q n + yq n−1 )2 (the proof of this fact is left to Exercise 5.7.3).
Thus we obtain the formula
log 2 (1 + y)
h−1 n
G P G (1 C ) (y) = .
(q n + yq n−1 )2
To shorten the notation, let us set f n := h−1 n
G P G (1 C ) . To obtain an upper bound on the
variation of this function, we first take the derivative:
1 2q n−1 (1 + y)
f n (y) = 1 − ,
(q n + yq n−1 )2 q n + yq n−1
and then observe that if x n = 1, this derivative is always negative (so the function
is monotonically decreasing), if x n ≥ 3, then the derivative is always positive (so the
function is monotonically increasing), and if x n = 2 we have that the function has a
maximum at the point y = q n /q n−1 −2 ∈ (0, 1). Therefore, in the cases x n = 1 and x n ≥ 3,
we have immediately that
log 2 2 log 2
var ( f n ) = f n (0) − f n (1) = 2 −
qn (q n + q n−1 )2
log 2 (q n−1 /q n )2 + 2q n−1 /q n − 1
≤ 2
q n + q n q n−1 1 + q n−1 /q n
$ 2 %
x + 2x − 1
≤ log 2 λ(C) max : x ∈ [0, 1]
1+x
= log 2 λ(C).
Theorem 5.2.7. The system ([0, 1], B , m G , G) is ψ-mixing with respect to the partition
α H . More precisely, for all positive integers m, n ∈ N, any Gauss cylinder set C :=
C(x1 , . . . , x n ) of level n and every set B ∈ B, we have that
m G G−n−m B ∩ C − m G (B) m G (C) ≤ 2−m log 2 m G (B) m G (C) .
Letting N tend to infinity in the above inequality and invoking Corollary 5.2.6 finishes
the proof of the theorem.
Remark 5.2.8. The history of ψ-mixing for the Gauss map is a rather long one, and
proving ψ-mixing for the Gauss map can be considered to be the first problem in the
metric theory of continued fractions. It originates in a letter which Gauss wrote to
Laplace on the 30th of January 1812, asking him to give an estimate of the error term
This problem, known as the generalised Gauss problem¹, remained open for a long
time, until R. O. Kuzmin [Kuz28] and P. Lévy [Lev29] independently
√
and almost
simultaneously gave a solution. Kuzmin showed that ρ n ≤ cκ n , for some positive
constants κ < 1 and c, whereas Lévy obtained the result that ρ n ≤ cθ n , for pos-
itive constants θ < 0.68 . . . and c. These results of Kuzmin and Lévy were then
followed by various improvements by several authors, among them W. Doeblin,
F. Schweiger and P. Szüsz. The currently most satisfying estimate has been obtained
1 Note that the actual Gauss problem was to show what we have obtained in Corollary 5.2.6.
5.3 Pointwise dual ergodicity for the Farey map | 167
by E. Wirsing [Wir74], who showed that the constant θ in Lévy’s estimate is equal to
0.30366300289873265860 . . ..
First let us recall from Chapter 2 the induced map FC1 of the Farey map on the
interval C1 := [1/2, 1], which was defined in Example 2.4.30. This map acts on points
[1, x2 , x3 , . . .] in C1 by
We showed in the proof of Proposition 2.4.31 that this induced system is measure-
theoretically isomorphic to the Gauss system. Given that G is ψ-mixing, as was shown
in the previous section (see Theorem 5.2.7), it follows immediately that the map FC1 is
also ψ-mixing. Before stating the first result, recall that pointwise dual ergodicity and
Darling–Kac sets were introduced in Definitions 4.2.1 and 4.3.1, respectively.
Lemma 5.3.1. The set C1 is a Darling–Kac set for the Farey map.
Proof. This follows directly from Proposition 4.3.3, in combination with the discussion
above.
Proof. Since, according to Lemma 5.3.1, the map F has a Darling–Kac set, the result
follows directly from Theorem 4.3.5.
Now we know that F is pointwise dual ergodic, we can obtain information about
wandering rates and return sequences. Let us denote the return sequence associated
to F by (v n )n≥1 . In order to determine the asymptotic type of (v n ), we must compute the
wandering rate (w n (C1 ))n≥1 , (see Definition 4.2.3), where we recall that the wandering
rate (w n (C)) for any C ∈ B is given by
n
w n (C) := νF F −(k−1) (C) .
k=1
Next, observe that this wandering rate is slowly varying at infinity, where we recall
from Chapter 3 that this means
Lemma 5.3.3. For the map F, the return sequence (v n ) can be defined by setting
n
v n := .
log(n)
Proof. This follows on combining Proposition 4.2.8 with the fact that F is pointwise
dual ergodic.
In this section, it will turn out to be helpful to have a formula for the transfer operator
:= F
F νF , as we had for G in Section 5.2. Exactly as before, we have that for f ∈ L1 (νF ),
) = h F−1 P F (h F f ).
F(f
Now, we aim to obtain uniform sets for the system ([0, 1], B , νF , F). First recall the
general definition of a uniform set stated in Definition 4.2.4: Let (X, B , μ, T) be
pointwise dual ergodic with return sequence (r n ). Then, a set A ∈ B with positive, finite
measure is called a uniform set for f ∈ L+1 (μ) if
,
1 k
n−1
T f → f dμ uniformly (mod μ) on A.
rn
k=0
We introduce a set of functions D, in order to show that the set C1 is uniform for all of
the elements of D. So, define
Proof. Let f ∈ D and consider the derivative of F(f ): By the monotonicity of f and f
we have
x
1 1 x
f − xf f −f
( f ) ( x ) = x+1 x+1 x+1 x+1
F 3
+ 2
,
( x + 1) ( x + 1)
3 45 6 3 45 6
>0 >0
5.4 Uniform and uniformly returning sets | 169
1 k 1 k 1 k
n n n
F f (1/2) ≤ F f (x) ≤ F f (1)
vn vn vn
k=0 k=0 k=0
1 k 1 k
n+1 n
1
≤ F f (1) ≤ f (1) + F f (1/2)
vn vn vn
k=0 k=0
−1 n
0
Since limn→∞ v n −1 f (1) = 0 the uniform convergence v n k f → f dνF follows
k=0 F
from pointwise dual ergodicity.
We will now introduce a somewhat stronger notion than uniform sets, namely, that of
uniformly returning sets.
Remark 5.4.4. If A, B are sets of finite μ-measure and B is uniformly returning for 1 A ,
we immediately obtain a form of mixing for infinite systems, in the sense that
,
w n μ(A ∩ T −n (B)) = w n T n (1 A ) · 1 B dμ → μ(A)μ(B).
looking at the problem of defining infinite mixing on the entire space from a more
physical point of view.
Proposition 5.4.5. For the Farey system ([0, 1], B , νF , F) we have that if v ∈ L1 (νF )
satisfies
,
n (v) = v dνF almost everywhere uniformly on C1 ,
lim w n F
n→∞
which gives
n =9
n =4
n =2
n =1
0 1 x
the unique element in A k such that F0 (x) = y. Using (5.1), the fact that F = h−1 P F (h F · v)
F
and the inductive hypothesis in tandem with the assumption that lim w n /w n+1 = 1, we
obtain that
n
wn F n (v) (y) = w n F n (v) (F0 (x)) = w n (P F (h F · v))(F0 (x))
h F (F0 (x))
w n (P n+1
F (h F · v))(x) − | F 1 (x)| · w n (P nF (h α · v))(F1 (x))
=
h F (F0 (x)) · |F0 (x)|
, ,
h (x) − h F (F1 (x)) · |F1 (x)|
∼ F vd ν F = vdνF ,
h α (F0 (x)) · |F0 (x)|
Let us now return to our collection D. To prove that C1 is indeed uniformly returning
for every element of D we need the following observation.
n f < F
F n−1 f , for all n ∈ N.
Proof. Fix f ∈ D. Again we use the fact that Ff (1) = f 1/2 . Then for every x ∈ C1 we
have
. /
n f (x) ≤ max F
F n f (x) : x ∈ C1 = F n f (1) = F
n−1 f 1/2
. /
= min F n−1 f (x) : x ∈ C1 ≤ F n−1 f (x) ,
Proposition 5.4.7. The set C1 is uniformly returning for every f ∈ D. That is for all f ∈ D
we have
,
log(n)Fn ( f ) → fdνF , uniformly on C1 .
Proof. Let λ, η ∈ R be arbitrary fixed real numbers with 0 < λ < η < ∞. Putting
n
V n := k ( f ) ,
F
k=0
we have by the monotonicity of the sequence F n ( f ) |C that
1
n∈N
F nη
(f ) V nη −V nλ
F nλ
(f )
· ( nη − nλ ) ≤ ≤ · ( nη − nλ ) .
Vn Vn Vn
172 | 5 Applications of infinite ergodic theory
n (1 − ε) (η − λ) ≤ nη − nλ ≤ n (1 + ε) (η − λ) .
nF nη
(f ) V nη −V nλ
nF nλ
(f )
· (1 − ε ) ( η − λ ) ≤ ≤ · (1 + ε ) ( η − λ ) .
Vn Vn Vn
Since
V nη −V nλ
→ ηα − λα as n → ∞ μ-a.e. uniformly on C1 ,
Vn
we obtain on the one hand
1 ηα − λα nλ ( f )
nF
· ≤ lim inf μ-a.e. uniformly on C1 .
1+ε η−λ n→∞ Vn
nF nλ
(f )
αλ α−1 ≤ lim inf μ-a.e. uniformly on C1 .
n→∞ Vn
On the other hand, we obtain similarly
nF nη
(f )
lim sup ≤ αη α−1 μ-a.e. uniformly on C1 .
n→∞ Vn
nF nc
(f )
→ αc α−1 μ-a.e. uniformly on C1 .
Vn
Finally using V nc ∼ cα Vn μ-a.e. uniformly on C1 and nc ∼ cn, we obtain for
m = nc
m ( f )
mF nc ( f ) V n
nc n F
= · · →α μ-a.e. uniformly on C1 .
Vm n Vn V nc
Remark 5.4.8. The material developed here for the set of functions D can be found
in greater generality in the context of operator renewal theory in various works,
including [Gou11, Gou04, MT15, MT12, Sar02, KKSS15, KKS15].
5.5 Finer asymptotics of Lebesgue measure of sum-level sets | 173
Using the results obtained in the previous sections, we are now in a position to state
and prove our first result on the finer asymptotics of the sum-level sets.
Theorem 5.5.1.
n
n
λ (C k ) ∼ .
log2 n
k=1
Proof. First, recall from Lemma 5.1.1 that Ck = F −(k−1) (C1 ). Therefore,
,
1
n n−1
1
· λ(Ck ) = 1C1 ◦ F k dλ
vn vn
k=1 k=0
,
n−1
1
= 1C1 ◦ F k · h F−1 dνF
vn
k=0
, n−1
1 k −1
= 1C1 F (h F ) dνF .
vn
k=0
n−1 k −1
Since C1 is a uniform set for h−1 −1
F ∈ D we have that limn→∞ v n k=0 F (h F ) =
0 −1
h F dνF = 1 a.e. uniformly on C1 , and so
1
n
lim · λ(Ck ) = log 2.
n→∞ v n
k=1
Our second, and final, theorem concerning the Lebesgue measure of the sum-level
sets gives a significant improvement of Theorem 5.1.4 and Theorem 5.5.1. That is, by
increasing the dosage of infinite ergodic theory, we are able to obtain the following
sharp estimate for the asymptotic behaviour of the Lebesgue measure of the sum-level
sets.
Theorem 5.5.2.
log 2
λ(Cn ) ∼ .
log n
Proof. We use again the fact from Lemma 5.1.1 that Ck = F −(k−1) (C1 ). Therefore,
,
w n · λ(Cn ) = w n 1C1 ◦ F n dλ
,
= w n 1C1 ◦ F n · h F−1 dνF
,
= 1C1 w n F n (h F−1 ) dνF .
174 | 5 Applications of infinite ergodic theory
a. e. uniformly on C1 , and so
Remark 5.5.3. We refer the interested reader to [Hee15] for an effective bound on the
error term of this asymptotic.
Let us now give an application of the above theorem to elementary metrical Dio-
phantine analysis. We first state the following result, which is a consequence of
Theorem 1.2.19, given in Section 1.2.4.
Theorem 5.5.4. For λ-almost every x = [x1 , x2 , x3 , . . .] ∈ [0, 1], we have that
log(x n /n)
lim sup = 1.
n→∞ log log n
Proof. In light of Corollary 1.2.20, for λ-a.e. x = [x1 , x2 , x3 , . . .] ∈ [0, 1] we have that for
all ε > 0 and all sufficiently large n ∈ N,
Taking logarithms of each side, it follows that for all sufficiently large n ∈ N,
or, rewriting this expression, again for all sufficiently large n ∈ N, we have
log(x n /n)
< 1 + ε.
log log(n)
Hence,
log(x n /n)
lim sup ≤ 1 + ε.
n→∞ log log(n)
log(x n /n)
lim sup ≤ 1.
n→∞ log log(n)
On the other hand, it also follows from Corollary 1.2.20 that for Lebesgue-almost every
x = [x1 , x2 , x3 , . . .] ∈ [0, 1], we have that for infinitely many n ∈ N,
x n > n log(n).
5.5 Finer asymptotics of Lebesgue measure of sum-level sets | 175
Then,
log(x n /n)
1 ≤ lim sup
n→∞ log log(n)
Proposition 5.5.5. For λ-almost every x = [x1 , x2 , x3 , . . .] ∈ [0, 1], we have that
log(x n+1 / ni=1 x i )
lim sup n ≤ 0.
n→∞ log log( i=1 x i )
k∈N i=1
and define
Aεn := C.
C∈A εn
n
λ(Aεn ) = λ(C(x1 , . . . , x k , x k+1 ))
k=1 (x1 ,...,x k )
k
x k+1 ≥n(log n)ε
x =n
i=1 i
n λ(C(x1 , . . . , x ))
k
ε
n(log n)
k=1
(x ,...,x )
1
k
k
x =n
i=1 i
176 | 5 Applications of infinite ergodic theory
1
n
= ε
λ(C(x1 , . . . , x k ))
n(log n)
k=1
(x ,...,x )
1
k
k
x =n
i=1 i
λ(Cn ) log 2
= ∼ .
n(log n)ε n(log n)1+ε
Hence, since the above calculation implies that the series ∞ ε
n=1 λ(An ) converges, a
ε
straightforward application of the Borel–Cantelli Lemma then yields, where A∞ :=
ε
lim sup An , that
n→∞
ε
λ(A∞ ) = 0, for each ε > 0.
ε
On considering the complement of the set A∞ in [0, 1], we have now shown that, for
each ε > 0 and for λ-almost all x = [x1 , x2 , x3 , . . .],
k ε
k
x k+1 < xi log x i , for all k ∈ N sufficiently large.
i=1 i=1
By taking logarithms on both sides of the above inequality, we obtain, for all
sufficiently large k ∈ N, that
k
k
log(x k+1 ) − log i=1 ix log x k+1 / x
i=1 i
= < ε.
k k
log log i=1 x i log log i=1 x i
Remark 5.5.6. Using these ideas, there are other results related to continued fractions
and Diophantine analysis that can be obtained. For instance, for the random variable
# k -
k
X n (x) := max xi : x i ≤ n, k ∈ N0 , x ∈ I,
i=1 i=1
the process n − X n is investigated in [KS08a]. In that paper, a uniform law and large
deviation law are derived.
For further interesting results in the context of continued fraction digit sums we
also refer to [GLJ93, GLJ96], wherein alternating sums of continued fraction digits are
considered.
5.6 Uniform distribution of the even Stern–Brocot sequence | 177
Let us begin this section by recalling the concept of weak convergence of probability
measures. We say a sequence (μ n )n∈N of Borel probability measures on (R, B)
converges weakly to a Borel probability measure μ if for all f ∈ Cb (R) we have
, ,
lim f dμ n = f dμ.
n→∞
For this we write w-limn μ n = μ. There are different effective ways to check weak
convergence despite the difficulty of trying to directly use the definition. We will need
the following two characterisations, which can be found in the standard literature
on probability, for example in [Gut13]. We recall that ∆ μ : x → μ((−∞, x]) denotes the
distribution function of μ.
– (Distribution function) w-limn μ n = μ if and only if limn→∞ ∆ μ n (x) = ∆ μ for every
x ∈ R that is a continuity point of ∆ μ .
– (Method of Moments) For a probability measure μ, the function
,
t → exp(t · x) dμ
is called the moment generating function (you will see why in Exercise 5.7.6). We
0 0
have that if exp(t · x) dμ n → exp(t · x) dμ ∈ R for n tending to infinity and for all
t in a neighbourhood of 0, then w-limn μ n = μ.
Let δ x denote the Dirac measure in x, that is δ x (A) = 1 for x ∈ A and δ x (A) = 0 otherwise.
Our aim in this section is to prove the following theorem:
Theorem 5.6.1. For each rational number v/w ∈ (0, 1] we have that
w-lim log(n vw ) q−2 δ p/q = λ. (5.2)
n→∞
p/q∈F −n {v/w}
In order to prove this result, we first prove the following proposition, and then one
further lemma, which will allow us to transfer the result stated below for intervals to
the atomic measures considered in Theorem 5.6.1.
Proof. Consider the family of functions (φ t )t∈[−1,1] given by φ t : x → x · exp (t · x). The
first aim is to show that for all t ∈ [−1, 1] we have
t ∈ D.
Fφ
Indeed, for t ∈ [−1, 0] this is an immediate consequence of F (D) ⊂ D (see Lemma 5.4.1)
by noting that φ t is increasing, concave with φ t (0) = 0, that is φ t ∈ D. For t ∈ (0, 1],
a straightforward computation shows that the first derivative of Fφ t at x ∈ [0, 1] is
given by
x x
1 1
φt − xφt φt − φt
t (x) = x+1 x+1 x+1 x+1
Fφ 3
+ 2
.
( x + 1) ( x + 1)
For the second derivative we then obtain
tx
−2 xt − 6x + 2t + xt 2 + 2x3 − 4tx2 − 4 exp
t x+1
Fφ (x) = 6
( x + 1)
t
2tx − 6x − 2t + xt 2 + 2x3 + 4tx2 − 4 exp
x+1
+ 6
.
( x + 1)
This immediately implies that Fφ t ≤ 0, for all t ∈ (0, 1]. Hence, Fφ t is concave and
we have that Fφ t is decreasing on [0, 1]. Since Fφ t (1) = 0, this shows that on
[0, 1] we have that Fφ t ≥ 0. Hence, Fφ t ∈ D, for all t ∈ [−1, 1].
We proceed by noting that Proposition 5.4.7 combined with Proposition 5.4.5
guarantees that every compact interval contained in (0, 1] is a uniformly returning set
for φ t , for each t ∈ [−1, 1]. In order to complete the proof of the proposition, we employ
the method of moments as follows. For each [a, b] ⊂ (0, 1] and for each t ∈ [−1, 1], we
have
,
log n
lim exp (tx) · · 1 F −n [a,b] (x) dλ(x)
( )
n→∞ νF [a, b]
log n log n
= lim · νF φ t · 1 F −n [a,b] = lim · νF Fn φt · 1
( ) [a,b]
n→∞ νF [a, b] n→∞ νF [a, b]
,
= νF (φ t ) = exp (tx) dλ(x).
For the next lemma we introduce the notation Φ for the free semi-group generated by
the inverse branches F0 and F1 of the Farey map F. Note that for each rational number
v/w ∈ (0, 1] we have that
{F −n {v/w} : n ∈ N} = g v/w : g ∈ Φ .
Moreover, note that the Φ-orbit of 1 is equal to the set of rational numbers contained in
(0, 1). (Note that this is just a slightly different way of repeating what we already knew
from Section 1.3, when obtaining the Farey coding of the rational numbers.) Then if we
x 1
associate matrices to the inverse branches F0 : x → 1+x and F1 : x → 1+x , and observe
that
8 9 # -
1 0 0 1 a b
, ⊂ GL2 (Z) := : a, b, c, d ∈ Z, |ad − bc| = 1} ,
1 1 1 1 c d
then to each g ∈ Φ we can associate a matrix ac db from GL2 (Z). The action of g on
C is given by g : z → az+b cz+d . Thus g(1) = v/w, for some v, w ∈ N such that v < w and
gcd(v, w) = 1. Furthermore, for the modulus of the derivative of g at x we have that
|g (x)| = |cx + d |2 . To make this more precise, see also Exercise 5.7.5.
In the following we let Uε (x) denote the interval centred at x ∈ R of Euclidean
diameter diam(Uε (x)) equal to ε > 0.
Lemma 5.6.3. For each g ∈ Φ there exists a constant C g such that for all ε > 0 sufficiently
small and for all h ∈ Φ, we have
diam(h(Uε (g(1)))) − ε |(h (g(1))| ≤ εC g diam(h(Uε (g(1)))).
Proof. First we will prove the following bounded distortion property. Using Exer-
cise 5.7.5, we know that for each g ∈ Φ we can find m, n ∈ N such that |g (z)| = |mz + n|2 .
Now fix z ∈ (0, 1) and let 0 < ε < z/2 and x, y ∈ Uε (z). Then
2 2 2
|g (x)| (my + n)2 m (y − x ) + 2nm(y − x)
sup
− 1 ≤ sup
− 1 ≤ sup
g∈Φ | g (y)|
2 (mx + n)2
m,n∈N (mx + n) m,n∈N
(2m2 + 2nm) 8 m2 + mn
≤ sup 2
|y − x | ≤ 2 |x − y | sup 2
m,n∈N (mz/2 + n) z m,n∈N (m + 2n)
16
≤ |x − y|.
z2
Here, the last inequality can be seen by treating the two cases m ≤ n and m > n
separately. Now, fix g ∈ Φ. Then we have, for 0 < ε < g(1)2 /32 and each h ∈ Φ,
180 | 5 Applications of infinite ergodic theory
ε | h
(g(1))| 1
− 1
diam(h(Uε (g(1)))) − 1 ≤ 1 0 g(1)+ε/2 |h (η)|
ε g(1)−ε/2 |h (g(1))| − 1 dη + 1
1 32ε
≤ 2
− 1 ≤ .
1 − 16ε/g(1) g(1)2
By Proposition 5.6.2, we then have that w-limn→∞ νg,ε,n = λ. Then observe that
ε
lim εu g,ε = lim = g(1), (5.3)
ε↘0 ε↘0 g(1) + ε/2
log
g(1) − ε/2
Using Lemma 5.6.3, we now obtain the following for all x ∈ [0, 1], where ∆(ν)
g,ε,n and
∆(ρ)
g,n denote the distribution functions of the measures νg,ε,n , and ρ g,n , respectively.
(ν) (ρ)
∆ g,ε,n (x) − ∆ g,n (x)
1
≤ log n εu g,ε diam(h(Uε (g(1))) − εg(1) |h (g(1))| + log n
ε h∈Φ:
n2
hg(1)∈F −(n−1) {g(1)}
1
≤ ε g(1) C g + |εu g,ε − g(1)| u g,ε log n |f (1)|
εu g,ε f ∈Φ:
f (1)∈F −(n−1) {g(1)}
n→∞ 1
→ ε g(1) C g + |εu g,ε − g(1)| ,
g(1)
5.7 Exercises | 181
where the convergence follows from Proposition 5.6.2 and (5.3). This inequality holds
for all x ∈ [0, 1], n ∈ N and the right-hand side vanishes for ε → 0. Hence, using the
condition for weak convergence in terms of distribution functions, we obtain that
w-lim ρ g,n = λ.
n→∞
The proof of Theorem 5.6.1 now follows, if we use in the definition of ρ g,n the fact that
g(1) can be written in the form of a reduced fraction v/w and that then |g (1)| = w−2 ,
as well as similarly, that f (1) can be written in the form of a reduced fraction p/q and
that then |f (1)| = q−2 (cf. Exercise 5.7.8).
Finally, let us recall here the even Stern–Brocot sequence which was first defined
in Section 1.3.1. For each n ≥ 0, the n-th member of the Stern–Brocot sequence is
denoted by Bn and the n-th member of the even Stern–Brocot sequence is defined
to be Sn := Bn \ Bn−1 . Recalling that Sn is exactly the set F −n ({1/2}), we obtain the
following immediate corollary.
5.7 Exercises
Exercise 5.7.1. Show that any function of bounded variation f can be written as
the difference g − h of two bounded functions g and h, which are either both
non-decreasing or both non-increasing.
Exercise 5.7.2. Show that if p i (x) := (1 + x)/((i + x) (i + 1 + x)) and f ∈ L1 (λ), then
∞
(x) = 1
Gf p i (x)f .
i+x
i=1
denote
denote the group of invertible 2 × 2 matrices over the integers and let Aut C
Show
the group of automorphism of C that is the set of bi-holomorphic mappings of C.
that the map
a b az + b
GL2 (Z) → Aut C , → z→
c d cz + d
defines a group homomorphism and determine its kernel. Further, show that for every
element φ : z → az+b −2
cz+d in the image of this homomorphism we have φ ( z ) = | cz + d | .
0
Exercise 5.7.6. The function t → exp(t · x) dμ(x) is called the moment generating
function of the probability distribution μ. Show that the k-th derivative of this function
0
in 0 can be used to find the k-th moment M k (μ) := x k dμ(x), k ∈ N0 , of μ, if it exists.
Exercise 5.7.7. For each g ∈ Φ there exists a constant ∆ g such that all ε > 0 sufficiently
small and for all h ∈ Φ, we have
diam(h(Uε (g(1)))) − ε |(h (g(1))| ≤ ε2 |(hg) (1)|∆ g .
Exercise 5.7.8. Show that for each p/q ∈ (0, 1) such that gcd(p, q) = 1, there exists a
unique element f ∈ Φ with f (1) = p/q, and we have f (1) = q2 .
Exercise 5.7.9. Prove that for all a, b ∈ (0, 1) with a < b we have
lim log(n)λ F −n ([a, b]) = log(b/a).
n→∞
Bibliography
[Aar81] J. Aaronson. The asymptotic distributional behaviour of transformations preserving
infinite measures. J. Anal. Math., 39:203–234, 1981.
[Aar97] J. Aaronson. An introduction to infinite ergodic theory, volume 50 of Mathematical
Surveys and Monographs. American Mathematical Society, Providence, RI, 1997.
[Adl98] R. L. Adler. Symbolic dynamics and Markov partitions. Bull. Amer. Math. Soc. (N.S.),
35(1):1–56, 1998.
[BBDK96] J. Barrionuevo, R. M. Burton, K. Dajani, and C. Kraaikamp. Ergodic properties of
generalized Lüroth series. Acta Arith., 74(4):311–327, 1996.
[Ber12a] F. Bernstein. Über eine Anwendung der Mengenlehre auf ein aus der Theorie der
säkularen Störungen herrührendes Problem. Math. Ann., 71:417–439, 1912.
[Ber12b] F. Bernstein. Über geometrische Wahrscheinlichkeit und über das Axiom der
beschränkten Arithmetisierbarkeit der Beobachtungen. Math. Ann., 72(4):585–587, 1912.
[Bir31] G. D. Birkhoff. Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA, 17:656–660,
1931.
[Bor09] E. Borel. Les probabilités denombrables et leurs applications arithmétiques. Rend. Circ.
Mat. Palermo, 27:247–271, 1909.
[Bro61] A. Brocot. Calcul des rouages par approximation, nouvelle méthode. Revue
chronométrique, 3:186–194, 1861.
[CO60] R. V. Chacon and D. S. Ornstein. A general ergodic theorem. Illinois J. Math., 4:153–160,
1960.
[Coh80] D. L. Cohn. Measure theory. Birkhäuser, Boston, Mass., 1980.
[Den38] A. Denjoy. Sur une fonction réelle de Minkowski. J. Math. Pures Appl. (9), 17:105–151,
1938.
[DK96] K. Dajani and C. Kraaikamp. On approximation by Lüroth series. J. Théor. Nombres
Bordeaux, 8(2):331–346, 1996.
[DK02] K. Dajani and C. Kraaikamp. Ergodic theory of numbers, volume 29 of Carus
Mathematical Monographs. Mathematical Association of America, Washington, DC,
2002.
[DS88] N. Dunford and J. T. Schwartz. Linear operators. Part I. Wiley Classics Library. John Wiley
& Sons, Inc., New York, 1988. General theory, With the assistance of William G. Bade and
Robert G. Bartle, Reprint of the 1958 original, A Wiley-Interscience Publication.
[Dud89] R. M. Dudley. Real analysis and probability. The Wadsworth & Brooks/Cole Mathematics
Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 1989.
[DV86] H. G. Diamond and J. D. Vaaler. Estimates for partial sums of continued fraction partial
quotients. Pacific J. Math., 122(1):73–82, 1986.
[EFP49] P. Erdös, W. Feller, and H. Pollard. A property of power series with positive coefficients.
Bull. Amer. Math. Soc., 55:201–204, 1949.
[Eri70] K. B. Erickson. Strong renewal theorems with infinite mean. Trans. Amer. Math. Soc.,
151:263–291, 1970.
[EW11] M. Einsiedler and T. Ward. Ergodic theory with a view towards number theory, volume 259
of Graduate Texts in Mathematics. Springer-Verlag London, Ltd., London, 2011.
[Fal14] K. Falconer. Fractal geometry. John Wiley & Sons, Ltd., Chichester, third edition, 2014.
Mathematical foundations and applications.
[Far16] J. Farey. On a curious property of vulgar fractions. Phil. Mag. Ser. 1, 47(217):385–386,
1816.
184 | Bibliography
[Fel68a] W. Feller. An introduction to probability theory and its applications. Vol. I. Third edition.
John Wiley & Sons, Inc., New York-London-Sydney, 1968.
[Fel68b] W. Feller. An introduction to probability theory and its applications. Vol. II. Third edition.
John Wiley & Sons, Inc., New York-London-Sydney, 1968.
[FK10] J. Fiala and P. Kleban. Intervals between Farey fractions in the limit of infinite level. Ann.
Sci. Math. Québec, 34(1):63–71, 2010.
[Gal72] J. Galambos. Some remarks on the Lüroth expansion. Czechoslovak Math. J.,
22(97):266–271, 1972.
[Gal73] J. Galambos. The largest coefficient in continued fractions and related problems. In
Diophantine approximation and its applications (Proc. Conf., Washington, D.C., 1972),
pages 101–109. Academic Press, New York, 1973.
[Gam01] T. W. Gamelin. Complex analysis. Undergraduate Texts in Mathematics. Springer-Verlag,
New York, 2001.
[Gan01] C. Ganatsiou. On some properties of the Lüroth-type alternating series representations
for real numbers. Int. J. Math. Math. Sci., 28(6):367–373, 2001.
[GL63] A. Garsia and J. Lamperti. A discrete renewal theorem with infinite mean. Comment.
Math. Helv., 37:221–234, 1962/1963.
[GLJ93] Y. Guivarc’h and Y. Le Jan. Asymptotic winding of the geodesic flow on modular surfaces
and continued fractions. Ann. Sci. École Norm. Sup. (4), 26(1):23–50, 1993.
[GLJ96] Y. Guivarch and Y. Le Jan. Note rectificative: “Asymptotic winding of the geodesic flow on
modular surfaces and continued fractions” (Ann. Sci. École Norm. Sup. (4) 26(1):23–50,
1993; MR1209912 (94a:58157)). Ann. Sci. École Norm. Sup. (4), 29(6):811–814, 1996.
[GM88] M. C. Gutzwiller and B. B. Mandelbrot. Invariant multifractal measures in chaotic
Hamiltonian systems, and related structures. Phys. Rev. Lett., 60(8):673–676, 1988.
[Gou04] S. Gouëzel. Sharp polynomial estimates for the decay of correlations. Israel J. Math.,
139:29–65, 2004.
[Gou11] S. Gouëzel. Correlation asymptotics from large deviations in dynamical systems with
infinite measure. Colloq. Math., 125(2):193–212, 2011.
[Gut11] S. B. Guthery. A motif of mathematics. Docent Press, Boston, MA, 2011. History and
application of the mediant and the Farey sequence.
[Gut13] A. Gut. Probability: A Graduate Course. Springer Texts in Statistics. Springer, New York,
second edition, 2013.
[Hal56] P. R. Halmos. Lectures on Ergodic Theory. Publications of the Mathematical Society of
Japan, no. 3. The Mathematical Society of Japan, 1956.
[Hee15] B. Heersink. An effective estimate for the Lebesgue measure of preimages of iterates of
the farey map. Advances in Mathematics Volume 291, 19 March 2016, Pages 621–634.
[Hei87] L. Heinrich. Rates of convergence in stable limit theorems for sums of exponentially
ψ-mixing random variables with an application to metric theory of continued fractions.
Math. Nachr., 131:149–165, 1987.
[Hen00] D. Hensley. The statistics of the continued fraction digit sum. Pacific J. Math.,
192(1):103–120, 2000.
[HW08] G. H. Hardy and E. M. Wright. An introduction to the theory of numbers. Oxford University
Press, Oxford, sixth edition, 2008. Revised by D. R. Heath-Brown and J. H. Silverman,
With a foreword by Andrew Wiles.
[Ios92] M. Iosifescu. A very simple proof of a generalization of the Gauss-Kuzmin-Lévy theorem
on continued fractions, and related questions. Rev. Roumaine Math. Pures Appl.,
37(10):901–914, 1992.
[Iso11] S. Isola. From infinite ergodic theory to number theory (and possibly back). Chaos,
Solitons and Fractals, 44(7):467–479, 2011.
Bibliography | 185
[Mar92] G. Markowsky. Misconceptions about the golden ratio. Coll. Math. J., 23(1):2–19, 1992.
[Min10] H. Minkowski. Geometrie der Zahlen. In 2 Lieferungen. II. (Schluß-) Lieferung. Leipzig:
B. G. Teubner. VIII + S. 241–256 (1910), 1910.
[MN13] T. Miernowski and A. Nogueira. Exactness of the Euclidean algorithm and of the Rauzy
induction on the space of interval exchange transformations. Ergodic Theory Dynam.
Syst., 33(1):221–246, 2013.
[MT12] I. Melbourne and D. Terhesiu. Operator renewal theory and mixing rates for dynamical
systems with infinite measure. Invent. Math., 189(1):61–110, 2012.
[MT15] I. Melbourne and D. Terhesiu. Erratum to: Operator renewal theory and mixing rates for
dynamical systems with infinite measure. Invent. Math., 202(3):1269–1272, 2015.
[MU03] R. D. Mauldin and M. Urbański. Graph directed Markov systems, volume 148 of
Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 2003.
Geometry and dynamics of limit sets.
[Mun11] S. Munday. Finite and infinite ergodic theory for linear and conformal dynamical systems.
PhD thesis, University of St. Andrews, 2011.
[Mun14] S. Munday. On the derivative of the α-Farey-Minkowski function. Discrete Contin. Dyn.
Syst., 34(2):709–732, 2014.
[Neu32] J. von Neumann. Proof of the quasi-ergodic hypothesis. Proc. Natl. Acad. Sci. USA,
18:70–82, 1932.
[Par81] W. Parry. Topics in ergodic theory, volume 75 of Cambridge Tracts in Mathematics.
Cambridge University Press, Cambridge–New York, 1981.
[Phi88] W. Philipp. Limit theorems for sums of partial quotients of continued fractions. Monatsh.
Math., 105(3):195–206, 1988.
[Phi76] W. Philipp. A conjecture of Erdős on continued fractions. Acta Arith., 28(4):379–386,
1975/76.
[Roh48] V. Rohlin. A “general” measure-preserving transformation is not mixing. Doklady Akad.
Nauk SSSR (N.S.), 60:349–351, 1948.
[Rok64] V. A. Rokhlin. Exact endomorphisms of a Lebesgue space. Transl., Ser. 2, Am. Math. Soc.,
39:1–36, 1964.
[RS92] A. M. Rockett and P. Szüsz. Continued fractions. World Scientific Publishing Co., Inc.,
River Edge, NJ, 1992.
[Rud87] W. Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third edition,
1987.
[Rud91] W. Rudin. Functional analysis. International Series in Pure and Applied Mathematics.
McGraw-Hill, Inc., New York, second edition, 1991.
[Sal43] R. Salem. On some singular monotonic functions which are strictly increasing. Trans.
Amer. Math. Soc., 53:427–439, 1943.
[Šal68] T. Šalát. Zur metrischen Theorie der Lürothschen Entwicklungen der reellen Zahlen.
Czechoslovak Math. J., 18(93):489–522, 1968.
[Sar02] O. Sarig. Subexponential decay of correlations. Invent. Math., 150(3):629–653, 2002.
[Sch95] F. Schweiger. Ergodic theory of fibred systems and metric number theory. Oxford Science
Publications. The Clarendon Press, Oxford University Press, New York, 1995.
[Sen76] E. Seneta. Regularly varying functions, volume 508 of Lecture Notes in Mathematics.
Springer-Verlag, Berlin–New York, 1976.
[Ste58] M. A. Stern. Ueber eine zahlentheoretische Funktion. J. Reine Angew. Math., 55:193–220,
1858.
[SW07] L.-M. Shen and J. Wu. On the error-sum function of Lüroth series. J. Math. Anal. Appl.,
329(2):1440–1445, 2007.
Bibliography | 187
– Lagrange’s 16 – non-singular 70
– Maharam’s Recurrence 74 – Schweiger’s jump 31
– Poincare’s Recurrence 74 trivial σ-algebra 158
– Radon–Nikodým 76
uniform set 148, 168
– Strong Renewal 130 uniformly returning set 169
topological dynamical system 1
topologically conjugate 13 V * -invariant set 158
transfer operator 76
transformation wandering rate 147, 167
wandering sets 69
– ergodic 88
weak convergence 177
– exact 94 wedge 17
– jump 47, 86 Wirsing, Eduard 167
– measure-preserving 64 words 17