(De Gruyter Graduate) Marc Kesseböhmer, Sara Munday, Bernd Otto Stratmann - Infinite Ergodic Theory of Numbers-De Gruyter (2016)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 206

Marc Kesseböhmer, Sara Munday, Bernd Otto Stratmann

Infinite Ergodic Theory of Numbers


De Gruyter Graduate
Also of Interest
Ergodic Theory
Idris Assani (Ed.), 2016
ISBN 978-3-11-046086-5, e-ISBN (PDF) 978-3-11-046151-0,
e-ISBN (EPUB) 978-3-11-046091-9

De Gruyter Studies in Mathematics


Carsten Carstensen et al. (Ed.)
ISSN 0179-0986

Volume 62
Positive Dynamical Systems in Discrete Time: Theory, Models, and
Applications
Ulrich Krause, 2015
ISBN 978-3-11-036975-5, e-ISBN (PDF) 978-3-11-036569-6,
e-ISBN (EPUB) 978-3-11-039134-3

Volume 61
Markov Operators, Positive Semigroups and Approximation
Processes
Francesco Altomare, Mirella Cappelletti, Vita Leonessa,
Ioan Rasa, 2014
ISBN 978-3-11-037274-8, e-ISBN (PDF) 978-3-11-036697-6,
e-ISBN (EPUB) 978-3-11-038641-7

Volume 59
Topological Dynamical Systems: An Introduction to the Dynamics
of Continuous Mappings
Jan Vries, 2014
ISBN 978-3-11-034073-0, e-ISBN (PDF) 978-3-11-034240-6,
e-ISBN (EPUB) 978-3-11-037459-9

De Gruyter Series in Nonlinear Analysis and Applications


Jürgen Appell et al. (Ed.)
ISSN 0941-813X
Marc Kesseböhmer, Sara Munday,
Bernd Otto Stratmann

Infinite Ergodic
Theory of Numbers

|
Mathematics Subject Classification 2010
37Axx, 11B57, 11J70, 11J83, 60K05

Authors
Dr. Sara Munday
Università di Bologna
Department of Mathematics

Professor Dr. Marc Kesseböhmer


Universität Bremen
Department of Mathematics

Professor Dr. Bernd O. Stratmann†

ISBN 978-3-11-043941-0
e-ISBN (PDF) 978-3-11-043942-7
e-ISBN (EPUB) 978-3-11-043085-1

Library of Congress Cataloging-in-Publication Data


A CIP catalog record for this book has been applied for at the Library of Congress.

Bibliographic information published by the Deutsche Nationalbibliothek


The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

© 2016 Walter de Gruyter GmbH, Berlin/Boston


Typesetting: Konvertus, Haarlem, NL
Printing and binding: CPI books GmbH, Leck
♾ Printed on acid-free paper
Printed in Germany

www.degruyter.com
Preface
This book is intended to serve as an introduction to Infinite Ergodic Theory for
advanced undergraduate and PhD students and should be appropriate as a text for
a seminar or reading course. We hope that some aspects of the presented material
will be of interest also to researchers. The prerequisites we have assumed are a certain
familiarity with measure theory and (to a lesser extent) basic concepts from functional
analysis. For the rest, we have attempted to be as self-contained as possible.
One of the fundamental objectives of ergodic theory is to investigate dynamical
systems from a measure-theoretic perspective. Central to this conception are invariant
measures – measures which remain unchanged under the dynamics. In classical
ergodic theory these measures are probability measures, whereas the topic of infinite
ergodic theory is systems which preserve infinite measures. This seemingly small
change engenders radically different results, as we will see. A systematic approach to
this part of ergodic theory has been provided in the foundational work of Jon Aaronson
[Aar97], which is still a major source for a coherent presentation and also guided us in
the preparation of this graduate text. In addition to Aaronson’s book, we also found
helpful the book by Dajani and Kraaikamp [DK02] and the survey article by Stefano
Isola [Iso11], as well as unpublished lecture notes by Omri Sarig, Maximilian Thaler,
and Roland Zweimüller.
Our aim with this book is to illuminate various aspects of infinite ergodic theory
by using several concrete examples of dynamical systems that are strongly linked to
number theory. We will use these examples to analyse some explicit questions (like the
asymptotic behaviour of sum-level sets) to illustrate not only the powerful methods
from infinite ergodic theory but also the strong connection between infinite ergodic
theory and renewal theory.
Another theme in the book is elementary Diophantine approximation. We give
some classical results for continued fractions in Chapter 1, and, with the intention of
closing a circle of ideas, in Chapter 5 we show how the analysis of the sum-level sets
for the continued fraction expansion also gives rise to some further Diophantine-type
results. The final application in Chapter 5 is to establish a uniform law for the
Stern-Brocot sequence contrasting a famous analogous result for the Farey sequence.
The book is organised with chapters containing general theory sandwiched
between chapters devoted more to our examples and to applications. The first chapter
consists of a little necessary background material and the introduction of all our main
examples. It is followed by a chapter containing some standard results in ergodic
theory and the beginnings of the theory for infinite systems. Chapter 3 is then devoted
to renewal theory and its application to certain piecewise-linear systems. We return
to more general infinite ergodic theory for Chapter 4, and finally, in Chapter 5 we see
some applications of this theory to the Gauss and Farey systems.
VI | Preface

Unfortunately, our good friend, co-author and colleague Bernd O. Stratmann


died before the manuscript of this book was complete – we have tried our best to
finalise the book also in his spirit. We would like to extend our heartfelt thanks
for helpful conversations and for reading various sections to Kathryn Lorenz, Marco
Lenci, Giampaolo Cristadoro, Henna Koivusalo, Andrei Ghenciu, Mauro Artigiani,
Tushar Das, Niclas Technau, Arne Mosbach and Konstantin Schäfer.

August 2016,
Marc Kesseböhmer and Sara Munday
Contents

Mathematical symbols | IX

1 Number-theoretical dynamical systems | 1


1.1 Continued fractions and Diophantine approximation | 1
1.1.1 Continued fractions | 1
1.1.2 Elementary Diophantine approximation: Hurwitz’s Theorem and badly
approximable numbers | 7
1.2 Topological Dynamical Systems | 12
1.2.1 The Gauss map | 15
1.2.2 Symbolic dynamics | 16
1.2.3 A return to the Gauss map | 19
1.2.4 Elementary metrical Diophantine analysis | 21
1.2.5 Markov partitions for interval maps | 26
1.3 The Farey map: definition and topological properties | 30
1.3.1 The Farey map | 30
1.3.2 Topological properties of the Farey map | 34
1.4 Two further examples | 40
1.4.1 The α-Lüroth maps | 40
1.4.2 The α-Farey maps | 44
1.4.3 Topological properties of F α | 48
1.4.4 Expanding and expansive partitions | 51
1.4.5 Metrical Diophantine-like results for the α-Lüroth expansion | 53
1.5 Notes and historical remarks | 58
1.5.1 The Farey sequence | 58
1.5.2 The classical Lüroth series | 59
1.6 Exercises | 61

2 Basic ergodic theory | 64


2.1 Invariant measures | 64
2.1.1 Invariant measures for the Gauss and α-Lüroth system | 66
2.2 Recurrence and conservativity | 69
2.3 The transfer operator | 75
2.3.1 Jacobians and the change of variable formula | 75
2.3.2 Obtaining invariant measures via the transfer operator | 76
2.3.3 The Ruelle operator | 79
2.3.4 Invariant measures for F , F α , G and L α | 82
2.3.5 Invariant measures via the jump transformation | 86
VIII | Contents

2.4 Ergodicity and exactness | 88


2.4.1 Ergodicity of the systems G and L α | 92
2.4.2 Ergodic theorems for probability spaces and consequences for the
Gauss and α-Lüroth systems | 96
2.4.3 Ergodic theorems for infinite measures | 103
2.4.4 Inducing | 105
2.4.5 Uniqueness of the invariant measures for F and F α | 109
2.4.6 Proof of Hopf’s Ratio Ergodic Theorem | 112
2.5 Exactness revisited | 115
2.6 Exercises | 120

3 Renewal theory and α-sum-level sets | 122


3.1 Sum-level sets | 122
3.2 Sum-level sets for the α-Lüroth expansion | 123
3.2.1 Classical renewal results | 124
3.2.2 Renewal theory applied to the α-sum-level sets | 132
3.3 Exercises | 135

4 Infinite ergodic theory | 137


4.1 The functional analytic perspective and the Chacon–Ornstein
Ergodic Theorem | 137
4.2 Pointwise dual ergodicity | 147
4.3 ψ-mixing, Darling–Kac sets and pointwise dual ergodicity | 151
4.4 Exercises | 157

5 Applications of infinite ergodic theory | 159


5.1 Sum-level sets for the continued fraction expansion,
first investigations | 159
5.2 ψ-mixing for the Gauss map and the Gauss problem | 161
5.3 Pointwise dual ergodicity for the Farey map | 167
5.4 Uniform and uniformly returning sets | 168
5.5 Finer asymptotics of Lebesgue measure of sum-level sets | 173
5.6 Uniform distribution of the even Stern–Brocot sequence | 177
5.7 Exercises | 181

Bibliography | 183

Index | 188
Mathematical symbols
∀ ‘for all’.
∃ ‘exists’.
0n n consecutive appearances of the symbol 0. 32
· operator norm. 76
| · |p p-norm for p ∈ [1, ∞]. 137
∅ empty set.
=⇒ ‘implies’.
⇐⇒ ‘equivalent’.
#E cardinality of E. 18
:= ‘equal by definition’.

A⊆B topological closure of A in R. 26


A* set of recurrent points in A. 105
[a, b] closed interval.
[a, b) right open interval.
(a, b] left open interval.
(a, b) open interval.
A B symmetric difference of the sets A and B. 70
α̊ Markov partition for L α . 40
αD dyadic partition for L α . 43
αH harmonic partition for L α . 42

Aut(C) group of automorphisms acting on C.  182

B badly approximable numbers. 11


BA σ-algebra B restricted to the set A. 106, 108
Bα,N N-badly α-approximable numbers. 54
Bα,φ φ-badly α-approximable numbers. 56
BN subset of the set of badly approximable numbers. 11
Bn n-th member of the Stern–Brocot sequence. 33

B(s) := ∞ n
n=0 b n s . 131

C set of complex numbers.


C α (1 , . . . , k ) α-Lüroth cylinder set. 44
Cb (R) real-valued bounded continuous functions. 177
Cn sum-level set for the continued fraction expansion. 122
X | Mathematical symbols

CT conservative part with respect to the measure-preserving


transformation T. 72
CV conservative part with respect to the contracting operator V. 141
C(x1 , . . . , x n ) Gauss cylinder. 19
 1 , . . . , xn )
C(x Farey cylinder. 33
 α (x1 , . . . , x n )
C α-Farey cylinder. 48
T
C  for a non-singular
conservative part with respect to T
transformation T. 78

∆μ distribution function of a measure μ with support in [0, 1]. 36


δx Dirac measure in x. 177
diam(A) Euclidian diameter of the set A. 28
dimH (A) Hausdorff dimension of the set A. 55
dν /dμ Radon–Nikodým derivative of ν with respect to μ. 76
d(ω, τ) distance between ω and τ with respect to the metric d. 17
DT dissipative part with respect to the measure-preserving
transformation T. 72, 78, 141
dv greatest common divisor of {n > 0 : v n > 0}. 125

E set of exceptional points of a countable Markov partition. 26


E* set of all finite non-empty words over the alphabet E. 17
EN set of all infinite non-empty words over the alphabet E. 17
ε empty word; word having no letters. 17

F Farey map given by x → x/(1 − x) on [0, 1/2], x → (1 − x)/x


on (1/2, 1]. 30
F* jump transformation on (1/2, 1] of the Farey map F. 31

f A (x) := φ(x)−1
j=0 f ◦ T j (x); induced version of f on A. 112
Fα α-Farey map. 44
F *α jump transformation on A1 of F α . 46
F α,j inverse branches of the α-Farey map F α , j = 0, 1. 45
Fj inverse branches of F, j = 0, 1. 31
Fn n-th Farey sequence. 58
F+ subset of non-negative functions from the function space F . 66

G Gauss map given by x → 1/x − 1/x . 15


gcd greatest common divisor. 125
GL2 (Z) general linear group. 179
Gn n-th inverse branch of G. 15

hF invariant density of νF with respect to λ. 82


h Fα invariant density of νF α with respect to λ. 83
Mathematical symbols | XI

hG invariant density of m G with respect to λ. 85


h Lα invariant density of m α with respect to λ. 85

I set of irrational numbers.


Iα set of α-irrational numbers. 42
Int(A) interior of the set A. 40

Ji := d(λ ◦ T i )/dλ; Jacobian for the i-th inverse branch of the interval
map T. 81

κ± Hölder and sub-Hölder exponent. 49

L1 (μ) space of μ-integrable functions. 66, 137


[ 1 , 2 , . . . ,  k ] α α-Lüroth expansion. 42
[1 , 2 , 3 , . . .]α α-Lüroth coding. 47
Lα α-Lüroth map. 40
L α,n inverse branches of L α . 67
λ Lebesgue measure. 67
L∞ (μ) space of μ-essentially bounded functions. 66, 137
L(x) Lüroth map. 60

M Markov partition. 27, 40


m |A measure m restricted to BA . 106
M(B) equivalence classes of B-measurable functions. 65
M(B ) set of B-measurable functions. 65
mG Gauss measure; invariant λ-absolutely continuous measure
for G. 67
μg measure with density g with respect to the measure μ. 75, 140

N set of natural numbers.


N0 set of natural numbers including 0.
ν (x) Hurwitz constant for x. 10
νF invariant λ-absolutely continuous measure for F. 82
νF α invariant λ-absolutely continuous measure for F α . 83
ν μ ν absolutely continuous with respect to μ. 75

O(x) forward orbit of x. 12


ω |n initial block (ω1 , ω2 , . . . , ω n ) of length n of ω ∈ E* . 17
|ω| length of a word ω. 17
[ω] cylinder set with respect to the finite word ω ∈ E* . 17
ω∧τ longest common initial block of two words ω and τ. 17
XII | Mathematical symbols

p(x) first passage time of x. 86


PF Ruelle operator for the map F. 82
P Fα Ruelle operator for the map F α . 84
PG Ruelle operator for the map G. 85
φ return time to the set A ∈ B. 105
φn := S An (φ). 112
p k /q k k-th convergent to the continued fraction. 2
PT Ruelle operator for the differential Markov interval map T. 81

Q set of rational numbers.


Qα set of α-rational numbers. 42
Q Minkowski’s question-mark function. 35

R set of real numbers.


ρ(x) hitting time under F of x to the interval (1/2, 1]. 31
ρ α (x) hitting time under F α of x to the interval A1 . 45
r(α)
k
(x) := [1 , . . . , k ]α ; α-Lüroth convergent of x. 44
rn n-th remainder. 4

sn := q n−1 /q n . 4
σ shift map of EN . 18
Sn = Bn \ Bn−1 ; vertices of the Stern–Brocot Tree. 34
S An (h) ergodic sums for the induced system with respect to h : A → R. 112

T E* jump transformation on T with respect to E. 86


θ α (x) conjugating homeomorphism between ([0, 1], F α )
and ([0, 1], T). 49
Ti measurable inverse branches for an interval map T. 80
Tk denotes the map x → kx mod 1, k > 2, on R/Z. 14
T 
μ = T transfer operator of T with respect to the measure μ. 76
Tn Stern–Brocot intervals of order n. 33
T:X→Y a map from X to Y.

U ∨V join of two collections of subsets of X. 27


Uε (x) open neighbourhood of x with diameter ε. 179
Un n-th refinement of a collection U . 27

V* dual operator of V. 138


var[a,b] f variation of f over the interval [a, b]. 162
var f total variation of f on [0, 1]. 162

Vn := ∞ k=n v k . 129
((v n )n≥1 , (w n )n≥0 ) renewal pair. 124
Mathematical symbols | XIII

Wα well α-approximable numbers. 55


Wα,φ φ-well α-approximable numbers. 56
w-limn μ n weak-* limit of the sequence of measures (μ n ). 177
WT collection of all wandering sets for a map T. 69

X non-empty metric space.


X* dual space; set of all continuous linear functionals f : X → R. 77
 x 1 , x 2 , . . . α α-Farey coding. 47
[x1 , x2 , x3 , . . .] regular continued fraction expansion of an irrational number. 1
x1 , x2 , x3 , . . . Farey coding. 32
[x1 , x2 , . . . , x n ] regular continued fraction expansion of a rational number. 2
[x1 , . . . , x k ] periodic continued fraction expansion. 16
(X, B , μ, T) measure-theoretic dynamical system. 64
x the greatest integer not exceeding x. 5
{x} fractional part of x. 5
(X, B , μ) σ-finite measure space. 64
(X, T) (topological) dynamical system. 1

Z set of integers.
1 Number-theoretical dynamical systems
Throughout this book, a (topological) dynamical system (X, T) means simply a
non-empty metric space X and a continuous map T : X → X. In this chapter, we
introduce various dynamical systems that generate real number expansions. Such
systems will be referred to as number-theoretic dynamical systems. The examples we
will consider here are mainly constructed over the unit interval. For further simple
examples of number-theoretic dynamical systems, including that of the map that gives
rise to the familiar decimal expansion, we refer to Dajani and Kraaikamp [DK02].

1.1 Continued fractions and Diophantine approximation

In this chapter, the main goal is to introduce two well-studied number-theoretic


dynamical systems, namely, the Gauss map and the Farey map. Both of these maps
are related to the continued fraction expansion, which we introduce below. We give
here a very succinct introduction, but for more details there are several good books
available, for instance the classical text by Khintchine [Khi64] or the more modern
approach given by Rockett and Szűsz [RS92].

1.1.1 Continued fractions

An expression of the form

1
[x1 , x2 , x3 , . . .] := , (1.1)
1
x1 +
1
x2 +
x3 + · · ·

where each x i , for i ∈ N, is a positive integer, is called a regular continued fraction


expansion (or simply a continued fraction). We will refer to the numbers x1 , x2 , x3 , . . .
as the elements of the continued fraction. The number of elements of the continued
fraction may be finite or infinite. In the first case, we will say we have a finite continued
fraction and in the second case we will say that we have an infinite continued fraction.
A finite continued fraction is the result of a finite number of rational operations and
2 | 1 Number-theoretical dynamical systems

hence it represents the rational number given by

1
[x1 , . . . , x n ] := . (1.2)
1
x1 +
1
x2 + · · · +
xn
Notice that for x n ≥ 2, the continued fractions [x1 , . . . , x n − 1, 1] and [x1 , . . . , x n ]
represent the same number. It is typical to use only the latter expression, but on
occasion we find it helpful to have the option of using either.
We cannot immediately assign a value to an infinite continued fraction, so, for
the time being, it should be thought of only as a formal notation, akin to that for an
infinite series.
In the theory of continued fractions, a particularly important role is played
by the initial segments of each (finite or infinite) continued fraction. For a given
continued fraction [x1 , x2 , x3 , . . .], we consider the sequence of rational numbers
([x1 , . . . , x k ])k≥1 . For each k ∈ N, we will write p k /q k := [x1 , . . . , x k ], where the positive
integers p k and q k are required to be coprime. Then p k /q k will be called the k-th
convergent to the continued fraction [x1 , x2 , x3 , . . .]. In particular,
p1 1 p 1 x2
= and 2 = = .
q1 x1 q2 x1 + 1/x2 x1 x2 + 1

For a finite continued fraction [x1 , . . . , x n ] with n elements, we have


pn
= [x1 , . . . , x n ].
qn

In this case, there are only n convergents. Each infinite continued fraction has an
infinite sequence of convergents. The justification for this terminology will come
in Proposition 1.1.3, but first let us give the recurrence relations that describe the
formation of the convergents. Here, we make the further definition that p0 := 0 and
q0 := 1.

Theorem 1.1.1. For each n ∈ N, we have that


(a) p n+1 = x n+1 p n + p n−1 ,
(b) q n+1 = x n+1 q n + q n−1 ,
(c) q n p n−1 − p n q n−1 = (−1)n .

Proof. The statements in (a) and (b) can be proved by induction; we leave this as an
exercise for the reader. For part (c), multiplying part (a) by q n and part (b) by p n , then
subtracting the first from the second yields

q n+1 p n − p n+1 q n = −(q n p n−1 − p n q n−1 ).


1.1 Continued fractions and Diophantine approximation | 3

It then suffices to notice that q1 p0 − p1 q0 = −1 to complete the proof of the


theorem.
Part (c) of Theorem 1.1.1 has the following immediate and useful corollary.

Corollary 1.1.2.
(a) For all n ≥ 1,

p n−1 p n (−1)n
− = .
q n−1 q n q n q n−1

(b) For all n ≥ 2,

p n−2 p n (−1)n−1 x n
− = .
q n−2 q n q n q n−2

This corollary allows us to reach important conclusions about the sequence of


convergents to a continued fraction. Specifically, we have the following result.
 
Proposition 1.1.3. The sequence of convergents p n /q n n≥1 of the continued fraction
[x1 , x2 , x3 , . . .] satisfies the following four properties:
 
(a) The sequence p2n /q2n n≥1 of even convergents is increasing.
 
(b) The sequence p2n−1 /q2n−1 n≥1 of odd convergents is decreasing.
(c) Every convergentof odd order is greater than every convergent of even order.
p n+1 p n
(d) lim − = 0.
n→∞ q n+1 qn
Proof. The statements in parts (a) and (b) follow directly from Corollary 1.1.2 (b). For
part (c), observe that by Corollary 1.1.2 (a) we have that every odd convergent is greater
than the directly preceding even convergent. Thus, p2k+1 /q2k+1 is greater than all of
the smaller-index even convergents: p2 /q2 , p4 /q4 , . . . , p2k /q2k . Suppose that there
exists some convergent p2m /q2m , where m > k, such that
p2m p2k+1
> .
q2m q2k+1

But then,
p2m p2m+1 p2k+1
< < ,
q2m q2m+1 q2k+1

where the last inequality comes again from Corollary 1.1.2 (a). This contradiction
finishes the proof of part (c). Finally, for part (d), it suffices to show that
n−1
qn ≥ 2 2 . (1.3)

Indeed, directly from Theorem 1.1.1 (b), we have that

q n = x n q n−1 + q n−2 ≥ q n−1 + q n−2 ≥ 2q n−2 .


4 | 1 Number-theoretical dynamical systems

Thus, repeated applications of the above inequality yields

q2n ≥ 2n q0 = 2n and q2n+1 ≥ 2n q1 ≥ 2n .

This finishes the proof.


Using the above proposition, it then makes sense to say that the value of the infinite
continued fraction [x1 , x2 , x3 , . . .] is equal to the real number that is the limit of the
sequence of convergents associated with it.

Definition 1.1.4. For x = [x1 , x2 , x3 , . . .] ∈ (0, 1] and n ∈ N, let r n and s n be defined as


follows:
1 q
r n := and s n := n−1 .
[x n , x n+1 , x n+2 , . . .] qn

The number r n is called the n-th remainder of x.

Note that for x = [x1 , x2 , x3 , . . .] and n ∈ N, the following recurrence relation holds:
1
rn = xn + .
r n+1
Since q n = x n q n−1 + q n−2 , we have that
qn 1
s−1
n = = xn + .
q n−1 q n−1 /q n−2

Clearly, this process may be continued until q1 /q0 = x1 is reached. Therefore,

s n = [x n , x n−1 , . . . , x1 ].

Theorem 1.1.5. For x = [x1 , x2 , x3 , . . .] ∈ (0, 1] and n ∈ N, we have that


p n r n+1 + p n−1
x= .
q n r n+1 + q n−1

Proof. We will prove the theorem by induction. For n = 1, on recalling the definitions
p0 := 0 and q0 := 1, we deduce that
p1 r2 + p0 r +0 1
= 2 = = x.
q1 r2 + q0 x1 r2 + 1 x1 + 1/r2

Now assume that the statement is true for some n ∈ N. Then, in light of The-
orem 1.1.1 (c), we obtain that

p n r n+1 + p n−1 p n (x n+1 + 1/r n+2 ) + p n−1


x= =
q n r n+1 + q n−1 q n (x n+1 + 1/r n+2 ) + q n−1
p n x n+1 r n+2 + p n + p n−1 r n+2
=
q n x n+1 r n+2 + q n + q n−1 r n+2
p n+1 r n+2 + p n
= .
q n+1 r n+2 + q n
1.1 Continued fractions and Diophantine approximation | 5

This theorem allows us to calculate the distance between a real number x =


[x1 , x2 , x3 , . . .] and any of its convergents.

Corollary 1.1.6. For x = [x1 , x2 , x3 , . . .] ∈ (0, 1] and n ∈ N, we have that


 
 
x − p n  = 1
<
1

1
.
 q n  q2n (r n+1 + s n ) x n+1 q2n q2n

Proof. Using Theorem 1.1.5 and Theorem 1.1.1 (c), we infer that
   
   
x − p n  =  p n r n+1 + p n−1 − p n 
 q n   q n r n+1 + q n−1 q n 
 
 p n q n r n+1 + p n−1 q n − p n q n r n+1 − p n q n−1 
=  
(q n r n+1 + q n−1 )q n 
 
 qn p − p n q n−1  1
=  2n−1 = .
q (r n + s n )  q2 (r
n+1 n + sn )
n+1

The first inequality follows since x n+1 < r n+1 , the second since x n+1 ≥ 1.

We end this section with the important result that every real number admits a con-
tinued fraction expansion. The basis of the proof is simply the Euclidean algorithm,
the algorithm for finding the greatest common divisor of two integers. Before stating
the theorem, recall that for each positive real number x the notation x denotes the
greatest integer not exceeding x and {x} denotes the fractional part of x, that is,
x = x + { x }.

Theorem 1.1.7. To every real number x ∈ (0, 1] there corresponds a continued fraction
with value equal to x. This continued fraction is infinite if and only if x is irrational.
Moreover, every irrational number has a unique continued fraction expansion.

Proof. If x = 1, then the continued fraction [1] := 1/1 is the one sought. So, suppose
that x ∈ (0, 1). Then set r1 := 1/x and define x1 := r1 , so that

1
x= .
x1 + {r1 }

If r1 is an integer, we are finished. Otherwise, if {r1 } ∈ (0, 1), then we set r2 := 1/{r1 },
so that
1
x= .
x1 + 1/r2
Suppose that the numbers r1 , r2 , . . . , r n have been defined and, if r n is not an integer,
let x n := r n and set r n+1 := 1/{r n }. Then we obtain the relation

1
rn = xn + .
r n+1
6 | 1 Number-theoretical dynamical systems

If the number x happens to be a rational number, then each r n as defined above will
also be rational. In this case, the process must stop after a finite number of steps.
Indeed, if r n = a/b and r n is not already an integer, then

1 1 b
r n+1 = = = .
r n − x n a/b − x n a − x n b

Therefore, r n+1 has a smaller denominator than r n and it follows that if we consider
the sequence r1 , r2 , r3 , . . ., we must eventually come to an integer. If that integer is
r k , then the number x is represented by the finite continued fraction [x1 , x2 , . . . , x k ],
where x k := r k > 1 (if r k = 1, then r k−1 must also be an integer, so we replace the two
final terms x k−1 and 1 by the single integer x k−1 + 1).
If x is irrational, then each r n must also be irrational and the above-described
process will not terminate. Then, by the definition of x k and Proposition 1.1.3, where
p n /q n := [x1 , x2 , . . . , x n ] as before, we have for each n ≥ 1 that
p2n p
< x < 2n+1 .
q2n q2n+1

Thus, it follows by Proposition 1.1.3 (d) that


pn
lim = x.
n→∞ qn

This means that the continued fraction [x1 , x2 , x3 , . . .] has as its value the given
irrational number x.
It only remains to show that each infinite expansion is unique (recall that the
finite expansions are not unique, since the value of [x1 , . . . , x n ] is equal to the value of
[x1 , . . . , x n − 1, 1], for x n ≥ 2). So, fix

x := [x1 , x2 , x3 , . . .] and x := [x1 , x2 , x3 , . . .],

and suppose that n + 1 := min{k ∈ N : x k  = xk }. Since the first n approximants of x


and x must coincide, but the remainders r n+1 of x and rn+1 of x differ, it follows from
Theorem 1.1.5 and the strict monotonicity of r → (p n r + p n−1 )/(q n r + q n−1 ) that

p n r n+1 + p n−1 p n rn+1 + p n−1


x=  = = x .
q n r n+1 + q n−1 q n rn+1 + q n−1

This proves uniqueness of the continued fraction expansion for all irrational numbers
from the unit interval.
1.1 Continued fractions and Diophantine approximation | 7

1.1.2 Elementary Diophantine approximation: Hurwitz’s Theorem and badly


approximable numbers

Let us now turn to some number-theoretic applications of continued fractions. One


useful property of the continued fraction expansion of an irrational number is that
it allows us to approximate the value of this number by rational numbers to within
any desired degree of accuracy. These approximations are given by the convergents
(this is why the convergents are sometimes also referred to as approximants). The
larger n ∈ N is chosen, the closer the rational number p n /q n comes in value to x =
[x1 , x2 , x3 , . . .]. The results presented below all concern the closeness, in absolute
value, of the convergents to the irrational number they are approximating. These sorts
of questions fall into the area of mathematics known as Diophantine approximation.
This is a vast and extremely active research area with many interesting and deep open
problems.

Theorem 1.1.8. For all irrational numbers x = [x1 , x2 , x3 , . . .] and for all n ∈ N, we have
that the inequality
 
 
x − p i  < 1
 q i  2q2i

is fulfilled for at least one element i ∈ {n, n + 1}.

Proof. By way of contradiction, suppose that the statement in the theorem is false.
This means that there exists some n ∈ N such that the inequality
 
 
x − p i  ≥ 1
 q i  2q2i

holds simultaneously for i = n and i = n + 1. Since, in light of Corollary 1.1.6, we have


that |x − p i /q i | = (q2i (r i+1 + s i ))−1 , this is equivalent to

r i+1 + s i ≤ 2, for i = n, n + 1.

Let us consider each case separately.


(i) For i = n, we can rewrite this as 2 ≥ r n+1 + s n = x n+1 + 1/r n+2 + s n and hence,
1 1
≤ 2 − (x n+1 + s n ) = 2 − .
r n+2 s n+1

(ii) For i = n + 1, we obtain that

r n+2 ≤ 2 − s n+1 .
8 | 1 Number-theoretical dynamical systems

Combining (i) and (ii), we infer that 1 ≤ 4 − 2s n+1 − 2s−1


n+1 + 1. It therefore follows that
0 ≤ 2 − s n+1 − s−1
n+1 , which finally implies that

0 ≥ (s n+1 − 1)2 .

Since for each n ∈ N we have that s n+1 = 1, this contradiction finishes the proof.

Theorem 1.1.9 (Hurwitz’s Theorem I). For all irrational numbers x = [x1 , x2 , x3 , . . .] and
for all n ∈ N, we have that the inequality
 
 
x − p i  < √ 1
 qi  5 q2i

is fulfilled for at least one element i ∈ {n, n + 1, n + 2}.



Proof. As in the proof of the previous theorem (with 2 replaced by 5), assume by
way of contradiction that for each i ∈ {n, n + 1, n + 2} we have that

r i+1 + s i ≤ 5.

Proceeding for i = n and i = n + 1 as in (i) and (ii) in the previous proof, we derive the
inequality

s2n+1 − 5s n+1 + 1 ≤ 0. (1.4)

Analogously, by considering in turn i = n + 1 and i = n + 2, we deduce that



s 2n+2 − 5s n+2 + 1 ≤ 0. (1.5)

By the quadratic formula, (1.4) and (1.5) yield, where γ := ( 5 + 1)/2 and γ * :=

( 5 − 1)/2,

γ * < s i < γ , for i = n + 1, n + 2. (1.6)

The strict inequality follows from the fact that γ and γ * are both irrational and s n is
rational. Using this, we obtain that
1 1 1
s n+2 = ≤ < = γ* ,
x n+2 + s n+1 1 + s n+1 1 + γ *

which contradicts (1.6). This finishes the proof.



Remark 1.1.10. In this context, the number 1/ 5 is called the Hurwitz number.
Sometimes this number is called the Hurwitz constant for the golden mean, and thus
there are many others in the sense of Definition 1.1.13 (b).

The number γ := ( 5 + 1)/2 which appears in the above proof is known as the golden
mean (or, sometimes, the golden ratio). There are a great many commonly-held beliefs
1.1 Continued fractions and Diophantine approximation | 9

about the golden mean and why it is supposed to be so interesting and/or important;
unfortunately, a lot of these are simply wrong (see [Mar92]). However, the continued
fraction expansion of γ does give a clue as to why this particular number turns up so
often. Observe that γ is one of the two roots of the equation x2 − x − 1 = 0. Writing this
another way, we have that
1
γ = 1+ .
γ

Then, substituting for the γ which appears on the right-hand side of the above
equality, we obtain that
1
γ = 1+ .
1
1+
γ

Clearly this process can be repeated infinitely often, to yield the continued fraction
expansion γ = 1 + [1, 1, 1, . . .], where the fractional part consists of infinitely many
ones. The other number γ * appearing in the proof of Theorem 1.1.9 is simply equal to
γ − 1 = [1, 1, 1, . . .]. Since we are mostly concerned in this book with numbers in the
unit interval, we shall, in a slight abuse of terminology, also refer to this number as
the golden mean.

The next theorem shows that the constant 1/ 5 that appears in Theorem 1.1.9
cannot be improved for arbitrary irrational numbers. The proof again relies upon the
golden mean.

Theorem 1.1.11 (Hurwitz’s Theorem II). For the golden mean γ * we have that the in-
equality
 
 * pn  C
γ −  ≤
 q n  q2 n

is satisfied for at most finitely many convergents p n /q n if and only if C < 1/ 5.

Proof. Firstly, note that since γ * = [1, 1, 1, . . .], we have for the remainders that r n :=
x n + [x n+1 , x n+2 , . . .] is in this case given by r n = 1 + [1, 1, 1, . . .] = 1/γ * , for all n ∈ N.
Secondly, note that

s n = [x n , x n−1 , . . . , x1 ] = γ * + ([x n , . . . , x1 ] − [x n+1 , x n+2 , . . .]) = γ * + δ n ,

where for δ n we have that limn→∞ δ n = 0. Hence, from these two observations it
follows that
√ √
1 5+1 5−1 √
r n+1 + s n = + γ* + δn = + + δn = 5 + δn .
γ* 2 2
10 | 1 Number-theoretical dynamical systems

Now, if C < √1 is given, say C = √1 for some fixed ρ > 0, then


5 5+ρ
 
 
γ − p n  = 1
= √
1
≤ √
1
,
 q n  q2n (r n+1 + s n ) q2n ( 5 + δ n ) q2n ( 5 + ρ)

where the last inequality can only be fulfilled for finitely many n, due to the fact that
√ √
5 + ρ < 5 + δ n can be satisfied for at most finitely many n.

Corollary 1.1.12. For each irrational number x ∈ (0, 1], the inequality
 
 
x − p n  ≤ K
 q n  q2n

is fulfilled for infinitely many reduced p n /q n as long as K ≥ 1/ 5.

We now want to investigate some further results in the vein of Hurwitz’s Theorems
and Corollary 1.1.12 given above. Before that, it will be useful to make some further
definitions.

Definition 1.1.13.
(a) Let c denote a fixed positive real number. An irrational number x is said to be
c-approximable if and only if the inequality
 
 
x − p  < c
 q  q2

is satisfied for infinitely many reduced p/q.


(b) To each irrational number x we associate a non-negative real number ν (x),
defined by

ν (x) := inf {c > 0 : x is c-approximable},

and called the Hurwitz constant for x.


(c) Two irrational numbers x and y are called equivalent if and only if there exist k,  ∈
N such that the k-th remainder of x is equal to the -th remainder of y. In other
words, the continued fraction expansions of x and y are the same if we discard a
finite initial block from each.
(d) Irrational numbers equivalent to the golden mean are called noble numbers.

So, for every irrational number x it follows directly from Theorem 1.1.9 (Hurwitz’s

Theorem I) that ν (x) ≤ 1/ 5. From Hurwitz’s Theorem II and the fact that if x and y are
equivalent then ν (x) = ν (y) (see Exercise 1.6.6), it follows that if an irrational number

x is noble, then ν (x) = 1/ 5.
1.1 Continued fractions and Diophantine approximation | 11

Theorem 1.1.14. Let N be some fixed positive integer. If x = [x1 , x2 , x3 , . . .] is irrational


such that for some n ∈ N we have that the inequality
 
 
x − p i  > √ 1
 q i  q2 N 2 + 4
i

is fulfilled for all i ∈ {n, n + 1, n + 2}, then it follows that x n+2 < N.

Proof. We proceed as in the proofs of Theorems 1.1.8 and 1.1.9, with 2 and 5,

respectively, now replaced by N 2 + 4. In this way, considering in sequence i = n and
i = n + 1, we derive

s2n+1 − N 2 + 4 s n+1 + 1 < 0.

Also, by considering i = n + 1 and i = n + 2, we similarly derive the inequality

s 2n+2 − N 2 + 4 s n+2 + 1 < 0.

Then, using the quadratic formula, we obtain that


√ √
N2 + 4 − N N2 + 4 + N
< s i and s−1
i < , for i = n + 1, n + 2.
2 2
Using this, we then have that

x n+2 = s n+1 + x n+2 − s n+1 = s−1


n+2 − s n+1
√ √
N2 + 4 + N N2 + 4 − N
< − = N.
2 2
It follows on combining the above theorem with the second theorem of Hurwitz that if
√ √
ν (x) = 1/ 5, then for all large enough n, we have x n = 1. That is, if ν (x) = 1/ 5, then
x is a noble number. In light of the discussion immediately preceding Theorem 1.1.14,

we have that ν (x) = 1/ 5 if and only if x is a noble number.
Let us now consider another interesting class of numbers, namely, all those with
bounded continued fraction elements.

Definition 1.1.15. For each N ∈ N, define

BN := {[x 1 , x2 , . . .] ∈ [0, 1] \ Q : ∃ n 0 ∈ N such that x n < N for all n ≥ n 0 }.

The set of badly approximable numbers B is then defined to be

B := BN = {x ∈ [0, 1] \ Q : there exists N ∈ N such that x ∈ BN }.


N>0

In other words, an irrational number x belongs to the set BN if and only if it is


equivalent to some y = [y1 , y2 , . . .] such that y i < N for all i ∈ N. The following
proposition clarifies why the elements in B are called “badly approximable”.
12 | 1 Number-theoretical dynamical systems

Proposition 1.1.16.
(a) Fix N ∈ N. If x is an irrational number in [0, 1] such that x ∈/ BN , then
 
 
x − p n  ≤ √ 1
 q n  q2n N 2 + 4

is fulfilled for infinitely many n ∈ N. In other words, we have that ν (x) ≤ 1/ N 2 + 4.
(b) For each x ∈ B there exists a constant C > 0 such that for all n ∈ N we have
 
 
x − p n  > C .
 q n  q2n

Proof. The first part is an immediate consequence of Theorem 1.1.14. For the second
part, fix x = [x1 , x2 , x3 , . . .] ∈ B. Then there exist numbers M and m0 such that x n < M
for all n ≥ m0 . Using this, for such an n we derive the inequality

r n+1 + s n = x n+1 + [x n+2 , x n+3 , . . .] + [x n , . . . , x1 ] < M + 1 + 1 = M + 2

and hence
 
 
x − p n  > 1
, for all n ≥ m0 .
 q n  q2n (M + 2)

For each of the finitely many n < m0 we have that there exists a number c n > 0 such
that
 
 
x − p n  > c n .
 q n  q2n

If we then define C := min{1/(M + 2), c0 , c1 , . . . , c m0 −1 }, the result follows.


Notice that in the above proof, it can be seen that the constant C depends on the value
of N for which x ∈ BN . In that sense, one might say that the noble numbers are the
“worst approximable” numbers.

1.2 Topological Dynamical Systems

Throughout this book, a (topological) dynamical system (X, T) is a continuous map


T : X → X of a non-empty metric space X. The first objective in the study of a dynamical
system is the consideration of the orbits of points of X. The (forward) orbit O(x) is
the set

O(x) := {T n (x) : n ≥ 0}.

For certain types of point, which we define below, the orbit is very easy to determine.
1.2 Topological Dynamical Systems | 13

Definition 1.2.1. Let T : X → X be a dynamical system.


(a) If T(x) = x, then x is called a fixed point for T.
(b) A point x ∈ X is said to be periodic for T if T n (x) = x for some n ≥ 1. Then n is called
a period of x. The smallest period of a periodic point x is called the prime period of
x. Note that the fixed points for T are the periodic points with prime period n = 1.
(c) A point x ∈ X is said to be pre-periodic for T, if T k (x) is a periodic point for some
k ≥ 1. In other words, x is pre-periodic if T k+n (x) = T k (x) for some n ≥ 1.

Suppose that we are given two dynamical systems (X, T) and (Y , S). It is desirable to
have conditions under which these two systems should be considered dynamically
equivalent, that is, as in some dynamical sense, “the same”. The sense we are after is
that their orbits should behave in the same way. The following definition does exactly
this job.

Definition 1.2.2. Two dynamical systems (X, T) and (Y , S) are said to be topologically
conjugate if there exists a homeomorphism h : X → Y, called a conjugacy map, such
that

h ◦ T = S ◦ h.

In other words, (X, T) and (Y , S) are topologically conjugate if there exists a homeo-
morphism h such that the following diagram commutes:

T
X X

h h

Y Y
S

Remark 1.2.3.
1. Topological conjugacy defines an equivalence relation on the space of all topolo-
gical dynamical systems.
2. If two dynamical systems (X, T) and (Y , S) are topologically conjugate via a
conjugacy map h, then all of their corresponding iterates are topologically
conjugate by means of h. That is, h ◦ T n = S n ◦ h for all n ≥ 1. Therefore, there
exists a one-to-one correspondence between the orbits of T and those of S.

Suppose that we are given two topologically conjugate dynamical systems, (X, T) and
(Y , S). If T(x) = x, it follows that

h(x) = h ◦ T(x) = S(h(x))

and so h(x) is a fixed point of the map S. Thus, there is a one-to-one correspondence
between the fixed points of T and the fixed points of S. In particular, if the number of
14 | 1 Number-theoretical dynamical systems

fixed points of T and S are not equal, the systems cannot be topologically conjugate.
The number of fixed points is an example of a topological conjugacy invariant. The
number of periodic points of each prime period is similarly seen to be a topological
conjugacy invariant.

Example 1.2.4.
(a) Consider the two maps of the unit circle T2 : R/Z → R/Z and T3 : R/Z → R/Z,
defined by setting

T 2 (x) := 2x (mod 1) and T3 (x) := 3x (mod 1).

The map T2 has a single fixed point in 0 and the set of fixed points of the map T3 is
{0, 1/2}. Therefore, the number of fixed points is not the same and so, the sytems
(R/Z, T2 ) and (R/Z, T3 ) cannot be topologically conjugate.

(b) Let f : [0, 1] → [0, 1] be given by f (x) := x and g : [0, 1] → [0, 1] be given by
g(x) := 3x(1 − x). Then the set of fixed points of f is {0, 1}, whereas the set of fixed
points of g is {0, 2/3}. However, although they have the same number of fixed
points, there is no topological conjugacy map between ([0, 1], f ) and ([0, 1], g).
This can be seen by noting that every homeomorphism h : [0, 1] → [0, 1] is either
strictly increasing or strictly decreasing. In order to have h be a conjugating
homeomorphism between f and g, we would have to have h(0) := 0 and h(1) := 2/3
or vice versa. But this is simply not possible for a strictly monotonic function that
also has to map [0, 1] onto [0, 1].

In the next definition, we give a weaker notion than that of topological conjugacy.

Definition 1.2.5. Let (X, T) and (Y , S) be two dynamical systems. If there exists a
continuous surjection h : X → Y which satisfies h ◦ T = S ◦ h, then S is called a
(topological) factor of T. The map h is thereafter called a factor map.

In general, the existence of a factor map between two systems is not sufficient to make
them topologically conjugate. Nonetheless, if (Y , S) is a factor of (X, T), then every
orbit of T is projected to an orbit of S. As every factor map is by definition surjective,
this means that all of the orbits of S have an analogue in T. However, as a factor map
may not be injective, more than one orbit of T may be projected to the same orbit of S.
In other words, some orbits of S may have more than one analogue in T. Therefore, the
dynamical system (X, T) can in this sense be thought of as being more “complicated”
than the factor (Y , S).
In the following subsection, we introduce the first of the examples of
number-theoretic dynamical systems that will be used to illustrate various concepts
throughout the book. We will soon see that this map is related to the continued fraction
expansion introduced in Section 1.1.1.
1.2 Topological Dynamical Systems | 15

1.2.1 The Gauss map

Definition 1.2.6. Let G : [0, 1] → [0, 1] be defined by


⎧  
⎨1 − 1 for 0 < x ≤ 1;
G(x) := x x

0 for x = 0.

The map G is referred to as the Gauss map and its graph is shown in Fig. 1.1.

Let us also define here the inverse branches G n : (0, 1) → (1/(n + 1), 1/n) of the Gauss
map. These are given, for each n ∈ N, by
1
G n (x) := .
x+n
In the following proposition, we shall show how the map G acts on points in the unit
interval written in terms of their continued fraction expansion.

Proposition 1.2.7. If x = [x1 , x2 , x3 , . . .] ∈ [0, 1], where the continued fraction expansion
of x is either infinite or finite and consists of at least two elements, then G(x) =
[x2 , x3 , x4 . . .]. Moreover, if x = [x1 ], then G(x) = 0.

Proof. If x = [x1 , x2 , x3 , . . .], then directly from the definition of G, we have that
 
1 1 1
G(x) = − = x1 + − x1 = [x2 , x3 , . . .].
x x 1
x2 +
x3 + . . .

If x = [x1 ] = 1/x1 , then G(x) = x1 − x1 = 0.

G(x)
1

0 1 1 x
2

 
1 1
Fig. 1.1. The Gauss map G : [0, 1] → [0, 1], G(x) := − for x ∈ (0, 1] and G(0) := 0.
x x
16 | 1 Number-theoretical dynamical systems

Using the latter proposition, we can very easily identify the fixed points and periodic
points for the map G. First of all, by definition, G has a fixed point at 0. There
are countably many more fixed points, given by the points x = [n, n, n, . . .], for
n ∈ N. For n = 1, this is the golden mean, γ * . For n = 2, we have the fixed point

2 − 1 = [2, 2, 2, . . .]. The periodic points for G are simply the periodic continued
fractions, that is, points with continued fraction expansions of the form [x1 , . . . , x k ] :=
[x1 , . . . , x k , x1 , . . . , x k , x1 , . . .], with the block x1 , . . . , x k repeating infinitely many
times. It remains to describe the pre-periodic points. The pre-periodic points of
the trivial fixed point 0 are the rational numbers. The pre-periodic points for all
other fixed points are numbers of the form x = [x1 , . . . , x k , n, n, n, . . .], that is, with
finitely many elements that can take any value, before infinitely many of the same
element n, for some n ∈ N. In particular, the pre-periodic points for the fixed point
γ * are the set of noble numbers (cf. Definition 1.1.13 (d)). Finally, the pre-periodic
points for the remaining periodic points are those points in the unit interval with
eventually periodic continued fraction expansions, that is, numbers of the form x =
[x1 , . . . , x m , x m+1 , . . . , x m+k ]. These eventually periodic expansions are the subject of
the following theorem. Before stating the theorem, recall that a quadratic surd is an
irrational root of a quadratic equation with integer coefficients. These are also called
algebraic numbers of degree two.

Theorem 1.2.8 (Lagrange’s Theorem). Every quadratic surd has an eventually periodic
continued fraction expansion; conversely, every eventually periodic continued fraction
represents a quadratic surd.

Proof. See, for instance, Theorem 28 in Khintchine [Khi64].


From this point on, we are most of the time no longer interested in the rational
numbers, as they are simply the countable set of pre-periodic points for the trivial
fixed point 0. In other words, from here on all continued fraction expansions are
assumed to be infinite. We will denote the irrational points in the unit interval by
I := [0, 1] \ Q.

1.2.2 Symbolic dynamics

Recall Proposition 1.2.7, where we showed how the Gauss map acts on the continued
fraction expansions. In this section, we will introduce some more general theory
which will illuminate the idea behind this proposition, namely, the beginnings
of the theory of symbolic dynamics and its connections to topological dynamical
systems which admit a Markov partition (see Section 1.2.5 for the definition of
Markov partitions in the context of interval maps). We will only provide a very
short introduction, and for more details we refer, for example, to Lind and Marcus
[LM95].
1.2 Topological Dynamical Systems | 17

Definition 1.2.9.
(a) Let E be a countable, possibly infinite set containing at least two elements. The
set E will be referred to as an alphabet. The elements of E will be called letters or
symbols.
(b) For each n ∈ N we shall denote by E n the set of all words comprising n letters from
the alphabet E. For later convenience, we also denote the empty word (that is, the
word having no letters) by ε. For instance, if E = {0, 1} then

E1 = E and E2 = {(0, 0), (0, 1), (1, 0), (1, 1)}, whereas
E3 = {(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1)}.

(c) We will denote by E* := n∈N E n the set of all finite non-empty words over the
alphabet E. The set of all infinite words will be denoted by EN . In other words,
 
EN := ω = (ω i )∞
i=1 : ω i ∈ E for all i ∈ N .

(d) We define the length |ω| of a word ω to be the number of letters of which it consists.
That is, for every ω ∈ E* , the length of ω is the unique n ∈ N such that ω ∈ E n . For
ω ∈ EN , we have that |ω| = ∞. Furthermore, |ε| = 0.
(e) If ω ∈ E* ∪ EN and n ∈ N does not exceed the length of ω, we define the initial block
ω|n to be the initial n-length word of ω, that is, the subword ω1 ω2 . . . ω n .
(f) Given two words ω, τ ∈ E* ∪ EN , we define their wedge ω ∧ τ ∈ {ε} ∪ E* ∪ EN to be
their longest common initial block. For example, if we again let E = {0, 1} and we
have words ω = (0, 0, 1, 0, 1, . . .) and τ = (0, 0, 1, 1, 0, . . .), then ω ∧ τ = (0, 0, 1).
On the other hand, if γ = (1, 0, 1, 0, 1, . . .) then ω ∧ γ = ε. Of course, if two (finite
or infinite) words ω and τ are equal, then ω ∧ τ = ω = τ.

Let us now introduce a metric on the space EN which reflects the idea that two words
are close if they share a long initial block. In other words, the longer their common
initial subword, the closer two words are. We leave the proof that this genuinely
defines a metric, in fact an ultrametric, as an exercise (see Exercise 1.6.10).

Definition 1.2.10. Let the metric d : EN × EN → [0, 1] be defined by d(ω, τ) := 2−|ω∧τ| .

If ω and τ have no common initial block, then ω ∧ τ = ε. Thus, |ω ∧ τ| = 0 and d(ω, τ) =


1. On the other hand, if the two infinite words ω and τ are such that ω = τ, then |ω ∧ τ| =
∞ and we define (1/2)+∞ := 0.

Definition 1.2.11. Given a finite word ω ∈ E* , the cylinder set [ω] generated by ω is the
set of all infinite words with initial block ω, that is,

[ω] := {τ ∈ EN : τ||ω| = ω} = {τ ∈ EN : τ i = ω i for all 1 ≤ i ≤ |ω|}.

We now introduce the shift map, which is defined by dropping the first letter of each
word and shifting all the remaining letters one place to the left.
18 | 1 Number-theoretical dynamical systems

Definition 1.2.12. The shift map σ : EN → EN is defined by setting σ(ω) = σ((ω i )i≥1 ) :=
(ω i+1 )i≥1 . That is,

σ((ω1 , ω2 , ω3 , . . .)) := (ω2 , ω3 , ω4 , . . .)

The shift map is # E-to-one on EN , where # E denotes the cardinality of E. In other


words, each infinite word has # E preimages under the shift map. In particular, if E is
countably infinite, it follows that σ is countable-to-one. Indeed, given any letter e ∈ E
and any infinite word ω ∈ EN , the concatenation eω = (e, ω1 , ω2 , ω3 , . . .) of e with ω
is a preimage of ω under the shift map, since σ(eω) = ω.
Note that the shift map is continuous, since two words that are close share a long
initial block, and thus their images under the shift map, which result from dropping
their first letters, will also share a long initial block. More precisely, whenever d(ω, τ) <
1, that is, whenever |ω ∧ τ| ≥ 1, we have that

d(σω, στ) = 2−|σω∧ στ| = 2−|ω∧τ|+1 = 2 · 2−|ω∧τ| = 2d(ω, τ).

It is evident that if ω = τ, then both d(ω, τ) and d(σ(ω), σ(τ)) are equal to zero.
The system (EN , σ) will be referred to as the full shift system. Let us now consider
another useful construction, that of sub-shifts. These are restrictions of the shift map
to certain closed and shift-invariant subsets of EN and are easiest to define in terms of
an incidence matrix, that is, an (E × E)-matrix consisting entirely of 0s and 1s.
 
Definition 1.2.13. Let A = A ij i,j∈E be an incidence matrix. The set of all infinite
A-admissible words is defined to be

EN N
A := { ω ∈ E : A ω n ω n+1 = 1, for all n ∈ N} .

Note that if the n-th row of A does not contain any 1, then no word can contain the
letter n. This letter then may as well be thrown out of the alphabet. We will therefore
impose the condition that every row of A contains at least one 1. Further, notice that
if all the entries of the incidence matrix A are equal to 1, then EN N
A = E . However, if A
N N
has at least one 0 entry, then E A is a proper subset of E , called a sub-shift (of finite
type). In particular, if A is the identity matrix then EN
A = {(e, e, e, . . .) : e ∈ E }, that is,
ENA is the set of all constant words, which are the fixed points of σ in EN . We will now
consider a more interesting sub-shift.

Example 1.2.14. Let E = {0, 1} and let


 
1 1
A= .
1 0

Then EN A consists of all those points where a 0 can be followed by either a 0 or a


1, but a 1 can only be followed by a 0. This example is known as the golden mean
1.2 Topological Dynamical Systems | 19

shift. One reason for this name can be discovered on considering the cardinality of
the sets E nA , for n ∈ N. We have (and the reader is advised to draw a diagram of
the A-admissible words to see why) that the sequence (# E nA )n≥1 coincides with the
sequence (2, 3, 5, 8, 13, 21, 34, . . .). This latter sequence is, of course, the sequence
of Fibonacci numbers starting with f 2 = 2 and f3 = 3 instead of f0 = 1 and f1 = 1.

Observe that we have for the sequence of convergents (p n /q n )n≥1 to γ = (1 + 5)/2
that p n /q n = f n+2 /f n+1 for each n ∈ N.

In order to define the shift map on a sub-shift space EN A , we must first verify that these
spaces are σ-invariant, which means that σ(EN A ) ⊆ E N N
A So, let ω ∈ E A . Then A ω n ω n+1 = 1
.
for every n ∈ N. In particular, A(σω)n (σω)n+1 = A ω n+1 ω n+2 = 1 for all n ∈ N. Thus, σω ∈ EN A
and it therefore follows that ENA is σ-invariant. Therefore, the restriction σ : E N
A → E N
A is
well defined.

1.2.3 A return to the Gauss map

Let us now return to the Gauss map. In light of the above description of symbolic
dynamics and the shift map, we can now rephrase Proposition 1.2.7 in the following
way: The Gauss map acts on the irrational points of the unit interval like the full shift
map acts on the space NN . More precisely, we have that the topological dynamical
systems (I, G) and (NN , σ) are topologically conjugate under the map h : I → NN defined
by h([x1 , x2 , x3 , . . .]) = (x1 , x2 , x3 , . . .) ∈ NN . In other words, we obtain the following
commuting diagram:

h h

We leave it as an exercise for the reader to verify that the map h is genuinely a
conjugacy map between the Gauss and shift systems (see Exercise 1.6.12).
Recall the sequence of convergents to each irrational number x ∈ [0, 1] introduced
in Subsection 1.1.1; these are the equivalent of initial blocks for the Gauss map and so
we can use them to define Gauss cylinder sets.

Definition 1.2.15. For each choice x1 , . . . , x k ∈ N, define the k-th level Gauss cylinder
set C(x1 , . . . , x k ) by

C(x1 , . . . , x k ) := {[y1 , y2 , . . .] : y i = x i for 1 ≤ i ≤ k}.


20 | 1 Number-theoretical dynamical systems

The first level of these cylinder sets are given by the sets, for n ∈ N,

C(n) = {[n, x2 , x3 , . . .] : x i ∈ N, i ≥ 2}.

For fixed n ∈ N, any point x ∈ C(n) is of the form


1
x= ,
n + [y1 , y2 , . . .]

for some y1 , y2 , . . . ∈ N. Therefore, since the values of [y1 , y2 , . . .] lie between 0 and 1,
we see that such a point is bigger than 1/(n +1) and smaller than 1/n. Therefore, C(n) =
(1/(n + 1), 1/n) ∩ I. In general, if we fix some x1 , . . . , x k ∈ N and let x ∈ C(x1 , . . . , x k ),
we have that

1
x=
1
x1 +
.. 1
.+
x k + [y1 , y2 , . . .]

and again, since the values of [y1 , y2 , . . .] range from 0 to 1, we infer that x can
take any value in the set ([x1 , . . . , x k ], [x1 , . . . , (x k + 1)])± ∩ I, where the notation (·, ·)±
indicates that the rational number [x1 , . . . , x k ] may be the left or right endpoint of
C(x1 , . . . , x k ) depending upon whether k is even or odd, respectively. Thus, for the
Lebesgue measure λ(C(x1 , . . . , x k )) of this set, we find that
 
p p + p k−1  1
λ(C(x1 , . . . , x k )) =  k − k =  .
q k q k + q k−1  q
q2k 1 + k−1
qk

Here, the last equality comes from Theorem 1.1.1 (c). From this, we immediately obtain
that
1 1
≤ λ(C(x1 , . . . , x k )) ≤ 2 . (1.7)
2q2k qk

It is of interest to calculate the approximate proportion of the k-th level cylin-


der set C(x1 , . . . , x k ) that is occupied by each of the (k + 1)-th level cylinder sets
C(x1 , . . . , x k , n). Notice that the endpoints of the interval C(x1 , . . . , x k , n) are given
by [x1 , . . . , x k , n] and [x1 , . . . , x k , n + 1] and, with the aid of Theorem 1.1.1, we can
write these in terms of the rational numbers p k−1 /q k−1 = [x1 , . . . , x k−1 ] and p k /q k =
[x1 , . . . , x k ] as follows:

np k + p k−1 (n + 1)p k + p k−1


[x1 , . . . , x k , n] = and [x1 , . . . , x k , n + 1] = .
nq k + q k−1 (n + 1)q k + q k−1
1.2 Topological Dynamical Systems | 21

Utilising Theorem 1.1.1 (c) once more, we obtain that


 
 np + p k−1 (n + 1)p k + p k−1 
λ(C(x1 , . . . , x k , n)) =  k − 
nq k + q k−1 (n + 1)q k + q k−1 
 
 n(p k q k−1 − q k p k−1 ) + (n + 1)(p k−1 q k − p k q k−1 ) 
=  
(nq k + q k−1 )((n + 1)q k + q k−1 ) 
1
=   
q k−1 1 q
n2 q2k 1 + 1 + + k−1
nq k n nq k

Thus, it follows that


q
1 + k−1
λ(C(x1 , . . . , x k , n)) 1 q
= 2·  k .
λ(C(x1 , . . . , x k )) n q k−1 1 q
1+ 1 + + k−1
nq k n nq k

One easily verifies that the second ratio on the right-hand side of the above equality is
bounded above by 2 and below by 1/3, so finally we find that

1 λ(C(x1 , . . . , x k , n)) 2
≤ ≤ 2. (1.8)
3n2 λ(C(x1 , . . . , x k )) n

1.2.4 Elementary metrical Diophantine analysis

Let us now consider another area of elementary number theory related to the con-
tinued fraction expansion. In this subsection, we will prove some results concerning
the Lebesgue measure λ of various sets of numbers, beginning with the set of badly
approximable numbers that was introduced in Definition 1.1.15. The proofs will utilise
the Gauss cylinder sets defined above.

Theorem 1.2.16. Where B denotes the set of badly approximable numbers, we have that

λ(B) = 0.

Proof. For each N and n ∈ N, define the sets

A(n)
N := {[x 1 , x 2 , . . .] ∈ I : x i < N for all 1 ≤ i ≤ n } .

We aim to show that


 
lim λ A(n)
N = 0.
n→∞
22 | 1 Number-theoretical dynamical systems

To begin, note that A(n+1)


N ⊂ A(n)
N for each n ∈ N and

A(n+1)
N = C(x1 , . . . , x n , k).
(x1 ,...,x n ) k:k<N
x i <N,1≤i≤n

Then,
   
 p n + p n−1 p n N + p n−1 
λ C(x1 , . . . , x n , k) =  − 
q n + q n−1 q n N+q n−1 
k:k<N
N −1
=
q2n (1 + s n )(N + s n )
N −1 N −1
< 2 = λ(C(x1 , . . . , x n )).
q n N(1 + s n ) N

Thus,
⎛ ⎞
 
⎜ ⎟
λ A(n+1)
N = λ⎝ C(x1 , . . . , x n , k)⎠
(x1 ,...,x n ) k:k<N
x i <N,1≤i≤n
 

= λ C(x1 , . . . , x n , k)
(x1 ,...,x n ) k:k<N
x i <N,1≤i≤n

 N −1    
1
≤ λ(C(x1 , . . . , x n )) = 1 − λ A(n)
N
(x ,...,x )
N N
1 n
x i <N,1≤i≤n

Applying this estimate n times, we arrive at


   1
   
1
n  
λ A(n+1)
N ≤ 1 − λ A (n)
N ≤ · · · ≤ 1 − λ A(1)
N .
N N
 
Therefore, limn→∞ λ A(n)
N = 0. Now, if we define the set AN by setting

AN := {[x 1 , x2 , . . .] ∈ I : x i < N for all i ∈ N}

and notice that AN ⊂ A(n)


N for all n ∈ N, it follows that for all N ∈ N,

λ (AN ) = 0.

Finally, observing that B = N∈N AN , we have
 

λ (B ) = λ AN ≤ λ (AN ) = 0.
N∈N N∈N
1.2 Topological Dynamical Systems | 23

 
Corollary 1.2.17. Let W := [x1 , x2 , . . .] ∈ I : lim supn→∞ x n = ∞ . Then,

λ(W ) = 1.

Proof. The set W is simply the complement of the set B of badly approximable
numbers.
We have now seen that the set of badly approximable numbers does not contribute
to sets of irrational numbers of positive Lebesgue measure. Hence, if we want to
investigate sets of positive measure, then we have to look for irrationals which
are more rapidly approximated by their approximants than is the case for badly
approximable irrationals. Our next aim is to prove a theorem, originally due to Borel
and Bernstein¹, which will give us some information in this direction. In order to prove
this theorem, we will need the following well-known and extremely useful result. The
proof is neither long nor complicated, so we include it here for completeness.

Lemma 1.2.18 (Borel–Cantelli Lemma). Let (C n )n≥1 be a sequence of Borel-measurable


subsets of [0, 1] and define the lim-sup set C∞ := lim supn→∞ C n to be

lim sup C n := C m = {x ∈ [0, 1] : x ∈ C n for infinitely many n ∈ N}.


n→∞ n≥1 m≥n
∞
Then, if n=1 λ(C n ) < ∞, we have that

λ(C∞ ) = 0.
∞
Proof. The convergence of n=1 λ(C n ) implies that for each ε > 0, there exists some
n(ε) ∈ N such that

λ(C n ) < ε.
n≥n(ε)

By the definition of C∞ , we have that

C∞ ⊂ Cn .
n≥n(ε)

Therefore, for arbitrary ε > 0, we obtain that


⎛ ⎞

λ(C∞ ) ≤ λ ⎝ Cn ⎠ ≤ λ(C n ) < ε.
n≥n(ε) n≥n(ε)

Letting ε tend to zero finishes the proof.

1 Borel’s original article [Bor09] contained a mistake, which was observed and corrected by Bernstein
[Ber12b, Ber12a]
24 | 1 Number-theoretical dynamical systems

We are now in a position to prove the second and final main theorem of this
subsection.

Theorem 1.2.19 (Borel–Bernstein Theorem). For a function φ : N → (0, ∞) we set

Wφ := {[x 1 , x2 , . . .] ∈ I : x n > φ(n) for infinitely many n ∈ N}.


∞
(a) If the series n=1 1/φ(n) diverges, then

λ(Wφ ) = 1,
∞
(b) If the series n=1 1/φ(n) converges, then

λ(Wφ ) = 0.

Proof. To prove part (a) we show that the complement of Wφ has measure zero. The
proof follows along the same lines as the proof of Theorem 1.2.16. As before, we obtain
that
⎛ ⎞
 
1  
λ ⎝ ⎠
C(x1 , . . . , x n , k) < 1 − λ C(x1 , . . . , x n ) .
φ(n + 1)
1≤k<φ(n+1)

Therefore, on setting Bφ(m,n) := {[x1 , x2 , . . .] ∈ I : x i ≤ φ(i) for all m ≤ i ≤ n}, we obtain


that
   1
  
(m,n+1)
λ Bφ < 1− λ Bφ(m,n) < · · ·
φ(n + 1)
! n    
1
< 1− λ Bφ(m,m) .
φ(k + 1)
k=m

Using the fact that 1 − x < e−x for each 0 < x < 1, we have that
  n  
λ Bφ(m,n+1) < e− k=m 1/φ(k+1) λ Bφ(m,m) ,

which implies, since by assumption the series nk=m 1/φ(k + 1) gets arbitrarily large,
that for each m ∈ N,
 
lim λ Bφ(m,n) = 0.
n→∞

Hence, as Bφm := {[x1 , x2 , . . .] ∈ I : x i ≤ φ(i) for all i ≥ m} ⊂ Bφ(m,n) for all n ∈ N, we finally
obtain that λ(Bφm ) = 0. Consequently,
 
λ(WφC ) = λ Bφm = 0.
m≥1
1.2 Topological Dynamical Systems | 25

It remains to prove part (b). For this, we aim to use the Borel–Cantelli Lemma. To that
end, define the sets Wφ(n) , for n ∈ N, by setting

Wφ(n) := {[x 1 , x2 , . . .] ∈ I : x n > φ(n)}.

It is then clear that

Wφ = {x ∈ I : x ∈ Wφ(n) for infinitely many n ∈ N}.

So, in order to apply the Borel–Cantelli Lemma, it suffices to show that the series
∞  (n) 
n=1 λ W φ converges. Indeed,
⎛ ⎞
 
λ Wφ(n+1) = λ ⎝ C(x1 , . . . , x n , k)⎠ (1.9)
(x1 ,...,x n )∈Nn k>φ(n+1)

and
⎛ ⎞
 
 p n φ(n + 1) + p n−1 p n 
λ⎝ ⎠ 
C(x1 , . . . , x n , k) =  − 
q n φ(n + 1) + q n−1 q n 
k>φ(n+1)
1 1 + sn
=
q2n (1 + s n ) φ(n + 1) + s n
2
< λ(C(x1 , . . . , x n )). (1.10)
φ(n + 1)

Thus, on combining (1.9) and (1.10), we have that


  2  2
λ Wφ(n+1) < λ(C(x1 , . . . , x n )) =
φ(n + 1) n
φ(n + 1)
(x1 ,...,x n )∈N

and so,

∞   

λ Wφ(n+1) < 2 1/φ(n) < ∞.
n=1 n=1

This finishes the proof.

Corollary 1.2.20. For λ-a.e. [x1 , x2 , x3 , . . .] ∈ [0, 1], we have that

x n > n log n, for infinitely many n ∈ N,

whereas for every ε > 0, we have that

x n < n(log n)1+ε , for all sufficiently large n ∈ N.


26 | 1 Number-theoretical dynamical systems

Proof. This follows from Theorem 1.2.19 immediately on the observation that



1 

1
= ∞ and < ∞, for all ε > 0.
n log n n(log n)1+ε
n=1 n=1

1.2.5 Markov partitions for interval maps

In this subsection, we introduce the idea of a Markov partition. This notion will be
used repeatedly to build symbolic codings for our various examples. We will restrict
the discussion to maps defined on the closed unit interval [0, 1] which are either
already continuous or become continuous when considered as functions on the circle
R/Z  [0, 1) identifying the points 0 and 1, with the exception of a countable set of
points (denoted by E ) where the map can be discontinuous. We will refer to such
maps as Markov interval maps. To make this clearer, all the possibilities are covered
by considering the tent map, the doubling map (modulo 1), as in Example 1.2.4 (see
also Fig. 1.2), and the Gauss map (modulo 1) (see Fig. 1.1), for which the exceptional
set is E = {0}.

Definition 1.2.21. Let T : J → J be a map as described above, that is J := [0, 1] or


R/Z. Further, let M := {A i : i ∈ E ⊂ N} be a countable collection of open non-empty

sub-intervals of J and define E := J \ i∈E A i , where we suppose that the set E of
exceptional points is at most countable. In here, A denotes the topological closure

M1(x) M2(x)
1 1

0 1 x 0 1 x

Fig. 1.2. The graph of two Markov interval maps. The map M1 is continuous on [0, 1] whereas M2 has
to be considered as a continuous function on the circle R/Z  [0, 1).
1.2 Topological Dynamical Systems | 27

of the subset A in J. Then, the collection M is said to be a Markov partition for T if it


satisfies the following properties:
(a) T |A i : A i → T(A i ) defines a homeomorphism for each i ∈ E.
(b) A i ∩ A j = ∅ for all i, j ∈ E with i = j.
(c) If T(A i ) ∩ A j = ∅ for some i, j ∈ E, then A j ⊂ T(A i ).

Example 1.2.22. We give here the most natural Markov partition for the Gauss map,
G : R/Z → R/Z. For each i ∈ N, let A i := (1/(i + 1), 1/i), so that M is the collection of
sets M = {(1/(i + 1), 1/i) : i ∈ N}. Then the set E is equal to the singleton {0}. Notice
that M in fact coincides with the family of first level Gauss cylinder sets, up to sets of
measure zero. We must check that the three properties defining a Markov partition are
satisfied for this choice of M. Property (a) is clearly satisfied, by the definition of the
Gauss map. Property (b) is also straightforward to check. For property (c), it is enough
to notice that for all i ∈ N we have

G (A i ) = (0, 1).

Let us now consider collections U of subsets of X and define some operations on these
collections.

Definition 1.2.23. Let T : X → X be some given map.


(a) The join U ∨ V of two collections U and V of X is defined to be

U ∨ V := {U ∩ V : U ∈ U , V ∈ V} .

This definition is extended to finitely many collections in the natural way.


(b) We define the preimage of a collection U under the map T to be the collection
consisting of all the preimages of the elements of U under T, that is,

T −1 U := {T −1 (U) : U ∈ U }.

The collection T −n (U ) is defined inductively to be the preimage of the collection


T −(n−1) (U ), in other words, T −n (U ) := T −1 (T −(n−1) (U )).
(c) For each n ∈ N, we define the n-th refinement U n of a collection U to be

"
n−1
U n := T −k (U ) = U ∨ T −1 (U ) ∨ · · · ∨ T −(n−1) (U ).
k=0

These operations on collections of sets can now be used to define a certain type of
Markov partition.

Definition 1.2.24. A Markov partition M for a dynamical system ([0, 1], T) is said to be
shrinking if the diameter of the largest element in each refinement Mn shrinks to zero
28 | 1 Number-theoretical dynamical systems

as n tends to infinity. That is, M is shrinking provided that


 
lim sup diam(M) : M ∈ Mn = 0.
n→∞

Each shrinking Markov partition gives rise to a canonical coding, in the following way.
(Here we are following [Adl98], and we refer to this paper for further discussion.)

Definition 1.2.25. Let M := {A i : i ∈ E ⊆ N} be a shrinking Markov partition for a


 
dynamical system ([0, 1], T) and let A = A ij be the E × E incidence matrix defined,
 ∅. Then, where EN
for each i, j ∈ E, by A ij = 1 if and only if T(A i ) ∩ A j = A is the sub-shift
consisting of all A-admissible words, the coding associated with M is given by the map

π : EN
A → [0, 1] \ T −n (E )
n=0

defined by

π((ω1 , ω2 , ω3 , . . .)) := A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −n A ω n+1 .
n=0

Note that this map is well defined, since the intersection on the right-hand side
above is a singleton, due to Cantor’s Intersection Theorem (since the sequence
(A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −n A ω n+1 )n≥0 is a nested sequence of compact sets with diamet-
ers shrinking to zero; see Exercise 1.6.14 if you have not encountered this useful result
before). Due to the restriction that the exceptional set E be countable, we obtain a
unique coding for all but countably many points.
 
Theorem 1.2.26. The map π is a factor map from the system EN A , σ to the system
 ∞ −n 
[0, 1] \ n=0 T (E ), T .

Proof. Recall that the definition of a factor map is that π should be a continuous
∞ −n
surjection from E∞ A to [0, 1] \ n=0 T (E ) such that π ◦ σ = T ◦ π. It is easily
demonstrated that π is uniformly continuous. Indeed, for this it is enough to notice
 
that diam(π([x1 , . . . , x n ])) ≤ sup diam(M) : M ∈ Mn → 0, for n → ∞, and uniformly
for all admissible n-cylinders.

To see that π is surjective, we argue inductively. Let x ∈ [0, 1] \ ∞ −n
n=0 T (E ) be
an element of A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −(n−1) A ω n for some n ∈ N. Since the collection
{M : M ∈ Mn+1 } covers the set

A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −(n−1) A ω n \ T −n (E ),
n=0

there must exist (at least) one element A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −n A ω n+1 containing x


with T(A ω n ) ∩ A ω n+1  = ∅. In this way we construct a point (ω1 , ω2 , ω3 , . . .) lying in
ENA and one quickly verifies that π((ω 1 , ω 2 , ω 3 , . . .)) is equal to x.
1.2 Topological Dynamical Systems | 29

Finally, it remains to show that π ◦ σ = T ◦ π. So, let ω = (ω1 , ω2 , ω3 , . . .) ∈ EN


A and
observe that

π ◦ σ(ω) = π((ω2 , ω3 , ω4 , . . .)) = A ω2 ∩ T −1 A ω3 ∩ · · · ∩ T −(n−1) A ω n+1 .
n=0

On the other hand, using the continuity of the restriction of T to A ω1 we have that
∞ 
T ◦ π(ω) = T A ω1 ∩ T −1 A ω2 ∩ · · · ∩ T −n A ω n+1
n=0

= TA ω1 ∩ A ω2 ∩ · · · ∩ T −(n−1) A ω n+1
n=0

= A ω2 ∩ T −1 A ω3 ∩ · · · ∩ T −(n−1) A ω n+1
n=0

where the final equality comes from the fact that T(A ω1 ) ⊃ A ω2 , since ω belongs to the
set EN
A . This finishes the proof.

Example 1.2.27. Let us return once more to the Gauss map and to its canonical Markov
partition given in Example 1.2.22. We can easily see that the Markov partition M given
in that example is shrinking. Indeed, the refinement Mn coincides with the collection
of n-th level Gauss cylinder sets, up to sets of measure zero, and the largest element
of this collection is C(1, . . . , 1). We saw in (1.7) that
1 1
≤ λ(C(x1 , . . . , x n )) ≤ 2 ,
2q2n qn

and since for C(1, . . . , 1) we have that q n is equal to the (n + 1)-th Fibonacci number, it
is clear that as n tends to infinity, the diameter of the cylinder set with code consisting
of n 1s shrinks to zero.
Since M is shrinking, we can therefore obtain a coding from it. Recall that the
set E in this case is equal to the singleton {0}. Therefore, the set of points where the
coding is undefined is here simply equal to the set of pre-periodic points of 0, that is,
the rational numbers in [0, 1]. So, for any point ω = (x1 , x2 , x3 , . . .) ∈ NN , we have that

{π(ω)} = A x1 ∩ G−1 (A x2 ) ∩ G−2 (A x3 ) ∩ · · · .

In other words, we have that π(ω) lies in A x1 = (1/(x1 + 1), 1/x1 ), G(π(ω)) lies in A x2 =
(1/(x2 + 1), 1/x2 ), G2 (π(ω)) lies in A x3 = (1/(x3 + 1), 1/x3 ) and so on. Thus, the first
continued fraction element of π(ω) is equal to x1 , the second is equal to x2 , the third
is equal to x3 and so on. Therefore, we have shown that the coding that arises from the
Markov partition M coincides with the continued fraction expansion of each irrational
number.
30 | 1 Number-theoretical dynamical systems

Note that in this case the map π : NN → I is actually a topological conjugacy map,
since the endpoints of all the intervals in M lie in the set G−1 (0) and hence, the map
is injective as well as surjective.

Remark 1.2.28. The ideas of this section can extended much further, to the setting
of graph-directed Markov systems. For this, we refer the reader to the textbook by
Mauldin and Urbanski [MU03].

1.3 The Farey map: definition and topological properties

In this section, we introduce the second of our main examples, namely, the Farey
map. First we give the definition and show how the Farey map is related to the Gauss
map. We then define a Markov partition for the Farey map and describe the coding
associated with it. Finally, in the second subsection, we give some basic topological
properties of the Farey map.

1.3.1 The Farey map

Let us now define a transformation on the unit interval that is closely related to the
Gauss map.

Definition 1.3.1. The Farey map F : [0, 1] → [0, 1] is defined by


#
x/(1 − x) for 0 ≤ x ≤ 12 ;
F(x) :=
(1 − x)/x for 12 < x ≤ 1.

The graph of the Farey map is shown in Fig. 1.3.

F (x)
1

0 1 1 x
2

Fig. 1.3. The Farey map, F : [0, 1] → [0, 1]. In the fixed point at 0 the map F has slope 1. The second
√ 
fixed point lies in γ * = 5 − 1 /2.
1.3 The Farey map: definition and topological properties | 31

For later use, let us also give the inverse branches of F. These are the functions
F0 : (0, 1) → (0, 1/2) and F1 : (0, 1) → (1/2, 1) which are defined by
x 1
F0 (x) := and F1 (x) := .
1+x 1+x
It is easily shown that the action of the map F on a point x = [x1 , x2 , x3 , . . .] ∈ (0, 1) is
as follows:
#
[x1 − 1, x2 , x3 , . . .] if x1 > 1;
F(x) =
[x2 , x3 , x4 , . . .] if x1 = 1.

For this reason, the Farey map is sometimes referred to as the slow continued fraction
map, whereas the Gauss map is referred to as the fast continued fraction map. We can
describe the relationship between the Farey and Gauss maps more precisely. To do
this, we introduce the idea of a jump transformation, which is also often referred to as
Schweiger’s jump transformation [Sch95].

Definition 1.3.2. Let the map ρ : (0, 1] → N ∪ {0} be defined by

ρ(x) := inf {n ≥ 0 : F n (x) ∈ (1/2, 1]}.

Note that ρ(x) is finite for all x ∈ (0, 1]. Then, let the map F * : [0, 1] → [0, 1] be defined by
#
* F ρ(x)+1 (x) if x = 0;
F (x) :=
0 if x = 0.

The map F * is said to be the jump transformation on (1/2, 1] of F.

Lemma 1.3.3. The jump transformation F * of the Farey map coincides with the Gauss
map.

Proof. Fix n ≥ 2 and let x = [n, x2 , x3 , . . .], so x ∈ (1/(n + 1), 1/n]. Then, we have that

F * (x) = F n ([n, x2 , x3 , . . .]) = F n−1 ([n − 1, x2 , x3 , . . .]) = · · ·


= F([1, x2 , x3 , . . .]) = [x2 , x3 , . . .] = G(x).

On the other hand, if x = [1, x2 , x3 , . . .] ∈ (1/2, 1], then ρ(x) is equal to zero and so we
have that F * (x) = F(x), which is again equal to G(x), since G|(1/2,1] = F |(1/2,1] .

Our next aim is to describe a coding generated by the Farey map. The two open
sets {B0 := (0, 1/2), B1 := (1/2, 1)} form a Markov partition for F. In this case, the
exceptional set E is the empty set. That the three conditions for a Markov partition
are satisfied is easy to check, so we leave this as an exercise.
The above Markov partition for the Farey map yields a coding of the numbers in
[0, 1] as follows. Each element ω = (x1 , x2 , x3 , . . .) ∈ {0, 1}N is mapped by π to the
32 | 1 Number-theoretical dynamical systems

point x ∈ [0, 1] given by



{x } := B x1 ∩ F −1 B x2 ∩ · · · ∩ F −n B x n+1 .
n=0

We will use the notation π(ω) = x =: x1 , x2 , x3 , . . . ∈ [0, 1]. This coding will be referred
to as the Farey coding. The Farey coding is related to the continued fraction expansion
of x in a straightforward way. Indeed, if x = [x1 , x2 , x3 , . . .], then the Farey coding of
x is given by x = 0x1 −1 , 1, 0x2 −1 , 1, 0x3 −1 , 1, . . . , where 0n denotes the sequence of
n ∈ N consecutive appearances of the symbol 0 and 00 is understood to mean the
appearance of no zeros between two consecutive 1s. This follows directly from the fact
that the Gauss map is the jump transformation on (1/2, 1] of the Farey map.
We have described the coding for irrational numbers; let us now consider the
rational numbers. We can still define a code for these points, it will just no longer
be unique. So, let x = [x1 , . . . , x k ]. In this case we obtain two infinite codings, namely,
we have that

x = 0x1 −1 , 1, 0x2 −1 , 1, . . . , 0x k −1 , 1, 0, 0, 0, . . . 

and

x = 0x1 −1 , 1, 0x2 −1 , 1, . . . , 0x k −2 , 1, 1, 0, 0, 0, . . . .

Indeed, take for example the point 1/2. Observe that 1/2 ∈ B1 but also 1/2 ∈ B0 ,
so the first entry in the Farey coding for 1/2 can be either 1 or 0. Then F(1/2) = 1
and F n (1/2) = 0 for all n ≥ 2, therefore we obtain that 1/2 = 1, 1, 0, 0, 0, . . . and
1/2 = 0, 1, 0, 0, 0, . . .. The coding for any rational number can be deduced from this
example, since every rational number is eventually mapped to 1/2 by the map F (or,
to put it another way, the set of rational numbers coincides with the iteration of the
point 1/2 under the inverse branches of the Farey map).
Recall that F acts on x = [x1 , x2 , x3 , . . .] in the following way:
#
[x1 − 1, x2 , x3 , . . .] for x1 ≥ 2;
F(x) :=
[x2 , x3 , x4 . . .] for x1 = 1.

In particular, this means that if we instead write x in its Farey coding, so x =


y1 , y2 , . . . , then

F(x) = y2 , y3 , . . ..

So, again, the system (I, F) can be thought of as acting like the shift map σ on the shift
space EN , where this time the alphabet E = {0, 1} is finite. This is a potential advantage
of the map F over the map G in certain situations, as the space {0, 1}N is a compact
metric space, whereas NN is not.
1.3 The Farey map: definition and topological properties | 33

We are now in a position to define the cylinder sets with respect to the Farey map.

Definition 1.3.4. For each n-tuple (x1 , . . . , x n ) ∈ {0, 1}n , define the Farey cylinder set
 1 , . . . , x n ) by setting
C(x

 1 , . . . , x n ) := {y1 , y2 , . . . : y k = x k , for all 1 ≤ k ≤ n}.


C(x

Note that the Farey cylinder sets coincide with the refinements of the Markov partition
{B 0 , B1 } under the inverse branches F0 and F 1 of F. That is, the cylinder set
 1 , . . . , x n ) coincides with the set F x1 ◦ F x2 ◦ · · · ◦ F x n ((0, 1)). We will refer to these
C(x
successive refinements as the Farey decomposition.
There is another way to describe these cylinder sets, in terms of the classical
construction of Stern–Brocot intervals (cf. [Ste58], [Bro61]). For each n ≥ 0, the
elements of the n-th member of the Stern–Brocot sequence
$ %
s n,k
Bn := : k = 1, . . . , 2n + 1
t n,k

are defined recursively as follows:


– s0,1 := 0 and s0,2 := t0,1 := t0,2 := 1;
– s n+1,2k−1 := s n,k and t n+1,2k−1 := t n,k , for k = 1, . . . , 2n + 1;
– s n+1,2k := s n,k + s n,k+1 and t n+1,2k := t n,k + t n,k+1 , for k = 1, . . . 2n .

From this arises the set Tn of Stern–Brocot intervals of order n which is given by
$& ' %
s n,k s n,k+1
Tn := , : k = 1, . . . , 2n .
t n,k t n,k+1

It is straightforward to check that these intervals are precisely the same intervals as
the set

 1 , . . . , x n ) : x i ∈ {0, 1}, 1 ≤ i ≤ n}.


{C(x

Returning to the Stern–Brocot sequence, the n-th member of this sequence consists of
2n + 1 proper fractions and the n-th member of the sequence can be obtained from
the (n − 1)-th member by adding in the mediant of each neighbouring pair, where
we remind the reader that the mediant of two rational numbers a/b and a /b is by
definition the rational number (a + a )/(b + b ). The first few of these sequences are
given by:
$ % $ % $ %
0 1 0 1 1 0 1 1 2 1
B0 = , , B1 = , , , B2 = , , , , ,
1 1 1 2 1 1 3 2 3 1
$ %
0 1 1 2 1 3 2 3 1
B3 = , , , , , , , , ,
1 4 3 5 2 5 3 4 1
$ %
0 1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1
B4 = , , , , , , , , , , , , , , , , ,...
1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 1
34 | 1 Number-theoretical dynamical systems

1
2

1 1 2
3 2 3

1 1 2 1 3 2 3
4 3 5 2 5 3 4

1 1 2 1 3 2 3 1 4 3 5 2 5 3 4
5 4 7 3 8 5 7 2 7 5 8 3 7 4 5

Fig. 1.4. The dyadic Stern–Brocot tree with root s1,2 /t1,2 = 1/2. For each s n,2k /t n,2k , n ∈ N and k =
1, . . . , 2n−1 we have the two offspring s n+1,2k /t n+1,2k and s n+1,2k+2 /t n+1,2k+2 . For each n, the missing
elements from Bn \{0, 1} are marked in grey.

This sequence also gives rise to the Stern–Brocot Tree as shown in Fig. 1.4. The vertices
of the n-th generation of the Stern–Brocot Tree will be denoted by Sn := Bn \ Bn−1 , for
n ∈ N and the sequence (Sn )n∈N is called the even Stern–Brocot sequence.

Remark 1.3.5. The Markov partition {B0 , B1 } given above is not the only reasonable
choice for the Farey map. We might instead choose to use the partition M, that is, the
partition defined in Example 1.2.22 for the Gauss map. It is not hard to verify that this is
also a shrinking Markov partition for the Farey map, and so yields a different coding for

the points of [0, 1]. In this case, we obtain a coding map π : NN −n
A → [0, 1] \ n∈N F (0),
N
where the sub-shift NA is determined by the infinite transition matrix given by


⎨1 if i = 1 and j ∈ N;
A ij = 1 if i = n and j = n − 1, for n ≥ 2;

⎩0 otherwise.

This sub-shift is known as the (infinite) renewal shift. We will return to the subject of
renewal theory in Chapter 3.

1.3.2 Topological properties of the Farey map

We now come to the study of the Farey map from a topological point of view. Let us
first introduce another well-studied dynamical system, which we shall shortly show
to be topologically conjugate to the Farey map.

Definition 1.3.6. Define the map T : [0, 1] → [0, 1] by setting


#
2x for x ∈ [0, 1/2);
T(x) :=
2 − 2x for x ∈ [1/2, 1].
1.3 The Farey map: definition and topological properties | 35

T (x)
1

0 1 1 x
2

Fig. 1.5. The tent map, T : [0, 1] → [0, 1].

The map T is referred to as the tent map; the reason for this name is clear on inspection
of the graph of T, shown in Fig. 1.5.

It turns out, and we shall prove this shortly, that the conjugating homeomorphism
between the Farey map and the tent map coincides with Minkowski’s question-mark
function, which we shall denote by Q : [0, 1] → [0, 1]. This remarkable function
was originally introduced by Minkowski [Min10] and later investigated by Denjoy
[Den38] and Salem [Sal43], amongst others. Minkowski’s original motivation behind
the definition of the function that now bears his name was to highlight the intriguing
property of continued fractions that was described in Lagrange’s Theorem (see
Theorem 1.2.8). Recall that this theorem states that the set of irrational algebraic
numbers of degree two corresponds precisely to the set of real numbers that admit
an eventually periodic continued fraction expansion. In other words, if x ∈ [0, 1] can
be written as a continued fraction of the form [x1 , . . . , x m , x m+1 , . . . , x m+k ], then x is an
irrational root of some quadratic polynomial and, moreover, the converse statement
also holds. Minkowski designed the function Q to map the quadratic surds into the
non-dyadic rationals in a continuous and order-preserving way (we leave the proof
of this to Exercise 1.6.15). The question-mark function is constructed in the following
way. First, define Q(0) = Q(0/1) := 0 and Q(1) = Q(1/1) := 1. Then, define
 
p + p Q(p/q) + Q(p /q )
Q 
:= .
q+q 2

In other words, the function Q is successively defined on all the rational numbers
in the unit interval by taking mediants of those that have already been defined. The
definition of Q is extended to all of [0, 1] by continuity (since any uniformly continuous
36 | 1 Number-theoretical dynamical systems

Q n (x ) Q (x )
1 1
7/8
3/4
5/8
1/2
3/8
1/4
1/8

0 1 1 2 1 3 2 3 1 x 0 1 x
4 3 5 2 5 3 4

Fig. 1.6. On the left, the graphs of the functions Q n , n = 1, 2, 3, and on the right, an approximation to
the graph of the Minkowski question-mark function, Q : [0, 1] → [0, 1].

function from a dense set of a metric space E into another metric space can be uniquely
extended to a continuous function on all of E).
Another way to think about the question-mark function is as a uniform limit of
the sequence of piecewise linear functions (Q n )n∈N , where each Q n : [0, 1] → [0, 1]
is defined by mapping the n-th level Stern–Brocot fractions, arranged in increasing
order, onto the set {p/2n : 0 ≤ p ≤ 2n } and then joining these image points by straight
line segments. Above, in Fig. 1.6, can be found an illustration of the first few of these
functions and also an approximation of the graph of Q itself.
Denjoy demonstrated that the function Q is given by the following formula:


∞ k
Q([x1 , x2 , x3 , . . .]) = −2 (−1)k 2− i=1 x i .
k=1

Later, Salem derived the most important properties of Q from this formula, including
the facts that Q is strictly increasing and singular with respect to Lebesgue measure,
which means that the derivative of Q is equal to zero, Lebesgue-almost everywhere.
The function Q is for this reason referred to as a slippery Devil’s staircase, a term
coined by Gutzwiller and Mandelbrot in [GM88]. The (multifractal) fractal nature of
Q is investigated in [KS08b], see also [KS07].
Also, recall that the distribution function ∆ μ of a measure μ with support in [0, 1]
is defined for each x ∈ [0, 1] by

∆ μ (x) := μ([0, x)).


1.3 The Farey map: definition and topological properties | 37

Note that a distribution function is always non-decreasing and right-continuous (see


Theorem 9.1.1 in Dudley [Dud89]) and in the case of the measure μ having no atoms,
the function ∆ μ is continuous.

Proposition 1.3.7. The dynamical systems ([0, 1], F) and ([0, 1], T) are topologically
conjugate and the conjugating homeomorphism Q is given by


∞ k
Q([x1 , x2 , x3 , . . .]) := −2 (−1)k 2− i=1 x i .
k=1

That is, the conjugating homeomorphism between the Farey map and the tent map is
Minkowski’s question-mark function. Moreover, the map Q is equal to the distribution
function ∆ μ0 of the measure of maximal entropy μ0 := λ ◦ Q for the Farey map.

Remark 1.3.8. The reader unfamiliar with the concept of measures of maximal
entropy should not be unduly alarmed by this terminology. For our purposes, it is
enough to know that μ0 assigns mass 2−n to each n-th level Farey cylinder set. For
further reading we refer to [Wal82]. Also note that since Q is a homeomorphism the
measure λ ◦ Q is well defined.

Proof of Proposition 1.3.7. We will first show that the map Q is the conjugating homeo-
morphism from F to the tent system. For this, suppose first that x ∈ [0, 1/2]. Then, Q(x)
is an element of [0, 1/2] and we have that
  ∞ 
  
∞ 
k − ki=1 x i
 
k −(x1 −1)− ki=2 x i
T Q(x) = 2 −2 (−1) 2 = −2 (−1) 2
k=1 k=1
 
= Q [x 1 − 1, x2 , x3 , . . .] = Q(F(x)).

Now, suppose that x ∈ (1/2, 1], that is, x = [1, x2 , x3 , . . .]. Then, it follows that Q(x) ∈
(1/2, 1] and we have that
 
−1
∞ 
k −1− ki=2 x i
T(Q(x)) = 2 − 2 2 · 2 − 2 (−1) 2
k=2
 


k −
k
xi  
=2 (−1) 2 i=2 = Q [x2 , x3 , . . .] = Q(F(x)).
k=2

It only remains to show that Q is equal to the distribution function of μ 0 . Indeed, for
each x ∈ [0, 1] we have

∆ μ0 (x) = μ0 ([0, x]) = λ ◦ Q([0, x]) = λ([Q(0), Q(x)]) = λ([0, Q(x)]) = Q(x).

This finishes the proof.


In preparation for the next proposition, we recall the definition of a Hölder continuous
function.
38 | 1 Number-theoretical dynamical systems

Definition 1.3.9. A map S : (X, d) → (X, d) of a metric space (X, d) into itself is said to
be Hölder continuous with exponent κ > 0 if there exists a positive constant C > 0 such
that

d(S(x), S(y)) ≤ C d(x, y)κ , for all x, y ∈ X.

We will show now that Minkowski’s question-mark function, the conjugating homeo-
morphism between the Farey and tent systems, is Hölder continuous with exponent

log 2/(2 log γ ), where we recall that γ := (1 + 5)/2 denotes the golden mean. This
result was also originally proved by Salem [Sal43]. First, we give a useful lemma.

Lemma 1.3.10. For the denominator q n of the n-th convergent of x = [x1 , x2 , x3 , . . .], we
have that q n < γ x1 +···+x n , for all n ∈ N.

Proof. We will prove this by induction. Certainly, for p1 /q1 = 1/x1 , the inequality x1 =
q1 < γ x1 holds. So, fix n ∈ N and suppose that q k < γ x1 +···+x k for all k < n. Recall that
q n = x n q n−1 + q n−2 . Therefore,

q n < x n γ x1 +···+x n−1 + γ x1 +···+x n−2 .

Thus, it suffices to show that

x n γ x n−1 + 1 ≤ γ x n−1 +x n , or, that x n + 1/γ ≤ γ x n .

This last inequality becomes the equality 1 + 1/γ = γ when x n = 1; it is also


straightforward to check that 2 + 1/γ = γ 2 . As the function n → γ n − n is increasing
for n ≥ 2, the proof is finished.

Proposition 1.3.11. The map Q is s-Hölder continuous for s = log 2/(2 log γ ), but not for
any s > log 2/(2 log γ ).

Proof. In order to calculate the Hölder exponent of Q, first note that


   
Q(C(x1 , x2 , . . . , x n )) = Q([x1 , x2 , . . . , x n ]) − Q([x1 , x2 , . . . , x n , 1])
= 2−s n ,

where we have put, as in the proof of Proposition 1.3.7, s n := ni=1 x i . This can be seen
by simply calculating the image of the endpoints of this cylinder, or by noting that
every Gauss cylinder C(x1 , x2 , . . . , x n ) is a s n -th level Farey cylinder.
Now, we have that

λ(C(x1 , . . . , x n )) = |p n /q n − pn+1 /qn+1 |, where pn+1 /qn+1 = [x1 , . . . , x n , 1].


1.3 The Farey map: definition and topological properties | 39

Recalling that p n q n−1 − p n−1 q n = (−1)n+1 , it follows, in light of Lemma 1.3.10, that
1
λ(C(x1 , . . . , x n )) = > γ −(2s n +1) .
q n qn+1

Rearranging this expression, we obtain that

λ(C(x1 , . . . , x n )) > (2log γ/ log 2 )−(2s n +1)


= 2− log γ/ log 2 · (2−s n )2 log γ/ log 2
 2 log γ/ log 2
= c · Q(C(x1 , . . . , x n )) ,

where c := 2− log γ/ log 2 = γ −1 . In other words,


 
Q(C(x1 , . . . , x n )) ≤ c · λ(C(x1 , . . . , x n ))log 2/(2 log γ) .

Now, let x and y be two arbitrary different irrational numbers in [0, 1]. There must be
a first time during the backwards iteration of [0, 1] under the inverse branches of F in
which a Farey cylinder set appears between the numbers x and y. Say that this cylinder
set appears in the p-th stage of the Farey decomposition. If we iterate one more time,
it is clear that there are two (p + 1)-th level Farey cylinder sets fully contained in the
interval (x, y); moreover, one of these also has to be a Gauss cylinder set. Let this Gauss

cylinder set be denoted by C(z 1 , z2 , . . . , z k ), where kj=1 z j = p + 1. This leads to the
observation that, as C(z1 , z2 , . . . , z k ) is contained in the interval (x, y), we have

|x − y |log 2/(2 log γ) > λ(C(z 1 , z2 , . . . , z k ))log 2/(2 log γ)


≥ c−1 · |Q(C(z1 , z2 , . . . , z k ))| = c−1 · 2−(p+1) .

Consider the interval (x, y) again. By construction, it is contained inside two neigh-
bouring (p − 1)-th level Farey intervals, and so

|Q(x) − Q(y)| < 2−(p−1) + 2−(p−1) = 2−(p−2) = 8 · 2−(p+1) .

Combining these observations, we obtain that

|Q(x) − Q(y)| ≤ 8c |x − y |log 2/(2 log γ) .

Now fix s > log(2)/(2 log(γ )) and let x n , y n be the left and right boundary point of the
Gauss cylinder C(1, . . . , 1) of length n ∈ N. Then there exists a constant c > 0 such that
on the one hand
1
|x n − y n | = λ(C(1, . . . , 1)) = ≤ cγ −2n .
f n f n+1

(cf. Exercise 1.6.4). On the other hand we have

|Q(x n ) − Q(y n )| = λ(Q(C(1, . . . , 1))) ≥ c−1 2−n .


40 | 1 Number-theoretical dynamical systems

Combining these two observations gives

|Q(x n ) − Q(y n )| 2−n


≥ c−s−1 −2sn = c−s−1 exp(n(− log(2) + 2s log(γ ))) → ∞.
|x n − y n | s γ

This finishes the proof.

1.4 Two further examples

In this section we will introduce and study the properties of our other main examples.
These are two families of dynamical systems, the α-Lüroth and α-Farey systems, which
are both indexed by partitions α of the unit interval. We shall first introduce the
class of partitions of [0, 1] we are interested in and then define the α-Lüroth map
L α . We then describe the expansion of real numbers that can be derived from this
map, again in terms of a Markov partition. Next, we introduce the family of α-Farey
maps and develop topological properties for these maps similar to those described in
Section 1.3.2 for the Farey map.

1.4.1 The α-Lüroth maps

Let us begin by letting α := {A n : n ∈ N} denote a countably infinite partition of the


unit interval [0, 1], consisting of non-empty, right-closed and left-open intervals. It
is assumed throughout that the elements of α are ordered from right to left, starting
from A1 , and that these elements accumulate only at the origin. We will denote the
collection of all such partitions of [0, 1] by A. Further, we let a n denote the Lebesgue

measure λ(A n ) of A n ∈ α and let t n := ∞ k=n a k denote the Lebesgue measure of the n-th

tail of α. It is clear that t1 = ∞ a
k=1 k = 1 for every partition α ∈ A.

Definition 1.4.1. For a given partition α ∈ A, the α-Lüroth map L α : [0, 1] → [0, 1] is
given by
#
(t n − x)/a n for x ∈ A n , n ∈ N;
L α (x) :=
0 if x = 0.

In other words, the map L α consists of countably many linear branches that send A n
onto [0, 1), for each n ∈ N.

We now define a Markov partition for the map L α . Let E := {0} and let α̊ := {B n : n ∈ N},
where B n := Int(A n ) denotes the interior of A n for each n ∈ N. It is easy to verify that the
collection of sets α̊ constitutes a Markov partition. Indeed, the properties (a) and (b) of
Definition 1.2.21 are obviously satisfied and property (c) follows from the observation
that for all n ∈ N we have L α (B n ) = (0, 1).
1.4 Two further examples | 41

For later use, let us also define the inverse branches of L α . These are the countable
family of maps L α,n : (0, 1) → B n defined for each n ∈ N by

L α,n (x) := t n − a n x.

In order to construct a coding from this Markov partition, we must first show that it is
shrinking (cf. Definition 1.2.24). To do this, we calculate the Lebesgue measure of the
intervals that make up the refinements α̊ n of α̊. First of all, the size of each element
B n := (t n+1 , t n ) of α̊ is equal to a n . The refinement α̊ 2 is given by

α̊ 2 = α̊ ∨ L−1 −1
α ( α̊) = L α ( α̊) = L α,n (B k )
k∈N n∈N

= (t n − a n t k , t n − a n t k+1 ).
k∈N n∈N

Thus, the size of an interval B n,k := (t n − a n t k , t n − a n t k+1 ) in α̊ 2 is equal to a n a k , since


t k − t k+1 = a k . Now, suppose that the endpoints of an interval B1 ,...,n in the refinement
α̊ n are given by

t1 − a1 t2 + · · · + (−1)n−1 a1 . . . an−1 tn

and

t1 − a1 t2 + · · · + (−1)n−1 a1 . . . an−1 tn +1 ,

so that the size of B1 ,...,n is equal to a1 . . . an . To shorten this notation, let us write
[1 , . . . , n ]α := t1 − a1 t2 + · · · + (−1)n−1 a1 . . . an−1 tn . Then,

α̊ n+1 = L−1 n
α ( α̊ ) = L α,k (B1 ,...,n )
k∈N (1 ,...,n )∈Nn
 
= t k − a k [1 , . . . , n ]α , t k − a k [1 , . . . , n + 1]α ±
k∈N (1 ,...,n )∈Nn
 
= [1 , . . . , n+1 ]α , [1 , . . . , n+1 + 1]α ± ,
(1 ,...,n ,n+1 )∈Nn+1

where we recall that the notation (·, ·)± means that the endpoints are not necessarily
in the correct order. Thus, the size of an interval B1 ,...,n+1 is equal to a1 . . . an+1 . From

this, we can deduce that the partition α̊ is shrinking. Since ∞ k=1 a k = 1, it follows that
there exists (at least) one of the a k with maximum size. Call it amax . Then, the largest
element of α̊ n has size (amax )n and, as 0 < amax < 1, this clearly tends to zero as n tends
to infinity.
Now we can utilise the shrinking Markov partition α̊ to obtain a coding for

all the points in the set [0, 1] \ ∞ −n
n=0 L α (0). Exactly as was the case for the Gauss
map before, if x lies in this set, we find a sequence (1 , 2 , . . .) ∈ NN such that
42 | 1 Number-theoretical dynamical systems

)∞
x∈ n=0 B 1 ∩ L −1
α B 2 ∩ · · · ∩ L α B n+1 . Thus, x ∈ B 1 and therefore,
−n

L α (x) = (t1 − x)/a1 .

So, x = t1 −a1 L α (x). Then, L α (x) ∈ B2 and a similar calculation leads us to the identity

x = t1 − a1 t2 + a1 a2 L2α (x).

We can continue this calculation indefinitely to obtain an alternating series expansion



of each x ∈ [0, 1] \ ∞ −n
n=0 L α (0), which is given by


∞  
*
x = t 1 + (−1)n−1 a  i t  n = t 1 − a 1 t 2 + a 1 a 2 t 3 − . . . .
n=2 i<n

This will be called the α-Lüroth expansion of the point x. To shorten the notation, we
denote these infinite series expansions by x = [1 , 2 , 3 , . . .]α .
Notice that the infinite α-Lüroth expansions match with the finite ones we ob-
tained above whilst calculating the endpoints of the intervals of the refined partitions
α̊ n . The main difference is that each infinite expansion is unique (as in the case of
infinite continued fractions), whereas the finite ones can be written in either of the
two ways:

[1 , 2 , . . . , k ]α := t1 − a1 t2 + · · · + (−1)k−1 a1 . . . ak−1 tk

and

[1 , 2 , . . . , k − 1, 1]α = t1 − a1 t2 + · · · + (−1)k a1 . . . ak−1 ak −1 t1 ,

where we assume that k > 1.


By analogy with continued fractions, for which a number is rational if and only
if it has a finite continued fraction expansion, we say that x ∈ [0, 1] is an α-rational
number when x has a finite α-Lüroth expansion - that is, whenever x is a pre-image of
0 under the map L α - and say that x is an α-irrational number otherwise. We denote
the set of α-rational numbers by Qα and the set of α-irrational numbers by Iα . The set
Qα is, of course, a countable dense set in [0, 1]. The reader should also notice that the
α-rationals are not necessarily equal to actual rational numbers, unless the partition
α is chosen to consist solely of intervals with rational endpoints.

Example 1.4.2.
(a) Define the harmonic partition α H by setting
$  ' %
1 1
α H := A n := , :n≥1 .
n+1 n
1.4 Two further examples | 43

The map L α H : [0, 1] → [0, 1] is then given by


⎧  '
⎨−n(n + 1)x + (n + 1) for x ∈
1 1
, ;
L α H (x) := n+1 n

0 for x = 0.

This map can be found in the literature where it is often referred to as the
alternating Lüroth map. For references and more on the historical background to
this, see Section 1.5.
With respect to the map L α H , in exactly the way outlined above using the Markov
partition α̊ H , the corresponding series expansion of some arbitrary x ∈ [0, 1] turns
out to be
 
∞
n−1
!n
−1
x= (−1) (n + 1) (k (k + 1))
n=1 k=1
1 1 1
= − + − ··· ,
1 1 (1 + 1)2 1 (1 + 1)2 (2 + 1)3

where each n ∈ N and the expansion can, as usual, be finite or infinite.


 + 
(b) Define the partition α D := 1/2n , 1/2n−1 : n ∈ N . We will refer to α D as the
dyadic partition. For this partition, we obtain the map L α D , which is given by
#  +
2 − 2n x for x ∈ 1/2n , 1/2n−1 ;
L α D (x) :=
0 for x = 0.

Remark 1.4.3.
1. The name “α-Lüroth” for these maps is in honour of the German mathematician
J. Lüroth, for his 1883 paper [Lür83] which develops a particular series expansion
of real numbers which is related to the expansions derived above. For more
details, we refer once more to Section 1.5.
2. Note that the α-Lüroth expansion is a particular type of generalised Lüroth series, a
concept which was introduced by Barrionuevo et al. in [BBDK96] (also see [DK02]).

Before going any further, let us describe the action of the map L α on the expansions it
generates. For each x = [1 , 2 , 3 , . . .]α , we have, since x ∈ A1 , that

L α (x) = (t1 − x)/a1 = (t1 − (t1 − a1 t2 + a1 a2 t3 + . . .))/a1
= t2 + a2 t3 + . . . = [2 , 3 , 4 , . . .]α .

This shows that L α , just like the Gauss map, can be thought of as acting as the shift map
on the space NN , at least for those points in [0, 1] with infinite α-Lüroth expansions.
That is, L α : Iα → Iα and σ : NN → NN are topologically conjugate via the conjugacy map
h : NN → Iα given by h(1 2 3 . . .) = [1 , 2 , 3 , . . .]α .
44 | 1 Number-theoretical dynamical systems

For each x = [1 , 2 , 3 , . . .]α ∈ [0, 1], just as was done for the continued fraction
expansion, if we truncate the α-Lüroth expansion of x after k entries, then we obtain
the k-th α-Lüroth convergent of x, that is, for each k ∈ N we obtain the finite α-Lüroth
expansion r(α)
k
(x), given by

r(α)
k (x)
:= [1 , . . . , k ]α = t1 − a1 t2 + · · · + (−1)k−1 a1 · · · ak−1 tk .

The behaviour of these convergents is exactly like those of the continued fraction
convergents, as shown in the following proposition.

Proposition 1.4.4. Let x = [1 , 2 , 3 , . . .]α ∈ Iα . Then, the sequence of α-Lüroth conver-
gents of x satisfies the following four properties.
 
(a) The sequence r(α) (x) n≥1 of even convergents is increasing.
 2n 
(b) The sequence r(α) 2n−1 (x) n≥1 of odd convergents is decreasing.
(c) Every convergent
 of oddorder is greater than every convergent of even order.
 
(d) limn→∞ r(α)
n+1 (x) − r(α)
n (x) = 0.

Proof. The proof is very similar to that of Proposition 1.1.3 and, as such, is left as an
exercise.

Definition 1.4.5. For each k-tuple (1 , . . . , k ) of positive integers, define the α-Lüroth
cylinder set C α (1 , . . . , k ) associated with the α-Lüroth expansion by

C α (1 , . . . , k ) := {[y1 , y2 , . . .]α : y i = i for 1 ≤ i ≤ k}.

Observe once again that these cylinder sets coincide up to sets of measure zero with
the elements of the refinements α̊ n of the Markov partition α̊.

1.4.2 The α-Farey maps

Let us now introduce a second family of maps, indexed by the same collection A of
partitions of [0, 1] as were used in the definition of L α . We will soon see that these
new maps are related to the maps L α in the same way the Farey map is related to the
Gauss map.

Definition 1.4.6. For a given partition α := {A n : n ∈ N} ∈ A, the α-Farey map F α :


[0, 1] → [0, 1] is defined by


⎨(1 − x)/a1 if x ∈ A1 ;
F α (x) := a n−1 (x − t n+1 )/a n + t n if x ∈ A n , for n ≥ 2;

⎩0 if x = 0.
1.4 Two further examples | 45

Although the formula looks a bit cryptic, all that the transformation F α does is to map
the set A1 linearly onto the interval [0, 1) and, for each n ≥ 2, map the interval A n
linearly onto the interval A n−1 . In particular, notice that F α |A1 = L α |A1 . The action of
F α on each point x = [1 , 2 , . . .]α ∈ [0, 1] is given by
#
[2 , 3 , . . .]α for 1 = 1;
F α (x) :=
[1 − 1, 2 , 3 , . . .]α for 1 ≥ 2.

Notice that the map F α acts on the α-Lüroth expansion of x in precisely the same way
as the Farey map acts on the continued fraction expansion of each point x ∈ [0, 1].

Definition 1.4.7. Let the two inverse branches of the map F α be denoted by

F α,0 : [0, 1] → [0, t2 ] and F α,1 : [0, 1] → [t2 , 1].

With the convention that F α,0 (0) = 0, these two branches are given by
a n+1
F α,0 (x) := (x − t n+1 ) + t n+2 , for x ∈ A n , n ≥ 1
an
and

F α,1 (x) := 1 − a1 x, for x ∈ [0, 1].

Note that F α,0 maps the interval A n into the interval A n+1 , for each n ∈ N.

Example 1.4.8.
  + 
(a) For the harmonic partition α H := A n := 1/(n + 1), 1/n : n ∈ N , we obtain the
α H -Farey map F α H , which is given explicitly by

⎨2 − 2x for x ∈ (1/2, 1];
F α H (x) := n + 1 1
⎩ x− for x ∈ (1/(n + 1), 1/n].
n−1 n(n − 1)

The graphs of the maps L α H and F α H are shown in Fig. 1.7


 + 
(b) Consider again the dyadic partition α D := 1/2n , 1/2n−1 : n ∈ N . In this case
the map F α D coincides with the tent map. To see this, it is enough to note that for
each n ∈ N we have that a n = 2−n and t n = 2−(n−1) . The graphs of the maps L α D and
F α D are shown in Fig. 1.8.

We now show that the relationship between the maps L α and F α is exactly the same
as the relationship between the maps G and F, that is, L α is a jump transformation of
F α . More precisely, we make the following definition.

Definition 1.4.9. Let the map ρ α : (0, 1] → N ∪ {0} be defined by setting

ρ α (x) := inf {n ≥ 0 : F αn (x) ∈ A1 }.


46 | 1 Number-theoretical dynamical systems

FαH (x) LαH (x)


1 1

1
2

1
3
1
4

0 1 1 1 1 x 0 1 1 1 1 x
4 3 2 4 3 2

Fig. 1.7. The α H -Lüroth and α H -Farey map, where t n = 1/n, n ∈ N.

FαD (x) LαD (x)


1 1

1
2

1
4

0 1 1 1 x 0 1 1 1 x
4 2 4 2

Fig. 1.8. The α D -Lüroth and α D -Farey map, which coincides with the tent map T , where t n = (1/2)n−1 ,
n ∈ N.

Notice that the map ρ α is finite everywhere on (0, 1]. Then, let the map F *α : [0, 1] →
[0, 1] be defined by
#
* F αρ α (x)+1 (x) if x = 0;
F α (x) :=
0 if x = 0.

The map F *α is said to be the jump transformation on A1 of F α .

We then obtain the following result. Note that the proof can be copied line by line from
the proof of the corresponding result for the Farey and Gauss maps, so we omit it here.
1.4 Two further examples | 47

Lemma 1.4.10. The jump transformation F *α of F α coincides with the α-Lüroth


map L α .

Proof. This follows in precisely the same way as Lemma 1.3.3.


Let us now describe how to construct a Markov partition for the α-Farey map from
the partition α and use it to obtain another coding of the points in [0, 1] from the
map F α . The partition given by the two open sets {B0 := Int(A1 ), B1 := (0, 1] \ A1 } is
a Markov partition for F α . Each α-irrational number in [0, 1] has an infinite coding
x =: x1 , x2 , . . .α with (x1 , x2 , . . .) ∈ {0, 1}N such that x k = 1 if and only if F αk−1 (x) ∈ B1
for each k ∈ N. This coding will be referred to as the α-Farey expansion or the α-Farey
coding. (Here we are skipping over the details, but this coding is obtained in precisely
the way that the Farey coding was obtained in Section 1.3.) The α-Farey coding is
related to the α-Lüroth coding in exactly the same way as the Farey coding is related
to the continued fraction expansion, namely, if an α-irrational number x ∈ [0, 1] has
α-Lüroth coding given by x = [1 , 2 , 3 , . . .]α , then the α-Farey coding of x is given
by x := 01 −1 , 1, 02 −1 , 1, 03 −1 , 1, . . . α , where we recall that 0n denotes the sequence
of n ∈ N consecutive appearances of the symbol 0 and 00 is understood to mean
the appearance of no zeros between two consecutive 1s. For each α-rational number
x = [1 , 2 , . . . , k ]α , one immediately verifies that this number has an α-Farey coding
given by either

x = 01 −1 , 1, 02 −1 , 1, . . . , 0k −1 , 1, 0, 0, 0, . . .α

or

x = 01 −1 , 1, 02 −1 , 1, . . . , 0k −2 , 1, 1, 0, 0, 0, . . . α .

Recall that F α acts on x = [1 , 2 , . . .]α in the following way:


#
[1 − 1, 2 , 3 , . . .]α for 1 ≥ 2;
F α (x) :=
[2 , 3 , . . .]α for 1 = 1.

In particular, this means that if we instead write x in its α-Farey coding, that is, x =
x 1 , x2 , x3 , . . . α , then

F α (x) := x2 , x3 , x4 , . . .α .

Therefore, the map F α : [0, 1] → [0, 1] is a factor of the shift map σ on the shift
space {0, 1}N , via the factor map h : {0, 1}N → [0, 1] defined by h((x1 , x2 , x3 , . . .)) :=
 x 1 , x 2 , x 3 , . . . α .
Let us now define the cylinder sets associated with the map F α . These once more
coincide with the refinements of the Markov partition given above for F α .
48 | 1 Number-theoretical dynamical systems

Definition 1.4.11. For each n-tuple (x1 , . . . , x n ) of positive integers, define the α-Farey
 α (x1 , . . . , x n ) by setting
cylinder set C

 α (x1 , . . . , x n ) := {y1 , y2 , . . .α : y k = x k , for 1 ≤ k ≤ n}.


C

By analogy with the Farey decomposition described after Definition 1.3.4, we will
 α (x1 , . . . , x n ) : (x1 , . . . , x n ) ∈ {0, 1}n } the n-th level
call the set of cylinder sets {C
α-Farey decomposition. Observe that we have the relation C  α (x1 , . . . , x n ) = F α,x1 ◦ · · · ◦
F α,x n ([0, 1]).

Notice that every α-Lüroth cylinder set is also an α-Farey cylinder set, whereas the
converse of this statement is not true. The precise description of the correspondence
is that any α-Farey cylinder set which has the form C  α (01 −1 , 1, . . . , 0k −1 , 1) coincides
with the α-Lüroth cylinder set C α (1 , . . . , k ), but if an α-Farey cylinder set is defined by
a finite word ending in the symbol 0, then it cannot be translated to a single α-Lüroth
cylinder set. However, we do have the relation

 α (01 −1 , 1, 02 −1 , 1, . . . , 0k −1 , 1, 0m ) =


C C α (1 , 2 , . . . , k , n).
n≥m+1

It therefore follows that for the Lebesgue measure of this interval we have that

 α (01 −1 , 1, 02 −1 , 1, . . . , 0k −1 , 1, 0m )) =
λ(C λ(C α (1 , 2 , . . . , k , n))
n≥m+1

= a1 a2 · · · ak t m+1 .

In addition, we can identify the endpoints of each α-Farey cylinder set. If we consider
the set C  α (01 −1 , 1, . . . , 0k −1 , 1), then we already know the endpoints of this interval
(since it is also equal to an α-Lüroth cylinder set). On the other hand, the endpoints
of the set C  α (01 −1 , 1, 02 −1 , 1, . . . , 0k −1 , 1, 0m ) are given by [1 , . . . , k , m + 1]α and
[ 1 , . . . ,  k ] α .

1.4.3 Topological properties of F α

Let us now consider the topological properties of the maps F α . Perhaps by now
the reader will not be surprised to learn that they are essentially the same as the
topological properties of the Farey map F. Again, the proofs can be closely modelled
after the proofs of the corresponding results for the Farey map and so we leave many
of them as exercises.
Before stating the first proposition, we remind the reader that the measure of
maximal entropy μ α for the system F α is the measure that assigns mass 2−n to each
n-th level α-Farey cylinder set, for each n ∈ N.
1.4 Two further examples | 49

Proposition 1.4.12. The dynamical systems ([0, 1], F α ) and ([0, 1], T) are to-
pologically conjugate and the conjugating homeomorphism is given, for each
x = [1 , 2 , 3 , . . .]α , by


∞ k
θ α (x) := −2 (−1)k 2− i=1 i .
k=1

Moreover, the map θ α is equal to the distribution function of the measure of maximal
entropy μ α for the α-Farey map.

Proof. See Exercise 1.6.17.


Notice that the only difference between the map θ α , for a given partition α, and
Minkowski’s question-mark function Q is that the summands in the power of 2 in the
latter are the elements of the continued fraction expansion of the point x, whereas in
the formula for the function θ α the elements of the α-Lüroth expansion turn up. Each
function θ α is continuous and strictly increasing. It can also be shown that each of
the functions θ α (with the obvious, trivial exception of the function θ α D which simply
maps the tent map to the tent map), are singular with respect to the Lebesgue measure
(for more on these functions, see [Mun14]).
Our next aim is to determine the Hölder exponent and the sub-Hölder exponent
of the map θ α , for an arbitrary partition α. Recall that the definition of a Hölder
continuous function was given in Definition 1.3.9. In a similar vein, we say that a map
S : X → X of a metric space (X, d) into itself is sub-Hölder continuous with exponent
κ > 0 if there exists a positive constant C such that

d(S(x), S(y)) ≥ Cd(x, y)κ , for all x, y ∈ X.

In order to determine the Hölder and sub-Hölder exponents of θ α , let us first define
κ(n) := −n log 2/(log a n ) and set
   
κ+ := inf κ(n) : n ∈ N and κ− := sup κ(n) : n ∈ N .

Proposition 1.4.13. We have that the map θ α is κ+ -Hölder continuous and, provided that
κ− is finite, κ− -sub-Hölder continuous.

Proof. In order to calculate the Hölder exponent of θ α , first note that

|θ α (C α (1 , 2 , . . . , k ))| = |θ α ([1 , 2 , . . . , k ]α ) − θ α ([1 , 2 , . . . , k + 1]α )|


k
= 2− j=1 j .

This can be seen by simply calculating the image of the endpoints of this cylinder,
or by noting that every α-Lüroth cylinder set C α (1 , 2 , . . . , k ) is an n-th level α-Farey

cylinder set, where n = kj=1 j .
50 | 1 Number-theoretical dynamical systems

Suppose first that κ+ is non-zero. In this case we have that


 1/κ+
!
k !
k
−i /κ(i )
!
k
−i
λ(C α (1 , 2 , . . . , k )) = a i = 2 ≥ 2
i=1 i=1 i=1
 k 1/κ+
− i=1 i
= 2 = |θ α (C α (1 , 2 , . . . , k ))|1/κ+ .

Or, in other words,

|θ α (C α (1 , 2 , . . . , k ))| ≤ λ(C α (1 , 2 , . . . , k ))κ+ .

From this point, the proof that θ α is κ+ -Hölder continuous is completed in precisely
the same way as the proof that Q is log 2/(2 log γ )-Hölder continuous and we leave the
details to the reader.
Suppose now that κ+ is equal to zero. Then, we have that for each q ∈ N there
exists m0 ∈ N with the property that for every m ≥ m0 ,

m log 2 1
κ(m) = < , or, equivalently, a m < 2−qm .
− log a m q

So we have that the sequence of partition elements are eventually exponentially


decaying, and hence, the Hölder exponent of the map θ α is necessarily equal to zero.
It remains to show that the map θ α is κ− -sub-Hölder continuous. Suppose that
κ− is finite (otherwise the definition of sub-Hölder continuity makes no real sense).
Similarly to the κ+ case, we obtain the inequality

|θ α (C α (1 , . . . , k ))|1/κ− ≥ λ(C α (1 , . . . , k )).

The proof can be completed analogously to the proof of the Hölder continuity of
θ α from this point on and the details are left as an exercise for the reader (see
Exercise 1.6.19).

Example 1.4.14. For the conjugacy map θ α H between the map F α H , arising from the
harmonic partition, and the tent map T, we have that θ α H is log 4/ log 6-Hölder
continuous. To show this, first observe that
− log 2 2 log 2 2 log 2 3 log 2
κ(1) = =1> = κ(2) and < = κ(3).
log 1/2 log 6 log 6 log 12

Then, since 6n > (n2 + n)2 for n ≥ 3, we have that for all n ≥ 3,
n log 2 2 log 2
κ(n) = > = κ(2).
log(n(n + 1)) log 6
1.4 Two further examples | 51

Concerning the existence of a constant κ for which the map θ α H is κ-sub-Hölder


continuous, recall that this means we have to find κ > 0 such that for all x, y ∈ [0, 1],

|θ α H (x) − θ α H (y)|  |x − y |κ .

In particular, this inequality has to be satisfied for x = 0 and y given by [n]α H = 1/n
successively, for each n ∈ N. But here we have that |θ α H (0) − θ α H (1/n)| = 2−(n−1) and
|0 − 1/n | = 1/n, which implies that there can be no such κ.
Notice that, in line with the fact that there is no sub-Hölder continuity in this case,
we also have that κ− is infinite.

Remark 1.4.15. The reasoning given above for why the map θ α H fails to be sub-Hölder
continuous also works for the map Q (the conjugacy map between the Farey and
tent maps). In other words, there is no positive constant κ such that the map Q is
κ-sub-Hölder continuous.

1.4.4 Expanding and expansive partitions

Let us now introduce some particular classes of partitions that will be useful in the
chapters that follow, particularly in Chapter 3. Before beginning this task, we first
recall the definition of a slowly varying function.

Definition 1.4.16. A measurable function ψ : R+ → R+ is said to be slowly varying if

ψ(xy)
lim = 1, for all y > 0.
x→∞ ψ(x)

In the following proposition, we list some of the useful properties that slowly varying
functions satisfy. From this list, it should be clear that the idea behind a slowly varying
function is that it behaves like a logarithmic function.

Proposition 1.4.17. Let ψ, φ : R+ → R+ be two slowly varying functions. Then the


following three statements hold:
(a) For any ε > 0, we have that

lim x ε · ψ(x) = ∞ and lim x−ε · ψ(x) = 0.


x→∞ x→∞

log(ψ(x))
(b) lim = 0.
x→∞ log(x)
(c) For any −∞ < a < ∞, the functions ψ a , ψ · φ and ψ + φ are all slowly varying.

Proof. See the book by Seneta [Sen76].


52 | 1 Number-theoretical dynamical systems

Definition 1.4.18. Let α := {A n : n ∈ N} ∈ A. Then:


(a) The partition α is said to be expanding provided that
tn
lim = ρ, for some ρ > 1.
n→∞ t n+1

(b) The partition α is said to be expansive of exponent θ ≥ 0 if the tails of the partition
satisfy the power law

t n = ψ(n) · n−θ ,

where ψ : R+ → R+ is a slowly varying function.


(c) A partition α is said to be of finite type if for the sequence of tails t n of α, we have

that ∞ n=1 t n converges. Otherwise, α is said to be of infinite type.

Notice that if α is expanding, one immediately verifies that α is of finite type. This
can be seen, for instance, by applying the ratio test for series convergence. The next
proposition describes the situation for expansive partitions.

Proposition 1.4.19. Suppose that α is expansive of exponent θ ≥ 0. Then we have the


following classification:
(a) If θ ∈ [0, 1), then α is of infinite type.
(b) If θ > 1, then α is of finite type.
(c) If θ = 1, then α can be either of finite or infinite type.

Proof. Suppose first that α is expansive of exponent θ ∈ [0, 1). Then, by Proposi-
tion 1.4.17, for all ε > 0 there exists n0 ∈ N such that if n ≥ n0 , then we have that
ψ(n) ≥ n−ε . Let ε > 0 be sufficiently small such that θ + ε ∈ (0, 1). Then,


∞ n
0 −1 
∞ 
∞ 

tn = ψ(n) · n−θ + ψ(n) · n−θ ≥ n−(θ+ε) ≥ n−1 .
n=1 n=1 n=n0 n=n0 n=n0
∞
Consequently, n=1 t n = ∞ and α is of infinite type. Now suppose that α is expansive of
exponent θ > 1. Then, again by Proposition 1.4.17, for all ε > 0 there exists n0 ∈ N such
that if n ≥ n0 , then we have that ψ(n) ≤ n ε . For ε > 0 small enough such that θ − ε > 1,
we then have that

∞ 
∞ 

tn = ψ(n) · n−θ ≤ n−(θ−ε) < ∞.
n=n0 n=n0 n=n0

Therefore, in this case, the partition α is of finite type. It only remains to prove the
third assertion, which can be done by considering the following two examples. First,
let t1 := 1 and for each n ≥ 2, let t n := (n log n)−1 . The partition α defined in such a way
1.4 Two further examples | 53

is clearly expansive of exponent 1. For this partition, we have that


∞ 

1
tn = 1 + ,
n log n
n=1 n=2

which diverges. So, in this first case, the partition is of infinite type. On the other hand,
if now we define a partition by setting t1 := 1 and t n := n−1 ·(log n)−2 for n ≥ 2, we obtain
that

∞ 

1
tn = 1 + ,
n(log n)2
n=1 n=2

which is a convergent series, so in this case we have that the partition is of finite type.
This finishes the proof.
Fig. 1.9 illustrates two α-Farey maps with α expansive. The graph on the left-hand
side has α with exponent θ = 2, so satisfies the condition of the second part of
Proposition 1.4.19. The graph on the right-hand side has α with exponent θ = 1/2, so
it satisfies the condition given in the first part of Proposition 1.4.19.

1.4.5 Metrical Diophantine-like results for the α-Lüroth expansion

In this section, we consider some easily-obtained results for the α-Lüroth expan-
sion which are analogous to the metrical Diophantine results already given above

Fα (x) Fαʹ (x)


1 1

1
2
1
3

1
4
1
9

x 0 1 1 1 1
1 x
0 1 1 1 1 52 3 2
16 9 4

Fig. 1.9. The graphs of two α-Farey maps with α expansive. The partition on the left is of finite type
with α given by t n := 1/n2 , n ∈ N and the partition on the right is of infinite type with α given by

t n := 1/ n, n ∈ N.
54 | 1 Number-theoretical dynamical systems

for the continued fraction expansion. We will first consider the equivalent of the
badly-approximable numbers.

Definition 1.4.20. For each N ∈ N, let the set Bα,N be defined by

Bα,N := {x = [1 , 2 , . . .]α ∈ Iα : n ≤ N for all sufficiently large n ∈ N}

and set

Bα := Bα,N .
N∈N

The set Bα will be referred to as the set of badly α-approximable numbers.

Lemma 1.4.21.

λ (Bα ) = 0.

Proof. First notice that we can write the set of badly α-approximable numbers in the
following way:

Bα = Aα,N ,
N∈N

where

Aα,N := {x = [1 , 2 , . . .]α ∈ Iα : k ≤ N for all k ∈ N}.

Also, for each N, n ∈ N, define

A(n)
α,N := { x = [1 , 2 , . . .]α ∈ Iα : k ≤ N for all 1 ≤ k ≤ n } .

It is clear that Aα,N ⊆ A(n) (n+1) (n)


α,N and further that Aα,N ⊂ Aα,N for all N, n ∈ N. Notice that
we may also write A(n+1)
α,N in the following way:

A(n+1)
α,N = C α (1 , . . . , n+1 ) = C α (1 , . . . , n , k).
1 ,...,n+1 1 ,...,n k≤N
i ≤N, 1≤i≤n+1 i ≤N, 1≤i≤n

Thus, for all n ∈ N, we have that


  N  
λ A(n+1)
α,N = a k λ A(n)
α,N .
k=1

Hence, on applying this argument n − 1 more times, it follows that


 n
  
N  
λ A(n+1)
α,N = ak λ A(1)
α,N .
k=1
1.4 Two further examples | 55

N
Since the last term above is simply a constant and since 0 < k=1 a k < 1, this shows
that λ(Aα,N ) = 0, for any N ∈ N. Finally, we have that
∞  ∞

λ Aα,N ≤ λ(Aα,N ) = 0.
N=1 N=1

This finishes the proof of the lemma.

Corollary 1.4.22. Let Wα be the set defined by

Wα := {x = [1 , 2 , . . .]α ∈ Iα : lim sup n = ∞}.


n→∞

Then, Wα is of full Lebesgue measure.

Proof. Notice that the complement of Wα is the set of all those α-irrational num-
bers with bounded α-Lüroth elements. The corollary then follows directly from
Lemma 1.4.21.
Although the sets Aα,N defined in the proof of Lemma 1.4.21 have Lebesgue measure
zero for every N ∈ N, we can still distinguish between their sizes by calculating their
Hausdorff dimension. Luckily, this is very easy to do, as the next lemma demonstrates.
The reader unfamiliar with the Hausdorff dimension of a set can either refer, for
instance, to the book by Falconer [Fal14], or can safely ignore the next two results,
as they will not be needed for anything that follows.

Lemma 1.4.23. For each N ∈ N, we have

  
N
dimH Aα,N = s, where s is given by a si = 1.
i=1

Proof. All that is required to prove this statement is to notice that for each N ∈ N the
set Aα,N is an invariant set for a finite iterated function system {L α,1 , . . . , L α,N }, where
L α,n denotes the n-th inverse branch of the map L α . Recall that these are given by
L α,n (x) := t n − a n x. That Aα,N is an invariant set for this system means that
N
 
Aα,N = L α,i Aα,N .
i=1

Then, since these inverse branches are contracting similarities, that is, they satisfy
the equality |L α,i (x) − L α,i (y)| = a i |x − y| for all x, y ∈ [0, 1], we have that the dimension
of Aα,N can be deduced directly from an application of Hutchinson’s Formula (see
[Fal14], Theorem 9.3).
This observation can be used to calculate the Hausdorff dimension of the set Bα , as
follows.
56 | 1 Number-theoretical dynamical systems

Proposition 1.4.24. Let α be an arbitrary partition of [0, 1]. Then,

dimH (Bα ) = 1.

Proof. Since Bα := N∈N AN , we have that
 
dimH (Bα ) = sup dimH (AN ) : N ∈ N .
  N s
Then, by Lemma 1.4.23, dimH Aα,N = s, where s is given by i=1 a i = 1 and
  N+1 t
dimH Aα,N+1 = t, where t is given by i=1 a i = 1. Therefore, a1t + · · · + a tN < 1 and
so s < t. In other words,
   
dimH Aα,N < dimH Aα,N+1 .
∞
Furthermore, as i=1 a i = 1, it follows that dimH (Bα ) = 1.

Note that similarly, the Hausdorff dimension of the set of badly approximable numbers
(for the continued fraction expansion) is also known to be equal to 1. However, the
proof is much more involved, so we simply refer to [Jar29]. Let us now consider the
result analogous to Theorem 1.2.19.

Theorem 1.4.25.

(a) Let φ : N → N be a function such that the series ∞n=1 t φ(n) diverges. Where the set
Bα,φ is defined by

Bα,φ := {x = [1 , 2 , . . .]α ∈ Iα : k < φ(k) for all k ∈ N},

we have that

λ (Bα,φ ) = 0.
∞
(b) Let φ : N → N be a function such that the series n=1 t φ(n) converges. Where the set
Wα,φ is defined to be

Wα,φ := {x = [1 , 2 , . . .]α ∈ Iα : k > φ(k) infinitely often},

we have that

λ (Wα,φ ) = 0.

Proof. For the proof of part (a), we proceed similarly to the proof of Lemma 1.4.21.
n
Define the sets Bα,φ by setting

(n)
Bα,φ := {x = [1 , 2 , . . .]α ∈ Iα : k < φ(k) for all 1 ≤ k ≤ n}.
1.4 Two further examples | 57

(n+1) (n) (n)


Then, Bα,φ ⊂ Bα,φ and Bα,φ ⊂ Bα,φ for all n ∈ N. So, in order to prove that λ (Bα,φ ) = 0,
it suffices to show that
 
(n)
lim λ Bα,φ = 0.
n→∞

To that end, notice that for arbitrary (1 , . . . , n ) ∈ Nn , we can write


⎛ ⎞ ⎛ ⎞

φ(n+1)−1
λ⎝ C α (1 , . . . , n , k)⎠ = ⎝ a k ⎠ λ(C α (1 , . . . , n ))
1≤k<φ(n+1) k=1
 
= 1 − t φ(n+1) λ(C α (1 , . . . , n )).

Thus,
     (n)  !
n
   (1) 
(n+1)
λ Bα,φ = 1 − t φ(n+1) λ Bα,φ = ··· = 1 − t φ(k+1) λ Bα,φ .
k=1

−x
Now, since 1 − x ≤ e for all 0 < x < 1, we then have that
  n  
(n+1)
λ Bα,φ ≤ e− k=1 t φ(k+1) λ Bα,φ
(1)
.

Consequently, as the series nk=1 t φ(k+1) can be made arbitrarily large as n increases,
 
(n)
we have that limn→∞ λ Bα,φ = 0.
Concerning the proof of part (b), we will again use the Borel–Cantelli Lemma.
Notice that
(n) (n)
Wα,φ = lim sup Wα,φ , where Wα,φ := {x ∈ Iα : n ≥ φ(n)}.
n→∞

Thus, to finish the proof, it is enough to show that


∞  
(n)
λ Wα,φ < ∞.
n=1

Indeed,
⎛ ⎞
 
(n)
λ Wα,φ = λ⎝ C α (1 , . . . , n−1 , k)⎠
(1 ,...,n−1 )∈Nn k:k≥φ(n)
 
= λ(C α (1 , . . . , n−1 , k))
(1 ,...,n−1 )∈Nn k:k≥φ(n)
 
= a1 . . . an−1 a k = t φ(n) .
(1 ,...,n−1 )∈Nn k:k≥φ(n)
58 | 1 Number-theoretical dynamical systems


By assumption, the series ∞ n=1 t φ(n) converges and so, therefore, does the series
∞  
(n)
n=1 λ Wα,φ . This finishes the proof.

Remark 1.4.26. Notice that if the partition α is of finite type, that is, if α is such that
∞
n=1 t n converges, then we have that λ(Bα,φ ) = 1 for any arbitrary increasing function
φ : N → N.

1.5 Notes and historical remarks

1.5.1 The Farey sequence

The Farey map is named for John Farey (1766–1826), who was not a mathematician,
but a geologist. Farey’s one contribution to Mathematics was the article On a curious
property of vulgar fractions [Far16], in which he defines Farey sequences in the
following way. For each n ∈ N, list all the rationals between 0 and 1 which, when
expressed in their lowest terms, have denominator at most equal to n. Denoting the
n-th Farey sequence by Fn , the first few are given by
$ % $ % $ %
0 1 0 1 1 0 1 1 2 1
F1 := , , F2 := , , , F3 := , , , , ,
1 1 1 2 1 1 3 2 3 1
$ % $ %
0 1 1 1 2 3 1 0 1 1 1 2 1 3 2 3 1
F4 := , , , , , , , F5 := , , , , , , , , , ,...
1 4 3 2 3 4 1 1 5 4 3 5 2 5 3 4 1

The curious property of Farey’s title is that each member of the sequence is equal to the
mediant of its two neighbours. Recall that the mediant of two rational numbers a/b
and a /b is by definition the rational number (a + a )/(b + b ). Farey did not himself
provide a proof of his discovered property² and he was doubtless not the first to notice
it. Cauchy supplied the necessary proof in the same year that Farey’s article appeared.
We have already seen that if we iterate the point 1/2 under the two inverse
branches of the Farey map, each time one of the Farey fractions turns up. However,
as we have already pointed out, strictly speaking it is not the Farey sequence which
appears in this manner, but rather the Stern–Brocot sequence. The Stern–Brocot
sequence was independently discovered by the German number-theorist Moritz Stern
[Ste58] and the French clockmaker Achille Brocot [Bro61]. (Brocot used the sequences
to design systems of gears. For information on these sorts of applications, see Chapter
IV of Rockett and Szűsz [RS92].) For this reason, it would perhaps be more reasonable
to refer to the Farey map as the Stern–Brocot map. However, we choose to stick with
convention on this point.

2 That Farey did not give a proof of his curious property was pointed out by Hardy [HW08], with the
rather unfriendly comment that Farey was “at the best an indifferent mathematician”.
1.5 Notes and historical remarks | 59

1.5.2 The classical Lüroth series

In the paper [Lür83], J. Lüroth introduces a series representation of real numbers from
the unit interval. His starting point is the observation that for every real number x in
the interval (0, 1), either x = 1/, for some positive integer  ≥ 2, or, 1/x lies between
two successive positive integers 1 and 1 + 1 and so
1
x= +
x,
1 + 1

where, since x < 1/1 , we have that 0 < 


x < 1/(1 (1 + 1)). Now, defining x1 := 
x(1 + 1)1
supplies the equation
1 x1
x= + .
1 + 1 1 (1 + 1)

Note that since 0 < 


x < 1/(1 (1 +1)), we also obtain the inequality 0 < x1 < 1. Therefore,
the same argument holds for x1 as for the original point x, which leads to the equation
1 1 x2
x= + + .
1 + 1 1 (1 + 1)(2 + 1) 1 (1 + 1)2 (2 + 1)

Clearly, this process either continues until such a time as one of the x i is equal to the
reciprocal of a positive integer that is at least equal to 2, or continues indefinitely. For
the special case that x = 1, we notice that 1 = 1/2 + 1/4 + 1/8 + . . .. In each case, this
gives the series expansion now called the Lüroth expansion of a real number in [0, 1].
Each finite expansion of the form above represents a rational number. Suppose
now that x ∈ [0, 1] has an infinite Lüroth expansion. Since each k is at least equal to
1, for the k-th term in the Lüroth expansion of x we have that
1 1
≤ .
1 (1 + 1) . . . k−1 (k−1 + 1)(k + 1) 2k

Thus, it makes sense to write


 

∞ !
n
−1
x= n (k (k + 1)) .
n=1 k=1

We will use Lüroth’s original notation and write x = S(1 , 2 , . . .) for this sum. For
instance, we have that 1 = S(1, 1, 1, . . .). The next observation in [Lür83] is that if
x ∈ [0, 1] has a finite Lüroth expansion, that is, if x = S(1 , 2 , . . . , k ) for some k ∈ N,
then

x = S(1 , 2 , . . . , (k + 1), 1, 1, 1, . . .).

This is straightforward to check. It follows that every number x in (0, 1] has an


infinite Lüroth expansion. In fact, this infinite representation is unique. As already
60 | 1 Number-theoretical dynamical systems

mentioned, each finite Lüroth expansion represents a rational number, but it is easy
to see, using only the sum of a geometric series, that each (eventually) periodic infinite
Lüroth expansion is also a rational number. Of course, each finite Lüroth expansion
can also be written as an eventually periodic expansion; in this case the periodic
part consists of infinitely many ones. The proof of these statements are also given in
[Lür83].
It seems probable that Lüroth was thinking of a generalisation of the decimal
expansion of a real number when he introduced his infinite series expansion. He
states that the given expansion has many similarities with the representation through
infinite decimal expansions and asks whether or not it is possible to characterise the
numbers which have a finite Lüroth expansion in any other way, that is, as in the
way that rational numbers with finite decimal representations are exactly those with
denominators equal to 2n 5m for some positive integers n and m. As of the present
moment, we are unaware of any answer to this question.
The Lüroth expansion can also be generated by a dynamical system, L : [0, 1] →
[0, 1]. The map L is referred to as the Lüroth map and it is defined by
⎧ & 
⎪ 1 1

⎪ n(n + 1)x − n for x ∈ , , n ≥ 2;

⎨ n+1 n
& '
L(x) := 1

⎪ 2x − 1 for x ∈ ,1 ;

⎪ 2

0 for x = 0.

The graph of the map L is shown in Fig. 1.10 below.


The Lüroth expansion of a real number in [0, 1] is generated by the Lüroth map
in precisely the same way as in all the other examples given above. It can be seen
from the graph of the map L that it is (basically) nothing other than the map L α H with

L (x)
1

0 1 1 x
2

Fig. 1.10. The Lüroth map, L : [0, 1] → [0, 1].


1.6 Exercises | 61

all positive slopes instead of all negative slopes. The map L α H was first described by
S. Kalpazidou, A. Knopfmacher and J. Knopfmacher [KKK91] in the early 1990s; they
called it the alternating Lüroth map and established some of its basic properties. The
Lüroth map and, to a lesser extent, the alternating Lüroth map have been studied
by several authors. In addition to those already cited above, these works include
[BBDK96], [DK96], [Gal72], [Gan01], [Šal68], [SW07] and in particular [DK02].

1.6 Exercises

Exercise 1.6.1 (Dirichlet’s Approximation Theorem). Fix x ∈ R. Prove that for every
N ∈ N there exists p, q ∈ Z with 1 ≤ q ≤ N such that
1
|xq − p| ≤
N
and deduce that for infinitely many co-prime integers p and q we have that
 
 
x − p  ≤ 1 .
 q  q2

Hint: Apply the Pigeonhole Principle to the ‘pigeons’ kx − kx , for k = 0, . . . , N and the
‘holes’ [/N, ( + 1)/N),  = 0, . . . , N − 1.

Exercise 1.6.2. Recall the following good decimal approximation of π:

π ≈ 3.141592653589793 . . .

(i) Find the first four elements in the continued fraction expanion of π.
(ii) Determine the first four convergents of π.

Exercise 1.6.3. Let x, y ∈ R, A, B, C ∈ Z, and a, b, c, d ∈ N such that cy + d = 0. Show


that if
ay + b
x= and Ay2 + By + C = 0,
cy + d

then x satisfies

Dx2 + Ex + F = 0, for some D, E, F ∈ Z.

Determine D, E and F in terms of A, B, C, a, b, c and d.

Exercise 1.6.4. Let (f n ) denote the Fibonacci sequence, that is, f0 := 1, f1 := 1 and
f n+2 := f n+1 + f n . Show that for the generating function we have



z
fk zk =
1 − z − z2
k=0
62 | 1 Number-theoretical dynamical systems

and deduce that


1  
f n := √ γ n − (−γ )−n ,
5

where γ := (1 + 5)/2 denotes the golden mean.

Exercise 1.6.5. Taking inspiration from the proof of Hurwitz’s Theorem II, prove that
√ √
for x := 2 − 1 = [2, 2, 2, . . .], we have ν (x) = 1/ 8.

Exercise 1.6.6. Prove that if x and y are two equivalent irrational numbers (in the
sense of Definition 1.1.13 (c)), then ν (x) = ν (y).

Exercise 1.6.7. Let x and y be two equivalent irrational numbers. Show that there exist
integers a, b, c and d such that

ay + b
x= ,
cy + d

with ad − bc = ±1.

Exercise 1.6.8. Show that for every n ∈ N, the continued fraction expansion of

n2 + 1 − n is given by the periodic expansion
√ 1
n2 + 1 − n = [2n] = .
1
2n +
2n + . . .

Exercise 1.6.9. Let e = ∞k=0 1/k! = 2.71828 . . . be the base of the natural logarithm,
and note that is known that the continued fraction expansion of e is given by

e = a0 + [a1 , a2 , . . .] = 2 + [1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, . . .],

where a0 := 2, a1 := 1, and for n ≥ 1 we have that a3n−1 := 2n and a3n = a3n+1 := 1. Show
that the following statements are true for every integer n ≥ 1.
(i) For exactly one element i ∈ {n, n + 1, n + 2}, we have that
 
 
e − p i  < 3
.
 q i  2(i + 2)q2i

(ii) For exactly two elements i ∈ {n, n + 1, n + 2}, we have that


 
 
e − p i  > 1 .
 q i  3q2i

Exercise 1.6.10. Show that the function d : E∞ × E∞ → [0, 1] defined in Definition 1.2.10
is really a metric.

Exercise 1.6.11. Show that the space NN equipped with the metric d from Defini-
tion 1.2.10 is a complete metric space which is not locally compact.
1.6 Exercises | 63

Exercise 1.6.12. Show that the map ψ : I → NN defined by

ψ([x1 , x2 , x3 , . . .]) := (x1 , x2 , x3 , . . .)

is a topological conjugacy map between the Gauss system and the full shift map on
the shift space NN .

Exercise 1.6.13. Show that the tent map and the map L : [0, 1] → [0, 1], x → 4x(1 − x),
are conjugated via
1 1
ψ : [0, 1] → [0, 1], x → − cos(πx).
2 2
Exercise 1.6.14. Prove Cantor’s Intersection Theorem: If (S n )n∈N is a decreasing
sequence (so S n+1 ⊆ S n ) of non-empty compact sets in R (more generally, a complete
)
metric space), with diameters shrinking to zero, then the intersection S := n∈N S n is
a singleton.

Hint: Choose a sequence of points x n ∈ S n . Show that (x n )n∈N is a Cauchy sequence


and therefore converges, to x say. Show that this x lies in each set S n . Finally, show that
if x, y ∈ S, then x = y.

Exercise 1.6.15. Show that Minkowski’s question-mark function Q maps the set of
rational numbers onto the dyadic rationals and maps the quadratic surds onto the
set of non-dyadic rationals and show that their order is preserved.

Exercise 1.6.16. Re-derive the formula given by Denjoy for the function Q from the
definition given in terms of mediants.

Hint: Find inspiration in the proof of Proposition 1.3.11.

Exercise 1.6.17. Prove Proposition 1.4.12.

Exercise 1.6.18. Provide the missing details in the proof that the map L α is κ+ -Hölder
continuous.

Exercise 1.6.19. Supply the missing details in the proof that L α is κ− -sub-Hölder
continuous.

Exercise 1.6.20. Find an operator K on the set S of continuous, surjective, strictly


increasing functions defined on the unit interval such that K n (f ) converges uniformly
to Minkowski’s question-mark function Q for any f ∈ S .
2 Basic ergodic theory
In this chapter, we aim to investigate the basic measure- and ergodic-theoretic
properties of all our main examples. Along the way, we will recall some standard
definitions and theorems that will be of use to us when we come to consider these
particular examples. We will assume that the reader is somewhat familiar with
measures and basic measure theory, but any definitions or theorems that we do not
mention explicitly can be found in, for instance, either [Coh80] or [Rud87].

2.1 Invariant measures

From this point on, we are interested in (measure-theoretic) dynamical systems, in


other words, systems (X, B , μ, T) where (X, B , μ) is a σ-finite measure space and T : X →
X is a measurable transformation. The most fundamental objects of interest in ergodic
theory are measures which remain invariant under the dynamics of the system. These
are the subject of our first definition.

Definition 2.1.1. Let (X, B , μ, T) be a dynamical system. The measure μ is called


T-invariant if we have

μ ◦ T −1 (A) := μ(T −1 (A)) = μ(A) for all A ∈ B .

We also say that T : X → X is a measure-preserving transformation and call the system


(X, B , μ, T) a measure-preserving system.

Remark 2.1.2. For every T-invariant measure μ and any set A ∈ B of finite μ-measure,
we have

μ(T −1 (A) \ A) = μ(T −1 (A)) − μ(T −1 (A) ∩ A)


= μ(A) − μ(T −1 (A) ∩ A) = μ(A \ T −1 (A)).

In practice, it can be difficult to check that a map preserves a given measure using
only this definition, as it is often the case that no specific information is known about
a general measurable set. However, it is enough to have knowledge of a particular class
of sets that generates the σ-algebra of measurable sets, as we now show. (Recall that
the σ-algebra generated by a collection C of subsets of X is the smallest, in the sense
of inclusion, σ-algebra that contains all the sets in C .)
2.1 Invariant measures | 65

Lemma 2.1.3. Let (X, B , μ) be a measure space and let T : X → X be a measurable


function. Suppose that S is a collection of subsets of X closed under taking intersections
with σ(S ) = B and such that μ is σ-finite on S . Then, if μ ◦ T −1 (B) = μ(B) for every set
B ∈ S , we have that the map T preserves the measure μ.

Proof. Since T is measurable, we have that μ ◦ T −1 defines a measure on B. Moreover,


this measure coincides with μ on S . The facts that S is closed under taking intersec-
tions and generates B and that μ is σ-finite on S guarantee that the measures μ and
μ ◦ T −1 coincide on B.

Example 2.1.4. The collection of all subintervals of [0, 1] together with the empty set
generates the Borel σ-algebra on [0, 1] and is closed under taking intersections.

To motivate the definition of T-invariance from the stochastic point of view let us
consider a probability space (X, B , μ) together with a measurable transformation
T : X → X. Then the stochastic process given by (g ◦ T k )k∈N0 defined on (X, B , μ)
is stationary for every integrable function g : X → R if and only if μ is T-invariant.
Stationarity follows from T-invariance by noting that for every Borel set B and every
k ∈ N we have

μ({x ∈ X : g ◦ T k (x) ∈ A}) = μ(T −k {x ∈ X : g(x) ∈ A}) = μ({x ∈ X : g(x) ∈ A}),

if μ is T-invariant. Stationarity implies T-invariance by taking g := 1 A for a measurable


set A and k = 1 and observing that

μ(T −1 (A)) = Eμ (1 A ◦ T) = Eμ (1 A ) = μ(A).

Example 2.1.5. For the tent map T : [0, 1] → [0, 1] it is easily seen that for any interval
(a, b) ∈ [0, 1] we have that
   
a b 2−b 2−a
λ(T −1 (a, b)) = λ , ∪ ,
2 2 2 2
1 1
= (b − a) + (b − a) = b − a = λ((a, b)).
2 2
Thus, it follows from Lemma 2.1.3 that the tent map preserves the Lebesgue measure.

The following definitions will be used throughout the book.

Definition 2.1.6. Let (X, B , μ) be a measure space and let M(B) denote the set of
all real-valued B-measurable functions. We now define an equivalence relation by
identifying two elements f , g ∈ M(B) if they satisfy

f ∼ g : ⇐⇒ μ({f = g }) = 0. (2.1)

Then we define

M(B) := M(B)/ ∼ .
66 | 2 Basic ergodic theory

Further, define
$ , %
L1 (μ) := f : X → R : f measurable and |f | dμ < ∞ / ∼

and

L∞ (μ) := {f : X → R : f measurable and ess sup|f | < ∞} / ∼ .

If it is clear from the context, we will not distinguish between measurable functions
and their equivalence classes given by (2.1). For any subset F of M(B), we let F + denote
the subset of non-negative elements from F .

For the case of a finite measure space, the following proposition gives a useful
characterisation of T-invariant measures in terms of the space of L1 (μ) functions.

Proposition 2.1.7. Let T : (X, B , μ) → (X, B , μ) be a measurable transformation on a


probability space (X, B , μ). Then μ is T-invariant if and only if
, ,
f dμ = f ◦ T dμ, for all f ∈ L1 (μ). (2.2)

Proof. Suppose first that (2.2) holds. In that case, for any measurable set B we can let
f = 1 B to obtain that
, ,
μ(B) = 1 B dμ = 1 B ◦ T dμ = μ(T −1 (B)).

On the other hand, if T preserves the measure μ, then (2.2) holds for all characteristic
functions and therefore it also holds for all simple functions. If f ∈ L1 (μ), we can find a
sequence (f n )n≥1 of simple functions increasing to f ; moreover, the sequence (f n ◦ T)n≥1
is a sequence of simple functions increasing to f ◦ T. We can then apply the Monotone
Convergence Theorem to deduce that
, , , ,
f dμ = lim f n dμ = lim f n ◦ T dμ = f ◦ T dμ.
n→∞ n→∞

Since f ∈ L1 (μ) was arbitrary, the proof is finished.


The proof of Proposition 2.1.7 actually shows rather more, namely, that it is enough to
check that the condition holds for all bounded measurable functions.

2.1.1 Invariant measures for the Gauss and α-Lüroth system

Our aim now is to describe the invariant measures for our main examples. Let us begin
with the α-Lüroth systems.
2.1 Invariant measures | 67

Proposition 2.1.8. The Lebesgue measure λ is L α -invariant.

Proof. Recall from Section 1.4.1 that the inverse branches L α,n : [0, 1) → A n of L α are
given by L α,n (x) := t n − a n x, for all n ∈ N. In order to show that λ is L α -invariant, by
Lemma 2.1.3 it is enough to show that for every subinterval I contained in [0, 1], we
have that

λ(I) = λ(L −1
α (I)).

In fact, since the Lebesgue measure is non-atomic, it suffices to let I := [a, b] for some
0 ≤ a < b ≤ 1. A straightforward calculation shows that
∞  ∞
−1

λ(L α [a, b]) = λ L α,n ([a, b]) = λ(L α,n ([a, b]))
n=1 n=1

∞ 

= |(t n − a n a) − (t n − a n b)| = a n (b − a)
n=1 n=1

= b − a = λ([a, b]).

As you will show in Exercise 2.6.1, the Gauss map is not preserved by the Le-
besgue measure. However, Gauss observed in 1845 that the map G does preserve a
Lebesgue-absolutely continuous measure, which we now define.

Definition 2.1.9. The Gauss measure m G is given, for all Borel measurable sets A ⊆
[0, 1], by
,
1 1
m G (A) := dλ(x).
log 2 1 + x
A

Proposition 2.1.10. The Gauss map G preserves the Gauss measure m G .

Proof. It suffices to show that m G ([0, b]) = m G ◦ G−1 ([0, b]), for any 0 ≤ b ≤ 1. Recalling
that G n refers to the n-th inverse branch of G, which is given by G n (x) := 1/(x + n), we
have that
∞ ∞ & '
1 1
G−1 ([0, b]) = G n ([0, b]) = , .
n+b n
n=1 n=1

Thus, first observing that

1 n+1 n+b b
1+ · 1+
n = n n+1 = n ,
1 n+b+1 n+b b
1+ · 1+
n+b n+b n+1 n+1
68 | 2 Basic ergodic theory

the following calculation finishes the proof:

,1/n
1 

−1 1
m G (G ([0, b])) = dx
log 2 1+x
n=1
1/(n+b)
∞     
1  1 1
= log 1 + − log 1 +
log 2 n n+b
n=1
    
1 

b b
= log 1 + − log 1 +
log 2 n n+1
n=1
,b/n
1 

1
= dx
log 2 1+x
n=1
b/(n+1)

∞ & '
b b
= mG ,
n+1 n
n=1

= m G ([0, b]).

Remark 2.1.11. In order to make a guess at how Gauss arrived at his invariant measure,
we make the simple arithmetic observation that


∞ ∞ 
 
1 1 1 1
= − = ,
(n + x)(n + 1 + x) n+x n+1+x 1+x
n=1 n=1

which is equivalent to



1 1 1
= .
(n + x)2 1 + 1 1+x
n=1
n+x
This infinite sum can be expressed in terms of the Gauss map G as follows, with
h G (x) := 1/((1 + x) log 2), and, as usual, with G n referring to the n-th inverse branch
of G,


|Gn (x)| h G (G n (x)) = h G (x).
n=1

The significance of this formula will be apparent shortly, when we come to describe
the transfer operator.

Finally, we finish this section with the observation that the Gauss measure belongs to
the same measure class as the Lebesgue measure.
2.2 Recurrence and conservativity | 69

Proposition 2.1.12. For any Borel set B ⊆ [0, 1], we have that

λ(B) λ(B)
≤ m G (B) ≤ .
2 log 2 log 2

Proof. Since in the following x lies between 0 and 1, we have that


, ,
1 1 1 λ(B)
m G (B) = dλ(x) ≤ 1 dλ(x) = ,
log 2 1 + x log 2 log 2
B B

and
, ,
1 1 1 1 λ(B)
m G (B) = dλ(x) ≥ dλ(x) = .
log 2 1+x log 2 2 2 log 2
B B

2.2 Recurrence and conservativity

Let us now study a general property of invariant measures by taking a detour to present
one of the fundamental results of ergodic theory, namely, Halmos’s Recurrence
Theorem [Hal56]. This theorem states that for a conservative transformation (which
will be defined momentarily) on a σ-finite measure space, almost all points of a given
set return infinitely often to that set under iteration. Although it is relatively easy to
prove, the importance of this theorem should not be underestimated, as it is really
one of the very few completely general theorems in all of ergodic theory. Theorems of
this type were initially established for finite systems, but the proofs are essentially the
same.

Definition 2.2.1. Let (X, B , μ, T) be a measure-theoretic dynamical system. A set W ∈ B


is called a wandering set for T if {T −n (W) : n ≥ 0} is a collection of pairwise disjoint
sets. The collection of all wandering sets will be denoted by WT

Note that if W ⊆ X is a wandering set for a map T : X → X, then this implies that



1 W ◦ T n ≤ 1.
n=0

Definition 2.2.2. Let (X, B , μ) be a measure space and T : X → X a map. Then T is


said to be conservative if every wandering set for T is a null set for μ. If (X, B , μ, T)
is a measure-preserving system and T is conservative, then we will call the system a
conservative measure-preserving system.

Proposition 2.2.3. Any measure-preserving system (X, B , μ, T) with μ(X) finite is con-
servative.
70 | 2 Basic ergodic theory

Proof. Fix W ∈ B and assume that T −k (W) defines a disjoint family of sets for k ∈ N0 .
Then
 
−k
 
μ(X) ≥ μ T W = μ(T −k W) = μ (W )
k∈N k∈N k∈N

implies that μ(W) = 0.


Note that infinite measure spaces are not guaranteed to be conservative. Indeed, the
simplest counterexample is a translation of the real line. For instance, consider the
map given by T(x) = x + 1, which certainly preserves the Lebesgue measure on R, and
let A = (0, 1). Clearly, no point of A ever comes back to A under iteration of T, although
the Lebesgue measure of A is evidently not equal to zero.
We are now ready to prove the main result of this section. For this we make the
following definition.

Definition 2.2.4. A transformation T : X → X of a measure space (X, B , μ) is said to be


non-singular if for each B ∈ B with μ(B) = 0 we have that μ(T −1 (B)) = 0. That is, the map
T −1 preserves sets of μ-measure zero. In this case we also call the dynamical system
(X, B , μ, T ) non-singular.
Remark 2.2.5. A system is sometimes said to be (two-sided) non-singular if for each
B ∈ B we have μ(B) = 0 if and only if μ(T −1 (B)) = 0. We never need this stronger
property.

Let us recall that the symmetric difference A B of two sets A and B is defined by
A B := (A \ B) ∪ (B \ A) = (A ∪ B) \ (A ∩ B). Hereafter, we will use the notation
“A = B mod μ”, (respectively, “A ⊂ B mod μ”) to indicate that two sets are equal
(respectively, A is contained in B) up to a set of μ-measure zero, i.e., μ(A B) = 0
(respectively, μ(A \ B) = 0).

Theorem 2.2.6 (Halmos’s Recurrence Theorem). Let (X, B , μ, T) be a non-singular dy-


namical system. Then for every set A ∈ B we have that μ(A ∩ W) = 0 for all wandering sets
W ∈ B if and only if for all measurable subsets B ⊂ A the set of points from B returning
infinitely often under T to B is equal to B mod μ, i.e.,
 
μ {x ∈ B : T n (x) ∈ B for infinitely many n ≥ 1} = μ(B).

Proof. For B ⊂ A measurable, let



N := {x ∈ B : T n (x) ∈/ B for all n ≥ 1} = B \ T −n B.
n=1

Our first claim is that μ(N) = 0. To show this, let x ∈ N. Then for every n ≥ 1 we have
that T n (x) ∈/ B, and therefore T n (x) ∈/ N. This shows that N ∩ T −n (N) = ∅, for all n ≥ 1.
2.2 Recurrence and conservativity | 71

Hence, it follows that for all i, j ∈ N such that j < i, we have that
 
T −j (N) ∩ T −i (N) = T −j N ∩ T −(i−j) (N) = T −j (∅) = ∅.

So, the preimages {T −n (N) : n = 0, 1, 2, ...} of N under the iterates of T form a pairwise
disjoint family of sets, that is to say, N is a wandering subset of A. By our assumption
on A and since T is assumed to be non-singular, it follows that 0 = μ(N) = μ(T −n N), for

all n ∈ N. Since N = B \ ∞ −n
n=1 T B this shows that for all n ≥ 0 we have

T −n B ⊂ T −k B mod μ.
k>n

Consequently,

T −k B = T −n−1 B ∪ T −k B = T −k B mod μ.
k>n k>n+1 k>n+1

From this we deduce that

B⊂ T −k B = T −k B = · · · = T −k B
k=0 k>1 n∈N k≥n

= {x ∈ X : T n (x) ∈ B for infinitely many n ≥ 1} mod μ.

For the reverse implication we assume that μ(A ∩ W) > 0 for some wandering set W ∈ B.
Then for the set B := A ∩ W of positive measure we have T −n B ∩ B = ∅, for all n ∈ N.
Note that the recurrence property for B in Halmos’s Recurrence Theorem can be stated
equivalently as follows



1 B ◦ T n = ∞ μ-a.e. on B.
n=1

This leads to the following useful observation.

Lemma 2.2.7. Let (X, B , μ, T) be a measure-preserving system and f , g two measurable


functions with g ∈ L+1 (μ) and f > 0. Then
# - # -
∞
k
∞
k
x∈X: g ◦ T (x) = ∞ ⊂ x ∈ X : f ◦ T (x) = ∞ mod μ.
k=0 k=0
. /

Proof. Let us consider the increasing sequence of sets W fN := k=0 f ◦ T k
< N which
. /

k=0 f ◦ T < ∞ . Using the fact that T W f ⊂ W f , for every k ∈ N, and
k −k N N
have union
the T-invariance of μ we have for every n ∈ N
72 | 2 Basic ergodic theory

, 
n n ,

g ◦ T k f dμ = 1 W N · g ◦ T n−k · f dμ
f
k=0 k=0
W fN
n ,

= 1 W N ◦ T k · f ◦ T k · g ◦ T n dμ
f
k=0
, 
n
= g ◦ Tn 1 T −k W N · f ◦ T k dμ
f
k=0
, 
n ,
≤ g ◦ T n · 1W N · f ◦ T k dμ ≤ N g dμ < ∞.
f
k=0

Since f > 0 this shows that μ-a.e. on W fN the infinite sum ∞ k
k=0 g ◦ T is finite. Hence,
. /  . /
∞ k N ∞ k
k=0 f ◦ T < ∞ = N Wf ⊂ k=0 g ◦ T < ∞ . Taking complements proves the
inclusion.
Now Lemma 2.2.7 gives rise to the definition of the Hopf decomposition of X with
respect to a measure-preserving transformation T.

Definition 2.2.8 (Hopf decomposition). Let (X, B , μ, T) be a measure-preserving sys-


tem. Then the (μ-a.e. determined) Hopf decomposition of X with respect to T refers to
the decomposition into the conservative part
# -


k
C T := x ∈ X : g ◦ T (x) = ∞
k=0

for some g ∈ L1 (μ) such that g > 0 and the dissipative part D T := X \ C T .

By Lemma 2.2.7 the uniqueness of C T mod μ is guaranteed for this decomposition


and, as we will shortly see, we further have for every measure-preserving transforma-
tion T that X = C T mod μ is equivalent to T being conservative (cf. Corollary 2.2.11).

Proposition 2.2.9. Let (X, B , μ, T) be a measure-preserving system. Then:


(a) All wandering sets are subsets (mod μ) of the dissipative part D T of X.
(b) For all measurable subsets A of the dissipative part D T with μ(A) > 0 there exists a
wandering set W ⊂ A with μ(W) > 0.

Proof. To prove (a) fix f ∈ L1 (μ) with f > 0 and let W ∈ WT with μ(W) > 0. Then, for
every n ∈ N, we have
, 
n n ,

f ◦ T k dμ = 1 W · f ◦ T n−k dμ
W k=0 k=0
n ,

= 1 W ◦ T k · f ◦ T n dμ
k=0
, 
n ,
= f ◦ Tn · 1 W ◦ T k dμ ≤ f dμ < ∞.
k=0
2.2 Recurrence and conservativity | 73

Since we suppose that the measure of W is positive, we have that the infinite sum
∞ k
k=0 f ◦ T must converge μ almost surely on W. Hence, W is contained in the
dissipative part.
Towards part (b) fix a measurable subset A of the dissipative part with positive
measure and without loss of generality we also assume μ(A) to be finite. If we make the
assumption that μ(A ∩ W) = 0 for all wandering sets W ∈ B, then Halmos’s Recurrence

Theorem (Theorem 2.2.6) would imply that n∈N 1 A ◦ T n = ∞ almost everywhere on A.
But then Lemma 2.2.7 with g := 1 A , would imply that A is a subset of the conservative
part. This contradiction finishes the proof.

Remark 2.2.10. What we have just proved is that the dissipative part D T of a
measure-preserving system is the measurable union of WT . This means by definition
that the collection WT is hereditary (i.e., measurable subsets of wandering sets are
wandering sets) and that the properties of WT described in (a) (that WT is said to
cover D T ) and (b) (that WT is said to saturate D T ) are fulfilled. In fact, any measurable
set with these properties is uniquely determined mod μ. To see why, suppose there
were two measurable unions D and D such that μ(D \ D ) > 0. Then by property (b)
there exists a wandering set W ⊂ D \ D with positive measure and by property (a)
this set must lie in D which gives a contradiction. The same argument with D and D
interchanged gives that D = D mod μ.
The measurable union for a hereditary family of measurable sets (like WT ) always
exists and can also be constructed abstractly.

Corollary 2.2.11. Let (X, B , μ, T) be a measure-preserving system. Then the system is


conservative if and only if

CT = X mod μ.

Proof. If the system is conservative, then by definition all elements of WT have


measure zero. Hence, by Proposition 2.2.9 (b) we have μ(D T ) = 0.
Conversely, if C T = X mod μ then μ(D T ) = 0. Hence, by Proposition 2.2.9 (a) the
system must be conservative.
The next theorem provides a convenient way of determining whether a given
transformation is conservative with respect to an infinite invariant measure
(cf. Remark 2.3.22 (3) for an application). We will use the following notion of a
sweep-out set.

Definition 2.2.12. Let (X, B , μ) be a measure space and T : X → X a map. A set A ∈ B is


called a sweep-out set for T if A is of finite, positive μ-measure and

T −n (A) = X mod μ.
n=0
74 | 2 Basic ergodic theory

Remark 2.2.13. Note that for any sweep-out set A we have

T −k A = T −n T −k A = T −n X = X mod μ,
k≥n k≥0

for all n ∈ N, and hence

T −k A = X mod μ.
n∈N k≥n

Theorem 2.2.14 (Maharam’s Recurrence Theorem). Let (X, B , μ, T) be a measure-


preserving system and suppose that there exists a sweep-out set A for T. Then T is
conservative.

Proof. This theorem is a direct consequence of Lemma 2.2.7 and Corollary 2.2.11 since
for g := 1 A ∈ L+1 (μ) and f ∈ L1 (μ) with f > 0, arbitrary, we obtain with the help of
Remark 2.2.13
# - # -
∞
k
∞
k
X= x∈X: 1 A (T (x)) = ∞ ⊂ x ∈ X : f (T (x)) = ∞ mod μ.
k=0 k=0

Let us end this section with another recurrence theorem, this one giving information
about distances of orbits mapped by a measurable function to a metric space.

Theorem 2.2.15 (Poincaré’s Recurrence Theorem). Let T : X → X be a conservative


non-singular transformation of a σ-finite measure space (X, B , μ), let (M, d) be a
separable metric space and f : X → M a measurable function. Then μ-a.e. we have
 
lim inf d f (x), f ◦ T n (x) = 0.
n→∞

Proof. Let B ⊂ M be a measurable set of diameter less than 1/n. Then by Halmos’s
Recurrence Theorem 2.2.6 we have μ-a.e. on f −1 B



1 f −1 B ◦ T n = ∞.
n=0

Consequently, there exists a μ-null set N B ∈ B such that for all x ∈ f −1 B \ N B we have
  1
lim inf d f (x), f ◦ T n (x) ≤ .
n→∞ n
By the separability of M there exists up to measure zero a countable cover of X by sets
of the form f −1 B \ N B with measurable subsets B ⊂ M of diameter less than 1/n. The
union of these sets U n has full measure and for all x ∈ U n we have
  1
lim inf d f (x), f ◦ T n (x) ≤ .
n→∞ n
2.3 The transfer operator | 75

)
The claim of the theorem then holds for all x in n∈N U n , which is a set of full
measure.

2.3 The transfer operator

2.3.1 Jacobians and the change of variable formula

Now, we remind the reader of absolutely continuous measures and the Radon–Nikodým
Theorem, which will be utilised heavily for the rest of this section.

Definition 2.3.1. Let μ and ν be two measures on a measurable space (X, B). Then ν is
called absolutely continuous with respect to μ, or sometimes μ-absolutely continuous,
and written ν μ, if for all B ∈ B with μ(B) = 0 we have that ν (B) = 0. Moreover, if for
the two measures μ and ν we have that ν μ as well as μ ν , then the two measures
are said to be equivalent and we write μ ∼ ν . We will also refer to equivalent measures
as being in the same measure class.

Note that the definition of a non-singular transformation given in Definition 2.2.4 can
also be phrased in the following way: The transformation T : X → X is non-singular if
the measure μ ◦ T −1 is absolutely continuous with respect to μ.

Remark 2.3.2. Let us point out here that Proposition 2.1.12 shows that the Gauss
measure m G is absolutely continuous with respect to the Lebesgue measure λ and vice
versa. Hence, the two measures are equivalent.

Lemma 2.3.3. Let (X, B , μ) be a σ-finite measure space and g ∈ M+ (B). Then the
integral
,
μ g (A) := g dμ, A ∈ B , (2.3)
A

defines a σ-finite measure μ g on (X, B), which is absolutely continuous with respect to
μ. We have g ∈ L+1 (μ) if and only if μ g is finite. We have μ g = μ h for two measurable
non-negative functions g, h if and only if g ∼ h, in other words, if g = h as elements of
M+ (B ).
The unique function g ∈ M+ (B) with ν = μ g is called the density of ν with respect
to μ.

Proof. The fact that μ g is a σ-additive set function (and hence defines a measure)
follows from the Monotone Convergence Theorem. Since μ is σ-finite there exists a

sequence of measurable sets B1 ⊂ B2 ⊂ · · · such that B k = X and μ(B k ) < ∞. Since
g < ∞ on X we have that A k := B k ∩ {g ≤ k} defines an increasing sequence of sets with
union equal to X for which we have μ g (A k ) ≤ kμ(B k ) < ∞. This shows that also μ g is
σ-finite. Clearly, μ g μ and g ∈ L+1 (μ) if and only if μ g is finite.
76 | 2 Basic ergodic theory

If f ∼ g then it follows immediately that μ f = μ g . For the reverse implication define


0
C k := B k ∩ {f < g }. Then, since C g − f dμ = μ g (C k ) − μ f (C k ) = 0 and the integrand is
k
non-negative, it follows that f = g μ-a.e. on C k . Analogously, one shows that f = g μ-a.e.
on B k ∩ {f ≥ g }. Hence, f = g μ-a.e. on B k for every k ∈ N and consequently f ∼ g.
The significance of the Radon–Nikodým Theorem is that also the converse of
Lemma 2.3.3 holds, that is, every finite measure ν such that ν μ has a density
with respect to μ.

Theorem 2.3.4 (The Radon–Nikodým Theorem). Let μ and ν be two σ-finite measures
on a measurable space (X, B) such that ν μ. Then there exists a unique element
h ∈ M+ (B) such that ν = μ h , that is for every set B ∈ B, we have
,
ν (B) = h dμ.
B

Moreover, if the measure ν is finite, then this almost surely unique function h belongs to
L1 (μ).

Proof. See, for instance, Theorem 6.10 in Rudin [Rud87], (where the theorem is proved
in slightly greater generality than stated here).

Remark 2.3.5. The unique density h appearing in Theorem 2.3.4 is often referred to as
the Radon–Nikodým derivative of ν with respect to μ and denoted by h = dν /dμ.

2.3.2 Obtaining invariant measures via the transfer operator

Our aim now is to obtain invariant measures which are absolutely continuous to
a given reference measure. Throughout, we consider a non-singular transformation
T : X → X on a σ-finite measure space (X, B , μ). Let g ∈ M+ (B) and suppose that μ g has
0
density g with respect to μ, that is for all measurable sets A we have μ g (A) := A g dμ.
Then, since μ g μ, we have μ g ◦ T −1 μ ◦ T −1 μ. Thus, via the Radon–Nikodým

Theorem, we can define the operator T μ : M+ (B) → M+ (B) by
−1
 μ (g) := d(μ g ◦ T ) .
T

If it is clear which measure μ is meant, we will simply write T  := T


 μ . If g ∈ L1 (μ), then
  +  −
we extend this definition linearly by setting T(g) := T(g ) − T(g ). Defined in this way,
 : L1 (μ) → L1 (μ) is a positive, bounded, linear operator. In fact, for the operator norm
T
 we have that T
of T,   = 1 (the proof of this fact is left to Exercise 2.6.11).

Definition 2.3.6. The operator T  μ : L1 (μ) → L1 (μ) defined above is called the transfer
operator of T with respect to the measure μ.
2.3 The transfer operator | 77

Let us make one further observation. For all A ∈ B, we have that


, , −1
 dμ = d(μ g ◦ T ) dμ = μ g ◦ T −1 (A)
1 A · T(g)

X A
, ,
= g dμ = (1 A ◦ T) · g dμ.
T −1 (A) X

Then, an approximation argument shows that for all f ∈ L∞ (μ), we equivalently have
that
, ,
 dμ = (f ◦ T) · g dμ.
f · T(g) (2.4)
X X
0 0
 k (g) dμ = (f ◦ T k ) · g dμ. Furthermore,
Inductively, it follows for all k ∈ N that X f · T X

the relation in (2.4) characterises T(g). Indeed, suppose there exist g1 and g2 in L1 (μ)
such that for all f ∈ L∞ (μ) we have
, , ,
fg1 dμ = (f ◦ T)g dμ = fg2 dμ.

Then if we choose f := sgn(g1 − g2 ), it follows that


, , , ,
|g 1 − g 2 | dμ = f (g1 − g 2 ) dμ = fg 1 dμ − fg 2 dμ = 0,

and thus g1 ∼ g2 .

Remark 2.3.7. The relation in (2.4) shows that the transfer operator captures the
evolution of probability densities under the action of T : [0, 1] → [0, 1] in the following
sense. Suppose that X0 denotes a [0, 1]-valued random variable with a distribution
0
absolutely continuous with respect to μ and density g. That is, P(X0 ∈ A) = A g dμ, for
all A ∈ B. Then, for all n ∈ N, the distribution of the random variable X n := T n ◦ X0 also
has a density with respect to μ, and that density coincides with T  n g.

Remark 2.3.8. For a σ-finite measure μ, it is a fact that (L1 (μ))*  L∞ (μ). (Here the star
denotes the dual space of L1 (μ), where we recall that if X is a normed linear space, then
the dual space X * is defined to be the set of all continuous linear functionals f : X → R.)
Hence, the operator

U T : L∞ (μ) → L∞ (μ),
f →f ◦T

 (Exercise 2.6.10). The operator U T is usually referred to as


is the adjoint operator of T
the Koopman operator (see [Koo31]).
78 | 2 Basic ergodic theory

We now formulate a dual version of Proposition 2.2.9. For this fix some f ∈ L1 (μ) with
 T and its complement D
f > 0 and define the set C  T by setting
# - # -
 k  k
 T :=
C  f = ∞ and D
T  T :=  f <∞ .
T
k∈N k∈N

This decomposition gives rise to a generalisation of the previously-defined Hopf


decomposition for measure-preserving systems (see Remark 2.3.10).

Proposition 2.3.9. Let (X, B , μ, T) be a non-singular system. Then:


(a) All wandering sets are subsets of D  T mod μ.

(b) For all measurable subsets A ⊂ D T of the dissipative part with μ(A) > 0 there exists
a wandering set W ∈ B with μ(W) > 0 and W ⊂ A.

Proof. Towards part (a), let W ∈ B be a wandering set, that is ∞ n
n=0 1 W ◦ T ≤ 1, μ-a.e.
Then we have
,  ,  ,
 k
T f dμ = f · k
1 W ◦ T dμ ≤ f dμ < ∞.
W k∈N k∈N

  k f is finite μ almost everywhere


That the integral is finite implies that the sum k∈N T
on W. This finishes the proof of part (a).
To prove (b), first fix a measurable subset
# -
 k
A⊂  f ≤ N = D
T T
N∈N k∈N
. /
with positive measure. Then there exists a measurable subset B ⊂ A ∩ k f ≤ N
T
k∈N
for some N ∈ N with positive and finite measure.
Now, if we assume that μ(A ∩ W) = 0 for all wandering sets W ∈ B then Halmos’s

Recurrence Theorem 2.2.6 would imply that n∈N 1 B ◦ T n = ∞ almost everywhere on
B. But then we would have
,  , 
∞ > Nμ(B) ≥  k f · 1 B dμ = f
T 1 B ◦ T k dμ = ∞.
k∈N k∈N

This contradiction finishes the proof.

Remark 2.3.10. The last proposition shows in particular that for a non-singular
dynamical system, the set D  T is the measurable union of the wandering sets (cf.
Remark 2.2.10). Hence, this decomposition is
– independent of the chosen positive integrable function f and
– for a measure-preserving system coincides with the previously-defined Hopf
decomposition.
2.3 The transfer operator | 79

The following corollary allows us to characterise conservativity also in terms of the


transfer operator.

Corollary 2.3.11. Let (X, B , μ, T) be a non-singular dynamical system. Then the system
is conservative if and only if

T = X
C mod μ.

Proof. The proof follows exactly along the lines of the proof of Corollary 2.2.11.

Example 2.3.12. Let us consider the extended tent map T : R → R given by


#
2x for x ≤ 1/2
T(x) :=
−2x + 2 for x > 1/2.

which defines a non-singular transformation with respect to the Lebesgue measure λ.


 T = [0, 1].
Then the conservative part of the Hopf decomposition of R is given by C

As advertised at the beginning of this section, we are now in a position to find a


T-invariant measure with the help of the transfer operator.
 μ (g) = g,
Theorem 2.3.13. Let (X, B , μ, T) be a non-singular system. If g ∈ M+ (B) with T

then the measure μ g (as defined in (2.3)) is T-invariant and T μ g (1) = 1.

Proof. Using (2.4), for all B ∈ B we have


, , ,
 dμ = 1 B g dμ = μ g (B).
μ g ◦ T −1 (B) = 1 B ◦ T · g dμ = 1 B · Tg

From this we deduce that


−1
 μ g (1) = d(μ g ◦ T ) = dμ g = 1 .
T
dμ g dμ g

The reason for naming T  the transfer operator should now be clear. The idea behind
this operator is that first of all it transfers the action of T on X to an action on L1 (μ)
and secondly, it transfers in this way the measure-theoretic problem of finding a
T-invariant measure to the functional-analytic problem of finding a fixed point for
 This is all very well, but without an explicit formula for T,
the operator T.  we are still
not really any closer to actually writing down an invariant measure for our remaining
examples. In the following section, we will address this issue.

2.3.3 The Ruelle operator

Let us now specialize somewhat to the case that T is a continuous map of the circle R/Z
such that T admits a Markov partition {A i : i ≥ 1}, as in Definition 1.2.21. Let us further
80 | 2 Basic ergodic theory

assume that the map T consists of full branches, that is, that T |A i (A i ) = (0, 1) for all
i ≥ 1. We are interested in finding invariant measures that are absolutely continuous
with respect to the Lebesgue measure λ, since these retain some physical meaning, so
instead of the transfer operator associated to some arbitrary σ-finite measure μ, we
shall consider the special case of the transfer operator with respect to λ.

Remark 2.3.14. If μ ∼ λ and h := dμ/dλ denotes the a.e. positive density of μ with
 μ and T
respect to λ (see Exercise 2.6.13), then the operators T  λ are related as follows:

 μ (f ) = 1 T
T  (h · f ).
h λ
Indeed, for every A ∈ B, using the identity in (2.4) twice, first for the measure μ and
then for λ, we obtain that
, , ,

T μ (f ) dμ = 
1 A · T μ (f ) dμ = (1 A ◦ T) · f dμ
A [0,1] [0,1]
, ,
= (1 A ◦ T) · fh dλ =  λ (h · f ) dλ
1A · T
[0,1] [0,1]
,
1
= T (h · f ) dμ.
h λ
A

 λ (g) = g, then dλ g := g dλ defines a T-invariant measure


If g ∈ M+ (B) is such that T
absolutely continuous with respect to λ and hence T  λ 1 = 1.
g

Our aim here, as advertised at the end of the last section, is to define another, related,
operator which has a more concrete form. For this, we will need the following lemma.

Lemma 2.3.15. Let φ : (a, b) → (0, 1) be continuously differentiable such that either
φ > 0 or φ < 0. Then we have

dλ ◦ φ−1  −1  
= (φ ) .

Proof. We only consider the case φ > 0. Let g := 1 J for some subinterval J of (0, 1).
Then by the substitution rule of integration we have
, , ,
λ ◦ φ (J) = g ◦ φ dλ = g ◦ φ · φ /φ dλ = g ◦ φ · (φ−1 ) ◦ φ · φ dλ
−1  

, ,
= g · (φ−1 ) dλ = (φ−1 ) dλ
J

Since the subintervals of (0, 1) generate B we have verified the claim of the lemma.
For an interval map T, as described above, we have that the maps T |A i are invertible
with measurable inverse branches T i := (T |A i )−1 . Further, suppose that λ ◦ T i λ.
2.3 The transfer operator | 81

Hence, we can define the Jacobian for each inverse branch i ≥ 1 to be

d(λ ◦ T i )
Ji := .

Proposition 2.3.16. For each g ∈ M+ (B) or g ∈ L1 (μ), we have

 λ (g) =
T g ◦ T i · Ji .
i≥1

Proof. Let g be as assumed and let A ∈ B be arbitrary. Then,


 
−1

λ g ◦ T (A) = λ g T i (A) = λ g ((T i (A))
i≥1 i≥1
, ,
= 1 A ◦ T i−1 · g ◦ T i ◦ T i−1 dλ = 1 A · g ◦ T i d(λ ◦ T i )
i≥1 i≥1
, 
= 1A g ◦ T i · Ji dλ.
i≥1

Since the density is unique this proves the proposition.

Definition 2.3.17. Suppose that the dynamical system ([0, 1], T) is such that all the
inverse branches T i := (T |A i )−1 are additionally continuously differentiable. Then the
Ruelle operator P T is defined for any function f : [0, 1] → R such that the right-hand
side makes sense for x ∈ (0, 1) to be
  
P T (f )(x) := (f ◦ T i )(x) · T i (x). (2.5)
i≥1

Proposition 2.3.18. We have that P T is well-defined as an operator on L1 (λ). When we


use the same notation for P T acting on functions and acting on equivalence classes, we
also have that

λ .
PT = T

Proof. We have that if f , g are two integrable functions in the same L1 (λ) equivalence
class, then also

P T (f ) = P T (g), λ-almost everywhere.

To see this, it suffices to note that under strictly monotone continuously differentiable
functions, Lebesgue null-sets are mapped to null-sets.
 
Since by Lemma 2.3.15 we have Ji = T i  the second statement follows from
Proposition 2.3.16.
82 | 2 Basic ergodic theory

2.3.4 Invariant measures for F , F α , G and L α

The above discussion allows us to finally get our hands on Lebesgue-absolutely


continuous invariant measures for F and F α ; we can also revisit the invariant measures
for G and L α in terms of the relevant transfer operators. In light of Proposition 2.3.18,
we have that if we find a fixed point of the Ruelle operator, this gives us a fixed point
of the transfer operator.

Proposition 2.3.19. For the Farey system ([0, 1], B , F), the λ-absolutely continuous
measure νF defined by the density h F , given by h F (x) := 1/x, is an F-invariant measure.
Moreover, the measure νF is infinite and σ-finite.

Proof. First observe that according to (2.5), the Ruelle operator P F for the map F acts
in the following way: For f : [0, 1] → R measurable,
   
P F (f ) = F0   · ( f ◦ F0 ) + F1   · ( f ◦ F1 ) ,

where F0 (x) := x/(1 + x) and F1 (x) := 1/(1 + x) denote the inverse branches of the
Farey map (as defined in Section 1.3.1). In order to show that the density h F defines
an F-invariant measure absolutely continuous with respect to λ, it suffices to show
that P F (h F ) = h F . Indeed, we have
1 −1
F0 (x) = and F1 (x) = .
(1 + x)2 (1 + x)2
The following calculation then finishes the proof:
   
1 x  1
P F (h F )(x) = hF + hF
(1 + x)2 1+x 1+x
(1 + x) + x(1 + x) 1
= = = h F (x).
x(1 + x)2 x

Before moving on to consider invariant measures for the α-Farey systems, let us
make an interesting observation about fractions in the Stern–Brocot tree (which was
introduced directly after Definition 1.3.4). For each reduced fraction v/w ∈ (0, 1) and
each n ∈ N0 , we have that
 1 1
= . (2.6)
pq vw
p/q∈F −n (v/w)

In particular, for the special case v/w = 1/2, we have that


 2
= 1.
pq
p/q∈F −n (1/2)

The identity in this special case was first noted by the Canadian music theorist Pierre
Lamothe (see reference in [Gut11]), whereas the general case originates in [KS12a]. To
2.3 The transfer operator | 83

first prove (2.6) in an elementary way, suppose that the statement is true for all reduced
fractions v/w ∈ (0, 1) and for some n ∈ N0 . Then,
 1  1
=
p
pq p
pq
q ∈F q ∈F
−(n+1) v
( )
w (F −1 ( wv ))
−n

 1  1
= +
p
pq p
pq
q ∈F q ∈F
v w
−n
( v+w )
( v+w ) −n

1 1 1
= + = .
v(v + w) w(v + w) vw

Alternatively, the equality in (2.6) can be deduced as a special case from the fixed point
equation for the Ruelle operator P F for the map F. As in Proposition 2.3.19, let h F (x) :=
1/x denote the density which satisfies P F (h F ) = h F . For all x ∈ (0, 1) and all n ∈ N0 , we
then have that
  n  −1
(F ) (y) h F (y) = h F (x). (2.7)
y∈F −n (x)

(You are asked to prove the above statement in Exercise 2.6.12.) A straightforward
calculation, which we leave to Exercise 2.6.14, shows that
    
 n  p  q2 p −n v
(F ) = , for all ∈ F .
 q  w2 q w

Inserting this into (2.7), it follows that


 w2  w2 q
= ·
p
pq p
q2 p
q ∈F ( wv ) q ∈F( wv )
−n −n


  n   p −1  p  v w
= (F )  hF = hF = ,
 q  q w v
p
q ∈F ( w )
−n v

and hence the statement in (2.6) follows.

Proposition 2.3.20. For each α-Farey system ([0, 1], B , F α ), the λ-absolutely continu-
ous measure να , given by the density h F α , which is defined, up to multiplication by a
constant, by

dνα  t n

h F α := = ·1 ,
dλ a n An
n=1

is an F α -invariant measure. Moreover, να is σ-finite, and we have that να is an infinite


measure if and only if α is of infinite type.
84 | 2 Basic ergodic theory

Proof. We will prove this as in the case of the Farey map by considering the Ruelle
operator P F α for the map F α , which acts on measurable functions f : [0, 1] → R by
   
P F α ( f ) = F α,0   · ( f ◦ F α,0 ) + F α,1   · ( f ◦ F α,1 ) .

As in Proposition 2.3.19, it is sufficient to show that h F α is an eigenfunction of P F α ,


that is,

P F α (h F α ) = h F α .

To see this, note that for the inverse branches F α,1 and F α,0 and the density h F α , an
easy computation shows that we have



h F α ◦ F α,1 = t1 /a1 · 1[0,1] and h F α ◦ F α,0 = t n+1 /a n+1 · 1 A n .
n=1

Moreover, one immediately verifies that



|F α,1 | = a 1 · 1[0,1] and |F α,0 | = a n+1 /a n · 1 A n .
n=1

These two observations together imply that

P F α (h F α ) = |F α,0  | · (h F α ◦ F α,0 ) + |F α,1  | · (h F α ◦ F α,1 )


∞  
a n+1 t n+1
= · 1 A n + t1 · 1[0,1]
a n a n+1
n=1
∞   ∞
t n+1 tn
= + 1 · 1An = · 1 = h Fα .
an a n An
n=1 n=1

Regarding the second statement of the proposition, the σ-finiteness of the measure να
follows from the fact that να (A n ) < ∞ for all n ∈ N, and a simple calculation shows that
for each n ∈ N we have that
 n  n ,
 n  
n
tk 
n
να Ak = να (A k ) = h F α dλ = · ak = tk .
ak
k=1 k=1 k=1 A k=1 k=1
k

∞
Recalling that α is of infinite type provided that k=1 t k diverges, the proof is finished.

To complete the picture for our main examples, we recall the following facts for the
Gauss and the α-Lüroth system.
2.3 The transfer operator | 85

Proposition 2.3.21.
(a) For the Gauss system ([0, 1], B , G), the λ-absolutely continuous measure m G , given
by the density h G , which is defined by

dm G 1 1
h G (x) := = ,
dλ log 2 1 + x

is a G-invariant probability measure.


(b) For an α-Lüroth system ([0, 1], B , L α ), the λ-absolutely continuous measure m α ,
given by the density h L α , which is defined by

dm α
h L α (x) := = 1[0,1] ,

is a L α -invariant probability measure. In other words, m α coincides with λ.

Proof. The statements in (a) and (b) have already been obtained in Proposition 2.1.10
and Proposition 2.1.8, respectively. However, for the Gauss map G this statement can
now be derived alternatively by considering the Ruelle operator P G for G, which is
defined, for measurable functions f : [0, 1] → R and x ∈ [0, 1], by



P G (f )(x) := |Gn (x)| f (G n (x)),
n=1

with G n referring to the n-th inverse branch of G (see also Remark 2.1.11). One then
immediately verifies that P G (h G ) = h G , from which the assertion follows. The proof for
L α , using the Ruelle operator P L α for L α , is left as an exercise (see Exercise 2.6.4).

Remark 2.3.22.
1. It is clear that the measures νF , να , m G and m α are all absolutely continuous with
respect to λ; indeed, they are defined that way. The converse is also true, since
their densities are strictly positive. Hence, all these measures are equivalent to λ.
2. The reader may have encountered the Bogolyubov–Krylov Theorem, which states
that for an arbitrary continuous map T : X → X on a compact metric space X there
always exists a T-invariant probability measure. However, this theorem does not
say anything about whether the measure is absolutely continuous with respect to
Lebesgue measure. There may be several invariant measures, but we will shortly
see in Section 2.4.5 that in our leading examples the λ-absolutely continuous ones
given by the densities h F and h F α are unique for their respective systems.
3. We can use Maharam’s Recurrence Theorem (see Theorem 2.2.14) to show that the
Farey system ([0, 1], F, νF ) and any α-Farey system ([0, 1], F α , να ) is conservative.

Indeed, this can be seen immediately on observing that both ∞ −n
n=0 F ((1/2, 1])
∞ −n
and n=0 F α (A1 ) are equal to (0, 1].
86 | 2 Basic ergodic theory

2.3.5 Invariant measures via the jump transformation

Before moving on from the subject of invariant measures, let us discuss another way of
obtaining an invariant measure for an infinite measure system, this time by way of the
jump transformation. Recall that in Definitions 1.3.2 and 1.4.9, we introduced specific
jump transformations for the maps F and F α , respectively. We also proved that the
Gauss map G is the jump transformation of the Farey map F with respect to the set
[1/2, 1] and the α-Lüroth map L α is the jump transformation of the α-Farey map F α
with respect to the set A1 . Let us now give a more general definition.

Definition 2.3.23. Let (X, B , μ, T) be a conservative measure-preserving system. Let E



be a measurable set such that T(E) = X and k≥0 T −k E = X. Then, the first passage time
p : X → N is defined to be

p(x) := 1 + inf {n ≥ 0 : T n (x) ∈ E}

and the jump transformation T E* : X → X of T with respect to E is defined by setting

T E* (x) := T p(x) (x).

To shorten the notation, let us write {p = n} for the set {x ∈ X : p(x) = n}, so that T E* = T n
on {p = n}. We also write {p > n} for the set {x ∈ X : p(x) > n}. Observe that {p > 0} = X

and {p > n} = ∞ k=n+1 { p = k } for n ≥ 1.

Example 2.3.24. In the case of the Farey map, setting E := [1/2, 1] (so the jump
transformation F *E coincides with the Gauss map, as was already shown earlier), yields
that the sets {p = n} are given by the first level Gauss cylinder sets, that is, for each
n ∈ N, we have that {p = n} = C(n). Also, in this case we have that {p > 0} = [0, 1] and
{p > n } is equal to the Farey cylinder with code consisting of n zeros, for n ≥ 1.

One of the basic ideas behind the concept of the jump transformation is to find a set
E for a given map T such that the jump transformation T E* is easier to understand
than the original map T. The hope is to find a jump transformation which turns out
to be a map that has already been studied earlier. The following lemma then yields
information about T. (Throughout this section, it should be helpful to keep the Gauss
map and the Farey map in mind, along with their invariant measures.)

Lemma 2.3.25. With the notation above, assume that the map T E* : X → X preserves
a finite measure ν . We then have that the measure μ, defined for any measurable set
B ∈ B by

μ(B) := ν (T −n (B) ∩ {p > n }),
n≥0

is T-invariant.
2.3 The transfer operator | 87

Proof. Noting that {p > n} = {p > n + 1} ∪ {p = n + 1}, we have that



μ(T −1 (B)) = ν (T −n (T −1 (B)) ∩ {p > n })
n≥0

= ν (T −n (T −1 (B)) ∩ {p > n + 1})
n≥0

+ ν (T −n (T −1 (B)) ∩ {p = n + 1})
n≥0
 
= ν (T −n (B) ∩ {p > n }) + ν ((T E* )−1 (B) ∩ {p = n })
n≥1 n≥1
 −n
= ν (T (B) ∩ {p > n}) = μ(B).
n≥0

As an example, observe that if in the above lemma we put B = E, then we obtain that
μ(E) = ν (X). Also, let us consider the situation for the Farey and Gauss systems. We
already know that the Gauss map G preserves the probability measure m G . So, if we
set E := [1/2, 1], then Lemma 2.3.25 tells us that the Farey map F preserves the measure
μ, where μ is given for arbitrary measurable B ∈ B by

μ(B) := m G (B) +  . . . , 0)n ).
m G (F −n (B) ∩ C(0,
n≥1

 . . . , 0)n denotes the level n Farey cylinder set with code consisting of n zeros,
Here, C(0,
for each n ≥ 1. To get an idea of what this measure actually looks like, let us calculate
μ([1/(k + 1), 1/k]), for some k ≥ 2 (note that for k = 1, this is precisely μ(E) and we
 . . . , 0)n is
already know that μ(E) is equal to m G (X)). Observe first that F −n (B) ∩ C(0,
n
equal to F0 (B), for any measurable set B. Therefore, we have that

μ(B) = m G (F0n (B)).
n≥0

Recalling that F0 (x) = x/(1 + x), one immediately verifies that F0n (x) = x/(1 + nx). Thus,

μ([1/(k + 1), 1/k]) = m G (F0 ([1/(k + 1), 1/k]))
n≥0

 1 ,
1/(k+n)
1
= dλ(x)
log 2 1+x
n≥0
1/(k+n+1)
 1   1
 
1

= log 1 + − log 1 +
log 2 k+n k+n+1
n≥0  
1 ! k+n+1 k+n+1
= log ·
log 2 k+n k+n+2
 
n≥0
log k + 1/k
= .
log 2
88 | 2 Basic ergodic theory

Recalling the measure νF obtained via the Ruelle operator in Proposition 2.3.19, we
deduce that for the sets [1/(k + 1), 1/k], for k ≥ 1,
1
ν ([1/(k + 1), 1/k]) = μ([1/(k + 1), 1/k]).
log 2 F

It will turn out later that these two measures really are equal up to multiplication by a
constant (see Theorem 2.4.35).

2.4 Ergodicity and exactness

Ergodicity is the natural way to describe indecomposability of transformations, in the


sense that the system is said to be ergodic if and only if it is not possible to decompose
it into two invariant subsystems supported on sets of positive measure.

Definition 2.4.1. The dynamical system (X, B , μ, T) is said to be ergodic provided that
whenever A ∈ B is such that T −1 (A) = A we have that either μ(A) = 0 or μ(X \ A) = 0.
In other words, an ergodic transformation has only trivial invariant subsets. We will
often simply say that the map T is ergodic with respect to μ or that μ is an ergodic
measure for T. If the system (X, B , μ, T) is measure-preserving and T is ergodic, we
will call the system an ergodic measure-preserving system.

Remark 2.4.2. Note that it is often useful to talk about T-invariant ergodic measures,
and in many books ergodicity is only defined for T-invariant measures. However, this
is not necessary. It can also be of interest to talk about ergodic measures that are not
T-invariant.

Let us begin by considering the existence of sweep-out sets (see Definition 2.2.12) for
conservative, ergodic systems.

Lemma 2.4.3. If (X, B , μ, T) is a conservative and ergodic non-singular system, then


every measurable set of positive and finite measure is a sweep-out set for T.

Proof. Suppose that E ∈ B is such that μ(E) > 0. Further, let E := E \ W, where W :=
{x ∈ E : T n (x) ∈
/ E for all n ≥ 1}. Then, by conservativity (see Halmos’s Recurrence
Theorem 2.2.6), we have that μ(W) = 0 and thus, μ(E ) > 0. Now, noting that x ∈
∞ −n  ∞ −n 
n=0 T (E ) if and only if x ∈ n=1 T (E ), we have that
∞  ∞
T −1 T −n (E ) = T −n (E ),
n=0 n=0
∞
and thus, since μ( n=0 T
−n
(E )) > 0, the ergodicity of T implies that
∞ ∞
T −n (E) = T −n (E ) = X mod μ.
n=0 n=0
2.4 Ergodicity and exactness | 89

(Recall that the we use the notation A = B mod μ to mean that the two sets A and B are
equal up to a set of μ-measure zero).

Corollary 2.4.4. If (X, B , μ, T) is a non-singular system, then T is conservative and


0
ergodic if and only if for every measurable function f : X → [0, ∞) with f dμ > 0 we
have that


f ◦ T k = ∞, μ-a.e. on X.
k=0

Proof. Suppose first that the condition stated for measurable functions holds. Then
let A ∈ B be either a wandering set or a T-invariant proper subset of X and suppose
that μ(A) > 0. It follows that
#∞ -
 k
1 A ◦ T = ∞ = X mod μ
k=0

contradicting our assumption for f = 1 A . Hence, such a set A does not exist and T is
conservative and ergodic.
The reverse implication follows from the fact that for a non-negative measurable
0
function f with f dμ > 0 there must exist δ > 0 with μ({f > δ}) > 0. Hence, δ1{f >δ} ≤ f
and by Lemma 2.4.3 and Remark 2.2.13 we find μ-a.e. that


∞ 

f ◦ Tk ≥ δ 1{f >δ} ◦ T k = ∞.
k=0 k=0

Corollary 2.4.5. If (X, B , μ, T) is a non-singular system, then it is also conservative and


0
ergodic if and only if for every measurable function f ∈ L+1 (μ) with f dμ > 0 we have
that


 k f = ∞, μ-a.e. on X.
T
k=0

Proof. Suppose first that T is conservative and ergodic. Since by Lemma 2.4.3 any A ∈ B
with μ(A) > 0 is a sweep-out set, we have that
, 
∞ , 

 k f · 1 A dμ =
T f· 1 A ◦ T k dμ = ∞.
k=0 k=0
 k
Therefore, it follows that ∞k=0 T f = ∞ μ-a.e. on X.
For the reverse implication fix a wandering set W ∈ B with μ(W) > 0 and f ∈ L+1 (μ)
0
with f dμ > 0. Then we obtain the contradiction
, 
∞ , 
∞ ,
∞=  k f · 1 W dμ =
T f· k
1 W ◦ T dμ ≤ f dμ < ∞.
k=0 k=0
90 | 2 Basic ergodic theory

If A is T-invariant with μ(A) > 0 and B a measurable subset of X \ A with 0 < μ(B) < ∞
then similarly we find the contradiction
, 
∞ , 
∞ , 

∞=  k 1 B · 1 A dμ =
T 1B · 1 A ◦ T k dμ = 1 B · 1 A dμ = 0.
k=0 k=0 k=0

This shows that T is conservative and ergodic.

Proposition 2.4.6. For a non-singular dynamical system (X, B , μ, T), the following are
equivalent:
(a) T is ergodic with respect to μ.
(b) For B ∈ B, if B = T −1 B mod μ (that is μ(B T −1 (B)) = 0), then either μ(B) = 0 or
μ(X/B) = 0.
(c) For f : X → R measurable, if f ◦ T = f μ-a.e., then f is μ-a.e. equal to a constant.

Proof. First we prove that (a) implies (b). Suppose that T is ergodic and let B be a
μ-almost-invariant measurable set, that is, μ(B T −1 (B)) = 0. We aim to construct a
T-invariant set A from B such that A has the same μ-measure as B. So, define
∞ ∞
A := T −k (B).
n=0 k=n

It then follows, for each n ≥ 1, that


∞ ∞
B T −k (B) ⊂ B T −k (B).
k=n k=n

Moreover, since
k−1 k−1
B T −k (B) ⊂ T −i (B) T −(i+1) (B) = T −i (B T −1 (B))
i=0 i=0

and since the system is non-singular, we also have that μ(B T −k (B)) = 0. Let B n :=
∞ −k
k=n T (B) and notice that the sequence (B n )n≥1 is a decreasing
)
sequence of sets
with the property that μ(B n B) = 0, for each n ∈ N, and n∈N B n = A. It follows that
μ(A B) = 0 and so,

μ(A) = μ(B).

Furthermore, it is clear that the set A is T-invariant. Hence, the ergodicity of T implies
that μ(A) = 0 or μ(X \ A) = 0. Consequently, either μ(B) = 0 or μ(X/B) = 0, which proves
the first implication.
Now we prove the implication from (b) to (c). Let the system be ergodic and
suppose that for the measurable function f : X → R we have that f ◦ T = f μ-a.e. For
each c ∈ R, we make the observation that the set D c := {f ≤ c} := {x ∈ X : f (x) ≤ c} is
2.4 Ergodicity and exactness | 91

(μ-a.e.) T-invariant. Therefore, for each of these sets D c , either μ(D c ) = 0 or D c = X, up


to a set of μ-measure zero. Let c0 := inf {c ∈ R : D c = X mod μ}. Then,
$ % $ %
1 1
{f = c0 } = f ≤ c0 + \ f ≤ c0 −
n n
n≥1 n≥1

= X mod μ.

In other words, f (x) = c0 for μ-a.e. x ∈ X.


To see that (c) implies (a), suppose that for all measurable functions f : X → R, we
have that if f ◦ T = f μ-a.e., then f is μ-a.e. equal to a constant. Let B ∈ B be such
that T −1 (B) = B. Then, 1 B ◦ T = 1 B everywhere on X and so, by assumption, 1 B is
constant μ-a.e. on X. Therefore, by the definition of 1 B , we have that either μ(B) = 0 or
μ(X/B) = 0.
If we additionally assume that the system is conservative we can amend the list of
equivalences in the following way.

Proposition 2.4.7. For a conservative non-singular dynamical system (X, B , μ, T), the
following are equivalent:
(a) T is ergodic with respect to μ.

(b) For A ∈ B, if μ(A) > 0, then ∞ −n
n=0 T (A) = X mod μ.
(c) For A, B ∈ B, if μ(A)μ(B) > 0, then there exists n ∈ N such that

μ(T −n (A) ∩ B) > 0.

Proof. We will prove the string of implications (a) implies (b) implies (c) implies (a).
The implication from (a) to (b) is a consequence of Lemma 2.4.3 since the system
is supposed to be non-singular, ergodic and conservative. To prove that (b) implies (c),
let A and B be sets of positive measure. Since (b) holds, we have that

T −n (A) = X mod μ,
n=1

which implies that


∞ 
−n


0 < μ(B) = μ B ∩ T (A) ≤ μ(B ∩ T −n (A)).
n=1 n=1

It follows that there must be at least one n ≥ 1 such that μ(B ∩ T −n (A)) > 0.
Suppose now that (c) holds and let A be a T-invariant set. Then we have for all
n ∈ N that

0 = μ((X \ A) ∩ A) = μ((X \ A) ∩ T −n (A)).


92 | 2 Basic ergodic theory

So, by (c), either μ(A) = 0 or μ(X \ A) = 0, proving that T is ergodic. This finishes the
string of equivalences.
Next we state the following important uniqueness result for finite invariant ergodic
measures.

Proposition 2.4.8. Let (X, B , μ, T) be an ergodic invariant system with μ(X) = 1 and let
ν be another T-invariant probability measure on (X, B ) with ν μ. Then we have that
ν = μ.

Proof. Since ν μ, the Radon–Nikodým Theorem implies that there exists a density
f ∈ L+1 (μ) with dν = f dμ. We are going to prove that f is constant μ-almost everywhere.
Since ν and μ are both probability measures, this then guarantees that f = 1. In fact,
for r ∈ R and all B ⊂ {f > r} with positive μ-measure we have
,
ν (B ) − rμ (B ) = (f (x ) − r) dμ (x ) > 0,
B

which implies that

ν (B ) > rμ (B ) .

Similarly, for all C ⊂ F r := {f ≤ r} it follows that ν (C) ≤ rμ (C). Making use of Remark 2.1.2
 
and since T −1 (F r ) \ F r ⊂ {f > r}, we either have μ T −1 (F r ) \ F r = 0 or

ν (T −1 (F r ) \ F r ) > rμ(T −1 (F r ) \ F r ) = rμ(F r \ T −1 (F r ))


≥ ν (F r \ T −1 (F r )) = ν (T −1 (F r ) \ F r ),

which is impossible. Again in light of Remark 2.1.2, we have

μ(T −1 (F r ) F r ) = μ(T −1 (F r ) \ F r ) + μ(F r \ T −1 (F r )) = 0.

Since T is ergodic, it follows that μ (F r ) ∈ {0, 1}. This shows that f has to be μ-a.e.
equal to a constant.

2.4.1 Ergodicity of the systems G and L α

Let us now prove that the Gauss map is ergodic with respect to the Gauss measure.
Before beginning, we introduce the notation “a  b”, which means that there exists a
positive constant C such that C−1 a ≤ b ≤ Ca.

Lemma 2.4.9. The Gauss map G is ergodic with respect to the Gauss measure m G .

Proof. The first and most important step in the proof is to show that for any given
continued fraction cylinder set C(x1 , . . . , x n ) and for all measurable sets B, we have
2.4 Ergodicity and exactness | 93

that

m G (G−n (B) ∩ C(x1 , . . . , x n ))  m G (B)m G (C(x1 , . . . , x n )). (2.8)

In fact, it is sufficient to demonstrate (2.8) for all intervals of the form B := [c, d] ⊆ [0, 1],
since the set of all Borel sets satisfying (2.8) for a fixed constant can be shown to be a
monotone class.
Thus, let B be some fixed interval in [0, 1] and let p n /q n := [x1 , . . . , x n−1 , x n ] and
p n−1 /q n−1 := [x1 , . . . , x n−1 ] denote the n-th and (n − 1)-th approximants of the number
x = [x1 , x2 , . . .] ∈ [0, 1]. Notice that x ∈ G−n (B) ∩ C(x1 , . . . , x n ) if and only if G n (x) =
[x n+1 , x n+2 , . . .] ∈ B. Since G n is monotonic on each cylinder set C(x1 , . . . , x n ), it follows
that G−n (B) ∩ C(x1 , . . . , x n ) is an interval with endpoints given by

p n + p n−1 c p n + p n−1 d
and ,
q n + q n−1 c q n + q n−1 d

for some c, d > 0. (This follows from Theorem 1.1.5 and the fact that in this case we
have r n+1 = c−1 or r n+1 = d−1 .) Therefore, the Lebesgue measure of the intersection
G−n (B) ∩ C(x1 , . . . , x n ) is equal to
   
 p n + p n−1 d p n + p n−1 c   p n q n−1 d + p n−1 q n c − p n q n−1 d − p n−1 q n c 
   
 q n + q n−1 d − q n + q n−1 c  =  (q n + q n−1 c)(q n + q n−1 d) 
1
= |d − c| ,
(q n + q n−1 c)(q n + q n−1 d)

by Theorem 1.1.1 (c). On the other hand, recall that the Lebesgue measure of the
cylinder set C(x1 , . . . , x n ) is given by
1
λ(C(x1 , . . . , x n )) = ,
q n (q n + q n−1 )

which implies that

q n (q n + q n−1 )
λ(G−n (B) ∩ C(x1 , . . . , x n )) = λ(B)λ(C(x1 , . . . , x n ))
(q n + q n−1 c)(q n + q n−1 d)
 λ(B)λ(C(x 1 , . . . , x n )).

In light of Proposition 2.1.12, the proof of (2.8) is finished.


Now, suppose that A ∈ B is such that G−1 (A) = A. Then (2.8) implies that for each
n ∈ N and for every cylinder set of level n we have that

m G (A ∩ C(x1 , . . . , x n ))  m G (A)m G (C(x1 , . . . , x n )).

Clearly, this also holds for finite unions of (disjoint) cylinder sets and, since finite
unions of cylinder sets generate the Borel σ-algebra, this implies that

m G (A ∩ B)  m G (A)m G (B), for all B ∈ B .


94 | 2 Basic ergodic theory

On choosing B := [0, 1] \ A, it now follows that

0  m G (A)m G ([0, 1] \ A),

which shows that m G (A) ∈ {0, 1}. This finishes the proof.
We can almost immediately obtain a stronger result about the Gauss map by only
slightly altering the above proof. We aim to show that the Gauss map is an exact
transformation. We first give the definition of exactness, for which we recall from
Definition 2.2.4 that a transformation is said to be non-singular if it preserves sets of
measure zero.

Definition 2.4.10. A non-singular transformation T of a σ-finite measure space


)
(X, B , μ) is said to be exact if for each B in the tail σ-algebra n∈N T −n (B) we have
that either μ(B) or μ(X \ B) is equal to zero.

Remark 2.4.11.
1. This definition of exactness only makes sense for non-invertible transformations.
Indeed, if T : X → X is invertible, then it follows immediately that T −n (B) = B
for every n ∈ N. The correct corresponding property for invertible systems is
the K-property, named for Kolmogorov, who introduced it. For more details, see
[Par81] and references therein.
2. The tail σ-algebra is not an immediately transparent object. It helps to remember
that it is an intersection of sets of sets. In particular, this means that if B ∈
)
n∈N T
−n
(B), then B ∈ T −n (B) for all n ∈ N. Thus, there exists a sequence of sets
(B1 , B2 , B3 , . . .) such that B = T −n (B n ) for every n ∈ N. Another way of thinking of
this is to note that T −n (T n (B)) = T −n (T n (T −n (B n ))) = T −n (B n ) = B, for all n ∈ N.
3. It is easy to see that an exact transformation must be ergodic, for if the map T : X →
X is exact and B is a measurable subset of X such that T −1 (B) = B, then T −n (B) = B
for all n ∈ N and so, the set B belongs to the tail σ-algebra and hence, either μ(B)
or μ(X \ B) is equal to zero.

Theorem 2.4.12. The Gauss map G is exact with respect to the Gauss measure m G .
)
Proof. Let B ∈ B be such that B lies in the tail σ-algebra n∈N G−n (B). As noted in
Remark 2.4.11, this implies that there exists a sequence of sets (B n )n≥1 such that B =
G−n (B n ) for every n ∈ N. We have shown in the proof of Lemma 2.4.9, in (2.8), that for
cylinder sets C(x1 , . . . , x n ) we have

m G (G−n (A) ∩ C(x1 , . . . , x n ))  m G (A)m G (C(x1 , . . . , x n )), for all A ∈ B .

In particular, for our sequence (B n )n≥1 we then have that

m G (G−n (B n ) ∩ C(x1 , . . . , x n ))  m G (B n )m G (C(x1 , . . . , x n )), for all n ∈ N.


2.4 Ergodicity and exactness | 95

But this implies, since m G (B n ) = m G (G−n (B n )) and the measure m G is G-invariant, that
for every cylinder set C(x1 , . . . , x n ), we have that

m G (B ∩ C(x1 , . . . , x n ))  m G (B)m G (C(x1 , . . . , x n )).

This also holds for finite unions of cylinder sets and thus, since the cylinder sets
generate the Borel σ-algebra, we deduce that

m G (B ∩ A)  m G (B)m G (A), for all A ∈ B .

Choosing A to be the complement of B yields the desired result.


In Section 2.5 we will prove that the Farey map is exact. One ingredient for the proof
there will be the following lemma, the proof of which is again based on the relation
in (2.8).

Lemma 2.4.13. Let C  denote an arbitrary level n + 1 Farey cylinder set with code ending
 we then have that
in the symbol 1. For each B ∈ B such that B ⊆ C,
 
 F n (B) .
λ(B)  λ(C)λ

Proof. Say that for some x1 , . . . , x k ∈ N with ki=1 x i = n + 1 we have
 
 := C
C  0x1 −1 , 1, 0x2 −1 , . . . , 1, 0x k −1 , 1 .

 Then there exists E ∈ B such that


Next consider some set B ∈ B such that B ⊆ C.

 x1 −1 , 1, 0x2 −1 , . . . , 1, 0x k −1 , 1).
B = G−k (E) ∩ C(x1 , . . . , x k ) = F −(n+1) (E) ∩ C(0

Since E = G k (B) = F n+1 (B) and since λ  m G , by Proposition 2.1.12, it follows from (2.8)
that

λ(B)  m G (B) = m G (G−k (E) ∩ C(x1 , . . . , x k ))  m G (E)m G (C(x1 , . . . , x k ))


  λ(F n (B))λ(C).
 λ(F n+1 (B))λ(C) 

Here the final inequality follows by using the change of variables formula.
Let us now prove directly that an α-Lüroth system ([0, 1], L α ) is exact with respect to
the Lebesgue measure. The proof follows along the same lines as that for the Gauss
map, so we give only a sketch here and leave the details as an exercise for the reader.

Proposition 2.4.14. The α-Lüroth map L α is exact with respect to λ.


)
Proof. To start, let B ∈ n∈N L−n α (B ) be given. We aim to prove that λ(B) ∈ {0, 1}.
First, it is straightforward to calculate that for any single cylinder set C α (1 , . . . , n ),
96 | 2 Basic ergodic theory

we have that

λ(B ∩ C α (1 , . . . , n )) = λ(C α (1 , . . . , n ))λ(B).

One immediately verifies that this also holds for a finite union of L α -cylinder sets. From
this, we deduce that

λ(B ∩ C) = λ(B)λ(C), for all C ∈ B .

Therefore, by choosing C to be equal to [0, 1] \ B, we conclude that

0 = λ(B ∩ ([0, 1] \ B)) = λ(B)λ([0, 1] \ B).

This shows that λ(B) = 0 or λ(B) = 1, and hence finishes the proof.

Corollary 2.4.15. The α-Lüroth map L α is ergodic with respect to λ.

2.4.2 Ergodic theorems for probability spaces and consequences for the Gauss and
α-Lüroth systems

The first major result in ergodic theory was published in 1931 by G.D. Birkhoff
[Bir31]¹. This result is known as the pointwise ergodic theorem and it gives a precise
relationship between the average of an integrable function evaluated along the orbit of
a typical point (the time average) and the integral of the function (the space average).
There are now a great variety of proofs available; the interested reader is referred to
either [Wal82] or [EW11] (and references therein). In Chapter 4 we will prove the more
general Chacon–Orstein Ergodic Theorem and then show how to derive Birkhoff’s
Pointwise Ergodic Theorem from this more general theorem. Let us here simply state
the result for the case of an ergodic probability-measure-preserving system.

Theorem 2.4.16 (Birkhoff’s Pointwise Ergodic Theorem). Let (X, B , μ, T) be an ergo-


dic, probability-measure-preserving system. If f ∈ L1 (μ), then we have for μ-a.e. x ∈ X
that
,
1
n−1
lim f ◦ T j (x) = f dμ.
n→∞ n
j=0

1 Birkhoff’s Pointwise Ergodic Theorem, although published first, was not the first ergodic theorem
to be proved. The work of von Neumann [Neu32] predates that of Birkhoff. In [Neu32] von Neumann
proves what is now called the Mean Ergodic Theorem. See the book [EW11] for an exposition of this
result and further references.
2.4 Ergodicity and exactness | 97

Since we have already proved that the Gauss and α-Lüroth systems are ergodic and
probability-measure-preserving, we may apply Birkhoff’s Pointwise Ergodic Theorem
to easily obtain the following interesting number-theoretic results. The original proofs
(in the continued fraction case, that is) of most of these statements were decidedly
more complicated.

Proposition 2.4.17. For λ-almost every real number x = [x1 , x2 , x3 , . . .] ∈ [0, 1], the
following statements hold:
(a) The element j appears in the continued fraction expansion of x with frequency

1 2 log(1 + j) − log(j) − log(2 + j)


lim #{i : i ≤ n, x i = j} = .
n→∞ n log 2

More generally, every finite sequence y 1 , . . . , y n of positive integers appears in the


continued fraction expansion of x with frequency

m G (C(y1 , . . . , y n )).

(b) For the geometric mean of the elements x n we have


∞ 
! log n/ log 2
1/n (n + 1)2
lim (x1 x2 . . . x n ) = .
n→∞ n(n + 2)
n=1

(c) For the arithmetic mean of the elements x n , we have


1
lim (x + x + · · · + x n ) = ∞.
n→∞ n 1 2

(d) For the growth rate of the denominators q n of the approximants to x, we have

1 π2
lim log(q n ) = .
n→∞ n 12 log 2

(e) For the rate at which the approximants p n /q n converge to x, we have


 
1  p n  π2

lim log x −  = − .
n→∞ n qn 6 log 2

Proof. For the first statement, first notice that the element j appears in the first n
elements of the continued fraction expansion of an irrational number x with frequency
$  %
1 1 1 1
#{i : i ≤ n, x i = j} = # i : i ≤ n, G i (x) ∈ , .
n n j+1 j

Now, observe that if we define the function f ∈ L1 (m G ) to be equal to the characteristic



function f := 1(1/(j+1),1/j) , then the ergodic sum 1n n−1 j
j=0 f ◦ G (x) coincides with the
n-th frequency defined above. Thus, by Birkhoff’s Pointwise Ergodic Theorem, we
98 | 2 Basic ergodic theory

immediately deduce that

, ,1/j
1 1 1
lim #{i : i ≤ n, x i = j} = f dm G = dλ(y)
n→∞ n log 2 1+y
1/(j+1)
    
1 1 1
= log 1 + − log 1 +
log 2 j j+1
2 log(1 + j) − log(j) − log(2 + j)
= .
log 2

The proof of the remaining part of the first statement follows similarly, on choosing
f := 1 C(y1 ,...,y n ) .
For part (b), define the function f : (0, 1) → (0, 1) by setting f (x) := log n, for
x ∈ (1/(n + 1), 1/n). It is easy to check that the function f is in L1 (λ) (and hence in
L1 (m G ), since λ and m G are comparable). By Birkhoff’s Pointwise Ergodic Theorem,
we therefore have for λ-a.e. x that
,
1 1
n n−1
lim log x j = lim f (G j (x)) = f dm G .
n→∞ n n→∞ n
j=1 j=0

A simple calculation shows that this yields the identity in part (b).
Proving part (c) requires a little more effort. Let now the function f be defined by
f (x) := 1/x = x1 , that is, f (x) is defined to be equal to the first element in the continued
fraction expansion of x. We then have that

1
n−1
1
(x1 + x2 + · · · + x n ) = f (G j (x)).
n n
j=0

However, we cannot directly apply Birkhoff’s Pointwise Ergodic Theorem because the
function f is not integrable in this instance. To overcome this, define for each N ∈ N,
#
f (x) if f (x) ≤ N;
f N (x) :=
0 otherwise.

The function f N is in L1 (λ) and so, by Birkhoff’s Pointwise Ergodic Theorem, we have
that

1 1
n−1 n−1
lim inf f (G j (x)) ≥ lim f N (G j (x))
n→∞ n n→∞ n
j=0 j=0
,1
= f N dm G .
0

The fact that the above integral tends to infinity as N tends to infinity finishes the proof
of part (c).
2.4 Ergodicity and exactness | 99

In order to prove part (d), first observe that if x = [x1 , x2 , x3 , . . .], then

p n (x) 1 1
= =
q n (x) x1 + [x2 , . . . , x n ] p n−1 (G(x))
x1 +
q n−1 (G(x))
q n−1 (G(x))
= .
x1 q n−1 (G(x)) + p n−1 (G(x))

This shows that p n (x) = q n−1 (G(x)) for every n ∈ N, since the approximants are in
reduced form. It follows that

1 p n (x) p n−1 (G(x)) p (G n−1 (x))


= · · · · 1 n−1 ,
q n (x) q n (x) q n−1 (G(x)) q1 (G (x))

so that
 
1
n−1
1 p n−j (G j (x))
− log(q n (x)) = log .
n n q n−j (G j (x))
j=0

Let the L1 (m G )-function f be defined by f (x) := log x. It then follows that


  
1 1
n−1 n−1
1 j j p n−j (G j (x))
− log(q n (x)) = f (G (x)) − f (G (x)) − f .
n n n q n−j (G j (x))
j=0 j=0

First noticing that the second term on the right hand side tends to zero as n tends
p (G j (x))
to infinity, since q n−j (G j (x)) is a good approximation to G j (x) for large n, we have by
n−j
Birkhoff’s Pointwise Ergodic Theorem that

,1
1
n−1
1 1 log x π2
lim − log q n = lim f (G j (x)) = dλ(x) = − .
n→∞ n n→∞ n log 2 1+x 12 log 2
j=0 0

This proves part (d). Finally, since


 
 pn 
log q n + log q n+1 ≤ − log x −  ≤ log q n + log q n+2 ,
qn

part (e) follows from part (d). This finishes the proof of the proposition.

Remark 2.4.18. Part (a) of the above proposition implies that for λ-a.e. x ∈ [0, 1], the
continued fraction expansion of x contains two 1s in a row infinitely often. Notice that
this implies that the same property holds for the Farey coding of almost every point.
This will turn out to be useful later on, when we prove that the Farey map is exact.
Part (c) says that the arithmetic mean of the first n continued fraction digits
diverges a.e. as n tends to infinity. Nevertheless, there exist meaningful stochastic laws
describing the continued fraction digits in greater depth. Lévy [Lév52] was the first
to derive non-degenerate limit laws in the context of continued fractions namely, we
100 | 2 Basic ergodic theory

have that the continued fraction digits belong to the domain of attraction to a stable
law with characteristic exponent 1. More precisely we have the following convergence
in distribution with respect to any absolutely continuous probability measure μ λ

Sk μ
− log k → F,
k/ log 2

where F has a stable distribution (cf. [Hei87] and [Phi88], and for related results see
also [Hen00]).
Khinchin showed that for a suitable normalising sequence a weak law of large
numbers holds (cf. [Khi35]). That is,
Sn 1

(n log n) log 2

converges in measure with respect to λ. However, according to [Phi88] there is no


normalising sequence (n k ) with (n k /k) non-decreasing such that a strong law of large
numbers is satisfied. More precisely, we either have that



1 S
< ∞ and lim k = 0 λ-a.e.
nk k→∞ n k
k=1

or


1 S
= ∞ and lim k = ∞ λ-a.e.
nk n
k→∞ k
k=1

On the other hand, Diamond and Vaaler have shown in [DV86] that for the trimmed
sum

n
Sn := x i − max x
1≤≤n
i=1

we have

Sn 1
lim = λ-a.e.
n→∞ n log n log 2

Finally, let us mention two further related results, namely, the extreme value law for
continued fractions by Galambos [Gal73, JKS13] and Philipp’s law [Phi76]. The extreme
value law states
$ %
ns
lim λ max x k < = exp(−1/s)
n→∞ 1≤k≤n log(2)

and Philipp showed that Lebesgue a.e. we have

log log n 1
lim inf max x k · = .
n→∞ 1≤k≤n n log 2
2.4 Ergodicity and exactness | 101

Let us now turn our attention to the α-Lüroth systems. In light of the fact that we proved
in Section 2.4.1 that each map L α is ergodic with respect to the Lebesgue measure, we
can use Birkhoff’s Pointwise Ergodic Theorem to obtain various statements about the
α-Lüroth elements of λ-a.e. real number x ∈ [0, 1].

Proposition 2.4.19. Let L α denote the α-Lüroth map for the partition α = {A n : n ∈ N}

with λ(A n ) = a n and tails t n = ∞ k=n a k , as before. Then, for λ-a.e. x = [1 , 2 , 3 , . . .]α ∈
[0, 1], the following statements hold:
1
(a) lim #{j ≤ n : j = k} = a k , for each k ∈ N.
n→∞ n ⎛ ⎞
1 !n  ∞
(b) lim log ⎝ j ⎠ = a k log k.
n→∞ n
j=1 k=1

1 
n ∞
(c) lim j = tk .
n→∞ n
j=1 k=1
(d) For each k ∈ N, every finite sequence y1 , . . . , y k of positive integers appears infinitely
often in the α-Lüroth expansion of x.
(e) With the additional assumption on the partition α that a n ≤ t n+1 for sufficiently large
n ∈ N, we have that
   ∞
1  
lim log x − r(α)
n = a k log a k .
n→∞ n
k=1

Proof. Each of the above statements follows directly on application of Birkhoff’s


Pointwise Ergodic Theorem to a specific Lebesgue integrable function f . For the first
four assertions, choose the function f in turn to be given by the characteristic function
1 A k , then log(1 (x)), then 1 (x) and finally by the characteristic function 1 C α (y1 ,...,y k ) .
Here, 1 (x) is defined to be the first element in the α-Lüroth expansion of x. For part
(c), note that we have to do the same trick as was done for the Gauss map case only in
the event that the partition α is of infinite type.
For part (e), first notice that under the stated condition on α, we have for
sufficiently large n that
 
 
a1 . . . an an+1 ≤ x − r(α)
n  ≤ a 1 . . . a  n . (2.9)

Then, let f be given by f (x) := log(a1 (x) ). Using (2.9), we have that

 
1 1 1
n−1 n n
 
lim f ◦ L jα (x) = lim log aj (x) = lim log x − r(α)
n 
n→∞ n n→∞ n n→∞ n
j=0 j=1 j=1
, ∞ ,

= log a1 (x) dλ(x) = log a1 (x) dλ(x)
[0,1] k=1 A
k

∞
= a k log a k .
k=0

This finishes the proof.


102 | 2 Basic ergodic theory

Remark 2.4.20.
1. The lists given in Propositions 2.4.17 and 2.4.19 gives only a small sample of the
possible results obtainable using Birkhoff’s Pointwise Ergodic Theorem in this
manner. The reader is invited to think of others.
2. It is immediately clear that the densities of the appearances of the digits in the
α-Lüroth expansion constitute a probability vector, as they are just given by the
associated a k s. Calculating the sum of the frequencies appearing in part (a) of
Proposition 2.4.17 shows that the same is true for the Gauss map.
3. The extra condition on α given in part (e) of the previous proposition is equivalent
to the requirement that a n /t n ≤ 1/2, for all n sufficiently large. For the example
of the alternating Lüroth map, this condition is met. It is also satisfied for any
expansive partition of exponent θ > 0 and for expanding partitions with ρ < 2.

To finish this section we will prove a useful consequence of Birkhoff’s Ergodic


Theorem.

Proposition 2.4.21. Under


 the 
assumptions  of Birkhoff’s Pointwise Ergodic Theorem we
also have convergence of n−1 n−1j=0 f ◦ T j
in L1 (μ).

Proof. To prove convergence in L1 (μ) we first verify the claim for bounded functions
and then use the fact that L∞ (μ) is dense in L1 (μ). Let h ∈ L∞ (μ) ⊂ L1 (μ). For

notational convenience we write S n h := n−1 j
j=0 h ◦ T . Since  h ◦ T ∞ =  h ∞ we also
have that the a.e. defined limit h* = lim n−1 S n h is in L∞ (μ). Hence, a.e. we have
 −1 
n S h − h*  → 0 and by Lebesgue’s Dominated Convergence Theorem, it follows that
1 −1 n 1  
1n S n h − h* 1 → 0. Since n−1 S n h is a Cauchy sequence in the Banach space L1 (μ),
1
for every ε > 0 there exists N(ε, h) ∈ N, such that for all k > 0 and n > N (ε, h) we have
1 1
11 1
1 S n h − 1 S n+k h1 < ε.
1n n+k 1
1

For ε > 0 and each f ∈ L1 (μ) we can find h ∈ L∞ (μ) with f − h1 < ε/4. Then for n >
 
N ε/2, h and k > 0
1 1 1 1 1 1
11 1 1 1 1 1
1 S n f − 1 S n+k f 1 ≤ 1 1 S n f − 1 S n h1 + 1 1 S n h − 1 S n+k h1
1n n+k 1 1 n n 1 1 n n+k 1
1 1 1
1 1
1 1 1 1
+1 1
1 n + k S n+k h − n + k S n+k f 1
1
≤ 2 f − h1 + ε/2 < ε.
1 
Therefore n S n f n≥1 is a Cauchy sequence in L1 (μ) and consequently must have a
limit. This finishes the proof.
2.4 Ergodicity and exactness | 103

2.4.3 Ergodic theorems for infinite measures

Let us now turn our attention back to infinite measure-preserving systems. It turns
out, and we give a straightforward proof of this at the end of the section, that in the
case of a dynamical system that preserves an infinite measure, Birkhoff’s Pointwise
Ergodic Theorem is replaced by the following statement.

Theorem 2.4.22. Let (X, B , μ, T) be a conservative and ergodic measure-preserving


system, such that μ(X) = ∞. Then, for all f ∈ L1 (μ) and μ-a.e. x ∈ X, we have that

1
n−1
lim f ◦ T j (x) = 0.
n→∞ n
j=0

We delay the proof of the above theorem until after the statement of a stronger ergodic
theorem (Theorem 2.4.24); for the moment, let us consider again the original statement
of Birkhoff’s Pointwise Ergodic Theorem given in Theorem 2.4.16. So, let T : (X, B , μ) →
(X, B , μ) be a probability-measure-preserving system and let A ∈ B be a measurable

set. Define S n (A) := n−1 j
j=0 1 A ◦ T , that is, the function S n (A) evaluated at a point
simply counts the number of visits the orbit of x makes to the set A before time n. We
shall, following Zweimüller [Zwe04], call S n (A) the occupation time of A. Birkhoff’s
Pointwise Ergodic Theorem then implies that
1
lim S n (A)(x) = μ(A), for μ-a.e. x ∈ X.
n→∞ n
This tells us three things. Firstly, it shows that the rate at which the occupation time
of A diverges is asymptotically the same for μ-a.e. x ∈ X. Secondly, it proves that this
rate depends on A only through the measure of the set A and, thirdly, it identifies the
occupation time as being proportional to n.
For infinite systems, however, the infinite-measure version of Birkhoff’s Pointwise
Ergodic Theorem, given above in Theorem 2.4.22, only provides an upper bound for
S n (A), but it gives no information on how the asymptotic behaviour of S n (A)(x) is
related to A and to x. It is natural to ask whether a sequence (c n )n∈N of normalising
constants can be found such that for all A ∈ B, we have that
1
lim S n (A)(x) = μ(A), for μ-a.e. x ∈ X.
n→∞ cn
Unfortunately, this is simply not possible, as the next theorem shows.
104 | 2 Basic ergodic theory

Theorem 2.4.23 (Aaronson’s Theorem). Let (X, B , μ, T) be a conservative and ergodic


measure-preserving system, such that μ(X) = ∞. Also let (c n )n≥1 be any arbitrary
sequence of strictly positive real numbers. Then, for all f ∈ L+1 (μ) we have,

1 
n−1
lim inf f ◦ T j (x) = 0, μ-a.e.,
n→∞ cn
j=0

0
or there exists a sequence (n k ) ∈ NN with n k → ∞ such that for all f ∈ L+1 (μ) with f dμ >
0, we have,

1 
n k −1
lim f ◦ T j = ∞, μ-a.e.
k→∞ c n k
j=0

Proof. For the proof, we refer to Section 2.4 of Aaronson [Aar97].

Aaronson’s result shows that in the situation of an infinite invariant measure, the
asymptotic behaviour of ergodic sums (or, more specifically, occupation times) is
extremely complicated. Despite this negative result by Aaronson, there are plenty of
interesting qualitative and quantitative characterisations for infinite dynamical sys-
tems. Our first aim in this direction is to show that although the pointwise asymptotics
of the ergodic sum of an integrable function f crucially depends on the point x ∈ X
0
chosen, it only depends upon the function f through its expected value X f dμ.

Theorem 2.4.24 (Hopf’s Ratio Ergodic Theorem). Let (X, B , μ, T) be a conservative


0
and ergodic measure-preserving system and let f , g ∈ L1 (μ), with g ≥ 0 and X g dμ > 0.
Then, for μ-a.e. x ∈ X, we have that
n−1 j 0
j=0 f ◦ T (x) f dμ
lim n−1 = 0X .
n→∞
j=0 g ◦ T (x)
j
X
g dμ

We will prove this theorem shortly, in Section 2.4.6, using the technique of inducing. (In
Chapter 4 we will give another, alternative proof of this theorem, by showing how to
derive it from the more general Chacon–Orstein Ergodic Theorem.) Before doing either
of those things, let us now show how to prove Theorem 2.4.22 using Hopf’s theorem:
Assume that the measure μ is infinite. By the σ-finiteness of the space (X, B , μ) we have
that for each m ∈ N, there exists some set B m ∈ B such that m ≤ μ(B m ) < ∞. Applying
Hopf’s Ratio Ergodic Theorem to the functions f ∈ L1 (μ) and 1 B m yields that
n−1 0
1
n−1 j
j j=0 f ◦ T f dμ
0 ≤ lim sup f ◦ T ≤ lim = X , μ-a.e. on X.
n→∞ n n→∞ S (1
n Bm ) μ(B m)
j=0

Here, the second inequality above comes from the fact that S n (1 B m ) ≤ n. Since m was
arbitrary and limm→∞ μ(B m ) = ∞, the proof of Theorem 2.4.22 is complete.
2.4 Ergodicity and exactness | 105

2.4.4 Inducing

In this section we will introduce and study induced maps. The idea behind these maps
is similar to that of the jump transformation introduced earlier (see Definitions 1.3.2
and 2.3.23). The basic construction goes back to Kakutani [Kak43] and Rokhlin
[Roh48]. In essence, it consists of viewing an infinite measure-preserving system
through the window of a set of finite measure. Recall that for a non-singular system

T : (X, B , μ) → (X, B , μ), a set A is called a sweep-out set for T if ∞ −n
n=0 T (A) =
X mod μ, and also that we showed in Lemma 2.4.3 that for conservative, ergodic
transformations, every set A ∈ B with 0 < μ(A) < ∞ is a sweep-out set.

Definition 2.4.25. Let A be a sweep-out set for the non-singular, conservative trans-
formation T : X → X and define the function φ : X → N by setting

φ(x) := inf {n ≥ 1 : T n (x) ∈ A}.

Note that the conservativity of T ensures that

A = A* := A ∩ T −k A mod μ.
n∈N k≥n

In the context of inducing we will always assume that A = A* , which guarantees that
φ(x) < ∞ for all x ∈ A. When we restrict the function φ to the set A, the map φ is called
the return time to A. Finally, the induced map T A : A → A of T on A = A* is defined to be

T A (x) := T φ(x) (x).

We refer to (A, BA , m|A , T A ) with m|A denoting the measure m restricted to BA := {A ∩


B : B ∈ B} as the induced system. The idea behind T A is that it is an accelerated version
of T. It only records the times each orbit visits the set A, cutting out what happens in
between. Let us now prove one straightforward, but nevertheless useful, identity. To
shorten the notation, in all that follows we will again write {φ = n} for the set {x ∈ X :
φ(x) = n} and, similarly, {φ > n} for the set {x ∈ X : φ(x) > n}.

Lemma 2.4.26. If A is a sweep-out set for the non-singular, conservative transformation


T : X → X, then we have for all B ⊆ X that

T A−1 (A ∩ B) = A ∩ {φ = n} ∩ T −n (B).
n=1

Further, the induced system (A, BA , m|A , T A ) is also non-singular and conservative.

Proof. The function φ is finite μ-almost everywhere on X. It therefore follows that A =


∞ n
n=1 A ∩ { φ = n }, where this union is disjoint. Since T A = T on the set A ∩ { φ = n },
106 | 2 Basic ergodic theory

we have that

T A−1 (A ∩ B) = A ∩ {φ = n} ∩ T A−1 (A ∩ B)
n=1

= A ∩ {φ = n} ∩ T A−1 (A) ∩ T A−1 (B)
n=1

= A ∩ {φ = n} ∩ T −n (B).
n=1

This identity immediately implies that the induced system is non-singular. To see
that the induced system is also conservative note that by the definition of φ and the
conservativity of T we have for all B ∈ BA that
 
1 B ◦ T Ak = 1B ◦ T n = ∞
k∈N n∈N

a.e. on B.

Let us now turn to measure-theoretic questions. Properties of the induced map can be
used to deduce interesting properties of the original system and vice versa, as we shall
show in the following three propositions. First we assume some knowledge of T A and
use this to investigate T.

Proposition 2.4.27. Let T : X → X be a B-measurable transformation and assume that


for A ∈ B the induced map T A : A → A preserves some finite measure ν . Then the following
hold:
(a) There exists a T-invariant measure m given by



m(B) := ν (A ∩ {φ > n } ∩ T −n (B)), for all B ∈ B
n=0

such that m|A = ν .


(b) The system (X, B , m, T) is conservative.

Proof. The proof of part (a) follows from Lemma 2.4.26, similarly to the proof of
Lemma 2.3.25. We leave the details as an exercise. Part (b) follows directly from
Maharam’s Recurrence Theorem (see Theorem 2.2.14).

Proposition 2.4.28. Let (X, B , μ, T) be conservative and non-singular.


(a) Assume that A is a sweep-out set for T such that the induced system (A, BA , μ|A , T A )
is ergodic. Then (X, B , μ, T) is also ergodic.
(b) If (X, B , μ, T) is ergodic, then also (A, BA , μ|A , T A ) is ergodic.

Proof. To prove (a) we first claim that for any measurable T-invariant set B = T −1 (B),
the intersection A ∩ B is invariant under T A . Indeed, in light of Lemma 2.4.26, we have
2.4 Ergodicity and exactness | 107

that
∞ ∞
T A−1 (A ∩ B) = A ∩ {φ = n} ∩ T −n (B) = A ∩ {φ = n} ∩ B = A ∩ B.
n=1 n=1

By assumption, T A is ergodic, so either μ|A (A ∩ B) = 0 or μ|A ((A ∩ B)c ) = 0. In the first


case, we have μ(A ∩ B) = 0 and we may conclude, since A is a sweep-out set and T is
non-singular, that
   
μ(B) = μ T −n (A) ∩ B =μ T −n (A ∩ B) = 0.
n∈N n∈N

c
Analogously, the second case yields that m(B ) = 0 and the proof is finished.
For the proof of part (b) we make use of Lemma 2.4.3 in the following way. Assume
that we have a set B ∈ BA with T A−1 (B) = B, μ A (B) > 0 and such that B is not equal to
A mod μ. Then the set A \ B has positive measure and for all x ∈ A \ B we find the
contradiction

∞ 

0= 1 B ◦ T Ak (x) = 1 B ◦ T n (x) = ∞.
k=0 n=0

Let us now present a converse to Proposition 2.4.28. That is, we now assume some
knowledge concerning the original map T and use this knowledge to obtain facts
about the induced map T A .

Proposition 2.4.29. Let (X, B , μ, T) be a measure-preserving system and let A


be a sweep-out set for T. Then the induced system (A, BA , μ|A , T A ) is also
measure-preserving.

Proof. Fix B ∈ BA . Using Lemma 2.4.26 we find


 
μ|A T A−1 (B)

∞  
= μ A ∩ {φ = n} ∩ T A−1 (B)
n=1
    


−1 −n+1
n−2
−k c −n
n−1
−k c
= μ T T (B) ∩ T (A ) \ T (B) ∩ T (A )
n=1 k=0 k=0
   


−n+1
n−2
−k c −n
n−1
−k c
= μ T (B) ∩ T (A ) − μ T (B) ∩ T (A )
n=1 k=0 k=0
 n−1

−n −k c
= μ|A (B) − lim μ T (B) ∩ T (A ) ≤ μ|A (B).
n→∞
k=0
108 | 2 Basic ergodic theory

On the other hand, applying the above observation to A \ B ∈ BA instead of B, we infer


that
 
μ|A T A−1 (B) = μ(A) − μ(T A−1 (A \ B)) ≥ μ(A) − μ(A \ B) = μ(B).

Combining these two inequalities proves the assertion.

FA (x)
1

0 1 1 x
2

Fig. 2.1. The induced map F A of the Farey map on the interval A := [1/2, 1].

Example 2.4.30. For the Farey map F, let A := [1/2, 1]. Then, the induced map F A :
A → A is given by

F A (x) := F φ(x) (x) = F x2 (x) (x), for x = [1, x2 , x3 , . . .].

So, in this case, the sets {φ = n} are equal to the collection of second-level Gauss
cylinder sets {C(1, n) : n ∈ N}. In fact, we can explicitly calculate the map F A as
follows:
1−x
F A (x) = , for x = [1, n, x3 , x4 , . . .] ∈ C(1, n).
nx − (n − 1)
Also, note that the action of F A on the continued fraction expansion of a point x =
[1, x2 , x3 , . . .] ∈ A is given by F A ([1, x2 , x3 , x4 , . . .]) = [1, x3 , x4 , . . .]. (You will be asked
to check this in Exercise 2.6.5).

Proposition 2.4.31. The Farey map F is ergodic with respect to the measure νF .

Proof. To shorten the notation in what follows, let us denote the Borel σ-algebra on
[0, 1] by B and the Borel σ-algebra on [1/2, 1] =: A by BA .
2.4 Ergodicity and exactness | 109

It was shown in Proposition 2.3.19 that the map F preserves the σ-finite Borel
measure νF on the unit interval which is defined by the density h F , given by h F (x) =
1/x. It therefore follows from Proposition 2.4.29 that F A preserves the measure νF|A .
We will now show that the induced system (A, BA , νF|A , F |A ) of the Farey map and
the Gauss system ([0, 1], B , m G , G) are measure-theoretically isomorphic. Recall that
this means that there exist sets X ⊆ [1/2, 1] and Y ⊆ [0, 1] such that νF (X) = m G (Y) = 1
and a measure-preserving function ψ : X → Y such that ψ ◦ F A = G ◦ ψ. Here, remember
that the function ψ being measure-preserving means that νF|A ◦ ψ−1 (B) = m G (B), for all
B ∈ B. Indeed, it suffices to let X = [1/2, 1), Y = (0, 1] and the function ψ be equal to the
right-hand branch of the Farey map itself, that is, let φ(x) := F |A (x) for all x ∈ [1/2, 1).
Then, if x = [1, x2 , x3 , . . .] ∈ A, we have that

F |A ◦ F A (x) = F |A ([1, x3 , x4 , . . .]) = [x3 , x4 , . . .] = G([x2 , x3 , . . .]) = G ◦ F |A (x).

Furthermore,

& ' ,
1/(1+a)
−1 1 1 1 1
νF|A ((F |A ) ([a, b]) = νF|A , = dx
1+b 1+a log 2 x
1/(1+b)
 
1 b+1
= · log = m G ([a, b]).
log 2 a+1

It is clear that two measure-theoretically isomorphic systems are either both ergodic
or both not ergodic. Since we already know that G is ergodic, it follows from the
argument above that F A is also ergodic with respect to νF|A . Therefore, we can use
Proposition 2.4.28 to conclude that the map F is ergodic with respect to νF .

Remark 2.4.32. An argument similar to the one we have just given for F can be used
to show that each α-Farey map F α is ergodic with respect to the invariant measure
discovered in Proposition 2.3.20. We leave the details to Exercise 2.6.9.

2.4.5 Uniqueness of the invariant measures for F and F α

We will now turn to the question of uniqueness of Lebesgue-absolutely continuous


invariant measures. It will turn out that for rather general systems there is, up to
multiplication by a constant, only one λ-absolutely continuous invariant measure.
The proof will employ the induced maps introduced in the previous section. We will
also need the following proposition.

Proposition 2.4.33. Let (X, B , μ, T) be a conservative and ergodic measure-preserving


system, and for all A ∈ B with 0 < μ(A) < ∞, let φ denote the return time function on the
set A. Then,
110 | 2 Basic ergodic theory



(a) μ(B) = μ(A ∩ {φ > n} ∩ T −n (B)), for all B ∈ B .
,n=0
(b) μ(X) = φ dμ.
A

Proof. By Lemma 2.4.3, every set A satisfying 0 < μ(A) < ∞ is a sweep-out set for T.
Therefore, the function φ is well defined. Observe that for all n ≥ 0,

T −1 (A c ∩ {φ > n}) = (A ∩ {φ > n + 1}) ∪ (A c ∩ {φ > n + 1}). (2.10)

(The proof of this fact is left to Exercise 2.6.8.) Now suppose that for all B ∈ B and some
fixed n ∈ N, we have


n
μ(B) = μ(A ∩ {φ > k} ∩ T −k (B)) + μ(A c ∩ {φ > n} ∩ T −n (B)). (2.11)
k=0

Then, using (2.10) and the T-invariance of μ, we obtain that


n
μ(B) = μ(A ∩ {φ > k} ∩ T −k (B)) + μ(T −1 (A c ∩ {φ > n}) ∩ T −(n+1) (B))
k=0

n
= μ(A ∩ {φ > k} ∩ T −k (B))
k=0

+ μ(((A ∩ {φ > n + 1}) ∪ (A c ∩ {φ > n + 1})) ∩ T −(n+1) (B))



n+1
= μ(A ∩ {φ > k} ∩ T −k (B)) + μ(A c ∩ {φ > n + 1} ∩ T −(n+1) (B)).
k=0

Consequently, since the formula is obviously true for n = 0, we have shown by


induction that (2.11) holds for all n ≥ 0. In order to finish the proof of part (a), we
must show that limn→∞ μ(A c ∩ {φ > n} ∩ T −n (B)) = 0. In order to do this, we split the
remainder of the proof up into three cases.
(i) Let B = T −1 (A). Then,

μ(A) = μ(T −1 (A))


n
= μ(A ∩ {φ > k} ∩ T −(k+1) (A))
k=0
+ μ(A c ∩ {φ > n} ∩ T −(n+1) (A))
n
= μ(A ∩ {φ = k + 1}) + μ(A c ∩ {φ = n + 1}),
k=0

and since we can write μ(A) = ∞ c
k=0 μ(A ∩{ φ = k }) < ∞, we have that lim n→∞ μ(A ∩
−n
{φ > n } ∩ T (B)) = 0, as required.
2.4 Ergodicity and exactness | 111

Incidentally, notice that the above calculation also shows that, for all n ≥ 0,


n
μ(A c ∩ {φ = n + 1}) = μ(A) − μ(A ∩ {φ = k + 1})
 k=0
n

=μ A\ (A ∩ {φ = k + 1})
k=0

= μ(A ∩ {φ > n + 1}).

This will be useful in case (iii), below.


(ii) Suppose now that B ⊆ A. Then, A c ∩ {φ = n} ∩ T −n (A) = ∅, so in this case the proof
is finished.
(iii) Finally, let B ⊆ A c ∩ {φ = N }, for some fixed N ∈ N. In this case, we have that
A c ∩ {φ > n} ∩ T −n (B) ⊆ A c ∩ {φ = n + N }. Thus, we have that

lim μ(A c ∩ {φ > n} ∩ T −n (B)) = lim μ(A c ∩ {φ = n + N })


n→∞ n→∞

= lim μ(A ∩ {φ > N + n}) = 0,


n→∞

where the last two equalities are due to the observation made at the end of case
(i) and the fact that the set A is assumed to be of finite measure. This finishes the
proof of case (iii) and so completes the proof of part (a).

For part (b), if we substitute X for B into part (a), we obtain that


∞ 
∞ ,
μ(X) = μ(A ∩ {φ > n} ∩ T −n (X)) = μ(A ∩ {φ > n}) = φ dμ.
n=0 n=0 A

Remark 2.4.34. The result in Proposition 2.4.33 (b) is known as Kac’s formula.

Theorem 2.4.35. Let (X, B , μ, T) be a conservative, ergodic, non-singular (but not


necessarily measure-preserving) system. Then, up to multiplication by a constant, there
is at most one μ-absolutely continuous σ-finite T-invariant measure.

Proof. Let m1 and m2 be two non-zero, T-invariant σ-finite measures that are both
absolutely continuous with respect to μ. Then, let B ∈ B with μ(B) > 0. Since T is
conservative and ergodic, Lemma 2.4.3 implies that the set B is a sweep-out set for
T with respect to the measure μ. That is, we have

T −n (B) = X mod μ.
n=0
∞ −n
Therefore, since μ(X \ n=0 T (B)) = 0, we also have that
 ∞
  ∞

−n −n
m1 X \ T (B) = 0 and m2 X \ T (B) = 0.
n=0 n=0
112 | 2 Basic ergodic theory

In other words, the set B is also a sweep-out set for T with respect to m1 and m2 .
In particular, m1 (B), m2 (B) > 0, so the measures m1 and m2 are in fact in the same
measure class as μ.
Now choose A ∈ B such that 0 < m1 (A) < ∞ and 0 < m2 (A) < ∞. We may assume,
without loss of generality, that m1 (A) = m2 (A) = 1. Then, the measures m1 |A and m2 |A
are equivalent ergodic T-invariant probability measures for the dynamical system
given by T A : (A, BA ) → (A, BA ). Thus, according to Proposition 2.4.8, we have that m1 =
m2 on BA . The formula in Proposition 2.4.33 (a) then yields that m1 = m2 on all of B.

Corollary 2.4.36. Up to multiplication by a constant, the invariant measures νF for the


Farey system ([0, 1], B , F) and να for the α-Farey system ([0, 1], B , F α ) are unique.

Proof. First, both F and F α are non-singular with respect to λ, since νF , να and λ are
in the same measure class. Then, as F and F α are both conservative and ergodic (see
Proposition 2.4.31 and Exercise 2.6.9), an application of Theorem 2.4.35 gives that both
νF and να are unique.

2.4.6 Proof of Hopf’s Ratio Ergodic Theorem

Now, we will turn our attention to a proof of Hopf’s Ratio Ergodic Theorem. The proof
we will shortly present is due originally to Zweimüller [Zwe04]. It exploits the idea of
inducing in a way that will allow us to apply the finite measure version of Birkhoff’s
Pointwise Ergodic Theorem.
Before we begin the proof, let us first fix some notation. Throughout, the system
(X, B , μ, T) is assumed to be conservative, ergodic and measure-preserving. For f ∈
L1 (μ), we denote ergodic sums for the system T by


n−1
S n (f ) := f ◦ Tj.
j=0

We let A be a sweep-out set for T and consider the induced transformation T A : A → A


on A. For a measurable function h : A → R, we denote the ergodic sums for the induced
system by


n−1
S An (h) := h ◦ T Aj .
j=0

A particularly important example is given by


n−1
φ n := S An (φ) = φ ◦ T Aj , (2.12)
j=0
2.4 Ergodicity and exactness | 113

where φ : A → N is the return time function on A. Note that for a specific x ∈ A, the j-th
summand φ ◦ T Aj (x) inside this sum is equal to the length of the j-th excursion of the
orbit (T n (x))n≥0 to the set A. To have a more concrete idea of what this means, it helps
to think in terms of continued fractions. So, if x = [1, x1 , x2 , x3 , . . .] ∈ A := [1/2, 1] and
if F A denotes the Farey map induced on A, then φ1 (x) := φ(x) = x1 , φ2 (x) := φ(x) +
φ(F A (x)) = x1 + x2 , and so on; in general,


n
φ n (x) := xi .
i=1

Notice also that, trivially, we have

S An (A) := S An (1 A ) = n, for all n ≥ 1.

The idea of chopping up the orbits of points under T into pieces corresponding to each
excursion to the set A is a useful one. We can also apply this idea to obtain the induced
version of a function f : X → R, by adding up the values of the function observed during
the first excursion and then represent these as a single function.

Definition 2.4.37. For f : X → R, let the function f A : A → R be defined by

A

φ(x)−1
f (x) := f ◦ T j (x).
j=0

The function f A is referred to as the induced version of f on A.

Lemma 2.4.38. For an integrable function f : X → R, the following hold.


(a)

S φ n (f ) = S An (f A ), for all n ∈ N.

(b)
, ,
f dμ = f A dμ.
X A

Proof. To prove part (a), we observe that for any n ∈ N, the section of orbit
x, T(x), . . . , T φ n (x)−1 (x) that determines the sum S φ n (f ) consists of n complete
excursions to A (that is, T φ n (x) (x) ∈ A). Therefore, we have that

S φ n (f ) = S φ1 (f ) + S φ2 −φ1 (f ◦ T A ) + · · · + S φ n −φ n−1 (f ◦ T An−1 )


= S φ (f ) + S φ◦T A (f ◦ T A ) + · · · + S φ◦T n−1 (f ◦ T An−1 )
A

= S φ (f ) + (S φ (f )) ◦ T A + · · · + (S φ (f )) ◦ T An−1 = S An (f A ).
114 | 2 Basic ergodic theory

To prove part (b), let f := 1 B for some B ∈ B. Using Proposition 2.4.33 (a), we then have
that
, ∞
1 B dμ = μ(B) = μ(A ∩ {φ > n} ∩ T −n (B))
X n=0
, 


n
= 1 A∩{φ>n} · 1 B ◦ T dμ
A n=0
⎛ ⎞
, φ−1 ,
= ⎝ 1 B ◦ T ⎠ dμ = (1 B )A dμ.
n

A n=0 A

Hence, the assertion in part (b) holds for characteristic functions. A standard ar-
gument from measure theory then finishes the proof; we leave the details as an
exercise.

Remark 2.4.39. The latter proposition also yields Kac’s formula (see Remark 2.4.34)
as a corollary, by simply choosing f := 1 X . In particular, note that this formula implies
that the T-invariant measure μ is infinite if and only if the return-time function to any
set A of positive finite measure is non-integrable.

We are now in a position to provide a proof of Hopf’s Ratio Ergodic Theorem.

Proof of Theorem 2.4.24. Let A be a sweep-out set for T. First observe that it suffices to
prove that for all f ∈ L1 (μ), we have that
0
S n (f ) f dμ
lim (x) = X , for μ-a.e. x ∈ A. (2.13)
n→∞ S n (1 A ) μ(A)

Indeed, the set of points where this limit exists and is equal to the right-hand side of
the equality in (2.13) is T-invariant and of strictly positive μ-measure (since μ(A) > 0).
Therefore, the correct limit must be attained μ-a.e., by ergodicity. Then, if the same
0
assertion is made for g ∈ L1 (μ), with the extra conditions that g ≥ 0 and X g dμ > 0,
the assertion of the theorem follows immediately.
Therefore, we are left only to give a proof of (2.13). For this, consider the
induced map T A . In light of Proposition 2.4.29, we have that T A is an ergodic
measure-preserving transformation on the finite measure space (A, BA , μ|A ). We can
therefore apply Birkhoff’s Pointwise Ergodic Theorem to T A and the induced function
f A , which is integrable by Lemma 2.4.38, to deduce that
, 0
S φ n (f ) S An (f A ) A f dμ
lim = lim = f dμ|A = X , μ-a.e. on A. (2.14)
n→∞ S φ n (1 A ) n→∞ n μ(A)
 
This proves (2.13) for μ-a.e. x ∈ A for the subsequence φ n (x) n≥1 . It remains to
demonstrate convergence for the full sequence.
2.5 Exactness revisited | 115

By the linearity of the integral, we may assume without loss of generality that f ≥ 0.
 
Then the sequence S n (f ) n≥1 is non-decreasing in n. Now, for a.e. x ∈ A we find for
every k ∈ N a positive integer n such that φ n−1 (x) ≤ k < φ n (x). Therefore, observing that
S k (1 A )(x) = n − 1 and using Lemma 2.4.38 (a), we have

S An−1 (f A )(x) S k (f )(x) n S An (f A )(x)


≤ ≤ .
n−1 S k (1 A )(x) n − 1 n

Observing that n tends to infinity as k tends to infinity, the proof is finished.

2.5 Exactness revisited

Recall from Definition 2.4.10 that a non-singular transformation T of a σ-finite measure


)
space (X, B , μ) is said to be exact if for each B in the tail σ-algebra n∈N T −n (B) we
have that either μ(B) or μ(X \ B) vanishes. In this section, our first aim is to prove
the exactness of the Farey map. The second aim will be to give a useful equivalent
formulation of exactness, known as Lin’s criterion.
Let us first give a different characterisation of exactness, which we will employ
to prove that the Farey map is exact. The origin of this characterisation is a paper by
Miernowski and Nogueira [MN13] but it can also be found formulated more generally,
in [Len14]. Similar, although not equivalent, ideas can be found already in a paper
from the 1960s by Rokhlin [Rok64], where exactness was introduced for the first
time. We will show that exactness of a transformation is implied by the following
intersection property.

Definition 2.5.1. Let (X, B , μ) be a σ-finite measure space and let T : (X, B , μ) →
(X, B , μ) denote a bi-measurable map (that is, T is measurable and T(B) ∈ B for all
B ∈ B). Then T is said to satisfy the intersection property with respect to the measure μ
provided that for every A ∈ B with positive measure, there exists some k ≥ 1, depending
on A, such that μ(T k (A) ∩ T k+1 (A)) > 0.

Lemma 2.5.2. Let the map T : (X, B , μ) → (X, B , μ) be bi-measurable and ergodic with
respect to μ. If T satisfies the intersection property, then T is exact.

Proof. Suppose that T is bi-measurable, ergodic and satisfies the intersection prop-
)
erty, and let A ∈ m∈N T −m (B). Suppose that μ(A) > 0. In order to show that T is
exact, we have to show that the complement of A has μ-measure equal to zero. Since
A belongs to the tail σ-algebra, we have that T −m (T m (A)) = A, for all m ≥ 0. We then
have for all m ≥ 0,

T m+1 (T −1 (A) \ A) = T m+1 (T −1 (T −m (T m (A))) \ T −(m+1) (T m+1 (A)))


= T m+1 (T −(m+1) (T m (A)) \ T −(m+1) (T m+1 (A)))
= T m (A) \ T m+1 (A).
116 | 2 Basic ergodic theory

Using this, it follows for all m ≥ 1, that

T m (T −1 (A) \ A) ∩ T m+1 (T −1 (A) \ A)


= (T m−1 (A) \ T m (A)) ∩ (T m (A) \ T m+1 (A)) = ∅.
    
In particular, this shows that λ T m T −1 (A) \ A ∩ T m+1 T −1 (A) \ A = 0, for all m ≥
1. Hence, by the intersection property, we have that μ(T −1 (A) \ A) = 0. Proceeding
similarly for the set A \ T −1 (A), we also obtain that μ(A \ T −1 (A)) = 0. Hence, it follows
that T −1 (A) = A mod μ. From there, the result is an immediate consequence of the
ergodicity of T.
We will now begin to work towards the proof that F is exact. This will follow from a pro-
position given below, after we prove the following preparatory lemma. Let us remark
that the main idea of the proof of exactness of F which we present here, including how
to utilise the intersection property given in Definition 2.5.1, is inspired by the ideas of
Lenci in [Len12], where one finds a similar proof valid for more general systems.

Lemma 2.5.3. Consider the Farey system ([0, 1], B , F), and let A be given such that
λ(A) > 0. Then
 
lim sup λ F n (A) ∩ C(1) = λ(C(1)).
n→∞
 
Proof. We always have λ F n (A) ∩ C(1) ≤ λ(C(1)). Hence we are left to show
that there exists a strictly increasing sequence of positive integers (n k )k≥1 such
 
that limk→∞ λ F n k (A) ∩ C(1) = λ(C(1)). For this let x = x1 , x2 , x3 , . . . be a
Lebesgue-density point² of A and recall that (C(x  1 , . . . , x n ))n≥1 denotes the shrinking
family of Farey cylinder sets each containing x. Note that, since F is ergodic, (or by
using Halmos’s Recurrence Theorem), we can certainly choose x such that there exists
a sequence (n k )k≥1 such that x n k +1 = 1, for all k ∈ N. To shorten the notation, let us
define for all k ∈ N,

 1 , . . . , x n , x n +1 ) = C(x
D k := C(x  1 , . . . , x n , 1).
k k k

 = C(1). Since x is a
We have that F n k is bijective on D k and we have that F n k (D k ) = C(1)
Lebesgue-density point of A, it follows that

λ(A ∩ D k ) λ(D k \ A)
lim = 1 and lim = 0. (2.15)
k→∞ λ(D k ) k→∞ λ(D k )

2 A good reference for the Lebesgue density theorem and Lebesgue density points is Rudin [Rud87];
see in particular Theorem 7.2.
2.5 Exactness revisited | 117

By partitioning C(1), we have

λ(C(1)) λ(C(1) \ F n k (A)) λ(F n k (A) ∩ C(1))


1= = + .
λ(C(1)) λ(C(1)) λ(C(1))
Here, the limit for k tending to infinity of the first summand in the above expression is
equal to zero. Indeed, by first using the fact that F n k (D k ) = C(1) and then Lemma 2.4.13
and (2.15), it follows that we have

λ(C(1) \ F n k (A)) λ(F n k (D k \ A)) λ(D k \ A)


=  λ(F n k (D k \ A))  → 0,
λ(C(1)) λ(C(1)) λ(D k )

for k tending to infinity. This finishes the proof.

Proposition 2.5.4. Let A be given such that λ(A) > 0. Then for the Farey map F we have
that

lim sup λ(F n (A) ∩ F n+1 (A) ∩ C(1)) = λ(C(1)).


n→∞

Proof. Obviously, lim supn→∞ λ(F n (A) ∩ F n+1 (A) ∩ C(1)) ≤ λ(C(1)). As in the proof
of Lemma 2.5.3, let x = x1 , x2 , . . . be a Lebesgue-density point of A. In light of
Remark 2.4.18, we have that there exists a sequence (m k )k≥1 such that x m k +1 =
x m k +2 = 1, for all k. Therefore, for both of the sequences (C(x  1 , . . . , x m , x m +1 ))k≥1 and
k k

(C(x1 , . . . , x m k , x m k +1 , x m k +2 ))k≥1 we can proceed as in the proof of Lemma 2.5.3, which
yields that

lim λ(F m k (A) ∩ C(1)) = lim λ(F m k +1 (A) ∩ C(1)) = λ(C(1)).


k→∞ k→∞

Using this observation we obtain for the lower bound

λ(F m k (A) ∩ F m k +1 (A) ∩ C(1))


= λ(F m k (A) ∩ C(1)) + λ(F m k +1 (A) ∩ C(1)) − λ((F m k (A) ∪ F m k +1 (A)) ∩ C(1))
≥ λ(F m k (A) ∩ C(1)) + λ(F m k +1 (A) ∩ C(1)) − λ(C(1)) → λ(C(1)),

for k tending to infinity. This proves the proposition.

Corollary 2.5.5. For the Farey system ([0, 1], B , F), let A be given such that λ(A) > 0.
Then there exists n ∈ N such that

λ(F n (A) ∩ F n+1 (A)) > 0.

In other words, we have that the Farey map F satisfies the intersection property with
respect to the Lebesgue measure λ.
118 | 2 Basic ergodic theory

Theorem 2.5.6. The Farey map F is exact with respect to the infinite invariant meas-
ure νF .

Proof. Recalling that F is ergodic, this follows immediately on combining Corol-


lary 2.5.5 with Lemma 2.5.2 and using the fact that λ and νF are in the same measure
class.
Let us now move on to our second goal, the statement and proof of Lin’s equivalent
formulation of exactness [Lin71]. We will use this result in Chapter 5 to obtain
information about certain sets defined in terms of their continued fraction expansion.
In the proof here, we will make use of the dual space of L1 (X, B , μ), which we already
introduced in Remark 2.3.8, and the Banach–Alaoglu Theorem, which can be found,
for instance, as Theorem V.4.2 in Dunford and Schwartz [DS88], if the reader has not
encountered it before.

Theorem 2.5.7 (Lin’s Criterion for Exactness). Let (X, B , μ, T) be a measure-preserving


0
system. Then, T is exact if and only if for all f ∈ L1 (X, B , μ) with X f dμ = 0 we have for
 that
the transfer operator T
1 1
1 n 1
lim 1T (f )1 = 0.
n→∞ 1

Proof. First suppose that T is exact and that f ∈ L1 (X, B , μ) has zero expectation, that
0
is, suppose that X f dμ = 0. Then, since T   = 1 (see Exercise 2.6.11), the sequence
(T n (f )1 )n≥1 is bounded. To show that its limit is zero, fix a subsequence (n k ) such
k≥1
that
1 1 1 1
1  nk 1 1 n 1
lim 1T ( f )1 = lim sup 1T ( f )1 < ∞.
k→∞ 1 n→∞ 1

If (g n )n≥1 is defined by
 
g n := sign T n ( f ) ∈ L∞ (X, B , μ) ,

then we have, for all n ∈ N,


1 1 , ,
1 n 1  n ( f ) dμ = g n ◦ T n · f dμ.
1T ( f )1 = g n · T
1
1 1
Since 1g n ◦ T n 1∞ = 1, it follows from the Riesz Representation Theorem that we can
identify each g n ◦ T n with a bounded linear functional on L1 (X, B , μ), that is, with an
element in L1 (X, B , μ)*  L∞ (X, B , μ). By the Banach–Alaoglu Theorem we know that
the closed unit ball in L1 (X, B , μ)* is compact in the weak-* topology. Since T −n−1 B ⊂
 *  *
T −n B, it follows that L1 X, T −n−1 B , μ ⊂ L1 X, T −n B , μ . Now for each K ∈ N, we
consider the non-empty and weak-* compact sets G K of accumulation points of the
 
sequence g n k ◦ T n k k≥K in L1 (X, B , μ)* , that is,
.  *   /
G K := g ∈ L1 X, T −n K B , μ : g is an accumulation point of g n k ◦ T n k k≥K ,
2.5 Exactness revisited | 119

as subsets of the weak-* compact unit ball in L1 (X, B , μ)* . Since G K ⊂ G K+1 for all
)
K ∈ N, the intersection property of compact sets implies that K∈N G K = ∅. Fix
)  *  
g ∈ K∈N G K . By definition, g ∈ L1 X, T −n K B , μ  L∞ X, T −n K B , μ for all K ∈ N, so
)
we have that g is measurable with respect to the tail-σ-algebra n∈N T −n B. Therefore,
by the exactness of T, we have that g must be constant μ-a.e., that is, g = c ∈ [0, ∞)
 
μ-a.e., for some c ∈ R. Since g is an accumulation point of the sequence g n k ◦ T n k k≥1 ,
  0 0
there exists a subsequence n k ≥1 such that lim→∞ g n k ◦ T n k · f dμ = g · f dμ.
This gives that
1 1 1 1 ,
1 n 1 1  nk 1 n
0 ≤ lim sup 1T ( f )1 = lim 1T ( f )1 = lim g n k ◦ T k · f dμ
n→∞ 1 k→∞ 1 k→∞
, , ,
= lim g n k ◦ T n k · f dμ = g · f dμ = c · f dμ = 0.
→∞

In order to prove the converse, we assume that 1 T 1is not exact and construct
0 1 n 1
f ∈ L1 (X, B , μ) with f dμ = 0 and lim inf n→∞ 1T ( f )1 > 0. To that end, choose
) −n
1
A ∈ n∈N T B such that 0 < μ (A) < μ (X ), which is possible by the σ-finiteness of μ.
For the same reason there exists a measurable set B ⊂ X \ A such that 0 < μ (B) < ∞. For
0 0
f := 1 A − μ (A) /μ (B) 1 B , we have that f ∈ L1 (X, B , μ), f dμ = 0 and A f dμ = μ (A) > 0.
)
Since A ∈ n∈N T −n B, there exists a sequence (A n )n≥1 in B such that A = T −n A n , for
each n ∈ N. This yields that for all n ∈ N, we have
1 1 ,   , ,
1 n 1  n   n  n f dμ
1T ( f )1 ≥ T f  dμ ≥ T f dμ = 1 A n T
1
An An
, ,
= 1 A n ◦ T n f dμ = f dμ > 0.
A

This finishes the proof.


We end this section by showing that if the underlying measure is finite, then exactness
of a transformation implies the following mixing property. Discussion of mixing for
infinite systems, whilst certainly interesting, is a much trickier business and we shall
not go into it here. The interested reader is referred to the work of Lenci, see [Len14],
[Len13], and references therein.

Definition 2.5.8. Let (X, B , μ, T) be a measure-preserving system such that μ(X) = 1.


Then T is said to be mixing with respect to the measure μ provided that for every A, B ∈
B , we have that

lim μ(A ∩ T −n (B)) = μ(A)μ(B).


n→∞

Corollary 2.5.9. Let (X, B , μ, T) be a measure-preserving system such that μ(X) = 1. If


T is exact, then T is mixing with respect to the measure μ.
120 | 2 Basic ergodic theory

Proof. First note that since T preserves the probability measure μ, we have that
μ ◦ T −1 = μ, which implies d(μ ◦ T −1 )/dμ = 1 and hence, T  1 = 1. Using this, it
0 0
follows that for each f ∈ L1 (X, B , μ) we have that T(f  − f dμ) = T(f  ) − f dμ.
0 0 0 X X
Since (f − X f dμ) ∈ L1 (X, B , μ) and X (f − X f dμ)dμ = 0, we can apply Lin’s
0
Criterion, which gives that limn→∞ T  n (f − f dμ)1 = 0. Using this and the fact
0 X
that 1 A − X 1 A dμ ∈ L1 (X, B , μ), we obtain
, ,
lim μ(A ∩ T −n (B)) = lim 1 A (1 B ◦ T n ) dμ = lim (T  n 1 A )1 B dμ
n→∞ n→∞ n→∞
X X
,
= lim  n (1 A − μ(A))1 B dμ + μ(A)μ(B)
(T
n→∞
X

= μ(A)μ(B).

This completes the proof.

2.6 Exercises

Exercise 2.6.1. Show that the Lebesgue measure is not invariant under the Gauss map.

Exercise 2.6.2. Show that the system (R, B , λ, T) defined in Example 2.3.12 is really
non-singular and that its Hopf decomposition is non-trivial with conservative part
 T = [0, 1].
given by C

Exercise 2.6.3. As before, let F0 : x → x/(1 + x) and F1 : x → 1/(1 + x) denote the two
inverse branches of the Farey map F. For ω = (ω1 , . . . , ω n ) ∈ {0, 1}n , n ∈ N, define

F ω := F ω1 ◦ . . . ◦ F ω n ,
En := {(ω1 , . . . , ω n ) ∈ {0, 1}n : # {i : ω i = 1} is even}, and On := {0, 1}n \ En .

(i) Show that for all n ∈ N and x ∈ (0, 1) we have that


*
F ω (x)
x = * ω∈En .
ω∈On F ω (x)

(ii) Use the identity in (i) to obtain an alternative proof of the fixed point equation
P F (h) = h of the Ruelle operator P F for the map F. (See Proposition 2.3.19).

Exercise 2.6.4. Give a proof of Proposition 2.3.21 (b), by verifying the eigenequation
P L α h L α = h L α for the Ruelle operator P L α of an α-Lüroth map L α .

Exercise 2.6.5. Let F A : A → A denote the induced map of the Farey map F on
the set A := [1/2, 1]. Prove that F A ([x1 , x2 , x3 , x4 , . . .]) = [x1 , x3 , x4 , . . .], for all
[x1 , x2 , x3 , x4 , . . .] ∈ A.
2.6 Exercises | 121

Exercise 2.6.6. Let (X, B , μ, T) be an ergodic system with a finite invariant measure μ,
and suppose that E ∈ B is such that μ(E) > 0. Let (n k )k≥0 be the sequence of occurrence
times such that T n k (x) ∈ E for all k ≥ 0 (note that these are guaranteed to exist for μ-a.e.
x by Halmos’s Recurrence Theorem). Show that if we assume that n0 = 0, so x ∈ E, then
we have μ-a.e. that

nk 1
lim = .
k→∞ k μ(E)

Exercise 2.6.7. Prove that in an infinite ergodic system, the assumption of con-
servativity is necessary for the existence of sweep-out sets of finite measure. (See
Lemma 2.4.3.)

Exercise 2.6.8. Prove that where φ(x) := inf {n ≥ 1 : T n (x) ∈ A} and A is a sweep-out
set for T, we have

T −1 (A c ∩ {φ > n}) = (A ∩ {φ > n + 1}) ∪ (A c ∩ {φ > n + 1}), for all n ≥ 0.

Exercise 2.6.9. Taking inspiration from the proof of Proposition 2.4.31, prove that the
map F α is ergodic with respect to the measure να defined in Proposition 2.3.20.

Exercise 2.6.10. Using the duality (L1 (μ))*  L∞ (μ) give a formal proof that the

unitary operator U T : L∞ (μ) → L∞ (μ), f → f ◦ T, is the dual operator of T.
  = 1.
Exercise 2.6.11. Prove that T

Exercise 2.6.12. Prove the statement in (2.7).

Exercise 2.6.13. Let μ and ν be two σ-finite measures on (X, B) with μ ∼ ν . Show that
the Radon–Nikodým density dμ/dν is almost everywhere positive and that dν /dμ =
(dμ/dν )−1

Exercise 2.6.14. Show that if v/w is a reduced fraction in (0, 1), then for all p/q ∈
 
F −n v/w we have that
  
 n  p  q2
(F ) = .
 q  w2

Exercise 2.6.15. Show that the statement in Proposition 2.5.4 still holds if we replace

C(1) by any arbitrary Farey-cylinder whose final symbol is equal to 1. (Of course, the
sequence (m k )k≥1 might be a different one).

Exercise 2.6.16. Fill in the gap in the proof of Lin’s Criterion: Prove that if g is
measurable with respect to the tail σ-algebra of T and T is exact, then g is constant
μ-a.e.
3 Renewal theory and α-sum-level sets
In this chapter we will mainly investigate certain subsets of the unit interval which
are defined in terms of the α-Lüroth expansion. However, in order to motivate this
exploration, in the first section we will describe the analogous problem for the
continued fraction expansion. In the second section, we first define the sets we are
interested in and then show how classical results in the field of renewal theory can be
used to obtain detailed information about the sets in question.

3.1 Sum-level sets

One of the goals of Chapter 5 will be to give a detailed measure-theoretical analysis of


the following sets Cn , for n ∈ N, which we will call the sum-level sets for the continued
fraction expansion:
# -

k
Cn := [x 1 , x2 , x3 , . . .] ∈ [0, 1] : x i = n for some k ∈ N
i=1

The first few of these sets are shown in Fig. 3.1, below. Directly from the definition, we
have that C1 = C(1) = [1/2, 1]. Likewise, it follows that for the next few sum-level sets
we have

C2 = C(2) ∪ C(1, 1), C3 = C(3) ∪ C(2, 1) ∪ C(1, 1, 1) ∪ C(1, 2),

and so on.
To begin the inspection of the sequence (Cn )n≥1 of these sets, let us consider the
lim-inf set, which is defined by

lim inf Cn := Cm = {x ∈ [0, 1] : x ∈ Cn for all sufficiently large n }.


n→∞
n≥1 m≥n

In order for an irrational number x to lie in all of the sets CN , CN+1 , CN+2 , . . ., for some

N ∈ N, we must have that x = [x1 , . . . , x k , 1, 1, 1, . . .], where ki=1 x i = N. In other
words, the lim-inf set of the sequence (Cn )n≥1 is equal to the set of all noble numbers
(see Definition 1.1.13 (d)), that is, irrational numbers whose continued fraction digits
are from some point on always equal to 1. As we have already observed, this set is
3.2 Sum-level sets for the α-Lüroth expansion | 123

0 1
C1
1/2
C2
1/3 2/3
C3
1/4 2/5 3/5 3/4
C4
1/5 2/7 3/8 3/7 4/7 5/8 5/7 4/5
..
.
Fig. 3.1. The first four sum-level sets.

countable. On the other hand, one immediately verifies that the lim-sup set¹

lim sup C n := Cm = {x ∈ [0, 1] : x ∈ Cn for infinitely many n }


n→∞ n≥1 m≥m

is equal to the set of all irrational numbers in [0, 1]. Hence, at first sight, the sequence
of sum-level sets appears to be far away from being a canonical dynamical entity.
(However, in Chapter 5 we will show that this is actually not the case.)
For the Lebesgue measure of the first four members of the sequence of the
sum-level sets (cf. Fig. 3.1) one immediately computes that

λ(C1 ) = 1/2, λ(C2 ) = 1/3, λ(C3 ) = 3/10 and λ(C4 ) = 39/140.

From this one might already start to suspect that the sequence (λ (Cn ))n≥1 is decreasing
for n tending to infinity. In fact, it was conjectured by Fiala and Kleban [FK10] that
λ (Cn ) tends to zero, as n tends to infinity. In Section 5.1, we will settle this conjecture
affirmatively, as well as prove some much stronger results. Before this, though, we will
consider the parallel, easier to analyse, situation for the α-Lüroth systems.

3.2 Sum-level sets for the α-Lüroth expansion

In this section, we will study the sequence of the Lebesgue measures of the α-sum-level
sets for an arbitrary α-Lüroth map L α , for a given partition α. Analogous to the
sum-level sets for the continued fraction expansion, defined in the previous section,

1 Recall that we have already encountered lim-sup sets in Chapter 1, specifically in Lemma 1.2.18 (the
Borel–Cantelli Lemma).
124 | 3 Renewal theory and α-sum-level sets

these sets are given, for each n ∈ N, by


# -

k
Cn(α) := x ∈ C α (1 , 2 , . . . , k ) : i = n, for some k ∈ N .
i=1

Also, for later convenience, we define C0(α) := [0, 1]. Our toolkit for the investigation
into the sequence (λ(Cn(α) ))n≥1 will consist of classical results from renewal theory.

3.2.1 Classical renewal results

Our aim here is to state and give some ideas of the proofs of some strong renewal
theorems due to Garsia/Lamperti [GL63] and Erickson [Eri70]. Before doing so, we first
state and prove the original discrete renewal theorem due to Erdős, Pollard and Feller
[EFP49]. We begin by defining a renewal pair.

Definition 3.2.1. Let (v n )n≥1 be an infinite probability vector, that is, a sequence of

non-negative real numbers for which ∞ k=1 v n = 1. Assume that associated to this
vector there exists a sequence (w n )n≥0 , with w0 := 1, which satisfies the renewal
equation:


n
wn = v m w n−m , for all n ∈ N.
m=1

A pair ((v n )n≥1 , (w n )n≥0 ) of sequences with these properties is referred to as a renewal
pair.

Let us give a brief sketch of the original probabilistic motivation for this definition.
For further details and many examples, we refer the reader to Chapter XIII of Feller
[Fel68a].
Consider a sequence of independent identically distributed random variables
(T n )n∈N with values in N. (Just think of T n as the random discrete time between
the occurrence of a ‘recurrent event’, like the successive renewal of a burned-out
lightbulb.) For the distribution, we write v k := P(T1 = k) for each k ∈ N. Now the
probability of the occurrence of the event at time k ∈ N is given by
#  -

w k := P T i = k for some  ∈ N0 .
i=1

Since the empty sum is by definition equal to 0 we have w0 = 1. Using the fact that
the sequence of random variables (T n ) are independent and identically distributed
3.2 Sum-level sets for the α-Lüroth expansion | 125

we find
# 
-

wn = P T i = n for some  ∈ N0
i=1
# 
-

n 
= P T1 = m and T i = n for some  ∈ N
m=1 i=1
# 
-

n−1 
= P({T1 = n}) + P T1 = m and T i = n − m for some  ∈ N
m=1 i=2
# 
-

n−1 
= vn + vm P T i+1 = n − m for some  ∈ N0
m=1 i=1

n
= v m w n−m .
m=1

This shows that the renewal equation w n = nm=1 v m w n−m for n ∈ N has its natural
place in probability theory. In the following we will see how to determine the
asymptotic behaviour of (w n ) in terms of (v n ) just by analysing the renewal equation.
We are now almost in a position to state and prove the classical discrete renewal
theorem. The proof we give here is essentially (with a few extra details inserted) the
original proof given in [EFP49]. Before we start, we make the following definitions for
a given renewal pair ((v n ), (w n )):

d v := gcd{n ≥ 1 : v n > 0} and d w := gcd{n ≥ 1 : w n > 0}.

Then, for all n with w n = 0 we also have that v n = 0, since using the renewal equation

gives that w n = 0 = nm=1 v m w n−m and so each term in this sum must be equal to zero.
In particular, v n w0 = 0, but since w0 = 1, it follows that v n = 0. This implies that d w is
a factor of d v . It is also possible to show, using a fairly straightforward but somewhat
ungainly induction argument, that d v is a factor of d w . Thus, these two quantities are
always equal. We will also need the following elementary technical observation.

Lemma 3.2.2. If d is the greatest common divisor of the sequence of natural numbers
(n k )k≥1 , then there exist numbers K and M with the property that for each m ∈ N such
that m ≥ M there exist c1 , . . . , c K ∈ N such that


K
m·d= ck nk .
k=1

Proof. We can assume that d = 1 (otherwise just divide each of the n k by d), and
also that d is the greatest common divisor of the first K of the given numbers, that is,
g.c.d.(n1 , n2 , . . . , n K ) = 1. It is well known that there then exist integers b1 , b2 , . . . , b K
126 | 3 Renewal theory and α-sum-level sets

with the property that²

b1 n1 + . . . + b K n K = 1.

Letting b := max{|b1 |, |b2 |, . . . , |b K |} and M := bn1 (n1 + · · · + n K ), we have that each


m ≥ M can be written in the form

m = bn 1 (n1 + · · · + n K ) + in1 + r(b1 n1 + · · · + b K n K ),

where i ≥ 0 and 0 ≤ r < n1 come from the division algorithm applied to m − M. Therein
lie the factors c k and (since bn1 > b k r), they are clearly positive integers.
Finally, before stating the theorem, we also need the following elementary lemma. We
include the proof for completeness.

Lemma 3.2.3. Let (b n )n≥1 and (bn )n≥1 be two sequences of real numbers with the
property that limn→∞ (b n + bn ) exists. Then, provided that we do not have lim inf n→∞ b n =
−∞ and lim supn→∞ bn = ∞, or vice versa, it follows that

lim (b n + bn ) = lim inf b n + lim sup bn .


n→∞ n→∞ n→∞

Proof. On the one hand we have

lim (b n + bn ) − lim sup(bn ) = lim inf (b n + bn ) + lim inf (−bn ) ≤ lim inf (b n ).
n→∞ i→∞ n→∞ i→∞ n→∞

On the other hand,

lim (b n + bn ) − lim inf (b n ) = lim sup(b n + bn ) + lim sup(−b n ) ≥ lim sup(bn ).
n→∞ i→∞ n→∞ i→∞ n→∞

Combining these two inequalities, the lemma is proved.

Theorem 3.2.4 (Discrete Renewal Theorem). Let ((v n )n≥1 , (w n )n≥0 ) be a renewal pair
and suppose that d v = 1. Then
1
lim w n = ∞ ,
n→∞ m=1 m · v m

where the limit is understood to be equal to zero if the series in the denominator diverges.

Proof. For ease of notation, throughout we denote s := ∞ m=1 m · v m . First, we show by
induction that 0 ≤ w n ≤ 1, for each n ∈ N0 . To start, notice that w1 = v1 · w0 = v1 ≤ 1.

2 This can be seen, for instance, by considering moduli of integers. The interested reader is referred
to Section 2.9 of The Theory of Numbers, by Hardy and Wright [HW08].
3.2 Sum-level sets for the α-Lüroth expansion | 127

Now suppose that 0 ≤ w k ≤ 1 for all 0 ≤ k ≤ n − 1. Then


n
w n = v1 · w n−1 + v2 · w n−2 + · · · + v n · w0 ≤ v k ≤ 1.
k=1

Let w := lim supn→∞ w n and pick a subsequence (w n k )k∈N with the property that
limk→∞ w n k = w. Then, for all m ≥ 1, we have, via Lemma 3.2.3, that
⎛ ⎞
⎜  ⎟
w = lim w n k = lim ⎝v m · w n k −m + v s · w n k −s ⎠
k→∞ k→∞
1≤s≤n k
s =m

= lim inf v m · w n k −m + lim sup v s · w n k −s
k→∞ k→∞ 1≤s≤n k
s =m

≤ v m lim inf w n k −m + v s lim sup w n k −s
k→∞ k→∞
s =m

≤ v m lim inf w n k −m + (1 − v m )w.


k→∞

From this it follows immediately that v m w ≤ v m lim inf k→∞ w n k −m and therefore,
provided that v m > 0, we obtain

lim sup w n =: w ≤ lim inf w n k −m .


n→∞ k→∞

Thus,

lim w n k −m = w. (3.1)
k→∞

Applying this argument many times over, one obtains that equation (3.1) holds for
all m such that there exist positive integers m1 , . . . m j with each v m i > 0 so that m =
m1 + . . . + m j . Given that d v = 1, from Lemma 3.2.1 it follows that every large enough
m has this form (where we can do without the factors c i , as there is no reason that the
integers m i have to be distinct). In other words, there exists some M ∈ N such that (3.1)
holds for every m ≥ M.
Now, for each n ∈ N, set


r n := vm .
m=n+1

Then r0 = 1 and

∞ 
∞ 
∞ 
∞ 

m · vm = vm + vm + vm + . . . = rn .
m=1 m=1 m=2 m=3 n=0
128 | 3 Renewal theory and α-sum-level sets

From the renewal equation and the fact that r m − r m−1 = −v m , we deduce that


n
r0 · w n = w n = − (r m − r m−1 )w n−m
m=1

and, by bringing the negative terms to the left-hand side, we can write this in the
following way:

r0 · w n + r1 · w n−1 + · · · + r n · w0 = r0 · w n−1 + · · · + r n−1 · w0 . (3.2)

If we define the left-hand side of Equation (3.2) to be equal to s n , then the right-hand
side of Equation (3.2) is equal to s n−1 . Note that s0 = r0 · w0 = 1. Thus, in light of
Equation (3.2), we have that s n = 1, for all n ∈ N. In particular,
n
k −M

r i · w n k −(M+i) = 1. (3.3)
i=0

We will now show that w = 1/s. First, suppose that s is finite. In that case, for all ε > 0
there exists N ∈ N with

r0 + r1 + . . . r N ≥ s − ε.

If k is sufficiently large such that n k − M ≥ N, then by (3.3) we have


N
1≥ r i · w n k −(M+i) .
i=0

It then follows from (3.1) that

1 ≥ w(r1 + . . . r N ) ≥ w(s − ε).

Since ε was an arbitrary positive number, we obtain the inequality w ≤ 1/s.


On the other hand, from (3.3) and from the pair of inequalities w n ≤ 1 and (r N+1 +
r N+2 + . . .) ≤ ε, we deduce that


N
1≤ε+ r i · w n k −(M+i) .
i=0

Letting k tend to infinity, the above equation yields that



1≤ε+w m · vm ,
m=1

and so we also have the opposite inequality, namely, w ≥ 1/s.


3.2 Sum-level sets for the α-Lüroth expansion | 129

If we are instead in the situation that s is infinite, we have for all C > 0, that there
exists an N ∈ N such that

r0 + r1 + . . . r N > C,

from which, in a similar way to the above, we obtain the inequality 1 ≥ Cw. Since C
can be arbitrarily large, it follows that w = 0. Notice that if lim supn→∞ w n = 0, then we
must have that limn→∞ w n = 0, as these are all positive numbers. Therefore, in the case
where s is infinite, the proof is finished.
In the case where s is finite, we also have to show that lim inf n→∞ w n = 1/s. This
proceeds analogously, starting by setting w := lim inf n→∞ w n and then choosing a
subsequence that achieves this lower limit.

Remark 3.2.5. In the above proof, if it so happens that v m > 0 for every m ∈ N, we
could dispense with the slight complication of having to use Lemma 3.2.2, since in
this situation we have that Equation (3.1) holds for every m ∈ N.

We will now state some stronger renewal results obtained by Garsia and Lamperti
[GL63], and by Erickson [Eri70]. Their results are for the case where the limit in the
statement of the discrete renewal theorem is equal to zero. They study the manner in
which the sequence (w n )n≥0 tends to zero, under a certain additional hypothesis which
we will now describe. Let the sequences ((v n )n≥1 , (w n )n≥0 ) be a given renewal pair and
let the two associated sequences (V n )n∈N and (W n )n∈N be defined, for all n ∈ N, by


∞ 
n
V n := v k and W n := wk . (3.4)
k=n k=1

Then the principal assumption in these strong renewal results is that V(n) satisfies

V n = ψ(n)n−θ ,

for all n ∈ N, for some θ ∈ [0, 1] and for some slowly varying function ψ. (Recall
that slowly varying functions were defined in Section 1.4.4.) Before stating the
theorem, let us also remind the reader that the notation “f (n) ∼ g(n)” means that
limn→∞ f (n)/g(n) = 1. Finally, the constants appearing on the right-hand side of the
first two statements are given in terms of the gamma function (which was originally
introduced by Euler). The gamma function is an extension of the factorial function to
complex arguments, so we have Γ(n) = (n − 1)!, and, considered as an extension to the
open right-half plane, it has no zeros. For more details, we refer the interested reader
to the book Complex Analysis by Gamelin [Gam01].
130 | 3 Renewal theory and α-sum-level sets

Strong renewal results by Garsia/Lamperti and Erickson [GL63, Lemma 2.3.1],


[Eri70, Theorem 5]:
For θ ∈ [0, 1], we have that
 −1
−1

n
W n ∼ (Γ(2 − θ)Γ(1 + θ)) ·n· Vk .
k=1

Also, if θ ∈ (1/2, 1], then


 −1
−1

n
w n ∼ (Γ(2 − θ)Γ(θ)) · Vk .
k=1

Finally, for θ ∈ (0, 1/2) we have that the limit in the latter formula does not have to exist
in general. However, for θ ∈ (0, 1/2] it is shown in [GL63, Theorem 1.1] that one at least
has
1 sin πθ
lim inf n · w n · V n = = ,
n→∞ Γ(θ)Γ(1 − θ) π

and that the limit exists if we restrict the indices to a set of integers whose complement
is of zero density³
We will not rigorously prove these strong renewal results, as the proofs are
decidedly non-trivial. However, we will provide a sketch of some of the main ideas.
The proof of the first statement in the strong renewal results by Garsia/Lamperti and
Erickson is reasonably straightforward, although it does use some fairly heavy ana-
lytic machinery. The deep result underlying this statement is Karamata’s Tauberian
Theorem, which we state below in the setting of power series (the proof can be found
in [Fel68b]). Before stating this theorem, let us recall the following definitions:
– A measurable function ψ : R+ → R+ is said to be slowly varying if

ψ(xy)
lim = 1, for all y > 0.
x→∞ ψ(x)

– A function f : R+ → R+ is called regularly varying with exponent ρ (with ρ ∈ R) if


for all x ∈ R+ we have

f (x) = x ρ · ψ(x),

where ψ is slowly varying.


– A sequence (b n )n∈N is called regularly varying with exponent ρ if for all n ∈ N, we
have that b n = f (n), with f : R+ → R+ regularly varying with exponent ρ.

3 If we set A(n) := {1, . . . , n} ∩ A, then the density of a set of integers A is given, where the limit

exists, by d(A) := limn→∞ # A(n)/n. For example, if A := {n2 : n ∈ N}, then since # A(n) ≤ n we have
that d(A) = 0.
3.2 Sum-level sets for the α-Lüroth expansion | 131

Theorem 3.2.6 (Karamata’s Tauberian Theorem). Let b n ≥ 0 for all n ∈ N0 and suppose
that the series


B(s) := bn sn
n=0

converges for 0 ≤ s < 1. If ψ is slowly varying and 0 ≤ ρ < ∞, then the following two
statements are equivalent:
 
1 1
(a) B(s) ∼ · ψ , as s → 1− .
(1 − s)ρ 1−s


n−1
n ρ · ψ(n)
(b) bk ∼ , as n → ∞.
Γ(1 + ρ)
k=0

Furthermore, if the sequence (b n )n∈N is monotonic and 0 < ρ < ∞, then (a) is equivalent to

n ρ−1 · ψ(n)
(c) b n ∼ , as n → ∞.
Γ(ρ)

Finally, if for a family of sequences (b xn )x∈X the asymptotic in (b) holds uniformly in
x ∈ X then so does the asymptotic in (a).

Proof. See [Fel68b], Theorem 5 in Section XIII.5 and, for the uniformity, a detailed
inspection of the proof of the Extended Continuity Theorem (Section XIII.1 Theorem
2a) is needed (cf. Excercises 3.3.6 and 3.3.7).
For the following discussion, we will need to use the notion of a generating function
(see also Chapter XI of [Fel68b]).

Definition 3.2.7. Let (c i )i≥0 be a sequence of real numbers and define



C(s) := cn sn .
n=0

Then, if C(s) is convergent in some interval −s0 < s < s0 , we say that C(s) is the
generating function of the sequence (c i )i≥0 . Note that if the sequence (c i )i≥0 is bounded,
then C(s) certainly converges in the interval |s| < 1.

Now, for a given renewal pair ((v n )n≥1 , (w n )n≥0 ), and with W n and V n defined as in
(3.4), we wish to use Karamata’s Tauberian Theorem to prove that
 −1
−1

n
W n ∼ (Γ(2 − θ)Γ(1 + θ)) ·n· Vk .
k=1

To begin, first notice that


n
V k ∼ (1 − θ)−1 · n1−θ · ψ(n), as n → ∞.
k=1
132 | 3 Renewal theory and α-sum-level sets

Now, let us define two generating functions


∞ 

V(t) := V n t n and W(t) := wn tn .
n=0 n=0

Recalling both that V n = n−θ ψ(n), where θ ∈ [0, 1] and ψ is a slowly varying function,
and that V n is monotonically decreasing, we can apply Theorem 3.2.6 (c) =⇒ (a) to
obtain
 
1
V(t) ∼ Γ(1 − θ)(1 − t)θ−1 ψ .
1−t

Multiplying out and gathering coefficients yields that V(t)W(t) = 1/(1 − t), or, in other
words,
1
W(t) = .
(1 − t)V(t)
Thus,
1
W(t) ∼   (1 − t)−θ .
1
Γ(1 − θ)ψ
1−t

Finally, applying Theorem 3.2.6 (a) =⇒ (b), we have that


1 1 1
Wn ∼ · nθ · ·
Γ(1 − θ) ψ(n) Γ(1 + θ)
1−θ 1
= · nθ ·
Γ(1 + θ)Γ(2 − θ) ψ(n)
 n −1
1 
= ·n· Vk .
Γ(1 + θ)Γ(2 − θ)
k=1

The proof of the other parts of the strong renewal results quoted above also rely on
Theorem 3.2.6, but also on some intricate estimates of integrals. We leave the details
to the intrepid reader.

3.2.2 Renewal theory applied to the α-sum-level sets

We begin our discussion with the crucial observation that the sequence of the
Lebesgue measures of these α-sum-level sets satisfies a renewal equation (as observed
in [WX11], [Mun11] and [KMS12]). Here, the role of the probability vector is filled by the
sequence of Lebesgue measures of the partition elements of α, that is, the sequence
(a m )m≥1 .
3.2 Sum-level sets for the α-Lüroth expansion | 133

 
Lemma 3.2.8. We have that a n , Cn(α) defines a renewal pair. That is, for each n ∈ N,
we have
   n  
λ Cn(α) = (α)
a m λ Cn−m .
m=1

Proof. Since λ(C0(α) ) = 1 and λ(C1(α) ) = a1 , the assertion certainly holds for n = 1. For
n ≥ 2, the following calculation finishes the proof.
    n−1 
λ Cn(α) = λ C α (n) + λ(C α (1 , . . . , k , m))
m=1 (α)
C α (1 ,...,k ,m)∈Cn
k∈N


n−1 
= λ(C α (n)) + am λ(C α (1 , . . . , k ))
m=1 (α)
C α (1 ,...,k )∈Cn−m
k∈N

  n−1   n  
= a n λ C0(α) + (α)
a m λ Cn−m = (α)
a m λ Cn−m .
m=1 m=1

We are now in a position to prove our main results. The first of these is valid for
arbitrary partitions, but for the second we must restrict ourselves to partitions that
are either expansive of exponent θ ∈ [0, 1] or of finite type (recall that these were
introduced in Definition 1.4.18). The proof of the first statement of the first main result
again makes use of the notion of a generating function, which was defined above (see
Definition 3.2.7).

Theorem 3.2.9. For the α-sum-level sets of an arbitrary given partition α ∈ A we have

that ∞ (α)
n=1 λ(Cn ) diverges, and that
#
  0 if F α is of infinite type;
(α)
lim λ Cn = ∞ −1
k=1 t k if F α is of finite type.
n→∞

Proof. The general form of the discrete renewal theorem given in Theorem 3.2.4 above
can be applied directly to our specific situation. For this, fix some partition α ∈ A,
and set v n := λ(A n ) = a n , for each n ∈ N. Let us recall again that this is certainly a
probability vector. Then, put w n := λ(Cn(α) ), for each n ∈ N0 . In light of Lemma 3.2.8 and
the observation that w0 = λ(C0(α) ) = 1, we then have that these particular sequences
(v n )n≥1 and (w n )n≥0 are indeed a renewal pair. Consequently, an application of the
discrete renewal theorem immediately implies that
∞ −1  ∞ −1
   
(α)
lim λ Cn = k · ak = tk ,
n→∞
k=1 k=1
∞
where this limit is equal to zero if k=1 t k diverges. Note that by Lemma 2.3.20, the
divergence of the latter series is equivalent to the statement that the partition α is of
infinite type.
134 | 3 Renewal theory and α-sum-level sets

For the remaining assertion, let us consider the two generating functions V and
W, which are given by

∞ 

V(s) := v n s n and W(s) := wm sm .
n=1 m=0

Using the Cauchy product formula for the two power series in tandem with the renewal
equation provided in Lemma 3.2.8, we have that

∞ 
n 

W(s)V(s) = sn v m w n−m = w n s n = W(s) − 1 (3.5)
n=1 m=1 n=1

Hence, W(s) = 1/(1 − V(s)). Since a(1) = 1, this yields that

lim W(s) = +∞,


s→1−

∞ (α)
which shows that the series n=0 λ(Cn ) diverges. This finishes the proof.

Theorem 3.2.10. For a given partition α which is either expansive of exponent θ ∈ [0, 1]
or of finite type, we have the following estimates for the asymptotic behaviour of the
Lebesgue measure of the α-sum-level sets.
(a) With K θ := (Γ(2− θ)Γ(1+ θ))−1 for α expansive of exponent θ ∈ [0, 1] and with K θ := 1
for α of finite type, we have that
 n −1
 n   
(α)
λ Ck ∼ K θ · n · tk .
k=1 k=1

(b) With k θ := (Γ(2 − θ)Γ(θ))−1 for α expansive of exponent θ ∈ (1/2, 1] and with k θ := 1
for α of finite type, we have that
 n −1
  
(α)
λ Cn ∼ k θ · tk .
k=1

(c) For an expansive partition α of exponent θ ∈ (0, 1), we have that


   sin πθ
lim inf n · t n · λ Cn(α) = .
n→∞ π

Moreover, if θ ∈ (0, 1/2), then the corresponding limit does not exist in general.
However, in this situation the existence of the limit is always guaranteed at least
on the complement of some set of integers of zero density.

Proof. The statements in the theorem concerning partitions α of finite type follow
easily from Theorem 3.2.9. Indeed, given that
⎛   ⎞
λ Cn(α)   n
lim ⎝  −1 ⎠ = lim λ C (α)
n · lim t k = 1,
n→∞ n n→∞ n→∞
k=1 t k k=1
3.3 Exercises | 135

the statement in part (b) follows immediately. The corresponding claim in part (a) fol-
lows directly on considering the Cesàro average of the sequence of Lebesgue measures
of α-sum-level sets. Similarly to the proof of Theorem 3.2.9, the remainder of the proof
(that is, those parts concerning partitions that are expansive of exponent θ), follow
from straightforward applications of the strong renewal results of Garsia/Lamperti
and Erickson to the setting of the α-sum-level sets. For this we must set v n := a n , V n := t n
and w n := λ(Cn(α) ), and recall that the so-defined pair of sequences ((v n )n≥1 , (w n )n≥0 )
satisfies the conditions of a renewal pair.

3.3 Exercises

Exercise 3.3.1. Let (v n ) be a infinite probability vector with generating function V (see
Definition 3.2.7). Show that
1 − V(z) 

lim− = m · vm .
s→1 1−z
m=1

Exercise 3.3.2. Consider the renewal pair (v n , w n ) with the extra assumption that v1 :=
p ∈ (0, 1) and v2 := 1 − p. Determine the values of the sequence (w n ) explicitly with the
help of the generating functions V and W, and verify that the speed of convergence in
the renewal theorem is in fact exponential. (Hint: Use the relation (3.5) to show that
W = 1/(1 − V) is a rational function with two poles, one in s1 = 1 and another one
in s2 . Find the residues and determine the power series of W by using the identity

(1 − s/s k ) = n≥0 (s/s k )n , for k = 1, 2.)

Exercise 3.3.3. Generalise the ideas of Exercise 3.3.2 to the case that more than two,
but only finitely many, of the v n are non-zero and such that d v = 1. (Hint: As an
intermediate step show that 1− V(s) is a polynomial of finite degree which has a simple
root in 1 and all the other roots are of modulus strictly greater than 1. Also make use
of Exercise 3.3.1.)

Exercise 3.3.4. In the following exercise we employ a very useful representation for
slowly varying functions. Assume that L is a slowly varying function. Then there
exist constants c ∈ R and A > 0, a bounded measurable function η and a continuous
function δ, both defined on [a, ∞), with limx→∞ η(x) = c and limx→∞ δ(x) = 0 such that
for all x ≥ a we have
 , x 
δ(t)
L(x) = exp η(x) + dt .
a t

Use this representation to prove:


1. For every 0 < r < s < ∞,
 
 L(tx) 
lim sup  − 1 = 0.
x→∞ t∈[r,s]  L(x)
136 | 3 Renewal theory and α-sum-level sets

2. We have
log(L(x))
lim = 0.
x→∞ log(x)

3. We have for α > 0

lim x α L(x) = ∞ and lim x−α L(x) = 0.


x→∞ x→∞

4. For α ∈ R and for slowly varying functions L1 , L2 and L3 , where also


limx→∞ L3 (x) = ∞ we have that each of the functions x → (L1 (x))α , L1 + L2 , and
L1 ◦ L3 are also slowly varying.

Exercise 3.3.5. Show that a measurable function f : R+ → R+ is regularly varying with


exponent ρ ∈ R if and only if
f (xy)
lim = y ρ for every y > 0.
x→∞ f (x)

Exercise 3.3.6. Let us fix a constant ρ ≥ 0, a slowly varying function ψ and a family of


sequences of distribution functions t → U x (t), t ≥ 0, x ∈ X such that the corresponding
measure – also denoted by U x – carries no mass in 0. The Laplace transform of U x is
defined to be
,∞
ω : R → R , s → exp(−st) dU x (t).
x + +

Suppose that uniformly in x ∈ X we have

U x (t) ∼ t ρ ψ(t),

as t tends to infinity, then we also have uniformly in x ∈ X, as τ tends to zero,

ω x (τ) ∼ Γ(ρ + 1)τ−ρ ψ(1/τ).

Hint: First show that for some δ ∈ (0, 1) the quotient ω x (δ · τ)/τ−ρ ψ(1/τ) stays uniformly
bounded as τ tends to zero (for this split the domain of integration into the points 2k /τ,
k ∈ N and use integration by parts). Then use Exercise 3.3.4 (1) with a > 0 sufficiently
small and b < ∞ sufficiently large to split the domain of integration in the definition of
the Laplace transform in a convergent part and two negligible parts.

Exercise 3.3.7. Use Exercise 3.3.6 to prove the uniformity statement in Karamata’s
Tauberian Theorem 3.2.6.
 t −1
Hint: Consider the distribution function U x (t) := k=0 b xk + (t − t )b x t and make the
change of variables y = exp(−t).
4 Infinite ergodic theory
In this chapter we will make a deeper journey into infinite ergodic theory, taking
up where we left off in Chapter 2. In the following chapter, we shall then see some
applications of this general theory to continued fractions.

4.1 The functional analytic perspective and the Chacon–Ornstein


Ergodic Theorem

Our first main result of this section will be the Chacon–Ornstein Ergodic Theorem
[CO60], which is stated completely in terms of linear operators acting on L1 (μ).
After having proved this powerful result, we will then see that it implies Birkhoff’s
Pointwise Ergodic Theorem (which we have already seen in Theorem 2.4.16), Hopf’s
Ergodic Theorem (cf. Theorem 2.4.24), as well as Hurewicz’s Ergodic Theorem (Corol-
lary 4.1.18), which we will specifically need later in this chapter. The proof, as you
might expect, is not trivial. Before stating and proving the Chacon–Ornstein Ergodic
Theorem, we will first collect a few useful observations concerning the functional
analytic nature of this part of the theory. In particular, we shall now study the
previously-defined transfer operator T  and Koopman operator U T : f → f ◦ T from a
broader functional-analytic perspective (recall that the Koopman operator U T was first
mentioned in Remark 2.3.8).
Let (X, B , μ) denote a σ-finite measure space. We remind the reader that the space
of integrable functions L1 (μ), in which functions are identified if they are a.e. equal,
together with norm  · 1 given by
,
f 1 := |f | dμ

defines a Banach space. The space L∞ (μ) equipped with the norm  · ∞ given by

f ∞ := inf {c ∈ R : |f | ≤ c a.e.}

also defines a Banach space. Further, let L+p (μ) denote the set of non-negative
functions from L p (μ) and recall that the non-negative part ψ+ of a measurable
real-valued function ψ is given by the measurable function ψ+ := max{ψ, 0}.
We shall now study bounded linear operators acting on L1 (μ); these are linear
functions V : L1 (μ) → L1 (μ) with bounded operator norm, which is defined to be
1 1
V  := sup 1V(f )11 .
f 1 =1
138 | 4 Infinite ergodic theory

Definition 4.1.1. Let V : L1 (μ) → L1 (μ) be a linear operator.


(a) V is said to be a contraction if V  ≤ 1.
 
(b) V is said to be positive if V L+1 (μ) ⊂ L+1 (μ).

We have already seen that if (X, B , μ, T) is a non-singular dynamical system then


 : L1 (μ) → L1 (μ) is well defined and T
T   ≤ 1 (see Exercise 2.6.11). In the next lemma,

we will consider positivity for T and both properties for the Koopman operator.

Lemma 4.1.2. For the Koopman and the transfer operator we have:
(a) If (X, B , μ, T) is a measure-preserving dynamical system, then U T : f → f ◦ T is a
positive contraction on L1 (μ).
 is a positive contraction
(b) If (X, B , μ, T) is a non-singular dynamical system, then T
on L1 (μ).

Proof. Let (X, B , μ, T) be a measure-preserving dynamical system. Then for every


f ∈ L1 (μ), we have
, , , ,
|U T f | dμ = |f ◦ T | dμ = |f | dμ ◦ T −1 = |f | dμ < ∞,

which shows that U T is well defined and a contraction. It is clear that U T is positive,
so the proof of part (a) is finished.
Towards part (b), as we recalled above, we have already seen that T  has norm 1.
Positivity follows from the fact that for a given f ∈ L1 (μ) we have, for all g ∈ L+∞ (μ), that
+

, ,

Tf · g dμ = f · g ◦ T dμ ≥ 0.

Definition 4.1.3. Let V : L1 (μ) → L1 (μ) be a bounded linear operator. Then the dual
of V, which will be denoted by V * , is an operator acting on the dual space (L1 (μ))* 
L∞ (μ) which is uniquely determined by the identity
, ,
f · V * (g) dμ := V(f ) · g dμ,

for all f ∈ L1 (μ) and g ∈ L∞ (μ).

In particular, as was already alluded to in Remark 2.3.8, if (X, B , μ, T) is a non-singular


dynamical system, then for the transfer operator T  we have by definition that T * = UT
as an operator acting on L∞ (μ).
Note that if V is a positive linear operator acting on L1 (μ) then the dual operator
0
V * : L∞ (μ) → L∞ (μ) is also positive. Indeed, for every g ∈ L+∞ (μ) we have that V * (g) ·
0
f dμ = g · Vf dμ ≥ 0 for all f ∈ L+1 (μ), and hence V * g ≥ 0. If V is a positive linear operator
acting on L1 (μ) we will use the notation


n−1
S n f := V k f for n ∈ N ∪ {∞}, and we set S0 f := 0.
k=0
4.1 The functional analytic perspective and the Chacon–Ornstein Ergodic Theorem | 139

Note that for V = U T , this definition coincides with our definition in the context of
dynamical systems as introduced in Section 2.4.6.
We can now continue towards the proof of the Chacon–Ornstein Ergodic Theorem
and its immediate corollaries, i.e., to Hopf’s, Birkhoff’s and Hurewicz’s Pointwise
Ergodic Theorems.

Lemma 4.1.4 (Chacon–Ornstein Lemma). Let V be a positive contraction acting on


L1 (μ) and let g ∈ L1 (μ) be such that g > 0. For all φ ∈ L1 (μ), we then have μ-a.e. that

Vnφ
lim = 0.
n→∞ S n g

Proof. Let ε > 0 be fixed and define

E n := {x ∈ X : V n φ(x) > εS n g }.

The aim is to show that ∞ n=2 μ g (E n ) <0 ∞, for μ g given by dμ g := g dμ, where we
assume without loss of generality that g dμ = 1. This will be sufficient, since then
by the Borel–Cantelli Lemma (see Lemma 1.2.18), we have that the set of points which
lie in infinitely many of the sets E n is of μ g -measure equal to zero; then taking the
complement of this limsup set and noting that g > 0 as well as that ε > 0 was arbitrary,
n
this gives lim supn→∞ VS n gφ ≤ 0 μ-a.e. Applying this result to −φ instead of φ shows that
n
μ-a.e. we also have lim inf n→∞ VS n gφ ≥ 0, giving finally the assertion.
Now, since V is a positive operator and both 0 and ψ ∈ L1 (μ) are less than or equal
 
to ψ , we have 0 = V(0) ≤ V(ψ+ ) as well as V ( ψ) ≤ V ψ+ , and hence (Vψ)+ ≤ (V(ψ+ ))+ =
+

V(ψ+ ). Using this and the fact that V(V n φ − εS n g) = V n+1 φ − S n+1 εg + εg, it follows that
 +  
V n+1 φ − εS n+1 g + 1 E n+1 εg = 1 E n+1 V n+1 φ − εS n+1 g + 1 E n+1 εg
 
= 1 E n+1 V n+1 φ − S n+1 εg + εg
 +
= 1 E n+1 V(V n φ − εS n g)
 + 
≤ V V n φ − εS n g .
 +
To shorten the notation below, let us set J n := V n φ − εS n g . Then the above
inequality together with the fact that V is a contraction implies that
, , , ,
ε 1 E n+1 gdμ ≤ (VJ n − J n+1 )dμ ≤ (V  J n − J n+1 )dμ ≤ (J n − J n+1 )dμ.

This shows that for all N ∈ N, we have that


, 
N , 
N , ,
ε 1 E n+1 gdμ ≤ (J n − J n+1 )dμ = J1 − J N+1 dμ ≤ J1 dμ < ∞.
n=1 n=1
140 | 4 Infinite ergodic theory

Lemma 4.1.5 (Hopf’s maximal inequality). Let V be a positive contraction acting on


L1 (μ) and let f ∈ L1 (μ). If we set f n := max0≤j≤n S j f for each n ∈ N0 , we then have
,
f dμ ≥ 0.
{f n >0}

Proof. First, notice that f0 = 0 ≤ f + = f1 ≤ f2 ≤ · · · . Then, using the positivity of V, we


have on the set {f n > 0} that
⎛ ⎞

j−1 
j−1 j−2
f n = max V k f = max ⎝ f + V k f ⎠ = f + max V Vkf
0≤j≤n 1≤j≤n 1≤j≤n
k=0 k=1 k=0


j−2
≤ f + V max V k f = f + V ( f n−1 ) ≤ f + V ( f n ) .
1≤j≤n
k=0

Since Vf n ≥ 0 and V is contracting, a rearrangement of the above inequality yields that


, , , ,
f dμ ≥ ( f n − Vf n ) dμ = f n dμ − Vf n dμ
{f n >0} {f n >0} {f n >0}
, , , ,
≥ f n dμ − Vf n dμ ≥ f n dμ − V  f n dμ ≥ 0.

Before stating the next result, let us fix some notation. We let Q n (φ, g ) := S n φ/S n g and
define Q2 (φ, g ) := sup
n∈N Q n ( φ, g ).

Lemma 4.1.6 (Wiener’s maximal inequality). Let V be a positive contraction on L1 (μ)


and let g ∈ L1 (μ) be such that g > 0 and such that μ g , given by dμ g := gdμ, is a probability
measure. For each φ ∈ L1 (μ) and s > 0, we then have that
. /  φ 1
μg 2 (φ, g ) > s
Q ≤ .
s
Proof. Hopf’s maximal inequality applied to f n := max0≤j≤n S j (φ − sg) gives that for
each n ∈ N, we have that
, ,
0≤ (φ − sg) dμ = φ dμ − sμ g ({f n > 0}) .
{f n >0} {f n >0}

Since f n > 0 if and only if max1≤k≤n Q k (φ − sg, g ) > 0, it follows that


$ % ,
1 1
μg max Q k (φ − sg, g ) > 0 = μ g ({f n > 0}) ≤ φ dμ ≤ φ1 .
1≤k≤n s s
{f n >0}
4.1 The functional analytic perspective and the Chacon–Ornstein Ergodic Theorem | 141

Since ({max
. 1≤k≤n Q k (/φ − sg, g ) > 0})n∈N is an increasing sequence of sets with union
2
equal to Q (φ, g ) > s , using the continuity of μ g from below finishes the proof.

Let g ∈ L1 (μ) such that g > 0. By letting s tend to infinity in the previous lemma, we
2
find that Q(φ, g) < ∞ μ-a.e., for each φ ∈ L1 (μ), and in particular for those φ such
that φ > 0. In fact, this shows that {S∞ g = ∞} = {S∞ φ = ∞} mod μ and hence the set
{S ∞ g = ∞} is μ-a.e. independent of g.

Definition 4.1.7 (Hopf decomposition for operators). Let V be a positive contraction


on L1 (μ). The above observation allows us to define the (μ-a.e. determined) Hopf
decomposition of X with respect to the positive contraction V into the conservative part
C V := {S∞ g = ∞} for some g ∈ L1 (μ) such that g > 0 and the dissipative part D V := X \ C V .
If X = C V mod μ, then V is called conservative.

Remark 4.1.8. This definition further generalizes our notion of Hopf decompositions.
Let (X, B , μ, T) be a measure-theoretical dynamical system. Then the following hold:
(a) If the system is measure-preserving, then

C T = C UT mod μ.

(b) If the system is non-singular, then

 T = C
C mod μ.
T

Lemma 4.1.9. Let V be a conservative positive contraction on L 1 (μ) and let φ ∈ L∞ (μ)
be given such that either V * φ ≥ φ or V * φ ≤ φ. Then V * φ = φ.

Proof. For the case that V * φ ≥ φ, let g ∈ L1 (μ) be fixed such that g > 0. We then have
that
,  
n−1 , ,
* k  n 
0≤ V φ−φ V g dμ = φ V g − g dμ ≤ 2 φ∞ g dμ < ∞.
k=0

Since by assumption X∞ (g) = X, we have that S n g = n−1 k
k=0 V g is unbounded μ-a.e. on
*
X. Therefore, the latter inequality can be satisfied only if V φ = φ.
The case V * φ ≤ φ can be treated in an analogous way and is left to the reader.

Example 4.1.10. Since for all f ∈ L+1 (μ),


, , , ,
*
f · V 1 X dμ = Vf dμ ≤ f dμ = f · 1 X dμ,

it follows that V * 1 X ≤ 1 X . Consequently, in light of Lemma 4.1.9, we deduce that


V * 1X = 1X .
142 | 4 Infinite ergodic theory

Lemma 4.1.11. Let V be a positive conservative contraction on L1 (μ) and let φ ∈ L∞ (μ)
be V *-invariant, that is, φ = V * φ. Then φ+ and 1{a<φ≤b} are also both V * -invariant, for
all a, b ∈ R.

Proof. Fix φ ∈ L∞ (μ) such that φ = V * φ. Since V * is a positive operator, we have that
V * φ+ ≥ (V * φ)+ = φ+ and hence we can apply Lemma 4.1.9, which tells us that V * φ+ =
φ+ . These observations together with Example 4.1.10 give that for each a ∈ R,

V * (φ − a1 X ) = φ − a1 X and V * (φ − a1 X )+ = (φ − a1 X )+ .
   + 
Next, observe that the sequence h n := n 1/n − 1/n − (φ − a)+ converges
n≥1
monotonically from below to the indicator function 1{a<φ} . Since by the above
observations, all elements h n in this sequence are V * -invariant, we get V * 1{a<φ} ≥
V * h n = h n ↗ 1{a<φ} . Again by Lemma 4.1.9 we have that V * 1{a<φ} = 1{a<φ} . Now, the
lemma follows on observing that 1{a<φ≤b} = 1{a<φ} − 1{b<φ}
For a positive conservative contraction, the fixed points of its dual can be characterized
in the following way.

Lemma 4.1.12. Let V be a positive conservative contraction on L 1 (μ) and let φ ∈ L∞ (μ).
Then the following equivalence holds:

V * φ = φ ⇐⇒ V(φ · h) = φ · V(h), for all h ∈ L+1 (μ).

Proof. Let φ ∈ L∞ (μ) and assume that we have V(φ · h) = φ · V(h), for all h ∈ L+1 (μ).
Note that by Example 4.1.10, we have that 1 X = V * 1 X . Using this, it follows that for all
h ∈ L+1 (μ) we have
, , ,
V * (φ) · h dμ = φ · V(h) dμ = V(φ · h) dμ
, ,
*
= V (1 X ) · φ · h dμ = φ · h dμ.

This shows that φ = V * φ.


For the reverse direction, let φ ∈ L∞ (μ) be given such that V * φ = φ. By
Lemma 4.1.11, we have for F := {a < φ ≤ b}, for arbitrary a, b ∈ R, that 1 F and 1 F  =
1 X − 1 F are both V * -invariant. Using this, it follows that for each h ∈ L +1 (μ) we have
that
, ,
 
0 = 1 F  · V (1 F h) dμ = 1 F V 1 F  h dμ.
3 45 6
≥0
 
It follows that 1 F  V (1 F h) = 1 F V 1 F  h = 0. Using this and the linearity of V, we obtain
    
V (1 F h ) = 1 F + 1 F  V (1 F h ) = 1 F V ( h ) − V 1 F  h = 1 F V ( h ) .
4.1 The functional analytic perspective and the Chacon–Ornstein Ergodic Theorem | 143

Now the claim follows by approximating φ in L∞ (μ)-norm by elementary functions of


 nN
the form φ n := 2k=−2 nN 2
−n
k1{k2−n <φ≤(k+1)2−n } , for some fixed N > φ∞ . Note that we
have φ n h − φh1 → 0, for n → ∞.

Remark 4.1.13. The above lemma shows in particular that if all φ k ∈ L∞ (μ), k ∈
{1, . . . , n}, are V * -invariant then so is their product φ1 · · · φ n .

Definition 4.1.14. Let V : L1 (μ) → L1 (μ) be a bounded linear operator. Then the system
(L1 (μ), V) is called ergodic if the σ-algebra I := σ({f ∈ L∞ (μ) : V * f = f }) generated by
the V * -invariant functions is trivial.

Remark 4.1.15. Note that the σ-algebra I is trivial if and only if g ∈ {f ∈ L∞ (μ) :
V * f = f } implies that g is constant.

Lemma 4.1.16. For the transfer and the Koopman operator we have:
(a) If (X, B , μ, T) is a measure-preserving, conservative and ergodic dynamical system,
then (L1 (μ), U T ) is conservative and ergodic.
(b) If (X, B , μ, T) is a non-singular, conservative and ergodic dynamical system, then
 is conservative and ergodic.
(L1 (μ), T)
 are conservative by Remark 4.1.8. In
Proof. Both systems (L1 (μ), U T ) and (L1 (μ), T)
order to prove the ergodicity of U T , we have to show that the σ-algebra I generated
by the U T* -invariant functions is trivial, which is equivalent to the fact that all
U T* -invariant functions are constant. Fix φ ∈ L∞ (μ) such that U T* φ = φ. Then, by
Lemma 4.1.11, we can assume without loss of generality that φ ∈ L+∞ (μ)\ {0}. Therefore,
where μ φ is given by dμ φ := φdμ, we have for all f ∈ L1 (μ) that
, , , , ,
f ◦ T dμ φ = U T f · φ dμ = f U T* (φ) dμ = fφ dμ = f dμ φ .

This shows that μ φ is an invariant ergodic measure absolutely continuous to μ. Hence,


Theorem 2.4.35 implies that μ φ is equal to c · μ for some constant c > 0. This shows that
φ = c is in fact a constant function.
For part (b), to prove that T  is ergodic, we must show that the σ-algebra I
 *
generated by the T -invariant functions is trivial. However, since T  * coincides with
the Koopman operator U T acting on L∞ (μ), the latter assertion is an immediate
consequence of Proposition 2.4.6.
We are now in a position to state and prove our first main result of this chapter.
Before stating the theorem, we recall that for every sub-σ-algebra F of a σ-algebra
B , and function g ∈ L 1 (μ) there exists an a.s. uniquely-defined function Eμ (g |F )
called the conditional expectation of g with respect to F which can be characterised as
follows: f = Eμ (g |F ) if and only if f is F -measurable and for all B ∈ F we have
, ,
f dμ = g dμ.
B B
144 | 4 Infinite ergodic theory

Note that the conditional expectation is already characterised if the above equality
can be shown to hold for sets B ∈ F  with F = σ(F  ) and such that F  is closed under
intersections and contains Ω.

Theorem 4.1.17 (Chacon–Orstein Ergodic Theorem). Let V be a positive conservative


contraction on L1 (μ). We then have for f , g ∈ L1 (μ) such that g > 0, that the sequence
of quotients Q n ( f , g ) := S n f /S n g converges μ-a.e. to a function Q ( f , g ), for n tending to
infinity. Moreover, the function Q ( f , g ) can be expressed as a conditional expectation.
That is,
 
Q ( f , g ) = Eμ g f /g |I ,

where I denotes σ-algebra generated by the V *-invariant functions and μ g is given by


dμ g := g/μ(g) dμ.
Furthermore, if the system (L1 (μ), V) is ergodic (that is, I is trivial), then the limiting
0 0
function Q ( f , g ) is μ-a.e. equal to the constant function f dμ/ g dμ.

Proof. Let us begin by proving the μ-a.e. convergence. For this, fix g ∈ L1 (μ) such that
g > 0. Next, let us define the set
 
L := φg + ψ − Vψ : φ ∈ L∞ (μ), ψ ∈ L1 (μ), V(φh) = φV(h), for all h ∈ L+1 (μ) .

Now, let f ∈ L be fixed. By the definition of L, we immediately obtain that


 
φS n g + ψ − V n ψ
Q n ( f , g) = , for all n ∈ N.
Sn g

Recalling that V is conservative, a straightforward application of the Chacon–Ornstein


Lemma shows that on X∞ (g) = X the sequence (Q n ( f , g ))n≥1 converges μ-a.e. to the
function φ. Therefore, we have that

L ⊂ L := {h ∈ L1 (μ) : (Q n (h, g ))n≥1 converges μ-a.e.}

Our next step is to show that L is a dense subset of L1 (μ) and that L is closed. To show
the denseness, we use the general fact that a subspace is dense in a normed vector
space if and only if the annihilator of the subspace is equal to the null-space, which
is a consequence of the Hahn–Banach Theorem (cf. [Rud91, Theorem 4.7]). Hence, it
is sufficient to show that the following implication holds for each k ∈ L∞ (μ):
,
kh dμ = 0, for all h ∈ L =⇒ k = 0. (4.1)

In order to show this, note that for each ψ ∈ L1 (μ), we have that ψ − Vψ belongs to L.
Hence, for each k ∈ L∞ (μ) that fulfills the left hand side of the above implication, we
have
4.1 The functional analytic perspective and the Chacon–Ornstein Ergodic Theorem | 145

, , ,
0= kψ dμ − k Vψ dμ = (k − V * k)ψ dμ.

Since this must hold for all ψ ∈ L1 (μ), it follows that k = V * k. Using Lemma 4.1.12, we
0
have that kg ∈ L and hence, kkg dμ = 0. Since kkg ≥ 0, it follows that k = 0, which
proves (4.1).
Now, to establish the μ-a.e. convergence, all that is left to show is that L is closed
in L1 (μ). For this, let us fix h ∈ L and δ > 0 such that for each ε > 0 there exists h ε ∈ L
with h − h ε 1 < ε · δ. Since we have that

lim sup |Q n (h, g ) (x) − Q m (h, g ) (x)|


n,m→∞

= lim sup |Q n (h − h ε , g ) (x) − Q m (h − h ε , g ) (x)| ,


n,m→∞

an application of Wiener’s maximal inequality (Lemma 4.1.6) gives that


$ %
μg x : lim sup |Q n (h, g ) (x) − Q m (h, g ) (x)| > δ
n,m→∞
$ %
≤ μg x : max |Q n (h − h ε , g ) (x) − Q m (h − h ε , g ) (x)| > δ
n,m
. /  h − h ε 1
≤ μg 2 (|h − h ε |, g ) (x) > δ/2
x:Q ≤2 ≤ 2ε.
δ
Since δ > 0 was chosen arbitrarily, we can now conclude, by letting ε tend to zero, that
μ g -a.e. we have that (Q n (h, g ))n≥1 converges. This shows that h ∈ L and hence, L is
closed.
It remains to characterise the limiting function. In order to do so, let E ( · |I ) =
Eμ g ( · |I ) denote the conditional expectation with respect to the measure μ g . Let us
first consider the special situation in which f = φg + ψ − Vψ is a fixed element in L.
 
We already know that in this case the limit of Q n (f , g) n≥1 is μ-a.e. equal to φ, which
is I -measurable, by Lemma 4.1.12. Let J denote the set of all finite intersections of
subsets of the form {x : a < h(x) ≤ b}, for some a, b ∈ R and h ∈ L∞ (μ) with h = V * h.
Then using Lemma 4.1.11 and Remark 4.1.13, we have for all F ∈ J ,
, , , ,
 
μ(g) 1 F · f /g dμ g = 1 F · f dμ = 1 F φg dμ + 1 F · ψ − V(ψ) dμ
, , ,
= 1 F · φg dμ + 1 F · ψ dμ − V * (1 F ) · ψ dμ
,
= μ(g) 1 F · φ dμ g .

Since the set J is closed under taking intersections and generates I and since φ
is I -measurable, it follows from the characterisation of the conditional expectation
146 | 4 Infinite ergodic theory

 
stated above that φ = E f /g|I . For general f ∈ L1 (μ) the claim follows by approxim-
ating f with functions from L.
Finally, if we additionally have that V is ergodic, then the σ-algebra I generated by
the V *-invariant functions is trivial by definition and hence, the limiting function is
μ-a.e. constant and equal to
, 0
  f dμ
E f /g |I = f /g dμ g = 0 .
g dμ

As already mentioned at the beginning, we end this section by first showing how the
Chacon–Ornstein Ergodic Theorem implies Hurewicz’s Ergodic Theorem, and then
Hopf’s Ergodic Theorem, and, in turn, Birkhoff’s Pointwise Ergodic Theorem.

Corollary 4.1.18 (Hurewicz’s Ergodic Theorem). Let (X, B , μ, T) be an ergodic conser-


0
vative dynamical system and let g ∈ L+1 (μ) be such that g dμ > 0. For each f ∈ L1 (μ),
we then have μ-a.e. that
n−1  k 0
k=0 T (f ) 0
f dμ
lim n−1 = .
n→∞  k
T (g) gdμ
k=0

Proof. By the observations in Section 4.1 for V = T  : L1 (μ) → L1 (μ), the


Chacon–Ornstein Ergodic Theorem is applicable. Hence, for all f ∈ L1 (μ) and for
a particular g0 ∈ L+1 (μ) such that g0 > 0 the convergence holds as stated in the
corollary. Note that such a function g0 exists because (X, B , μ) is σ-finite; in fact,
this is a necessary and sufficient condition for σ-finiteness. For general g ∈ L+1 (μ) with
0
g dμ > 0 we have that
n−1  k  7 n−1  k  0
n−1  k
k=0 T (f ) k=0 T (f ) k=0 T (g) f dμ
lim n−1 =0
 k (g) n→∞ n−1 T  k (g0 ) n−1 T
= lim .
n→∞ T  k (g0 ) g dμ
k=0 k=0 k=0

Remark 4.1.19. We can also give a second proof of Hopf’s Ergodic Theorem (which
we already discussed in Chapter 2, see Theorem 2.4.24), using the Chacon–Orstein
Ergodic Theorem. The argument works in precisely the same way as that given above.
First, we have seen in Section 4.1 that the Chacon–Ornstein Ergodic Theorem is
applicable to the Koopman operator U T : L1 (μ) → L1 (μ), given by U T f := f ◦ T.
Therefore, the assertion in Hopf’s Ergodic Theorem certainly holds for all f ∈ L1 (μ)
and for a particular g0 ∈ L+1 (μ) such that g0 > 0. The proof is then completed in exactly
0
the same way as in Hurewicz’s Ergodic Theorem for any g ∈ L1 (μ) with g dμ > 0.
Now Birkhoff’s Pointwise Ergodic Theorem (see Theorem 2.4.16) follows immedi-
ately by choosing g = 1 X in Hopf’s Ergodic Theorem. Recall that in the proof we gave
of Hopf’s Ergodic Theorem in Chapter 2 using inducing, part of the argument was to
use Birkhoff’s Theorem, so this deduction is only reasonable now.
4.2 Pointwise dual ergodicity | 147

4.2 Pointwise dual ergodicity

In this section we will study an ergodicity property of the transfer operator associated
to a system.

Definition 4.2.1. An ergodic, conservative measure-preserving system (X, B , μ, T) is


said to be pointwise dual ergodic if there exists a sequence (r n )n≥1 such that μ-a.e.
,
1  i
n−1
T f → f dμ for all f ∈ L1 (μ).
rn
i=0

The sequence (r n )n≥1 , which is unique up to asymptotic equivalence (see remark


below), will be referred to as the return sequence of T.

Remark 4.2.2. For what follows it will be important to find a sequence (a n )n≥1 in
the asymptotic class of (r n ) that is strictly increasing. First fix f , g ∈ M(B) with f
0
a μ-integrable function and g a bounded function with g > 0, g dμ = 1. Then fix a
μ-typical point x ∈ X that witnesses both the Hurewicz Ergodic Theorem, in the sense
that
n n ,
lim  k
T f (x)/  k
T g(x) = f dμ,
n→∞
k=0 k=0

and the pointwise dual ergodicity for f , in the sense that


,
1  i
n−1
T f (x) → f dμ.
rn
i=0
 k
Then the sequence a n := n−1
k=0 T g(x) is strictly increasing and asymptotic to r n . In fact
  k−1 g(x)
we have shown that a n can be written as the sum a n = nk=1 b k where b k := T
is strictly positive.

Definition 4.2.3. For a set A ∈ B with 0 < μ(A) < ∞, the wandering rate of A with respect
to T is given by the sequence (w n (A))n≥1 , where
 n 
w n (A) := μ T −k (A) .
k=0

Where φ is the return time function with respect to A (as in Definition 2.4.25), we also
have that
 

n n
−k
k−1
−
w n (A) = μ(A ∩ {φ > k}) = μ(A) + μ T A\ T A . (4.2)
k=0 k=1 =0

The proof of this statement is left to Exercise 4.4.8


148 | 4 Infinite ergodic theory

Information about the wandering rate can be understood as information about


“how infinite” the system is, in terms of the size of X relative to E. That is, the
increments μ(A ∩ {φ > n}) in the characterisation of the wandering rate quantify in
some way how large X is relative to E.

Definition 4.2.4. Let (X, B , μ, T) be pointwise dual ergodic with return sequence (r n ).
Then, a set A ∈ B with positive, finite measure is called a uniform set for f ∈ L+1 (μ) if
,
1  k
n−1
T f → f dμ uniformly (mod μ) on A.
rn
k=0

We will also call a set A ∈ B uniform if it is a uniform set for some f ∈ L+1 (μ) with
0
f dμ > 0.

Here, uniform convergence (mod μ) on A means uniform convergence on a set A0 such


that the symmetric difference A0 A has μ-measure zero. One can also think of this
convergence as convergence in the L∞ (μ|A )-norm. We also recall that the definition of
a regularly varying sequence was given directly above Theorem 3.2.6.

Lemma 4.2.5. Assume that the sequence (r n ) is regularly varying with exponent ρ ∈

[0, 1] and given by r n = nk=0 b k , for some non-negative sequence (b n ). If for some A ∈ B
with 0 < μ(A) < ∞ and f ∈ L+1 (μ) we have uniformly (mod μ) on A that
,
1  k
n−1
lim T f = f dμ,
n→∞ r n
k=0
∞ n
then with B(s) := n=0 b n s , s ∈ [0, 1), we have uniformly (mod μ) on A that
,
1  n n

lim− s T f= f dμ.
s→1 B(s)
n=0

Proof. Let r n = n ρ ψ(n) for some slowly varying function ψ. Since b n ≤ r n it follows that
B(s) is finite for all s ∈ [0, 1). Then the claim follows by applying Karamata’s Tauberian
Theorem (see Theorem 3.2.6) twice, first with the sequence (b n ) and then with the
sequence (T  n f ). More precisely, first note that


n  
1 n ρ ψ(n) 1 1
bk ∼ =⇒ B(s) ∼ ψ Γ(1 + ρ).
Γ(1 + ρ) Γ(1 + ρ) (1 − s)ρ 1−s
k=0

Thus, since by assumption uniformly (mod μ) on A


n , ,
k f ∼
T f dμ · r n ∼ f dμ · n ρ ψ(n)
k=0
4.2 Pointwise dual ergodicity | 149

it follows that uniformly (mod μ) on A


, 0  
f dμ 1 

B(s) · f dμ ∼ ψ Γ(1 + ρ) ∼ n f .
sn T
(1 − s)ρ 1−s
n=0

Lemma 4.2.6 ([Aar81]). For f ∈ L1 (μ) or f ∈ M + (B), and A ∈ B such that 0 < μ(A) < ∞
we have for s ∈ (0, 1)
, ,
 
∞ 

1 − sφ  n f dμ =
sn T sn f dμ,
A n=0 n=0 An

n−1
where A n := T −n A \ k=0 T −k A, for n ∈ N, and A0 = A.

Proof. We may restrict our attention to non-negative measurable functions f only


since the case f ∈ L1 (μ) can be deduced from this in the following way: First note that
for f ∈ L+1 (μ) the right hand side of the equality is finite due to the fact that the sets A n
are pairwise disjoint. For an arbitrary f ∈ L1 (μ) we consider its positive part max(f , 0)
and its negative part max(−f , 0) separately and use the linearity and positivity of T 
together with the linearity of both the integral and the infinite sum.
For f ∈ M + (B) and with dν = f dμ we have
,
 n f dμ = ν (T −n A)
T
A
n−1 
 
= ν (A n ) + ν T −k (A ∩ {φ = n − k })
k=0
, 
n−1
= ν (A n ) +  k f · 1{φ=n−k} dμ.
T
A k=0

Using this gives


, 
∞ 
∞ 
∞ , 
n−1
 n f dμ =
sn T s n ν (A n ) + sn  k f · 1{φ=n−k} dμ
T
A n=0 n=0 n=1 A k=0

∞ , 
∞ 

= s n ν (A n ) + s k 1{φ=k}  n−k f dμ
s n−k T
n=0 A k=1 n=k


∞ , , 

= sn f dμ + sφ  n f dμ
sn T
n=0 An A n=0

Rearranging the last identity proves the claim.

Proposition 4.2.7 (Asymptotic Renewal Equation [Aar81]). Let A be a uniform set with

regular varying return sequence r n = nk=0 b k , for some non-negative sequence (b k ).
150 | 4 Infinite ergodic theory

Then
,
  1
1 − s φ dμ ∼ , for s → 1− ,
B(s)
A
∞ n
where B(s) := n=0 b n s .

Proof. Let A be a uniform set for f ∈ L+1 (μ). By Lemma 4.2.6 and with A n as defined
therein, we have
, , ,
 
∞ 

1 − sφ  n f dμ =
sn T sn f dμ → f dμ,
A n=0 n=0 An

for s → 1− , where the convergence follows by Abel’s Theorem. On the other hand,
making use now of Lemma 4.2.5 and the almost everywhere uniform convergence, we
find for s → 1−
, , ,
 

 
1 − sφ  k f dμ ∼ B(s) f dμ
sk T 1 − s φ dμ.
A k=0 A

These observations combined prove the proposition.

Proposition 4.2.8. If T is pointwise dual ergodic with return sequence (r n ) and if A is


a uniform set such that the wandering rate w n (A) is regularly varying with exponent
α ∈ [0, 1], then we have
n
r n w n (A) ∼ .
Γ(2 − α)Γ(1 + α)
In particular, we have that r n is regularly varying with exponent 1 − α and there exists a
sequence W n ↗ ∞ such that for all uniform sets B ∈ B we have W n ∼ w n (B).

Proof. As shown in Remark 4.2.2, pointwise dual ergodicity implies that the sequence

(r n ) can be chosen to be r n = nk=0 b n for some strictly positive sequence (b n ). On the
one hand, Lemma 4.2.6 for f = 1 X together with Proposition 4.2.7 implies, for s → 1− ,


∞ ,
k 1 1
s μ (A k ) = (1 − s φ ) dμ ∼ ,
1−s (1 − s)B(s)
k=0 A
∞ k
with B(s) := k=0 s b k . Since, as in (4.2),


n
w n (A) = μ(A) + μ (A k ) ∼ n α ψ(n)
k=1
4.3 ψ-mixing, Darling–Kac sets and pointwise dual ergodicity | 151

for α ∈ [0, 1] and ψ some slowly varying function, Karamata’s Tauberian Theorem
gives on the other hand


∞  
Γ(1 + α) 1
s k μ (A k ) ∼ ψ .
(1 − s)α 1−s
k=1

Hence we have
1
B(s) ∼ .
(1 − s)1−α Γ(1 + α)ψ(1/(1 − s))

Applying in Karamata’s Tauberian Theorem the assertion (a) implies (b) to the
generating function B(s) gives


n
1
rn = b k ∼ n1−α .
Γ(2 − α)Γ(1 + α)ψ(n)
k=0

From this the claim follows.

4.3 ψ-mixing, Darling–Kac sets and pointwise dual ergodicity

Definition 4.3.1. Let (X, B , μ, T) be a measure-preserving system. Then a set A ∈ B


with positive, finite measure is called a Darling–Kac set if there exists a positive
sequence (r n ) such that

1  k
n−1
T 1 A → μ(A) uniformly (mod μ ) on A.
rn
k=0

Let us now introduce a stronger mixing property than the one given in Definition 2.5.8.
We recall that the refinements U n of a collection of sets U were defined in Defini-
tion 1.2.23(c) and σ(U ) denotes the σ-algebra generated by U .

Definition 4.3.2. Let (X, B , μ, T) be a dynamical system with μ(X) < ∞ and let M be a
measurable partition of X. Then the system is said to be ψ-mixing with respect to M,
if there exists a sequence (ψ m )m≥0 of positive real numbers which tends to zero for m
 
tending to infinity, such that for all n ∈ N, A ∈ σ Mn , B ∈ B and m ∈ N0 , we have that
 
μ A ∩ T −(m+n) (B) ≤ (1 + ψ m ) μ (A) μ (B) ,

and for all m ∈ N large enough


 
μ A ∩ T −(m+n) (B) ≥ (1 − ψ m ) μ (A) μ (B) .
152 | 4 Infinite ergodic theory

Proposition 4.3.3. Let (X, B , μ, T) be a conservative, measure-preserving and ergodic


dynamical system. Let A ∈ B with 0 < μ(A) < ∞ and let M be a measurable partition of A
such that the return time function φ with respect to A is M-measurable and the induced
system (A, BA , μ|A , T A ) is ψ-mixing with respect to M. Then A is a Darling–Kac set.

Proof. Without loss of generality we assume that μ(A) = 1. Recall that since T is
 k
conservative and ergodic we have by Corollary 2.4.5 that ∞ k=0 T 1 A = ∞ and hence
n
the sequence (a n )n≥1 given for each n ∈ N by a n := k=1 μ(A ∩ T −k A) tends to infinity.
We will show that this sequence will witness the Darling–Kac property for A. To begin,
by the assumed ψ-mixing condition there exists a positive sequence (ψ n )n≥0 with
limn→∞ ψ n = 0 such that for B ∈ σ(Mk ) with k ∈ N, and for all n ∈ N0 we have

 Ak+n 1 B ≤ (1 + ψ n )μ(B)
T

and for all n ∈ N large enough

 Ak+n 1 B ≥ (1 − ψ n )μ(B).
T

The key observation in this proof will be that for B ∈ B with B ⊂ A,


n
A ∩ T −n B = {φ k = n } ∩ T A−k B, (4.3)
k=1

k−1
where, as in (2.12), we define φ k := =0 φ ◦ T A . For the transfer operators, it follows
from (4.3) that


n 
n 
n
 n 1A =
T  Ak 1{φ =n} and
T  k 1A =
T  Ak 1{φ ≤n} .
T
k k
k=1 k=1 k=1
n
In particular, we have a n = k=1 μ |A ({ φ k ≤ n}). Now, for the upper bound we have


n 
n 
n+m
 k 1A =
T  Ak 1{φ ≤n} ≤
T  Ak 1{φ ≤n}
T
k k
k=1 k=1 k=1

n 
n
≤m+  Ak+m 1{φ ≤n} ≤ m +
T  Ak+m 1{φ ≤n}
T
k+m k
k=1 k=1

n
≤m+ (1 + ψ m )μ|A ({φ k ≤ n}) = m + (1 + ψ m )a n ,
k=1

where for the last inequality we used the fact that {φ k ≤ n} ∈ σ(Mk ). Since this
inequality holds for every m, n ∈ N and since (a n ) is diverging it follows that
4.3 ψ-mixing, Darling–Kac sets and pointwise dual ergodicity | 153

uniformly a.e.

1  k
n
lim sup T 1 A ≤ 1.
n→∞ an
k=1

For the lower bound we calculate similarly


n 
n
 k 1A ≥
T  Ak+m 1{φ ≤n} − m
T k+m
k=1 k=1

n 
n
≥  Ak+m 1{φ ≤n} −
T  Ak+m 1{φ ≤n≤φ } − m
T
k k k+m
k=1 k=1

n 
n
≥ (1 − ψ m )μ|A ({φ k ≤ n}) −  Ak+m 1{φ ≤n≤φ } − m
T k k+m
k=1 k=1

n
≥ (1 − ψ m )a n − m −  Ak+m 1{φ ≤n≤φ } .
T k k+m
k=1

Now we observe that



n 
n
 Ak+m 1{φ ≤n≤φ } ≤ (1 + ψ0 )
T μ|A ({φ k ≤ n ≤ φ k+m })
k k+m
k=1 k=1
n  n
= (1 + ψ0 ) μ|A ({φ k = ; φ m ◦ T Ak > n − })
k=1 =1

2
n  n
≤ (1 + ψ0 ) μ|A ({φ k = })μ|A ({φ m > n − }).
k=1 =1

Let us split up the last sum for  = 1, . . . , n − p and  = n − p + 1, . . . , n for a fixed p < n.
For the first part we have


n 
n−p 
n
μ|A ({φ k = })μ|A ({φ m > n − }) ≤ μ|A ({φ m > p}) μ|A ({φ k ≤ n − p})
k=1 =1 k=1

≤ μ|A ({φ m > p})a n .

For the second part, using μ|A ({φ m > n − }) ≤ 1 we have


n 
n
μ|A ({φ k = })μ|A ({φ m > n − })
k=1 =n−p+1

n
≤ μ|A ({n − p ≤ φ k ≤ n})
k=1
154 | 4 Infinite ergodic theory


n 
n
≤ μ|A ({φ k ≤ n}) − μ|A ({φ k ≤ n − p})
k=1 k=1
n 
n−p
≤ μ|A ({φ k ≤ n}) − μ|A ({φ k ≤ n − p})
k=1 k=1

= a n − a n−p ≤ p.

Combining these three inequalities we see that for all m, p ∈ N we have

1  k
n
lim inf T 1 A ≥ 1 − ψ m − (1 − ψ0 )2 μ|A ({φ1 ≥ p}).
n→∞ an
k=1

Since φ1 = φ is finite a.e. we have μ|A ({φ1 ≥ p}) → 0 for p → ∞. Also, ψ m → 0 for m → ∞,
this proves the uniform lower bound

1  k
n
lim inf T 1 A ≥ 1.
n→∞ an
k=1

Combining these bounds finishes the proof.


n −k
Lemma 4.3.4. For A, B ∈ B with 0 < μ(A), μ(B) < ∞ and A n := A \ k=1 T (A) we have:


 
(a)  k 1A = 1X .
T n
k=0   n

n 
n 
n−  k 
(b)  k (1 B ) =
T  
T 1 A ·  k
T (1 B ) +  1 k
T −m (A) .
B\ m=0 T
k=0 =0 k=0 k=0

Proof. To see the first claim, note that by Proposition 2.4.33 (a) we have for two
measurable sets A, C with finite measure and μ(A) > 0 that


∞   ∞  
μ A k ∩ T −k (C) = μ A ∩ T −k (C) ∩ {φ > k} = μ(C).
k=0 k=0

Hence, using the Monotone Convergence Theorem, for every measurable set C with
finite measure we have
, ∞ ∞ ,
 
∞   ,
 k (1 A ) dμ =
T 1 C

T k
(1 Ak ) dμ = μ A k ∩ T −k
(C) = 1 X dμ.
k

C k=0 k=0 k=0 C


 k
This means that ∞ k=0 T (1 A k ) = 1 X .
The second claim follows in a similar way. For n ∈ N and two measurable sets A, C
with finite measure we define
n
C n := T −n (C) \ T −k (A).
k=0
4.3 ψ-mixing, Darling–Kac sets and pointwise dual ergodicity | 155

and observe that T −1 (C n ) = C n+1 ∪ (A ∩ T −1 (C n )), where C n+1 ∩ (A ∩ T −1 (C n )) = ∅. Now


with A, B, C measurable sets with finite measure we have

 n− 
n  
μ B ∩ T −k (A ) ∩ T −(k+) (C)
=0 k=0
  


n 
n−
−k − −m
= μ B∩T A ∩ T (C) \ T (A)
=0 k=0 m=1


n    n− 
n   
= μ B ∩ T −k (C ∩ A) + μ B ∩ T −k T −1 (C−1 ) \ B .
k=0 =0 k=0

Now for the second summand we find by a telescoping argument

 n− 
n   
μ B ∩ T −k T −1 (C−1 ) \ B
=0 k=0

 
n−1 n−(+1)+1 
n 
n−
= μ(B ∩ T −k (C ) − μ(B ∩ T −k (C )
=0 k=1 =1 k=0

n−1 
n− 
n−1 
n− 
n
= μ(B ∩ T −k (C )) − μ(B ∩ T −k (C )) − μ(B ∩ C )
=0 k=1 =1 k=1 =1
n 
n
= μ(B ∩ T −k (C0 ) − μ(B ∩ C )
k=1 =1
n 
n
= μ(B ∩ T −k (C \ A)) − μ(B ∩ C )
k=0 =0

Combining these two calculations gives


n  n− 
n  
μ(B ∩ T −k (C)) = μ B ∩ T −k (A ) ∩ T −(k+) (C)
k=0 =0 k=0
 


n
− −m
+ μ B∩T C\ T A .
=0 m=0

The remaining part of the proof is analogous to the first part.

Proposition 4.3.5. Let (X, B , μ, T) be a conservative ergodic measure-preserving sys-


tem, and let A, B ∈ B with 0 < μ(A), μ(B) < ∞. Suppose that there exists an increasing
positive sequence (a n )n≥1 tending to infinity such that a.e. on A we have

1  k
n
lim T 1 B = μ(B). (4.4)
n→∞ a n
k=0
156 | 4 Infinite ergodic theory

Then the system is pointwise dual ergodic. In particular, the existence of a Darling–Kac
set implies pointwise dual ergodicity.

Proof. We are going to prove that the convergence in (4.4) in fact holds a.e. on X. Then
Hurewicz’s Ergodic Theorem combined with this observation proves pointwise dual
ergodicity.
First note that by the Chacon–Ornstein Lemma 4.1.4 and the assumption stated in
the proposition we have for all N ∈ N that
n  k  n 
k
a n−N a n−N k=0 T 1 B k=n−N+1 T 1 B
1 − n
a n n−N T
= → 1.
an  k 1B  k 1B
T
k=0 k=0
n
As before, for n ≥ 0, let A n := A \ k=1 T −k A. By Egorov’s Theorem we may assume
without loss of generality that the convergence in (4.4) holds uniformly on A. Fix ε > 0.
Using the first part of Lemma 4.3.4 we find for almost every x ∈ X an n0 ∈ N such
 0 k
that nk=0 T 1 A k (x) ≥ (1 − ε). Further there exists n1 > n0 such that for all n ≥ n1 and
uniformly on A we have

1 0
n−n
 k (1 B ) ≥ (1 − ε)μ(B) and a n−n0 ≥ (1 − ε).
T
a n−n0 an
k=0

Now using the second part of Lemma 4.3.4 gives for all n ≥ n1
 n   n 
1  k 1  k   k
n n−k
T 1 B (x) = T 1Ak ·  1B +
T   1 k −j
T
an an B\ j=0 T (A) (x)
k=0 k=0 =0 k=0
 
1  k 
n0 n−k
≥ T 1Ak ·  
T (1 B ) (x)
an
k=0 =0
 
1  k 0 
n0 n−n
≥ T 1Ak ·  (1 B ) (x)
T
an
k=0 =0

a n−n0 n0
 
≥ (1 − ε) μ(B)  k 1 A (x) ≥ (1 − ε)3 μ(B).
T
an k
k=0
 k
This shows that a.e. we have lim inf n→∞ a1n nk=0 T 1 B ≥ μ(B).
Towards the upper bound for the limit superior, fix ε > 0. By Hurewicz’s Ergodic
Theorem we also have a.e. on A,

1  k
n
lim T 1 A = μ(A). (4.5)
n→∞ a n
k=0

Again by Egorov’s Theorem we find a set A ∈ B such that μ(A ) > (1 + ε)−1 μ(A) and the
convergence in (4.5) holds uniformly on A . Hence there exists n0 ∈ N such that on A
 k
and all n ≥ n0 we have a1n nk=0 T 1 A ≤ (1 + ε)μ(A) for all n ≥ n0 . Using the second part
4.4 Exercises | 157

of Lemma 4.3.4 again with A in the place of B gives


 
1  k 1  k 
n n n−k
T 1 A = T 1 Ak ·  1 A
T 
an an
k=0 k=0 =0
 
n
1  
n
≤  k
T 1 Ak · T (1 A )
an
k=0 =0

n
≤ (1 + ε)μ(A)  k 1 A
T k
k=0

≤ (1 + ε)μ(A) ≤ (1 + ε)2 μ(A ).


 k
Hence, we have a.e. lim supn→∞ a1n nk=0 T 1 A ≤ μ(A ). Another application of
Hurewicz’s Ergodic Theorem allows us to replace 1 A by 1 B in the above inequality.
This finishes the proof.

4.4 Exercises

Exercise 4.4.1. Let X := (0, ∞) and λ denote the Lebesgue measure restricted to X.
Consider

V1 : L1 (λ) → L1 (λ)
 
f → x → V1 (f )(x) := e−x f (x) .

Is V1 a positive contractive operator? Is this operator conservative? Determine the sets


{S ∞ g = ∞} and {S∞ g > 0} for some integrable g > 0. Is V1 ergodic?

Exercise 4.4.2. Let X := (0, ∞) and λ denote the Lebesgue measure restricted to X.
Consider

V2 : L1 (λ) → L1 (λ)
 , 
f → x → V2 (f )(x) := 1(0,1) f dλ .

Is V2 a positive contractive operator? Is this operator conservative? Determine the sets


{S ∞ g = ∞} and {S∞ g > 0} for some integrable g > 0. Is V2 ergodic?

Exercise 4.4.3. Let f , g ∈ L1 (μ) with g ≥ 0. Prove with the help of Wiener’s Maximal
Inequality that we have

S n f S∞ f
lim =
n→∞ S n g S∞ g
158 | 4 Infinite ergodic theory

exists μ-a.e. on {S∞ g > 0} \ {S∞ g = ∞}. Go back to Exercises 4.4.1 and 4.4.2 and
consider the limit limn→∞ S n f /S n g on {S∞ g > 0} and {S∞ g = ∞}, respectively.

Exercise 4.4.4. Let V be a positive conservative contraction on L1 (μ). Then a meas-


urable set A is called V *-invariant if 1 A is a V *-invariant function. Do the V *-invariant
sets form a σ-algebra? Compare this collection with the σ-algebra generated by the
V *-invariant functions.

Exercise 4.4.5. For g ∈ L+1 (μ) we have that the set {S∞ g = ∞} is V * -invariant.

Exercise 4.4.6. Prove Remark 4.1.15.

Exercise 4.4.7. Let (X, B , μ) be a probability space. We recall that a sub-σ-algebra F ⊂


B is said to be trivial if for all A ∈ F we have μ(F) ∈ {0, 1}. Show that for all f ∈ L 1 (μ)
we have
,
Eμ (f |F ) = f dμ.

Exercise 4.4.8. Prove the statement in (4.2).


5 Applications of infinite ergodic theory
In this chapter, we shall first consider the sum-level sets for the continued fraction
expansion and prove all the corresponding results to those obtained in Chapter 3 for
the α-sum-level sets. In the continued fraction case investigated here, though, we will
first have to prove that the Gauss map has the ψ-mixing property, in order to apply the
results from infinite ergodic theory given in Chapter 4, as the renewal arguments given
in Chapter 3 are no longer sufficient.
We will also see an application of the fine asymptotics of the Lebesgue measure of
the sum-level sets to Diophantine approximation. Finally, we employ infinite ergodic
theory to show that the even Stern–Brocot sequence is uniformly distributed with
respect to certain canonical weightings.

5.1 Sum-level sets for the continued fraction expansion,


first investigations

To begin, let us recall the definition of the sum-level sets (Cn )n≥1 for the continued
fraction expansion:
# -

k
Cn := [x 1 , x2 , x3 , . . .] ∈ [0, 1] : x i = n for some k ∈ N
i=1
n
= C(x1 , . . . , x k ).
k
k=1 (x1 ,...,x k ): x i =n
i=1

We claimed in the introduction to this chapter that the renewal theory arguments
as used in Chapter 3 are no longer sufficient to analyse the sum-level sets for the
continued fraction expansion. To see why, observe that if we wanted to prove a result
equivalent to Lemma 3.2.8 in this situation, we would be doomed to failure, as the
following calculation shows:
3 1 1 1 1
= λ(C3 ) = · λ(C2 ) + · λ(C1 ) + · λ(C0 ) = .
10 2 6 12 3
(Recall that we calculated the values of the Lebesgue measure of the first four
sum-level sets at the beginning of Chapter 3.)
Before stating the first main theorem, we need the following lemma which
provides the crucial link between the sequence of sum-level sets and the Farey map.
Note that this lemma contradicts our initial impression that the sequence of sum-level
sets is not a dynamical entity, despite its apparent strangeness.
160 | 5 Applications of infinite ergodic theory

Lemma 5.1.1. For all n ∈ N, we have that

F −(n−1) (C1 ) = Cn .

Proof. By computing the images of C1 under the inverse images F0 and F1 of the Farey
map, one immediately verifies that F −1 (C1 ) = C2 . We then proceed by way of induction
as follows. Assume that for some n ∈ N we have that F −(n−1) (C1 ) = Cn . Since F −n (C1 ) =
F −1 (F −(n−1) (C1 )) = F −1 (Cn ), it is then sufficient to show that F −1 (Cn ) = Cn+1 . For this, let
x = [x1 , x2 , x3 . . .] ∈ Cn be given. Then there exists  ∈ N such that


x ∈ C(x1 , . . . , x ) and x i = n.
i=1

By computing the images of x under the inverse branches F0 and F1 , one obtains that
F −1 (x) = {[1, x1 , x2 , . . .], [x1 + 1, x2 , . . .]}. Since we have that

 

1+ x i = (x1 + 1) + x i = n + 1,
i=1 i=2

this shows that F −1 (x) ⊂ Cn+1 , and hence, F −1 (Cn ) ⊂ Cn+1 . The reverse inclusion Cn+1 ⊂
F −1 (Cn ) can be established by counting the Stern–Brocot intervals contained in Cn+1 .

Remark 5.1.2. Notice that an analogous result also holds for the α-sum-level sets, but
that this observation was not necessary for the analysis given in Section 3.2.2.

We are now almost ready to prove our first main theorem. The proof will follow on
combining Lemma 5.1.1 with the next result which depends on the fact that F is exact
(as shown in Theorem 2.5.6), so that we can apply Lin’s criterion for exactness (see
Theorem 2.5.7). Let us also recall that the unique absolutely continuous invariant
0
measure for the Farey map, denoted by νF , is given by νF (A) := A h F (x) dλ(x), where
h F (x) := 1/x (see Proposition 2.3.19).

Proposition 5.1.3. For each measurable set C which satisfies νF (C) < ∞, we have that
 
lim λ F −n (C) = 0.
n→∞

Proof. Let C ∈ B be given as stated in the proposition. So, for each A ∈ B for which
0 < νF (A) < ∞, we then have
     
λ F −n (C) = νF 1 F −n (C) · h F−1 = νF 1 C ◦ F n · h F−1
  
1A 1A
= νF 1 C ◦ F n · h F−1 − +
νF (A ) νF (A )
5.2 ψ-mixing for the Gauss map and the Gauss problem | 161

1  1  
1 n 1 1 νF F −n (C) ∩ A
≤1  −1 A 1
1F h F − νF (A ) 1 + νF (A )
1  11
1 n 1 1 ν C
F( )
≤1  −1 A 1
1F h F − νF (A ) 1 + νF (A )
1
ν (C)
→ F , for n tending to infinity.
νF (A )
 
Here, the limit follows from the fact that νF h F−1 − 1 A /νF (A) = 0 and F is exact, and
hence, Lin’s criterion is applicable. Therefore, by choosing A ∈ B such that νF (A) is
arbitrarily large, the proposition follows.
We can now easily apply this result to determine the limit of the sequence (λ(Cn ))n≥1 ,
which was our first objective.

Theorem 5.1.4.

lim λ(Cn ) = 0.
n→∞

Proof. The proof follows immediately by first putting C = C1 in Proposition 5.1.3, and
then using the fact that Cn = F −(n−1) (C1 ), for all n ∈ N, as shown in Lemma 5.1.1.

Remark 5.1.5.
1. The above theorem (and proof) can be found in [KS12b]. Also in that paper, there is
another proof of the same theorem, which is more elementary, in the sense that it
uses less infinite ergodic theory. The other proof depends upon first showing that
lim inf n→∞ λ(Cn ) = 0. This fact was first established by Fiala and Kleban [FK10],
but they did not provide a proof for the limit.
2. The arguments given above can be slightly modified to give a different proof of
the infinite-type part of Theorem 3.2.9. Note that this will not work for the finite
measure case.

5.2 ψ-mixing for the Gauss map and the Gauss problem

Recall that in Corollary 2.5.9, we used Lin’s criterion for exactness to show that if
a map T is exact, then it is also mixing (cf. Definition 2.5.8). It follows, in light
of Theorem 2.4.12, that the Gauss map G is mixing. Here, we want to show that G
satisfies the stronger property of ψ-mixing which was introduced in Chapter 4 (see
Definition 4.3.2). This property can sometimes, for instance in [Aar97], be found under
the name continued-fraction mixing precisely because the Gauss map satisfies it. The
proof that we will present here is very much inspired by the proof given in [Ios92].
Before proving that the Gauss map G is ψ-mixing, we must make some preliminary
 1 = 1. Secondly, we remind the reader that
remarks. First of all, recall that we have G
 = h P G (h G f ) was established in Proposition 2.3.18, where there we
the identity Gf −1
G
162 | 5 Applications of infinite ergodic theory

understand the operator P G as acting on L1 (μ). Throughout this section we want to


use the pointwise definition of h−1G P G ( h G f ) for a concrete measurable function f (and
in a later section we will do similarly for the Farey map F), but in order to shorten the
 )(x) = h−1 P G (h G f ) (x). Thus, for the operator G
notation, we will simply write G(f  for
G
all x ∈ [0, 1] and “suitable” functions f , where the class of suitable functions will be
defined below, we will write

∞  
 (x) = 1
Gf p i (x)f ,
i+x
i=1

where we have set p i (x) := (1 + x)/((i + x) (i + 1 + x)), for all i ∈ N. (With the pointwise
definition above kept in mind, the proof of this fact is simply a calculation; we leave
it to Exercise 5.7.2.) Notice that (p i (x))i≥1 is a probability vector; this will turn out to be
crucial later.
Further, let us introduce the set of functions of bounded variation,

BV := {f : [0, 1] → R : var f < ∞} ,

where var f is defined to be var f := var[0,1] f and where var[a,b] f is defined, for [a, b] ⊂
[0, 1], to be
# n -

var[a,b] f := sup |f (x i+1 ) − f (x i )| : a ≤ x 1 < · · · < x n+1 ≤ b, n ∈ N .
i=1

Note that any function of bounded variation is in particular bounded. One can show
that any function of bounded variation f can be written as the difference g − h
of two bounded functions g and h, which are either both non-decreasing or both
non-increasing (you are asked to prove this in Exercise 5.7.1). More precisely, these
functions can be chosen, for x ∈ [0, 1], in the non-decreasing case to be

g (x) := var[0,x] f and h (x) := var[0,x] f − f (x) ,

and in the non-increasing case to be

g (x) := var[x,1] f and h (x) := var[x,1] f − f (x) .

It is easy to see, and we leave it as an exercise, that in the non-decreasing case, var g =
g (1) − g (0) = var f and var h = h(1) − h(0) = var f + f (0) − f (1), and in the non-increasing
case, var g = var f and var h = var f − f (0) + f (1).

Lemma 5.2.1. If f : [0, 1] → R is bounded and either non-decreasing or non-increasing,


 is bounded and non-increasing or non-decreasing, respectively.
then Gf

Proof. That Gf is bounded follows directly from the fact that ∞ p i (x) = 1 for all
i=1
x ∈ [0, 1]. For the proof of the remaining assertions, assume that f is non-decreasing
5.2 ψ-mixing for the Gauss map and the Gauss problem | 163

(for the non-increasing case consider −f instead) and let x, y ∈ [0, 1] be fixed such that
x < y. Then,


∞    ∞  
 (x) − Gf
 (y) = 1 1
Gf p i (x) f − p i (y) f
x+i y+i
i=1 i=1
∞  
1
= (p i (x) − p i (y)) f
x+i
i=1
 ∞     
1 1
+ p i (y) f −f
x+i y+i
i=1
∞  
1
≥ (p i (x) − p i (y)) f
x+i
i=1
∞     
1 1
= (p i (x) − p i (y)) f −f ≥ 0,
x+i x+2
i=1

where, in the final equality, and inequality, we have used the observations that
∞
i=1 ( p i ( x ) − p i ( y )) = 0, that p 1 is decreasing, and that p i is increasing for all i ≥ 3.

Lemma 5.2.2. For each monotone and bounded function f : [0, 1] → R, we have that

 ≤ 1
var Gf var f .
2
Proof. Using Lemma 5.2.1 and assuming first that f is non-decreasing, we obtain


∞    ∞  
 := Gf
 (0) − Gf
 (1) = 1 1
var Gf p i (0) f − p i (1) f
i 1+i
i=1 i=1
∞  
1 1
= f (1) − (p i−1 (1) − p i (0)) f
2 i
i=2
 
1 2

1 1 1 1 1
= f (1) − f ≤ f (1) − f (0) = var f ,
2 2 i ( i + 1) i 2 2 2
i=2
 
where we used the facts that f is non-decreasing and that 2/ (i (i + 1)) i≥2 is a
probability vector. For the non-increasing case, consider −f instead and observe that
 (−f ) = −Gf
var(−f ) = var f and G  .

Remark 5.2.3. Note that the constant 1/2 in the latter lemma is optimal. In order to see
this, choose f to be any non-decreasing function such that f |[0,1/2] = 0 and 0 < f (1) < ∞.
For this choice, the above calculation immediately shows that var Gf  = 1/2 var f .

Now we are in a position to take the next step towards the proof that G is ψ-mixing,
namely, we will obtain a bound on the distance, arising from the supremum norm
164 | 5 Applications of infinite ergodic theory

 applied to functions of bounded variation and the


 · ∞ , of the n-th iterate of G
integral of these functions.

Lemma 5.2.4. For each f ∈ BV and n ∈ N, we have that


1 , 1
1 n 1
1G f − f dm G 1 ≤ 2−n (2 var f − |f (0) − f (1)|) .
1 1

If f is monotone then we have 2 var f − |f (0) − f (1)| = var f .

Proof. First observe that for each x ∈ [0, 1] and f ∈ BV, we have that
,  , 
   

|f (u )| −  f dm G  ≤  f (u ) − f (x ) dm G (x ) ≤ var f ,
 

0 
and hence, f ∞ ≤  f dm G  + var f . It follows from (2.4) by setting f := 1 that
0 n 0
G f dm G = f dm G , and thus the above observation gives, for each n ∈ N,
1 , 1  , 
1 n 1
1G f − f dm G 1 ≤ var G  n f − f dm G = var Gnf .
1 1

Let f = g − h, for the monotone bounded functions g and h as defined prior to


Lemma 5.2.1. Using Lemma 5.2.2, it follows that
1 , 1 1 , 1 1 , 1
1 n 1 1 n 1 1 n 1
1G f − f dm G 1 ≤ 1G g − g dm G 1 + 1G  h − h dm G 1
1 1 1 1 1 1
∞ ∞ ∞
 n g + var G
≤ var G nh
≤ 2−n (var g + var h)
= 2−n min {2 var f − f (0) + f (1) , 2 var f + f (0) − f (1)}
= 2−n (2 var f − |f (0) − f (1)|)

Lemma 5.2.5. For each B ∈ B, all m, n ∈ N0 and either every Gauss cylinder set C :=
C(x1 , . . . , x n ) of level n > 0 or for n = 0 and C := [0, 1], we have that
  −n−m  
λ G (B) ∩ C − m G (B) λ (C) ≤ 2−m log 2 m G (B) λ (C) .

 and P G , we have
Proof. Using Lemma 5.2.4 and, as before, the relationship between G
that
  −n−m  
λ G (B) ∩ C − m G (B) λ (C)
,   
   
=  1 B ◦ G m+n · h−1
G · 1 C dm G − m B λ C 
G ( ) ( )
,     
 
=   m+n h−1
1B · G 1
G C − λ ( C ) 1 B dm 
G
, 1   1
1  m  n −1 1
≤ 1 B dm G 1G G h G 1 C − λ ( C )1

5.2 ψ-mixing for the Gauss map and the Gauss problem | 165

         
 n h−1   n −1  
≤ m G (B) 2−m 2 var G 1
G C −  G h 1
G C (0) − G n
h −1
1
G C (1)
       
 −1 n 
= m G (B) 2−m 2 var h−1 n −1 n
G P G (1 C ) −  h G P G (1 C ) (0) − h G P G (1 C ) (1)

≤ 2−m log 2 m G (B) λ (C) .

Here, the final inequality can be seen as follows: Directly from the definition of P G ,
we have that P nG (1 C ) is equal to the derivative of the inverse branch of G n that maps
the unit interval onto the Gauss cylinder C. With p n /q n := [x1 , . . . , x n ], this derivative
is given for y ∈ [0, 1] by 1/(q n + yq n−1 )2 (the proof of this fact is left to Exercise 5.7.3).
Thus we obtain the formula
  log 2 (1 + y)
h−1 n
G P G (1 C ) (y) = .
(q n + yq n−1 )2
 
To shorten the notation, let us set f n := h−1 n
G P G (1 C ) . To obtain an upper bound on the
variation of this function, we first take the derivative:
 
1 2q n−1 (1 + y)
f n (y) = 1 − ,
(q n + yq n−1 )2 q n + yq n−1

and then observe that if x n = 1, this derivative is always negative (so the function
is monotonically decreasing), if x n ≥ 3, then the derivative is always positive (so the
function is monotonically increasing), and if x n = 2 we have that the function has a
maximum at the point y = q n /q n−1 −2 ∈ (0, 1). Therefore, in the cases x n = 1 and x n ≥ 3,
we have immediately that
 
   log 2 2 log 2 
  
var ( f n ) = f n (0) − f n (1) =  2 −
qn (q n + q n−1 )2 
 
log 2  (q n−1 /q n )2 + 2q n−1 /q n − 1 
≤ 2  
q n + q n q n−1  1 + q n−1 /q n 
$ 2  %
 x + 2x − 1 
≤ log 2 λ(C) max   : x ∈ [0, 1]
1+x 
= log 2 λ(C).

If x n = 2, or, equivalently, if q n /q n−1 ∈ [2, 3], a similar calculation shows that


 
var ( f n ) ≤ 2f n q n /q n−1 − 2 − f n (0) − f n (1)
$ 4  %
 x − 4x3 + 3x2 + 2x + 2 
 
≤ log 2 λ(C) max 
2x(x + 1)(x − 1)  : x ∈ [2, 3]
1
= log 2 λ(C).
6
The case in which C := [0, 1] and n = 0 is more straightforward and is left to
Exercise 5.7.4.
166 | 5 Applications of infinite ergodic theory

Corollary 5.2.6. For every set B ∈ B, we have that

lim λ(G−n (B)) = m G (B).


n→∞

Proof. This is an immediate consequence of Lemma 5.2.5 with C = [0, 1].

Theorem 5.2.7. The system ([0, 1], B , m G , G) is ψ-mixing with respect to the partition
α H . More precisely, for all positive integers m, n ∈ N, any Gauss cylinder set C :=
C(x1 , . . . , x n ) of level n and every set B ∈ B, we have that
   
m G G−n−m B ∩ C − m G (B) m G (C) ≤ 2−m log 2 m G (B) m G (C) .

Proof. Let B and C be given as stated in the theorem. For y := (y1 , . . . , y N ) ∈ NN , we


use the notation [yC] to denote the Gauss cylinder set C(y1 , . . . , y N , x1 , . . . , x n ) of level
n + N. With this notation, Lemma 5.2.5 implies that for each N ∈ N, we have
    
 −N  −n−m 
λ G G B ∩ C − m G (B) λ G−N C 
 
 
   
=  λ [yC] ∩ G−n−m−N B − m G (B) λ ([yC]) 
y∈NN 

≤ 2−m log 2 m G (B) λ ([yC])
y∈NN
 
−m
=2 log 2 m G (B) λ G−N (C) .

Letting N tend to infinity in the above inequality and invoking Corollary 5.2.6 finishes
the proof of the theorem.

Remark 5.2.8. The history of ψ-mixing for the Gauss map is a rather long one, and
proving ψ-mixing for the Gauss map can be considered to be the first problem in the
metric theory of continued fractions. It originates in a letter which Gauss wrote to
Laplace on the 30th of January 1812, asking him to give an estimate of the error term

ρ n (x) := |λ(G−n ([0, x])) − m G ([0, x])|, for x ∈ [0, 1] and n ∈ N.

This problem, known as the generalised Gauss problem¹, remained open for a long
time, until R. O. Kuzmin [Kuz28] and P. Lévy [Lev29] independently

and almost
simultaneously gave a solution. Kuzmin showed that ρ n ≤ cκ n , for some positive
constants κ < 1 and c, whereas Lévy obtained the result that ρ n ≤ cθ n , for pos-
itive constants θ < 0.68 . . . and c. These results of Kuzmin and Lévy were then
followed by various improvements by several authors, among them W. Doeblin,
F. Schweiger and P. Szüsz. The currently most satisfying estimate has been obtained

1 Note that the actual Gauss problem was to show what we have obtained in Corollary 5.2.6.
5.3 Pointwise dual ergodicity for the Farey map | 167

by E. Wirsing [Wir74], who showed that the constant θ in Lévy’s estimate is equal to
0.30366300289873265860 . . ..

5.3 Pointwise dual ergodicity for the Farey map

First let us recall from Chapter 2 the induced map FC1 of the Farey map on the
interval C1 := [1/2, 1], which was defined in Example 2.4.30. This map acts on points
[1, x2 , x3 , . . .] in C1 by

FC1 ([1, x2 , x3 , . . .]) = [1, x3 , x4 , . . .].

We showed in the proof of Proposition 2.4.31 that this induced system is measure-
theoretically isomorphic to the Gauss system. Given that G is ψ-mixing, as was shown
in the previous section (see Theorem 5.2.7), it follows immediately that the map FC1 is
also ψ-mixing. Before stating the first result, recall that pointwise dual ergodicity and
Darling–Kac sets were introduced in Definitions 4.2.1 and 4.3.1, respectively.

Lemma 5.3.1. The set C1 is a Darling–Kac set for the Farey map.

Proof. This follows directly from Proposition 4.3.3, in combination with the discussion
above.

Theorem 5.3.2. The system ([0, 1], B , νF , F) is pointwise dual ergodic.

Proof. Since, according to Lemma 5.3.1, the map F has a Darling–Kac set, the result
follows directly from Theorem 4.3.5.
Now we know that F is pointwise dual ergodic, we can obtain information about
wandering rates and return sequences. Let us denote the return sequence associated
to F by (v n )n≥1 . In order to determine the asymptotic type of (v n ), we must compute the
wandering rate (w n (C1 ))n≥1 , (see Definition 4.2.3), where we recall that the wandering
rate (w n (C)) for any C ∈ B is given by
 n 
w n (C) := νF F −(k−1) (C) .
k=1

So, for each n ∈ N we have


 n 
−(k−1)  
w n (C 1 ) = ν F T (C1 ) = νF 1[1/(n+1),1] = log(n + 1) ∼ log(n).
k=1

Next, observe that this wandering rate is slowly varying at infinity, where we recall
from Chapter 3 that this means

lim w k·n (C1 ) /w n (C1 ) = 1, for each k ∈ N.


n→∞
168 | 5 Applications of infinite ergodic theory

Lemma 5.3.3. For the map F, the return sequence (v n ) can be defined by setting
n
v n := .
log(n)

Proof. This follows on combining Proposition 4.2.8 with the fact that F is pointwise
dual ergodic.

5.4 Uniform and uniformly returning sets

In this section, it will turn out to be helpful to have a formula for the transfer operator
 := F
F νF , as we had for G  in Section 5.2. Exactly as before, we have that for f ∈ L1 (νF ),

 ) = h F−1 P F (h F f ).
F(f

It is then straightforward to check that this leads to the pointwise definition


 x   
 )(x) = 1 f
F(f +
x
f
1
.
1+x 1+x 1+x 1+x

Now, we aim to obtain uniform sets for the system ([0, 1], B , νF , F). First recall the
general definition of a uniform set stated in Definition 4.2.4: Let (X, B , μ, T) be
pointwise dual ergodic with return sequence (r n ). Then, a set A ∈ B with positive, finite
measure is called a uniform set for f ∈ L+1 (μ) if
,
1  k
n−1
T f → f dμ uniformly (mod μ) on A.
rn
k=0

We introduce a set of functions D, in order to show that the set C1 is uniform for all of
the elements of D. So, define

D := {f ∈ L +1 (νF ) ∩ C2 ([0, 1]) : f  > 0 and f  ≤ 0}.

We remark that D is not empty, since id[0,1] ∈ D.


 D) ⊂ D.
Lemma 5.4.1. We have F(

Proof. Let f ∈ D and consider the derivative of F(f  ): By the monotonicity of f and f 
we have
 x      
  1 1 x 
f − xf f −f
 ( f ) ( x ) = x+1 x+1 x+1 x+1
F 3
+ 2
,
( x + 1) ( x + 1)
3 45 6 3 45 6
>0 >0
5.4 Uniform and uniformly returning sets | 169

which implies F  ( f ) > 0. Furthermore, an easy calculation shows


 x       
1 x  1
f  + xf  2 f −f
 ( f ) (x) = x+1 x+1 x+1 x+1
F 5
+ 3
( x + 1) ( x + 1)
    x 
1
2 ( x + 1) ( x − 1) f  − 2f 
x+1 x+1
+ 5
≤ 0.
( x + 1)
This finishes the proof.

Lemma 5.4.2. The set C1 is uniform for every f ∈ D.


 (1) = f (1/2). Lemma 5.4.1 implies that x → F
Proof. Fix f ∈ D. Note that Ff  n f (x) is
monotone increasing. Thus, recalling from Lemma 5.3.3 that v n = n/ log(n), we have
for every x ∈ C1

1  k 1  k 1  k
n n n
F f (1/2) ≤ F f (x) ≤ F f (1)
vn vn vn
k=0 k=0 k=0

1  k 1  k
n+1 n
1
≤ F f (1) ≤ f (1) + F f (1/2)
vn vn vn
k=0 k=0

−1 n
0
Since limn→∞ v n −1 f (1) = 0 the uniform convergence v n k f → f dνF follows
k=0 F
from pointwise dual ergodicity.
We will now introduce a somewhat stronger notion than uniform sets, namely, that of
uniformly returning sets.

Definition 5.4.3. Let (X, B , μ, T) be a conservative, ergodic, measure-preserving sys-


tem. A set C ∈ B with 0 < μ(C) < ∞ is called uniformly returning for f ∈ L+1 (μ) if there
exists an increasing sequence (w n ) := (w n (f , C))n≥1 of positive real numbers such that
μ-almost everywhere and uniformly on C we have that
,
 n (f ) = f dμ.
lim w n T
n→∞

Remark 5.4.4. If A, B are sets of finite μ-measure and B is uniformly returning for 1 A ,
we immediately obtain a form of mixing for infinite systems, in the sense that
,
w n μ(A ∩ T −n (B)) = w n T n (1 A ) · 1 B dμ → μ(A)μ(B).

Note that this could be considered to be a little unsatisfactory as a definition, since it


is basically considering an infinite system by restricting it to a set of finite measure.
The work of Lenci, mentioned directly before Definition 2.5.8, addresses this issue by
170 | 5 Applications of infinite ergodic theory

looking at the problem of defining infinite mixing on the entire space from a more
physical point of view.

Let us now return to the Farey map.

Proposition 5.4.5. For the Farey system ([0, 1], B , νF , F) we have that if v ∈ L1 (νF )
satisfies
,
 n (v) = v dνF almost everywhere uniformly on C1 ,
lim w n F
n→∞

then the same holds on any compact subset of (0, 1].

Proof. Let us first recall that, for x ∈ (0, 1] and n ∈ N,


   n 
P n+1
F (h F · v) (x) = P F P F (h F · v) (x)
   
= P nF (h F · v) (F0 (x)) · F0 (x)
   
+ P nF (h F · v) (F1 (x)) · F1 (x) ,

which gives

  (P n+1 (h F · v))(x) − (P nF (h F · v))(F1 (x)) · |F1 (x)|


P nF (h F · v) (F0 (x)) = F . (5.1)
|F 0 (x)|

We proceed by induction as follows. The start of the induction is given by the


assumption in the theorem. For the inductive step, assume that the statement holds

for ki=1 Ci , for some k ∈ N. Then consider some arbitrary y ∈ Ck+1 , and let x denote

log(n +2) F ̂n φ(x )


1 φ(x )

n =9
n =4
n =2
n =1

0 1 x

Fig. 5.1. The iterates of the Farey transfer operator 


F n acting on φ : x → x and rescaled with the
wandering rate approximate a.s. uniformly on compact subsets of (0, 1] the constant function of

height 1 = φ dμ.
5.4 Uniform and uniformly returning sets | 171

the unique element in A k such that F0 (x) = y. Using (5.1), the fact that F  = h−1 P F (h F · v)
F
and the inductive hypothesis in tandem with the assumption that lim w n /w n+1 = 1, we
obtain that
    n
wn F n (v) (y) = w n F  n (v) (F0 (x)) = w n (P F (h F · v))(F0 (x))
h F (F0 (x))

w n (P n+1
F (h F · v))(x) − | F 1 (x)| · w n (P nF (h α · v))(F1 (x))
=
h F (F0 (x)) · |F0 (x)|
, ,
h (x) − h F (F1 (x)) · |F1 (x)|
∼ F vd ν F = vdνF ,
h α (F0 (x)) · |F0 (x)|

where the last equality is a consequence of the eigenequation P F h F = h F .

Let us now return to our collection D. To prove that C1 is indeed uniformly returning
for every element of D we need the following observation.

Lemma 5.4.6. On C1 , for every f ∈ D we have that

n f < F
F  n−1 f , for all n ∈ N.
 
Proof. Fix f ∈ D. Again we use the fact that Ff  (1) = f 1/2 . Then for every x ∈ C1 we
have
. /  
 n f (x) ≤ max F
F  n f (x) : x ∈ C1 = F n f (1) = F
 n−1 f 1/2
. /
= min F n−1 f (x) : x ∈ C1 ≤ F n−1 f (x) ,

where at least one of the inequalities must be strict.

Proposition 5.4.7. The set C1 is uniformly returning for every f ∈ D. That is for all f ∈ D
we have
,
log(n)Fn ( f ) → fdνF , uniformly on C1 .

(See Fig. 5.1 for an illustration).

Proof. Let λ, η ∈ R be arbitrary fixed real numbers with 0 < λ < η < ∞. Putting


n
V n := k ( f ) ,
F
k=0
 
we have by the monotonicity of the sequence F n ( f ) |C that
1
n∈N


F nη
(f ) V nη −V nλ 
F nλ
(f )
· ( nη − nλ ) ≤ ≤ · ( nη − nλ ) .
Vn Vn Vn
172 | 5 Applications of infinite ergodic theory

Since nη − nλ ∼ n (η − λ) as n → ∞, we have for fixed ε ∈ (0, 1) and all n sufficiently


large

n (1 − ε) (η − λ) ≤ nη − nλ ≤ n (1 + ε) (η − λ) .

This implies for all n sufficiently large


nF nη
(f ) V nη −V nλ 
nF nλ
(f )
· (1 − ε ) ( η − λ ) ≤ ≤ · (1 + ε ) ( η − λ ) .
Vn Vn Vn
Since
V nη −V nλ
→ ηα − λα as n → ∞ μ-a.e. uniformly on C1 ,
Vn
we obtain on the one hand

1 ηα − λα  nλ ( f )
nF
· ≤ lim inf μ-a.e. uniformly on C1 .
1+ε η−λ n→∞ Vn

Letting η → λ and ε → 0, it follows that


nF nλ
(f )
αλ α−1 ≤ lim inf μ-a.e. uniformly on C1 .
n→∞ Vn
On the other hand, we obtain similarly


nF nη
(f )
lim sup ≤ αη α−1 μ-a.e. uniformly on C1 .
n→∞ Vn

Since λ and η are arbitrary, we have for any c > 0


nF nc
(f )
→ αc α−1 μ-a.e. uniformly on C1 .
Vn
Finally using V nc ∼ cα Vn μ-a.e. uniformly on C1 and nc ∼ cn, we obtain for
m = nc

m ( f )
mF  nc ( f ) V n
nc n F
= · · →α μ-a.e. uniformly on C1 .
Vm n Vn V nc

From this and Lemma 5.4.2 the assertion follows.

Remark 5.4.8. The material developed here for the set of functions D can be found
in greater generality in the context of operator renewal theory in various works,
including [Gou11, Gou04, MT15, MT12, Sar02, KKSS15, KKS15].
5.5 Finer asymptotics of Lebesgue measure of sum-level sets | 173

5.5 Finer asymptotics of Lebesgue measure of sum-level sets

Using the results obtained in the previous sections, we are now in a position to state
and prove our first result on the finer asymptotics of the sum-level sets.

Theorem 5.5.1.

n
n
λ (C k ) ∼ .
log2 n
k=1

Proof. First, recall from Lemma 5.1.1 that Ck = F −(k−1) (C1 ). Therefore,
, 
1 
n n−1
1
· λ(Ck ) = 1C1 ◦ F k dλ
vn vn
k=1 k=0
, 
n−1
1
= 1C1 ◦ F k · h F−1 dνF
vn
k=0
,  n−1 
1   k −1
= 1C1 F (h F ) dνF .
vn
k=0

n−1  k −1 
Since C1 is a uniform set for h−1 −1
F ∈ D we have that limn→∞ v n k=0 F (h F ) =
0 −1
h F dνF = 1 a.e. uniformly on C1 , and so

1 
n
lim · λ(Ck ) = log 2.
n→∞ v n
k=1

Our second, and final, theorem concerning the Lebesgue measure of the sum-level
sets gives a significant improvement of Theorem 5.1.4 and Theorem 5.5.1. That is, by
increasing the dosage of infinite ergodic theory, we are able to obtain the following
sharp estimate for the asymptotic behaviour of the Lebesgue measure of the sum-level
sets.

Theorem 5.5.2.
log 2
λ(Cn ) ∼ .
log n

Proof. We use again the fact from Lemma 5.1.1 that Ck = F −(k−1) (C1 ). Therefore,
,
w n · λ(Cn ) = w n 1C1 ◦ F n dλ
,
= w n 1C1 ◦ F n · h F−1 dνF
,  
= 1C1 w n F n (h F−1 ) dνF .
174 | 5 Applications of infinite ergodic theory

Since C1 is uniformly returning for h−1


F we have that
  ,
lim w n F n (h F−1 ) = h F−1 dνF = 1
n→∞

a. e. uniformly on C1 , and so

lim w n · λ(Cn ) = log 2.


n→∞

Remark 5.5.3. We refer the interested reader to [Hee15] for an effective bound on the
error term of this asymptotic.

Let us now give an application of the above theorem to elementary metrical Dio-
phantine analysis. We first state the following result, which is a consequence of
Theorem 1.2.19, given in Section 1.2.4.

Theorem 5.5.4. For λ-almost every x = [x1 , x2 , x3 , . . .] ∈ [0, 1], we have that

log(x n /n)
lim sup = 1.
n→∞ log log n

Proof. In light of Corollary 1.2.20, for λ-a.e. x = [x1 , x2 , x3 , . . .] ∈ [0, 1] we have that for
all ε > 0 and all sufficiently large n ∈ N,

x n < n(log n)1+ε .

Taking logarithms of each side, it follows that for all sufficiently large n ∈ N,

log(x n ) < log(n) + (1 + ε) log log(n)

or, rewriting this expression, again for all sufficiently large n ∈ N, we have

log(x n /n)
< 1 + ε.
log log(n)
Hence,

log(x n /n)
lim sup ≤ 1 + ε.
n→∞ log log(n)

Since ε > 0 was arbitrary, we may conclude that

log(x n /n)
lim sup ≤ 1.
n→∞ log log(n)

On the other hand, it also follows from Corollary 1.2.20 that for Lebesgue-almost every
x = [x1 , x2 , x3 , . . .] ∈ [0, 1], we have that for infinitely many n ∈ N,

x n > n log(n).
5.5 Finer asymptotics of Lebesgue measure of sum-level sets | 175

Then,

log(x n /n)
1 ≤ lim sup
n→∞ log log(n)

and the theorem is proved.


In contrast to the almost-everywhere property of continued fraction digits stated in
Theorem 5.5.4, we can now prove a similar statement associated to the Farey coding

using Theorem 5.5.2. Note that ni=1 x i represents the word length associated with the
Farey system, whereas the parameter n represents the word length associated with
the Gauss system. Before stating the theorem, let us remind the reader that we use the
notation A  B to mean that there exists a constant c ≥ 1 such that c−1 A ≤ B ≤ cA.

Proposition 5.5.5. For λ-almost every x = [x1 , x2 , x3 , . . .] ∈ [0, 1], we have that

log(x n+1 / ni=1 x i )
lim sup n ≤ 0.
n→∞ log log( i=1 x i )

Proof. For each n ∈ N and ε > 0, let


# -

k
A εn := C(x1 , . . . , x k+1 ) : x i = n, x k+1 ≥ n(log n) ε

k∈N i=1

and define

Aεn := C.
C∈A εn

Now recall from (1.8) that

λ(C(x1 , . . . , x k , m))  m−2 λ(C(x1 , . . . , x k )).

Therefore, for all k,  ∈ N, we obtain that



λ(C(x1 , . . . , x k , x k+1 ))  −1 λ(C(x1 , . . . , x k )).
x k+1 ≥

Using this estimate and Theorem 5.5.2, we deduce that


n  
λ(Aεn ) = λ(C(x1 , . . . , x k , x k+1 ))
k=1 (x1 ,...,x k )
k
x k+1 ≥n(log n)ε
x =n
i=1 i


n  λ(C(x1 , . . . , x ))
k
 ε
n(log n)
k=1
(x ,...,x )
1
k
k
x =n
i=1 i
176 | 5 Applications of infinite ergodic theory

1 
n 
= ε
λ(C(x1 , . . . , x k ))
n(log n)
k=1
(x ,...,x )
1
k
k
x =n
i=1 i

λ(Cn ) log 2
= ∼ .
n(log n)ε n(log n)1+ε

Hence, since the above calculation implies that the series ∞ ε
n=1 λ(An ) converges, a
ε
straightforward application of the Borel–Cantelli Lemma then yields, where A∞ :=
ε
lim sup An , that
n→∞

ε
λ(A∞ ) = 0, for each ε > 0.
ε
On considering the complement of the set A∞ in [0, 1], we have now shown that, for
each ε > 0 and for λ-almost all x = [x1 , x2 , x3 , . . .],
 k  ε
 
k
x k+1 < xi log x i , for all k ∈ N sufficiently large.
i=1 i=1

By taking logarithms on both sides of the above inequality, we obtain, for all
sufficiently large k ∈ N, that
   k 
k
log(x k+1 ) − log i=1 ix log x k+1 / x
i=1 i
  =   < ε.
k k
log log i=1 x i log log i=1 x i

It therefore follows that


  
log x k+1 / ki=1 x i
lim sup   ≤ ε.
k
k→∞ log log i=1 x i

Finally, on letting ε tend to zero, the lemma follows.

Remark 5.5.6. Using these ideas, there are other results related to continued fractions
and Diophantine analysis that can be obtained. For instance, for the random variable
# k -
 k
X n (x) := max xi : x i ≤ n, k ∈ N0 , x ∈ I,
i=1 i=1

the process n − X n is investigated in [KS08a]. In that paper, a uniform law and large
deviation law are derived.
For further interesting results in the context of continued fraction digit sums we
also refer to [GLJ93, GLJ96], wherein alternating sums of continued fraction digits are
considered.
5.6 Uniform distribution of the even Stern–Brocot sequence | 177

5.6 Uniform distribution of the even Stern–Brocot sequence

Let us begin this section by recalling the concept of weak convergence of probability
measures. We say a sequence (μ n )n∈N of Borel probability measures on (R, B)
converges weakly to a Borel probability measure μ if for all f ∈ Cb (R) we have
, ,
lim f dμ n = f dμ.
n→∞

For this we write w-limn μ n = μ. There are different effective ways to check weak
convergence despite the difficulty of trying to directly use the definition. We will need
the following two characterisations, which can be found in the standard literature
on probability, for example in [Gut13]. We recall that ∆ μ : x → μ((−∞, x]) denotes the
distribution function of μ.
– (Distribution function) w-limn μ n = μ if and only if limn→∞ ∆ μ n (x) = ∆ μ for every
x ∈ R that is a continuity point of ∆ μ .
– (Method of Moments) For a probability measure μ, the function
,
t → exp(t · x) dμ

is called the moment generating function (you will see why in Exercise 5.7.6). We
0 0
have that if exp(t · x) dμ n → exp(t · x) dμ ∈ R for n tending to infinity and for all
t in a neighbourhood of 0, then w-limn μ n = μ.

Let δ x denote the Dirac measure in x, that is δ x (A) = 1 for x ∈ A and δ x (A) = 0 otherwise.
Our aim in this section is to prove the following theorem:

Theorem 5.6.1. For each rational number v/w ∈ (0, 1] we have that

w-lim log(n vw ) q−2 δ p/q = λ. (5.2)
n→∞
p/q∈F −n {v/w}

In order to prove this result, we first prove the following proposition, and then one
further lemma, which will allow us to transfer the result stated below for intervals to
the atomic measures considered in Theorem 5.6.1.

Proposition 5.6.2. For each interval [a, b] ⊂ (0, 1] we have that


 
log n
w-lim   · λ|F −n ([a,b]) = λ.
n→∞ log b/a
178 | 5 Applications of infinite ergodic theory

Proof. Consider the family of functions (φ t )t∈[−1,1] given by φ t : x → x · exp (t · x). The
first aim is to show that for all t ∈ [−1, 1] we have

 t ∈ D.

Indeed, for t ∈ [−1, 0] this is an immediate consequence of F  (D) ⊂ D (see Lemma 5.4.1)
by noting that φ t is increasing, concave with φ t (0) = 0, that is φ t ∈ D. For t ∈ (0, 1],
a straightforward computation shows that the first derivative of Fφ  t at x ∈ [0, 1] is
given by
 x       x 
1 1
  φt − xφt φt − φt
 t (x) = x+1 x+1 x+1 x+1
Fφ 3
+ 2
.
( x + 1) ( x + 1)
For the second derivative we then obtain
 
  tx
  −2 xt − 6x + 2t + xt 2 + 2x3 − 4tx2 − 4 exp
 t x+1
Fφ (x) = 6
( x + 1)
 
  t
2tx − 6x − 2t + xt 2 + 2x3 + 4tx2 − 4 exp
x+1
+ 6
.
( x + 1)
 
This immediately implies that Fφ  t ≤ 0, for all t ∈ (0, 1]. Hence, Fφ  t is concave and
   
we have that Fφ  t is decreasing on [0, 1]. Since Fφ  t (1) = 0, this shows that on
 
[0, 1] we have that Fφ  t ≥ 0. Hence, Fφ  t ∈ D, for all t ∈ [−1, 1].
We proceed by noting that Proposition 5.4.7 combined with Proposition 5.4.5
guarantees that every compact interval contained in (0, 1] is a uniformly returning set
for φ t , for each t ∈ [−1, 1]. In order to complete the proof of the proposition, we employ
the method of moments as follows. For each [a, b] ⊂ (0, 1] and for each t ∈ [−1, 1], we
have
,
log n
lim exp (tx) ·   · 1 F −n [a,b] (x) dλ(x)
( )
n→∞ νF [a, b]
log n   log n  
= lim   · νF φ t · 1 F −n [a,b] = lim   · νF Fn φt · 1
( ) [a,b]
n→∞ νF [a, b] n→∞ νF [a, b]
,
= νF (φ t ) = exp (tx) dλ(x).

This shows that, restricted


 to [−1, 1], the  moment generating functions for the
log n
sequence of measures log(b/a) · λ|F −n ([a,b]) converge to the moment generating
n∈N
function for the Lebesgue measure λ. In turn, this shows the weak convergence in
question and hence finishes the proof of Proposition 5.6.2.
5.6 Uniform distribution of the even Stern–Brocot sequence | 179

For the next lemma we introduce the notation Φ for the free semi-group generated by
the inverse branches F0 and F1 of the Farey map F. Note that for each rational number
v/w ∈ (0, 1] we have that
   
{F −n {v/w} : n ∈ N} = g v/w : g ∈ Φ .

Moreover, note that the Φ-orbit of 1 is equal to the set of rational numbers contained in
(0, 1). (Note that this is just a slightly different way of repeating what we already knew
from Section 1.3, when obtaining the Farey coding of the rational numbers.) Then if we
x 1
associate matrices to the inverse branches F0 : x → 1+x and F1 : x → 1+x , and observe
that
8   9 #  -
1 0 0 1 a b
, ⊂ GL2 (Z) := : a, b, c, d ∈ Z, |ad − bc| = 1} ,
1 1 1 1 c d
 
then to each g ∈ Φ we can associate a matrix ac db from GL2 (Z). The action of g on
C is given by g : z → az+b cz+d . Thus g(1) = v/w, for some v, w ∈ N such that v < w and
gcd(v, w) = 1. Furthermore, for the modulus of the derivative of g at x we have that
|g  (x)| = |cx + d |2 . To make this more precise, see also Exercise 5.7.5.
In the following we let Uε (x) denote the interval centred at x ∈ R of Euclidean
diameter diam(Uε (x)) equal to ε > 0.

Lemma 5.6.3. For each g ∈ Φ there exists a constant C g such that for all ε > 0 sufficiently
small and for all h ∈ Φ, we have
 
diam(h(Uε (g(1)))) − ε |(h (g(1))| ≤ εC g diam(h(Uε (g(1)))).

Proof. First we will prove the following bounded distortion property. Using Exer-
cise 5.7.5, we know that for each g ∈ Φ we can find m, n ∈ N such that |g  (z)| = |mz + n|2 .
Now fix z ∈ (0, 1) and let 0 < ε < z/2 and x, y ∈ Uε (z). Then
      2 2 2 
 |g (x)|   (my + n)2   m (y − x ) + 2nm(y − x) 

sup    
− 1 ≤ sup  
− 1 ≤ sup  

g∈Φ | g (y)|
2 (mx + n)2
m,n∈N (mx + n) m,n∈N

(2m2 + 2nm) 8 m2 + mn
≤ sup 2
|y − x | ≤ 2 |x − y | sup 2
m,n∈N (mz/2 + n) z m,n∈N (m + 2n)
16
≤ |x − y|.
z2
Here, the last inequality can be seen by treating the two cases m ≤ n and m > n
separately. Now, fix g ∈ Φ. Then we have, for 0 < ε < g(1)2 /32 and each h ∈ Φ,
180 | 5 Applications of infinite ergodic theory

 
 
   
 ε | h 
(g(1))|   1 
   − 1
 diam(h(Uε (g(1)))) − 1 ≤  1 0 g(1)+ε/2 |h (η)|
 
 ε g(1)−ε/2 |h (g(1))| − 1 dη + 1 
 
 1  32ε
≤  2
− 1 ≤ .
1 − 16ε/g(1) g(1)2

From this we deduce that


 
diam(h(Uε (g(1)))) − ε|h (g(1))| < diam(h(Uε (g(1)))) 32ε
g(1)2

Setting C g := 32/g(1)2 then finishes the proof.


Proof of Theorem 5.6.1. Let g ∈ Φ be given and define, for ε > 0 sufficiently small,
 
Ug,ε,n := F −(n−1) Uε (g(1)) .
 
With u g,ε := 1/νF (Uε (g(1))) = 1/ log (g(1) + ε/2)/(g(1) − ε/2) , consider the scaled and
restricted Lebesgue measure νg,ε,n which is given, for each n ∈ N, by

νg,ε,n := u g,ε log n · λ |Ug,ε,n .

By Proposition 5.6.2, we then have that w-limn→∞ νg,ε,n = λ. Then observe that
ε
lim εu g,ε = lim = g(1), (5.3)
ε↘0 ε↘0 g(1) + ε/2
log
g(1) − ε/2

and consider the measures ρ g,n defined, for each n ∈ N, by


 |f  (1)|
ρ g,n := g(1) log n ·δ .
|g  (1)| f (1)
f (1)∈F −(n−1) {g(1)}

Using Lemma 5.6.3, we now obtain the following for all x ∈ [0, 1], where ∆(ν)
g,ε,n and
∆(ρ)
g,n denote the distribution functions of the measures νg,ε,n , and ρ g,n , respectively.
 
 (ν) (ρ) 
∆ g,ε,n (x) − ∆ g,n (x)
1   
≤ log n εu g,ε diam(h(Uε (g(1))) − εg(1) |h (g(1))| + log n
ε h∈Φ:
n2
hg(1)∈F −(n−1) {g(1)}

  1 
≤ ε g(1) C g + |εu g,ε − g(1)| u g,ε log n |f  (1)|
εu g,ε f ∈Φ:
f (1)∈F −(n−1) {g(1)}

n→∞   1
→ ε g(1) C g + |εu g,ε − g(1)| ,
g(1)
5.7 Exercises | 181

where the convergence follows from Proposition 5.6.2 and (5.3). This inequality holds
for all x ∈ [0, 1], n ∈ N and the right-hand side vanishes for ε → 0. Hence, using the
condition for weak convergence in terms of distribution functions, we obtain that

w-lim ρ g,n = λ.
n→∞

The proof of Theorem 5.6.1 now follows, if we use in the definition of ρ g,n the fact that
g(1) can be written in the form of a reduced fraction v/w and that then |g  (1)| = w−2 ,
as well as similarly, that f (1) can be written in the form of a reduced fraction p/q and
that then |f  (1)| = q−2 (cf. Exercise 5.7.8).
Finally, let us recall here the even Stern–Brocot sequence which was first defined
in Section 1.3.1. For each n ≥ 0, the n-th member of the Stern–Brocot sequence is
denoted by Bn and the n-th member of the even Stern–Brocot sequence is defined
to be Sn := Bn \ Bn−1 . Recalling that Sn is exactly the set F −n ({1/2}), we obtain the
following immediate corollary.

Corollary 5.6.4. For the even Stern–Brocot sequence we have that


 −2
w-lim log(n2 ) q δ p/q = λ. (5.4)
n→∞
p/q∈Sn

5.7 Exercises

Exercise 5.7.1. Show that any function of bounded variation f can be written as
the difference g − h of two bounded functions g and h, which are either both
non-decreasing or both non-increasing.

Exercise 5.7.2. Show that if p i (x) := (1 + x)/((i + x) (i + 1 + x)) and f ∈ L1 (λ), then


∞  
 (x) = 1
Gf p i (x)f .
i+x
i=1

Exercise 5.7.3. Let n ∈ N and denote C := C(x1 , . . . , x n ). Show that P nG (1 C ) is equal to


the derivative of the inverse branch of G n that maps the unit interval onto the cylinder
set C.

Exercise 5.7.4. Prove the case n = 0, C := [0, 1] of Lemma 5.2.5.

Exercise 5.7.5. Let


#  -
a b
GL2 (Z) := : a, b, c, d ∈ Z, |ad − bc| = 1}
c d
182 | 5 Applications of infinite ergodic theory

 
 denote
denote the group of invertible 2 × 2 matrices over the integers and let Aut C
  Show
the group of automorphism of C that is the set of bi-holomorphic mappings of C.
that the map
   
  a b az + b
GL2 (Z) → Aut C  , → z→
c d cz + d

defines a group homomorphism and determine its kernel. Further, show that for every
  
element φ : z → az+b   −2
cz+d in the image of this homomorphism we have φ ( z ) = | cz + d | .
0
Exercise 5.7.6. The function t → exp(t · x) dμ(x) is called the moment generating
function of the probability distribution μ. Show that the k-th derivative of this function
0
in 0 can be used to find the k-th moment M k (μ) := x k dμ(x), k ∈ N0 , of μ, if it exists.

Exercise 5.7.7. For each g ∈ Φ there exists a constant ∆ g such that all ε > 0 sufficiently
small and for all h ∈ Φ, we have
 
diam(h(Uε (g(1)))) − ε |(h (g(1))| ≤ ε2 |(hg) (1)|∆ g .

Exercise 5.7.8. Show that for each p/q ∈ (0, 1) such that gcd(p, q) = 1, there exists a
unique element f ∈ Φ with f (1) = p/q, and we have f  (1) = q2 .

Exercise 5.7.9. Prove that for all a, b ∈ (0, 1) with a < b we have
 
lim log(n)λ F −n ([a, b]) = log(b/a).
n→∞
Bibliography
[Aar81] J. Aaronson. The asymptotic distributional behaviour of transformations preserving
infinite measures. J. Anal. Math., 39:203–234, 1981.
[Aar97] J. Aaronson. An introduction to infinite ergodic theory, volume 50 of Mathematical
Surveys and Monographs. American Mathematical Society, Providence, RI, 1997.
[Adl98] R. L. Adler. Symbolic dynamics and Markov partitions. Bull. Amer. Math. Soc. (N.S.),
35(1):1–56, 1998.
[BBDK96] J. Barrionuevo, R. M. Burton, K. Dajani, and C. Kraaikamp. Ergodic properties of
generalized Lüroth series. Acta Arith., 74(4):311–327, 1996.
[Ber12a] F. Bernstein. Über eine Anwendung der Mengenlehre auf ein aus der Theorie der
säkularen Störungen herrührendes Problem. Math. Ann., 71:417–439, 1912.
[Ber12b] F. Bernstein. Über geometrische Wahrscheinlichkeit und über das Axiom der
beschränkten Arithmetisierbarkeit der Beobachtungen. Math. Ann., 72(4):585–587, 1912.
[Bir31] G. D. Birkhoff. Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA, 17:656–660,
1931.
[Bor09] E. Borel. Les probabilités denombrables et leurs applications arithmétiques. Rend. Circ.
Mat. Palermo, 27:247–271, 1909.
[Bro61] A. Brocot. Calcul des rouages par approximation, nouvelle méthode. Revue
chronométrique, 3:186–194, 1861.
[CO60] R. V. Chacon and D. S. Ornstein. A general ergodic theorem. Illinois J. Math., 4:153–160,
1960.
[Coh80] D. L. Cohn. Measure theory. Birkhäuser, Boston, Mass., 1980.
[Den38] A. Denjoy. Sur une fonction réelle de Minkowski. J. Math. Pures Appl. (9), 17:105–151,
1938.
[DK96] K. Dajani and C. Kraaikamp. On approximation by Lüroth series. J. Théor. Nombres
Bordeaux, 8(2):331–346, 1996.
[DK02] K. Dajani and C. Kraaikamp. Ergodic theory of numbers, volume 29 of Carus
Mathematical Monographs. Mathematical Association of America, Washington, DC,
2002.
[DS88] N. Dunford and J. T. Schwartz. Linear operators. Part I. Wiley Classics Library. John Wiley
& Sons, Inc., New York, 1988. General theory, With the assistance of William G. Bade and
Robert G. Bartle, Reprint of the 1958 original, A Wiley-Interscience Publication.
[Dud89] R. M. Dudley. Real analysis and probability. The Wadsworth & Brooks/Cole Mathematics
Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 1989.
[DV86] H. G. Diamond and J. D. Vaaler. Estimates for partial sums of continued fraction partial
quotients. Pacific J. Math., 122(1):73–82, 1986.
[EFP49] P. Erdös, W. Feller, and H. Pollard. A property of power series with positive coefficients.
Bull. Amer. Math. Soc., 55:201–204, 1949.
[Eri70] K. B. Erickson. Strong renewal theorems with infinite mean. Trans. Amer. Math. Soc.,
151:263–291, 1970.
[EW11] M. Einsiedler and T. Ward. Ergodic theory with a view towards number theory, volume 259
of Graduate Texts in Mathematics. Springer-Verlag London, Ltd., London, 2011.
[Fal14] K. Falconer. Fractal geometry. John Wiley & Sons, Ltd., Chichester, third edition, 2014.
Mathematical foundations and applications.
[Far16] J. Farey. On a curious property of vulgar fractions. Phil. Mag. Ser. 1, 47(217):385–386,
1816.
184 | Bibliography

[Fel68a] W. Feller. An introduction to probability theory and its applications. Vol. I. Third edition.
John Wiley & Sons, Inc., New York-London-Sydney, 1968.
[Fel68b] W. Feller. An introduction to probability theory and its applications. Vol. II. Third edition.
John Wiley & Sons, Inc., New York-London-Sydney, 1968.
[FK10] J. Fiala and P. Kleban. Intervals between Farey fractions in the limit of infinite level. Ann.
Sci. Math. Québec, 34(1):63–71, 2010.
[Gal72] J. Galambos. Some remarks on the Lüroth expansion. Czechoslovak Math. J.,
22(97):266–271, 1972.
[Gal73] J. Galambos. The largest coefficient in continued fractions and related problems. In
Diophantine approximation and its applications (Proc. Conf., Washington, D.C., 1972),
pages 101–109. Academic Press, New York, 1973.
[Gam01] T. W. Gamelin. Complex analysis. Undergraduate Texts in Mathematics. Springer-Verlag,
New York, 2001.
[Gan01] C. Ganatsiou. On some properties of the Lüroth-type alternating series representations
for real numbers. Int. J. Math. Math. Sci., 28(6):367–373, 2001.
[GL63] A. Garsia and J. Lamperti. A discrete renewal theorem with infinite mean. Comment.
Math. Helv., 37:221–234, 1962/1963.
[GLJ93] Y. Guivarc’h and Y. Le Jan. Asymptotic winding of the geodesic flow on modular surfaces
and continued fractions. Ann. Sci. École Norm. Sup. (4), 26(1):23–50, 1993.
[GLJ96] Y. Guivarch and Y. Le Jan. Note rectificative: “Asymptotic winding of the geodesic flow on
modular surfaces and continued fractions” (Ann. Sci. École Norm. Sup. (4) 26(1):23–50,
1993; MR1209912 (94a:58157)). Ann. Sci. École Norm. Sup. (4), 29(6):811–814, 1996.
[GM88] M. C. Gutzwiller and B. B. Mandelbrot. Invariant multifractal measures in chaotic
Hamiltonian systems, and related structures. Phys. Rev. Lett., 60(8):673–676, 1988.
[Gou04] S. Gouëzel. Sharp polynomial estimates for the decay of correlations. Israel J. Math.,
139:29–65, 2004.
[Gou11] S. Gouëzel. Correlation asymptotics from large deviations in dynamical systems with
infinite measure. Colloq. Math., 125(2):193–212, 2011.
[Gut11] S. B. Guthery. A motif of mathematics. Docent Press, Boston, MA, 2011. History and
application of the mediant and the Farey sequence.
[Gut13] A. Gut. Probability: A Graduate Course. Springer Texts in Statistics. Springer, New York,
second edition, 2013.
[Hal56] P. R. Halmos. Lectures on Ergodic Theory. Publications of the Mathematical Society of
Japan, no. 3. The Mathematical Society of Japan, 1956.
[Hee15] B. Heersink. An effective estimate for the Lebesgue measure of preimages of iterates of
the farey map. Advances in Mathematics Volume 291, 19 March 2016, Pages 621–634.
[Hei87] L. Heinrich. Rates of convergence in stable limit theorems for sums of exponentially
ψ-mixing random variables with an application to metric theory of continued fractions.
Math. Nachr., 131:149–165, 1987.
[Hen00] D. Hensley. The statistics of the continued fraction digit sum. Pacific J. Math.,
192(1):103–120, 2000.
[HW08] G. H. Hardy and E. M. Wright. An introduction to the theory of numbers. Oxford University
Press, Oxford, sixth edition, 2008. Revised by D. R. Heath-Brown and J. H. Silverman,
With a foreword by Andrew Wiles.
[Ios92] M. Iosifescu. A very simple proof of a generalization of the Gauss-Kuzmin-Lévy theorem
on continued fractions, and related questions. Rev. Roumaine Math. Pures Appl.,
37(10):901–914, 1992.
[Iso11] S. Isola. From infinite ergodic theory to number theory (and possibly back). Chaos,
Solitons and Fractals, 44(7):467–479, 2011.
Bibliography | 185

[Jar29] V. Jarník. Zur metrischen Theorie der diophantischen Approximationen. Przyczynek do


metrycznej teorji przyblizeń diofantowych. Prace Mat.-Fiz., 36:91–106, 1929.
[JKS13] J. Jaerisch, M. Kesseböhmer, and B. O. Stratmann. A Fréchet law and an Erdős-Philipp law
for maximal cuspidal windings. Ergodic Theory Dynam. Syst., 33(4):1008–1028, 2013.
[Kak43] S. Kakutani. Induced measure preserving transformations. Proc. Imp. Acad. Tokyo,
19:635–641, 1943.
[Khi35] A. Khintchine. Metrische Kettenbruchprobleme. Compositio Math., 1:361–382, 1935.
[Khi64] A. Ya. Khinchin. Continued fractions. The University of Chicago Press, Chicago,
Ill.-London, 1964.
[KKK91] S. Kalpazidou, A. Knopfmacher, and J. Knopfmacher. Metric properties of alternating
Lüroth series. Portugal. Math., 48(3):319–325, 1991.
[KKS15] J. Kautzsch, M. Kesseböhmer, and T. Samuel. On the convergence to equilibrium of
unbounded observables under a family of intermittent interval maps. Ann. Henri
Poincaré, 17(9):2585–2621, 2016.
[KKSS15] J. Kautzsch, M. Kesseböhmer, T. Samuel, and B. O. Stratmann. On the asymptotics of the
α-Farey transfer operator. Nonlinearity, 28(1):143–166, 2015.
[KMS12] M. Kesseböhmer, S. Munday, and B. O. Stratmann. Strong renewal theorems and
Lyapunov spectra for α-Farey and α-Lüroth systems. Ergodic Theory Dynam. Syst.,
32(3):989–1017, 2012.
[Koo31] B. O. Koopman. Hamiltonian systems and transformation in hilbert space. Proc. Natl.
Acad. Sci., 17(5):315–318, 1931.
[KS07] M. Kesseböhmer and B. O. Stratmann. A multifractal analysis for Stern–Brocot intervals,
continued fractions and Diophantine growth rates. J. Reine Angew. Math., 605:133–163,
2007.
[KS08a] M. Kesseböhmer and M. Slassi. Large deviation asymptotics for continued fraction
expansions. Stoch. Dyn., 8(1):103–113, 2008.
[KS08b] M. Kesseböhmer and B. O. Stratmann. Fractal analysis for sets of non-differentiability of
Minkowski’s question mark function. J. Number Theory, 128(9):2663–2686, 2008.
[KS12a] M. Kesseböhmer and B. O. Stratmann. A dichotomy between uniform distributions of the
Stern–Brocot and the Farey sequence. Unif. Distrib. Theory, 7(2):21–33, 2012.
[KS12b] M. Kesseböhmer and B. O. Stratmann. On the asymptotic behaviour of the Lebesgue
measure of sum-level sets for continued fractions. Discrete Contin. Dyn. Syst.,
32(7):2437–2451, 2012.
[Kuz28] R. O. Kuz’min. Sur un problème de Gauss. Anni Congr. Intern. Bologne, 6:83–89, 1928.
[Len12] M. Lenci. Infinite-volume mixing for dynamical systems preserving an infinite measure.
Procedia IUTAM, 5:204–219, 2012. IUTAM Symposium on 50 Years of Chaos: Applied and
Theoretical.
[Len13] M. Lenci. Exactness, K-property and infinite mixing. Publ. Mat. Urug., 14:159–170, 2013.
[Len14] M. Lenci. Uniformly expanding Markov maps of the real line: exactness and infinite
mixing. preprint: arXiv:1404.2212, 2014.
[Lev29] P. Lévy. Sur les lois de probabilité dont dependent les quotients complets et incomplets
d’une fraction continue. Bull. Soc. Math. France, 57:178–194, 1929.
[Lév52] P. Lévy. Fractions continues aléatoires. Rend. Circ. Mat. Palermo (2), 1:170–208, 1952.
[Lin71] M. Lin. Mixing for Markov operators. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete,
19:231–242, 1971.
[LM95] D. Lind and B. Marcus. An introduction to symbolic dynamics and coding. Cambridge
University Press, Cambridge, 1995.
[Lür83] J. Lüroth. Ueber eine eindeutige Entwickelung von Zahlen in eine unendliche Reihe. Math.
Ann., 21(3):411–423, 1883.
186 | Bibliography

[Mar92] G. Markowsky. Misconceptions about the golden ratio. Coll. Math. J., 23(1):2–19, 1992.
[Min10] H. Minkowski. Geometrie der Zahlen. In 2 Lieferungen. II. (Schluß-) Lieferung. Leipzig:
B. G. Teubner. VIII + S. 241–256 (1910), 1910.
[MN13] T. Miernowski and A. Nogueira. Exactness of the Euclidean algorithm and of the Rauzy
induction on the space of interval exchange transformations. Ergodic Theory Dynam.
Syst., 33(1):221–246, 2013.
[MT12] I. Melbourne and D. Terhesiu. Operator renewal theory and mixing rates for dynamical
systems with infinite measure. Invent. Math., 189(1):61–110, 2012.
[MT15] I. Melbourne and D. Terhesiu. Erratum to: Operator renewal theory and mixing rates for
dynamical systems with infinite measure. Invent. Math., 202(3):1269–1272, 2015.
[MU03] R. D. Mauldin and M. Urbański. Graph directed Markov systems, volume 148 of
Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 2003.
Geometry and dynamics of limit sets.
[Mun11] S. Munday. Finite and infinite ergodic theory for linear and conformal dynamical systems.
PhD thesis, University of St. Andrews, 2011.
[Mun14] S. Munday. On the derivative of the α-Farey-Minkowski function. Discrete Contin. Dyn.
Syst., 34(2):709–732, 2014.
[Neu32] J. von Neumann. Proof of the quasi-ergodic hypothesis. Proc. Natl. Acad. Sci. USA,
18:70–82, 1932.
[Par81] W. Parry. Topics in ergodic theory, volume 75 of Cambridge Tracts in Mathematics.
Cambridge University Press, Cambridge–New York, 1981.
[Phi88] W. Philipp. Limit theorems for sums of partial quotients of continued fractions. Monatsh.
Math., 105(3):195–206, 1988.
[Phi76] W. Philipp. A conjecture of Erdős on continued fractions. Acta Arith., 28(4):379–386,
1975/76.
[Roh48] V. Rohlin. A “general” measure-preserving transformation is not mixing. Doklady Akad.
Nauk SSSR (N.S.), 60:349–351, 1948.
[Rok64] V. A. Rokhlin. Exact endomorphisms of a Lebesgue space. Transl., Ser. 2, Am. Math. Soc.,
39:1–36, 1964.
[RS92] A. M. Rockett and P. Szüsz. Continued fractions. World Scientific Publishing Co., Inc.,
River Edge, NJ, 1992.
[Rud87] W. Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third edition,
1987.
[Rud91] W. Rudin. Functional analysis. International Series in Pure and Applied Mathematics.
McGraw-Hill, Inc., New York, second edition, 1991.
[Sal43] R. Salem. On some singular monotonic functions which are strictly increasing. Trans.
Amer. Math. Soc., 53:427–439, 1943.
[Šal68] T. Šalát. Zur metrischen Theorie der Lürothschen Entwicklungen der reellen Zahlen.
Czechoslovak Math. J., 18(93):489–522, 1968.
[Sar02] O. Sarig. Subexponential decay of correlations. Invent. Math., 150(3):629–653, 2002.
[Sch95] F. Schweiger. Ergodic theory of fibred systems and metric number theory. Oxford Science
Publications. The Clarendon Press, Oxford University Press, New York, 1995.
[Sen76] E. Seneta. Regularly varying functions, volume 508 of Lecture Notes in Mathematics.
Springer-Verlag, Berlin–New York, 1976.
[Ste58] M. A. Stern. Ueber eine zahlentheoretische Funktion. J. Reine Angew. Math., 55:193–220,
1858.
[SW07] L.-M. Shen and J. Wu. On the error-sum function of Lüroth series. J. Math. Anal. Appl.,
329(2):1440–1445, 2007.
Bibliography | 187

[Wal82] P. Walters. An introduction to ergodic theory, volume 79 of Graduate Texts in


Mathematics. Springer-Verlag, New York–Berlin, 1982.
[Wir74] E. Wirsing. On the theorem of Gauss–Kusmin–Lévy and a Frobenius-type theorem for
function spaces. Acta Arith., 24:507–528, 1973/74. Collection of articles dedicated to Carl
Ludwig Siegel on the occasion of his seventy-fifth birthday, V.
[WX11] S. Wang and J. Xu. On the Lebesgue measure of sum-level sets for Lüroth expansion.
J. Math. Anal. Appl., 374(1):197–200, 2011.
[Zwe04] R. Zweimüller. Hopf’s ratio ergodic theorem by inducing. Colloq. Math., 101(2):289–292,
2004.
Index
absolutely continuous 75 Darling–Kac set 151
admissible word 18 decomposition
α-Farey decomposition 48 – Farey 33
alphabet 17 – Hopf 72
α-Farey cylinder set 48 – Hopf’s (for operators) 141
α-Farey expansion 47 Diophantine approximation 7
α-Farey inverse branch 45 dissipative part 72, 141
α-Farey map 44 distribution function 36, 177
α-irrational number 42 Doeblin, Wolfgang 166
α-Lüroth convergent 44 dual operator 138
α-Lüroth cylinder set 44 dual space 77, 138
α-Lüroth expansion 42 dyadic partition 43
α-Lüroth map 40 dynamical system
α-rational number 42 – conservative 69
alternating Lüroth map 61 – conservative measure-preserving 69
approximants 7 – ergodic 88
– measure-preserving 64
badly approximable numbers 11 – measure-theoretic 64
badly α-approximable numbers 54 – non-singular 70
Banach space 137 – number-theoretic 1
Borel σ-algebra 65, 93 – topological 1, 12
bounded variation 162
empty word 17
conditional expectation 143 equivalent 75
conjugacy map 13 ergodic 88
conservative operator 141 ergodic theorem
conservative part 72, 141 – Birkhoff’s pointwise 96, 146
conservative transformation 69 – Chacon–Ornstein 144
continued fraction 1 – Hopf’s 104, 146
– convergent 2 – Hurewicz’s 146
– elements of the 1 eventually periodic 16
– expansion 1, 5, 29 exact 115
– finite 1 exact transformation 94
– infinite 1 expansion
– sum-level set 122 – α-Farey 47
continued-fraction mixing 161 – α-Lüroth 42
convergent – continued fraction 1, 5, 29
– α-Lüroth 44 – Lüroth 59
– continued fraction 2
cover 73 factor 14
cylinder set factor map 14
– α-Farey 48 Farey coding 32
– α-Lüroth 44 Farey decomposition 33
– Farey 33 Farey map 30
– Gauss 19 Farey sequence 58
– symbolic dynamic 17 first passage time 86
Index | 189

fixed point 13 Lüroth expansion 59


function Lüroth map
– distribution 36 – alternating 43, 61
– distribution 37, 177 Lüroth series 43
– generating 61, 131, 133 Lévy, Paul 166
– Hölder continuous 38 Laplace, Pierre-Simon 166
– induced 113 Laplace Transform 136
– Minkowski’s question-mark 35, 38, 49 Lemma
– moment generating 177, 182 – Borel–Cantelli 23
– regularly varying 130 – Chacon–Ornstein 139
– singular 36 – Hopf’s maximal inequality 140
– slowly varying 51, 129, 130 – Wiener’s maximal inequality 140
length 17
Gauss map 15, 19 letters 17
Gauss measure 67 lim-inf set 122
Gauss, Carl Friedrich 166 lim-sup set 23
generating function 61, 131, 133 Lin’s Criterion 118
golden mean 8 Lüroth map 60
– shift 19
golden ratio 8
map
Hölder continuous function 38 – α-Lüroth 40
Hölder exponent 38 – alternating Lüroth 43, 61
harmonic partition 42 – conjugacy 13
Hopf decomposition 72 – factor 14
– for operators 141 – Farey 30
Hurwitz constant 10 – Gauss 15, 19
Hurwitz number 8 – induced 105
– Lüroth 60
incidence matrix 18 – shift 18
induced function 113 – tent 35
induced map 105, 146 Markov partition 27, 40, 47
initial block 17 – coding 28
intersection property 115 – shrinking 27
invariant Markov partition coding 28
– T - 64 measurable union 73
– V * - 158 measure
invariant measure 64 – Dirac 177
inverse branch – Gauss 67
– α-Farey 45 – invariant 64
– α-Lüroth 41 – restricted 105
– Farey 31 measure of maximal entropy 37
– Gauss 15 measure-preserving 64
mediant 33
join 27 Minkowski’s question-mark function 35,
jump transformation 31, 46, 86 38, 49
mod μ 70
Kac’s formula 111, 114 moment generating function 177, 182
Koopman operator 77 Moments
Kuzmin, Rodion Ossijewitsch 166 – Method of 177
190 | Index

noble numbers 10 renewal shift 34


non-singular 94 renewal theorem 124
n-th Farey sequence 58 return sequence 147
number-theoretic dynamical systems 1 return time 105
Ruelle operator 81
occupation time 103
operator
saturate 73
– bounded 137
Schweiger, Fritz 166
– conservative 141
– contracting 138 shift map 18
– dual 138 singular function 36
– Koopman 77 slippery Devil’s staircase 36
– positive 138 slowly varying function 51, 129, 130
– Ruelle 81 stable distribution 100
– transfer 76 Stern–Brocot intervals 33, 160
operator norm 137 Stern–Brocot sequence 33, 58
orbit 12 – even 34, 181
Stern–Brocot tree 34, 82
partition sub-Hölder continuous 49
– dyadic 43 sub-shift 18
– expanding 52 sub-shifts 18
– expansive 52 sum-level set 159
– finite type 52 – α-Lüroth expansion 123
– harmonic 42 – α 133
– infinite type 52 – continued fraction 122
– Markov 27, 40, 47 sweep-out set 73, 88, 105
– shrinking Markov 27 symbolic dynamics 16
period 13 symbols 17
– prime 13 symmetric difference of two sets 70
periodic system
– eventually 16 – induced 105
periodic point 13 Szüsz, Peter 166
Pigeonhole Principle 61
point
– (pre-)periodic 13 tail σ-algebra 94
– exceptional 27 tent map 35
pointwise dual ergodic 147 Theorem
(pre-)periodic point 13 – Aaronson’s 104
prime period 13 – Birkhoff’s Pointwise Ergodic 96, 146
ψ-mixing 151, 161 – Bogolyubov–Krylov 85
– Borel–Bernstein 24
quadratic surd 16 – Dirichlet’s Approximation 61
– Discrete Renewal 126
Radon–Nikodým derivative 76 – Hahn–Banach 144
refinement 27 – Halmos’s Recurrence 70
regularly varying 130, 148 – Hopf’s Ratio Ergodic 104
renewal equation 124 – Hurewicz’s ergodic 146
– asymptotic 149 – Hurwitz’s 8, 9
renewal pair 124 – Karamata’s Tauberian 131
Index | 191

– Lagrange’s 16 – non-singular 70
– Maharam’s Recurrence 74 – Schweiger’s jump 31
– Poincare’s Recurrence 74 trivial σ-algebra 158

– Radon–Nikodým 76
uniform set 148, 168
– Strong Renewal 130 uniformly returning set 169
topological dynamical system 1
topologically conjugate 13 V * -invariant set 158
transfer operator 76
transformation wandering rate 147, 167
wandering sets 69
– ergodic 88
weak convergence 177
– exact 94 wedge 17
– jump 47, 86 Wirsing, Eduard 167
– measure-preserving 64 words 17

You might also like