A Treatise On Information Geometry: Chris Goddard, The University of Melbourne
A Treatise On Information Geometry: Chris Goddard, The University of Melbourne
A Treatise On Information Geometry: Chris Goddard, The University of Melbourne
Preface xi
Acknowledgements xiii
1 Characteristic Classes 1
1.1 Characteristic Classes as pullbacks . . . . . . . . . . . . . . . . . . . 2
1.2 Standard Classes and their Construction . . . . . . . . . . . . . . . . 3
1.2.1 The Stiefel Whitney Classes . . . . . . . . . . . . . . . . . . . 3
1.2.2 The Euler Class . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Chern Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Pontrjagin Classes . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Generalised Invariants and application to Exotic Differentiable Struc-
tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Geometric Interpretation of Generalised Invariants . . . . . . . 9
2 Pseudo-Riemannian Geometry 11
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The fundamental theorem of Riemannian geometry . . . . . . . . . . 12
2.2.1 The metric tensor . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 The euclidean covariant derivative . . . . . . . . . . . . . . . . 12
i
2.2.3 Axiomatic extension to arbitrary metrics . . . . . . . . . . . . 13
2.2.4 Uniqueness of the Levi-Civita connection . . . . . . . . . . . . 14
2.3 Geodesics and the geodesic equation . . . . . . . . . . . . . . . . . . 14
2.3.1 Definition and variational formulation . . . . . . . . . . . . . . 14
2.3.2 The cut and conjugate loci . . . . . . . . . . . . . . . . . . . . 15
2.4 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 The curvature tensor and its symmetries . . . . . . . . . . . . 16
2.4.2 Interpretation of the Ricci and Scalar curvature . . . . . . . . 17
2.4.3 Some basic comparison results . . . . . . . . . . . . . . . . . . 17
2.5 Stoke’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Kähler geometry and Convexity Theory . . . . . . . . . . . . . . . . . 20
2.6.1 The notion of convexity in determining stability . . . . . . . . 20
2.6.2 Kähler manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.3 Relation to minimal surface theory . . . . . . . . . . . . . . . 22
2.6.4 Further thoughts . . . . . . . . . . . . . . . . . . . . . . . . . 24
ii
3.3.4 Lipschitz and BV Functions . . . . . . . . . . . . . . . . . . . 38
3.3.5 Jacobians and the Area Formula . . . . . . . . . . . . . . . . . 41
3.3.6 The Co-Area Formula . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Key Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Tangent Cones . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.2 Rectifiable Sets and a Structure Theorem . . . . . . . . . . . . 45
3.4.3 Currents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.4 The Rectifiability Theorem . . . . . . . . . . . . . . . . . . . . 46
3.5 The Compactness Theorem for Integral Currents . . . . . . . . . . . . 47
3.5.1 Two important theorems from analysis . . . . . . . . . . . . . 48
3.5.2 Remarks on Radon Measures . . . . . . . . . . . . . . . . . . 50
3.5.3 Mollification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.4 Proof of The Theorem, and Existence of (Weak) Solutions to
the Plateau Problem . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Varifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6.2 The Constancy Theorem . . . . . . . . . . . . . . . . . . . . . 57
3.6.3 The Approximation and Deformation Theorems . . . . . . . . 58
3.6.4 The Boundary Rectifiability Theorem . . . . . . . . . . . . . . 58
3.7 A couple of applications . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.7.1 The Frankel-Lawson Result . . . . . . . . . . . . . . . . . . . 59
3.7.2 Jon Pitts’ Construction . . . . . . . . . . . . . . . . . . . . . . 64
iii
5 Geometric Evolution Equations 77
5.1 The Heat Flow Equation as the Prototypical Example . . . . . . . . . 78
5.1.1 Maximum Principle . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1.2 Comparison Principle . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.3 Averaging Property . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 The Curve Shortening Flow (CSF) . . . . . . . . . . . . . . . . . . . 83
5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.2 The Shrinking Circle . . . . . . . . . . . . . . . . . . . . . . . 84
5.2.3 Geometric Invariance . . . . . . . . . . . . . . . . . . . . . . . 84
5.2.4 Avoidance Principle . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.5 Maximum Principle . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.6 Preserving Embeddedness . . . . . . . . . . . . . . . . . . . . 87
5.2.7 Finite Time Singularity . . . . . . . . . . . . . . . . . . . . . . 88
5.2.8 CSF as a steepest descent flow for length . . . . . . . . . . . . 88
5.2.9 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.10 Evolution of Curvature . . . . . . . . . . . . . . . . . . . . . . 91
5.2.11 Bounds on Curvature for CSF . . . . . . . . . . . . . . . . . . 93
5.2.12 An Isoperimetric Estimate . . . . . . . . . . . . . . . . . . . . 95
5.3 Some Geometric Background on Hypersurfaces . . . . . . . . . . . . . 95
5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.2 The Second Fundamental Form . . . . . . . . . . . . . . . . . 96
5.3.3 Differentiation on a Hypersurface . . . . . . . . . . . . . . . . 98
5.3.4 The Structure Equation for Hypersurfaces . . . . . . . . . . . 98
5.3.5 Principal Curvatures . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 Mean Curvature Flow (MCF) . . . . . . . . . . . . . . . . . . . . . . 100
5.4.1 Introduction; MCF as a steepest descent flow of volume . . . . 100
5.4.2 Existence Results . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4.3 Induced Evolution of Various Quantities along MCF . . . . . . 103
5.4.4 The Huisken Rescaling Result . . . . . . . . . . . . . . . . . . 104
5.4.5 Simon’s Identity and an application . . . . . . . . . . . . . . . 105
iv
5.4.6 Monotonicity Formula for the MCF and application to MCF
Solitons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 Proof of the Huisken rescaling result of MCF . . . . . . . . . . . . . . 111
5.5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5.2 The Maximum Principle for Vectors . . . . . . . . . . . . . . . 111
5.5.3 Extension of Technique to 2-tensors on Hypersurfaces . . . . . 113
5.5.4 Application of 2-tensor Extension to Proof of Huisken . . . . . 113
5.5.5 Preservation of bounded curvature . . . . . . . . . . . . . . . 114
5.6 Rescaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.6.2 A compactness theorem . . . . . . . . . . . . . . . . . . . . . 115
5.6.3 Existence of smooth limit flows . . . . . . . . . . . . . . . . . 116
5.7 The 2 dimensional Ricci Flow (2DRF) . . . . . . . . . . . . . . . . . 116
5.7.1 Introduction and Existence . . . . . . . . . . . . . . . . . . . . 117
5.7.2 Evolution of the Scalar Curvature . . . . . . . . . . . . . . . . 118
5.7.3 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.7.4 Soliton Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.7.5 Computing the ”Variational Derivative” for the 2D Gradient
Ricci Soliton equation . . . . . . . . . . . . . . . . . . . . . . 122
5.7.6 C ∞ convergence of the 2DRF . . . . . . . . . . . . . . . . . . 123
5.8 The Ricci Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.8.1 Short time existence . . . . . . . . . . . . . . . . . . . . . . . 129
5.8.2 Evolution of the Curvature . . . . . . . . . . . . . . . . . . . . 131
5.8.3 The Structural Tensor E . . . . . . . . . . . . . . . . . . . . . 131
5.8.4 Convergence Results for Ricci Flow . . . . . . . . . . . . . . . 133
5.8.5 The Perelman Functional; connection with the Fisher Infor-
mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.9 Proof of Hamilton’s Theorem (1982) . . . . . . . . . . . . . . . . . . 136
5.9.1 The theorem, and sketch of its proof . . . . . . . . . . . . . . 136
5.9.2 A maximum principle . . . . . . . . . . . . . . . . . . . . . . . 137
5.9.3 The Uhlenbeck Trick . . . . . . . . . . . . . . . . . . . . . . . 137
5.9.4 Core of the Argument . . . . . . . . . . . . . . . . . . . . . . 139
v
5.10 The Huisken-Sinestrari theorem; surgery and classification of canon-
ical singularities for MCF . . . . . . . . . . . . . . . . . . . . . . . . 141
5.10.1 Preliminaries, including statement of the theorem . . . . . . . 141
5.10.2 Canonical Singularities . . . . . . . . . . . . . . . . . . . . . . 142
5.10.3 A priori Estimates . . . . . . . . . . . . . . . . . . . . . . . . 143
5.10.4 Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.11 Outline of the Classification Program . . . . . . . . . . . . . . . . . . 146
5.11.1 Injectivity Radius and Collapsing of Balls . . . . . . . . . . . 147
5.11.2 Canonical Neighbourhoods . . . . . . . . . . . . . . . . . . . . 148
5.11.3 Connection to the Ricci flow with surgery . . . . . . . . . . . 151
5.11.4 Perelman’s Length or ”Entropy” . . . . . . . . . . . . . . . . . 153
vi
7 Fisher Information and application to the theory of Physical Man-
ifolds 175
7.1 The Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.2 Fisher Information Theory and the Principle of Extreme Physical
Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.2.1 A few definitions from statistics . . . . . . . . . . . . . . . . . 177
7.2.2 The Fisher Information and the EPI principle . . . . . . . . . 179
7.2.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.2.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 182
7.3 Unbiased Estimators and the Cramer-Rao Inequality . . . . . . . . . 183
7.3.1 Estimators and strong Cramer-Rao . . . . . . . . . . . . . . . 184
7.3.2 Physical estimators and weak Cramer-Rao (for Riemannian-
metric-measure spaces) . . . . . . . . . . . . . . . . . . . . . . 186
7.3.3 Physical estimators and weak Cramer-Rao (for Riemann-Cartan
manifolds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.4 Fisher Information is the optimal sharp information measure . . . . . 192
7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.4.2 Variation of the meta information . . . . . . . . . . . . . . . . 194
7.4.3 Alternative perspectives . . . . . . . . . . . . . . . . . . . . . 195
vii
8.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.2.2 Result outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.2.3 Proof of Claims 1 and 2 . . . . . . . . . . . . . . . . . . . . . 213
8.3 Recovery of standard results . . . . . . . . . . . . . . . . . . . . . . . 214
8.3.1 Proof of Claim 2 . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.3.2 Proof of Claim 3 . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.3.3 Proof of Claims 4 to 7 . . . . . . . . . . . . . . . . . . . . . . 217
8.3.4 Classical electrodynamics . . . . . . . . . . . . . . . . . . . . . 218
8.3.5 (Classical) quantum mechanics . . . . . . . . . . . . . . . . . 218
8.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.4.2 Speculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.4.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
viii
9.4.1 Application to condensed matter physics . . . . . . . . . . . . 243
9.4.2 Fluid dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.4.3 The Lorentz problem . . . . . . . . . . . . . . . . . . . . . . . 246
9.4.4 Entanglement physics . . . . . . . . . . . . . . . . . . . . . . . 248
ix
x
Preface
xi
R
that the Fisher information is M Rσ ; in particular this is critical when Rσ = 0.
Originally I also made some attempt to make concrete various ideas that do
not fit in any of the above categories. In particular, I wanted to somehow make
rigorous the idea of curvature on sets of fractal dimension. Early in 2008 I made
some progress with respect to these ideas, which became part of a project I came to
call turbulent geometry.
Following my preliminary investigations into turbulent geometry I began to focus
on the idea of building statistical spaces on top of statistical spaces. In particular
this, and various hybrid models including also turbulence and ideas from statis-
tics, have application to number theory and more practically towards constructing
theoretical frameworks for condensed matter physics. One of the more exciting de-
velopments here, in my opinion, was my discovery of the correspondence principle,
which in its simplest incarnation essentially states that there is a 1-1 correspondence
between completely general statistical manifolds and sharp turbulent geometries of
scalar type. This allows a physical interpretation of the former in a way that has con-
siderable aesthetic appeal, as of a Cartan-Riemannian manifold wherein the points
are of generalised measure.
However, with a few minor initial exceptions, much of the theory I develop in
the later sections of this book still lacks the degree of rigour which its scope would
otherwise dictate. An indication of the level of detail that is required is present
in my proof of both Stoke’s theorem and the Cramer-Rao inequality for statistical
manifolds. Unfortunately my generalisations of these theorems to the rather broader
classes of geometrical objects I deal with further along the path of my researches
are slightly rushed, and probably do require some inspection.
Indeed the main part of the dissertation I had in mind for this disclaimer is
the chapter on turbulent geometry (chapter 9). I have made a good faith attempt
to prove many results in this chapter, but I fear that many of my ”proofs” here
are wanting in precision. However I have decided to retain this work because of its
potential promise to solve many extremely difficult and interesting problems. These
include what I call the Lorentz problem (why the preferred geometrical structure
of the universe should be Lorentzian), indications of how to construct a geometric
theory of condensed matter physics, and indications of how to develop a powerful
and predictive theory for entanglement physics. Others still are increased under-
standing of turbulence in fluid flow, the theory of purely formal structures of fractal
measure (such as the Mandelbrot set), and questions about the structure of the
prime numbers.
xii
Acknowledgements
First and foremost, I would like to acknowledge my PhD supervisor, Professor Hyam
Rubinstein. Not only did he manage to continue to find me things to examine and
learn, but his boundless patience with me when I mentioned various ideas of mine
in incomplete form, and his tireless capacity for impressing upon me the need for
rigour was something which I have greatly appreciated. I would also like to thank my
parents; my father, for his interest in my research, for listening to me, and helping
guide my thought process to meaningful results, and my mother, for being a polite
listener.
Thanks to the popular scientific magazine the New Scientist, I would be oth-
erwise unaware of Frieden’s book on the subject of Fisher information, which was
reviewed a number of years back. I should also thank my father once more for
pointing out the review to me, and impressing upon me the potential importance of
the work.
I should also thank the New Scientist for bringing to my attention the notion of
scale free physics, an idea advocated first by Howard Georgi in [22], though, taking
a similar role as Fisher to Frieden, the original instigators were Banks and Zaks in
their 1982 paper [7]. This motivated much of my interest in investigating (smooth)
fractional geometry and its connection to turbulence in early 2008.
Additionally, I owe a debt of gratitude to Ben Andrews for a passing dinnertime
conversation during the 2006 IAGSM winter school in Brisbane in which he men-
tioned the work of Amari, Nagaoka, Murray and Rice on Information Geometry,
without which I would not have been able to make rigorous my justification for the
variational principle that I invoke later on. There are various incarnations of this
principle, but the first steps along the path of extension and generalisation of the
result known as the Cramer-Rao inequality would have been impossible without ref-
erence to the work of other information geometers. I might add that this principle
underlies Roy Frieden’s work.
xiii
Of course, Frieden’s work would not have been possible, or at least would cer-
tainly have been much more difficult, without various ideas, not at least that of the
Fisher information, which are attributed to Ronald Fisher (1890 - 1962).
Finally an additional word of thanks to Professor Frieden who suggested during
an email exchange in late December 2008 that one of the best tools for proving the
optimality of the Fisher information over the space of positive functionals for a given
geometric model is the Cencov inequality, which is closely related to the Cencov
uniqueness theorem. This is still something that I need to more fully investigate.
xiv
Organisation and attribution of
work
The organisation of this manuscript will be as follows. In the first chapter I survey
J. Milnor’s book on characteristic classes. Chapter two is also a survey, this time
of Riemannian geometry. No particular source has been used here, rather it being
a synthesis of a number of different works. Towards the end of the chapter some
original results are presented.
In chapter three I survey Leon Simon’s and Frank Morgan’s books on geometric
measure theory, as well as incorporating some notes from a course on the same that
Marty Ross gave back in 2006.
Chapter four is a survey mainly of the work in Gilbarg and Trüdinger’s book
”Elliptic Partial Differential Equations”, together with an extensive commentary on
techniques from Shatah and Struwe’s book on geometric wave equations.
Following this in chapter five I give the notes rewritten almost verbatim from a
course that Ben Andrews gave at the AMSI winter school of 2006, together with a few
minor personal modifications. I also cover a few lectures given by Gerard Huisken
and one given by Nick Sheridan. The common thread to all of these materials is
that of geometric evolution equations and the Hamilton-Perelman approach to the
solution of the geometrisation conjecture, which is the topic of this chapter. I also
draw upon J. Morgan and Gang Tian’s detailed and extensive work on the same
subject.
Chapter six is the first truly original component of this work, and introduces
the notion of statistical geometry. Chapter seven draws heavily upon Roy Frieden’s
program described in his book ”Physics from Fisher Information” and attempts
to rigorise it. Chapter eight is the application of the techniques developed in the
previous two chapters to the derivation of the equations of geometrodynamics and
also the equations of the standard model. This chapter is also more or less completely
xv
original.
Chapter nine is an extension of the ideas in the preceeding three chapters to the
idea of a turbulent manifold. The ideas presented in this and the last chapter are
almost entirely due to the author.
xvi
Chapter 1
Characteristic Classes
1
Characteristic Classes as pullbacks
Theorem 1.1.1. (Whitney imbedding theorem for sphere bundles, [51]). Every
sphere bundle whose spheres are of dimension n−1 is equivalent to the bundle induced
by mapping the base space M into the manifold H(n, N ), provided N ≥ dim(M )+1.
Theorem 1.1.2. (Steenrod, [49]). Two sphere bundles are equivalent iff the map-
pings of M into H(n, N ) are homotopic.
2
Characteristic Classes
are the primary focus of this chapter. But these classes of bundles are more or less
the same, since an n-plane bundle can easily be mapped to an n − 1 sphere bundle
via a map call it g. Since cohomology functors reverse direction, we then clearly
have an induced map g ∗ : C(M, R) → C̄(M, R), where C̄ is the characteristic ring
of n-plane bundles over M with respect to the ring R.
3
Standard Classes and their Construction
u|(F, F0 ) ∈ H n (F, F0 )
In order to define the Euler class, we consider the inclusion (E, ∅) ⊂ (E, E0 ).
This gives rise to a restriction homomorphism H ∗ (E, E0 ; Z) → H ∗ (E; Z), denoted
by y 7→ y|E. Applying this to the fundamental class u ∈ H n (E, E0 ; Z) we obtain a
new class u|E ∈ H n (E; Z). But we have that H n (E; Z) is canonically isomorphic
to H n (B; Z). This isomorphism follows from a highly nontrivial theorem that I will
not prove.
4
Characteristic Classes
The Euler class of an oriented n-plane bundle χ is then defined to be the co-
homology class e(χ) ∈ H n (B; Z) which corresponds to u|E under the canonical
isomorphism π∗ : H n (B; Z) → H n (E; Z).
There is in fact a connection between the Euler class of T M and the Euler
characteristic of M , where M is a manifold and T M the corresponding tangent
bundle.
First we need to define the notion of Kronecker index.
For M a closed, possibly disconnected, smooth n-manifold, there is a unique
fundamental homology class
µ(M ) ∈ Hn (M ; Z/2)
with integer coefficients. Here π0∗ is the restriction of the projection map of ω,
π, to E0 .
The groups H i−2n (B) and H i−2n+1 (B) are zero for i < 2n − 1, from which it
follows that π0∗ : H i (B) → H i (E0 ) is an isomorphism.
5
Generalisations
Now define the Chern classes ci (ω) ∈ H 2i (B; Z) by induction on the complex
dimension n of ω. Define the top Chern class cn (ω) to coincide with the Euler class
e(ωR ). For i < n set
Suppose T is the total space of ω0 . Then what we are really doing here is
computing the Chern classes of ω0 which lie in E0 (the base space of ω0 ), and then
defining the Chern classes of ω by pulling these back to B via the map π0∗−1 .
π0∗ : H 2i (B) → H 2i (E0 ) is an isomorphism for i < n, so this is ok. For i > n the
class ci (ω) is defined to be zero.
There is in fact a link between Chern classes and Stiefel Whitney classes; Chern
classes are more or less the even Stiefel Whitney classes of a bundle of even dimen-
sion, without the reduction to Z2 coefficients.
pi (χ) ∈ H 4i (B; Z) .
1.3 Generalisations
A direction of further investigation might be to examine the world of characteristic
classes for general fibre bundles. In other words, dropping the restriction that the
structure group be the general linear group. I am quite interested to see what the
analogies of the above classes might be in the general setting, if in fact there are
any.
6
Characteristic Classes
There are of course many topologies that are somewhat troublesome to work
with, and which may not even admit differentiable structures at all. So we make
the following assumption:
There is, about each point in our set X, an open set U and a function f such
that the map f : U → Rn is a homeomorphism. We call n the dimension of our
topology. If two such sets U and V overlap, we require the transition function from
U to V to be a homeomorphism.
Note that there will always be at least one way of placing a differentiable struc-
ture on such a manifold (by taking finer charts in the original topology, and requiring
transition maps to be differentiable).
If there is a map g : X → Rm such that g is a bilinear mapping and g(X) is a
differentiable submanifold of Rm , we say that X inherits the differentiable structure
from Rm from its embedding via g. There may well be more than one way to embed
X in Rm , of course, which is kind of the whole point.
So-
What we would like to do is produce a class of invariants that, for a given
differentiable manifold, will specify its diffeomorphism class uniquely, given of course
that we know all of the invariants. So consider homotopy classes of maps from a
space Y into our space X. For example, the homotopy groups πn (X) arise in this
way, for Yn = S n .
Now, let Y be an arbitrary differentiable submanifold of infinite dimensional
euclidean space. We can represent such spaces in general by associating a symmetric
bilinear (possibly degenerate) form g(x) to each point x of R∞ such that in a local
chart the induced functions gij (x) are smooth; in other words we are thinking of
them as (pseudo)-Riemannian submanifolds of our infinite dimensional space. Now,
7
Generalised Invariants and application to Exotic Differentiable Structures
8
Characteristic Classes
Then the limits of the Ricci flow under this construction, counted with multiplic-
ity ie as varifolds, correspond to the elements of the group G(g, 1, X), i.e. the limit of
the flow above corresponds to the homotopy class of the map f from N = (R2n+1 , g)
to X. If a flow may be perturbed from one limit to another, ie if we modify the
image via a diffeomorphism and restart the flow we get a different limit, then we
identify the two limits.
In other words, what we are doing, if we consider the Ricci flow as ∼1 , and the
perturbation identification as ∼2 , is claiming that
G(g, 1, X) ∼
= ({all maps from N to X}/ ∼1 )/ ∼2
Intuitively one should think of the fundamental group and the torus, for instance,
and consider the action of this flow as causing the image f (S 1 ) to pinch around
topological obstructions in the limit.
This is slightly messy; evidently we might want to remove the later consideration
about perturbation identification. So instead first volume normalised Ricci flow the
manifold X with the marking f (N ) within the ambient space R2n+1 ; then flow
f (N ) within the ambient space X. Then the worst thing that can happen is for the
Ricci flow limits to degenerate; for instance, consider the smooth torus. Then any
circle around its waist is a valid limit of the Ricci flow of f (S 1 ). So we get a one
dimensional family of solutions which smoothly vary one into the other. This leads
us to the following
Claim: If we perform this modified procedure, then the Ricci flow limits will
correspond to the elements of the group G(g, 1, X), with the implicit assumption
that we identify Ricci flow limits that can be deformed smoothly from one to the
other.
In fact we might expect:
Claim: Degeneracy of limits will correspond to an underlying symmetry of the
ambient space X. (For example, for the torus this is circular symmetry).
9
Generalised Invariants and application to Exotic Differentiable Structures
10
Chapter 2
Pseudo-Riemannian Geometry
2.1 Overview
Out of general interest, I acquired a very good book on Pseudo (or Semi) Riemannian
Geometry [41] and made a study of it towards the middle of 2005. The essential
difference between pseudo and standard Riemannian geometry, as is explained quite
quickly, is the relaxing of the condition that the metric g on the tangent bundle
to a manifold be a symmetric positive definite bilinear form to being merely a
nondegenerate symmetric bilinear form with fixed index (dimension of the space
of negative eigenvalues of g). One recalls that nondegeneracy of a bilinear form
means that its matrix in a given representation is invertible.
Many of the results that apply to Riemannian manifolds follow through for
semi-Riemannian manifolds- though by no means all. One also gets an interesting
interplay between timelike, spacelike, and lightlike vectors (vectors where g(v, v) is
negative, positive, or zero respectively). The 0 vector is defined to be spacelike.
A particularly important case of a semi-riemannian manifold is the case where
the manifold has a metric with index one. These are referred to as Lorentz manifolds,
and are the core ingredient in the physical model known as general relativity.
Riemannian geometry can be generalised to what I shall call Riemann-Cartan
geometry. For observe that if instead g is an antisymmetric bilinear form, that
the fundamental theorem of Riemannian geometry still goes through, and we get a
unique connection, known as the Cartan connection, corresponding to that metric g.
We may then conclude by linearity of the proof of this statement for both symmetric
and antisymmetric bilinear forms that we may drop the assumption of any form of
11
The fundamental theorem of Riemannian geometry
12
Pseudo-Riemannian Geometry
(ii) ∇X (Y + Z) = ∇X Y + ∇X Z
(iii) ∇X (f Y ) = f ∇X Y + X(f )Y
d
dt
g(V, W ) = g( DV
dt
, W ) + g(V, DW
dt
)
∇X Y − ∇Y X = [X, Y ]
13
Geodesics and the geodesic equation
Adding the first two of these statements together then subtracting the third
gives
14
Pseudo-Riemannian Geometry
R1
L(γ) = 0
g(γ 0 (t), γ 0 (t))1/2 dt
where now one takes the norm with respect to the inner product g on the man-
ifold in question.
In order for the path to be a shortest path, we require that the first variation of
the length vanish:
δL(γ) = 0
∇γ 0 γ 0 = 0
d 2 xk dxj
dt2
+ Γkij dx
dt
i
dt
=0
where Γkij := (∇Xi Xj )k are the Christoffel symbols associated to the metric g.
A couple of examples of geodesics for instance are straight lines in Euclidean
space or great circles on the sphere.
15
Curvature
2.4 Curvature
The curvature is essentially the first correction term, or first obstruction from a
Riemannian metric being locally flat. In other words, provided kxk = o(δ),
where
R(U, V ) := ∇U ∇V − ∇V ∇U − ∇[U,V ]
It can be checked that it has the properties of a tensor, ie linear in each argument.
Furthermore, it can be shown to have the following symmetries:
16
Pseudo-Riemannian Geometry
γ = {x|d(x, γ) ≤ }
Then
R(γ 0 , γ 0 )ds
R
V ol(γ ) = n−1 kS n−2 kL[γ] − c(n)n+1 γ
n
V ol(Br (x)) = rn kV olB1R k − c(n)rn+2 R(x)
for r << 1, giving us a geometrical idea of what the scalar curvature does.
17
Stoke’s Theorem
Theorem 2.5.1. Let N k+1 , M n+1 be differential manifolds. Let U k+1 (m) ⊂ N ,
V n+1 ⊂ M be compact subsets of N , M , where U k+1 (m) is a family of subsets
of N specified by m ∈ V such that the function m 7→ U k+1 (m) is smooth. Let
ω ∈ Ωk (N ) ⊗ Ωn (M ). Let dN be the exterior derivative or de Rham operator on
Ω(N ); let dM be likewise w.r.t. Ω(M ). Then we have
Z Z Z Z
dN dM ω = ω (2.1)
V U (m),m∈V ∂V ∂U (m),m∈V
Remark. The nontrivial nature of this result is that ω is allowed to have the form
in local coordinates, though it is possible that in all the examples I have been
working with the function f (x, y) could be expressed in a power series expansion
P ∞ P∞
i=0 j=0 ai (x)bj (y). But it is still not obvious how to prove the above because we
are dealing with a smooth family of sets in N specified by a single set in M .
Remark. The general idea of this result, or at least a generalisation, will prove useful
later when I am looking into mathematical turbulence in chapter 9.
Proof. Without loss of generality, assume that ω = f (x, y)dxdy in local coordinates
as in the remark above. Make the further assumption that f and all its derivatives
vanish at infinity. Then we may split f as a doubly infinite sum of a set of suitable
eigenfunctions as in the remark above. The result then follows by applying the
standard form of Stoke’s theorem twice for each term in the sum, and then bringing
everything back together.
18
Pseudo-Riemannian Geometry
R R R R
M A
d(M ;a) ω(m, a) = ∂M A
ω(m, a)
Proof. A bit of care is required here since we are dealing with statistical deriva-
tives. First observe that we may write ω(m, a) = F (m, a)dmda, and d(M ;a) (m) =
expa (m) ◦ dM ◦ exp−1
R
b∈A b (m) where we have a 1-1 correspondence between a ∈ A
and inner products at m ∈ M .
Then
expa dM (exp−1
R R R R R
M A
d(M ;a) (m)ω(m, a) = M A A b (m)F (m, a))dbdadm
expa exp−1
R R R
∂M A A b F (m, a)dbdadm
expa exp−1
R
A b db = IdM
to
R R
∂M A
F (m, a)dadm
19
Kähler geometry and Convexity Theory
Remark. So essentially the theorem more or less works, but we have to make sure
that our family of differential operators is chosen in such a way to make ”physical
sense”. In other words, the probabilistic flux must be conserved, and also we need
to have that our distribution is normalised in probabilistic weight to one. Otherwise
we run into trouble. The appropriate tools to tackle these issues will be developed
in some greater detail and along slightly different lines in chapter 6.
20
Pseudo-Riemannian Geometry
Jp2 = −1 (2.2)
21
Kähler geometry and Convexity Theory
for all tangent vector fields X and Y on M , where ∇ is the riemannian connec-
tion. This is equivalent to requiring that J be globally parallel to the riemannian
connection.
Definition 3. If a complex manifold M satisfies these two equations, we say that
M is a Kähler manifold.
It is useful to extend a hermitian metric g on a complex manifold M to a complex
valued, ”sesquilinear” form h defined in the following way:
h(X, Y ) = g(X, Y ) + iω(X, Y )
where
ω(X, Y ) = g(X, JY )
for all X, Y ∈ Tp M .
The condition of being Kählerian imposes a strong restriction on ω:
22
Pseudo-Riemannian Geometry
dωp (X1 , X2 , X3 ) = g(X2 , (∇X1 J)X3 ) − g(X1 , (∇X2 J)X3 ) + g(X1 , (∇X3 J)X2 )
Proof. We need only prove things locally, so consider a small embedded complex
submanifold K ⊂ M . Then if X, Y ∈ T K, we have
So K is Kählerian.
23
Kähler geometry and Convexity Theory
The reason we might expect this to be true is because there actually is a corre-
spondence between minimal graphs and complex subdomains of the complex plane,
24
Pseudo-Riemannian Geometry
so it seems reasonable that we might be able to get the same result with a locally
C 2 manifold if we impose some sort of causality requirement.
In particular, we might expect a subset of solutions to certain physical systems
of even dimension to be Kähler, since they might be realisable as solutions to the
square roots of the relevant differential operator.
25
Kähler geometry and Convexity Theory
26
Chapter 3
27
Introduction and Motivation
28
A Survey of Geometric Measure Theory
R p
A0 (t) = d
dt D
1 + (∇f + t∇η)2 dA
By a result from measure theory, we may exchange the integral sign and the
derivative and get
Z
0 ∂p
A (t) = 1 + (∇f + t∇η)2 dA
D ∂t
2(∇f + t∇η) · ∇η 12
Z
= p dA (3.1)
D 1 + (∇f + t∇η)2
[η √ ∇f ·n ∇· √ ∇f
R
]|∂D − D
ηdA = 0
1+(∇f )2 1+(∇f )2
The first term vanishes since η|∂D = 0. Since η is otherwise arbitrary, we then
conclude by the fundamental lemma of the calculus of variations that
∇f
∇· √ =0
1+(∇f )2
29
Introduction and Motivation
As mentioned before, there are a couple of technical issues that might compromise
the above naive approach. I will describe them here.
The main problem is the issue of convergence. For it is possible to find a fam-
ily of surfaces that gets arbitrarily close to the minimal area, but which becomes
pathological in the limit- for instance, one may get space filling curves- very horrible
indeed!
Another problem that may occur might be for instance the possibility that in
perturbing an initial smooth surface to try to find an ideal one, one may produce
self intersections in a new surface- and hence it will no longer be a manifold.
Geometric measure theory deals with these issues by slightly weakening the family
of things that one is working with- to surfaces that are smooth almost everywhere,
and that have integer multiplicity. In particular, one introduces the concept of
rectifiability. It turns out that the space filling component of the pathological family
of surfaces previously described is unrectifiable; whereas the ”good bit” is rectifiable.
There is a structure theorem about the objects one works with that in fact shows
that there is a unique decomposition into these two separate pieces. Finally, if one
has ”nice” conditions, such as a smooth embedded boundary curve and a smooth
ambient space, we may conclude, using a result known as the Allard Regularity
theorem, that the rectifiable bit is actually the smooth surface we are looking for.
Moreover, we are able to conclude that it satisfies the minimal surface equation, and
so our ”naive” analysis is justified.
I should remark that the Plateau Problem is not the only problem that is
amenable to these methods. Any variational problem can be treated in an anal-
ogous manner, and benefits to an equal degree from the tools of this theory. It is
merely that the Plateau Problem is easy to understand in a visual manner, and so
makes a very good example. Since we may in general be considering much broader
variational problems, it makes sense to extend from merely considering integer mul-
tiplicities and consider any value for the multiplicity at any point on the surface;
this is where the idea of density comes into its own.
30
A Survey of Geometric Measure Theory
Parallelogram law.
We also have:
31
Sobolev Spaces
From which the Schwarz inequality can easily be extracted. The triangle in-
equality follows easily from the Schwarz inequality.
The following theorem is a very important one, and gets used throughout anal-
ysis. We shall need it in what is to follow.
Theorem 3.2.2. (The Riesz Representation Theorem). For every bounded linear
functional F on a Hilbert space H, there is a uniquely determined element f ∈ H
such that F (x) = (x, f ) for all x ∈ H and kF k = kf k.
Proof. Let N = x|F (x) = 0 be the null space of F . If N = H, the result follows
trivially by taking f = 0. Otherwise, since N is a closed subspace of H, there exists
an element z 6= 0 in H such that (x, z) = 0 for all x ∈ N . Then F (z) 6= 0 and
moreover for any x ∈ H,
F (x) F (x)
F (x − F (z)
z) = F (x) − F (z)
F (z) =0
F (x)
so x − F (z)
z ∈ N . Hence by our choice of z
F (x)
(x − F (z)
z, z) =0
or in other words
F (x)
(x, z) = F (z)
kzk2
32
A Survey of Geometric Measure Theory
kF k = supx6=0 |(x,f
kxk
)|
≤ supx6=0 kxkkf
kxk
k
= kf k,
kf k2 = (f, f ) = F (f ) ≤ kF kkf k
The Lp (Ω) and Lploc (Ω) spaces consist of p-integrable functions over Ω and locally
p-integrable functions over Ω respectively.
R
For ψ to be p-integrable means that Ω ψ p is well defined and is finite.
R
For ψ to be locally p-integrate means that for any K ⊂⊂ Ω then K ψ p is well
defined and is finite.
The Lp (Ω) norm is defined on p-integrable functions on Ω by
Z
kukp;Ω = ( |u|p )1/p (3.6)
Ω
A multi index α is a string (α1 , ..., αn ), and is used as shorthand for multi-variable
α1 αn
differentiation, e.g. Dα f = ( ∂∂α1 xf1 )...( ∂∂αn xfn ).
3.2.2 Motivation
Let Ω be a closed domain, and φ ∈ C01 (Ω) (singly differentiable functions on Ω with
compact support). By the divergence theorem a C 2 (Ω) solution of ∆u = f satisfies
Z Z
Du.Dφdx = − f φdx (3.7)
Ω Ω
R
since the above is equivalent to Ω DRi (φDi u)dx = 0, which, by the divergence
theorem, is equivalent to the statement ∂Ω φDu.vds = 0 where v is a unit normal
to ∂Ω. Since Ω is closed, this statement makes sense.
33
Sobolev Spaces
is an inner product on C01 (Ω). Hence we can complete C01 (Ω) with respect to the
metric induced by this equation to get a Hilbert space, which we notate as W01,2 (Ω).
This is an example of a so called Sobolev space.
R
The linear functional F defined by F (φ) = − Ω f φdx may be extended, through
appropriate choice of f , to a bounded linear functional on W01,2 (Ω). Then by the
Riesz representation theorem there exists an element u ∈ W01,2 (Ω) such that (u, φ) =
F (φ) for all φ ∈ C01 (Ω). Hence a generalised solution to the Dirichlet problem,
∆u = f , u = 0 on ∂Ω, has been found. So we have reduced the classical problem to
a question of whether this solution we have found in W01,2 (Ω) is in fact in C01 (Ω), in
other words, a regularity problem.
3.2.3 The Sobolev Spaces W k,p (Ω), W0k,p (Ω) and Wloc
k,p
(Ω)
Let u ∈ L1loc (Ω). Given a multi-index α, v ∈ L1loc (Ω) is called the αth weak derivative
of u if
Z Z
|α|
φudx = (−1) uDα φdx (3.9)
Ω Ω
for all φ ∈ C0∞ (Ω). We write v = Dα u, and observe that this object is defined
up to sets of Lebesgue measure zero.
We then define the Sobolev space
34
A Survey of Geometric Measure Theory
X
kDα ukp;Ω (3.12)
|α|≤k
k,p
Wloc (Ω) = Lploc (Ω) ∩ {u : Dα u ∈ Lp (Ω), |α| ≤ k} (3.13)
Finally, we define the spaces W0k,p (Ω) = W k,p (Ω) ∩ Lp0 (Ω) where Lp0 is the space
of p-integrable functions on Ω with compact support. In other words, W0k,p is the
set of (k, p)-Sobolev functions with compact support.
kuknp/(n−p);Ω ≤ CkDukp;Ω
kukp∗ ;Ω ≤ Ckukk,p;Ω .
35
Further preliminaries from analysis
0 Pk−1
supΩ |u| ≤ C|K|1/p { |α|=0 (diamK)|α| α!
1
kDα ukp;K
+(diam(K))k (k−1)!
(
k − np )−1 kDk ukp;K }
(i) Lm (φ) = 0
36
A Survey of Geometric Measure Theory
Remark. Lebesgue measure is Borel regular, that is, for any A, Lm (A) = Lm (B)
some some B ⊃ A Borel. To see this, let Bj = ∪k Pjk and L(Bj ) ≤ L(A) + 1j . Set
B = ∩j Bj .
Remark. Lm is Radon, that is, any A can be approximated by closed sets on the
inside, and open sets on the outside. Equivalently, this means that Lm is Borel
regular and Lm (A) < ∞ for A bounded.
Note that Hδn is an (outer) measure, and, furthermore, that H n is a Borel regular
measure. To see the latter, let A, B be two sets with d = d(A, B) > 0. Take δ < d.
Then Hδn (A ∪ B) = Hδn (A) + Hδn (B). So H n is Borel. To see that it is in fact Borel
regular, take δj = 1j and a family of covers {Tjk } such that H n1 (∪k T̄jk ) = H n (A) + 1j .
j
n 1
However, H is not Radon if n < m. For example, H (unit 2-disc) = ∞.
Other interesting facts include that
Note. H n makes sense in any metric space X, and for any n ≥ 0, not necessarily
a natural number.
37
Further preliminaries from analysis
Remark. Hausdorff measure may otherwise seem quite arbitrary, but the fact that
it is invariant under rigid motions of Rm means that oddly rotated sets can be
measured quite easily using it, whereas they would be frightful to compute using
Lebesgue measure. In fact, it turns out that we may compute the Lebesgue measure
of horridly rotated sets using Hausdorff measure, due to the following extreme useful
and deep theorem:
Theorem 3.3.1. H n = Ln on Rn .
3.3.3 Densities
Let A ⊂ Rn , a ∈ Rn , m ≤ n. We define the m-dimensional density Θm (A, a) of A
at a by
Hm (A ∩ B n (a, r))
Θm (A, a) = limr→0 (3.14)
αm r m
where αm is the size of the ball of radius one in Rm .
Define Θm (µ, a) where µ is a measure of Rn by
38
A Survey of Geometric Measure Theory
Theorem 3.3.4. Lipschitz functions are weakly differentiable, that is, for any Lip-
schitz f there exists a g which is the weak derivative of f , ie., such that
R R
ηDi g = − f Di η
39
Further preliminaries from analysis
for each W ⊂⊂ U , then there is a subsequence {uk0 } ⊂ {uk } and a BVloc function
u such that uk0 → u in L1loc (U ) and
Z Z
kDuk ≤ liminf kDuk0 k (3.17)
W W
for all W ⊂⊂ U .
Remarks.
40
A Survey of Geometric Measure Theory
(i) The finiteness condition above is necessary in the sense that in order to get a
compactness result of this nature, we need the sequence of functions and their
derivatives to be bounded in every bounded domain. If we wanted to establish
L2loc (U ) convergence, we would need all second derivatives to be bounded too.
(ii) The inequality following the finiteness condition is essentially a kind of Fatou’s
lemma for first order derivatives.
Having got this result, the game we now play is the following- we want to
construct a family of BVloc functions to locally represent our surface, for example
(in the case we are considering the Plateau problem). It then follows from this
theorem that this family will have a convergent subsequence. So we want to do
calculus of variations with BVloc functions. The difficulties with this will lie of
course in the fact that we may have corners and other pathological behaviours on
sets of measure zero.
and
R R
u(x)Jm f (x)dLm x = u(x)dHm y
P
Rm Rn x∈f −1 (y)
41
Further preliminaries from analysis
Proof. I will follow [35]. Suppose rank Df = m. Let {si } be a countable dense set of
affine maps of Rm onto m-dimensional planes in Rn (recall an affine map is a linear
transformation followed by a translation). Suppose E is a piece of A such that for
each a ∈ E the affine functions f (a) + Df (a)(x − a) and si (x) are approximately
equal.
Then detsi is approximately Jm f on E, and f is injective on E since all affine
maps are injective.
Since f is differentiable, we can locally approximate f by f (aj ) + Df (aj )(x − aj )
in some small neighbourhood of aj which we can call Ej for a dense subset {aj }
of A. Since the set {si } is dense there is some element of this set, call it sj after
possible relabelling, such that sj is arbitrarily close to the local approximation to f
in Ej . Then evidently these neighbourhoods cover A.
So for each piece Ei ,
42
A Survey of Geometric Measure Theory
Z
Jn f dLm = |det(L)|Hm (A)
A
43
Key Constructions
since the map f is linear and so the stretch factor from A to f (A) will be given by
|det(L)|.
Z
= |det(L)| Hm−n (P −1 {y})dLn y
P (A)
But this is precisely the result we wanted. The general case for f nonlinear
comes through easily as a consequence.
\ x−a
T an(E, a) = {r ∈ R : r ≥ 0}[ Closure{ : x ∈ E, 0 < |x − a| < }] (3.20)
>0
|x − a|
44
A Survey of Geometric Measure Theory
We can interpret this theorem as stating that any set E in Rn can be split
uniquely into a curvelike bit (a bit where calculus makes sense) and an uncurvelike
bit (where things get pathological). This follows from the closely related theorem:
Remark. Note that  and B̂ are more or less the same as A and B, respectively, up
to sets of Hn measure zero.
3.4.3 Currents
The main building blocks of GMT are the rectifiable currents, m-dimensional ori-
ented surfaces. They are oriented rectifiable sets with integer multiplicities, finite
area, and compact support. One presumes an orientation can be defined by associ-
ating a basis of m formlike objects to each tangent space in B. Then, if it is possible
to choose a nowhere vanishing m-formlike object across all of B we could say that
it is orientable.
Multiplicity is related to how one interprets integration over such a set B. The
way integration works is to use the standard measure µ for each point, then modify
45
Key Constructions
(∂S)(φ) = S(dφ)
46
A Survey of Geometric Measure Theory
The aim of this section will be to prove a limited version of this theorem. The
main reason I will not state a proof of the theorem in its full generality is essentially
because the technical nature of it would disguise the main ideas underlying the
establishment of this result. Instead, I will in fact focus on providing a proof of the
compactness theorem for BV functions stated earlier. This has several advantages:
(i) The compactness theorem for BV functions is in fact a special case (the codi-
mension one case) of the above theorem, since a codimension one current can
be viewed locally as a graph of a BVloc function,
(ii) Although this theorem is less general, it is considerably easier to prove and is
a good model of the general case,
and
(iii) The theorem is readily extendible to the codimension k case.
I will follow this proof with a focus on proving its main implication for the
Plateau problem (and more general variational problems), namely that weak min-
imising solutions will exist to it.
47
The Compactness Theorem for Integral Currents
Proof. (lifted from J. Koliha’s notes on analysis [34], Theorem 5.5.2). Suppose A is
bounded and equicontinuous. Then, for any fixed > 0 there is a ball B(a, δ(a)) for
each a ∈ S such that
Since S is compact, by the Heine Borel lemma there exist balls B(xk , δ(xk )) for
k = 1, ..., N some finite N that cover S.
Now define a mapping P : A → RN setting P (f ) = (f (x1 ), ..., f (xN )). I claim
that P (A) is bounded in RN . To see this, observe that since A is bounded by
hypothesis that there is a c > 0 such that
kf k ≤ c for all f ∈ A
48
A Survey of Geometric Measure Theory
Finally
|f (x) − f (y)| ≤ |f (x) − fk (x)| + |fk (x) − fk (y)| + |fk (y) − f (y)|
≤ + + = 3 (3.22)
49
The Compactness Theorem for Integral Currents
This corollary is important in any application where one wants to show existence
of a convergent subsequence of a set of functions. Hence it has obvious applications
in geometric measure theory, and, more generally, in existence theory for PDE.
A more general version of the theorem also holds:
Remark. Note that the theorem I proved earlier was essentially for the case Y = Rn ,
but extending the proof to the general case is not too hard.
I now remind the reader of a theorem I mentioned earlier when I was talking
about Sobolev spaces:
Theorem 3.5.5. (The Riesz Representation Theorem). For every bounded linear
functional F on a Hilbert space H, there is a uniquely determined element f ∈ H
such that F (x) = (x, f ) for all x ∈ H and kF k = kf k.
This result allows one, in certain circumstances, to show certain notions are well
defined. It will have particular power in the next section, in which I shall make
certain comments on Radon measures.
50
A Survey of Geometric Measure Theory
Proof. The correspondence is essentially achieved via use of the Riesz representation
theorem. Let < ., . > be the inner product on H. Then, for any L, there is a ν, and
vice versa, such that
R
L(f ) = X < f, ν > dµ.
Corollary 3.5.7. As a consequence of the above, we can identify the Radon mea-
sures on X with the non-negative linear functionals onRK(X, R). By abuse of nota-
tion, define from now on µ : K(X, R) → R as µ(f ) = X f dµ.
3.5.3 Mollification
Definition 10. A symmetric Rmollifier is a function φ ∈ Cc∞ (Rn ), φ ≥ 0, with
support(φ) ⊂ B1 (0), such that Rn φ = 1 and φ(x) = φ(−x).
51
The Compactness Theorem for Integral Currents
1 1
where p
+ q
= 1.
An appropriate symmetric mollifier φ might well be φ(x) = cexp( |x|21−1 ) for
R
|x| ≤ 1 and zero otherwise, for a choice of a constant c such that φdx = 1.
The following result is mentioned in Leon Simon’s book, and also in Gilbarg and
Trudinger’s text. It is a standard result about mollification. First, however, we will
need an auxiliary lemma, which is essentially lemma 7.1 in [26]:
x−y
Z
−n
uh (x) = h φ( )u(y)dy
|x−y|≤h h
Z
= φ(z)u(x − hz)dz (3.23)
|z|≤1
Z
supΩ |u − uh | ≤ supx∈Ω φ(z)|u(x) − u(x − hz)|dz
|z|≤1
52
A Survey of Geometric Measure Theory
Proof. The following proof of the first part is lifted from Gilbarg and Trüdinger [26],
lemma 7.2.
Now, by Hölder’s inequality, we have that
Z
p
|uh (x)| = | φ(z)u(x − hz)dz|p
|z|≤1
Z
≤ |φ(z)p−1 |φ(z)|u(x − hz)|p dz
|z|≤1
Z Z
p−1
≤ φ(z) dz φ(z)|u(x − hz)|p dz
|z|≤1 |z|≤1
Z
≤ φ(z)|u(x − hz)|p dz (3.25)
|z|≤1
Z Z Z
p
|uh | dx ≤ φ(z)|u(x − hz)|p dzdx
Ω Ω |z|≤1
Z Z
= φ(z)dz |u(x − hz)|p dx
|z|≤1 Ω
Z
≤ |u(x)|p dx (3.26)
Bh (Ω)
where Bh (Ω) = {x ∈ U : dist(x, Ω) < h}. The last inequality follows from
Hölder’s inequality and since x − hz is in Bh (Ω) for all h < 21 dist(Ω, ∂U ) and
|z| ≤ 1, provided x ∈ Ω. Observe certainly Ω ⊂ Bh (Ω).
Then it now follows that
53
The Compactness Theorem for Integral Currents
(we may do this because C 0 (U ) is dense in C p (U ) for all p > 0). I now invoke
our auxiliary lemma to find a h such that
kw − wh kLp (Ω) ≤
since by definition
R R
W
kDuk = −supg W
udiv(g)
where g is a bump function (ie no larger in modulus than one, smooth, and
contained in the support of W ).
R R R R
But then W f kDuk = −supg W udiv(f g) ≤ −liminfσ→0 W udiv(f gσ ) =: liminfσ→0 W f kDuσ k.
So it remains to show that
R R
limsupσ→0 f kDuσ k ≤ f kDuk
54
A Survey of Geometric Measure Theory
with g a bump function. Also note that if g is fixed, and for σ < dist(spt(f ), ∂U ),
we have
Z Z
σ
f g · grad(u ) = − uσ div(f g)
U Z
=− φσ ? (u)div(f g)
Z
=− u(φσ ? div(f g))
Z
=− udiv(φσ ? f g) (3.28)
R But by definition of kDuk, we have that the above is nothing other than ≤
Wσ
(f + (σ))kDuk. Here (σ) → 0 for σ → 0, and W = spt(f ), Wσ = {x ∈ U :
dist(x, W ) < σ}. This is because
55
Varifolds
Existence of Weak Solutions to the Plateau Problem. The above theorem guar-
antees that given a sequence of surfaces realised as graphs of BV functions there
will be a convergent subsequence. But certainly if we perform calculus of variations
then we will naturally get a sequence of smooth surfaces, with the limit satisfying
the minimal surface equation. But a sequence can only have one limit; hence this
limit must be the solution given to us via the BV function compactness theorem.
However we are not guaranteed that it will be smooth; we only know in general that
it will be the graph of some BV function. Showing that what we have is in some
sense smooth will be the focus of the next section.
There is in fact a generalisation of all this to varifolds, which are essentially
currents with a measure θ, sort of like metric-measure spaces in the sense of Gro-
mov, but more pathological. This generalisation, the compactness theorem for such
objects, is known as the Allard regularity theorem. However I will not mention this
here, since the statement and proof of it are very technical and little instructive
value would be gained from a study. More on varifolds in the next section.
3.6 Varifolds
3.6.1 Introduction
Note that I have so far only treated the compactness and regularity theorems for the
case of objects with integral density. In more general variational problems, where
we might have some sort of natural Lagrangian that varies from point to point
in our manifold, for instance, it is necessary to consider different, non-constant
densities or equivalently different measures. In this section I mention a few results
that equip us with the means to deal with these additional complexities. The most
important theorems are the approximation and deformation theorems, which lead
to the more general compactness theorem for rectifiable varifolds. The constancy
theorem is of considerable theoretical utility, and was used by Jon Pitts extensively
in his thesis. The boundary regularity theorem as stated here really only applies to
integral varifolds, but it is also true more generally.
Varifolds are essentially a generalisation of the concept of currents. The primary
reason we are interested in these things is in order to extend our existence and
regularity results to general variational problems; what we have developed so far
is limited in scope to geometric variational problems on sets of integer multiplicity,
which limits us to examining things like the Plateau Problem. In fact, in later parts
56
A Survey of Geometric Measure Theory
of this thesis, I will examine the problem of varying the Fisher Information of a
space, which is definitely not integer valued!
For a more precise definition, let Ω be a subset of Rn . Then a general m-
dimensional varifold on Ω is essentially a Radon measure on Ω × G(n, m), where
G(n, m) is the set of all m-dimensional subspaces of n-dimensional Euclidean space.
However, for our purposes, we will mainly be interested in rectifiable varifolds. Such
objects may essentially be represented as a graph with multiplicity, i.e., as a pair
(V, f ) where the mass of a set U ⊂ V is defined as
R
m(U ) = U
f dLm
Another way of writing this is to define a new measure (which is not necessarily
positive definite) such that dµ = f dLm and then
R
m(U ) = U
dµ
57
A couple of applications
Theorem 3.6.2. Given an integral current T ∈ Dm (Rn ) and > 0, there exists
a m-dimensional polyhedral chain P in Rn , with support within a distance of the
support of T , and a C 1 diffeomorphism f of Rn such that
f# T = P + E
Remark. Note we have an error term, E, which this theorem states we have precise
control over (the theorem says we can make the mass of the error as small as we
like by choosing an appropriate f and P ). This error plays a similar role in related
regularity results that the tilt excess plays in the proof of the Allard Regularity
Theorem.
58
A Survey of Geometric Measure Theory
But for the purposes of this section I will avoid further discussion of this and
focus instead on a result due to Frankel and Lawson, and an existence result due
to Jon Pitts. There are of course the other obvious applications to establishing
regularity of solutions to other natural variational problems, and determining when
singularities are formed similar to soap films given particular data. From the exam-
ple above, such singularities may well exist, even if the integrand and geometry are
smooth.
Proof. Suppose Vn and Wn are embedded submanifolds, and that they do not in-
tersect. Suppose γ is a geodesic from p0 ∈ V to q0 ∈ W that realises the minimum
distance l from V to W . Clearly it must strike V and W orthogonally, otherwise it
can easily be shortened. Any unit vector X 0 tangent to V at p0 parallel translates
along γ to a vector field X along γ tangent to W at q0 . This vector field gives rise
to a variation of γ allowing end points to vary.
Lemma 3.7.2. (Key Lemma). The first variation of arc length is 0, and the second
variation of arc length is
Rl
L00X (0) = BW (X) − BV (X) − 0
K(X, T )ds
∂ ∂
where T = ∂t and X = ∂α are coordinate vector fields defined on a ribbon
0 ≤ t ≤ l, − ≤ α ≤ . At α = 0, T is the unit tangent field to γ, and the curve
α = 0 corresponds to the geodesic γ running from p0 to q0 . The curve t = 0 is a
curve in V ; the curve t = l is a curve in W . BV (X) (similarly BW (X)) represents
the second fundamental form of V (respectively W ) at p0 (respectively q0 ) evaluated
on X(p0 ) (resp. X(q0 )).
59
A couple of applications
hence
Z l
0 g(∇X T, T )
L (α) = dt (3.29)
0 g(T, T )1/2
Since g(T, T ) = 1 along α = 0 we get
Rl Rl
L0 (0) = 0
g(∇X T, T )dt = 0
g(∇T X, T )dt = 0
which becomes
Rl Rl
L00 (0) = 0
∇X g(∇T X, T )dt − 0
g(∇T X, T )2 dt
The second term vanishes since ∇T X = 0 (X is parallel along γ). So we are left
with
Rl Rl
L00 (0) = 0 g(∇T ∇X X, T )dt + 0 g(R(X, T )X, T )dt
60
A Survey of Geometric Measure Theory
It is clear that the first two terms are the second fundamental forms with respect
to W and V respectively evaluated at q0 and p0 . (Recall that the second fundamental
form of an embedding say f : M → M is B(X, Y ) = ∇X Y − ∇X Y where X, Y
are extensions of vector fields X and Y locally defined on M to M and ∇ is the
connection on M . ∇ is defined by
∇X Y = (∇X Y )T
Then we have that for a codimension one submanifold, say V , that g(∇X X, T )|V =
g(∇X X − ∇X X, T )|V = g(B(X, X), T )|V which is BV (X) by definition.
Doing this for n orthonormal vectors X10 , ..., Xn0 spanning Tp0 V and summing we
get
Pn 00
Pn Pn Rl
α=1 L X α
(0) = α=1 BW (X α ) − α=1 BV (X α ) − 0
Ric(T )ds
It is one of my current projects to try to extend this result to the case of minimal
submanifolds which admit singularities. There are examples of such things in the
literature. We would like to study the situation where one has minimal hypercones
in an ambient space of positive Ricci curvature. Three cases arise:
(i) Where the point of closest approach is between two smooth points,
(ii) Where the point of closest approach is between a smooth point and a singular
point,
and perhaps the most interesting case,
(iii) Where the point of closest approach is between two singular points.
61
A couple of applications
Remark. Note that this is not as general as the Frankel-Lawson result, since we
require both V and W to be compact.
Theorem 3.7.3. The conjecture is true for Vn an immersed minimal submanifold
and Wn an immersed minimal current of Mn+1 .
Proof. Suppose Vn and Wn are embedded submanifolds, and that they do not in-
tersect. Suppose γ is a geodesic from p0 ∈ V to q0 ∈ W that realises the minimum
distance l from V to W . The cases where both of p0 , q0 are nonsingular has al-
ready been treated. So suppose q0 is singular. Now clearly γ must strike V and W
orthogonally, otherwise it can easily be shortened.
Lemma 3.7.4. (Key Lemma). The first variation of arc length is 0. The second
variation of arc length is negative for some choice of piecewise smooth vector field.
Proof. We first establish that the tangent cone at the singular point q0 is a tangent
plane. This enables us to sensibly talk about vector fields about q0 . Let σ be γ
parametrised backwards from q0 to p0 . Let xn be a sequence of points converging
to q0 in W . Let αn be the corresponding sequence of geodesic curves from q0 to xn .
Let τn be the corresponding sequence of geodesics curves from xn to p0 . We proceed
in two steps:
(i) We first prove that if the angle between αn and σ is acute, that the length
of τn , L(τn ), is less than L(σ) for sufficiently large n, hence establishing that such
is an impossibility if q0 is to be the point of closest approach.
(ii) Next, we prove using the Hopf maximum principle that the angle between
αn and σ cannot be obtuse either.
Step 1. We first observe the following fact: For neighbourhoods sufficiently small
in any manifold, the metric can be written as
for points p within a distance from the center of geodesic coordinates with the
choice ∇∂i ∂j = 0 at the origin of the coordinates.
So take N sufficiently large s.t. for all n ≥ N , L(αn ) < . Take a series of -balls
that cover the geodesic triangle τn , αn , σ. Then there is an isometry up to order 2
that maps this geodesic triangle onto a triangle in Euclidean 2-space.
Hence from now on we will assume we are working in Euclidean 2-space, and by
abuse of notation will refer to the images of τn , αn and σ by the same names. To
control error, we will take N2 > N s.t. L(αn ) < 2 for all n ≥ N2 .
62
A Survey of Geometric Measure Theory
Now, the angle between −σ and −τn will become very small for n large, and,
by the first order taylor series expansion for the tangent of this angle, we compute
L(αn )2
it to be roughly L(α n)
L(σ)
to order L(σ)2
.
Proceed in steps of length along both −τn and −σ until we arrive in the last
-ball about q0 . Connect the finishing points A and D along −τn and −σ respectively
by a curve λ. This curve λ will divide the triangle into an isoceles triangle and a
rectangle. Through an easy computation, the identical interior angles within the
rectangle where λ meets σ and τn are seen to be
L(αn ) 2
φn = π/2 + 2L(σ)
+ θ( L(α n)
L(σ)2
)
1
Choose N3 ≥ N2 s.t. for all n ≥ N , L(αn ) < L(σ)
.
Then φn ≤ π/2 + L(αn )2 /2 + θ(L(αn )4 ), or φn ≤ π/2 + θ(4 ).
Then clearly if one projects down onto the image of σ, one sees that the projec-
tion of the curve from A to xn is of the same length to order 4 and the projection
of the curve from xn to q0 is nonzero because the angle between αn and σ is acute.
So, if P is the projection,
Length(curve from D to q0 ) = Length(curve from A to xn ) + Length(P(curve
from xn to q0 )) +θ(4 )
Since all other terms are of order 2 the error term is negligible and it is easily
seen that Length(curve from D to q0 ) > Length(curve from A to xn ), from which it
easily follows that L(τn ) < L(σ).
Step 2. Suppose now that there is a vector v in the tangent cone at q0 that meets
σ at an obtuse angle. By Step 1, all of the tangent cone must be at least at right
angles to σ. Furthermore, by the Hopf maximum principle, all of the tangent cone
must meet σ at an obtuse angle if part of it does. But then this is a contradiction
to the minimality of W .
So the tangent cone is a plane.
What about the second variation? Consider the class of all sequences of smooth
points (xn , yn ) such that xn converges to p0 and yn converges to q0 . From before,
we know that there is a variation such that the distance between xn and yn can be
reduced, no matter how close they are to p0 , q0 respectively.
Since V , W are compact the variational vector fields Kn inducing a negative
second variation corresponding to (xn , yn ) converge to some vector field K0 corre-
sponding to (p0 , q0 ) that induces a variation in length less than or equal to 0. We
can talk about vector fields at q0 since the tangent cone at q0 was proven to be a
63
A couple of applications
tangent plane before. If the variation is less than 0 we are done. If it is equal to 0
then I claim, if all other variations are greater than or equal to 0, this contradicts
the minimality of W (this can be seen by once again invoking the Hopf maximum
principle). Hence either all second order variations are 0 or there exists a variation
less than 0, and we are done.
So assume all second order variations are 0 about q0 . But this is impossible
because the ambient space is not Ricci flat.
The theorem now follows easily in an analogous manner to the smooth case.
Remark. The statement that variational vector fields inducing negative second vari-
ation in length on compact spaces corresponding to a convergent sequence of points
themselves converge to another variational vector field inducing nonpositive second
variation is not at all obvious, and requires some degree of proof.
Note: It may be possible to extend the argument of Frankel and Lawson to mini-
mal submanifolds that arise as limits of smooth Riemannian submanifolds, following
the approach of [20].
64
A Survey of Geometric Measure Theory
65
A couple of applications
66
Chapter 4
67
Elliptic and Parabolic PDEs
developed from the analogous results for elliptic PDE, so I shall talk about these
jointly.
4.1.1 Introduction
An elliptic equation is an equation of the following general form:
where aij ζi ζj ≥ 0 for every nonzero vector ζ. This is known as the ellipticity
condition.
Sometimes a stronger condition is required, the notion of uniform ellipticity:
aij ζi ζj ≥ Ckζk2
(L(a, b, c) − ∂t )φ = 0
68
Existence Theory for PDEs
Definition 11. We will say that L(a, g) is uniformly elliptic if 0 < λkζk2 ≤ aij ζi ζj ≤
Λkζk2 for some constants λ, Λ.
What I said above for elliptic PDEs is essentially a consequence of the following
result:
Theorem 4.1.1. ((Theorem 3.5, [26]), [28], ”The Strong Maximum Principle”).
Suppose L = L(a, g) is uniformly elliptic, c = 0, and that Lφ ≥ 0 (or ≤ 0) in
a domain U . Then if φ achieves its maximum (minimum) in the interior of U it
must be constant. Furthermore, if c ≤ 0 and c/λ is bounded, φ cannot achieve a
nonnegative maximum ( non positive minimum) in the interior of U unless it is
constant.
Proof. (by contradiction). Suppose φ is non constant and achieves its maximum
M ≥ 0 in the interior of U . Then the set Ū on which φ < M is certainly a subset of
U and also ∂ Ū ∩ U is nonempty. Choose now a point x in Ū closer to the boundary
of Ū than the boundary of U ; ie closer to the maximum point than to the boundary
of the original domain. Construct the largest metric ball B in Ū with x as centre.
Then φ(y) = M for some y on ∂B, while also φ < M in B.
To conclude the proof, we will need
Lemma 4.1.2. (Lemma 3.4, [26]). If L is uniformly elliptic, c = 0 and Lφ ≥ 0 in
B, then if y ∈ ∂B and
(i) φ is continuous at y,
69
Elliptic and Parabolic PDEs
(iii) There exists a sphere within B such that the intersection of that sphere with
∂B contains y,
Proof. (of lemma). The proof of this lemma as given in Gilbarg and Trudinger is
not personally to my taste; it is primarily an analyst’s proof, rigorous, but unnec-
essarily technical and not terribly instructive. Essentially one defines an auxiliary
function for comparison purposes and uses that to facilitate the argument. For a
more intuitive or geometric idea of why we expect the above to be true, at least in
the case that c = 0, consider a geodesic path from the centre x of the sphere in B to
the point y on ∂B. Furthermore we know that Lφ is nonnegative on this geodesic,
and φ is increasing along it at y. However, the second piece of information alone
tells us nothing; the derivative could easily be zero at y.
Consider now the geodesic as the real line and φ as a polynomial. Since Lφ can
be thought of as the ”deformed acceleration” since it is elliptic, we must have that
at y, φ(x) be locally of the form x2k , k ∈ N after an appropriate choice of chart
since the acceleration is nonnegative. By the second piece of data, we must be in
the right branch of x2k . Since y is not at zero, the derivative is nonzero and positive.
A similar argument, albeit with charts more carefully chosen, also goes through
for a nonzero function c.
It then follows easily from the lemma and from before that Dφ(y) 6= 0, but this
is impossible since y was supposed to be a maximum.
The proof of the maximum principle for parabolic equations is proved along
similar lines, though it requires some more care. The main difference is of course
the time dependence of the bound, or the ”diffusion” of u over time.
Primarily, the main difference is that the fundamental or Green’s function solu-
tion to the equation Lf − ∂t f = δ(x)δ(t) satisfies the inequality
f ≤ ect g
70
Existence Theory for PDEs
Lf − δ(t)Lg − ∂t f = 0
L̄h − ∂t f + ch = 0
∂t f ≥ c(f − δ(t)g)
∂v
Proof. Let v = log u. Then ∂t
= u1 ∆u and
∆v = ∇k ( u1 ∇k u) = u1 ∆u − 1
u2
k∇uk2
Hence
∂v
∂t
= ∆v + k∇vk2
71
Hyperbolic PDEs
n
I show ∆v ≥ − 2t .
∂
∇k v = ∇k (∆v + k∇vk2 )
∂t
= ∆∇k v + 2∇l v∇k ∇l v (4.1)
∂
∆v = ∆∆v + 2k∇2 vk2 + 2∇l v∇l ∆v
∂t
2
≥ ∆∆v + (∆v)2 + 2∇v · ∇∆v (4.2)
n
∂
∂t
∆v ≥ n2 (∇v)2
n
∆v ≥ − 2t
∂
∂t
(log(u)) − k∇log(u)k2 = ∆u
u
− k ∇u
u
n
k2 ≥ − 2t
Rij = 0
72
Existence Theory for PDEs
Following this I make some attempt at assembling an existence theory for more
general hyperbolic equations, trying to extend Shatah and Struwe’s techniques. The
main motivation is to apply this to slightly blurred Physical manifolds, where one
has two coupled PDEs
(1 + ∆)Rij = 0
and
(∆ + ∆2 )Rij = 0
Ric(g) = 0
which is of the same form as the first equation. I shall now proceed to sketch
the key ideas in this theory.
73
Hyperbolic PDEs
by absorbing time into our coordinate system, ie x 7→ x̄, x = (x̄, t). This will
simplify the description of Shatah-Struwe theory considerably.
Given this simplifying assumption, it turns out that all we need can be obtained
from the seventh page of [50]. Let u : M → R , L = L(u, p) : T M → R be smooth
functions. Write L(u, Du) as the Lagrangian density of u, and consider the action
R
A(u; Q) = Q
L(u, Du)dm
d
R
d
A(u ; Q)|=0 = Q
(Lu (u, Du) − ∂i Lpi (u, Du))φdz
Then u is stationary with respect to A iff Lu (u, Du) = ∂i Lpi (u, Du).
Now, consider the Lagrangian L(u, Du) = 21 k∇uk2 +F (u, Du). Then, after some
intermediate working, one obtains that the above equality holds only if
d
∆u = d
F (u , Du )|=0
Since F was arbitrary, the right hand side of this expression is arbitrary. Hence
we can reverse engineer an action for the original Shatah-Struwe equation. Via
geometric measure theory, since we have constructed an action, the solution will
exist and it will be smooth.
74
Existence Theory for PDEs
∆2 u = h(x, t, u, Du, D2 u, D3 u)
The idea here is to consider the same sort of thing as for the above, but instead
to look at the action
then by the same general nonsense as before, we find that the first variation is
zero iff
d
∆2 u = d
F (u , Du , D2 u , D3 u )|=0
Then, since the right hand side is arbitrary, and via the argument once more
from geometric measure theory that a smooth action implies smooth solutions, we
are done.
75
Hyperbolic PDEs
76
Chapter 5
The following notes are a typeset version, with some minor additions, of a most
excellent course given by Ben Andrews of the ANU at the University of Queensland
in the winter of 2006. For further information on this topic, in particular the spe-
cialisation to examination of the Ricci Flow and its application to the classification
of 3-manifolds, the interested reader is invited to investigate the sources Cao-Zhu
[8] and Morgan-Tian [36]. John Morgan and Gang Tian have also quite recently
written [37], which is a followup article to [36]. Some existence theory of PDE may
be required to understand some of the material in these references; hence the reader
is invited to also keep a copy of [26] handy, since many of the methods introduced
therein are applicable to weakly parabolic PDEs, of which the Ricci Flow is an
example.
Even though the information here is not directly relevant to the rest of this
work it has some bearing on classification of physical manifolds under a certain
choice of signal function (see later section). Note also that the fact that geometric
evolution equations of heat type improve the Fisher information in and of itself is
of considerable interest in finding extrema of this quantity (again, see the section
on Fisher information later on). Geometric evolution equations are also extremely
useful in proving results in comparison geometry; for instance, one such result tells
us that a manifold admitting a metric of positive Ricci curvature admits a metric
of constant sectional curvature. However, the primary motivation in preparing this
section was mainly for my self-reference, and consolidation of my understanding of
this most beautiful subject.
77
The Heat Flow Equation as the Prototypical Example
u : Ω × [0, T ) → R,
∂u
(p, t) = ∆u(p, t) (5.1)
∂t
To solve a heat flow equation, we also need
78
Geometric Evolution Equations
Z Z
d 2
u =2 u∆u
dt Ω ZΩ Z
=2 uDn u − 2 kDuk2 ≤ 0 (5.2)
∂Ω Ω
”Energy”
Z Z
d 2
kDuk = 2 Di uDi ∆u
dt Ω ZΩ
=2 Di u∆Di u
ZΩ Z
=2 Dn u∆u − 2 (∆u)2 ≤ 0, (5.3)
∂Ω Ω
Z Z
d
ulog(u) = (1 + logu)∆u
dt Ω Ω
k∇uk
Z
=− ≤0 (5.4)
Ω u
79
The Heat Flow Equation as the Prototypical Example
”Fisher Information”
Define the Fisher Information I(u) as
kDuk2
Z
I= (5.5)
Ω u
Z
d d Di uDi u
I=
dt dt Ω u
Z Z
Di uDi ∆u Di uDi u
=2 − ∆u
Ω u Ω u2
Z Z Z
Dn u∆u Di u Di uDi u
=2 − 2 Di ( )∆u − ∆u
∂Ω u Ω u Ω u2
(∆u)2
Z Z
Di uDi ∆u
=0+ −
Ω u Ω u
Z 2
1d (∆u)
= I− (5.6)
2 dt Ω u
Hence
d
R (∆u)2
dt
I = −2 Ω u
≤ 0,
if u is a positive function on Ω.
5.1.4 Smoothing
.
eg. Suppose ku0 (p, 0)k ≤ C0 (Then ku0 (p, t)k ≤ C0 ∀p, ∀t ≥ 0)
Then for each k ∈ N there exists a constant Ck such that
Ck2
kDk u(p, t)k2 ≤ tk 0
C ∀p ∈ Ω, t > 0
Note that
k
Dk u = Σ{α1 ,...,αn |α1 +...+αn =k} (∂x1 )α1∂...(∂x
u
n )αn
80
Geometric Evolution Equations
∂
∂t
Dα u = Dα ∆u = ∆(Dα u)
Hence
∂ ∂
kDk uk2 = Σα kDα uk2
∂t ∂t
= 2Σα Dα u∆(Dα u) (5.7)
Now
So we deduce
∂
∂t
kDk uk2 ≤ ∆kDk uk2 − kDk+1 uk2 c(k, n)
∂
∂t
kDk+1 uk2 ≤ ∆kDk+1 uk2
Define
kDk+1 uk2
Q= 2Ĉk2 −kDk uk2
81
The Heat Flow Equation as the Prototypical Example
∂
(2Ĉk2 − kDk uk2 ) ≥ ∆(2ˆ(C)2k − kDk uk2 ) + c(k, n)kDk+1 uk2 (5.9)
∂t
We would like to compute a heat equation type inequality for Q. For this we
make use of the following lemma:
Lemma. If f and g satisfy f, g > 0 and
∂f
∂t
≤ ∆f + P
∂g
∂t
≥ ∆g + R
then
∂ f
( )
∂t g
≤ ∆( fg ) + 2 ∇g
g
.∇( fg ) + g1 (P − fg R)
∂ c(k, n)kDk+1 k2
Q ≤ ∆Q + V.DQ − Q 2 (5.10)
∂t 2Ĉk − kDk uk2
where
D(2Ĉ 2 −kDk uk2 )
V = 2 2Ĉ 2k−kDk uk2
k
1
I claim that Q(p, t) ≤ c(k,n)t . From this our result would automatically follow.
So observe that by the first inequality we have that
∂
∂t
(tQ − 1c ) = Q + t ∂Q∂t
≤ ∆(tQ − 1c ) + V.D(tQ − 1c ) − cQ(tQ − 1c )
Now by the existence theory for parabolic PDE if one has an equation of the
form
∂
∂t
X ≤ ∆X − bj Dj X + µX
82
Geometric Evolution Equations
where bj and µ are smooth then we have a maximum principle that states that,
provided X(p, 0) ≤ 0 for all p ∈ Ω then X(p, t) ≤ 0, ∀p ∈ Ω, t ≥ 0. (For a proof of
this maximum principle, see the previous section on existence theory.)
Certainly tQ − 1c < 0 at t = 0, so the conditions of the maximum principle are
satisfied and our claim follows.
Hence
1 Ĉk2
kDk+1 uk2 ≤ ct
(2Ĉk2 − kDk uk2 ) ∼ ct
∂X/∂u
T = k∂X/∂uk
N = R− π2 T
where Rθ is rotation by θ.
Since kT k2 = 1, 2 < T, ∂T
∂s
>= 0.
Define
83
The Curve Shortening Flow (CSF)
∂T
∂s
= −κN
∂N
0 =< ∂s
,T > + < N, −κN >,
or
∂N
∂s
= κT
Now I discuss the curve shortening flow. Consider once more a smooth immer-
sion X0 . We wish to find X : R/Z × [0, T ) → R2 such that
∂X ∂2X
∂t
(u, t) = ∂s2
(u, t)
with the initial condition X(u, 0) = X0 (u). This specifies our curve shortening
flow.
∂X ∂2X ∂T
Note. Since ∂s
= T we have that ∂s2
= ∂s
= −κN , and hence that
∂X
∂t
= −κN
84
Geometric Evolution Equations
∂ρ ∂ρ
∂u
=0= ∂v
D2 ρ ≥ 0
∂ρ
= 2 < X − Y, TX > (5.11)
∂u
and
∂ρ
= −2 < X − Y, TY > (5.12)
∂v
We also have
85
The Curve Shortening Flow (CSF)
∂ 2ρ
= 2 < TX , TX > +2 < X − Y, −κX NX > (5.13)
∂u2
∂ 2ρ
= 2 − 2 < X − Y, −κY NY > (5.14)
∂v 2
and
∂ 2ρ
= −2 < TY , TX > (5.15)
∂u∂v
∂ρ ∂ρ
If ∂u =0= ∂v
then we can choose signs so that TX = TY ⊥ X − Y .
So
∂ρ
= 2 < X − Y, −κX NX + κY NY >
∂t
∂ 2ρ ∂ 2ρ
= + −2−2
∂u2 ∂v 2
= v.D2 ρ.v ≥ 0 (5.16)
86
Geometric Evolution Equations
∂ 1 ∂ ∂X
kXu k = < Xu , >
∂t kXu k ∂t ∂u
∂
=< T, (−κN ) > kXu k
∂s
= −κ2 kXu k (5.17)
Hence
2
kXu (u, t)k ≤ eC t kXu (u, 0)k (5.18)
and
2
kXu (u, t)k ≥ e−C t kXu (u, 0)k (5.19)
proving (i).
Proof of (ii):
I first claim that if the arc length between two points u and v is δ0 that
Consider ρ(u, v, t) = kX(u, t) − X(v, t)k2 on {(u, v, t)| arc length from u to v at
time t ≥ δ0 }.
The claim is then that for any two points u, v on the curve at time zero a
distance δ0 or greater apart that ρ(u, v, t) ≥ Dδ02 .
Well, the maximum principle tells us that
so the points remain separated if they are separated initially, proving (ii).
87
The Curve Shortening Flow (CSF)
∂X
∂t
(u, t) = V (u, t)
Then
∂ ∂
∂t
kXu k =< T, ∂s V > kXu k
Hence
Z 1
d ∂V
L(t) = < T, > ds
dt ∂s
Z0 1 Z 1
∂
= < T, V > ds − < V, −κN > ds (5.20)
0 ∂s 0
The first term vanishes since the variation vanishes on the boundary of the curve,
or, if it is a closed curve, since the end points are the same.
Hence
88
Geometric Evolution Equations
d
R1 R1 R1
dt
L(t) =− 0
< V, −κN > ds ≥ −( 0 kV k2 ds)1/2 ( 0 κ2 ds)1/2
∂X
∂t
(u, t) = V (u, t) = −κ(u, t)N (u, t),
so we recover CSF.
5.2.9 Existence
Let (x, y) = X : R/Z × [0, T ) → R2 .
Then
∂X
∂u
= (xu , yu )
and
(xu ,yu )
T =√ 2 2
xu +yu
Now
∂T 1 ∂ (xu , yu )
−κN = =p (p )
∂s x2u + yu2 ∂u x2u + yu2
(xuu , yuu ) (xu , yu )
=p − 2 (xu xuu + yu yuu ) (5.21)
x2u + yu2 (xu + yu2 )2
89
The Curve Shortening Flow (CSF)
< Aijα 2
β ζi ζj ηα , ηβ >≥ δ0 kζk kηk
2
∂
∂t
Xα = Aαβ (X β )uu
∂ ∂
X̂(u, t) = X(φ(u, t), t) + Xu V (φ(u, t), t)
∂t ∂t
= −κX̂ NX̂ (u, t) + Xu V (φ(u, t), t) (5.22)
Choose
1 xuu
V = 2 xu y u
(xu + yu2 ) yuu
Then
90
Geometric Evolution Equations
∂ x̂ ∂ 1 x̂uu
= X̂ = 2
∂t ŷ ∂t x̂u + ŷu2 ŷuu
∂
∂t
X = −κN
we note that
∂ ∂ ∂
Xu = (−κN ) = kXu k (−κN )
∂t ∂u ∂s
∂κ 2
= (− N − κ T )kXu k (5.23)
∂s
Now
91
The Curve Shortening Flow (CSF)
∂ ∂p
kXu k = < Xu , Xu >
∂t ∂t
1 ∂
= < Xu , Xu >
kXu k ∂t
∂
=< T, Xu >
∂t
2
= −κ kXu k (5.24)
∂ ∂ ∂
f = (kXu k−1 ∂u f )
∂t ∂s ∂t
1 1
= ∂u ∂t f − (−κ2 kXu k)∂u f
kXu k kXu k2
= ∂s ∂t f + κ2 ∂s f (5.25)
In other words
[∂t , ∂s ] = κ2 ∂s
Then
∂ ∂ ∂ ∂ ∂
( T) = ( T ) + κ2 T
∂t ∂s ∂s ∂t ∂s
3
= ∂s (−κs N ) − κ N
= −κss N − κs κT − κ3 N (5.26)
92
Geometric Evolution Equations
∂ ∂ ∂
( T)
∂t ∂s
= ∂t
(−κN ) = −κt N − κκs T
So the terms κs κT cancel and we are left with the evolution equation for the
curvature for a solution of CSF:
∂
κ = κss + κ3 (5.27)
∂t
∂k
2
k ∂s k κk
for all k. I will prove this only for k = 1, but the core ideas remain the same.
The key is to use the evolution equation for the curvature we have just derived.
So suppose κ2 < C2 . I claim that we get a bound on κs .
Proof. First note that
∂
κs = (κss + κ3 )s + κ2 κs
∂t
= κsss + 4κ2 κs (5.28)
∂ 2
κ
∂t s
= (κs )2ss − 2κ2ss + 8κ2 κ2s
κ2s
Now if we denote g = C 2 −κ2
, we get that
∂ 2κ4 κ2s
g ≤ gss + ags + 8κ2 g − 2g 2 + 2 for some a,
∂t (C − κ2 )2
≤ gss + ags + Cg − 2g 2 for some new constant C. (5.29)
93
The Curve Shortening Flow (CSF)
Then, as before, since initially g is a positive bounded function we get, via the
maximum principle, that
g(u, t) ≤ D for all u ∈ Ω, t ≥ 0, for some positive number D or, in other words,
that
κ2s ≤ C 2 D
∂
∂t
kXu k = −κ2 kXu k
ie Xu is bounded.
For higher derivatives, note that
∂ ∂k X ∂k X ∂iX
∂t
< ∂uk
,T >= −κ2 < ∂uk
,T > + (lower order in < ∂ui
>)(κ, κs , ...)
k
So, once again by the maximum principle and induction, k < ∂∂uXk , T > k remains
bounded if it is bounded initially. It turns out that this is all we need to conclude:
k
Theorem 5.2.1. If X exists on R/Z ×[0, T ), then k ∂∂uXk k is bounded on R/Z ×[0, T )
∂k ∂l
and k ∂u k ∂tl Xk is bounded.
94
Geometric Evolution Equations
The Arzela Ascoli Theorem now tells us that Xt (.) = X(., t) → XT (.), i.e. the
flow converges smoothly to a sensible limit at t = T provided we have bounds on
the curvature. We may then apply short time existence starting from XT to extend
the solution further.
To digress briefly, recall the statement of the Arzela-Ascoli theorem:
Theorem 5.2.2. (Ascoli-Arzela). Let X be a compact metric space, Y a metric
space. Then a subset F of C(X,Y) is compact in the compact-open topology if and
only if it is equicontinuous, pointwise relatively compact and closed.
Note that this proves the Global Existence Theorem which I stated earlier, since
this process can only stop when the curvature becomes unbounded.
π = Ld sin( πl2 )
95
Some Geometric Background on Hypersurfaces
M ∩ U = g −1 (0)
Tx M = im(DX(p))
∂X ∂X
{ ∂p 1 , ..., ∂pn }
∇g
N= k∇gk
⊥ TM
∂X ∂X
gij = ∂pi
· ∂pj
2
hij = − ∂p∂i ∂p
X
j · N
96
Geometric Evolution Equations
The second fundamental form plays an analogous role for a hypersurface that
the curvature plays for a curve; it completely specifies how the surface M n embeds
in Rn+1 .
For example, take M n = S n ⊂ Rn+1 . Then g = kXk2 − 1, and N = X. This
implies that hij = gij .
The Implicit Function Theorem implies that if X and Y are local parametri-
sations of M near x ∈ M then Y = X ◦ φ where φ : A ⊂ Rn → B ⊂ Rn is a
diffeomorphism. In particular,
∂Y ∂ ∂X ∂φj
∂q i
= ∂q i
(X(φ(q))) = ∂pj ∂q i
Define
∂φj
Λji = ∂q i
as the Jacobian of φ.
Then observe that
∂ 2Y ∂ ∂X ∂φk
= ( )
∂q i ∂q j ∂q i ∂pk ∂q j
∂ 2 X ∂φk ∂φl ∂X ∂ 2 φk
= k l j i + k j i (5.31)
∂p ∂p ∂q ∂q ∂p ∂q ∂q
Now by definition
∂ 2Y
hYij = − ·N
∂q i ∂q j
∂ 2X
= − k l · N Λki Λlj
∂p ∂p
X k l
= hkl Λi Λj (5.32)
97
Some Geometric Background on Hypersurfaces
proving that the second fundamental form also transforms like a tensor!
The second fundamental form has a natural interpretation. Suppose v ∈ Tx M .
Then h(v, v) is the curvature of the curve M ∩ Π in the plane Π, where Π =
span{v, N }.
∂X
V (W X) = DV (W i )
∂pi
∂ ∂X
= V j j (W i i )
∂p ∂p
i 2
j ∂W ∂X j i ∂ X
=V +V W
∂pj ∂pi ∂pj ∂pi
i 2 2
j ∂W ∂X j i ∂ X tang j i ∂ X
= (V + V W ( i j ) ) + V W ( i j · N )N (5.33)
∂pj ∂pi ∂p ∂p ∂p ∂p
Take f = X. Then
98
Geometric Evolution Equations
0 = ∂i ∂j (∂k X) − ∂j ∂i (∂k X)
= ∂i ((∇∂j ∂k )X − hjk N ) − (i ↔ j)
= (∇∂i (∇∂j ∂k ))X − h(∂i , ∇∂j ∂k )N − (∇i hjk + h(∇∂i ∂j , ∂k )
+ h(∂j , ∇∂i ∂k ))N − hjk hip g pq ∂q X − (i ↔ j)
= (∇∂i (∇∂j ∂k ) − ∇∂j (∇∂i ∂k )
− (hjk hip − hik hjp )g pq ∂q )X + (∇∂j hik − ∇∂i hjk )N (5.34)
g(ei , ej ) = δij
then
h(ei , ej ) = κi δij
99
Mean Curvature Flow (MCF)
Pn
H= i=1 κi = g ij hij
det(h)
K = Πni=1 κi = det(g)
X : M0 × [0, T ) → Rn+1
∂X
where ∂t
= V = ωX + f N .
∂X ∂X
Recall gij = ∂pi
· ∂pj
.
Now
∂ ∂X ∂
i
= i (ωX + f N )
∂t ∂p ∂p
∂f
= (∇∂i ω)X − h(∂i , ω)N + i
N + f hpi ∂p X (5.37)
∂p
100
Geometric Evolution Equations
∂ ∂ ∂X ∂X
gij = · + (i ↔ j)
∂t ∂t ∂pi ∂pj
= g(∇∂i ω, ∂j ) + f hpi gpj + (i ↔ j)
= g(∇∂i ω, ∂j ) + g(∂i , ∇∂j ω) + 2f hij (5.38)
Hence
∂ ∂
det(g) = det(g)g ij gij
∂t ∂t
= 2det(g)(∇∂i ω i ) + 2det(g)f H (5.39)
So
∂p p
det(g) = [(∇∂i ω i ) + f H] (det(g)) (5.40)
∂t
So we may finally conclude that
Z
d
|M | = (div(ω) + f H)dµ(g)
dt
ZM
= f Hdµ(g) by the divergence theorem (5.41)
M
∂X
∂t
= −kHN
∂X
∂t
= ∆X
101
Mean Curvature Flow (MCF)
∆X = g ij (−hij N ) = −HN
and
X(z, 0) = z
for all z ∈ M0 .
The solution to this initial value problem if it exists is the MCF of M0 .
∂2X
(The bracketed term is the normal component of ∂pi ∂pj
)
˙ be the covariant derivative at t = 0.
Let ∇
Then
∂X ˙ i∇
= g ij (∇ ˙ j X + (∇
˙ i ∂j − ∇i ∂j )X)
∂t
102
Geometric Evolution Equations
∂ ∂ ∂X ∂X
gij = ( i · j )
∂t ∂t ∂p ∂p
= −2Hhij (5.42)
Evolution of Normal.
Define
∂
∂t
N = Ai ∂i X
We compute the Ai .
Now certainly
∂X
0=N ·T =N · ∂pi
So
∂N ∂X ∂
0= ∂t
· ∂pi
+N · ∂pi
(−HN )
The first term is nothing other than Ai gji . The second term is clearly − ∂H
∂pi
. So
we conclude
∂H
Aj gji = ∂pi
From which we derive finally the evolution equation for the normal to the hy-
persurface:
∂N ∂H
= i g ij ∂j X
∂t ∂p
= (∇H)X (5.43)
103
Mean Curvature Flow (MCF)
∂t ∂i N = ∂t (hki ∂k X)
∂
= ( hki )∂k X + hki ∂k (−HN )
∂t
∂ k
= hi )∂k X − Hhpk ∂p X (5.44)
∂t
But also
∂t ∂i N = ∂i (∇k Hg kl ∂l X)
= (∇i ∇k Hg kl ∂l X) − h(∇H, ∂i )N (5.45)
∂ j
h = ∇i ∇k (H)g kj + Hhki hjk (5.46)
∂t i
which is the evolution equation for h.
Consequently
∂
∂t
H = ∆H + Hkhk2 ,
104
Geometric Evolution Equations
X(p, t) → x0 ∈ Rn+1 as t → T
and
X(p,t)−x0
(2n(T −t))1/2
→C ∞ XT
where XT (M0 ) = S n .
Remark. Note that if M0 = r0 z, where z ∈ S n , then X(z, t) = r(t)z and N (z, t) = z.
So
hji ∂j X = ∂i N = ∂i z = ∂i ( Xr ) = 1r ∂i X
∇i ∇j H = ∇i ∇j (g kl hkl )
= g kl ∇i ∇j hkl (since ∇i gjk = 0)
= g kl ∇i ∇k hjl (by Codazzi) (5.47)
0 = (∇i ∇k − ∇k ∇i )h(∂j , ∂l )
= ∇i (∇k hjl + h(∇k ∂j , ∂l ) + h(∂j , ∇k ∂l )) − (i ↔ k)
= ∇i ∇k hjl + ∇h(∇k ∂j , ∂l ) + h(∇∂, ∇∂)
+ h(∇i ∇k ∂j , ∂l ) + h(∂j , ∇i ∇k ∂l ) − (i ↔ k) (5.48)
Note that the second and third terms can be made to vanish if we choose coor-
dinates so that ∇k ∂j (x) = 0.
Continuing under this choice of coordinates, we get that
105
Mean Curvature Flow (MCF)
Hence
[∇i , ∇k ]hjl = hij hpk hpl − hkj hpj hpl + hil hpk hjp − hkl hpi hjp
Which implies after using the Codazzi equation and Simon’s Identity that
∂ j
hi = (∇i ∇k H + Hhpi hpk )g kj
∂t
= ∆hji + hji khk2 (5.49)
Note that this is the same evolution equation for h as we derived before, except
that we did not make the assumption that M was a hypersurface, ie this equation
is more generally applicable.
106
Geometric Evolution Equations
for t0 ≥ t.
Remarks.
(1) Note that the fundamental solution of the heat equation on Rn+1 is ρ(x, t) =
n+1 2
ct− 2 exp(− kxk
4t
).
∂u
(2) Furthermore, if u satisfies ∂t
= ∆u in Rn then
kx − x0 k2
Z
u(x0 , t0 ) = cn (t0 )−n/2 exp(− )u(x, 0)dxn (as can be proved using Green’s functions)
4t0
kx − x0 k2
Z
= cn (t0 − t)−n/2 exp(− )u(x, t)dxn for0 ≤ t ≤ t0 (5.50)
4(t0 − t)
Hence
kx − x0 k2
Z
d
(t0 − t)−n/2 exp(− )u(x, t)dxn = 0 (5.51)
dt 4(t0 − t)
Now, consider solitons of MCF, ie Mt = λ(t)M0 . So we have X(p, t) = λ(t)X(φ(p, t), 0),
0 ∂φj
∂X
which implies −Hµ|(p,t) = λλ X|(p,t) + λ ∂xj |(φ(p,t),0) ∂t .
∂X
∂t
= fµ
∂ ∂
So ∂t gij = 2f hij , and hence ∂t
(dµ) = f H(dµ).
Therefore,
∂
∂t
kXk2 = 2 < X, ∂X
∂t
>= 2f < X, µ >
107
Mean Curvature Flow (MCF)
so
∂
∂t
(exp(− 2c kXk2 )dµ) = f (H − c < x, µ >)exp(− 2c kXk2 )dµ
Choose c = 2(T1−t) .
Then we have from the relation immediately above, that a MCF soliton is a
R kXt k2
critical point of Mt exp(− 4(T −t)
)dµ, where Mt = λ(t)M0 . We immediately see the
connection with the monotonicity formula, because, after reparametrisation, we see
kX0 k2
that a MCF soliton is a critical point of M0 (t0 − t)−n/2 exp(− 4(T
R
−t)
)dµ.
We recall the monotonicity formula for the MCF, as stated above:
2
d
(t − t)−n/2 exp(− kx−x 0k
R
dt M 0 4(t0 −t)
)dµ ≤ 0.
Now I claim that we acheive equality if and only if Mt is a MCF soliton. Certainly
if Mt is a MCF soliton, this is the case. So it remains to prove the converse. In
particular, I will in the process prove the monotonicity formula.
(To prove that we can always choose an appropriate c, we compute
λ0
− λ1 Hµ|0 = −Hµ|Mt = λ0 X0 = c
Hµ|0 + something tangential
This implies that λλ0 |t = λλ00 = −c and so λ2 (t) = 1 − 2ct = 2c(T − t), from
0 2 0
which we intuit λλ = 21 (λλ2) = 21 log(2c(T − t))0 = 2(T1−t) as required.)
To prove the converse, we adopt the notation ρ : Rn+1 × [0, ∞) → R for the map
n+1 2
ρ(x, t) = (t0 − t)− 2 exp(− kx−x 0k
4(t0 −t)
)
∂ n+1
Note that ∂t = −∆R ρ.
We compute
d
R
(t0 − t)1/2 ρ(X(p, t), t)dµ
dt
n+1
= − 12 (t0 − t)−1/2 ρdµ − (t0 − t)1/2 ρH 2 dµ + (t0 − t)1/2 (−∆R ρ + Dρ(−Hµ))dµ
R R R
n+1 ∂2ρ
∆R ρ = D2 ρ(µ, µ) + (∂xi )2
108
Geometric Evolution Equations
Also,
d2
∇∇ρ(ei , ei ) = ρ(γ(s))
ds2
d
= (Dρ(γ 0 ))
ds
= Dρ(γ 00 ) + D2 ρ(γ 0 , γ 0 ) (5.52)
and hence
n+1
∆R ρ = D2 ρ(µ, µ) + ∆M ρ + HDµ ρ
So
Z Z
d 1/2 ρ
(t0 − t) ρdµ = − (t0 − t)1/2 [ + Dµ Dµ ρ]
dt 2(t0 − t)
(Dµ ρ)2
Z Z
1/2 2 2HDµ ρ Dµ ρ 2
− (t0 − t) ρ[H − +k k ] + (t0 − t)1/2
ρ ρ ρ
Z
Dµ ρ 2
= − (t0 − t)1/2 ρkH − k dµ
ρ
(Dµ ρ)2
Z
ρ
= − (t0 − t)1/2 [Dµ Dµ ρ − + ] (5.53)
ρ 2(t0 − t)
Di ρDj ρ ρ
Di Dj ρ − ρ
+ δ
2(t0 −t) ij
=0
∂u
(2) For any solution u of ∂t
= ∆u on Rn with u ≥ 0,
Di uDj u uδij
Di Dj u − u
+ 2t
≥0
109
Mean Curvature Flow (MCF)
δij
(This occurs if and only if Di Dj logu + 2t
≥ 0).
Remark. It follows from this claim that a (smooth) limit of rescaled flows about
any point is a shrinking MCF soliton.
Proof of Claim. Recall if µ is large,
∂
∂t
µ = ∆µ + k∇µk2
Hence
∂
Dµ
∂t i
= ∆Di µ + 2Dk µDk Di µ
and
∂
DDµ
∂t i j
= ∆Di Dj µ + 2Dk µDk (Di Dj µ) + 2Dj Dk µDk Di µ
One can then apply the maximum principle to show that tDi Dj µ + δij ≥ 0,
since this is certainly true when t = 0. This proves the claim, and the monotonicity
formula, since we have that
d
R
dt Mt
(t0 − t)1/2 ρdµ ≤ 0
2
by the claim. Furthermore, equality is only achieved if H = Dρµ ρ = Dµ (− kX−x 0k
4(t0 −t)
+
<X−x0 ,µ>
c(t)) = − 2(t0 −t) - but this is just the MCF soliton equation. In other words, this
implies that a smooth limit of rescaled flows about any point is a shrinking MCF
soliton, as required.
We have the following theorem, due to Huisken, which goes some way towards
classifying these beasts:
Theorem. A complete, shrinking soliton of MCF with H > 0 is either
(ii) S k × Rn−k ,
110
Geometric Evolution Equations
is preserved. To show this I shall invoke a Maximum Principle for Qij . I will state
a Maximum Principle for vectors first, and then show how this can be extended to
deal with 2-tensors. The preservation of the inequality will follow as a consequence
of this principle.
111
Proof of the Huisken rescaling result of MCF
∂ ∂f
f (u) = (∆uα + V α (u)) (5.54)
∂t ∂uα
∂f
∆f = g kl ∇k ( ∇l uα )
∂uα
2
∂f α kl ∂ f
= ∆u + g ∇k uα ∇l uβ (5.55)
∂uα ∂uα ∂uβ
Hence
∂
f = ∆f − g kl D2 f (∇k u, ∇l u) + Df · V (Note that f > 0).
∂t
≤ ∆f + Df · V (since D2 f is positive as f is convex) (5.56)
Now, suppose we look at u ∈ Rn − ∂K, such that u0 is the closest point, and
w = ∇f (u0 ) with u = u0 + tw. Note that w · V (u0 ) < 0. Then
∂f
Df · V = ∂uα
Vα
∂f ∂f
= |
∂uα u0 +tw
(V α (u0 )) + |
∂uα u0 +tw
(V α (u) − V α (u0 ))
Now, the first term is negative by our initial remark that w · V (u0 ) < 0. Since
as V is smooth and for u, u0 sufficiently close, |V α (u) − V α (u0 )| ≤ Cku − u0 k =
∂f
Cd = Cf for some constant C. Note also that ∂u α |u = ∇f (u) is also bounded since
K is compact and hence the function k∇f k will be bounded on the boundary of K.
So the second term is bounded, and so we can conclude
∂f
∂t
≤ ∆f + Cf
112
Geometric Evolution Equations
∂
T
∂t kl
= ∆Tkl + Qkl (T )
∂
A
∂t ij
= ∆Aij + Q(A)ij
Then the idea is to work with this tensor. The remainder of the argument is
completely analogous to the previous case, if only a little more complicated.
113
Rescaling
Then once again, K is convex, a cone, and preserved by the above ODE.
So if κmax (x, 0) ≤ Cκmin (x, 0) ∀x ∈ M , then κmax (x, t) ≤ Cκmin (x, t) ∀x ∈
M, t ≥ 0.
Remark. Note if κmax (x, t) ≤ κmin (x, t) ∀x, ie hij = f (x)gij , then f is constant
on connected components.
Proof of Remark. By Codazzi, ∇k hij = ∇i hkj . This implies that (∇i f )gkj =
(∇k f )gij since ∇g = 0. Choose i = j 6= k, and g = δ. Then we conclude ∇k f = 0,
or f is constant.
5.6 Rescaling
Rescaling is a recurring theme in the subject of geometric evolution equations. It
is often important to understand the limiting behaviour of geometric flows, in order
to get a better understanding of the singularities which may develop, and hence
being able to justify surgery in certain circumstances. This becomes particularly
important in the Ricci flow, where the idea is to break a three manifold into pieces
by performing successive surgeries, until the manifold has been decomposed into
”prime” components. The understanding of how these ”prime” components then
collapse to points is key to their classification.
114
Geometric Evolution Equations
5.6.1 Preliminaries
Now, MCF has a scaling symmetry: if X : M0n × [0, T ) → Rn+1 satisfies MCF, then
so does
where t̂ = t − t̄.
Definition 13. (Convergence of hypersurfaces).
A sequence of closed hypersurfaces Mk converges to M∞ if there exist
gk ∈ C ∞ (Rn+1 ), k ∈ N ∪ {∞},
Mk = gk−1 (0), ie the Mk are level sets of the gk ,
such that
k∇gk (x)k ≥ 1 for all x ∈ Mk (all points x are regular points, and hence the Mk
are well defined submanifolds of Rn+1 ),
and
gk → g∞ in C ∞ (BR ) for all R.
for all x ∈ Mk ∩ BR , j ≥ 0,
and assume we have a bound below on the ”tube radius”
115
The 2 dimensional Ricci Flow (2DRF)
Claim. For M0 convex, there exists a smooth limit flow at (x̄, T ), some x̄ ∈ Rn+1 .
Proof. We use results from existence theory of parabolic PDE to conclude smooth-
ness results on smaller and smaller balls.
1
For t̄ < T , choose λ = r1− . For 0 ≤ t ≤ 8n , Mλ (t) is between S n (C0 ) and S n ( 21 ).
1
On B1/4 we can write M(λ,t) = graph(u(., t)) (on B1/4 × [0, 8n ]). Then we observe
that ku(., t)k ≤ c0 and kDu(., t)k ≤ c1 some constants c0 , c1 , and
∂u Di uDj u
∂t
= (δ ij − 1+kDuk2
)Di Dj u
1 1
Now the Hölder gradient estimate holds on B1/8 × [ 16n , 8n ]. Therefore, by the
interior Schauder estimates,
∂
g
∂t ij
= −2Rij
116
Geometric Evolution Equations
∂
g
∂t ij
= −2Kgij
1
∇g∂i ∂j = g kl (∂i gjl + ∂j gil − ∂l gij )∂l
2
1
= e−2u (g0 )kl (∂i (e2u (g0 )jl ) · · · )∂l
2
= ∇g∂0i ∂j + ∂i u∂j + ∂j u∂i − (g0 )ij ∂p u(g0 )pq ∂q (5.58)
∂u
∂t
= e−2u (∆g0 u − K0 )
117
The 2 dimensional Ricci Flow (2DRF)
∂ k 1
Γij |(x0 ,t0 ) = g kl (∂i (−f gjl ) + ∂j (−f gil ) − ∂l (−f gij ))
∂t 2
1 kl
= g (∇i (−f gjl ) + ∇j (−f gil ) + ∇l (f gij ))
2
1 1 1
= − ∇i (f )δjk − ∇j (f )δik + gij ∇l (f )g lk (5.59)
2 2 2
Now, recall by definition
∇i ∂j = Γpij ∂p
so
l
Rikj = ∂k Γlij − ∂i Γlkj + Γ ? Γ
so
∂ l 1 1 1
Rikj = ∇k (− ∇i (f )δjk − ∇j (f )δik + gij ∇l (f )g lk ) − (i ↔ k)
∂t 2 2 2
1 1 1 1
= − ∇k ∇j f δi + gij ∇k ∇p f g + ∇i ∇j f δkl − gkj ∇i ∇p f g pl
l pl
(5.60)
2 2 2 2
118
Geometric Evolution Equations
∂ 1 1 1 1
Rij = − ∇i ∇j f + gij ∆f + ∇i ∇j f n − ∇i ∇j f
∂t 2 2 2 2
1
= gij ∆f (5.61)
2
But
∂ ∂ 1 1 ∂R
R
∂t ij
= ( Rgij )
∂t 2
= g
2 ∂t ij
+ 12 R(−f gij )
So
∂R
∂t
= ∆f + Rf
∂R
∂t
= ∆R + R2
Rmin (0)
Rmin (t) ≥ 1−Rmin (0)t
≥ − 1t
5.7.3 Normalisation
We compute the change in area A(t) of the surface M (t) under the 2DRF
∂
g
∂t ij
= −Rgij
Now,
R p
Area = A = M
det(g)
∂
∂t
det(g) = det(g)tr(g −1 ∂g
∂t
) = det(g)(−2R)
119
The 2 dimensional Ricci Flow (2DRF)
∂
p p
∂t
det(g) = −R det(g)
Hence
d
R
dt
A(g(t)) =− M
Rdµ(g(t))
R
Out of interest, recall the Gauss-Bonnet theorem: M
R = 4πχ(M ).
But anyway,
∂
= −R(g)g A(0) gA(0) R
∂t
ĝ A(t)
− A(t)2 (− R)
R
R
Define r = A(0)
(a constant).
Then
∂
∂t
ĝ = − A(0)
A
(Rg − rĝ)
∂ A ∂
Note that R̂ĝ = Rg, and define a time variable τ such that ∂τ
= A(0) ∂t
. Then
∂
∂τ
ĝ = −(R̂ − r)ĝ
∂
∂τ
R̂ = ∆R̂ + R̂(R̂ − r)
120
Geometric Evolution Equations
∂
∂t
(R̂ − r) = ∆(R̂ − r) + R̂(R̂ − r),
(i) R̂ ≤ −δ is preserved,
(ii) ∂
∂t
(R̂max − r) ≤ −δ(R̂max − r), which implies that R̂max − r ≤ Ce−δt ,
(iv) ∂
∂t
û= −(R̂ − r) implies that kû(x, t) − û(x, 0)k ≤ Ce−δt , which implies that
û(x, t) → û∞ (x) in the C ∞ norm, with R(e2ûinf ty g0 ) = r.
ĝ(φ(x,t),t) ( ∂φt
∂xi
∂φt
(x, t), ∂x j (x, t)) = ĝ(x,0) (∂i , ∂j )
∂
∂t
= L ∂φ g = ∇i Vj + ∇j Vi
∂t
(R − r)gij + ∇i Vj + ∇j Vi = 0
A gradient soliton is one such that V can be realised as the gradient of a scalar
function f . But then ∇i Vj = ∇j Vi = ∇i ∇j f , so
(R − r)gij + 2∇i ∇j f = 0,
121
The 2 dimensional Ricci Flow (2DRF)
the Gradient Soliton equation. It turns out that the classification of solutions
of this equation is precisely what is required for the classification of prime three
manifolds, ie, to establish the geometrisation conjecture of Thurston. As a final
comment, note that if we contract the above, we get a related equation from which
we lose no information in the case of the 2D Ricci Flow:
(R − r) + ∆f = 0
0 = ∇k Rij + ∇k ∇i ∇j f − ∇i Rkj − ∇i ∇k ∇j f
p
= ∇k Rij − ∇i Rkj + Rkij ∇p f (5.62)
0 = ∇k R − ∇p Rkp − Rkp ∇p f
122
Geometric Evolution Equations
0 = ∇k R − ∇j Rkj − ∇j Rkj
ie ∇p Rkp = 21 ∇k R.
If dim(M ) = n = 2, we conclude further that
1 1
0 = ∇k R − Rδkp ∇p f
2 2
1
= (∇k R − R∇k f ), (5.63)
2
or
∇k (Re−f ) = 0
Re−f is quite reminiscent of the general form of the integrand in the Perelman
functional (see later).
Now,
∇k (k∇f k2 + R) = 2∇q f ∇k ∇q f + ∇k R
= (∇k R − R∇k f ) + r∇k f (5.64)
∂f
∂t
= ∆f + rf
123
The 2 dimensional Ricci Flow (2DRF)
Claim. If f is the solution of this equation, then (∆f + (R − r))(x, t) = 0 for all
x ∈ M , t ≥ 0.
Proof of Claim.
Well certainly if f satisfies this equation, then
∂ ∂
∂t
(∆f + (R − r)) = ∂t
∆f + ∆R + R(R − r)
Now by definition,
∂ ∂ √ 1 ∂
p
∂t
∆f = ∂t
( i( det(g)g ij ∂x∂ i f ))
det(g) ∂x
∆f = e−2u ∆g0 f
So
∂
∂t
∆f = −(−(R − r))∆f + ∆ ∂f
∂t
Then
∂
(∆f + (R − r)) = ∆(∆f + rf ) + ∆R + R(R − r) + (R − r)∆f
∂t
= ∆(∆f + (R − r)) + R(∆f + (R − r)) (5.65)
124
Geometric Evolution Equations
∂
∂t
f = ∆f + rf
∂
∇i f = ∇i (∆f + rf )
∂t
p
= g kl ∇i ∇k ∇l f + g kl Rikl ∇p f + r∇i f
1
= δ∇i f − R∇i f + r∇i f (5.66)
2
Hence
∂ ∂
k∇f k2 = (g ij ∇i f ∇j f )
∂t ∂t
1
= 2∇i f (∆∇i f − R∇i f + r∇i f ) + (R − r)k∇f k2 (5.67)
2
∂ ∂ ij
(Note that if g
∂t ij
= µgij then ∂t
g = −µg ij .)
Hence
∂
k∇f k2 = ∆k∇f k2 − 2k∇2 f k2 − Rk∇f k2 + 2rk∇f k2 + Rk∇f k2 − rk∇f k2
∂t
= ∆k∇f k2 − 2k∇2 f k2 + rk∇f k2
≤ ∆k∇f k2 + rk∇f k2 (5.68)
from which we conclude, again by the maximum principle for parabolic PDE,
that
for all x ∈ M .
We wish to control R, however; this will require looking to higher derivatives of
f . So
∂
∂t
(k∇f k2 + R) = ∆(k∇f k2 + R) − 2k∇2 f k2 + rk∇f k2 + R(R − r)
125
The 2 dimensional Ricci Flow (2DRF)
Therefore
∂
∂t
(k∇f k2 +R) ≤ ∆(k∇f k2 +R)−(R−r)2 +r(k∇f k2 +R)+R(R−r)−rR+r(R−r)
Hence
∂
∂t
(k∇f k2 + R − r) ≤ ∆(k∇f k2 + R − r) + r(k∇f k2 + R − r)
k∇f k2 + R − r ≤ Cert ,
and in particular,
R − r ≤ Cert
since by our previous result k∇f k ≤ Cert , and by abuse of notation I am not
relabelling constants.
But we already know that R − r ≥ −Cert , since by the evolution equation for
the scalar curvature,
∂
= ∆(R − r) + R(R − r)
∂t
= ∆(R − r) + (R − r)2 + r(R − r)
≥ ∆(R − r) + r(R − r) (5.70)
kR − rk ≤ Cert (5.71)
126
Geometric Evolution Equations
∂
∂t
f = ∆f
so kf k ≤ C.
Like before,
∂
∂t
(k∇f k2 + R) ≤ ∆(k∇f k2 + R)
so k∇f k2 + R ≤ C
Now
∂
(2tk∇f k2 + f 2 ) ≤ 2k∇f k2 + ∆(2tk∇f k2 ) + ∆f 2 − 2k∇f k2
∂t
= ∆(2tk∇f k2 + f 2 ) (5.72)
∂
(2k∇f k2 + R) ≤ ∆(2k∇f k2 + R) − 2R2 + R2
∂t
≤ ∆(2k∇f k2 + R) − R2 (5.73)
I claim that 2k∇f k2 + R ≤ Ct , and hence that, putting together the previous
information we deduced, R ≤ Ct .
Now, we already know that R ≥ − 1t since ∂R ∂t
= ∆R + R2 , so we get that R is
controlled. We get bounds on k∇k Rk as before. So normalised 2DRF will converge
to a sensible limiting solution e2u∞ g0 for the metric.
In fact, we can conclude more than this. Since ∂t ∂
u = − 12 R, and ∂f
∂t
= ∆f = −R,
∂
we get that ∂t (2u − f ) = 0. But since f is bounded, this means that u must be
∞
bounded. But if u is bounded, we must get that u → u∞ such that R(e2u g0 ) = 0.
Finally, in the case that r > 0, we can no longer use such basic arguments
and need to use more advanced tools. Proving convergence for the cases r < 0 and
r = 0 are the analogies of the classification of the solutions to normalised 3DRF that
Hamilton and his contemporaries made in the 80s and early 90s. The case r > 0
with R > 0 initially requires the introduction of an ”entropy” to prove convergence,
much like Perelman did for the 3DRF. In particular, one defines the quantity
127
The 2 dimensional Ricci Flow (2DRF)
Z
Z= Rlog(R) (5.74)
M
and shows that it decreases with the time parameter under normalised 2DRF. In
particular, you find that dtd Z ≤ 0, and dtd Z = 0 if and only if g is a gradient soliton.
Also, at least for 2DRF, you need to use a related Differential Harnack Inequality.
In particular, we have
Theorem 5.7.1. (Harnack Inequality for the 2DRF). For any solution of 2DRF
with R > 0, we have the following inequality:
∂R k∇Rk2 R
− + ≥0 (5.75)
∂t R t
Proof. Recall the related Harnack Inequality for the heat equation:
k∇uk2
If ∂u
∂t
= ∆u on Rn , u > 0 then ∆u − u
+ nu
2t
≥ 0.
Now for a soliton:
0 = ∇i ∇j f + 21 (R − r)gij
∇l R∇k R
hence ∇k R = R∇k f . So ∇k ∇l R = ∇l R∇k f +R∇k ∇l f = R
− 12 R(R−r)gkl .
Hence
∂R k∇Rk2 k∇Rk2
∂t
− R
= ∆R − R
+ R(R − r) = 0
Compute:
∂ ∆R k∇Rk2
(
∂t R
− R2
+ R − r)
2 2 2
≥ ∆( ∆R
R
− k∇Rk
R2
+ R − r) + ∇R
R
· ∇( ∆R
R
− k∇Rk
R2
+ R − r) + ( ∆R
R
− k∇Rk
R2
+ (R − r))2
∆R k∇Rk2
R
− R2
+ R − r ≥ − 1t
128
Geometric Evolution Equations
R ≥ − 1t
This requires us to modify the above notion of entropy and the Harnack inequal-
ities, but it is possible to make arguments analogous to the above to make this all
work.
Recall that
∂
g
∂t ij
= Apqrs
ij ∂r ∂s gpq + lower order terms
We wish to show that this system is strongly parabolic, so that we can use
existence results for such equations (just use existence theory for the heat equation,
since all strongly parabolic equations are equivalent to the heat equation under an
appropriate reparametrisation). So, we wish to establish that
129
The Ricci Flow
< Apqrs 2
ij ξr ξs ηpq , ηij >≥ kξk kηk
2
So
l
Rikj ∂l = ∇k (∇i ∂j ) − ∇i (∇k ∂j )
= ∇k (Γlij ∂l ) + ∇i (Γlkj ∂l )
= (∂k Γlij − ∂i Γlkj )∂l + lower order terms
1
= g lm (∂k (∂i gjm + ∂j gim − ∂m gij ) − ∂i (∂k gjm + ∂j gkm − ∂m gkj )) + lower order terms
2
1 lm
= g (∂k ∂j gim − ∂k ∂m gij − ∂i ∂j gkm + ∂i ∂m gkj ) + lower order terms (5.76)
2
Hence
1 1 1 1
Rij = − g kl ∂k ∂j gij + ∂j (g kl ∂k gil ) + ∂i (g kl ∂k gjl ) − g kl ∂i ∂j gkl + lower order terms
2 2 2 2
1 kl 1 kl 1 kl 1 kl 1
= − g ∂k ∂j gij + ∂i ( g ∂k gjl − g ∂j gkl ) + ∂j ( g ∂k gil − g kl ∂i gkl ) + lower order terms
2 2 4 2 4
(5.77)
Hence recalling that Rij is a tensor that our calculation is hence the same in all
coordinate systems, we get that
Hence this Ricci flow is weakly parabolic, and short time existence follows.
130
Geometric Evolution Equations
∂ l
Rikj = ∇k (∂t Γlij ) − ∇i (∂t Γlkj )
∂t
1 ∂ ∂ ∂
= ∇k ( g lm (∇i gjl + ∇j gil − ∇l gij )) − (i ↔ k)
2 ∂t ∂t ∂t
= ∇k ∇l Rij − ∇k ∇i Rjl − ∇k ∇j Ril − ∇i ∇l Rkj + ∇i ∇k Rjl + ∇i ∇j Rkl (5.78)
(Define ∇i ∇k Rjl = R ? R)
Recall the second Bianchi identity:
∂ l
R = −∇k ∇p Rljip + ∇i ∇p Rljkp + R ? R
∂t ikj
= ∇p (∇i Rljkp − ∇k Rljip ) + R ? R
= ∇p (∇i Rkplj + ∇k Rpilj ) + R ? R
= −∇p ∇p Riklj + R ? R(using the second Bianchi identity once more)
= ∆Rikjl + R ? R (5.79)
131
The Ricci Flow
∂
R
∂t ijkl
= ∆Rijkl + R ? R
∂
µ
∂t ijk
= −Rµijk
∂ ijk
∂t
µ = Rµijk
Hence
∂
Ej
∂t i
= ∆Eij + 2Eik Ekj + µiab µjcd Eca Edb
which is the evolution equation for the structural tensor. This is very useful in
the 3DRF.
132
Geometric Evolution Equations
In fact, we have a more general result due to Böhm and Wilking. Before I
mention it, however, I should introduce the curvature operator. The curvature
operator R̄ is a bilinear form on Λ2 T M , which is defined by the relation
Remark. Note that having the smallest two eigenvalues positive is closely related to
the notion of 2-convexity, which is the main notion used in the Huisken-Sinestrari
result (which I will mention later on). Essentially for a hypersurface to be 2-convex
means that the sum of its smallest two principal curvatures is positive everywhere.
133
The Ricci Flow
Proof. (Böhm and Wilking, sketch). In order to prove this, we would like an ana-
logue of the notion of the structural tensor for n dimensional manifolds. So, given
any basis {Πα : α = 1, · · · , n(n−1)
2
} for Λ2 Tx M we get (writing R̄αβ = R̄(Πα , Πβ ))
∂
R̄β
∂t α
= ∆R̄αβ + 2(Rαγ Rγβ + Cαγδ C βξτ Rξγ Rτδ )
The Cαβγ are structure constants taking the analogous role to the structural
constants Eij as I defined before, and are defined by
never leave Ω.
Note that
d
= − (Rij + ∇i ∇j f )ġij e−f dµ + (2∆f − k∇f k2 + R)( 21 g ij ġij − f˙)e−f dµ
R R
dt
F
Fix a smooth measure dξ and given g, define f (g) by e−f dµ(g) = dξ.
Then the variation of Fξ (g) = F (g, f (g)) is
d
R
F
dt ξ
= − (Rij + ∇i ∇j f )ġij dξ
134
Geometric Evolution Equations
∂
g
∂t ij
= −2(Rij + ∇i ∇j f )
ψ̄ = ∇i fˆ
dI ∂ ij
R
dt
= M
Rij (g) ∂t g dµ(g),
135
Proof of Hamilton’s Theorem (1982)
So we may conclude:
Ricci Flow is the steepest descent flow of the Fisher Information for a sharp
Riemannian manifold.
This makes sense- for it is clear that, under the Ricci flow, we are losing infor-
mation about the manifold. Moreover, later when I discuss Ricci flow with surgery,
it is obvious that we are losing information, because we are losing topology.
(1) Prove short time existence of the Ricci Flow on M 3 starting with a metric g0
with Ric(g0 ) = 0.
(2) Show curvature blows up as t → T .
(3) Show sectional curvatures get close together as the curvature gets large.
∂g
(4) Rescale time
R
and the metric to get a solution to the equation ∂t
= −2Rij + n2 rg,
RdV
where r = RM dV . Show furthermore that this solution exists for all time and
M
converges to a metric of constant curvature.
136
Geometric Evolution Equations
F : ξ × [0, T ) → ξ
∂
α ˆ + F (α)
= ∆α
∂t
da
dt
= F (a)
with a(0) ∈ κx , remains in κx , we may conclude that the solution α(t) of the
PDE remains in κ.
U0 : (V, h) → (T M n , g0 )
∂
Ui
∂t a
= Rli Ual
137
Proof of Hamilton’s Theorem (1982)
where Uai are components of the isometry with respect to some bases for V and
T M n . Then I claim that U (t) remains an isometry from (V, h) to (T (M ), gt ) and
we can consider the behaviour of the pullback of the Riemann tensor Rm on M to
V , U ? Rm, rather than Rm.
Proof of Claim.
∂ ? ∂
(U gt )ab = (gij Uai Ubj )
∂t ∂t
∂ ∂ ∂
= gij Uai Ubj + gij (Uai )Ubj + gij Uai Ubj
∂t ∂t ∂t
i j i p j i j q
= −2Rij Ua Ub + gij Rp Ua Ub + gij Ua Rq Ub
=0 (5.80)
∂
Rabcd = ∆Rabcd + (R ? R)abcd
∂t
= ∆Rabcd + 2(Babcd − Babdc + Bacbd − Badbc ) (5.81)
In 3 dimensions, this ODE can be written with respect to a basis {ei } of T ? Mx3
such that Q is diagonal- in particular, since E is the curvature operator (essentially)
in dimension 3, we can write everything in terms of the eigenvalues of E and we get
2
λ 0 0 λ + µν 0 0
d d
E= 0 µ 0 = 0 µ2 + νλ 0
dt dt 2
0 0 ν 0 0 ν + λµ
138
Geometric Evolution Equations
Initial values tell us all about the initial Ricci and scalar curvatures
µ+ν 0 0
1
Ric = 0 ν+λ 0
2
0 0 λ+µ
and R = λ + µ + ν.
From now on we assume λ(0) ≥ µ(0) ≥ ν(0).
Lemma 5.9.2. If ∃C < ∞ such that λ < C(µ + ν) at t = 0, then this condition is
preserved under the Ricci flow.
Proof. We will use the maximum principle from before, for the set
Note that this set is invariant under parallel translation. To see that κ is convex,
note that
But maxkuk=1 (αQ1 + βQ2 )(u, u) ≤ αmaxkuk=1 Q1 (u, u) + βmaxkuk=1 Q2 (u, u).
This establishes convexity.
It just remains to show that the solution of the associated ODE stays in κ. This
follows from the fact that
d λ µ2 (ν−λ)+ν 2 (µ−λ)
dt
log( µ+ν ) = λ(µ+ν)
≤0
as can easily be proved from the evolution equation for λ, µ, ν from before. Thus
by the maximum principle, the solution to the PDE remains in κ.
139
Proof of Hamilton’s Theorem (1982)
λ−ν C
λ+µ+ν
≤ (λ+µ+ν)δ
In particular, as the curvature gets large, as it will under the Ricci flow (recall
R = λ + µ + ν), this shows that eigenvalues get ”pinched” together. But the left
hand side is scale invariant, so it follows that the limit of normalised Ricci flow, g∞ ,
has constant sectional curvature, and we are done.
λ−ν C δ
µ+ν
≤ µ+ν
d λ−ν
dt
log( (µ+ν) 1−δ ) ≤ δλ − 21 (1 − δ)(ν + µ)
By the lemma, it is possible to choose δ = δ(C) small so that this always is non
λ−ν
positive, so that (µ+ν)1−δ is non increasing, so that the inequality is preserved by the
ODE.
Thus, by the maximum principle, it is also preserved by the PDE. This completes
the proof of Hamilton’s theorem.
140
Geometric Evolution Equations
Proof. (sketch). The idea as to how to prove this is to use the mean curvature flow.
So, given F0 : M n → Rn+1 , solve
141
The Huisken-Sinestrari theorem; surgery and classification of canonical
singularities for MCF
d
dt
F (p, t) = −H · ν(p, t) = ∆t (F (p, t))
(ii) The neckpinch. The neck will continue to become longer and thinner under
the flow. Under a microscope the singularity looks like the infinite cylinder S n−1 ×R.
142
Geometric Evolution Equations
143
The Huisken-Sinestrari theorem; surgery and classification of canonical
singularities for MCF
(iv) If λ1 ≤ ηH (eg for a neckpinch or part of a horn not at the tip), then kλn −
λ2 k ≤ 5ηH + Ĉη (n ≥ 3). (The cylindrical or ”roundness” estimate). Roughly
what this is used for is to conclude that the curvatures λ2 , · · · , λn are all
arbitrarily close in this situation, ie the manifold looks locally like S n−1 × R.
k∇Ak2 ≤ η0 H 4 + Cη0
where η0 = η0 (n), and Cη0 = C(η0 , M0n ). η0 depends only on the dimension,
and not the initial data.
Remarks.
(1) (iii) is analogous to the Hamilton-Ivey estimates for the Ricci Flow, for n = 3,
which control the structural tensor E: Eij ≥ −ηR − Cη .
(2) Note that (iii) and (iv) are only useful if H is huge in the neighbourhood of a
point, so that the terms with dependance on H will dominate. In particular,
we would like to be able to show that if H is large at a point, it is large in a
neighbourhood of that point. This motivates the last estimate,
(3) There is a natural analogue to (v) in the RF: This was established a poste-
riori by contradiction arguments due to Perelman, using his non-collapsing
estimate, which he derived from his entropy.
Remark. Note that if we are not guaranteed an estimate like (v), we may get,
in the RF, the following type of singularity, described as a ”sheet of cigars” by
Hamilton.
144
Geometric Evolution Equations
5.10.4 Surgery
Now that we have established that they will always develop, the idea is now to
perform surgery on canonical singularities of type (ii) and type (iii) (if type (i)
occurs we are done).
For the neckpinch, we cut the neck at both ends and glue in two n-balls, then
continue the flow on the individual pieces:
For the cusp, we merely slice off the end and glue in an n-ball:
This can all be made very precise. In particular, the cusp and the neckpinch can
be viewed as -thin, ie with a cross-section of metric diameter . We can show that
for sufficiently small, all estimates are preserved. A big issue is whether things
are actually improved after surgery; obviously if things do not improve we are not
145
Outline of the Classification Program
guaranteed convergence. We also need to prove that only a finite number of surgeries
are needed, ie. we do not have to perform an infinite number in an arbitrarily small
interval. Finally, we need to show that the solution ”heals” sufficiently after a
surgery before the next for us to be justified in using the same estimates (after
surgery the solution is no longer C ∞ , so we need to be careful; we have to use the
smoothing property of heat-type equations).
To make this all precise, we have the following useful result, which allows us to
think of our necks as embedded in R3 .
Proof. (idea). One shows that one can foliate the neck by minimal surfaces via
harmonic maps. The maps then give the required parametrisation.
Finally, we have
So this is what one does- one continues to perform MCF with surgery on the
manifold until all the pieces one ends up with are canonical singularities of type (i),
ie. shrinking spheres. The result then follows.
146
Geometric Evolution Equations
Definition 16. (κ-non collapsed). Let B(x, r) be the ball of metric radius r centred
at x. Then if kRmk ≤ r−2 within B(x, r), then we say that B(x, r) is κ-non collapsed
if vol(B(x, r)) ≥ κrn , where n is the dimension of B(x, r).
φ?i gi →C ∞ g
147
Outline of the Classification Program
In particular, this means that, for the Ricci flow on compact manifolds, the
Cheeger-Gromov result applies. In other words, we have a compactness result for
such flows- which allows one to identify the original manifold as a connected sum of
its limiting pieces after finitely many surgeries, which essentially proves Thurston’s
Geometrisation Conjecture. I will repeat many of these statements later.
−1
Sidenote: The C topology is the topology of maps which are −1 times dif-
ferentiable.
The -cap. An -cap is intuitively a -neck which is ”rounded off” at one end,
as shown:
148
Geometric Evolution Equations
These are really the only essential canonical neighbourhoods. However, for a
proper analysis, we need to include two more:
149
Outline of the Classification Program
150
Geometric Evolution Equations
The proof of this theorem is not entirely trivial, but neither is it particularly
novel. However, it is, in some sense, one of the core results of the Hamilton-Perelman
program. It essentially involves a lot of careful analysis and reasoning. The inter-
ested reader is advised to refer to the appendix of F. Morgan and G. Tian’s extensive
manuscript [36] for the details.
So we have eight possibilities for such manifolds: S 3 , RP 3 , RP 3 #RP 3 , R3 , RP 3 −
{pt}, S 2 × R, S 2 × S 1 and the non-orientable S 2 bundle over S 1 . It will turn out that
in fact the following result is true:
Sketching the proof of this theorem is what I will concern myself for the rest of
this section.
(ii) What is to stop us from having infinitely many surgeries in a finite amount of
”time”?
(iii) How can we be sure the flow is ”smooth enough” after previous surgeries when
we wish to perform future surgeries?
151
Outline of the Classification Program
(iv) Will we always get sensible limiting behaviour as a result of this process?
The standard way to address (i) is to have a priori estimates and show that one
can perform surgery in such a way that these estimates are preserved. To show that
we only have a finite number of surgeries in a finite amount of ”time”, the idea is,
crudely speaking, to observe that the process of surgery can be arranged to remove
a fixed amount of volume from the manifold, and, furthermore, that each further
surgery in a bounded interval must remove a fixed amount of volume bounded below
by some constant. We may hence conclude that an infinite number of surgeries in a
finite ”time” interval for a manifold starting with finite volume is impossible. (iii)
may be overcome by making careful arguments using the tools of geometric measure
theory and the smoothing property of parabolic equations of heat type. Really the
core trouble is with (iv). In fact, we have the following result, due to Cheeger,
Gromov and others, which I mentioned before:
Result. A Geometric Limit of a sequence of manifolds (Mn , gn (t), xn ) exists if
(ii) for each A < ∞ we have uniformly bounded curvature for restriction of the
flow to metric balls of radius A about base points.
If the above result holds, then we can conclude that the limit of a sequence
of manifolds really does look like the solution of its renormalised Ricci flow, and
the question of classification of these limits comes down to classifying gradient soli-
tons. But we have the final result that gradient solitons are unions of canonical
neighbourhoods, in particular -necks and -caps, and we are done.
So evidently the key is to establish that (i) and (ii) of the above result hold for
geometric flows, otherwise the whole process of surgery will not work. In particular,
horrible solutions like Hamilton’s ”sheet of cigars” might crop up and throw a span-
ner in the works. It took Perelman with his notion of Length or ”Entropy” to show
by contradiction arguments that (i) in fact holds (relation to the Gradient estimate
from before). (ii) is essentially a consequence of the standard a priori estimates due
to Hamilton.
Before I go into some more detail about Perelman’s length and so forth, I make
the following remark- this whole argument could be simplified considerably if we
observe that the Ricci flow is steepest descent for the Fisher Information of a sharp
Riemannian 3 manifold without masses, and the corresponding renormalised Ricci
152
Geometric Evolution Equations
flow is steepest descent flow for the Physical Information of the same. Then we get
automatically that the classification of the limits of such flows is the classification of
gradient solitons. But perhaps some subtlety escapes me, most particularly because
we are not only flowing manifolds smoothly, we are also performing surgery; so I
shall continue with sketching the standard treatment.
∂ ∂τ ∂f
Proposition 5. If ∂t gij = −2Rij , ∂t
= −1 (ie τ ∼ (T − t)), and ∂t
= −∆f +
n
k∇f k2 − R + 2τ , then
n
∂ 1
g k2 (4πτ )− 2 e−f dµ(g)
R
∂t
W (g(t), f (t), τ (t)) = 2τ kRij + ∇i ∇j f − 2τ ij
≥0
Furthermore, equality is acheived if and only if one has a gradient Ricci soliton.
Remarks.
(1) This is precisely the monotonicity formula for the rescaled Ricci flow. Fur-
thermore, the W entropy is essentially a generalisation to rescaled Ricci flow
of the Green’s function result
2
(t − t0 )−n/2 exp(− kx−x 0k
R
R(x0 , t0 ) = Mt 4(t0 −t)
)R(x, t)dµ(gt )
(2) Note that τ (R + k∇f k2 ) + f − n is constant on solitons for the above choices
for f and τ .
153
Outline of the Classification Program
Remark. We immediately see a similarity between this quantity and the physical
information. I suppose one could roughly interpret L(γ) as ”the physical information
of a path γ through the manifold M × I”.
A whole theory of L-geodesics can then be developed. In particular, we can show
using this functional that the Result in the previous section is true, by constructing
an a posteriori gradient estimate for Ricci flow with surgery. In particular, Perel’man
used these quantities to deduce his theorem which I mentioned earlier:
154
Chapter 6
Statistical Geometry
6.1.1 Motivation
We are first and foremost interested in generalising the notion of a particle path.
For instance, consider the following picture:
155
Preliminary Definitions
We would like somehow to be able to model as particles objects that have mea-
sure normalised to one across each spacelike hypersurface of our manifold, with the
bulk of the measure concentrated about a world tube. I will be more specific -
we would like the measure to maximise at the centre of this tube, ie about some
optimal geodesic path, but also somehow model uncertainty in position by having
exponential falloff relative to the core geodesic.
It is necessary to construct some sort of machinery to handle this problem. In
order to do this, we consider the notion of a discrete statistical manifold, which
is a discrete space M together with a discrete distribution space A such that at
each m = m0 ∈ M there are a finite set a1 , ..., ak such that there are paths
from m to m1 (a1 ), ..., m1 (ak ) ∈ M . From these points there are paths to points
156
Statistical Geometry
m2 (m1 (a1 ), a1 ), m2 (m1 (a1 ), a2 ), ...R etc. The signal function now is a function from
M to R such that, for fixed m, A f (m, a)da = 1. It assigns to each path from m0
to the m1 (ai ) the weight, or probability, of that path being selected.
Diagrammatically:
The trick will be to find a way to move from discrete spaces M and A to natural
smooth manifolds, and reduce the distance between consecutive steps m0 , m1 etc
along each path to zero. This will be the focus of the next subsection.
Also we would like to define M not only to be a metric space, but a measure-
metric space, with measure ψ. Conservation of probability demands that the signal
function and the measure be related in the following way:
157
Preliminary Definitions
6.1.2 Introduction
Definition 18. A fuzzy Riemannian manifold (M, A, f ) is a differential manifold
M together with a Riemannian manifold A, together with a smooth non-negative
function f : M × A → R.
Each point in A is an n by n matrix corresponding to an inner product on Rn .
We might make further assumptions on the structure of A. For instance we will
assume for now that each inner product is symmetric, and that all points in A have
158
Statistical Geometry
the same index. Finally we will assume that as we vary m ∈ M the inner product
g(m, a) corresponding to a ∈ A will vary smoothly; ie g(., a) will be a metric on M .
This correspondence must be smooth, and all the metrics must be of constant index
throughout M .
Furthermore, the subspace of the space of metrics with fixed index corresponding
to the set A must be connected (this is to avoid issues in applications with space-
like directions becoming timelike under different metrics, ie to avoid pathological
problems with causality). grada , diva , curla and ∆a shall from now on represent
the usual differential operators with respect to the metric corresponding to a. Ev-
idently we will need to integrate over A in order to take all of these operators into
account when we are using them, and they will be weighted in importance by our
distribution function f .
f is defined so as to have the following two properties:
The Normalisation Condition
R p
A
f (m, a) det(h(a))da = 1, for all m ∈ M
159
Preliminary Definitions
R
gradΛ K(m, a) = Rb∈A f (m, b)gradb K(m, a)db
divΛ ψ(m, a) = Rb∈A f (m, b)divb ψ(m, a)db
curlΛ ψ(m, a) = b∈A f (m, b)curlb ψ(m, a)db
Note now that the divergence free flow condition for the signal function is equiv-
alent under this new notation to
R
A
divΛ gradΛ f (m, a)da = 0 for all m ∈ M
This makes sense because we obviously want ψ to also satisfy a more general
divergence free condition:
R p
A×A
f (m, b)divb ψ(m, a) det(h(a))det(h(b))d(a ⊗ b) = 0 for all m ∈ M
160
Statistical Geometry
for all m ∈ M .
In fact, as it will turn out later, this notion of compatibility is precisely what is
required for ψ to correspond to a notion of generalised momentum (see the section
on interesting examples).
Also, a mass distribution must satisfy the property that
as well as
161
Preliminary Definitions
where k . k(m,a) denotes the norm with respect to metric g(., a) at the point m ∈ M .
It turns out that, in order for the flow to have locally extremal length, ψ and f
must satisfy the additional coupling condition for all (m, a) ∈ M × A:
ψ(m, a)
∇(ψ(m,a)−gradb f (m,a);b) =0 (6.4)
kψ(m, a)km,a
Here ∇(.,a) is the Levi-Civita connection with respect to metric g(., a).
R R
Proof. Let L(φ, U )s = A U kφ(m, a, s)k(m,a) dmda be a variation of L(φ, U ).
Then
Z Z
∂ ∂
L(φ, U )s = kφk(m,a) dmda
∂s A U ∂s
∂
kφk2
Z Z
= 1/2 ∂s dmda
A U kφk
Z Z
= < (∇ ∂ ;b) φ(m, a), φ(m, a) > /kφ(m, a)kdmda (6.5)
∂s
A×A U
162
Statistical Geometry
< ∇( ∂ ;b) φ, φ/kφk > =< ∇(v;b) (φ − gradb f ), φ/kφk > for arbitrary v
∂s
Z Z Z Z
∇ < v, φ/kφk > .(φ − gradb f ) = ∇ < v, φ/kφk > .G
A U ZA ZU Z Z
= ∇.(G < v, φ/kφk >) − < v, φ/kφk > ∇.G
A U A U
(by a vector identity). (6.7)
Now the first term vanishes by the divergence theorem and the fact that v
vanishes on the boundary. The second term vanishes since div curl is the zero
operator.
Hence
∂ φ
R R
∂s
L(φ(m, a, s), U )|s=0 =− A U
< v, ∇(φ−gradb f ;b) kφk > dmda = 0
163
Existence of solutions
ψ(m, a)
∇γ(m,a,b) =0 (6.8)
kψ(m, a)k(m,a)
as can be easily verified by following the same procedure as for the other example.
A pair (f, ψ) satisfying (6.1) and (6.8) I shall from now on refer to as a fuzzy geodesic.
Note that (6.1) and (6.8) can be interpreted as describing a coupling between the
curvature and mass distributions on the manifold M .
∂2L
∂u∂v
(Ψ(u, v), U )|u=0,v=0 ≥ 0.
But even this is beset with problems. For if any of the second variations are
in fact 0 we cannot conclude anything about stability and must examine higher
derivatives. In particular we would need all the third order variations to be ≥ 0.
But then if a single third order variation is 0, we have to look further, etc. However,
since the domains we are dealing with are compact, it is inherently reasonable to
expect that we should be able to stop this process after looking at a finite number
of derivatives. So we have:
164
Statistical Geometry
Remark. In order for the above proof to work it may need to be established that
all the induced P.D.E.s from looking at nth order variations have analytic solutions.
This is something that we should expect from this problem, however, since it is
motivated by physical considerations.
Definition 22. A domain Ω is called fundamental if it suffices to look at only the
first and second order derivatives to determine stability, i.e. if C(3) > diam(Ω).
Let me make all this more precise.
Definition 23. The diameter of a set U ⊂ M is defined as
diam(U ) = supa∈A (diama (U ))
where diama (U ) is the standard diameter from semi-Riemannian geometry with
respect to the metric indexed by a.
Theorem 6.2.1. U ⊂ M is topologically compact iff diam(U ) is finite.
Remark. This could be more or less taken as a definition for the topology of Λ =
(M, A, f ).
R
Conjecture. Suppose λ = sup(m,a)∈M ×A A f (m, b)gradb f (m, a)db is finite, and
dim(M) = n. Then, for each R > 0, there is a constant K(λ, n, R) such that,
for any U with diam(U ) < R, it is sufficient to determine stability to the local
length minimisation problem in U by looking at q-parameter variations of order up
to q = K(λ, n, R).
Remarks.
(i) This is in accordance with intuition, because if it was necessary to look at an
infinite number of variations within a compact domain, somehow our solution
would possess an infinite amount of information in a bounded domain, which,
needless to say, is unphysical.
(ii) Note that K(λ, n, R) need not be an integer. In fact, in analogy with fractional
differentiation as defined before, it makes perfect sense to be able to make
variations of noninteger order. Note that this level of sophistication is probably
unnecessary, since physical functions tend to be C ∞ anyway.
(iii) The global bound on the variability of f is essentially a global bound on the
curvature of Λ = (M, A, f ). In other words, I am roughly saying, ”provided Λ
does not curve too much, we can put a bound on the number of variations we
need to make”.
165
Existence of solutions
Though it is possible that both parts of this conjecture are incorrect, and we
really should be looking at the following problem:
Problem. Given a set U and complete knowledge of f in U , compute all possible
solutions ψ can take in U such that the length function L(ψ, U ) with fixed boundary
conditions ψ|∂U (m, a) = g(m, a) is locally minimised in F.
The influence of boundary conditions mean that (6.1) and (6.4) will no longer
suffice to determine the correct solutions inside U for ψ, and we will have to derive
new relations that ψ, g and f must jointly satisfy. If solutions are nonunique, I
posit the further
If this conjecture is correct, this leads one on to being able to define the relative
probability of a solution occuring inside U in terms of random noise outside.
166
Statistical Geometry
Define L(ψ|U , K)avg to be the average of the length induced from all possible
extensions of f to K ⊂ M from U ⊂ K (since we are dealing with copies of U only,
so f may change outside U and hence be effectively random), given a particular
solution ψ|U . (Note that if the above conjecture is true then ψ will in fact extend
uniquely to all of K once we know f .) Then define the normalised length of solution
ψ|U as
P
L̂(ψ|U ) = limα→∞ L(ψ|U , Kα )avg / S∈NU L(S, Kα )avg
where Kα are a series of expanding sets that envelop all of M in the limit (we
are of course making the assumption that M is paracompact).
We now want to define a notion of probability such that if L̂(S1 ) = nL̂(S2 ),
then P (S2 ) = P (S1 )/n. Certainly 1/L̂(S2 ) = n/L̂(S1 ), so define the probability of
solution ψ|U occuring as being proportional to 1/L̂(ψ|U ). In particular,
1 1
P
P (ψ|U ) = L̂(ψ| )
/ S∈NU L̂(S)
U
Conjecture. This corresponds to the real probability that a solution will occur.
In order to understand the types of solutions that may occur, we have the
following conjecture:
Conjecture. Stable solutions (ie first variation of length is zero, second variation
of length is positive) are characterised by geometrical symmetries. In other words,
there is a group of symmetries acting on the space that preserves the solution.
Which leads us to
Conjecture. There is a 1-1 correspondence between symmetry groups acting on
(M, A, f ) and stable solutions for ψ in (M, A, f ).
Other examples of the variation problem might be in modeling the excitation of
an object by an incoming packet of energy, or modeling the possible states leading
up to a particular distribution of mass/energy. This motivates the more general idea
of considering the mixed problem where part of the boundary conditions are fixed
and the other values for ψ on the boundary are allowed to vary, then trying to find
stable minimal solutions for the length over the compact domain.
Yet another generalisation that can be made to this variational problem is by
first associating to each a ∈ A a U (a) ⊂ M such that the correspondence a 7→ U (a)
is smooth. Denote this particular distribution of sets by D(A). Then define the
generalised length to be
167
Two classes of Signal Functions and related results
R R p
L(ψ, D(A)) = a∈A m∈U (a)
k ψ(m, a) k(m,a) det(g(m, a))det(h(a))dmda
Then solve the same problems as before while extremising this notion of length.
ψ(m,σ(m))
∇(ψ(m,σ(m));σ(m)) kψ(m,σ(m))k =0
168
Statistical Geometry
So once again we have a choice of metric σ(m) defined effectively at each point
m ∈ M , except this time we have a distribution of approximate distance h about it
in the space of metrics, loosely speaking.
Since we are dealing with a narrow distribution, any ambiguity with our choice
of compatibility condition is accurate up to order (h/(min eigenvalue of σ(m)))dimM
which is very small anyway, provided h << 1.
Interesting subcases are when σij is diagonal with signature (1, 1, 1, −c2 ) or di-
agonal with signature (1, 1, 1, 1).
To solve for ψ in this case we might just blindly hack away and try to solve, or
we might try
ψ(m, a) = ψ(m)expm (−|(σij (m) − aij )mi mj |/h) for some choice of ψ.
We might want to look at the case where the length scale, L, varies with posi-
tion. A natural choice is to modify the usual choice, h, by an invariant depending
onpthe inner product at that point. So it is sensible to look at a length scale L(m) =
|(σ (m)−a )m m |
h det(σ(m)), and look at signal functions f proportional to expm (− ij L(m)ij i j ).
A very important signal function to look at is f (m, a) = (1 + ∆σ )δ(σ(m) − a),
for << 1, as shall become apparent later. I shall now proceed to derive the
conservation and geodesic equations for this particular object.
In particular, the conservation equation will be
which reduces to
169
Two classes of Signal Functions and related results
(1+∆σ )ψ̄
(1 + ∆σ )∇((1+∆σ )ψ̄,σ) k(1+∆ σ )ψ̄k
=0
ψ̄∆σ kψ̄k
∇(ψ̄,σ) kψ̄ψ̄k + {∆σ ∇(ψ̄,σ) kψ̄ψ̄k + ∇(∆σ ψ̄,σ) kψ̄ψ̄k + ∇(ψ̄,σ) ( ∆kψ̄k
σ ψ̄
− kψ̄k2
)} =0
But this expression may be simplified, for we observe that the last four terms
can be written as
Hence, we have the expression for our geodesic equation for this choice of signal
function:
ψ̄
(1 + 2∆σ )∇(ψ̄,σ) =0 (6.10)
kψ̄k
ψ
(1 + 2∆)ψ( kψk )=0
is equivalent to
ψ
(1 + 2∆) dtd { kψk (x + tψ)}|t=0
170
Statistical Geometry
ψ
where ψ̂ = kψk .
Rewriting we get
(1 + 2∆)∇f = 0
where kuk = 1.
µ √ kl ukl x
Hence (∇f )µα = Aα exp(i unu √xnu ), and therefore f kl =
2
 exp(i 2p√p ) + a
√
But using symmetry ψ̂ k ψ̂ l = f kl = B̂ k B̂ l exp(i(τ k + τ l ) · x √1 ) + αk αl
√ k
Hence taking square roots we get that ψ̂ k = C k exp( iλ√·x ) + γ k for some new
constants C, λ, and γ.
Remark. For the above solution, the overall direction of flux is in direction γ k , since
the oscilliatory term changes phase so rapidly with small epsilon that all contribu-
tions over an integrated region will cancel each other out. However, we do expect to
have some correction term of order or higher, which will be the focus of the next
investigatory exercise.
Proposition 6. For the above solution to the fuzzy geodesic equation, consider a
rectangular region R with coordinates γ̂,γ̂ ⊥ . Suppose further that R is uniformly not
small (with dimensions of order n , for n < 1/2). Then if the volume of the region
is V , the amount of flux through the region, Φ, is
171
Stokes Theorem for Statistical Manifolds
√
Φ(R) = V kγk + E()
Kk N C · nexp( iλ·x
R
√ )k
Proof. In fact, though daunting at first, this is relatively trivial to prove, and follows
easily from Stokes theorem in the sharpR case. If we absorb the distribution f into
the derivatives and write the form ω as b∈A F (m, a)dK(m, b), for F a function and
the dK normalised unit forms, then we observe that, pointwise
R R
R R ∂ (m, b)Fk (m, a)dxj (m, b)dKi (m, c) =
M A ijk j
∂ (φbc (m), c)Fk (φbc (m), a)dxj (φbc (m), c)dKi (φbc (m), c)
Mbc A ijk j
and since then this holds for all points m, a, b, c it follows that the statistical
stokes theorem is true.
172
Statistical Geometry
R
where f (m, a) = A F (m, b)δ(σb (m) − a)db. (Note that, if σb is just
R a number, as
∂
in our number theory example, the statistical derivative reduces to A F (m, c)( ∂m +
∂
∂c
)dc, since we don’t have matrix multiplication twisting things up, or rather, matrix
multiplication is trivial ).
Recall Theorem 2.5.2 from chapter 2:
R R R R
M
d
A (M ;a)
ω(m, a) = ∂M A
ω(m, a)
But this is clearly equivalent to what we want to show, for since d(M ;a) is a
statistical derivative, we can simply let it be the statistical derivative with respect
to f , and we are done. In other words, stokes theorem holds for general statistical
manifolds.
173
Stokes Theorem for Statistical Manifolds
174
Chapter 7
175
The Shannon Entropy
P
get that H(AB) = H(A) + i pi Hi (B), where Hi (B) is the conditional entropy of
the scheme B given that event Ai has occured. To be more precise, let qij be the
probability that the event Bj in the scheme B occurs given that the event Ai of
scheme A occurred, and let pi be the associated probabilities to the events in A.
Hence
X
H(AB) = − pi qij (ln(pi ) + ln(qij )) (7.1)
i,j
X X X X
=− pi ln(pi ) qij − pi qij ln(qij ) (7.2)
i j i j
X
= H(A) − pi Hi (B) (7.3)
i
P
We then define HA (B) = i pi Hi (B).
We have the following result which I give without proof. The interested reader
is advised to refer to Khinchin [33] if they wish for a detailed argument.
Remarks. It can be shown that (i) holds for the notion of entropy that we
have defined. Clearly we would want this to be true, since intuitively things are
most uncertain in a scheme when all events have equal probability. (iii) basically is
the statement that the entropy of a scheme should not change if we add impossible
events. Most importantly, all three of these properties hold for the notion of entropy
defined above.
Remark. There is also the notion of the information of a system. Intuitively
speaking, if we perform an experiment on a scheme we gain some information (by
finding out which event occurs), and we eliminate the uncertainty of the scheme.
Hence this notion of information should be an increasing function of the entropy. It
176
Fisher Information and application to the theory of Physical Manifolds
(i) Shannon entropy is more or less a global measure of uncertainty and does not
allow finer examination of phenomena. In other words if we were to shuffle
the values in a finite or even continuous scheme around we would get the
same value for the entropy. This can be seen most clearly in the one dimen-
sional case by taking a discrete sum of values and observing that there is no
interdependence between them.
(ii) This entropy does not incorporate the act of observation or measurement of
information into the system. Shannon entropy is a measure of uncertainty from
the point of view of an observer from outside the system; for instance, someone
observing the outcome of the roll of a die. It is not immediately apparent that
this should matter, but, in general, the act of making a measurement from
within a system will actually locally perturb the system, and hence, perturb
the measurement. In other words, internal information (which we want to
quantify) 6= external information (which is the quantity measured by Shannon
entropy).
177
Fisher Information Theory and the Principle of Extreme Physical Information
this area I found Wikipedia to be most useful. However, due to the dubious and
changeable nature of this source, it behooves me not to provide references to the
wikipedia entries where I originally found this data, but rather a more conventional
source, such as a modern introduction to statistics. An example of such a book
containing the material I will need is the recent one by Coolidge [13].
After developing the statistical tools required, I shall refer to Frieden’s pioneering
work [19], describing in some detail why I have decided to use Fisher information in
the development that follows.
For example, the signal function is an example of such a function, with A being
the indexing space for events and M being a manifold upon which the events occur
pointwise.
Definition 29. Two random variables X, Y are independent if and only if the events
[X ≤ a] and [Y ≤ b] are independent for any numbers a, b.
Definition 30. The expected value of a random variable X for a probability distri-
bution f is given by the relation
R
E(X) = A
f (x, θ)X(x, θ)dθ
178
Fisher Information and application to the theory of Physical Manifolds
It is clear why we would expect E(X) to be a useful invariant. What is not clear
is why we would ever want to consider var(X).
Well, certainly var(X) gives us some measure of the variability of the random
variable X about its mean. It can easily be shown that var(X) = E(X 2 ) − (E(X))2 .
Furthermore, for independent random variables X, Y we have that var(X + Y ) =
var(X) + var(Y ).
Definition 32. The score for a probability distribution with one hidden variable θ
for a random variable X that does not depend on position in M is defined as
∂
V (X, θ) = ∂x
(ln(X(θ)f (x, θ)))
This is the quantity that I will be most interested in, and will be the primary
focus of the analysis to follow.
179
Fisher Information Theory and the Principle of Extreme Physical Information
Note that for the intents and purposes of what I plan to use this for, X will
predominantly be the identity variable, ie X(θ) = 1 for all θ ∈ A, since the quantity
that I am predominantly interested in measuring is only the signal function. Also
note that the Fisher information is linear in its second argument for independent
random variables. In other words, I(θ, X + Y ) = I(θ, X) + I(θ, Y ). This follows
from the result that the variance of the “sum” of independent events is the sum of
their variances. (Note (X + Y )(x)f (θ, x) = X(x)f (θ, x)Y (x)f (θ, x) for independent
random variables X, Y ; addition is not the same in χ as one might expect. Equiv-
alently we could notate X + Y as X ∩ Y .) Intuitively this is in accordance with
the fact that we should be able to add the information from unrelated experiments
together in a linear fashion.
Definition 34. A random observation is a random variable that is equivalent to
the identity on the probability space over a manifold.
This of course leads us to the question of what we mean when we talk about a
measure of information, ie what properties it should have. We might like to try to
prove something like the following:
Theorem 7.2.1. (Uniqueness of the Information Measure). Let χ be the space of
random variables on A. Then, given a probability distribution f defined pointwise
over the manifold M as above there is one, and only one function K : M ×χ → R≥0 ,
unique up to a nonzero scalar constant, that satisfies the following properties:
180
Fisher Information and application to the theory of Physical Manifolds
We are now at the point where we can define the bound information, J, and
describe the principle of extreme physical information, as described in Frieden’s
book.
I define the channel information to be the total Fisher information as defined
above. This is the total amount of information our manifold, can, in some sense,
“carry”. The bound information, J, is in some sense the unavoidable information
contained within the system. It will be derived in our case by our conservation
equation, as described and motivated in the previous section. For a physical manifold
then, we expect I = J.
Now, the principle of extreme physical information (The EPI principle) states
that the physical information, K = I −J, must be critical with respect to the natural
Fisher variables of the system. In our case, the Fisher variables will be the metric
components σ ij in some local chart. This is how the EPI principle is described by
Frieden. But I in fact go one step further.
Refined EPI principle: For physical manifolds, the physical information, K,
must be locally minimal. In other words we roughly need the following two things
to hold:
δK = 0, and
δ 2 K ≥ 0.
I say roughly, of course, because the second criterion does not guarantee stability.
7.2.3 Interpretation
One could view this principle as being a maximal efficiency principle. The necessity
of local minimality is required, of course, from the point of view of stability: the
system must be stable, otherwise perturbations will throw things out of wack. Even
if one has K = 0, this is no guarantee of stability; obviously it makes no sense if
K is negative, but it could happen that an unphysical manifold might have K = 0,
δK = 0 for all variations, but K 00 (t) < 0 for some variation. Then there will
be a natural flow to negative information K and the manifold will become truly
unphysical.
Perhaps a better way to interpret the EPI principle, particularly in the statistical
setting, is that the universe wants to run as efficiently as possible, so each and every
point, in an ideal world, would know exactly what was going on everywhere else and
181
Fisher Information Theory and the Principle of Extreme Physical Information
adjust itself to realise a minimum. (This is actually what happens in the degenerate
or sharp case, where one eliminates all probabilistic behaviour). However, allowing
more complex statistical models (which are more realistic) entail that there is an
increasing cost to a point ”knowing” what occurs metrically further and further away
from it; in other words, the universe must organise a trade off between knowing
exactly what happens everywhere (which would require a massive investment of
energy in pinning down the geometry) and spending the least energy possible so as
to be slightly uncertain about the geometry, but still realise a lower energy state.
This is precisely what I will be pinning down with my discussion of almost sharp
geometries, which are closest in behaviour to standard quantum mechanical models,
but of course the mathematics of information geometry does allow quite significant
generalisation beyond this.
As for why the universe would ”want” to run as efficiently as possible, perhaps
a better way of putting it would be to observe that since all other configurations
are unstable it has little choice in the matter. The inevitable noise due to statistical
fluctuations will cause any suboptimal choice will be quickly dropped in favour of
one more locally optimal. So while noise might prevent the universe from ever
attaining a global minimum for information, the fact that the information is stable
there means that it will get very close. So to a reasonable approximation we can
think of the universe as actually sitting at that point in solution space, in order to
draw various cartoon models of increasing sophistication. Naturally, of course, we
would like to understand the noise - and indeed elimination in parts will lead to
more complex formulations of Cramer-Rao and EPI. However the act of elimination
will always leave higher order noise yet to be quantified and understood. Hence the
best we can hope for is to produce models that work under a range of conditions
that we can experimentally measure and understand.
R R
I= M A
k∂Λ q(m, a)k2(m,a) dadm
182
Fisher Information and application to the theory of Physical Manifolds
and
where ψ(m, a) = ∂Λ q 2 (m, a) + B(m, a) for some B such that divΛ B(m, a) = 0.
The physical information, K, is defined to be I − J. To apply the EPI principle
one now need only solve the variational equation δK = 0. Hopefully this will prove
fruitful in deriving general relations that a physical manifold must then satisfy.
Philosophical Digression:
It might seem that in describing the characteristics a solution must satisfy on
a manifold that we are completely defining the state the solution must take. This
is not the case on complete manifolds because it seems to me against the very
principles of Fisher Information Theory to be able to make an infinite measurement
on a space (recall the EPI principle basically asserts physical behaviour arises as a
consequence of measurement), since this would require an experimental apparatus
of infinite capacity! So only finite domains can be measured, and, as before in my
discussion about extremising length, the solution may well not be unique. Of these
possible solutions, the one that the domain ultimately takes will depend on the
behaviour of the manifold outside.
183
Unbiased Estimators and the Cramer-Rao Inequality
this property can be thought of as the condition of attaining equality, or, at the very
least, local minimality, in what I shall call the Weak Cramer-Rao Inequality.
To put things into the terminology of Murray and Rice, a fuzzy Riemannian
manifold (M, A, f ) which is complete with respect to each metric in its corresponding
metric indexing space A is a complete exponential family; the sample space of events
is diffeomorphic to M × A; the space in which events occur is M ; and the likelihood
function is f : M × A → R.
From now on I shall assume that we are dealing with a fixed fuzzy Riemannian
manifold Λ = (M, A, f ) such that M is complete with respect to each metric σ(., a),
any a ∈ A.
Em (θ ◦ u) = θ(m)
R
where θ : M → Rn is a choice of coordinates for M and Em (λ) = A
λ(m, a)f (m, a)da,
m ∈ M.
◦u)
Em ((θi ◦ u) ∂ln(f
∂θj
)=0
◦u)
∂
((θi ◦ u) ∂(f
R R
A ∂θj
(θi f )da = δji + A ∂θj
) = δji
∂f ∂ln(f )
If we use the fact that ∂θj
= ∂θj
f, we see that the relation follows.
g ij (f ) = Em ( ∂ln(f
∂θi
) ∂ln(f )
∂θj
)
184
Fisher Information and application to the theory of Physical Manifolds
Proof. (Cramer-Rao).
Note that
Upon applying our lemma for unbiased estimators on the second term, then
integrating by parts, we get
∂ln(f (v(m,a),a))
∂θi
=0
Lemma 7.3.2. Any mle of an exponential family realises the Cramer-Rao lower
bound.
185
Unbiased Estimators and the Cramer-Rao Inequality
Lemma 7.3.3. Weakly unbiased estimators satisfy the additional relation that
E(θi ∂ln(f
∂θj
)
) = 0.
Corollary 7.3.4. Note that if our likelihood function f decays sufficiently fast to-
wards infinity from the center of our coordinates θ that E(θ ◦ Id) = 0.
{ ∂M φf dS − M φ ∂f
R R R R R
E(θ ◦ Id) = A M
θf dmda = A ∂θ
dm}da
Now the first term on the right disappears since f decays sufficiently fast. The
second term on the right disappears by the lemma.
Lemma 7.3.5. Any estimator Γ of the form Γ = exp(− ψf )∂σ f is a weakly unbiased
estimator, where we define ψ by the conservation equation
divΛ ψ = ∆Λ f
where divΛ and ∆Λ are the standard fuzzy differential operators. I shall call such
estimators physical estimators.
Proof. In a physical situation, which is the situation towards this is to apply, f will
decay sufficiently quickly that we can use our remark. So all we need to show is
that:
E(θ ◦ Γ) = 0
186
Fisher Information and application to the theory of Physical Manifolds
Proof. The above inequality is certainly not true for general weakly unbiased esti-
mators since the metric σ is not positive definite. So we need to show that
and
σij g ij (f (u)) ≥ 0.
In fact, what I will end up doing is proving that the second term vanishes for
physical estimators and that the following equality holds for the first:
2 kψk2
( k∂σff k −
R R R
σ cov((θi ◦ u), (θj ◦ u)) =
M ij M A f
)
187
Unbiased Estimators and the Cramer-Rao Inequality
Lemma 7.3.7. For the choice of Γ for an estimator, the Cramer-Rao inequality
becomes
k∂σ f k2 kψk2
Z Z Z Z
dadm − ≥0 (7.6)
M A f M A f
Remarks. Note that the first term of the above inequality is the channel infor-
mation as I defined it before, and the second term is the bound information. So
what this inequality is saying is that the difference of these, which I likewise defined
as the physical information, must be greater than or equal to zero. In particular,
since Λ is an exponential family, any mle of it will realise the inequality critically.
To choose such a critical mle amounts to computing the variation of K with respect
to the signal function f . This provides the sort of backing we were looking for before
to motivate the variational analysis to follow.
k∂σ f k2 2
Proof. (Completion of Proof). I prove that σij cov((θi ◦ Γ), (θj ◦ Γ)) = f
− kψk
f
.
So
Expanding, we get
R ∂i f ∂ j f ψi ψj R ∂i f ψj
A
( f
+ f
) −2 A f f
f
Now
ψj
σij ∂fi f σij ψi ψj f1
R R R R
M A f
f = M A
by Stoke’s theorem.
R
It remains to show that M σij g ij (f (Γ))dm = 0.
Now
(Γ)) ∂Γk ∂ln(f (Γ)) ∂Γl
g ij (f (Γ)) = A ∂ln(f
R
∂Γk ∂θi ∂Γl ∂θj
f
R ∂ln(f (Γ)) ∂(curlV )k ∂ln(f (Γ)) ∂(curlV )l
= A ∂Γk ∂θi ∂Γl ∂θj
f
188
Fisher Information and application to the theory of Physical Manifolds
Lemma 7.3.8. Any weak mle of an exponential family realises the weak Cramer-Rao
lower bound.
Further Remarks. Now it may seem that it is a bit restrictive to only consider
physical estimators, but I claim that we can transform any map u : M × A → M
into a physical estimator. All one needs to do is choose an appropriate likelihood
function f s.t. E(θ ◦ u) = 0 and u = exp(− ψf )(∂σ f ) for some appropriate ψ that
will itself depend on f . This transforms our problem into one of varying likelihood
functions, which makes good physical sense.
Furthermore, the analysis is this section has rested often on the fact that various
objects vanish at infinity. In certain situations of interest (for instance, modelling
the event horizons of black holes, or of the behaviour of charged plasmas contained
in a finite vessel), this is not enough, and it is necessary to consider finite domains,
or at least domains with boundary. One is led to the following conjecture, motivated
by similar work done by physicists working on the physics of black holes:
Conjecture. The physical information, KM of the manifold and the physical infor-
mation, K∂M of its boundary are related by the following inequality:
KM − K∂M ≥ 0
2 kψk2τ
{ k∂τff kτ −
R R
K∂M = ∂M A f
}
We can think of this as a kind of ”holographic principle” for the physical in-
formation K. Note that our earlier result can be considered a special case of the
above, for ∂M = φ. Note that if this conjecture is true, as a particular consequence
it suffices in applications to discard all boundary terms that crop up in a calculation,
or, in other words, to consider problems with Dirichlet boundary conditions.
Final Remark. All of this suggests that we define ourselves an information
functional
R R k∂σ f k2 R R kψk2
K(f ) = M A f
dadm − M A f
189
Unbiased Estimators and the Cramer-Rao Inequality
and play the following game- choose a family of signal functions f and minimise
this functional within this family. The best we could hope for of course would be to
zero the above functional. However, in most circumstances, all we can hope for is to
make it as small as possible. This is the game that I shall play in what is to follow.
Lemma 7.3.9. Weakly unbiased estimators satisfy the additional relation that
E(θi ∂ln(f
∂θj
)
) = 0.
φ ∂f dm}da
R R R R R
E(θ ◦ Id) = A M
θf dmda = A
{ ∂M
φf dS − M ∂θ
Now the first term on the right disappears since f decays sufficiently fast. The
second term on the right disappears by the lemma.
Lemma 7.3.11. Any estimator Γ of the form Γ = ∂Λ f is a weakly unbiased esti-
mator, where f satisfies the conservation equation
0 = ∆Λ f
190
Fisher Information and application to the theory of Physical Manifolds
where ∆Λ is the standard fuzzy differential operator for the Laplacian, and ∂Λ
is the gradient operator associated to the space. I shall call such estimators physical
estimators.
Remark. Note that a physical estimator corresponds precisely to a geometry with
signal function where probability flux is conserved. So physics occurs in spaces that
”make sense”.
Proof. In a physical situation, which is the situation towards this is to apply, f will
decay sufficiently quickly that we can use our remark. So all we need to show is
that:
E(θ ◦ Γ) = 0
191
Fisher Information is the optimal sharp information measure
Theorem 7.3.14. (The EPI principle (for Riemann-Cartan manifolds)). With the
understanding that a physical estimator is to be used, the Cramer-Rao inequality is
equivalent to the nonnegativity of the Fisher Information. In other words, given a
distribution f over a space of n-dimensional nondegenerate bilinear forms of constant
index Λ, we have that
R R k∂Λ f k2
I(f ) := M A f
≥0
This proves the EPI principle in the form we would like to use it.
192
Fisher Information and application to the theory of Physical Manifolds
(i) What is the mathematical justification for using the EPI principle?
(ii) Why the Fisher information? Why not some other arbitrary information mea-
sure?
R R kLf k2
I(L) = M A f
Remark. This does not preclude the fact that there may be other (locally) optimal
measures. However it is a good starting point for further philosophical discussion.
For now I will restrict myself to the case that the signal function is sharp.
Let S(M ) be the space of functionals F : (M → R) → R on our manifold M .
We would like to come up with some natural parametrisation of these.
We expect such a parametrisation to be compatible with the metric such that
our functionals F are locally diagonalisable with signature identical to the metric.
So locally our space has the dimension of the space of positive matrices on Rn .
193
Fisher Information is the optimal sharp information measure
Just as with our inner products for statistical manifolds as before, we play the
same game andR define a probability density functionalR λ(m, a, h) such that for a
fixed (m, a), B λ(m, a, h)dh = 1, and also, as before, B Λ(λ(m, a, h))dh = 0 (with
the appropriate conventions for fuzzy derivatives).
Then, we may define the meta-information to be
R R R kL(m,a,b) f (m,a)k2(m,a)
I= M A B
λ(m, a, b) f (m,a)
Make the assumption now that the information distribution functional is sharp,
that is, a particular information is preferred. Then λ(m, a, b) = δ(b − κ(m, a)), and
R R kκjk (m,a)(∂j )k f (m,a)k2
I= M A f (m,a)
Take the first variation with respect to κ and require that it be zero; we get
immediately that κ is constant in m and a if it is critical. Furthermore, it is constant
in k, if we use the notion of generalised derivative (to real numbers). So κ is
essentially just κij (∂j )k for some fixed k where κij is a constant matrix. One may
then make the observation that it is possible to find a new metric σ̄ such that
kκj (∂j )k f k2σ = k∂ k f k2σ̄ , ie it is possible to absorb κ completely into the metric σ.
So we have reduced the problem to one of examining the so called k-information.
Why then should the 1-information be preferred?
I guess then the ultimate conclusion is this:
Theorem 7.4.2. For each k ∈ R, the k-information is optimal.
194
Fisher Information and application to the theory of Physical Manifolds
However it shall turn out that this is relatively irrelevant when it comes to
matters of physics, leaving one with a (potentially infinite) number of integration
constants. I suppose if there was any way to say that the one information is preferred
it would be simply that there is no guide as to how to determine these integration
constants; therefore assuming they exist is a nonsense (overspecification), which
rules out criticality for all informations except one or less. But if one considers the
k-information for k < 1 one gets nothing at all (underspecification). So k = 1 is the
only possibility.
This shall all hopefully become clearer to the reader after perusing the section
to come.
Theorem 7.4.3. For any exponential family (statistical structure) the associated
mle is unique.
Remark. In particular via my construction this means that for every geometric struc-
ture there is a unique mle.
Also from [40] we have that
Theorem 7.4.4. The unique mle realises the Cramer-Rao lower bound for the Fisher
Information.
So the Fisher information is the best information in the regard that it is the
unique functional which is zeroed by the estimator of maximal likelihood. This is
in accord with our common sense as to how an information should behave.
For yet another perspective, consider the Cencov Uniqueness Theorem [43],[44],[10].
This essentially states that for any coarse graining (averaging over the geometry of
a space), the Fisher information of the new geometry is less than that of the old
geometry. This means that the Fisher information is a special information, in that
it directly captures what we would expect from the 2nd law of thermodynamics - as
the disorder of the system decreases, so does the Fisher information.
195
Fisher Information is the optimal sharp information measure
196
Chapter 8
We are now finally ready to put theory to application. This brings me back to the
signal functions that I introduced before in my discussion of statistical manifolds.
There was in fact a very good reason for studying these things, since it is these
particular classes of objects (sharp and almost sharp manifolds) that will be the
basis of my study in this section. It will turn out that by assuming that a manifold
is sharp and applying our variational principle that we shall recover the equations of
general relativity; furthermore, if we relax our restrictions a bit and assume that our
space is slightly fuzzy, then rinse and repeat our variational calculation, we shall be
able to derive the underlying equations on which one can base the standard model.
8.1.1 Introduction
Let us now suppose f (m, a) = δ(σ(m) − a) again. Our aim is to derive the classical
equations for a physical manifold. The first step is to reexpress I and J using this
signal function. Clearly for this choice, ψ(m, a) = ψ(m, a)δ(σ(m) − a).
Remark : Note that, when all is said and done, we will get a value for the
physical information K ≥ 0. In fact it is extremely unlikely that we will be able
to get K(σ) = 0 for any metric σ. However the point of this exercise is to find the
conditions on σ for it to minimise, or optimise, our information functional as defined
before.
197
Physics for Sharp Manifolds
Substituting into I, and observing that gradΛ δ(σ(m)−a) = gradσ σjk ∂σ∂jk δ(σ(m)−
a) we get that
1
< ∂σ σjk ∂σ∂jk δ(σ(m) − a), ∂σ σmn ∂σ∂mn δ(σ(m) − a) >
R R
I = I(σ) = M A 4δ
(detσ)1/2 dadm
1 ∂δ ∂δ
R R
= M A 4δ ∂σjk ∂σmn
< ∂σ σjk , ∂σ σmn > (detσ)1/2 dadm
Now, before I proceed any further, I shall assume that in the general expression
for ψ deduced from the conservation equation, ψ(m, a) = gradΛ f (m, a) + B(m, a)
that B = 0. Later I shall relax this restriction, and show that, at least in the
classical case, B corresponds to the presence of electromagnetic effects.
1
R R R R
M
∂ (δ)∂j (δ)Σdadm
A δ i M A i j
=
∂ ∂ (Σ)δdadm,
√ p
provided that Σ(1/ a) ∼ 1/a for some p > 1, a >> 1.
Claim 2 :
< ∂σ xjk , ∂σ xmn > satisfies the asymptotic property in Claim 1, for very small
values in the entries of the matrix y = (xjk − σjk )nj,k=1 .
Claim 3 :
1
R
I = I(σ) = 4 M
R(detσ)1/2 dm
R kψ(m,a)k2
J = J(σ) = M ×A 4f (m,a)
dadm
R kψk2
= M 4
dm
198
Physics from Fisher Information
Claim 4 :
(Rij − 12 σij R)(detσ)1/2 δσ −1
R
Assuming boundary terms are negligible, we get that δ(4I) = M
and δ(4J) = M (ψ̄i ψ̄j − 21 σij kψ̄k2 )(detσ)1/2 δσ −1 , where δ is now the variational
R
derivative.
Substituting the expressions from the final claim into the equation δK = 0 we
get
Gij = ψ̄i ψ̄j − 21 σij kψ̄k2
This is remarkably similar to the Einstein equation, but with the additional term
1
σ kψ̄k2 .
2 ij
This term might explain why cosmologists using the standard Einstein
equation to describe large objects like galaxies get puzzling results, and are forced
to invoke the presence of large amounts of (as yet unobserved) dark matter/energy.
It might also explain the Pioneer anomaly (the discrepancy between the predicted
flight path of the now famous probe out of the solar system and its actual journey).
Note: It can be shown that ψ̄ corresponds to the four momentum if we consider
toy examples like Minkowski space.
Also observe that this equation can be simplified, for if we contract it on both
sides with respect to the metric we obtain
R − n2 R = kψ̄k2 − n2 kψ̄k2
where we are assuming n = dimM > 2 (in applications, n will usually be 4).
As a consequence of the above, the equation of state simplifies to
Rij = ψ̄i ψ̄j
or, in coordinate invariant form
Ric − ψ̄ ⊗ ψ̄ = 0 (8.1)
Together with the geodesic condition
ψ̄
∇(ψ̄,σ) kψ̄k =0
199
Physics for Sharp Manifolds
I prove this for the one dimensional case, but this example is easily extended
to the general case. First observe that δ(x) = lima→∞ f (a, x), where f (a, x) =
a
exp(−ax2 ). Of course this definition is not unique, but by a deep theorem in
p
π
functional analysis it does not matter what limiting functions we use, as long as
they are smooth, since I am going to take derivatives.
Then
Z Z
0 2 ∂f (a, x) 2
(δ (x)) (1/δ(x))Σ(x)dx = lima→∞ ( ) /f (a, x)Σdx
∂x
Z r
a 2 2 −ax2
= lima 4a x e Σ(x)dx
π
√ ∂ −1/2 √ √
Z
2
5/2
= lima 4a / π (a e−u Σ(u/ a)du), u = ax
∂a
√ √
Z
1 2
= lima 4a / π(− 3/2 e−u Σ(u/ a)du
5/2
2a
−1 √
Z
−u2 0
+ a−1/2 ue Σ (u/ a)du) (8.2)
2a3/2
The first term vanishes by our asymptotic assumption on Σ. We are left with
4a2
Z p 1
lima √ 3/2 e−v Σ00 ( v/a) √ dv, (v = u2 ),
2 πa 2 va
√
Z
4 2u
= lima √ e−u Σ00 (u/ a)du
2 π2 u
√
Z
1 2
= lima √ e−u Σ00 (u/ a)du
π
00
= Σ (0)
Z
= δ 00 (x)Σ(x)dx, (8.3)
200
Physics from Fisher Information
Z
∂δ
0= divσ (∂σ σjk )dadm
∂σjk
ZU ×A
∂δ
= < ∂σ σjk , n̂ > dSda (8.4)
∂U ×A ∂σjk
Z
0= δ0Σ
and
2
e−ax = lima→∞ g(a, x)
pa
δ = lima→∞ π
So
∞
rZ
a 2
0 = −lima→∞ 2 axe−ax Σ(x)dx
−∞ π
Z ∞
a 2 u
= −lima→∞ 2 1/2 ue−u Σ( √ )du
π a
Z ∞−∞
a p
= −lima→∞ √ e−t Σ( t/a)dt
π 0
a √
= −lima→∞ √ Σ(1/ a) (8.5)
π
√
and the only way this can be zero is if Σ(1/ a) ∼ a1p , p > 1, for large a.
Translating back to our situation, this means that < ∂σ σjk , n̂ >= 0
201
Physics for Sharp Manifolds
Now, consider the level sets of σmn in M . These sets will be codimension one, and
have a normal n̂ that is parallel to ∂σ σmn . Hence on these sets < ∂σ σjk , ∂σ σmn >= 0.
Now sweep out all of M by following the vector field ∂σ σmn to create a family of
sets covering all of M with this property.
Hence the second Claim is proven.
Z
1 ∂δ ∂δ
I(σ) = < ∂σ σjk , ∂σ σmn > (detσ)1/2 dadm
4δ ∂σ jk ∂σ mn
ZM ×A
1 ∂ ∂
= δ(σ(m) − a) (< ∂σ σjk , ∂σ σmn > (detσ)1/2 )dadm
M ×A 4 ∂σ jk ∂σ mn
Z
1 ∂ ∂
= (< ∂σ σjk , ∂σ σmn > (detσ)1/2 )dm (8.6)
4 M ∂σjk ∂σmn
I now demonstrate that this new expression is equivalent to one quarter the
integral of the scalar curvature.
First observe that the scalar curvature is the contraction of the Ricci curvature
with respect to the metric:
R = σ ij Ricij .
Lemma 8.1.1. I claim that the first three terms evaluate to zero after contraction,
that is, that the scalar curvature can be written in terms of coordinates as:
R = −σ iα Γm l
il Γαm
From this point on, without loss of generality, I will making the convenient
assumption that the coordinate vectors ∂i have been chosen so that [∂i , ∂j ] = 0. This
has the natural consequence that the Christoffel symbols are antisymmetric in their
lower indices, that is,
202
Physics from Fisher Information
Γkij = −Γkji
1
Γm
ij = (∂i σjk + ∂j σki − ∂k σij )σ
km
(8.7)
2
1
(∂l Γliα − ∂α Γlil )σ iα = {∂l [(∂i σαk + ∂α σki − ∂k σiα )σ kl ] − ∂α [(∂i σlk + ∂l σki − ∂k σli )σ kl ]}σ iα
2
1
= {(∂i σαk + ∂α σki − ∂k σiα )σ kl (−σ kl ∂l σkl )
2
− (∂i σlk + ∂l σki − ∂k σli )σ kl (−σ kl ∂α σkl )}σ iα
1
+ {∂l (∂i σαk + ∂α σki − ∂k σiα ) − ∂α (∂i σlk + ∂l σki − ∂k σli )}σ kl σ iα
2
1
= − [(∂i σαk )(∂l σkl )(σ kl )2 σ iα + (∂α σki )(∂l σkl )(σ kl )2 σ iα
2
1
− (∂k σiα )(∂l σkl )(σ kl )2 σ iα ] − [(∂l σik )(∂α σkl )(σ kl )2 σ iα
2
+ (∂i σkl )(∂α σkl )(σ ) σ − (∂k σli )(∂α σli )(σ kl )2 σ iα ]
kl 2 iα
(8.9)
Now
1
{∂l ∂i σαk + ∂l ∂α σki − ∂l ∂k σiα − ∂α ∂l σik − ∂α ∂i σkl + ∂α ∂k σli }σ kl σ iα = 0 (8.10)
2
after one observes that σ kl σiα = δki δlα and plugs things through.
The remaining problem term, σ ik Γlik Γm lm , is zero because σ
ik
= σ ki and Γlik =
−Γlki . A simple argument using these facts plus the judicious replacement of dummy
indices will then do the job.
This proves the lemma.
∂ ∂
∂σjk ∂σmn
(< ∂σ σjk , ∂σ σmn > (detσ)1/2 ) = −σ iα Γm l
il Γαm (detσ)
1/2
203
Physics for Sharp Manifolds
Now, under application of the second claim to remove a few unnecessary terms,
we get that
∂ ∂
(< ∂σ σjk , ∂σ σmn > (detσ)1/2 )
∂σjk ∂σmn
∂ ∂
= σ iα [ (∂i σjk ) (∂α σmn )
∂σjk ∂σmn
∂ ∂
+ (∂i σjk ) (∂α σmn )](detσ)1/2
∂σmn ∂σjk
∂ ∂
+ (σ iα )[ (∂i σjk )∂α σmn
∂σjk ∂σmn
∂
+ ∂i σjk (∂α σmn )](detσ)1/2
∂σmn
∂ ∂
+ (σ iα )[ (∂i σjk )∂α σmn
∂σmn ∂σjk
∂
+ ∂i σjk (∂α σmn )](detσ)1/2
∂σjk
∂ ∂
+ σ iα [(∂i σjk )(∂α σmn )] (detσ)1/2
∂σmn ∂σjk
∂ ∂
+ σ iα [(∂i σjk )(∂α σmn )] (detσ)1/2
∂σjk ∂σmn
(8.11)
∂ ∂ ∂ ∂
σ iα [ (∂i σjk ) (∂α σmn ) + (∂i σjk ) (∂α σmn )](detσ)1/2
∂σjk ∂σmn ∂σmn ∂σjk
= (detσ)1/2 σ iα [(Γlki δjl δkj + Γlij δjl δkk )(Γl̄nα δml̄ δnm
+ Γl̄αm δml̄ δnn ) + (Γlki δml δnj + Γlij δml δnk )(Γl̄nα δj l̄ δkm + Γl̄αm δj l̄ δkn )]
= (detσ)1/2 σ iα [Γkki Γnnα + Γjji Γm l m
mα + Γil Γmα
+ Γlil Γm m n m j m j m j
αm + Γmi Γnα + Γji Γαm + Γij Γmα + Γij Γαm ]
= σ iα (Γm n m n
mi Γnα − Γin Γαm )(detσ)
1/2
(8.12)
the last equality obtained using the fact that Γkij = −Γkji several times.
204
Physics from Fisher Information
∂
Now it might appear that we have neglected terms like Γl
∂σjk ki
but these all
vanish, because, for instance,
∂ ∂ ∂
(∂i σjk ) = Γlki δjl δkj + Γlij δjl δkk + (Γlki )σlj + (Γl )σlk
∂σjk ∂σjk ∂σjk ij
= Γlki δjl δkj + Γlij δjl δkk (8.13)
after we observe that the last two terms cancel after the relabelling of j 7→ k,
k 7→ j in the second last term and using the antisymmetry of the Christoffel symbols
in their lower indices. Similar results hold for all the other terms.
Examining the second and third terms, we see that they are the same after
relabelling of dummy indices. Similarly for the fourth and fifth terms. Furthermore,
we observe that the second and fourth terms are related by the relation (second
term) = -2(fourth term), since we know that ∂σ∂jk (detσ)1/2 = 12 σ jk (detσ)1/2 .
So the last three terms of (8.11) cancel, leaving us with only the second term
to evaluate:
∂ ∂ ∂
(σ iα )[ (∂i σjk )∂α σmn + ∂i σjk (∂α σmn )](detσ)1/2
∂σjk ∂σmn ∂σmn
= −(σ iα )2 δji δkα [(Γlki δml δnj + Γlij δml δnk )∂α σmn
+ (∂i σjk )(Γlnα δml δnm + Γlαm δml δnn )](detσ)1/2
= −(σ jk )2 [(Γlkj δml δnj + Γljj δml δnk )∂k σmn + (Γlnk δmn δnm
+ Γlkm δml )∂j σjk ](detσ)1/2
= −(σ jk )2 [(Γm m m
kj )∂k σmj + (Γmk + Γkm )∂j σjk ](detσ)
1/2
= −(σ jk )2 Γm
kj ∂k σmj (detσ)
1/2
(8.14)
by antisymmetry of Γ in its lower indices.
Continuing:
−(σ jk )2 Γm
kj ∂k σmj (detσ)
1/2
= −(σ iα )2 Γm l l
αi (Γiα σlm + Γαm σli )(detσ)
1/2
= −σ iα (Γm l l m
αi Γiα δil δαm + Γαm Γαi δil δαi )(detσ)
1/2
= −σ iα (Γm l α m
mi Γlα + Γαm Γαα δαi )(detσ)
1/2
= −σ iα Γm l
mi Γlα (detσ)
1/2
(8.15)
205
Physics for Sharp Manifolds
∂ ∂
(< ∂σ σjk , ∂σ σmn > (detσ)1/2 ) = σ iα (Γm n m n
mi Γnα − Γin Γαm )(detσ)
1/2
− σ iα Γm l
mi Γlα (detσ)
1/2
∂σjk ∂σmn
= −σ iα Γm n
in Γαm (detσ)
1/2
= R(detσ)1/2 (8.16)
and
p
∆σ (m) (det(σ)) = σij σkl (m)∂k ∂l (δij + R(1)ijkl (m)xk xl + ...)|x=0 (det(σ))1/2 =
R(1) (det(σ))1/2
206
Physics from Fisher Information
kl
∂
(σ ij ψi ψj (detσ)1/2 ) ∂σ∂s(s) dp
R
δJ = M ∂σ kl
ij ij
(ψ̄i ψ̄j − 12 σij kψ̄k2 )(detσ)1/2 ∂σ∂s(s) dp+2 σ kl ∂σ∂ij (ψ̄k )ψ̄l δ(detσ)1/2 ∂σ∂s(s) dp
R R R
δJ = M M A
∂σ ij (s)
It remains to show that the last term is zero. Denote ∂s
|s=0 as F .
So, with the integral over the manifold M understood,
∂ ψ̄j ∂ ∂ ψ̄j
2σ ij ψ̄i kl
(detσ)1/2 F = −2 (σ ij kl ∂i σmn (detσ)1/2 F )
∂σ ∂σmn ∂σ
∂ ψ̄j
∝ δiα δjβ (−σ ij )2 kl ∂i σαβ (detσ)1/2 F
∂σ
2
∂ ψ̄j
+ kl
∂i σαβ (detσ)1/2 F σ ij
∂σαβ ∂σ
∂ ∂ ψ̄j
+ (∂i σαβ )σ ij kl (detσ)1/2 F
∂σαβ ∂σ
1 ∂ ψ̄j
+ (−σ αβ (detσ)1/2 )σ ij (∂i σαβ ) kl F
2 ∂σ
∂F ij ∂ ψ̄j
+ σ ∂i σαβ (detσ)1/2 . (8.17)
∂σαβ ∂σ kl
∂ ψ̄j
0 = divσ ψ̄ = ∂ σ σ ij
∂σαβ i αβ
= σ ij ∂i ψ̄j .
Also observe that since ∂i σαβ ∂σ∂Fαβ = ∂i F , we may integrate the last term by
parts to obtain
∂ ψ̄
−F ∂i (σ ij ∂σklj ∂i σαβ (detσ)1/2 )
207
Physics for Sharp Manifolds
(neglecting boundary terms, of course, since we are assuming that they are
negligible anyway)
∂ ψ̄j 1/2 1
= −F (σ ij (−σ αβ
∂ i σαβ (detσ) )
∂σ kl 2
(since ∂i (detσ)1/2 )
−σ αβ ∂i σαβ
= (detσ)1/2 )
2
∂ ψ̄j ∂ ψ̄j
− F ∂i σ ij kl (detσ)1/2 − F σ ij ∂i kl (detσ)1/2 (8.18)
∂σ ∂σ
Now it can be seen that the first and second terms here cancel with the fourth
and first terms in our previous expression (8.17) respectively.
So our original expression reduces to
∂ ∂ ψ̄j ∂ ψ̄j
{ (∂i σαβ )σ ij kl (detσ)1/2 − σ ij ∂i kl (detσ)1/2 }F
∂σαβ ∂σ ∂σ
∂ ∂ ψ̄j ∂ ψ̄j
={ (Γlβi σlα + Γliα σlβ ) kl − ∂i kl }σ ij (detσ)1/2 F
∂σαβ ∂σ ∂σ
∂ ψ̄j
= {(Γlβi δαl δβα + Γliα δαl δββ ) kl σ ij
∂σ
ij
∂ ∂σ
− kl (σ ij ∂i ψ̄j ) + kl ∂i ψ̄j }(detσ)1/2 F
∂σ ∂σ
= 0, (8.19)
since the first term is zero by the antisymmetry of Γ in its lower indices, and
the remaining two terms vanish since divσ ψ̄ = 0.
But this was precisely what we needed to show, completing the proof of Claim
4.
208
Physics from Fisher Information
(i) Suppose M has boundary. What is the impact of the boundary on the
resulting equations, ie what boundary terms does one get?
(ii) What happens if B 6= 0? Moreover, can we determine from the resulting
equations how B is related to classical electromagnetic effects?
Note that if our conjecture from before, our ”holographic principle” is true, (i)
is an irrelevant question, i.e. it was completely general for us to consider this vari-
ational problem with Dirichlet boundary conditions in order to derive the relevant
equations.
Point (ii) will be addressed in a later section.
209
Physics for almost sharp manifolds
But the Levi-Civita theorem also applies if we have a metric with both symmetric
and antisymmetric components, so as to produce a combined connection subject not
to assumption of symmetry or antisymmetry. This in turn induces a natural notion
of curvature for our space.
So instead of considering gradσ f = ψ − curlσ B as we did before in the sharp
case for a symmetric σ, absorb ψ into the dynamics of σ and rewrite the above as
gradσ̂ δ(σ̂) = −curlσ̂ B. Since the right hand side lacks generality (as I have pointed
out), replace it naturally with −gradτ δ(τ ). I claim this is a valid choice for any
antisymmetric τ . Then we have the rather simple expression gradσ̂+τ δ(σ̂)δ(τ ) = 0,
or, if we use the fact that σ̂ and τ are arbitrary,
This leads naturally of course, after mirroring the above arguments and calcu-
lations, to a generalised Einstein-Hilbert action
R
M
Rσ dm = 0
210
Physics from Fisher Information
it will turn out that in order to optimise K we will have to solve two coupled partial
differential equations for these parameters.
Throughout this section I will assume that B = 0.
First I will demonstrate that a good choice of signal function is
2
(1 + ∆σ + ∆2σ 2! )δ(σ(m) − a)
where ∆σ = σ kl ∇k ∂l .
This signal function is motivated by considering the very reasonable signal func-
tion
Z Z
1 −( x )2 1 2
√ e k f (x)dx = √ e−v f (kv)dv
k π π
k 2 2 00
Z
1 −v 2 0
= √ e (f (0) + kvf (0) + v f (0))dv + terms of order k 3
π 2
2
r
k d π
= f (0) + 0 + √ ( )|t=1 f 00 (0) + Θ(k 3 )
2 π dt t
k
= f (0) + ( )2 f 00 (0) + Θ(k 3 )
Z 2
k
= (δ(x) + ( )2 δ 00 (x))f (x)dx + Θ(k 3 ) (8.20)
2
x 2
But f was arbitrary, so k√1 π e−( k ) ∼ δ(x) + ( k2 )2 δ 00 (x).
This naturally extends to the general case as
211
Physics for almost sharp manifolds
for small , with error O(2 ). So in what follows it makes sense to consider the
deformed metric σ̄ := σ×k −1 ◦ instead of σ. In order to make sense of the argument
to follow, however, for transparency write σ̄ := ˆσ̄ˆˆ , where ˆ = sup(x∈M ) (kk −1 ◦ kσ )
is the maximal variation and hence constant.
Finally for the argument to follow map ˆ 7→ , and σ̄ ˆ 7→ σ. The point of doing
all of this is essentially so we can treat our perturbative parameter as constant and
concentrate wholly on the metric. We will unravel afterwards according to this key.
Note that the condition that boundary terms have negligible contribution seems
eminently reasonable in light of our assumptions if the boundary is at infinity, since
we assumed a finite mass distribution E < ∞ (the equations trivialise to the previous
case if E → ∞ (admittedly with boundary terms)).
Claim 2 : The total information K can be rewritten as
R
4K = M
(R + R(2) + R(3) 2 )dm
212
Physics from Fisher Information
where R(2) is the fourth order geometric invariant defined by the contraction
of CurvRiem three times, where Curvij = ∇i ∇j − ∇j ∇i − ∇[i,j] is the curvature
operator. Similarly R(3) is the sixth order geometric invariant. More generally,
we define R(k) to be the 2k th order geometric invariant defined by the appropriate
number of contractions of Curv k−1 Riem.
Corollary: Untangling the above via the key, we obtain that the information is
ˆ3 + ...)(det(σ;ˆ ))1/2 dm
R
4K(σ, ) = M
(R(σ;)
ˆ ˆ + R(2)(σ;)
ˆ ˆ2 + R(3)(σ;)
ˆ
Remarks.
(1) Requiring now that δK = 0 allows us to solve simultaneously not only for the
optimal metric σ, but for the optimal expansion parameter . In particular
this tells us that may vary wildly depending on the nature of our solution.
∂K
Furthermore, the requirement that two equations must hold - ∂σ −1 = 0 and
∂K
∂
= 0 - serves as an obstruction to the space of solutions one might expect
given one or the other to hold by itself (see the later section on classification).
(2) We have neglected the contribution to the information from boundary terms,
which we assumed to be negligible. In fact, due to the nature of the model we
are using, this is a necessity. However it is clear that there is physics that does
not fall under this general umbrella. This will be taken up again in a later
chapter, when I talk about statistical stacks and condensed matter physics.
(δ −1 k∂σ δk2 + {− ∆δσ δ δ −1 k∂σ δk2 + 2δ −1 ∆k∂δk2 }) + 2 {∆2 (δ −1 k∂σ δk2 ) +
R R
4I = M A
2δ −1 ∆2 k∂δk2 − 2∆σ (δ −1 ∆k∂δk2 } + B()
where B() are boundary terms depending on derivatives of , which can there-
fore be neglected. We are also using the identity
213
Recovery of standard results
1
1+f +2 f 2 /2
= 1 − f + 2 f 2 /2 + ...
Once more throwing away terms of order 3 and above, we rewrite the above
expression as
2
δ −1 k∂σ δk2 + {(σ αβ ∂α σjk ∂β σmn (detσ)1/2 ) ∂σjk∂∂σmn ∆σ δ} +
R R R R
4I = M A M A
R R 2 3
M A
∆ (detσ)1/2 δ
as
K = I(σ) − J(σ)
where
214
Physics from Fisher Information
I(σ) = K(σs )
and
R R
J(σ) = M A
kT (σs )k2 δ
R
J= M
(kψ̄k2 − kcurlσ B̄k2 )
So since we already understand the variation of the first term above from before,
we need only focus on the second term.
Claim 4. We have:
R R
M
kcurlσ B̄k2 = M
(k∇B̄k2 + (divσ B̄)2 )
(R − kψ̄k2 + k∂ fˆk2 )
R
K(σ) = M
215
Recovery of standard results
divσ (B̂δ) = 0
Hence
divσ B̂ = 0
216
Physics from Fisher Information
R R
M
kcurlσ B̄k2 = − M
< B̄, curlσ curlσ B̄ >
Proof. (Claim 6). This claim is an easy consequence of the previous claims and the
results of previous sections.
217
Recovery of standard results
using the Minkowski metric. This is nothing other than the Coulomb Gauge.
I also claim that k∇B̄k2 = 0 is equivalent to the statement ∆B̄ = 0, since we
observe then that
But the expression on the left hand side is zero by a basic operator identity and
hence since B̄ is timelike the conclusion readily follows. So ∆B̄ = 0, or
∂2
(∇2 − ∂t2
)Ax = 0
∂ 2
(∇2 − ∂t 2 )Ay = 0
∂ 2
(∇2 − ∂t 2 )Az = 0
∂ 2
(∇2 − ∂t2 )φ = 0
218
Physics from Fisher Information
(1 + 2ˆ∆)ψ̄ = 0
and
(1 + 4ˆ∆ + ...)ψ̄ = V
for some potential function V , from which we conclude after substitution the
Klein-Gordon equation for the wave function:
2ˆ∆ψ̄(m) = V (m)
For flat Lorentz manifolds, in the non-relativistic limit, where v/c << 1, we may
rescale time in terms of v/c to remove ˆ and the equation degenerates to a parabolic
PDE, the Schrödinger equation:
8.4 Classification
8.4.1 Motivation
Now the classification of all n-manifolds for n ≥ 4 is well known to be impossible,
since the word problem on the fundamental group of such objects is known to be
in general insoluble. However, since we have some control of the full Riemann
curvature tensor for stable physical manifolds, and since, by their very nature, these
219
Classification
manifolds are specially constructed, we might expect some form of classification for
them. Hence we are led to the following
Conjecture: There is a classification of all stable, physical Lorentz n-manifolds
for each n for both the zeroth and first order cases treated previously. In other
words, there are a finite number of families of eigensolutions to these equations,
which may be described precisely.
These fundamental solutions may well correspond, at least in relation to the first
order equations, to well known “particles” which have been observed in experiment.
This leads us to the further
Conjecture: There is a one to one correspondence between eigenfamilies of stable,
physical Lorentz 4-manifolds with Dirichlet boundary conditions satisfying the full
blown 1st order equation with nonzero electromagnetic term, and the “particles” of
the Standard model of particle physics.
Clearly in order to analyse the solutions of our PDEs we need to have a general
set of tools at our disposal to identify symmetries in the equations. So one might
expect that Lie Group theory would play a pivotal role.
Motivating Example: Consider the Dirac equation (with all constants set to one-
except for h and m):
h ∂
σ ij ( m ∂xj
+ Aj )Q = iQ
8.4.2 Speculation
Certainly for small particles, we would expect a natural splitting in local charts of the
Lorentz metric into approximately a Riemannian metric for the space coordinates,
and the Riemannian metric for the time coordinate. To be more precise, we expect
there to be a coordinate system such that the off block entries in the expression for
the metric in the chart are of order 2 and hence can be neglected. Then, in other
words, fundamental solutions would be classified by
220
Physics from Fisher Information
Then, we may observe that from the geometrisation conjecture it is now well
known that there are only eight possible choices for the prime components of a three
manifold. One manifolds have, of course, a trivial classification: they can be either
S 1 or R.
To get fermions, we need to write the PDEs L1 R(1) := R(1) +R(2) = 0, L2 R(2) :=
R(2) +2R(3) = 0 as Ki R(i) = 0 where ”K̄i Ki = Li ” or, more precisely, < Ki , Ki >σ̂ =
Li where σ̂ is the complexification of σ (the complex splitting of our original PDE).
To be more precise, we should write R(i) (σ) = kR(i) (σ̂)k2σ̂ for the corresponding
complexifications of R(i) . Then we would say we get fermions if Ki (R(i) ) = 0.
Anyway, we then get eight fundamental solutions, according to our classification
above, times either a circle or a line. Solutions of the form X × S 1 correspond to
virtual fermions; solutions of the form X ×R we expect to be real fermions. This fits
in nicely with the standard model, which lists off eight different types of fermions
per “generation”.
But there is also the other complex factorisation of our orginal PDE, where
Li = kK̄i k2σ̂ , K̄i being the complex conjugate to the operator Ki . Then we get
corresponding solutions to the equation K̄i Γ = 0, where Γ = Ric − φ ⊗ φ. This leads
us to introduce an additional quantum number, which we shall call spin. Fermions
with spin up are solutions to the equation Ki Γ = 0; fermions with spin down are
solutions to the equation K̄i Γ = 0.
What about bosons? This is a slightly tricky thing to get precisely right, but
we can roughly say that we would expect bosons to be the pairing of a spin up
fermion with a spin down that otherwise have very similar quantum numbers, via
the factorisation of the equation we wrote above. Virtual bosons will correspond to
gluons, of course, of which there will be eight - and the real bosons will correspond
to the set {photon, W, Z, Higgs}. There are of course particle and antiparticle
equivalents of these last four (under the symmetry of charge conjugation, λ 7→ λT
where λ is the antisymmetric component of σ.
Now, we do not have nearly enough information to talk about the different
”generations” of the standard model fermions, but within the scope of this model we
might guess that they correspond to the ground and excited states of each of the eight
fundamental families of eigensolutions to the PDEs Ki R(i) = 0 (up to spin). It does
not seem immediately obvious as to why there should be only three eigensolutions
per family, since the PDE suggests there should be an infinite quantity.
221
Classification
One can make handwavy arguments saying that the perturbation expansion is
no longer valid beyond three but I suspect that the deeper reason is that it is the
particle physics version of the Lorentz problem. To recall, the Lorentz problem is the
mystery of why Lorentzian geometry should be the preferred geometry of low energy
physics. The reason that this is suggestive is because one can pair the triple of the
three generations of a type of fermion (spacelike directions) with the corresponding
force carrier, or boson (timelike direction). I do attempt to sketch a solution to the
Lorentz problem in the next chapter - perhaps the proof can be adapted to shed
some light upon this particular mystery too.
As a final throwaway remark, we might naively guess that maybe fermions are
somehow more fundamental than bosons (although this is in direct contradiction
to what I have just said); that bosons can be decomposed into fermion ”conjugate
pairs”. Maybe there is a connection here to twistor theory?
In fact, what we are doing is performing a kind of matched asymptotic expansion
with gluuing- we are taking a complete model of the particle (S 3 × R, for instance),
and then gluuing it to a very small region in R14 which of course is modelled after a
space in which there is no matter (see diagram). So in this instance, these particles I
am describing are a perturbation on an otherwise boring landscape (there are more
metrics than simply that for R14 allowable on the outer solution, however; in fact,
once again, we have eight alternatives).
I must stress that this whole approximation is only valid when
(a) Our particles are small, i.e. we are dealing with the lower eigenvalues in this
model of the governing equation, and
(b) is very small.
So evidently if we are finding solutions of our equations with values of which
are not << 1, then these solutions are unphysical, because they contradict the
assumptions made to get to them.
Later on in the next chapter I will develop tools which might be usable to
overcome these limitations, and create better even better models for particle physics.
However assumption (a) still stands by necessity.
8.4.3 Extensions
Remark. Here are some thoughts that I had in 2007 which serve to motivate the
following chapter. They should not be taken too seriously, but rather as an informal
222
Physics from Fisher Information
a + bi + cj + dk
where the real part is associated to the timelike dimension and the ”imaginary”
part to the spacelike dimensions.
Of course this is all very artificial and may not really mean anything at all. Still
it suggests that quaternions might have a certain significance. This does slightly fly
in the face of intuition, however, because it is actually the complex numbers that
have the most structure and one might therefore expect them to be somewhat more
natural.
223
Classification
What perhaps is a more plausible way of looking at why perhaps the support of
information in the universe is almost entirely contained on a Lorentz 4-manifold can
perhaps be seen by examining the first order perturbative PDE for the equations
of physics. For it to be a nontrivial generalisation of the nonperturbative case, in
other words, for there to be any ”quantum mechanical effects”, one requires that
the second degree curvature term, R(2) , be nonzero. But this is only possible if there
are at least 4-dimensions in which to work, because otherwise this term will vanish.
So then one can again invoke the principle of maximal laziness, or Occam’s razor,
or your favourite equivalent principle, and say then, well, clearly, four dimensions
is the simplest structure in which to work at low perturbations, so that is how the
information in the universe will naturally be structured.
That does not mean that there will not be small (though usually negligible)
traces of information in other, ”higher” dimensions.
Why would the index be one? It may be that the best way to organise a four
dimensional system is for it to have as much structure as possible. So we come
back to quaternions again, which is clearly equivalent to a certain class of Lorentz
manifolds.
The natural things to consider for such statistical manifolds are of course infi-
nite dimensional random matrices, or, alternately, the continuous geometries of von
Neumann could be adapted to help flesh out an appropriate model here to prove
this conjecture properly.
Interestingly, this conjecture may not be true in certain circumstances, which
could lead to some quite weird behaviour indeed.
Another research direction to consider is to look at ”higher statistical moments”,
that is, to build a statistical geometry on top of an existing statistical geometry. To
make myself a bit clearer, we have been considering, so far, a distribution on the
space of metrics, A1 , of a given differentiable manifold, M . It is then a simple process
of abstraction to consider different metrics over the space A1 , so we are considering
a distribution of inner products in a space A2 over the space of inner products A1 of
a manifold M n . We could even associate a differential structure to A1 by building
and gluing charts of appropriate dimension together from the underlying Euclidean
structure to make A1 essentially a general n(n − 1)/2 manifold, ie make it only
locally euclidean for the purposes of construction of metrics over it.
In this way, one can potentially build an infinite ladder of spaces, (M = A0 , A1 , A2 , ..., Ai , ...).
Relatively speculative though the thought is, I believe that for assuming that all
distributions are almost sharp that the associated Banach or Hilbert spaces corre-
224
Physics from Fisher Information
225
Classification
226
Chapter 9
Turbulent Geometry
Fractals are generally wily and unruly beasts. There are many unresolved questions
about the ones that are known, and even the simplest (such as the Mandelbrot set)
as notorious for their complexity and depth of structure. Is it useful to model the
atoms of reality to some degree as fractal structures? Certainly there are many
phenomena in nature that exhibit scale free behaviour, so it would appear that
the answer to this question might be yes. Certainly in cosmology, it is well known
that the large scale structure of galaxies and interstellar dust appears to exhibit
characteristic properties of being fractal, at least between certain distance scales -
a self-similarity across many orders of magnitude.
227
Preliminaries
9.1 Preliminaries
9.1.1 Fractional Calculus
Before I go any further, I will need to describe the fractional calculus.
A fairly natural question to ask is, is there a way, for any real number α > 0, to
take the αth derivative of a function, or the αth integral?
It turns out that there is such a way, however, whereas for integral α differenti-
ation is purely a local operation, in general boundary information is required.
Rx Rx
For some motivation, define (Jf )(x) = 0 f (t)dt, (J 2 f ) = 0 (Jf )(t)dt etc.
1
Rx
Then clearly (J n f )(x) = (n−1)! 0
(x − t)n−1 f (t)dt for natural numbers n
and this definition extends naturally to all positive real numbers α:
1
Rx
(J α f )(x) = Γ(α) 0
(x − t)α−1 f (t)dt
228
Turbulent Geometry
go from ∞ to ∞. So, with such a theory, we should be able to treat such stacked
spaces in a more streamlined, natural, and elegant fashion.
Of course, it seems quite logical that once we have determined and fleshed out
the necessary concepts, it should then be possible to use them to determine new
physics, after hitting the relevant information with the Cramer-Rao inequality and
evaluating the critical points. This shall become the ultimate aim of this section.
Before I mention the main ideas here, I should talk first of fractal geometry.
By this I do not mean fractal geometry in the standard sense, where one deals
with Cantor dust like objects. Rather, I am interested in considering a variety of a
continuum of objects where the dimension at each point in the set takes a value on
the real line and varies from point to point. More generally, in turbulent geometry,
I am interested in extending the notion of the dimension of an infinitesimal piece
of a set to Rn , or, more precisely, to a Riemannian n-manifold N , that depends
somehow on the base space M .
The key idea is to observe that, if ψ is some distribution
R R over the real line, we can
th r
measure the r fractal dimension of ψ by computing R R ψ(x)∂(a) δ(σ(x) − a)dadx,
where σ is an inner product on the tangent space of R, and ∂ r is the usual extension
of the derivative to the real line. This can easily be checked. We immediately
observe that there is a key connection to statistical geometry here. Pursuing this
analogy, we would like to be able to sensibly define ∂ r where r = (r1 , ..., rn ) is the
local expression of an element of some manifold N .
First define g to be a point in Rr if it is a map
R R r
M A
ψ(m)∂(a) δ(σ(m) − a)dadm
However certain issues remain unresolved, such as how to develop dynamics from
this language. That will be the focus of the next section.
229
Preliminaries
where ρ(~g) is the density of the Fisher information with respect to a vector of
signal functions ~g . Then this is a natural interpretation of the fractal dimension,
given in terms of the standard language of statistical geometry from before, such
that the physical information is:
∂ ∂ ∂σ −1 ∂ ∂δ ∂δ ∂
∂δ −1
= ∂σ −1 ∂δ −1
= ∂σ −1 ∂σ
= ∂σ ∂σ −1
∂ ∂ ∂
with the last equality following because ∂σ ∂σ −1
= ∂Id is the zero operator.
If we now use this as a prototype for our idea of turbulent derivative, we wish
to look at
∂ ∂
( ∂δ δ; κ)1/2 = (( ∂σ δ; κ)( ∂σ∂−1 δ; κ))1/2
230
Turbulent Geometry
(∂δ ∗ δ; κ) := ( ∂δ
∂
δ; κ)1/2
This will provide the desired effect of providing dimension κ(m, a) at the location
(m, a). This notation also naturally extends to more general signal functions.
We need to check that this is well defined, of course - namely that I(phys) is
invariant under alteration of ρ(g) by a boundary term. This may require a further
condition to be imposed, such as criticality or some form of conservation law, more
likely.
Regardless, it now seems clear that the natural space to use as our ”dimension
manifold” at p ∈ M for the ath inner product is T(p,a) M .
A special case of the above, when f and g are both sharp, with f (m, a) =
δ(σ(m) − a) and ~g (m, a) = δ(~τ (m) − a), causes the information I(phys) to reduce to
the rather elegant expression
R
I(phys) = M
∂ R~τ Rσ
What still is not clear, however, is whether we should merely require that I(phys)
be critical - the most likely bet - or whether both informations should be critical.
The information that we are interested in studying is for almost sharp turbu-
lence:
2
(1 + ∆~τ + ∆~τ2 2! + ...)R~τ
R
I(turb) = M
and a completely general signal function for the base space. If we evaluate things
by brute force this leads to expressions which are bulky and difficult to deal with,
so I will not do so here. Instead it will turn out that with a bit more work and
development of appropriate notation these expressions can be reduced to something
more manageable.
R
Note that a completely general signal function is of theR form
R f (m, a) = A
F (m, b)δ(σb (m)−
a)db, and, without turbulence, the information would be M A eh ∆σb (h)dbdm where
F = eh .
231
Preliminaries
with sharp, or almost sharp turbulence defined on them, given by a signal func-
tion of the form
232
Turbulent Geometry
Integrate by parts so that (∂f ∗ ; δ(τ (m) − b)) is acting on the volume element.
By general nonsense we may then realise this as an eigenfunction ρ(m, b) of a volume
element induced by a new family of metrics σb . Hence
R
(f ; g) = A ρ(m, b)δ(σ(m) − a)(det(σb ))1/2 db
233
Cramer-Rao for turbulent statistical manifolds
where g(m, b, c) = G(m, c)δ(σc (m) − b). But by the correspondence principle
this is equivalent to
where α and β are sharp. So we can see that turbulent geometry is just a
special case of fractal fractal geometry, which we are interested in for its potential
applications to entanglement physics. By abuse of terminology I will also call the
latter second order turbulent geometry, and geometry with fractal measure first order
turbulent geometry.
where ρf , ρ~g are the Fisher information densities associated to the signal func-
tions f and ~g .
234
Turbulent Geometry
To prove this result I will need to play the same game as before and develop
an appropriate idea of weak and strong turbulent estimators. However, due to the
complexity of the issues involved, my treatment will be at best a sketch of what a
properly rigourous treatment would require.
f ×~g
Em ~ ◦ ~v ) = (∂f ∗ θ(m); φ(m))
(∂f ∗ θ ◦ u; φ ~
∗ f ×~g
R Rwhere∗ ∂f is the turbulent statistical derivative with respect
n
to f , Em (κ) =
A A
(∂f f (m, a); ~g (m, b))κ(m, a, b)dbda, and φi , θ : M → R are coordinate charts
on M .
(f ×~g )
Lemma 9.2.2. (Factorisation). Em (∂f ∗ τ ; σ) = A (∂f ∗ (f τ ); (~g σ)). Equiva-
R
f ×~g
Em ~ k ◦ ~v )lnf (u) ∂f ln~g(~v) ∂~g (∂f ∗ f (u); ~g (~v ))} = 0
{(∂f ∗ θi ◦ u; φ
f ×~g
hkl
ij (f, ~
g ) := Em {Γik (f, ~g )Γjl (f, ~g )}
where
235
Cramer-Rao for turbulent statistical manifolds
~ k ◦ ~v ), (∂f ∗ θj ◦ u; φ
cov f ×~g {(∂f ∗ θi ◦ u; φ ~ l ◦ ~v )} − hkl (f (u), ~g (~v )) ≥ 0
ij
ik
Tαβ := (∂f ∗ (θi ◦ u); (φ ~ α ◦ ~v )} −
~ α ◦ ~v )) − E f ×~g {∂f ∗ (θi ◦ u); (φ
m
αβ
{∂f ∗ ∂θ∂k ln(f (u)); ∂ φ∂~β ln(g(v))}hik (f (u), ~g (~v ))
Expanding, we get
But, observing first of all that h is statistically independent, then by the lemma
the second term reduces to
(f ×~g ) (f ×~g ) ~ k ◦~v ))(∂f ∗ ∂α ln(f (u)); ∂ ln(~g (~v )))}hlβ (f (u), ~g (~v )))
2Em ({−Em (∂f ∗ (θi ◦u); (φ ∂θ ~β
∂φ jα
using Em (−v) = −Em (v), together with the definition of unbiased turbulent
estimator, we get
(f ×~g ) ~ k ◦ ~v ))(∂f ∗ ∂α ln(f (u)); ∂ ln(~g (~v )))hlβ (f (u), ~g (~v )))
−2Em ((∂f ∗ (θi ◦ u); (φ ∂θ ~β
∂φ jα
∂ln(f )
∂f
Note that ∂θ j = ∂θj
f , and also (∂f ∗ ∂ln(f
∂θi
)
f ; ∂ln(~
∂φ
g)
~j ~g ) = (∂f ∗ ∂ln(f
∂θi
) ∂ln(~g )
; ∂ φ~j )(∂f ∗ f ; ~g )
by the factorisation lemma.
One concludes that
(∂f ∗ ∂ln(f
∂θi
) ∂ln(~g ) ∂f
; ∂ φ~j )(∂f ∗ f ; ~g ) = (∂f ∗ ∂θ ∂~g
i ; ~j )
∂φ
from which follows, after integration by parts that the second term in the original
expansion simplifies to
(f ×~g )
−2Em (hkl kl
ij ) = −2hij
236
Turbulent Geometry
(u, v) = (∂f ∗ f, ~g )
Proof. To prove this we need the turbulent version of Stoke’s theorem. However
after this the rest is easy.
Theorem 9.2.8. (Turbulent Stokes). Let ω be an n − 1 forms on an n manifold
M , and κ a vector valued function on M . Then
d(∂ω ∗ ω; κ) = (∂ω ∗ ω; κ)
R R
M ∂M
237
Statistical Stacks
Proof. The key observation is that the problem may be converted into a repeated
application of Stoke’s theorem in the standard case, from which the result quickly
follows.
Remark. The statistical version of the above follows by merging this result with the
statistical stokes theorem (see chapter 2 or chapter 7). I will not go into this here.
Suppose now (u, ~v ) is a weak tmle. Then for this choice, the above inequality is
~ k ◦~v )}, {∂f ∗ (θj ◦
an equality, and, furthermore, we have that cov (f ×~g) ({∂f ∗ (θi ◦ u); (φ
~ l ◦ ~v )}) = 0, since we may use θ = ln = exp−1 as our coordinate chart, and
u); (φ
then
Lemma 9.2.10. (Turbulent EPI principle). Write h(f (u), ~g (~v )) = (∂f ∗ fij ; ~gkl )hkl
ij (a)(f (u), ~
g (~v )).
Then we have that
(∂f ∗ ρf ; ρ~g )
R R R R
0≤ M A
h(f (u), ~g (~v )) = M A
238
Turbulent Geometry
f k (m, a0 , a1 , ...)
where the ai corresponds to the ith statistical level, and k ∈ R∞ is the turbulent
variable. But this is rather bad notation, since after all we might have less than one
ai ; so we might instead write
f k (m, a(k) )
∞
with some sort of induced probabilistic distribution being understood over RR ,
in particular k = ∂ρg(stack) (m, b) where ρg (m, b) is the information density of a signal
function for a standard statistical distribution over our manifold M with signal
function g and statistical variable b.
In particular, I will define a(k) ∈ Rk . Also, I claim that we may interpret without
loss of generality f k literally as f k , for some base signal function f (m, a).
Then, in order to compute the information, we write f (m, a)g(m,n,b) =: γ(m, n, a, b),
and then compute
239
Statistical Stacks
R R R R
I(γ) = M N A B
ργ
As a motivating example for all this, I claim that the information of a sharp
turbulent stack is
R R R )
M N
Rσ τ̄n (m) (m)dndm
where here of course g(m, n, b) = δ(τ (m, n) − b), and f (m, a) = δ(σ(m) − a).
This can be verified by standard sorts of arguments to before and reference to the
definitions.
A slightly less trivial example is the following:
R R
M A
(1 + ∆τ + ∆2τ 2 + ...)∆f (m,a) F (m, a)dadm
which arises from the Rsecond statistical moment of a stack associated to the
signal function f (m, a) = A F (m, b)δ(σb (m) − a)db after ”forgetting” some of the
degrees of freedom at the first level. Here evidently our turbulent signal function
is almost sharp scalar turbulence, g(m, a) = (1 + ∆τ + ...)δ(τ (m) − b) (since then
f ∂ρg = (1 + ∆τ + ...)f as required).
9.3.3 Dynamics
I now give the extremisation theorem for this object, ie, the Cramer-Rao inequality,
and give a sketch of its proof. This will not nearly be as hard as it could be, since
we have already done most of the work; it will mainly be an adaptation of the proof
of the Cramer-Rao inequality for standard turbulent spaces.
Theorem 9.3.1. (Cramer-Rao Inequality for a turbulent stack). Let us have a
turbulent stack (M, N, A, f, g) as above. Then I(f g ) ≥ 0.
To prove this result we will need to play the standard game and build up an
appropriate set of tools to tackle the problem.
Definition 43. Let Λ = (M, N, A, f, g) be a turbulent stack, where f is the signal
function on the base and g is the signal function on the turbulent superstructure. A
turbulent estimator on this structure is a tuple (u, v) where u, v are maps from the
sample space M × A to M . An unbiased turbulent estimator is an estimator that
satisfies the condition that
240
Turbulent Geometry
f g
E(m) ((θi ◦ u); (φj ◦ v)) = (θi (m); φj (m, n))
where
f g R
Em (τ ; σ) := A
f g (τ σ )
is the turbulent expectation with respect to the signal functions f and g, and
θ : M → Rn is a coordinate chart on M , φ : M × N → Rn is a coordinate chart on
M × N.
Remark. From now on I will use the notation f g and (f ; g) interchangeably.
(f ;g) R R
Lemma 9.3.2. (Factorisation). Em (τ ; σ) := A ((f τ ); (gσ)) = A (f ; g)(τ ; σ).
Proof. We use the fact that f and g are normalised distribution functions over A,
which conserve probability.
Lemma 9.3.3. Unbiased turbulent estimators satisfy the following relation:
(f g )
Em ({(θi ◦ u); (φk ◦ v)}{( ∂θ∂ j ln(f ◦ u)); ( ∂φ
∂
l ln(g ◦ v))}) = 0
∂ (f g ) ∂ ∂ ∂
hkl
ij (f, g) := Em {( ∂θi ln(f ); ∂φk ln(g))( ∂θj ln(f ); ∂φl ln(g))}
cov (f ;g) ({(θi ◦ u); (φk ◦ v)}, {(θj ◦ u); (φl ◦ v)}) − hkl
ij (f (u), g(v)) ≥ 0
(f ;g)
where cov (f ;g) (v ik , wjl ) := Em (v ik wjl ).
(f ;g)
jl
Proof. Note that Em (Γik
αβ Γγδ ) is certainly nonnegative, for
Γik
αβ :=
(f ×g)
((θi ◦u); (φα ◦v))−Em {(θi ◦u); (φα ◦v)}−{ ∂θ∂k ln(f (u)); ∂φ∂ β ln(g(v))}hαβ
ik (f (u), g(v))
241
Statistical Stacks
Expanding, we get
(f ;g) (f ;g)
cov (f ;g) ({(θi ◦u); (φk ◦v)}, {(θj ◦u); (φl ◦v)})−2Em ({((θi ◦u); (φk ◦v))−Em ((θi ◦
u); (φk ◦ v))( ∂θ∂α ln(f (u)); ∂φ∂ β ln(g(v)))}hlβ kl
jα (f (u), g(v))) + hij (f (u), g(v)) ≥ 0
But, observing first of all that h is statistically independent, then by the lemma
the second term reduces to
(f ;g) (f ;g)
2Em ({−Em ((θi ◦ u); (φk ◦ v))( ∂θ∂α ln(f (u)); ∂φ∂ β ln(g(v)))}hlβ
jα (f (u), g(v)))
using Em (−v) = −Em (v), together with the definition of unbiased turbulent
estimator, we get
(f ;g)
−2Em (((θi ◦ u); (φk ◦ v))( ∂θ∂α ln(f (u)); ∂φ∂ β ln(g(v)))hlβ
jα (f (u), g(v)))
from which follows, after integration by parts and use of the factorisation lemma
that the second term in the original expansion simplifies to
(f ;g)
−2Em (hkl kl
ij ) = −2hij
242
Turbulent Geometry
∗
∂(∂ ∗ ∗ (τ ))
◦R◦∂(α) ◦ R(σ)
(γ)
243
Topics in turbulent statistical dynamics
statistical effects, so this model will not be useful in such cases, since it is a pertur-
bative expansion in
kkR3 , = (1 , 2 , 3 ).
First of all, ignore perturbative effects. Assume in other words that the geometry
is to some extent sharp, but exhibiting both types of fractal behaviour. Then I claim
we will have an action of the form
where f (m, a)g(m,n,b) = δ(σ(m) − a)δ(τ (m,n)−b) induces the metric (σ; τ ) on M . In
particular the associated signal function S is of the form
S = ((f ; g); h)
With respect to τ :
Rτ = 0
244
Turbulent Geometry
With respect to σ:
With respect to :
(∂σ ∗ R(Rτ )σ ; Rκ ) = 0,
Rτ = 0
and
(∂σ ∗ R(Rτ )σ ; ∂σ ∗ Rκ ) = 0
245
Topics in turbulent statistical dynamics
We might be interested in the case where the geometry is flat, as is usually the
case where one is examining the behaviour of fluids on earth. Hence all three metrics
become purely antisymmetric. The aim is to now rewrite the corresponding PDEs
and compare with the standard form of the Navier-Stokes equations.
I should remark however that, before we can go any further, just like before we
need to reexamine the fundamentals more closely to make sure we know what we
are dealing with. This is a task for the future.
(∂σ ∗ Rσ ; Rτ )
R
M
For this to be critical we require both the derivatives wrt τ and σ to vanish.
Recalling the definition of derivative, we have
(n)
fτ (x)∂ n
∂ fτ := Σ∞
n=0 n!
hence
(n)
∂fτ (x) n
∂ ∂f
∂ fτ
∂τ
∂ := Σ∞
n=0
∂τ
n!
= ∂ ∂τ
246
Turbulent Geometry
(∂σ ∗ Rσ ; Rτ ) = 0
Differentiating wrt σ is not entirely a good idea, but differentiating half with
respect to σ, then half with respect to σ −1 , ie applying the operator ∂σ ∗ , is a better
idea and clearly equivalent.
Via the same trick we have
(n) ∂ n+1 g
∂∂ f g = Σ f n!
= ∂ ∂f g
(∂σ ∗ Rσ ; ∂σ ∗ Rτ ) = 0
But the fractional derivatives of a function can only both be zero if the orders
of the derivative are both the same. So
∂σ ∗ Rτ = Rτ
But this is impossible unless ∂σ ∗ is the identity operator, which will occur only
if σ = 0. So the optimal dimension is zero as required.
Now, in a sharp first order turbulent geometry, it turns out that the concept of
index generalises in a nice way:
Lemma 9.4.2. (Turbulent index). The index of a sharp turbulent geometry (∂σ ∗ δ(σ(m)−
a); δ(τ (m) − b)) takes the form
ind(∂σ ∗ σ; τ ) = ind(σ)ind(τ )
Proof. (sketch). This basically follows from the fact that the weighted index of AB ,
where A, B are both matrices, is equal to the index of A raised to the index of B.
n
For if we denote A = ln to be the logarithm of Â, ie such that  = eA = Σ An! ,
n
we get ÂB = eA⊗B = Σ (A⊗B) n!
. Now the index of A ⊗ B is the index of A times the
ˆ ˆ
index of B. Write ind := log ◦ind◦exp. Then similarly ind(A⊗B) ˆ
= ind(A)ind(B).
ˆ
Hence the index of eA⊗B will be the exponential of this index, ie eind(A)ind(B) . But the
ˆ ˆ
index of  is eind(A) ie ind(eA ) = eind(A) ; hence the index of ÂB is ind(Â)ind(B) .
247
Topics in turbulent statistical dynamics
Proof. (sketch). Via the proof of minimisation of dimension, in a sharp second order
turbulent geometry we must have that the dimension for the base and first levels
must be as small as possible. In particular at the first level the geometry of the
fractal measure will be trivial when optimised.
Now, by the lemma above, we then must have that the index of the geometry to
first order is ind(σ)ind(τ ) . But by the above remarks τ = 0, and so ind(τ ) = 0. Since
we are assuming that the geometry of the base is nontrivial (in particular, since we
are assuming perturbative effects, which pose an obstruction to minimisation), we
have that σ 6= 0. Hence the index is well defined and must be one.
It is then relatively easy to see that these results should generalise; in particular
it seems clear that as a consequence of these two results that an optimal and stable
almost sharp geometry must be Lorentzian.
(∂σ ∗ Rσ ; ∂τ ∗ Rτ ; Rκ )
R
M
248
Turbulent Geometry
of one could arbitrarily determine the state of the other. In particular this demon-
strated that a contradiction could be constructed wherein it was possible to violate
the second assumption, that two noncommuting observables could have independent
reality in the second state.
This motivated the development of hidden variable models, such as the Bohmian
mechanics of Aharonov and Bohm [5], [6] in the 1950s, in an effort to restore to
quantum mechanics causality and locality. But these theories all suffered from
various problems. In [3], Bell argued that it was impossible to construct a theory,
consistent with the wave function interpretation of quantum mechanics that did not
involve entanglement, or nonlocal effects. In order to show this he constructed a
famous inequality, now known (unsurprisingly) as the Bell Inequality.
Of course, the theory which I have laboriously constructed is, in and of itself,
riddled through and through with hidden variables. So in order to continue it
becomes necessary to address Bell’s arguments, and examine the assumptions upon
which he reached his conclusions.
Certainly in the course of this chapter and indeed over the course of this treatise
I have demonstrated that it is possible to construct higher order geometric objects,
which transcend limiting oneself to examining the geometry of the base ”physical”
space. This allows mechanisms via which information may transit smoothly between
points which in a simpler geometry would be far apart. For instance, consider naively
the example of a manifold, and consider two points which with respect to one metric
σ0 are a distance L apart. But we may consider a family of metrics σt such that the
distance between these two points with respect to σt is Le−t . Then clearly in the
limit as t becomes large and positive the distance will drop to zero.
We may now construct as in chapter 7 a distribution over this one parameter
family of metrics. So we have a probabilistic theory with hidden variables. Via Bell’s
paper our theory is still nonlocal. But we may make a further extension of the class of
objects we are dealing with, to consider what I describe in this chapter as turbulent
geometry. Then via the correspondence principle our statistical geometry becomes
a sharp first order turbulent geometry, a hidden variable model which nonetheless
provides us with a local description, and furthermore wherein at least part of the
information - the part corresponding to the limit of the sequence of metrics σt - is
mediated between the two points instantaneously.
Furthermore, if one considers the geometry to be almost sharp, in the sense I
defined earlier in chapter 7, then one recovers standard quantum mechanical effects
(as in chapter 8, section 3.5).
249
Topics in turbulent statistical dynamics
Definition 46. A hidden variable model (in the sense of Bell) is a model that
(ii) that makes use of variables that are ”hidden” in the sense that although they
impact on the process of measurement of physical parameters, they are not
necessarily measurable themselves.
This is certainly quite restrictive! And, indeed by inspection it does not capture
the full generality of the mathematics which I have developed in the course of this
project; indeed, it fails to even capture the spirit of the statistical geometry; it is
similar to trying to understand the full generality of statistical geometry by merely
looking at the behaviour of almost sharp signal functions. Indeed, it is clear the
following is true:
250
Turbulent Geometry
Lemma 9.4.4. A hidden variable model (in the sense of Bell) is equivalent to the
standard product of an almost sharp geometry K with some other form of topological
space T . Furthermore, this is a product of the simplest type: if m ∈ K and n ∈
T , then the corresponding hidden variable model (in the sense of Bell) will have
corresponding coordinates (m, n).
Proof. (sketch). This follows from the fact that one is still using a wave function
description, and the definition of measurement of an observable in a hidden variable
model.
251
Topics in turbulent statistical dynamics
252
Chapter 10
In this chapter I will give a number of results from number theory which can be
derived using the methods I have outlined in preceding chapters. In a sense, many
of the results I will sketch here serve as ”toy examples” for the physics associated
to the same mathematical models. The first section requires nothing more than
a detailed understanding of the statistical calculus for statistical manifolds. The
second needs a basic understanding of statistical stacks and of course of statistical
manifolds. As for the final result, it requires the tools of turbulent geometry.
As a minor remark, for the special case of restricting to criticality of subsets
of the real numbers, and considering their analytic extensions, it turns out that
the theory of turbulent stacks and the theory of turbulent geometry become more
or less equivalent. This is mainly because the statistical derivative becomes trivial
through experiencing cancellation; the sections over the space of inner products
do not contribute to the dynamics. However if one were to consider more general
results pertaining to tuples of complex numbers, these techniques split apart again
and develop their own individualistic character.
Of course there are many, many more results and variations on such that can
be derived using these tools, methods, and general philosophy. Such a treatment is
beyond the scope of this work - the main purpose of this chapter is not to catalogue
all potential results that can be demonstrated using the machinery of information
geometry, but rather to demonstrate its power and capacity. Indeed, it is my sus-
picion that it would be impossible to succeed in the former task, since I am of the
opinion that the well of truth accessible after this fashion to be inexhaustible.
253
The first statistical moment of the complex numbers
10.1.1 Introduction
Broadly speaking, the prime numbers are defined to be the minimal set that contains
maximal information about the natural numbers, ie the minimal set that generates
the natural numbers via multiplication.
I have already introduced the concept of a statistical manifold and developed
a variational principle which essentially amounts to computing critical points of an
information like quantity, called the Fisher information, on the statistical manifold.
Motivated by these two similar properties, it seems natural that many famous
conjectures and problems in number theory should be translatable into this language.
In this short note I indicate how it is possible to translate a famous conjecture of
Riemann’s into this context, and sketch how it might be possible to prove it using
these new tools.
This conjecture is:
Conjecture. (Conjecture A). The class of functions g{fn }n∈N (z) = n∈N nfnz , where
P
{fn } ∈ l∞ (C) have no poles to the right of the critical line Re(z) = 1/2.
I will quickly illustrate now how this implies the more famous conjecture:
Conjecture. (Riemann Hypothesis). The zeroes of the functions g{fn }n∈N lie on the
critical line Re(z) = 1/2.
Corollary 10.1.2. The class of functions g{fn }n∈N have no zeroes to the right of the
critical line Re(z) = 1/2.
254
Number theory from information geometry
It is then well known that the RH follows from this statement for this class. In
particular, this follows from the symmetry of the zeroes for L-functions about the
critical line Re(z) = 1/2, which is in turn a consequence of the functional equation
for the same.
Now, let Π(x, a, d) denote the number of prime numbers in an arithmetic pro-
gression a, a + d, a + 2d, ... which are less than or equal to x. If the RH is true, then
it is well known that for every coprime a and d and for every > 0, we have that
1
Rx 1
Π(x, a, d) = φ(d) 2 ln(t)
dt + O(x1/2+ ) as x → ∞,
where φ : N → N is the Euler phi function, that is, the function where φ(n)
denotes the number of naturals less than or equal to n which are coprime to n.
is not only zero but that δK, the first variation of K, be zero too. Here ψ =
gradΛ f . We ignore torsion due to our assumption of analyticity which renders its
contribution zero by Cauchy’s theorem.
In particular I will more explicitly write ψ more carefully in terms of the signal
function fairly shortly. This will be to assist in generalisations.
R R
I=K= M A
k∂f k2 /f
255
The first statistical moment of the complex numbers
0 2
Let F = eh . Then F 00 = (h00 + (h0 )2 )eh and F 0 = h0 eh , so that F 00 − (FF ) = h00 eh .
∂
Write ψ = F 1/2 ∂σ ∂σ.
∂ ∂
)db then roughly speaking h00 =
R
If we recall that ∂Λ (z, a) = A F (z, b)( ∂z + ∂b
∂2h 2 ∂2h
∂z 2
+ ∂∂ah2 + 2 ∂z∂a . But the second term vanishes due to the holographic principle,
and so does the third, so in particular, we have
h00 = hzz
h00 =
R
A
F (z, b)hzz db
Note we have the following functional identity that will come in handy:
Proof. This may be checked qualitatively by verifying that each side possesses the
same general properties; alternatively, one may take a sequence of smooth functions
converging to H and check both sides, observing that H 0 = δ.
Lemma 10.1.4. h(z, a) = A(a)z + B(a) + G(z, a) Hδ (γ(z, a)), where H is the Heavi-
side function and δ is the Dirac delta functional, and G(z, a) is some random smooth
function depending on σ.
∂
Proof. Certainly hzz = k ∂σ ∂σk2 δ follows easily from above. Hence h is the second
antiderivative of this expression.
00
But δ may be written as Hδ . Then certainly h = G(z, a) Hδ (γ(z, a)) + A(a)z +
B(a), since z is the active variable here, and we may integrate out the derivatives
on H by judicious use of the holographic principle.
256
Number theory from information geometry
Proof. By symmetry H(z, a) = −H(z, −a) and Ha (z, a) = Ha (z, −a). Then H 1/2 (z, a)ψ̂a =
1/2 1/2
−H 1/2 (z, −a)ψ̂a = Ha ψ̂(z, −a) = Ha ψ̂(z, a) follows as a consequence of these
symmetries and integration by parts.
Lemma 10.1.7. The fact that the first variation of the information is zero implies
that γ = γ(2z − a).
R R
Proof. Since the first variation of M A
hzz eh is zero already, we may restrict our-
selves to the reduced information
Iˆ =
R R
M A
H(γ)kψ̂k2 dadz
where ψ̂ = Hψ.
Note also by conservation of probability, we moreover have that divΛ ψ = 0, and
hence divΛ ψ̂ = 0.
ˆ
Requiring that δ Iˆ = 0 implies that ∂ I = 0, or, ∂γ
∂ ∂
Recall that ∂ is essentially the operator ∂z
+ ∂a
. Now, we use the fact that
1/2
Ha ψ̂ = H 1/2 ψ̂a
together with
ψ̂z = 0
γz + 2γa = 0
257
The first statistical moment of the complex numbers
R R
I= M A
H(γ(z, a))kψ(z, a)k2
If we compute the first variation with respect to γ once again and set it to zero,
we have that
δ(γ(t)) dγ |
dt t=a−2z
kψ̂k2 = 0
0
0 = H(γ(t)) dγ
dγ
kψ̂k2
0
which in turn implies dγ
dγ
= 0 or γ 0 is constant, hence γ = Ct + D for constants
C and D. But by the Riemann mapping theorem we can deform γ to Ct since
the transformation T : Ct + D 7→ Ct is a conformal map. Finally we observe that
H(Ct) = H(t) as C is constant, which completes the proof.
Proof. (of Conjecture A). Note that F is analytic (since our statistical manifold is
analytic, require that the signal function be analytic).
It then follows from the analyticity of F that
R R R R
0 = M A F (z, a) = M A H(a − 2z)eA(a)z+B(a)
= M A H(a − 2z)B̄(a)a−z for certain choices of A and B.
R R
P∞
Now suppose B̄(a) = n=1 Bn δ(n − a). Then the above expression becomes
P∞
Bn n−z
R
0= Re(z)≥ 21 n=1
258
Number theory from information geometry
which is
(h00 + R(σ)δ)eh
R R R R R
M A A
((∆F )δ + F ∆δ) = M A
where F (m, b) = eh(m,b) , since cross terms vanish as they can be rewritten as
boundary terms.
If this is to be zero, then h00 + Rδ = 0, or, via a similar argument to the previous
Lemma 10.1.4,
259
Turbulence and the criticality of the prime numbers
where we require that kψ(a, z)k2 := (h00 + R)eA(a)z+B(a) behaves nicely asymp-
totically (and in fact behaves like a generalised mass distribution). Note that this
may not necessarily be positive definite.
The rest of the argument is totally identical to that given previously, following
Lemmas 10.1.7 and 10.1.8.
260
Number theory from information geometry
to achieve. The reason for this is because I have the opinion that there is a natural
geometric hierachy of models of growing sophistication and generality, and that not
only do models higher in the progression give better information, but that there are
an infinite number of such.
So, following an initial extension of the Riemann conjecture to signal functions
with fully general turbulence in the stack and the measure, I will indicate the begin-
nings of the idea of a new mathematics, the idea of geometric exponentiation, and
suggest how this could be used to get even better results.
(∂σ ∗ Rσ ; Rτ )dm
R
I(σ, τ ) = M
Suppose M is the complex plane, and σ,τ are analytic functions from C to C,
being metrics on M . By the correspondence principle it is equivalent to the Fisher
information given by the signal function
R
f (m, a) = A
F (m, b)δ(σ̄(m, b) − a)db
(∂σ ∗ Rσ ; ∂σ ∗ Rτ ) = 0
and
(∂σ ∗ Rσ ; Rτ ) = 0
261
Turbulence and the criticality of the prime numbers
∂2
So since Rσ = 0 we have that ∂z 2
ln(σ) = 0.
Solving for σ, we get
ln(σ) = Az + B
or
σ = eAz+B
(Note that the result is independent of the value of τ , but of course we need
τ 6= 0 as demonstrated above.)
This follows since
∂δ
(i) (∂f ∗ f ; g)2 = ( ∂σ
∂
(σ δ(τ −b) ) ∂σ
∂
(σ (−1)δ(τ −b) ))δ(σ − a)2 , as ∂f δ
= f δ.
(ii) f (z, a)δ(τ −b) = δlnf + H(κ(z, a)) for some function κ, so the above reduces to
∂ ∂
(iii) ∂σ
(δlnσ+ H) ∂σ (−δlnσ + H)δ(σ − a)2 = H(κ̂(z, a))2 δ(σ − a)2 , for a deformed
function κ̂ which proves my claim (if we approximate δ by a sequence of peaked
2
Gaussians of the form be−ax with integral normalised to one, then δ 2 will be
2
by definition approximated by a sequence of the form b2 e−2ax , and hence in
the limit will have zero measure).
We then conclude as before that κ̂(z, a) = C(2z − a) for some constant C, using
once more the fact that the information is critical.
Finally the conjecture then follows as before as a consequence of the analyticity
of F .
262
Number theory from information geometry
We would like to extract optimal information about critical subsets of the real
line using this model, given that f , g and h are all analytic. We know that the
information will be
(∂κ∗ Rκ ; ∂κ∗ Rγ ) = 0
(∂κ∗ Rκ ; Rγ ) = 0
Rβ = 0
Solving these simulatenously in the case that M is the complex plane should
provide us with a sharper understanding of critical subsets of the real numbers.
For some idea of what we might expect to come out of this, I provide the reader
with a conjecture that may well be a partial consequence of analysis of the above.
Conjecture. (Turbulent RH). Let f, g ∈ l∞ (N ) × N be functions from N × N to
R. Consider the generalised L-function
−z
∂ gij j −z
P
ζ(f,g) (z) = i,j fij −z i
∂z gij j
I claim that, for all such f and g, the analytic extension of ζ(f,g) to the complex
numbers will only have (nontrivial) zeroes on the critical line
263
Geometric exponentiation and deeper information
d
p
Re(z) = dx
arctan( xtan(x))|xtan(x)=λ(g) ,
where λ(g) is the average of the eigenvalues of the gij , weighted by multiplicity.
As a couple of remarks, note that this conjecture reduces to our statements for
the first and second statistical moments of the complex numbers, respectively (as
well it should). Perhaps the most mysterious aspect is the relationship between
λ(g) and the location of the critical line - and in particular the appearance of the
trigonometric function tan. Essentially this means that as the average eigenvalue
increases, the location of the critical line will occasionally ”spike” to positive infinity,
then reappear at negative infinity and quickly drop back close to zero again, and do
so with a period of length π. Indeed, this is a significant departure from the naive
expectation that if we were to have taken successively higher statistical moments
of the complex numbers, that the location of the critical line for the associated L
functions would have jumped by multiples of 1/2 indefinitely.
It would be very good to even have a sketch of the above. However, given the
fact that I am still uncertain whether this is a precisely correct consequence of the
criticality of the turbulent information (even given everything else, I am unsure
about whether λ should not also be a function of f ), it would perhaps be wiser
to instead try to see what follows from the criticality of the action I have given
above instead. Also it was never my intent to delve particularly deep into this
area of mathematics - more I merely wished to indicate how the tools I have been
developed can be used, and what results might follow if care is taken.
It also goes without saying that if the above conjecture is slightly off-beam and
in fact not corollary to the criticality of our signal function above, a counterexample
should be easy enough to find via numerical methods.
264
Number theory from information geometry
265
Geometric exponentiation and deeper information
indexing on the tangent space of AB via v(w) for the vector v(w) assigned to the
wth copy of the tangent space of A. Furthermore it is in fact possible to induce
a natural geometric structure on M N using the metrics σ and τ , via the fourth
order nondegenerate tensor Λ = σ ⊗ τ , with geometry given by k(v(w), p(q))kΛ :=
σij τkl v i (wk )pj (q l ).
10.3.2 Application
To reiterate, this idea was developed in an attempt to find a way to deal with
an infinite number of geometric bifurcations, as is engendered by the two types of
mathematical turbulence in the turbulent geometry discussed above (turbulence in
the stack and the measure). It is my hope that it might be possible to find a new
correspondence principle for such spaces that allows the description of the previous
geometry not only in full generality, but also in an elegant and finite way.
As to applications of geometric exponentiation, my thoughts on this matter are
still quite vague. However I suspect it could be used to improve the understanding of
transcendental equations, as well as of aspects of the mathematics related to Galois
theory. I also have the intuition that geometric exponentiation would be useful to
find still deeper results about critical subsets of the real line than those described
previously in this chapter.
For instance, one might expect actions of the following general form, in the
simplest case:
R
M
SΛ(m) dm
(∂Λ∗ SΛ ; SΓ )dm
R
M
266
Number theory from information geometry
267
Geometric exponentiation and deeper information
ficult to make. However, applications of such methods may have large potential
payoff. I suspect that even slight progress could have benefits to the understanding
of materials science (in the case of plastic geometry) and improvement in sieving
methods (with respect to the tensor rank manifolds outlined). The reason to suspect
the latter is because it is quite possible that the metric on the tensor rank space may
be in correspondence with the bilinear form of the remainder term in the linear sieve
(see for instance Greaves [24]). Related questions, such as the Geometric Langlands
program, might also be addressable with deeper understanding.
Note that if we step back a bit, it is easy to observe that all the above consid-
erations relate to having ideas from statistics driving the development of abstract
geometric structures on manifolds. It then becomes natural to ask whether we can
use geometry to drive the development of statistical structures. Motivating ques-
tions here are as to the rigorous formulation of Heisenberg’s matrix mechanics, the
understanding of Brownian motion, and doubtless various other processes, such as
queuing theory, population dynamics, or analysis of markets and group psychology.
And in fact we can. The prototypical sort of example to bear in mind here is
that of a Markov chain, where one has a nondegenerate bilinear form driving the
evolution of a probabilistic state vector. It seems clear to me that this is a natural
place to start in order to begin the process of abstraction and generalisation with
respect to this particular viewpoint. In fact in retrospect it seems to me that this
and its continuous analogues are perhaps closer in spirit to the work of many of the
more established practitioners of Information Geometry than that covered in the
bulk of this treatise, being of a more statistical and less geometric flavour.
These and doubtless other considerations will hopefully be the subject of a future
work.
268
Bibliography
[3] - John S. Bell - ”On the Einstein Podolsky Rosen paradox”, Physics 1, 195
(1964)
[7] - Banks and Zaks, ”On the phase structure of vector-like gauge theories with
massless fermions”, Nucl. Phys. B 196:189, 1982
[8] - Huai Dong Cao and Xi Ping Zhu - ”A Complete Proof of the Poincare and
Geometrization Conjectures - Application of the Hamilton-Perelman Theory of
the Ricci Flow” - http://www.intlpress.com/
[10] - N.N. Cencov - ”Statistical decision rules and optimal inference” - Translation
of Math. Monog. 53, Amer. Math. Soc., Providence 1982
269
BIBLIOGRAPHY
270
BIBLIOGRAPHY
[30] - John David Jackson - ”Classical Electrodynamics”, Third edition 1999, Hamil-
ton Printing Company
[36] - John W. Morgan and Gang Tian - ”Ricci Flow and the Poincare Conjecture”,
http://arxiv.org/abs/math/0607607
[37] - John W. Morgan and Gang Tian - ”Completion of the Proof of the Ge-
ometrization Conjecture”, http://arxiv.org/abs/0809.4040
[39] - Misner, Wheeler and Thorne - “Gravitation”, W.H. Freeman and Company,
1973
[40] - Michael K. Murray and John W. Rice - “Differential Geometry and Statistics”,
Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993
271
BIBLIOGRAPHY
[42] - Grisha Perelman - ”The Entropy Formula for the Ricci Flow and its Geometric
Applications” - http://arxiv.org/abs/math.DG/0211159
[44] - Denes Petz - ”Information Geometry and Statistical Inference” - 2nd Inter-
national Symposium on Information Geometry and its Applications, December
12-16 2005, Tokyo
[46] - Bernhard Riemann, ”On the Number of Prime Numbers less than a Given
Quantity”, Monatsberichte der Berliner Akademie, November 1859
[50] - Jalal Shatah and Michael Struwe - ”Geometric Wave Equations”, Courant
Lecture Notes in Mathematics, 1998
272