A Treatise On Information Geometry: Chris Goddard, The University of Melbourne

A treatise on information geometry
arXiv:1802.06178v1 [math.DG] 17 Feb 2018
Chris Goddard, The University of Melbourne

Contents
Preface xi
Acknowledgements xiii
Organisation and attribution of work xv
1 Characteristic Classes 1
1.1 Characteristic Classes as pullbacks . . . . . . . . . . . . . . . . . . . 2
1.2 Standard Classes and their Construction . . . . . . . . . . . . . . . . 3
1.2.1 The Stiefel Whitney Classes . . . . . . . . . . . . . . . . . . . 3
1.2.2 The Euler Class . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Chern Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Pontrjagin Classes . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Generalised Invariants and application to Exotic Differentiable Struc-
tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Geometric Interpretation of Generalised Invariants . . . . . . . 9
2 Pseudo-Riemannian Geometry 11
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The fundamental theorem of Riemannian geometry . . . . . . . . . . 12
2.2.1 The metric tensor . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 The euclidean covariant derivative . . . . . . . . . . . . . . . . 12
i
2.2.3 Axiomatic extension to arbitrary metrics . . . . . . . . . . . . 13
2.2.4 Uniqueness of the Levi-Civita connection . . . . . . . . . . . . 14
2.3 Geodesics and the geodesic equation . . . . . . . . . . . . . . . . . . 14
2.3.1 Definition and variational formulation . . . . . . . . . . . . . . 14
2.3.2 The cut and conjugate loci . . . . . . . . . . . . . . . . . . . . 15
2.4 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 The curvature tensor and its symmetries . . . . . . . . . . . . 16
2.4.2 Interpretation of the Ricci and Scalar curvature . . . . . . . . 17
2.4.3 Some basic comparison results . . . . . . . . . . . . . . . . . . 17
2.5 Stoke’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Kähler geometry and Convexity Theory . . . . . . . . . . . . . . . . . 20
2.6.1 The notion of convexity in determining stability . . . . . . . . 20
2.6.2 Kähler manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.3 Relation to minimal surface theory . . . . . . . . . . . . . . . 22
2.6.4 Further thoughts . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 A Survey of Geometric Measure Theory 27

3.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 The Plateau Problem as the Prototypical Example . . . . . . 28
3.1.2 Possible problems with the standard variational approach . . . 30
3.1.3 Geometric Measure Theory comes to the rescue . . . . . . . . 30
3.2 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.3 The Sobolev Spaces W k,p (Ω), W0k,p (Ω) and Wloc
k,p
(Ω) . . . . . . 34
3.2.4 Approximation of Sobolev functions by smooth functions . . . 35
3.2.5 The Sobolev Inequality . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Further preliminaries from analysis . . . . . . . . . . . . . . . . . . . 36
3.3.1 Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.2 Hausdorff Measure . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.3 Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
ii
3.3.4 Lipschitz and BV Functions . . . . . . . . . . . . . . . . . . . 38
3.3.5 Jacobians and the Area Formula . . . . . . . . . . . . . . . . . 41
3.3.6 The Co-Area Formula . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Key Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Tangent Cones . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.2 Rectifiable Sets and a Structure Theorem . . . . . . . . . . . . 45
3.4.3 Currents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.4 The Rectifiability Theorem . . . . . . . . . . . . . . . . . . . . 46
3.5 The Compactness Theorem for Integral Currents . . . . . . . . . . . . 47
3.5.1 Two important theorems from analysis . . . . . . . . . . . . . 48
3.5.2 Remarks on Radon Measures . . . . . . . . . . . . . . . . . . 50
3.5.3 Mollification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.4 Proof of The Theorem, and Existence of (Weak) Solutions to
the Plateau Problem . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Varifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6.2 The Constancy Theorem . . . . . . . . . . . . . . . . . . . . . 57
3.6.3 The Approximation and Deformation Theorems . . . . . . . . 58
3.6.4 The Boundary Rectifiability Theorem . . . . . . . . . . . . . . 58
3.7 A couple of applications . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.7.1 The Frankel-Lawson Result . . . . . . . . . . . . . . . . . . . 59
3.7.2 Jon Pitts’ Construction . . . . . . . . . . . . . . . . . . . . . . 64
4 Existence Theory for PDEs 67

4.1 Elliptic and Parabolic PDEs . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.2 The Maximum Principle . . . . . . . . . . . . . . . . . . . . . 69
4.1.3 Differential Harnack Inequalities . . . . . . . . . . . . . . . . . 71
4.2 Hyperbolic PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 Existence theory for hyperbolic equations . . . . . . . . . . . . 73
4.2.2 Existence theory for ”elastic” wave equations . . . . . . . . . 75
iii
5 Geometric Evolution Equations 77
5.1 The Heat Flow Equation as the Prototypical Example . . . . . . . . . 78
5.1.1 Maximum Principle . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1.2 Comparison Principle . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.3 Averaging Property . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 The Curve Shortening Flow (CSF) . . . . . . . . . . . . . . . . . . . 83
5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.2 The Shrinking Circle . . . . . . . . . . . . . . . . . . . . . . . 84
5.2.3 Geometric Invariance . . . . . . . . . . . . . . . . . . . . . . . 84
5.2.4 Avoidance Principle . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.5 Maximum Principle . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.6 Preserving Embeddedness . . . . . . . . . . . . . . . . . . . . 87
5.2.7 Finite Time Singularity . . . . . . . . . . . . . . . . . . . . . . 88
5.2.8 CSF as a steepest descent flow for length . . . . . . . . . . . . 88
5.2.9 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.10 Evolution of Curvature . . . . . . . . . . . . . . . . . . . . . . 91
5.2.11 Bounds on Curvature for CSF . . . . . . . . . . . . . . . . . . 93
5.2.12 An Isoperimetric Estimate . . . . . . . . . . . . . . . . . . . . 95
5.3 Some Geometric Background on Hypersurfaces . . . . . . . . . . . . . 95
5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.2 The Second Fundamental Form . . . . . . . . . . . . . . . . . 96
5.3.3 Differentiation on a Hypersurface . . . . . . . . . . . . . . . . 98
5.3.4 The Structure Equation for Hypersurfaces . . . . . . . . . . . 98
5.3.5 Principal Curvatures . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 Mean Curvature Flow (MCF) . . . . . . . . . . . . . . . . . . . . . . 100
5.4.1 Introduction; MCF as a steepest descent flow of volume . . . . 100
5.4.2 Existence Results . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4.3 Induced Evolution of Various Quantities along MCF . . . . . . 103
5.4.4 The Huisken Rescaling Result . . . . . . . . . . . . . . . . . . 104
5.4.5 Simon’s Identity and an application . . . . . . . . . . . . . . . 105
iv
5.4.6 Monotonicity Formula for the MCF and application to MCF
Solitons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 Proof of the Huisken rescaling result of MCF . . . . . . . . . . . . . . 111
5.5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5.2 The Maximum Principle for Vectors . . . . . . . . . . . . . . . 111
5.5.3 Extension of Technique to 2-tensors on Hypersurfaces . . . . . 113
5.5.4 Application of 2-tensor Extension to Proof of Huisken . . . . . 113
5.5.5 Preservation of bounded curvature . . . . . . . . . . . . . . . 114
5.6 Rescaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.6.2 A compactness theorem . . . . . . . . . . . . . . . . . . . . . 115
5.6.3 Existence of smooth limit flows . . . . . . . . . . . . . . . . . 116
5.7 The 2 dimensional Ricci Flow (2DRF) . . . . . . . . . . . . . . . . . 116
5.7.1 Introduction and Existence . . . . . . . . . . . . . . . . . . . . 117
5.7.2 Evolution of the Scalar Curvature . . . . . . . . . . . . . . . . 118
5.7.3 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.7.4 Soliton Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.7.5 Computing the ”Variational Derivative” for the 2D Gradient
Ricci Soliton equation . . . . . . . . . . . . . . . . . . . . . . 122
5.7.6 C ∞ convergence of the 2DRF . . . . . . . . . . . . . . . . . . 123
5.8 The Ricci Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.8.1 Short time existence . . . . . . . . . . . . . . . . . . . . . . . 129
5.8.2 Evolution of the Curvature . . . . . . . . . . . . . . . . . . . . 131
5.8.3 The Structural Tensor E . . . . . . . . . . . . . . . . . . . . . 131
5.8.4 Convergence Results for Ricci Flow . . . . . . . . . . . . . . . 133
5.8.5 The Perelman Functional; connection with the Fisher Infor-
mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.9 Proof of Hamilton’s Theorem (1982) . . . . . . . . . . . . . . . . . . 136
5.9.1 The theorem, and sketch of its proof . . . . . . . . . . . . . . 136
5.9.2 A maximum principle . . . . . . . . . . . . . . . . . . . . . . . 137
5.9.3 The Uhlenbeck Trick . . . . . . . . . . . . . . . . . . . . . . . 137
5.9.4 Core of the Argument . . . . . . . . . . . . . . . . . . . . . . 139
v
5.10 The Huisken-Sinestrari theorem; surgery and classification of canon-
ical singularities for MCF . . . . . . . . . . . . . . . . . . . . . . . . 141
5.10.1 Preliminaries, including statement of the theorem . . . . . . . 141
5.10.2 Canonical Singularities . . . . . . . . . . . . . . . . . . . . . . 142
5.10.3 A priori Estimates . . . . . . . . . . . . . . . . . . . . . . . . 143
5.10.4 Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.11 Outline of the Classification Program . . . . . . . . . . . . . . . . . . 146
5.11.1 Injectivity Radius and Collapsing of Balls . . . . . . . . . . . 147
5.11.2 Canonical Neighbourhoods . . . . . . . . . . . . . . . . . . . . 148
5.11.3 Connection to the Ricci flow with surgery . . . . . . . . . . . 151
5.11.4 Perelman’s Length or ”Entropy” . . . . . . . . . . . . . . . . . 153
6 Statistical Geometry 155

6.1 Preliminary Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.1.3 Particles and mass flux . . . . . . . . . . . . . . . . . . . . . . 160
6.1.4 Fuzzy geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.2 Existence of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.2.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.2.2 Characterisation of the solution space . . . . . . . . . . . . . . 166
6.3 Two classes of Signal Functions and related results . . . . . . . . . . 168
6.3.1 ”Sharp” signal functions . . . . . . . . . . . . . . . . . . . . . 168
6.3.2 Almost sharp signal functions . . . . . . . . . . . . . . . . . . 169
6.3.3 Example of a Fuzzy Geodesic . . . . . . . . . . . . . . . . . . 170
6.4 Stokes Theorem for Statistical Manifolds . . . . . . . . . . . . . . . . 172
6.4.1 Naive formulation . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.2 The general statistical derivative . . . . . . . . . . . . . . . . 173
vi
7 Fisher Information and application to the theory of Physical Man-
ifolds 175
7.1 The Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.2 Fisher Information Theory and the Principle of Extreme Physical
Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.2.1 A few definitions from statistics . . . . . . . . . . . . . . . . . 177
7.2.2 The Fisher Information and the EPI principle . . . . . . . . . 179
7.2.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.2.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 182
7.3 Unbiased Estimators and the Cramer-Rao Inequality . . . . . . . . . 183
7.3.1 Estimators and strong Cramer-Rao . . . . . . . . . . . . . . . 184
7.3.2 Physical estimators and weak Cramer-Rao (for Riemannian-
metric-measure spaces) . . . . . . . . . . . . . . . . . . . . . . 186
7.3.3 Physical estimators and weak Cramer-Rao (for Riemann-Cartan
manifolds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.4 Fisher Information is the optimal sharp information measure . . . . . 192
7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.4.2 Variation of the meta information . . . . . . . . . . . . . . . . 194
7.4.3 Alternative perspectives . . . . . . . . . . . . . . . . . . . . . 195
8 Physics from Fisher Information 197

8.1 Physics for Sharp Manifolds . . . . . . . . . . . . . . . . . . . . . . . 197
8.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.1.2 Opening statements . . . . . . . . . . . . . . . . . . . . . . . . 198
8.1.3 Proof of the first Claim . . . . . . . . . . . . . . . . . . . . . . 200
8.1.4 Proof of the second Claim . . . . . . . . . . . . . . . . . . . . 201
8.1.5 Proof of the third Claim . . . . . . . . . . . . . . . . . . . . . 202
8.1.6 Proof of Claim 4 . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.1.7 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.1.8 Addendum - simplification of the equations of geometrody-
namics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.2 Physics for almost sharp manifolds . . . . . . . . . . . . . . . . . . . 210
vii
8.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.2.2 Result outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.2.3 Proof of Claims 1 and 2 . . . . . . . . . . . . . . . . . . . . . 213
8.3 Recovery of standard results . . . . . . . . . . . . . . . . . . . . . . . 214
8.3.1 Proof of Claim 2 . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.3.2 Proof of Claim 3 . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.3.3 Proof of Claims 4 to 7 . . . . . . . . . . . . . . . . . . . . . . 217
8.3.4 Classical electrodynamics . . . . . . . . . . . . . . . . . . . . . 218
8.3.5 (Classical) quantum mechanics . . . . . . . . . . . . . . . . . 218
8.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.4.2 Speculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.4.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9 Turbulent Geometry 227

9.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.1.1 Fractional Calculus . . . . . . . . . . . . . . . . . . . . . . . . 228
9.1.2 Basic tools and motivation . . . . . . . . . . . . . . . . . . . . 228
9.1.3 Construction of the information . . . . . . . . . . . . . . . . . 230
9.1.4 The Correspondence Principle . . . . . . . . . . . . . . . . . . 232
9.1.5 Further generalisations . . . . . . . . . . . . . . . . . . . . . . 233
9.2 Cramer-Rao for turbulent statistical manifolds . . . . . . . . . . . . . 234
9.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.2.2 Turbulent estimators and the Cramer-Rao inequality . . . . . 235
9.2.3 Application to physical estimators . . . . . . . . . . . . . . . . 237
9.3 Statistical Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.3.2 Basic tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.3.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
9.3.4 A few quick remarks on notation . . . . . . . . . . . . . . . . 243
9.4 Topics in turbulent statistical dynamics . . . . . . . . . . . . . . . . . 243
viii
9.4.1 Application to condensed matter physics . . . . . . . . . . . . 243
9.4.2 Fluid dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.4.3 The Lorentz problem . . . . . . . . . . . . . . . . . . . . . . . 246
9.4.4 Entanglement physics . . . . . . . . . . . . . . . . . . . . . . . 248
10 Number theory from information geometry 253

10.1 The first statistical moment of the complex numbers . . . . . . . . . 254
10.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
10.1.2 The Argument (sketch) . . . . . . . . . . . . . . . . . . . . . . 255
10.1.3 Alternative approach using Riemann-Cartan geometry . . . . 259
10.1.4 Higher moments of the complex numbers . . . . . . . . . . . . 260
10.2 Turbulence and the criticality of the prime numbers . . . . . . . . . . 260
10.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.2.2 A streamlined sketch of the Riemann conjecture . . . . . . . . 261
10.2.3 Towards a slightly more sophisticated conjecture . . . . . . . . 263
10.3 Geometric exponentiation and deeper information . . . . . . . . . . . 264
10.3.1 Motivation and definition . . . . . . . . . . . . . . . . . . . . 264
10.3.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
10.3.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 267
ix
x
Preface
During my time at the University of Melbourne I looked at a number of different

topics. In a rough semblance of order, these were: Characteristic Classes, Pseudo-
Riemannian Geometry, Comparison Geometry (together with the diameter sphere
problem), and Geometric Measure Theory (with a focus towards generalising a par-
ticular theorem in minimal surface theory). I will discuss each of these in turn,
with the exception of comparison geometry, since that is to be the topic of my PhD
thesis.
In July 2006 I attended the AMSI winter school at the University of Queensland,
and was most fortunate to be given an extremely accessible account of geometric
evolution equations, and in particular, the Ricci Flow by Ben Andrews of the ANU.
In the same year I also studied in some detail the theory of Information Geometry,
and its application to what I call the theory of Physical Manifolds via an information
measure called the Fisher information. I constructed a mathematics which I think
could best be described as statistical geometry for this purpose. This, and various
generalisations, forms the focus of the latter part of this manuscript.
The general premise of statistical geometry is fairly simple. Take a differential
n manifold, M . Consider the natural space of inner products on Rn , A. Associate
to each point in M a distribution f (m) : A → R+ which assigns in some sense a
weighting to each potential geodesic direction fromRm, possibly favouring certain
directions over others. In particular, we would like A f (m, a) = 1 for all m ∈ M .
There is a natural statistical derivative ∇f induced by f .
We might use this to force f to satisfy an additional conservation condition,
which essentially translates to conservation of probabilistic flux - that ∆f f = 0.
This was in fact roughly the original approach I took towards the matter. However
it turns out that this is roughly equivalent to criticality of the Fisher information.
In particular, for the special case of a sharp or Cartan-Riemannian manifold, where
f (m, a) = δ(σ(m) − a) for some metric σ, we have roughly that ∆f f = Rσ , and also
xi
R
that the Fisher information is M Rσ ; in particular this is critical when Rσ = 0.
Originally I also made some attempt to make concrete various ideas that do
not fit in any of the above categories. In particular, I wanted to somehow make
rigorous the idea of curvature on sets of fractal dimension. Early in 2008 I made
some progress with respect to these ideas, which became part of a project I came to
call turbulent geometry.
Following my preliminary investigations into turbulent geometry I began to focus
on the idea of building statistical spaces on top of statistical spaces. In particular
this, and various hybrid models including also turbulence and ideas from statis-
tics, have application to number theory and more practically towards constructing
theoretical frameworks for condensed matter physics. One of the more exciting de-
velopments here, in my opinion, was my discovery of the correspondence principle,
which in its simplest incarnation essentially states that there is a 1-1 correspondence
between completely general statistical manifolds and sharp turbulent geometries of
scalar type. This allows a physical interpretation of the former in a way that has con-
siderable aesthetic appeal, as of a Cartan-Riemannian manifold wherein the points
are of generalised measure.
However, with a few minor initial exceptions, much of the theory I develop in
the later sections of this book still lacks the degree of rigour which its scope would
otherwise dictate. An indication of the level of detail that is required is present
in my proof of both Stoke’s theorem and the Cramer-Rao inequality for statistical
manifolds. Unfortunately my generalisations of these theorems to the rather broader
classes of geometrical objects I deal with further along the path of my researches
are slightly rushed, and probably do require some inspection.
Indeed the main part of the dissertation I had in mind for this disclaimer is
the chapter on turbulent geometry (chapter 9). I have made a good faith attempt
to prove many results in this chapter, but I fear that many of my ”proofs” here
are wanting in precision. However I have decided to retain this work because of its
potential promise to solve many extremely difficult and interesting problems. These
include what I call the Lorentz problem (why the preferred geometrical structure
of the universe should be Lorentzian), indications of how to construct a geometric
theory of condensed matter physics, and indications of how to develop a powerful
and predictive theory for entanglement physics. Others still are increased under-
standing of turbulence in fluid flow, the theory of purely formal structures of fractal
measure (such as the Mandelbrot set), and questions about the structure of the
prime numbers.
xii
Acknowledgements
First and foremost, I would like to acknowledge my PhD supervisor, Professor Hyam
Rubinstein. Not only did he manage to continue to find me things to examine and
learn, but his boundless patience with me when I mentioned various ideas of mine
in incomplete form, and his tireless capacity for impressing upon me the need for
rigour was something which I have greatly appreciated. I would also like to thank my
parents; my father, for his interest in my research, for listening to me, and helping
guide my thought process to meaningful results, and my mother, for being a polite
listener.
Thanks to the popular scientific magazine the New Scientist, I would be oth-
erwise unaware of Frieden’s book on the subject of Fisher information, which was
reviewed a number of years back. I should also thank my father once more for
pointing out the review to me, and impressing upon me the potential importance of
the work.
I should also thank the New Scientist for bringing to my attention the notion of
scale free physics, an idea advocated first by Howard Georgi in [22], though, taking
a similar role as Fisher to Frieden, the original instigators were Banks and Zaks in
their 1982 paper [7]. This motivated much of my interest in investigating (smooth)
fractional geometry and its connection to turbulence in early 2008.
Additionally, I owe a debt of gratitude to Ben Andrews for a passing dinnertime
conversation during the 2006 IAGSM winter school in Brisbane in which he men-
tioned the work of Amari, Nagaoka, Murray and Rice on Information Geometry,
without which I would not have been able to make rigorous my justification for the
variational principle that I invoke later on. There are various incarnations of this
principle, but the first steps along the path of extension and generalisation of the
result known as the Cramer-Rao inequality would have been impossible without ref-
erence to the work of other information geometers. I might add that this principle
underlies Roy Frieden’s work.
xiii
Of course, Frieden’s work would not have been possible, or at least would cer-
tainly have been much more difficult, without various ideas, not at least that of the
Fisher information, which are attributed to Ronald Fisher (1890 - 1962).
Finally an additional word of thanks to Professor Frieden who suggested during
an email exchange in late December 2008 that one of the best tools for proving the
optimality of the Fisher information over the space of positive functionals for a given
geometric model is the Cencov inequality, which is closely related to the Cencov
uniqueness theorem. This is still something that I need to more fully investigate.
xiv
Organisation and attribution of
work
The organisation of this manuscript will be as follows. In the first chapter I survey
J. Milnor’s book on characteristic classes. Chapter two is also a survey, this time
of Riemannian geometry. No particular source has been used here, rather it being
a synthesis of a number of different works. Towards the end of the chapter some
original results are presented.
In chapter three I survey Leon Simon’s and Frank Morgan’s books on geometric
measure theory, as well as incorporating some notes from a course on the same that
Marty Ross gave back in 2006.
Chapter four is a survey mainly of the work in Gilbarg and Trüdinger’s book
”Elliptic Partial Differential Equations”, together with an extensive commentary on
techniques from Shatah and Struwe’s book on geometric wave equations.
Following this in chapter five I give the notes rewritten almost verbatim from a
course that Ben Andrews gave at the AMSI winter school of 2006, together with a few
minor personal modifications. I also cover a few lectures given by Gerard Huisken
and one given by Nick Sheridan. The common thread to all of these materials is
that of geometric evolution equations and the Hamilton-Perelman approach to the
solution of the geometrisation conjecture, which is the topic of this chapter. I also
draw upon J. Morgan and Gang Tian’s detailed and extensive work on the same
subject.
Chapter six is the first truly original component of this work, and introduces
the notion of statistical geometry. Chapter seven draws heavily upon Roy Frieden’s
program described in his book ”Physics from Fisher Information” and attempts
to rigorise it. Chapter eight is the application of the techniques developed in the
previous two chapters to the derivation of the equations of geometrodynamics and
also the equations of the standard model. This chapter is also more or less completely
xv
original.
Chapter nine is an extension of the ideas in the preceeding three chapters to the
idea of a turbulent manifold. The ideas presented in this and the last chapter are
almost entirely due to the author.
xvi
Chapter 1
Characteristic Classes
The idea of a characteristic class is to construct some cohomological framework on

a manifold and look at the things that generate it, which are known as character-
istic classes. These things are used as invariants to differentiate between different
manifolds. The most well known examples of such things are the Stiefel-Whitney
classes, Euler classes, Chern classes, and Pontrjagin classes, pertaining to Z2 , inte-
ger, complex and quaternionic cohomology respectively. I shall discuss these all in
turn, and give a broad outline of their construction, following [38].
A core element of the construction of all these various types of characteristic
classes lies in understanding fibre bundles, in particular fibre bundles with GL(n)
as structure group and vector spaces as fibres, i.e. vector bundles. A good general
reference on fibre bundles can be found in [48].
The main reason that we are interested in characteristic class theory is that an
extremely important application of this method is the ability to distinguish between
different differentiable structures on a topological manifold. In other words, it is
possible to show that there are different ways of doing calculus on a given topology.
John Milnor, together with collaborator Michel Kervaire, used this theory to great
effect in classifying the various allowable structures on n-spheres in the 50s, which
has inspired much of the recent modern interest. More recently, a particularly
spectacular success was Donaldson’s proof that there are uncountably many different
differentiable structures imposable upon R4 , for instance.
In particular the fact that there may be various different ways of doing calculus
on a given topological space was in fact not known until Milnor’s seminal paper,
”On Manifolds Homeomorphic to the 7-sphere”. This was a major advance and
demonstrated the counterintuitive notion that exotic forms of calculus do and can
1
Characteristic Classes as pullbacks
exist on manifolds. However, the method of characteristic classes is not totally

constructive, and only serves as a crude indicator of how to distinguish between
different differentiable structures. It will be one of the primary purposes and aims
of this dissertation to build a few examples of atypical geometric structures on
manifolds, and to discuss, where possible, the associated insights into physics.
For now, however, I will sketch the preliminary ideas (and motivation) behind
the development of characteristic classes.
1.1 Characteristic Classes as pullbacks

Before providing an axiomatic formulation of the concept of characteristic class, I
will give the historical motivation for their development. I will follow Chern [12]
closely here.
First of all, the Grassman manifold H(n, N ) is the space of all n-dimensional
linear spaces through a point in Rn+N . This can also be thought of as the classes
of the rotation group SO(n + N ) modulo the rotation group SO(n) of rotations
of the base and modulo the rotation group SO(N ) of rotations of the fibre, ie
H(n, N ) ∼= SO(N )\SO(n + N )/SO(n).
We may now state two theorems of great importance in the theory of sphere
bundles:
Theorem 1.1.1. (Whitney imbedding theorem for sphere bundles, [51]). Every
sphere bundle whose spheres are of dimension n−1 is equivalent to the bundle induced
by mapping the base space M into the manifold H(n, N ), provided N ≥ dim(M )+1.
Theorem 1.1.2. (Steenrod, [49]). Two sphere bundles are equivalent iff the map-
pings of M into H(n, N ) are homotopic.
Consider now the cohomology ring K(H(n, N )) of H(n, N ) relative to some

coefficient ring R. Then the class of homotopic mappings of M into H(n, N ), ie, the
equivalence classes of sphere bundles, induce a ring homomorphism of K(H(n, N ))
into K(M ). The image C(M, R) of this map in K(M ) is called the characteristic
ring of sphere bundles over M . So, in a sense, we have ”pulled back” classes in
H(n, N ) to generate classes in M . Finally, a cohomology class of C(M, R) is called
a characteristic cohomology class.
Of course, we may want to understand characteristic classes for other bundles
than just sphere bundles! In particular, we are interested in vector bundles, which
2
are the primary focus of this chapter. But these classes of bundles are more or less
the same, since an n-plane bundle can easily be mapped to an n − 1 sphere bundle
via a map call it g. Since cohomology functors reverse direction, we then clearly
have an induced map g ∗ : C(M, R) → C̄(M, R), where C̄ is the characteristic ring
of n-plane bundles over M with respect to the ring R.
1.2 Standard Classes and their Construction

1.2.1 The Stiefel Whitney Classes
The definition of Stiefel Whitney classes can be put on an axiomatic footing presum-
ing we know their existence. Proving their existence is actually the most difficult
part; once we know that they exist and what their properties are it is (relatively)
easy to do calculations. The following is directly from [38].
AXIOM 1. To each vector bundle χ there corresponds a sequence of cohomology
classes
wi (χ) ∈ H i (B(χ); Z/2), i ∈ 0, 1, 2, ...
called the Stiefel Whitney classes of χ. The class w0 (χ) is equal to the unit
element
1 ∈ H 0 (B(χ); Z/2) and wi (χ) is zero for i ≥ n if χ is an n-plane bundle.
(an n-plane bundle here basically means a bundle with an n-dimensional vector
space for fibres).
For all intents and purposes, B(χ) will often be the tangent bundle of a manifold
M (i.e. M is the base space, with fibres at each point corresponding to tangent
spaces). In this case we often refer to the above axiom/definition as defining the
Stiefel-Whitney classes of M .
AXIOM 2. (Naturality) If f : B(χ) → B(η) is covered by a bundle map from χ
to η, then
wi (χ) = f ∗ wi (η).
A bundle map from χ to η is a continuous function g : E(χ) → E(η) which
carries each vector space Fb (χ) isomorphically onto one of the vector spaces Fb0 (η).
AXIOM 3. (The Whitney Product Theorem) If χ and η are vector bundles over
the same base space, then
wk (χ ⊕ η) = ki=0 wi (χ) ∪ wk−i (η)
P
3
Standard Classes and their Construction
Here ∪ is the standard cup product operation in cohomology, and χ ⊕ η is the

Whitney sum of χ and η, which I shall now proceed to define.
First of all, it is necessary to define the notion of an induced bundle. Let χ be
a vector bundle with projection π : E → B and B1 an arbitrary topological space.
Given any map f : B1 → B define the induced bundle f ∗ χ over B1 as follows. The
total space E1 of f ∗ χ is the subset E1 ⊂ B1 × E consisting of all pairs (b, e) with
f (b) = π(e). The projection map π1 : E1 → B1 is defined by π1 (b, e) = b. Then, if
we define fˆ : E1 → E, fˆ : (b, e) 7→ e then we have π ◦ fˆ = f ◦ π1 .
Now take two bundles χ1 , χ2 over the same base space B. Let d : B → B × B
denote the diagonal embedding. Then the bundle d∗ (χ1 × χ2 ) is referred to as the
Whitney sum of χ1 and χ2 .
Finally, we have
AXIOM 4. For the line bundle γ11 over the circle P 1 , the Stiefel-Whitney class
w1 (γ11 ) is non-zero.
These 4 axioms completely characterise Stiefel Whitney classes.
1.2.2 The Euler Class

Let E0 be the set of all nonzero elements in the total space E of a oriented n-plane
bundle χ.
We have the following important theorem:
Theorem 1.2.1. The group H i (E, E0 ) is zero for i < n, and H n (E, E0 ) contains a
unique class u such that for each fibre F = π −1 (b) the restriction
u|(F, F0 ) ∈ H n (F, F0 )
is the unique non-zero class in H n (F, F0 ). Furthermore the correspondence x 7→

x ∪ u defines an isomorphism H k (E) → H k+n (E, E0 ) for every k. (We call u the
fundamental cohomology class.)
In order to define the Euler class, we consider the inclusion (E, ∅) ⊂ (E, E0 ).
This gives rise to a restriction homomorphism H ∗ (E, E0 ; Z) → H ∗ (E; Z), denoted
by y 7→ y|E. Applying this to the fundamental class u ∈ H n (E, E0 ; Z) we obtain a
new class u|E ∈ H n (E; Z). But we have that H n (E; Z) is canonically isomorphic
to H n (B; Z). This isomorphism follows from a highly nontrivial theorem that I will
not prove.
4
The Euler class of an oriented n-plane bundle χ is then defined to be the co-
homology class e(χ) ∈ H n (B; Z) which corresponds to u|E under the canonical
isomorphism π∗ : H n (B; Z) → H n (E; Z).
There is in fact a connection between the Euler class of T M and the Euler
characteristic of M , where M is a manifold and T M the corresponding tangent
bundle.
First we need to define the notion of Kronecker index.
For M a closed, possibly disconnected, smooth n-manifold, there is a unique
fundamental homology class
µ(M ) ∈ Hn (M ; Z/2)
For any cohomology class ν ∈ H n (M ; Z/2), we define the Kronecker index as

< ν, µM >= ν(µM ) ∈ Z/2.
Theorem 1.2.2. If M is a smooth compact oriented manifold, then the Kronecker
index < e(τM ), µ >, using rational or integer coefficients, is equal to the Euler
characteristic χ(M ). Similarly, for a non-oriented manifold, the Stiefel-Whitney
number < wn (τM ), µ >= wn [M ] is congruent to χ(M ) modulo 2.
1.2.3 Chern Classes

To construct Chern classes, one uses, instead of real vector bundles, complex vector
bundles. So let ω be a complex n-plane bundle. Construct a new canonical (n-1)-
plane bundle ω0 over the deleted total space E0 . A point in E0 is specified by a fibre
F of ω together with a non-zero vector v in that fibre. Suppose a Hermitian metric
has already been specified on ω. Then the fiber of ω0 over v is defined to be the
orthogonal complement of v in F . These vector spaces can be considered as fibers
of a new vector bundle ω0 over E0 .
Any real oriented 2n-plane bundle possesses an exact Gysin sequence
∗
... → H i−2n (B) →∪e H i (B) →π0 H i (E0 ) → H i−2n+1 (B) → ...
with integer coefficients. Here π0∗ is the restriction of the projection map of ω,
π, to E0 .
The groups H i−2n (B) and H i−2n+1 (B) are zero for i < 2n − 1, from which it
follows that π0∗ : H i (B) → H i (E0 ) is an isomorphism.
5
Generalisations
Now define the Chern classes ci (ω) ∈ H 2i (B; Z) by induction on the complex
dimension n of ω. Define the top Chern class cn (ω) to coincide with the Euler class
e(ωR ). For i < n set
ci (ω) = π0∗−1 ci (ω0 )
Suppose T is the total space of ω0 . Then what we are really doing here is
computing the Chern classes of ω0 which lie in E0 (the base space of ω0 ), and then
defining the Chern classes of ω by pulling these back to B via the map π0∗−1 .
π0∗ : H 2i (B) → H 2i (E0 ) is an isomorphism for i < n, so this is ok. For i > n the
class ci (ω) is defined to be zero.
There is in fact a link between Chern classes and Stiefel Whitney classes; Chern
classes are more or less the even Stiefel Whitney classes of a bundle of even dimen-
sion, without the reduction to Z2 coefficients.
1.2.4 Pontrjagin Classes

I conclude my summary of (some of) the ideas in [MS] with a construction of the
Pontrjagin classes for a bundle. The i-th Pontrjagin class of a real n-plane bundle χ
pi (χ) ∈ H 4i (B; Z) .
is defined to be the integral cohomology class (−1)i c2i (χ ⊗ C)

Here χ ⊗ C is the so called complexification of χ, where we take the tensor
product of each fiber V with C, the complex numbers.
1.3 Generalisations
A direction of further investigation might be to examine the world of characteristic
classes for general fibre bundles. In other words, dropping the restriction that the
structure group be the general linear group. I am quite interested to see what the
analogies of the above classes might be in the general setting, if in fact there are
any.
6
1.4 Generalised Invariants and application to Ex-

otic Differentiable Structures
1.4.1 Definition
The motivating question here is:
Question. Given a set with a particular topological structure, ie a particular home-

omorphism class, is there a systematic way towards classifying all possible differen-
tiable structures that this set admits?
There are of course many topologies that are somewhat troublesome to work
with, and which may not even admit differentiable structures at all. So we make
the following assumption:
There is, about each point in our set X, an open set U and a function f such
that the map f : U → Rn is a homeomorphism. We call n the dimension of our
topology. If two such sets U and V overlap, we require the transition function from
U to V to be a homeomorphism.
Note that there will always be at least one way of placing a differentiable struc-
ture on such a manifold (by taking finer charts in the original topology, and requiring
transition maps to be differentiable).
If there is a map g : X → Rm such that g is a bilinear mapping and g(X) is a
differentiable submanifold of Rm , we say that X inherits the differentiable structure
from Rm from its embedding via g. There may well be more than one way to embed
X in Rm , of course, which is kind of the whole point.
So-
What we would like to do is produce a class of invariants that, for a given
differentiable manifold, will specify its diffeomorphism class uniquely, given of course
that we know all of the invariants. So consider homotopy classes of maps from a
space Y into our space X. For example, the homotopy groups πn (X) arise in this
way, for Yn = S n .
Now, let Y be an arbitrary differentiable submanifold of infinite dimensional
euclidean space. We can represent such spaces in general by associating a symmetric
bilinear (possibly degenerate) form g(x) to each point x of R∞ such that in a local
chart the induced functions gij (x) are smooth; in other words we are thinking of
them as (pseudo)-Riemannian submanifolds of our infinite dimensional space. Now,
7
Generalised Invariants and application to Exotic Differentiable Structures
since for any space X n there is an embedding k : X n → R2n+1 , it seems reasonable

to restrict to pseudo-Riemannian submanifolds of R2n+1 and expect the following
result to hold:
Conjecture. Suppose we know, for all symmetric bilinear forms g on R2n+1 , the
homotopy groups G(g, 0, X) corresponding to the homotopy classes of continuous
maps f : (R2n+1 , g) → X. Then we know the homeomorphism class of X.
Now suppose we know the homeomorphism class of X, and want to know how
many possible diffeomorphism classes we have to choose from. This leads us to make
the following generalisation: instead of considering merely continuous maps f from
(R2n+1 , g) to X, require only that the maps be α-Hölder continuous, that is:
limx→x0 |f (x)−f (x0 )|

|x−x0 |α
is well defined, for every x0 ∈ R2n+1 .

We let α vary between 0 and 1. Clearly α = 0 corresponds to continuous maps
and α = 1 corresponds to differentiable maps. We expect there to be greater variety
in the latter than the former, since in general we will have more obstructions to being
able to find a homotopy between disparate mappings if we restrict to differentiable
functions. This leads naturally to the following additional
Conjecture: Let 0 ≤ α ≤ β ≤ 1. Then
G(g, α, X) ⊂ G(g, β, X)
for every bilinear form g on R2n+1 , where G(g, a, X) is the homotopy group
corresponding to g, X, and a. In fact, we might expect something even stronger
than this:
G(g, α, X) G(g, β, X)
Corollary 1.4.1. It is possible to take the quotient of the space of our invariants
corresponding to differentiable maps (diffeomorphism classes) by the space of our
invariants corresponding to continuous maps (homeomorphism classes).
So for our example, after we have gone through this process, we will know
precisely how many ways there are to place differentiable structures on X given
that we know its homeomorphism class. The quotient space may often be a point-
in which case there will be only one way.
Of course, this may not tell us how to constuct these diffeomorphism classes-
but it may be a good first step towards such a goal.
8
1.4.2 Geometric Interpretation of Generalised Invariants

There is a natural way to think of the groups G(g, 1, X).
Claim: Suppose Rij (g, f ) is the induced Ricci curvature of the image of (R2n+1 , g)
in X via the differentiable map f . Let f∗ g be the push forward of g. Take the limit
of the Ricci Flow in X,
∂
f g(t)
∂t ∗
= −Ric(f∗ g(t)), Ric(f∗ g(0)) = Ric(g, f )
Then the limits of the Ricci flow under this construction, counted with multiplic-
ity ie as varifolds, correspond to the elements of the group G(g, 1, X), i.e. the limit of
the flow above corresponds to the homotopy class of the map f from N = (R2n+1 , g)
to X. If a flow may be perturbed from one limit to another, ie if we modify the
image via a diffeomorphism and restart the flow we get a different limit, then we
identify the two limits.
In other words, what we are doing, if we consider the Ricci flow as ∼1 , and the
perturbation identification as ∼2 , is claiming that
G(g, 1, X) ∼
= ({all maps from N to X}/ ∼1 )/ ∼2
Intuitively one should think of the fundamental group and the torus, for instance,
and consider the action of this flow as causing the image f (S 1 ) to pinch around
topological obstructions in the limit.
This is slightly messy; evidently we might want to remove the later consideration
about perturbation identification. So instead first volume normalised Ricci flow the
manifold X with the marking f (N ) within the ambient space R2n+1 ; then flow
f (N ) within the ambient space X. Then the worst thing that can happen is for the
Ricci flow limits to degenerate; for instance, consider the smooth torus. Then any
circle around its waist is a valid limit of the Ricci flow of f (S 1 ). So we get a one
dimensional family of solutions which smoothly vary one into the other. This leads
us to the following
Claim: If we perform this modified procedure, then the Ricci flow limits will
correspond to the elements of the group G(g, 1, X), with the implicit assumption
that we identify Ricci flow limits that can be deformed smoothly from one to the
other.
In fact we might expect:
Claim: Degeneracy of limits will correspond to an underlying symmetry of the
ambient space X. (For example, for the torus this is circular symmetry).
9
Generalised Invariants and application to Exotic Differentiable Structures
10
Chapter 2
Pseudo-Riemannian Geometry
2.1 Overview
Out of general interest, I acquired a very good book on Pseudo (or Semi) Riemannian
Geometry [41] and made a study of it towards the middle of 2005. The essential
difference between pseudo and standard Riemannian geometry, as is explained quite
quickly, is the relaxing of the condition that the metric g on the tangent bundle
to a manifold be a symmetric positive definite bilinear form to being merely a
nondegenerate symmetric bilinear form with fixed index (dimension of the space
of negative eigenvalues of g). One recalls that nondegeneracy of a bilinear form
means that its matrix in a given representation is invertible.
Many of the results that apply to Riemannian manifolds follow through for
semi-Riemannian manifolds- though by no means all. One also gets an interesting
interplay between timelike, spacelike, and lightlike vectors (vectors where g(v, v) is
negative, positive, or zero respectively). The 0 vector is defined to be spacelike.
A particularly important case of a semi-riemannian manifold is the case where
the manifold has a metric with index one. These are referred to as Lorentz manifolds,
and are the core ingredient in the physical model known as general relativity.
Riemannian geometry can be generalised to what I shall call Riemann-Cartan
geometry. For observe that if instead g is an antisymmetric bilinear form, that
the fundamental theorem of Riemannian geometry still goes through, and we get a
unique connection, known as the Cartan connection, corresponding to that metric g.
We may then conclude by linearity of the proof of this statement for both symmetric
and antisymmetric bilinear forms that we may drop the assumption of any form of
11
The fundamental theorem of Riemannian geometry
symmetry, and consider instead a merely nondegenerate form g. Then, by splitting

it into its symmetric and antisymmetric components, we establish the existence
of unique connections corresponding to these parts; by linearity we establish the
existence of a unique connection corresponding to g.
Everything else then carries through for Riemann-Cartan geometry. The cur-
vature tensor still remains a symmetric tensor, since it is induced by the second
geometric derivative. We still have geodesics, parallel transport, and injectivity
radii- the entire language remains the same.
2.2 The fundamental theorem of Riemannian ge-

ometry
2.2.1 The metric tensor
In standard euclidean geometry Rn , in particular, vector calculus, we have the notion
of the ”dot product” · : Rn × Rn → R which is a generalisation in somePsense of the
angle between two vectors. In particular, a · b := kakkbkcos(θ(a, b)) = i ai bi . This
is an example in fact of an ”inner product” on Rn . There are others. For instance
we might define a¯·b = 2a1 b1 + a2 b2 + a3 b3 + ... + an bn . For this new inner product,
we have an ”induced angle” θ̄(a, b), such that a¯·b = kakkbkcos(θ̄(a, b)), where now
kak2 = a¯·a. So in a certain sense the geometry of our space, which is described with
distances and angles, can be made to depend wholly on the choice of inner product.
Now consider a differentiable manifold M . We generalise the idea of inner prod-
ucts on Rn to inner products on T M . In particular, we define a Riemannian metric
on M to be a symmetric positive definite bilinear form g : T M × T M → R such that
gij := g(xi , xj ) are smooth functions for each and every chart xi for some element
U of an atlas for M .
2.2.2 The euclidean covariant derivative

In euclidean space, we may have the idea of a vector field. A vector field is a map
V : Rn → Rn which assigns to each point a vector is a smoothly varying fashion.
We may take derivatives of vector fields in a standard sort of way. In particular, in
much that same sort of way one can take the directional derivative with respect to
a function, f , one can take a directional derivative with respect to a vector field.
12
Recall that ∇v f , the directional derivative of f in direction v, is ∇f · v. We

define in a similar sort of sense directional derivatives for a vector field X; write
X = (X1 , ..., Xn ) for functions Xi . Then ∇v X := (∇v X1 , ..., ∇v Xn ).
This leads us to the notion of the covariant derivative for euclidean space. Let
v = W (x) now just be an element of a vector field W . Observe first of all that
W (f ) := ∇W f = ∇f · W will no longer be constant but variable with position.
Then we define ∇W X to be the vector field (∇W X1 , ..., ∇W Xn ).
2.2.3 Axiomatic extension to arbitrary metrics

In much the same way we extended the idea of inner product from euclidean space
to Riemannian metrics on differentiable manifolds, we may axiomatically extend the
idea of covariant derivative ∇ from euclidean space to Riemannian manifolds.
First of all, observe that the Euclidean covariant derivative has the following
properties:
(i) ∇f X+gY Z = f ∇X Z + g∇Y Z
(ii) ∇X (Y + Z) = ∇X Y + ∇X Z
(iii) ∇X (f Y ) = f ∇X Y + X(f )Y
In addition, we have that it is compatible with the Euclidean metric, ie if we

write D = exp−1 ◦ d where exp : Rn → Rn is the identity for euclidean space, then
d
dt
g(V, W ) = g( DV
dt
, W ) + g(V, DW
dt
)
Finally we have that it is symmetric, that is
∇X Y − ∇Y X = [X, Y ]
Then, it is simplicity to generalise from here. If an operator ∇ satisfies the

first three conditions it is said to be an affine connection. If in addition it satisfies
the fourth and fifth where now g is to be a Riemannian metric, we will call it a
Riemannian, or Levi-Civita connection associated to that metric. In fact, it turns
out that there is a one to one correspondence between these two objects, which I
will establish shortly.
13
Geodesics and the geodesic equation
2.2.4 Uniqueness of the Levi-Civita connection

In the previous section I described how one can abstract the idea of a connection to
a Riemannian manifold starting from the properties of the standard connection on
Euclidean space. I will now prove that, up to choice of metric, such connections are
unique.
Theorem 2.2.1. (”The fundamental theorem of Riemannian geometry”). Given a
Riemannian manifold M , there exists a unique Riemannian connection associated
to it.
Proof. If we assume existence, then certainly
X(Y, Z) = g(∇X Y, Z) + g(Y, ∇X Z)

Y (Z, X) = g(∇Y Z, X) + g(Z, ∇Y X)
Z(X, Y ) = g(∇Z X, Y ) + g(X, ∇Z Y )
Adding the first two of these statements together then subtracting the third
gives
Xg(Y, Z) + Y g(Z, X) − Zg(X, Y ) =

g([X, Z], Y ) + g([Y, Z], X) + g([X, Y ], Z) + 2(Z, ∇Y X)
which is an expression that can easily be rearranged to give ∇ uniquely in terms

of the metric. To demonstrate existence, define ∇ to satisfy the above expression.
Then it is easy to show that this definition satisfies the axioms.
Remark. For Riemann-Cartan manifolds, the Levi-Civita uniqueness theorem still
holds, but with the following minor modification. Instead of considering Xg(Y, Z) −
Y g(X, Z) + Zg(X, Y ), it is necessary to consider the sum (Xg(Y, Z) − Xg(Z, Y )) −
(Y g(X, Z) − Y g(Z, X)) + (Zg(X, Y ) − Zg(Y, X)), due to the lack of symmetry. One
should then get an expression wherein all terms in the connection save one can be
made to cancel, and uniqueness follows.
2.3 Geodesics and the geodesic equation

2.3.1 Definition and variational formulation
A geodesic in differential geometry is defined to be the shortest path between two
points on a Riemannian manifold. In this sense it is a generalisation of the notion
14
of straight line in Euclidean space. In particular, we are interested in measuring

the length of a path.
R 1 In euclidean space, the distance from a to b along a path γ :
[0, 1] → Rn will be 0 kγ 0 (t)kdt; the integral of the velocity along the parametrised
curve.
Similarly, in Riemannian geometry, we define the length to be
R1
L(γ) = 0
g(γ 0 (t), γ 0 (t))1/2 dt
where now one takes the norm with respect to the inner product g on the man-
ifold in question.
In order for the path to be a shortest path, we require that the first variation of
the length vanish:
δL(γ) = 0
which is equivalent to requiring that
∇γ 0 γ 0 = 0
This above expression is known as the geodesic equation. It can be written in

coordinates as
d 2 xk dxj
dt2
+ Γkij dx
dt
i
dt
=0
where Γkij := (∇Xi Xj )k are the Christoffel symbols associated to the metric g.
A couple of examples of geodesics for instance are straight lines in Euclidean
space or great circles on the sphere.
2.3.2 The cut and conjugate loci

The cut locus of a point p in a space M is defined to be the set of points C(p)
such that their preimage is the boundary of the critical ball about 0 in Tp M . The
conjugate locus of the same point p is the set of points C̄(p) such that the exponential
map fails to be a diffeomorphism on their preimage in Tp M .
15
Curvature
2.4 Curvature
The curvature is essentially the first correction term, or first obstruction from a
Riemannian metric being locally flat. In other words, provided kxk = o(δ),
gij ( xδ ) = δij (0) + δ 2 Rijkl (0) xδk xδl + o(δ 2 )
It can be thought of alternatively as the ”geometric acceleration”. For instance a

sphere has positive curvature, whereas a saddle has negative curvature. Similarly, it
is possible, given bounds on curvature, to deduce how quickly geodesics will converge
or diverge.
2.4.1 The curvature tensor and its symmetries

It turns out that the curvature tensor can be more precisely described as
Rijkl := g(R(Xi , Xj )Xk , Xl )
where
R(U, V ) := ∇U ∇V − ∇V ∇U − ∇[U,V ]
It can be checked that it has the properties of a tensor, ie linear in each argument.
Furthermore, it can be shown to have the following symmetries:
(i) R(u, v) = −R(v, u)
(ii) g(R(u, v)w, z) = −g(R(u, v)z, w)
(iii) R(u, v)w + R(v, w)u + R(w, u)v = 0
(iv) ∇u R(v, w) + ∇v R(w, u) + ∇w R(u, v) = 0
The Ricci curvature is defined to be Ril := Rijkl g jk , the contraction of the

Riemann curvature with respect to the second and third indices.
The scalar curvature is the contraction of the Ricci curvature, R := Ril g il .
16
2.4.2 Interpretation of the Ricci and Scalar curvature

Consider a geodesic γ. Define a metric tube γ about γ as
γ = {x|d(x, γ) ≤ }
Then
R(γ 0 , γ 0 )ds
R
V ol(γ ) = n−1 kS n−2 kL[γ] − c(n)n+1 γ
for << 1. This gives us a geometrical interpretation of the Ricci curvature.

Similarly,
n
V ol(Br (x)) = rn kV olB1R k − c(n)rn+2 R(x)
for r << 1, giving us a geometrical idea of what the scalar curvature does.
2.4.3 Some basic comparison results

As I mentioned before, curvature bounds, and knowledge of the curvature in gen-
eral, can be used to conclude various things about the topology of the space. For
instance, the celebrated Gauss-Bonnet formula states that the integral of the scalar
curvature of a closed surface is 2π times the Euler characteristic of that surface. So
we are extracting purely topological information out from what we put in. Higher
dimensional analogues of this result are possible, but I shall neither sketch nor prove
them here.
Another result of importance is the Toponogov comparison theorem, which es-
sentially states that if one takes geodesic triangles with two sides of fixed length and
angle in two different spaces, and the first has smaller curvature than the second,
then the second will have a shorter third side. An immediate consequence of this is
that geodesics will converge more rapidly in spaces of higher curvature.
Knowledge of curvature also allows one to bound certain things like the injectiv-
ity radius of a space from above. The injectivity radius of a point p is the maximum
number r such that the exponential map is injective on all balls of radius r in Tp M ;
the injectivity radius of a manifold is the infinum over all such points.
17
Stoke’s Theorem
2.5 Stoke’s Theorem

In this section I will give two different types of generalisations of Stoke’s theorem,
which will prove relevant once I have finished with the preliminaries and begin to
develop the information geometry in chapter 7. The first of these I developed a
number of years ago, back in 2002-3. I did not derive a careful proof of this first
result however until late 2007.
Theorem 2.5.1. Let N k+1 , M n+1 be differential manifolds. Let U k+1 (m) ⊂ N ,
V n+1 ⊂ M be compact subsets of N , M , where U k+1 (m) is a family of subsets
of N specified by m ∈ V such that the function m 7→ U k+1 (m) is smooth. Let
ω ∈ Ωk (N ) ⊗ Ωn (M ). Let dN be the exterior derivative or de Rham operator on
Ω(N ); let dM be likewise w.r.t. Ω(M ). Then we have
Z Z Z Z
dN dM ω = ω (2.1)
V U (m),m∈V ∂V ∂U (m),m∈V
Remark. The nontrivial nature of this result is that ω is allowed to have the form
f (x, y)dx1 ...dxk dy1 ...dyn
in local coordinates, though it is possible that in all the examples I have been
working with the function f (x, y) could be expressed in a power series expansion
P ∞ P∞
i=0 j=0 ai (x)bj (y). But it is still not obvious how to prove the above because we
are dealing with a smooth family of sets in N specified by a single set in M .
Remark. The general idea of this result, or at least a generalisation, will prove useful
later when I am looking into mathematical turbulence in chapter 9.
Proof. Without loss of generality, assume that ω = f (x, y)dxdy in local coordinates
as in the remark above. Make the further assumption that f and all its derivatives
vanish at infinity. Then we may split f as a doubly infinite sum of a set of suitable
eigenfunctions as in the remark above. The result then follows by applying the
standard form of Stoke’s theorem twice for each term in the sum, and then bringing
everything back together.
The second type of generalisation of Stoke’s theorem is as follows:
18
Theorem 2.5.2. Let M be a n-manifold. Let A be diffeomorphic to the space

of inner products on Rn . Consider a family of differential operators d(M ;a) on M
indexed smoothly by A. Let ω(m, a) be a family of smooth (n − 1) forms on M
indexed smoothly by A. Then we have that
R R R R
M A
d(M ;a) ω(m, a) = ∂M A
ω(m, a)
Remark. We can in fact view A as a distribution of sorts over each point of M ,

so in fact this result has a statistical flavour. Additionally, we are taking only
one derivative, and not two. It will turn out that this sort of result is core to
the development of the theory of information for statistical manifolds, which is the
subject matter of chapters 6 through 8.
Proof. A bit of care is required here since we are dealing with statistical deriva-
tives. First observe that we may write ω(m, a) = F (m, a)dmda, and d(M ;a) (m) =
expa (m) ◦ dM ◦ exp−1
R
b∈A b (m) where we have a 1-1 correspondence between a ∈ A
and inner products at m ∈ M .
Then
expa dM (exp−1
R R R R R
M A
d(M ;a) (m)ω(m, a) = M A A b (m)F (m, a))dbdadm
By Stoke’s theorem in the standard case this reduces to
expa exp−1
R R R
∂M A A b F (m, a)dbdadm
since dM expa (m) = 0 via conservation of probabilistic flux.

This finally can be simplified, after observing that since d(M ;a) is a statistical
derivative we must have
expa exp−1
R
A b db = IdM
to
R R
∂M A
F (m, a)dadm
19
Kähler geometry and Convexity Theory
Remark. So essentially the theorem more or less works, but we have to make sure
that our family of differential operators is chosen in such a way to make ”physical
sense”. In other words, the probabilistic flux must be conserved, and also we need
to have that our distribution is normalised in probabilistic weight to one. Otherwise
we run into trouble. The appropriate tools to tackle these issues will be developed
in some greater detail and along slightly different lines in chapter 6.
2.6 Kähler geometry and Convexity Theory

2.6.1 The notion of convexity in determining stability
There is a well known theorem in minimal surface theory that gives conditions under
which a surface is in fact area minimising.
Theorem 2.6.1. (folk lore). If one has a minimal graph over a compact, convex
domain, then it is in fact a length minimising graph, where by length I mean length
in the general n-dimensional sense.
This leads one to ask the natural questions:
If one has a statistical structure in a suitable sense, what are the conditions on
a substructure for it to be a volume minimising solution?
Is it possible to have a weaker condition than convexity on a domain for it to
be necessarily minimising?
It turns out that the theory of calibrations is uniquely suited to answering the
second of these two questions. As for the first, that will have to wait until later.
I give here the main results and ideas about calibrations without proof. The main
reference here is the paper of Harvey and Lawson on the matter [27], though I mainly
made use of a review article by Ivanov [29].
Definition 1. Let φ be a differential form of degree k on Rn . Suppose that dφ = 0
and φ(ζ) ≤ vol(ζ) for any oriented k-plane ζ, where vol is the volume form in Rn .
Then we will say that φ is a n-Euclidean calibration form.
The utility of such forms is that they can be used to prove the existence of stable
globally minimising submanifolds:
Theorem 2.6.2. If X k is a k-surface, and φ a n-Euclidean calibration form such
that φX (the restriction of φ to X) is the volume form on X, then X is globally
minimal in Rn .
20
Of course, the notion can be extended to any smooth Riemannian manifold M n .

In this case, we just say that φ is a calibration form. The theorem goes through for
this generalisation as well.
2.6.2 Kähler manifolds

I shall now proceed to define carefully the idea of a Kähler manifold and develop the
theory of these objects to the point where I am able to justify the theorem above.
In doing this I will follow the treatment given by Lawson in [L2] very closely.
Definition 2. A n-complex manifold is a manifold M that is locally diffeomorphic

to C n , and which has biholomorphic transition functions (so that the differentials
of these functions are everywhere complex linear). In particular, this implies that
there is a map Jp : Tp M → Tp M such that
Jp2 = −1 (2.2)
So if we have local coordinates (z 1 , ..., z m ) = (x1 + iy 1 , ..., xm + iy m ) where the z i

are complex and the xi , y i are real then J(∂/∂xi ) = ∂/∂y i and J(∂/∂y i ) = −∂/∂xi .
We also have that
TX,Y = [JX, JY ] − J[JX, Y ] − J[X, JY ] − [X, Y ] = 0 (2.3)
as can be easily checked. There is a deep theorem of Newlander and Nirenberg

that states that any C ∞ manifold that admits a tensor field J of type (1, 1) satisfying
the above equations can be made into a complex manifold with an appropriate atlas
of charts.
We would of course like to address the problem of what a natural choice of
metric on a complex manifold should be. When all is said and done, we require the
following restrictions on the choice of metric:
g(JX, JY ) = g(X, Y ) (2.4)

for all p ∈ M and all X, Y ∈ Tp M . In other words we want J to be an isometry, so
the metric g must be Hermitian.
(∇X J)(Y ) = ∇X (JY ) − J(∇X Y ) = 0 (2.5)
21
for all tangent vector fields X and Y on M , where ∇ is the riemannian connec-
tion. This is equivalent to requiring that J be globally parallel to the riemannian
connection.
Definition 3. If a complex manifold M satisfies these two equations, we say that
M is a Kähler manifold.
It is useful to extend a hermitian metric g on a complex manifold M to a complex
valued, ”sesquilinear” form h defined in the following way:
h(X, Y ) = g(X, Y ) + iω(X, Y )
where
ω(X, Y ) = g(X, JY )
for all X, Y ∈ Tp M .
The condition of being Kählerian imposes a strong restriction on ω:
2.6.3 Relation to minimal surface theory

In minimal surface theory there is a well known result.
Theorem 2.6.3. All minimal graphs f : R2n → Rk over domains of even dimension
can be lifted to minimal graphs fˆ : C n → C k .
More generally, we have the following result:
Theorem 2.6.4. If the ambient space M is Kähler, then all complex submanifolds U
with (possibly empty) boundary B are solutions of the plateau problem with boundary
B.
This is reason enough to be relatively excited about using Kähler geometry.
Evidently we would like to use this somehow in relation to ideas of generalised
length.
However, note that this theorem is limited in that it does not deal with the
plateau problem for submanifolds of a Kähler manifolds of odd real dimension. This
concern cannot actually be resolved, and is in fact where the hope that somehow
Kähler manifolds are natural objects to look at hits a stumbling block. The truth
of the matter is that one is adding a certain amount of structure and hence getting
strong results for one’s efforts; but the assumptions on structure cannot be extended
to completely general spaces.
I will now proceed to prove the second theorem.
22
Lemma 2.6.5. If M is a complex manifold with a hermitian metric g, then g is

Kahlerian iff dω = 0.
Proof. Fix a point p ∈ M . Let X1 , X2 , X3 be tangent vectors to p and extend them

to local fields in M such that (∇Xi Xj )p = 0 for i, j = 1, 2, 3 (this can be achieved
by parallel transporting these vectors along the geodesics that they generate). So
then [Xi , Xj ]p = 0 and hence dωp (X1 , X2 , X3 ) = X1 ω(X2 , X3 ) − X2 ω(X1 , X3 ) +
X3 ω(X1 , X2 ). Then since ω(X, Y ) = g(X, JY ) we have that
dωp (X1 , X2 , X3 ) = g(X2 , (∇X1 J)X3 ) − g(X1 , (∇X2 J)X3 ) + g(X1 , (∇X3 J)X2 )
Then if g is Kählerian, (∇X J)Y = 0 for all X, Y ∈ Tp M so this expression is

zero.
To prove the converse, observe first that (∇X J)J = −J(∇X J) as J 2 = −1, and
∇JX = J∇X since the integrability tensor is 0. Then
dωp (X1 , JX2 , JX3 ) =

g(JX2 , −J(∇X1 J)X3 ) − g(X1 , −(∇X2 J)X3 ) − g(X1 , −(∇X3 J)X2 )
So dω(X1 , X2 , X3 ) − dω(X1 , JX2 , JX3 ) = 2g(X2 , (∇X1 J)X3 ), from which it

easily follows that the condition that dω = 0 implies the Kählerian condition
(∇X J)Y = 0 for all X, Y .
We now proceed to define another natural notion.
Definition 4. Suppose M is a complex manifold. Then by a complex submani-

fold we mean an immersion θ : K → M of a complex manifold K such that the
representations of θ in complex coordinates are holomorphic.
Lemma 2.6.6. Every complex submanifold of a Kähler manifold is Kählerian in

the induced metric.
Proof. We need only prove things locally, so consider a small embedded complex
submanifold K ⊂ M . Then if X, Y ∈ T K, we have
∇X (JY ) = (∇X JY )T = (J∇X Y )T = J(∇X Y )T = J∇X Y
So K is Kählerian.
Theorem 2.6.7. Every complex submanifold of a Kähler manifold is minimal.
23
Lemma 2.6.8. (Wirtinger’s Inequality). Suppose M is a Kähler manifold and let

K ⊂ M be any 2m dimensional oriented real submanifold. Suppose dVp is the volume
form of the induced metric on K at p. Then the restriction of ω m = ω ∧ ... ∧ ω, the
mth power of the Kähler form of M to Tp K satisfies ω m /m! ≤ dVp , with equality iff
Tp K is a complex subspace of Tp M .
Proof. (of Theorem). If M is any Kähler manifold and θ : K → M any complex

submanifold with K compact possibly with non-empty boundary with real dimension
2m. Then the volume of K in the induced metric is less than or equal to the volume
of any other 2m-dimensional submanifold homologous to K in M . In other words,
K is the solution of a plateau problem in M .
2.6.4 Further thoughts

One natural question to ask is:
Can we go the other way, ie, in certain cases, can we find for any solution
of a plateau problem a complex submanifold corresponding to it? Also, can the
solutions of plateau problems in odd real dimensions in Kähler manifolds always be
lifted to corresponding solutions of higher dimensional plateau problems in complex
coordinates? (Obviously lifts may not be unique. The key question here is whether
such a lift will always exist.)
One would hope that the answer to these question would be yes, as this would
be tremendously compelling support for the use of Kählerian geometry to describe
physical situations. But in general the answer is certainly no. For instance, simply
take the boundary of a Plateau problem in such a way that it cannot possibly be
matched to a complex submanifold of the ambient Kähler space.
However, it may be possible for very special Kähler manifolds to find a corre-
spondence. For instance, considering Kähler manifolds of complex dimension two
with index one. Then the sort of result we would like to prove would be:
Conjecture. Let M be a Kähler manifold of complex dimension two. Then all

submanifolds of index one (ie causally convex submanifolds) must necessarily be
complex submanifolds. Hence all causally convex solutions to the Plateau problem
in real dimension 2, 3 or 4 will be complex submanifolds.
The reason we might expect this to be true is because there actually is a corre-
spondence between minimal graphs and complex subdomains of the complex plane,
24
so it seems reasonable that we might be able to get the same result with a locally
C 2 manifold if we impose some sort of causality requirement.
In particular, we might expect a subset of solutions to certain physical systems
of even dimension to be Kähler, since they might be realisable as solutions to the
square roots of the relevant differential operator.
25
26
Chapter 3
A Survey of Geometric Measure

Theory
Geometric Measure Theory is an extraordinarily rich subject, initiated by Federer

(see his famous unreadable treatise, [15]), and still very much being developed by
such people as Leon Simon and Frank Morgan. My main references here are Simon’s
book [47], Morgan’s introductory text [35], and notes that I took from a series of
lectures given by Marty Ross in the first half of 2006.
3.1 Introduction and Motivation

Geometric measure theory is essentially an extension of differential geometry de-
signed to deal with possible convergence problems for variational problems. For
instance, one might be interested in finding a family of surfaces converging to say a
minimal surface defined by some boundary in a higher dimensional domain. This is
known as the Plateau Problem. The essential nature of the game here is:
(i) Take some sort of geometric variational problem (like the Plateau problem)
(ii) Convergence may not be well defined for the usual class of ”nice” manifolds.
So in order to get nice convergence, extend solutions to the variational problem to
a class containing the ”nice” manifolds but also containing uglier beasts. (To gain
something you must first lose something). The uglier beasts would be rectifiable
sets, rather than differential manifolds, for instance (the nature and construction of
these will be explained shortly).
27
Introduction and Motivation
(iii) Identify conditions on the problem (ambient space, boundary conditions)

that are sufficient to imply regularity of the solution to the variational problem, i.e.
prove that the solution is not an ugly beast, but is one of the nice guys. Results
that assist with this process are things like the Allard regularity theorem and its
cousins.
3.1.1 The Plateau Problem as the Prototypical Example

The Plateau Problem is actually the problem that historically the technology of
geometric measure theory was developed for. The statement of the problem, in its
simplest, ungeneralised form, is quite simple:
Given a closed boundary curve in euclidean three space, what is the surface of
minimal area spanning this curve?
The main advantage of considering this problem is that it can be visualised very
easily- it is clear, given some particular loop of curve, what will be bad candidate
surfaces, for instance, and what will be good ones. It is also clear that we should be
able to ”improve” upon an initial approximation to the best surface by deforming
to an ideal, ”minimal” surface smoothly. There are problems with this approach,
however- which is the whole point of geometric measure theory- but let us first
assume that we can do this.
A good first step towards determining whether a surface is minimal is to find the
necessary criterion for it to be stationary in area- for it is obvious that if one were to
perturb such a surface very slightly in any way, such that we get a family of surfaces,
the first derivative of the area of such a family evaluated at the ideal surface should
be zero. This approach leads one to what is known as the minimal surface equation.
I should now proceed to demonstrate how the minimal surface equation is derived
according to standard variational techniques, for the case in which a surface can be
realised as a graph.
If the surface can be realised as a graph, then there is some f : D ⊂ Rn → R
such that the area A(f ) is given by the expression
R p
A(f ) = D 1 + (∇f )2 dA
Let gt = f + tη be an arbitary variation of f , i.e. such that η|∂D = 0. Then if f

is a minimal graph we must have that
dA(gt )
dt
|t=0 =0
28
A Survey of Geometric Measure Theory
(This is a necessary condition for a minimiser).

R p
So since A(t) = D 1 + (∇f + t∇η)2 dA,
R p
A0 (t) = d
dt D
1 + (∇f + t∇η)2 dA
By a result from measure theory, we may exchange the integral sign and the
derivative and get
Z
0 ∂p
A (t) = 1 + (∇f + t∇η)2 dA
D ∂t
2(∇f + t∇η) · ∇η 12
Z
= p dA (3.1)
D 1 + (∇f + t∇η)2
Then for any η,
0 = A0 (0) = √∇f ·∇η 2 dA

R
D 1+(∇f )
Integrating by parts we get that
[η √ ∇f ·n ∇· √ ∇f
R
]|∂D − D
ηdA = 0
1+(∇f )2 1+(∇f )2
The first term vanishes since η|∂D = 0. Since η is otherwise arbitrary, we then
conclude by the fundamental lemma of the calculus of variations that
∇f
∇· √ =0
1+(∇f )2
For a proof of the fundamental lemmaR of the calculus of variations, suppose

x
h : [x0 , x1 ] → R is continuous and suppose x01 hη = 0 for every C 1 η : [x0 , x1 ] → R
R
with η(x0 ) = 0 = η(x1 ). Then I claim h = 0 (if we merely assume hη = 0 for
every measurable η, we may only conclude h = 0a.e.).
The proof is by contradiction. For suppose h(a) 6= 0. Without loss of generality,
suppose in particular
R x1 that h(a) > 0. But then by continuity it is possible to find an
η such that x0 hη > 0, which is impossible.
29
Introduction and Motivation
3.1.2 Possible problems with the standard variational ap-

proach
As mentioned before, there are a couple of technical issues that might compromise
the above naive approach. I will describe them here.
The main problem is the issue of convergence. For it is possible to find a fam-
ily of surfaces that gets arbitrarily close to the minimal area, but which becomes
pathological in the limit- for instance, one may get space filling curves- very horrible
indeed!
Another problem that may occur might be for instance the possibility that in
perturbing an initial smooth surface to try to find an ideal one, one may produce
self intersections in a new surface- and hence it will no longer be a manifold.
3.1.3 Geometric Measure Theory comes to the rescue
Geometric measure theory deals with these issues by slightly weakening the family
of things that one is working with- to surfaces that are smooth almost everywhere,
and that have integer multiplicity. In particular, one introduces the concept of
rectifiability. It turns out that the space filling component of the pathological family
of surfaces previously described is unrectifiable; whereas the ”good bit” is rectifiable.
There is a structure theorem about the objects one works with that in fact shows
that there is a unique decomposition into these two separate pieces. Finally, if one
has ”nice” conditions, such as a smooth embedded boundary curve and a smooth
ambient space, we may conclude, using a result known as the Allard Regularity
theorem, that the rectifiable bit is actually the smooth surface we are looking for.
Moreover, we are able to conclude that it satisfies the minimal surface equation, and
so our ”naive” analysis is justified.
I should remark that the Plateau Problem is not the only problem that is
amenable to these methods. Any variational problem can be treated in an anal-
ogous manner, and benefits to an equal degree from the tools of this theory. It is
merely that the Plateau Problem is easy to understand in a visual manner, and so
makes a very good example. Since we may in general be considering much broader
variational problems, it makes sense to extend from merely considering integer mul-
tiplicities and consider any value for the multiplicity at any point on the surface;
this is where the idea of density comes into its own.
30
3.2 Sobolev Spaces

Before moving on to the particulars of geometric measure theory, it is necessary
first to understand some concepts and definitions from analysis. In particular it is
k,p
necessary to understand the so called Sobolev spaces W k,p (Ω) and Wloc (Ω).
I shall follow [26] in my treatment of the motivation for these objects.
3.2.1 Preliminary Results

Definition. A Hilbert space is a set H together with a bilinear operation (.,.):
H × H → C, satisfying the following relations:
(i) (x, y) = (y, x) for all x, y ∈ H,
(ii) (ax + by, z) = a(x, z) + b(y, z) for all a, b ∈ R,x, y, z ∈ H, and
(iii) (x, x) ≥ 0 for all x ∈ H, with equality iff x = 0.
We call this operation an inner product on H. For H to be a Hilbert space we
require that it must be complete with respect to this inner product.
Let H be a Hilbert space, and x, y ∈ H. Then we have the following inequalities:
The Schwarz Inequality.
|(x, y)| ≤ kxkkyk (3.2)
The Triangle inequality.
kx + yk ≤ kxk + kyk (3.3)
Parallelogram law.
kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ) (3.4)
We also have:
Theorem 3.2.1. (Pythagoras’ Theorem). If (x, y) = 0 (i.e. x ⊥ y), then kx+yk2 =

kxk + kyk.
Proof. Very easy.
31
Sobolev Spaces
Now I shall prove the Schwarz inequality.

Let y 6= 0 first. Then, choose α ∈ C such that x − αy ⊥ y. So (x − αy, y) = 0.
Hence (x, y) − αkyk2 = 0, or α = (x,y)
kyk2
. By Pythagoras’ theorem for x − αy and y,
kxk2 = k(x − αy) + αyk2

= kx − αyk2 + kαyk2
|(x, y)|2
≥ |α|2 kyk2 = kyk2
kyk4
|(x, y)|2
= (3.5)
kyk2
From which the Schwarz inequality can easily be extracted. The triangle in-
equality follows easily from the Schwarz inequality.
The following theorem is a very important one, and gets used throughout anal-
ysis. We shall need it in what is to follow.
Theorem 3.2.2. (The Riesz Representation Theorem). For every bounded linear
functional F on a Hilbert space H, there is a uniquely determined element f ∈ H
such that F (x) = (x, f ) for all x ∈ H and kF k = kf k.
Proof. Let N = x|F (x) = 0 be the null space of F . If N = H, the result follows
trivially by taking f = 0. Otherwise, since N is a closed subspace of H, there exists
an element z 6= 0 in H such that (x, z) = 0 for all x ∈ N . Then F (z) 6= 0 and
moreover for any x ∈ H,
F (x) F (x)
F (x − F (z)
z) = F (x) − F (z)
F (z) =0
F (x)
so x − F (z)
z ∈ N . Hence by our choice of z
F (x)
(x − F (z)
z, z) =0
or in other words
F (x)
(x, z) = F (z)
kzk2
32
Hence F (x) = (f, x) where f = zF (z)/kzk2 , showing existence. To prove

uniqueness suppose F (x) = (g, x). Then in particular 0 = F (x) − F (x) = (f − g, x)
for every x ∈ H. In particular (f − g, f − g) = kf − gk = 0. But this is true iff
f − g = 0 by inner product axiom (iii). So f = g.
To see that kF k = kf k, we have, by the Schwarz inequality,
kF k = supx6=0 |(x,f
kxk
)|
≤ supx6=0 kxkkf
kxk
k
= kf k,
and also that
kf k2 = (f, f ) = F (f ) ≤ kF kkf k
proving equality of the norms.
The Lp (Ω) and Lploc (Ω) spaces consist of p-integrable functions over Ω and locally
p-integrable functions over Ω respectively.
R
For ψ to be p-integrable means that Ω ψ p is well defined and is finite.
R
For ψ to be locally p-integrate means that for any K ⊂⊂ Ω then K ψ p is well
defined and is finite.
The Lp (Ω) norm is defined on p-integrable functions on Ω by
Z
kukp;Ω = ( |u|p )1/p (3.6)
Ω
A multi index α is a string (α1 , ..., αn ), and is used as shorthand for multi-variable
α1 αn
differentiation, e.g. Dα f = ( ∂∂α1 xf1 )...( ∂∂αn xfn ).
3.2.2 Motivation
Let Ω be a closed domain, and φ ∈ C01 (Ω) (singly differentiable functions on Ω with
compact support). By the divergence theorem a C 2 (Ω) solution of ∆u = f satisfies
Z Z
Du.Dφdx = − f φdx (3.7)
Ω Ω
R
since the above is equivalent to Ω DRi (φDi u)dx = 0, which, by the divergence
theorem, is equivalent to the statement ∂Ω φDu.vds = 0 where v is a unit normal
to ∂Ω. Since Ω is closed, this statement makes sense.
33
Sobolev Spaces
Now the bilinear form

Z
(u, φ) = Du.Dφdx (3.8)
Ω
is an inner product on C01 (Ω). Hence we can complete C01 (Ω) with respect to the
metric induced by this equation to get a Hilbert space, which we notate as W01,2 (Ω).
This is an example of a so called Sobolev space.
R
The linear functional F defined by F (φ) = − Ω f φdx may be extended, through
appropriate choice of f , to a bounded linear functional on W01,2 (Ω). Then by the
Riesz representation theorem there exists an element u ∈ W01,2 (Ω) such that (u, φ) =
F (φ) for all φ ∈ C01 (Ω). Hence a generalised solution to the Dirichlet problem,
∆u = f , u = 0 on ∂Ω, has been found. So we have reduced the classical problem to
a question of whether this solution we have found in W01,2 (Ω) is in fact in C01 (Ω), in
other words, a regularity problem.
3.2.3 The Sobolev Spaces W k,p (Ω), W0k,p (Ω) and Wloc
k,p
(Ω)
Let u ∈ L1loc (Ω). Given a multi-index α, v ∈ L1loc (Ω) is called the αth weak derivative
of u if
Z Z
|α|
φudx = (−1) uDα φdx (3.9)
Ω Ω
for all φ ∈ C0∞ (Ω). We write v = Dα u, and observe that this object is defined
up to sets of Lebesgue measure zero.
We then define the Sobolev space
W k,p (Ω) = Lp (Ω) ∩ {u : Dα u ∈ Lp (Ω), |α| ≤ k} (3.10)
We equip W k,p (Ω) with the norm

Z X
kukk,p;Ω =( |Dα u|p dx)1/p (3.11)
Ω |α|≤k
which is easily seen to be equivalent to
34
X
kDα ukp;Ω (3.12)
|α|≤k
We define the local Sobolev spaces in the obvious manner:
k,p
Wloc (Ω) = Lploc (Ω) ∩ {u : Dα u ∈ Lp (Ω), |α| ≤ k} (3.13)
Finally, we define the spaces W0k,p (Ω) = W k,p (Ω) ∩ Lp0 (Ω) where Lp0 is the space
of p-integrable functions on Ω with compact support. In other words, W0k,p is the
set of (k, p)-Sobolev functions with compact support.
3.2.4 Approximation of Sobolev functions by smooth func-

tions
3.2.5 The Sobolev Inequality
I follow [26] in this section.
We have the following results regarding comparability of different norms. The
first result relates the Lp norms of a Sobolev function and its gradient. The second
∗
relates the Lp norm of a Sobolev function with its (k, p)-Sobolev norm. Here p∗ =
np/(n − kp).
Theorem 3.2.3. Let Ω ⊂ Rn , n > 1 be an open domain. There is a constant

C = C(n, p) such that if n > p, p ≥ 1, and u ∈ W01,p (Ω) then
kuknp/(n−p);Ω ≤ CkDukp;Ω
Furthermore, if p > n and Ω is bounded, then u ∈ C(Ω) and
supΩ |u| ≤ C|Ω|1/n−1/p kDukp;Ω
Theorem 3.2.4. Let Ω ⊂ Rn be an open set. There is a constant C = C(n, k, p)

such that if kp < n, p ≥ 1, and u ∈ W0k,p (Ω), then
kukp∗ ;Ω ≤ Ckukk,p;Ω .
If kp > n, then u ∈ C(Ω and
35
Further preliminaries from analysis
0 Pk−1
supΩ |u| ≤ C|K|1/p { |α|=0 (diamK)|α| α!
1
kDα ukp;K
+(diam(K))k (k−1)!
(
k − np )−1 kDk ukp;K }
where K = spt u and C = C(k, p, n).
3.3 Further preliminaries from analysis

3.3.1 Lebesgue Measure
m-dimensional Lebesgue Measure Lm is defined on Rm for sets A as
Lm (A) = inf {Σ∞ ∞

j=1 v(Pj ) : A ⊂ ∪j=1 }
It is what is known as an ”outer measure”, that is,

Lm : P (Rm ) → [0, ∞] where P (Rm ) is the power set of Rm , and satisfies the
following axioms:
(i) Lm (φ) = 0
(ii) A ⊂ B implies Lm (A) ≤ Lm (B)

P∞ m
(iii) Lm (∪∞
j=1 Aj ) ≤ j=1 L (Aj )
and it also satisfies the property that Lm (box) =Vol(box).

Lebesgue measure is a very important measure, because we have that all Borel
sets of Rm are Lm measurable. This follows from Caratheodory’s Criterion, which
states that if one has a measure µ on a metric space X, then if µ(A ∪ B) = µ(A) +
µ(B) whenever d(A, B) > 0 then µ is a Borel measure, ie all Borel sets in X are
µ-measurable.
I remind the reader that a Borel set is an element of the Borel-algebra corre-
sponding to a particular topology, which is the intersection of all σ-algebras con-
taining the closed sets in that topology. A σ-algebra is a set of sets, which is closed
under complements, intersections, unions, and which contains the empty set and
the whole space. This is related to the concept of measurability, and hence to inte-
gration, since the set of all µ-measurable sets in a metric space X is a σ-algebra (a
µ-measurable set A ⊂ X satisfies µ(B) = µ(B ∩ A) + µ(B − A) for any B).
36
Proof. (of the Caratheodory Criterion). See J. Koliha’s notes [34].
Remark. Lebesgue measure is Borel regular, that is, for any A, Lm (A) = Lm (B)
some some B ⊃ A Borel. To see this, let Bj = ∪k Pjk and L(Bj ) ≤ L(A) + 1j . Set
B = ∩j Bj .
Remark. Lm is Radon, that is, any A can be approximated by closed sets on the
inside, and open sets on the outside. Equivalently, this means that Lm is Borel
regular and Lm (A) < ∞ for A bounded.
3.3.2 Hausdorff Measure

n-dimensional Hausdorff measure, H n , on Rm , n ≥ 0 is also a map from the power
set of Rm to [0, ∞], and also satisfies the axioms (i), (ii) and (iii) of an outer measure.
Furthermore, H n (M ) = n-volume of M by any classical method.
We define
P∞ diamB̄j n
Hδn (A) = inf { j=1 ωn ( 2
) : diamBj ≤ δ, A ⊂ ∪∞
j=1 Bj },
where ωn is the n-volume of the n-ball. n-dimensional Hausdorff measure is then

defined as
H n (A) = limδ→0+ Hδn (A)
Note that Hδn is an (outer) measure, and, furthermore, that H n is a Borel regular
measure. To see the latter, let A, B be two sets with d = d(A, B) > 0. Take δ < d.
Then Hδn (A ∪ B) = Hδn (A) + Hδn (B). So H n is Borel. To see that it is in fact Borel
regular, take δj = 1j and a family of covers {Tjk } such that H n1 (∪k T̄jk ) = H n (A) + 1j .
j
n 1
However, H is not Radon if n < m. For example, H (unit 2-disc) = ∞.
Other interesting facts include that
(i) H n is an isometry, that is, H n (ρ(A)) = H n (A) if ρ : Rm → Rm is an isometry.
(ii) H 0 is the counting measure (ω0 = 1).
Note. H n makes sense in any metric space X, and for any n ≥ 0, not necessarily
a natural number.
37
Remark. Hausdorff measure may otherwise seem quite arbitrary, but the fact that
it is invariant under rigid motions of Rm means that oddly rotated sets can be
measured quite easily using it, whereas they would be frightful to compute using
Lebesgue measure. In fact, it turns out that we may compute the Lebesgue measure
of horridly rotated sets using Hausdorff measure, due to the following extreme useful
and deep theorem:
Theorem 3.3.1. H n = Ln on Rn .
Proof. See Morgan.
Finally, it is sometimes useful to have some idea of the dimension of a set,

particularly if it behaves in a slightly pathological way. Hausdorff measure proves
to be quite useful in understanding at least this level of behaviour of quite unruly
beasts indeed.
Definition 5. (Hausdorff Dimension). The Hausdorff dimension of a set A is defined

to be the supremum of numbers s such that H s (A) = ∞, or equivalently, the infimum
of numbers s such that H s (A) = 0.
3.3.3 Densities
Let A ⊂ Rn , a ∈ Rn , m ≤ n. We define the m-dimensional density Θm (A, a) of A
at a by
Hm (A ∩ B n (a, r))
Θm (A, a) = limr→0 (3.14)
αm r m
where αm is the size of the ball of radius one in Rm .
Define Θm (µ, a) where µ is a measure of Rn by
µ(B n (a, r))

Θm (µ, a) = limr→0 (3.15)
αm r m
3.3.4 Lipschitz and BV Functions

Definition 6. A Lipschitz function, f : Rm → Rn is a function satifying the crite-
rion:
38
|f (x) − f (y)| ≤ C|x − y|
for some constant C. The minimal such C is referred to as Lip f .
We have the following very important result about Lipschitz functions:
Theorem 3.3.2. (Rademacher’s Theorem). A Lipschitz function f : Rm → Rn is

differentiable almost everywhere.
Later we will be interested in looking at sets that can be realised locally as

graphs of Lipschitz functions. So from this theorem it follows that such objects are
smooth except on a set of measure zero, ie they are smooth manifolds except on a
set of measure zero.
Proof. A proof of Rademacher’s theorem can be found in Morgan’s book [35].
There are other results:
Theorem 3.3.3. (Whitney’s Theorem). Let f : Rm → Rn be Lipschitz. Then for

any > 0 there is a C 1 g : Rm → Rn such that Lm ({x : f 6= g}) < , where Lm is
the m dimensional Lebesgue measure.
Theorem 3.3.4. Lipschitz functions are weakly differentiable, that is, for any Lip-
schitz f there exists a g which is the weak derivative of f , ie., such that
R R
ηDi g = − f Di η
for any arbitrary η.
Definition 7. BV functions, otherwise known as functions of bounded variation,

are Lipschitz functions whose weak derivative is a finite Radon measure. That is, for
any open set Ω ⊂ Rn , we say u ∈ L1 (Ω) has bounded variation if ∃ a finite Radon
measure Du such that
R R
Ω
u divφ = − Ω
φ · Du,
for all φ ∈ Cc1 (Ω, Rn ). We then write u ∈ BV (Ω).
Alternatively, we have the following equivalent statement:
39
Definition 8. A Lipschitz u is said to be in BVloc (Ω) if for each W ⊂⊂ Ω there is

a constant c(W ) < ∞ such that
R
W
u divφ ≤ c(W )supkφk
for all vector functions φ = (φ1 , ..., φn ), φi ∈ Cc∞ (W ).
Proposition 1. If u ∈ BVloc (Ω), then for any U ⊂⊂ Ω, there is a measure µ such

that
R R
U
u divφ = U
φ·µ
for any φ ∈ Cc∞ (U, Rn ).
Proof. (See Simon [47] p.35)
Yet another way of expressing the same thing:

R
Define the variation of u in Ω as V (u, Ω) = sup{ Ω u divφ : φ ∈ Cc1 (Ω, Rn ), kφkL∞ (Ω) ≤
1}. Then BV (Ω) = {u ∈ L1 (Ω) : V (u, Ω) is finite}.
These things are important mainly because of the following compactness theo-
rem, which is, in some sense, the fundamental theorem of geometric measure theory:
Theorem 3.3.5. (Compactness Theorem for BV functions). If {uk } is a sequence

of BVloc (U ) functions satisfying
Z
supk≥1 (kuk kL1 (W ) + kDuk k) < ∞ (3.16)
W
for each W ⊂⊂ U , then there is a subsequence {uk0 } ⊂ {uk } and a BVloc function
u such that uk0 → u in L1loc (U ) and
Z Z
kDuk ≤ liminf kDuk0 k (3.17)
W W
for all W ⊂⊂ U .
Proof. I shall prove this result later.
Remarks.
40
(i) The finiteness condition above is necessary in the sense that in order to get a
compactness result of this nature, we need the sequence of functions and their
derivatives to be bounded in every bounded domain. If we wanted to establish
L2loc (U ) convergence, we would need all second derivatives to be bounded too.
(ii) The inequality following the finiteness condition is essentially a kind of Fatou’s
lemma for first order derivatives.
Having got this result, the game we now play is the following- we want to
construct a family of BVloc functions to locally represent our surface, for example
(in the case we are considering the Plateau problem). It then follows from this
theorem that this family will have a convergent subsequence. So we want to do
calculus of variations with BVloc functions. The difficulties with this will lie of
course in the fact that we may have corners and other pathological behaviours on
sets of measure zero.
3.3.5 Jacobians and the Area Formula

In dealing with maps from one rectifiable set to another, it becomes a matter of
interest as to how to compute the Jacobian of such a map, as this knowledge proves
integral to establishing the area and coarea formulae (to follow). If f : Rm → Rn is
differentiable at a point p, the k dimensional Jacobian of f at a, Jk f (a), is defined
to be the maximal k-dimensional volume of the image under Df (a) of a unit k-cube.
If rank(Df (a)) < k then Jk f (a) = 0. Otherwise, if rank (Df (a)) ≥ k, then
Jk f (a)2 is merely the sum of the squares of the determinants of the kxk submatrices
of Df (a). If k = m or n, Jk f (a)2 = det(Df T (a)Df (a)).
The definition of Jk f has been arranged so as to make the following theorem
easy to state.
Theorem 3.3.6. (The Area Formula). Suppose f : Rm → Rn is Lipschitz for

m ≤ n. Then
R R
J f (x)dLm x =
A m Rn
N (f |A, y)dHm y
and
R R
u(x)Jm f (x)dLm x = u(x)dHm y
P
Rm Rn x∈f −1 (y)
41
where N (f |A, y) is the cardinality of the set {x ∈ A : f (x) = y} and u is any

m
H integrable function.
Proof. I will follow [35]. Suppose rank Df = m. Let {si } be a countable dense set of
affine maps of Rm onto m-dimensional planes in Rn (recall an affine map is a linear
transformation followed by a translation). Suppose E is a piece of A such that for
each a ∈ E the affine functions f (a) + Df (a)(x − a) and si (x) are approximately
equal.
Then detsi is approximately Jm f on E, and f is injective on E since all affine
maps are injective.
Since f is differentiable, we can locally approximate f by f (aj ) + Df (aj )(x − aj )
in some small neighbourhood of aj which we can call Ej for a dense subset {aj }
of A. Since the set {si } is dense there is some element of this set, call it sj after
possible relabelling, such that sj is arbitrarily close to the local approximation to f
in Ej . Then evidently these neighbourhoods cover A.
So for each piece Ei ,
Hm (f (Ei )) ≈ Hm (si (Ei )

= Lm (si (Ei ))
Z
= det(si )dLm
ZEi
≈ Jm f dLm (3.18)
Ei
Summing over the sets Ei gives

(number of sets Ei intersecting f −1 {y})Hm y ≈ A Jm f dLm .
R R
Rn
which in the limit for a sequence of covers Ei,j such that as j → ∞, diam(Ei,j ) →
0∀i uniformly, we get
R R
Rn
N (f |A, y)dHm y = A Jm f dLm , as required.
R
What
R if rank(Df ) < m? Then J f is zero. Evidently we would like to show
A m
m
that Rn N (f |A, y)dH y is also zero.
Consider the function g : Rm → Rn+m , x 7→ (f (x), x).
Then Jm g ≤ (Lipf + )m−1 . This follows from the fact that in diagonal form
Df has maximal entries Lipf, ..., Lipf along the diagonal repeated rank(Df ) =
42
p < m times followed by a string of zeroes. So Dg will have maximal entries

Lipf + , ..., Lipf + , along the diagonal since rank(Dg) = m. Then Jm g will be,
maximally, the product of these entries, that is, (Lipf + )m−1 , as required.
Since Dg has rank m,
R
Hm (f (A)) ≤ Hm (g(A)) = A Jm g ≤ (Lipf + )m−1 Lm (A)
Hence we have that HmR(f (A)) = 0, since = 0 corresponds to the map f . From
this it quickly follows that Rn N (f |A, y)dHm y is zero, and we are done.
The second part of the theorem follows by approximating u by simple functions.
3.3.6 The Co-Area Formula

The following result is very closely related to that of the previous section. It essen-
tially covers the other case, where we are dealing with maps from higher dimensional
spaces into lower dimensional ones, rather than vice versa.
First I remind readers of a well known result from analysis, Fubini’s Theorem.
Theorem 3.3.7. (Fubini’s Theorem, Special Case). Suppose µ is Hausdorff measure
on Rn and ν is Hausdorff measure on Rp . Then if f ∈ L1 (µ ⊗ ν),
R R R R R
Rn ×Rp
f (s, t)d(µ ⊗ ν) = Rn ( Rp f (s, t)dν(t))dµ(s) = Rp ( Rn f (s, t)dµ(s))dν(t).
Proof. Refer to [Ko], section 16.2, for a proof of this result.

Theorem 3.3.8. (The Coarea Formula). Suppose f : Rm → Rn is Lipschitz, with
m > n. Then if A is a Lm measurable set, then
Hm−n (A ∩ f −1 (y))dLn y.
R m
R
A
Jn f (x)dL x = Rn
Proof. Again following [35]. If f is orthogonal projection, then Jn f = 1 and the

coarea formula reduces to the special case of Fubini’s theorem above.
More generally, suppose Jn f 6= 0. We may subdivide A as in the proof of the
area formula, and hence may assume f is linear. Then f = L ◦ P , where P is
projection onto the orthogonal complement V of kerf and L is some nonsingular
linear map from V to Rn . Then
Z
Jn f dLm = |det(L)|Hm (A)
A
43
Key Constructions
since the map f is linear and so the stretch factor from A to f (A) will be given by
|det(L)|.
Z
= |det(L)| Hm−n (P −1 {y})dLn y
P (A)
(by definition of what it means to be a projection),

Z
= Hm−n ((L ◦ P )−1 {y})dLn y (3.19)
L◦P (A)
since V is n-dimensional and L is a nonsingular map.
But this is precisely the result we wanted. The general case for f nonlinear
comes through easily as a consequence.
3.4 Key Constructions

3.4.1 Tangent Cones
Define the tangent cone of E at a consisting of the tangent vectors of E at a:
\ x−a
T an(E, a) = {r ∈ R : r ≥ 0}[ Closure{ : x ∈ E, 0 < |x − a| < }] (3.20)
>0
|x − a|
Intuitively speaking, the tangent cone to a ∈ E is essentially what the rectifi-

able set E looks like near a to a first order (linear) approximation. If E is an n-
dimensional manifold, the first order approximation will always be an n-dimensional
plane. But in rectifiable sets the tangent cone may very well look like a cone, or any
number of more exotic structures. Draw an (n + 1)-dimensional disc surrounding
a by viewing E near a as embedded in Rn+1 for some sensible coordinate system.
Then the tangent cone at a can be thought of as a subset of this (n + 1)-disc, and
in fact is uniquely determined by its intersection with the boundary of the disc, or
the n-sphere.
44
3.4.2 Rectifiable Sets and a Structure Theorem

Recall that the natural measure for m-dimensional sets on Rn , is the Hausdorff
measure, Hm .
Borel subsets B of Rn are called (Hm , m) rectifiable if B is a countable union
of Lipschitz images of bounded subsets of Rm , with H(B) < ∞. I shall make this
more precise later. It is possible to associate to any rectifiable set B a canonical
tangent plane Tb B for each b ∈ B (also to be described in more detail later). (For
our treatment, sets will always be taken to have integer dimension, even though
geometric measure theory allows the study of more pathological sets.)
Theorem 3.4.1. (Structure Theorem). Let E be an arbitrary subset of Rn with

Hm (E) < ∞. Then E can be decomposed as the union of two disjoint sets E = A∪B
with A (Hm , m) rectifiable and B purely unrectifiable.
We can interpret this theorem as stating that any set E in Rn can be split
uniquely into a curvelike bit (a bit where calculus makes sense) and an uncurvelike
bit (where things get pathological). This follows from the closely related theorem:
Theorem 3.4.2. (Structure Theorem (2)). n-dimensional rectifiable sets admit

a tangent cone Hn -almost everywhere, and n-dimensional unrectifiable sets admit a
tangent cone Hn -almost nowhere. Hence any arbitrary subset E of Rn splits uniquely
into two disjoint sets Â ∪ B̂ with all points in A admitting a tangent cone and all
points in B not admitting a tangent cone.
Remark. Note that Â and B̂ are more or less the same as A and B, respectively, up
to sets of Hn measure zero.
3.4.3 Currents
The main building blocks of GMT are the rectifiable currents, m-dimensional ori-
ented surfaces. They are oriented rectifiable sets with integer multiplicities, finite
area, and compact support. One presumes an orientation can be defined by associ-
ating a basis of m formlike objects to each tangent space in B. Then, if it is possible
to choose a nowhere vanishing m-formlike object across all of B we could say that
it is orientable.
Multiplicity is related to how one interprets integration over such a set B. The
way integration works is to use the standard measure µ for each point, then modify
45
Key Constructions
the measure pointwise by the map µ(p) 7→ multiplicity(p)µ(p). To have integer

multiplicity means that pointwise multiplicity(p) ∈ Z. An example of this is illus-
trated by taking the real plane R2 and folding it in half. Then the fold line has
multiplicity one and all other points in the resulting half plane have multiplicity
two.
Anyway, the utility of rectifiable currents lies in a result from general measure
theory that states that one can integrate a smooth differential form φ over such a
set S, and hence view S as a current, or a linear functional on differential forms,
R
φ 7→ S
φ
One also has a boundary operator from m to m − 1-dimensional currents
(∂S)(φ) = S(dφ)
though ∂S will not necessarily be rectifiable even if S is.

One might wonder if rectifiable currents which satisfy the problem of least area
possess any geometric significance. The Allard Regularity theorem tells us that for
m ≤ 6, an m-dimensional area-minimising rectifiable current in Rm+1 is a smooth
embedded manifold. In fact this is a best possible estimate, since there are examples
in the literature of area minimising rectifiable currents which admit codimension
seven singularities. This is related to the theorem of Frankel and Lawson, and my
interest in extending it.
Remark. For a more careful definition of these objects, consider any oriented m-
dimensional rectifiable set S. Let S̄(x) denote the element of the tangent cone at x.
Then for any differential m-form φ, define
R
S(φ) = S
µ(x)S̄(x) · φdHm
where µ(x) is an associated multiplicity to the point x. S can then be naturally

thought of as a rectifiable current (provided S also has compact support).
3.4.4 The Rectifiability Theorem

Definition 9. A current T is rectifiable if T = τ (M, θ, χ) where M is countably
n-rectifiable, H n -measurable, θ is a positive locally H n -integrable function on M ,
and χ(x) orients the tangent cone Tx M of M for H n -a.e. x ∈ M .
46
This theorem essentially gives a condition for a current to be a rectifiable current.

Theorem 3.4.3. (32.1 [47]). Suppose T ∈ Dn (U ) is such that MW (T ) < ∞ and
MW (∂T ) < ∞ for all W ⊂⊂ U , and θ∗n (µT , x) > 0 for µT -a.e. x ∈ U . Then T is
rectifiable.
3.5 The Compactness Theorem for Integral Cur-

rents
This theorem is a core result of geometric measure theory; it allows us to sensibly
define convergence in certain special classes of geometric measure theoretic objects.
Theorem 3.5.1. (27.3 [47]). If {Tj } ⊂ Dn (U ) is a sequence of integer multiplicity
currents with
supj≥1 (MW (Tj ) + MW (∂Tj )) < ∞ ∀W ⊂⊂ U ,
then there is an integer multiplicity T ∈ Dn (U ) and a subsequence {Tj 0 } such

that Tj 0 → T in U .
The aim of this section will be to prove a limited version of this theorem. The
main reason I will not state a proof of the theorem in its full generality is essentially
because the technical nature of it would disguise the main ideas underlying the
establishment of this result. Instead, I will in fact focus on providing a proof of the
compactness theorem for BV functions stated earlier. This has several advantages:
(i) The compactness theorem for BV functions is in fact a special case (the codi-
mension one case) of the above theorem, since a codimension one current can
be viewed locally as a graph of a BVloc function,
(ii) Although this theorem is less general, it is considerably easier to prove and is
a good model of the general case,
and
(iii) The theorem is readily extendible to the codimension k case.
I will follow this proof with a focus on proving its main implication for the
Plateau problem (and more general variational problems), namely that weak min-
imising solutions will exist to it.
47
The Compactness Theorem for Integral Currents
Remark. The above compactness theorem can be extended to a sequence of geo-

metric measure theoretic objects, like currents (but not) of varying (not necessarily
integer), but bounded multiplicities. These beasts are known as varifolds. This
allows one to apply this technology to such problems as variation of the scalar cur-
vature on a surface, etc. The proof of this result will be given in a later section.
3.5.1 Two important theorems from analysis

Definition. A set of functions F mapping from a space X to a space Y is said to be
equicontinuous if for every f ∈ F , every > 0, and every x0 ∈ X there exists a δ
such that if kx − x0 k < δ then kf (x) − f (x0 )k < .
Equicontinuity is stronger than uniform continuity, which in turn is stronger
than continuity.
Definition. A set of functions F ⊂ C(X, Y ) is called pointwise relatively compact
if for all x ∈ X, the set {f (x) : f ∈ F } is compact in Y .
I may now state the result:
Theorem 3.5.2. (Arzela-Ascoli). Suppose S is a compact metric space. Then a

non-empty set A ⊂ C(S) is totally bounded iff A is bounded relative to the norm in
C(S) and equicontinuous.
Proof. (lifted from J. Koliha’s notes on analysis [34], Theorem 5.5.2). Suppose A is
bounded and equicontinuous. Then, for any fixed > 0 there is a ball B(a, δ(a)) for
each a ∈ S such that
f ∈ A and x ∈ B(a, δ(a)) implies |f (x) − f (a)| < 31
Since S is compact, by the Heine Borel lemma there exist balls B(xk , δ(xk )) for
k = 1, ..., N some finite N that cover S.
Now define a mapping P : A → RN setting P (f ) = (f (x1 ), ..., f (xN )). I claim
that P (A) is bounded in RN . To see this, observe that since A is bounded by
hypothesis that there is a c > 0 such that
kf k ≤ c for all f ∈ A
Hence, for each f ∈ A,
kP (f )k∞ = maxk |f (xk )| ≤ kf k ≤ c
48
But bounded sets in RN are totally bounded, since RN is a complete metric

space (once again we are using the Heine-Borel lemma). So let P (f1 ), ..., P (fp ) be a
1
3
-net for P (A). Then I claim that f1 , ..., fp is an -net for A.
Suppose f ∈ A. Then certainly P (f ) ∈ P (A), and there is a fj such that
kP (f ) − P (fj )k∞ < 13 . Recall also that for any x there is xk with x ∈ B(xk , δk ). So
|f (x) − fj (x)| ≤ |f (x) − f (xk )| + |f (xk ) − fj (xk )| + |fj (xk ) − fj (x)|

1 1
≤ + kP (f ) − P (fj )k∞ +
3 3
1 1 1
< + + = . (3.21)
3 3 3
This is true for any x ∈ S, so kf − fj k < i.e. A is totally bounded.

For the converse, suppose A is totally bounded. Fix > 0. Then there is an
-net f1 , ..., fN such that for any f ∈ A there is an fi such that kf − fi k < .
Since S is compact, the functions |fi | are bounded from above by constants Ci . Let
C = maxi Ci . Then
kf k∞ ≤ kf − fi k∞ + kfi k∞ < + C < 2C
for small enough . So we conclude that A is bounded.

It remains to show that A is equicontinous, that is, that for all f ∈ A and all
x, y ∈ S there is a δ such that if |x − y| < δ then |f (x) − f (y)| < .
Observe now that since S is compact we may choose a δ small enough that this
condition does hold for a finite number of functions, eg our -net f1 , ..., fN . So then
|fk (x) − fk (y)| < for all x, y ∈ S such that |x − y| < δ
Finally
|f (x) − f (y)| ≤ |f (x) − fk (x)| + |fk (x) − fk (y)| + |fk (y) − f (y)|
≤ + + = 3 (3.22)
so A is equicontinuous as required. This completes the proof.
49
The following corollary is of crucial importance in our development, and is often

also called the Arzela-Ascoli theorem:
Corollary 3.5.3. Suppose S is a compact metric space. Then a nonempty set

A ⊂ C(S) is compact iff A is bounded, closed and equicontinuous.
This corollary is important in any application where one wants to show existence
of a convergent subsequence of a set of functions. Hence it has obvious applications
in geometric measure theory, and, more generally, in existence theory for PDE.
A more general version of the theorem also holds:
Theorem 3.5.4. (Ascoli-Arzela). Let X and Y be metric spaces, X compact. Then

a subset F of C(X, Y ) is compact iff it is equicontinuous, pointwise relatively compact
and closed.
Remark. Note that the theorem I proved earlier was essentially for the case Y = Rn ,
but extending the proof to the general case is not too hard.
I now remind the reader of a theorem I mentioned earlier when I was talking
about Sobolev spaces:
Theorem 3.5.5. (The Riesz Representation Theorem). For every bounded linear
functional F on a Hilbert space H, there is a uniquely determined element f ∈ H
such that F (x) = (x, f ) for all x ∈ H and kF k = kf k.
This result allows one, in certain circumstances, to show certain notions are well
defined. It will have particular power in the next section, in which I shall make
certain comments on Radon measures.
3.5.2 Remarks on Radon Measures

Recall that a measure µ on a space X is Radon if it is Borel regular and if it is
finite on compact subsets. The importance of Radon measures comes mainly from
the following fact:
Theorem 3.5.6. There is a one to one correspondence between µ-measurable func-

tions ν : X → H satisfying kνk = 1 µ a.e. and linear functionals L : K(X, H) where
H is any Hilbert space and K(X, H) is the space of continuous functions from X to
H, provided that the functionals L satisfy the finiteness condition
50
sup{L(f ) : f ∈ K(X, H), kf k ≤ 1, spt(f ) ⊂ P } < ∞, for each compact P ⊂ X.
Proof. The correspondence is essentially achieved via use of the Riesz representation
theorem. Let < ., . > be the inner product on H. Then, for any L, there is a ν, and
vice versa, such that
R
L(f ) = X < f, ν > dµ.
Corollary 3.5.7. As a consequence of the above, we can identify the Radon mea-
sures on X with the non-negative linear functionals onRK(X, R). By abuse of nota-
tion, define from now on µ : K(X, R) → R as µ(f ) = X f dµ.
The following theorem is important for the development to follow.

Theorem 3.5.8. Let {µk } be a sequence of Radon measures on a space X. Suppose
supk µk (U ) < ∞ for every set U ⊂ X with compact closure. Then we may conclude
that there is a subsequence {µk0 } which converges to a Radon measure µ on X in
the sense that
limµk0 (f ) = µ(f ) for each f ∈ K(X, R);
in other words, the associated functionals converge.
3.5.3 Mollification
Definition 10. A symmetric Rmollifier is a function φ ∈ Cc∞ (Rn ), φ ≥ 0, with
support(φ) ⊂ B1 (0), such that Rn φ = 1 and φ(x) = φ(−x).
Now let u be a L1loc function. We define φσ (x) = σ −n φ( σx ), where φ is a symmetric

mollifier. Then uσ = φσ ◦ u are the corresponding mollified functions.
The key idea of mollifying a function is to construct a sequence of nicer functions
converging to it (in particular, smooth functions converging to something which is
not guaranteed to be smooth), and conclude properties of the limit from results for
the mollified entities. This has powerful applications, because it is possible to do
analysis on mollified functions (differentiation, integration, etc) but not necessarily
on the more pathological entities we are interested in.
Observe that we may equivalently define the mollification of u as
51
uh (x) = h−n φ( x−y

R
U h
)u(y)dy
for an appropriate symmetric mollifier φ. Recall also Hölder’s inequality:

R
U
uvdx ≤ kukp kvkq
1 1
where p
+ q
= 1.
An appropriate symmetric mollifier φ might well be φ(x) = cexp( |x|21−1 ) for
R
|x| ≤ 1 and zero otherwise, for a choice of a constant c such that φdx = 1.
The following result is mentioned in Leon Simon’s book, and also in Gilbarg and
Trudinger’s text. It is a standard result about mollification. First, however, we will
need an auxiliary lemma, which is essentially lemma 7.1 in [26]:
Lemma 3.5.9. (Auxiliary Lemma). Suppose u ∈ C 0 (U ). Then uh converges to u

uniformly on any domain Ω ⊂⊂ U .
Proof. Since we can define a φ such that
x−y
Z
−n
uh (x) = h φ( )u(y)dy
|x−y|≤h h
Z
= φ(z)u(x − hz)dz (3.23)
|z|≤1
it follows that if Ω ⊂⊂ U and 2h < dist(Ω, ∂U ) then
Z
supΩ |u − uh | ≤ supx∈Ω φ(z)|u(x) − u(x − hz)|dz
|z|≤1
≤ supx∈Ω sup|z|≤1 |u(x) − u(x − hz)| (3.24)
But since u is uniformly continuous over the set
Bh (Ω) = {x : dist(x, Ω) < h}
as Bh (Ω) is compact and u is continuous, it follows from the above inequality

that uh tends to u uniformly as h tends to zero.
52
Lemma 3.5.10. Suppose u ∈ BVloc (U ). Then the mollification of u, uσ , converges

in the limit as σ → 0 to u in L1loc (U ). Also kDuσ k → kDuk in the sense of Radon
measures in U , as mentioned in the previous theorem.
Proof. The following proof of the first part is lifted from Gilbarg and Trüdinger [26],
lemma 7.2.
Now, by Hölder’s inequality, we have that
Z
p
|uh (x)| = | φ(z)u(x − hz)dz|p
|z|≤1
Z
≤ |φ(z)p−1 |φ(z)|u(x − hz)|p dz
|z|≤1
Z Z
p−1
≤ φ(z) dz φ(z)|u(x − hz)|p dz
|z|≤1 |z|≤1
Z
≤ φ(z)|u(x − hz)|p dz (3.25)
|z|≤1
In particular, if Ω ⊂⊂ U and 2h < dist(Ω, ∂U ) then
Z Z Z
p
|uh | dx ≤ φ(z)|u(x − hz)|p dzdx
Ω Ω |z|≤1
Z Z
= φ(z)dz |u(x − hz)|p dx
|z|≤1 Ω
Z
≤ |u(x)|p dx (3.26)
Bh (Ω)
where Bh (Ω) = {x ∈ U : dist(x, Ω) < h}. The last inequality follows from
Hölder’s inequality and since x − hz is in Bh (Ω) for all h < 21 dist(Ω, ∂U ) and
|z| ≤ 1, provided x ∈ Ω. Observe certainly Ω ⊂ Bh (Ω).
Then it now follows that
kuh kLp (Ω) ≤ kukLp (Bh (Ω))
Now, given > 0, choose a C 0 (U ) function w satisfying
53
ku − wkLp (Bh (Ω)) ≤
(we may do this because C 0 (U ) is dense in C p (U ) for all p > 0). I now invoke
our auxiliary lemma to find a h such that
kw − wh kLp (Ω) ≤
Then applying our previous estimate to the difference u − w, we finally obtain

that
ku − uh kLp (Ω) ≤ ku − wkLp (Ω) + kw − wh kLp (Ω) + kwh − uh kLp (Ω)

≤ 2 + ku − wkLp (Bh (Ω)) ≤ 3 (3.27)
for a small enough h. It follows that uh converges to u in Lploc (U ), and, in

particular, that uh converges to u in L1loc (U ).
I follow Simon’s book [47], Lemma 6.2, for the proof of the second part. We
wish to show
R R
limσ→0 f |Duσ | = f |Du|
I claim it is easy to demonstrate that

R R
f kDuk ≤ liminfσ→0 f kDuσ k
since by definition
R R
W
kDuk = −supg W
udiv(g)
where g is a bump function (ie no larger in modulus than one, smooth, and
contained in the support of W ).
R R R R
But then W f kDuk = −supg W udiv(f g) ≤ −liminfσ→0 W udiv(f gσ ) =: liminfσ→0 W f kDuσ k.
So it remains to show that
R R
limsupσ→0 f kDuσ k ≤ f kDuk
Observe first of all that

R R
f kDuσ k = supg f g · grad(uσ )
54
with g a bump function. Also note that if g is fixed, and for σ < dist(spt(f ), ∂U ),
we have
Z Z
σ
f g · grad(u ) = − uσ div(f g)
U Z
=− φσ ? (u)div(f g)
Z
=− u(φσ ? div(f g))
Z
=− udiv(φσ ? f g) (3.28)
R But by definition of kDuk, we have that the above is nothing other than ≤
Wσ
(f + (σ))kDuk. Here (σ) → 0 for σ → 0, and W = spt(f ), Wσ = {x ∈ U :
dist(x, W ) < σ}. This is because
kφσ ? f gk = f k(φσ ? g1 , ..., φσ ? gn )k ≤ φσ ? f
But it is clear that φσ ?f → f uniformly in Wσ̂ as σ → 0, where σ̂ < dist(W, ∂U ).

So this proves the second part.
3.5.4 Proof of The Theorem, and Existence of (Weak) So-

lutions to the Plateau Problem
Proof. (of the Compactness Theorem for BV functions). By the aforementioned
lemma, to show uk0 → u in L1loc (U ) for some subsequence {uk0 }, it is sufficient to
show that the sets
{u ∈ C ∞ (U ) : W (kuk + kDuk)dLn ≤ c(W )}

R
where W ⊂⊂ U are precompact in L1loc (U ) (remember, a set is precompact, or

totally bounded, if for any number E > 0 there is a covering of that set by finitely
many sets of maximal diameter E). This is because certainly if u belongs to the
above class, its associated Radon measure Du is bounded locally in L∞ loc (U ). But if
these sets are totally bounded then it follows that Du is bounded also in L1loc (U ) by
finiteness and compactness.
Then the result follows from Gilbarg and Trüdinger’s book [26], Theorem 7.22,
since we can then conclude these sets are totally bounded as required.
55
Varifolds
Existence of Weak Solutions to the Plateau Problem. The above theorem guar-
antees that given a sequence of surfaces realised as graphs of BV functions there
will be a convergent subsequence. But certainly if we perform calculus of variations
then we will naturally get a sequence of smooth surfaces, with the limit satisfying
the minimal surface equation. But a sequence can only have one limit; hence this
limit must be the solution given to us via the BV function compactness theorem.
However we are not guaranteed that it will be smooth; we only know in general that
it will be the graph of some BV function. Showing that what we have is in some
sense smooth will be the focus of the next section.
There is in fact a generalisation of all this to varifolds, which are essentially
currents with a measure θ, sort of like metric-measure spaces in the sense of Gro-
mov, but more pathological. This generalisation, the compactness theorem for such
objects, is known as the Allard regularity theorem. However I will not mention this
here, since the statement and proof of it are very technical and little instructive
value would be gained from a study. More on varifolds in the next section.
3.6 Varifolds
3.6.1 Introduction
Note that I have so far only treated the compactness and regularity theorems for the
case of objects with integral density. In more general variational problems, where
we might have some sort of natural Lagrangian that varies from point to point
in our manifold, for instance, it is necessary to consider different, non-constant
densities or equivalently different measures. In this section I mention a few results
that equip us with the means to deal with these additional complexities. The most
important theorems are the approximation and deformation theorems, which lead
to the more general compactness theorem for rectifiable varifolds. The constancy
theorem is of considerable theoretical utility, and was used by Jon Pitts extensively
in his thesis. The boundary regularity theorem as stated here really only applies to
integral varifolds, but it is also true more generally.
Varifolds are essentially a generalisation of the concept of currents. The primary
reason we are interested in these things is in order to extend our existence and
regularity results to general variational problems; what we have developed so far
is limited in scope to geometric variational problems on sets of integer multiplicity,
which limits us to examining things like the Plateau Problem. In fact, in later parts
56
of this thesis, I will examine the problem of varying the Fisher Information of a
space, which is definitely not integer valued!
For a more precise definition, let Ω be a subset of Rn . Then a general m-
dimensional varifold on Ω is essentially a Radon measure on Ω × G(n, m), where
G(n, m) is the set of all m-dimensional subspaces of n-dimensional Euclidean space.
However, for our purposes, we will mainly be interested in rectifiable varifolds. Such
objects may essentially be represented as a graph with multiplicity, i.e., as a pair
(V, f ) where the mass of a set U ⊂ V is defined as
R
m(U ) = U
f dLm
Another way of writing this is to define a new measure (which is not necessarily
positive definite) such that dµ = f dLm and then
R
m(U ) = U
dµ
In other words, varifolds lend themselves naturally to the investigation of prob-

lems other than that of minimising or extremising area, or are essentially a general-
isation from considering the mass of a set induced via the Lebesgue measure to that
induced by more general measures.
3.6.2 The Constancy Theorem

This result provides us with some control on the density of a varifold (or current)
through a certain region. It is hence quite useful to prove regularity results (remem-
ber, to show regularity usually amounts to proving kV kB(m, a) ≤ ωn an (1 + ) for
small ). In practice this theorem is usually used to show that the hypotheses of the
Allard Regularity Theorem are satisfied.
Theorem 3.6.1. (26.27, [47]). If U is open in Rn , if U is connected, if T ∈ Dn (U )

and if ∂T = 0, then ∃ c a constant such that T = c[[U ]].
What this is essentially saying is that the density of a rectifiable current T is

constant provided its boundary ∂T is zero. Recall that ∂T (φ) is defined to be T (dφ).
57
A couple of applications
3.6.3 The Approximation and Deformation Theorems

(The relevant references here are 30.2 of Simon’s Book [47], 7.1 of Morgan’s Book
[35], and 4.2.20 of Federer’s treatise, [15].)
Following from the Deformation Theorem, the Approximation Theorem is a key
result in the theory of currents. It essentially states that any integral current T can
be approximated by a slight deformation of a polyhedral chain P . Recall that a
polyhedral chain is essentially a collection of simplices.
Theorem 3.6.2. Given an integral current T ∈ Dm (Rn ) and > 0, there exists
a m-dimensional polyhedral chain P in Rn , with support within a distance of the
support of T , and a C 1 diffeomorphism f of Rn such that
f# T = P + E
with M (E) ≤ , M (∂E) ≤ , Lip(f ) ≤ 1 + , Lip(f −1 ) ≤ 1 + , kf (x) − xk ≤ ,

and f = identity whenever dist(x, spt(T )) ≥ .
Remark. Note we have an error term, E, which this theorem states we have precise
control over (the theorem says we can make the mass of the error as small as we
like by choosing an appropriate f and P ). This error plays a similar role in related
regularity results that the tilt excess plays in the proof of the Allard Regularity
Theorem.
3.6.4 The Boundary Rectifiability Theorem

This result essentially speaks for itself.
Theorem 3.6.3. Suppose T is an integer multiplicity current in Dn (U ) with MW (∂T ) <

∞ for all W ⊂⊂ U . Then ∂T is an integer multiplicity current.
3.7 A couple of applications

I have already mentioned the more obvious application of the theory to the Plateau
problem. However I should remark that it is possible to show that certain closed
curves in three space do not generate nice smooth surfaces, but rather smooth struc-
tures with a stratification. One can actually see this with soap films, for instance.
58
But for the purposes of this section I will avoid further discussion of this and
focus instead on a result due to Frankel and Lawson, and an existence result due
to Jon Pitts. There are of course the other obvious applications to establishing
regularity of solutions to other natural variational problems, and determining when
singularities are formed similar to soap films given particular data. From the exam-
ple above, such singularities may well exist, even if the integrand and geometry are
smooth.
3.7.1 The Frankel-Lawson Result

My original motivation for exploring the jungle of GMT was to understand the first
and second order behaviour of variations about singularities in varifolds- this was
towards the aim of extending a well known result of Lawson and Frankel, which I
shall now proceed to reveal and prove, following [16] and [17] most closely.
Theorem 3.7.1. (Main Theorem). Let Mn+1 be a complete, connected manifold

with positive Ricci curvature. Let Vn and Wn be immersed minimal hypersurfaces of
Mn+1 , each immersed as a closed subset, and let Vn be compact. Then Vn and Wn
must intersect.
Proof. Suppose Vn and Wn are embedded submanifolds, and that they do not in-
tersect. Suppose γ is a geodesic from p0 ∈ V to q0 ∈ W that realises the minimum
distance l from V to W . Clearly it must strike V and W orthogonally, otherwise it
can easily be shortened. Any unit vector X 0 tangent to V at p0 parallel translates
along γ to a vector field X along γ tangent to W at q0 . This vector field gives rise
to a variation of γ allowing end points to vary.
Lemma 3.7.2. (Key Lemma). The first variation of arc length is 0, and the second
variation of arc length is
Rl
L00X (0) = BW (X) − BV (X) − 0
K(X, T )ds
∂ ∂
where T = ∂t and X = ∂α are coordinate vector fields defined on a ribbon
0 ≤ t ≤ l, − ≤ α ≤ . At α = 0, T is the unit tangent field to γ, and the curve
α = 0 corresponds to the geodesic γ running from p0 to q0 . The curve t = 0 is a
curve in V ; the curve t = l is a curve in W . BV (X) (similarly BW (X)) represents
the second fundamental form of V (respectively W ) at p0 (respectively q0 ) evaluated
on X(p0 ) (resp. X(q0 )).
59
Proof. (of Key Lemma).

Rl Rl
L0 (α) = ∂
0 ∂α
g(T, T )1/2 dt = 0
∇X g(T, T )1/2 dt
hence
Z l
0 g(∇X T, T )
L (α) = dt (3.29)
0 g(T, T )1/2
Since g(T, T ) = 1 along α = 0 we get
Rl Rl
L0 (0) = 0
g(∇X T, T )dt = 0
g(∇T X, T )dt = 0
since ∇T X = 0 as X is parallel displaced along γ. I.e. the first variation of arc

length is 0.
Continuing from our first variational equation we get
Rl
L00 (α) = 0 ∇X ( g(∇ T X,T )
g(T,T )1/2
)dt
which becomes
Rl Rl
L00 (0) = 0
∇X g(∇T X, T )dt − 0
g(∇T X, T )2 dt
The second term vanishes since ∇T X = 0 (X is parallel along γ). So we are left
with
Rl Rl
L00 (0) = 0 g(∇T ∇X X, T )dt + 0 g(R(X, T )X, T )dt
using the relation ∇X ∇T − ∇T ∇X = R(X, T ) for orthogonal vector fields X, T .

The first term can be rewritten:
∂
g(∇T ∇X X, T ) = ∇T g(∇X X, T ) − g(∇X X, ∇T T ) = ∂t
g(∇X X, T )
Putting this all together, together with the result that
K(T, X) = −g(R(X, T )X, T ) (3.30)

we have that
d2 L
Rl
L00X (0) = dα2
(0) = g(∇X X, T )q0 − g(∇X X, T )p0 − 0
K(T, X)dt
60
It is clear that the first two terms are the second fundamental forms with respect
to W and V respectively evaluated at q0 and p0 . (Recall that the second fundamental
form of an embedding say f : M → M is B(X, Y ) = ∇X Y − ∇X Y where X, Y
are extensions of vector fields X and Y locally defined on M to M and ∇ is the
connection on M . ∇ is defined by
∇X Y = (∇X Y )T
Then we have that for a codimension one submanifold, say V , that g(∇X X, T )|V =
g(∇X X − ∇X X, T )|V = g(B(X, X), T )|V which is BV (X) by definition.
Doing this for n orthonormal vectors X10 , ..., Xn0 spanning Tp0 V and summing we
get
Pn 00
Pn Pn Rl
α=1 L X α
(0) = α=1 BW (X α ) − α=1 BV (X α ) − 0
Ric(T )ds
where Ric(T ) = nα=1 K(Xα , T ). Now it is well known that

P
Pnminimal hypersur-
faces are characterised by having mean curvature zero, hence α=1 BW (Xα ) = 0 =
P n
α=1 BV (Xα ). Since Ric(T ) > 0 by our hypothesis, we conclude that for some α,
00
LXα < 0, contrary to our assumption that γ was of minimal length.
It is one of my current projects to try to extend this result to the case of minimal
submanifolds which admit singularities. There are examples of such things in the
literature. We would like to study the situation where one has minimal hypercones
in an ambient space of positive Ricci curvature. Three cases arise:
(i) Where the point of closest approach is between two smooth points,
(ii) Where the point of closest approach is between a smooth point and a singular
point,
and perhaps the most interesting case,
(iii) Where the point of closest approach is between two singular points.
This amounts to trying to prove the following.

Conjecture. Let Mn+1 be a complete, connected manifold with positive Ricci curva-
ture. Let Vn and Wn be immersed minimal currents of Mn+1 with multiplicity one,
each immersed as a closed subset, and let both Vn and Wn be compact. Then Vn and
Wn must intersect.
61
Remark. Note that this is not as general as the Frankel-Lawson result, since we
require both V and W to be compact.
Theorem 3.7.3. The conjecture is true for Vn an immersed minimal submanifold
and Wn an immersed minimal current of Mn+1 .
Proof. Suppose Vn and Wn are embedded submanifolds, and that they do not in-
tersect. Suppose γ is a geodesic from p0 ∈ V to q0 ∈ W that realises the minimum
distance l from V to W . The cases where both of p0 , q0 are nonsingular has al-
ready been treated. So suppose q0 is singular. Now clearly γ must strike V and W
orthogonally, otherwise it can easily be shortened.
Lemma 3.7.4. (Key Lemma). The first variation of arc length is 0. The second
variation of arc length is negative for some choice of piecewise smooth vector field.
Proof. We first establish that the tangent cone at the singular point q0 is a tangent
plane. This enables us to sensibly talk about vector fields about q0 . Let σ be γ
parametrised backwards from q0 to p0 . Let xn be a sequence of points converging
to q0 in W . Let αn be the corresponding sequence of geodesic curves from q0 to xn .
Let τn be the corresponding sequence of geodesics curves from xn to p0 . We proceed
in two steps:
(i) We first prove that if the angle between αn and σ is acute, that the length
of τn , L(τn ), is less than L(σ) for sufficiently large n, hence establishing that such
is an impossibility if q0 is to be the point of closest approach.
(ii) Next, we prove using the Hopf maximum principle that the angle between
αn and σ cannot be obtuse either.
Step 1. We first observe the following fact: For neighbourhoods sufficiently small
in any manifold, the metric can be written as
g(∂i , ∂j )p = δij + θ(2 )
for points p within a distance from the center of geodesic coordinates with the
choice ∇∂i ∂j = 0 at the origin of the coordinates.
So take N sufficiently large s.t. for all n ≥ N , L(αn ) < . Take a series of -balls
that cover the geodesic triangle τn , αn , σ. Then there is an isometry up to order 2
that maps this geodesic triangle onto a triangle in Euclidean 2-space.
Hence from now on we will assume we are working in Euclidean 2-space, and by
abuse of notation will refer to the images of τn , αn and σ by the same names. To
control error, we will take N2 > N s.t. L(αn ) < 2 for all n ≥ N2 .
62
Now, the angle between −σ and −τn will become very small for n large, and,
by the first order taylor series expansion for the tangent of this angle, we compute
L(αn )2
it to be roughly L(α n)
L(σ)
to order L(σ)2
.
Proceed in steps of length along both −τn and −σ until we arrive in the last
-ball about q0 . Connect the finishing points A and D along −τn and −σ respectively
by a curve λ. This curve λ will divide the triangle into an isoceles triangle and a
rectangle. Through an easy computation, the identical interior angles within the
rectangle where λ meets σ and τn are seen to be
L(αn ) 2
φn = π/2 + 2L(σ)
+ θ( L(α n)
L(σ)2
)
1
Choose N3 ≥ N2 s.t. for all n ≥ N , L(αn ) < L(σ)
.
Then φn ≤ π/2 + L(αn )2 /2 + θ(L(αn )4 ), or φn ≤ π/2 + θ(4 ).
Then clearly if one projects down onto the image of σ, one sees that the projec-
tion of the curve from A to xn is of the same length to order 4 and the projection
of the curve from xn to q0 is nonzero because the angle between αn and σ is acute.
So, if P is the projection,
Length(curve from D to q0 ) = Length(curve from A to xn ) + Length(P(curve
from xn to q0 )) +θ(4 )
Since all other terms are of order 2 the error term is negligible and it is easily
seen that Length(curve from D to q0 ) > Length(curve from A to xn ), from which it
easily follows that L(τn ) < L(σ).
Step 2. Suppose now that there is a vector v in the tangent cone at q0 that meets
σ at an obtuse angle. By Step 1, all of the tangent cone must be at least at right
angles to σ. Furthermore, by the Hopf maximum principle, all of the tangent cone
must meet σ at an obtuse angle if part of it does. But then this is a contradiction
to the minimality of W .
So the tangent cone is a plane.
What about the second variation? Consider the class of all sequences of smooth
points (xn , yn ) such that xn converges to p0 and yn converges to q0 . From before,
we know that there is a variation such that the distance between xn and yn can be
reduced, no matter how close they are to p0 , q0 respectively.
Since V , W are compact the variational vector fields Kn inducing a negative
second variation corresponding to (xn , yn ) converge to some vector field K0 corre-
sponding to (p0 , q0 ) that induces a variation in length less than or equal to 0. We
can talk about vector fields at q0 since the tangent cone at q0 was proven to be a
63
tangent plane before. If the variation is less than 0 we are done. If it is equal to 0
then I claim, if all other variations are greater than or equal to 0, this contradicts
the minimality of W (this can be seen by once again invoking the Hopf maximum
principle). Hence either all second order variations are 0 or there exists a variation
less than 0, and we are done.
So assume all second order variations are 0 about q0 . But this is impossible
because the ambient space is not Ricci flat.
The theorem now follows easily in an analogous manner to the smooth case.
Remark. The statement that variational vector fields inducing negative second vari-
ation in length on compact spaces corresponding to a convergent sequence of points
themselves converge to another variational vector field inducing nonpositive second
variation is not at all obvious, and requires some degree of proof.
Note: It may be possible to extend the argument of Frankel and Lawson to mini-
mal submanifolds that arise as limits of smooth Riemannian submanifolds, following
the approach of [20].
3.7.2 Jon Pitts’ Construction

I discuss here the nature of the construction Jon Pitts uses to establish existence
of minimal submanifolds of compact spaces under certain dimensional restrictions.
Many thanks to Marty Ross for his considerable assistance in helping me dissect
the relevant manuscript [45] to understand the main thrust of the arguments. Jon
Pitts’ manuscript actually provides a rather beautiful demonstration of how many
of the techniques and tools of geometric measure theory are used in practice.
What he proves in his treatise is the following theorem:
Theorem 3.7.5. Let M n be a compact n-manifold, with n ≤ 6. Then for each

2 ≤ k ≤ 5 there is at least one minimal submanifold Tk of M of dimension k.
The particular example to bear in mind here is the 2-sphere. I construct a

minimal submanifold in the following way. Trace out a family, any family, of codi-
mension one submanifolds of the 2-sphere (circles) from pole to pole, parametrised
by numbers in the interval [0, 1]. So we have a mapping γ : [0, 1] → S 2 . Define
L(γ) = max{L(γ(t)) : t ∈ [0, 1]}. Then we hope that we can find a τ that realises
Λ(M ) = inf γ {L(γ)}. For this τ , define V = γ(t) such that L(γ(t)) = max{L(γ(T )) :
64
T ∈ [0, 1]}. Then V will be minimal. In particular, V will be an equatorial circle of

S 2 , which is of course minimal, even though it will not be stable.
So we might reasonably expect this procedure to allow us to construct unstable
minimal submanifolds of more general spaces. There are of course problems that
might arise in this program; for instance, there might not be a τ that realises Λ(M ).
This necessitates the use of varifolds and the whole machinery of geometric measure
theory.
So, following this idea, first Jon Pitts proves that one can construct, via this
min-max procedure, a guy that is in some sense a weak solution to our minimisation
problem. He constructs a varifold V that possesses a set of properties that he calls
almost minimising. One of the properties of this varifold V is that there is a sequence
{Vi } converging to V .
One of his major results is his decomposition theorem, which essentially states
that each of the Vi splits locally into sheets Vil with density bounds; in particular
for applications, the density of these sheets will be tightly bound about an integer
multiplicity. It is in the construction of the tools to prove this theorem, in particular
curvature bounds, that the dimensional restriction for our ambient space M becomes
necessary to make.
The compactness theorem for varifolds then states that a subsequence of the Vil
converges to varifolds Si . The density bounds are preserved for the Si via certain
other results. Since one has bounds on the density, one can use the Allard regularity
theorem to prove that the Si are regular.
P
Pitts then uses a careful argument to prove that V = i Si , and it follows
that V is regular since each of the Si are disconnected (or equal) and are regular.
Furthermore V is minimal.
65
66
Chapter 4
Existence Theory for PDEs
4.1 Elliptic and Parabolic PDEs

As I demonstrated earlier, geometric measure theory is sufficient to establish ex-
istence and regularity results to solutions of variational problems over smooth do-
mains. By such a problem I mean a situation in which one defines a certain geometric
invariant over a space, performs calculus of variations, and then asks whether there
are solutions to the resulting PDE. However one might ask whether solutions exist
to general classes of PDE boundary value problems, where one does not necessarily
have the luxury of some action principle to rely on. This is the game of existence
theory.
As far as I know, the very interesting question of whether all PDEs defined over
domains that admit regular (smooth) solutions can be realised as the stationarity
condition for some geometric invariant over the same domain, is still very much an
open one. Even if the answer to this question was in the affirmative, however, it does
not tell us which PDEs can be realised as such variational problems. So existence
theory will always be useful.
Understanding existence theory for elliptic and parabolic PDEs is very impor-
tant, particularly in application to the Ricci flow. Many estimates crucial to un-
derstanding limiting behaviour of solutions and regularity for geometric flows are
a consequence of understanding the analogous results for parabolic PDEs. Hence
it is important, before discussing these flows (see ahead, the section ”Geometric
Evolution Equations”), to discuss the general state of knowledge on this more fun-
damental area of mathematics. Many of the results for parabolic PDE can be quickly
67
Elliptic and Parabolic PDEs
developed from the analogous results for elliptic PDE, so I shall talk about these
jointly.
4.1.1 Introduction
An elliptic equation is an equation of the following general form:
L(a, b, c)φ := (aij Di Dj + bi Di + c)φ = 0
where aij ζi ζj ≥ 0 for every nonzero vector ζ. This is known as the ellipticity
condition.
Sometimes a stronger condition is required, the notion of uniform ellipticity:
aij ζi ζj ≥ Ckζk2
for some positive C.

A parabolic equation is a slight departure:
(L(a, b, c) − ∂t )φ = 0
where L is an elliptic operator as before, and now φ = φ(x, t).

Equivalently we can consider the class of equations
(L(a, g) + c)φ := (gil Dl (aij gjk Dk ) + c)φ = 0
where aij is a nondegenerate matrix, and bk = gil Dl (aij gjk ) is an arbitrary

function depending on the choice of g. Usually also g is a symmetric positive definite
bilinear form, ie, a Riemannian metric. Now, if a is positive as before, we have an
elliptic equation; if it admits one or more negative eigenvalues, then it is hyperbolic.
In the special case that it admits one negative eigenvalue, and for the corresponding
eigenvariable t we have g∂t (ag)∂t δ = −gag∂t2 δ, then the above class degenerates to
parabolic type.
Such equations arise naturally in problems via separation of variables, where c
may incorporate some eigenvalue multiplied by a characteristic function depending
on the nature of the problem. Spherical harmonics are a good example of this.
68
4.1.2 The Maximum Principle

Roughly what the maximum principle states, for an elliptic PDE, is if u satisfies
the inequality ∆u + ai Di u + cu ≥ 0 and u is bounded on ∂U , where U ⊂ M is a
codimension 0 subset, say by D then u remains bounded by D for all x ∈ U . In
particular, if u ≤ 0 on ∂U , then u ≤ 0 on all of U .
∂
For a parabolic PDE, if u is a solution of the equation ∂t u ≤ ∆u + ai Di u + cu,
then if u is bounded initially, say by D, then the maximum principle tells us that u
is bounded at time t by Dect .
I shall now proceed to make the above statements more precise, with proof.
Definition 11. We will say that L(a, g) is uniformly elliptic if 0 < λkζk2 ≤ aij ζi ζj ≤
Λkζk2 for some constants λ, Λ.
What I said above for elliptic PDEs is essentially a consequence of the following
result:
Theorem 4.1.1. ((Theorem 3.5, [26]), [28], ”The Strong Maximum Principle”).
Suppose L = L(a, g) is uniformly elliptic, c = 0, and that Lφ ≥ 0 (or ≤ 0) in
a domain U . Then if φ achieves its maximum (minimum) in the interior of U it
must be constant. Furthermore, if c ≤ 0 and c/λ is bounded, φ cannot achieve a
nonnegative maximum ( non positive minimum) in the interior of U unless it is
constant.
Proof. (by contradiction). Suppose φ is non constant and achieves its maximum
M ≥ 0 in the interior of U . Then the set Ū on which φ < M is certainly a subset of
U and also ∂ Ū ∩ U is nonempty. Choose now a point x in Ū closer to the boundary
of Ū than the boundary of U ; ie closer to the maximum point than to the boundary
of the original domain. Construct the largest metric ball B in Ū with x as centre.
Then φ(y) = M for some y on ∂B, while also φ < M in B.
To conclude the proof, we will need
Lemma 4.1.2. (Lemma 3.4, [26]). If L is uniformly elliptic, c = 0 and Lφ ≥ 0 in
B, then if y ∈ ∂B and
(i) φ is continuous at y,
(ii) φ(y) > φ(x) for all x ∈ B,
69
Elliptic and Parabolic PDEs
(iii) There exists a sphere within B such that the intersection of that sphere with
∂B contains y,
then the outer normal derivative satisfies

∂φ
∂v
(y) >0
If c ≤ 0 and c/λ is bounded we may conclude the same provided φ(y) ≥ 0. If

u(y) = 0 there is no restriction on the sign of c.
Proof. (of lemma). The proof of this lemma as given in Gilbarg and Trudinger is
not personally to my taste; it is primarily an analyst’s proof, rigorous, but unnec-
essarily technical and not terribly instructive. Essentially one defines an auxiliary
function for comparison purposes and uses that to facilitate the argument. For a
more intuitive or geometric idea of why we expect the above to be true, at least in
the case that c = 0, consider a geodesic path from the centre x of the sphere in B to
the point y on ∂B. Furthermore we know that Lφ is nonnegative on this geodesic,
and φ is increasing along it at y. However, the second piece of information alone
tells us nothing; the derivative could easily be zero at y.
Consider now the geodesic as the real line and φ as a polynomial. Since Lφ can
be thought of as the ”deformed acceleration” since it is elliptic, we must have that
at y, φ(x) be locally of the form x2k , k ∈ N after an appropriate choice of chart
since the acceleration is nonnegative. By the second piece of data, we must be in
the right branch of x2k . Since y is not at zero, the derivative is nonzero and positive.
A similar argument, albeit with charts more carefully chosen, also goes through
for a nonzero function c.
It then follows easily from the lemma and from before that Dφ(y) 6= 0, but this
is impossible since y was supposed to be a maximum.
The proof of the maximum principle for parabolic equations is proved along
similar lines, though it requires some more care. The main difference is of course
the time dependence of the bound, or the ”diffusion” of u over time.
Primarily, the main difference is that the fundamental or Green’s function solu-
tion to the equation Lf − ∂t f = δ(x)δ(t) satisfies the inequality
f ≤ ect g
70
where g is the solution to the equation Lg = δ(x).

It is then an easy consequence of the maximum principle above and the theory
of Green’s functions that if u ≤ D at t = 0, that u ≤ Dect for arbitrary t.
Finding a proof of the above inequality is not an entirely trivial exercise. Nonethe-
less I will make an attempt to at least sketch one. We certainly know that
Lf − δ(t)Lg − ∂t f = 0
since Lg = δ(x). Consequently if we factor L = L̄ + c and integrate by parts

twice, we get L̄(f −δ(t)g)+gtt δ(t)−∂t f +c(f −δ(t)g) = 0. But g is time independent;
hence the second term vanishes and we have
L̄h − ∂t f + ch = 0
where h = f − δ(t)g. But since L̄ is a positive operator we can write
∂t f ≥ c(f − δ(t)g)
and consequently ∂t ( fg ) ≥ c( fg − δ(t)), from which we conclude ln fg ≤ c(t − H(t))

and finally f ≤ gect−H(t) ≤ gect which concludes the proof of the inequality and
hence of the maximum principle for parabolic PDE.
4.1.3 Differential Harnack Inequalities

I prove the Harnack inequality for the heat equation, due to Li and Yau:
∂u k∇uk2 nu
Theorem 4.1.3. If ∂t
= ∆u on Rn , u > 0, then ∆u − u
+ 2t
≥ 0.
∂v
Proof. Let v = log u. Then ∂t
= u1 ∆u and
∆v = ∇k ( u1 ∇k u) = u1 ∆u − 1
u2
k∇uk2
Hence
∂v
∂t
= ∆v + k∇vk2
71
Hyperbolic PDEs
n
I show ∆v ≥ − 2t .
∂
∇k v = ∇k (∆v + k∇vk2 )
∂t
= ∆∇k v + 2∇l v∇k ∇l v (4.1)
∂
∆v = ∆∆v + 2k∇2 vk2 + 2∇l v∇l ∆v
∂t
2
≥ ∆∆v + (∆v)2 + 2∇v · ∇∆v (4.2)
n
At the minimum point of ∆v, ∆∆v ≥ 0 and ∇∆v = 0. So,
∂
∂t
∆v ≥ n2 (∇v)2
Hence by the maximum principle
n
∆v ≥ − 2t
Or, remembering v = ln(u),
∂
∂t
(log(u)) − k∇log(u)k2 = ∆u
u
− k ∇u
u
n
k2 ≥ − 2t
This proves the Harnack inequality.
4.2 Hyperbolic PDEs

In this section I review the work of Shatah and Struwe [50], who cover basic ex-
istence and regularity theory for hyperbolic wave equations. This is of interest in
application to the constitutive equation for a sharp Physical manifold (see the sec-
tion on Information Theory and Physical Manifolds), where one has an equation of
the form
Rij = 0
72
Following this I make some attempt at assembling an existence theory for more
general hyperbolic equations, trying to extend Shatah and Struwe’s techniques. The
main motivation is to apply this to slightly blurred Physical manifolds, where one
has two coupled PDEs
(1 + ∆)Rij = 0
and
(∆ + ∆2 )Rij = 0
Finally I attempt to establish an existence theory for mixed parabolic-hyperbolic

PDEs. This is related to steepest descent of the physical information for the previous
two examples.
Unlike for parabolic and elliptic equations, we no longer have the luxury of
having a maximum principle to draw upon, so we have to find other tricks. In
fact, it will turn out that existence theory for hyperbolic equations tends to rest on
finding and exploiting ”conserved” quantities.
4.2.1 Existence theory for hyperbolic equations

In their book [50], Shatah and Struwe develop an existence and regularity theory
for equations of the form
∆u = h(x, t, u, Du) (4.3)

Note that this is of interest in understanding existence and regularity of equa-
tions such as
Ric(g) = 0
since Ric(g) ∼ ∆g +{correction terms depending on g and first order derivatives

of g}. This gives a tensor equation of the form
∆gij = h(x, t, gij , Dgij )
which is of the same form as the first equation. I shall now proceed to sketch
the key ideas in this theory.
73
Hyperbolic PDEs
Remark. Due to the equivalence of parabolic PDE to hyperbolic PDE as I sketched

earlier, we may make the simplifying assumption of considering only equations of
the type
∆gij = h(x, gij , Dgij )
by absorbing time into our coordinate system, ie x 7→ x̄, x = (x̄, t). This will
simplify the description of Shatah-Struwe theory considerably.
Given this simplifying assumption, it turns out that all we need can be obtained
from the seventh page of [50]. Let u : M → R , L = L(u, p) : T M → R be smooth
functions. Write L(u, Du) as the Lagrangian density of u, and consider the action
R
A(u; Q) = Q
L(u, Du)dm
where Q ⊂ M is an arbitrary subset.

Take a variation of u, u = u + φ, where φ is a function with compact support
on M . Then if we evaluate the first variation of the above action with such notation,
we obtain after following the standard sort of procedure that
d
R
d
A(u ; Q)|=0 = Q
(Lu (u, Du) − ∂i Lpi (u, Du))φdz
Then u is stationary with respect to A iff Lu (u, Du) = ∂i Lpi (u, Du).
Now, consider the Lagrangian L(u, Du) = 21 k∇uk2 +F (u, Du). Then, after some
intermediate working, one obtains that the above equality holds only if
d
∆u = d
F (u , Du )|=0
Since F was arbitrary, the right hand side of this expression is arbitrary. Hence
we can reverse engineer an action for the original Shatah-Struwe equation. Via
geometric measure theory, since we have constructed an action, the solution will
exist and it will be smooth.
74
4.2.2 Existence theory for ”elastic” wave equations

Before I finish I will briefly mention how one may establish an existence and regu-
larity theory for equations of the type
∆2 u = h(x, t, u, Du, D2 u, D3 u)
The idea here is to consider the same sort of thing as for the above, but instead
to look at the action
k∆uk2 − F (u, Du, D2 u, D3 u)
then by the same general nonsense as before, we find that the first variation is
zero iff
d
∆2 u = d
F (u , Du , D2 u , D3 u )|=0
Then, since the right hand side is arbitrary, and via the argument once more
from geometric measure theory that a smooth action implies smooth solutions, we
are done.
75
Hyperbolic PDEs
76
Chapter 5
Geometric Evolution Equations
The following notes are a typeset version, with some minor additions, of a most
excellent course given by Ben Andrews of the ANU at the University of Queensland
in the winter of 2006. For further information on this topic, in particular the spe-
cialisation to examination of the Ricci Flow and its application to the classification
of 3-manifolds, the interested reader is invited to investigate the sources Cao-Zhu
[8] and Morgan-Tian [36]. John Morgan and Gang Tian have also quite recently
written [37], which is a followup article to [36]. Some existence theory of PDE may
be required to understand some of the material in these references; hence the reader
is invited to also keep a copy of [26] handy, since many of the methods introduced
therein are applicable to weakly parabolic PDEs, of which the Ricci Flow is an
example.
Even though the information here is not directly relevant to the rest of this
work it has some bearing on classification of physical manifolds under a certain
choice of signal function (see later section). Note also that the fact that geometric
evolution equations of heat type improve the Fisher information in and of itself is
of considerable interest in finding extrema of this quantity (again, see the section
on Fisher information later on). Geometric evolution equations are also extremely
useful in proving results in comparison geometry; for instance, one such result tells
us that a manifold admitting a metric of positive Ricci curvature admits a metric
of constant sectional curvature. However, the primary motivation in preparing this
section was mainly for my self-reference, and consolidation of my understanding of
this most beautiful subject.
77
The Heat Flow Equation as the Prototypical Example
5.1 The Heat Flow Equation as the Prototypical

Example
The heat equation for a function
u : Ω × [0, T ) → R,
where Ω is a smoothly bounded open domain in Rn is the parabolic PDE
∂u
(p, t) = ∆u(p, t) (5.1)
∂t
To solve a heat flow equation, we also need
(i) Boundary data, eg Dirichlet data (u(p, t) = 0, p ∈ ∂Ω), or Neumann data

(Dn u(p, t) = 0, p ∈ ∂Ω), or Periodic data, and also
(ii) Initial data, ie u(p, 0) = u0 (p).
5.1.1 Maximum Principle

.
If u is a smooth solution of the heat equation on Ω × [0, T ), then if u0 (p) ≥
0∀p ∈ Ω, then u(p, t) ≥ 0∀p ∈ Ω, and ∀t ≥ 0.
Proof. (Restricted to the Dirichlet case, though it follows for other boundary
conditions as well). We show that u (p, t) = u(p, t) + (1 + t) ≥ 0 for all > 0.
Fix > 0. Then I claim that u (p, 0) ≥ > 0∀p ∈ Ω. For instance, in the
Dirichlet case, u (p, t) = (1 + t) > 0∀p ∈ ∂Ω.
Suppose the Result is not true. Then let (p0 , t0 ) be such that u (p0 , t0 ) = 0 =
inf {u (p, t)|p ∈ Ω, 0 ≤ t ≤ t0 }.
Then ∂u ∂t

(p0 , t0 ) ≤ 0, since u (p0 , t) ≥ u (p0 , t0 ) for 0 ≤ t ≤ t0 .
But since (p0 , t0 ) was defined to be the point realising the infimum of u , we
have also that ∆u (p0 , t0 ) ≥ 0.
So then
0 ≥ ∂u∂t

(p0 , t0 ) = ∂u∂t
(p0 , t0 ) + = ∆u(p0 , t0 ) + = ∆u (p0 , t0 ) + ≥ ,
a contradiction.
78
5.1.2 Comparison Principle

.
If u, v are two smooth solutions of the heat equation with u(p, 0) ≥ v(p, 0)∀p ∈ Ω
then u(p, t) ≥ v(p, t).
Proof. Observe that u − v also satisfies the heat equation. The result then
follows from Result (i).
5.1.3 Averaging Property

.
supp∈Ω u(p, t) is monotone nonincreasing.
Z Z
d 2
u =2 u∆u
dt Ω ZΩ Z
=2 uDn u − 2 kDuk2 ≤ 0 (5.2)
∂Ω Ω
”Energy”
Z Z
d 2
kDuk = 2 Di uDi ∆u
dt Ω ZΩ
=2 Di u∆Di u
ZΩ Z
=2 Dn u∆u − 2 (∆u)2 ≤ 0, (5.3)
∂Ω Ω
if we have either Dirichlet or von Neumann boundary conditions, in which case

the first term vanishes.
”Shannon Entropy”
Z Z
d
ulog(u) = (1 + logu)∆u
dt Ω Ω
k∇uk
Z
=− ≤0 (5.4)
Ω u
79
”Fisher Information”
Define the Fisher Information I(u) as
kDuk2
Z
I= (5.5)
Ω u
Z
d d Di uDi u
I=
dt dt Ω u
Z Z
Di uDi ∆u Di uDi u
=2 − ∆u
Ω u Ω u2
Z Z Z
Dn u∆u Di u Di uDi u
=2 − 2 Di ( )∆u − ∆u
∂Ω u Ω u Ω u2
(∆u)2
Z Z
Di uDi ∆u
=0+ −
Ω u Ω u
Z 2
1d (∆u)
= I− (5.6)
2 dt Ω u
Hence
d
R (∆u)2
dt
I = −2 Ω u
≤ 0,
if u is a positive function on Ω.
5.1.4 Smoothing
.
eg. Suppose ku0 (p, 0)k ≤ C0 (Then ku0 (p, t)k ≤ C0 ∀p, ∀t ≥ 0)
Then for each k ∈ N there exists a constant Ck such that
Ck2
kDk u(p, t)k2 ≤ tk 0
C ∀p ∈ Ω, t > 0
Note that
k
Dk u = Σ{α1 ,...,αn |α1 +...+αn =k} (∂x1 )α1∂...(∂x
u
n )αn
80
Proof. (Periodic case)

We prove the result by induction. Suppose we have
kDk u(p, t)k2 ≤ Ĉk2 , ∀p ∈ Ω, t ∈ [0, T ]
We will bound kDk+1 uk2 .
∂
∂t
Dα u = Dα ∆u = ∆(Dα u)
Hence
∂ ∂
kDk uk2 = Σα kDα uk2
∂t ∂t
= 2Σα Dα u∆(Dα u) (5.7)
Now
∆kDα uk2 = Σi Di Di kDα uk2

= Σi 2Di (Dα uDα Di u)
= 2Dα u∆(Dα u) + 2Σi kDi Dα uk2 (5.8)
So we deduce
∂
∂t
kDk uk2 ≤ ∆kDk uk2 − kDk+1 uk2 c(k, n)
for some constant c(k, n).

Also note similarly that
∂
∂t
kDk+1 uk2 ≤ ∆kDk+1 uk2
Define
kDk+1 uk2
Q= 2Ĉk2 −kDk uk2
81
Note that the denominator will always be positive.

Furthermore, note that rearranging our result above we get that
∂
(2Ĉk2 − kDk uk2 ) ≥ ∆(2ˆ(C)2k − kDk uk2 ) + c(k, n)kDk+1 uk2 (5.9)
∂t
We would like to compute a heat equation type inequality for Q. For this we
make use of the following lemma:
Lemma. If f and g satisfy f, g > 0 and
∂f
∂t
≤ ∆f + P
∂g
∂t
≥ ∆g + R
then
∂ f
( )
∂t g
≤ ∆( fg ) + 2 ∇g
g
.∇( fg ) + g1 (P − fg R)
Proof of Lemma. Merely a careful calculation.

By the lemma, we have that
∂ c(k, n)kDk+1 k2
Q ≤ ∆Q + V.DQ − Q 2 (5.10)
∂t 2Ĉk − kDk uk2
where
D(2Ĉ 2 −kDk uk2 )
V = 2 2Ĉ 2k−kDk uk2
k
1
I claim that Q(p, t) ≤ c(k,n)t . From this our result would automatically follow.
So observe that by the first inequality we have that
∂
∂t
(tQ − 1c ) = Q + t ∂Q∂t
≤ ∆(tQ − 1c ) + V.D(tQ − 1c ) − cQ(tQ − 1c )
Now by the existence theory for parabolic PDE if one has an equation of the
form
∂
∂t
X ≤ ∆X − bj Dj X + µX
82
where bj and µ are smooth then we have a maximum principle that states that,
provided X(p, 0) ≤ 0 for all p ∈ Ω then X(p, t) ≤ 0, ∀p ∈ Ω, t ≥ 0. (For a proof of
this maximum principle, see the previous section on existence theory.)
Certainly tQ − 1c < 0 at t = 0, so the conditions of the maximum principle are
satisfied and our claim follows.
Hence
1 Ĉk2
kDk+1 uk2 ≤ ct
(2Ĉk2 − kDk uk2 ) ∼ ct
and we are done.
5.2 The Curve Shortening Flow (CSF)

5.2.1 Introduction
First a brief discussion about curves.
Consider X0 : R → R2 such that X0 (u + n) = X0 (u), for n ∈ Z. So we have
an immersion of the circle into the plane. We are interested in finding properties
invariant under rigid motions and reparametrisation.
The standard or canonical parametrisation is in terms of arc length.
Define
Rs
φ(s) = 0
k ∂X
∂u
kdu
The unit tangent vector T is defined as
∂X/∂u
T = k∂X/∂uk
The unit normal vector N is defined as
N = R− π2 T
where Rθ is rotation by θ.
Since kT k2 = 1, 2 < T, ∂T
∂s
>= 0.
Define
83
The Curve Shortening Flow (CSF)
∂T
∂s
= −κN
The curvature κ determines X0 .

We have, similarly to T , that kN k2 = 1 and consequently 2 < N, ∂N
∂s
>= 0.
Also the relation 0 =< N, T > implies that
∂N
0 =< ∂s
,T > + < N, −κN >,
or
∂N
∂s
= κT
Now I discuss the curve shortening flow. Consider once more a smooth immer-
sion X0 . We wish to find X : R/Z × [0, T ) → R2 such that
∂X ∂2X
∂t
(u, t) = ∂s2
(u, t)
with the initial condition X(u, 0) = X0 (u). This specifies our curve shortening
flow.
∂X ∂2X ∂T
Note. Since ∂s
= T we have that ∂s2
= ∂s
= −κN , and hence that
∂X
∂t
= −κN
5.2.2 The Shrinking Circle

As an example we consider the circle under curve shortening flow. The solution is
X(u, t) = (r02 − 2t)1/2 (cosu, sinu)
5.2.3 Geometric Invariance

If K is a rigid motion of R2 , φ : R → R smooth, φ0 > 0, then X̂(u, t) = K(X(φ(u), t))
is also a solution of curve shortening flow.
84
5.2.4 Avoidance Principle

Suppose X, Y : R/Z × [0, T ) → R2 are solutions of CSF with X(u, 0) 6= Y (v, 0) for
all u, v, then X(u, t) 6= Y (v, t) for all u, v ∈ R/Z, all 0 ≤ t < T . ie if we have two
curves that start disjoint they remain disjoint under CSF.
Proof. Define ρ : R/Z × R/Z × [0, T ) → R as
ρ(u, v, t) = kX(u, t) − Y (v, t)k2
We have ρ ≥ ρ0 > 0 at t = 0 by hypothesis.

For any t ≥ 0, let u, v ∈ R/Z be such that
ρ(u, v, t) = inf {ρ(p, q, t)|p, q ∈ R/Z}
At this point we have
∂ρ ∂ρ
∂u
=0= ∂v
and, by the nature of the infimum,
D2 ρ ≥ 0
(D2 ρ is the matrix of mixed spatial derivatives of 2nd order of ρ).

Choose parametrisations so that u, v are arc-length parameters at time t.
Then
∂ρ
= 2 < X − Y, TX > (5.11)
∂u
and
∂ρ
= −2 < X − Y, TY > (5.12)
∂v
We also have
85
∂ 2ρ
= 2 < TX , TX > +2 < X − Y, −κX NX > (5.13)
∂u2
∂ 2ρ
= 2 − 2 < X − Y, −κY NY > (5.14)
∂v 2
and
∂ 2ρ
= −2 < TY , TX > (5.15)
∂u∂v
∂ρ ∂ρ
If ∂u =0= ∂v
then we can choose signs so that TX = TY ⊥ X − Y .
So
∂ρ
= 2 < X − Y, −κX NX + κY NY >
∂t
∂ 2ρ ∂ 2ρ
= + −2−2
∂u2 ∂v 2
= v.D2 ρ.v ≥ 0 (5.16)
where v = [1, 1].

So the distance between X and Y is nondecreasing, which proves the avoidance
principle.
5.2.5 Maximum Principle

Suppose u : M × [0, T ] → R is smooth, and M is a compact boundary without
boundary. Suppose also that for all (x, t) ∈ M × [0, T ), in particular for u(x, t) =
inf {u(y, t)|y ∈ M } that ∂u
∂t
(x, t) ≥ 0. Then inf {u(y, t)|y ∈ M } is nondecreasing in
t.
Proof. Look at the infimum point of u = u + (1 + t) − inf u, where u first hits
0. The argument is similar to the one we made before.
Warning. The following result is not true:
If u : M × [0, T ) is smooth, u(x, 0) > 0 and ∂u ∂t
≥ 0 whenever u(x, t) =
inf {u(y, t)|y ∈ M } = 0, then u(x, t) ≥ 0 for all x, t ≥ 0.
86
5.2.6 Preserving Embeddedness

If X0 is embedded, and X : R/Z × [0, T ) → R2 is a smooth solution of CSF, with
supkκ(u, t)k ≤ C, then Xt (.) = X(., t) is an embedding. (Note in particular that κ
of a curve is bounded if the curve can be parametrised by arc length).
Proof. We need
(i) kXu k =6 0
and
(ii) X(., t) is one to one.
Proof of (i):
∂ 1 ∂ ∂X
kXu k = < Xu , >
∂t kXu k ∂t ∂u
∂
=< T, (−κN ) > kXu k
∂s
= −κ2 kXu k (5.17)
Hence
2
kXu (u, t)k ≤ eC t kXu (u, 0)k (5.18)
and
2
kXu (u, t)k ≥ e−C t kXu (u, 0)k (5.19)
proving (i).
Proof of (ii):
I first claim that if the arc length between two points u and v is δ0 that
Consider ρ(u, v, t) = kX(u, t) − X(v, t)k2 on {(u, v, t)| arc length from u to v at
time t ≥ δ0 }.
The claim is then that for any two points u, v on the curve at time zero a
distance δ0 or greater apart that ρ(u, v, t) ≥ Dδ02 .
Well, the maximum principle tells us that
inf(s,t) ρ ≥ min{inf(∂s,t) ρ, inf(s,0) ρ}
so the points remain separated if they are separated initially, proving (ii).
87
5.2.7 Finite Time Singularity

Any bounded embedded curve will reach a singularity after a finite amount of time
under CSF.
Proof. Draw a circle enclosing the curve. By the avoidance principle, under CSF
neither curve will ever intersect. But under CSF the enclosing circle shrinks to a
point. Hence so must the curve.
5.2.8 CSF as a steepest descent flow for length

It turns out more generally that all geometric evolution equations can be viewed
as a steepest descent flow for some functional. We shall see this later for the mean
curvature flow, and, later still, for the Ricci flow. But for now consider the following
functional:
R1
L(t) = 0
k ∂X
∂u
kdu
Consider an arbitrary variation
∂X
∂t
(u, t) = V (u, t)
Then
∂ ∂
∂t
kXu k =< T, ∂s V > kXu k
Hence
Z 1
d ∂V
L(t) = < T, > ds
dt ∂s
Z0 1 Z 1
∂
= < T, V > ds − < V, −κN > ds (5.20)
0 ∂s 0
The first term vanishes since the variation vanishes on the boundary of the curve,
or, if it is a closed curve, since the end points are the same.
Hence
88
d
R1 R1 R1
dt
L(t) =− 0
< V, −κN > ds ≥ −( 0 kV k2 ds)1/2 ( 0 κ2 ds)1/2
by the Cauchy-Schwarz inequality.

So we conclude that dL
R
dt
is minimised along all variations with kV k2 = 1 by
V ∝ −κN , since then this inequality is realised. But this means that
∂X
∂t
(u, t) = V (u, t) = −κ(u, t)N (u, t),
so we recover CSF.
5.2.9 Existence
Let (x, y) = X : R/Z × [0, T ) → R2 .
Then
∂X
∂u
= (xu , yu )
and
(xu ,yu )
T =√ 2 2
xu +yu
Now
∂T 1 ∂ (xu , yu )
−κN = =p (p )
∂s x2u + yu2 ∂u x2u + yu2
(xuu , yuu ) (xu , yu )
=p − 2 (xu xuu + yu yuu ) (5.21)
x2u + yu2 (xu + yu2 )2
We can then write CSF in this choice of coordinates as

∂ x 1 yu2 −xu yu xuu
= 2
∂t y (xu + yu2 ) −xu yu x2u yuu
2 β
Definition 12. A system ∂t ∂
X α = Aijα ∂ x n
β ∂ui ∂uj on Ω×[0, T ), Ω ⊂ R , is called strongly
parabolic if for each ζ ∈ Rn {0},
89
< Aijα 2
β ζi ζj ηα , ηβ >≥ δ0 kζk kηk
2
Remark. Note that in the case n = 1,
∂
∂t
Xα = Aαβ (X β )uu
this amounts to requiring that
< Aαβ ηα , ηβ >≥ δkηk2
In the case of CSF, we have that

1 yu2 −xu yu
Aαβ = 2
xu + yu2 −xu yu x2u

1 yu
= 2 yu −xu
xu + yu2 −xu
which is NOT parabolic.

To fix this, we add a time dependent reparametrisation to CSF. (It will turn out
that the same problem arises with mean curvature flow and Ricci flow and can be
fixed in an analogous manner.)
∂φ(u,t)
Define φ : R/Z × [0, T ) → R/Z such that ∂t
= V (φ(u, t), t).
Define X̂(u, t) = X(φ(u, t), t). By our remark before this will also be a solution
of CSF. Now
∂ ∂
X̂(u, t) = X(φ(u, t), t) + Xu V (φ(u, t), t)
∂t ∂t
= −κX̂ NX̂ (u, t) + Xu V (φ(u, t), t) (5.22)
Choose

1 xuu
V = 2 xu y u
(xu + yu2 ) yuu
Then
90

∂ x̂ ∂ 1 x̂uu
= X̂ = 2
∂t ŷ ∂t x̂u + ŷu2 ŷuu
which is strongly parabolic.

Short Time Existence Result. For any smooth X0 , there exists a unique smooth
solution of CSF X : R/Z × [0, T ) → R2 with X(u, 0) = X0 (u), for some T > 0.
Proof. This follows quickly from the reparametrisation above, which means that
existence (and uniqueness) follow from the analogous results for the standard heat
equation.
Global Existence Theorem. For any smooth immersion X0 : R/Z → R2 there ex-
ists a unique smooth solution X : R/Z × [0, T ) → R2 satisfying CSF with X(u, 0) =
X0 (u) on a maximal time interval T < ∞, and sup{kκ(u, t)k, u ∈ R/Z} → ∞ as
t → T.
5.2.10 Evolution of Curvature

We deduce the equation for evolution of the curvature (which, by a remark earlier,
completely determines the curve) for a solution of CSF. This is analogous to equiv-
alent results for the Ricci flow, where one can deduce for 3-manifolds the evolution
equation for the full Riemann tensor (which completely determines the manifold)
for a solution of the Ricci flow.
Starting from
∂
∂t
X = −κN
we note that
∂ ∂ ∂
Xu = (−κN ) = kXu k (−κN )
∂t ∂u ∂s
∂κ 2
= (− N − κ T )kXu k (5.23)
∂s
Now
91
∂ ∂p
kXu k = < Xu , Xu >
∂t ∂t
1 ∂
= < Xu , Xu >
kXu k ∂t
∂
=< T, Xu >
∂t
2
= −κ kXu k (5.24)
immediately from our previous computation.

Also note that
∂ ∂
∂t
T = ( Xu )
∂t kXu k
= − ∂κ
∂s
N
It then quickly follows from < T, N >= 0 that

∂ ∂κ
∂t
N = ∂s
T
Now, for any function f , we have that
∂ ∂ ∂
f = (kXu k−1 ∂u f )
∂t ∂s ∂t
1 1
= ∂u ∂t f − (−κ2 kXu k)∂u f
kXu k kXu k2
= ∂s ∂t f + κ2 ∂s f (5.25)
In other words
[∂t , ∂s ] = κ2 ∂s
Then
∂ ∂ ∂ ∂ ∂
( T) = ( T ) + κ2 T
∂t ∂s ∂s ∂t ∂s
3
= ∂s (−κs N ) − κ N
= −κss N − κs κT − κ3 N (5.26)
But note that also
92
∂ ∂ ∂
( T)
∂t ∂s
= ∂t
(−κN ) = −κt N − κκs T
So the terms κs κT cancel and we are left with the evolution equation for the
curvature for a solution of CSF:
∂
κ = κss + κ3 (5.27)
∂t
5.2.11 Bounds on Curvature for CSF

In the subject of geometric evolution equations, it is often important to be able to
control geometric quantities, particularly if we want to understand the nature of
limiting solutions. For the CSF, we have the nice result that if we have a global
bound on the curvature κ, we get bounds on
∂k
2
k ∂s k κk
for all k. I will prove this only for k = 1, but the core ideas remain the same.
The key is to use the evolution equation for the curvature we have just derived.
So suppose κ2 < C2 . I claim that we get a bound on κs .
Proof. First note that
∂
κs = (κss + κ3 )s + κ2 κs
∂t
= κsss + 4κ2 κs (5.28)
and hence that
∂ 2
κ
∂t s
= (κs )2ss − 2κ2ss + 8κ2 κ2s
κ2s
Now if we denote g = C 2 −κ2
, we get that
∂ 2κ4 κ2s
g ≤ gss + ags + 8κ2 g − 2g 2 + 2 for some a,
∂t (C − κ2 )2
≤ gss + ags + Cg − 2g 2 for some new constant C. (5.29)
93
Then, as before, since initially g is a positive bounded function we get, via the
maximum principle, that
g(u, t) ≤ D for all u ∈ Ω, t ≥ 0, for some positive number D or, in other words,
that
κ2s ≤ C 2 D
which is what we wanted to show.

Also, we want to show that X remains smooth. The way to do this is to control
derivatives of X. Now
∂
∂t
kXu k = −κ2 kXu k
this implies that

2 2
e−C t kXu (u, 0)k ≤ kXu (u, t)k ≤ eC t kXu (u, 0)k
ie Xu is bounded.
For higher derivatives, note that
Xuu =< Xuu , N > N + < Xuu , T > T

∂
=( < Xu , N > − < Xu , κXu >)N + < Xuu , T > T
∂u
= −κkXu k2 N + < Xuu , T > T (5.30)
So we need to control < Xuu , T >.

More generally, we will find that we will need to control for k th order terms a
term of the form
∂ ∂k X ∂k X ∂iX
∂t
< ∂uk
,T >= −κ2 < ∂uk
,T > + (lower order in < ∂ui
>)(κ, κs , ...)
k
So, once again by the maximum principle and induction, k < ∂∂uXk , T > k remains
bounded if it is bounded initially. It turns out that this is all we need to conclude:
k
Theorem 5.2.1. If X exists on R/Z ×[0, T ), then k ∂∂uXk k is bounded on R/Z ×[0, T )
∂k ∂l
and k ∂u k ∂tl Xk is bounded.
94
The Arzela Ascoli Theorem now tells us that Xt (.) = X(., t) → XT (.), i.e. the
flow converges smoothly to a sensible limit at t = T provided we have bounds on
the curvature. We may then apply short time existence starting from XT to extend
the solution further.
To digress briefly, recall the statement of the Arzela-Ascoli theorem:
Theorem 5.2.2. (Ascoli-Arzela). Let X be a compact metric space, Y a metric
space. Then a subset F of C(X,Y) is compact in the compact-open topology if and
only if it is equicontinuous, pointwise relatively compact and closed.
Note that this proves the Global Existence Theorem which I stated earlier, since
this process can only stop when the curvature becomes unbounded.
5.2.12 An Isoperimetric Estimate

Consider a closed curve Xt with no self intersections in R2 . Let p, q be two points
on the curve, d be the distance separating them in the plane, l be the distance
separating them along the curve, and L the total length of the curve. In particular,
note that if the curve is precisely a circle of radius r, and p and q are separated by
an angle θ, that L = 2πr, l = rθ, d2 = rsin( 2θ ) and θ = 2πl
L
. Then we can see that
π = Ld sin( πl2 )
So define Z(p, q, t) = Ld sin( πl2 ) for any curve Xt .

Then sup(p,q) Z ≥ π, with equality realised iff Xt is a circle.
Claim. sup(p,q) Z(p, q, t) is nonincreasing under CSF.
(In fact, one can show that sup(p,q) Z(p, q, t) is decreasing under CSF if it is
initially > π. This of course shows that for curves with no initial self intersections
the limiting solution of renormalised CSF is actually a circle.)
5.3 Some Geometric Background on Hypersur-

faces
5.3.1 Introduction
A hypersurface M n ⊂ Rn+1 is locally the level set of a function - i.e. ∀x ∈ M there
6 0, where U ⊂ Rn+1 is an open set st x ∈ U . Then
is a smooth g : U → R, kDgk =
95
Some Geometric Background on Hypersurfaces
M ∩ U = g −1 (0)
In the compact case, it is possible to choose a global chart, ie. a g : Rn+1 → R

such that g −1 (0) = M , and x ∈ M implies that kDg(x)k = 6 0.
We have the Implicit Function Theorem which states that there exist local
parametrisations, ie, given x ∈ M there exists an open neighbourhood U and an
X : V ⊂ Rn → U which is smooth and an embedding. In other words, X is one to
one and DX(z) : Rn → Rn+1 is injective.
We can interpret the tangent space to a point x in a hypersurface M as
Tx M = {v ∈ Rn+1 |Dx g(v) = 0} = kerDg(x)
Or, if X is a local parametrisation such that x = X(p) then
Tx M = im(DX(p))
In other words, we get a natural basis for Tx M as
∂X ∂X
{ ∂p 1 , ..., ∂pn }
The unit normal to a hypersurface is
∇g
N= k∇gk
⊥ TM
The metric induced by the embedding of the hypersurface in Rn+1 is
∂X ∂X
gij = ∂pi
· ∂pj
5.3.2 The Second Fundamental Form

We define the second fundamental form for a hypersurface M as
2
hij = − ∂p∂i ∂p
X
j · N
96
The second fundamental form plays an analogous role for a hypersurface that
the curvature plays for a curve; it completely specifies how the surface M n embeds
in Rn+1 .
For example, take M n = S n ⊂ Rn+1 . Then g = kXk2 − 1, and N = X. This
implies that hij = gij .
The Implicit Function Theorem implies that if X and Y are local parametri-
sations of M near x ∈ M then Y = X ◦ φ where φ : A ⊂ Rn → B ⊂ Rn is a
diffeomorphism. In particular,
∂Y ∂ ∂X ∂φj
∂q i
= ∂q i
(X(φ(q))) = ∂pj ∂q i
Define
∂φj
Λji = ∂q i
as the Jacobian of φ.
Then observe that
gijY = Λki Λlj gkl

X
so the metric transforms like a tensor as we would expect.

More interestingly, observe that
∂ 2Y ∂ ∂X ∂φk
= ( )
∂q i ∂q j ∂q i ∂pk ∂q j
∂ 2 X ∂φk ∂φl ∂X ∂ 2 φk
= k l j i + k j i (5.31)
∂p ∂p ∂q ∂q ∂p ∂q ∂q
Now by definition
∂ 2Y
hYij = − ·N
∂q i ∂q j
∂ 2X
= − k l · N Λki Λlj
∂p ∂p
X k l
= hkl Λi Λj (5.32)
97
Some Geometric Background on Hypersurfaces
proving that the second fundamental form also transforms like a tensor!
The second fundamental form has a natural interpretation. Suppose v ∈ Tx M .
Then h(v, v) is the curvature of the curve M ∩ Π in the plane Π, where Π =
span{v, N }.
5.3.3 Differentiation on a Hypersurface

Let V be a tangent vector field to our hypersurface M , V = V i ∂X
∂pi
. Then, given a
smooth function f we can differentiate f in direction V to get
∂f
V f = V i ∂p i
Suppose now V, W are tangent vector fields to M . We compute
∂X
V (W X) = DV (W i )
∂pi
∂ ∂X
= V j j (W i i )
∂p ∂p
i 2
j ∂W ∂X j i ∂ X
=V +V W
∂pj ∂pi ∂pj ∂pi
i 2 2
j ∂W ∂X j i ∂ X tang j i ∂ X
= (V + V W ( i j ) ) + V W ( i j · N )N (5.33)
∂pj ∂pi ∂p ∂p ∂p ∂p
This motivates us to define
V (W X) = (∇V W )(X) − h(V, W )N
∇ is nothing other than the covariant derivative of M .
5.3.4 The Structure Equation for Hypersurfaces

Observe that trivially for any f that
∂2f ∂2f
0= ∂xi ∂xj
− ∂xj ∂xi
Take f = X. Then
98
0 = (∇∂i ∂j − ∇∂j ∂i )X − (hij − hji )N
Differentiating one more time, we observe that
0 = ∂i ∂j (∂k X) − ∂j ∂i (∂k X)
= ∂i ((∇∂j ∂k )X − hjk N ) − (i ↔ j)
= (∇∂i (∇∂j ∂k ))X − h(∂i , ∇∂j ∂k )N − (∇i hjk + h(∇∂i ∂j , ∂k )
+ h(∂j , ∇∂i ∂k ))N − hjk hip g pq ∂q X − (i ↔ j)
= (∇∂i (∇∂j ∂k ) − ∇∂j (∇∂i ∂k )
− (hjk hip − hik hjp )g pq ∂q )X + (∇∂j hik − ∇∂i hjk )N (5.34)
(The notation (i ↔ j) means to repeat the previous expression in the line of

calculation but under a reversal of the indices i and j).
From examining the tangential and normal components of this we deduce the
Codazzi Equation.
∇∂i hjk = ∇∂j hik (5.35)

and the
Gauss Equation.
[∇∂i , ∇∂j ]∂k = (hjk hip − hik hjp )g pq ∂q (5.36)
5.3.5 Principal Curvatures

The eigenvalues of h with respect to g are the principal curvatures. So, if we choose
a basis e1 , ..., en for Tx M such that
g(ei , ej ) = δij
then
h(ei , ej ) = κi δij
The Mean Curvature is defined to be
99
Mean Curvature Flow (MCF)
Pn
H= i=1 κi = g ij hij
The Gauss Curvature is defined to be
det(h)
K = Πni=1 κi = det(g)
5.4 Mean Curvature Flow (MCF)

5.4.1 Introduction; MCF as a steepest descent flow of vol-
ume
MCF arises from a steepest descent flow of the volume of a hypersurface. In this way
it is a natural generalisation of the curve shortening flow, which I showed previously
arises as the steepest descent flow of the length of an immersed curve in R2 .
The n-dimensional volume of M is given by
Rp
|M | = det(g)
Consider an arbitrary variation
X : M0 × [0, T ) → Rn+1
∂X
where ∂t
= V = ωX + f N .
∂X ∂X
Recall gij = ∂pi
· ∂pj
.
Now
∂ ∂X ∂
i
= i (ωX + f N )
∂t ∂p ∂p
∂f
= (∇∂i ω)X − h(∂i , ω)N + i
N + f hpi ∂p X (5.37)
∂p
where hpi is defined to be hik g kp .

So
100
∂ ∂ ∂X ∂X
gij = · + (i ↔ j)
∂t ∂t ∂pi ∂pj
= g(∇∂i ω, ∂j ) + f hpi gpj + (i ↔ j)
= g(∇∂i ω, ∂j ) + g(∂i , ∇∂j ω) + 2f hij (5.38)
Hence
∂ ∂
det(g) = det(g)g ij gij
∂t ∂t
= 2det(g)(∇∂i ω i ) + 2det(g)f H (5.39)
So
∂p p
det(g) = [(∇∂i ω i ) + f H] (det(g)) (5.40)
∂t
So we may finally conclude that
Z
d
|M | = (div(ω) + f H)dµ(g)
dt
ZM
= f Hdµ(g) by the divergence theorem (5.41)
M
It is evident then that steepest descent flow is given by the variation
∂X
∂t
= −kHN
for some positive nonzero constant k. So choose k = 1. This is the equation of

Mean Curvature Flow.
Alternatively, we can think of MCF as a natural heat equation in the sense that
if we take the variation
∂X
∂t
= ∆X
where ∆f = g ij ∇i ∇j f is the Laplacian of f , then
101
∆X = g ij (−hij N ) = −HN
which gives us MCF again.

More precisely, given a compact hypersurface X0 , we want to find X : M0 ×
[0, T ) → Rn+1 such that
∂X
∂t
= −HN
and
X(z, 0) = z
for all z ∈ M0 .
The solution to this initial value problem if it exists is the MCF of M0 .
5.4.2 Existence Results

Short Time Existence.
We can establish short time existence of the MCF in two different ways-
(i) Write Mt as a graph over M0 , i.e. X(z, t) = z + ρ(z, t)N (z, 0). It is clear
that we will have short time existence since if we flow along a normal vector field
the graph will have no self intersections for small times t.
or
(ii) Write
∂X 2
∂t
= g ij ( ∂p∂i ∂p
X
j − (∇∂i ∂j )X)
∂2X
(The bracketed term is the normal component of ∂pi ∂pj
)
˙ be the covariant derivative at t = 0.
Let ∇
Then
∂X ˙ i∇
= g ij (∇ ˙ j X + (∇
˙ i ∂j − ∇i ∂j )X)
∂t
˙ i ∂j −∇i ∂j )X is defined independent of local parametrisation.

Note V (z, t)X = (∇
Hence we have short term existence.
Theorem 5.4.1. (Global Existence Theorem).
If T is the maximal time of existence, then supMt khk2 → ∞ as t → T . Here
khk2 = g ij g kl hik hjl = i κ2i .
P
102
5.4.3 Induced Evolution of Various Quantities along MCF

Evolution of Metric.
∂ ∂ ∂X ∂X
gij = ( i · j )
∂t ∂t ∂p ∂p
= −2Hhij (5.42)
Evolution of Normal.
Define
∂
∂t
N = Ai ∂i X
We compute the Ai .
Now certainly
∂X
0=N ·T =N · ∂pi
So
∂N ∂X ∂
0= ∂t
· ∂pi
+N · ∂pi
(−HN )
The first term is nothing other than Ai gji . The second term is clearly − ∂H
∂pi
. So
we conclude
∂H
Aj gji = ∂pi
From which we derive finally the evolution equation for the normal to the hy-
persurface:
∂N ∂H
= i g ij ∂j X
∂t ∂p
= (∇H)X (5.43)
Evolution of h and also the mean curvature H.
103
Note first that
∂t ∂i N = ∂t (hki ∂k X)
∂
= ( hki )∂k X + hki ∂k (−HN )
∂t
∂ k
= hi )∂k X − Hhpk ∂p X (5.44)
∂t
But also
∂t ∂i N = ∂i (∇k Hg kl ∂l X)
= (∇i ∇k Hg kl ∂l X) − h(∇H, ∂i )N (5.45)
So if we equate tangential components of the above expressions, we get the

relation
∂ j
h = ∇i ∇k (H)g kj + Hhki hjk (5.46)
∂t i
which is the evolution equation for h.
Consequently
∂
∂t
H = ∆H + Hkhk2 ,
the evolution equation for H.
5.4.4 The Huisken Rescaling Result

The proof of the following result is due to Huisken. It is a typical example of
how under a geometric flow the limiting behaviour can be very nice. The way
we understand limiting behaviour, furthermore, is done by rescaling our solution-
typified by the final statement in the result which follows.
The Result. If M0 is a compact convex hypersurface in Rn+1 , n ≥ 2, then the
solution of MCF remains smooth and convex on a maximal interval [0, T ). Further-
more,
104
X(p, t) → x0 ∈ Rn+1 as t → T
and
X(p,t)−x0
(2n(T −t))1/2
→C ∞ XT
where XT (M0 ) = S n .
Remark. Note that if M0 = r0 z, where z ∈ S n , then X(z, t) = r(t)z and N (z, t) = z.
So
hji ∂j X = ∂i N = ∂i z = ∂i ( Xr ) = 1r ∂i X
Hence hji = 1r δij . Now κ1 = · · · = κn = 1r , so H = nr .

Then MCF becomes ∂r ∂t
= − nr , which implies that r(t) = (r02 − 2nt)1/2 . So if
r2
we set T = 2n0 then r(t) = (2n(T − t))1/2 . Hence it is clear to see that a natural
rescaling of our solution should feature the quotient of some factor such as this.
5.4.5 Simon’s Identity and an application

We derive an identity that will prove useful later on. Note that
∇i ∇j H = ∇i ∇j (g kl hkl )
= g kl ∇i ∇j hkl (since ∇i gjk = 0)
= g kl ∇i ∇k hjl (by Codazzi) (5.47)
This is Simon’s Identity. For an example of its application, we compute
0 = (∇i ∇k − ∇k ∇i )h(∂j , ∂l )
= ∇i (∇k hjl + h(∇k ∂j , ∂l ) + h(∂j , ∇k ∂l )) − (i ↔ k)
= ∇i ∇k hjl + ∇h(∇k ∂j , ∂l ) + h(∇∂, ∇∂)
+ h(∇i ∇k ∂j , ∂l ) + h(∂j , ∇i ∇k ∂l ) − (i ↔ k) (5.48)
Note that the second and third terms can be made to vanish if we choose coor-
dinates so that ∇k ∂j (x) = 0.
Continuing under this choice of coordinates, we get that
105
0 = [∇i , ∇k ]hjl + h([∇i , ∇k ]∂j , ∂l ) + h(∂j , [∇i , ∇k ]∂l )
Hence
[∇i , ∇k ]hjl = hij hpk hpl − hkj hpj hpl + hil hpk hjp − hkl hpi hjp
Which implies after using the Codazzi equation and Simon’s Identity that
∇i ∇j H = g kl ∇k ∇l hij + hij khk2 − Hhpi hpj
From which we finally derive
∂ j
hi = (∇i ∇k H + Hhpi hpk )g kj
∂t
= ∆hji + hji khk2 (5.49)
Note that this is the same evolution equation for h as we derived before, except
that we did not make the assumption that M was a hypersurface, ie this equation
is more generally applicable.
5.4.6 Monotonicity Formula for the MCF and application

to MCF Solitons
Once again we use the simpler case of MCF to motivate the more technically complex
analogues in RF. It is evidently important to understand the limiting behaviour of
geometric flows, and it turns out that particular self-similar limits, known as soliton
solutions, are very important. In fact, both for the MCF and the RF, it will turn
out that any limit of rescaled flows about any point will be such a solution. To prove
this, we need a monotonicity formula for the relevant flow. In this section I shall
introduce such a formula for the MCF, and demonstrate its application to proving
that all limits of the MCF are self-similar. Finally, I will finish with mentioning a
partial classification result for these creatures.
So, let Mt be a solution of MCF in Rn+1 . Then we have a monotonicity formula
for the MCF, due to Huisken:
2
d
− t)−n/2 exp(− kx−x 0k
R
( (t
dt Mt 0 4(t0 −t)
dµ(gt )) ≤ 0
106
for t0 ≥ t.
Remarks.
(1) Note that the fundamental solution of the heat equation on Rn+1 is ρ(x, t) =
n+1 2
ct− 2 exp(− kxk
4t
).
∂u
(2) Furthermore, if u satisfies ∂t
= ∆u in Rn then
kx − x0 k2
Z
u(x0 , t0 ) = cn (t0 )−n/2 exp(− )u(x, 0)dxn (as can be proved using Green’s functions)
4t0
kx − x0 k2
Z
= cn (t0 − t)−n/2 exp(− )u(x, t)dxn for0 ≤ t ≤ t0 (5.50)
4(t0 − t)
Hence
kx − x0 k2
Z
d
(t0 − t)−n/2 exp(− )u(x, t)dxn = 0 (5.51)
dt 4(t0 − t)
Now, consider solitons of MCF, ie Mt = λ(t)M0 . So we have X(p, t) = λ(t)X(φ(p, t), 0),
0 ∂φj
∂X
which implies −Hµ|(p,t) = λλ X|(p,t) + λ ∂xj |(φ(p,t),0) ∂t .
Taking the inner product with µ, we get

0
H = − λλ < X, µ >
”The MCF Soliton Equation”.

0
We may scale −λ
λ
= 2(T1−t) (Then Ht = λ1 H0 ), so ∂λ
∂t
= − λc .
Under normal variations
∂X
∂t
= fµ
∂ ∂
So ∂t gij = 2f hij , and hence ∂t
(dµ) = f H(dµ).
Therefore,
∂
∂t
kXk2 = 2 < X, ∂X
∂t
>= 2f < X, µ >
107
so
∂
∂t
(exp(− 2c kXk2 )dµ) = f (H − c < x, µ >)exp(− 2c kXk2 )dµ
Choose c = 2(T1−t) .
Then we have from the relation immediately above, that a MCF soliton is a
R kXt k2
critical point of Mt exp(− 4(T −t)
)dµ, where Mt = λ(t)M0 . We immediately see the
connection with the monotonicity formula, because, after reparametrisation, we see
kX0 k2
that a MCF soliton is a critical point of M0 (t0 − t)−n/2 exp(− 4(T
R
−t)
)dµ.
We recall the monotonicity formula for the MCF, as stated above:
2
d
(t − t)−n/2 exp(− kx−x 0k
R
dt M 0 4(t0 −t)
)dµ ≤ 0.
Now I claim that we acheive equality if and only if Mt is a MCF soliton. Certainly
if Mt is a MCF soliton, this is the case. So it remains to prove the converse. In
particular, I will in the process prove the monotonicity formula.
(To prove that we can always choose an appropriate c, we compute
λ0
− λ1 Hµ|0 = −Hµ|Mt = λ0 X0 = c
Hµ|0 + something tangential
This implies that λλ0 |t = λλ00 = −c and so λ2 (t) = 1 − 2ct = 2c(T − t), from
0 2 0
which we intuit λλ = 21 (λλ2) = 21 log(2c(T − t))0 = 2(T1−t) as required.)
To prove the converse, we adopt the notation ρ : Rn+1 × [0, ∞) → R for the map
n+1 2
ρ(x, t) = (t0 − t)− 2 exp(− kx−x 0k
4(t0 −t)
)
∂ n+1
Note that ∂t = −∆R ρ.
We compute
d
R
(t0 − t)1/2 ρ(X(p, t), t)dµ
dt
n+1
= − 12 (t0 − t)−1/2 ρdµ − (t0 − t)1/2 ρH 2 dµ + (t0 − t)1/2 (−∆R ρ + Dρ(−Hµ))dµ
R R R
We can write (in orthonormal coordinates with en+1 = µ(x), e1 , · · · , en a basis

for TX M )
n+1 ∂2ρ
∆R ρ = D2 ρ(µ, µ) + (∂xi )2
108
Also,
d2
∇∇ρ(ei , ei ) = ρ(γ(s))
ds2
d
= (Dρ(γ 0 ))
ds
= Dρ(γ 00 ) + D2 ρ(γ 0 , γ 0 ) (5.52)
where γ is a geodesic in M , γ(0) = X and γ 0 (0) = ei

Now γ 00 = −κN = −h(γ 0 , γ 0 )µ so
∇i ∇i ρ = D2 ρ(ei , ei ) − h(ei , ei )Dµ ρ
and hence
n+1
∆R ρ = D2 ρ(µ, µ) + ∆M ρ + HDµ ρ
So
Z Z
d 1/2 ρ
(t0 − t) ρdµ = − (t0 − t)1/2 [ + Dµ Dµ ρ]
dt 2(t0 − t)
(Dµ ρ)2
Z Z
1/2 2 2HDµ ρ Dµ ρ 2
− (t0 − t) ρ[H − +k k ] + (t0 − t)1/2
ρ ρ ρ
Z
Dµ ρ 2
= − (t0 − t)1/2 ρkH − k dµ
ρ
(Dµ ρ)2
Z
ρ
= − (t0 − t)1/2 [Dµ Dµ ρ − + ] (5.53)
ρ 2(t0 − t)
Claim. (1) For ρ as above,
Di ρDj ρ ρ
Di Dj ρ − ρ
+ δ
2(t0 −t) ij
=0
∂u
(2) For any solution u of ∂t
= ∆u on Rn with u ≥ 0,
Di uDj u uδij
Di Dj u − u
+ 2t
≥0
109
δij
(This occurs if and only if Di Dj logu + 2t
≥ 0).
Remark. It follows from this claim that a (smooth) limit of rescaled flows about
any point is a shrinking MCF soliton.
Proof of Claim. Recall if µ is large,
∂
∂t
µ = ∆µ + k∇µk2
Hence
∂
Dµ
∂t i
= ∆Di µ + 2Dk µDk Di µ
and
∂
DDµ
∂t i j
= ∆Di Dj µ + 2Dk µDk (Di Dj µ) + 2Dj Dk µDk Di µ
One can then apply the maximum principle to show that tDi Dj µ + δij ≥ 0,
since this is certainly true when t = 0. This proves the claim, and the monotonicity
formula, since we have that
d
R
dt Mt
(t0 − t)1/2 ρdµ ≤ 0
2
by the claim. Furthermore, equality is only achieved if H = Dρµ ρ = Dµ (− kX−x 0k
4(t0 −t)
+
<X−x0 ,µ>
c(t)) = − 2(t0 −t) - but this is just the MCF soliton equation. In other words, this
implies that a smooth limit of rescaled flows about any point is a shrinking MCF
soliton, as required.
We have the following theorem, due to Huisken, which goes some way towards
classifying these beasts:
Theorem. A complete, shrinking soliton of MCF with H > 0 is either
(i) A shrinking sphere,
(ii) S k × Rn−k ,
(iii) or (A shrinking self similar solution of CSF) ×Rn−1 .
110
5.5 Proof of the Huisken rescaling result of MCF

5.5.1 Preliminaries
We set out to prove the result that I stated before, namely that compactness and
convexity are preserved under MCF. It is clear that compactness is preserved, since
we are following a steepest descent flow of the n-volume and the n-volume is initially
finite by hypothesis. To show that under MCF the surface flows to a point will
require invoking a corresponding avoidance principle as for CSF, which is again
very easy.
So the hard part is demonstrating that convexity is preserved if the hypersurface
is convex initially. Let me define convexity for a hypersurface more precisely:
M0 is convex if hij ≥ gij , for some > 0, where hij is the second fundamental
form of M0 and gij is its metric.
So this reduces the problem of proving that convexity is preserved to demon-
strating that under MCF,
Qij = hij − gij ≥ 0
is preserved. To show this I shall invoke a Maximum Principle for Qij . I will state
a Maximum Principle for vectors first, and then show how this can be extended to
deal with 2-tensors. The preservation of the inequality will follow as a consequence
of this principle.
5.5.2 The Maximum Principle for Vectors

Suppose ∂∂ uα = ∆uα + V α (u), α = 1, · · · , k on Rn (with periodic uα0 ).
Let K be a closed convex set in Rk such that u(z, 0) ∈ K for all z ∈ Rn and for
points y ∈ ∂K, V (y) ”points into K”. More precisely, y + sV (y) ∈ K for s small
and positive.
Then we conclude that U (z, t) ∈ K for all z ∈ Rn , all t ≥ 0.
Alternatively, V (u) ”points into K” for any u ∈ ∂K means the solution of the
ODE dtd U α = V α (U ) with U α (0) = z has U α (t) ∈ K for t ≥ 0.
Proof.
Claim. There exists a smooth convex function f on Rn − ∂K such that f → 0
on ∂K, and f ∼ d(·, ∂K).
111
Proof of the Huisken rescaling result of MCF
Proof of Claim. Given y ∈ Rn − ∂K, choose z to be the nearest point to y

in ∂K. Then define f to be the distance between them. Note in particular that
∇d(y) = ∇d(z).
Continuation of Proof. We compute
∂ ∂f
f (u) = (∆uα + V α (u)) (5.54)
∂t ∂uα
∂f
∆f = g kl ∇k ( ∇l uα )
∂uα
2
∂f α kl ∂ f
= ∆u + g ∇k uα ∇l uβ (5.55)
∂uα ∂uα ∂uβ
Hence
∂
f = ∆f − g kl D2 f (∇k u, ∇l u) + Df · V (Note that f > 0).
∂t
≤ ∆f + Df · V (since D2 f is positive as f is convex) (5.56)
Now, suppose we look at u ∈ Rn − ∂K, such that u0 is the closest point, and
w = ∇f (u0 ) with u = u0 + tw. Note that w · V (u0 ) < 0. Then
∂f
Df · V = ∂uα
Vα
∂f ∂f
= |
∂uα u0 +tw
(V α (u0 )) + |
∂uα u0 +tw
(V α (u) − V α (u0 ))
Now, the first term is negative by our initial remark that w · V (u0 ) < 0. Since
as V is smooth and for u, u0 sufficiently close, |V α (u) − V α (u0 )| ≤ Cku − u0 k =
∂f
Cd = Cf for some constant C. Note also that ∂u α |u = ∇f (u) is also bounded since
K is compact and hence the function k∇f k will be bounded on the boundary of K.
So the second term is bounded, and so we can conclude
∂f
∂t
≤ ∆f + Cf
for some constant C > 0.

This implies by the standard maximum principle that f ≤ 0 for t ≥ 0, since
f ≤ 0 initially. (Since u(z, 0) ∈ K, f (u(z, 0)) ≤ 0). So f (u(z, t)) ≤ 0, or u(z, t) ∈ K
for all t ≥ 0.
112
5.5.3 Extension of Technique to 2-tensors on Hypersurfaces

Theorem 5.5.1. Let T is a symmetric 2-tensor on M satisfying
∂
T
∂t kl
= ∆Tkl + Qkl (T )
Suppose K is a O(n) invariant convex set (we need it to be O(n) invariant

otherwise under a change of basis or representation for T the following would not
be true) in Sym(n) = {A ∈ GL(n)|At = A}.
Suppose further that Aij = T (ei , ej ) ∈ K for any orthonormal basis {e1 , · · · , en }
for Tx M , all x ∈ M , at t = 0. In other words, ”T ∈ K”.
Finally, assume that A points into K for A ∈ ∂K. Then T ∈ K for t ≥ 0.
Proof. Given a local parametrisation of M , we get {∂1 , · · · , ∂n } is a smooth basis of

Tx M . Perform Gram-Schmidt to get {E1 , · · · , En } orthonormal with respect to g.
Then write Aij = T (Ei , Ej ).
We have that
∂
A
∂t ij
= ∆Aij + Q(A)ij
Then the idea is to work with this tensor. The remainder of the argument is
completely analogous to the previous case, if only a little more complicated.
5.5.4 Application of 2-tensor Extension to Proof of Huisken

Define
K = {A|A(e, e) ≥ for all e st kek = 1}

= ∩e|kek=1 {A|A(e, e) ≥ } (5.57)
Then K is the intersection of convex half spaces and hence is convex.

Consider now the ODE
Vij (A) = Aij kAk2
113
Rescaling
Then if A ∈ K, A + sV (A) = A(1 + skAk2 ) ≥ (1 + kAk2 ) > , s > 0. ie,

V points into K. So the criteria for the 2-tensor maximum principle are met, and
we conclude that A remains in K under mean curvature flow. In particular, if we
define A = h and since M is compact, g(v, v) realises its maximum on M and hence
h(v, v) ≥ max(x∈M,w∈Tx M ) (g(w, w)) = ˆ. The relevant ODE is Vij (h) = hij khk2 ,
which is consistent with the above, and we are done.
5.5.5 Preservation of bounded curvature
As another application of the 2-tensor maximum principle, consider
K = {A|A(v, v) ≤ CA(w, w) for all v, w : kvk = kwk = 1}
Then once again, K is convex, a cone, and preserved by the above ODE.
So if κmax (x, 0) ≤ Cκmin (x, 0) ∀x ∈ M , then κmax (x, t) ≤ Cκmin (x, t) ∀x ∈
M, t ≥ 0.
Remark. Note if κmax (x, t) ≤ κmin (x, t) ∀x, ie hij = f (x)gij , then f is constant
on connected components.
Proof of Remark. By Codazzi, ∇k hij = ∇i hkj . This implies that (∇i f )gkj =
(∇k f )gij since ∇g = 0. Choose i = j 6= k, and g = δ. Then we conclude ∇k f = 0,
or f is constant.
5.6 Rescaling
Rescaling is a recurring theme in the subject of geometric evolution equations. It
is often important to understand the limiting behaviour of geometric flows, in order
to get a better understanding of the singularities which may develop, and hence
being able to justify surgery in certain circumstances. This becomes particularly
important in the Ricci flow, where the idea is to break a three manifold into pieces
by performing successive surgeries, until the manifold has been decomposed into
”prime” components. The understanding of how these ”prime” components then
collapse to points is key to their classification.
114
5.6.1 Preliminaries
Now, MCF has a scaling symmetry: if X : M0n × [0, T ) → Rn+1 satisfies MCF, then
so does
X(λ,x̄,t̄) (p, t) = λ(X(p, t̂/λ2 ) − x̄)
where t̂ = t − t̄.
Definition 13. (Convergence of hypersurfaces).
A sequence of closed hypersurfaces Mk converges to M∞ if there exist
gk ∈ C ∞ (Rn+1 ), k ∈ N ∪ {∞},
Mk = gk−1 (0), ie the Mk are level sets of the gk ,
such that
k∇gk (x)k ≥ 1 for all x ∈ Mk (all points x are regular points, and hence the Mk
are well defined submanifolds of Rn+1 ),
and
gk → g∞ in C ∞ (BR ) for all R.
5.6.2 A compactness theorem

Let Mk be any sequence of hypersurfaces
k∇(j) hk2 (x) ≤ C(R, j)
for all x ∈ Mk ∩ BR , j ≥ 0,
and assume we have a bound below on the ”tube radius”
r− (Mk , R) = sup{r ≥ 0|Y : Mk ∩ BR × (−r, r) → Rn+1 , Y (x, s) = x + sN (x) is

injective }.
(This rules out, for instance, a sequence of manifolds Mk embedded in Rn+1

with a gap of distance 1/k apart- there is a more general theorem, the varifold
compactness theorem, that allows such things to occur, but that is a story for another
day, or, at least, another section (see section on Geometric measure theory).)
i.e. r− (Mk , R) ≥ > 0, some . So ∃R > 0 such that Mk ∩ BR 6= φ for all k.
Then we conclude that there is a convergent subsequence {Mpk }k∈N .
115
The 2 dimensional Ricci Flow (2DRF)
5.6.3 Existence of smooth limit flows

Definition 14. A limit of rescalings X(λk ,xk ,tk ) , where (xk , tk ) → (x̄, t̄) is called a
limit flow.
Claim. For M0 convex, there exists a smooth limit flow at (x̄, T ), some x̄ ∈ Rn+1 .
Proof. We use results from existence theory of parabolic PDE to conclude smooth-
ness results on smaller and smaller balls.
1
For t̄ < T , choose λ = r1− . For 0 ≤ t ≤ 8n , Mλ (t) is between S n (C0 ) and S n ( 21 ).
1
On B1/4 we can write M(λ,t) = graph(u(., t)) (on B1/4 × [0, 8n ]). Then we observe
that ku(., t)k ≤ c0 and kDu(., t)k ≤ c1 some constants c0 , c1 , and
∂u Di uDj u
∂t
= (δ ij − 1+kDuk2
)Di Dj u
1 1
Now the Hölder gradient estimate holds on B1/8 × [ 16n , 8n ]. Therefore, by the
interior Schauder estimates,
kukC (k,α) (B1/16 ×[ 3

, 1 ]) ≤ Ck (n, c0 )
32n 8n
k∇(j) hk2 is computed from D(j+2) u. This implies
k∇(j) hk2 ≤ Cj (n, c0 )

3 1
on M(λ,t) , 32n ≤ t ≤ 8n .
This means that the tube radius is controlled, ie M(λ,t̄) ∩ Bk 6= φ, and so by the
compactness theorem, there is a limit flow at (x̄, T ).
5.7 The 2 dimensional Ricci Flow (2DRF)

Consider the PDE
∂
g
∂t ij
= −2Rij
Given M a compact manifold, with riemannian metric g0 , we want to find g(x, t)

a smooth inner product on Tx M satisfying this equation with g(x, 0) = g0 (x). We
then say that g(x, t) is a solution to the Ricci flow of (M, g0 ).
116
5.7.1 Introduction and Existence
Note that for n = 2, R11 = g kl R1k1l =

P
k R1k1k = R1212 = sect(Tx M ) = K. Hence
Rij = Kgij , and
∂
g
∂t ij
= −2Kgij
But note that R = R11 + R22 = 2K.

Hence gij (x, t) = e2u(x,t) gij (x, 0) for some function u.
Compute
R(g(t)) = R(e2u g0 ) = e−2u (R(g0 ) − 2∆g0 u)
1
∇g∂i ∂j = g kl (∂i gjl + ∂j gil − ∂l gij )∂l
2
1
= e−2u (g0 )kl (∂i (e2u (g0 )jl ) · · · )∂l
2
= ∇g∂0i ∂j + ∂i u∂j + ∂j u∂i − (g0 )ij ∂p u(g0 )pq ∂q (5.58)
From this we may compute the curvature:
2e2u ∂u g = −e−2u (R0 − 2∆0 u)e2u g0

∂t 0
which implies that
∂u
∂t
= e−2u (∆g0 u − K0 )
So short time existence is ok.
117
5.7.2 Evolution of the Scalar Curvature

∂
We compute ∂t R, if gij = −f gij .
Now recall Γkij = 12 g kl (∂i gjl + ∂j gil − ∂l gij ) is the expression for the Christoffel
∂
symbols in local coordinates. Observe that, although Γ is not a tensor, that ∂t Γ
is. For a proof of this, let ∇, ∇ ¯ be covariant derivatives, and consider T (U, V ) =
∇U V − ∇¯ U V = ∇U (V j ∂j ) − ∇ ¯ U (V j ∂j ) = (U V j )∂j + V j ∇U ∂j − (U V j )∂j − V j ∇
¯ U ∂j =
i j ∂
U V T (∂i , ∂j ). Since ∂t Γ is merely a limit of expressions like this, each of which are
tensors, so much the limit be a tensor.
∂
Hence we can compute ∂t Γ in any local coordinate system. Choose local coor-
dinates such that Γ(x0 , t0 ) = 0. Then
∂ k 1
Γij |(x0 ,t0 ) = g kl (∂i (−f gjl ) + ∂j (−f gil ) − ∂l (−f gij ))
∂t 2
1 kl
= g (∇i (−f gjl ) + ∇j (−f gil ) + ∇l (f gij ))
2
1 1 1
= − ∇i (f )δjk − ∇j (f )δik + gij ∇l (f )g lk (5.59)
2 2 2
Now, recall by definition
∇i ∂j = Γpij ∂p
so
∇k (∇i ∂j ) = (∂k Γpij )∂p + Γpij Γqkp ∂q
In particular, recall that
l
Rikj = ∂k Γlij − ∂i Γlkj + Γ ? Γ
so
∂ l 1 1 1
Rikj = ∇k (− ∇i (f )δjk − ∇j (f )δik + gij ∇l (f )g lk ) − (i ↔ k)
∂t 2 2 2
1 1 1 1
= − ∇k ∇j f δi + gij ∇k ∇p f g + ∇i ∇j f δkl − gkj ∇i ∇p f g pl
l pl
(5.60)
2 2 2 2
118
Then, for the Ricci tensor, we have
∂ 1 1 1 1
Rij = − ∇i ∇j f + gij ∆f + ∇i ∇j f n − ∇i ∇j f
∂t 2 2 2 2
1
= gij ∆f (5.61)
2
But
∂ ∂ 1 1 ∂R
R
∂t ij
= ( Rgij )
∂t 2
= g
2 ∂t ij
+ 12 R(−f gij )
So
∂R
∂t
= ∆f + Rf
is the evolution equation for the scalar curvature.

Specialising to 2D Ricci flow, where f = R, we get
∂R
∂t
= ∆R + R2
Consequently, we may immediately conclude that [I don’t understand this]
Rmin (0)
Rmin (t) ≥ 1−Rmin (0)t
≥ − 1t
5.7.3 Normalisation
We compute the change in area A(t) of the surface M (t) under the 2DRF
∂
g
∂t ij
= −Rgij
Now,
R p
Area = A = M
det(g)
∂
∂t
det(g) = det(g)tr(g −1 ∂g
∂t
) = det(g)(−2R)
119
∂
p p
∂t
det(g) = −R det(g)
Hence
d
R
dt
A(g(t)) =− M
Rdµ(g(t))
R
Out of interest, recall the Gauss-Bonnet theorem: M
R = 4πχ(M ).
But anyway,
e−2u (R0 − 2∆0 u)e2u dµ(g0 ) =

R R R
M
Rdµ(g) = R0 dµ(g0 )
so the rate of change in area is constant in time.

Hence if A(0) is finite there will exist a T > 0 such that A(T ) = 0. So it is of
some interest to normalise our flow to fix the area.
So define ĝ(x, t) = g(x, t)( A(g(0))
A(g(t))
). Notice that this implies that Area(ĝ(t)) =
A(ĝ(0)).
Now,
∂
= −R(g)g A(0) gA(0) R
∂t
ĝ A(t)
− A(t)2 (− R)
R
R
Define r = A(0)
(a constant).
Then
∂
∂t
ĝ = − A(0)
A
(Rg − rĝ)
∂ A ∂
Note that R̂ĝ = Rg, and define a time variable τ such that ∂τ
= A(0) ∂t
. Then
∂
∂τ
ĝ = −(R̂ − r)ĝ
This is the normalised 2D Ricci Flow equation.

Now, if we were to substitute f = R̂ − r in the computation before, we get the
evolution equation for the normalised curvature:
∂
∂τ
R̂ = ∆R̂ + R̂(R̂ − r)
120
Special Case. Suppose R(g0 ) ≤ −δ < 0. Then observe that since
∂
∂t
(R̂ − r) = ∆(R̂ − r) + R̂(R̂ − r),
by the maximum principle
(i) R̂ ≤ −δ is preserved,
(ii) ∂
∂t
(R̂max − r) ≤ −δ(R̂max − r), which implies that R̂max − r ≤ Ce−δt ,
(iii) kR̂ − rk ≤ Ce−δt for all δ ∈ (0, krk),
(iv) ∂
∂t
û= −(R̂ − r) implies that kû(x, t) − û(x, 0)k ≤ Ce−δt , which implies that
û(x, t) → û∞ (x) in the C ∞ norm, with R(e2ûinf ty g0 ) = r.
5.7.4 Soliton Solutions

”Soliton” solutions are defined to be solutions which evolve without changing geom-
etry, (up to scaling). In particular, for such a solution, there exist diffeomorphisms
φt : M → M , φ0 = id such that
ĝ(φ(x,t),t) ( ∂φt
∂xi
∂φt
(x, t), ∂x j (x, t)) = ĝ(x,0) (∂i , ∂j )
In other words, (φt )? ĝt = ĝ0 .

But this implies that
∂
∂t
= L ∂φ g = ∇i Vj + ∇j Vi
∂t
for some vector field V , where L is the Lie derivative.

Hence
(R − r)gij + ∇i Vj + ∇j Vi = 0
A gradient soliton is one such that V can be realised as the gradient of a scalar
function f . But then ∇i Vj = ∇j Vi = ∇i ∇j f , so
(R − r)gij + 2∇i ∇j f = 0,
121
the Gradient Soliton equation. It turns out that the classification of solutions
of this equation is precisely what is required for the classification of prime three
manifolds, ie, to establish the geometrisation conjecture of Thurston. As a final
comment, note that if we contract the above, we get a related equation from which
we lose no information in the case of the 2D Ricci Flow:
(R − r) + ∆f = 0
It turns out that in n dimensions, the Gradient Ricci Soliton equation is
−2Rij = 2∇i ∇j f − λgij
5.7.5 Computing the ”Variational Derivative” for the 2D

Gradient Ricci Soliton equation
This computation is relevant later on when we consider the Perelman functional for
the n dimensional Ricci flow.
Differentiate the 2D Gradient Ricci Soliton equation:
0 = ∇k Rij + ∇k ∇i ∇j f − ∇i Rkj − ∇i ∇k ∇j f
p
= ∇k Rij − ∇i Rkj + Rkij ∇p f (5.62)
The trace of this expression is
0 = ∇k R − ∇p Rkp − Rkp ∇p f
Recall the second Bianchi identity:
∇k Rijpq + ∇i Rjkpq + ∇j Rkipq = 0
Taking the trace of this, we get
0 = ∇k Rjq − ∇j Rkq + ∇i Riqjk
Taking the trace once more, we get
122
0 = ∇k R − ∇j Rkj − ∇j Rkj
ie ∇p Rkp = 21 ∇k R.
If dim(M ) = n = 2, we conclude further that
1 1
0 = ∇k R − Rδkp ∇p f
2 2
1
= (∇k R − R∇k f ), (5.63)
2
or
∇k (Re−f ) = 0
Re−f is quite reminiscent of the general form of the integrand in the Perelman
functional (see later).
Now,
∇k (k∇f k2 + R) = 2∇q f ∇k ∇q f + ∇k R
= (∇k R − R∇k f ) + r∇k f (5.64)
since ∇k ∇q f = − 21 (R − r)gkq = −(R − r)∇k f .

Hence ∇k (k∇f k2 + R − rf ) = 0.
5.7.6 C ∞ convergence of the 2DRF

I prove that normalised 2DRF on the maximal interval [0, ∞) converges in the C ∞
norm to a limiting solution for the metric g∞ = e2u∞ g0 , for g(0, .) = g0 (.), provided
that Rmax is initially finite, ie the initial scalar curvature is bounded. This is a
model of proving smooth convergence of 3DRF solitons to a sensible limit.
So let g(t) be any solution of normalised 2DRF. At t = 0, let f be a solution of
∆f + (R − r) = 0.
For t > 0, define f to be the solution of
∂f
∂t
= ∆f + rf
123
Claim. If f is the solution of this equation, then (∆f + (R − r))(x, t) = 0 for all
x ∈ M , t ≥ 0.
Proof of Claim.
Well certainly if f satisfies this equation, then
∂ ∂
∂t
(∆f + (R − r)) = ∂t
∆f + ∆R + R(R − r)
Now by definition,
∂ ∂ √ 1 ∂
p
∂t
∆f = ∂t
( i( det(g)g ij ∂x∂ i f ))
det(g) ∂x
det(g0 ), and g −1 = e−2u g0−1 .

p p
g = e2u g0 , so det(g) = e2u
Hence
∆f = e−2u ∆g0 f
So
∂
∂t
∆f = −(−(R − r))∆f + ∆ ∂f
∂t
Then
∂
(∆f + (R − r)) = ∆(∆f + rf ) + ∆R + R(R − r) + (R − r)∆f
∂t
= ∆(∆f + (R − r)) + R(∆f + (R − r)) (5.65)
But then by the maximum principle, if (∆f + (R − r))(x, 0) = 0, then (∆f +

(R − r))(x, t) = 0 for all (x, t), which proves the claim.
I now prove the convergence of normalised 2DRF to a sensible solution in the
case that r < 0. Essentially what we need to show is that R and all its derivatives
are bounded for all x ∈ M , t ≥ 0. Then, by the Ascoli-Arzela theorem, we know
that as t → ∞, there must be a subsequence of the (M, gt )t∈R+ , say (M, gtk )k∈N , that
converges to a smooth limit (M, g∞ ). But then this limit must be unique because
the space of smooth metrics with bounded curvature is compact and Hausdorff.
We aim to bound R, since then bounds on its derivatives are automatic, by the
smoothing property of geometric evolution equations of heat type (see the original
discussion of the properties of the heat equation).
Since
124
∂
∂t
f = ∆f + rf
we see by the maximum principle for parabolic PDE that kf k ≤ Cert

Now
∂
∇i f = ∇i (∆f + rf )
∂t
p
= g kl ∇i ∇k ∇l f + g kl Rikl ∇p f + r∇i f
1
= δ∇i f − R∇i f + r∇i f (5.66)
2
Hence
∂ ∂
k∇f k2 = (g ij ∇i f ∇j f )
∂t ∂t
1
= 2∇i f (∆∇i f − R∇i f + r∇i f ) + (R − r)k∇f k2 (5.67)
2
∂ ∂ ij
(Note that if g
∂t ij
= µgij then ∂t
g = −µg ij .)
Hence
∂
k∇f k2 = ∆k∇f k2 − 2k∇2 f k2 − Rk∇f k2 + 2rk∇f k2 + Rk∇f k2 − rk∇f k2
∂t
= ∆k∇f k2 − 2k∇2 f k2 + rk∇f k2
≤ ∆k∇f k2 + rk∇f k2 (5.68)
from which we conclude, again by the maximum principle for parabolic PDE,
that
k∇f k2 (x, t) ≤ Cert (5.69)
for all x ∈ M .
We wish to control R, however; this will require looking to higher derivatives of
f . So
∂
∂t
(k∇f k2 + R) = ∆(k∇f k2 + R) − 2k∇2 f k2 + rk∇f k2 + R(R − r)
125
Recall that ∆f + (R − r) = 0 by hypothesis, so
k∇2 f k2 ≥ 12 (∆f )2 = 21 (R − r)2
Therefore
∂
∂t
(k∇f k2 +R) ≤ ∆(k∇f k2 +R)−(R−r)2 +r(k∇f k2 +R)+R(R−r)−rR+r(R−r)
Hence
∂
∂t
(k∇f k2 + R − r) ≤ ∆(k∇f k2 + R − r) + r(k∇f k2 + R − r)
So, once more by the maximum principle,
k∇f k2 + R − r ≤ Cert ,
and in particular,
R − r ≤ Cert
since by our previous result k∇f k ≤ Cert , and by abuse of notation I am not
relabelling constants.
But we already know that R − r ≥ −Cert , since by the evolution equation for
the scalar curvature,
∂
= ∆(R − r) + R(R − r)
∂t
= ∆(R − r) + (R − r)2 + r(R − r)
≥ ∆(R − r) + r(R − r) (5.70)
so by the maximum principle, R − r ≥ −Cert . We have now established that
kR − rk ≤ Cert (5.71)
which is what we needed to show.

In the case that r = 0, we need to be a little bit more careful, but not overly so.
Our equation for f reduces to
126
∂
∂t
f = ∆f
so kf k ≤ C.
Like before,
∂
∂t
(k∇f k2 + R) ≤ ∆(k∇f k2 + R)
so k∇f k2 + R ≤ C
Now
∂
(2tk∇f k2 + f 2 ) ≤ 2k∇f k2 + ∆(2tk∇f k2 ) + ∆f 2 − 2k∇f k2
∂t
= ∆(2tk∇f k2 + f 2 ) (5.72)
so 2tk∇f k2 + f 2 ≤ C, and hence k∇f k2 ≤ Cmin{1, 1t }.

Now
∂
(2k∇f k2 + R) ≤ ∆(2k∇f k2 + R) − 2R2 + R2
∂t
≤ ∆(2k∇f k2 + R) − R2 (5.73)
I claim that 2k∇f k2 + R ≤ Ct , and hence that, putting together the previous
information we deduced, R ≤ Ct .
Now, we already know that R ≥ − 1t since ∂R ∂t
= ∆R + R2 , so we get that R is
controlled. We get bounds on k∇k Rk as before. So normalised 2DRF will converge
to a sensible limiting solution e2u∞ g0 for the metric.
In fact, we can conclude more than this. Since ∂t ∂
u = − 12 R, and ∂f
∂t
= ∆f = −R,
∂
we get that ∂t (2u − f ) = 0. But since f is bounded, this means that u must be
∞
bounded. But if u is bounded, we must get that u → u∞ such that R(e2u g0 ) = 0.
Finally, in the case that r > 0, we can no longer use such basic arguments
and need to use more advanced tools. Proving convergence for the cases r < 0 and
r = 0 are the analogies of the classification of the solutions to normalised 3DRF that
Hamilton and his contemporaries made in the 80s and early 90s. The case r > 0
with R > 0 initially requires the introduction of an ”entropy” to prove convergence,
much like Perelman did for the 3DRF. In particular, one defines the quantity
127
Z
Z= Rlog(R) (5.74)
M
and shows that it decreases with the time parameter under normalised 2DRF. In
particular, you find that dtd Z ≤ 0, and dtd Z = 0 if and only if g is a gradient soliton.
Also, at least for 2DRF, you need to use a related Differential Harnack Inequality.
In particular, we have
Theorem 5.7.1. (Harnack Inequality for the 2DRF). For any solution of 2DRF
with R > 0, we have the following inequality:
∂R k∇Rk2 R
− + ≥0 (5.75)
∂t R t
Proof. Recall the related Harnack Inequality for the heat equation:
k∇uk2
If ∂u
∂t
= ∆u on Rn , u > 0 then ∆u − u
+ nu
2t
≥ 0.
Now for a soliton:
0 = ∇i ∇j f + 21 (R − r)gij
∇l R∇k R
hence ∇k R = R∇k f . So ∇k ∇l R = ∇l R∇k f +R∇k ∇l f = R
− 12 R(R−r)gkl .
Hence
∂R k∇Rk2 k∇Rk2
∂t
− R
= ∆R − R
+ R(R − r) = 0
Compute:
∂ ∆R k∇Rk2
(
∂t R
− R2
+ R − r)
2 2 2
≥ ∆( ∆R
R
− k∇Rk
R2
+ R − r) + ∇R
R
· ∇( ∆R
R
− k∇Rk
R2
+ R − r) + ( ∆R
R
− k∇Rk
R2
+ (R − r))2
So we conclude by the maximum principle that
∆R k∇Rk2
R
− R2
+ R − r ≥ − 1t
The Harnack inequality for the 2DRF now is an easy consequence.
128
To complete the argument of sensible convergence of the normalised 2DRF, we

use this inequality and the entropy to bound Rt above and below for all t ≥ 0.
This gives bounds, as usual, on k∇k Rk, and using Ascoli-Arzela we conclude that
there is C ∞ convergence of u to some limit u∞ , giving a metric g∞ = e2u∞ g0 . The
stationarity of the entropy at this solution tells us that this will be a gradient Ricci
soliton.
In particular, R∞ will be constant and positive, so we conclude that M is dif-
feomorphic to S 2 or RP 2 .
Of course this argument (for r > 0), requires that R be initially positive every-
where in M . So we need a more general argument for when this is not the case, ie for
when there are points in M initially such that R < 0 at these points, in particular,
that
R ≥ − 1t
This requires us to modify the above notion of entropy and the Harnack inequal-
ities, but it is possible to make arguments analogous to the above to make this all
work.
5.8 The Ricci Flow

We now consider the n-dimensional Ricci flow.
5.8.1 Short time existence

I establish short time existence for the n-dimensional Ricci flow
∂
g
∂t ij
= −2Rij
Recall that
∂
g
∂t ij
= Apqrs
ij ∂r ∂s gpq + lower order terms
We wish to show that this system is strongly parabolic, so that we can use
existence results for such equations (just use existence theory for the heat equation,
since all strongly parabolic equations are equivalent to the heat equation under an
appropriate reparametrisation). So, we wish to establish that
129
The Ricci Flow
< Apqrs 2
ij ξr ξs ηpq , ηij >≥ kξk kηk
2
for any ξ, η, for some > 0.

Once again, remember that
Γkij = 12 g kl (∂i gjl + ∂l gil − ∂l gij )
So
l
Rikj ∂l = ∇k (∇i ∂j ) − ∇i (∇k ∂j )
= ∇k (Γlij ∂l ) + ∇i (Γlkj ∂l )
= (∂k Γlij − ∂i Γlkj )∂l + lower order terms
1
= g lm (∂k (∂i gjm + ∂j gim − ∂m gij ) − ∂i (∂k gjm + ∂j gkm − ∂m gkj )) + lower order terms
2
1 lm
= g (∂k ∂j gim − ∂k ∂m gij − ∂i ∂j gkm + ∂i ∂m gkj ) + lower order terms (5.76)
2
Hence
1 1 1 1
Rij = − g kl ∂k ∂j gij + ∂j (g kl ∂k gil ) + ∂i (g kl ∂k gjl ) − g kl ∂i ∂j gkl + lower order terms
2 2 2 2
1 kl 1 kl 1 kl 1 kl 1
= − g ∂k ∂j gij + ∂i ( g ∂k gjl − g ∂j gkl ) + ∂j ( g ∂k gil − g kl ∂i gkl ) + lower order terms
2 2 4 2 4
(5.77)
Hence recalling that Rij is a tensor that our calculation is hence the same in all
coordinate systems, we get that
−2Rij = g kl ∂k ∂l gij + ∇i Xj + ∇j Xi + lower order terms,
where Xi = 12 ∂i gkl − g kl ∂k gil .

It is possible to reparametrise this expression such that X = 0. Then Apqrs
ij =
g rs δijpq , and so
g ij g rs δijpq ξr ξs ηpq ηij = kξk2 kηk2
Hence this Ricci flow is weakly parabolic, and short time existence follows.
130
5.8.2 Evolution of the Curvature

It is possible to determine how the Riemann curvature tensor evolves along a one
parameter family of metrics on a fixed space M .
In particular, let us compute
∂ l
Rikj = ∇k (∂t Γlij ) − ∇i (∂t Γlkj )
∂t
1 ∂ ∂ ∂
= ∇k ( g lm (∇i gjl + ∇j gil − ∇l gij )) − (i ↔ k)
2 ∂t ∂t ∂t
= ∇k ∇l Rij − ∇k ∇i Rjl − ∇k ∇j Ril − ∇i ∇l Rkj + ∇i ∇k Rjl + ∇i ∇j Rkl (5.78)
(Define ∇i ∇k Rjl = R ? R)
Recall the second Bianchi identity:
0 = ∇i Rjklm + ∇j Rkilm + ∇k Rijlm
From which follows the relation
0 = ∇l Rij − ∇i Rlj + ∇k Rlijk
So using this relation, we get that:
∂ l
R = −∇k ∇p Rljip + ∇i ∇p Rljkp + R ? R
∂t ikj
= ∇p (∇i Rljkp − ∇k Rljip ) + R ? R
= ∇p (∇i Rkplj + ∇k Rpilj ) + R ? R
= −∇p ∇p Riklj + R ? R(using the second Bianchi identity once more)
= ∆Rikjl + R ? R (5.79)
5.8.3 The Structural Tensor E

The structural tensor E takes the same role for general 3 manifolds undergoing Ricci
Flow that the second fundamental form A takes for hypersurfaces undergoing MCF.
Hence it has a great deal of importance. In this section I will proceed to define this
object and derive the evolution equation that it obeys.
131
The Ricci Flow
Let (M 3 , g) be orientable. Let µ(u, v, w) be the signed volume of the paral-

lelpiped generated by u, v, and w. µ is of course the alternating tensor; in coor-
dinates, µijk = 1 if (i, j, k) is a positive permutation, −1 if (i, j, k) is a negative
permutation and 0 otherwise.
Define E ij = 41 µiab µjcd Rabcd , so Rabcd = µabi µcdj E ij (this follows from the identity
g ij µiab µjcd = gac gbd − gad gbc ).
The eigenvalues of E are called the principal sectional curvatures. In a frame
which diagonalises E, we get that
 
λ 0 0
E= 0 µ 0 
0 0 ν
with R1212 = ν, R1313 = µ, and R2323 = λ.

In particular, if kvk = 1, then E(v, v) = Sect(v ⊥ ), which gives us some form of
intuition for what E is measuring.
I now proceed to determine the evolution equation for the components of the
structural tensor. Now, we know that
∂
R
∂t ijkl
= ∆Rijkl + R ? R
Furthermore, we have that
∂
µ
∂t ijk
= −Rµijk
and hence that
∂ ijk
∂t
µ = Rµijk
Hence
∂
Ej
∂t i
= ∆Eij + 2Eik Ekj + µiab µjcd Eca Edb
which is the evolution equation for the structural tensor. This is very useful in
the 3DRF.
132
5.8.4 Convergence Results for Ricci Flow

The following theorem, due to Hamilton in 1982, is a fairly significant result in the
development of the Ricci flow:
Theorem 5.8.1. If (M 3 , g0 ), M compact, has Ricci(g0 ) > 0 then there exists a

solution of Ricci flow starting from g0 on a maximal time interval [0, T ). Further-
more, V ol(g(t)) → 0 as t → T , and the normalised Ricci flow converges smoothly
to a metric of constant positive sectional curvature, that is
(V ol(S 3 ))2/3 g(t)

(V ol(g(t)))2/3
→ gT
with Sect(gT ) = constant > 0.
Proof. I will prove this later.
In fact, we have a more general result due to Böhm and Wilking. Before I
mention it, however, I should introduce the curvature operator. The curvature
operator R̄ is a bilinear form on Λ2 T M , which is defined by the relation
R̄(aij ∂i ∧ ∂j , bkl ∂k ∧ ∂l ) = aij bkl Rijkl
In particular, R̄(∂i ∧ ∂j , ∂i ∧ ∂j ) = Sect(∂i ∧ ∂j ) if ∂i and ∂j are orthonormal.

Remark. If n ≥ 4 there are elements of Λ2 T M which are not of the form u ∧ v for
any u, v ∈ T M . Furthermore, if R̄ ≥ 0 we may conclude Sect ≥ 0, but not vice
versa for the aforementioned reason.
So now, the result:
Theorem 5.8.2. (Bohm and Wilking, 2006). If (M n , g0 ) is compact, and such

that the sum of the smallest two eigenvalues of the curvature operator R̄(x) are
positive, for all x ∈ M , then M n converges under Ricci flow to a manifold of constant
curvature (modulo scaling).
Remark. Note that having the smallest two eigenvalues positive is closely related to
the notion of 2-convexity, which is the main notion used in the Huisken-Sinestrari
result (which I will mention later on). Essentially for a hypersurface to be 2-convex
means that the sum of its smallest two principal curvatures is positive everywhere.
133
The Ricci Flow
Proof. (Böhm and Wilking, sketch). In order to prove this, we would like an ana-
logue of the notion of the structural tensor for n dimensional manifolds. So, given
any basis {Πα : α = 1, · · · , n(n−1)
2
} for Λ2 Tx M we get (writing R̄αβ = R̄(Πα , Πβ ))
∂
R̄β
∂t α
= ∆R̄αβ + 2(Rαγ Rγβ + Cαγδ C βξτ Rξγ Rτδ )
The Cαβγ are structure constants taking the analogous role to the structural
constants Eij as I defined before, and are defined by
[Πα , Πβ ] = Cαβγ Gγδ Πδ ,
where G is the metric on Λ2 T M .

The problem is then, like with the Huisken rescaling result of MCF, to find
SO(n) invariant convex subsets Ω of Sym(Λ2 T M ) such that the flow lines of the
ODE corresponding to
∂
∂t
R̄ = R̄2 + R̄ ? R̄
never leave Ω.
5.8.5 The Perelman Functional; connection with the Fisher

Information
It turns out that the Ricci Flow can be realised as the steepest descent flow of the
following functional
(Rg + k∇f k2g )e−f dµ(g)

R
F (g, f ) = M
Note that
d
= − (Rij + ∇i ∇j f )ġij e−f dµ + (2∆f − k∇f k2 + R)( 21 g ij ġij − f˙)e−f dµ
R R
dt
F
Fix a smooth measure dξ and given g, define f (g) by e−f dµ(g) = dξ.
Then the variation of Fξ (g) = F (g, f (g)) is
d
R
F
dt ξ
= − (Rij + ∇i ∇j f )ġij dξ
ie steepest descent flow is
134
∂
g
∂t ij
= −2(Rij + ∇i ∇j f )
which is nothing other than the Ricci flow with reparametrisation.

Note: The following requires some reference to the section on Fisher Information
and Physical Manifolds.
Now note that the Physical Information Functional for a sharp Riemannian
manifold M , is
R
K= M
(Rĝ − kψ̄k2ĝ )dµ(ĝ)
Suppose ψ̄ is an irrotational vector field, in other words,
ψ̄ = ∇i fˆ
for some function fˆ.

Then
(Rĝ − k∇fˆk2ĝ )dµ(ĝ)

R
K= M
Now consider a diffeomorphism of M that sends ∇ĝ to e−f ∇g some function f .

Then Rĝ = Rg e−2f , k∇fˆk2ĝ = k∇fˆk2g e−2f , dµ(ˆ(g)) = ef dµ(g), and finally
(Rg − k∇fˆk2g )e−f dµ(g)

R
K= M
So if we choose a natural choice of flow on M , ie fˆ = f , we recover the Perelman

functional. In other words, crudely speaking,
Normalised Ricci Flow can be viewed as the steepest descent flow of the ”Physi-
cal” Information.
Perhaps more intuitively, if we consider the Fisher Information Functional for a
sharp Riemannian manifold, we get
R
I= M
R(g)dµ(g)
In particular, for an arbitrary variation of the metric,
dI ∂ ij
R
dt
= M
Rij (g) ∂t g dµ(g),
135
Proof of Hamilton’s Theorem (1982)
from which it is easily deduced that the steepest descent flow is

∂
g
∂t ij
= −Rij (g),
So we may conclude:
Ricci Flow is the steepest descent flow of the Fisher Information for a sharp
Riemannian manifold.
This makes sense- for it is clear that, under the Ricci flow, we are losing infor-
mation about the manifold. Moreover, later when I discuss Ricci flow with surgery,
it is obvious that we are losing information, because we are losing topology.
5.9 Proof of Hamilton’s Theorem (1982)

This is a typeset version of a proof of Hamilton’s theorem given by Nick Sheridan
at the AMSI winter school.
5.9.1 The theorem, and sketch of its proof

We have the main result of this section:
Theorem 5.9.1. (Hamilton). Let M 3 be a compact 3 manifold which admits a
Riemannian metric of positive Ricci curvature. Then M 3 also admits a metric of
positive sectional curvature.
Proof. (sketch). We shall follow the following procedure to prove this:
(1) Prove short time existence of the Ricci Flow on M 3 starting with a metric g0
with Ric(g0 ) = 0.
(2) Show curvature blows up as t → T .
(3) Show sectional curvatures get close together as the curvature gets large.
∂g
(4) Rescale time
R
and the metric to get a solution to the equation ∂t
= −2Rij + n2 rg,
RdV
where r = RM dV . Show furthermore that this solution exists for all time and
M
converges to a metric of constant curvature.
136
5.9.2 A maximum principle

We have the following maximum principle for sections of an arbitrary vector bundle
which will prove most useful:
¯
Proposition 2. Let Π : ξ → M n be a vector bundle, with bundle metric h. Let ∇(t)
be a family of connections on ξ compact with respect to h. Furthermore, suppose
F : ξ × [0, T ) → ξ
is a continuous fibre preserving Lipshitz map on each fibre. Let κ be a closed

¯
subspace of ξ which is invariant under parallel translation by ∇(t), and such that
−1 n
κx = κ ∩ Π (x) is closed and convex in ξx for all x ∈ M . Finally, suppose α is a
solution of
∂
α ˆ + F (α)
= ∆α
∂t
such that α(0) ∈ κ. Then if for each fibre ξx every solution of
da
dt
= F (a)
with a(0) ∈ κx , remains in κx , we may conclude that the solution α(t) of the
PDE remains in κ.
5.9.3 The Uhlenbeck Trick

(This trick is used to simplify the evolution equation for Rijkl .)
So suppose g(t) is a solution to Ricci flow on M n . Let (V, h) be a vector bundle
over M n with h as metric such that
U0 : (V, h) → (T M n , g0 )
is a bundle isometry. Evolve U (t) by
∂
Ui
∂t a
= Rli Ual
137
where Uai are components of the isometry with respect to some bases for V and
T M n . Then I claim that U (t) remains an isometry from (V, h) to (T (M ), gt ) and
we can consider the behaviour of the pullback of the Riemann tensor Rm on M to
V , U ? Rm, rather than Rm.
Proof of Claim.
∂ ? ∂
(U gt )ab = (gij Uai Ubj )
∂t ∂t
∂ ∂ ∂
= gij Uai Ubj + gij (Uai )Ubj + gij Uai Ubj
∂t ∂t ∂t
i j i p j i j q
= −2Rij Ua Ub + gij Rp Ua Ub + gij Ua Rq Ub
=0 (5.80)
In other words, U remains an isometry.

Why do we want to do this? Well, first of all, the evolution equation for U ? Rm
is much nicer. Computing:
∂
Rabcd = ∆Rabcd + (R ? R)abcd
∂t
= ∆Rabcd + 2(Babcd − Babdc + Bacbd − Badbc ) (5.81)
where Babcd (Q) = −Qef ab Qfedc for a 4-tensor Q.

We can in fact view Rm as a section of the bundle ξ = Λ2 T ? M n ⊗S Λ2 T ? M n ,
since in 3 dimensions each fibre is the same as the vector space of 3 by 3 symmetric
matrices.
Our relevant ODE for this bundle is
d
Q
dt abcd
= 2(Babcd (Q) − Babdc (Q) + Bacbd (Q) − Badbc (Q)) = (R ? R)abcd (Q)
In 3 dimensions, this ODE can be written with respect to a basis {ei } of T ? Mx3
such that Q is diagonal- in particular, since E is the curvature operator (essentially)
in dimension 3, we can write everything in terms of the eigenvalues of E and we get
   2 
λ 0 0 λ + µν 0 0
d d 
E= 0 µ 0 = 0 µ2 + νλ 0 
dt dt 2
0 0 ν 0 0 ν + λµ
138
Initial values tell us all about the initial Ricci and scalar curvatures
 
µ+ν 0 0
1
Ric =  0 ν+λ 0 
2
0 0 λ+µ
and R = λ + µ + ν.
From now on we assume λ(0) ≥ µ(0) ≥ ν(0).
5.9.4 Core of the Argument

We have the following
Lemma 5.9.2. If ∃C < ∞ such that λ < C(µ + ν) at t = 0, then this condition is
preserved under the Ricci flow.
Proof. We will use the maximum principle from before, for the set
κ = {Q ∈ ξ : λ(Q) − C(µ(Q) + ν(Q)) ≤ 0}
Note that this set is invariant under parallel translation. To see that κ is convex,
note that
λ(Q) − C(ν(Q) + µ(Q))

= maxkuk=1 Q(u, u) + max(V,W |<V,W >=0) {−C(Q(V, V ) + Q(W, W ))} (from the
definition of E)
But maxkuk=1 (αQ1 + βQ2 )(u, u) ≤ αmaxkuk=1 Q1 (u, u) + βmaxkuk=1 Q2 (u, u).
This establishes convexity.
It just remains to show that the solution of the associated ODE stays in κ. This
follows from the fact that
d λ µ2 (ν−λ)+ν 2 (µ−λ)
dt
log( µ+ν ) = λ(µ+ν)
≤0
as can easily be proved from the evolution equation for λ, µ, ν from before. Thus
by the maximum principle, the solution to the PDE remains in κ.
139
The next result is
Theorem 5.9.3. If (M 3 , g0 ) is a closed Riemannian 3 manifold with Ric(g0 ) > 0

then ∃0 < δ < 1 and C̄ > 0 (depending only on g0 ) such that
λ−ν C
λ+µ+ν
≤ (λ+µ+ν)δ
Remark. Note that this theorem implies that
(λ−µ)2 +(µ−ν)2 +(ν−λ)2

(λ+µ+ν)2
≤ C(λ + µ + ν)−δ
In particular, as the curvature gets large, as it will under the Ricci flow (recall
R = λ + µ + ν), this shows that eigenvalues get ”pinched” together. But the left
hand side is scale invariant, so it follows that the limit of normalised Ricci flow, g∞ ,
has constant sectional curvature, and we are done.
Proof. Note that it suffices to show that
λ−ν C δ
µ+ν
≤ µ+ν
Since µ + ν > 0 everywhere at time 0 by the Ric > 0 condition, we may by

compactness choose such a C̄ and a δ. We now demonstrate that this condition is
preserved using the maximum principle.
Define
κ = {Q ∈ ξ : (λ(Q) − ν(Q)) − C̄(µ(Q) − ν(Q))1−δ ≤ 0}
To show that the solution to the ODE stays in κ compute
d λ−ν
dt
log( (µ+ν) 1−δ ) ≤ δλ − 21 (1 − δ)(ν + µ)
By the lemma, it is possible to choose δ = δ(C) small so that this always is non
λ−ν
positive, so that (µ+ν)1−δ is non increasing, so that the inequality is preserved by the
ODE.
Thus, by the maximum principle, it is also preserved by the PDE. This completes
the proof of Hamilton’s theorem.
140
5.10 The Huisken-Sinestrari theorem; surgery and

classification of canonical singularities for MCF
This is a typeset version, with some modifications, of a particularly instructive lec-
ture given by Gerard Huisken at the University of Melbourne, in which he motivated
much of the surgery procedure and the general considerations of the classification
program for 3DRF by considering MCF. Much of the techniques and results he and
his collaborator Sinestrari used (and proved) are completely analogous to related
results for the full blown Ricci Flow. In this section I shall outline his argument,
providing clarification wherever I view it appropriate.
5.10.1 Preliminaries, including statement of the theorem

Let M be a hypersurface evolving according to MCF. Let A = {hij } be its second
fundamental form, with eigenvalues (principal curvatures) λ1 ≤ λ2 ≤ · · · ≤ λn .
Then we say that M n is 2-convex if λ1 + λ2 > 0 everywhere on M n (this is a weaker
criterion than convexity, which would require λ1 ≥ 0 everywhere).
For a couple of examples, note that S n−1 × R is 2-convex, but S n−2 × R2 has
λ1 + λ2 = 0. For instance, a thin neck with cross section S n−1 with two bulbs on
each end is allowed (see the neckpinch in Canonical Singularities) as a hypersurface,
but the analogous picture with cross section S n−2 is not.
This definition might seem somewhat arbitrary, but note that for n = 3, If
R = (λ1 λ2 + · · · + λn−1 λn ) = 12 (H 2 − kAk2 ) > 0, this this implies 2 convexity. If
n = 4, then if the manifold M has the positive isotropic curvature of Hamilton, this
once again implies 2 convexity. So this suggests that this might be a reasonable
notion to study. In fact, we have the following result:
Theorem 5.10.1. (Huisken-Sinestrari). If M n , n ≥ 3 is 2-convex, then either

M n is diffeomorphic to S n or diffeomorphic to a finite connected sum of copies of
S n−1 × S 1 . If M = ∂Ω, Ω ⊂ Rn+1 , then either Ω is diffeomorphic to B1n+1 (0) or to
a finite connected sum of B1n (0) × S 1 .
Corollary 5.10.2. If n ≥ 3, M n 2-convex and simply connected, then M n ≡ S n

and Ω̄ ≡ B̄1n+1 (0).
Proof. (sketch). The idea as to how to prove this is to use the mean curvature flow.
So, given F0 : M n → Rn+1 , solve
141
The Huisken-Sinestrari theorem; surgery and classification of canonical
singularities for MCF
d
dt
F (p, t) = −H · ν(p, t) = ∆t (F (p, t))
subject to the initial condition F (p, 0) = F0 (p). This is a weakly parabolic

system.
We also have importantly that 2-convexity is preserved by MCF.
5.10.2 Canonical Singularities

If we perform MCF, the flow
pwill continue until it reaches a singularity. For instance,
n n
if M = SR0 , then R(t) = R02 − 2nt, with the solution degenerating to a point at
R2
t = T = 2n0 .
n−m
Similarly, for the situation of starting with the manifold SR(t) × Rm , R(t) =
p
R02 − 2mt. In particular, we have the following result, due to Huisken:
Positive Case. If M n has λ1 > 0 (convex) then Mtn contracts smoothly to a
round point. Compare this with the Ricci flow result due to Hamilton: If n = 3 and
Ric(g) > 0, then gt contracts to a round metric on S 3 /Γ.
In an ideal world, we might hope that this is always the case. But we run into
the following
Problem. If the positivity condition is relaxed, then other singularities will
develop.
In particular, for mean curvature flow, if we initially have 2-convexity, it can be
shown that these singularities fit into one of the following categories, known as the
canonical singularities:
(i) The shrinking sphere. Under MCF this solution will collapse to a point.
(ii) The neckpinch. The neck will continue to become longer and thinner under
the flow. Under a microscope the singularity looks like the infinite cylinder S n−1 ×R.
142
(iii) The cusp, or degenerate neckpinch. This is a translating solution of MCF.

In general, if one has something looking like a neckpinch where one of the spheres
is sufficiently large relative to the other, the smaller sphere will be ”eaten” by the
larger one and case (i) will apply; if both spheres decay at about the same speed a
neckpinch will develop. Hence there must be a critical point in between at which a
”cusp” is formed, which is this singularity. Under a microscope this will look like a
horn.
5.10.3 A priori Estimates

We have the following a priori estimates for MCF.
(i) λ1 + λ2 > 0 is preserved.
(ii) λ1 + λ2 ≥ H is preserved, for some > 0.
(iii) λ1 ≥ −ηH − Cη ∀η > 0. (Rescaling of singularities is weakly convex)
143
The Huisken-Sinestrari theorem; surgery and classification of canonical
singularities for MCF
(iv) If λ1 ≤ ηH (eg for a neckpinch or part of a horn not at the tip), then kλn −
λ2 k ≤ 5ηH + Ĉη (n ≥ 3). (The cylindrical or ”roundness” estimate). Roughly
what this is used for is to conclude that the curvatures λ2 , · · · , λn are all
arbitrarily close in this situation, ie the manifold looks locally like S n−1 × R.
(v) (Gradient Estimate)
k∇Ak2 ≤ η0 H 4 + Cη0
where η0 = η0 (n), and Cη0 = C(η0 , M0n ). η0 depends only on the dimension,
and not the initial data.
Remarks.
(1) (iii) is analogous to the Hamilton-Ivey estimates for the Ricci Flow, for n = 3,
which control the structural tensor E: Eij ≥ −ηR − Cη .
(2) Note that (iii) and (iv) are only useful if H is huge in the neighbourhood of a
point, so that the terms with dependance on H will dominate. In particular,
we would like to be able to show that if H is large at a point, it is large in a
neighbourhood of that point. This motivates the last estimate,
(3) There is a natural analogue to (v) in the RF: This was established a poste-
riori by contradiction arguments due to Perelman, using his non-collapsing
estimate, which he derived from his entropy.
Remark. Note that if we are not guaranteed an estimate like (v), we may get,
in the RF, the following type of singularity, described as a ”sheet of cigars” by
Hamilton.
144
If we get this type of singularity cropping up in the flow it creates problems,

because the surgery procedure (which I will get around to describing for MCF)
breaks down for this object. In other words, we cannot perform surgery on this
and then proceed to continue the flow, as we would like to do. However, using
the gradient estimate it is possible to eliminate this from the running, so to speak.
We then get an analogous classification of the singularities which develop under the
Ricci flow as ”canonical ones”, namely, getting a neckpinch S 2 × R and a horn with
cross-section S 2 , as well as the collapsing 3-sphere.
5.10.4 Surgery
Now that we have established that they will always develop, the idea is now to
perform surgery on canonical singularities of type (ii) and type (iii) (if type (i)
occurs we are done).
For the neckpinch, we cut the neck at both ends and glue in two n-balls, then
continue the flow on the individual pieces:
For the cusp, we merely slice off the end and glue in an n-ball:
This can all be made very precise. In particular, the cusp and the neckpinch can
be viewed as -thin, ie with a cross-section of metric diameter . We can show that
for sufficiently small, all estimates are preserved. A big issue is whether things
are actually improved after surgery; obviously if things do not improve we are not
145
Outline of the Classification Program
guaranteed convergence. We also need to prove that only a finite number of surgeries
are needed, ie. we do not have to perform an infinite number in an arbitrarily small
interval. Finally, we need to show that the solution ”heals” sufficiently after a
surgery before the next for us to be justified in using the same estimates (after
surgery the solution is no longer C ∞ , so we need to be careful; we have to use the
smoothing property of heat-type equations).
To make this all precise, we have the following useful result, which allows us to
think of our necks as embedded in R3 .
Proposition 3. (MCF). The interior of each neck has a canonical parametrisation

close to a standard filled cylinder, ie Dn−1 × I.
Proof. (idea). One shows that one can foliate the neck by minimal surfaces via
harmonic maps. The maps then give the required parametrisation.
Finally, we have
Proposition 4. If M n is 2-convex, n ≥ 3, then M n ∩ Π consists of finitely many 2-

convex components not nested, for any plane Π. In other words, the surgery preserves
embeddedness. (Obviously we don’t want the surface passing through the balls we are
gluuing on during surgery- this result tells us that it is possible to choose the balls
so this does not occur).
So this is what one does- one continues to perform MCF with surgery on the
manifold until all the pieces one ends up with are canonical singularities of type (i),
ie. shrinking spheres. The result then follows.
5.11 Outline of the Classification Program

Even though this section does contain some remarks from Ben’s lectures, it mostly
continues from where he left off. In this section I give a relatively detailed outline of
how one links the Hamilton-Perelman program to Thurston’s geometrisation conjec-
ture, demonstrating how classification of gradient solitons is sufficient. This section
of this survey draws most strongly from John Morgan and Gang Tian’s notes on the
Ricci Flow ([MT]).
146
5.11.1 Injectivity Radius and Collapsing of Balls

Definition 15. (Injectivity Radius). The injectivity radius at x ∈ M is defined as
the maximal diameter of a ball in the tangent space at x such that it is diffeomorphic
to its image via the exponential map.
Definition 16. (κ-non collapsed). Let B(x, r) be the ball of metric radius r centred
at x. Then if kRmk ≤ r−2 within B(x, r), then we say that B(x, r) is κ-non collapsed
if vol(B(x, r)) ≥ κrn , where n is the dimension of B(x, r).
The importance of non-collapsing is emphasised in the following result:

Result. If kRmk ≤ r−2 on B(x, r) and B(x, r) is κ-non collapsed then injx (M ) ≥
C(r, κ).
In particular, this type of control over the injectivity radius will be important,
for the following reason:
Theorem 5.11.1. (Cheeger-Gromov). Any sequence (Min , gi , pi ) with (Min , gi ) com-

plete, kRm(gi )k ≤ C, k∇k Rm(gi )k ≤ Ck and inj(gi ) ≥ > 0 has a subsequence
which converges in the pointed sense.
I should of course mention:
Definition 17. (Pointed Convergence). We say that a sequence (Mi , gi , pi ) (where

Mi are manifolds, gi are metrics and the pi are points in the Mi ) converges to
(M, g, p) in the pointed sense if there exists
(i) an increasing sequence of compact Ωi ⊂ M with ∪i Ωi = M (pi ∈ Ωi ∀i), and
(ii) a sequence of maps φi : Ωi → Mi which are diffeomorphisms onto their image,

with φi (p) = pi such that
φ?i gi →C ∞ g
What Perelman was able to show was the following result:
Theorem 5.11.2. (Perel’man). Let g(t) be a solution of Ricci flow on a compact

M n ×[0, T ]. If p ∈ M , r > 0 s.t. kRk ≤ r12 on Br (p) at t = T then ∃η = η(n, g(0), T )
such that inj(g(T ), p) ≥ ηr.
147
In particular, this means that, for the Ricci flow on compact manifolds, the
Cheeger-Gromov result applies. In other words, we have a compactness result for
such flows- which allows one to identify the original manifold as a connected sum of
its limiting pieces after finitely many surgeries, which essentially proves Thurston’s
Geometrisation Conjecture. I will repeat many of these statements later.
5.11.2 Canonical Neighbourhoods

Canonical neighbourhoods are of paramount importance in the Ricci flow, particu-
larly in Ricci flow with surgery. They reflect what happens around singularities of
the flow as the manifold gets ”stretched”, as in MCF with surgery from before.
The -neck. An -neck about a point x ∈ (M, g) is a submanifold N ⊂ M and a
diffeomorphism ψ : S 2 × (−−1 , −1 ) → N such that x ∈ ψ(S 2 × {0}) and such that
−1
the pullback of the rescaled metric, ψ ? (R(x)g) is within in the C topology to
the product of the round metric of scalar curvature 1 on S 2 with the usual metric
on (−−1 , −1 ). This is really a roundabout way of saying that we have the following
picture:
−1
Sidenote: The C topology is the topology of maps which are −1 times dif-
ferentiable.
The -cap. An -cap is intuitively a -neck which is ”rounded off” at one end,
as shown:
148
These are really the only essential canonical neighbourhoods. However, for a
proper analysis, we need to include two more:
C-components. A C-component is a compact, connected Riemannian manifold

of positive sectional curvature diffeomorphic to S 3 or RP 3 such that if the metric
is rescaled so that R(x) = 1 at some point x in the component, then diam(C-
component) ≤ C, C −1 ≤ Sect(x, v, w) ≤ C any point x, any nonparallel vectors
v, w, and C −1 ≤ vol(C − component) ≤ C.
-round components. An -round component satisfies the property that if the

metric about any x in it is rescaled by R(x), then the metric is within −1 in the
−1
C topology of a round metric of scalar curvature one.
Theorem 5.11.3. (Classification of 3-manifolds which are unions of -caps and -

necks). If a 3 manifold is a union of -caps and -necks, and is sufficiently small,
then it must be on the following list:
149
150
Furthermore, if it is of the form (1) or (2), it is one of S 3 , RP 3 , or RP 3 #RP 3 ;

if it is of the form (3) or (4), it is either R3 or RP 3 − {pt}; if it is of the form
(5) it is S 2 × R; and finally if it is of the form (6) it is either the orientable or
non-orientable S 2 bundle over S 1 .
The proof of this theorem is not entirely trivial, but neither is it particularly
novel. However, it is, in some sense, one of the core results of the Hamilton-Perelman
program. It essentially involves a lot of careful analysis and reasoning. The inter-
ested reader is advised to refer to the appendix of F. Morgan and G. Tian’s extensive
manuscript [36] for the details.
So we have eight possibilities for such manifolds: S 3 , RP 3 , RP 3 #RP 3 , R3 , RP 3 −
{pt}, S 2 × R, S 2 × S 1 and the non-orientable S 2 bundle over S 1 . It will turn out that
in fact the following result is true:
Theorem 5.11.4. (Thurston Geometrisation Conjecture). Any 3-manifold is dif-

feomorphic to a connected sum of the above eight so-called ”model” geometries.
Sketching the proof of this theorem is what I will concern myself for the rest of
this section.
5.11.3 Connection to the Ricci flow with surgery

The key to proving the above result is to start off with an arbitrary manifold (M, g0 ),
and then flow it by Ricci flow until it looks nicer. Singularities will develop; these
can be controlled and understood in terms of the model geometries. Precise maps
can be made as -necks and -caps get very long and thin ( gets very very small)
that parametrise the manifold around these singularities. Exactly analogous to MCF
with surgery, we may then perform surgery on our manifold, and then continue the
flow on the new pieces. This process is to be performed inductively.
Issues (just as with the MCF) are:
(i) Do things improve after the surgery?
(ii) What is to stop us from having infinitely many surgeries in a finite amount of
”time”?
(iii) How can we be sure the flow is ”smooth enough” after previous surgeries when
we wish to perform future surgeries?
151
(iv) Will we always get sensible limiting behaviour as a result of this process?
The standard way to address (i) is to have a priori estimates and show that one
can perform surgery in such a way that these estimates are preserved. To show that
we only have a finite number of surgeries in a finite amount of ”time”, the idea is,
crudely speaking, to observe that the process of surgery can be arranged to remove
a fixed amount of volume from the manifold, and, furthermore, that each further
surgery in a bounded interval must remove a fixed amount of volume bounded below
by some constant. We may hence conclude that an infinite number of surgeries in a
finite ”time” interval for a manifold starting with finite volume is impossible. (iii)
may be overcome by making careful arguments using the tools of geometric measure
theory and the smoothing property of parabolic equations of heat type. Really the
core trouble is with (iv). In fact, we have the following result, due to Cheeger,
Gromov and others, which I mentioned before:
Result. A Geometric Limit of a sequence of manifolds (Mn , gn (t), xn ) exists if
(i) we have uniform non-collapsing at xn in the time zero metric, and
(ii) for each A < ∞ we have uniformly bounded curvature for restriction of the
flow to metric balls of radius A about base points.
If the above result holds, then we can conclude that the limit of a sequence
of manifolds really does look like the solution of its renormalised Ricci flow, and
the question of classification of these limits comes down to classifying gradient soli-
tons. But we have the final result that gradient solitons are unions of canonical
neighbourhoods, in particular -necks and -caps, and we are done.
So evidently the key is to establish that (i) and (ii) of the above result hold for
geometric flows, otherwise the whole process of surgery will not work. In particular,
horrible solutions like Hamilton’s ”sheet of cigars” might crop up and throw a span-
ner in the works. It took Perelman with his notion of Length or ”Entropy” to show
by contradiction arguments that (i) in fact holds (relation to the Gradient estimate
from before). (ii) is essentially a consequence of the standard a priori estimates due
to Hamilton.
Before I go into some more detail about Perelman’s length and so forth, I make
the following remark- this whole argument could be simplified considerably if we
observe that the Ricci flow is steepest descent for the Fisher Information of a sharp
Riemannian 3 manifold without masses, and the corresponding renormalised Ricci
152
flow is steepest descent flow for the Physical Information of the same. Then we get
automatically that the classification of the limits of such flows is the classification of
gradient solitons. But perhaps some subtlety escapes me, most particularly because
we are not only flowing manifolds smoothly, we are also performing surgery; so I
shall continue with sketching the standard treatment.
5.11.4 Perelman’s Length or ”Entropy”

The W -entropy of Perelman is defined as
n
W (g, f, τ ) = (τ (R + k∇f k2 ) + f − n)(4πτ )− 2 e−f dµ(g)
R
∂ ∂τ ∂f
Proposition 5. If ∂t gij = −2Rij , ∂t
= −1 (ie τ ∼ (T − t)), and ∂t
= −∆f +
n
k∇f k2 − R + 2τ , then
n
∂ 1
g k2 (4πτ )− 2 e−f dµ(g)
R
∂t
W (g(t), f (t), τ (t)) = 2τ kRij + ∇i ∇j f − 2τ ij
≥0
Furthermore, equality is acheived if and only if one has a gradient Ricci soliton.
Remarks.
(1) This is precisely the monotonicity formula for the rescaled Ricci flow. Fur-
thermore, the W entropy is essentially a generalisation to rescaled Ricci flow
of the Green’s function result
2
(t − t0 )−n/2 exp(− kx−x 0k
R
R(x0 , t0 ) = Mt 4(t0 −t)
)R(x, t)dµ(gt )
(2) Note that τ (R + k∇f k2 ) + f − n is constant on solitons for the above choices
for f and τ .
Alternatively, we may define the related Length of Perelman:

R τ̄ √
L(γ) = 0
τ (R(γ(τ )) + kγ 0 (τ )k2 )dτ
defined for a ”timelike” path γ : I → M × I, such that γ(τ ) ∈ M × {t − τ }, and

γ(0) = x. We say such a path is ”parametrised by backwards time”.
153
Remark. We immediately see a similarity between this quantity and the physical
information. I suppose one could roughly interpret L(γ) as ”the physical information
of a path γ through the manifold M × I”.
A whole theory of L-geodesics can then be developed. In particular, we can show
using this functional that the Result in the previous section is true, by constructing
an a posteriori gradient estimate for Ricci flow with surgery. In particular, Perel’man
used these quantities to deduce his theorem which I mentioned earlier:
Theorem 5.11.5. (Perel’man). Let g(t) be a solution of Ricci flow on a compact

M n ×[0, T ]. If p ∈ M , r > 0 s.t. kRk ≤ r12 on Br (p) at t = T then ∃η = η(n, g(0), T )
such that inj(g(T ), p) ≥ ηr.
n
Proof. (sketch). The key step is to show that vol(B r (p)) ≥ ξr for some ξ =
−n/2 −f
R
ξ(n, g(0), T ). We define µ(g, t) = inf {W (g, f, τ )| (4πτ ) e dµ = 1} where W
is the W -entropy as defined above. Note that µ(g, t) will increase strictly with t
unless (M, g) is a soliton.
To establish the volume estimate, we construct an argument byRcontradiction- so
assume that vol(Br (p)) is very small. Then we can find fi such that (4πτ )−n/2 e−f dµ =
1, but W (g, fi , τ ) → −∞. But this gives a contradiction to the above observation,
which completes the argument.
154
Chapter 6
Statistical Geometry
6.1 Preliminary Definitions
6.1.1 Motivation
Semi-Riemannian Geometry has an issue in dealing with particles, since it fails to

treat them as objects with some degree of uncertainty as to their position; it treats
them as points. To deal with this issue, I introduce the structure on a differential
manifold of a space of metrics, with a weighting function defined at each point
signifying precisely how important each metric is there.
We are first and foremost interested in generalising the notion of a particle path.
For instance, consider the following picture:
155
Preliminary Definitions
We would like somehow to be able to model as particles objects that have mea-
sure normalised to one across each spacelike hypersurface of our manifold, with the
bulk of the measure concentrated about a world tube. I will be more specific -
we would like the measure to maximise at the centre of this tube, ie about some
optimal geodesic path, but also somehow model uncertainty in position by having
exponential falloff relative to the core geodesic.
It is necessary to construct some sort of machinery to handle this problem. In
order to do this, we consider the notion of a discrete statistical manifold, which
is a discrete space M together with a discrete distribution space A such that at
each m = m0 ∈ M there are a finite set a1 , ..., ak such that there are paths
from m to m1 (a1 ), ..., m1 (ak ) ∈ M . From these points there are paths to points
156
m2 (m1 (a1 ), a1 ), m2 (m1 (a1 ), a2 ), ...R etc. The signal function now is a function from
M to R such that, for fixed m, A f (m, a)da = 1. It assigns to each path from m0
to the m1 (ai ) the weight, or probability, of that path being selected.
Diagrammatically:
The trick will be to find a way to move from discrete spaces M and A to natural
smooth manifolds, and reduce the distance between consecutive steps m0 , m1 etc
along each path to zero. This will be the focus of the next subsection.
Also we would like to define M not only to be a metric space, but a measure-
metric space, with measure ψ. Conservation of probability demands that the signal
function and the measure be related in the following way:
157
Ultimately we will like to be able to represent particles in a sensible way as

geometric objects, intuitively as follows:
I shall now make this all more precise.
6.1.2 Introduction
Definition 18. A fuzzy Riemannian manifold (M, A, f ) is a differential manifold
M together with a Riemannian manifold A, together with a smooth non-negative
function f : M × A → R.
Each point in A is an n by n matrix corresponding to an inner product on Rn .
We might make further assumptions on the structure of A. For instance we will
assume for now that each inner product is symmetric, and that all points in A have
158
the same index. Finally we will assume that as we vary m ∈ M the inner product
g(m, a) corresponding to a ∈ A will vary smoothly; ie g(., a) will be a metric on M .
This correspondence must be smooth, and all the metrics must be of constant index
throughout M .
Furthermore, the subspace of the space of metrics with fixed index corresponding
to the set A must be connected (this is to avoid issues in applications with space-
like directions becoming timelike under different metrics, ie to avoid pathological
problems with causality). grada , diva , curla and ∆a shall from now on represent
the usual differential operators with respect to the metric corresponding to a. Ev-
idently we will need to integrate over A in order to take all of these operators into
account when we are using them, and they will be weighted in importance by our
distribution function f .
f is defined so as to have the following two properties:
The Normalisation Condition
R p
A
f (m, a) det(h(a))da = 1, for all m ∈ M
Divergence Free Flow Condition

R p
A×A×A
f (m, b)divb (f (m, c)gradc (f (m, a))) det(h(a))det(h(b))det(h(c))dbdcda = 0
for all m ∈ M where h is the metric on A
However, more generally the second property can be relaxed. When I go on to

define the coupling condition for a fuzzy manifold the significance of this will become
more apparent.
This function shall from now on be referred to as the signal function. It is by
no means unique- there is nothing to stop one from choosing several different signal
functions for the same choice of M and A.
The role of the signal function (provided it satisfies the conditions above) is
fairly straightforward. f (m, a) quantifies the importance of metric g(., a) at m ∈ M
by assigning a number to it between 0 and 1 at that point.
For instance, a standard Riemannian manifold is a fuzzy Riemannian manifold

with A being a point space, A 7→ g where g is a particular metric on M , and
f (m, A) = 1 for all m ∈ M .
We define the differential operators on a fuzzy semi-Riemannian manifold in the
following way.
159
Definition 19. If K : M × A → R and ψ : M × A → T M are general functions of

the given type on the fuzzy semi-Riemannian manifold Λ = (M, A, f ), then
R
gradΛ K(m, a) = Rb∈A f (m, b)gradb K(m, a)db
divΛ ψ(m, a) = Rb∈A f (m, b)divb ψ(m, a)db
curlΛ ψ(m, a) = b∈A f (m, b)curlb ψ(m, a)db
Note now that the divergence free flow condition for the signal function is equiv-
alent under this new notation to
R
A
divΛ gradΛ f (m, a)da = 0 for all m ∈ M
6.1.3 Particles and mass flux

Clearly we want to be able to define some notion of particle on this structure and
map out its behaviour. We want the notion of particle to somehow be compatible
with our signal function, in that we want its motion to also be divergence free (we
want stuff to have divergence free flow because if it didn’t, that would mean stuff
is entering a region and not exiting or emerging from a region without stuff going
in in the first place). We also want, given some specification of values for our mass
distribution on the boundary of a compact neighbourhood, a precise idea of how the
mass distribution filters through that neighbourhood. In other words, we want to
define some sort of generalisation of geodesic flow/geodesics.
This leads us to define
Definition 20. A mass distribution ψ : M × A → T M on a fuzzy riemannian

manifold (M, A, f ) is a distribution that is compatible with the signal function f ,
that is, for all (m, a, b, c) ∈ M × A × A × A:
f (m, c)divb ψ(m, a) = divb (f (m, c)gradc (f (m, a))) (6.1)
This makes sense because we obviously want ψ to also satisfy a more general
divergence free condition:
R p
A×A
f (m, b)divb ψ(m, a) det(h(a))det(h(b))d(a ⊗ b) = 0 for all m ∈ M
160
which is notationally equivalent to (ignoring determinant factors) the following

conservation equation
Z
divΛ ψ(m, a)da = 0, (6.2)
A
for all m ∈ M .
In fact, as it will turn out later, this notion of compatibility is precisely what is
required for ψ to correspond to a notion of generalised momentum (see the section
on interesting examples).
Also, a mass distribution must satisfy the property that
kψ(m, a)k2(m,a) ≥ 0 for all (m, a) ∈ M × A.
as well as
kψ(m, a)k(m,a) = 0 iff ψ(m, a) = 0
One could think of the first of these as a ”reality” or causality condition on ψ.

For if kψ(m, a)k2(m,a) < 0 for some (m, a), then kψ(m, a)k(m,a) is purely imaginary,
so cannot be interpreted in any physically meaningful way. Note that in general
this could happen since the space of metrics we are dealing with might have nonzero
index. This will become important later on (rather soon actually) when I define
length. The second condition is a positivity condition.
Note the conservation equation is still an extraordinarily strong condition. In-

tuitively it amounts to choosing a particular choice for ψ so as to make the solution
unique. Also note that, once again, the notion of compatibility can be generalised.
For instance, the compatibility equation could be written as:
divb ψ(m, a) = ∆b f (m, a) (6.3)

For the purposes of ease of calculation I will in fact use this equation first and
use this as a toy example to demonstrate how one derives the secondary coupling
condition as a consequence. I will then proceed to use the previous, more physically
motivated equation and proceed along similar lines to derive its secondary coupling
condition.
Let F be the space of functions {f |f : M × A → T M }.
161
6.1.4 Fuzzy geodesics

Define length as follows:
Definition 21. The length L : F × {U |U ⊂⊂ M } → R is defined by

R R p
L(ψ, U ) = m∈U a∈A
k ψ(m, a) k(m,a) det(g(m, a))det(h(a))dadm
where k . k(m,a) denotes the norm with respect to metric g(., a) at the point m ∈ M .
It turns out that, in order for the flow to have locally extremal length, ψ and f
must satisfy the additional coupling condition for all (m, a) ∈ M × A:
ψ(m, a)
∇(ψ(m,a)−gradb f (m,a);b) =0 (6.4)
kψ(m, a)km,a
Here ∇(.,a) is the Levi-Civita connection with respect to metric g(., a).
R R
Proof. Let L(φ, U )s = A U kφ(m, a, s)k(m,a) dmda be a variation of L(φ, U ).
Then
Z Z
∂ ∂
L(φ, U )s = kφk(m,a) dmda
∂s A U ∂s
∂
kφk2
Z Z
= 1/2 ∂s dmda
A U kφk
Z Z
= < (∇ ∂ ;b) φ(m, a), φ(m, a) > /kφ(m, a)kdmda (6.5)
∂s
A×A U
The integral is over A × A now since information transfer throughout A is in-

stantaneous and so we must take into account all the different covariant derivatives
for each metric. Now since divb φ(m, a, s) = ∆b f (m, a) we can write φ(m, a, s) =
gradb f (m, a) + curlb K(m, a, s) for some vector field K. Note that f is fixed in the
∂
variation; only K is a function of s. Therefore ∂s is a vector only in the direction of
the curl of φ.
So...
162
< ∇( ∂ ;b) φ, φ/kφk > =< ∇(v;b) (φ − gradb f ), φ/kφk > for arbitrary v
∂s
=< ∇(φ−gradb f ;b) v, φ/kφk > since the connection is symmetric

= − < v, ∇(φ−gradb f ;b) φ/kφk > +(φ − gradb f ) < v, φ/kφk >
(6.6)
Let us consider the second term. Set G = curlb K.
Z Z Z Z
∇ < v, φ/kφk > .(φ − gradb f ) = ∇ < v, φ/kφk > .G
A U ZA ZU Z Z
= ∇.(G < v, φ/kφk >) − < v, φ/kφk > ∇.G
A U A U
(by a vector identity). (6.7)
Now the first term vanishes by the divergence theorem and the fact that v
vanishes on the boundary. The second term vanishes since div curl is the zero
operator.
Hence
∂ φ
R R
∂s
L(φ(m, a, s), U )|s=0 =− A U
< v, ∇(φ−gradb f ;b) kφk > dmda = 0
And since v is arbitrary, by the fundamental theorem of the calculus of variations,

φ(m,a)
∇(φ(m,a)−gradb f (m,a);b) kφ(m,a)k = 0. Q.E.D.
m,a
Now, what about the more physically motivated compatibility equation from
before, (6.1)?
1
Well certainly we must have divb ψ(m, a, s) = f (m,c) divb (f (m, c)gradc f (m, a))
for some variation ψ(m, a, s) of ψ(m, a). Since the right hand side is not a function
of s, we must have ψ(m, a, s) = curlb A(m, a, s) + F (m, a) for some good choice of
F (Note also that the left hand side is not a function of c, so the right hand side
must exhibit some form of cancellation). Such a good choice of F is any solution
1
of the equation divb F (m, a) = f (m,c) divb (f (m, c)gradc f (m, a)). Then if we define
γ(m, a, b) = curlb A(m, a, 0), the resulting condition for the length being minimal
that results is
163
Existence of solutions
ψ(m, a)
∇γ(m,a,b) =0 (6.8)
kψ(m, a)k(m,a)
as can be easily verified by following the same procedure as for the other example.
A pair (f, ψ) satisfying (6.1) and (6.8) I shall from now on refer to as a fuzzy geodesic.
Note that (6.1) and (6.8) can be interpreted as describing a coupling between the
curvature and mass distributions on the manifold M .
6.2 Existence of solutions

6.2.1 Stability
It is necessary to be careful, since we are only extremising length, not necessarily
minimising it. For a solution to be minimal, we hence require the additional condi-
tion that for a two parameter variation of ψ, Ψ : [−, ] × [−δ, δ] × M × A → T M
(where Ψ(0, 0) = ψ), that
∂2L
∂u∂v
(Ψ(u, v), U )|u=0,v=0 ≥ 0.
But even this is beset with problems. For if any of the second variations are
in fact 0 we cannot conclude anything about stability and must examine higher
derivatives. In particular we would need all the third order variations to be ≥ 0.
But then if a single third order variation is 0, we have to look further, etc. However,
since the domains we are dealing with are compact, it is inherently reasonable to
expect that we should be able to stop this process after looking at a finite number
of derivatives. So we have:
Conjecture. In order to ascertain whether a solution to this process is stable over

a compact domain we only need to look at a finite number of derivatives.
Proof. (Idea). Intuitively if all the variational derivatives up to a certain order n

vanish the solution is flat to a certain extent and in order to vary sufficiently to satisfy
the boundary conditions it becomes apparent that the diameter of the domain must
exceed some certain parameter C(n). It is obvious that C(n) must increase strictly
with n. Hence there will be some natural number N such that for all n ≥ N , C(n)
is greater than the diameter of the domain. Hence it is only necessary to look at
the variations of degree less than N .
164
Remark. In order for the above proof to work it may need to be established that
all the induced P.D.E.s from looking at nth order variations have analytic solutions.
This is something that we should expect from this problem, however, since it is
motivated by physical considerations.
Definition 22. A domain Ω is called fundamental if it suffices to look at only the
first and second order derivatives to determine stability, i.e. if C(3) > diam(Ω).
Let me make all this more precise.
Definition 23. The diameter of a set U ⊂ M is defined as
diam(U ) = supa∈A (diama (U ))
where diama (U ) is the standard diameter from semi-Riemannian geometry with
respect to the metric indexed by a.
Theorem 6.2.1. U ⊂ M is topologically compact iff diam(U ) is finite.
Remark. This could be more or less taken as a definition for the topology of Λ =
(M, A, f ).
R
Conjecture. Suppose λ = sup(m,a)∈M ×A A f (m, b)gradb f (m, a)db is finite, and
dim(M) = n. Then, for each R > 0, there is a constant K(λ, n, R) such that,
for any U with diam(U ) < R, it is sufficient to determine stability to the local
length minimisation problem in U by looking at q-parameter variations of order up
to q = K(λ, n, R).
Remarks.
(i) This is in accordance with intuition, because if it was necessary to look at an
infinite number of variations within a compact domain, somehow our solution
would possess an infinite amount of information in a bounded domain, which,
needless to say, is unphysical.
(ii) Note that K(λ, n, R) need not be an integer. In fact, in analogy with fractional
differentiation as defined before, it makes perfect sense to be able to make
variations of noninteger order. Note that this level of sophistication is probably
unnecessary, since physical functions tend to be C ∞ anyway.
(iii) The global bound on the variability of f is essentially a global bound on the
curvature of Λ = (M, A, f ). In other words, I am roughly saying, ”provided Λ
does not curve too much, we can put a bound on the number of variations we
need to make”.
165
Existence of solutions
6.2.2 Characterisation of the solution space

There will be different locally minimal solutions for a given set U with signal function
f . These solutions for ψ will be discrete in F since U is compact. We could loosely
think of these in analogy with eigenfunctions. It is possible that there may be
families of these solutions; if so, we could loosely think of the parameters indexing
the families as eigenvalues.
We have the following conjecture, which is in two parts:
Conjecture. Given a set U , complete knowledge of f and knowledge of the boundary

conditions ψ|∂U , ψ is uniquely determined in U by (6.1) and (6.4). Furthermore, the
solution ψ can be written in terms of the minimal solutions for the length function
L on U .
Though it is possible that both parts of this conjecture are incorrect, and we
really should be looking at the following problem:
Problem. Given a set U and complete knowledge of f in U , compute all possible
solutions ψ can take in U such that the length function L(ψ, U ) with fixed boundary
conditions ψ|∂U (m, a) = g(m, a) is locally minimised in F.
The influence of boundary conditions mean that (6.1) and (6.4) will no longer
suffice to determine the correct solutions inside U for ψ, and we will have to derive
new relations that ψ, g and f must jointly satisfy. If solutions are nonunique, I
posit the further
Conjecture. Suppose we have the conditions of the conjecture immediately above.

Suppose furthermore that we know ψ precisely in some neighbourhood of U in M −
U int . Then ψ is determined precisely inside U .
If this conjecture is correct, this leads one on to being able to define the relative
probability of a solution occuring inside U in terms of random noise outside.
Definition 24. The ”probability that a solution S will occur” is interpreted as

the relative influence of random noise outside a countably infinite number of copies
of (U, f ) in determining it. Furthermore, if we have two solutions, A and B, with
length(A) < length(B), we should reasonably expect probability(A) > probability(B).
How one would go about computing these probabilities as a function of U , f and
state S might be as follows.
Define NU to be the space of solutions for ψ in U .
166
Define L(ψ|U , K)avg to be the average of the length induced from all possible
extensions of f to K ⊂ M from U ⊂ K (since we are dealing with copies of U only,
so f may change outside U and hence be effectively random), given a particular
solution ψ|U . (Note that if the above conjecture is true then ψ will in fact extend
uniquely to all of K once we know f .) Then define the normalised length of solution
ψ|U as
P
L̂(ψ|U ) = limα→∞ L(ψ|U , Kα )avg / S∈NU L(S, Kα )avg
where Kα are a series of expanding sets that envelop all of M in the limit (we
are of course making the assumption that M is paracompact).
We now want to define a notion of probability such that if L̂(S1 ) = nL̂(S2 ),
then P (S2 ) = P (S1 )/n. Certainly 1/L̂(S2 ) = n/L̂(S1 ), so define the probability of
solution ψ|U occuring as being proportional to 1/L̂(ψ|U ). In particular,
1 1
P
P (ψ|U ) = L̂(ψ| )
/ S∈NU L̂(S)
U
Conjecture. This corresponds to the real probability that a solution will occur.
In order to understand the types of solutions that may occur, we have the
following conjecture:
Conjecture. Stable solutions (ie first variation of length is zero, second variation
of length is positive) are characterised by geometrical symmetries. In other words,
there is a group of symmetries acting on the space that preserves the solution.
Which leads us to
Conjecture. There is a 1-1 correspondence between symmetry groups acting on
(M, A, f ) and stable solutions for ψ in (M, A, f ).
Other examples of the variation problem might be in modeling the excitation of
an object by an incoming packet of energy, or modeling the possible states leading
up to a particular distribution of mass/energy. This motivates the more general idea
of considering the mixed problem where part of the boundary conditions are fixed
and the other values for ψ on the boundary are allowed to vary, then trying to find
stable minimal solutions for the length over the compact domain.
Yet another generalisation that can be made to this variational problem is by
first associating to each a ∈ A a U (a) ⊂ M such that the correspondence a 7→ U (a)
is smooth. Denote this particular distribution of sets by D(A). Then define the
generalised length to be
167
Two classes of Signal Functions and related results
R R p
L(ψ, D(A)) = a∈A m∈U (a)
k ψ(m, a) k(m,a) det(g(m, a))det(h(a))dmda
Then solve the same problems as before while extremising this notion of length.
6.3 Two classes of Signal Functions and related

results
6.3.1 ”Sharp” signal functions

There are several interesting choices we can make for the signal function f . For
instance, if the physical behaviour of the manifold is sharp, we have f (m, a) =
δ(σ(m) − a) for some cross section σ : M → A. In other words we have one, and
only one choice of inner product defined on Tm M for each m ∈ M . Of course we
then have ψ(m, a) = ψ(m, a)δ(σ(m) − a) for some function ψ. Now, if we consider
the compatibility condition (6.1) and substitute the above functional forms for ψ
and f , we get nothing other than
divσ(m) ψ(m, σ(m)) = 0,
precisely what we would want if ψ was to correspond to momentum!

Now, if we solve for F in δ(σ(m)−c)divb F (m, a) = divb (δ(σ(m)−c)gradc δ(σ(m)−
a)), we get that F = 0 is a solution. So we get finally that
ψ(m,σ(m))
∇(ψ(m,σ(m));σ(m)) kψ(m,σ(m))k =0
as our secondary equation, which is precisely the standard geodesic equation.

This is, once again, in agreement with classical physics- we have correspondence
between the particle paths and the geodesics of the manifold.
One might now ask what sort of equations T = ψ ⊗ ψ might satisfy in this
case. Here of course we are thinking of T as the stress energy tensor, inspired by
[MTW]. Well certainly divσ(m) T = 0 follows easily from divσ(m) ψ = 0. What about
a secondary equation for T ? For this we need additional information about what it
means to be a physical manifold- see the next section.
168
6.3.2 Almost sharp signal functions

Another interesting case is when the space of metrics is nontrivial but, in some
sense, very narrow at each point in M . To model this, first identify Tm M with M ,
assuming, of course, that M is complete.
Then consider
expm (−|(σij (m)−aij )mi mj |/h)
f (m, a) = R
expm (−|(σij (m)−a0ij )mi mj |/h)da0
a0 ∈A
So once again we have a choice of metric σ(m) defined effectively at each point
m ∈ M , except this time we have a distribution of approximate distance h about it
in the space of metrics, loosely speaking.
Since we are dealing with a narrow distribution, any ambiguity with our choice
of compatibility condition is accurate up to order (h/(min eigenvalue of σ(m)))dimM
which is very small anyway, provided h << 1.
Interesting subcases are when σij is diagonal with signature (1, 1, 1, −c2 ) or di-
agonal with signature (1, 1, 1, 1).
To solve for ψ in this case we might just blindly hack away and try to solve, or
we might try
ψ(m, a) = ψ(m)expm (−|(σij (m) − aij )mi mj |/h) for some choice of ψ.
We might want to look at the case where the length scale, L, varies with posi-
tion. A natural choice is to modify the usual choice, h, by an invariant depending
onpthe inner product at that point. So it is sensible to look at a length scale L(m) =
|(σ (m)−a )m m |
h det(σ(m)), and look at signal functions f proportional to expm (− ij L(m)ij i j ).
A very important signal function to look at is f (m, a) = (1 + ∆σ )δ(σ(m) − a),
for << 1, as shall become apparent later. I shall now proceed to derive the
conservation and geodesic equations for this particular object.
In particular, the conservation equation will be
(1 + ∆σ )divσ (1 + ∆σ )ψ̄ = 0
which reduces to
(1 + 2∆σ )divσ ψ̄ = 0 (6.9)

The geodesic equation will be
169
Two classes of Signal Functions and related results
(1+∆σ )ψ̄
(1 + ∆σ )∇((1+∆σ )ψ̄,σ) k(1+∆ σ )ψ̄k
=0
After discarding terms of order 2 or higher, this reduces to
ψ̄∆σ kψ̄k
∇(ψ̄,σ) kψ̄ψ̄k + {∆σ ∇(ψ̄,σ) kψ̄ψ̄k + ∇(∆σ ψ̄,σ) kψ̄ψ̄k + ∇(ψ̄,σ) ( ∆kψ̄k
σ ψ̄
− kψ̄k2
)} =0
But this expression may be simplified, for we observe that the last four terms
can be written as
2∆σ ∇(ψ̄,σ) kψ̄ψ̄k
Hence, we have the expression for our geodesic equation for this choice of signal
function:
ψ̄
(1 + 2∆σ )∇(ψ̄,σ) =0 (6.10)
kψ̄k
6.3.3 Example of a Fuzzy Geodesic

In this section I examine solutions of (6.10) subject to (6.9). Let σ be the metric
of standard Euclidean space. Then I make the following Claim. The solution to
(6.10) for σ = Id is
ψ
kψk
= Aexp(i < k, x > −1/2 ) + α
where by abuse of notation I am embedding Rn in C n .

Proof.
ψ
(1 + 2∆)ψ( kψk )=0
is equivalent to
ψ
(1 + 2∆) dtd { kψk (x + tψ)}|t=0
which is the same as
(1 + 2∆) ∂x∂ j (ψ̂ i )ψ̂ j kψk = 0
170
ψ
where ψ̂ = kψk .
Rewriting we get
(1 + 2∆)( ∂x∂ j (ψ̂ i ψ̂ j ) − ψ̂ i div ψ̂) = 0
Using (6.9) we get
(1 + 2∆) ∂x∂ j (ψ̂ i ψ̂ j ) = 0
So we are interested in solving an equation of the form
(1 + 2∆)∇f = 0
Note: If = 0, f is a constant, which is synonymous with the classical case of

geodesics in flat space just being straight lines, and hence the flow being constant.
So (1 + 2∆)gi = 0. If we try gi = Aek·x , ∆gi = Akkk2 ek·x and hence A(1 +
2kkk2 )ek·x = 0. Hence kkk = 2√i .
So
u ·x
i 2j√
gj = e ,
where kuk = 1.
µ √ kl ukl x
Hence (∇f )µα = Aα exp(i unu √xnu ), and therefore f kl =
2
Â exp(i 2p√p ) + a
√
But using symmetry ψ̂ k ψ̂ l = f kl = B̂ k B̂ l exp(i(τ k + τ l ) · x √1 ) + αk αl
√ k
Hence taking square roots we get that ψ̂ k = C k exp( iλ√·x ) + γ k for some new
constants C, λ, and γ.
Remark. For the above solution, the overall direction of flux is in direction γ k , since
the oscilliatory term changes phase so rapidly with small epsilon that all contribu-
tions over an integrated region will cancel each other out. However, we do expect to
have some correction term of order or higher, which will be the focus of the next
investigatory exercise.
Proposition 6. For the above solution to the fuzzy geodesic equation, consider a
rectangular region R with coordinates γ̂,γ̂ ⊥ . Suppose further that R is uniformly not
small (with dimensions of order n , for n < 1/2). Then if the volume of the region
is V , the amount of flux through the region, Φ, is
171
Stokes Theorem for Statistical Manifolds
√
Φ(R) = V kγk + E()
where the error E() is bounded in magnitude by
Kk N C · nexp( iλ·x
R
√ )k

where n is the unit normal to the boundary of V , N is an n neighbourhood of

∂R for some n > 21 and K is some positive constant depending on R.
Remark. It is clear to see from the above proposition that the corrections to the flux
are of order 1+n , and can hence be neglected. This is in accordance with ”classical
intuition” – fuzzy effects should only be local, and not have global consequences.
6.4 Stokes Theorem for Statistical Manifolds

6.4.1 Naive formulation
Many times in what is to follow I will require the following result, the analogue of
Stokes theorem applied to statistical manifolds:
Theorem 6.4.1. (Stokes).
R R R R
M
d ω=
A Λ ∂M A
ω
Proof. In fact, though daunting at first, this is relatively trivial to prove, and follows
easily from Stokes theorem in the sharpR case. If we absorb the distribution f into
the derivatives and write the form ω as b∈A F (m, a)dK(m, b), for F a function and
the dK normalised unit forms, then we observe that, pointwise
R R
R R ∂ (m, b)Fk (m, a)dxj (m, b)dKi (m, c) =
M A ijk j
∂ (φbc (m), c)Fk (φbc (m), a)dxj (φbc (m), c)dKi (φbc (m), c)
Mbc A ijk j
for a suitable change of metric on M , and φbc : M → Mbc . Since φ is a diffeo-

morphism, we then see that
R R
R R M A ijk ∂j (m, b)Fk (m, a)dxj (m,
R b)dK
R i (m, c) =
F (φbc (m), a)dKi (φbc (m), c) = ∂M A Fk (m, a)dKi (m, c)
∂Mbc A k
and since then this holds for all points m, a, b, c it follows that the statistical
stokes theorem is true.
172
6.4.2 The general statistical derivative

However there was some display of naivete in our proof of the previous result, since
we were assuming our statistical derivative had properties that it may well not in full
generality. So evidently before we can proceed further and fully nail Stokes theorem
for statistical spaces it is necessary to flesh out precisely what I mean by a general
statistical derivative in a geometrical context.
I will show that the appropriate derivative to use for a statistical manifold is in
fact
(ij)
R ∂ ∂σb ∂
∂Λ;f (m, b) = A
F (m, c)( ∂m + ∂c (ij) )dc,
∂σb
R
where f (m, a) = A F (m, b)δ(σb (m) − a)db. (Note that, if σb is just
R a number, as
∂
in our number theory example, the statistical derivative reduces to A F (m, c)( ∂m +
∂
∂c
)dc, since we don’t have matrix multiplication twisting things up, or rather, matrix
multiplication is trivial ).
Recall Theorem 2.5.2 from chapter 2:
R R R R
M
d
A (M ;a)
ω(m, a) = ∂M A
ω(m, a)
But this is clearly equivalent to what we want to show, for since d(M ;a) is a
statistical derivative, we can simply let it be the statistical derivative with respect
to f , and we are done. In other words, stokes theorem holds for general statistical
manifolds.
173
Stokes Theorem for Statistical Manifolds
174
Chapter 7
Fisher Information and

application to the theory of
Physical Manifolds
7.1 The Shannon Entropy

Clearly we need more information about what it means to be a physical manifold
in order to generalise the classical results. I am motivated by Perelman’s notion
of entropy for the Ricci flow [42], since a flow is another name for a 1-parameter
variation, and the Ricci tensor occurs in the classical Einstein equation. So we are
led to consider entropy, and what it means in the context of a physical manifold.
I follow the treatment given in Khinchin [33] in what follows.
Definition 25. A finite scheme is a collection of events {n1 , ..., nk } with associated
probabilities {p1 , ..., pk }.
For a particular scheme, we want to measure the degree of uncertainty in it-

for example, if we have 2 events with probabilities 0.5 each, the relative degree of
uncertainty is higher than if we had probability 0.01 for one and 0.99 for the other.
It turns out that a very Pgood measure of uncertainty for a given scheme is its
entropy H, given by H = − i pi ln(pi ).
Suppose we have two schemes A and B, and form the product scheme AB. If
events in A are entirely unrelated to those in B, we have that H(AB) = H(A) +
H(B). In general, however, these two schemes will depend on one another, and we
175
The Shannon Entropy
P
get that H(AB) = H(A) + i pi Hi (B), where Hi (B) is the conditional entropy of
the scheme B given that event Ai has occured. To be more precise, let qij be the
probability that the event Bj in the scheme B occurs given that the event Ai of
scheme A occurred, and let pi be the associated probabilities to the events in A.
Hence
X
H(AB) = − pi qij (ln(pi ) + ln(qij )) (7.1)
i,j
X X X X
=− pi ln(pi ) qij − pi qij ln(qij ) (7.2)
i j i j
X
= H(A) − pi Hi (B) (7.3)
i
P
We then define HA (B) = i pi Hi (B).
We have the following result which I give without proof. The interested reader
is advised to refer to Khinchin [33] if they wish for a detailed argument.
Theorem 7.1.1. There is only one notion of entropy, up to some proportionality

factor, which satisfies the following three properties:
Pk
(i) For any given k and i=1 pk = 1, the function H(p1 , ..., pk ) is largest for
pi = 1/k, i = 1, ..., k.
(ii) H(AB) = H(A) + HA (B), where HA (B) is as defined above.
(iii) H(p1 , ..., pk , 0) = H(p1 , ..., pk )
Remarks. It can be shown that (i) holds for the notion of entropy that we
have defined. Clearly we would want this to be true, since intuitively things are
most uncertain in a scheme when all events have equal probability. (iii) basically is
the statement that the entropy of a scheme should not change if we add impossible
events. Most importantly, all three of these properties hold for the notion of entropy
defined above.
Remark. There is also the notion of the information of a system. Intuitively
speaking, if we perform an experiment on a scheme we gain some information (by
finding out which event occurs), and we eliminate the uncertainty of the scheme.
Hence this notion of information should be an increasing function of the entropy. It
176
Fisher Information and application to the theory of Physical Manifolds
is sensible to choose this function to be proportional to the entropy, so we can really

think of entropy as latent information.
Evidently we might think of using this notion of entropy for our purposes by
suitably generalising it and requiring it to be critical. However this particular form
of entropy is not entirely suited to our purposes, for a couple of reasons:
(i) Shannon entropy is more or less a global measure of uncertainty and does not
allow finer examination of phenomena. In other words if we were to shuffle
the values in a finite or even continuous scheme around we would get the
same value for the entropy. This can be seen most clearly in the one dimen-
sional case by taking a discrete sum of values and observing that there is no
interdependence between them.
(ii) This entropy does not incorporate the act of observation or measurement of
information into the system. Shannon entropy is a measure of uncertainty from
the point of view of an observer from outside the system; for instance, someone
observing the outcome of the roll of a die. It is not immediately apparent that
this should matter, but, in general, the act of making a measurement from
within a system will actually locally perturb the system, and hence, perturb
the measurement. In other words, internal information (which we want to
quantify) 6= external information (which is the quantity measured by Shannon
entropy).
There is in fact a notion of information/entropy known as Fisher Information

that avoids these problems, which suggests it is more natural to use. This will be
the focus of the next section, and the remainder of this dissertation.
7.2 Fisher Information Theory and the Principle

of Extreme Physical Information
7.2.1 A few definitions from statistics
Before I can talk about what the Fisher information is, I will have to guide the
reader (and myself) through a crash course in statistics in order to equip properly
for the development and motivation of the Fisher information as a useful invariant for
getting useful data from a physical manifold. In the course of initially investigating
177
Fisher Information Theory and the Principle of Extreme Physical Information
this area I found Wikipedia to be most useful. However, due to the dubious and
changeable nature of this source, it behooves me not to provide references to the
wikipedia entries where I originally found this data, but rather a more conventional
source, such as a modern introduction to statistics. An example of such a book
containing the material I will need is the recent one by Coolidge [13].
After developing the statistical tools required, I shall refer to Frieden’s pioneering
work [19], describing in some detail why I have decided to use Fisher information in
the development that follows.
Definition 26. A probability distribution with statistical variables θ is a nonnegative

function f : A × M → R satisfying the property that
R
A
f (x, θ)dθ = 1 for each x ∈ M
For example, the signal function is an example of such a function, with A being
the indexing space for events and M being a manifold upon which the events occur
pointwise.
Definition 27. A random variable is a function X : A × M → R for a probability

indexing space A.
Now choose a probability indexing space A, and let f be a probability distribu-

tion defined on it, and let X : A × M → R be a random variable.
Definition 28. Two events α, β ∈ A are independent if and only if f (x, α ∩ β) =

f (x, α)f (x, β) for all x ∈ M . Note that this is certainly true for signal functions, if
we regard the events as individual metrics.
Definition 29. Two random variables X, Y are independent if and only if the events
[X ≤ a] and [Y ≤ b] are independent for any numbers a, b.
Definition 30. The expected value of a random variable X for a probability distri-
bution f is given by the relation
R
E(X) = A
f (x, θ)X(x, θ)dθ
Definition 31. The variance of a random variable X for a probability distribution

f as above is given by the relation
var(X) = E((X − µ)2 )
178
where µ = E(X) is the expected value of X.
It is clear why we would expect E(X) to be a useful invariant. What is not clear
is why we would ever want to consider var(X).
Well, certainly var(X) gives us some measure of the variability of the random
variable X about its mean. It can easily be shown that var(X) = E(X 2 ) − (E(X))2 .
Furthermore, for independent random variables X, Y we have that var(X + Y ) =
var(X) + var(Y ).
Definition 32. The score for a probability distribution with one hidden variable θ
for a random variable X that does not depend on position in M is defined as
∂
V (X, θ) = ∂x
(ln(X(θ)f (x, θ)))
An important property of the score is that if we view it as a random variable

then E(V ) = 0, provided certain integrability conditions are met (see J. Koliha’s
notes on analysis [34], p.209), since
R f 0 (x,θ) ∂
R ∂
E(V ) = A f (x,θ)
f (x, θ)dθ = ∂x A
f (x, θ)dθ = ∂x
(1) =0
7.2.2 The Fisher Information and the EPI principle

We are now ready to define the Fisher information.
Definition 33. The Fisher Information I(x, X) corresponding to a probability dis-

tribution f and random variable X not depending on position x ∈ M is defined as
the variance of the score of f . Since the expectation of the score is zero, we have
that
∂
I(x, X) = E[[ ∂x ln(X(θ)f (x, θ))]2 ]
It is then natural to define the Total Fisher Information as

R
I(X) = M
I(x, X)dx
This is the quantity that I will be most interested in, and will be the primary
focus of the analysis to follow.
179
Note that for the intents and purposes of what I plan to use this for, X will
predominantly be the identity variable, ie X(θ) = 1 for all θ ∈ A, since the quantity
that I am predominantly interested in measuring is only the signal function. Also
note that the Fisher information is linear in its second argument for independent
random variables. In other words, I(θ, X + Y ) = I(θ, X) + I(θ, Y ). This follows
from the result that the variance of the “sum” of independent events is the sum of
their variances. (Note (X + Y )(x)f (θ, x) = X(x)f (θ, x)Y (x)f (θ, x) for independent
random variables X, Y ; addition is not the same in χ as one might expect. Equiv-
alently we could notate X + Y as X ∩ Y .) Intuitively this is in accordance with
the fact that we should be able to add the information from unrelated experiments
together in a linear fashion.
Definition 34. A random observation is a random variable that is equivalent to
the identity on the probability space over a manifold.
This of course leads us to the question of what we mean when we talk about a
measure of information, ie what properties it should have. We might like to try to
prove something like the following:
Theorem 7.2.1. (Uniqueness of the Information Measure). Let χ be the space of
random variables on A. Then, given a probability distribution f defined pointwise
over the manifold M as above there is one, and only one function K : M ×χ → R≥0 ,
unique up to a nonzero scalar constant, that satisfies the following properties:
(i) Linearity in its second argument for independent random observations.

(ii) It is the maximal measure of variability in the probability distribution over M ,
subject to the previous condition.
Certainly the Fisher Information as I defined it above satisfies these conditions.

And indeed, for most intents and purposes, something like this will suffice. However,
it behooves us to seek a more fundamental and convincing derivation of the principle
I am going to invoke, since otherwise I would essentially have to motivate everything
physically and wave my hands over the parts where something more precise might
be required. For now, however, I will continue with this particular approach and see
where it leads us.
So anyway, uniqueness only up to some scalar constant will not matter, as it
will turn out, because at the end of the day we are aiming to extremise this measure
of information, and δK = 0 is completely equivalent to δ(λK) = 0, for λ a fixed
constant.
180
We are now at the point where we can define the bound information, J, and
describe the principle of extreme physical information, as described in Frieden’s
book.
I define the channel information to be the total Fisher information as defined
above. This is the total amount of information our manifold, can, in some sense,
“carry”. The bound information, J, is in some sense the unavoidable information
contained within the system. It will be derived in our case by our conservation
equation, as described and motivated in the previous section. For a physical manifold
then, we expect I = J.
Now, the principle of extreme physical information (The EPI principle) states
that the physical information, K = I −J, must be critical with respect to the natural
Fisher variables of the system. In our case, the Fisher variables will be the metric
components σ ij in some local chart. This is how the EPI principle is described by
Frieden. But I in fact go one step further.
Refined EPI principle: For physical manifolds, the physical information, K,
must be locally minimal. In other words we roughly need the following two things
to hold:
δK = 0, and
δ 2 K ≥ 0.
I say roughly, of course, because the second criterion does not guarantee stability.
7.2.3 Interpretation
One could view this principle as being a maximal efficiency principle. The necessity
of local minimality is required, of course, from the point of view of stability: the
system must be stable, otherwise perturbations will throw things out of wack. Even
if one has K = 0, this is no guarantee of stability; obviously it makes no sense if
K is negative, but it could happen that an unphysical manifold might have K = 0,
δK = 0 for all variations, but K 00 (t) < 0 for some variation. Then there will
be a natural flow to negative information K and the manifold will become truly
unphysical.
Perhaps a better way to interpret the EPI principle, particularly in the statistical
setting, is that the universe wants to run as efficiently as possible, so each and every
point, in an ideal world, would know exactly what was going on everywhere else and
181
adjust itself to realise a minimum. (This is actually what happens in the degenerate
or sharp case, where one eliminates all probabilistic behaviour). However, allowing
more complex statistical models (which are more realistic) entail that there is an
increasing cost to a point ”knowing” what occurs metrically further and further away
from it; in other words, the universe must organise a trade off between knowing
exactly what happens everywhere (which would require a massive investment of
energy in pinning down the geometry) and spending the least energy possible so as
to be slightly uncertain about the geometry, but still realise a lower energy state.
This is precisely what I will be pinning down with my discussion of almost sharp
geometries, which are closest in behaviour to standard quantum mechanical models,
but of course the mathematics of information geometry does allow quite significant
generalisation beyond this.
As for why the universe would ”want” to run as efficiently as possible, perhaps
a better way of putting it would be to observe that since all other configurations
are unstable it has little choice in the matter. The inevitable noise due to statistical
fluctuations will cause any suboptimal choice will be quickly dropped in favour of
one more locally optimal. So while noise might prevent the universe from ever
attaining a global minimum for information, the fact that the information is stable
there means that it will get very close. So to a reasonable approximation we can
think of the universe as actually sitting at that point in solution space, in order to
draw various cartoon models of increasing sophistication. Naturally, of course, we
would like to understand the noise - and indeed elimination in parts will lead to
more complex formulations of Cramer-Rao and EPI. However the act of elimination
will always leave higher order noise yet to be quantified and understood. Hence the
best we can hope for is to produce models that work under a range of conditions
that we can experimentally measure and understand.
7.2.4 Concluding Remarks

An equivalent representation of the channel information, I, for a fuzzy riemannian
manifold (M, A, f ) may now be defined to be
R R
I= M A
k∂Λ q(m, a)k2(m,a) dadm
for ψ a mass distribution and U ⊂ M , where q 2 = f . This makes sense because

by construction f is really a sort of probability distribution for the curvature. Here
182
by ∂Λ I mean the fuzzy gradient operator acting on q as a function, and not as a

tensor.
We now need to determine the bound information J. Using the conservation
equation relating ψ and f ,
divΛ ψ(m, a) = divΛ gradΛ q 2 (m, a),
and
gradΛ q 2 (m, a) = 2q(m, a)gradΛ q(m, a),
and assuming perfect transfer of information, we get that

2
I = 41 M A kψ(m,a)−B(m,a)k
R R
f (m,a)
dadm = J,
where ψ(m, a) = ∂Λ q 2 (m, a) + B(m, a) for some B such that divΛ B(m, a) = 0.
The physical information, K, is defined to be I − J. To apply the EPI principle
one now need only solve the variational equation δK = 0. Hopefully this will prove
fruitful in deriving general relations that a physical manifold must then satisfy.
Philosophical Digression:
It might seem that in describing the characteristics a solution must satisfy on
a manifold that we are completely defining the state the solution must take. This
is not the case on complete manifolds because it seems to me against the very
principles of Fisher Information Theory to be able to make an infinite measurement
on a space (recall the EPI principle basically asserts physical behaviour arises as a
consequence of measurement), since this would require an experimental apparatus
of infinite capacity! So only finite domains can be measured, and, as before in my
discussion about extremising length, the solution may well not be unique. Of these
possible solutions, the one that the domain ultimately takes will depend on the
behaviour of the manifold outside.
7.3 Unbiased Estimators and the Cramer-Rao In-

equality
I now return to basics and attempt to defend once more the principle I invoked,
notably that the quantity K should be locally minimal in the space of signal func-
tions, f on a fuzzy Riemannian manifold. In this section I will demonstrate how
183
Unbiased Estimators and the Cramer-Rao Inequality
this property can be thought of as the condition of attaining equality, or, at the very
least, local minimality, in what I shall call the Weak Cramer-Rao Inequality.
To put things into the terminology of Murray and Rice, a fuzzy Riemannian
manifold (M, A, f ) which is complete with respect to each metric in its corresponding
metric indexing space A is a complete exponential family; the sample space of events
is diffeomorphic to M × A; the space in which events occur is M ; and the likelihood
function is f : M × A → R.
From now on I shall assume that we are dealing with a fixed fuzzy Riemannian
manifold Λ = (M, A, f ) such that M is complete with respect to each metric σ(., a),
any a ∈ A.
7.3.1 Estimators and strong Cramer-Rao

Definition 35. An estimator u is a map from the sample space M × A to M . An
unbiased estimator is an estimator that satisfies the condition that
Em (θ ◦ u) = θ(m)
R
where θ : M → Rn is a choice of coordinates for M and Em (λ) = A
λ(m, a)f (m, a)da,
m ∈ M.
Lemma 7.3.1. . An unbiased estimator u satisfies the following relation:
◦u)
Em ((θi ◦ u) ∂ln(f
∂θj
)=0
Proof. Differentiating the condition with respect to θj on both sides, we get:
◦u)
∂
((θi ◦ u) ∂(f
R R
A ∂θj
(θi f )da = δji + A ∂θj
) = δji
∂f ∂ln(f )
If we use the fact that ∂θj
= ∂θj
f, we see that the relation follows.
Definition 36. The Fisher Information Matrix g ij (f ) (corresponding to the likeli-

hood function f ), is defined to be
g ij (f ) = Em ( ∂ln(f
∂θi
) ∂ln(f )
∂θj
)
184
We have the following inequality for unbiased estimators on Λ, known as the

Cramer-Rao inequality:
cov(θi ◦ u, θj ◦ u) − g ij (f (u)) ≥ 0 (7.4)
where cov(v i , wj )(m) = Em (v i wj ) is the covariance of v and w, and by g ij (f (u))(m, a)

I mean g ij (f (u(m, a), a)).
Proof. (Cramer-Rao).
Note that
∂ln(f (u)) ik ∂ln(f (u)) jl

Em ((θi ◦ u − Em (θi ◦ u) − ∂θk
g )(θj ◦ u − Em (θj ◦ u) − ∂θl
g ))
is positive semi-definite. Expanding this, we get
cov(θi ◦ u, θj ◦ u) − 2Em ((θi ◦ u − Em (θi ◦ u)) ∂ln(f

∂θl
(u)) jl
g ) + g ij
Upon applying our lemma for unbiased estimators on the second term, then
integrating by parts, we get
Em ((θi ◦ u − Em (θi ◦ u)) ∂ln(f

∂θl
(u)) jl
g ) = −Em (θi ∂ln(f
∂θl
(u)) jl
g ) = g ij
and we are done.
Definition 37. A maximum likelihood estimator (mle) of a statistical manifold like

Λ is defined to be an unbiased estimator v realising a local maximum of ln(f (v(m, a), a)),
m ∈ M (this is equivalent to realising a local maximum of the likelihood function
since ln is an increasing function, hence the name maximum likelihood estimator).
Amongst other things we certainly must have
∂ln(f (v(m,a),a))
∂θi
=0
Lemma 7.3.2. Any mle of an exponential family realises the Cramer-Rao lower
bound.
Proof. See Murray and Rice’s book.
185
7.3.2 Physical estimators and weak Cramer-Rao (for Riemannian-

metric-measure spaces)
Definition 38. An estimator u is said to be weakly unbiased if
R
E(θ ◦ u) = M
Em (θ ◦ u)dm = E(θ ◦ Id)
Lemma 7.3.3. Weakly unbiased estimators satisfy the additional relation that
E(θi ∂ln(f
∂θj
)
) = 0.
Proof. Similar to that for unbiased estimators.
Corollary 7.3.4. Note that if our likelihood function f decays sufficiently fast to-
wards infinity from the center of our coordinates θ that E(θ ◦ Id) = 0.
Proof. Write θ = dφ.

Certainly
{ ∂M φf dS − M φ ∂f
R R R R R
E(θ ◦ Id) = A M
θf dmda = A ∂θ
dm}da
Now the first term on the right disappears since f decays sufficiently fast. The
second term on the right disappears by the lemma.
Lemma 7.3.5. Any estimator Γ of the form Γ = exp(− ψf )∂σ f is a weakly unbiased
estimator, where we define ψ by the conservation equation
divΛ ψ = ∆Λ f
where divΛ and ∆Λ are the standard fuzzy differential operators. I shall call such
estimators physical estimators.
Proof. In a physical situation, which is the situation towards this is to apply, f will
decay sufficiently quickly that we can use our remark. So all we need to show is
that:
E(θ ◦ Γ) = 0
186
Now E(θ ◦ Γ) = M A ln(exp( −ψ )∂σ f )f for θ = ln = exp−1 as our choice of

R R
f
coordinate chart.
So
−ψ
R R R R
M A
ln(exp( f
)∂ σ f )f = M A
ln(∂σ f )f − ψ
Now ln ◦ ∂σ = ∂σ ◦ ln, so this becomes

R R
∂ f −ψ
M A σ
but since ψ = ∂σ f + curlΛ B, this is just

R R
− M A curlΛ B
which vanishes by Stoke’s theorem provided B decays sufficiently rapidly towards

infinity, which proves the lemma.
Lemma 7.3.6. For physical estimators we have a corresponding weak Cramer-Rao

inequality:
Z
σij {cov((θi ◦ u), (θj ◦ u)) − g ij }dm ≥ 0 (7.5)
M
Proof. The above inequality is certainly not true for general weakly unbiased esti-
mators since the metric σ is not positive definite. So we need to show that
σij cov((θi ◦ u), (θj ◦ u)) ≥ 0
and
σij g ij (f (u)) ≥ 0.
In fact, what I will end up doing is proving that the second term vanishes for
physical estimators and that the following equality holds for the first:
2 kψk2
( k∂σff k −
R R R
σ cov((θi ◦ u), (θj ◦ u)) =
M ij M A f
)
But by the definition

R of a fuzzy Riemannian manifold ∂σ f and ψ must be timelike
vectors, so indeed M σij cov((θi ◦u)(θj ◦u)) is positive. The lemma then follows from
the strong Cramer-Rao inequality.
187
The previous lemma (which we just proved) is equivalent to the following:
Lemma 7.3.7. For the choice of Γ for an estimator, the Cramer-Rao inequality
becomes
k∂σ f k2 kψk2
Z Z Z Z
dadm − ≥0 (7.6)
M A f M A f
Remarks. Note that the first term of the above inequality is the channel infor-
mation as I defined it before, and the second term is the bound information. So
what this inequality is saying is that the difference of these, which I likewise defined
as the physical information, must be greater than or equal to zero. In particular,
since Λ is an exponential family, any mle of it will realise the inequality critically.
To choose such a critical mle amounts to computing the variation of K with respect
to the signal function f . This provides the sort of backing we were looking for before
to motivate the variational analysis to follow.
k∂σ f k2 2
Proof. (Completion of Proof). I prove that σij cov((θi ◦ Γ), (θj ◦ Γ)) = f
− kψk
f
.
So
(∂σ ln(f ) − ψf )(∂σ ln(f ) − ψf )f

R
cov((θi ◦ Γ), (θj ◦ Γ)) = A
Expanding, we get
R ∂i f ∂ j f ψi ψj R ∂i f ψj
A
( f
+ f
) −2 A f f
f
Now
ψj
σij ∂fi f σij ψi ψj f1
R R R R
M A f
f = M A
by Stoke’s theorem.
R
It remains to show that M σij g ij (f (Γ))dm = 0.
Now
(Γ)) ∂Γk ∂ln(f (Γ)) ∂Γl
g ij (f (Γ)) = A ∂ln(f
R
∂Γk ∂θi ∂Γl ∂θj
f
R ∂ln(f (Γ)) ∂(curlV )k ∂ln(f (Γ)) ∂(curlV )l
= A ∂Γk ∂θi ∂Γl ∂θj
f
But this is 0 since gradcurlV = 0. This completes the proof.
188
Lemma 7.3.8. Any weak mle of an exponential family realises the weak Cramer-Rao
lower bound.
Further Remarks. Now it may seem that it is a bit restrictive to only consider
physical estimators, but I claim that we can transform any map u : M × A → M
into a physical estimator. All one needs to do is choose an appropriate likelihood
function f s.t. E(θ ◦ u) = 0 and u = exp(− ψf )(∂σ f ) for some appropriate ψ that
will itself depend on f . This transforms our problem into one of varying likelihood
functions, which makes good physical sense.
Furthermore, the analysis is this section has rested often on the fact that various
objects vanish at infinity. In certain situations of interest (for instance, modelling
the event horizons of black holes, or of the behaviour of charged plasmas contained
in a finite vessel), this is not enough, and it is necessary to consider finite domains,
or at least domains with boundary. One is led to the following conjecture, motivated
by similar work done by physicists working on the physics of black holes:
Conjecture. The physical information, KM of the manifold and the physical infor-
mation, K∂M of its boundary are related by the following inequality:
KM − K∂M ≥ 0
which becomes an equality for a critical choice of signal function f . Here by

K∂M I mean of course
2 kψk2τ
{ k∂τff kτ −
R R
K∂M = ∂M A f
}
where τ is the restriction of σ to the boundary of M .
We can think of this as a kind of ”holographic principle” for the physical in-
formation K. Note that our earlier result can be considered a special case of the
above, for ∂M = φ. Note that if this conjecture is true, as a particular consequence
it suffices in applications to discard all boundary terms that crop up in a calculation,
or, in other words, to consider problems with Dirichlet boundary conditions.
Final Remark. All of this suggests that we define ourselves an information
functional
R R k∂σ f k2 R R kψk2
K(f ) = M A f
dadm − M A f
189
and play the following game- choose a family of signal functions f and minimise
this functional within this family. The best we could hope for of course would be to
zero the above functional. However, in most circumstances, all we can hope for is to
make it as small as possible. This is the game that I shall play in what is to follow.
7.3.3 Physical estimators and weak Cramer-Rao (for Riemann-

Cartan manifolds)
The notation in the previous subsection is unfortunately a bit too cumbersome
and too particular for our tastes. Ultimately we would like to express things more
succinctly and in full generality. More precisely we would like to deal with the case of
our metrics being general nondegenerate bilinear forms instead of merely symmetric.
As before, recall the following definition and its initial consequences:
Definition 39. An estimator u is said to be weakly unbiased if
R
E(θ ◦ u) = M
Em (θ ◦ u)dm = E(θ ◦ Id)
Lemma 7.3.9. Weakly unbiased estimators satisfy the additional relation that
E(θi ∂ln(f
∂θj
)
) = 0.
Proof. Similar to that for unbiased estimators.

Corollary 7.3.10. Note that if our likelihood function f decays sufficiently fast
towards infinity from the center of our coordinates θ that E(θ ◦ Id) = 0.
Proof. Write θ = dφ.

Certainly
φ ∂f dm}da
R R R R R
E(θ ◦ Id) = A M
θf dmda = A
{ ∂M
φf dS − M ∂θ
Now the first term on the right disappears since f decays sufficiently fast. The
second term on the right disappears by the lemma.
Lemma 7.3.11. Any estimator Γ of the form Γ = ∂Λ f is a weakly unbiased esti-
mator, where f satisfies the conservation equation
0 = ∆Λ f
190
where ∆Λ is the standard fuzzy differential operator for the Laplacian, and ∂Λ
is the gradient operator associated to the space. I shall call such estimators physical
estimators.
Remark. Note that a physical estimator corresponds precisely to a geometry with
signal function where probability flux is conserved. So physics occurs in spaces that
”make sense”.
Proof. In a physical situation, which is the situation towards this is to apply, f will
decay sufficiently quickly that we can use our remark. So all we need to show is
that:
E(θ ◦ Γ) = 0
ln(∂σ f )f for θ = ln = exp−1 as our choice of coordinate

R R
Now E(θ ◦ Γ) = M A
chart.
So
R R R R
M A
ln(∂σ f )f = M A
ln(∂σ f )f − ∂σ f
Now ln ◦ ∂σ = ∂σ ◦ ln, so this becomes

R R
M A
∂σ f
But this is zero by conservation of probability.
Lemma 7.3.12. For physical estimators we have a corresponding weak Cramer-Rao

inequality:
Z
σij {cov((θi ◦ u), (θj ◦ u)) − g ij }dm ≥ 0 (7.7)
M
Proof. Observing that as before we have that
σ E ((θi ◦ u − Em (θi ◦ u) − ∂ln(f (u)) ik ∂ln(f (u)) jl

R
M ij m ∂θk
g )(θj ◦ u − Em (θj ◦ u) − ∂θl
g ))
is positive semi-definite, since u := ∂Λ f is timelike, we conclude the lemma.

Lemma 7.3.13. Any weak mle of an exponential family realises the weak Cramer-
Rao lower bound.
191
Fisher Information is the optimal sharp information measure
Theorem 7.3.14. (The EPI principle (for Riemann-Cartan manifolds)). With the
understanding that a physical estimator is to be used, the Cramer-Rao inequality is
equivalent to the nonnegativity of the Fisher Information. In other words, given a
distribution f over a space of n-dimensional nondegenerate bilinear forms of constant
index Λ, we have that
R R k∂Λ f k2
I(f ) := M A f
≥0
Proof. First of all, it is clear that

R R R k∂Λ f k2
σ covm (θi ◦ ∂Λ f, θj ◦ ∂Λ f ) =
M ij M A f
R
so it remains to show that M
σij g ij (f (∂Λ f )) = 0.
Now
∂ln(f (u)) ∂uk ∂ln(f (u)) ∂ul

Z Z Z
ij
σij g (f (u)) = σij f
M M A ∂uk ∂θi ∂ul ∂θj
∂ln(f (u)) ∂curl(V )k ∂ln(f (u)) ∂curl(V )l
Z Z
= σij f
M A ∂uk ∂θi ∂ul ∂θj
(for some V, since the divergence of u is zero)
=0
(since grad(curl) is the zero operator) (7.8)
This proves the EPI principle in the form we would like to use it.
7.4 Fisher Information is the optimal sharp infor-

mation measure
7.4.1 Introduction
There were a number of criticisms leveled at Frieden’s program of using Fisher
information to recover physical principles. But there are two in particular that
require to be addressed before I go any further:
192
(i) What is the mathematical justification for using the EPI principle?
(ii) Why the Fisher information? Why not some other arbitrary information mea-
sure?
I believe that I have provided a rather thorough explanation of (i), grounding

it on the fact that the Fisher information satisfies the Cramer-Rao inequality for
statistical manifolds. (ii) however is not so clear cut. In particular, there are many,
many other information measures that also satisfy the Cramer-Rao inequality. Many,
many more. To have an idea of just how many, I will point out one general result:
The L - information,
R R kLf k2
I(L) = M A f
where L is a differential operator, satisfies the Cramer-Rao inequality.

For example, L0 = (∂1 , ∂2 , ..., ∂n ) is the standard Fisher operator. Restricting
to the case of a 3-dimensional chart, note that we could also have used (∂3 , ∂1 , ∂2 ),
(∂22 , ∂1 , ∂13 ), (∂1 + ∂2 + 4∂3 , ∂2 , ∂3 + 2∂22 ), (x2 ∂1 + x1 ∂2 , ∂2 , ∂3 ) etc, to have an idea of
how many degrees of freedom there are here.
So I shall now state the main result of this section.
Theorem 7.4.1. Over all possible information measures, the L0 -information, or

Fisher Information, is critical. In other words, it is an optimal measure of infor-
mation.
Remark. This does not preclude the fact that there may be other (locally) optimal
measures. However it is a good starting point for further philosophical discussion.
For now I will restrict myself to the case that the signal function is sharp.
Let S(M ) be the space of functionals F : (M → R) → R on our manifold M .
We would like to come up with some natural parametrisation of these.
We expect such a parametrisation to be compatible with the metric such that
our functionals F are locally diagonalisable with signature identical to the metric.
So locally our space has the dimension of the space of positive matrices on Rn .
193
7.4.2 Variation of the meta information

To gather the appropriate tools to prove the theorem, we need to define a new func-
tional which I shall call the meta-information. First I shall assemble the ingredients:
Essentially we wish to associate a probability distribution of information op-
erators L(m, a) to each pair (m, a) in our statistical manifold. Such information
operators are characterised by the way they act on vectors. In particular, we define
the information hijk to be the coefficient of the ith component of a general operator
L, being the coefficient of the jth directional vector field in that component raised
to the kth power. In particular,
Lijk f = hijk (∂j )k f
Just as with our inner products for statistical manifolds as before, we play the
same game andR define a probability density functionalR λ(m, a, h) such that for a
fixed (m, a), B λ(m, a, h)dh = 1, and also, as before, B Λ(λ(m, a, h))dh = 0 (with
the appropriate conventions for fuzzy derivatives).
Then, we may define the meta-information to be
R R R kL(m,a,b) f (m,a)k2(m,a)
I= M A B
λ(m, a, b) f (m,a)
Make the assumption now that the information distribution functional is sharp,
that is, a particular information is preferred. Then λ(m, a, b) = δ(b − κ(m, a)), and
R R kκjk (m,a)(∂j )k f (m,a)k2
I= M A f (m,a)
Take the first variation with respect to κ and require that it be zero; we get
immediately that κ is constant in m and a if it is critical. Furthermore, it is constant
in k, if we use the notion of generalised derivative (to real numbers). So κ is
essentially just κij (∂j )k for some fixed k where κij is a constant matrix. One may
then make the observation that it is possible to find a new metric σ̄ such that
kκj (∂j )k f k2σ = k∂ k f k2σ̄ , ie it is possible to absorb κ completely into the metric σ.
So we have reduced the problem to one of examining the so called k-information.
Why then should the 1-information be preferred?
I guess then the ultimate conclusion is this:
Theorem 7.4.2. For each k ∈ R, the k-information is optimal.
194
However it shall turn out that this is relatively irrelevant when it comes to
matters of physics, leaving one with a (potentially infinite) number of integration
constants. I suppose if there was any way to say that the one information is preferred
it would be simply that there is no guide as to how to determine these integration
constants; therefore assuming they exist is a nonsense (overspecification), which
rules out criticality for all informations except one or less. But if one considers the
k-information for k < 1 one gets nothing at all (underspecification). So k = 1 is the
only possibility.
This shall all hopefully become clearer to the reader after perusing the section
to come.
7.4.3 Alternative perspectives

However, we can make the following important observation, that what we have been
doing has been slightly unnecessary and perhaps a little silly.
Recall the idea of maximal likelihood estimator (mle) from before.
From Murray and Rice’s book on the subject [40] we have that
Theorem 7.4.3. For any exponential family (statistical structure) the associated
mle is unique.
Remark. In particular via my construction this means that for every geometric struc-
ture there is a unique mle.
Also from [40] we have that
Theorem 7.4.4. The unique mle realises the Cramer-Rao lower bound for the Fisher
Information.
So the Fisher information is the best information in the regard that it is the
unique functional which is zeroed by the estimator of maximal likelihood. This is
in accord with our common sense as to how an information should behave.
For yet another perspective, consider the Cencov Uniqueness Theorem [43],[44],[10].
This essentially states that for any coarse graining (averaging over the geometry of
a space), the Fisher information of the new geometry is less than that of the old
geometry. This means that the Fisher information is a special information, in that
it directly captures what we would expect from the 2nd law of thermodynamics - as
the disorder of the system decreases, so does the Fisher information.
195
In other words, the information of our system is in monotone correspondence

with the Fisher information. Which gives us another reason to prefer the Fisher
information as a measure of such.
196
Chapter 8
Physics from Fisher Information
We are now finally ready to put theory to application. This brings me back to the
signal functions that I introduced before in my discussion of statistical manifolds.
There was in fact a very good reason for studying these things, since it is these
particular classes of objects (sharp and almost sharp manifolds) that will be the
basis of my study in this section. It will turn out that by assuming that a manifold
is sharp and applying our variational principle that we shall recover the equations of
general relativity; furthermore, if we relax our restrictions a bit and assume that our
space is slightly fuzzy, then rinse and repeat our variational calculation, we shall be
able to derive the underlying equations on which one can base the standard model.
8.1 Physics for Sharp Manifolds
8.1.1 Introduction
Let us now suppose f (m, a) = δ(σ(m) − a) again. Our aim is to derive the classical
equations for a physical manifold. The first step is to reexpress I and J using this
signal function. Clearly for this choice, ψ(m, a) = ψ(m, a)δ(σ(m) − a).
Remark : Note that, when all is said and done, we will get a value for the
physical information K ≥ 0. In fact it is extremely unlikely that we will be able
to get K(σ) = 0 for any metric σ. However the point of this exercise is to find the
conditions on σ for it to minimise, or optimise, our information functional as defined
before.
197
Physics for Sharp Manifolds
Substituting into I, and observing that gradΛ δ(σ(m)−a) = gradσ σjk ∂σ∂jk δ(σ(m)−
a) we get that
1
< ∂σ σjk ∂σ∂jk δ(σ(m) − a), ∂σ σmn ∂σ∂mn δ(σ(m) − a) >
R R
I = I(σ) = M A 4δ
(detσ)1/2 dadm
1 ∂δ ∂δ
R R
= M A 4δ ∂σjk ∂σmn
< ∂σ σjk , ∂σ σmn > (detσ)1/2 dadm
Now, before I proceed any further, I shall assume that in the general expression
for ψ deduced from the conservation equation, ψ(m, a) = gradΛ f (m, a) + B(m, a)
that B = 0. Later I shall relax this restriction, and show that, at least in the
classical case, B corresponds to the presence of electromagnetic effects.
8.1.2 Opening statements

I now make a series of claims:
Claim 1 :
1
R R R R
M
∂ (δ)∂j (δ)Σdadm
A δ i M A i j
=
∂ ∂ (Σ)δdadm,
√ p
provided that Σ(1/ a) ∼ 1/a for some p > 1, a >> 1.
Claim 2 :
< ∂σ xjk , ∂σ xmn > satisfies the asymptotic property in Claim 1, for very small
values in the entries of the matrix y = (xjk − σjk )nj,k=1 .
Claim 3 :
1
R
I = I(σ) = 4 M
R(detσ)1/2 dm
Examining J, we find that, upon assuming B = 0, that
R kψ(m,a)k2
J = J(σ) = M ×A 4f (m,a)
dadm
R kψk2
= M 4
dm
198
Claim 4 :
(Rij − 12 σij R)(detσ)1/2 δσ −1
R
Assuming boundary terms are negligible, we get that δ(4I) = M
and δ(4J) = M (ψ̄i ψ̄j − 21 σij kψ̄k2 )(detσ)1/2 δσ −1 , where δ is now the variational
R
derivative.
Substituting the expressions from the final claim into the equation δK = 0 we
get
Gij = ψ̄i ψ̄j − 21 σij kψ̄k2
This is remarkably similar to the Einstein equation, but with the additional term
1
σ kψ̄k2 .
2 ij
This term might explain why cosmologists using the standard Einstein
equation to describe large objects like galaxies get puzzling results, and are forced
to invoke the presence of large amounts of (as yet unobserved) dark matter/energy.
It might also explain the Pioneer anomaly (the discrepancy between the predicted
flight path of the now famous probe out of the solar system and its actual journey).
Note: It can be shown that ψ̄ corresponds to the four momentum if we consider
toy examples like Minkowski space.
Also observe that this equation can be simplified, for if we contract it on both
sides with respect to the metric we obtain
R − n2 R = kψ̄k2 − n2 kψ̄k2
where we are assuming n = dimM > 2 (in applications, n will usually be 4).
As a consequence of the above, the equation of state simplifies to
Rij = ψ̄i ψ̄j
or, in coordinate invariant form
Ric − ψ̄ ⊗ ψ̄ = 0 (8.1)
Together with the geodesic condition
ψ̄
∇(ψ̄,σ) kψ̄k =0
and the conservation equation

divσ ψ̄ = 0
we have now completely described the dynamics of a physical manifold without
boundary for the zeroth order perturbation f (m, a) = δ(σ(m) − a).
199
8.1.3 Proof of the first Claim
I prove this for the one dimensional case, but this example is easily extended
to the general case. First observe that δ(x) = lima→∞ f (a, x), where f (a, x) =
a
exp(−ax2 ). Of course this definition is not unique, but by a deep theorem in
p
π
functional analysis it does not matter what limiting functions we use, as long as
they are smooth, since I am going to take derivatives.
Then
Z Z
0 2 ∂f (a, x) 2
(δ (x)) (1/δ(x))Σ(x)dx = lima→∞ ( ) /f (a, x)Σdx
∂x
Z r
a 2 2 −ax2
= lima 4a x e Σ(x)dx
π
√ ∂ −1/2 √ √
Z
2
5/2
= lima 4a / π (a e−u Σ(u/ a)du), u = ax
∂a
√ √
Z
1 2
= lima 4a / π(− 3/2 e−u Σ(u/ a)du
5/2
2a
−1 √
Z
−u2 0
+ a−1/2 ue Σ (u/ a)du) (8.2)
2a3/2
The first term vanishes by our asymptotic assumption on Σ. We are left with
4a2
Z p 1
lima √ 3/2 e−v Σ00 ( v/a) √ dv, (v = u2 ),
2 πa 2 va
√
Z
4 2u
= lima √ e−u Σ00 (u/ a)du
2 π2 u
√
Z
1 2
= lima √ e−u Σ00 (u/ a)du
π
00
= Σ (0)
Z
= δ 00 (x)Σ(x)dx, (8.3)
which proves the claim.
200
8.1.4 Proof of the second Claim

From the conservation equation for a subset U of M , we have that
Z
∂δ
0= divσ (∂σ σjk )dadm
∂σjk
ZU ×A
∂δ
= < ∂σ σjk , n̂ > dSda (8.4)
∂U ×A ∂σjk
Now consider the one dimensional case for simplicity:
Z
0= δ0Σ
and
2
e−ax = lima→∞ g(a, x)
pa
δ = lima→∞ π
which implies that

2
δ 0 = −lima→∞ 2 πa axe−ax .
p
So
∞
rZ
a 2
0 = −lima→∞ 2 axe−ax Σ(x)dx
−∞ π
Z ∞
a 2 u
= −lima→∞ 2 1/2 ue−u Σ( √ )du
π a
Z ∞−∞
a p
= −lima→∞ √ e−t Σ( t/a)dt
π 0
a √
= −lima→∞ √ Σ(1/ a) (8.5)
π
√
and the only way this can be zero is if Σ(1/ a) ∼ a1p , p > 1, for large a.
Translating back to our situation, this means that < ∂σ σjk , n̂ >= 0
201
Now, consider the level sets of σmn in M . These sets will be codimension one, and
have a normal n̂ that is parallel to ∂σ σmn . Hence on these sets < ∂σ σjk , ∂σ σmn >= 0.
Now sweep out all of M by following the vector field ∂σ σmn to create a family of
sets covering all of M with this property.
Hence the second Claim is proven.
8.1.5 Proof of the third Claim

From the first and second claims, we establish quickly that
Z
1 ∂δ ∂δ
I(σ) = < ∂σ σjk , ∂σ σmn > (detσ)1/2 dadm
4δ ∂σ jk ∂σ mn
ZM ×A
1 ∂ ∂
= δ(σ(m) − a) (< ∂σ σjk , ∂σ σmn > (detσ)1/2 )dadm
M ×A 4 ∂σ jk ∂σ mn
Z
1 ∂ ∂
= (< ∂σ σjk , ∂σ σmn > (detσ)1/2 )dm (8.6)
4 M ∂σjk ∂σmn
I now demonstrate that this new expression is equivalent to one quarter the
integral of the scalar curvature.
First observe that the scalar curvature is the contraction of the Ricci curvature
with respect to the metric:
R = σ ij Ricij .
The Ricci tensor has the following expression in coordinates:
Ricij = ∂l Γlik − ∂k Γlil + Γlik Γm m l

lm − Γil Γkm
Lemma 8.1.1. I claim that the first three terms evaluate to zero after contraction,
that is, that the scalar curvature can be written in terms of coordinates as:
R = −σ iα Γm l
il Γαm
From this point on, without loss of generality, I will making the convenient
assumption that the coordinate vectors ∂i have been chosen so that [∂i , ∂j ] = 0. This
has the natural consequence that the Christoffel symbols are antisymmetric in their
lower indices, that is,
202
Γkij = −Γkji
I will also make use of the following two identities:
1
Γm
ij = (∂i σjk + ∂j σki − ∂k σij )σ
km
(8.7)
2
∂i σjk = Γlki σlj + Γlij σlk (8.8)
Proof. (of lemma) By (8.7),
1
(∂l Γliα − ∂α Γlil )σ iα = {∂l [(∂i σαk + ∂α σki − ∂k σiα )σ kl ] − ∂α [(∂i σlk + ∂l σki − ∂k σli )σ kl ]}σ iα
2
1
= {(∂i σαk + ∂α σki − ∂k σiα )σ kl (−σ kl ∂l σkl )
2
− (∂i σlk + ∂l σki − ∂k σli )σ kl (−σ kl ∂α σkl )}σ iα
1
+ {∂l (∂i σαk + ∂α σki − ∂k σiα ) − ∂α (∂i σlk + ∂l σki − ∂k σli )}σ kl σ iα
2
1
= − [(∂i σαk )(∂l σkl )(σ kl )2 σ iα + (∂α σki )(∂l σkl )(σ kl )2 σ iα
2
1
− (∂k σiα )(∂l σkl )(σ kl )2 σ iα ] − [(∂l σik )(∂α σkl )(σ kl )2 σ iα
2
+ (∂i σkl )(∂α σkl )(σ ) σ − (∂k σli )(∂α σli )(σ kl )2 σ iα ]
kl 2 iα
(8.9)
Now
1
{∂l ∂i σαk + ∂l ∂α σki − ∂l ∂k σiα − ∂α ∂l σik − ∂α ∂i σkl + ∂α ∂k σli }σ kl σ iα = 0 (8.10)
2
after one observes that σ kl σiα = δki δlα and plugs things through.
The remaining problem term, σ ik Γlik Γm lm , is zero because σ
ik
= σ ki and Γlik =
−Γlki . A simple argument using these facts plus the judicious replacement of dummy
indices will then do the job.
This proves the lemma.
So it remains to demonstrate that
∂ ∂
∂σjk ∂σmn
(< ∂σ σjk , ∂σ σmn > (detσ)1/2 ) = −σ iα Γm l
il Γαm (detσ)
1/2
203
Now, under application of the second claim to remove a few unnecessary terms,
we get that
∂ ∂
(< ∂σ σjk , ∂σ σmn > (detσ)1/2 )
∂σjk ∂σmn
∂ ∂
= σ iα [ (∂i σjk ) (∂α σmn )
∂σjk ∂σmn
∂ ∂
+ (∂i σjk ) (∂α σmn )](detσ)1/2
∂σmn ∂σjk
∂ ∂
+ (σ iα )[ (∂i σjk )∂α σmn
∂σjk ∂σmn
∂
+ ∂i σjk (∂α σmn )](detσ)1/2
∂σmn
∂ ∂
+ (σ iα )[ (∂i σjk )∂α σmn
∂σmn ∂σjk
∂
+ ∂i σjk (∂α σmn )](detσ)1/2
∂σjk
∂ ∂
+ σ iα [(∂i σjk )(∂α σmn )] (detσ)1/2
∂σmn ∂σjk
∂ ∂
+ σ iα [(∂i σjk )(∂α σmn )] (detσ)1/2
∂σjk ∂σmn
(8.11)
Consider the first term. By repeated application of (8.8) we get:
∂ ∂ ∂ ∂
σ iα [ (∂i σjk ) (∂α σmn ) + (∂i σjk ) (∂α σmn )](detσ)1/2
∂σjk ∂σmn ∂σmn ∂σjk
= (detσ)1/2 σ iα [(Γlki δjl δkj + Γlij δjl δkk )(Γl̄nα δml̄ δnm
+ Γl̄αm δml̄ δnn ) + (Γlki δml δnj + Γlij δml δnk )(Γl̄nα δj l̄ δkm + Γl̄αm δj l̄ δkn )]
= (detσ)1/2 σ iα [Γkki Γnnα + Γjji Γm l m
mα + Γil Γmα
+ Γlil Γm m n m j m j m j
αm + Γmi Γnα + Γji Γαm + Γij Γmα + Γij Γαm ]
= σ iα (Γm n m n
mi Γnα − Γin Γαm )(detσ)
1/2
(8.12)
the last equality obtained using the fact that Γkij = −Γkji several times.
204
∂
Now it might appear that we have neglected terms like Γl
∂σjk ki
but these all
vanish, because, for instance,
∂ ∂ ∂
(∂i σjk ) = Γlki δjl δkj + Γlij δjl δkk + (Γlki )σlj + (Γl )σlk
∂σjk ∂σjk ∂σjk ij
= Γlki δjl δkj + Γlij δjl δkk (8.13)
after we observe that the last two terms cancel after the relabelling of j 7→ k,
k 7→ j in the second last term and using the antisymmetry of the Christoffel symbols
in their lower indices. Similar results hold for all the other terms.
Examining the second and third terms, we see that they are the same after
relabelling of dummy indices. Similarly for the fourth and fifth terms. Furthermore,
we observe that the second and fourth terms are related by the relation (second
term) = -2(fourth term), since we know that ∂σ∂jk (detσ)1/2 = 12 σ jk (detσ)1/2 .
So the last three terms of (8.11) cancel, leaving us with only the second term
to evaluate:
∂ ∂ ∂
(σ iα )[ (∂i σjk )∂α σmn + ∂i σjk (∂α σmn )](detσ)1/2
∂σjk ∂σmn ∂σmn
= −(σ iα )2 δji δkα [(Γlki δml δnj + Γlij δml δnk )∂α σmn
+ (∂i σjk )(Γlnα δml δnm + Γlαm δml δnn )](detσ)1/2
= −(σ jk )2 [(Γlkj δml δnj + Γljj δml δnk )∂k σmn + (Γlnk δmn δnm
+ Γlkm δml )∂j σjk ](detσ)1/2
= −(σ jk )2 [(Γm m m
kj )∂k σmj + (Γmk + Γkm )∂j σjk ](detσ)
1/2
= −(σ jk )2 Γm
kj ∂k σmj (detσ)
1/2
(8.14)
by antisymmetry of Γ in its lower indices.
Continuing:
−(σ jk )2 Γm
kj ∂k σmj (detσ)
1/2
= −(σ iα )2 Γm l l
αi (Γiα σlm + Γαm σli )(detσ)
1/2
= −σ iα (Γm l l m
αi Γiα δil δαm + Γαm Γαi δil δαi )(detσ)
1/2
= −σ iα (Γm l α m
mi Γlα + Γαm Γαα δαi )(detσ)
1/2
= −σ iα Γm l
mi Γlα (detσ)
1/2
(8.15)
205
Combining these computations together, we get that
∂ ∂
(< ∂σ σjk , ∂σ σmn > (detσ)1/2 ) = σ iα (Γm n m n
mi Γnα − Γin Γαm )(detσ)
1/2
− σ iα Γm l
mi Γlα (detσ)
1/2
∂σjk ∂σmn
= −σ iα Γm n
in Γαm (detσ)
1/2
= R(detσ)1/2 (8.16)
which completes the proof of Claim 3.

Remark. Note that it is possible to avoid this painful calculation, if we observe that
the metric expansion about a point m ∈ M is
σij (m) = δij + R(1)ijkl (m)xk xl + R(2)ijklts (m)xk xl xt xs + O(x6 )
and
p
∆σ (m) (det(σ)) = σij σkl (m)∂k ∂l (δij + R(1)ijkl (m)xk xl + ...)|x=0 (det(σ))1/2 =
R(1) (det(σ))1/2
Hence from claims 1 and 2 we have

R R
I(σ) = M
∆σ (m)(det(σ))1/2 dm = M
R(1) (m)(det(σ))1/2 dm
which again completes the proof of Claim 3.

Remark. Note that this may be generalised. Suppose the 2nth term of the met-
ric expansion for σ is R(n)i1 j1 ...in jn xi1 xj1 ...xin xjn . Then we can similarly show that
∆nσ (det(σ))1/2 = R(n) (det(σ))1/2 . We will in fact need this later when we start to
look at almost sharp geometries.
8.1.6 Proof of Claim 4

R
I = M Rdet(σ)1/2 dm is the standard Einstein Hilbert action. Under our assumption
that boundary terms are negligible, it is well known that the variation of I is as
claimed. So it remains to establish the expression for J.
Now,
206
kl
∂
(σ ij ψi ψj (detσ)1/2 ) ∂σ∂s(s) dp
R
δJ = M ∂σ kl
for a variation σ kl (s) of σ kl .

Hence
ij ij
(ψ̄i ψ̄j − 12 σij kψ̄k2 )(detσ)1/2 ∂σ∂s(s) dp+2 σ kl ∂σ∂ij (ψ̄k )ψ̄l δ(detσ)1/2 ∂σ∂s(s) dp
R R R
δJ = M M A
∂σ ij (s)
It remains to show that the last term is zero. Denote ∂s
|s=0 as F .
So, with the integral over the manifold M understood,
∂ ψ̄j ∂ ∂ ψ̄j
2σ ij ψ̄i kl
(detσ)1/2 F = −2 (σ ij kl ∂i σmn (detσ)1/2 F )
∂σ ∂σmn ∂σ
∂ ψ̄j
∝ δiα δjβ (−σ ij )2 kl ∂i σαβ (detσ)1/2 F
∂σ
2
∂ ψ̄j
+ kl
∂i σαβ (detσ)1/2 F σ ij
∂σαβ ∂σ
∂ ∂ ψ̄j
+ (∂i σαβ )σ ij kl (detσ)1/2 F
∂σαβ ∂σ
1 ∂ ψ̄j
+ (−σ αβ (detσ)1/2 )σ ij (∂i σαβ ) kl F
2 ∂σ
∂F ij ∂ ψ̄j
+ σ ∂i σαβ (detσ)1/2 . (8.17)
∂σαβ ∂σ kl
We immediately notice that the second term vanishes since
∂ ψ̄j
0 = divσ ψ̄ = ∂ σ σ ij
∂σαβ i αβ
= σ ij ∂i ψ̄j .
Also observe that since ∂i σαβ ∂σ∂Fαβ = ∂i F , we may integrate the last term by
parts to obtain
∂ ψ̄
−F ∂i (σ ij ∂σklj ∂i σαβ (detσ)1/2 )
207
(neglecting boundary terms, of course, since we are assuming that they are
negligible anyway)
∂ ψ̄j 1/2 1
= −F (σ ij (−σ αβ
∂ i σαβ (detσ) )
∂σ kl 2
(since ∂i (detσ)1/2 )
−σ αβ ∂i σαβ
= (detσ)1/2 )
2
∂ ψ̄j ∂ ψ̄j
− F ∂i σ ij kl (detσ)1/2 − F σ ij ∂i kl (detσ)1/2 (8.18)
∂σ ∂σ
Now it can be seen that the first and second terms here cancel with the fourth
and first terms in our previous expression (8.17) respectively.
So our original expression reduces to
∂ ∂ ψ̄j ∂ ψ̄j
{ (∂i σαβ )σ ij kl (detσ)1/2 − σ ij ∂i kl (detσ)1/2 }F
∂σαβ ∂σ ∂σ
∂ ∂ ψ̄j ∂ ψ̄j
={ (Γlβi σlα + Γliα σlβ ) kl − ∂i kl }σ ij (detσ)1/2 F
∂σαβ ∂σ ∂σ
∂ ψ̄j
= {(Γlβi δαl δβα + Γliα δαl δββ ) kl σ ij
∂σ
ij
∂ ∂σ
− kl (σ ij ∂i ψ̄j ) + kl ∂i ψ̄j }(detσ)1/2 F
∂σ ∂σ
= 0, (8.19)
since the first term is zero by the antisymmetry of Γ in its lower indices, and
the remaining two terms vanish since divσ ψ̄ = 0.
But this was precisely what we needed to show, completing the proof of Claim
4.
8.1.7 Further Work

To deal with the zeroth order case properly I still need to look at a couple of things:
208
(i) Suppose M has boundary. What is the impact of the boundary on the
resulting equations, ie what boundary terms does one get?
(ii) What happens if B 6= 0? Moreover, can we determine from the resulting
equations how B is related to classical electromagnetic effects?
Note that if our conjecture from before, our ”holographic principle” is true, (i)
is an irrelevant question, i.e. it was completely general for us to consider this vari-
ational problem with Dirichlet boundary conditions in order to derive the relevant
equations.
Point (ii) will be addressed in a later section.
8.1.8 Addendum - simplification of the equations of geometro-

dynamics
Thinking a little more about these ideas for deriving physics from sharp manifolds
leads one to realise that they lack full generality. Namely, one is unable to use all
natural degrees of freedom. In fact, it would appear that we have been led astray by
the misleading coincidence that the number of degrees of freedom for this simpler
approach is sufficient to describe physics in a four geometry, and also that it is closer
intuitively to standard notions of physics, eg 4-vectors and so on.
However, the crucial objects of study are not vectors, they are (1, 1) tensors.
So this suggests we should drop the assumption of symmetry in our choice of bi-
linear form for describing geometrical dynamics. Then one can split the form into
its symmetric and antisymmetric components. The symmetric component would
correspond to the channel information - and would give rise to the action for the
curvature. The antisymmetric component corresponds to the bound information.
This is clearly the correct picture to bear in mind since it is well known that when
one converts the equations of electrodynamics into tensor form one obtains an an-
tisymmetric (1, 1) tensor, called the field strength tensor F αβ (Jackson, ”Classical
Electrodynamics”, page 556 [30]).
For the former the tools of Riemannian geometry are valid. However, for the
latter, which I might as well call ”Cartan geometry”, since all that is occuring is a
minor change of axiom, all the theory for Riemannian geometry goes through - so for
instance we have the analogue of a unique antisymmetric Levi-Civita connection -
and we get a natural curvature tensor, with a corresponding scalar curvature induced
for the bound information.
209
Physics for almost sharp manifolds
But the Levi-Civita theorem also applies if we have a metric with both symmetric
and antisymmetric components, so as to produce a combined connection subject not
to assumption of symmetry or antisymmetry. This in turn induces a natural notion
of curvature for our space.
So instead of considering gradσ f = ψ − curlσ B as we did before in the sharp
case for a symmetric σ, absorb ψ into the dynamics of σ and rewrite the above as
gradσ̂ δ(σ̂) = −curlσ̂ B. Since the right hand side lacks generality (as I have pointed
out), replace it naturally with −gradτ δ(τ ). I claim this is a valid choice for any
antisymmetric τ . Then we have the rather simple expression gradσ̂+τ δ(σ̂)δ(τ ) = 0,
or, if we use the fact that σ̂ and τ are arbitrary,
gradσ δ(σ) = 0 for any arbitrary σ : M → GL(n).
This leads naturally of course, after mirroring the above arguments and calcu-
lations, to a generalised Einstein-Hilbert action
R
M
Rσ dm = 0
which we would equate to zero to determine physical solutions (criticality of the

information). Here of course I stress σ : M → GL(n) may not be symmetric; if it
is of course the solutions are the standard Ricci flat or Einstein metrics. Certainly
there is a fair bit to check here, but, as you can see, the advantage of this extra work
is a considerable simplification of the theory, particularly in its later incarnations
and generalisations.
8.2 Physics for almost sharp manifolds

8.2.1 Introduction
In this section I will attempt to develop, in an analogous manner to before, the
equations describing the behaviour of the curvature and mass distributions within
a physical manifold for a slightly more sophisticated choice of signal function. One
could think of this as the next term in a perturbative expansion about an ideal
metric, so whereas before I was looking at the ”zeroth order case”, I am now looking
at the ”first order case”. In particular, our physical information K will now depend
on two parameters; the metric parameter σ and the expansion parameter . Hence
210
it will turn out that in order to optimise K we will have to solve two coupled partial
differential equations for these parameters.
Throughout this section I will assume that B = 0.
First I will demonstrate that a good choice of signal function is
2
(1 + ∆σ + ∆2σ 2! )δ(σ(m) − a)
where ∆σ = σ kl ∇k ∂l .
This signal function is motivated by considering the very reasonable signal func-
tion
f (m, a) = ( k√1 π )dim(M ) exp(−|(σ ij (m) − aij )mi mj |/k 2 )
that I constructed previously.

For k << 1 this can be expanded in terms of δ-functions to get the expression
which I provided above up to quadratic order in k. I shall now provide a proof of
this claim, working, as usual, with the one dimensional case.
Let f : R → R be an arbitrary smooth function.
Z Z
1 −( x )2 1 2
√ e k f (x)dx = √ e−v f (kv)dv
k π π
k 2 2 00
Z
1 −v 2 0
= √ e (f (0) + kvf (0) + v f (0))dv + terms of order k 3
π 2
2
r
k d π
= f (0) + 0 + √ ( )|t=1 f 00 (0) + Θ(k 3 )
2 π dt t
k
= f (0) + ( )2 f 00 (0) + Θ(k 3 )
Z 2
k
= (δ(x) + ( )2 δ 00 (x))f (x)dx + Θ(k 3 ) (8.20)
2
x 2
But f was arbitrary, so k√1 π e−( k ) ∼ δ(x) + ( k2 )2 δ 00 (x).
This naturally extends to the general case as
( k√1 π )dim(M ) exp(−|(σ ij (m) − aij )mi mj |/k 2 ) ∼ (1 + ( k2 )2 ∆σ )δ(σ(m) − a)
211
Physics for almost sharp manifolds
Now k is a dimensionless parameter, albeit very small, so we may define = ( k2 )2

as a new dimensionless parameter.
Note from my concluding remarks from the previous section that we may make
the simplifying assumption of working with a Riemann-Cartan metric, in which case
we do not need worry about torsion or a mass distribution, since they are absorbed
into the metric tensor.
A few further preliminary remarks. Let k : σ 7→ ∆σ be the Laplacian map. Then
observe that ∆σ ◦ = k(σ × k −1 ◦ ), where we are using the property ∆σ×τ = ∆σ ∆τ
of k. Then we get that
R R
M
∆(σ;) (det(σ; ))1/2 = M
∆σ (detσ)1/2
for small , with error O(2 ). So in what follows it makes sense to consider the
deformed metric σ̄ := σ×k −1 ◦ instead of σ. In order to make sense of the argument
to follow, however, for transparency write σ̄ := ˆσ̄ˆˆ , where ˆ = sup(x∈M ) (kk −1 ◦ kσ )
is the maximal variation and hence constant.
Finally for the argument to follow map ˆ 7→ , and σ̄ ˆ 7→ σ. The point of doing
all of this is essentially so we can treat our perturbative parameter as constant and
concentrate wholly on the metric. We will unravel afterwards according to this key.
8.2.2 Result outline

Now that these fundamentals have been covered we are ready for the main results.
Claim 1 : Under the assumption that boundary terms are negligible, for the
above choice of signal function the total information can be written as
R
4K = M
(∆σ + ∆2σ + ∆3σ 2 )(det(σ))1/2 dm.
Note that the condition that boundary terms have negligible contribution seems
eminently reasonable in light of our assumptions if the boundary is at infinity, since
we assumed a finite mass distribution E < ∞ (the equations trivialise to the previous
case if E → ∞ (admittedly with boundary terms)).
Claim 2 : The total information K can be rewritten as
R
4K = M
(R + R(2) + R(3) 2 )dm
212
where R(2) is the fourth order geometric invariant defined by the contraction
of CurvRiem three times, where Curvij = ∇i ∇j − ∇j ∇i − ∇[i,j] is the curvature
operator. Similarly R(3) is the sixth order geometric invariant. More generally,
we define R(k) to be the 2k th order geometric invariant defined by the appropriate
number of contractions of Curv k−1 Riem.
Corollary: Untangling the above via the key, we obtain that the information is
ˆ3 + ...)(det(σ;ˆ ))1/2 dm
R
4K(σ, ) = M
(R(σ;)
ˆ ˆ + R(2)(σ;)
ˆ ˆ2 + R(3)(σ;)
ˆ
Remarks.
(1) Requiring now that δK = 0 allows us to solve simultaneously not only for the
optimal metric σ, but for the optimal expansion parameter . In particular
this tells us that may vary wildly depending on the nature of our solution.
∂K
Furthermore, the requirement that two equations must hold - ∂σ −1 = 0 and
∂K
∂
= 0 - serves as an obstruction to the space of solutions one might expect
given one or the other to hold by itself (see the later section on classification).
(2) We have neglected the contribution to the information from boundary terms,
which we assumed to be negligible. In fact, due to the nature of the model we
are using, this is a necessity. However it is clear that there is physics that does
not fall under this general umbrella. This will be taken up again in a later
chapter, when I talk about statistical stacks and condensed matter physics.
8.2.3 Proof of Claims 1 and 2

By definition,
1 2
+ ∆σ + ∆2 2 )∂(1 + ∆σ + ...)δk2
R R
4I = M A (1+∆σ +...)δ
k(1
Expanding this expression and throwing away terms of order 3 or higher, we

are left with
(δ −1 k∂σ δk2 + {− ∆δσ δ δ −1 k∂σ δk2 + 2δ −1 ∆k∂δk2 }) + 2 {∆2 (δ −1 k∂σ δk2 ) +
R R
4I = M A
2δ −1 ∆2 k∂δk2 − 2∆σ (δ −1 ∆k∂δk2 } + B()
where B() are boundary terms depending on derivatives of , which can there-
fore be neglected. We are also using the identity
213
Recovery of standard results
1
1+f +2 f 2 /2
= 1 − f + 2 f 2 /2 + ...
Once more throwing away terms of order 3 and above, we rewrite the above
expression as
2
δ −1 k∂σ δk2 + {(σ αβ ∂α σjk ∂β σmn (detσ)1/2 ) ∂σjk∂∂σmn ∆σ δ} +
R R R R
4I = M A M A
R R 2 3
M A
∆ (detσ)1/2 δ
This proves the first claim.

But ∆n (detσ)1/2 = R(n) (detσ)1/2 , so this is nothing other than
R R
M A
(R(1) + R(2) + 2 R(3) + ...)δ
which proves the second claim.
8.3 Recovery of standard results

Here I will demonstrate how standard results in classical physics follow from the
equations developed above. In particular, I will show that Maxwell’s equations arise
as a special case for the physics of sharp manifolds, and I will show how the Klein-
Gordan equation follows as a consequence of the physics of almost sharp manifolds.
First, write the stress energy tensor in terms of the metric as T (σ) = curlσ σ =
curlσs σs + curlσas σas where σs is the symmetric part of the Riemann-Cartan metric
σ and σas is the antisymmetric part.
Then divσ T (σ) is certainly zero, as required.
We can also rewrite the action
R R
K= M A
k∂δ(σ)k2 /δ(σ)
as
K = I(σ) − J(σ)
where
214
I(σ) = K(σs )
and
R R
J(σ) = M A
kT (σs )k2 δ
In particular there exists a vector field φ such that kT k2 = kφk2 over M .

Claim 1. ∂f = φ. So if we define a new vector field ψ, with divψ = 0, then
φ = ψ − curlσ B, for some vector field B.
Claim 2.
R
J= M
kψ̄ − curlσ B̄k2 ,
where ψ = ψ̄δ and curlσ B̄δ = B.

Claim 3.
R
J= M
(kψ̄k2 − kcurlσ B̄k2 )
So since we already understand the variation of the first term above from before,
we need only focus on the second term.
Claim 4. We have:
R R
M
kcurlσ B̄k2 = M
(k∇B̄k2 + (divσ B̄)2 )
provided that we are assuming Dirichlet boundary conditions.

Claim 5. There exists fˆ such that
k∇fˆk2 = k∇B̄k2 + (divσ B̄)2
Claim 6. The physical information is
(R − kψ̄k2 + k∂ fˆk2 )
R
K(σ) = M
215
Remark. If ψ̄ = 0, then the physical information for a sharp manifold corre-

sponds to the Perel’man functional after a simple change of coordinates.
Claim 7. Assuming Dirichlet boundary conditions, the constitutive equation for
a sharp manifold with nonzero electromagnetic term is then
Rij = Γij (8.21)

Remark. In the case that ψ̄ = 0, the first equation is very closely related to
the equation for a gradient soliton. In particular there is a close relation between
the information functional for a sharp manifold and the Perelman functional for the
Ricci flow.
The first claim is self-evident; it follows from the definition of signal function. I
prove Claims 2 through to 7 for the sharp case. The perturbative case is analogous.

Write B = B̂δ. Then we know that
divσ (B̂δ) = 0
since B is the curl of some fuzzy vector field on Λ.

So
0 = (div B̂)δ+ < B̂, ∂σ δ > (detσ)1/2

∂
= {(detσ)1/2 div B̂ − (σ iα B̂i ∂α σjk (detσ)1/2 )}δ
∂σjk
∂
= {(detσ)1/2 div B̂ − B̂i (σ iα ∂α σjk )(detσ)1/2 }δ
∂σjk
∂
= {div B̂ − B̂i (σ iα (Γlkα σlj + Γlαj σlk ))}(detσ)1/2 δ
∂σjk
= div B̂(detσ)1/2 δ (8.22)
Hence
divσ B̂ = 0
216
So B̂ is the curl of some vector field, say B̄.

This concludes the proof of Claim 2.

R R R
Now, J = M
kψ − Bk2 /f = M
(kψk2 + kBk2 − 2 < ψ, B >)/f = M
(kψk2 + kBk2 −
2kBk2 )/f
So Claim 3 follows immediately from Claim 2.
8.3.3 Proof of Claims 4 to 7

Proof. (Claim 4).
This part follows from integration by parts:
R R
M
kcurlσ B̄k2 = − M
< B̄, curlσ curlσ B̄ >
using the assumption of Dirichlet boundary conditions, followed by application

of the identity ijk klm = δil δjm − δim δjl .
Proof. (Claim 5).

Now first observe that k∇B̄k2 + (divσ B̄)2 is positive, since as ψ̄ is timelike we
must have that curlσ B̄ is timelike, which implies that B̄ must be timelike. Hence
there is a vector V such that kV k2 is the above quantity.
Consider kcurlσ V k2 = k∇curlσ B̄k2 + (divσ curlσ B̄)2 . This quantity is zero since
∇curlσ and divσ curlσ are zero operators. Hence curlσ V = 0, which implies V = ∇fˆ
for some function fˆ.
This completes the proof of claim 5.
Proof. (Claim 6). This claim is an easy consequence of the previous claims and the
results of previous sections.
Proof. (Claim 7). Follows by a simple first variation argument.
217
8.3.4 Classical electrodynamics

Now, starting with the dynamical equation for a critical sharp manifold, assume R(i)
is O(i ), kψ̄k2 is O(). Then ∂i fˆ∂j fˆ = 0, in other words, fˆ is constant. We interpret
fˆ then to be the ”frequency”.
Also recall 0 = k∂ fˆk2 = k∇B̄k2 + (div B̄)2 . Then div B̄ = 0 and k∇B̄k2 = 0
since B̄ is timelike. So if we interpret B̄ as the four-potential, B̄ = (A, φ) we get
that
∂φ
∇·A− ∂t
=0
using the Minkowski metric. This is nothing other than the Coulomb Gauge.
I also claim that k∇B̄k2 = 0 is equivalent to the statement ∆B̄ = 0, since we
observe then that
k∇curlσ B̄k2 = k∇2 fˆk2 = k∇2 B̄k2
But the expression on the left hand side is zero by a basic operator identity and
hence since B̄ is timelike the conclusion readily follows. So ∆B̄ = 0, or
∂2
(∇2 − ∂t2
)Ax = 0
∂ 2
(∇2 − ∂t 2 )Ay = 0
∂ 2
(∇2 − ∂t 2 )Az = 0
∂ 2
(∇2 − ∂t2 )φ = 0
But these are the homogeneous electromagnetic field equations.
8.3.5 (Classical) quantum mechanics

If we do a bit of fiddling with the results from the beginning, it is clear that the
action for the perturbative case should look something like
(1 + ˆ∆ + ˆ2 ∆2 + ...)(R − kψ̄k2 + k∂ fˆk2 )

R
K(σ) = M
Let ∂ fˆ = 0. Interpret ψ̄ as the wave function, as intended. Then a simple

variation argument gives the constitutive equations
218
(1 + ˆ∆ + ˆ2 ∆2 + ...)R = (1 + ˆ∆ + ˆ2 ∆2 + ...)kψ̄k2
(∆ + 2ˆ∆2 + ...)R = (∆ + 2ˆ∆2 + ...)kψ̄k2
Make the further simplifying assumption that curvature is negligible. Then to

first order we have
(1 + 2ˆ∆)ψ̄ = 0
and
∆(1 + 4ˆ∆ + ...)ψ̄ = 0
This second can be rewritten as
(1 + 4ˆ∆ + ...)ψ̄ = V
for some potential function V , from which we conclude after substitution the
Klein-Gordon equation for the wave function:
2ˆ∆ψ̄(m) = V (m)
For flat Lorentz manifolds, in the non-relativistic limit, where v/c << 1, we may
rescale time in terms of v/c to remove ˆ and the equation degenerates to a parabolic
PDE, the Schrödinger equation:
(2ˆ∆ − ∂t )ψ̄(x, t) = V (x, t)
8.4 Classification
8.4.1 Motivation
Now the classification of all n-manifolds for n ≥ 4 is well known to be impossible,
since the word problem on the fundamental group of such objects is known to be
in general insoluble. However, since we have some control of the full Riemann
curvature tensor for stable physical manifolds, and since, by their very nature, these
219
Classification
manifolds are specially constructed, we might expect some form of classification for
them. Hence we are led to the following
Conjecture: There is a classification of all stable, physical Lorentz n-manifolds
for each n for both the zeroth and first order cases treated previously. In other
words, there are a finite number of families of eigensolutions to these equations,
which may be described precisely.
These fundamental solutions may well correspond, at least in relation to the first
order equations, to well known “particles” which have been observed in experiment.
This leads us to the further
Conjecture: There is a one to one correspondence between eigenfamilies of stable,
physical Lorentz 4-manifolds with Dirichlet boundary conditions satisfying the full
blown 1st order equation with nonzero electromagnetic term, and the “particles” of
the Standard model of particle physics.
Clearly in order to analyse the solutions of our PDEs we need to have a general
set of tools at our disposal to identify symmetries in the equations. So one might
expect that Lie Group theory would play a pivotal role.
Motivating Example: Consider the Dirac equation (with all constants set to one-
except for h and m):
h ∂
σ ij ( m ∂xj
+ Aj )Q = iQ
where σ is an almost flat Lorentz metric, Q is the probability amplitude and A

is the four-current.
This equation has two families of eigensolutions, one corresponding to the elec-
tron of particle physics, the other to the positron. Z2 acts naturally on these solu-
tions as a symmetry of the Dirac equation, suggesting that we might expect more
generally for more complicated Lie Groups to allow us to classify the solution space
of more sophisticated PDEs.
8.4.2 Speculation
Certainly for small particles, we would expect a natural splitting in local charts of the
Lorentz metric into approximately a Riemannian metric for the space coordinates,
and the Riemannian metric for the time coordinate. To be more precise, we expect
there to be a coordinate system such that the off block entries in the expression for
the metric in the chart are of order 2 and hence can be neglected. Then, in other
words, fundamental solutions would be classified by
220
{Prime 3-manifolds} × {Prime 1-manifolds}
Then, we may observe that from the geometrisation conjecture it is now well
known that there are only eight possible choices for the prime components of a three
manifold. One manifolds have, of course, a trivial classification: they can be either
S 1 or R.
To get fermions, we need to write the PDEs L1 R(1) := R(1) +R(2) = 0, L2 R(2) :=
R(2) +2R(3) = 0 as Ki R(i) = 0 where ”K̄i Ki = Li ” or, more precisely, < Ki , Ki >σ̂ =
Li where σ̂ is the complexification of σ (the complex splitting of our original PDE).
To be more precise, we should write R(i) (σ) = kR(i) (σ̂)k2σ̂ for the corresponding
complexifications of R(i) . Then we would say we get fermions if Ki (R(i) ) = 0.
Anyway, we then get eight fundamental solutions, according to our classification
above, times either a circle or a line. Solutions of the form X × S 1 correspond to
virtual fermions; solutions of the form X ×R we expect to be real fermions. This fits
in nicely with the standard model, which lists off eight different types of fermions
per “generation”.
But there is also the other complex factorisation of our orginal PDE, where
Li = kK̄i k2σ̂ , K̄i being the complex conjugate to the operator Ki . Then we get
corresponding solutions to the equation K̄i Γ = 0, where Γ = Ric − φ ⊗ φ. This leads
us to introduce an additional quantum number, which we shall call spin. Fermions
with spin up are solutions to the equation Ki Γ = 0; fermions with spin down are
solutions to the equation K̄i Γ = 0.
What about bosons? This is a slightly tricky thing to get precisely right, but
we can roughly say that we would expect bosons to be the pairing of a spin up
fermion with a spin down that otherwise have very similar quantum numbers, via
the factorisation of the equation we wrote above. Virtual bosons will correspond to
gluons, of course, of which there will be eight - and the real bosons will correspond
to the set {photon, W, Z, Higgs}. There are of course particle and antiparticle
equivalents of these last four (under the symmetry of charge conjugation, λ 7→ λT
where λ is the antisymmetric component of σ.
Now, we do not have nearly enough information to talk about the different
”generations” of the standard model fermions, but within the scope of this model we
might guess that they correspond to the ground and excited states of each of the eight
fundamental families of eigensolutions to the PDEs Ki R(i) = 0 (up to spin). It does
not seem immediately obvious as to why there should be only three eigensolutions
per family, since the PDE suggests there should be an infinite quantity.
221
Classification
One can make handwavy arguments saying that the perturbation expansion is
no longer valid beyond three but I suspect that the deeper reason is that it is the
particle physics version of the Lorentz problem. To recall, the Lorentz problem is the
mystery of why Lorentzian geometry should be the preferred geometry of low energy
physics. The reason that this is suggestive is because one can pair the triple of the
three generations of a type of fermion (spacelike directions) with the corresponding
force carrier, or boson (timelike direction). I do attempt to sketch a solution to the
Lorentz problem in the next chapter - perhaps the proof can be adapted to shed
some light upon this particular mystery too.
As a final throwaway remark, we might naively guess that maybe fermions are
somehow more fundamental than bosons (although this is in direct contradiction
to what I have just said); that bosons can be decomposed into fermion ”conjugate
pairs”. Maybe there is a connection here to twistor theory?
In fact, what we are doing is performing a kind of matched asymptotic expansion
with gluuing- we are taking a complete model of the particle (S 3 × R, for instance),
and then gluuing it to a very small region in R14 which of course is modelled after a
space in which there is no matter (see diagram). So in this instance, these particles I
am describing are a perturbation on an otherwise boring landscape (there are more
metrics than simply that for R14 allowable on the outer solution, however; in fact,
once again, we have eight alternatives).
I must stress that this whole approximation is only valid when
(a) Our particles are small, i.e. we are dealing with the lower eigenvalues in this
model of the governing equation, and
(b) is very small.
So evidently if we are finding solutions of our equations with values of which
are not << 1, then these solutions are unphysical, because they contradict the
assumptions made to get to them.
Later on in the next chapter I will develop tools which might be usable to
overcome these limitations, and create better even better models for particle physics.
However assumption (a) still stands by necessity.
8.4.3 Extensions
Remark. Here are some thoughts that I had in 2007 which serve to motivate the
following chapter. They should not be taken too seriously, but rather as an informal
222
discussion from which one can launch oneself as a point of departure.

Various extensions that could be made might be to consider an ”expansion
metric” τ , which corresponds to the signal function (1 + (τ · ∇) · (τ · ∇))δ(σ(m) − a),
in order to take into account a thickening or thinning of tubes in the space of metrics
approximating a sharp manifold. Other work that could be done might be to try to
examine more closely the link between the particular PDEs that crop up in these
problems, the existence of local but possibly not global solutions to them, and the
connection to chaos. A journey into chaos, with its connections to fractal geometry,
is worthy of another treatise in its own right. Various people such as Smale have
made great contributions to this area. Perhaps there is a connection to Morse Theory
here?
Furthermore, one might ask, why Lorentzian geometry? Why should reality
seem to be most plausibly described by a four-geometry with three spacelike and
one timelike dimensions? This leads naturally to the idea of considering a potential
geometry with possibly infinite dimensions, and performing the same sort of analysis
on this new geometry as I have done here. The potential for ”fractional” dimensions
or a Cantor dust on various levels should be also allowable. Then I make the
following conjecture:
Conjecture: On such an infinite dimensional space, an optimal distribution of in-
formation should have support almost everywhere on a finite dimensional submanifold-
with three spacelike and one timelike dimensions. In other words, a minimal and
stable configuration for the information is for reality structures with these dimen-
sional characteristics. The remainder of the information for a stable (or almost
stable) system should inhabit a set of measure zero.
Why should this be true? Well, there is a natural ring structure on such a space,
as one can see from the observation that there are local charts into the quaternions
a + bi + cj + dk
where the real part is associated to the timelike dimension and the ”imaginary”
part to the spacelike dimensions.
Of course this is all very artificial and may not really mean anything at all. Still
it suggests that quaternions might have a certain significance. This does slightly fly
in the face of intuition, however, because it is actually the complex numbers that
have the most structure and one might therefore expect them to be somewhat more
natural.
223
Classification
What perhaps is a more plausible way of looking at why perhaps the support of
information in the universe is almost entirely contained on a Lorentz 4-manifold can
perhaps be seen by examining the first order perturbative PDE for the equations
of physics. For it to be a nontrivial generalisation of the nonperturbative case, in
other words, for there to be any ”quantum mechanical effects”, one requires that
the second degree curvature term, R(2) , be nonzero. But this is only possible if there
are at least 4-dimensions in which to work, because otherwise this term will vanish.
So then one can again invoke the principle of maximal laziness, or Occam’s razor,
or your favourite equivalent principle, and say then, well, clearly, four dimensions
is the simplest structure in which to work at low perturbations, so that is how the
information in the universe will naturally be structured.
That does not mean that there will not be small (though usually negligible)
traces of information in other, ”higher” dimensions.
Why would the index be one? It may be that the best way to organise a four
dimensional system is for it to have as much structure as possible. So we come
back to quaternions again, which is clearly equivalent to a certain class of Lorentz
manifolds.
The natural things to consider for such statistical manifolds are of course infi-
nite dimensional random matrices, or, alternately, the continuous geometries of von
Neumann could be adapted to help flesh out an appropriate model here to prove
this conjecture properly.
Interestingly, this conjecture may not be true in certain circumstances, which
could lead to some quite weird behaviour indeed.
Another research direction to consider is to look at ”higher statistical moments”,
that is, to build a statistical geometry on top of an existing statistical geometry. To
make myself a bit clearer, we have been considering, so far, a distribution on the
space of metrics, A1 , of a given differentiable manifold, M . It is then a simple process
of abstraction to consider different metrics over the space A1 , so we are considering
a distribution of inner products in a space A2 over the space of inner products A1 of
a manifold M n . We could even associate a differential structure to A1 by building
and gluing charts of appropriate dimension together from the underlying Euclidean
structure to make A1 essentially a general n(n − 1)/2 manifold, ie make it only
locally euclidean for the purposes of construction of metrics over it.
In this way, one can potentially build an infinite ladder of spaces, (M = A0 , A1 , A2 , ..., Ai , ...).
Relatively speculative though the thought is, I believe that for assuming that all
distributions are almost sharp that the associated Banach or Hilbert spaces corre-
224
sponding to the ”Quantum Mechanics” of such systems form the pattern B0 , B1 =

dual of B0 , B2 = dual of B1 , etc, if we view much of modern analysis as the study
of the structure of almost sharp statistical geometries.
Perhaps the construction of higher statistical moments, together with ideas from
continuous geometry, consideration of more general signal functions and also taking
into account non-equilibrium dynamics, that is, dynamics that does not involve real-
ising the lower bound of the Cramer-Rao inequality (the last a very strong maybe),
would lead to interesting and significant physics beyond the standard model. As we
shall see in the material to follow, this is in a certain way indeed the case.
225
Classification
226
Chapter 9
Turbulent Geometry
Fractals are generally wily and unruly beasts. There are many unresolved questions
about the ones that are known, and even the simplest (such as the Mandelbrot set)
as notorious for their complexity and depth of structure. Is it useful to model the
atoms of reality to some degree as fractal structures? Certainly there are many
phenomena in nature that exhibit scale free behaviour, so it would appear that
the answer to this question might be yes. Certainly in cosmology, it is well known
that the large scale structure of galaxies and interstellar dust appears to exhibit
characteristic properties of being fractal, at least between certain distance scales -
a self-similarity across many orders of magnitude.
Turbulence is another mystery that defies explanation. From waves crashing

down onto a beach and foaming onward, to even fairly simple systems where water
is pushed through a pipe at high pressure which contains discrete jumps in tube
diameter, turbulence is ubiquitous, but is not very well understood. It is often
confused as a chaotic phenomenon, but it does not have to be. For instance, a
flow can converge to one with turbulent, stable eddies that have predictable and
reproducible properties. The flow over the surface of a dimpled golf ball is also
turbulent, but, if anything, it allows an experienced golfer more control over its
trajectory. Obviously, it would be marvelous to have some model that at least gave
us some method of understanding such phenomena, at least while they are stable!
In this chapter I attempt to develop a mathematics that may go towards helping

to understand these objects and processes in more detail.
227
Preliminaries
9.1 Preliminaries
9.1.1 Fractional Calculus
Before I go any further, I will need to describe the fractional calculus.
A fairly natural question to ask is, is there a way, for any real number α > 0, to
take the αth derivative of a function, or the αth integral?
It turns out that there is such a way, however, whereas for integral α differenti-
ation is purely a local operation, in general boundary information is required.
Rx Rx
For some motivation, define (Jf )(x) = 0 f (t)dt, (J 2 f ) = 0 (Jf )(t)dt etc.
1
Rx
Then clearly (J n f )(x) = (n−1)! 0
(x − t)n−1 f (t)dt for natural numbers n
and this definition extends naturally to all positive real numbers α:
1
Rx
(J α f )(x) = Γ(α) 0
(x − t)α−1 f (t)dt
We then define the β th fractional derivative of a function f as
(Dβ f )(x) = ((Dn ◦ J n−β )f )(x)
for any integer n > β.

These definitions of course extend naturally to multivariable calculus and cal-
culus on manifolds.
9.1.2 Basic tools and motivation

Here I shall define the necessary tools and the objects of my study henceforth.
The aim is to be able to model spaces of generalised fractal dimension, in a sense
which I will soon make more precise. The motivation was to attempt to fabricate a
machinery that would, once and for all, provide the means to validate rigorously our
previous educated guess as to why a stable physical geometry should be Lorentzian.
In other words, we should be able to say why physics as we usually understand it
should take place in Lorentzian space. Furthermore, I am motivated by the need to
remove myself of dimensional dependence, since if we can model our base space as
an infinite dimensional manifold with a certain specific fractal measure, we will have
that if we take a statistical manifold on top of our statistical manifold (for the base
space) instead of having a quadratic growth in the size of the dimensions, we will
228
Turbulent Geometry
go from ∞ to ∞. So, with such a theory, we should be able to treat such stacked
spaces in a more streamlined, natural, and elegant fashion.
Of course, it seems quite logical that once we have determined and fleshed out
the necessary concepts, it should then be possible to use them to determine new
physics, after hitting the relevant information with the Cramer-Rao inequality and
evaluating the critical points. This shall become the ultimate aim of this section.
Before I mention the main ideas here, I should talk first of fractal geometry.
By this I do not mean fractal geometry in the standard sense, where one deals
with Cantor dust like objects. Rather, I am interested in considering a variety of a
continuum of objects where the dimension at each point in the set takes a value on
the real line and varies from point to point. More generally, in turbulent geometry,
I am interested in extending the notion of the dimension of an infinitesimal piece
of a set to Rn , or, more precisely, to a Riemannian n-manifold N , that depends
somehow on the base space M .
The key idea is to observe that, if ψ is some distribution
R R over the real line, we can
th r
measure the r fractal dimension of ψ by computing R R ψ(x)∂(a) δ(σ(x) − a)dadx,
where σ is an inner product on the tangent space of R, and ∂ r is the usual extension
of the derivative to the real line. This can easily be checked. We immediately
observe that there is a key connection to statistical geometry here. Pursuing this
analogy, we would like to be able to sensibly define ∂ r where r = (r1 , ..., rn ) is the
local expression of an element of some manifold N .
First define g to be a point in Rr if it is a map
g : (...((Rr1 → Rr2 ) → Rr3 ) → ...) → Rrn
Now, define ∂ r (f ) to be the function g in Rr which maps ∂ r1 f 7→ ∂ r2 f 7→ ... 7→

∂ rn f .
Then we can make sense of the r-measure of a ψ distribution over a manifold
M in an analogous fashion:
R R r
M A
ψ(m)∂(a) δ(σ(m) − a)dadm
However certain issues remain unresolved, such as how to develop dynamics from
this language. That will be the focus of the next section.
229
Preliminaries
9.1.3 Construction of the information

We seek to compute the information of a statistical manifold with a completely
general signal function, subject to almost sharp turbulence. This should serve as a
useful model for at least certain physical phenomena that lie beyond the scope of
the standard model.
First I will have to determine how to make sense of the information for this
problem. In fact, a two tier approach is necessary. Suppose now instead of having
one definite dimension of each point in our space we have a statistical distribution.
Furthermore, suppose we define the turbulent information as follows:
R R
I(turb) = M A
ρ(~g) dadm
where ρ(~g) is the density of the Fisher information with respect to a vector of
signal functions ~g . Then this is a natural interpretation of the fractal dimension,
given in terms of the standard language of statistical geometry from before, such
that the physical information is:
(∂f ∗ ρ(f ) ; ρ(~g) )dadm

R R
I(phys) = M A
where ∂f ∗ is the derivative with respect to the signal function f .

I will need to go into some more detail about ∂f ∗ . If we look at the motivational
example of the signal function δ(σ(m) − a), we would like to somehow have some
∂
sensible notion compatible with the idea that ( ∂a δ; κ) for some function κ will give
an effective dimension of κ(m, a) at (m, a), but does not interfere with our previous
notion of statistical derivative over the domain M ×A for plain statistical geometries.
In particular we are interested in a derivative with respect to the domain {M ×
A} → R, the space of signal functions. Consider
∂ ∂ ∂σ −1 ∂ ∂δ ∂δ ∂
∂δ −1
= ∂σ −1 ∂δ −1
= ∂σ −1 ∂σ
= ∂σ ∂σ −1
∂ ∂ ∂
with the last equality following because ∂σ ∂σ −1
= ∂Id is the zero operator.
If we now use this as a prototype for our idea of turbulent derivative, we wish
to look at
∂ ∂
( ∂δ δ; κ)1/2 = (( ∂σ δ; κ)( ∂σ∂−1 δ; κ))1/2
230
Turbulent Geometry
which we may notate shorthand as
(∂δ ∗ δ; κ) := ( ∂δ
∂
δ; κ)1/2
This will provide the desired effect of providing dimension κ(m, a) at the location
(m, a). This notation also naturally extends to more general signal functions.
We need to check that this is well defined, of course - namely that I(phys) is
invariant under alteration of ρ(g) by a boundary term. This may require a further
condition to be imposed, such as criticality or some form of conservation law, more
likely.
Regardless, it now seems clear that the natural space to use as our ”dimension
manifold” at p ∈ M for the ath inner product is T(p,a) M .
A special case of the above, when f and g are both sharp, with f (m, a) =
δ(σ(m) − a) and ~g (m, a) = δ(~τ (m) − a), causes the information I(phys) to reduce to
the rather elegant expression
R
I(phys) = M
∂ R~τ Rσ
and of course the turbulent information to be

R
I(turb) = M
R~τ
What still is not clear, however, is whether we should merely require that I(phys)
be critical - the most likely bet - or whether both informations should be critical.
The information that we are interested in studying is for almost sharp turbu-
lence:
2
(1 + ∆~τ + ∆~τ2 2! + ...)R~τ
R
I(turb) = M
and a completely general signal function for the base space. If we evaluate things
by brute force this leads to expressions which are bulky and difficult to deal with,
so I will not do so here. Instead it will turn out that with a bit more work and
development of appropriate notation these expressions can be reduced to something
more manageable.
R
Note that a completely general signal function is of theR form
R f (m, a) = A
F (m, b)δ(σb (m)−
a)db, and, without turbulence, the information would be M A eh ∆σb (h)dbdm where
F = eh .
231
Preliminaries
9.1.4 The Correspondence Principle

In the previous discussion, I was interested in signal functions of the form
R
f (m, a) = A
F (m, b)δ(σb (m) − a)db
with sharp, or almost sharp turbulence defined on them, given by a signal func-
tion of the form
~g (m, a) = (1 + ∆~τ )δ(~τ (m) − a)
in order to perhaps gain an understanding of entanglement physics.

The base signal function is an example of the most general signal function one
can get over a statistical manifold without additional structure. However, there is
a key problem with looking at objects of this type, because it is not at all clear
what their connection to real world physics is; there are few obvious clues as to how
to connect the statistical dynamics predicted by these things with experiment, and
certainly no clear way to connect to practical application.
However, we might make the observation that sharp turbulent manifolds share
similar properties to general signal functions over statistical geometries. In partic-
ular there is integration twice over the space of inner products, and once over the
manifold itself. This suggests in turn the following question:
Question. Is there a correspondence between sharp turbulent manifolds and general

statistical geometries?
It turns out that the answer to this question is yes:
Theorem 9.1.1. (Correspondence principle). Let f (m, a) = δ(σ(m) − a) and

g(m, a) = δ(~τ (m) − a) be sharp signal functions. Then there is a correspondence
(f, g) 7→ (F, σ) such that the signal function volume element associated to (f, g),
(f ; g)dV = (∂f ∗ f ; g)dV (f ; g), corresponds to the signal function volume element
associated to (F, σ), f¯(m, a)dm = A F (m, b)δ(σb (m) − a)(det(σb ))1/2 dbdm.
R
Proof. First of all,
(f ; g)(m, a, b) = (∂f ∗ f (m, a); g(m, b))
232
Turbulent Geometry
Integrate by parts so that (∂f ∗ ; δ(τ (m) − b)) is acting on the volume element.
By general nonsense we may then realise this as an eigenfunction ρ(m, b) of a volume
element induced by a new family of metrics σb . Hence
R
(f ; g) = A ρ(m, b)δ(σ(m) − a)(det(σb ))1/2 db
By absorbing ρ partially into δ we find a new function F (m, b) such that

R
(f ; g) = A F (m, b)δ(σb (m) − a)(det(σb ))1/2 db
which completes the proof.

Remark. It is clear as corollary to this result then that entanglement physics may
well be described to a first approximation by second order turbulence. That is,
geometries with actions of the form
(∂σ ∗ R(σ); ∂τ ∗ R(τ ); R(φ))dm

R
I(σ, τ, φ) = M
would seem to be the key models to look at.

Remark. Another key point to mention is that we are not looking at turbulent
geometry in full generality here to realise the correspondence; in particular we are
looking only at fractal geometry. Our signal functions are both sharp. g is not a
vector; it is merely a scalar functional.
9.1.5 Further generalisations

It is clear that merely looking at a vector of signal functions is not enough for full
generality. For instance, we might be interested in transformations of this vector,
such as if it were only a representation from a coordinate chart of an enveloping
space N . In particular, we are interested in signal functions of the form
h(m, n, a, b) = ∂ g(m,n,b) f (m, a).
where g((m, n), b) is a signal function on M × N , for instance of the form

δ(τ (m, n)−b) if g is sharp. Of course M ×N induces a new topology on M , since τ in-
duces a new metric for each n ∈ N , τ̄n (m) := τ (m, n1 ) → τ (m, n2 ) → ... → τ (m, nk ).
But this is not entirely general enough; we would like to have a signal function
of the form
233
Cramer-Rao for turbulent statistical manifolds
h(m, a, b, c) = (f (m, a); g(m, b, c))
where g(m, b, c) = G(m, c)δ(σc (m) − b). But by the correspondence principle
this is equivalent to
(f (m, a); α(m, b); β(m, c))
where α and β are sharp. So we can see that turbulent geometry is just a
special case of fractal fractal geometry, which we are interested in for its potential
applications to entanglement physics. By abuse of terminology I will also call the
latter second order turbulent geometry, and geometry with fractal measure first order
turbulent geometry.
9.2 Cramer-Rao for turbulent statistical manifolds

9.2.1 Introduction
What is absolutely necessary for us to show before we can proceed, in order to
justify the calculation in our final section, is to show that the physical information
satisfies some form of generalised Cramer-Rao inequality for the class of objects of
this section which I choose to call turbulent statistical manifolds.
R R
In other words, we would like I(phys) = I(f,~g) := M A ∂f (ρf ; ρ~g )dadm ≥ 0 for all
f and ~g , subject possibly to some sort of proviso - a ”conservation relation” linking
2
the two. Here of course ρk = k∂kkkk is the usual information density for a standard
statistical manifold.
I shall hazard
R R a guess and suggest that the condition which we need is that
I(turb) = Ig := M A ρ~g dadm is critical; that is, it is zero and δI~g = 0.
In fact, it turns out we do not need this additional condition. In fact, it turns
out that merely the following is true:
Theorem 9.2.1. (Turbulent EPI Principle). Let Λ = (M, A, f, ~g ) be a statistical
turbulent manifold. Then
R R
I(phys) = M A
∂f (ρf ; ρ~g ) ≥ 0
where ρf , ρ~g are the Fisher information densities associated to the signal func-
tions f and ~g .
234
Turbulent Geometry
To prove this result I will need to play the same game as before and develop
an appropriate idea of weak and strong turbulent estimators. However, due to the
complexity of the issues involved, my treatment will be at best a sketch of what a
properly rigourous treatment would require.
9.2.2 Turbulent estimators and the Cramer-Rao inequality

Definition 40. Let Λ = (M, A, f, ~g ) be a statistical turbulent manifold, where
f is the signal function on the base and ~g is the signal function vector on the
turbulent superstructure. (In this case signal functions take the form h(m, a, b) =
(∂f ∗ f (m, a); ~g (m, b)).) A turbulent estimator on this structure is a tuple (u, ~v ) where
u, vi are maps from the sample space M ×A to M . An unbiased turbulent estimator
is an estimator that satisfies the condition that
f ×~g
Em ~ ◦ ~v ) = (∂f ∗ θ(m); φ(m))
(∂f ∗ θ ◦ u; φ ~
∗ f ×~g
R Rwhere∗ ∂f is the turbulent statistical derivative with respect
n
to f , Em (κ) =
A A
(∂f f (m, a); ~g (m, b))κ(m, a, b)dbda, and φi , θ : M → R are coordinate charts
on M .
(f ×~g )
Lemma 9.2.2. (Factorisation). Em (∂f ∗ τ ; σ) = A (∂f ∗ (f τ ); (~g σ)). Equiva-
R
lently, (∂f ∗ f ; ~g )(∂f ∗ τ ; σ) = (∂f ∗ (f τ ); (~g σ)).
Lemma 9.2.3. Unbiased turbulent estimators satisfy the following relation:
f ×~g
Em ~ k ◦ ~v )lnf (u) ∂f ln~g(~v) ∂~g (∂f ∗ f (u); ~g (~v ))} = 0
{(∂f ∗ θi ◦ u; φ
Proof. The result follows by differentiating the definition of an unbiased turbulent

estimator on both sides and then integrating by parts, as with the simpler case.
Definition 41. The Fisher information tensor hkl

ij (f, g) (corresponding to the like-
lihood functions f, g), is defined to be
f ×~g
hkl
ij (f, ~
g ) := Em {Γik (f, ~g )Γjl (f, ~g )}
where
Γik (f, ~g ) = (lnf ◦ ∂fi )(lng ◦ ∂gk )(∂f ∗ f ; g)
235
Cramer-Rao for turbulent statistical manifolds
Theorem 9.2.4. (Turbulent Cramer-Rao Inequality).
~ k ◦ ~v ), (∂f ∗ θj ◦ u; φ
cov f ×~g {(∂f ∗ θi ◦ u; φ ~ l ◦ ~v )} − hkl (f (u), ~g (~v )) ≥ 0
ij
where cov f ×~g (v ik , wjl ) := Em

f ×~g ik jl
(v w ).
f ×~g ik jl
Proof. Note that Em (Tαβ Tγδ ) is certainly nonnegative, for
ik
Tαβ := (∂f ∗ (θi ◦ u); (φ ~ α ◦ ~v )} −
~ α ◦ ~v )) − E f ×~g {∂f ∗ (θi ◦ u); (φ
m
αβ
{∂f ∗ ∂θ∂k ln(f (u)); ∂ φ∂~β ln(g(v))}hik (f (u), ~g (~v ))
Expanding, we get
~ k ◦ ~v )}, {∂f ∗ (θj ◦ u); (φ (f ×~g )

~ l ◦ ~v )}) − 2Em ~k ◦
cov (f ×~g) ({∂f ∗ (θi ◦ u); (φ ({(∂f ∗ (θi ◦ u); (φ
(f ×~g ) ~ k ◦ ~v ))(∂f ∗ ∂α ln(f (u)); ∂ ln(~g (~v )))}hlβ (f (u), ~g (~v ))) +
~v )) − Em (∂f ∗ (θi ◦ u); (φ ∂θ ~β
∂φ jα
hkl g (~v )) ≥ 0
ij (f (u), ~
But, observing first of all that h is statistically independent, then by the lemma
the second term reduces to
(f ×~g ) (f ×~g ) ~ k ◦~v ))(∂f ∗ ∂α ln(f (u)); ∂ ln(~g (~v )))}hlβ (f (u), ~g (~v )))
2Em ({−Em (∂f ∗ (θi ◦u); (φ ∂θ ~β
∂φ jα
using Em (−v) = −Em (v), together with the definition of unbiased turbulent
estimator, we get
(f ×~g ) ~ k ◦ ~v ))(∂f ∗ ∂α ln(f (u)); ∂ ln(~g (~v )))hlβ (f (u), ~g (~v )))
−2Em ((∂f ∗ (θi ◦ u); (φ ∂θ ~β
∂φ jα
∂ln(f )
∂f
Note that ∂θ j = ∂θj
f , and also (∂f ∗ ∂ln(f
∂θi
)
f ; ∂ln(~
∂φ
g)
~j ~g ) = (∂f ∗ ∂ln(f
∂θi
) ∂ln(~g )
; ∂ φ~j )(∂f ∗ f ; ~g )
by the factorisation lemma.
One concludes that
(∂f ∗ ∂ln(f
∂θi
) ∂ln(~g ) ∂f
; ∂ φ~j )(∂f ∗ f ; ~g ) = (∂f ∗ ∂θ ∂~g
i ; ~j )
∂φ
from which follows, after integration by parts that the second term in the original
expansion simplifies to
(f ×~g )
−2Em (hkl kl
ij ) = −2hij
236
Turbulent Geometry
completing the proof.
Definition 42. A turbulent maximum likelihood estimator (tmle) is an unbiased

estimator (u, v) that realises a local maximum of (∂f ∗ ln(f ◦ u); ln(~g ◦ ~v )).
Lemma 9.2.5. Any tmle realises the Cramer-Rao lower bound.
9.2.3 Application to physical estimators

Similarly to our previous experience with statistical manifolds, we have a notion
of weakly unbiased turbulent estimator, and also of weak Cramer-Rao inequality,
where in this case we integrate over M in order to obtain our result.
In particular we have
R
Theorem 9.2.6.R (Weak turbulent Cramer-Rao inequality). R Write f (m, a) = A F (m, b)δ(σb (m)−
a)db, ~g (m, a) = A G(m, b)δ(τb (m) − a)db. Let fij = A F (m, b)σij (m, b)db, similarly
for ~g . Note that we need not worry about the role of derivatives of F and G ~ since
as f ,~g are probability distributions such contributions will vanish at infinity. Then
(f ×~g ) ~ k ◦ ~v )}, {∂f ∗ (θj ◦ u); (φ
~ l ◦ v)}) −
(∂f ∗ fij ; ~gkl )(cova ({∂f ∗ (θi ◦ u); (φ
R R
M A
hkl g (~v )))dadm ≥ 0
ij (a)(f (u), ~
where cova is the covariance density function and hkl

ij (a) is the Fisher information
tensor density.
which follows from appropriate generalisations of the prior results.

Lemma 9.2.7. Any weak turbulent unbiased estimator may be written as
(u, v) = (∂f ∗ f, ~g )
Proof. To prove this we need the turbulent version of Stoke’s theorem. However
after this the rest is easy.
Theorem 9.2.8. (Turbulent Stokes). Let ω be an n − 1 forms on an n manifold
M , and κ a vector valued function on M . Then
d(∂ω ∗ ω; κ) = (∂ω ∗ ω; κ)
R R
M ∂M
237
Statistical Stacks
Proof. The key observation is that the problem may be converted into a repeated
application of Stoke’s theorem in the standard case, from which the result quickly
follows.
Remark. The statistical version of the above follows by merging this result with the
statistical stokes theorem (see chapter 2 or chapter 7). I will not go into this here.
Suppose now (u, ~v ) is a weak tmle. Then for this choice, the above inequality is
~ k ◦~v )}, {∂f ∗ (θj ◦
an equality, and, furthermore, we have that cov (f ×~g) ({∂f ∗ (θi ◦ u); (φ
~ l ◦ ~v )}) = 0, since we may use θ = ln = exp−1 as our coordinate chart, and
u); (φ
then
~ k ◦ ~v )}, {∂f ∗ (θj ◦ u); (φ

cov (f ×~g) ({∂f ∗ (θi ◦ u); (φ ~ l ◦ ~v )}) = ln((∂f ∗ f ; ~g ))(∂f ∗ f ; ~g )
R
A
But ln ◦ f ∗ = f ∗ ◦ ln, so this becomes
(∂f ∗ ln(f ); ln(~g ))(∂f ∗ f ; ~g )

R
A
but by the definition of weak tmle this is zero.

R
Corollary 9.2.9. For a weak tmle (u, v), we have that M
hkl
ij (f (u), ~
g (~v )) = 0.
Lemma 9.2.10. (Turbulent EPI principle). Write h(f (u), ~g (~v )) = (∂f ∗ fij ; ~gkl )hkl
ij (a)(f (u), ~
g (~v )).
Then we have that
(∂f ∗ ρf ; ρ~g )
R R R R
0≤ M A
h(f (u), ~g (~v )) = M A
Proof. Follows by the factorisation lemma.
9.3 Statistical Stacks

9.3.1 Motivation
We would like to sometimes be able to describe the critical behaviour of systems
where properties of the solution in the base space do not necessarily vanish at infinity.
For instance, we might like to study periodic physical behaviour in our systems (for
instance, as in condensed matter physics) rather than rapid asymptotic decay of the
salient properties of our solution (as in particle physics). This suggests that we need
238
Turbulent Geometry
to derive an extremisation principle that naturally has a particular generalisation

of the holographic principle that applies to some sort of ”lifted” geometry that is
related to, but not corresponding, to the base.
One such way of doing this as I shall describe is to consider a so called sta-
tistical stack. The basic idea here is fairly simple - we would like to construct a
statistical structure of a statistical structure, define a suitable generalisation of the
fisher information, and then play the standard game, by examining critical behaviour
corresponding to particular points in the solution space of this functional.
What makes this game slightly harder to play in full generality is that we do
not merely want to be able to construct the one-fold stack corresponding to an
underlying manifold with signal function (the statistical manifold), or a two-fold,
three-fold etc. Rather we need to somehow develop a language that allows us to
describe for instance 1/2-fold stacks, or (n, m)-stacks; in other words we would like
to construct an appropriate language for turbulent stacks.
9.3.2 Basic tools

I shall now describe the construction. As a first observation, we will have a sequence
of signal functions for each level, so this complicates things. Hence we need some
way of combining these into a single induced signal function. I shall write this meta
signal function as
f k (m, a0 , a1 , ...)
where the ai corresponds to the ith statistical level, and k ∈ R∞ is the turbulent
variable. But this is rather bad notation, since after all we might have less than one
ai ; so we might instead write
f k (m, a(k) )
∞
with some sort of induced probabilistic distribution being understood over RR ,
in particular k = ∂ρg(stack) (m, b) where ρg (m, b) is the information density of a signal
function for a standard statistical distribution over our manifold M with signal
function g and statistical variable b.
In particular, I will define a(k) ∈ Rk . Also, I claim that we may interpret without
loss of generality f k literally as f k , for some base signal function f (m, a).
Then, in order to compute the information, we write f (m, a)g(m,n,b) =: γ(m, n, a, b),
and then compute
239
Statistical Stacks
R R R R
I(γ) = M N A B
ργ
As a motivating example for all this, I claim that the information of a sharp
turbulent stack is
R R R )
M N
Rσ τ̄n (m) (m)dndm
where here of course g(m, n, b) = δ(τ (m, n) − b), and f (m, a) = δ(σ(m) − a).
This can be verified by standard sorts of arguments to before and reference to the
definitions.
A slightly less trivial example is the following:
R R
M A
(1 + ∆τ + ∆2τ 2 + ...)∆f (m,a) F (m, a)dadm
which arises from the Rsecond statistical moment of a stack associated to the
signal function f (m, a) = A F (m, b)δ(σb (m) − a)db after ”forgetting” some of the
degrees of freedom at the first level. Here evidently our turbulent signal function
is almost sharp scalar turbulence, g(m, a) = (1 + ∆τ + ...)δ(τ (m) − b) (since then
f ∂ρg = (1 + ∆τ + ...)f as required).
9.3.3 Dynamics
I now give the extremisation theorem for this object, ie, the Cramer-Rao inequality,
and give a sketch of its proof. This will not nearly be as hard as it could be, since
we have already done most of the work; it will mainly be an adaptation of the proof
of the Cramer-Rao inequality for standard turbulent spaces.
Theorem 9.3.1. (Cramer-Rao Inequality for a turbulent stack). Let us have a
turbulent stack (M, N, A, f, g) as above. Then I(f g ) ≥ 0.
To prove this result we will need to play the standard game and build up an
appropriate set of tools to tackle the problem.
Definition 43. Let Λ = (M, N, A, f, g) be a turbulent stack, where f is the signal
function on the base and g is the signal function on the turbulent superstructure. A
turbulent estimator on this structure is a tuple (u, v) where u, v are maps from the
sample space M × A to M . An unbiased turbulent estimator is an estimator that
satisfies the condition that
240
Turbulent Geometry
f g
E(m) ((θi ◦ u); (φj ◦ v)) = (θi (m); φj (m, n))
where
f g R
Em (τ ; σ) := A
f g (τ σ )
is the turbulent expectation with respect to the signal functions f and g, and
θ : M → Rn is a coordinate chart on M , φ : M × N → Rn is a coordinate chart on
M × N.
Remark. From now on I will use the notation f g and (f ; g) interchangeably.
(f ;g) R R
Lemma 9.3.2. (Factorisation). Em (τ ; σ) := A ((f τ ); (gσ)) = A (f ; g)(τ ; σ).
Proof. We use the fact that f and g are normalised distribution functions over A,
which conserve probability.
Lemma 9.3.3. Unbiased turbulent estimators satisfy the following relation:
(f g )
Em ({(θi ◦ u); (φk ◦ v)}{( ∂θ∂ j ln(f ◦ u)); ( ∂φ
∂
l ln(g ◦ v))}) = 0
Proof. The result follows by differentiating the definition of an unbiased turbulent

estimator on both sides and then integrating by parts, as with the simpler case.
Definition 44. The Fisher information tensor hkl
ij (f, g) (corresponding to the like-
lihood functions f, g), is defined to be
∂ (f g ) ∂ ∂ ∂
hkl
ij (f, g) := Em {( ∂θi ln(f ); ∂φk ln(g))( ∂θj ln(f ); ∂φl ln(g))}
Theorem 9.3.4. (Turbulent Cramer-Rao Inequality).
cov (f ;g) ({(θi ◦ u); (φk ◦ v)}, {(θj ◦ u); (φl ◦ v)}) − hkl
ij (f (u), g(v)) ≥ 0
(f ;g)
where cov (f ;g) (v ik , wjl ) := Em (v ik wjl ).
(f ;g)
jl
Proof. Note that Em (Γik
αβ Γγδ ) is certainly nonnegative, for
Γik
αβ :=
(f ×g)
((θi ◦u); (φα ◦v))−Em {(θi ◦u); (φα ◦v)}−{ ∂θ∂k ln(f (u)); ∂φ∂ β ln(g(v))}hαβ
ik (f (u), g(v))
241
Statistical Stacks
Expanding, we get
(f ;g) (f ;g)
cov (f ;g) ({(θi ◦u); (φk ◦v)}, {(θj ◦u); (φl ◦v)})−2Em ({((θi ◦u); (φk ◦v))−Em ((θi ◦
u); (φk ◦ v))( ∂θ∂α ln(f (u)); ∂φ∂ β ln(g(v)))}hlβ kl
jα (f (u), g(v))) + hij (f (u), g(v)) ≥ 0
But, observing first of all that h is statistically independent, then by the lemma
the second term reduces to
(f ;g) (f ;g)
2Em ({−Em ((θi ◦ u); (φk ◦ v))( ∂θ∂α ln(f (u)); ∂φ∂ β ln(g(v)))}hlβ
jα (f (u), g(v)))
using Em (−v) = −Em (v), together with the definition of unbiased turbulent
estimator, we get
(f ;g)
−2Em (((θi ◦ u); (φk ◦ v))( ∂θ∂α ln(f (u)); ∂φ∂ β ln(g(v)))hlβ
jα (f (u), g(v)))
from which follows, after integration by parts and use of the factorisation lemma
that the second term in the original expansion simplifies to
(f ;g)
−2Em (hkl kl
ij ) = −2hij
completing the proof.
Definition 45. A turbulent maximum likelihood estimator (tmle) is an unbiased

estimator (u, v) that realises a local maximum of (ln(f ◦ u); ln(g ◦ v)).
Lemma 9.3.5. Any tmle realises the Cramer-Rao lower bound.
It is then possible, following the same methodology as before, to then sharpen
these results to dealing with weak estimators and finally prove the Cramer-Rao
inequality for turbulent stacks. In order to do this, of course, we need one more
result - Stokes theorem for turbulent stacks.
Theorem 9.3.6. (Stokes for (sharp) turbulent stacks). Let ω be a n − 1 form and
τ an arbitrary vector valued function on M . Then
R R R R
M N
d(ω τ (m,n) (m)) = ∂M N ω τ (m,n) (m)
Proof. As opposed to stokes for sharp turbulent manifolds, we are not performing
in effect a repeated application of the standard form of stokes. Rather we are
trying to show that stokes holds not only for forms that are function valued, but
are functionals, or higher order objects. But this is an eminently reasonable thing
to expect.
242
Turbulent Geometry
9.3.4 A few quick remarks on notation

The notation that I have been using so far is perhaps not entirely transparent, so
I will briefly digress and provide an alternative perspective on what I have been
doing. Essentially the difference between turbulence in the measure and turbulence
in the stack is a matter of the order at which operators are applied. Consider the
curvature operator R : {metrics on a manifold} → {functions on the same manifold}.
∗
Similarly, consider the operator ∂(τ ) : F → F where F is the space of functions on
a manifold, and τ is an auxiliary ”turbulence” metric associated to the operator,
∗ ∗
whose action is defined as ∂(τ ) (f ) = (∂f f ; R(τ )).
∗
Then for turbulence in the measure, we construct ∂(τ ) ◦ R(σ). Conversely, for
∗
turbulence in the stack, we act component wise on σ with the operator R ◦ ∂(τ ) . Of
course in general these two operators will not commute.
We may iterate this process. For consider
∗
∂(∂ ∗ ∗ (τ ))
◦R◦∂(α) ◦ R(σ)
(γ)
It is clear that there is a natural ”bifurcation” occurring in the operators here,

and in fact this is exactly what does occur when we look at deeper levels of infor-
mation within the structures I have defined above.
However I will continue to stick to the older notation for the remainder of this
manuscript, and perhaps develop the above ideas further in a later discourse.
9.4 Topics in turbulent statistical dynamics

Here I motivate a number of areas where the tools in this chapter may be applied,
without going into too much detail or development.
9.4.1 Application to condensed matter physics

The simplest nontrivial action incorporating both fractal behaviour in the stack as
well as in the measure is the focus of this section. This serves well as a model
for systems where statistical effects are still small but not negligible, such as in
condensed matter physics. As an important remark, I should mention that some
particularly exotic condensed matter systems will probably have more significant
243
Topics in turbulent statistical dynamics
statistical effects, so this model will not be useful in such cases, since it is a pertur-
bative expansion in
kkR3 , = (1 , 2 , 3 ).
First of all, ignore perturbative effects. Assume in other words that the geometry
is to some extent sharp, but exhibiting both types of fractal behaviour. Then I claim
we will have an action of the form
(∂(σ; τ )∗ R(σ;τ )(m) ; Rκ (m))dm

R
M
where f (m, a)g(m,n,b) = δ(σ(m) − a)δ(τ (m,n)−b) induces the metric (σ; τ ) on M . In
particular the associated signal function S is of the form
S = ((f ; g); h)
where h(m, a) = δ(κ(m) − a).

Introducing small statistical fluctuations in f only - we would have to have a
higher order construction to justify fluctuations in g and h - creates a new action
(∂σ ∗ (R(Rτ )(1)σ + R(Rτ )(2)σ + R(Rτ )(3)σ 2 + ...); R(1)κ )

R
M
where R(Rτ )σ (det(σ; τ ))1/2 := ∆∆

σ (det(σ; τ ))
τ 1/2
is the eigenfunction associated
to the respective operator acting on the volume element, (σ; τ ) = σijτkl , and det is
the natural extension of determinant to the tensor product of two matrices.
So we have an action in terms of the scalar variable , as well as three metrics-
σ, τ , and κ. Differentiating with respect to these parameters we obtain a set of
partial differential equations which control the behaviour of critical examples of
these objects.
With respect to κ:
(∂σ ∗ (R(Rτ )(1)σ + R(Rτ )(2)σ + R(Rτ )(3)σ 2 + ...); R(1)κ ) = 0
With respect to τ :
Rτ = 0
244
Turbulent Geometry
With respect to σ:
(∂σ ∗ (R(Rτ )(1)σ + R(Rτ )(2)σ + R(Rτ )(3)σ 2 + ...); ∂σ ∗ R(1)κ ) = 0
With respect to :
(∂σ ∗ (R(Rτ )(2)σ + 2R(Rτ )(3)σ + ...); R(1)κ ) = 0
It is my hope that, giving certain simplifying assumptions, these coupled PDEs

will reduce to give standard results in condensed matter theory. To sharpen the
models to a greater extent, a second order theory is the way to go. Roughly speaking,
one now allows second order turbulence in both the stack and the measure, as well as
considering fractal measure with fractal stacking (cross term). This allows a model
as above to be constructed, but now with statistical fluctuations permissible at all
parts of the first level.
However, I feel that the foundations need to be touched up some more before
this is developed further. This is a task for future research.
9.4.2 Fluid dynamics

On a slightly simpler note, we are interested in fluid flow with ”turbulence”, or
at least first order turbulence. This requires a sharp action as described in the
discussion immediately above. ie
(∂σ ∗ (R(Rτ )σ ; Rκ (m))dm

R
M
The geometric fluid flow of a physical system is described by the criticality of

this functional. Hence we must have
(∂σ ∗ R(Rτ )σ ; Rκ ) = 0,
Rτ = 0
and
(∂σ ∗ R(Rτ )σ ; ∂σ ∗ Rκ ) = 0
245
We might be interested in the case where the geometry is flat, as is usually the
case where one is examining the behaviour of fluids on earth. Hence all three metrics
become purely antisymmetric. The aim is to now rewrite the corresponding PDEs
and compare with the standard form of the Navier-Stokes equations.
I should remark however that, before we can go any further, just like before we
need to reexamine the fundamentals more closely to make sure we know what we
are dealing with. This is a task for the future.
9.4.3 The Lorentz problem

I would like to now sketch a potential resolution of the Lorentz problem, or the
mystery of why the physics we experience should occur in a Lorentzian geometry.
Recall from our results before in 8.2 that we certainly need at least four dimensions
to realise a perturbative model of physics, since R(2) requires four dimensions to be
computably nonzero.
This of course leaves the questions of why the index should be one, and, further-
more, why also in a sharp model, why the framework is stabler the fewer dimensions
one has to play with. The latter turns out to be easier to show, so I will prove that
first:
Theorem 9.4.1. (Minimisation of dimension). In a sharp fractal manifold, or first

order turbulent geometry, the optimal fractal dimension for the information is as
small as possible; in other words, it is zero.
Proof. (sketch). Recall that the action is of the form
(∂σ ∗ Rσ ; Rτ )
R
M
For this to be critical we require both the derivatives wrt τ and σ to vanish.
Recalling the definition of derivative, we have
(n)
fτ (x)∂ n
∂ fτ := Σ∞
n=0 n!
hence
(n)
∂fτ (x) n
∂ ∂f
∂ fτ
∂τ
∂ := Σ∞
n=0
∂τ
n!
= ∂ ∂τ
Consequently requiring the derivative wrt τ to vanish is equivalent to requiring
246
Turbulent Geometry
(∂σ ∗ Rσ ; Rτ ) = 0
Differentiating wrt σ is not entirely a good idea, but differentiating half with
respect to σ, then half with respect to σ −1 , ie applying the operator ∂σ ∗ , is a better
idea and clearly equivalent.
Via the same trick we have
(n) ∂ n+1 g
∂∂ f g = Σ f n!
= ∂ ∂f g
So differentiating wrt σ ∗ and setting to zero produces the consequential relation
(∂σ ∗ Rσ ; ∂σ ∗ Rτ ) = 0
But the fractional derivatives of a function can only both be zero if the orders
of the derivative are both the same. So
∂σ ∗ Rτ = Rτ
But this is impossible unless ∂σ ∗ is the identity operator, which will occur only
if σ = 0. So the optimal dimension is zero as required.
Now, in a sharp first order turbulent geometry, it turns out that the concept of
index generalises in a nice way:
Lemma 9.4.2. (Turbulent index). The index of a sharp turbulent geometry (∂σ ∗ δ(σ(m)−
a); δ(τ (m) − b)) takes the form
ind(∂σ ∗ σ; τ ) = ind(σ)ind(τ )
where ind(σ) is the weighted index of the metric σ.
Proof. (sketch). This basically follows from the fact that the weighted index of AB ,
where A, B are both matrices, is equal to the index of A raised to the index of B.
n
For if we denote A = lnÂ to be the logarithm of Â, ie such that Â = eA = Σ An! ,
n
we get ÂB = eA⊗B = Σ (A⊗B) n!
. Now the index of A ⊗ B is the index of A times the
ˆ ˆ
index of B. Write ind := log ◦ind◦exp. Then similarly ind(A⊗B) ˆ
= ind(A)ind(B).
ˆ
Hence the index of eA⊗B will be the exponential of this index, ie eind(A)ind(B) . But the
ˆ ˆ
index of Â is eind(A) ie ind(eA ) = eind(A) ; hence the index of ÂB is ind(Â)ind(B) .
247
Theorem 9.4.3. (Index Lemma). In a nontrivial turbulent geometry with almost

sharp geometry on the base, and sharp geometry on the first and second levels, the
Cramer-Rao inequality implies the turbulent index must be greater than or equal to
one; in particular, when the information is critical, the index must be one.
Proof. (sketch). Via the proof of minimisation of dimension, in a sharp second order
turbulent geometry we must have that the dimension for the base and first levels
must be as small as possible. In particular at the first level the geometry of the
fractal measure will be trivial when optimised.
Now, by the lemma above, we then must have that the index of the geometry to
first order is ind(σ)ind(τ ) . But by the above remarks τ = 0, and so ind(τ ) = 0. Since
we are assuming that the geometry of the base is nontrivial (in particular, since we
are assuming perturbative effects, which pose an obstruction to minimisation), we
have that σ 6= 0. Hence the index is well defined and must be one.
It is then relatively easy to see that these results should generalise; in particular
it seems clear that as a consequence of these two results that an optimal and stable
almost sharp geometry must be Lorentzian.
9.4.4 Entanglement physics

Recall from before I mentioned that the simplest cartoon model for entanglement
physics, or action at a distance in the base geometry M , is probably described by
criticality of the action
(∂σ ∗ Rσ ; ∂τ ∗ Rτ ; Rκ )
R
M
Here I am assuming ”scalar turbulence”, or merely fractal dynamics.

Now, it is of interest to discuss how this all relates to the issues raised in the
EPR paper of the mid 1930s [14], which were discussed further in Bell’s paper of
the mid 1960s [3]. I will make an attempt to do this now.
In [14], it was demonstrated that the separate assumptions that quantum me-
chanics was a physically complete theory, together with the idea that separate non-
commuting observables could not have simultaneous existence, were incompatible.
This was achieved by examining the states of two particles which interacted only
from time 0 to time T , and showing that, after this time, the observation of the state
248
Turbulent Geometry
of one could arbitrarily determine the state of the other. In particular this demon-
strated that a contradiction could be constructed wherein it was possible to violate
the second assumption, that two noncommuting observables could have independent
reality in the second state.
This motivated the development of hidden variable models, such as the Bohmian
mechanics of Aharonov and Bohm [5], [6] in the 1950s, in an effort to restore to
quantum mechanics causality and locality. But these theories all suffered from
various problems. In [3], Bell argued that it was impossible to construct a theory,
consistent with the wave function interpretation of quantum mechanics that did not
involve entanglement, or nonlocal effects. In order to show this he constructed a
famous inequality, now known (unsurprisingly) as the Bell Inequality.
Of course, the theory which I have laboriously constructed is, in and of itself,
riddled through and through with hidden variables. So in order to continue it
becomes necessary to address Bell’s arguments, and examine the assumptions upon
which he reached his conclusions.
Certainly in the course of this chapter and indeed over the course of this treatise
I have demonstrated that it is possible to construct higher order geometric objects,
which transcend limiting oneself to examining the geometry of the base ”physical”
space. This allows mechanisms via which information may transit smoothly between
points which in a simpler geometry would be far apart. For instance, consider naively
the example of a manifold, and consider two points which with respect to one metric
σ0 are a distance L apart. But we may consider a family of metrics σt such that the
distance between these two points with respect to σt is Le−t . Then clearly in the
limit as t becomes large and positive the distance will drop to zero.
We may now construct as in chapter 7 a distribution over this one parameter
family of metrics. So we have a probabilistic theory with hidden variables. Via Bell’s
paper our theory is still nonlocal. But we may make a further extension of the class of
objects we are dealing with, to consider what I describe in this chapter as turbulent
geometry. Then via the correspondence principle our statistical geometry becomes
a sharp first order turbulent geometry, a hidden variable model which nonetheless
provides us with a local description, and furthermore wherein at least part of the
information - the part corresponding to the limit of the sequence of metrics σt - is
mediated between the two points instantaneously.
Furthermore, if one considers the geometry to be almost sharp, in the sense I
defined earlier in chapter 7, then one recovers standard quantum mechanical effects
(as in chapter 8, section 3.5).
249
This certainly seems to provide a constructive contradiction with the thesis of

[3]. So some discussion is definitely warranted.
The crux of the matter, it seems to me, is what definition Bell used or had in
mind for what he thought of as a hidden variable model. Looking through his paper,
and reading between the lines, I extract the following definition by implication:
Definition 46. A hidden variable model (in the sense of Bell) is a model that
(i) is consistent with the wave function interpretation of quantum mechanics,
(ii) that makes use of variables that are ”hidden” in the sense that although they
impact on the process of measurement of physical parameters, they are not
necessarily measurable themselves.
Furthermore it is a model that satisfies the standard wave function interpreta-

tion, ie, one has a wave function ψ(m, n), where m are the coordinates in a chart
corresponding to the standard format of quantum mechanics, and n are the coordi-
nates in a chart corresponding to the hidden variables.
An observable in this model is an operator A that acts on this wave function in
the following manner:
R
A|ψ >:= N
A(n, m)ψ(m, n)dn
The expectation of A is then

R
E(A) := M
A|ψ > (m)
There are no observables for the space N ; it is M that in this interpretation is

the space in which physics occurs.
This is certainly quite restrictive! And, indeed by inspection it does not capture
the full generality of the mathematics which I have developed in the course of this
project; indeed, it fails to even capture the spirit of the statistical geometry; it is
similar to trying to understand the full generality of statistical geometry by merely
looking at the behaviour of almost sharp signal functions. Indeed, it is clear the
following is true:
250
Turbulent Geometry
Lemma 9.4.4. A hidden variable model (in the sense of Bell) is equivalent to the
standard product of an almost sharp geometry K with some other form of topological
space T . Furthermore, this is a product of the simplest type: if m ∈ K and n ∈
T , then the corresponding hidden variable model (in the sense of Bell) will have
corresponding coordinates (m, n).
Proof. (sketch). This follows from the fact that one is still using a wave function
description, and the definition of measurement of an observable in a hidden variable
model.
An example of such a model would be for instance a differentiable manifold M

producted with the space of all possible metrics σ on M . However, this is quite a
different thing from what I have been advocating in terms of the statistical geometry;
whereas I was dealing with a similar sort of product on a local level, the statistics
was imbedded implicitly in the computation of geodesic paths induced by local
descriptors I called signal functions, rather than explicitly via a naive probabilistic
distribution akin to the idea of many ”parallel universes”.
This distinction is crucial, for it is absolutely necessary that different inner
products should be able to interact locally, rather than artifically enjoying a disjoint
and distinct existence from their peers. Another way of putting this, is that the
naive model I described above, of M producted with the space of all possible metrics,
has a trivial, and maximally disconnected topology, whereas the model I have been
advocating has a maximally connected topology. The hidden variable models (in the
sense of Bell) are I believe of the first type.
In particular, there is hence no reason why the Bell inequality should extend
to the more general objects I have constructed - since they are not hidden variable
theories in the sense above, and, in certain respects, are quite the opposite.
This (hopefully) suggests that the EPR paradox is satisfactorially resolved by the
above theoretical considerations. Furthermore, since the resolution is constructive,
we now have an action (as above) which may be studied to provide further insights
into the mediation of so-called entanglement effects. This is a promising direction
for further research.
251
252
Chapter 10
Number theory from information

geometry
In this chapter I will give a number of results from number theory which can be
derived using the methods I have outlined in preceding chapters. In a sense, many
of the results I will sketch here serve as ”toy examples” for the physics associated
to the same mathematical models. The first section requires nothing more than
a detailed understanding of the statistical calculus for statistical manifolds. The
second needs a basic understanding of statistical stacks and of course of statistical
manifolds. As for the final result, it requires the tools of turbulent geometry.
As a minor remark, for the special case of restricting to criticality of subsets
of the real numbers, and considering their analytic extensions, it turns out that
the theory of turbulent stacks and the theory of turbulent geometry become more
or less equivalent. This is mainly because the statistical derivative becomes trivial
through experiencing cancellation; the sections over the space of inner products
do not contribute to the dynamics. However if one were to consider more general
results pertaining to tuples of complex numbers, these techniques split apart again
and develop their own individualistic character.
Of course there are many, many more results and variations on such that can
be derived using these tools, methods, and general philosophy. Such a treatment is
beyond the scope of this work - the main purpose of this chapter is not to catalogue
all potential results that can be demonstrated using the machinery of information
geometry, but rather to demonstrate its power and capacity. Indeed, it is my sus-
picion that it would be impossible to succeed in the former task, since I am of the
opinion that the well of truth accessible after this fashion to be inexhaustible.
253
The first statistical moment of the complex numbers
10.1 The first statistical moment of the complex

numbers
This is an adaptation of a paper which I wrote on this subject, titled ”On the
criticality of the prime numbers and a conjecture of Riemann”.
10.1.1 Introduction
Broadly speaking, the prime numbers are defined to be the minimal set that contains
maximal information about the natural numbers, ie the minimal set that generates
the natural numbers via multiplication.
I have already introduced the concept of a statistical manifold and developed
a variational principle which essentially amounts to computing critical points of an
information like quantity, called the Fisher information, on the statistical manifold.
Motivated by these two similar properties, it seems natural that many famous
conjectures and problems in number theory should be translatable into this language.
In this short note I indicate how it is possible to translate a famous conjecture of
Riemann’s into this context, and sketch how it might be possible to prove it using
these new tools.
This conjecture is:
Conjecture. (Conjecture A). The class of functions g{fn }n∈N (z) = n∈N nfnz , where
P
{fn } ∈ l∞ (C) have no poles to the right of the critical line Re(z) = 1/2.
I will quickly illustrate now how this implies the more famous conjecture:
Conjecture. (Riemann Hypothesis). The zeroes of the functions g{fn }n∈N lie on the
critical line Re(z) = 1/2.
Essentially this implication rests on the following simple result:
Lemma 10.1.1. . For every sequence fn ∈ l∞ (C), there exists a sequence hn ∈

l∞ (C) such that 1/g{fn } = g{hn } .
From which follows, assuming the first conjecture holds:
Corollary 10.1.2. The class of functions g{fn }n∈N have no zeroes to the right of the
critical line Re(z) = 1/2.
254
Number theory from information geometry
It is then well known that the RH follows from this statement for this class. In
particular, this follows from the symmetry of the zeroes for L-functions about the
critical line Re(z) = 1/2, which is in turn a consequence of the functional equation
for the same.
Now, let Π(x, a, d) denote the number of prime numbers in an arithmetic pro-
gression a, a + d, a + 2d, ... which are less than or equal to x. If the RH is true, then
it is well known that for every coprime a and d and for every > 0, we have that
1
Rx 1
Π(x, a, d) = φ(d) 2 ln(t)
dt + O(x1/2+ ) as x → ∞,
where φ : N → N is the Euler phi function, that is, the function where φ(n)
denotes the number of naturals less than or equal to n which are coprime to n.
10.1.2 The Argument (sketch)

We consider M = C, A = C (the space of metrics on the complex line).
The natural signal function is
R
f (m, a) = A
F (m, b)δ(σ(m, b) − a)db
We wish to choose f such that the Fisher information is optimised; in particular,

we wish to find f such that the quantity
R R
K= M A
k∂f k2 /f − kψk2 /f
is not only zero but that δK, the first variation of K, be zero too. Here ψ =
gradΛ f . We ignore torsion due to our assumption of analyticity which renders its
contribution zero by Cauchy’s theorem.
In particular I will more explicitly write ψ more carefully in terms of the signal
function fairly shortly. This will be to assist in generalisations.
R R
I=K= M A
k∂f k2 /f
A computation of the Fisher information yields the fairly easy expression

R R (∂Λ F )2 R R ∂
I= M
(∂ 2 F −
A Λ F
) − M A
kF 1/2 ∂σ ∂σk2
255
0 2
Let F = eh . Then F 00 = (h00 + (h0 )2 )eh and F 0 = h0 eh , so that F 00 − (FF ) = h00 eh .
∂
Write ψ = F 1/2 ∂σ ∂σ.
∂ ∂
)db then roughly speaking h00 =
R
If we recall that ∂Λ (z, a) = A F (z, b)( ∂z + ∂b
∂2h 2 ∂2h
∂z 2
+ ∂∂ah2 + 2 ∂z∂a . But the second term vanishes due to the holographic principle,
and so does the third, so in particular, we have
h00 = hzz
as a distributional expression. More precisely, we get
h00 =
R
A
F (z, b)hzz db
Note we have the following functional identity that will come in handy:
Lemma 10.1.3. eH/δ = H.
Proof. This may be checked qualitatively by verifying that each side possesses the
same general properties; alternatively, one may take a sequence of smooth functions
converging to H and check both sides, observing that H 0 = δ.
Lemma 10.1.4. h(z, a) = A(a)z + B(a) + G(z, a) Hδ (γ(z, a)), where H is the Heavi-
side function and δ is the Dirac delta functional, and G(z, a) is some random smooth
function depending on σ.
∂
Proof. Certainly hzz = k ∂σ ∂σk2 δ follows easily from above. Hence h is the second
antiderivative of this expression.
00
But δ may be written as Hδ . Then certainly h = G(z, a) Hδ (γ(z, a)) + A(a)z +
B(a), since z is the active variable here, and we may integrate out the derivatives
on H by judicious use of the holographic principle.
Lemma 10.1.5. eh = H(γ)F̂ , with F̂ = eAz+B .

H H
Proof. Follows from the fact that eG δ = e δ = H from a previous lemma.
1/2
Lemma 10.1.6. If we write ψ = H ψ̂, then H 1/2 ψ̂a = Ha ψ̂.
256
Proof. By symmetry H(z, a) = −H(z, −a) and Ha (z, a) = Ha (z, −a). Then H 1/2 (z, a)ψ̂a =
1/2 1/2
−H 1/2 (z, −a)ψ̂a = Ha ψ̂(z, −a) = Ha ψ̂(z, a) follows as a consequence of these
symmetries and integration by parts.
Lemma 10.1.7. The fact that the first variation of the information is zero implies
that γ = γ(2z − a).
R R
Proof. Since the first variation of M A
hzz eh is zero already, we may restrict our-
selves to the reduced information
Iˆ =
R R
M A
H(γ)kψ̂k2 dadz
where ψ̂ = Hψ.
Note also by conservation of probability, we moreover have that divΛ ψ = 0, and
hence divΛ ψ̂ = 0.
ˆ
Requiring that δ Iˆ = 0 implies that ∂ I = 0, or, ∂γ
(∂kH(γ)1/2 ψ̂k2 (∂γ)−1 )dadz = 0

R R
M A
∂ ∂
Recall that ∂ is essentially the operator ∂z
+ ∂a
. Now, we use the fact that
1/2
Ha ψ̂ = H 1/2 ψ̂a
together with
ψ̂z = 0
which is a consequence of conservation of probability, to deduce that our expres-

sion above is the same as
1/2 1/2
+ 2Ha )(∂γ)−1 ψ(H 1/2 ψ̂) = 0
R R
M A
(Hz
which clearly implies
γz + 2γa = 0
proving the lemma.
257
Lemma 10.1.8. γ is trivial; that is, γ = C(2z − a) for some constant C.
Proof. Recall that the information is
R R
I= M A
H(γ(z, a))kψ(z, a)k2
If we compute the first variation with respect to γ once again and set it to zero,
we have that
δ(γ(t)) dγ |
dt t=a−2z
kψ̂k2 = 0
Then this implies
0
0 = H(γ(t)) dγ
dγ
kψ̂k2
0
which in turn implies dγ
dγ
= 0 or γ 0 is constant, hence γ = Ct + D for constants
C and D. But by the Riemann mapping theorem we can deform γ to Ct since
the transformation T : Ct + D 7→ Ct is a conformal map. Finally we observe that
H(Ct) = H(t) as C is constant, which completes the proof.
Proof. (of Conjecture A). Note that F is analytic (since our statistical manifold is
analytic, require that the signal function be analytic).
It then follows from the analyticity of F that
R R R R
0 = M A F (z, a) = M A H(a − 2z)eA(a)z+B(a)
= M A H(a − 2z)B̄(a)a−z for certain choices of A and B.
R R
P∞
Now suppose B̄(a) = n=1 Bn δ(n − a). Then the above expression becomes
P∞
Bn n−z
R
0= Re(z)≥ 21 n=1
which completes the proof of Conjecture A.
258
10.1.3 Alternative approach using Riemann-Cartan geome-

try
The previous approach was outlined by myself before I started to realise that the
metric-measure formalism of Riemannian geometry with measure could be simplified
and generalised to Riemann-Cartan geometry. However I should emphasise that
there is no loss of generality in applying the former methods as in the previous
subsection, to the special case of the complex numbers. Nonetheless one is motivated
to find a cleaner and more purely geometric derivation of the above result using the
more general methods. So I will seek to quickly sketch one here for the benefit of
myself and reader.
As before, the natural signal function is
R
f (m, a) = A
F (m, b)δ(σ(m, b) − a)db
but now the total information is merely

R R
I(f ) = M A
k∂f k2 /f
since we can ignore the measure terms.

This we can compute and find to be
R R
I(f ) = M A
∆f
which is
(h00 + R(σ)δ)eh
R R R R R
M A A
((∆F )δ + F ∆δ) = M A
where F (m, b) = eh(m,b) , since cross terms vanish as they can be rewritten as
boundary terms.
If this is to be zero, then h00 + Rδ = 0, or, via a similar argument to the previous
Lemma 10.1.4,
h = A(a)z + B(a) + C(z, a) Hδ (γ(z, a))
which gives the information as
259
Turbulence and the criticality of the prime numbers
H(γ)(h00 + R)eA(a)z+B(a) dadz

R R
I(f ) = M A
where we require that kψ(a, z)k2 := (h00 + R)eA(a)z+B(a) behaves nicely asymp-
totically (and in fact behaves like a generalised mass distribution). Note that this
may not necessarily be positive definite.
The rest of the argument is totally identical to that given previously, following
Lemmas 10.1.7 and 10.1.8.
10.1.4 Higher moments of the complex numbers

It is possible to look at higher statistical moments of the complex numbers. For
instance, if we look at the second statistical moment, I claim we get the following
result:
Conjecture. Consider the class of functions ζf0 = i fi dz
P d −z
i , where f ∈ l∞ (N ).
Then ζf0 has nontrivial zeroes only on the critical line Re(z) = 1.
In particular I claim this is a consequence of the Cramer-Rao inequality for

the second statistical moment of the real line, extended analytically to the complex
numbers.
One might naively expect, from these two pieces of data, that the critical line
should move by multiples of 1/2 indefinitely as we examine successively higher mo-
ments. This is in fact not the case; in fact, the true story of what happens in general
will be the focus of the next section.
10.2 Turbulence and the criticality of the prime

numbers
10.2.1 Introduction
Next, I will demonstrate how the tools I have developed in the chapter on turbulent
geometry can be used to significantly extend the (generalised) Riemann conjecture.
However before I do this I will demonstrate how the sketch I gave above can be
streamlined, through use of the correspondence principle.
Naively we might then hope to get say optimal information about even say the
singleton distribution of the primes, but I fear that this is unfortunately impossible
260
to achieve. The reason for this is because I have the opinion that there is a natural
geometric hierachy of models of growing sophistication and generality, and that not
only do models higher in the progression give better information, but that there are
an infinite number of such.
So, following an initial extension of the Riemann conjecture to signal functions
with fully general turbulence in the stack and the measure, I will indicate the begin-
nings of the idea of a new mathematics, the idea of geometric exponentiation, and
suggest how this could be used to get even better results.
10.2.2 A streamlined sketch of the Riemann conjecture

Consider the following action:
(∂σ ∗ Rσ ; Rτ )dm
R
I(σ, τ ) = M
Suppose M is the complex plane, and σ,τ are analytic functions from C to C,
being metrics on M . By the correspondence principle it is equivalent to the Fisher
information given by the signal function
R
f (m, a) = A
F (m, b)δ(σ̄(m, b) − a)db
For the information to be critical we require that δI(σ, τ ) = 0.

Consequently, via similar methods to before, we get the following equations that
σ and τ should satisfy:
(∂σ ∗ Rσ ; ∂σ ∗ Rτ ) = 0
and
(∂σ ∗ Rσ ; Rτ ) = 0
If σ is nontrivial this implies that τ is constant and Rσ = 0. Note that we

know that the constant τ must be nonzero since otherwise the corresponding metric
τ (z) = τ will become degenerate.
Recall now that an analytic metric is of the following general form:
ds2 = σ(z)2 dzdz̄
261
Turbulence and the criticality of the prime numbers
and the corresponding (Gaussian) curvature is

1 ∂ 2
Rσ = K = − σ(z)2 (∂z)2 ln(σ(z))
∂2
So since Rσ = 0 we have that ∂z 2
ln(σ) = 0.
Solving for σ, we get
ln(σ) = Az + B
or
σ = eAz+B
where A and B are constants.

Note f (z, a) = δ(σ(z) − a) is analytic.
Consider now g(z, a) = δ(τ − a) the analytic signal function associated to the
constant metric τ . Via properties of the turbulent derivative and delta functions, I
claim then that the total signal function is
F (z, a) = (∂f ∗ f ; g) = H(κ(z, a))δ(σ(z) − a)
(Note that the result is independent of the value of τ , but of course we need
τ 6= 0 as demonstrated above.)
This follows since
∂δ
(i) (∂f ∗ f ; g)2 = ( ∂σ
∂
(σ δ(τ −b) ) ∂σ
∂
(σ (−1)δ(τ −b) ))δ(σ − a)2 , as ∂f δ
= f δ.
(ii) f (z, a)δ(τ −b) = δlnf + H(κ(z, a)) for some function κ, so the above reduces to
∂ ∂
(iii) ∂σ
(δlnσ+ H) ∂σ (−δlnσ + H)δ(σ − a)2 = H(κ̂(z, a))2 δ(σ − a)2 , for a deformed
function κ̂ which proves my claim (if we approximate δ by a sequence of peaked
2
Gaussians of the form be−ax with integral normalised to one, then δ 2 will be
2
by definition approximated by a sequence of the form b2 e−2ax , and hence in
the limit will have zero measure).
We then conclude as before that κ̂(z, a) = C(2z − a) for some constant C, using
once more the fact that the information is critical.
Finally the conjecture then follows as before as a consequence of the analyticity
of F .
262
10.2.3 Towards a slightly more sophisticated conjecture

Consider now the signal function
(∂F ∗ F (m, a, c); g(m, b)), F = f (m, a)h(m,c) .
We would like to extract optimal information about critical subsets of the real
line using this model, given that f , g and h are all analytic. We know that the
information will be
(∂σ ∗ R(∂κ∗ Rκ ;Rγ )σ ; ∂τ ∗ R(Rβ )τ ; Rλ )dm

R
I(σ, τ, κ) = M
Extremising this, or requiring that δI = 0, gives the following six relations:
(∂σ ∗ R(∂κ∗ Rκ ;Rγ )σ ; ∂τ ∗ R(Rβ )τ ; Rλ ) = 0
(∂σ ∗ R(∂κ∗ Rκ ;Rγ )σ ; ∂σ ∗ ∂τ ∗ R(Rβ )τ ; Rλ ) = 0
(∂σ ∗ R(∂κ∗ Rκ ;Rγ )σ ; ∂τ ∗ R(Rβ )τ ; ∂τ ∗ Rλ ) = 0
(∂κ∗ Rκ ; ∂κ∗ Rγ ) = 0
(∂κ∗ Rκ ; Rγ ) = 0
Rβ = 0
Solving these simulatenously in the case that M is the complex plane should
provide us with a sharper understanding of critical subsets of the real numbers.
For some idea of what we might expect to come out of this, I provide the reader
with a conjecture that may well be a partial consequence of analysis of the above.
Conjecture. (Turbulent RH). Let f, g ∈ l∞ (N ) × N be functions from N × N to
R. Consider the generalised L-function
−z
∂ gij j −z
P
ζ(f,g) (z) = i,j fij −z i
∂z gij j
I claim that, for all such f and g, the analytic extension of ζ(f,g) to the complex
numbers will only have (nontrivial) zeroes on the critical line
263
Geometric exponentiation and deeper information
d
p
Re(z) = dx
arctan( xtan(x))|xtan(x)=λ(g) ,
where λ(g) is the average of the eigenvalues of the gij , weighted by multiplicity.
As a couple of remarks, note that this conjecture reduces to our statements for
the first and second statistical moments of the complex numbers, respectively (as
well it should). Perhaps the most mysterious aspect is the relationship between
λ(g) and the location of the critical line - and in particular the appearance of the
trigonometric function tan. Essentially this means that as the average eigenvalue
increases, the location of the critical line will occasionally ”spike” to positive infinity,
then reappear at negative infinity and quickly drop back close to zero again, and do
so with a period of length π. Indeed, this is a significant departure from the naive
expectation that if we were to have taken successively higher statistical moments
of the complex numbers, that the location of the critical line for the associated L
functions would have jumped by multiples of 1/2 indefinitely.
It would be very good to even have a sketch of the above. However, given the
fact that I am still uncertain whether this is a precisely correct consequence of the
criticality of the turbulent information (even given everything else, I am unsure
about whether λ should not also be a function of f ), it would perhaps be wiser
to instead try to see what follows from the criticality of the action I have given
above instead. Also it was never my intent to delve particularly deep into this
area of mathematics - more I merely wished to indicate how the tools I have been
developed can be used, and what results might follow if care is taken.
It also goes without saying that if the above conjecture is slightly off-beam and
in fact not corollary to the criticality of our signal function above, a counterexample
should be easy enough to find via numerical methods.
10.3 Geometric exponentiation and deeper infor-

mation
10.3.1 Motivation and definition
Recall from my previous work on turbulent geometry that there are two types of
turbulence that admit straightforward modelling - turbulence in the measure and
turbulence in the stack. These results in actions of the form RR and RR respec-
tively, where I am taking considerable liberties paraphrasing here. Similarly we can
264
consider measure-measure turbulence, measure-stack turbulence, stack-measure tur-

R
bulence, and stack-stack turbulence. These result in actions of the form RR , RRR ,
RRR , and RRR respectively.
It is natural to then ask what happens in general. Well certainly at the nth
iteration we will have 2n different possibilities for an action. So complexity of our
models if we wish to encompass all possibility grows exponentially with further
attention to detail. This is evidently not desirable. In particular we would ideally
like to know what happens if we push n off to infinity, to generate an infinite number
of discrete geometric bifurcations in our models. Then it is readily seen that the
number of possibilities is 2ℵ0 , or ℵ1 , the cardinality of the real numbers (I am taking
a further liberty here - for those who wish to believe in intermediate infinities I
adopt the convention that they might take a continuous range of values ℵk where k
is between 0 and 1).
So this leads one to ask, is there some formalism that would allow us to deal with
an infinite number of discrete bifurcations? Needless to say, if this is to be doable,
some novel new idea or way at looking at things is essential to make progress so
that calculations do not become unmanageable. To cut things short, I believe that
the answer is yes, and this is where the idea of geometric exponentiation enters the
picture.
Set exponentiation is a fairly simple concept to understand. Consider two sets A
and B. The product A×B may readily be formed, and is understood to be the set of
tuples (a, b) where a ∈ A and b ∈ B. So this is the product of two sets. How about
raising one set to the power of another? What exactly should this mean? Obviously
we expect the cardinality of AB to be necessarily of quite a different order to that
of A × B. So this needs to follow from the definition.
Briefly, the exponential of AB will be understood to mean a product ×b∈B A(b),
where each A(b) is a copy of A indexed by an element b of B. So if B is large, say
infinite, as will often be the case, AB will be very large indeed. In what is to follow,
A and B will often be Riemannian manifolds. Then AB will be a manifold itself
of uncountably infinite dimension. However more information is required to specify
and map out an appropriate amount of geometric structure for such spaces, and this
leads directly to the notion of geometric exponention.
So we would like to exponentiate say one Riemannian manifold (M, σ) by another
(N, τ ). As before, by exponentiation of two sets AB I mean taking one copy of A for
each element in B, leading to a structure that potentially could have uncountable
dimension. If A and B have differentiable structure, we can similarly induce an
265
indexing on the tangent space of AB via v(w) for the vector v(w) assigned to the
wth copy of the tangent space of A. Furthermore it is in fact possible to induce
a natural geometric structure on M N using the metrics σ and τ , via the fourth
order nondegenerate tensor Λ = σ ⊗ τ , with geometry given by k(v(w), p(q))kΛ :=
σij τkl v i (wk )pj (q l ).
10.3.2 Application
To reiterate, this idea was developed in an attempt to find a way to deal with
an infinite number of geometric bifurcations, as is engendered by the two types of
mathematical turbulence in the turbulent geometry discussed above (turbulence in
the stack and the measure). It is my hope that it might be possible to find a new
correspondence principle for such spaces that allows the description of the previous
geometry not only in full generality, but also in an elegant and finite way.
As to applications of geometric exponentiation, my thoughts on this matter are
still quite vague. However I suspect it could be used to improve the understanding of
transcendental equations, as well as of aspects of the mathematics related to Galois
theory. I also have the intuition that geometric exponentiation would be useful to
find still deeper results about critical subsets of the real line than those described
previously in this chapter.
For instance, one might expect actions of the following general form, in the
simplest case:
R
M
SΛ(m) dm
Here SΛ is the generalised curvature due to the 4-tensor Λ on M . This of course

is the action corresponding to the signal function
f (m, a(b)) = δ(Λ(m) − a(b))
To extract the geometric exponential analogue of the Riemann conjecture, the

following action needs to be considered:
(∂Λ∗ SΛ ; SΓ )dm
R
M
266
This is of course the corresponding turbulent information of two signal functions

as immediately above and I claim that it satisfies a corresponding Cramer-Rao
inequality.
In particular I suspect that exploring the consequences of this action over the
complex numbers would lead to preliminary criticality results in transcendental num-
ber theory.
10.3.3 Concluding Remarks

It is necessary to emphasise that all of the results that I have discussed up to this
point are still quite primitive. I mentioned not too long ago that I believe that
there is a natural geometric hierachy of countably infinite order of structures of
monotonically increasing complexity. Throughout the latter half of this work I have
endeavoured to climb as much of this edifice as I have been able.
The first few tentative steps up this mountain - differential geometry, rieman-
nian geometry, statistical geometry, turbulent geometry - have been difficult. One
might hope that, with these minor successes, one could systematise the method
of ascent, and indeed we are led to the idea of geometric exponentiation in this
manner. However I have come to increasingly suspect that this hierarchy is patho-
logically non-inductive, even bearing in mind that each step up brings something
new and unexpected.
For instance, one might naively intuit that one can keep on doubling the order
of our tensors - from 2nd degree in Riemannian geometry to 4th in exponential, to
8th, etc. It is possible to do this and study the associated structures - in fact, I have
already initiated a study of exponential, or ”plastic” geometry, which has interesting
connections to viscoplastic media - but we are ignoring something quite important.
That is, there is no reason we should not consider a distribution of information over
the space of tensors of all possible ranks. Furthermore, this rank need not be a
natural number - one could reasonably imagine a sensible abstraction of the idea of
tensor rank to the reals, or even to euclidean n-space.
We then would get a tensor rank manifold, N , forming a strange variety of
topological product with our physical space, M . We may also in turn, for simplicity,
have a standard metric, or second rank tensor on N that gives it the structure of a
Riemann-Cartan manifold. The associated geometric models would then look like
quite exotic fibre bundles.
So further progress of a definite and fully general variety will certainly be dif-
267
ficult to make. However, applications of such methods may have large potential
payoff. I suspect that even slight progress could have benefits to the understanding
of materials science (in the case of plastic geometry) and improvement in sieving
methods (with respect to the tensor rank manifolds outlined). The reason to suspect
the latter is because it is quite possible that the metric on the tensor rank space may
be in correspondence with the bilinear form of the remainder term in the linear sieve
(see for instance Greaves [24]). Related questions, such as the Geometric Langlands
program, might also be addressable with deeper understanding.
Note that if we step back a bit, it is easy to observe that all the above consid-
erations relate to having ideas from statistics driving the development of abstract
geometric structures on manifolds. It then becomes natural to ask whether we can
use geometry to drive the development of statistical structures. Motivating ques-
tions here are as to the rigorous formulation of Heisenberg’s matrix mechanics, the
understanding of Brownian motion, and doubtless various other processes, such as
queuing theory, population dynamics, or analysis of markets and group psychology.
And in fact we can. The prototypical sort of example to bear in mind here is
that of a Markov chain, where one has a nondegenerate bilinear form driving the
evolution of a probabilistic state vector. It seems clear to me that this is a natural
place to start in order to begin the process of abstraction and generalisation with
respect to this particular viewpoint. In fact in retrospect it seems to me that this
and its continuous analogues are perhaps closer in spirit to the work of many of the
more established practitioners of Information Geometry than that covered in the
bulk of this treatise, being of a more statistical and less geometric flavour.
These and doubtless other considerations will hopefully be the subject of a future
work.
268
Bibliography
[1] - Shun-ichi Amari - ”Differential-Geometrical Methods in Statistics” - Lecture

Notes in Statistics, Springer-Verlag, 1985
[2] - Shun-ichi Amari, Hiroshi Nagaoka - ”Methods of Information Geometry”

- Translations of Mathematical Monographs Volume 191, Oxford University
Press, 1993
[3] - John S. Bell - ”On the Einstein Podolsky Rosen paradox”, Physics 1, 195
(1964)
[4] - Berger - “A Panoramic View of Riemannian Geometry” - Springer, 2003
[5] - D. Bohm - Phys. Rev. 85, 166 and 180 (1952)
[6] - D. Bohm and Y. Aharonov, Phys. Rev. 108, 1070 (1957)
[7] - Banks and Zaks, ”On the phase structure of vector-like gauge theories with
massless fermions”, Nucl. Phys. B 196:189, 1982
[8] - Huai Dong Cao and Xi Ping Zhu - ”A Complete Proof of the Poincare and
Geometrization Conjectures - Application of the Hamilton-Perelman Theory of
the Ricci Flow” - http://www.intlpress.com/
[9] - Manfredo Perdigao do Carmo - ”Riemannian Geometry”, Birkhäuser 1992
[10] - N.N. Cencov - ”Statistical decision rules and optimal inference” - Translation
of Math. Monog. 53, Amer. Math. Soc., Providence 1982
[11] - Cheeger and Ebin - ”Comparison theorems in Riemannian Geometry”
[12] - Shiing-Shen Chern - ”On the multiplication in the characteristic ring of a

sphere bundle”, Ann. of Math., 49 (1948), 362-372
269
BIBLIOGRAPHY
[13] - Frederick L. Coolidge - ”Statistics: A Gentle Introduction”, Sage Publications,

2nd Edition 2006
[14] - A. Einstein, B. Podolsky, and N. Rosen - ”Can Quantum-Mechanical Descrip-
tion of Physical Reality Be Considered Complete?”, Phys. Rev. 47, 777-780
(1935)
[15] - Federer - ”Geometric Measure Theory”, Springer, 1996 edition
[16] - Theodore Frankel - ”Manifolds with Positive Curvature”, Pacific J. Math., 11
(1961), 165-174
[17] - Theodore Frankel - ”On the fundamental group of a compact minimal sub-
manifold”, Ann. of Math., 83 (1966), 68-73
[18] - R.A. Fisher - ”Statistical Methods and Scientific Inference”
[19] - B.Roy Frieden - “Physics from Fisher Information”, Cambridge University
Press, 1999 edition
[20] - Ryuichi Fukuoka - ”Mollifier smoothing of tensor fields on differentiable man-
ifolds and applications to Riemannian Geometry” - arXiv:math.DG/0608230
[21] - Michael Gromov - ”Curvature, diameter and Betti numbers”, Comment.
Math. Helvetici 56 (1981) 179-195
[22] - Howard Georgi - ”Unparticle Physics”, hep-ph/0703260
[23] - Detlef Gromoll and Karsten Grove ”A Generalisation of Berger’s Rigidity
Theorem for Positively Curved Manifolds”, Ann. scient. Ec. Norm. Sup., 4
serie,20 (1987), 227-239
[24] - George Greaves - ”Sieves in Number Theory”, A Series of Modern Surveys in
Mathematics Volume 43, Springer 2001
[25] - Karsten Grove and Katsuhiro Shiohama - ”A generalised sphere theorem”
Annals of Mathematics, 106 (1977), 201-211
[26] - D. Gilbarg and N. S. Trudinger - ”Elliptic Partial Differential Equations of
Second Order”, Springer, 3rd edition, 2001
[27] - R. Harvey and H. B. Lawson, ”Calibrated geometries”, Acta Math. 148 (1982),
47-157
270
BIBLIOGRAPHY
[28] - E. Hopf - ”Elemetare Bemerkungen über die Lösungen partieller Differential-

gleichungen zweiter Ordnung vom elliptischen”, Typus. Sitz. Ber. Press. Akad.
Wissensch. Berlin. Math. -Phys. Kl. 19, 147-152 (1927)
[29] - A. O. Ivanov - ”Calibration forms”, Advances in Soviet Mathematics Vol 15,

Minimal Surfaces, editor A. T. Fomenko, American Mathematical Society, 1993
[30] - John David Jackson - ”Classical Electrodynamics”, Third edition 1999, Hamil-
ton Printing Company
[31] - H. Blaine Lawson, Jr. - ”The Unknottedness of Minimal Embeddings”, Inven-

tiones math., 11 (1970), 183-187
[32] - H. Blaine Lawson, Jr. - ”Lectures on Minimal Submanifolds”, Vol. 1, Publish

or Perish Inc., (1980)
[33] - A. I. Khinchin - ”Mathematical foundations of Information Theory”, Dover

Publications, 1957
[34] - J.J. Koliha - ”Metrics, Norms and Integrals: An Introduction to Contempo-

rary Analysis”, World Scientific Publishing Co Pte Ltd, 2008
[35] - Frank Morgan - “Geometric Measure Theory - A Beginner’s Guide”, 2nd

edition, Academic Press, 1995
[36] - John W. Morgan and Gang Tian - ”Ricci Flow and the Poincare Conjecture”,
http://arxiv.org/abs/math/0607607
[37] - John W. Morgan and Gang Tian - ”Completion of the Proof of the Ge-
ometrization Conjecture”, http://arxiv.org/abs/0809.4040
[38] - Milnor and Stasheff - “Characteristic Classes”, Annals of Mathematics Stud-

ies, Study 76, Princeton University Press
[39] - Misner, Wheeler and Thorne - “Gravitation”, W.H. Freeman and Company,
1973
[40] - Michael K. Murray and John W. Rice - “Differential Geometry and Statistics”,
Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993
[41] - Barrett O’Neill - ”Semi-Riemannian Geometry with Applications to Relativ-

ity”, Academic Press 1983
271
BIBLIOGRAPHY
[42] - Grisha Perelman - ”The Entropy Formula for the Ricci Flow and its Geometric
Applications” - http://arxiv.org/abs/math.DG/0211159
[43] - Denes Petz - ”Covariance and Fisher Information in Quantum Mechanics” -

arxiv:quant-ph/0106125v1, 22 Jun 2001
[44] - Denes Petz - ”Information Geometry and Statistical Inference” - 2nd Inter-
national Symposium on Information Geometry and its Applications, December
12-16 2005, Tokyo
[45] - Jon Pitts - ”Existence and regularity of minimal surfaces on Riemannian

manifolds”
[46] - Bernhard Riemann, ”On the Number of Prime Numbers less than a Given
Quantity”, Monatsberichte der Berliner Akademie, November 1859
[47] - Leon Simon - ”Lectures on Geometric Measure Theory”, Proceedings of the

Centre for Mathematical Analysis, ANU, Volume 3 1983
[48] - Steenrod - ”The topology of fibre bundles”
[49] - Steenrod - ”The classification of sphere bundles”, Ann. of Math., 45 (1944),

294-311
[50] - Jalal Shatah and Michael Struwe - ”Geometric Wave Equations”, Courant
Lecture Notes in Mathematics, 1998
[51] - Whitney - ”Topological properties of differentiable manifolds”, Bull. Amer.

Math. Soc., 43 (1937), 785-805
272

A Treatise On Information Geometry: Chris Goddard, The University of Melbourne

Uploaded by

Copyright:

Available Formats

A Treatise On Information Geometry: Chris Goddard, The University of Melbourne

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Treatise On Information Geometry: Chris Goddard, The University of Melbourne

Uploaded by

Copyright:

Available Formats

A treatise on information geometry

arXiv:1802.06178v1 [math.DG] 17 Feb 2018

Chris Goddard, The University of Melbourne

Organisation and attribution of work xv

3 A Survey of Geometric Measure Theory 27

4 Existence Theory for PDEs 67

6 Statistical Geometry 155

8 Physics from Fisher Information 197

9 Turbulent Geometry 227

10 Number theory from information geometry 253

During my time at the University of Melbourne I looked at a number of different

The idea of a characteristic class is to construct some cohomological framework on

exist on manifolds. However, the method of characteristic classes is not totally

1.1 Characteristic Classes as pullbacks

Consider now the cohomology ring K(H(n, N )) of H(n, N ) relative to some

1.2 Standard Classes and their Construction

Here ∪ is the standard cup product operation in cohomology, and χ ⊕ η is the

1.2.2 The Euler Class

is the unique non-zero class in H n (F, F0 ). Furthermore the correspondence x 7→

For any cohomology class ν ∈ H n (M ; Z/2), we define the Kronecker index as

1.2.3 Chern Classes

ci (ω) = π0∗−1 ci (ω0 )

1.2.4 Pontrjagin Classes

is defined to be the integral cohomology class (−1)i c2i (χ ⊗ C)

1.4 Generalised Invariants and application to Ex-

Question. Given a set with a particular topological structure, ie a particular home-

since for any space X n there is an embedding k : X n → R2n+1 , it seems reasonable

limx→x0 |f (x)−f (x0 )|

is well defined, for every x0 ∈ R2n+1 .

1.4.2 Geometric Interpretation of Generalised Invariants

symmetry, and consider instead a merely nondegenerate form g. Then, by splitting

2.2 The fundamental theorem of Riemannian ge-

2.2.2 The euclidean covariant derivative

Recall that ∇v f , the directional derivative of f in direction v, is ∇f · v. We

2.2.3 Axiomatic extension to arbitrary metrics

(i) ∇f X+gY Z = f ∇X Z + g∇Y Z

In addition, we have that it is compatible with the Euclidean metric, ie if we

Finally we have that it is symmetric, that is

Then, it is simplicity to generalise from here. If an operator ∇ satisfies the

2.2.4 Uniqueness of the Levi-Civita connection

Proof. If we assume existence, then certainly

X(Y, Z) = g(∇X Y, Z) + g(Y, ∇X Z)

Xg(Y, Z) + Y g(Z, X) − Zg(X, Y ) =

which is an expression that can easily be rearranged to give ∇ uniquely in terms

2.3 Geodesics and the geodesic equation

of straight line in Euclidean space. In particular, we are interested in measuring

which is equivalent to requiring that

This above expression is known as the geodesic equation. It can be written in

2.3.2 The cut and conjugate loci

gij ( xδ ) = δij (0) + δ 2 Rijkl (0) xδk xδl + o(δ 2 )

It can be thought of alternatively as the ”geometric acceleration”. For instance a

2.4.1 The curvature tensor and its symmetries

Rijkl := g(R(Xi , Xj )Xk , Xl )

(i) R(u, v) = −R(v, u)

(ii) g(R(u, v)w, z) = −g(R(u, v)z, w)

(iii) R(u, v)w + R(v, w)u + R(w, u)v = 0

(iv) ∇u R(v, w) + ∇v R(w, u) + ∇w R(u, v) = 0

The Ricci curvature is defined to be Ril := Rijkl g jk , the contraction of the

for << 1. This gives us a geometrical interpretation of the Ricci curvature.